* 2.6.31-rc7-git2: Reported regressions from 2.6.30
@ 2009-08-25 20:00 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:00 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich,
Kernel Testers List, Network Development, Linux ACPI,
Linux PM List, Linux SCSI List, Linux Wireless List, DRI
This message contains a list of some regressions from 2.6.30, for which there
are no fixes in the mainline I know of. If any of them have been fixed already,
please let me know.
If you know of any other unresolved regressions from 2.6.30, please let me know
either and I'll add them to the list. Also, please let me know if any of the
entries below are invalid.
Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.
Listed regressions statistics:
Date Total Pending Unresolved
----------------------------------------
2009-08-26 108 33 26
2009-08-20 102 32 29
2009-08-10 89 27 24
2009-08-02 76 36 28
2009-07-27 70 51 43
2009-07-07 35 25 21
2009-06-29 22 22 15
Unresolved regressions
----------------------
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14060
Subject : oops: sysfs_remove_link and i915
Submitter : Dominik Brodowski <linux@dominikbrodowski.net>
Date : 2009-08-22 5:48 (4 days old)
References : http://marc.info/?l=linux-kernel&m=125092139113955&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14058
Subject : Oops in fsnotify
Submitter : Grant Wilson <grant.wilson@zen.co.uk>
Date : 2009-08-20 15:48 (6 days old)
References : http://marc.info/?l=linux-kernel&m=125078450923133&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14057
Subject : Strange network timeouts w/ e100
Submitter : Walt Holman <walt@holmansrus.com>
Date : 2009-08-20 0:21 (6 days old)
References : http://marc.info/?l=linux-kernel&m=125072831831443&w=4
Handled-By : Krzysztof Halasa <khc@pm.waw.pl>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14031
Subject : dvb_usb_af9015: Oops on hotplugging
Submitter : Stefan Lippers-Hollmann <s.L-H@gmx.de>
Date : 2009-08-05 20:32 (21 days old)
References : http://marc.info/?l=linux-kernel&m=124949716608828&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14018
Subject : kernel freezes, inotify problem
Submitter : Christoph Thielecke <christoph.thielecke@gmx.de>
Date : 2009-08-19 12:48 (7 days old)
References : http://marc.info/?l=linux-kernel&m=125068616818353&w=4
Handled-By : Eric Paris <eparis@parisplace.org>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14016
Subject : mm/ipw2200 regression
Submitter : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date : 2009-08-15 16:56 (11 days old)
References : http://marc.info/?l=linux-kernel&m=125036437221408&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14015
Subject : pty regressed again, breaking expect and gcc's testsuite
Submitter : Mikael Pettersson <mikpe@it.uu.se>
Date : 2009-08-14 23:41 (12 days old)
References : http://marc.info/?l=linux-kernel&m=125029329805643&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14013
Subject : hd don't show up
Submitter : Tim Blechmann <tim@klingt.org>
Date : 2009-08-14 8:26 (12 days old)
References : http://marc.info/?l=linux-kernel&m=125023842514480&w=4
Handled-By : Tejun Heo <tj@kernel.org>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14012
Subject : latest git fried my x86_64 imac
Submitter : Justin P. Mattock <justinmattock@gmail.com>
Date : 2009-08-13 07:20 (13 days old)
References : http://marc.info/?l=linux-kernel&m=125014080427090&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14011
Subject : Kernel paging request failed in kmem_cache_alloc
Submitter : Matthias Dahl <ml_kernel@mortal-soul.de>
Date : 2009-08-10 22:26 (16 days old)
References : http://marc.info/?l=linux-kernel&m=124993603825082&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13987
Subject : Received NMI interrupt at resume
Submitter : Christian Casteyde <casteyde.christian@free.fr>
Date : 2009-08-15 07:55 (11 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13950
Subject : Oops when USB Serial disconnected while in use
Submitter : Bruno Prémont <bonbons@linux-vserver.org>
Date : 2009-08-08 17:47 (18 days old)
References : http://marc.info/?l=linux-kernel&m=124975432900466&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943
Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k
Submitter : Fabio Comolli <fabio.comolli@gmail.com>
Date : 2009-08-06 20:15 (20 days old)
References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942
Subject : Troubles with AoE and uninitialized object
Submitter : Bruno Prémont <bonbons@linux-vserver.org>
Date : 2009-08-04 10:12 (22 days old)
References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13941
Subject : x86 Geode issue
Submitter : Martin-Éric Racine <q-funk@iki.fi>
Date : 2009-08-03 12:58 (23 days old)
References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940
Subject : iwlagn and sky2 stopped working, ACPI-related
Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net>
Date : 2009-08-07 22:33 (19 days old)
References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935
Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version)
Submitter : Adrian Ulrich <kernel@blinkenlights.ch>
Date : 2009-08-08 22:08 (18 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13906
Subject : Huawei E169 GPRS connection causes Ooops
Submitter : Clemens Eisserer <linuxhippy@gmail.com>
Date : 2009-08-04 09:02 (22 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13869
Subject : Radeon framebuffer (w/o KMS) corruption at boot.
Submitter : Duncan <1i5t5.duncan@cox.net>
Date : 2009-07-29 16:44 (28 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13848
Subject : iwlwifi (4965) regression since 2.6.30
Submitter : Lukas Hejtmanek <xhejtman@ics.muni.cz>
Date : 2009-07-26 7:57 (31 days old)
References : http://marc.info/?l=linux-kernel&m=124859658502866&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836
Subject : suspend script fails, related to stdout?
Submitter : Tomas M. <tmezzadra@gmail.com>
Date : 2009-07-17 21:24 (40 days old)
References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
Subject : system freeze when switching to console
Submitter : Reinette Chatre <reinette.chatre@intel.com>
Date : 2009-07-23 17:57 (34 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13809
Subject : oprofile: possible circular locking dependency detected
Submitter : Jerome Marchand <jmarchan@redhat.com>
Date : 2009-07-22 13:35 (35 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13740
Subject : X server crashes with 2.6.31-rc2 when options are changed
Submitter : Michael S. Tsirkin <m.s.tsirkin@gmail.com>
Date : 2009-07-07 15:19 (50 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13733
Subject : 2.6.31-rc2: irq 16: nobody cared
Submitter : Niel Lambrechts <niel.lambrechts@gmail.com>
Date : 2009-07-06 18:32 (51 days old)
References : http://marc.info/?l=linux-kernel&m=124690524027166&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13645
Subject : NULL pointer dereference at (null) (level2_spare_pgt)
Submitter : poornima nayak <mpnayak@linux.vnet.ibm.com>
Date : 2009-06-17 17:56 (70 days old)
References : http://lkml.org/lkml/2009/6/17/194
Regressions with patches
------------------------
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14062
Subject : Failure to boot as xen guest
Submitter : Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Date : 2009-08-25 15:48 (1 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=83b519e8b9572c319c8e0c615ee5dd7272856090
References : http://marc.info/?l=linux-kernel&m=125121534229538&w=4
Handled-By : Jeremy Fitzhardinge <jeremy@goop.org>
Patch : http://patchwork.kernel.org/patch/43799/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14061
Subject : Crash due to buggy flat_phys_pkg_id
Submitter : Ravikiran G Thirumalai <kiran@scalex86.org>
Date : 2009-08-24 18:26 (2 days old)
References : http://marc.info/?l=linux-kernel&m=125114085701508&w=4
Handled-By : Yinghai Lu <yinghai@kernel.org>
Patch : http://patchwork.kernel.org/patch/43806/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14030
Subject : Kernel NULL pointer dereference at 0000000000000008, pty-related
Submitter : Eric W. Biederman <ebiederm@xmission.com>
Date : 2009-08-20 5:46 (6 days old)
References : http://marc.info/?l=linux-kernel&m=125074724623423&w=4
Handled-By : Linus Torvalds <torvalds@linux-foundation.org>
Patch : http://patchwork.kernel.org/patch/43679/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14017
Subject : _end symbol missing from Symbol.map
Submitter : Hannes Reinecke <hare@suse.de>
Date : 2009-08-13 6:45 (13 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=091e52c3551d3031343df24b573b770b4c6c72b6
References : http://marc.info/?l=linux-kernel&m=125014649102253&w=4
Handled-By : Hannes Reinecke <hare@suse.de>
Patch : http://marc.info/?l=linux-kernel&m=125014649102253&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13960
Subject : rtl8187 not connect to wifi
Submitter : okias <d.okias@gmail.com>
Date : 2009-08-10 19:16 (16 days old)
Handled-By : Larry Finger <Larry.Finger@lwfinger.net>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=22798
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13948
Subject : ath5k broken after suspend-to-ram
Submitter : Johannes Stezenbach <js@sig21.net>
Date : 2009-08-07 21:51 (19 days old)
References : http://marc.info/?l=linux-kernel&m=124968192727854&w=4
Handled-By : Nick Kossifidis <mickflemm@gmail.com>
Patch : http://patchwork.kernel.org/patch/38550/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13947
Subject : Libertas: Association request to the driver failed
Submitter : Daniel Mack <daniel@caiaq.de>
Date : 2009-08-07 19:11 (19 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=57921c312e8cef72ba35a4cfe870b376da0b1b87
References : http://marc.info/?l=linux-kernel&m=124967234311481&w=4
Handled-By : Roel Kluin <roel.kluin@gmail.com>
Dan Williams <dcbw@redhat.com>
Patch : http://patchwork.kernel.org/patch/43114/
For details, please visit the bug entries and follow the links given in
references.
As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.30,
unresolved as well as resolved, at:
http://bugzilla.kernel.org/show_bug.cgi?id=13615
Please let me know if there are any Bugzilla entries that should be added to
the list in there.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* 2.6.31-rc7-git2: Reported regressions from 2.6.30
@ 2009-08-25 20:00 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:00 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich,
Kernel Testers List, Network Development, Linux ACPI,
Linux PM List, Linux SCSI List, Linux Wireless List, DRI
This message contains a list of some regressions from 2.6.30, for which there
are no fixes in the mainline I know of. If any of them have been fixed already,
please let me know.
If you know of any other unresolved regressions from 2.6.30, please let me know
either and I'll add them to the list. Also, please let me know if any of the
entries below are invalid.
Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.
Listed regressions statistics:
Date Total Pending Unresolved
----------------------------------------
2009-08-26 108 33 26
2009-08-20 102 32 29
2009-08-10 89 27 24
2009-08-02 76 36 28
2009-07-27 70 51 43
2009-07-07 35 25 21
2009-06-29 22 22 15
Unresolved regressions
----------------------
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14060
Subject : oops: sysfs_remove_link and i915
Submitter : Dominik Brodowski <linux-X3ehHDuj6sIIGcDfoQAp7OTW4wlIGRCZ@public.gmane.org>
Date : 2009-08-22 5:48 (4 days old)
References : http://marc.info/?l=linux-kernel&m=125092139113955&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14058
Subject : Oops in fsnotify
Submitter : Grant Wilson <grant.wilson-1HOZaDBbGgxaa/9Udqfwiw@public.gmane.org>
Date : 2009-08-20 15:48 (6 days old)
References : http://marc.info/?l=linux-kernel&m=125078450923133&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14057
Subject : Strange network timeouts w/ e100
Submitter : Walt Holman <walt-Wo+ox+avW/9ByuSxxbvQtw@public.gmane.org>
Date : 2009-08-20 0:21 (6 days old)
References : http://marc.info/?l=linux-kernel&m=125072831831443&w=4
Handled-By : Krzysztof Halasa <khc-9GfyWEdoJtJmR6Xm/wNWPw@public.gmane.org>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14031
Subject : dvb_usb_af9015: Oops on hotplugging
Submitter : Stefan Lippers-Hollmann <s.L-H-Mmb7MZpHnFY@public.gmane.org>
Date : 2009-08-05 20:32 (21 days old)
References : http://marc.info/?l=linux-kernel&m=124949716608828&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14018
Subject : kernel freezes, inotify problem
Submitter : Christoph Thielecke <christoph.thielecke-Mmb7MZpHnFY@public.gmane.org>
Date : 2009-08-19 12:48 (7 days old)
References : http://marc.info/?l=linux-kernel&m=125068616818353&w=4
Handled-By : Eric Paris <eparis-FjpueFixGhCM4zKIHC2jIg@public.gmane.org>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14016
Subject : mm/ipw2200 regression
Submitter : Bartlomiej Zolnierkiewicz <bzolnier-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date : 2009-08-15 16:56 (11 days old)
References : http://marc.info/?l=linux-kernel&m=125036437221408&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14015
Subject : pty regressed again, breaking expect and gcc's testsuite
Submitter : Mikael Pettersson <mikpe-1zs4UD6AkMk@public.gmane.org>
Date : 2009-08-14 23:41 (12 days old)
References : http://marc.info/?l=linux-kernel&m=125029329805643&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14013
Subject : hd don't show up
Submitter : Tim Blechmann <tim-xpEK/MU0Hawdnm+yROfE0A@public.gmane.org>
Date : 2009-08-14 8:26 (12 days old)
References : http://marc.info/?l=linux-kernel&m=125023842514480&w=4
Handled-By : Tejun Heo <tj-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14012
Subject : latest git fried my x86_64 imac
Submitter : Justin P. Mattock <justinmattock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date : 2009-08-13 07:20 (13 days old)
References : http://marc.info/?l=linux-kernel&m=125014080427090&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14011
Subject : Kernel paging request failed in kmem_cache_alloc
Submitter : Matthias Dahl <ml_kernel-Rk1lLwyeSiSCvTm3UDtA3g@public.gmane.org>
Date : 2009-08-10 22:26 (16 days old)
References : http://marc.info/?l=linux-kernel&m=124993603825082&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13987
Subject : Received NMI interrupt at resume
Submitter : Christian Casteyde <casteyde.christian-GANU6spQydw@public.gmane.org>
Date : 2009-08-15 07:55 (11 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13950
Subject : Oops when USB Serial disconnected while in use
Submitter : Bruno Prémont <bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy@public.gmane.org>
Date : 2009-08-08 17:47 (18 days old)
References : http://marc.info/?l=linux-kernel&m=124975432900466&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943
Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k
Submitter : Fabio Comolli <fabio.comolli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date : 2009-08-06 20:15 (20 days old)
References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942
Subject : Troubles with AoE and uninitialized object
Submitter : Bruno Prémont <bonbons-ud5FBsm0p/xEiooADzr8i9i2O/JbrIOy@public.gmane.org>
Date : 2009-08-04 10:12 (22 days old)
References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13941
Subject : x86 Geode issue
Submitter : Martin-Éric Racine <q-funk-X3B1VOXEql0@public.gmane.org>
Date : 2009-08-03 12:58 (23 days old)
References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940
Subject : iwlagn and sky2 stopped working, ACPI-related
Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net>
Date : 2009-08-07 22:33 (19 days old)
References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935
Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version)
Submitter : Adrian Ulrich <kernel-4ZM2p5qjiQGewZBzVTKGGg@public.gmane.org>
Date : 2009-08-08 22:08 (18 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13906
Subject : Huawei E169 GPRS connection causes Ooops
Submitter : Clemens Eisserer <linuxhippy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date : 2009-08-04 09:02 (22 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13869
Subject : Radeon framebuffer (w/o KMS) corruption at boot.
Submitter : Duncan <1i5t5.duncan-j9pdmedNgrk@public.gmane.org>
Date : 2009-07-29 16:44 (28 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13848
Subject : iwlwifi (4965) regression since 2.6.30
Submitter : Lukas Hejtmanek <xhejtman-8qz54MUs51PtwjQa/ONI9g@public.gmane.org>
Date : 2009-07-26 7:57 (31 days old)
References : http://marc.info/?l=linux-kernel&m=124859658502866&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836
Subject : suspend script fails, related to stdout?
Submitter : Tomas M. <tmezzadra-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date : 2009-07-17 21:24 (40 days old)
References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
Subject : system freeze when switching to console
Submitter : Reinette Chatre <reinette.chatre-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Date : 2009-07-23 17:57 (34 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13809
Subject : oprofile: possible circular locking dependency detected
Submitter : Jerome Marchand <jmarchan-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date : 2009-07-22 13:35 (35 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13740
Subject : X server crashes with 2.6.31-rc2 when options are changed
Submitter : Michael S. Tsirkin <m.s.tsirkin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date : 2009-07-07 15:19 (50 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13733
Subject : 2.6.31-rc2: irq 16: nobody cared
Submitter : Niel Lambrechts <niel.lambrechts-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date : 2009-07-06 18:32 (51 days old)
References : http://marc.info/?l=linux-kernel&m=124690524027166&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13645
Subject : NULL pointer dereference at (null) (level2_spare_pgt)
Submitter : poornima nayak <mpnayak-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
Date : 2009-06-17 17:56 (70 days old)
References : http://lkml.org/lkml/2009/6/17/194
Regressions with patches
------------------------
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14062
Subject : Failure to boot as xen guest
Submitter : Arnd Hannemann <hannemann-JasiFyN5vQG662+jY7v6MhvVK+yQ3ZXh@public.gmane.org>
Date : 2009-08-25 15:48 (1 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=83b519e8b9572c319c8e0c615ee5dd7272856090
References : http://marc.info/?l=linux-kernel&m=125121534229538&w=4
Handled-By : Jeremy Fitzhardinge <jeremy-TSDbQ3PG+2Y@public.gmane.org>
Patch : http://patchwork.kernel.org/patch/43799/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14061
Subject : Crash due to buggy flat_phys_pkg_id
Submitter : Ravikiran G Thirumalai <kiran-HAaLjvVgespg9hUCZPvPmw@public.gmane.org>
Date : 2009-08-24 18:26 (2 days old)
References : http://marc.info/?l=linux-kernel&m=125114085701508&w=4
Handled-By : Yinghai Lu <yinghai-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
Patch : http://patchwork.kernel.org/patch/43806/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14030
Subject : Kernel NULL pointer dereference at 0000000000000008, pty-related
Submitter : Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Date : 2009-08-20 5:46 (6 days old)
References : http://marc.info/?l=linux-kernel&m=125074724623423&w=4
Handled-By : Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Patch : http://patchwork.kernel.org/patch/43679/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14017
Subject : _end symbol missing from Symbol.map
Submitter : Hannes Reinecke <hare-l3A5Bk7waGM@public.gmane.org>
Date : 2009-08-13 6:45 (13 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=091e52c3551d3031343df24b573b770b4c6c72b6
References : http://marc.info/?l=linux-kernel&m=125014649102253&w=4
Handled-By : Hannes Reinecke <hare-l3A5Bk7waGM@public.gmane.org>
Patch : http://marc.info/?l=linux-kernel&m=125014649102253&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13960
Subject : rtl8187 not connect to wifi
Submitter : okias <d.okias-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date : 2009-08-10 19:16 (16 days old)
Handled-By : Larry Finger <Larry.Finger-tQ5ms3gMjBLk1uMJSBkQmQ@public.gmane.org>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=22798
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13948
Subject : ath5k broken after suspend-to-ram
Submitter : Johannes Stezenbach <js-FF7aIK3TAVNeoWH0uzbU5w@public.gmane.org>
Date : 2009-08-07 21:51 (19 days old)
References : http://marc.info/?l=linux-kernel&m=124968192727854&w=4
Handled-By : Nick Kossifidis <mickflemm-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Patch : http://patchwork.kernel.org/patch/38550/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13947
Subject : Libertas: Association request to the driver failed
Submitter : Daniel Mack <daniel-rDUAYElUppE@public.gmane.org>
Date : 2009-08-07 19:11 (19 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=57921c312e8cef72ba35a4cfe870b376da0b1b87
References : http://marc.info/?l=linux-kernel&m=124967234311481&w=4
Handled-By : Roel Kluin <roel.kluin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Dan Williams <dcbw-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Patch : http://patchwork.kernel.org/patch/43114/
For details, please visit the bug entries and follow the links given in
references.
As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.30,
unresolved as well as resolved, at:
http://bugzilla.kernel.org/show_bug.cgi?id=13615
Please let me know if there are any Bugzilla entries that should be added to
the list in there.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13645] NULL pointer dereference at (null) (level2_spare_pgt)
2009-08-25 20:00 ` Rafael J. Wysocki
(?)
@ 2009-08-25 20:00 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:00 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, poornima nayak
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13645
Subject : NULL pointer dereference at (null) (level2_spare_pgt)
Submitter : poornima nayak <mpnayak@linux.vnet.ibm.com>
Date : 2009-06-17 17:56 (70 days old)
References : http://lkml.org/lkml/2009/6/17/194
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13733] 2.6.31-rc2: irq 16: nobody cared
2009-08-25 20:00 ` Rafael J. Wysocki
(?)
(?)
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Niel Lambrechts
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13733
Subject : 2.6.31-rc2: irq 16: nobody cared
Submitter : Niel Lambrechts <niel.lambrechts@gmail.com>
Date : 2009-07-06 18:32 (51 days old)
References : http://marc.info/?l=linux-kernel&m=124690524027166&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13740] X server crashes with 2.6.31-rc2 when options are changed
2009-08-25 20:00 ` Rafael J. Wysocki
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Michael S. Tsirkin
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13740
Subject : X server crashes with 2.6.31-rc2 when options are changed
Submitter : Michael S. Tsirkin <m.s.tsirkin@gmail.com>
Date : 2009-07-07 15:19 (50 days old)
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13740] X server crashes with 2.6.31-rc2 when options are changed
@ 2009-08-25 20:34 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Michael S. Tsirkin
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13740
Subject : X server crashes with 2.6.31-rc2 when options are changed
Submitter : Michael S. Tsirkin <m.s.tsirkin-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date : 2009-07-07 15:19 (50 days old)
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13809] oprofile: possible circular locking dependency detected
2009-08-25 20:00 ` Rafael J. Wysocki
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Jerome Marchand
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13809
Subject : oprofile: possible circular locking dependency detected
Submitter : Jerome Marchand <jmarchan@redhat.com>
Date : 2009-07-22 13:35 (35 days old)
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13809] oprofile: possible circular locking dependency detected
@ 2009-08-25 20:34 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Jerome Marchand
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13809
Subject : oprofile: possible circular locking dependency detected
Submitter : Jerome Marchand <jmarchan-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
Date : 2009-07-22 13:35 (35 days old)
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13819] system freeze when switching to console
2009-08-25 20:00 ` Rafael J. Wysocki
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Eric Anholt, ling.ma, Linus Torvalds,
Ma Ling, Reinette Chatre
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
Subject : system freeze when switching to console
Submitter : Reinette Chatre <reinette.chatre@intel.com>
Date : 2009-07-23 17:57 (34 days old)
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13819] system freeze when switching to console
@ 2009-08-25 20:34 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Eric Anholt, ling.ma-ral2JQCrhuEAvxtiuMwx3w,
Linus Torvalds, Ma Ling, Reinette Chatre
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
Subject : system freeze when switching to console
Submitter : Reinette Chatre <reinette.chatre-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Date : 2009-07-23 17:57 (34 days old)
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13848] iwlwifi (4965) regression since 2.6.30
2009-08-25 20:00 ` Rafael J. Wysocki
` (5 preceding siblings ...)
(?)
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Lukas Hejtmanek
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13848
Subject : iwlwifi (4965) regression since 2.6.30
Submitter : Lukas Hejtmanek <xhejtman@ics.muni.cz>
Date : 2009-07-26 7:57 (31 days old)
References : http://marc.info/?l=linux-kernel&m=124859658502866&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13836] suspend script fails, related to stdout?
2009-08-25 20:00 ` Rafael J. Wysocki
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Tomas M.
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836
Subject : suspend script fails, related to stdout?
Submitter : Tomas M. <tmezzadra@gmail.com>
Date : 2009-07-17 21:24 (40 days old)
References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13836] suspend script fails, related to stdout?
@ 2009-08-25 20:34 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Tomas M.
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836
Subject : suspend script fails, related to stdout?
Submitter : Tomas M. <tmezzadra-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date : 2009-07-17 21:24 (40 days old)
References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13836] suspend script fails, related to stdout?
2009-08-25 20:34 ` Rafael J. Wysocki
@ 2009-08-26 11:10 ` Tomas M.
-1 siblings, 0 replies; 286+ messages in thread
From: Tomas M. @ 2009-08-26 11:10 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Kernel Testers List
yes, this is still the case, the same script (netcfg from archlinux)
fails during boot too.
Rafael J. Wysocki wrote:
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836
Subject : suspend script fails, related to stdout?
Submitter : Tomas M. <tmezzadra@gmail.com>
Date : 2009-07-17 21:24 (40 days old)
References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13836] suspend script fails, related to stdout?
@ 2009-08-26 11:10 ` Tomas M.
0 siblings, 0 replies; 286+ messages in thread
From: Tomas M. @ 2009-08-26 11:10 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Kernel Testers List
yes, this is still the case, the same script (netcfg from archlinux)
fails during boot too.
Rafael J. Wysocki wrote:
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836
Subject : suspend script fails, related to stdout?
Submitter : Tomas M. <tmezzadra-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date : 2009-07-17 21:24 (40 days old)
References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13836] suspend script fails, related to stdout?
@ 2009-08-26 20:56 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-26 20:56 UTC (permalink / raw)
To: Tomas M.; +Cc: Linux Kernel Mailing List, Kernel Testers List
On Wednesday 26 August 2009, Tomas M. wrote:
> yes, this is still the case, the same script (netcfg from archlinux)
> fails during boot too.
Thanks for the update.
Rafael
> Rafael J. Wysocki wrote:
>
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836
> Subject : suspend script fails, related to stdout?
> Submitter : Tomas M. <tmezzadra@gmail.com>
> Date : 2009-07-17 21:24 (40 days old)
> References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13836] suspend script fails, related to stdout?
@ 2009-08-26 20:56 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-26 20:56 UTC (permalink / raw)
To: Tomas M.; +Cc: Linux Kernel Mailing List, Kernel Testers List
On Wednesday 26 August 2009, Tomas M. wrote:
> yes, this is still the case, the same script (netcfg from archlinux)
> fails during boot too.
Thanks for the update.
Rafael
> Rafael J. Wysocki wrote:
>
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836
> Subject : suspend script fails, related to stdout?
> Submitter : Tomas M. <tmezzadra-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Date : 2009-07-17 21:24 (40 days old)
> References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13935] 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version)
2009-08-25 20:00 ` Rafael J. Wysocki
` (7 preceding siblings ...)
(?)
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Adrian Ulrich, Jan Scholz, Jiri Kosina
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935
Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version)
Submitter : Adrian Ulrich <kernel@blinkenlights.ch>
Date : 2009-08-08 22:08 (18 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13906] Huawei E169 GPRS connection causes Ooops
2009-08-25 20:00 ` Rafael J. Wysocki
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Clemens Eisserer
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13906
Subject : Huawei E169 GPRS connection causes Ooops
Submitter : Clemens Eisserer <linuxhippy@gmail.com>
Date : 2009-08-04 09:02 (22 days old)
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13906] Huawei E169 GPRS connection causes Ooops
@ 2009-08-25 20:34 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Clemens Eisserer
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13906
Subject : Huawei E169 GPRS connection causes Ooops
Submitter : Clemens Eisserer <linuxhippy-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date : 2009-08-04 09:02 (22 days old)
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13869] Radeon framebuffer (w/o KMS) corruption at boot.
2009-08-25 20:00 ` Rafael J. Wysocki
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Duncan
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13869
Subject : Radeon framebuffer (w/o KMS) corruption at boot.
Submitter : Duncan <1i5t5.duncan@cox.net>
Date : 2009-07-29 16:44 (28 days old)
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13869] Radeon framebuffer (w/o KMS) corruption at boot.
@ 2009-08-25 20:34 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Duncan
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13869
Subject : Radeon framebuffer (w/o KMS) corruption at boot.
Submitter : Duncan <1i5t5.duncan-j9pdmedNgrk@public.gmane.org>
Date : 2009-07-29 16:44 (28 days old)
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13941] x86 Geode issue
2009-08-25 20:00 ` Rafael J. Wysocki
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Al Viro, Martin-Éric Racine
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13941
Subject : x86 Geode issue
Submitter : Martin-Éric Racine <q-funk@iki.fi>
Date : 2009-08-03 12:58 (23 days old)
References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13941] x86 Geode issue
@ 2009-08-25 20:34 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Al Viro, Martin-Éric Racine
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13941
Subject : x86 Geode issue
Submitter : Martin-Éric Racine <q-funk@iki.fi>
Date : 2009-08-03 12:58 (23 days old)
References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13941] x86 Geode issue
2009-08-25 20:34 ` Rafael J. Wysocki
(?)
@ 2009-08-25 23:37 ` Martin-Éric Racine
2009-08-26 20:59 ` Rafael J. Wysocki
-1 siblings, 1 reply; 286+ messages in thread
From: Martin-Éric Racine @ 2009-08-25 23:37 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Kernel Testers List, Al Viro
Yes, it still is valid.
Screen dumps of the kernel panic were provided. Can we get anyone to
investigate them now and find a fix? Thanks!
On Tue, Aug 25, 2009 at 11:34 PM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13941
> Subject : x86 Geode issue
> Submitter : Martin-Éric Racine <q-funk@iki.fi>
> Date : 2009-08-03 12:58 (23 days old)
> References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4
>
>
>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13941] x86 Geode issue
@ 2009-08-26 20:59 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-26 20:59 UTC (permalink / raw)
To: q-funk; +Cc: Linux Kernel Mailing List, Kernel Testers List, Al Viro
On Wednesday 26 August 2009, Martin-Éric Racine wrote:
> Yes, it still is valid.
Thanks for the update.
Rafael
> On Tue, Aug 25, 2009 at 11:34 PM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13941
> > Subject : x86 Geode issue
> > Submitter : Martin-Éric Racine <q-funk@iki.fi>
> > Date : 2009-08-03 12:58 (23 days old)
> > References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13941] x86 Geode issue
@ 2009-08-26 20:59 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-26 20:59 UTC (permalink / raw)
To: q-funk-X3B1VOXEql0
Cc: Linux Kernel Mailing List, Kernel Testers List, Al Viro
On Wednesday 26 August 2009, Martin-Éric Racine wrote:
> Yes, it still is valid.
Thanks for the update.
Rafael
> On Tue, Aug 25, 2009 at 11:34 PM, Rafael J. Wysocki<rjw-KKrjLPT3xs0@public.gmane.org> wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13941
> > Subject : x86 Geode issue
> > Submitter : Martin-Éric Racine <q-funk-X3B1VOXEql0@public.gmane.org>
> > Date : 2009-08-03 12:58 (23 days old)
> > References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k
2009-08-25 20:00 ` Rafael J. Wysocki
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Fabio Comolli, Luis R. Rodriguez
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943
Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k
Submitter : Fabio Comolli <fabio.comolli@gmail.com>
Date : 2009-08-06 20:15 (20 days old)
References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k
@ 2009-08-25 20:34 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Fabio Comolli, Luis R. Rodriguez
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943
Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k
Submitter : Fabio Comolli <fabio.comolli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
Date : 2009-08-06 20:15 (20 days old)
References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k
2009-08-25 20:34 ` Rafael J. Wysocki
@ 2009-08-26 6:39 ` Fabio Comolli
-1 siblings, 0 replies; 286+ messages in thread
From: Fabio Comolli @ 2009-08-26 6:39 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux Kernel Mailing List, Kernel Testers List, Luis R. Rodriguez
Still present as of -rc6-git7 (didn't try -rc7)
On Tue, Aug 25, 2009 at 10:34 PM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943
> Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k
> Submitter : Fabio Comolli <fabio.comolli@gmail.com>
> Date : 2009-08-06 20:15 (20 days old)
> References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4
>
>
>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k
@ 2009-08-26 6:39 ` Fabio Comolli
0 siblings, 0 replies; 286+ messages in thread
From: Fabio Comolli @ 2009-08-26 6:39 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux Kernel Mailing List, Kernel Testers List, Luis R. Rodriguez
Still present as of -rc6-git7 (didn't try -rc7)
On Tue, Aug 25, 2009 at 10:34 PM, Rafael J. Wysocki<rjw-KKrjLPT3xs0@public.gmane.org> wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943
> Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k
> Submitter : Fabio Comolli <fabio.comolli@gmail.com>
> Date : 2009-08-06 20:15 (20 days old)
> References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4
>
>
>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k
@ 2009-08-26 21:00 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-26 21:00 UTC (permalink / raw)
To: Fabio Comolli
Cc: Linux Kernel Mailing List, Kernel Testers List, Luis R. Rodriguez
On Wednesday 26 August 2009, Fabio Comolli wrote:
> Still present as of -rc6-git7 (didn't try -rc7)
Thanks for the update.
Rafael
> On Tue, Aug 25, 2009 at 10:34 PM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943
> > Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k
> > Submitter : Fabio Comolli <fabio.comolli@gmail.com>
> > Date : 2009-08-06 20:15 (20 days old)
> > References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k
@ 2009-08-26 21:00 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-26 21:00 UTC (permalink / raw)
To: Fabio Comolli
Cc: Linux Kernel Mailing List, Kernel Testers List, Luis R. Rodriguez
On Wednesday 26 August 2009, Fabio Comolli wrote:
> Still present as of -rc6-git7 (didn't try -rc7)
Thanks for the update.
Rafael
> On Tue, Aug 25, 2009 at 10:34 PM, Rafael J. Wysocki<rjw-KKrjLPT3xs0@public.gmane.org> wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943
> > Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k
> > Submitter : Fabio Comolli <fabio.comolli-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > Date : 2009-08-06 20:15 (20 days old)
> > References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13942] Troubles with AoE and uninitialized object
2009-08-25 20:00 ` Rafael J. Wysocki
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Bruno Prémont
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942
Subject : Troubles with AoE and uninitialized object
Submitter : Bruno Prémont <bonbons@linux-vserver.org>
Date : 2009-08-04 10:12 (22 days old)
References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13940] iwlagn and sky2 stopped working, ACPI-related
2009-08-25 20:00 ` Rafael J. Wysocki
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Ricardo Jorge da Fonseca Marques Ferreira
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940
Subject : iwlagn and sky2 stopped working, ACPI-related
Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net>
Date : 2009-08-07 22:33 (19 days old)
References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13940] iwlagn and sky2 stopped working, ACPI-related
@ 2009-08-25 20:34 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Ricardo Jorge da Fonseca Marques Ferreira
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940
Subject : iwlagn and sky2 stopped working, ACPI-related
Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm-cOTmPFJTJjbk1uMJSBkQmQ@public.gmane.org>
Date : 2009-08-07 22:33 (19 days old)
References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13940] iwlagn and sky2 stopped working, ACPI-related
2009-08-25 20:34 ` Rafael J. Wysocki
@ 2009-08-26 0:00 ` Ricardo Jorge da Fonseca Marques Ferreira
-1 siblings, 0 replies; 286+ messages in thread
From: Ricardo Jorge da Fonseca Marques Ferreira @ 2009-08-26 0:00 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Kernel Testers List
A patch has been proposed in the bugreport that fixes the problem, so if the
patch is commited, the regression is fixed for me. I don't think the patch was
commited yet.
On Tuesday 25 August 2009, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940
> Subject : iwlagn and sky2 stopped working, ACPI-related
> Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net>
> Date : 2009-08-07 22:33 (19 days old)
> References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4
>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13940] iwlagn and sky2 stopped working, ACPI-related
@ 2009-08-26 0:00 ` Ricardo Jorge da Fonseca Marques Ferreira
0 siblings, 0 replies; 286+ messages in thread
From: Ricardo Jorge da Fonseca Marques Ferreira @ 2009-08-26 0:00 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Kernel Testers List
A patch has been proposed in the bugreport that fixes the problem, so if the
patch is commited, the regression is fixed for me. I don't think the patch was
commited yet.
On Tuesday 25 August 2009, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940
> Subject : iwlagn and sky2 stopped working, ACPI-related
> Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm-cOTmPFJTJjbk1uMJSBkQmQ@public.gmane.org>
> Date : 2009-08-07 22:33 (19 days old)
> References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4
>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13940] iwlagn and sky2 stopped working, ACPI-related
@ 2009-08-26 20:58 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-26 20:58 UTC (permalink / raw)
To: Ricardo Jorge da Fonseca Marques Ferreira
Cc: Linux Kernel Mailing List, Kernel Testers List
On Wednesday 26 August 2009, Ricardo Jorge da Fonseca Marques Ferreira wrote:
> A patch has been proposed in the bugreport that fixes the problem, so if the
> patch is commited, the regression is fixed for me. I don't think the patch was
> commited yet.
Well, honestly, it doesn't seem it will be applied.
Thanks for the update anyway.
Rafael
> On Tuesday 25 August 2009, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940
> > Subject : iwlagn and sky2 stopped working, ACPI-related
> > Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net>
> > Date : 2009-08-07 22:33 (19 days old)
> > References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13940] iwlagn and sky2 stopped working, ACPI-related
@ 2009-08-26 20:58 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-26 20:58 UTC (permalink / raw)
To: Ricardo Jorge da Fonseca Marques Ferreira
Cc: Linux Kernel Mailing List, Kernel Testers List
On Wednesday 26 August 2009, Ricardo Jorge da Fonseca Marques Ferreira wrote:
> A patch has been proposed in the bugreport that fixes the problem, so if the
> patch is commited, the regression is fixed for me. I don't think the patch was
> commited yet.
Well, honestly, it doesn't seem it will be applied.
Thanks for the update anyway.
Rafael
> On Tuesday 25 August 2009, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940
> > Subject : iwlagn and sky2 stopped working, ACPI-related
> > Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm-cOTmPFJTJjbk1uMJSBkQmQ@public.gmane.org>
> > Date : 2009-08-07 22:33 (19 days old)
> > References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13947] Libertas: Association request to the driver failed
2009-08-25 20:00 ` Rafael J. Wysocki
` (14 preceding siblings ...)
(?)
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Daniel Mack, Dan Williams, John W. Linville,
Roel Kluin
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13947
Subject : Libertas: Association request to the driver failed
Submitter : Daniel Mack <daniel@caiaq.de>
Date : 2009-08-07 19:11 (19 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=57921c312e8cef72ba35a4cfe870b376da0b1b87
References : http://marc.info/?l=linux-kernel&m=124967234311481&w=4
Handled-By : Roel Kluin <roel.kluin@gmail.com>
Dan Williams <dcbw@redhat.com>
Patch : http://patchwork.kernel.org/patch/43114/
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13948] ath5k broken after suspend-to-ram
2009-08-25 20:00 ` Rafael J. Wysocki
` (15 preceding siblings ...)
(?)
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Bob Copeland, Johannes Stezenbach, Nick Kossifidis
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13948
Subject : ath5k broken after suspend-to-ram
Submitter : Johannes Stezenbach <js@sig21.net>
Date : 2009-08-07 21:51 (19 days old)
References : http://marc.info/?l=linux-kernel&m=124968192727854&w=4
Handled-By : Nick Kossifidis <mickflemm@gmail.com>
Patch : http://patchwork.kernel.org/patch/38550/
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13950] Oops when USB Serial disconnected while in use
2009-08-25 20:00 ` Rafael J. Wysocki
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Alan Stern, Bruno Prémont
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13950
Subject : Oops when USB Serial disconnected while in use
Submitter : Bruno Prémont <bonbons@linux-vserver.org>
Date : 2009-08-08 17:47 (18 days old)
References : http://marc.info/?l=linux-kernel&m=124975432900466&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13960] rtl8187 not connect to wifi
2009-08-25 20:00 ` Rafael J. Wysocki
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Larry Finger, okias
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13960
Subject : rtl8187 not connect to wifi
Submitter : okias <d.okias@gmail.com>
Date : 2009-08-10 19:16 (16 days old)
Handled-By : Larry Finger <Larry.Finger@lwfinger.net>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=22798
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13987] Received NMI interrupt at resume
2009-08-25 20:00 ` Rafael J. Wysocki
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Christian Casteyde
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13987
Subject : Received NMI interrupt at resume
Submitter : Christian Casteyde <casteyde.christian@free.fr>
Date : 2009-08-15 07:55 (11 days old)
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13987] Received NMI interrupt at resume
@ 2009-08-25 20:34 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Christian Casteyde
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13987
Subject : Received NMI interrupt at resume
Submitter : Christian Casteyde <casteyde.christian-GANU6spQydw@public.gmane.org>
Date : 2009-08-15 07:55 (11 days old)
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #14012] latest git fried my x86_64 imac
2009-08-25 20:00 ` Rafael J. Wysocki
` (19 preceding siblings ...)
(?)
@ 2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-26 0:28 ` Justin P. Mattock
-1 siblings, 1 reply; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Justin P. Mattock
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14012
Subject : latest git fried my x86_64 imac
Submitter : Justin P. Mattock <justinmattock@gmail.com>
Date : 2009-08-13 07:20 (13 days old)
References : http://marc.info/?l=linux-kernel&m=125014080427090&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14012] latest git fried my x86_64 imac
2009-08-25 20:34 ` [Bug #14012] latest git fried my x86_64 imac Rafael J. Wysocki
@ 2009-08-26 0:28 ` Justin P. Mattock
2009-08-26 21:06 ` Rafael J. Wysocki
0 siblings, 1 reply; 286+ messages in thread
From: Justin P. Mattock @ 2009-08-26 0:28 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Kernel Testers List
Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14012
> Subject : latest git fried my x86_64 imac
> Submitter : Justin P. Mattock<justinmattock@gmail.com>
> Date : 2009-08-13 07:20 (13 days old)
> References : http://marc.info/?l=linux-kernel&m=125014080427090&w=4
>
>
>
>
if I revert this commit:
af6af30c0fcd77e621638e53ef8b176bca8bd3b4
I can get a normal bootup.
As for this bug, it seems I'm the only
hitting this. The system is a fresh LFS build
x86_64.
In regards to keeping this open
not sure, I don't have a problem with closing this
and taking the blame as something I did during my build
of the system, then if this becomes more frequent
then open a new bug.
Justin P. Mattock
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14012] latest git fried my x86_64 imac
@ 2009-08-26 21:06 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-26 21:06 UTC (permalink / raw)
To: Justin P. Mattock
Cc: Linux Kernel Mailing List, Kernel Testers List, Peter Zijlstra,
Ingo Molnar
On Wednesday 26 August 2009, Justin P. Mattock wrote:
> Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14012
> > Subject : latest git fried my x86_64 imac
> > Submitter : Justin P. Mattock<justinmattock@gmail.com>
> > Date : 2009-08-13 07:20 (13 days old)
> > References : http://marc.info/?l=linux-kernel&m=125014080427090&w=4
> >
> >
> >
> >
> if I revert this commit:
> af6af30c0fcd77e621638e53ef8b176bca8bd3b4
> I can get a normal bootup.
Hm, that's
commit af6af30c0fcd77e621638e53ef8b176bca8bd3b4
Author: Peter Zijlstra <peterz@infradead.org>
Date: Wed Aug 5 20:41:04 2009 +0200
ftrace: Fix perf-tracepoint OOPS
I wonder what happens if you compile out ftrace?
> As for this bug, it seems I'm the only
> hitting this. The system is a fresh LFS build
> x86_64.
> In regards to keeping this open
> not sure, I don't have a problem with closing this
> and taking the blame as something I did during my build
> of the system, then if this becomes more frequent
> then open a new bug.
OK, I'll close it for now.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14012] latest git fried my x86_64 imac
@ 2009-08-26 21:06 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-26 21:06 UTC (permalink / raw)
To: Justin P. Mattock
Cc: Linux Kernel Mailing List, Kernel Testers List, Peter Zijlstra,
Ingo Molnar
On Wednesday 26 August 2009, Justin P. Mattock wrote:
> Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14012
> > Subject : latest git fried my x86_64 imac
> > Submitter : Justin P. Mattock<justinmattock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> > Date : 2009-08-13 07:20 (13 days old)
> > References : http://marc.info/?l=linux-kernel&m=125014080427090&w=4
> >
> >
> >
> >
> if I revert this commit:
> af6af30c0fcd77e621638e53ef8b176bca8bd3b4
> I can get a normal bootup.
Hm, that's
commit af6af30c0fcd77e621638e53ef8b176bca8bd3b4
Author: Peter Zijlstra <peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
Date: Wed Aug 5 20:41:04 2009 +0200
ftrace: Fix perf-tracepoint OOPS
I wonder what happens if you compile out ftrace?
> As for this bug, it seems I'm the only
> hitting this. The system is a fresh LFS build
> x86_64.
> In regards to keeping this open
> not sure, I don't have a problem with closing this
> and taking the blame as something I did during my build
> of the system, then if this becomes more frequent
> then open a new bug.
OK, I'll close it for now.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14012] latest git fried my x86_64 imac
@ 2009-08-26 21:58 ` Justin P. Mattock
0 siblings, 0 replies; 286+ messages in thread
From: Justin P. Mattock @ 2009-08-26 21:58 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux Kernel Mailing List, Kernel Testers List, Peter Zijlstra,
Ingo Molnar
Rafael J. Wysocki wrote:
> On Wednesday 26 August 2009, Justin P. Mattock wrote:
>
>> Rafael J. Wysocki wrote:
>>
>>> This message has been generated automatically as a part of a report
>>> of recent regressions.
>>>
>>> The following bug entry is on the current list of known regressions
>>> from 2.6.30. Please verify if it still should be listed and let me know
>>> (either way).
>>>
>>>
>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14012
>>> Subject : latest git fried my x86_64 imac
>>> Submitter : Justin P. Mattock<justinmattock@gmail.com>
>>> Date : 2009-08-13 07:20 (13 days old)
>>> References : http://marc.info/?l=linux-kernel&m=125014080427090&w=4
>>>
>>>
>>>
>>>
>>>
>> if I revert this commit:
>> af6af30c0fcd77e621638e53ef8b176bca8bd3b4
>> I can get a normal bootup.
>>
>
> Hm, that's
>
> commit af6af30c0fcd77e621638e53ef8b176bca8bd3b4
> Author: Peter Zijlstra<peterz@infradead.org>
> Date: Wed Aug 5 20:41:04 2009 +0200
>
> ftrace: Fix perf-tracepoint OOPS
>
> I wonder what happens if you compile out ftrace?
>
>
>> As for this bug, it seems I'm the only
>> hitting this. The system is a fresh LFS build
>> x86_64.
>> In regards to keeping this open
>> not sure, I don't have a problem with closing this
>> and taking the blame as something I did during my build
>> of the system, then if this becomes more frequent
>> then open a new bug.
>>
>
> OK, I'll close it for now.
>
> Thanks,
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
Ill give that a try and see.
Justin P. Mattock
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14012] latest git fried my x86_64 imac
@ 2009-08-26 21:58 ` Justin P. Mattock
0 siblings, 0 replies; 286+ messages in thread
From: Justin P. Mattock @ 2009-08-26 21:58 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux Kernel Mailing List, Kernel Testers List, Peter Zijlstra,
Ingo Molnar
Rafael J. Wysocki wrote:
> On Wednesday 26 August 2009, Justin P. Mattock wrote:
>
>> Rafael J. Wysocki wrote:
>>
>>> This message has been generated automatically as a part of a report
>>> of recent regressions.
>>>
>>> The following bug entry is on the current list of known regressions
>>> from 2.6.30. Please verify if it still should be listed and let me know
>>> (either way).
>>>
>>>
>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14012
>>> Subject : latest git fried my x86_64 imac
>>> Submitter : Justin P. Mattock<justinmattock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>> Date : 2009-08-13 07:20 (13 days old)
>>> References : http://marc.info/?l=linux-kernel&m=125014080427090&w=4
>>>
>>>
>>>
>>>
>>>
>> if I revert this commit:
>> af6af30c0fcd77e621638e53ef8b176bca8bd3b4
>> I can get a normal bootup.
>>
>
> Hm, that's
>
> commit af6af30c0fcd77e621638e53ef8b176bca8bd3b4
> Author: Peter Zijlstra<peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
> Date: Wed Aug 5 20:41:04 2009 +0200
>
> ftrace: Fix perf-tracepoint OOPS
>
> I wonder what happens if you compile out ftrace?
>
>
>> As for this bug, it seems I'm the only
>> hitting this. The system is a fresh LFS build
>> x86_64.
>> In regards to keeping this open
>> not sure, I don't have a problem with closing this
>> and taking the blame as something I did during my build
>> of the system, then if this becomes more frequent
>> then open a new bug.
>>
>
> OK, I'll close it for now.
>
> Thanks,
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
Ill give that a try and see.
Justin P. Mattock
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14012] latest git fried my x86_64 imac
@ 2009-08-27 18:01 ` Justin P. Mattock
0 siblings, 0 replies; 286+ messages in thread
From: Justin P. Mattock @ 2009-08-27 18:01 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux Kernel Mailing List, Kernel Testers List, Peter Zijlstra,
Ingo Molnar
Rafael J. Wysocki wrote:
> On Wednesday 26 August 2009, Justin P. Mattock wrote:
>
>> Rafael J. Wysocki wrote:
>>
>>> This message has been generated automatically as a part of a report
>>> of recent regressions.
>>>
>>> The following bug entry is on the current list of known regressions
>>> from 2.6.30. Please verify if it still should be listed and let me know
>>> (either way).
>>>
>>>
>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14012
>>> Subject : latest git fried my x86_64 imac
>>> Submitter : Justin P. Mattock<justinmattock@gmail.com>
>>> Date : 2009-08-13 07:20 (13 days old)
>>> References : http://marc.info/?l=linux-kernel&m=125014080427090&w=4
>>>
>>>
>>>
>>>
>>>
>> if I revert this commit:
>> af6af30c0fcd77e621638e53ef8b176bca8bd3b4
>> I can get a normal bootup.
>>
>
> Hm, that's
>
> commit af6af30c0fcd77e621638e53ef8b176bca8bd3b4
> Author: Peter Zijlstra<peterz@infradead.org>
> Date: Wed Aug 5 20:41:04 2009 +0200
>
> ftrace: Fix perf-tracepoint OOPS
>
> I wonder what happens if you compile out ftrace?
>
>
>> As for this bug, it seems I'm the only
>> hitting this. The system is a fresh LFS build
>> x86_64.
>> In regards to keeping this open
>> not sure, I don't have a problem with closing this
>> and taking the blame as something I did during my build
>> of the system, then if this becomes more frequent
>> then open a new bug.
>>
>
> OK, I'll close it for now.
>
> Thanks,
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
o.k. I tried disabling all of ftrace in the kernel,
unfortunately the only one left
is HAVE_FTRACE_SYSCALLS
which seems to be selected by x86.
seems the system still sticks
without reverting perf-tracepoint oops.
Justin P. Mattock
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14012] latest git fried my x86_64 imac
@ 2009-08-27 18:01 ` Justin P. Mattock
0 siblings, 0 replies; 286+ messages in thread
From: Justin P. Mattock @ 2009-08-27 18:01 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux Kernel Mailing List, Kernel Testers List, Peter Zijlstra,
Ingo Molnar
Rafael J. Wysocki wrote:
> On Wednesday 26 August 2009, Justin P. Mattock wrote:
>
>> Rafael J. Wysocki wrote:
>>
>>> This message has been generated automatically as a part of a report
>>> of recent regressions.
>>>
>>> The following bug entry is on the current list of known regressions
>>> from 2.6.30. Please verify if it still should be listed and let me know
>>> (either way).
>>>
>>>
>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14012
>>> Subject : latest git fried my x86_64 imac
>>> Submitter : Justin P. Mattock<justinmattock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>> Date : 2009-08-13 07:20 (13 days old)
>>> References : http://marc.info/?l=linux-kernel&m=125014080427090&w=4
>>>
>>>
>>>
>>>
>>>
>> if I revert this commit:
>> af6af30c0fcd77e621638e53ef8b176bca8bd3b4
>> I can get a normal bootup.
>>
>
> Hm, that's
>
> commit af6af30c0fcd77e621638e53ef8b176bca8bd3b4
> Author: Peter Zijlstra<peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
> Date: Wed Aug 5 20:41:04 2009 +0200
>
> ftrace: Fix perf-tracepoint OOPS
>
> I wonder what happens if you compile out ftrace?
>
>
>> As for this bug, it seems I'm the only
>> hitting this. The system is a fresh LFS build
>> x86_64.
>> In regards to keeping this open
>> not sure, I don't have a problem with closing this
>> and taking the blame as something I did during my build
>> of the system, then if this becomes more frequent
>> then open a new bug.
>>
>
> OK, I'll close it for now.
>
> Thanks,
> Rafael
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at http://www.tux.org/lkml/
>
>
o.k. I tried disabling all of ftrace in the kernel,
unfortunately the only one left
is HAVE_FTRACE_SYSCALLS
which seems to be selected by x86.
seems the system still sticks
without reverting perf-tracepoint oops.
Justin P. Mattock
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14012] latest git fried my x86_64 imac
@ 2009-08-27 19:45 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-27 19:45 UTC (permalink / raw)
To: Justin P. Mattock
Cc: Linux Kernel Mailing List, Kernel Testers List, Peter Zijlstra,
Ingo Molnar
On Thursday 27 August 2009, Justin P. Mattock wrote:
> Rafael J. Wysocki wrote:
> > On Wednesday 26 August 2009, Justin P. Mattock wrote:
> >
> >> Rafael J. Wysocki wrote:
> >>
> >>> This message has been generated automatically as a part of a report
> >>> of recent regressions.
> >>>
> >>> The following bug entry is on the current list of known regressions
> >>> from 2.6.30. Please verify if it still should be listed and let me know
> >>> (either way).
> >>>
> >>>
> >>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14012
> >>> Subject : latest git fried my x86_64 imac
> >>> Submitter : Justin P. Mattock<justinmattock@gmail.com>
> >>> Date : 2009-08-13 07:20 (13 days old)
> >>> References : http://marc.info/?l=linux-kernel&m=125014080427090&w=4
> >>>
> >>>
> >>>
> >>>
> >>>
> >> if I revert this commit:
> >> af6af30c0fcd77e621638e53ef8b176bca8bd3b4
> >> I can get a normal bootup.
> >>
> >
> > Hm, that's
> >
> > commit af6af30c0fcd77e621638e53ef8b176bca8bd3b4
> > Author: Peter Zijlstra<peterz@infradead.org>
> > Date: Wed Aug 5 20:41:04 2009 +0200
> >
> > ftrace: Fix perf-tracepoint OOPS
> >
> > I wonder what happens if you compile out ftrace?
> >
> >
> >> As for this bug, it seems I'm the only
> >> hitting this. The system is a fresh LFS build
> >> x86_64.
> >> In regards to keeping this open
> >> not sure, I don't have a problem with closing this
> >> and taking the blame as something I did during my build
> >> of the system, then if this becomes more frequent
> >> then open a new bug.
> >>
> >
> > OK, I'll close it for now.
> >
> > Thanks,
> > Rafael
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo@vger.kernel.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> >
> o.k. I tried disabling all of ftrace in the kernel,
> unfortunately the only one left
> is HAVE_FTRACE_SYSCALLS
> which seems to be selected by x86.
> seems the system still sticks
> without reverting perf-tracepoint oops.
That's kind of strange. Can you attach the .config, please?
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14012] latest git fried my x86_64 imac
@ 2009-08-27 19:45 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-27 19:45 UTC (permalink / raw)
To: Justin P. Mattock
Cc: Linux Kernel Mailing List, Kernel Testers List, Peter Zijlstra,
Ingo Molnar
On Thursday 27 August 2009, Justin P. Mattock wrote:
> Rafael J. Wysocki wrote:
> > On Wednesday 26 August 2009, Justin P. Mattock wrote:
> >
> >> Rafael J. Wysocki wrote:
> >>
> >>> This message has been generated automatically as a part of a report
> >>> of recent regressions.
> >>>
> >>> The following bug entry is on the current list of known regressions
> >>> from 2.6.30. Please verify if it still should be listed and let me know
> >>> (either way).
> >>>
> >>>
> >>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14012
> >>> Subject : latest git fried my x86_64 imac
> >>> Submitter : Justin P. Mattock<justinmattock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> >>> Date : 2009-08-13 07:20 (13 days old)
> >>> References : http://marc.info/?l=linux-kernel&m=125014080427090&w=4
> >>>
> >>>
> >>>
> >>>
> >>>
> >> if I revert this commit:
> >> af6af30c0fcd77e621638e53ef8b176bca8bd3b4
> >> I can get a normal bootup.
> >>
> >
> > Hm, that's
> >
> > commit af6af30c0fcd77e621638e53ef8b176bca8bd3b4
> > Author: Peter Zijlstra<peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
> > Date: Wed Aug 5 20:41:04 2009 +0200
> >
> > ftrace: Fix perf-tracepoint OOPS
> >
> > I wonder what happens if you compile out ftrace?
> >
> >
> >> As for this bug, it seems I'm the only
> >> hitting this. The system is a fresh LFS build
> >> x86_64.
> >> In regards to keeping this open
> >> not sure, I don't have a problem with closing this
> >> and taking the blame as something I did during my build
> >> of the system, then if this becomes more frequent
> >> then open a new bug.
> >>
> >
> > OK, I'll close it for now.
> >
> > Thanks,
> > Rafael
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at http://www.tux.org/lkml/
> >
> >
> o.k. I tried disabling all of ftrace in the kernel,
> unfortunately the only one left
> is HAVE_FTRACE_SYSCALLS
> which seems to be selected by x86.
> seems the system still sticks
> without reverting perf-tracepoint oops.
That's kind of strange. Can you attach the .config, please?
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14012] latest git fried my x86_64 imac
2009-08-27 19:45 ` Rafael J. Wysocki
(?)
@ 2009-08-27 20:47 ` Randy Dunlap
2009-08-27 21:01 ` Justin P. Mattock
-1 siblings, 1 reply; 286+ messages in thread
From: Randy Dunlap @ 2009-08-27 20:47 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Justin P. Mattock, Linux Kernel Mailing List,
Kernel Testers List, Peter Zijlstra, Ingo Molnar
On Thu, 27 Aug 2009 21:45:01 +0200 Rafael J. Wysocki wrote:
> On Thursday 27 August 2009, Justin P. Mattock wrote:
> > Rafael J. Wysocki wrote:
> > > On Wednesday 26 August 2009, Justin P. Mattock wrote:
> > >
> > >> Rafael J. Wysocki wrote:
> > >>
> > >>> This message has been generated automatically as a part of a report
> > >>> of recent regressions.
> > >>>
> > >>> The following bug entry is on the current list of known regressions
> > >>> from 2.6.30. Please verify if it still should be listed and let me know
> > >>> (either way).
> > >>>
> > >>>
> > >>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14012
> > >>> Subject : latest git fried my x86_64 imac
> > >>> Submitter : Justin P. Mattock<justinmattock@gmail.com>
> > >>> Date : 2009-08-13 07:20 (13 days old)
> > >>> References : http://marc.info/?l=linux-kernel&m=125014080427090&w=4
> > >>>
> > >>>
> > >>>
> > >>>
> > >>>
> > >> if I revert this commit:
> > >> af6af30c0fcd77e621638e53ef8b176bca8bd3b4
> > >> I can get a normal bootup.
> > >>
> > >
> > > Hm, that's
> > >
> > > commit af6af30c0fcd77e621638e53ef8b176bca8bd3b4
> > > Author: Peter Zijlstra<peterz@infradead.org>
> > > Date: Wed Aug 5 20:41:04 2009 +0200
> > >
> > > ftrace: Fix perf-tracepoint OOPS
> > >
> > > I wonder what happens if you compile out ftrace?
> > >
> > >
> > >> As for this bug, it seems I'm the only
> > >> hitting this. The system is a fresh LFS build
> > >> x86_64.
> > >> In regards to keeping this open
> > >> not sure, I don't have a problem with closing this
> > >> and taking the blame as something I did during my build
> > >> of the system, then if this becomes more frequent
> > >> then open a new bug.
> > >>
> > >
> > > OK, I'll close it for now.
> > >
> > > Thanks,
> > > Rafael
> > > --
> > > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > > the body of a message to majordomo@vger.kernel.org
> > > More majordomo info at http://vger.kernel.org/majordomo-info.html
> > > Please read the FAQ at http://www.tux.org/lkml/
> > >
> > >
> > o.k. I tried disabling all of ftrace in the kernel,
> > unfortunately the only one left
> > is HAVE_FTRACE_SYSCALLS
> > which seems to be selected by x86.
> > seems the system still sticks
> > without reverting perf-tracepoint oops.
>
> That's kind of strange. Can you attach the .config, please?
That's what arch/x86/Kconfig does:
### Arch settings
config X86
def_bool y
...
select HAVE_FTRACE_SYSCALLS
It just means that the $arch has that capability, not that it is enabled.
---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14012] latest git fried my x86_64 imac
@ 2009-08-27 21:01 ` Justin P. Mattock
0 siblings, 0 replies; 286+ messages in thread
From: Justin P. Mattock @ 2009-08-27 21:01 UTC (permalink / raw)
To: Randy Dunlap
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Peter Zijlstra, Ingo Molnar
Randy Dunlap wrote:
> On Thu, 27 Aug 2009 21:45:01 +0200 Rafael J. Wysocki wrote:
>
>
>> On Thursday 27 August 2009, Justin P. Mattock wrote:
>>
>>> Rafael J. Wysocki wrote:
>>>
>>>> On Wednesday 26 August 2009, Justin P. Mattock wrote:
>>>>
>>>>
>>>>> Rafael J. Wysocki wrote:
>>>>>
>>>>>
>>>>>> This message has been generated automatically as a part of a report
>>>>>> of recent regressions.
>>>>>>
>>>>>> The following bug entry is on the current list of known regressions
>>>>>> from 2.6.30. Please verify if it still should be listed and let me know
>>>>>> (either way).
>>>>>>
>>>>>>
>>>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14012
>>>>>> Subject : latest git fried my x86_64 imac
>>>>>> Submitter : Justin P. Mattock<justinmattock@gmail.com>
>>>>>> Date : 2009-08-13 07:20 (13 days old)
>>>>>> References : http://marc.info/?l=linux-kernel&m=125014080427090&w=4
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> if I revert this commit:
>>>>> af6af30c0fcd77e621638e53ef8b176bca8bd3b4
>>>>> I can get a normal bootup.
>>>>>
>>>>>
>>>> Hm, that's
>>>>
>>>> commit af6af30c0fcd77e621638e53ef8b176bca8bd3b4
>>>> Author: Peter Zijlstra<peterz@infradead.org>
>>>> Date: Wed Aug 5 20:41:04 2009 +0200
>>>>
>>>> ftrace: Fix perf-tracepoint OOPS
>>>>
>>>> I wonder what happens if you compile out ftrace?
>>>>
>>>>
>>>>
>>>>> As for this bug, it seems I'm the only
>>>>> hitting this. The system is a fresh LFS build
>>>>> x86_64.
>>>>> In regards to keeping this open
>>>>> not sure, I don't have a problem with closing this
>>>>> and taking the blame as something I did during my build
>>>>> of the system, then if this becomes more frequent
>>>>> then open a new bug.
>>>>>
>>>>>
>>>> OK, I'll close it for now.
>>>>
>>>> Thanks,
>>>> Rafael
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>>> the body of a message to majordomo@vger.kernel.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>> Please read the FAQ at http://www.tux.org/lkml/
>>>>
>>>>
>>>>
>>> o.k. I tried disabling all of ftrace in the kernel,
>>> unfortunately the only one left
>>> is HAVE_FTRACE_SYSCALLS
>>> which seems to be selected by x86.
>>> seems the system still sticks
>>> without reverting perf-tracepoint oops.
>>>
>> That's kind of strange. Can you attach the .config, please?
>>
>
> That's what arch/x86/Kconfig does:
>
> ### Arch settings
> config X86
> def_bool y
> ...
> select HAVE_FTRACE_SYSCALLS
>
> It just means that the $arch has that capability, not that it is enabled.
>
>
> ---
> ~Randy
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
>
>
Alright(see how much of newbie I am),
then disabling ftrace(if its safe to say)
still doesn't resolve the issue
for me then.
best bet, in my honest opinion
is to hold off on anything, until
"if any", other reports start showing up in this manner.
if in the future there is no such reports then it's
probably safe to say I did something wrong.
Then if there is issues later in time by anybody,
then we have an idea of where/what might be the causing this.
Justin P. Mattock
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14012] latest git fried my x86_64 imac
@ 2009-08-27 21:01 ` Justin P. Mattock
0 siblings, 0 replies; 286+ messages in thread
From: Justin P. Mattock @ 2009-08-27 21:01 UTC (permalink / raw)
To: Randy Dunlap
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Peter Zijlstra, Ingo Molnar
Randy Dunlap wrote:
> On Thu, 27 Aug 2009 21:45:01 +0200 Rafael J. Wysocki wrote:
>
>
>> On Thursday 27 August 2009, Justin P. Mattock wrote:
>>
>>> Rafael J. Wysocki wrote:
>>>
>>>> On Wednesday 26 August 2009, Justin P. Mattock wrote:
>>>>
>>>>
>>>>> Rafael J. Wysocki wrote:
>>>>>
>>>>>
>>>>>> This message has been generated automatically as a part of a report
>>>>>> of recent regressions.
>>>>>>
>>>>>> The following bug entry is on the current list of known regressions
>>>>>> from 2.6.30. Please verify if it still should be listed and let me know
>>>>>> (either way).
>>>>>>
>>>>>>
>>>>>> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14012
>>>>>> Subject : latest git fried my x86_64 imac
>>>>>> Submitter : Justin P. Mattock<justinmattock-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
>>>>>> Date : 2009-08-13 07:20 (13 days old)
>>>>>> References : http://marc.info/?l=linux-kernel&m=125014080427090&w=4
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>> if I revert this commit:
>>>>> af6af30c0fcd77e621638e53ef8b176bca8bd3b4
>>>>> I can get a normal bootup.
>>>>>
>>>>>
>>>> Hm, that's
>>>>
>>>> commit af6af30c0fcd77e621638e53ef8b176bca8bd3b4
>>>> Author: Peter Zijlstra<peterz-wEGCiKHe2LqWVfeAwA7xHQ@public.gmane.org>
>>>> Date: Wed Aug 5 20:41:04 2009 +0200
>>>>
>>>> ftrace: Fix perf-tracepoint OOPS
>>>>
>>>> I wonder what happens if you compile out ftrace?
>>>>
>>>>
>>>>
>>>>> As for this bug, it seems I'm the only
>>>>> hitting this. The system is a fresh LFS build
>>>>> x86_64.
>>>>> In regards to keeping this open
>>>>> not sure, I don't have a problem with closing this
>>>>> and taking the blame as something I did during my build
>>>>> of the system, then if this becomes more frequent
>>>>> then open a new bug.
>>>>>
>>>>>
>>>> OK, I'll close it for now.
>>>>
>>>> Thanks,
>>>> Rafael
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
>>>> the body of a message to majordomo-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html
>>>> Please read the FAQ at http://www.tux.org/lkml/
>>>>
>>>>
>>>>
>>> o.k. I tried disabling all of ftrace in the kernel,
>>> unfortunately the only one left
>>> is HAVE_FTRACE_SYSCALLS
>>> which seems to be selected by x86.
>>> seems the system still sticks
>>> without reverting perf-tracepoint oops.
>>>
>> That's kind of strange. Can you attach the .config, please?
>>
>
> That's what arch/x86/Kconfig does:
>
> ### Arch settings
> config X86
> def_bool y
> ...
> select HAVE_FTRACE_SYSCALLS
>
> It just means that the $arch has that capability, not that it is enabled.
>
>
> ---
> ~Randy
> *** Remember to use Documentation/SubmitChecklist when testing your code ***
>
>
Alright(see how much of newbie I am),
then disabling ftrace(if its safe to say)
still doesn't resolve the issue
for me then.
best bet, in my honest opinion
is to hold off on anything, until
"if any", other reports start showing up in this manner.
if in the future there is no such reports then it's
probably safe to say I did something wrong.
Then if there is issues later in time by anybody,
then we have an idea of where/what might be the causing this.
Justin P. Mattock
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #14011] Kernel paging request failed in kmem_cache_alloc
2009-08-25 20:00 ` Rafael J. Wysocki
` (20 preceding siblings ...)
(?)
@ 2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-26 6:17 ` Pekka Enberg
-1 siblings, 1 reply; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Matthias Dahl
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14011
Subject : Kernel paging request failed in kmem_cache_alloc
Submitter : Matthias Dahl <ml_kernel@mortal-soul.de>
Date : 2009-08-10 22:26 (16 days old)
References : http://marc.info/?l=linux-kernel&m=124993603825082&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14011] Kernel paging request failed in kmem_cache_alloc
2009-08-25 20:34 ` [Bug #14011] Kernel paging request failed in kmem_cache_alloc Rafael J. Wysocki
@ 2009-08-26 6:17 ` Pekka Enberg
0 siblings, 0 replies; 286+ messages in thread
From: Pekka Enberg @ 2009-08-26 6:17 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux Kernel Mailing List, Kernel Testers List, Matthias Dahl
Hi Matthias,
On Tue, Aug 25, 2009 at 11:34 PM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14011
> Subject : Kernel paging request failed in kmem_cache_alloc
> Submitter : Matthias Dahl <ml_kernel@mortal-soul.de>
> Date : 2009-08-10 22:26 (16 days old)
> References : http://marc.info/?l=linux-kernel&m=124993603825082&w=4
Can you reproduce the bug without the proprietary nvidia module that
seems to be loaded?
Pekka
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14011] Kernel paging request failed in kmem_cache_alloc
@ 2009-08-26 6:17 ` Pekka Enberg
0 siblings, 0 replies; 286+ messages in thread
From: Pekka Enberg @ 2009-08-26 6:17 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux Kernel Mailing List, Kernel Testers List, Matthias Dahl
Hi Matthias,
On Tue, Aug 25, 2009 at 11:34 PM, Rafael J. Wysocki<rjw-KKrjLPT3xs0@public.gmane.org> wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14011
> Subject : Kernel paging request failed in kmem_cache_alloc
> Submitter : Matthias Dahl <ml_kernel-Rk1lLwyeSiSCvTm3UDtA3g@public.gmane.org>
> Date : 2009-08-10 22:26 (16 days old)
> References : http://marc.info/?l=linux-kernel&m=124993603825082&w=4
Can you reproduce the bug without the proprietary nvidia module that
seems to be loaded?
Pekka
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14011] Kernel paging request failed in kmem_cache_alloc
2009-08-26 6:17 ` Pekka Enberg
(?)
@ 2009-08-26 14:01 ` Matthias Dahl
2009-08-26 14:59 ` Pekka Enberg
-1 siblings, 1 reply; 286+ messages in thread
From: Matthias Dahl @ 2009-08-26 14:01 UTC (permalink / raw)
To: Pekka Enberg
Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List
Hi Pekka,
> Can you reproduce the bug without the proprietary nvidia module that
> seems to be loaded?
I am sorry but I forgot to test that and right now I am not very keen on
trying again since this is my primary machine and I had quite some fs
corruption (ext4 on md raid5 -> no barriers) the last times. :-( But this also
happened w/o Xorg ever being run during that session (though naturally the
nvidia kernel module was still loaded).
So long,
Matthias
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14011] Kernel paging request failed in kmem_cache_alloc
@ 2009-08-26 14:59 ` Pekka Enberg
0 siblings, 0 replies; 286+ messages in thread
From: Pekka Enberg @ 2009-08-26 14:59 UTC (permalink / raw)
To: Matthias Dahl
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Paris
On Wed, Aug 26, 2009 at 5:01 PM, Matthias Dahl<ml_kernel@mortal-soul.de> wrote:
>> Can you reproduce the bug without the proprietary nvidia module that
>> seems to be loaded?
>
> I am sorry but I forgot to test that and right now I am not very keen on
> trying again since this is my primary machine and I had quite some fs
> corruption (ext4 on md raid5 -> no barriers) the last times. :-( But this also
> happened w/o Xorg ever being run during that session (though naturally the
> nvidia kernel module was still loaded).
Sure, I can understand that. The bug looks like regular slab
corruption which could have been caused the nvidia blob. So I think
the issue should be closed unless someone can reproduce it without the
blob.
That said, sys_inotify_add_watch() also appears in the trace and
there's been quite a few bug fixes in that area recently so I guess we
should CC Eric Paris just in case the oops rings a bell to him.
Pekka
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14011] Kernel paging request failed in kmem_cache_alloc
@ 2009-08-26 14:59 ` Pekka Enberg
0 siblings, 0 replies; 286+ messages in thread
From: Pekka Enberg @ 2009-08-26 14:59 UTC (permalink / raw)
To: Matthias Dahl
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Paris
On Wed, Aug 26, 2009 at 5:01 PM, Matthias Dahl<ml_kernel-Rk1lLwyeSiSCvTm3UDtA3g@public.gmane.org> wrote:
>> Can you reproduce the bug without the proprietary nvidia module that
>> seems to be loaded?
>
> I am sorry but I forgot to test that and right now I am not very keen on
> trying again since this is my primary machine and I had quite some fs
> corruption (ext4 on md raid5 -> no barriers) the last times. :-( But this also
> happened w/o Xorg ever being run during that session (though naturally the
> nvidia kernel module was still loaded).
Sure, I can understand that. The bug looks like regular slab
corruption which could have been caused the nvidia blob. So I think
the issue should be closed unless someone can reproduce it without the
blob.
That said, sys_inotify_add_watch() also appears in the trace and
there's been quite a few bug fixes in that area recently so I guess we
should CC Eric Paris just in case the oops rings a bell to him.
Pekka
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14011] Kernel paging request failed in kmem_cache_alloc
@ 2009-08-26 15:08 ` Eric Paris
0 siblings, 0 replies; 286+ messages in thread
From: Eric Paris @ 2009-08-26 15:08 UTC (permalink / raw)
To: Pekka Enberg
Cc: Matthias Dahl, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List
On Wed, 2009-08-26 at 17:59 +0300, Pekka Enberg wrote:
> On Wed, Aug 26, 2009 at 5:01 PM, Matthias Dahl<ml_kernel@mortal-soul.de> wrote:
> >> Can you reproduce the bug without the proprietary nvidia module that
> >> seems to be loaded?
> >
> > I am sorry but I forgot to test that and right now I am not very keen on
> > trying again since this is my primary machine and I had quite some fs
> > corruption (ext4 on md raid5 -> no barriers) the last times. :-( But this also
> > happened w/o Xorg ever being run during that session (though naturally the
> > nvidia kernel module was still loaded).
>
> Sure, I can understand that. The bug looks like regular slab
> corruption which could have been caused the nvidia blob. So I think
> the issue should be closed unless someone can reproduce it without the
> blob.
>
> That said, sys_inotify_add_watch() also appears in the trace and
> there's been quite a few bug fixes in that area recently so I guess we
> should CC Eric Paris just in case the oops rings a bell to him.
Nope, no bells here :( That slab cache is declared globally and
allocated at __init time. I haven't seen any reports of writes off the
ends of marks which might mess up a chache... sorry.....
-Eric
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14011] Kernel paging request failed in kmem_cache_alloc
@ 2009-08-26 15:08 ` Eric Paris
0 siblings, 0 replies; 286+ messages in thread
From: Eric Paris @ 2009-08-26 15:08 UTC (permalink / raw)
To: Pekka Enberg
Cc: Matthias Dahl, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List
On Wed, 2009-08-26 at 17:59 +0300, Pekka Enberg wrote:
> On Wed, Aug 26, 2009 at 5:01 PM, Matthias Dahl<ml_kernel-Rk1lLwyeSiSCvTm3UDtA3g@public.gmane.org> wrote:
> >> Can you reproduce the bug without the proprietary nvidia module that
> >> seems to be loaded?
> >
> > I am sorry but I forgot to test that and right now I am not very keen on
> > trying again since this is my primary machine and I had quite some fs
> > corruption (ext4 on md raid5 -> no barriers) the last times. :-( But this also
> > happened w/o Xorg ever being run during that session (though naturally the
> > nvidia kernel module was still loaded).
>
> Sure, I can understand that. The bug looks like regular slab
> corruption which could have been caused the nvidia blob. So I think
> the issue should be closed unless someone can reproduce it without the
> blob.
>
> That said, sys_inotify_add_watch() also appears in the trace and
> there's been quite a few bug fixes in that area recently so I guess we
> should CC Eric Paris just in case the oops rings a bell to him.
Nope, no bells here :( That slab cache is declared globally and
allocated at __init time. I haven't seen any reports of writes off the
ends of marks which might mess up a chache... sorry.....
-Eric
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14011] Kernel paging request failed in kmem_cache_alloc
@ 2009-08-26 21:03 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-26 21:03 UTC (permalink / raw)
To: Pekka Enberg
Cc: Matthias Dahl, Linux Kernel Mailing List, Kernel Testers List,
Eric Paris
On Wednesday 26 August 2009, Pekka Enberg wrote:
> On Wed, Aug 26, 2009 at 5:01 PM, Matthias Dahl<ml_kernel@mortal-soul.de> wrote:
> >> Can you reproduce the bug without the proprietary nvidia module that
> >> seems to be loaded?
> >
> > I am sorry but I forgot to test that and right now I am not very keen on
> > trying again since this is my primary machine and I had quite some fs
> > corruption (ext4 on md raid5 -> no barriers) the last times. :-( But this also
> > happened w/o Xorg ever being run during that session (though naturally the
> > nvidia kernel module was still loaded).
>
> Sure, I can understand that. The bug looks like regular slab
> corruption which could have been caused the nvidia blob. So I think
> the issue should be closed unless someone can reproduce it without the
> blob.
I've closed it as "insufficient data".
Thanks,
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14011] Kernel paging request failed in kmem_cache_alloc
@ 2009-08-26 21:03 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-26 21:03 UTC (permalink / raw)
To: Pekka Enberg
Cc: Matthias Dahl, Linux Kernel Mailing List, Kernel Testers List,
Eric Paris
On Wednesday 26 August 2009, Pekka Enberg wrote:
> On Wed, Aug 26, 2009 at 5:01 PM, Matthias Dahl<ml_kernel-Rk1lLwyeSiSCvTm3UDtA3g@public.gmane.org> wrote:
> >> Can you reproduce the bug without the proprietary nvidia module that
> >> seems to be loaded?
> >
> > I am sorry but I forgot to test that and right now I am not very keen on
> > trying again since this is my primary machine and I had quite some fs
> > corruption (ext4 on md raid5 -> no barriers) the last times. :-( But this also
> > happened w/o Xorg ever being run during that session (though naturally the
> > nvidia kernel module was still loaded).
>
> Sure, I can understand that. The bug looks like regular slab
> corruption which could have been caused the nvidia blob. So I think
> the issue should be closed unless someone can reproduce it without the
> blob.
I've closed it as "insufficient data".
Thanks,
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #14016] mm/ipw2200 regression
2009-08-25 20:00 ` Rafael J. Wysocki
` (21 preceding siblings ...)
(?)
@ 2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-26 6:09 ` Pekka Enberg
-1 siblings, 1 reply; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Bartlomiej Zolnierkiewicz
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14016
Subject : mm/ipw2200 regression
Submitter : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date : 2009-08-15 16:56 (11 days old)
References : http://marc.info/?l=linux-kernel&m=125036437221408&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14016] mm/ipw2200 regression
2009-08-25 20:34 ` [Bug #14016] mm/ipw2200 regression Rafael J. Wysocki
@ 2009-08-26 6:09 ` Pekka Enberg
0 siblings, 0 replies; 286+ messages in thread
From: Pekka Enberg @ 2009-08-26 6:09 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux Kernel Mailing List, Kernel Testers List,
Bartlomiej Zolnierkiewicz, Mel Gorman, Andrew Morton, linux-mm
On Tue, Aug 25, 2009 at 11:34 PM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14016
> Subject : mm/ipw2200 regression
> Submitter : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> Date : 2009-08-15 16:56 (11 days old)
> References : http://marc.info/?l=linux-kernel&m=125036437221408&w=4
If am reading the page allocator dump correctly, there's plenty of
pages left but we're unable to satisfy an order 6 allocation. There's
no slab allocator involved so the page allocator changes that went
into 2.6.31 seem likely. Mel, ideas?
Bartlomiej, can we see your .config, please?
Pekka
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14016] mm/ipw2200 regression
@ 2009-08-26 6:09 ` Pekka Enberg
0 siblings, 0 replies; 286+ messages in thread
From: Pekka Enberg @ 2009-08-26 6:09 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux Kernel Mailing List, Kernel Testers List,
Bartlomiej Zolnierkiewicz, Mel Gorman, Andrew Morton, linux-mm
On Tue, Aug 25, 2009 at 11:34 PM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14016
> Subject : mm/ipw2200 regression
> Submitter : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> Date : 2009-08-15 16:56 (11 days old)
> References : http://marc.info/?l=linux-kernel&m=125036437221408&w=4
If am reading the page allocator dump correctly, there's plenty of
pages left but we're unable to satisfy an order 6 allocation. There's
no slab allocator involved so the page allocator changes that went
into 2.6.31 seem likely. Mel, ideas?
Bartlomiej, can we see your .config, please?
Pekka
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14016] mm/ipw2200 regression
2009-08-26 6:09 ` Pekka Enberg
@ 2009-08-26 8:27 ` Johannes Weiner
-1 siblings, 0 replies; 286+ messages in thread
From: Johannes Weiner @ 2009-08-26 8:27 UTC (permalink / raw)
To: Pekka Enberg
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Bartlomiej Zolnierkiewicz, Mel Gorman,
Andrew Morton, netdev, linux-mm
[Cc netdev]
On Wed, Aug 26, 2009 at 09:09:44AM +0300, Pekka Enberg wrote:
> On Tue, Aug 25, 2009 at 11:34 PM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14016
> > Subject : mm/ipw2200 regression
> > Submitter : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> > Date : 2009-08-15 16:56 (11 days old)
> > References : http://marc.info/?l=linux-kernel&m=125036437221408&w=4
>
> If am reading the page allocator dump correctly, there's plenty of
> pages left but we're unable to satisfy an order 6 allocation. There's
> no slab allocator involved so the page allocator changes that went
> into 2.6.31 seem likely. Mel, ideas?
It's an atomic order-6 allocation, the chances for this to succeed
after some uptime become infinitesimal. The chunks > order-2 are
pretty much exhausted on this dump.
64 pages, presumably 256k, for fw->boot_size while current ipw
firmware images have ~188k. I don't know jack squat about this
driver, but given the field name and the struct:
struct ipw_fw {
__le32 ver;
__le32 boot_size;
__le32 ucode_size;
__le32 fw_size;
u8 data[0];
};
fw->boot_size alone being that big sounds a bit fishy to me.
Hannes
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14016] mm/ipw2200 regression
@ 2009-08-26 8:27 ` Johannes Weiner
0 siblings, 0 replies; 286+ messages in thread
From: Johannes Weiner @ 2009-08-26 8:27 UTC (permalink / raw)
To: Pekka Enberg
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Bartlomiej Zolnierkiewicz, Mel Gorman,
Andrew Morton, netdev, linux-mm
[Cc netdev]
On Wed, Aug 26, 2009 at 09:09:44AM +0300, Pekka Enberg wrote:
> On Tue, Aug 25, 2009 at 11:34 PM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. A Please verify if it still should be listed and let me know
> > (either way).
> >
> > Bug-Entry A A A : http://bugzilla.kernel.org/show_bug.cgi?id=14016
> > Subject A A A A : mm/ipw2200 regression
> > Submitter A A A : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> > Date A A A A A A : 2009-08-15 16:56 (11 days old)
> > References A A A : http://marc.info/?l=linux-kernel&m=125036437221408&w=4
>
> If am reading the page allocator dump correctly, there's plenty of
> pages left but we're unable to satisfy an order 6 allocation. There's
> no slab allocator involved so the page allocator changes that went
> into 2.6.31 seem likely. Mel, ideas?
It's an atomic order-6 allocation, the chances for this to succeed
after some uptime become infinitesimal. The chunks > order-2 are
pretty much exhausted on this dump.
64 pages, presumably 256k, for fw->boot_size while current ipw
firmware images have ~188k. I don't know jack squat about this
driver, but given the field name and the struct:
struct ipw_fw {
__le32 ver;
__le32 boot_size;
__le32 ucode_size;
__le32 fw_size;
u8 data[0];
};
fw->boot_size alone being that big sounds a bit fishy to me.
Hannes
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14016] mm/ipw2200 regression
@ 2009-08-26 9:37 ` Mel Gorman
0 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-08-26 9:37 UTC (permalink / raw)
To: Johannes Weiner
Cc: Pekka Enberg, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Bartlomiej Zolnierkiewicz, Mel Gorman,
Andrew Morton, netdev, linux-mm
On Wed, Aug 26, 2009 at 10:27:41AM +0200, Johannes Weiner wrote:
> [Cc netdev]
>
> On Wed, Aug 26, 2009 at 09:09:44AM +0300, Pekka Enberg wrote:
> > On Tue, Aug 25, 2009 at 11:34 PM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> > > This message has been generated automatically as a part of a report
> > > of recent regressions.
> > >
> > > The following bug entry is on the current list of known regressions
> > > from 2.6.30. Please verify if it still should be listed and let me know
> > > (either way).
> > >
> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14016
> > > Subject : mm/ipw2200 regression
> > > Submitter : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> > > Date : 2009-08-15 16:56 (11 days old)
> > > References : http://marc.info/?l=linux-kernel&m=125036437221408&w=4
> >
> > If am reading the page allocator dump correctly, there's plenty of
> > pages left but we're unable to satisfy an order 6 allocation. There's
> > no slab allocator involved so the page allocator changes that went
> > into 2.6.31 seem likely. Mel, ideas?
>
> It's an atomic order-6 allocation, the chances for this to succeed
> after some uptime become infinitesimal. The chunks > order-2 are
> pretty much exhausted on this dump.
>
> 64 pages, presumably 256k, for fw->boot_size while current ipw
> firmware images have ~188k. I don't know jack squat about this
> driver, but given the field name and the struct:
>
> struct ipw_fw {
> __le32 ver;
> __le32 boot_size;
> __le32 ucode_size;
> __le32 fw_size;
> u8 data[0];
> };
>
> fw->boot_size alone being that big sounds a bit fishy to me.
>
Agreed. While there are a low number of order-6 pages free in the page
allocation failure dump, there are not enough for watermarks to be
satisified. As it's atomic, there is little that can be done from a VM
perspective and it's the responsibility of the driver. I'm no driver expert
but I'll have a go at fixing it anyway.
My reading of this is that the firmware is being loaded from a workqueue and
I am failing to see any restriction on sleeping in the path. It would appear
that the driver just used the most convenient *_alloc_coherent function
available forgetting that it assumes GFP_ATOMIC. Can someone who does know
which way is up with a driver tell me why the patch below might not
work?
Bartlomiej, any chance you could give this a spin? Preferably, you'd
have preempt enabled and CONFIG_DEBUG_SPINLOCK_SLEEP on as well because
that combination will complain loudly if we really can't sleep in this
path.
=====
ipw2200: Avoid large GFP_ATOMIC allocation during firmware loading
ipw2200 uses pci_alloc_consistent() to allocate a large coherent buffer for
the loading of firmware which is an order-6 allocation of GFP_ATOMIC. At
system start-up time, this is not a problem. However, the firmware on the
card can get confused and the corrective action taken is to reload the
firmware and reinit the card. High-order GFP_ATOMIC allocations of this
type can and will fail when the system is already up and running.
As the firmware is loaded from a workqueue, it should be possible for
the driver to go to sleep. This patch converts the call of
pci_alloc_consistent() which assumes GFP_ATOMIC to dma_alloc_coherent()
which can specify its own flags.
The big downside with this patch is that it uses GFP_REPEAT to avoid the
driver unloading. There is potential that this will cause a reclaim
storm as the machine tries to find a free order-6 buffer. A suggested
alternative for the driver owner is in the comments.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
drivers/net/wireless/ipw2x00/ipw2200.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ipw2x00/ipw2200.c b/drivers/net/wireless/ipw2x00/ipw2200.c
index 44c29b3..f2e251e 100644
--- a/drivers/net/wireless/ipw2x00/ipw2200.c
+++ b/drivers/net/wireless/ipw2x00/ipw2200.c
@@ -3167,7 +3167,19 @@ static int ipw_load_firmware(struct ipw_priv *priv, u8 * data, size_t len)
u8 *shared_virt;
IPW_DEBUG_TRACE("<< : \n");
- shared_virt = pci_alloc_consistent(priv->pci_dev, len, &shared_phys);
+
+ /*
+ * This is a whopping large allocation, in or around order-6 so
+ * dma_alloc_coherent is used to specify the GFP_KERNEL|__GFP_REPEAT
+ * flags. Note that this action means the system could go into a
+ * reclaim loop until it cannot reclaim any more trying to satisfy
+ * the allocation. It would be preferable if one buffer is allocated
+ * at driver initialisation and reused when the firmware needs to
+ * be reloaded, overwriting the existing firmware each time
+ */
+ shared_virt = dma_alloc_coherent(
+ priv->pci_dev == NULL ? NULL : &priv->pci_dev->dev,
+ len, &shared_phys, GFP_KERNEL|__GFP_REPEAT);
if (!shared_virt)
return -ENOMEM;
^ permalink raw reply related [flat|nested] 286+ messages in thread
* Re: [Bug #14016] mm/ipw2200 regression
@ 2009-08-26 9:37 ` Mel Gorman
0 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-08-26 9:37 UTC (permalink / raw)
To: Johannes Weiner
Cc: Pekka Enberg, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Bartlomiej Zolnierkiewicz, Mel Gorman,
Andrew Morton, netdev, linux-mm
On Wed, Aug 26, 2009 at 10:27:41AM +0200, Johannes Weiner wrote:
> [Cc netdev]
>
> On Wed, Aug 26, 2009 at 09:09:44AM +0300, Pekka Enberg wrote:
> > On Tue, Aug 25, 2009 at 11:34 PM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> > > This message has been generated automatically as a part of a report
> > > of recent regressions.
> > >
> > > The following bug entry is on the current list of known regressions
> > > from 2.6.30. Please verify if it still should be listed and let me know
> > > (either way).
> > >
> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14016
> > > Subject : mm/ipw2200 regression
> > > Submitter : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> > > Date : 2009-08-15 16:56 (11 days old)
> > > References : http://marc.info/?l=linux-kernel&m=125036437221408&w=4
> >
> > If am reading the page allocator dump correctly, there's plenty of
> > pages left but we're unable to satisfy an order 6 allocation. There's
> > no slab allocator involved so the page allocator changes that went
> > into 2.6.31 seem likely. Mel, ideas?
>
> It's an atomic order-6 allocation, the chances for this to succeed
> after some uptime become infinitesimal. The chunks > order-2 are
> pretty much exhausted on this dump.
>
> 64 pages, presumably 256k, for fw->boot_size while current ipw
> firmware images have ~188k. I don't know jack squat about this
> driver, but given the field name and the struct:
>
> struct ipw_fw {
> __le32 ver;
> __le32 boot_size;
> __le32 ucode_size;
> __le32 fw_size;
> u8 data[0];
> };
>
> fw->boot_size alone being that big sounds a bit fishy to me.
>
Agreed. While there are a low number of order-6 pages free in the page
allocation failure dump, there are not enough for watermarks to be
satisified. As it's atomic, there is little that can be done from a VM
perspective and it's the responsibility of the driver. I'm no driver expert
but I'll have a go at fixing it anyway.
My reading of this is that the firmware is being loaded from a workqueue and
I am failing to see any restriction on sleeping in the path. It would appear
that the driver just used the most convenient *_alloc_coherent function
available forgetting that it assumes GFP_ATOMIC. Can someone who does know
which way is up with a driver tell me why the patch below might not
work?
Bartlomiej, any chance you could give this a spin? Preferably, you'd
have preempt enabled and CONFIG_DEBUG_SPINLOCK_SLEEP on as well because
that combination will complain loudly if we really can't sleep in this
path.
=====
ipw2200: Avoid large GFP_ATOMIC allocation during firmware loading
ipw2200 uses pci_alloc_consistent() to allocate a large coherent buffer for
the loading of firmware which is an order-6 allocation of GFP_ATOMIC. At
system start-up time, this is not a problem. However, the firmware on the
card can get confused and the corrective action taken is to reload the
firmware and reinit the card. High-order GFP_ATOMIC allocations of this
type can and will fail when the system is already up and running.
As the firmware is loaded from a workqueue, it should be possible for
the driver to go to sleep. This patch converts the call of
pci_alloc_consistent() which assumes GFP_ATOMIC to dma_alloc_coherent()
which can specify its own flags.
The big downside with this patch is that it uses GFP_REPEAT to avoid the
driver unloading. There is potential that this will cause a reclaim
storm as the machine tries to find a free order-6 buffer. A suggested
alternative for the driver owner is in the comments.
Signed-off-by: Mel Gorman <mel@csn.ul.ie>
---
drivers/net/wireless/ipw2x00/ipw2200.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ipw2x00/ipw2200.c b/drivers/net/wireless/ipw2x00/ipw2200.c
index 44c29b3..f2e251e 100644
--- a/drivers/net/wireless/ipw2x00/ipw2200.c
+++ b/drivers/net/wireless/ipw2x00/ipw2200.c
@@ -3167,7 +3167,19 @@ static int ipw_load_firmware(struct ipw_priv *priv, u8 * data, size_t len)
u8 *shared_virt;
IPW_DEBUG_TRACE("<< : \n");
- shared_virt = pci_alloc_consistent(priv->pci_dev, len, &shared_phys);
+
+ /*
+ * This is a whopping large allocation, in or around order-6 so
+ * dma_alloc_coherent is used to specify the GFP_KERNEL|__GFP_REPEAT
+ * flags. Note that this action means the system could go into a
+ * reclaim loop until it cannot reclaim any more trying to satisfy
+ * the allocation. It would be preferable if one buffer is allocated
+ * at driver initialisation and reused when the firmware needs to
+ * be reloaded, overwriting the existing firmware each time
+ */
+ shared_virt = dma_alloc_coherent(
+ priv->pci_dev == NULL ? NULL : &priv->pci_dev->dev,
+ len, &shared_phys, GFP_KERNEL|__GFP_REPEAT);
if (!shared_virt)
return -ENOMEM;
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 286+ messages in thread
* Re: [Bug #14016] mm/ipw2200 regression
@ 2009-08-26 9:37 ` Mel Gorman
0 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-08-26 9:37 UTC (permalink / raw)
To: Johannes Weiner
Cc: Pekka Enberg, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Bartlomiej Zolnierkiewicz, Mel Gorman,
Andrew Morton, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg
On Wed, Aug 26, 2009 at 10:27:41AM +0200, Johannes Weiner wrote:
> [Cc netdev]
>
> On Wed, Aug 26, 2009 at 09:09:44AM +0300, Pekka Enberg wrote:
> > On Tue, Aug 25, 2009 at 11:34 PM, Rafael J. Wysocki<rjw-KKrjLPT3xs0@public.gmane.org> wrote:
> > > This message has been generated automatically as a part of a report
> > > of recent regressions.
> > >
> > > The following bug entry is on the current list of known regressions
> > > from 2.6.30. Please verify if it still should be listed and let me know
> > > (either way).
> > >
> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14016
> > > Subject : mm/ipw2200 regression
> > > Submitter : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> > > Date : 2009-08-15 16:56 (11 days old)
> > > References : http://marc.info/?l=linux-kernel&m=125036437221408&w=4
> >
> > If am reading the page allocator dump correctly, there's plenty of
> > pages left but we're unable to satisfy an order 6 allocation. There's
> > no slab allocator involved so the page allocator changes that went
> > into 2.6.31 seem likely. Mel, ideas?
>
> It's an atomic order-6 allocation, the chances for this to succeed
> after some uptime become infinitesimal. The chunks > order-2 are
> pretty much exhausted on this dump.
>
> 64 pages, presumably 256k, for fw->boot_size while current ipw
> firmware images have ~188k. I don't know jack squat about this
> driver, but given the field name and the struct:
>
> struct ipw_fw {
> __le32 ver;
> __le32 boot_size;
> __le32 ucode_size;
> __le32 fw_size;
> u8 data[0];
> };
>
> fw->boot_size alone being that big sounds a bit fishy to me.
>
Agreed. While there are a low number of order-6 pages free in the page
allocation failure dump, there are not enough for watermarks to be
satisified. As it's atomic, there is little that can be done from a VM
perspective and it's the responsibility of the driver. I'm no driver expert
but I'll have a go at fixing it anyway.
My reading of this is that the firmware is being loaded from a workqueue and
I am failing to see any restriction on sleeping in the path. It would appear
that the driver just used the most convenient *_alloc_coherent function
available forgetting that it assumes GFP_ATOMIC. Can someone who does know
which way is up with a driver tell me why the patch below might not
work?
Bartlomiej, any chance you could give this a spin? Preferably, you'd
have preempt enabled and CONFIG_DEBUG_SPINLOCK_SLEEP on as well because
that combination will complain loudly if we really can't sleep in this
path.
=====
ipw2200: Avoid large GFP_ATOMIC allocation during firmware loading
ipw2200 uses pci_alloc_consistent() to allocate a large coherent buffer for
the loading of firmware which is an order-6 allocation of GFP_ATOMIC. At
system start-up time, this is not a problem. However, the firmware on the
card can get confused and the corrective action taken is to reload the
firmware and reinit the card. High-order GFP_ATOMIC allocations of this
type can and will fail when the system is already up and running.
As the firmware is loaded from a workqueue, it should be possible for
the driver to go to sleep. This patch converts the call of
pci_alloc_consistent() which assumes GFP_ATOMIC to dma_alloc_coherent()
which can specify its own flags.
The big downside with this patch is that it uses GFP_REPEAT to avoid the
driver unloading. There is potential that this will cause a reclaim
storm as the machine tries to find a free order-6 buffer. A suggested
alternative for the driver owner is in the comments.
Signed-off-by: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>
---
drivers/net/wireless/ipw2x00/ipw2200.c | 14 +++++++++++++-
1 file changed, 13 insertions(+), 1 deletion(-)
diff --git a/drivers/net/wireless/ipw2x00/ipw2200.c b/drivers/net/wireless/ipw2x00/ipw2200.c
index 44c29b3..f2e251e 100644
--- a/drivers/net/wireless/ipw2x00/ipw2200.c
+++ b/drivers/net/wireless/ipw2x00/ipw2200.c
@@ -3167,7 +3167,19 @@ static int ipw_load_firmware(struct ipw_priv *priv, u8 * data, size_t len)
u8 *shared_virt;
IPW_DEBUG_TRACE("<< : \n");
- shared_virt = pci_alloc_consistent(priv->pci_dev, len, &shared_phys);
+
+ /*
+ * This is a whopping large allocation, in or around order-6 so
+ * dma_alloc_coherent is used to specify the GFP_KERNEL|__GFP_REPEAT
+ * flags. Note that this action means the system could go into a
+ * reclaim loop until it cannot reclaim any more trying to satisfy
+ * the allocation. It would be preferable if one buffer is allocated
+ * at driver initialisation and reused when the firmware needs to
+ * be reloaded, overwriting the existing firmware each time
+ */
+ shared_virt = dma_alloc_coherent(
+ priv->pci_dev == NULL ? NULL : &priv->pci_dev->dev,
+ len, &shared_phys, GFP_KERNEL|__GFP_REPEAT);
if (!shared_virt)
return -ENOMEM;
^ permalink raw reply related [flat|nested] 286+ messages in thread
* Re: [Bug #14016] mm/ipw2200 regression
2009-08-26 9:37 ` Mel Gorman
@ 2009-08-26 14:44 ` Andrew Morton
-1 siblings, 0 replies; 286+ messages in thread
From: Andrew Morton @ 2009-08-26 14:44 UTC (permalink / raw)
To: Mel Gorman
Cc: Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List,
Bartlomiej Zolnierkiewicz, Mel Gorman, netdev, linux-mm, Zhu Yi,
James Ketrenos, Reinette Chatre, linux-wireless, ipw2100-devel
(cc IPW maintainers and mailing lists)
On Wed, 26 Aug 2009 10:37:49 +0100 Mel Gorman <mel@csn.ul.ie> wrote:
> On Wed, Aug 26, 2009 at 10:27:41AM +0200, Johannes Weiner wrote:
> > [Cc netdev]
> >
> > On Wed, Aug 26, 2009 at 09:09:44AM +0300, Pekka Enberg wrote:
> > > On Tue, Aug 25, 2009 at 11:34 PM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> > > > This message has been generated automatically as a part of a report
> > > > of recent regressions.
> > > >
> > > > The following bug entry is on the current list of known regressions
> > > > from 2.6.30. __Please verify if it still should be listed and let me know
> > > > (either way).
> > > >
> > > > Bug-Entry __ __ __ : http://bugzilla.kernel.org/show_bug.cgi?id=14016
> > > > Subject __ __ __ __ : mm/ipw2200 regression
> > > > Submitter __ __ __ : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> > > > Date __ __ __ __ __ __: 2009-08-15 16:56 (11 days old)
> > > > References __ __ __: http://marc.info/?l=linux-kernel&m=125036437221408&w=4
> > >
> > > If am reading the page allocator dump correctly, there's plenty of
> > > pages left but we're unable to satisfy an order 6 allocation. There's
> > > no slab allocator involved so the page allocator changes that went
> > > into 2.6.31 seem likely. Mel, ideas?
> >
> > It's an atomic order-6 allocation, the chances for this to succeed
> > after some uptime become infinitesimal. The chunks > order-2 are
> > pretty much exhausted on this dump.
> >
> > 64 pages, presumably 256k, for fw->boot_size while current ipw
> > firmware images have ~188k. I don't know jack squat about this
> > driver, but given the field name and the struct:
> >
> > struct ipw_fw {
> > __le32 ver;
> > __le32 boot_size;
> > __le32 ucode_size;
> > __le32 fw_size;
> > u8 data[0];
> > };
> >
> > fw->boot_size alone being that big sounds a bit fishy to me.
> >
>
> Agreed. While there are a low number of order-6 pages free in the page
> allocation failure dump, there are not enough for watermarks to be
> satisified. As it's atomic, there is little that can be done from a VM
> perspective and it's the responsibility of the driver. I'm no driver expert
> but I'll have a go at fixing it anyway.
>
> My reading of this is that the firmware is being loaded from a workqueue and
> I am failing to see any restriction on sleeping in the path. It would appear
> that the driver just used the most convenient *_alloc_coherent function
> available forgetting that it assumes GFP_ATOMIC. Can someone who does know
> which way is up with a driver tell me why the patch below might not
> work?
>
> Bartlomiej, any chance you could give this a spin? Preferably, you'd
> have preempt enabled and CONFIG_DEBUG_SPINLOCK_SLEEP on as well because
> that combination will complain loudly if we really can't sleep in this
> path.
>
> =====
> ipw2200: Avoid large GFP_ATOMIC allocation during firmware loading
>
> ipw2200 uses pci_alloc_consistent() to allocate a large coherent buffer for
> the loading of firmware which is an order-6 allocation of GFP_ATOMIC. At
> system start-up time, this is not a problem. However, the firmware on the
> card can get confused and the corrective action taken is to reload the
> firmware and reinit the card. High-order GFP_ATOMIC allocations of this
> type can and will fail when the system is already up and running.
>
> As the firmware is loaded from a workqueue, it should be possible for
> the driver to go to sleep. This patch converts the call of
> pci_alloc_consistent() which assumes GFP_ATOMIC to dma_alloc_coherent()
> which can specify its own flags.
>
> The big downside with this patch is that it uses GFP_REPEAT to avoid the
> driver unloading. There is potential that this will cause a reclaim
> storm as the machine tries to find a free order-6 buffer. A suggested
> alternative for the driver owner is in the comments.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
> drivers/net/wireless/ipw2x00/ipw2200.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/ipw2x00/ipw2200.c b/drivers/net/wireless/ipw2x00/ipw2200.c
> index 44c29b3..f2e251e 100644
> --- a/drivers/net/wireless/ipw2x00/ipw2200.c
> +++ b/drivers/net/wireless/ipw2x00/ipw2200.c
> @@ -3167,7 +3167,19 @@ static int ipw_load_firmware(struct ipw_priv *priv, u8 * data, size_t len)
> u8 *shared_virt;
>
> IPW_DEBUG_TRACE("<< : \n");
> - shared_virt = pci_alloc_consistent(priv->pci_dev, len, &shared_phys);
> +
> + /*
> + * This is a whopping large allocation, in or around order-6 so
> + * dma_alloc_coherent is used to specify the GFP_KERNEL|__GFP_REPEAT
> + * flags. Note that this action means the system could go into a
> + * reclaim loop until it cannot reclaim any more trying to satisfy
> + * the allocation. It would be preferable if one buffer is allocated
> + * at driver initialisation and reused when the firmware needs to
> + * be reloaded, overwriting the existing firmware each time
> + */
> + shared_virt = dma_alloc_coherent(
> + priv->pci_dev == NULL ? NULL : &priv->pci_dev->dev,
> + len, &shared_phys, GFP_KERNEL|__GFP_REPEAT);
>
> if (!shared_virt)
> return -ENOMEM;
Of course, the risk of making a change like this is that we'll then go
and leave it there.
To fix this code properly we should, as you say, stop doing an order-6
allocation altogether.
And right now I think it's doing _two_ order-6 allocations:
shared_virt = pci_alloc_consistent(priv->pci_dev, len, &shared_phys);
if (!shared_virt)
return -ENOMEM;
memmove(shared_virt, data, len);
whoever allocated `data' is being obnoxious as well.
It is perhaps pretty simple to make the second (GFP_ATOMIC) allocation
go away. The code is already conveniently structured to do this:
do {
chunk = (struct fw_chunk *)(data + offset);
offset += sizeof(struct fw_chunk);
/* build DMA packet and queue up for sending */
/* dma to chunk->address, the chunk->length bytes from data +
* offeset*/
/* Dma loading */
rc = ipw_fw_dma_add_buffer(priv, shared_phys + offset,
le32_to_cpu(chunk->address),
le32_to_cpu(chunk->length));
if (rc) {
IPW_DEBUG_INFO("dmaAddBuffer Failed\n");
goto out;
}
offset += le32_to_cpu(chunk->length);
} while (offset < len);
what is the typical/expected value of chunk->length here? If it's
significantly less than 4096*(2^6), could we convert this function to
use a separate DMAable allocation per fw_chunk?
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14016] mm/ipw2200 regression
@ 2009-08-26 14:44 ` Andrew Morton
0 siblings, 0 replies; 286+ messages in thread
From: Andrew Morton @ 2009-08-26 14:44 UTC (permalink / raw)
To: Mel Gorman
Cc: Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List,
Bartlomiej Zolnierkiewicz, Mel Gorman, netdev, linux-mm, Zhu Yi,
James Ketrenos, Reinette Chatre, linux-wireless, ipw2100-devel
(cc IPW maintainers and mailing lists)
On Wed, 26 Aug 2009 10:37:49 +0100 Mel Gorman <mel@csn.ul.ie> wrote:
> On Wed, Aug 26, 2009 at 10:27:41AM +0200, Johannes Weiner wrote:
> > [Cc netdev]
> >
> > On Wed, Aug 26, 2009 at 09:09:44AM +0300, Pekka Enberg wrote:
> > > On Tue, Aug 25, 2009 at 11:34 PM, Rafael J. Wysocki<rjw@sisk.pl> wrote:
> > > > This message has been generated automatically as a part of a report
> > > > of recent regressions.
> > > >
> > > > The following bug entry is on the current list of known regressions
> > > > from 2.6.30. __Please verify if it still should be listed and let me know
> > > > (either way).
> > > >
> > > > Bug-Entry __ __ __ : http://bugzilla.kernel.org/show_bug.cgi?id=14016
> > > > Subject __ __ __ __ : mm/ipw2200 regression
> > > > Submitter __ __ __ : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
> > > > Date __ __ __ __ __ __: 2009-08-15 16:56 (11 days old)
> > > > References __ __ __: http://marc.info/?l=linux-kernel&m=125036437221408&w=4
> > >
> > > If am reading the page allocator dump correctly, there's plenty of
> > > pages left but we're unable to satisfy an order 6 allocation. There's
> > > no slab allocator involved so the page allocator changes that went
> > > into 2.6.31 seem likely. Mel, ideas?
> >
> > It's an atomic order-6 allocation, the chances for this to succeed
> > after some uptime become infinitesimal. The chunks > order-2 are
> > pretty much exhausted on this dump.
> >
> > 64 pages, presumably 256k, for fw->boot_size while current ipw
> > firmware images have ~188k. I don't know jack squat about this
> > driver, but given the field name and the struct:
> >
> > struct ipw_fw {
> > __le32 ver;
> > __le32 boot_size;
> > __le32 ucode_size;
> > __le32 fw_size;
> > u8 data[0];
> > };
> >
> > fw->boot_size alone being that big sounds a bit fishy to me.
> >
>
> Agreed. While there are a low number of order-6 pages free in the page
> allocation failure dump, there are not enough for watermarks to be
> satisified. As it's atomic, there is little that can be done from a VM
> perspective and it's the responsibility of the driver. I'm no driver expert
> but I'll have a go at fixing it anyway.
>
> My reading of this is that the firmware is being loaded from a workqueue and
> I am failing to see any restriction on sleeping in the path. It would appear
> that the driver just used the most convenient *_alloc_coherent function
> available forgetting that it assumes GFP_ATOMIC. Can someone who does know
> which way is up with a driver tell me why the patch below might not
> work?
>
> Bartlomiej, any chance you could give this a spin? Preferably, you'd
> have preempt enabled and CONFIG_DEBUG_SPINLOCK_SLEEP on as well because
> that combination will complain loudly if we really can't sleep in this
> path.
>
> =====
> ipw2200: Avoid large GFP_ATOMIC allocation during firmware loading
>
> ipw2200 uses pci_alloc_consistent() to allocate a large coherent buffer for
> the loading of firmware which is an order-6 allocation of GFP_ATOMIC. At
> system start-up time, this is not a problem. However, the firmware on the
> card can get confused and the corrective action taken is to reload the
> firmware and reinit the card. High-order GFP_ATOMIC allocations of this
> type can and will fail when the system is already up and running.
>
> As the firmware is loaded from a workqueue, it should be possible for
> the driver to go to sleep. This patch converts the call of
> pci_alloc_consistent() which assumes GFP_ATOMIC to dma_alloc_coherent()
> which can specify its own flags.
>
> The big downside with this patch is that it uses GFP_REPEAT to avoid the
> driver unloading. There is potential that this will cause a reclaim
> storm as the machine tries to find a free order-6 buffer. A suggested
> alternative for the driver owner is in the comments.
>
> Signed-off-by: Mel Gorman <mel@csn.ul.ie>
> ---
> drivers/net/wireless/ipw2x00/ipw2200.c | 14 +++++++++++++-
> 1 file changed, 13 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/net/wireless/ipw2x00/ipw2200.c b/drivers/net/wireless/ipw2x00/ipw2200.c
> index 44c29b3..f2e251e 100644
> --- a/drivers/net/wireless/ipw2x00/ipw2200.c
> +++ b/drivers/net/wireless/ipw2x00/ipw2200.c
> @@ -3167,7 +3167,19 @@ static int ipw_load_firmware(struct ipw_priv *priv, u8 * data, size_t len)
> u8 *shared_virt;
>
> IPW_DEBUG_TRACE("<< : \n");
> - shared_virt = pci_alloc_consistent(priv->pci_dev, len, &shared_phys);
> +
> + /*
> + * This is a whopping large allocation, in or around order-6 so
> + * dma_alloc_coherent is used to specify the GFP_KERNEL|__GFP_REPEAT
> + * flags. Note that this action means the system could go into a
> + * reclaim loop until it cannot reclaim any more trying to satisfy
> + * the allocation. It would be preferable if one buffer is allocated
> + * at driver initialisation and reused when the firmware needs to
> + * be reloaded, overwriting the existing firmware each time
> + */
> + shared_virt = dma_alloc_coherent(
> + priv->pci_dev == NULL ? NULL : &priv->pci_dev->dev,
> + len, &shared_phys, GFP_KERNEL|__GFP_REPEAT);
>
> if (!shared_virt)
> return -ENOMEM;
Of course, the risk of making a change like this is that we'll then go
and leave it there.
To fix this code properly we should, as you say, stop doing an order-6
allocation altogether.
And right now I think it's doing _two_ order-6 allocations:
shared_virt = pci_alloc_consistent(priv->pci_dev, len, &shared_phys);
if (!shared_virt)
return -ENOMEM;
memmove(shared_virt, data, len);
whoever allocated `data' is being obnoxious as well.
It is perhaps pretty simple to make the second (GFP_ATOMIC) allocation
go away. The code is already conveniently structured to do this:
do {
chunk = (struct fw_chunk *)(data + offset);
offset += sizeof(struct fw_chunk);
/* build DMA packet and queue up for sending */
/* dma to chunk->address, the chunk->length bytes from data +
* offeset*/
/* Dma loading */
rc = ipw_fw_dma_add_buffer(priv, shared_phys + offset,
le32_to_cpu(chunk->address),
le32_to_cpu(chunk->length));
if (rc) {
IPW_DEBUG_INFO("dmaAddBuffer Failed\n");
goto out;
}
offset += le32_to_cpu(chunk->length);
} while (offset < len);
what is the typical/expected value of chunk->length here? If it's
significantly less than 4096*(2^6), could we convert this function to
use a separate DMAable allocation per fw_chunk?
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14016] mm/ipw2200 regression
2009-08-26 14:44 ` Andrew Morton
(?)
(?)
@ 2009-08-27 9:11 ` Zhu Yi
-1 siblings, 0 replies; 286+ messages in thread
From: Zhu Yi @ 2009-08-27 9:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List,
Bartlomiej Zolnierkiewicz, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Wed, 2009-08-26 at 22:44 +0800, Andrew Morton wrote:
>
> It is perhaps pretty simple to make the second (GFP_ATOMIC) allocation
> go away. The code is already conveniently structured to do this:
>
> do {
> chunk = (struct fw_chunk *)(data + offset);
> offset += sizeof(struct fw_chunk);
> /* build DMA packet and queue up for sending */
> /* dma to chunk->address, the chunk->length bytes from
> data +
> * offeset*/
> /* Dma loading */
> rc = ipw_fw_dma_add_buffer(priv, shared_phys + offset,
>
> le32_to_cpu(chunk->address),
>
> le32_to_cpu(chunk->length));
> if (rc) {
> IPW_DEBUG_INFO("dmaAddBuffer Failed\n");
> goto out;
> }
>
> offset += le32_to_cpu(chunk->length);
> } while (offset < len);
>
> what is the typical/expected value of chunk->length here? If it's
> significantly less than 4096*(2^6), could we convert this function to
> use a separate DMAable allocation per fw_chunk?
Unfortunately, the largest chunk size for the latest 3.1 firmware is
0x20040, which also requires order 6 page allocation. I'll try to use
the firmware DMA command block (64 slots) to handle the image (each for
4k, totally 256k).
Thanks,
-yi
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14016] mm/ipw2200 regression
@ 2009-08-27 9:11 ` Zhu Yi
0 siblings, 0 replies; 286+ messages in thread
From: Zhu Yi @ 2009-08-27 9:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List,
Bartlomiej Zolnierkiewicz, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Wed, 2009-08-26 at 22:44 +0800, Andrew Morton wrote:
>
> It is perhaps pretty simple to make the second (GFP_ATOMIC) allocation
> go away. The code is already conveniently structured to do this:
>
> do {
> chunk = (struct fw_chunk *)(data + offset);
> offset += sizeof(struct fw_chunk);
> /* build DMA packet and queue up for sending */
> /* dma to chunk->address, the chunk->length bytes from
> data +
> * offeset*/
> /* Dma loading */
> rc = ipw_fw_dma_add_buffer(priv, shared_phys + offset,
>
> le32_to_cpu(chunk->address),
>
> le32_to_cpu(chunk->length));
> if (rc) {
> IPW_DEBUG_INFO("dmaAddBuffer Failed\n");
> goto out;
> }
>
> offset += le32_to_cpu(chunk->length);
> } while (offset < len);
>
> what is the typical/expected value of chunk->length here? If it's
> significantly less than 4096*(2^6), could we convert this function to
> use a separate DMAable allocation per fw_chunk?
Unfortunately, the largest chunk size for the latest 3.1 firmware is
0x20040, which also requires order 6 page allocation. I'll try to use
the firmware DMA command block (64 slots) to handle the image (each for
4k, totally 256k).
Thanks,
-yi
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14016] mm/ipw2200 regression
@ 2009-08-27 9:11 ` Zhu Yi
0 siblings, 0 replies; 286+ messages in thread
From: Zhu Yi @ 2009-08-27 9:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List,
Bartlomiej Zolnierkiewicz, Mel Gorman,
netdev-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
James Ketrenos, Chatre, Reinette,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
ipw2100-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
On Wed, 2009-08-26 at 22:44 +0800, Andrew Morton wrote:
>
> It is perhaps pretty simple to make the second (GFP_ATOMIC) allocation
> go away. The code is already conveniently structured to do this:
>
> do {
> chunk = (struct fw_chunk *)(data + offset);
> offset += sizeof(struct fw_chunk);
> /* build DMA packet and queue up for sending */
> /* dma to chunk->address, the chunk->length bytes from
> data +
> * offeset*/
> /* Dma loading */
> rc = ipw_fw_dma_add_buffer(priv, shared_phys + offset,
>
> le32_to_cpu(chunk->address),
>
> le32_to_cpu(chunk->length));
> if (rc) {
> IPW_DEBUG_INFO("dmaAddBuffer Failed\n");
> goto out;
> }
>
> offset += le32_to_cpu(chunk->length);
> } while (offset < len);
>
> what is the typical/expected value of chunk->length here? If it's
> significantly less than 4096*(2^6), could we convert this function to
> use a separate DMAable allocation per fw_chunk?
Unfortunately, the largest chunk size for the latest 3.1 firmware is
0x20040, which also requires order 6 page allocation. I'll try to use
the firmware DMA command block (64 slots) to handle the image (each for
4k, totally 256k).
Thanks,
-yi
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14016] mm/ipw2200 regression
@ 2009-08-27 9:11 ` Zhu Yi
0 siblings, 0 replies; 286+ messages in thread
From: Zhu Yi @ 2009-08-27 9:11 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List,
Bartlomiej Zolnierkiewicz, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Wed, 2009-08-26 at 22:44 +0800, Andrew Morton wrote:
>
> It is perhaps pretty simple to make the second (GFP_ATOMIC) allocation
> go away. The code is already conveniently structured to do this:
>
> do {
> chunk = (struct fw_chunk *)(data + offset);
> offset += sizeof(struct fw_chunk);
> /* build DMA packet and queue up for sending */
> /* dma to chunk->address, the chunk->length bytes from
> data +
> * offeset*/
> /* Dma loading */
> rc = ipw_fw_dma_add_buffer(priv, shared_phys + offset,
>
> le32_to_cpu(chunk->address),
>
> le32_to_cpu(chunk->length));
> if (rc) {
> IPW_DEBUG_INFO("dmaAddBuffer Failed\n");
> goto out;
> }
>
> offset += le32_to_cpu(chunk->length);
> } while (offset < len);
>
> what is the typical/expected value of chunk->length here? If it's
> significantly less than 4096*(2^6), could we convert this function to
> use a separate DMAable allocation per fw_chunk?
Unfortunately, the largest chunk size for the latest 3.1 firmware is
0x20040, which also requires order 6 page allocation. I'll try to use
the firmware DMA command block (64 slots) to handle the image (each for
4k, totally 256k).
Thanks,
-yi
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14016] mm/ipw2200 regression
2009-08-27 9:11 ` Zhu Yi
(?)
@ 2009-08-27 9:45 ` Mel Gorman
-1 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-08-27 9:45 UTC (permalink / raw)
To: Zhu Yi
Cc: Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List,
Bartlomiej Zolnierkiewicz, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Thu, Aug 27, 2009 at 05:11:29PM +0800, Zhu Yi wrote:
> On Wed, 2009-08-26 at 22:44 +0800, Andrew Morton wrote:
> >
> > It is perhaps pretty simple to make the second (GFP_ATOMIC) allocation
> > go away. The code is already conveniently structured to do this:
> >
> > do {
> > chunk = (struct fw_chunk *)(data + offset);
> > offset += sizeof(struct fw_chunk);
> > /* build DMA packet and queue up for sending */
> > /* dma to chunk->address, the chunk->length bytes from
> > data +
> > * offeset*/
> > /* Dma loading */
> > rc = ipw_fw_dma_add_buffer(priv, shared_phys + offset,
> >
> > le32_to_cpu(chunk->address),
> >
> > le32_to_cpu(chunk->length));
> > if (rc) {
> > IPW_DEBUG_INFO("dmaAddBuffer Failed\n");
> > goto out;
> > }
> >
> > offset += le32_to_cpu(chunk->length);
> > } while (offset < len);
> >
> > what is the typical/expected value of chunk->length here? If it's
> > significantly less than 4096*(2^6), could we convert this function to
> > use a separate DMAable allocation per fw_chunk?
>
> Unfortunately, the largest chunk size for the latest 3.1 firmware is
> 0x20040, which also requires order 6 page allocation. I'll try to use
> the firmware DMA command block (64 slots) to handle the image (each for
> 4k, totally 256k).
>
That would be preferable as trying to make alloc-6 atomic allocations isn't
going to pan out. As I noted, doing it as GFP_KERNEL is possible but it'll
manifest as weird stalls periodically when the driver is loaded due to
reclaim and if the system is swapless, it might not work at all if memory
is mostly anonymous.
If the DMA command block doesn't work out, what is the feasibility of holding
onto the order-6 allocation once the module is loaded instead of allocing
for the duration of the firmware loading and then freeing it again?
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14016] mm/ipw2200 regression
@ 2009-08-27 9:45 ` Mel Gorman
0 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-08-27 9:45 UTC (permalink / raw)
To: Zhu Yi
Cc: Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List,
Bartlomiej Zolnierkiewicz, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Thu, Aug 27, 2009 at 05:11:29PM +0800, Zhu Yi wrote:
> On Wed, 2009-08-26 at 22:44 +0800, Andrew Morton wrote:
> >
> > It is perhaps pretty simple to make the second (GFP_ATOMIC) allocation
> > go away. The code is already conveniently structured to do this:
> >
> > do {
> > chunk = (struct fw_chunk *)(data + offset);
> > offset += sizeof(struct fw_chunk);
> > /* build DMA packet and queue up for sending */
> > /* dma to chunk->address, the chunk->length bytes from
> > data +
> > * offeset*/
> > /* Dma loading */
> > rc = ipw_fw_dma_add_buffer(priv, shared_phys + offset,
> >
> > le32_to_cpu(chunk->address),
> >
> > le32_to_cpu(chunk->length));
> > if (rc) {
> > IPW_DEBUG_INFO("dmaAddBuffer Failed\n");
> > goto out;
> > }
> >
> > offset += le32_to_cpu(chunk->length);
> > } while (offset < len);
> >
> > what is the typical/expected value of chunk->length here? If it's
> > significantly less than 4096*(2^6), could we convert this function to
> > use a separate DMAable allocation per fw_chunk?
>
> Unfortunately, the largest chunk size for the latest 3.1 firmware is
> 0x20040, which also requires order 6 page allocation. I'll try to use
> the firmware DMA command block (64 slots) to handle the image (each for
> 4k, totally 256k).
>
That would be preferable as trying to make alloc-6 atomic allocations isn't
going to pan out. As I noted, doing it as GFP_KERNEL is possible but it'll
manifest as weird stalls periodically when the driver is loaded due to
reclaim and if the system is swapless, it might not work at all if memory
is mostly anonymous.
If the DMA command block doesn't work out, what is the feasibility of holding
onto the order-6 allocation once the module is loaded instead of allocing
for the duration of the firmware loading and then freeing it again?
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14016] mm/ipw2200 regression
@ 2009-08-27 9:45 ` Mel Gorman
0 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-08-27 9:45 UTC (permalink / raw)
To: Zhu Yi
Cc: Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List,
Bartlomiej Zolnierkiewicz, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Thu, Aug 27, 2009 at 05:11:29PM +0800, Zhu Yi wrote:
> On Wed, 2009-08-26 at 22:44 +0800, Andrew Morton wrote:
> >
> > It is perhaps pretty simple to make the second (GFP_ATOMIC) allocation
> > go away. The code is already conveniently structured to do this:
> >
> > do {
> > chunk = (struct fw_chunk *)(data + offset);
> > offset += sizeof(struct fw_chunk);
> > /* build DMA packet and queue up for sending */
> > /* dma to chunk->address, the chunk->length bytes from
> > data +
> > * offeset*/
> > /* Dma loading */
> > rc = ipw_fw_dma_add_buffer(priv, shared_phys + offset,
> >
> > le32_to_cpu(chunk->address),
> >
> > le32_to_cpu(chunk->length));
> > if (rc) {
> > IPW_DEBUG_INFO("dmaAddBuffer Failed\n");
> > goto out;
> > }
> >
> > offset += le32_to_cpu(chunk->length);
> > } while (offset < len);
> >
> > what is the typical/expected value of chunk->length here? If it's
> > significantly less than 4096*(2^6), could we convert this function to
> > use a separate DMAable allocation per fw_chunk?
>
> Unfortunately, the largest chunk size for the latest 3.1 firmware is
> 0x20040, which also requires order 6 page allocation. I'll try to use
> the firmware DMA command block (64 slots) to handle the image (each for
> 4k, totally 256k).
>
That would be preferable as trying to make alloc-6 atomic allocations isn't
going to pan out. As I noted, doing it as GFP_KERNEL is possible but it'll
manifest as weird stalls periodically when the driver is loaded due to
reclaim and if the system is swapless, it might not work at all if memory
is mostly anonymous.
If the DMA command block doesn't work out, what is the feasibility of holding
onto the order-6 allocation once the module is loaded instead of allocing
for the duration of the firmware loading and then freeing it again?
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply [flat|nested] 286+ messages in thread
* ipw2200: firmware DMA loading rework
2009-08-26 14:44 ` Andrew Morton
(?)
(?)
@ 2009-08-28 3:42 ` Zhu Yi
-1 siblings, 0 replies; 286+ messages in thread
From: Zhu Yi @ 2009-08-28 3:42 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List,
Bartlomiej Zolnierkiewicz, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
likely to fail and should always be avoided.
The patch fixes this problem by replacing the original order-6
pci_alloc_consistent() with an array of order-1 pages from a pci pool.
This utilized the ipw2200 DMA command blocks (up to 64 slots). The
maximum firmware size support remains the same (64*8K).
This patch fixes bug http://bugzilla.kernel.org/show_bug.cgi?id=14016
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
---
drivers/net/wireless/ipw2x00/ipw2200.c | 120 ++++++++++++++++++--------------
1 files changed, 67 insertions(+), 53 deletions(-)
diff --git a/drivers/net/wireless/ipw2x00/ipw2200.c b/drivers/net/wireless/ipw2x00/ipw2200.c
index 6dcac73..f593fbb 100644
--- a/drivers/net/wireless/ipw2x00/ipw2200.c
+++ b/drivers/net/wireless/ipw2x00/ipw2200.c
@@ -2874,45 +2874,27 @@ static int ipw_fw_dma_add_command_block(struct ipw_priv *priv,
return 0;
}
-static int ipw_fw_dma_add_buffer(struct ipw_priv *priv,
- u32 src_phys, u32 dest_address, u32 length)
+static int ipw_fw_dma_add_buffer(struct ipw_priv *priv, dma_addr_t *src_address,
+ int nr, u32 dest_address, u32 len)
{
- u32 bytes_left = length;
- u32 src_offset = 0;
- u32 dest_offset = 0;
- int status = 0;
+ int ret, i;
+ u32 size;
+
IPW_DEBUG_FW(">> \n");
- IPW_DEBUG_FW_INFO("src_phys=0x%x dest_address=0x%x length=0x%x\n",
- src_phys, dest_address, length);
- while (bytes_left > CB_MAX_LENGTH) {
- status = ipw_fw_dma_add_command_block(priv,
- src_phys + src_offset,
- dest_address +
- dest_offset,
- CB_MAX_LENGTH, 0, 0);
- if (status) {
+ IPW_DEBUG_FW_INFO("nr=%d dest_address=0x%x len=0x%x\n",
+ nr, dest_address, len);
+
+ for (i = 0; i < nr; i++) {
+ size = min_t(u32, len - i * CB_MAX_LENGTH, CB_MAX_LENGTH);
+ ret = ipw_fw_dma_add_command_block(priv, src_address[i],
+ dest_address +
+ i * CB_MAX_LENGTH, size,
+ 0, 0);
+ if (ret) {
IPW_DEBUG_FW_INFO(": Failed\n");
return -1;
} else
IPW_DEBUG_FW_INFO(": Added new cb\n");
-
- src_offset += CB_MAX_LENGTH;
- dest_offset += CB_MAX_LENGTH;
- bytes_left -= CB_MAX_LENGTH;
- }
-
- /* add the buffer tail */
- if (bytes_left > 0) {
- status =
- ipw_fw_dma_add_command_block(priv, src_phys + src_offset,
- dest_address + dest_offset,
- bytes_left, 0, 0);
- if (status) {
- IPW_DEBUG_FW_INFO(": Failed on the buffer tail\n");
- return -1;
- } else
- IPW_DEBUG_FW_INFO
- (": Adding new cb - the buffer tail\n");
}
IPW_DEBUG_FW("<< \n");
@@ -3160,59 +3142,91 @@ static int ipw_load_ucode(struct ipw_priv *priv, u8 * data, size_t len)
static int ipw_load_firmware(struct ipw_priv *priv, u8 * data, size_t len)
{
- int rc = -1;
+ int ret = -1;
int offset = 0;
struct fw_chunk *chunk;
- dma_addr_t shared_phys;
- u8 *shared_virt;
+ int total_nr = 0;
+ int i;
+ struct pci_pool *pool;
+ u32 *virts[CB_NUMBER_OF_ELEMENTS_SMALL];
+ dma_addr_t phys[CB_NUMBER_OF_ELEMENTS_SMALL];
IPW_DEBUG_TRACE("<< : \n");
- shared_virt = pci_alloc_consistent(priv->pci_dev, len, &shared_phys);
- if (!shared_virt)
+ pool = pci_pool_create("ipw2200", priv->pci_dev, CB_MAX_LENGTH, 0, 0);
+ if (!pool) {
+ IPW_ERROR("pci_pool_create failed\n");
return -ENOMEM;
-
- memmove(shared_virt, data, len);
+ }
/* Start the Dma */
- rc = ipw_fw_dma_enable(priv);
+ ret = ipw_fw_dma_enable(priv);
/* the DMA is already ready this would be a bug. */
BUG_ON(priv->sram_desc.last_cb_index > 0);
do {
+ u32 chunk_len;
+ u8 *start;
+ int size;
+ int nr = 0;
+
chunk = (struct fw_chunk *)(data + offset);
offset += sizeof(struct fw_chunk);
+ chunk_len = le32_to_cpu(chunk->length);
+ start = data + offset;
+
+ nr = (chunk_len + CB_MAX_LENGTH - 1) / CB_MAX_LENGTH;
+ for (i = 0; i < nr; i++) {
+ virts[total_nr] = pci_pool_alloc(pool, GFP_KERNEL,
+ &phys[total_nr]);
+ if (!virts[total_nr]) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ size = min_t(u32, chunk_len - i * CB_MAX_LENGTH,
+ CB_MAX_LENGTH);
+ memcpy(virts[total_nr], start, size);
+ start += size;
+ total_nr++;
+ /* We don't support fw chunk larger than 64*8K */
+ BUG_ON(total_nr > CB_NUMBER_OF_ELEMENTS_SMALL);
+ }
+
/* build DMA packet and queue up for sending */
/* dma to chunk->address, the chunk->length bytes from data +
* offeset*/
/* Dma loading */
- rc = ipw_fw_dma_add_buffer(priv, shared_phys + offset,
- le32_to_cpu(chunk->address),
- le32_to_cpu(chunk->length));
- if (rc) {
+ ret = ipw_fw_dma_add_buffer(priv, &phys[total_nr - nr],
+ nr, le32_to_cpu(chunk->address),
+ chunk_len);
+ if (ret) {
IPW_DEBUG_INFO("dmaAddBuffer Failed\n");
goto out;
}
- offset += le32_to_cpu(chunk->length);
+ offset += chunk_len;
} while (offset < len);
/* Run the DMA and wait for the answer */
- rc = ipw_fw_dma_kick(priv);
- if (rc) {
+ ret = ipw_fw_dma_kick(priv);
+ if (ret) {
IPW_ERROR("dmaKick Failed\n");
goto out;
}
- rc = ipw_fw_dma_wait(priv);
- if (rc) {
+ ret = ipw_fw_dma_wait(priv);
+ if (ret) {
IPW_ERROR("dmaWaitSync Failed\n");
goto out;
}
- out:
- pci_free_consistent(priv->pci_dev, len, shared_virt, shared_phys);
- return rc;
+ out:
+ for (i = 0; i < total_nr; i++)
+ pci_pool_free(pool, virts[i], phys[i]);
+
+ pci_pool_destroy(pool);
+
+ return ret;
}
/* stop nic */
--
1.5.3.6
^ permalink raw reply related [flat|nested] 286+ messages in thread
* ipw2200: firmware DMA loading rework
@ 2009-08-28 3:42 ` Zhu Yi
0 siblings, 0 replies; 286+ messages in thread
From: Zhu Yi @ 2009-08-28 3:42 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List,
Bartlomiej Zolnierkiewicz, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
likely to fail and should always be avoided.
The patch fixes this problem by replacing the original order-6
pci_alloc_consistent() with an array of order-1 pages from a pci pool.
This utilized the ipw2200 DMA command blocks (up to 64 slots). The
maximum firmware size support remains the same (64*8K).
This patch fixes bug http://bugzilla.kernel.org/show_bug.cgi?id=14016
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
---
drivers/net/wireless/ipw2x00/ipw2200.c | 120 ++++++++++++++++++--------------
1 files changed, 67 insertions(+), 53 deletions(-)
diff --git a/drivers/net/wireless/ipw2x00/ipw2200.c b/drivers/net/wireless/ipw2x00/ipw2200.c
index 6dcac73..f593fbb 100644
--- a/drivers/net/wireless/ipw2x00/ipw2200.c
+++ b/drivers/net/wireless/ipw2x00/ipw2200.c
@@ -2874,45 +2874,27 @@ static int ipw_fw_dma_add_command_block(struct ipw_priv *priv,
return 0;
}
-static int ipw_fw_dma_add_buffer(struct ipw_priv *priv,
- u32 src_phys, u32 dest_address, u32 length)
+static int ipw_fw_dma_add_buffer(struct ipw_priv *priv, dma_addr_t *src_address,
+ int nr, u32 dest_address, u32 len)
{
- u32 bytes_left = length;
- u32 src_offset = 0;
- u32 dest_offset = 0;
- int status = 0;
+ int ret, i;
+ u32 size;
+
IPW_DEBUG_FW(">> \n");
- IPW_DEBUG_FW_INFO("src_phys=0x%x dest_address=0x%x length=0x%x\n",
- src_phys, dest_address, length);
- while (bytes_left > CB_MAX_LENGTH) {
- status = ipw_fw_dma_add_command_block(priv,
- src_phys + src_offset,
- dest_address +
- dest_offset,
- CB_MAX_LENGTH, 0, 0);
- if (status) {
+ IPW_DEBUG_FW_INFO("nr=%d dest_address=0x%x len=0x%x\n",
+ nr, dest_address, len);
+
+ for (i = 0; i < nr; i++) {
+ size = min_t(u32, len - i * CB_MAX_LENGTH, CB_MAX_LENGTH);
+ ret = ipw_fw_dma_add_command_block(priv, src_address[i],
+ dest_address +
+ i * CB_MAX_LENGTH, size,
+ 0, 0);
+ if (ret) {
IPW_DEBUG_FW_INFO(": Failed\n");
return -1;
} else
IPW_DEBUG_FW_INFO(": Added new cb\n");
-
- src_offset += CB_MAX_LENGTH;
- dest_offset += CB_MAX_LENGTH;
- bytes_left -= CB_MAX_LENGTH;
- }
-
- /* add the buffer tail */
- if (bytes_left > 0) {
- status =
- ipw_fw_dma_add_command_block(priv, src_phys + src_offset,
- dest_address + dest_offset,
- bytes_left, 0, 0);
- if (status) {
- IPW_DEBUG_FW_INFO(": Failed on the buffer tail\n");
- return -1;
- } else
- IPW_DEBUG_FW_INFO
- (": Adding new cb - the buffer tail\n");
}
IPW_DEBUG_FW("<< \n");
@@ -3160,59 +3142,91 @@ static int ipw_load_ucode(struct ipw_priv *priv, u8 * data, size_t len)
static int ipw_load_firmware(struct ipw_priv *priv, u8 * data, size_t len)
{
- int rc = -1;
+ int ret = -1;
int offset = 0;
struct fw_chunk *chunk;
- dma_addr_t shared_phys;
- u8 *shared_virt;
+ int total_nr = 0;
+ int i;
+ struct pci_pool *pool;
+ u32 *virts[CB_NUMBER_OF_ELEMENTS_SMALL];
+ dma_addr_t phys[CB_NUMBER_OF_ELEMENTS_SMALL];
IPW_DEBUG_TRACE("<< : \n");
- shared_virt = pci_alloc_consistent(priv->pci_dev, len, &shared_phys);
- if (!shared_virt)
+ pool = pci_pool_create("ipw2200", priv->pci_dev, CB_MAX_LENGTH, 0, 0);
+ if (!pool) {
+ IPW_ERROR("pci_pool_create failed\n");
return -ENOMEM;
-
- memmove(shared_virt, data, len);
+ }
/* Start the Dma */
- rc = ipw_fw_dma_enable(priv);
+ ret = ipw_fw_dma_enable(priv);
/* the DMA is already ready this would be a bug. */
BUG_ON(priv->sram_desc.last_cb_index > 0);
do {
+ u32 chunk_len;
+ u8 *start;
+ int size;
+ int nr = 0;
+
chunk = (struct fw_chunk *)(data + offset);
offset += sizeof(struct fw_chunk);
+ chunk_len = le32_to_cpu(chunk->length);
+ start = data + offset;
+
+ nr = (chunk_len + CB_MAX_LENGTH - 1) / CB_MAX_LENGTH;
+ for (i = 0; i < nr; i++) {
+ virts[total_nr] = pci_pool_alloc(pool, GFP_KERNEL,
+ &phys[total_nr]);
+ if (!virts[total_nr]) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ size = min_t(u32, chunk_len - i * CB_MAX_LENGTH,
+ CB_MAX_LENGTH);
+ memcpy(virts[total_nr], start, size);
+ start += size;
+ total_nr++;
+ /* We don't support fw chunk larger than 64*8K */
+ BUG_ON(total_nr > CB_NUMBER_OF_ELEMENTS_SMALL);
+ }
+
/* build DMA packet and queue up for sending */
/* dma to chunk->address, the chunk->length bytes from data +
* offeset*/
/* Dma loading */
- rc = ipw_fw_dma_add_buffer(priv, shared_phys + offset,
- le32_to_cpu(chunk->address),
- le32_to_cpu(chunk->length));
- if (rc) {
+ ret = ipw_fw_dma_add_buffer(priv, &phys[total_nr - nr],
+ nr, le32_to_cpu(chunk->address),
+ chunk_len);
+ if (ret) {
IPW_DEBUG_INFO("dmaAddBuffer Failed\n");
goto out;
}
- offset += le32_to_cpu(chunk->length);
+ offset += chunk_len;
} while (offset < len);
/* Run the DMA and wait for the answer */
- rc = ipw_fw_dma_kick(priv);
- if (rc) {
+ ret = ipw_fw_dma_kick(priv);
+ if (ret) {
IPW_ERROR("dmaKick Failed\n");
goto out;
}
- rc = ipw_fw_dma_wait(priv);
- if (rc) {
+ ret = ipw_fw_dma_wait(priv);
+ if (ret) {
IPW_ERROR("dmaWaitSync Failed\n");
goto out;
}
- out:
- pci_free_consistent(priv->pci_dev, len, shared_virt, shared_phys);
- return rc;
+ out:
+ for (i = 0; i < total_nr; i++)
+ pci_pool_free(pool, virts[i], phys[i]);
+
+ pci_pool_destroy(pool);
+
+ return ret;
}
/* stop nic */
--
1.5.3.6
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply related [flat|nested] 286+ messages in thread
* ipw2200: firmware DMA loading rework
@ 2009-08-28 3:42 ` Zhu Yi
0 siblings, 0 replies; 286+ messages in thread
From: Zhu Yi @ 2009-08-28 3:42 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List,
Bartlomiej Zolnierkiewicz, Mel Gorman,
netdev-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
James Ketrenos, Chatre, Reinette,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
ipw2100-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
likely to fail and should always be avoided.
The patch fixes this problem by replacing the original order-6
pci_alloc_consistent() with an array of order-1 pages from a pci pool.
This utilized the ipw2200 DMA command blocks (up to 64 slots). The
maximum firmware size support remains the same (64*8K).
This patch fixes bug http://bugzilla.kernel.org/show_bug.cgi?id=14016
Cc: Andrew Morton <akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Cc: Mel Gorman <mel-wPRd99KPJ+uzQB+pC5nmwQ@public.gmane.org>
Signed-off-by: Zhu Yi <yi.zhu-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
---
drivers/net/wireless/ipw2x00/ipw2200.c | 120 ++++++++++++++++++--------------
1 files changed, 67 insertions(+), 53 deletions(-)
diff --git a/drivers/net/wireless/ipw2x00/ipw2200.c b/drivers/net/wireless/ipw2x00/ipw2200.c
index 6dcac73..f593fbb 100644
--- a/drivers/net/wireless/ipw2x00/ipw2200.c
+++ b/drivers/net/wireless/ipw2x00/ipw2200.c
@@ -2874,45 +2874,27 @@ static int ipw_fw_dma_add_command_block(struct ipw_priv *priv,
return 0;
}
-static int ipw_fw_dma_add_buffer(struct ipw_priv *priv,
- u32 src_phys, u32 dest_address, u32 length)
+static int ipw_fw_dma_add_buffer(struct ipw_priv *priv, dma_addr_t *src_address,
+ int nr, u32 dest_address, u32 len)
{
- u32 bytes_left = length;
- u32 src_offset = 0;
- u32 dest_offset = 0;
- int status = 0;
+ int ret, i;
+ u32 size;
+
IPW_DEBUG_FW(">> \n");
- IPW_DEBUG_FW_INFO("src_phys=0x%x dest_address=0x%x length=0x%x\n",
- src_phys, dest_address, length);
- while (bytes_left > CB_MAX_LENGTH) {
- status = ipw_fw_dma_add_command_block(priv,
- src_phys + src_offset,
- dest_address +
- dest_offset,
- CB_MAX_LENGTH, 0, 0);
- if (status) {
+ IPW_DEBUG_FW_INFO("nr=%d dest_address=0x%x len=0x%x\n",
+ nr, dest_address, len);
+
+ for (i = 0; i < nr; i++) {
+ size = min_t(u32, len - i * CB_MAX_LENGTH, CB_MAX_LENGTH);
+ ret = ipw_fw_dma_add_command_block(priv, src_address[i],
+ dest_address +
+ i * CB_MAX_LENGTH, size,
+ 0, 0);
+ if (ret) {
IPW_DEBUG_FW_INFO(": Failed\n");
return -1;
} else
IPW_DEBUG_FW_INFO(": Added new cb\n");
-
- src_offset += CB_MAX_LENGTH;
- dest_offset += CB_MAX_LENGTH;
- bytes_left -= CB_MAX_LENGTH;
- }
-
- /* add the buffer tail */
- if (bytes_left > 0) {
- status =
- ipw_fw_dma_add_command_block(priv, src_phys + src_offset,
- dest_address + dest_offset,
- bytes_left, 0, 0);
- if (status) {
- IPW_DEBUG_FW_INFO(": Failed on the buffer tail\n");
- return -1;
- } else
- IPW_DEBUG_FW_INFO
- (": Adding new cb - the buffer tail\n");
}
IPW_DEBUG_FW("<< \n");
@@ -3160,59 +3142,91 @@ static int ipw_load_ucode(struct ipw_priv *priv, u8 * data, size_t len)
static int ipw_load_firmware(struct ipw_priv *priv, u8 * data, size_t len)
{
- int rc = -1;
+ int ret = -1;
int offset = 0;
struct fw_chunk *chunk;
- dma_addr_t shared_phys;
- u8 *shared_virt;
+ int total_nr = 0;
+ int i;
+ struct pci_pool *pool;
+ u32 *virts[CB_NUMBER_OF_ELEMENTS_SMALL];
+ dma_addr_t phys[CB_NUMBER_OF_ELEMENTS_SMALL];
IPW_DEBUG_TRACE("<< : \n");
- shared_virt = pci_alloc_consistent(priv->pci_dev, len, &shared_phys);
- if (!shared_virt)
+ pool = pci_pool_create("ipw2200", priv->pci_dev, CB_MAX_LENGTH, 0, 0);
+ if (!pool) {
+ IPW_ERROR("pci_pool_create failed\n");
return -ENOMEM;
-
- memmove(shared_virt, data, len);
+ }
/* Start the Dma */
- rc = ipw_fw_dma_enable(priv);
+ ret = ipw_fw_dma_enable(priv);
/* the DMA is already ready this would be a bug. */
BUG_ON(priv->sram_desc.last_cb_index > 0);
do {
+ u32 chunk_len;
+ u8 *start;
+ int size;
+ int nr = 0;
+
chunk = (struct fw_chunk *)(data + offset);
offset += sizeof(struct fw_chunk);
+ chunk_len = le32_to_cpu(chunk->length);
+ start = data + offset;
+
+ nr = (chunk_len + CB_MAX_LENGTH - 1) / CB_MAX_LENGTH;
+ for (i = 0; i < nr; i++) {
+ virts[total_nr] = pci_pool_alloc(pool, GFP_KERNEL,
+ &phys[total_nr]);
+ if (!virts[total_nr]) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ size = min_t(u32, chunk_len - i * CB_MAX_LENGTH,
+ CB_MAX_LENGTH);
+ memcpy(virts[total_nr], start, size);
+ start += size;
+ total_nr++;
+ /* We don't support fw chunk larger than 64*8K */
+ BUG_ON(total_nr > CB_NUMBER_OF_ELEMENTS_SMALL);
+ }
+
/* build DMA packet and queue up for sending */
/* dma to chunk->address, the chunk->length bytes from data +
* offeset*/
/* Dma loading */
- rc = ipw_fw_dma_add_buffer(priv, shared_phys + offset,
- le32_to_cpu(chunk->address),
- le32_to_cpu(chunk->length));
- if (rc) {
+ ret = ipw_fw_dma_add_buffer(priv, &phys[total_nr - nr],
+ nr, le32_to_cpu(chunk->address),
+ chunk_len);
+ if (ret) {
IPW_DEBUG_INFO("dmaAddBuffer Failed\n");
goto out;
}
- offset += le32_to_cpu(chunk->length);
+ offset += chunk_len;
} while (offset < len);
/* Run the DMA and wait for the answer */
- rc = ipw_fw_dma_kick(priv);
- if (rc) {
+ ret = ipw_fw_dma_kick(priv);
+ if (ret) {
IPW_ERROR("dmaKick Failed\n");
goto out;
}
- rc = ipw_fw_dma_wait(priv);
- if (rc) {
+ ret = ipw_fw_dma_wait(priv);
+ if (ret) {
IPW_ERROR("dmaWaitSync Failed\n");
goto out;
}
- out:
- pci_free_consistent(priv->pci_dev, len, shared_virt, shared_phys);
- return rc;
+ out:
+ for (i = 0; i < total_nr; i++)
+ pci_pool_free(pool, virts[i], phys[i]);
+
+ pci_pool_destroy(pool);
+
+ return ret;
}
/* stop nic */
--
1.5.3.6
^ permalink raw reply related [flat|nested] 286+ messages in thread
* ipw2200: firmware DMA loading rework
@ 2009-08-28 3:42 ` Zhu Yi
0 siblings, 0 replies; 286+ messages in thread
From: Zhu Yi @ 2009-08-28 3:42 UTC (permalink / raw)
To: Andrew Morton
Cc: Mel Gorman, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List,
Bartlomiej Zolnierkiewicz, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
likely to fail and should always be avoided.
The patch fixes this problem by replacing the original order-6
pci_alloc_consistent() with an array of order-1 pages from a pci pool.
This utilized the ipw2200 DMA command blocks (up to 64 slots). The
maximum firmware size support remains the same (64*8K).
This patch fixes bug http://bugzilla.kernel.org/show_bug.cgi?id=14016
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: Mel Gorman <mel@csn.ul.ie>
Signed-off-by: Zhu Yi <yi.zhu@intel.com>
---
drivers/net/wireless/ipw2x00/ipw2200.c | 120 ++++++++++++++++++--------------
1 files changed, 67 insertions(+), 53 deletions(-)
diff --git a/drivers/net/wireless/ipw2x00/ipw2200.c b/drivers/net/wireless/ipw2x00/ipw2200.c
index 6dcac73..f593fbb 100644
--- a/drivers/net/wireless/ipw2x00/ipw2200.c
+++ b/drivers/net/wireless/ipw2x00/ipw2200.c
@@ -2874,45 +2874,27 @@ static int ipw_fw_dma_add_command_block(struct ipw_priv *priv,
return 0;
}
-static int ipw_fw_dma_add_buffer(struct ipw_priv *priv,
- u32 src_phys, u32 dest_address, u32 length)
+static int ipw_fw_dma_add_buffer(struct ipw_priv *priv, dma_addr_t *src_address,
+ int nr, u32 dest_address, u32 len)
{
- u32 bytes_left = length;
- u32 src_offset = 0;
- u32 dest_offset = 0;
- int status = 0;
+ int ret, i;
+ u32 size;
+
IPW_DEBUG_FW(">> \n");
- IPW_DEBUG_FW_INFO("src_phys=0x%x dest_address=0x%x length=0x%x\n",
- src_phys, dest_address, length);
- while (bytes_left > CB_MAX_LENGTH) {
- status = ipw_fw_dma_add_command_block(priv,
- src_phys + src_offset,
- dest_address +
- dest_offset,
- CB_MAX_LENGTH, 0, 0);
- if (status) {
+ IPW_DEBUG_FW_INFO("nr=%d dest_address=0x%x len=0x%x\n",
+ nr, dest_address, len);
+
+ for (i = 0; i < nr; i++) {
+ size = min_t(u32, len - i * CB_MAX_LENGTH, CB_MAX_LENGTH);
+ ret = ipw_fw_dma_add_command_block(priv, src_address[i],
+ dest_address +
+ i * CB_MAX_LENGTH, size,
+ 0, 0);
+ if (ret) {
IPW_DEBUG_FW_INFO(": Failed\n");
return -1;
} else
IPW_DEBUG_FW_INFO(": Added new cb\n");
-
- src_offset += CB_MAX_LENGTH;
- dest_offset += CB_MAX_LENGTH;
- bytes_left -= CB_MAX_LENGTH;
- }
-
- /* add the buffer tail */
- if (bytes_left > 0) {
- status =
- ipw_fw_dma_add_command_block(priv, src_phys + src_offset,
- dest_address + dest_offset,
- bytes_left, 0, 0);
- if (status) {
- IPW_DEBUG_FW_INFO(": Failed on the buffer tail\n");
- return -1;
- } else
- IPW_DEBUG_FW_INFO
- (": Adding new cb - the buffer tail\n");
}
IPW_DEBUG_FW("<< \n");
@@ -3160,59 +3142,91 @@ static int ipw_load_ucode(struct ipw_priv *priv, u8 * data, size_t len)
static int ipw_load_firmware(struct ipw_priv *priv, u8 * data, size_t len)
{
- int rc = -1;
+ int ret = -1;
int offset = 0;
struct fw_chunk *chunk;
- dma_addr_t shared_phys;
- u8 *shared_virt;
+ int total_nr = 0;
+ int i;
+ struct pci_pool *pool;
+ u32 *virts[CB_NUMBER_OF_ELEMENTS_SMALL];
+ dma_addr_t phys[CB_NUMBER_OF_ELEMENTS_SMALL];
IPW_DEBUG_TRACE("<< : \n");
- shared_virt = pci_alloc_consistent(priv->pci_dev, len, &shared_phys);
- if (!shared_virt)
+ pool = pci_pool_create("ipw2200", priv->pci_dev, CB_MAX_LENGTH, 0, 0);
+ if (!pool) {
+ IPW_ERROR("pci_pool_create failed\n");
return -ENOMEM;
-
- memmove(shared_virt, data, len);
+ }
/* Start the Dma */
- rc = ipw_fw_dma_enable(priv);
+ ret = ipw_fw_dma_enable(priv);
/* the DMA is already ready this would be a bug. */
BUG_ON(priv->sram_desc.last_cb_index > 0);
do {
+ u32 chunk_len;
+ u8 *start;
+ int size;
+ int nr = 0;
+
chunk = (struct fw_chunk *)(data + offset);
offset += sizeof(struct fw_chunk);
+ chunk_len = le32_to_cpu(chunk->length);
+ start = data + offset;
+
+ nr = (chunk_len + CB_MAX_LENGTH - 1) / CB_MAX_LENGTH;
+ for (i = 0; i < nr; i++) {
+ virts[total_nr] = pci_pool_alloc(pool, GFP_KERNEL,
+ &phys[total_nr]);
+ if (!virts[total_nr]) {
+ ret = -ENOMEM;
+ goto out;
+ }
+ size = min_t(u32, chunk_len - i * CB_MAX_LENGTH,
+ CB_MAX_LENGTH);
+ memcpy(virts[total_nr], start, size);
+ start += size;
+ total_nr++;
+ /* We don't support fw chunk larger than 64*8K */
+ BUG_ON(total_nr > CB_NUMBER_OF_ELEMENTS_SMALL);
+ }
+
/* build DMA packet and queue up for sending */
/* dma to chunk->address, the chunk->length bytes from data +
* offeset*/
/* Dma loading */
- rc = ipw_fw_dma_add_buffer(priv, shared_phys + offset,
- le32_to_cpu(chunk->address),
- le32_to_cpu(chunk->length));
- if (rc) {
+ ret = ipw_fw_dma_add_buffer(priv, &phys[total_nr - nr],
+ nr, le32_to_cpu(chunk->address),
+ chunk_len);
+ if (ret) {
IPW_DEBUG_INFO("dmaAddBuffer Failed\n");
goto out;
}
- offset += le32_to_cpu(chunk->length);
+ offset += chunk_len;
} while (offset < len);
/* Run the DMA and wait for the answer */
- rc = ipw_fw_dma_kick(priv);
- if (rc) {
+ ret = ipw_fw_dma_kick(priv);
+ if (ret) {
IPW_ERROR("dmaKick Failed\n");
goto out;
}
- rc = ipw_fw_dma_wait(priv);
- if (rc) {
+ ret = ipw_fw_dma_wait(priv);
+ if (ret) {
IPW_ERROR("dmaWaitSync Failed\n");
goto out;
}
- out:
- pci_free_consistent(priv->pci_dev, len, shared_virt, shared_phys);
- return rc;
+ out:
+ for (i = 0; i < total_nr; i++)
+ pci_pool_free(pool, virts[i], phys[i]);
+
+ pci_pool_destroy(pool);
+
+ return ret;
}
/* stop nic */
--
1.5.3.6
^ permalink raw reply related [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
2009-08-28 3:42 ` Zhu Yi
(?)
@ 2009-08-30 12:37 ` Bartlomiej Zolnierkiewicz
-1 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-08-30 12:37 UTC (permalink / raw)
To: Zhu Yi
Cc: Andrew Morton, Mel Gorman, Johannes Weiner, Pekka Enberg,
Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
s/2.6.30/2.6.31-rc6/
The issue has always been there but it was some recent change that
explicitly triggered the allocation failures (after 2.6.31-rc1).
> likely to fail and should always be avoided.
>
> The patch fixes this problem by replacing the original order-6
> pci_alloc_consistent() with an array of order-1 pages from a pci pool.
> This utilized the ipw2200 DMA command blocks (up to 64 slots). The
> maximum firmware size support remains the same (64*8K).
>
> This patch fixes bug http://bugzilla.kernel.org/show_bug.cgi?id=14016
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Thanks for the fix (also kudos to other people helping with the bugreport),
it works fine so far and looks OK to me:
Tested-and-reviewed-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-08-30 12:37 ` Bartlomiej Zolnierkiewicz
0 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-08-30 12:37 UTC (permalink / raw)
To: Zhu Yi
Cc: Andrew Morton, Mel Gorman, Johannes Weiner, Pekka Enberg,
Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
s/2.6.30/2.6.31-rc6/
The issue has always been there but it was some recent change that
explicitly triggered the allocation failures (after 2.6.31-rc1).
> likely to fail and should always be avoided.
>
> The patch fixes this problem by replacing the original order-6
> pci_alloc_consistent() with an array of order-1 pages from a pci pool.
> This utilized the ipw2200 DMA command blocks (up to 64 slots). The
> maximum firmware size support remains the same (64*8K).
>
> This patch fixes bug http://bugzilla.kernel.org/show_bug.cgi?id=14016
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Thanks for the fix (also kudos to other people helping with the bugreport),
it works fine so far and looks OK to me:
Tested-and-reviewed-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-08-30 12:37 ` Bartlomiej Zolnierkiewicz
0 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-08-30 12:37 UTC (permalink / raw)
To: Zhu Yi
Cc: Andrew Morton, Mel Gorman, Johannes Weiner, Pekka Enberg,
Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
s/2.6.30/2.6.31-rc6/
The issue has always been there but it was some recent change that
explicitly triggered the allocation failures (after 2.6.31-rc1).
> likely to fail and should always be avoided.
>
> The patch fixes this problem by replacing the original order-6
> pci_alloc_consistent() with an array of order-1 pages from a pci pool.
> This utilized the ipw2200 DMA command blocks (up to 64 slots). The
> maximum firmware size support remains the same (64*8K).
>
> This patch fixes bug http://bugzilla.kernel.org/show_bug.cgi?id=14016
>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: Mel Gorman <mel@csn.ul.ie>
> Signed-off-by: Zhu Yi <yi.zhu@intel.com>
Thanks for the fix (also kudos to other people helping with the bugreport),
it works fine so far and looks OK to me:
Tested-and-reviewed-by: Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
2009-08-30 12:37 ` Bartlomiej Zolnierkiewicz
(?)
@ 2009-09-02 17:48 ` Bartlomiej Zolnierkiewicz
-1 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-09-02 17:48 UTC (permalink / raw)
To: Zhu Yi
Cc: Andrew Morton, Mel Gorman, Johannes Weiner, Pekka Enberg,
Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
>
> s/2.6.30/2.6.31-rc6/
>
> The issue has always been there but it was some recent change that
> explicitly triggered the allocation failures (after 2.6.31-rc1).
ipw2200 fix works fine but yesterday I got the following error while mounting
ext4 filesystem (mb_history is optional so the mount succeeded):
EXT4-fs (dm-2): barriers enabled
kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
EXT4-fs (dm-2): internal journal on dm-2:8
EXT4-fs (dm-2): delayed allocation enabled
EXT4-fs: file extents enabled
mount: page allocation failure. order:5, mode:0xc0d0
Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
Call Trace:
[<c0394de3>] ? printk+0xf/0x14
[<c016a693>] __alloc_pages_nodemask+0x400/0x442
[<c016a71b>] __get_free_pages+0xf/0x32
[<c01865cf>] __kmalloc+0x28/0xfa
[<c023d96f>] ? __spin_lock_init+0x28/0x4d
[<c01f529d>] ext4_mb_init+0x392/0x460
[<c01e99d2>] ext4_fill_super+0x1b96/0x2012
[<c0239bc8>] ? snprintf+0x15/0x17
[<c01c0b26>] ? disk_name+0x24/0x69
[<c018ba63>] get_sb_bdev+0xda/0x117
[<c01e6711>] ext4_get_sb+0x13/0x15
[<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
[<c018ad2d>] vfs_kern_mount+0x3b/0x76
[<c018adad>] do_kern_mount+0x33/0xbd
[<c019d0af>] do_mount+0x660/0x6b8
[<c016a71b>] ? __get_free_pages+0xf/0x32
[<c019d168>] sys_mount+0x61/0x99
[<c0102908>] sysenter_do_call+0x12/0x36
Mem-Info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Normal per-cpu:
CPU 0: hi: 186, btch: 31 usd: 0
Active_anon:25471 active_file:22802 inactive_anon:25812
inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 489 489
Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
57947 total pagecache pages
878 pages in swap cache
Swap cache stats: add 920, delete 42, find 11/11
Free swap = 1016436kB
Total swap = 1020116kB
131056 pages RAM
4233 pages reserved
90573 pages shared
77286 pages non-shared
EXT4-fs: mballoc enabled
EXT4-fs (dm-2): mounted filesystem with ordered data mode
Thus it seems like the original bug is still there and any ideas how to
debug the problem further are appreciated..
The complete dmesg and kernel config are here:
http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-02 17:48 ` Bartlomiej Zolnierkiewicz
0 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-09-02 17:48 UTC (permalink / raw)
To: Zhu Yi
Cc: Andrew Morton, Mel Gorman, Johannes Weiner, Pekka Enberg,
Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
>
> s/2.6.30/2.6.31-rc6/
>
> The issue has always been there but it was some recent change that
> explicitly triggered the allocation failures (after 2.6.31-rc1).
ipw2200 fix works fine but yesterday I got the following error while mounting
ext4 filesystem (mb_history is optional so the mount succeeded):
EXT4-fs (dm-2): barriers enabled
kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
EXT4-fs (dm-2): internal journal on dm-2:8
EXT4-fs (dm-2): delayed allocation enabled
EXT4-fs: file extents enabled
mount: page allocation failure. order:5, mode:0xc0d0
Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
Call Trace:
[<c0394de3>] ? printk+0xf/0x14
[<c016a693>] __alloc_pages_nodemask+0x400/0x442
[<c016a71b>] __get_free_pages+0xf/0x32
[<c01865cf>] __kmalloc+0x28/0xfa
[<c023d96f>] ? __spin_lock_init+0x28/0x4d
[<c01f529d>] ext4_mb_init+0x392/0x460
[<c01e99d2>] ext4_fill_super+0x1b96/0x2012
[<c0239bc8>] ? snprintf+0x15/0x17
[<c01c0b26>] ? disk_name+0x24/0x69
[<c018ba63>] get_sb_bdev+0xda/0x117
[<c01e6711>] ext4_get_sb+0x13/0x15
[<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
[<c018ad2d>] vfs_kern_mount+0x3b/0x76
[<c018adad>] do_kern_mount+0x33/0xbd
[<c019d0af>] do_mount+0x660/0x6b8
[<c016a71b>] ? __get_free_pages+0xf/0x32
[<c019d168>] sys_mount+0x61/0x99
[<c0102908>] sysenter_do_call+0x12/0x36
Mem-Info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Normal per-cpu:
CPU 0: hi: 186, btch: 31 usd: 0
Active_anon:25471 active_file:22802 inactive_anon:25812
inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 489 489
Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
57947 total pagecache pages
878 pages in swap cache
Swap cache stats: add 920, delete 42, find 11/11
Free swap = 1016436kB
Total swap = 1020116kB
131056 pages RAM
4233 pages reserved
90573 pages shared
77286 pages non-shared
EXT4-fs: mballoc enabled
EXT4-fs (dm-2): mounted filesystem with ordered data mode
Thus it seems like the original bug is still there and any ideas how to
debug the problem further are appreciated..
The complete dmesg and kernel config are here:
http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-02 17:48 ` Bartlomiej Zolnierkiewicz
0 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-09-02 17:48 UTC (permalink / raw)
To: Zhu Yi
Cc: Andrew Morton, Mel Gorman, Johannes Weiner, Pekka Enberg,
Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
>
> s/2.6.30/2.6.31-rc6/
>
> The issue has always been there but it was some recent change that
> explicitly triggered the allocation failures (after 2.6.31-rc1).
ipw2200 fix works fine but yesterday I got the following error while mounting
ext4 filesystem (mb_history is optional so the mount succeeded):
EXT4-fs (dm-2): barriers enabled
kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
EXT4-fs (dm-2): internal journal on dm-2:8
EXT4-fs (dm-2): delayed allocation enabled
EXT4-fs: file extents enabled
mount: page allocation failure. order:5, mode:0xc0d0
Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
Call Trace:
[<c0394de3>] ? printk+0xf/0x14
[<c016a693>] __alloc_pages_nodemask+0x400/0x442
[<c016a71b>] __get_free_pages+0xf/0x32
[<c01865cf>] __kmalloc+0x28/0xfa
[<c023d96f>] ? __spin_lock_init+0x28/0x4d
[<c01f529d>] ext4_mb_init+0x392/0x460
[<c01e99d2>] ext4_fill_super+0x1b96/0x2012
[<c0239bc8>] ? snprintf+0x15/0x17
[<c01c0b26>] ? disk_name+0x24/0x69
[<c018ba63>] get_sb_bdev+0xda/0x117
[<c01e6711>] ext4_get_sb+0x13/0x15
[<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
[<c018ad2d>] vfs_kern_mount+0x3b/0x76
[<c018adad>] do_kern_mount+0x33/0xbd
[<c019d0af>] do_mount+0x660/0x6b8
[<c016a71b>] ? __get_free_pages+0xf/0x32
[<c019d168>] sys_mount+0x61/0x99
[<c0102908>] sysenter_do_call+0x12/0x36
Mem-Info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Normal per-cpu:
CPU 0: hi: 186, btch: 31 usd: 0
Active_anon:25471 active_file:22802 inactive_anon:25812
inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 489 489
Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
57947 total pagecache pages
878 pages in swap cache
Swap cache stats: add 920, delete 42, find 11/11
Free swap = 1016436kB
Total swap = 1020116kB
131056 pages RAM
4233 pages reserved
90573 pages shared
77286 pages non-shared
EXT4-fs: mballoc enabled
EXT4-fs (dm-2): mounted filesystem with ordered data mode
Thus it seems like the original bug is still there and any ideas how to
debug the problem further are appreciated..
The complete dmesg and kernel config are here:
http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
2009-09-02 17:48 ` Bartlomiej Zolnierkiewicz
(?)
@ 2009-09-02 18:02 ` Luis R. Rodriguez
-1 siblings, 0 replies; 286+ messages in thread
From: Luis R. Rodriguez @ 2009-09-02 18:02 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz, Tso Ted, Aneesh Kumar K.V
Cc: Zhu Yi, Andrew Morton, Mel Gorman, Johannes Weiner, Pekka Enberg,
Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
Zolnierkiewicz<bzolnier@gmail.com> wrote:
> On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
>> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
>> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
>> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
>>
>> s/2.6.30/2.6.31-rc6/
>>
>> The issue has always been there but it was some recent change that
>> explicitly triggered the allocation failures (after 2.6.31-rc1).
>
> ipw2200 fix works fine but yesterday I got the following error while mounting
> ext4 filesystem (mb_history is optional so the mount succeeded):
OK so the mount succeeded.
> EXT4-fs (dm-2): barriers enabled
> kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> EXT4-fs (dm-2): internal journal on dm-2:8
> EXT4-fs (dm-2): delayed allocation enabled
> EXT4-fs: file extents enabled
> mount: page allocation failure. order:5, mode:0xc0d0
> Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> Call Trace:
> [<c0394de3>] ? printk+0xf/0x14
> [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> [<c016a71b>] __get_free_pages+0xf/0x32
> [<c01865cf>] __kmalloc+0x28/0xfa
> [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> [<c01f529d>] ext4_mb_init+0x392/0x460
> [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> [<c0239bc8>] ? snprintf+0x15/0x17
> [<c01c0b26>] ? disk_name+0x24/0x69
> [<c018ba63>] get_sb_bdev+0xda/0x117
> [<c01e6711>] ext4_get_sb+0x13/0x15
> [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> [<c018adad>] do_kern_mount+0x33/0xbd
> [<c019d0af>] do_mount+0x660/0x6b8
> [<c016a71b>] ? __get_free_pages+0xf/0x32
> [<c019d168>] sys_mount+0x61/0x99
> [<c0102908>] sysenter_do_call+0x12/0x36
> Mem-Info:
> DMA per-cpu:
> CPU 0: hi: 0, btch: 1 usd: 0
> Normal per-cpu:
> CPU 0: hi: 186, btch: 31 usd: 0
> Active_anon:25471 active_file:22802 inactive_anon:25812
> inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 489 489
> Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0
> DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> 57947 total pagecache pages
> 878 pages in swap cache
> Swap cache stats: add 920, delete 42, find 11/11
> Free swap = 1016436kB
> Total swap = 1020116kB
> 131056 pages RAM
> 4233 pages reserved
> 90573 pages shared
> 77286 pages non-shared
> EXT4-fs: mballoc enabled
> EXT4-fs (dm-2): mounted filesystem with ordered data mode
>
> Thus it seems like the original bug is still there and any ideas how to
> debug the problem further are appreciated..
>
> The complete dmesg and kernel config are here:
>
> http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
This looks very similar to the kmemleak ext4 reports upon a mount. If
it is the same issue, which from the trace it seems it is, then this
is due to an extra kmalloc() allocation and this apparently will not
get fixed on 2.6.31 due to the closeness of the merge window and the
non-criticalness this issue has been deemed.
A patch fix is part of the ext4-patchqueue
http://repo.or.cz/w/ext4-patch-queue.git
Luis
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-02 18:02 ` Luis R. Rodriguez
0 siblings, 0 replies; 286+ messages in thread
From: Luis R. Rodriguez @ 2009-09-02 18:02 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz, Tso Ted, Aneesh Kumar K.V
Cc: Zhu Yi, Andrew Morton, Mel Gorman, Johannes Weiner, Pekka Enberg,
Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
Zolnierkiewicz<bzolnier@gmail.com> wrote:
> On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
>> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
>> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
>> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
>>
>> s/2.6.30/2.6.31-rc6/
>>
>> The issue has always been there but it was some recent change that
>> explicitly triggered the allocation failures (after 2.6.31-rc1).
>
> ipw2200 fix works fine but yesterday I got the following error while mounting
> ext4 filesystem (mb_history is optional so the mount succeeded):
OK so the mount succeeded.
> EXT4-fs (dm-2): barriers enabled
> kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> EXT4-fs (dm-2): internal journal on dm-2:8
> EXT4-fs (dm-2): delayed allocation enabled
> EXT4-fs: file extents enabled
> mount: page allocation failure. order:5, mode:0xc0d0
> Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> Call Trace:
> [<c0394de3>] ? printk+0xf/0x14
> [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> [<c016a71b>] __get_free_pages+0xf/0x32
> [<c01865cf>] __kmalloc+0x28/0xfa
> [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> [<c01f529d>] ext4_mb_init+0x392/0x460
> [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> [<c0239bc8>] ? snprintf+0x15/0x17
> [<c01c0b26>] ? disk_name+0x24/0x69
> [<c018ba63>] get_sb_bdev+0xda/0x117
> [<c01e6711>] ext4_get_sb+0x13/0x15
> [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> [<c018adad>] do_kern_mount+0x33/0xbd
> [<c019d0af>] do_mount+0x660/0x6b8
> [<c016a71b>] ? __get_free_pages+0xf/0x32
> [<c019d168>] sys_mount+0x61/0x99
> [<c0102908>] sysenter_do_call+0x12/0x36
> Mem-Info:
> DMA per-cpu:
> CPU 0: hi: 0, btch: 1 usd: 0
> Normal per-cpu:
> CPU 0: hi: 186, btch: 31 usd: 0
> Active_anon:25471 active_file:22802 inactive_anon:25812
> inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 489 489
> Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0
> DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> 57947 total pagecache pages
> 878 pages in swap cache
> Swap cache stats: add 920, delete 42, find 11/11
> Free swap = 1016436kB
> Total swap = 1020116kB
> 131056 pages RAM
> 4233 pages reserved
> 90573 pages shared
> 77286 pages non-shared
> EXT4-fs: mballoc enabled
> EXT4-fs (dm-2): mounted filesystem with ordered data mode
>
> Thus it seems like the original bug is still there and any ideas how to
> debug the problem further are appreciated..
>
> The complete dmesg and kernel config are here:
>
> http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
This looks very similar to the kmemleak ext4 reports upon a mount. If
it is the same issue, which from the trace it seems it is, then this
is due to an extra kmalloc() allocation and this apparently will not
get fixed on 2.6.31 due to the closeness of the merge window and the
non-criticalness this issue has been deemed.
A patch fix is part of the ext4-patchqueue
http://repo.or.cz/w/ext4-patch-queue.git
Luis
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-02 18:02 ` Luis R. Rodriguez
0 siblings, 0 replies; 286+ messages in thread
From: Luis R. Rodriguez @ 2009-09-02 18:02 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz, Tso Ted, Aneesh Kumar K.V
Cc: Zhu Yi, Andrew Morton, Mel Gorman, Johannes Weiner, Pekka Enberg,
Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
Zolnierkiewicz<bzolnier@gmail.com> wrote:
> On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
>> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
>> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
>> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
>>
>> s/2.6.30/2.6.31-rc6/
>>
>> The issue has always been there but it was some recent change that
>> explicitly triggered the allocation failures (after 2.6.31-rc1).
>
> ipw2200 fix works fine but yesterday I got the following error while mounting
> ext4 filesystem (mb_history is optional so the mount succeeded):
OK so the mount succeeded.
> EXT4-fs (dm-2): barriers enabled
> kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> EXT4-fs (dm-2): internal journal on dm-2:8
> EXT4-fs (dm-2): delayed allocation enabled
> EXT4-fs: file extents enabled
> mount: page allocation failure. order:5, mode:0xc0d0
> Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> Call Trace:
> [<c0394de3>] ? printk+0xf/0x14
> [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> [<c016a71b>] __get_free_pages+0xf/0x32
> [<c01865cf>] __kmalloc+0x28/0xfa
> [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> [<c01f529d>] ext4_mb_init+0x392/0x460
> [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> [<c0239bc8>] ? snprintf+0x15/0x17
> [<c01c0b26>] ? disk_name+0x24/0x69
> [<c018ba63>] get_sb_bdev+0xda/0x117
> [<c01e6711>] ext4_get_sb+0x13/0x15
> [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> [<c018adad>] do_kern_mount+0x33/0xbd
> [<c019d0af>] do_mount+0x660/0x6b8
> [<c016a71b>] ? __get_free_pages+0xf/0x32
> [<c019d168>] sys_mount+0x61/0x99
> [<c0102908>] sysenter_do_call+0x12/0x36
> Mem-Info:
> DMA per-cpu:
> CPU 0: hi: 0, btch: 1 usd: 0
> Normal per-cpu:
> CPU 0: hi: 186, btch: 31 usd: 0
> Active_anon:25471 active_file:22802 inactive_anon:25812
> inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 489 489
> Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0
> DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> 57947 total pagecache pages
> 878 pages in swap cache
> Swap cache stats: add 920, delete 42, find 11/11
> Free swap = 1016436kB
> Total swap = 1020116kB
> 131056 pages RAM
> 4233 pages reserved
> 90573 pages shared
> 77286 pages non-shared
> EXT4-fs: mballoc enabled
> EXT4-fs (dm-2): mounted filesystem with ordered data mode
>
> Thus it seems like the original bug is still there and any ideas how to
> debug the problem further are appreciated..
>
> The complete dmesg and kernel config are here:
>
> http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
This looks very similar to the kmemleak ext4 reports upon a mount. If
it is the same issue, which from the trace it seems it is, then this
is due to an extra kmalloc() allocation and this apparently will not
get fixed on 2.6.31 due to the closeness of the merge window and the
non-criticalness this issue has been deemed.
A patch fix is part of the ext4-patchqueue
http://repo.or.cz/w/ext4-patch-queue.git
Luis
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
2009-09-02 18:02 ` Luis R. Rodriguez
(?)
(?)
@ 2009-09-02 18:26 ` Bartlomiej Zolnierkiewicz
-1 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-09-02 18:26 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Tso Ted, Aneesh Kumar K.V, Zhu Yi, Andrew Morton, Mel Gorman,
Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Wednesday 02 September 2009 20:02:14 Luis R. Rodriguez wrote:
> On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> Zolnierkiewicz<bzolnier@gmail.com> wrote:
> > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> >>
> >> s/2.6.30/2.6.31-rc6/
> >>
> >> The issue has always been there but it was some recent change that
> >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> >
> > ipw2200 fix works fine but yesterday I got the following error while mounting
> > ext4 filesystem (mb_history is optional so the mount succeeded):
>
> OK so the mount succeeded.
>
> > EXT4-fs (dm-2): barriers enabled
> > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > EXT4-fs (dm-2): internal journal on dm-2:8
> > EXT4-fs (dm-2): delayed allocation enabled
> > EXT4-fs: file extents enabled
> > mount: page allocation failure. order:5, mode:0xc0d0
> > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > Call Trace:
> > [<c0394de3>] ? printk+0xf/0x14
> > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > [<c016a71b>] __get_free_pages+0xf/0x32
> > [<c01865cf>] __kmalloc+0x28/0xfa
> > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > [<c01f529d>] ext4_mb_init+0x392/0x460
> > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > [<c0239bc8>] ? snprintf+0x15/0x17
> > [<c01c0b26>] ? disk_name+0x24/0x69
> > [<c018ba63>] get_sb_bdev+0xda/0x117
> > [<c01e6711>] ext4_get_sb+0x13/0x15
> > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > [<c018adad>] do_kern_mount+0x33/0xbd
> > [<c019d0af>] do_mount+0x660/0x6b8
> > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > [<c019d168>] sys_mount+0x61/0x99
> > [<c0102908>] sysenter_do_call+0x12/0x36
> > Mem-Info:
> > DMA per-cpu:
> > CPU 0: hi: 0, btch: 1 usd: 0
> > Normal per-cpu:
> > CPU 0: hi: 186, btch: 31 usd: 0
> > Active_anon:25471 active_file:22802 inactive_anon:25812
> > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 489 489
> > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 0 0
> > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > 57947 total pagecache pages
> > 878 pages in swap cache
> > Swap cache stats: add 920, delete 42, find 11/11
> > Free swap = 1016436kB
> > Total swap = 1020116kB
> > 131056 pages RAM
> > 4233 pages reserved
> > 90573 pages shared
> > 77286 pages non-shared
> > EXT4-fs: mballoc enabled
> > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> >
> > Thus it seems like the original bug is still there and any ideas how to
> > debug the problem further are appreciated..
> >
> > The complete dmesg and kernel config are here:
> >
> > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
>
> This looks very similar to the kmemleak ext4 reports upon a mount. If
> it is the same issue, which from the trace it seems it is, then this
> is due to an extra kmalloc() allocation and this apparently will not
> get fixed on 2.6.31 due to the closeness of the merge window and the
> non-criticalness this issue has been deemed.
>
> A patch fix is part of the ext4-patchqueue
> http://repo.or.cz/w/ext4-patch-queue.git
Thanks for the pointer but the page allocation failures that I hit seem
to be caused by the memory management itself and the ext4 issue fixed by:
http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
is a different problem (unrelated to this one).
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-02 18:26 ` Bartlomiej Zolnierkiewicz
0 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-09-02 18:26 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Tso Ted, Aneesh Kumar K.V, Zhu Yi, Andrew Morton, Mel Gorman,
Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Wednesday 02 September 2009 20:02:14 Luis R. Rodriguez wrote:
> On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> Zolnierkiewicz<bzolnier@gmail.com> wrote:
> > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> >>
> >> s/2.6.30/2.6.31-rc6/
> >>
> >> The issue has always been there but it was some recent change that
> >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> >
> > ipw2200 fix works fine but yesterday I got the following error while mounting
> > ext4 filesystem (mb_history is optional so the mount succeeded):
>
> OK so the mount succeeded.
>
> > EXT4-fs (dm-2): barriers enabled
> > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > EXT4-fs (dm-2): internal journal on dm-2:8
> > EXT4-fs (dm-2): delayed allocation enabled
> > EXT4-fs: file extents enabled
> > mount: page allocation failure. order:5, mode:0xc0d0
> > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > Call Trace:
> > [<c0394de3>] ? printk+0xf/0x14
> > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > [<c016a71b>] __get_free_pages+0xf/0x32
> > [<c01865cf>] __kmalloc+0x28/0xfa
> > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > [<c01f529d>] ext4_mb_init+0x392/0x460
> > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > [<c0239bc8>] ? snprintf+0x15/0x17
> > [<c01c0b26>] ? disk_name+0x24/0x69
> > [<c018ba63>] get_sb_bdev+0xda/0x117
> > [<c01e6711>] ext4_get_sb+0x13/0x15
> > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > [<c018adad>] do_kern_mount+0x33/0xbd
> > [<c019d0af>] do_mount+0x660/0x6b8
> > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > [<c019d168>] sys_mount+0x61/0x99
> > [<c0102908>] sysenter_do_call+0x12/0x36
> > Mem-Info:
> > DMA per-cpu:
> > CPU 0: hi: 0, btch: 1 usd: 0
> > Normal per-cpu:
> > CPU 0: hi: 186, btch: 31 usd: 0
> > Active_anon:25471 active_file:22802 inactive_anon:25812
> > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 489 489
> > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 0 0
> > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > 57947 total pagecache pages
> > 878 pages in swap cache
> > Swap cache stats: add 920, delete 42, find 11/11
> > Free swap = 1016436kB
> > Total swap = 1020116kB
> > 131056 pages RAM
> > 4233 pages reserved
> > 90573 pages shared
> > 77286 pages non-shared
> > EXT4-fs: mballoc enabled
> > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> >
> > Thus it seems like the original bug is still there and any ideas how to
> > debug the problem further are appreciated..
> >
> > The complete dmesg and kernel config are here:
> >
> > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
>
> This looks very similar to the kmemleak ext4 reports upon a mount. If
> it is the same issue, which from the trace it seems it is, then this
> is due to an extra kmalloc() allocation and this apparently will not
> get fixed on 2.6.31 due to the closeness of the merge window and the
> non-criticalness this issue has been deemed.
>
> A patch fix is part of the ext4-patchqueue
> http://repo.or.cz/w/ext4-patch-queue.git
Thanks for the pointer but the page allocation failures that I hit seem
to be caused by the memory management itself and the ext4 issue fixed by:
http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
is a different problem (unrelated to this one).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-02 18:26 ` Bartlomiej Zolnierkiewicz
0 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-09-02 18:26 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Tso Ted, Aneesh Kumar K.V, Zhu Yi, Andrew Morton, Mel Gorman,
Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
James Ketrenos, Chatre, Reinette,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
ipw2100-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
On Wednesday 02 September 2009 20:02:14 Luis R. Rodriguez wrote:
> On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> Zolnierkiewicz<bzolnier-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> >>
> >> s/2.6.30/2.6.31-rc6/
> >>
> >> The issue has always been there but it was some recent change that
> >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> >
> > ipw2200 fix works fine but yesterday I got the following error while mounting
> > ext4 filesystem (mb_history is optional so the mount succeeded):
>
> OK so the mount succeeded.
>
> > EXT4-fs (dm-2): barriers enabled
> > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > EXT4-fs (dm-2): internal journal on dm-2:8
> > EXT4-fs (dm-2): delayed allocation enabled
> > EXT4-fs: file extents enabled
> > mount: page allocation failure. order:5, mode:0xc0d0
> > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > Call Trace:
> > [<c0394de3>] ? printk+0xf/0x14
> > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > [<c016a71b>] __get_free_pages+0xf/0x32
> > [<c01865cf>] __kmalloc+0x28/0xfa
> > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > [<c01f529d>] ext4_mb_init+0x392/0x460
> > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > [<c0239bc8>] ? snprintf+0x15/0x17
> > [<c01c0b26>] ? disk_name+0x24/0x69
> > [<c018ba63>] get_sb_bdev+0xda/0x117
> > [<c01e6711>] ext4_get_sb+0x13/0x15
> > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > [<c018adad>] do_kern_mount+0x33/0xbd
> > [<c019d0af>] do_mount+0x660/0x6b8
> > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > [<c019d168>] sys_mount+0x61/0x99
> > [<c0102908>] sysenter_do_call+0x12/0x36
> > Mem-Info:
> > DMA per-cpu:
> > CPU 0: hi: 0, btch: 1 usd: 0
> > Normal per-cpu:
> > CPU 0: hi: 186, btch: 31 usd: 0
> > Active_anon:25471 active_file:22802 inactive_anon:25812
> > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 489 489
> > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 0 0
> > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > 57947 total pagecache pages
> > 878 pages in swap cache
> > Swap cache stats: add 920, delete 42, find 11/11
> > Free swap = 1016436kB
> > Total swap = 1020116kB
> > 131056 pages RAM
> > 4233 pages reserved
> > 90573 pages shared
> > 77286 pages non-shared
> > EXT4-fs: mballoc enabled
> > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> >
> > Thus it seems like the original bug is still there and any ideas how to
> > debug the problem further are appreciated..
> >
> > The complete dmesg and kernel config are here:
> >
> > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
>
> This looks very similar to the kmemleak ext4 reports upon a mount. If
> it is the same issue, which from the trace it seems it is, then this
> is due to an extra kmalloc() allocation and this apparently will not
> get fixed on 2.6.31 due to the closeness of the merge window and the
> non-criticalness this issue has been deemed.
>
> A patch fix is part of the ext4-patchqueue
> http://repo.or.cz/w/ext4-patch-queue.git
Thanks for the pointer but the page allocation failures that I hit seem
to be caused by the memory management itself and the ext4 issue fixed by:
http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
is a different problem (unrelated to this one).
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-02 18:26 ` Bartlomiej Zolnierkiewicz
0 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-09-02 18:26 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Tso Ted, Aneesh Kumar K.V, Zhu Yi, Andrew Morton, Mel Gorman,
Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Wednesday 02 September 2009 20:02:14 Luis R. Rodriguez wrote:
> On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> Zolnierkiewicz<bzolnier@gmail.com> wrote:
> > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> >>
> >> s/2.6.30/2.6.31-rc6/
> >>
> >> The issue has always been there but it was some recent change that
> >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> >
> > ipw2200 fix works fine but yesterday I got the following error while mounting
> > ext4 filesystem (mb_history is optional so the mount succeeded):
>
> OK so the mount succeeded.
>
> > EXT4-fs (dm-2): barriers enabled
> > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > EXT4-fs (dm-2): internal journal on dm-2:8
> > EXT4-fs (dm-2): delayed allocation enabled
> > EXT4-fs: file extents enabled
> > mount: page allocation failure. order:5, mode:0xc0d0
> > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > Call Trace:
> > [<c0394de3>] ? printk+0xf/0x14
> > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > [<c016a71b>] __get_free_pages+0xf/0x32
> > [<c01865cf>] __kmalloc+0x28/0xfa
> > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > [<c01f529d>] ext4_mb_init+0x392/0x460
> > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > [<c0239bc8>] ? snprintf+0x15/0x17
> > [<c01c0b26>] ? disk_name+0x24/0x69
> > [<c018ba63>] get_sb_bdev+0xda/0x117
> > [<c01e6711>] ext4_get_sb+0x13/0x15
> > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > [<c018adad>] do_kern_mount+0x33/0xbd
> > [<c019d0af>] do_mount+0x660/0x6b8
> > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > [<c019d168>] sys_mount+0x61/0x99
> > [<c0102908>] sysenter_do_call+0x12/0x36
> > Mem-Info:
> > DMA per-cpu:
> > CPU 0: hi: 0, btch: 1 usd: 0
> > Normal per-cpu:
> > CPU 0: hi: 186, btch: 31 usd: 0
> > Active_anon:25471 active_file:22802 inactive_anon:25812
> > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 489 489
> > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 0 0
> > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > 57947 total pagecache pages
> > 878 pages in swap cache
> > Swap cache stats: add 920, delete 42, find 11/11
> > Free swap = 1016436kB
> > Total swap = 1020116kB
> > 131056 pages RAM
> > 4233 pages reserved
> > 90573 pages shared
> > 77286 pages non-shared
> > EXT4-fs: mballoc enabled
> > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> >
> > Thus it seems like the original bug is still there and any ideas how to
> > debug the problem further are appreciated..
> >
> > The complete dmesg and kernel config are here:
> >
> > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
>
> This looks very similar to the kmemleak ext4 reports upon a mount. If
> it is the same issue, which from the trace it seems it is, then this
> is due to an extra kmalloc() allocation and this apparently will not
> get fixed on 2.6.31 due to the closeness of the merge window and the
> non-criticalness this issue has been deemed.
>
> A patch fix is part of the ext4-patchqueue
> http://repo.or.cz/w/ext4-patch-queue.git
Thanks for the pointer but the page allocation failures that I hit seem
to be caused by the memory management itself and the ext4 issue fixed by:
http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
is a different problem (unrelated to this one).
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
2009-09-02 18:26 ` Bartlomiej Zolnierkiewicz
(?)
(?)
@ 2009-09-19 13:25 ` Bartlomiej Zolnierkiewicz
-1 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-09-19 13:25 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Tso Ted, Aneesh Kumar K.V, Zhu Yi, Andrew Morton, Mel Gorman,
Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Wednesday 02 September 2009 20:26:17 Bartlomiej Zolnierkiewicz wrote:
> On Wednesday 02 September 2009 20:02:14 Luis R. Rodriguez wrote:
> > On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> > Zolnierkiewicz<bzolnier@gmail.com> wrote:
> > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > >>
> > >> s/2.6.30/2.6.31-rc6/
> > >>
> > >> The issue has always been there but it was some recent change that
> > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > >
> > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > ext4 filesystem (mb_history is optional so the mount succeeded):
> >
> > OK so the mount succeeded.
> >
> > > EXT4-fs (dm-2): barriers enabled
> > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > EXT4-fs (dm-2): delayed allocation enabled
> > > EXT4-fs: file extents enabled
> > > mount: page allocation failure. order:5, mode:0xc0d0
> > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > Call Trace:
> > > [<c0394de3>] ? printk+0xf/0x14
> > > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > > [<c016a71b>] __get_free_pages+0xf/0x32
> > > [<c01865cf>] __kmalloc+0x28/0xfa
> > > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > > [<c01f529d>] ext4_mb_init+0x392/0x460
> > > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > > [<c0239bc8>] ? snprintf+0x15/0x17
> > > [<c01c0b26>] ? disk_name+0x24/0x69
> > > [<c018ba63>] get_sb_bdev+0xda/0x117
> > > [<c01e6711>] ext4_get_sb+0x13/0x15
> > > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > > [<c018adad>] do_kern_mount+0x33/0xbd
> > > [<c019d0af>] do_mount+0x660/0x6b8
> > > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > > [<c019d168>] sys_mount+0x61/0x99
> > > [<c0102908>] sysenter_do_call+0x12/0x36
> > > Mem-Info:
> > > DMA per-cpu:
> > > CPU 0: hi: 0, btch: 1 usd: 0
> > > Normal per-cpu:
> > > CPU 0: hi: 186, btch: 31 usd: 0
> > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > lowmem_reserve[]: 0 489 489
> > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > lowmem_reserve[]: 0 0 0
> > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > 57947 total pagecache pages
> > > 878 pages in swap cache
> > > Swap cache stats: add 920, delete 42, find 11/11
> > > Free swap = 1016436kB
> > > Total swap = 1020116kB
> > > 131056 pages RAM
> > > 4233 pages reserved
> > > 90573 pages shared
> > > 77286 pages non-shared
> > > EXT4-fs: mballoc enabled
> > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > >
> > > Thus it seems like the original bug is still there and any ideas how to
> > > debug the problem further are appreciated..
> > >
> > > The complete dmesg and kernel config are here:
> > >
> > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> >
> > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > it is the same issue, which from the trace it seems it is, then this
> > is due to an extra kmalloc() allocation and this apparently will not
> > get fixed on 2.6.31 due to the closeness of the merge window and the
> > non-criticalness this issue has been deemed.
> >
> > A patch fix is part of the ext4-patchqueue
> > http://repo.or.cz/w/ext4-patch-queue.git
>
> Thanks for the pointer but the page allocation failures that I hit seem
> to be caused by the memory management itself and the ext4 issue fixed by:
>
> http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
>
> is a different problem (unrelated to this one).
Here is another data point.
This time it is an order-6 page allocation failure for rt2870sta
(w/ upcoming driver changes) and Linus' tree from few days ago..
ifconfig: page allocation failure. order:6, mode:0x8020
Pid: 4752, comm: ifconfig Tainted: G WC 2.6.31-04082-g1824090-dirty #80
Call Trace:
[<c03996f2>] ? printk+0xf/0x15
[<c016b841>] __alloc_pages_nodemask+0x41d/0x462
[<c010681e>] dma_generic_alloc_coherent+0x53/0xbd
[<c02f83aa>] hcd_buffer_alloc+0xdb/0xe8
[<c01067cb>] ? dma_generic_alloc_coherent+0x0/0xbd
[<c02ee2d6>] usb_buffer_alloc+0x16/0x1d
[<e121b627>] NICInitTransmit+0xe2/0x7e4 [rt2870sta]
[<e121bfb1>] RTMPAllocTxRxRingMemory+0x11c/0x17b [rt2870sta]
[<e11f0960>] rt28xx_init+0xa5/0x3f8 [rt2870sta]
[<e121194a>] rt28xx_open+0x53/0xa2 [rt2870sta]
[<e1211b77>] MainVirtualIF_open+0x23/0xf6 [rt2870sta]
[<c03383a4>] dev_open+0x86/0xbb
[<c0337b1a>] dev_change_flags+0x96/0x147
[<c036e9cb>] devinet_ioctl+0x20f/0x4f8
[<c036fc8f>] inet_ioctl+0x8e/0xa7
[<c032ab50>] sock_ioctl+0x1c9/0x1ed
[<c032a987>] ? sock_ioctl+0x0/0x1ed
[<c0195732>] vfs_ioctl+0x18/0x71
[<c0195cbb>] do_vfs_ioctl+0x491/0x4cf
[<c01779d6>] ? handle_mm_fault+0x242/0x4ff
[<c0119609>] ? do_page_fault+0x102/0x292
[<c0140721>] ? up_read+0x16/0x29
[<c0195d27>] sys_ioctl+0x2e/0x48
[<c0102908>] sysenter_do_call+0x12/0x36
Mem-Info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Normal per-cpu:
CPU 0: hi: 186, btch: 31 usd: 84
Active_anon:14664 active_file:30057 inactive_anon:31744
inactive_file:29940 unevictable:2 dirty:11 writeback:0 unstable:0
free:5421 slab:4037 mapped:7781 pagetables:963 bounce:0
DMA free:2060kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:124kB active_file:3284kB inactive_file:972kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 489 489
Normal free:19624kB min:2788kB low:3484kB high:4180kB active_anon:58656kB inactive_anon:126852kB active_file:116944kB inactive_file:118788kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 3*4kB 0*8kB 2*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2060kB
Normal: 2180*4kB 625*8kB 303*16kB 33*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 19624kB
64568 total pagecache pages
3652 pages in swap cache
Swap cache stats: add 21642, delete 17990, find 4906/6079
Free swap = 981700kB
Total swap = 1020116kB
131056 pages RAM
4262 pages reserved
91941 pages shared
60834 pages non-shared
<-- ERROR in Alloc TX TxContext[0] HTTX_BUFFER !!
<-- RTMPAllocTxRxRingMemory, Status=3
ERROR!!! RTMPAllocDMAMemory failed, Status[=0x00000003]
!!! rt28xx Initialized fail !!!
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-19 13:25 ` Bartlomiej Zolnierkiewicz
0 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-09-19 13:25 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Tso Ted, Aneesh Kumar K.V, Zhu Yi, Andrew Morton, Mel Gorman,
Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Wednesday 02 September 2009 20:26:17 Bartlomiej Zolnierkiewicz wrote:
> On Wednesday 02 September 2009 20:02:14 Luis R. Rodriguez wrote:
> > On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> > Zolnierkiewicz<bzolnier@gmail.com> wrote:
> > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > >>
> > >> s/2.6.30/2.6.31-rc6/
> > >>
> > >> The issue has always been there but it was some recent change that
> > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > >
> > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > ext4 filesystem (mb_history is optional so the mount succeeded):
> >
> > OK so the mount succeeded.
> >
> > > EXT4-fs (dm-2): barriers enabled
> > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > EXT4-fs (dm-2): delayed allocation enabled
> > > EXT4-fs: file extents enabled
> > > mount: page allocation failure. order:5, mode:0xc0d0
> > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > Call Trace:
> > > [<c0394de3>] ? printk+0xf/0x14
> > > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > > [<c016a71b>] __get_free_pages+0xf/0x32
> > > [<c01865cf>] __kmalloc+0x28/0xfa
> > > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > > [<c01f529d>] ext4_mb_init+0x392/0x460
> > > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > > [<c0239bc8>] ? snprintf+0x15/0x17
> > > [<c01c0b26>] ? disk_name+0x24/0x69
> > > [<c018ba63>] get_sb_bdev+0xda/0x117
> > > [<c01e6711>] ext4_get_sb+0x13/0x15
> > > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > > [<c018adad>] do_kern_mount+0x33/0xbd
> > > [<c019d0af>] do_mount+0x660/0x6b8
> > > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > > [<c019d168>] sys_mount+0x61/0x99
> > > [<c0102908>] sysenter_do_call+0x12/0x36
> > > Mem-Info:
> > > DMA per-cpu:
> > > CPU 0: hi: 0, btch: 1 usd: 0
> > > Normal per-cpu:
> > > CPU 0: hi: 186, btch: 31 usd: 0
> > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > lowmem_reserve[]: 0 489 489
> > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > lowmem_reserve[]: 0 0 0
> > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > 57947 total pagecache pages
> > > 878 pages in swap cache
> > > Swap cache stats: add 920, delete 42, find 11/11
> > > Free swap = 1016436kB
> > > Total swap = 1020116kB
> > > 131056 pages RAM
> > > 4233 pages reserved
> > > 90573 pages shared
> > > 77286 pages non-shared
> > > EXT4-fs: mballoc enabled
> > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > >
> > > Thus it seems like the original bug is still there and any ideas how to
> > > debug the problem further are appreciated..
> > >
> > > The complete dmesg and kernel config are here:
> > >
> > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> >
> > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > it is the same issue, which from the trace it seems it is, then this
> > is due to an extra kmalloc() allocation and this apparently will not
> > get fixed on 2.6.31 due to the closeness of the merge window and the
> > non-criticalness this issue has been deemed.
> >
> > A patch fix is part of the ext4-patchqueue
> > http://repo.or.cz/w/ext4-patch-queue.git
>
> Thanks for the pointer but the page allocation failures that I hit seem
> to be caused by the memory management itself and the ext4 issue fixed by:
>
> http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
>
> is a different problem (unrelated to this one).
Here is another data point.
This time it is an order-6 page allocation failure for rt2870sta
(w/ upcoming driver changes) and Linus' tree from few days ago..
ifconfig: page allocation failure. order:6, mode:0x8020
Pid: 4752, comm: ifconfig Tainted: G WC 2.6.31-04082-g1824090-dirty #80
Call Trace:
[<c03996f2>] ? printk+0xf/0x15
[<c016b841>] __alloc_pages_nodemask+0x41d/0x462
[<c010681e>] dma_generic_alloc_coherent+0x53/0xbd
[<c02f83aa>] hcd_buffer_alloc+0xdb/0xe8
[<c01067cb>] ? dma_generic_alloc_coherent+0x0/0xbd
[<c02ee2d6>] usb_buffer_alloc+0x16/0x1d
[<e121b627>] NICInitTransmit+0xe2/0x7e4 [rt2870sta]
[<e121bfb1>] RTMPAllocTxRxRingMemory+0x11c/0x17b [rt2870sta]
[<e11f0960>] rt28xx_init+0xa5/0x3f8 [rt2870sta]
[<e121194a>] rt28xx_open+0x53/0xa2 [rt2870sta]
[<e1211b77>] MainVirtualIF_open+0x23/0xf6 [rt2870sta]
[<c03383a4>] dev_open+0x86/0xbb
[<c0337b1a>] dev_change_flags+0x96/0x147
[<c036e9cb>] devinet_ioctl+0x20f/0x4f8
[<c036fc8f>] inet_ioctl+0x8e/0xa7
[<c032ab50>] sock_ioctl+0x1c9/0x1ed
[<c032a987>] ? sock_ioctl+0x0/0x1ed
[<c0195732>] vfs_ioctl+0x18/0x71
[<c0195cbb>] do_vfs_ioctl+0x491/0x4cf
[<c01779d6>] ? handle_mm_fault+0x242/0x4ff
[<c0119609>] ? do_page_fault+0x102/0x292
[<c0140721>] ? up_read+0x16/0x29
[<c0195d27>] sys_ioctl+0x2e/0x48
[<c0102908>] sysenter_do_call+0x12/0x36
Mem-Info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Normal per-cpu:
CPU 0: hi: 186, btch: 31 usd: 84
Active_anon:14664 active_file:30057 inactive_anon:31744
inactive_file:29940 unevictable:2 dirty:11 writeback:0 unstable:0
free:5421 slab:4037 mapped:7781 pagetables:963 bounce:0
DMA free:2060kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:124kB active_file:3284kB inactive_file:972kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 489 489
Normal free:19624kB min:2788kB low:3484kB high:4180kB active_anon:58656kB inactive_anon:126852kB active_file:116944kB inactive_file:118788kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 3*4kB 0*8kB 2*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2060kB
Normal: 2180*4kB 625*8kB 303*16kB 33*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 19624kB
64568 total pagecache pages
3652 pages in swap cache
Swap cache stats: add 21642, delete 17990, find 4906/6079
Free swap = 981700kB
Total swap = 1020116kB
131056 pages RAM
4262 pages reserved
91941 pages shared
60834 pages non-shared
<-- ERROR in Alloc TX TxContext[0] HTTX_BUFFER !!
<-- RTMPAllocTxRxRingMemory, Status=3
ERROR!!! RTMPAllocDMAMemory failed, Status[=0x00000003]
!!! rt28xx Initialized fail !!!
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-19 13:25 ` Bartlomiej Zolnierkiewicz
0 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-09-19 13:25 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Tso Ted, Aneesh Kumar K.V, Zhu Yi, Andrew Morton, Mel Gorman,
Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
James Ketrenos, Chatre, Reinette,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
ipw2100-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
On Wednesday 02 September 2009 20:26:17 Bartlomiej Zolnierkiewicz wrote:
> On Wednesday 02 September 2009 20:02:14 Luis R. Rodriguez wrote:
> > On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> > Zolnierkiewicz<bzolnier-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > >>
> > >> s/2.6.30/2.6.31-rc6/
> > >>
> > >> The issue has always been there but it was some recent change that
> > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > >
> > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > ext4 filesystem (mb_history is optional so the mount succeeded):
> >
> > OK so the mount succeeded.
> >
> > > EXT4-fs (dm-2): barriers enabled
> > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > EXT4-fs (dm-2): delayed allocation enabled
> > > EXT4-fs: file extents enabled
> > > mount: page allocation failure. order:5, mode:0xc0d0
> > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > Call Trace:
> > > [<c0394de3>] ? printk+0xf/0x14
> > > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > > [<c016a71b>] __get_free_pages+0xf/0x32
> > > [<c01865cf>] __kmalloc+0x28/0xfa
> > > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > > [<c01f529d>] ext4_mb_init+0x392/0x460
> > > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > > [<c0239bc8>] ? snprintf+0x15/0x17
> > > [<c01c0b26>] ? disk_name+0x24/0x69
> > > [<c018ba63>] get_sb_bdev+0xda/0x117
> > > [<c01e6711>] ext4_get_sb+0x13/0x15
> > > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > > [<c018adad>] do_kern_mount+0x33/0xbd
> > > [<c019d0af>] do_mount+0x660/0x6b8
> > > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > > [<c019d168>] sys_mount+0x61/0x99
> > > [<c0102908>] sysenter_do_call+0x12/0x36
> > > Mem-Info:
> > > DMA per-cpu:
> > > CPU 0: hi: 0, btch: 1 usd: 0
> > > Normal per-cpu:
> > > CPU 0: hi: 186, btch: 31 usd: 0
> > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > lowmem_reserve[]: 0 489 489
> > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > lowmem_reserve[]: 0 0 0
> > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > 57947 total pagecache pages
> > > 878 pages in swap cache
> > > Swap cache stats: add 920, delete 42, find 11/11
> > > Free swap = 1016436kB
> > > Total swap = 1020116kB
> > > 131056 pages RAM
> > > 4233 pages reserved
> > > 90573 pages shared
> > > 77286 pages non-shared
> > > EXT4-fs: mballoc enabled
> > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > >
> > > Thus it seems like the original bug is still there and any ideas how to
> > > debug the problem further are appreciated..
> > >
> > > The complete dmesg and kernel config are here:
> > >
> > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> >
> > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > it is the same issue, which from the trace it seems it is, then this
> > is due to an extra kmalloc() allocation and this apparently will not
> > get fixed on 2.6.31 due to the closeness of the merge window and the
> > non-criticalness this issue has been deemed.
> >
> > A patch fix is part of the ext4-patchqueue
> > http://repo.or.cz/w/ext4-patch-queue.git
>
> Thanks for the pointer but the page allocation failures that I hit seem
> to be caused by the memory management itself and the ext4 issue fixed by:
>
> http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
>
> is a different problem (unrelated to this one).
Here is another data point.
This time it is an order-6 page allocation failure for rt2870sta
(w/ upcoming driver changes) and Linus' tree from few days ago..
ifconfig: page allocation failure. order:6, mode:0x8020
Pid: 4752, comm: ifconfig Tainted: G WC 2.6.31-04082-g1824090-dirty #80
Call Trace:
[<c03996f2>] ? printk+0xf/0x15
[<c016b841>] __alloc_pages_nodemask+0x41d/0x462
[<c010681e>] dma_generic_alloc_coherent+0x53/0xbd
[<c02f83aa>] hcd_buffer_alloc+0xdb/0xe8
[<c01067cb>] ? dma_generic_alloc_coherent+0x0/0xbd
[<c02ee2d6>] usb_buffer_alloc+0x16/0x1d
[<e121b627>] NICInitTransmit+0xe2/0x7e4 [rt2870sta]
[<e121bfb1>] RTMPAllocTxRxRingMemory+0x11c/0x17b [rt2870sta]
[<e11f0960>] rt28xx_init+0xa5/0x3f8 [rt2870sta]
[<e121194a>] rt28xx_open+0x53/0xa2 [rt2870sta]
[<e1211b77>] MainVirtualIF_open+0x23/0xf6 [rt2870sta]
[<c03383a4>] dev_open+0x86/0xbb
[<c0337b1a>] dev_change_flags+0x96/0x147
[<c036e9cb>] devinet_ioctl+0x20f/0x4f8
[<c036fc8f>] inet_ioctl+0x8e/0xa7
[<c032ab50>] sock_ioctl+0x1c9/0x1ed
[<c032a987>] ? sock_ioctl+0x0/0x1ed
[<c0195732>] vfs_ioctl+0x18/0x71
[<c0195cbb>] do_vfs_ioctl+0x491/0x4cf
[<c01779d6>] ? handle_mm_fault+0x242/0x4ff
[<c0119609>] ? do_page_fault+0x102/0x292
[<c0140721>] ? up_read+0x16/0x29
[<c0195d27>] sys_ioctl+0x2e/0x48
[<c0102908>] sysenter_do_call+0x12/0x36
Mem-Info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Normal per-cpu:
CPU 0: hi: 186, btch: 31 usd: 84
Active_anon:14664 active_file:30057 inactive_anon:31744
inactive_file:29940 unevictable:2 dirty:11 writeback:0 unstable:0
free:5421 slab:4037 mapped:7781 pagetables:963 bounce:0
DMA free:2060kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:124kB active_file:3284kB inactive_file:972kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 489 489
Normal free:19624kB min:2788kB low:3484kB high:4180kB active_anon:58656kB inactive_anon:126852kB active_file:116944kB inactive_file:118788kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 3*4kB 0*8kB 2*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2060kB
Normal: 2180*4kB 625*8kB 303*16kB 33*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 19624kB
64568 total pagecache pages
3652 pages in swap cache
Swap cache stats: add 21642, delete 17990, find 4906/6079
Free swap = 981700kB
Total swap = 1020116kB
131056 pages RAM
4262 pages reserved
91941 pages shared
60834 pages non-shared
<-- ERROR in Alloc TX TxContext[0] HTTX_BUFFER !!
<-- RTMPAllocTxRxRingMemory, Status=3
ERROR!!! RTMPAllocDMAMemory failed, Status[=0x00000003]
!!! rt28xx Initialized fail !!!
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-19 13:25 ` Bartlomiej Zolnierkiewicz
0 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-09-19 13:25 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Tso Ted, Aneesh Kumar K.V, Zhu Yi, Andrew Morton, Mel Gorman,
Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Wednesday 02 September 2009 20:26:17 Bartlomiej Zolnierkiewicz wrote:
> On Wednesday 02 September 2009 20:02:14 Luis R. Rodriguez wrote:
> > On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> > Zolnierkiewicz<bzolnier@gmail.com> wrote:
> > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > >>
> > >> s/2.6.30/2.6.31-rc6/
> > >>
> > >> The issue has always been there but it was some recent change that
> > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > >
> > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > ext4 filesystem (mb_history is optional so the mount succeeded):
> >
> > OK so the mount succeeded.
> >
> > > EXT4-fs (dm-2): barriers enabled
> > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > EXT4-fs (dm-2): delayed allocation enabled
> > > EXT4-fs: file extents enabled
> > > mount: page allocation failure. order:5, mode:0xc0d0
> > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > Call Trace:
> > > [<c0394de3>] ? printk+0xf/0x14
> > > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > > [<c016a71b>] __get_free_pages+0xf/0x32
> > > [<c01865cf>] __kmalloc+0x28/0xfa
> > > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > > [<c01f529d>] ext4_mb_init+0x392/0x460
> > > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > > [<c0239bc8>] ? snprintf+0x15/0x17
> > > [<c01c0b26>] ? disk_name+0x24/0x69
> > > [<c018ba63>] get_sb_bdev+0xda/0x117
> > > [<c01e6711>] ext4_get_sb+0x13/0x15
> > > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > > [<c018adad>] do_kern_mount+0x33/0xbd
> > > [<c019d0af>] do_mount+0x660/0x6b8
> > > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > > [<c019d168>] sys_mount+0x61/0x99
> > > [<c0102908>] sysenter_do_call+0x12/0x36
> > > Mem-Info:
> > > DMA per-cpu:
> > > CPU 0: hi: 0, btch: 1 usd: 0
> > > Normal per-cpu:
> > > CPU 0: hi: 186, btch: 31 usd: 0
> > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > lowmem_reserve[]: 0 489 489
> > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > lowmem_reserve[]: 0 0 0
> > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > 57947 total pagecache pages
> > > 878 pages in swap cache
> > > Swap cache stats: add 920, delete 42, find 11/11
> > > Free swap = 1016436kB
> > > Total swap = 1020116kB
> > > 131056 pages RAM
> > > 4233 pages reserved
> > > 90573 pages shared
> > > 77286 pages non-shared
> > > EXT4-fs: mballoc enabled
> > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > >
> > > Thus it seems like the original bug is still there and any ideas how to
> > > debug the problem further are appreciated..
> > >
> > > The complete dmesg and kernel config are here:
> > >
> > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> >
> > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > it is the same issue, which from the trace it seems it is, then this
> > is due to an extra kmalloc() allocation and this apparently will not
> > get fixed on 2.6.31 due to the closeness of the merge window and the
> > non-criticalness this issue has been deemed.
> >
> > A patch fix is part of the ext4-patchqueue
> > http://repo.or.cz/w/ext4-patch-queue.git
>
> Thanks for the pointer but the page allocation failures that I hit seem
> to be caused by the memory management itself and the ext4 issue fixed by:
>
> http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
>
> is a different problem (unrelated to this one).
Here is another data point.
This time it is an order-6 page allocation failure for rt2870sta
(w/ upcoming driver changes) and Linus' tree from few days ago..
ifconfig: page allocation failure. order:6, mode:0x8020
Pid: 4752, comm: ifconfig Tainted: G WC 2.6.31-04082-g1824090-dirty #80
Call Trace:
[<c03996f2>] ? printk+0xf/0x15
[<c016b841>] __alloc_pages_nodemask+0x41d/0x462
[<c010681e>] dma_generic_alloc_coherent+0x53/0xbd
[<c02f83aa>] hcd_buffer_alloc+0xdb/0xe8
[<c01067cb>] ? dma_generic_alloc_coherent+0x0/0xbd
[<c02ee2d6>] usb_buffer_alloc+0x16/0x1d
[<e121b627>] NICInitTransmit+0xe2/0x7e4 [rt2870sta]
[<e121bfb1>] RTMPAllocTxRxRingMemory+0x11c/0x17b [rt2870sta]
[<e11f0960>] rt28xx_init+0xa5/0x3f8 [rt2870sta]
[<e121194a>] rt28xx_open+0x53/0xa2 [rt2870sta]
[<e1211b77>] MainVirtualIF_open+0x23/0xf6 [rt2870sta]
[<c03383a4>] dev_open+0x86/0xbb
[<c0337b1a>] dev_change_flags+0x96/0x147
[<c036e9cb>] devinet_ioctl+0x20f/0x4f8
[<c036fc8f>] inet_ioctl+0x8e/0xa7
[<c032ab50>] sock_ioctl+0x1c9/0x1ed
[<c032a987>] ? sock_ioctl+0x0/0x1ed
[<c0195732>] vfs_ioctl+0x18/0x71
[<c0195cbb>] do_vfs_ioctl+0x491/0x4cf
[<c01779d6>] ? handle_mm_fault+0x242/0x4ff
[<c0119609>] ? do_page_fault+0x102/0x292
[<c0140721>] ? up_read+0x16/0x29
[<c0195d27>] sys_ioctl+0x2e/0x48
[<c0102908>] sysenter_do_call+0x12/0x36
Mem-Info:
DMA per-cpu:
CPU 0: hi: 0, btch: 1 usd: 0
Normal per-cpu:
CPU 0: hi: 186, btch: 31 usd: 84
Active_anon:14664 active_file:30057 inactive_anon:31744
inactive_file:29940 unevictable:2 dirty:11 writeback:0 unstable:0
free:5421 slab:4037 mapped:7781 pagetables:963 bounce:0
DMA free:2060kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:124kB active_file:3284kB inactive_file:972kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 489 489
Normal free:19624kB min:2788kB low:3484kB high:4180kB active_anon:58656kB inactive_anon:126852kB active_file:116944kB inactive_file:118788kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
lowmem_reserve[]: 0 0 0
DMA: 3*4kB 0*8kB 2*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2060kB
Normal: 2180*4kB 625*8kB 303*16kB 33*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 19624kB
64568 total pagecache pages
3652 pages in swap cache
Swap cache stats: add 21642, delete 17990, find 4906/6079
Free swap = 981700kB
Total swap = 1020116kB
131056 pages RAM
4262 pages reserved
91941 pages shared
60834 pages non-shared
<-- ERROR in Alloc TX TxContext[0] HTTX_BUFFER !!
<-- RTMPAllocTxRxRingMemory, Status=3
ERROR!!! RTMPAllocDMAMemory failed, Status[=0x00000003]
!!! rt28xx Initialized fail !!!
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
2009-09-19 13:25 ` Bartlomiej Zolnierkiewicz
(?)
@ 2009-09-21 8:58 ` Mel Gorman
-1 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-21 8:58 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz
Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Sat, Sep 19, 2009 at 03:25:32PM +0200, Bartlomiej Zolnierkiewicz wrote:
> On Wednesday 02 September 2009 20:26:17 Bartlomiej Zolnierkiewicz wrote:
> > On Wednesday 02 September 2009 20:02:14 Luis R. Rodriguez wrote:
> > > On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> > > Zolnierkiewicz<bzolnier@gmail.com> wrote:
> > > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > > >>
> > > >> s/2.6.30/2.6.31-rc6/
> > > >>
> > > >> The issue has always been there but it was some recent change that
> > > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > > >
> > > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > > ext4 filesystem (mb_history is optional so the mount succeeded):
> > >
> > > OK so the mount succeeded.
> > >
> > > > EXT4-fs (dm-2): barriers enabled
> > > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > > EXT4-fs (dm-2): delayed allocation enabled
> > > > EXT4-fs: file extents enabled
> > > > mount: page allocation failure. order:5, mode:0xc0d0
> > > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > > Call Trace:
> > > > [<c0394de3>] ? printk+0xf/0x14
> > > > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > > > [<c016a71b>] __get_free_pages+0xf/0x32
> > > > [<c01865cf>] __kmalloc+0x28/0xfa
> > > > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > > > [<c01f529d>] ext4_mb_init+0x392/0x460
> > > > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > > > [<c0239bc8>] ? snprintf+0x15/0x17
> > > > [<c01c0b26>] ? disk_name+0x24/0x69
> > > > [<c018ba63>] get_sb_bdev+0xda/0x117
> > > > [<c01e6711>] ext4_get_sb+0x13/0x15
> > > > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > > > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > > > [<c018adad>] do_kern_mount+0x33/0xbd
> > > > [<c019d0af>] do_mount+0x660/0x6b8
> > > > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > > > [<c019d168>] sys_mount+0x61/0x99
> > > > [<c0102908>] sysenter_do_call+0x12/0x36
> > > > Mem-Info:
> > > > DMA per-cpu:
> > > > CPU 0: hi: 0, btch: 1 usd: 0
> > > > Normal per-cpu:
> > > > CPU 0: hi: 186, btch: 31 usd: 0
> > > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > > > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > > > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > > lowmem_reserve[]: 0 489 489
> > > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > > lowmem_reserve[]: 0 0 0
> > > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > > 57947 total pagecache pages
> > > > 878 pages in swap cache
> > > > Swap cache stats: add 920, delete 42, find 11/11
> > > > Free swap = 1016436kB
> > > > Total swap = 1020116kB
> > > > 131056 pages RAM
> > > > 4233 pages reserved
> > > > 90573 pages shared
> > > > 77286 pages non-shared
> > > > EXT4-fs: mballoc enabled
> > > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > > >
> > > > Thus it seems like the original bug is still there and any ideas how to
> > > > debug the problem further are appreciated..
> > > >
> > > > The complete dmesg and kernel config are here:
> > > >
> > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> > >
> > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > it is the same issue, which from the trace it seems it is, then this
> > > is due to an extra kmalloc() allocation and this apparently will not
> > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > non-criticalness this issue has been deemed.
> > >
> > > A patch fix is part of the ext4-patchqueue
> > > http://repo.or.cz/w/ext4-patch-queue.git
> >
> > Thanks for the pointer but the page allocation failures that I hit seem
> > to be caused by the memory management itself and the ext4 issue fixed by:
> >
> > http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
> >
> > is a different problem (unrelated to this one).
>
> Here is another data point.
>
> This time it is an order-6 page allocation failure for rt2870sta
> (w/ upcoming driver changes) and Linus' tree from few days ago..
>
It's another high-order atomic allocation which is difficult to grant.
I didn't look closely, but is this the same type of thing - large allocation
failure during firmware loading? If so, is this during resume or is the
device being reloaded for some other reason?
I suspect that there are going to be a few of these bugs cropping up
every so often where network devices are assuming large atomic
allocations will succeed because the "only time they happen" is during
boot but these days are happening at runtime for other reasons.
> ifconfig: page allocation failure. order:6, mode:0x8020
> Pid: 4752, comm: ifconfig Tainted: G WC 2.6.31-04082-g1824090-dirty #80
> Call Trace:
> [<c03996f2>] ? printk+0xf/0x15
> [<c016b841>] __alloc_pages_nodemask+0x41d/0x462
> [<c010681e>] dma_generic_alloc_coherent+0x53/0xbd
> [<c02f83aa>] hcd_buffer_alloc+0xdb/0xe8
> [<c01067cb>] ? dma_generic_alloc_coherent+0x0/0xbd
> [<c02ee2d6>] usb_buffer_alloc+0x16/0x1d
> [<e121b627>] NICInitTransmit+0xe2/0x7e4 [rt2870sta]
> [<e121bfb1>] RTMPAllocTxRxRingMemory+0x11c/0x17b [rt2870sta]
> [<e11f0960>] rt28xx_init+0xa5/0x3f8 [rt2870sta]
> [<e121194a>] rt28xx_open+0x53/0xa2 [rt2870sta]
> [<e1211b77>] MainVirtualIF_open+0x23/0xf6 [rt2870sta]
> [<c03383a4>] dev_open+0x86/0xbb
> [<c0337b1a>] dev_change_flags+0x96/0x147
> [<c036e9cb>] devinet_ioctl+0x20f/0x4f8
> [<c036fc8f>] inet_ioctl+0x8e/0xa7
> [<c032ab50>] sock_ioctl+0x1c9/0x1ed
> [<c032a987>] ? sock_ioctl+0x0/0x1ed
> [<c0195732>] vfs_ioctl+0x18/0x71
> [<c0195cbb>] do_vfs_ioctl+0x491/0x4cf
> [<c01779d6>] ? handle_mm_fault+0x242/0x4ff
> [<c0119609>] ? do_page_fault+0x102/0x292
> [<c0140721>] ? up_read+0x16/0x29
> [<c0195d27>] sys_ioctl+0x2e/0x48
> [<c0102908>] sysenter_do_call+0x12/0x36
> Mem-Info:
> DMA per-cpu:
> CPU 0: hi: 0, btch: 1 usd: 0
> Normal per-cpu:
> CPU 0: hi: 186, btch: 31 usd: 84
> Active_anon:14664 active_file:30057 inactive_anon:31744
> inactive_file:29940 unevictable:2 dirty:11 writeback:0 unstable:0
> free:5421 slab:4037 mapped:7781 pagetables:963 bounce:0
> DMA free:2060kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:124kB active_file:3284kB inactive_file:972kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 489 489
> Normal free:19624kB min:2788kB low:3484kB high:4180kB active_anon:58656kB inactive_anon:126852kB active_file:116944kB inactive_file:118788kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0
> DMA: 3*4kB 0*8kB 2*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2060kB
> Normal: 2180*4kB 625*8kB 303*16kB 33*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 19624kB
> 64568 total pagecache pages
> 3652 pages in swap cache
> Swap cache stats: add 21642, delete 17990, find 4906/6079
> Free swap = 981700kB
> Total swap = 1020116kB
> 131056 pages RAM
> 4262 pages reserved
> 91941 pages shared
> 60834 pages non-shared
> <-- ERROR in Alloc TX TxContext[0] HTTX_BUFFER !!
> <-- RTMPAllocTxRxRingMemory, Status=3
> ERROR!!! RTMPAllocDMAMemory failed, Status[=0x00000003]
> !!! rt28xx Initialized fail !!!
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-21 8:58 ` Mel Gorman
0 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-21 8:58 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz
Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Sat, Sep 19, 2009 at 03:25:32PM +0200, Bartlomiej Zolnierkiewicz wrote:
> On Wednesday 02 September 2009 20:26:17 Bartlomiej Zolnierkiewicz wrote:
> > On Wednesday 02 September 2009 20:02:14 Luis R. Rodriguez wrote:
> > > On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> > > Zolnierkiewicz<bzolnier@gmail.com> wrote:
> > > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > > >>
> > > >> s/2.6.30/2.6.31-rc6/
> > > >>
> > > >> The issue has always been there but it was some recent change that
> > > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > > >
> > > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > > ext4 filesystem (mb_history is optional so the mount succeeded):
> > >
> > > OK so the mount succeeded.
> > >
> > > > EXT4-fs (dm-2): barriers enabled
> > > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > > EXT4-fs (dm-2): delayed allocation enabled
> > > > EXT4-fs: file extents enabled
> > > > mount: page allocation failure. order:5, mode:0xc0d0
> > > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > > Call Trace:
> > > > [<c0394de3>] ? printk+0xf/0x14
> > > > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > > > [<c016a71b>] __get_free_pages+0xf/0x32
> > > > [<c01865cf>] __kmalloc+0x28/0xfa
> > > > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > > > [<c01f529d>] ext4_mb_init+0x392/0x460
> > > > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > > > [<c0239bc8>] ? snprintf+0x15/0x17
> > > > [<c01c0b26>] ? disk_name+0x24/0x69
> > > > [<c018ba63>] get_sb_bdev+0xda/0x117
> > > > [<c01e6711>] ext4_get_sb+0x13/0x15
> > > > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > > > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > > > [<c018adad>] do_kern_mount+0x33/0xbd
> > > > [<c019d0af>] do_mount+0x660/0x6b8
> > > > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > > > [<c019d168>] sys_mount+0x61/0x99
> > > > [<c0102908>] sysenter_do_call+0x12/0x36
> > > > Mem-Info:
> > > > DMA per-cpu:
> > > > CPU 0: hi: 0, btch: 1 usd: 0
> > > > Normal per-cpu:
> > > > CPU 0: hi: 186, btch: 31 usd: 0
> > > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > > > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > > > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > > lowmem_reserve[]: 0 489 489
> > > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > > lowmem_reserve[]: 0 0 0
> > > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > > 57947 total pagecache pages
> > > > 878 pages in swap cache
> > > > Swap cache stats: add 920, delete 42, find 11/11
> > > > Free swap = 1016436kB
> > > > Total swap = 1020116kB
> > > > 131056 pages RAM
> > > > 4233 pages reserved
> > > > 90573 pages shared
> > > > 77286 pages non-shared
> > > > EXT4-fs: mballoc enabled
> > > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > > >
> > > > Thus it seems like the original bug is still there and any ideas how to
> > > > debug the problem further are appreciated..
> > > >
> > > > The complete dmesg and kernel config are here:
> > > >
> > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> > >
> > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > it is the same issue, which from the trace it seems it is, then this
> > > is due to an extra kmalloc() allocation and this apparently will not
> > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > non-criticalness this issue has been deemed.
> > >
> > > A patch fix is part of the ext4-patchqueue
> > > http://repo.or.cz/w/ext4-patch-queue.git
> >
> > Thanks for the pointer but the page allocation failures that I hit seem
> > to be caused by the memory management itself and the ext4 issue fixed by:
> >
> > http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
> >
> > is a different problem (unrelated to this one).
>
> Here is another data point.
>
> This time it is an order-6 page allocation failure for rt2870sta
> (w/ upcoming driver changes) and Linus' tree from few days ago..
>
It's another high-order atomic allocation which is difficult to grant.
I didn't look closely, but is this the same type of thing - large allocation
failure during firmware loading? If so, is this during resume or is the
device being reloaded for some other reason?
I suspect that there are going to be a few of these bugs cropping up
every so often where network devices are assuming large atomic
allocations will succeed because the "only time they happen" is during
boot but these days are happening at runtime for other reasons.
> ifconfig: page allocation failure. order:6, mode:0x8020
> Pid: 4752, comm: ifconfig Tainted: G WC 2.6.31-04082-g1824090-dirty #80
> Call Trace:
> [<c03996f2>] ? printk+0xf/0x15
> [<c016b841>] __alloc_pages_nodemask+0x41d/0x462
> [<c010681e>] dma_generic_alloc_coherent+0x53/0xbd
> [<c02f83aa>] hcd_buffer_alloc+0xdb/0xe8
> [<c01067cb>] ? dma_generic_alloc_coherent+0x0/0xbd
> [<c02ee2d6>] usb_buffer_alloc+0x16/0x1d
> [<e121b627>] NICInitTransmit+0xe2/0x7e4 [rt2870sta]
> [<e121bfb1>] RTMPAllocTxRxRingMemory+0x11c/0x17b [rt2870sta]
> [<e11f0960>] rt28xx_init+0xa5/0x3f8 [rt2870sta]
> [<e121194a>] rt28xx_open+0x53/0xa2 [rt2870sta]
> [<e1211b77>] MainVirtualIF_open+0x23/0xf6 [rt2870sta]
> [<c03383a4>] dev_open+0x86/0xbb
> [<c0337b1a>] dev_change_flags+0x96/0x147
> [<c036e9cb>] devinet_ioctl+0x20f/0x4f8
> [<c036fc8f>] inet_ioctl+0x8e/0xa7
> [<c032ab50>] sock_ioctl+0x1c9/0x1ed
> [<c032a987>] ? sock_ioctl+0x0/0x1ed
> [<c0195732>] vfs_ioctl+0x18/0x71
> [<c0195cbb>] do_vfs_ioctl+0x491/0x4cf
> [<c01779d6>] ? handle_mm_fault+0x242/0x4ff
> [<c0119609>] ? do_page_fault+0x102/0x292
> [<c0140721>] ? up_read+0x16/0x29
> [<c0195d27>] sys_ioctl+0x2e/0x48
> [<c0102908>] sysenter_do_call+0x12/0x36
> Mem-Info:
> DMA per-cpu:
> CPU 0: hi: 0, btch: 1 usd: 0
> Normal per-cpu:
> CPU 0: hi: 186, btch: 31 usd: 84
> Active_anon:14664 active_file:30057 inactive_anon:31744
> inactive_file:29940 unevictable:2 dirty:11 writeback:0 unstable:0
> free:5421 slab:4037 mapped:7781 pagetables:963 bounce:0
> DMA free:2060kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:124kB active_file:3284kB inactive_file:972kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 489 489
> Normal free:19624kB min:2788kB low:3484kB high:4180kB active_anon:58656kB inactive_anon:126852kB active_file:116944kB inactive_file:118788kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0
> DMA: 3*4kB 0*8kB 2*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2060kB
> Normal: 2180*4kB 625*8kB 303*16kB 33*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 19624kB
> 64568 total pagecache pages
> 3652 pages in swap cache
> Swap cache stats: add 21642, delete 17990, find 4906/6079
> Free swap = 981700kB
> Total swap = 1020116kB
> 131056 pages RAM
> 4262 pages reserved
> 91941 pages shared
> 60834 pages non-shared
> <-- ERROR in Alloc TX TxContext[0] HTTX_BUFFER !!
> <-- RTMPAllocTxRxRingMemory, Status=3
> ERROR!!! RTMPAllocDMAMemory failed, Status[=0x00000003]
> !!! rt28xx Initialized fail !!!
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-21 8:58 ` Mel Gorman
0 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-21 8:58 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz
Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Sat, Sep 19, 2009 at 03:25:32PM +0200, Bartlomiej Zolnierkiewicz wrote:
> On Wednesday 02 September 2009 20:26:17 Bartlomiej Zolnierkiewicz wrote:
> > On Wednesday 02 September 2009 20:02:14 Luis R. Rodriguez wrote:
> > > On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> > > Zolnierkiewicz<bzolnier@gmail.com> wrote:
> > > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > > >>
> > > >> s/2.6.30/2.6.31-rc6/
> > > >>
> > > >> The issue has always been there but it was some recent change that
> > > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > > >
> > > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > > ext4 filesystem (mb_history is optional so the mount succeeded):
> > >
> > > OK so the mount succeeded.
> > >
> > > > EXT4-fs (dm-2): barriers enabled
> > > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > > EXT4-fs (dm-2): delayed allocation enabled
> > > > EXT4-fs: file extents enabled
> > > > mount: page allocation failure. order:5, mode:0xc0d0
> > > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > > Call Trace:
> > > > [<c0394de3>] ? printk+0xf/0x14
> > > > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > > > [<c016a71b>] __get_free_pages+0xf/0x32
> > > > [<c01865cf>] __kmalloc+0x28/0xfa
> > > > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > > > [<c01f529d>] ext4_mb_init+0x392/0x460
> > > > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > > > [<c0239bc8>] ? snprintf+0x15/0x17
> > > > [<c01c0b26>] ? disk_name+0x24/0x69
> > > > [<c018ba63>] get_sb_bdev+0xda/0x117
> > > > [<c01e6711>] ext4_get_sb+0x13/0x15
> > > > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > > > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > > > [<c018adad>] do_kern_mount+0x33/0xbd
> > > > [<c019d0af>] do_mount+0x660/0x6b8
> > > > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > > > [<c019d168>] sys_mount+0x61/0x99
> > > > [<c0102908>] sysenter_do_call+0x12/0x36
> > > > Mem-Info:
> > > > DMA per-cpu:
> > > > CPU 0: hi: 0, btch: 1 usd: 0
> > > > Normal per-cpu:
> > > > CPU 0: hi: 186, btch: 31 usd: 0
> > > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > > > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > > > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > > lowmem_reserve[]: 0 489 489
> > > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > > lowmem_reserve[]: 0 0 0
> > > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > > 57947 total pagecache pages
> > > > 878 pages in swap cache
> > > > Swap cache stats: add 920, delete 42, find 11/11
> > > > Free swap = 1016436kB
> > > > Total swap = 1020116kB
> > > > 131056 pages RAM
> > > > 4233 pages reserved
> > > > 90573 pages shared
> > > > 77286 pages non-shared
> > > > EXT4-fs: mballoc enabled
> > > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > > >
> > > > Thus it seems like the original bug is still there and any ideas how to
> > > > debug the problem further are appreciated..
> > > >
> > > > The complete dmesg and kernel config are here:
> > > >
> > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> > >
> > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > it is the same issue, which from the trace it seems it is, then this
> > > is due to an extra kmalloc() allocation and this apparently will not
> > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > non-criticalness this issue has been deemed.
> > >
> > > A patch fix is part of the ext4-patchqueue
> > > http://repo.or.cz/w/ext4-patch-queue.git
> >
> > Thanks for the pointer but the page allocation failures that I hit seem
> > to be caused by the memory management itself and the ext4 issue fixed by:
> >
> > http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
> >
> > is a different problem (unrelated to this one).
>
> Here is another data point.
>
> This time it is an order-6 page allocation failure for rt2870sta
> (w/ upcoming driver changes) and Linus' tree from few days ago..
>
It's another high-order atomic allocation which is difficult to grant.
I didn't look closely, but is this the same type of thing - large allocation
failure during firmware loading? If so, is this during resume or is the
device being reloaded for some other reason?
I suspect that there are going to be a few of these bugs cropping up
every so often where network devices are assuming large atomic
allocations will succeed because the "only time they happen" is during
boot but these days are happening at runtime for other reasons.
> ifconfig: page allocation failure. order:6, mode:0x8020
> Pid: 4752, comm: ifconfig Tainted: G WC 2.6.31-04082-g1824090-dirty #80
> Call Trace:
> [<c03996f2>] ? printk+0xf/0x15
> [<c016b841>] __alloc_pages_nodemask+0x41d/0x462
> [<c010681e>] dma_generic_alloc_coherent+0x53/0xbd
> [<c02f83aa>] hcd_buffer_alloc+0xdb/0xe8
> [<c01067cb>] ? dma_generic_alloc_coherent+0x0/0xbd
> [<c02ee2d6>] usb_buffer_alloc+0x16/0x1d
> [<e121b627>] NICInitTransmit+0xe2/0x7e4 [rt2870sta]
> [<e121bfb1>] RTMPAllocTxRxRingMemory+0x11c/0x17b [rt2870sta]
> [<e11f0960>] rt28xx_init+0xa5/0x3f8 [rt2870sta]
> [<e121194a>] rt28xx_open+0x53/0xa2 [rt2870sta]
> [<e1211b77>] MainVirtualIF_open+0x23/0xf6 [rt2870sta]
> [<c03383a4>] dev_open+0x86/0xbb
> [<c0337b1a>] dev_change_flags+0x96/0x147
> [<c036e9cb>] devinet_ioctl+0x20f/0x4f8
> [<c036fc8f>] inet_ioctl+0x8e/0xa7
> [<c032ab50>] sock_ioctl+0x1c9/0x1ed
> [<c032a987>] ? sock_ioctl+0x0/0x1ed
> [<c0195732>] vfs_ioctl+0x18/0x71
> [<c0195cbb>] do_vfs_ioctl+0x491/0x4cf
> [<c01779d6>] ? handle_mm_fault+0x242/0x4ff
> [<c0119609>] ? do_page_fault+0x102/0x292
> [<c0140721>] ? up_read+0x16/0x29
> [<c0195d27>] sys_ioctl+0x2e/0x48
> [<c0102908>] sysenter_do_call+0x12/0x36
> Mem-Info:
> DMA per-cpu:
> CPU 0: hi: 0, btch: 1 usd: 0
> Normal per-cpu:
> CPU 0: hi: 186, btch: 31 usd: 84
> Active_anon:14664 active_file:30057 inactive_anon:31744
> inactive_file:29940 unevictable:2 dirty:11 writeback:0 unstable:0
> free:5421 slab:4037 mapped:7781 pagetables:963 bounce:0
> DMA free:2060kB min:84kB low:104kB high:124kB active_anon:0kB inactive_anon:124kB active_file:3284kB inactive_file:972kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 489 489
> Normal free:19624kB min:2788kB low:3484kB high:4180kB active_anon:58656kB inactive_anon:126852kB active_file:116944kB inactive_file:118788kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> lowmem_reserve[]: 0 0 0
> DMA: 3*4kB 0*8kB 2*16kB 1*32kB 1*64kB 1*128kB 1*256kB 1*512kB 1*1024kB 0*2048kB 0*4096kB = 2060kB
> Normal: 2180*4kB 625*8kB 303*16kB 33*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 19624kB
> 64568 total pagecache pages
> 3652 pages in swap cache
> Swap cache stats: add 21642, delete 17990, find 4906/6079
> Free swap = 981700kB
> Total swap = 1020116kB
> 131056 pages RAM
> 4262 pages reserved
> 91941 pages shared
> 60834 pages non-shared
> <-- ERROR in Alloc TX TxContext[0] HTTX_BUFFER !!
> <-- RTMPAllocTxRxRingMemory, Status=3
> ERROR!!! RTMPAllocDMAMemory failed, Status[=0x00000003]
> !!! rt28xx Initialized fail !!!
>
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
2009-09-21 8:58 ` Mel Gorman
(?)
@ 2009-09-21 9:59 ` Bartlomiej Zolnierkiewicz
-1 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-09-21 9:59 UTC (permalink / raw)
To: Mel Gorman
Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Monday 21 September 2009 10:58:44 Mel Gorman wrote:
> On Sat, Sep 19, 2009 at 03:25:32PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > On Wednesday 02 September 2009 20:26:17 Bartlomiej Zolnierkiewicz wrote:
> > > On Wednesday 02 September 2009 20:02:14 Luis R. Rodriguez wrote:
> > > > On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> > > > Zolnierkiewicz<bzolnier@gmail.com> wrote:
> > > > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > > > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > > > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > > > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > > > >>
> > > > >> s/2.6.30/2.6.31-rc6/
> > > > >>
> > > > >> The issue has always been there but it was some recent change that
> > > > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > > > >
> > > > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > > > ext4 filesystem (mb_history is optional so the mount succeeded):
> > > >
> > > > OK so the mount succeeded.
> > > >
> > > > > EXT4-fs (dm-2): barriers enabled
> > > > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > > > EXT4-fs (dm-2): delayed allocation enabled
> > > > > EXT4-fs: file extents enabled
> > > > > mount: page allocation failure. order:5, mode:0xc0d0
> > > > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > > > Call Trace:
> > > > > [<c0394de3>] ? printk+0xf/0x14
> > > > > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > > > > [<c016a71b>] __get_free_pages+0xf/0x32
> > > > > [<c01865cf>] __kmalloc+0x28/0xfa
> > > > > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > > > > [<c01f529d>] ext4_mb_init+0x392/0x460
> > > > > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > > > > [<c0239bc8>] ? snprintf+0x15/0x17
> > > > > [<c01c0b26>] ? disk_name+0x24/0x69
> > > > > [<c018ba63>] get_sb_bdev+0xda/0x117
> > > > > [<c01e6711>] ext4_get_sb+0x13/0x15
> > > > > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > > > > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > > > > [<c018adad>] do_kern_mount+0x33/0xbd
> > > > > [<c019d0af>] do_mount+0x660/0x6b8
> > > > > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > > > > [<c019d168>] sys_mount+0x61/0x99
> > > > > [<c0102908>] sysenter_do_call+0x12/0x36
> > > > > Mem-Info:
> > > > > DMA per-cpu:
> > > > > CPU 0: hi: 0, btch: 1 usd: 0
> > > > > Normal per-cpu:
> > > > > CPU 0: hi: 186, btch: 31 usd: 0
> > > > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > > > > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > > > > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > > > lowmem_reserve[]: 0 489 489
> > > > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > > > lowmem_reserve[]: 0 0 0
> > > > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > > > 57947 total pagecache pages
> > > > > 878 pages in swap cache
> > > > > Swap cache stats: add 920, delete 42, find 11/11
> > > > > Free swap = 1016436kB
> > > > > Total swap = 1020116kB
> > > > > 131056 pages RAM
> > > > > 4233 pages reserved
> > > > > 90573 pages shared
> > > > > 77286 pages non-shared
> > > > > EXT4-fs: mballoc enabled
> > > > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > > > >
> > > > > Thus it seems like the original bug is still there and any ideas how to
> > > > > debug the problem further are appreciated..
> > > > >
> > > > > The complete dmesg and kernel config are here:
> > > > >
> > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> > > >
> > > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > > it is the same issue, which from the trace it seems it is, then this
> > > > is due to an extra kmalloc() allocation and this apparently will not
> > > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > > non-criticalness this issue has been deemed.
> > > >
> > > > A patch fix is part of the ext4-patchqueue
> > > > http://repo.or.cz/w/ext4-patch-queue.git
> > >
> > > Thanks for the pointer but the page allocation failures that I hit seem
> > > to be caused by the memory management itself and the ext4 issue fixed by:
> > >
> > > http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
> > >
> > > is a different problem (unrelated to this one).
> >
> > Here is another data point.
> >
> > This time it is an order-6 page allocation failure for rt2870sta
> > (w/ upcoming driver changes) and Linus' tree from few days ago..
> >
>
> It's another high-order atomic allocation which is difficult to grant.
> I didn't look closely, but is this the same type of thing - large allocation
> failure during firmware loading? If so, is this during resume or is the
> device being reloaded for some other reason?
Just modprobing the driver on a system running for some time.
> I suspect that there are going to be a few of these bugs cropping up
> every so often where network devices are assuming large atomic
> allocations will succeed because the "only time they happen" is during
> boot but these days are happening at runtime for other reasons.
I wouldn't go so far as calling a normal order-6 (256kB) allocation on
512MB machine with 1024MB swap a bug. Moreover such failures just never
happened before 2.6.31-rc1.
I don't know why people don't see it but for me it has a memory management
regression and reliability issue written all over it.
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-21 9:59 ` Bartlomiej Zolnierkiewicz
0 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-09-21 9:59 UTC (permalink / raw)
To: Mel Gorman
Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Monday 21 September 2009 10:58:44 Mel Gorman wrote:
> On Sat, Sep 19, 2009 at 03:25:32PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > On Wednesday 02 September 2009 20:26:17 Bartlomiej Zolnierkiewicz wrote:
> > > On Wednesday 02 September 2009 20:02:14 Luis R. Rodriguez wrote:
> > > > On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> > > > Zolnierkiewicz<bzolnier@gmail.com> wrote:
> > > > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > > > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > > > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > > > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > > > >>
> > > > >> s/2.6.30/2.6.31-rc6/
> > > > >>
> > > > >> The issue has always been there but it was some recent change that
> > > > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > > > >
> > > > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > > > ext4 filesystem (mb_history is optional so the mount succeeded):
> > > >
> > > > OK so the mount succeeded.
> > > >
> > > > > EXT4-fs (dm-2): barriers enabled
> > > > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > > > EXT4-fs (dm-2): delayed allocation enabled
> > > > > EXT4-fs: file extents enabled
> > > > > mount: page allocation failure. order:5, mode:0xc0d0
> > > > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > > > Call Trace:
> > > > > [<c0394de3>] ? printk+0xf/0x14
> > > > > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > > > > [<c016a71b>] __get_free_pages+0xf/0x32
> > > > > [<c01865cf>] __kmalloc+0x28/0xfa
> > > > > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > > > > [<c01f529d>] ext4_mb_init+0x392/0x460
> > > > > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > > > > [<c0239bc8>] ? snprintf+0x15/0x17
> > > > > [<c01c0b26>] ? disk_name+0x24/0x69
> > > > > [<c018ba63>] get_sb_bdev+0xda/0x117
> > > > > [<c01e6711>] ext4_get_sb+0x13/0x15
> > > > > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > > > > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > > > > [<c018adad>] do_kern_mount+0x33/0xbd
> > > > > [<c019d0af>] do_mount+0x660/0x6b8
> > > > > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > > > > [<c019d168>] sys_mount+0x61/0x99
> > > > > [<c0102908>] sysenter_do_call+0x12/0x36
> > > > > Mem-Info:
> > > > > DMA per-cpu:
> > > > > CPU 0: hi: 0, btch: 1 usd: 0
> > > > > Normal per-cpu:
> > > > > CPU 0: hi: 186, btch: 31 usd: 0
> > > > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > > > > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > > > > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > > > lowmem_reserve[]: 0 489 489
> > > > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > > > lowmem_reserve[]: 0 0 0
> > > > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > > > 57947 total pagecache pages
> > > > > 878 pages in swap cache
> > > > > Swap cache stats: add 920, delete 42, find 11/11
> > > > > Free swap = 1016436kB
> > > > > Total swap = 1020116kB
> > > > > 131056 pages RAM
> > > > > 4233 pages reserved
> > > > > 90573 pages shared
> > > > > 77286 pages non-shared
> > > > > EXT4-fs: mballoc enabled
> > > > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > > > >
> > > > > Thus it seems like the original bug is still there and any ideas how to
> > > > > debug the problem further are appreciated..
> > > > >
> > > > > The complete dmesg and kernel config are here:
> > > > >
> > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> > > >
> > > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > > it is the same issue, which from the trace it seems it is, then this
> > > > is due to an extra kmalloc() allocation and this apparently will not
> > > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > > non-criticalness this issue has been deemed.
> > > >
> > > > A patch fix is part of the ext4-patchqueue
> > > > http://repo.or.cz/w/ext4-patch-queue.git
> > >
> > > Thanks for the pointer but the page allocation failures that I hit seem
> > > to be caused by the memory management itself and the ext4 issue fixed by:
> > >
> > > http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
> > >
> > > is a different problem (unrelated to this one).
> >
> > Here is another data point.
> >
> > This time it is an order-6 page allocation failure for rt2870sta
> > (w/ upcoming driver changes) and Linus' tree from few days ago..
> >
>
> It's another high-order atomic allocation which is difficult to grant.
> I didn't look closely, but is this the same type of thing - large allocation
> failure during firmware loading? If so, is this during resume or is the
> device being reloaded for some other reason?
Just modprobing the driver on a system running for some time.
> I suspect that there are going to be a few of these bugs cropping up
> every so often where network devices are assuming large atomic
> allocations will succeed because the "only time they happen" is during
> boot but these days are happening at runtime for other reasons.
I wouldn't go so far as calling a normal order-6 (256kB) allocation on
512MB machine with 1024MB swap a bug. Moreover such failures just never
happened before 2.6.31-rc1.
I don't know why people don't see it but for me it has a memory management
regression and reliability issue written all over it.
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-21 9:59 ` Bartlomiej Zolnierkiewicz
0 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-09-21 9:59 UTC (permalink / raw)
To: Mel Gorman
Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Monday 21 September 2009 10:58:44 Mel Gorman wrote:
> On Sat, Sep 19, 2009 at 03:25:32PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > On Wednesday 02 September 2009 20:26:17 Bartlomiej Zolnierkiewicz wrote:
> > > On Wednesday 02 September 2009 20:02:14 Luis R. Rodriguez wrote:
> > > > On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> > > > Zolnierkiewicz<bzolnier@gmail.com> wrote:
> > > > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > > > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > > > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > > > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > > > >>
> > > > >> s/2.6.30/2.6.31-rc6/
> > > > >>
> > > > >> The issue has always been there but it was some recent change that
> > > > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > > > >
> > > > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > > > ext4 filesystem (mb_history is optional so the mount succeeded):
> > > >
> > > > OK so the mount succeeded.
> > > >
> > > > > EXT4-fs (dm-2): barriers enabled
> > > > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > > > EXT4-fs (dm-2): delayed allocation enabled
> > > > > EXT4-fs: file extents enabled
> > > > > mount: page allocation failure. order:5, mode:0xc0d0
> > > > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > > > Call Trace:
> > > > > [<c0394de3>] ? printk+0xf/0x14
> > > > > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > > > > [<c016a71b>] __get_free_pages+0xf/0x32
> > > > > [<c01865cf>] __kmalloc+0x28/0xfa
> > > > > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > > > > [<c01f529d>] ext4_mb_init+0x392/0x460
> > > > > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > > > > [<c0239bc8>] ? snprintf+0x15/0x17
> > > > > [<c01c0b26>] ? disk_name+0x24/0x69
> > > > > [<c018ba63>] get_sb_bdev+0xda/0x117
> > > > > [<c01e6711>] ext4_get_sb+0x13/0x15
> > > > > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > > > > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > > > > [<c018adad>] do_kern_mount+0x33/0xbd
> > > > > [<c019d0af>] do_mount+0x660/0x6b8
> > > > > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > > > > [<c019d168>] sys_mount+0x61/0x99
> > > > > [<c0102908>] sysenter_do_call+0x12/0x36
> > > > > Mem-Info:
> > > > > DMA per-cpu:
> > > > > CPU 0: hi: 0, btch: 1 usd: 0
> > > > > Normal per-cpu:
> > > > > CPU 0: hi: 186, btch: 31 usd: 0
> > > > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > > > > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > > > > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > > > lowmem_reserve[]: 0 489 489
> > > > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > > > lowmem_reserve[]: 0 0 0
> > > > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > > > 57947 total pagecache pages
> > > > > 878 pages in swap cache
> > > > > Swap cache stats: add 920, delete 42, find 11/11
> > > > > Free swap = 1016436kB
> > > > > Total swap = 1020116kB
> > > > > 131056 pages RAM
> > > > > 4233 pages reserved
> > > > > 90573 pages shared
> > > > > 77286 pages non-shared
> > > > > EXT4-fs: mballoc enabled
> > > > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > > > >
> > > > > Thus it seems like the original bug is still there and any ideas how to
> > > > > debug the problem further are appreciated..
> > > > >
> > > > > The complete dmesg and kernel config are here:
> > > > >
> > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> > > >
> > > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > > it is the same issue, which from the trace it seems it is, then this
> > > > is due to an extra kmalloc() allocation and this apparently will not
> > > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > > non-criticalness this issue has been deemed.
> > > >
> > > > A patch fix is part of the ext4-patchqueue
> > > > http://repo.or.cz/w/ext4-patch-queue.git
> > >
> > > Thanks for the pointer but the page allocation failures that I hit seem
> > > to be caused by the memory management itself and the ext4 issue fixed by:
> > >
> > > http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
> > >
> > > is a different problem (unrelated to this one).
> >
> > Here is another data point.
> >
> > This time it is an order-6 page allocation failure for rt2870sta
> > (w/ upcoming driver changes) and Linus' tree from few days ago..
> >
>
> It's another high-order atomic allocation which is difficult to grant.
> I didn't look closely, but is this the same type of thing - large allocation
> failure during firmware loading? If so, is this during resume or is the
> device being reloaded for some other reason?
Just modprobing the driver on a system running for some time.
> I suspect that there are going to be a few of these bugs cropping up
> every so often where network devices are assuming large atomic
> allocations will succeed because the "only time they happen" is during
> boot but these days are happening at runtime for other reasons.
I wouldn't go so far as calling a normal order-6 (256kB) allocation on
512MB machine with 1024MB swap a bug. Moreover such failures just never
happened before 2.6.31-rc1.
I don't know why people don't see it but for me it has a memory management
regression and reliability issue written all over it.
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
2009-09-21 9:59 ` Bartlomiej Zolnierkiewicz
(?)
@ 2009-09-21 10:08 ` Mel Gorman
-1 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-21 10:08 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz
Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Mon, Sep 21, 2009 at 11:59:27AM +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > > > > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > > > > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > > > > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > > > > >>
> > > > > >> s/2.6.30/2.6.31-rc6/
> > > > > >>
> > > > > >> The issue has always been there but it was some recent change that
> > > > > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > > > > >
> > > > > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > > > > ext4 filesystem (mb_history is optional so the mount succeeded):
> > > > >
> > > > > OK so the mount succeeded.
> > > > >
> > > > > > EXT4-fs (dm-2): barriers enabled
> > > > > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > > > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > > > > EXT4-fs (dm-2): delayed allocation enabled
> > > > > > EXT4-fs: file extents enabled
> > > > > > mount: page allocation failure. order:5, mode:0xc0d0
> > > > > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > > > > Call Trace:
> > > > > > [<c0394de3>] ? printk+0xf/0x14
> > > > > > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > > > > > [<c016a71b>] __get_free_pages+0xf/0x32
> > > > > > [<c01865cf>] __kmalloc+0x28/0xfa
> > > > > > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > > > > > [<c01f529d>] ext4_mb_init+0x392/0x460
> > > > > > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > > > > > [<c0239bc8>] ? snprintf+0x15/0x17
> > > > > > [<c01c0b26>] ? disk_name+0x24/0x69
> > > > > > [<c018ba63>] get_sb_bdev+0xda/0x117
> > > > > > [<c01e6711>] ext4_get_sb+0x13/0x15
> > > > > > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > > > > > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > > > > > [<c018adad>] do_kern_mount+0x33/0xbd
> > > > > > [<c019d0af>] do_mount+0x660/0x6b8
> > > > > > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > > > > > [<c019d168>] sys_mount+0x61/0x99
> > > > > > [<c0102908>] sysenter_do_call+0x12/0x36
> > > > > > Mem-Info:
> > > > > > DMA per-cpu:
> > > > > > CPU 0: hi: 0, btch: 1 usd: 0
> > > > > > Normal per-cpu:
> > > > > > CPU 0: hi: 186, btch: 31 usd: 0
> > > > > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > > > > > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > > > > > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > > > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > > > > lowmem_reserve[]: 0 489 489
> > > > > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > > > > lowmem_reserve[]: 0 0 0
> > > > > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > > > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > > > > 57947 total pagecache pages
> > > > > > 878 pages in swap cache
> > > > > > Swap cache stats: add 920, delete 42, find 11/11
> > > > > > Free swap = 1016436kB
> > > > > > Total swap = 1020116kB
> > > > > > 131056 pages RAM
> > > > > > 4233 pages reserved
> > > > > > 90573 pages shared
> > > > > > 77286 pages non-shared
> > > > > > EXT4-fs: mballoc enabled
> > > > > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > > > > >
> > > > > > Thus it seems like the original bug is still there and any ideas how to
> > > > > > debug the problem further are appreciated..
> > > > > >
> > > > > > The complete dmesg and kernel config are here:
> > > > > >
> > > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> > > > >
> > > > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > > > it is the same issue, which from the trace it seems it is, then this
> > > > > is due to an extra kmalloc() allocation and this apparently will not
> > > > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > > > non-criticalness this issue has been deemed.
> > > > >
> > > > > A patch fix is part of the ext4-patchqueue
> > > > > http://repo.or.cz/w/ext4-patch-queue.git
> > > >
> > > > Thanks for the pointer but the page allocation failures that I hit seem
> > > > to be caused by the memory management itself and the ext4 issue fixed by:
> > > >
> > > > http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
> > > >
> > > > is a different problem (unrelated to this one).
> > >
> > > Here is another data point.
> > >
> > > This time it is an order-6 page allocation failure for rt2870sta
> > > (w/ upcoming driver changes) and Linus' tree from few days ago..
> > >
> >
> > It's another high-order atomic allocation which is difficult to grant.
> > I didn't look closely, but is this the same type of thing - large allocation
> > failure during firmware loading? If so, is this during resume or is the
> > device being reloaded for some other reason?
>
> Just modprobing the driver on a system running for some time.
>
Was this a common situation before?
> > I suspect that there are going to be a few of these bugs cropping up
> > every so often where network devices are assuming large atomic
> > allocations will succeed because the "only time they happen" is during
> > boot but these days are happening at runtime for other reasons.
>
> I wouldn't go so far as calling a normal order-6 (256kB) allocation on
> 512MB machine with 1024MB swap a bug. Moreover such failures just never
> happened before 2.6.31-rc1.
It's not that normal, it's an allocation that cannot sleep and cannot
reclaim. Why is something like firmware loading allocating memory like
that? Is this use of GFP_ATOMIC relatively recent or has it always been
that way?
> I don't know why people don't see it but for me it has a memory management
> regression and reliability issue written all over it.
>
Possibly but drivers that reload their firmware as a response to an
error condition is relatively new and loading network drivers while the
system is already up and running a long time does not strike me as
typical system behaviour.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-21 10:08 ` Mel Gorman
0 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-21 10:08 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz
Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Mon, Sep 21, 2009 at 11:59:27AM +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > > > > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > > > > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > > > > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > > > > >>
> > > > > >> s/2.6.30/2.6.31-rc6/
> > > > > >>
> > > > > >> The issue has always been there but it was some recent change that
> > > > > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > > > > >
> > > > > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > > > > ext4 filesystem (mb_history is optional so the mount succeeded):
> > > > >
> > > > > OK so the mount succeeded.
> > > > >
> > > > > > EXT4-fs (dm-2): barriers enabled
> > > > > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > > > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > > > > EXT4-fs (dm-2): delayed allocation enabled
> > > > > > EXT4-fs: file extents enabled
> > > > > > mount: page allocation failure. order:5, mode:0xc0d0
> > > > > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > > > > Call Trace:
> > > > > > [<c0394de3>] ? printk+0xf/0x14
> > > > > > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > > > > > [<c016a71b>] __get_free_pages+0xf/0x32
> > > > > > [<c01865cf>] __kmalloc+0x28/0xfa
> > > > > > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > > > > > [<c01f529d>] ext4_mb_init+0x392/0x460
> > > > > > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > > > > > [<c0239bc8>] ? snprintf+0x15/0x17
> > > > > > [<c01c0b26>] ? disk_name+0x24/0x69
> > > > > > [<c018ba63>] get_sb_bdev+0xda/0x117
> > > > > > [<c01e6711>] ext4_get_sb+0x13/0x15
> > > > > > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > > > > > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > > > > > [<c018adad>] do_kern_mount+0x33/0xbd
> > > > > > [<c019d0af>] do_mount+0x660/0x6b8
> > > > > > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > > > > > [<c019d168>] sys_mount+0x61/0x99
> > > > > > [<c0102908>] sysenter_do_call+0x12/0x36
> > > > > > Mem-Info:
> > > > > > DMA per-cpu:
> > > > > > CPU 0: hi: 0, btch: 1 usd: 0
> > > > > > Normal per-cpu:
> > > > > > CPU 0: hi: 186, btch: 31 usd: 0
> > > > > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > > > > > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > > > > > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > > > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > > > > lowmem_reserve[]: 0 489 489
> > > > > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > > > > lowmem_reserve[]: 0 0 0
> > > > > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > > > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > > > > 57947 total pagecache pages
> > > > > > 878 pages in swap cache
> > > > > > Swap cache stats: add 920, delete 42, find 11/11
> > > > > > Free swap = 1016436kB
> > > > > > Total swap = 1020116kB
> > > > > > 131056 pages RAM
> > > > > > 4233 pages reserved
> > > > > > 90573 pages shared
> > > > > > 77286 pages non-shared
> > > > > > EXT4-fs: mballoc enabled
> > > > > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > > > > >
> > > > > > Thus it seems like the original bug is still there and any ideas how to
> > > > > > debug the problem further are appreciated..
> > > > > >
> > > > > > The complete dmesg and kernel config are here:
> > > > > >
> > > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> > > > >
> > > > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > > > it is the same issue, which from the trace it seems it is, then this
> > > > > is due to an extra kmalloc() allocation and this apparently will not
> > > > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > > > non-criticalness this issue has been deemed.
> > > > >
> > > > > A patch fix is part of the ext4-patchqueue
> > > > > http://repo.or.cz/w/ext4-patch-queue.git
> > > >
> > > > Thanks for the pointer but the page allocation failures that I hit seem
> > > > to be caused by the memory management itself and the ext4 issue fixed by:
> > > >
> > > > http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
> > > >
> > > > is a different problem (unrelated to this one).
> > >
> > > Here is another data point.
> > >
> > > This time it is an order-6 page allocation failure for rt2870sta
> > > (w/ upcoming driver changes) and Linus' tree from few days ago..
> > >
> >
> > It's another high-order atomic allocation which is difficult to grant.
> > I didn't look closely, but is this the same type of thing - large allocation
> > failure during firmware loading? If so, is this during resume or is the
> > device being reloaded for some other reason?
>
> Just modprobing the driver on a system running for some time.
>
Was this a common situation before?
> > I suspect that there are going to be a few of these bugs cropping up
> > every so often where network devices are assuming large atomic
> > allocations will succeed because the "only time they happen" is during
> > boot but these days are happening at runtime for other reasons.
>
> I wouldn't go so far as calling a normal order-6 (256kB) allocation on
> 512MB machine with 1024MB swap a bug. Moreover such failures just never
> happened before 2.6.31-rc1.
It's not that normal, it's an allocation that cannot sleep and cannot
reclaim. Why is something like firmware loading allocating memory like
that? Is this use of GFP_ATOMIC relatively recent or has it always been
that way?
> I don't know why people don't see it but for me it has a memory management
> regression and reliability issue written all over it.
>
Possibly but drivers that reload their firmware as a response to an
error condition is relatively new and loading network drivers while the
system is already up and running a long time does not strike me as
typical system behaviour.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-21 10:08 ` Mel Gorman
0 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-21 10:08 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz
Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Mon, Sep 21, 2009 at 11:59:27AM +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > > > > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > > > > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > > > > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > > > > >>
> > > > > >> s/2.6.30/2.6.31-rc6/
> > > > > >>
> > > > > >> The issue has always been there but it was some recent change that
> > > > > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > > > > >
> > > > > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > > > > ext4 filesystem (mb_history is optional so the mount succeeded):
> > > > >
> > > > > OK so the mount succeeded.
> > > > >
> > > > > > EXT4-fs (dm-2): barriers enabled
> > > > > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > > > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > > > > EXT4-fs (dm-2): delayed allocation enabled
> > > > > > EXT4-fs: file extents enabled
> > > > > > mount: page allocation failure. order:5, mode:0xc0d0
> > > > > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > > > > Call Trace:
> > > > > > [<c0394de3>] ? printk+0xf/0x14
> > > > > > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > > > > > [<c016a71b>] __get_free_pages+0xf/0x32
> > > > > > [<c01865cf>] __kmalloc+0x28/0xfa
> > > > > > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > > > > > [<c01f529d>] ext4_mb_init+0x392/0x460
> > > > > > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > > > > > [<c0239bc8>] ? snprintf+0x15/0x17
> > > > > > [<c01c0b26>] ? disk_name+0x24/0x69
> > > > > > [<c018ba63>] get_sb_bdev+0xda/0x117
> > > > > > [<c01e6711>] ext4_get_sb+0x13/0x15
> > > > > > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > > > > > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > > > > > [<c018adad>] do_kern_mount+0x33/0xbd
> > > > > > [<c019d0af>] do_mount+0x660/0x6b8
> > > > > > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > > > > > [<c019d168>] sys_mount+0x61/0x99
> > > > > > [<c0102908>] sysenter_do_call+0x12/0x36
> > > > > > Mem-Info:
> > > > > > DMA per-cpu:
> > > > > > CPU 0: hi: 0, btch: 1 usd: 0
> > > > > > Normal per-cpu:
> > > > > > CPU 0: hi: 186, btch: 31 usd: 0
> > > > > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > > > > > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > > > > > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > > > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > > > > lowmem_reserve[]: 0 489 489
> > > > > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > > > > lowmem_reserve[]: 0 0 0
> > > > > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > > > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > > > > 57947 total pagecache pages
> > > > > > 878 pages in swap cache
> > > > > > Swap cache stats: add 920, delete 42, find 11/11
> > > > > > Free swap = 1016436kB
> > > > > > Total swap = 1020116kB
> > > > > > 131056 pages RAM
> > > > > > 4233 pages reserved
> > > > > > 90573 pages shared
> > > > > > 77286 pages non-shared
> > > > > > EXT4-fs: mballoc enabled
> > > > > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > > > > >
> > > > > > Thus it seems like the original bug is still there and any ideas how to
> > > > > > debug the problem further are appreciated..
> > > > > >
> > > > > > The complete dmesg and kernel config are here:
> > > > > >
> > > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> > > > >
> > > > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > > > it is the same issue, which from the trace it seems it is, then this
> > > > > is due to an extra kmalloc() allocation and this apparently will not
> > > > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > > > non-criticalness this issue has been deemed.
> > > > >
> > > > > A patch fix is part of the ext4-patchqueue
> > > > > http://repo.or.cz/w/ext4-patch-queue.git
> > > >
> > > > Thanks for the pointer but the page allocation failures that I hit seem
> > > > to be caused by the memory management itself and the ext4 issue fixed by:
> > > >
> > > > http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
> > > >
> > > > is a different problem (unrelated to this one).
> > >
> > > Here is another data point.
> > >
> > > This time it is an order-6 page allocation failure for rt2870sta
> > > (w/ upcoming driver changes) and Linus' tree from few days ago..
> > >
> >
> > It's another high-order atomic allocation which is difficult to grant.
> > I didn't look closely, but is this the same type of thing - large allocation
> > failure during firmware loading? If so, is this during resume or is the
> > device being reloaded for some other reason?
>
> Just modprobing the driver on a system running for some time.
>
Was this a common situation before?
> > I suspect that there are going to be a few of these bugs cropping up
> > every so often where network devices are assuming large atomic
> > allocations will succeed because the "only time they happen" is during
> > boot but these days are happening at runtime for other reasons.
>
> I wouldn't go so far as calling a normal order-6 (256kB) allocation on
> 512MB machine with 1024MB swap a bug. Moreover such failures just never
> happened before 2.6.31-rc1.
It's not that normal, it's an allocation that cannot sleep and cannot
reclaim. Why is something like firmware loading allocating memory like
that? Is this use of GFP_ATOMIC relatively recent or has it always been
that way?
> I don't know why people don't see it but for me it has a memory management
> regression and reliability issue written all over it.
>
Possibly but drivers that reload their firmware as a response to an
error condition is relatively new and loading network drivers while the
system is already up and running a long time does not strike me as
typical system behaviour.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
2009-09-21 10:08 ` Mel Gorman
(?)
@ 2009-09-21 10:46 ` Bartlomiej Zolnierkiewicz
-1 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-09-21 10:46 UTC (permalink / raw)
To: Mel Gorman
Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Monday 21 September 2009 12:08:13 Mel Gorman wrote:
> On Mon, Sep 21, 2009 at 11:59:27AM +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > > > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > > > > > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > > > > > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > > > > > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > > > > > >>
> > > > > > >> s/2.6.30/2.6.31-rc6/
> > > > > > >>
> > > > > > >> The issue has always been there but it was some recent change that
> > > > > > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > > > > > >
> > > > > > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > > > > > ext4 filesystem (mb_history is optional so the mount succeeded):
> > > > > >
> > > > > > OK so the mount succeeded.
> > > > > >
> > > > > > > EXT4-fs (dm-2): barriers enabled
> > > > > > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > > > > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > > > > > EXT4-fs (dm-2): delayed allocation enabled
> > > > > > > EXT4-fs: file extents enabled
> > > > > > > mount: page allocation failure. order:5, mode:0xc0d0
> > > > > > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > > > > > Call Trace:
> > > > > > > [<c0394de3>] ? printk+0xf/0x14
> > > > > > > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > > > > > > [<c016a71b>] __get_free_pages+0xf/0x32
> > > > > > > [<c01865cf>] __kmalloc+0x28/0xfa
> > > > > > > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > > > > > > [<c01f529d>] ext4_mb_init+0x392/0x460
> > > > > > > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > > > > > > [<c0239bc8>] ? snprintf+0x15/0x17
> > > > > > > [<c01c0b26>] ? disk_name+0x24/0x69
> > > > > > > [<c018ba63>] get_sb_bdev+0xda/0x117
> > > > > > > [<c01e6711>] ext4_get_sb+0x13/0x15
> > > > > > > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > > > > > > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > > > > > > [<c018adad>] do_kern_mount+0x33/0xbd
> > > > > > > [<c019d0af>] do_mount+0x660/0x6b8
> > > > > > > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > > > > > > [<c019d168>] sys_mount+0x61/0x99
> > > > > > > [<c0102908>] sysenter_do_call+0x12/0x36
> > > > > > > Mem-Info:
> > > > > > > DMA per-cpu:
> > > > > > > CPU 0: hi: 0, btch: 1 usd: 0
> > > > > > > Normal per-cpu:
> > > > > > > CPU 0: hi: 186, btch: 31 usd: 0
> > > > > > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > > > > > > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > > > > > > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > > > > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > > > > > lowmem_reserve[]: 0 489 489
> > > > > > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > > > > > lowmem_reserve[]: 0 0 0
> > > > > > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > > > > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > > > > > 57947 total pagecache pages
> > > > > > > 878 pages in swap cache
> > > > > > > Swap cache stats: add 920, delete 42, find 11/11
> > > > > > > Free swap = 1016436kB
> > > > > > > Total swap = 1020116kB
> > > > > > > 131056 pages RAM
> > > > > > > 4233 pages reserved
> > > > > > > 90573 pages shared
> > > > > > > 77286 pages non-shared
> > > > > > > EXT4-fs: mballoc enabled
> > > > > > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > > > > > >
> > > > > > > Thus it seems like the original bug is still there and any ideas how to
> > > > > > > debug the problem further are appreciated..
> > > > > > >
> > > > > > > The complete dmesg and kernel config are here:
> > > > > > >
> > > > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> > > > > >
> > > > > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > > > > it is the same issue, which from the trace it seems it is, then this
> > > > > > is due to an extra kmalloc() allocation and this apparently will not
> > > > > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > > > > non-criticalness this issue has been deemed.
> > > > > >
> > > > > > A patch fix is part of the ext4-patchqueue
> > > > > > http://repo.or.cz/w/ext4-patch-queue.git
> > > > >
> > > > > Thanks for the pointer but the page allocation failures that I hit seem
> > > > > to be caused by the memory management itself and the ext4 issue fixed by:
> > > > >
> > > > > http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
> > > > >
> > > > > is a different problem (unrelated to this one).
> > > >
> > > > Here is another data point.
> > > >
> > > > This time it is an order-6 page allocation failure for rt2870sta
> > > > (w/ upcoming driver changes) and Linus' tree from few days ago..
> > > >
> > >
> > > It's another high-order atomic allocation which is difficult to grant.
> > > I didn't look closely, but is this the same type of thing - large allocation
> > > failure during firmware loading? If so, is this during resume or is the
> > > device being reloaded for some other reason?
> >
> > Just modprobing the driver on a system running for some time.
> >
>
> Was this a common situation before?
Yes, just like firmware restarts with ipw2200.
> > > I suspect that there are going to be a few of these bugs cropping up
> > > every so often where network devices are assuming large atomic
> > > allocations will succeed because the "only time they happen" is during
> > > boot but these days are happening at runtime for other reasons.
> >
> > I wouldn't go so far as calling a normal order-6 (256kB) allocation on
> > 512MB machine with 1024MB swap a bug. Moreover such failures just never
> > happened before 2.6.31-rc1.
>
> It's not that normal, it's an allocation that cannot sleep and cannot
> reclaim. Why is something like firmware loading allocating memory like
OK.
> that? Is this use of GFP_ATOMIC relatively recent or has it always been
> that way?
It has always been like that.
> > I don't know why people don't see it but for me it has a memory management
> > regression and reliability issue written all over it.
> >
>
> Possibly but drivers that reload their firmware as a response to an
> error condition is relatively new and loading network drivers while the
> system is already up and running a long time does not strike me as
> typical system behaviour.
Loading drivers after boot is a typical desktop/laptop behavior, please
think about hotplug (the hardware in question is an USB dongle).
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-21 10:46 ` Bartlomiej Zolnierkiewicz
0 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-09-21 10:46 UTC (permalink / raw)
To: Mel Gorman
Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Monday 21 September 2009 12:08:13 Mel Gorman wrote:
> On Mon, Sep 21, 2009 at 11:59:27AM +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > > > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > > > > > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > > > > > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > > > > > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > > > > > >>
> > > > > > >> s/2.6.30/2.6.31-rc6/
> > > > > > >>
> > > > > > >> The issue has always been there but it was some recent change that
> > > > > > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > > > > > >
> > > > > > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > > > > > ext4 filesystem (mb_history is optional so the mount succeeded):
> > > > > >
> > > > > > OK so the mount succeeded.
> > > > > >
> > > > > > > EXT4-fs (dm-2): barriers enabled
> > > > > > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > > > > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > > > > > EXT4-fs (dm-2): delayed allocation enabled
> > > > > > > EXT4-fs: file extents enabled
> > > > > > > mount: page allocation failure. order:5, mode:0xc0d0
> > > > > > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > > > > > Call Trace:
> > > > > > > [<c0394de3>] ? printk+0xf/0x14
> > > > > > > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > > > > > > [<c016a71b>] __get_free_pages+0xf/0x32
> > > > > > > [<c01865cf>] __kmalloc+0x28/0xfa
> > > > > > > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > > > > > > [<c01f529d>] ext4_mb_init+0x392/0x460
> > > > > > > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > > > > > > [<c0239bc8>] ? snprintf+0x15/0x17
> > > > > > > [<c01c0b26>] ? disk_name+0x24/0x69
> > > > > > > [<c018ba63>] get_sb_bdev+0xda/0x117
> > > > > > > [<c01e6711>] ext4_get_sb+0x13/0x15
> > > > > > > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > > > > > > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > > > > > > [<c018adad>] do_kern_mount+0x33/0xbd
> > > > > > > [<c019d0af>] do_mount+0x660/0x6b8
> > > > > > > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > > > > > > [<c019d168>] sys_mount+0x61/0x99
> > > > > > > [<c0102908>] sysenter_do_call+0x12/0x36
> > > > > > > Mem-Info:
> > > > > > > DMA per-cpu:
> > > > > > > CPU 0: hi: 0, btch: 1 usd: 0
> > > > > > > Normal per-cpu:
> > > > > > > CPU 0: hi: 186, btch: 31 usd: 0
> > > > > > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > > > > > > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > > > > > > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > > > > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > > > > > lowmem_reserve[]: 0 489 489
> > > > > > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > > > > > lowmem_reserve[]: 0 0 0
> > > > > > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > > > > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > > > > > 57947 total pagecache pages
> > > > > > > 878 pages in swap cache
> > > > > > > Swap cache stats: add 920, delete 42, find 11/11
> > > > > > > Free swap = 1016436kB
> > > > > > > Total swap = 1020116kB
> > > > > > > 131056 pages RAM
> > > > > > > 4233 pages reserved
> > > > > > > 90573 pages shared
> > > > > > > 77286 pages non-shared
> > > > > > > EXT4-fs: mballoc enabled
> > > > > > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > > > > > >
> > > > > > > Thus it seems like the original bug is still there and any ideas how to
> > > > > > > debug the problem further are appreciated..
> > > > > > >
> > > > > > > The complete dmesg and kernel config are here:
> > > > > > >
> > > > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> > > > > >
> > > > > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > > > > it is the same issue, which from the trace it seems it is, then this
> > > > > > is due to an extra kmalloc() allocation and this apparently will not
> > > > > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > > > > non-criticalness this issue has been deemed.
> > > > > >
> > > > > > A patch fix is part of the ext4-patchqueue
> > > > > > http://repo.or.cz/w/ext4-patch-queue.git
> > > > >
> > > > > Thanks for the pointer but the page allocation failures that I hit seem
> > > > > to be caused by the memory management itself and the ext4 issue fixed by:
> > > > >
> > > > > http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
> > > > >
> > > > > is a different problem (unrelated to this one).
> > > >
> > > > Here is another data point.
> > > >
> > > > This time it is an order-6 page allocation failure for rt2870sta
> > > > (w/ upcoming driver changes) and Linus' tree from few days ago..
> > > >
> > >
> > > It's another high-order atomic allocation which is difficult to grant.
> > > I didn't look closely, but is this the same type of thing - large allocation
> > > failure during firmware loading? If so, is this during resume or is the
> > > device being reloaded for some other reason?
> >
> > Just modprobing the driver on a system running for some time.
> >
>
> Was this a common situation before?
Yes, just like firmware restarts with ipw2200.
> > > I suspect that there are going to be a few of these bugs cropping up
> > > every so often where network devices are assuming large atomic
> > > allocations will succeed because the "only time they happen" is during
> > > boot but these days are happening at runtime for other reasons.
> >
> > I wouldn't go so far as calling a normal order-6 (256kB) allocation on
> > 512MB machine with 1024MB swap a bug. Moreover such failures just never
> > happened before 2.6.31-rc1.
>
> It's not that normal, it's an allocation that cannot sleep and cannot
> reclaim. Why is something like firmware loading allocating memory like
OK.
> that? Is this use of GFP_ATOMIC relatively recent or has it always been
> that way?
It has always been like that.
> > I don't know why people don't see it but for me it has a memory management
> > regression and reliability issue written all over it.
> >
>
> Possibly but drivers that reload their firmware as a response to an
> error condition is relatively new and loading network drivers while the
> system is already up and running a long time does not strike me as
> typical system behaviour.
Loading drivers after boot is a typical desktop/laptop behavior, please
think about hotplug (the hardware in question is an USB dongle).
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-21 10:46 ` Bartlomiej Zolnierkiewicz
0 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-09-21 10:46 UTC (permalink / raw)
To: Mel Gorman
Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Monday 21 September 2009 12:08:13 Mel Gorman wrote:
> On Mon, Sep 21, 2009 at 11:59:27AM +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > > > > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> > > > > > >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> > > > > > >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> > > > > > >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> > > > > > >>
> > > > > > >> s/2.6.30/2.6.31-rc6/
> > > > > > >>
> > > > > > >> The issue has always been there but it was some recent change that
> > > > > > >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> > > > > > >
> > > > > > > ipw2200 fix works fine but yesterday I got the following error while mounting
> > > > > > > ext4 filesystem (mb_history is optional so the mount succeeded):
> > > > > >
> > > > > > OK so the mount succeeded.
> > > > > >
> > > > > > > EXT4-fs (dm-2): barriers enabled
> > > > > > > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > > > > > > EXT4-fs (dm-2): internal journal on dm-2:8
> > > > > > > EXT4-fs (dm-2): delayed allocation enabled
> > > > > > > EXT4-fs: file extents enabled
> > > > > > > mount: page allocation failure. order:5, mode:0xc0d0
> > > > > > > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > > > > > > Call Trace:
> > > > > > > [<c0394de3>] ? printk+0xf/0x14
> > > > > > > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > > > > > > [<c016a71b>] __get_free_pages+0xf/0x32
> > > > > > > [<c01865cf>] __kmalloc+0x28/0xfa
> > > > > > > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > > > > > > [<c01f529d>] ext4_mb_init+0x392/0x460
> > > > > > > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > > > > > > [<c0239bc8>] ? snprintf+0x15/0x17
> > > > > > > [<c01c0b26>] ? disk_name+0x24/0x69
> > > > > > > [<c018ba63>] get_sb_bdev+0xda/0x117
> > > > > > > [<c01e6711>] ext4_get_sb+0x13/0x15
> > > > > > > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > > > > > > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > > > > > > [<c018adad>] do_kern_mount+0x33/0xbd
> > > > > > > [<c019d0af>] do_mount+0x660/0x6b8
> > > > > > > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > > > > > > [<c019d168>] sys_mount+0x61/0x99
> > > > > > > [<c0102908>] sysenter_do_call+0x12/0x36
> > > > > > > Mem-Info:
> > > > > > > DMA per-cpu:
> > > > > > > CPU 0: hi: 0, btch: 1 usd: 0
> > > > > > > Normal per-cpu:
> > > > > > > CPU 0: hi: 186, btch: 31 usd: 0
> > > > > > > Active_anon:25471 active_file:22802 inactive_anon:25812
> > > > > > > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > > > > > > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > > > > > > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > > > > > > lowmem_reserve[]: 0 489 489
> > > > > > > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > > > > > > lowmem_reserve[]: 0 0 0
> > > > > > > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > > > > > > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > > > > > > 57947 total pagecache pages
> > > > > > > 878 pages in swap cache
> > > > > > > Swap cache stats: add 920, delete 42, find 11/11
> > > > > > > Free swap = 1016436kB
> > > > > > > Total swap = 1020116kB
> > > > > > > 131056 pages RAM
> > > > > > > 4233 pages reserved
> > > > > > > 90573 pages shared
> > > > > > > 77286 pages non-shared
> > > > > > > EXT4-fs: mballoc enabled
> > > > > > > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> > > > > > >
> > > > > > > Thus it seems like the original bug is still there and any ideas how to
> > > > > > > debug the problem further are appreciated..
> > > > > > >
> > > > > > > The complete dmesg and kernel config are here:
> > > > > > >
> > > > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > > > > > > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
> > > > > >
> > > > > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > > > > it is the same issue, which from the trace it seems it is, then this
> > > > > > is due to an extra kmalloc() allocation and this apparently will not
> > > > > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > > > > non-criticalness this issue has been deemed.
> > > > > >
> > > > > > A patch fix is part of the ext4-patchqueue
> > > > > > http://repo.or.cz/w/ext4-patch-queue.git
> > > > >
> > > > > Thanks for the pointer but the page allocation failures that I hit seem
> > > > > to be caused by the memory management itself and the ext4 issue fixed by:
> > > > >
> > > > > http://repo.or.cz/w/ext4-patch-queue.git?a=blob;f=memory-leak-fix-ext4_group_info-allocation;h=c919fff34e70ec85f96d1833f9ce460c451000de;hb=HEAD
> > > > >
> > > > > is a different problem (unrelated to this one).
> > > >
> > > > Here is another data point.
> > > >
> > > > This time it is an order-6 page allocation failure for rt2870sta
> > > > (w/ upcoming driver changes) and Linus' tree from few days ago..
> > > >
> > >
> > > It's another high-order atomic allocation which is difficult to grant.
> > > I didn't look closely, but is this the same type of thing - large allocation
> > > failure during firmware loading? If so, is this during resume or is the
> > > device being reloaded for some other reason?
> >
> > Just modprobing the driver on a system running for some time.
> >
>
> Was this a common situation before?
Yes, just like firmware restarts with ipw2200.
> > > I suspect that there are going to be a few of these bugs cropping up
> > > every so often where network devices are assuming large atomic
> > > allocations will succeed because the "only time they happen" is during
> > > boot but these days are happening at runtime for other reasons.
> >
> > I wouldn't go so far as calling a normal order-6 (256kB) allocation on
> > 512MB machine with 1024MB swap a bug. Moreover such failures just never
> > happened before 2.6.31-rc1.
>
> It's not that normal, it's an allocation that cannot sleep and cannot
> reclaim. Why is something like firmware loading allocating memory like
OK.
> that? Is this use of GFP_ATOMIC relatively recent or has it always been
> that way?
It has always been like that.
> > I don't know why people don't see it but for me it has a memory management
> > regression and reliability issue written all over it.
> >
>
> Possibly but drivers that reload their firmware as a response to an
> error condition is relatively new and loading network drivers while the
> system is already up and running a long time does not strike me as
> typical system behaviour.
Loading drivers after boot is a typical desktop/laptop behavior, please
think about hotplug (the hardware in question is an USB dongle).
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
2009-09-21 10:46 ` Bartlomiej Zolnierkiewicz
(?)
@ 2009-09-21 10:56 ` Pekka Enberg
-1 siblings, 0 replies; 286+ messages in thread
From: Pekka Enberg @ 2009-09-21 10:56 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz
Cc: Mel Gorman, Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Mon, 2009-09-21 at 12:46 +0200, Bartlomiej Zolnierkiewicz wrote:
> > > I don't know why people don't see it but for me it has a memory management
> > > regression and reliability issue written all over it.
> >
> > Possibly but drivers that reload their firmware as a response to an
> > error condition is relatively new and loading network drivers while the
> > system is already up and running a long time does not strike me as
> > typical system behaviour.
>
> Loading drivers after boot is a typical desktop/laptop behavior, please
> think about hotplug (the hardware in question is an USB dongle).
Yeah, I wonder what broke things. Did the wireless stack change in
2.6.31-rc1 too? IIRC Mel ruled out page allocator changes as a suspect.
Pekka
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-21 10:56 ` Pekka Enberg
0 siblings, 0 replies; 286+ messages in thread
From: Pekka Enberg @ 2009-09-21 10:56 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz
Cc: Mel Gorman, Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Mon, 2009-09-21 at 12:46 +0200, Bartlomiej Zolnierkiewicz wrote:
> > > I don't know why people don't see it but for me it has a memory management
> > > regression and reliability issue written all over it.
> >
> > Possibly but drivers that reload their firmware as a response to an
> > error condition is relatively new and loading network drivers while the
> > system is already up and running a long time does not strike me as
> > typical system behaviour.
>
> Loading drivers after boot is a typical desktop/laptop behavior, please
> think about hotplug (the hardware in question is an USB dongle).
Yeah, I wonder what broke things. Did the wireless stack change in
2.6.31-rc1 too? IIRC Mel ruled out page allocator changes as a suspect.
Pekka
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-21 10:56 ` Pekka Enberg
0 siblings, 0 replies; 286+ messages in thread
From: Pekka Enberg @ 2009-09-21 10:56 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz
Cc: Mel Gorman, Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Mon, 2009-09-21 at 12:46 +0200, Bartlomiej Zolnierkiewicz wrote:
> > > I don't know why people don't see it but for me it has a memory management
> > > regression and reliability issue written all over it.
> >
> > Possibly but drivers that reload their firmware as a response to an
> > error condition is relatively new and loading network drivers while the
> > system is already up and running a long time does not strike me as
> > typical system behaviour.
>
> Loading drivers after boot is a typical desktop/laptop behavior, please
> think about hotplug (the hardware in question is an USB dongle).
Yeah, I wonder what broke things. Did the wireless stack change in
2.6.31-rc1 too? IIRC Mel ruled out page allocator changes as a suspect.
Pekka
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
2009-09-21 10:56 ` Pekka Enberg
(?)
@ 2009-09-21 13:12 ` Bartlomiej Zolnierkiewicz
-1 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-09-21 13:12 UTC (permalink / raw)
To: Pekka Enberg
Cc: Mel Gorman, Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Monday 21 September 2009 12:56:48 Pekka Enberg wrote:
> On Mon, 2009-09-21 at 12:46 +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > I don't know why people don't see it but for me it has a memory management
> > > > regression and reliability issue written all over it.
> > >
> > > Possibly but drivers that reload their firmware as a response to an
> > > error condition is relatively new and loading network drivers while the
> > > system is already up and running a long time does not strike me as
> > > typical system behaviour.
> >
> > Loading drivers after boot is a typical desktop/laptop behavior, please
> > think about hotplug (the hardware in question is an USB dongle).
>
> Yeah, I wonder what broke things. Did the wireless stack change in
> 2.6.31-rc1 too? IIRC Mel ruled out page allocator changes as a suspect.
The thing is that the mm behavior change has been narrowed down already
over a month ago to -mm merge in 2.6.31-rc1 (as has been noted in my initial
reports), I first though that that it was -next breakage but it turned out
that it came the other way around (because -mm is not even pulled into -next
currently -- great way to set an example for other kernel maintainers BTW).
I understand that behavior change may be justified and technically correct
in itself. I also completely agree that high order allocations in certain
drivers need fixing anyway.
However there is something wrong with the big picture and the way changes
are happening. I'm not saying that I'm surprised though, especially given
the recent decline in the quality assurance and the paradigm shift that
I'm seeing (some influential top level people talking that -rc1 is fine for
testing new code now or the "new kernel new hardware" thing).
Sorry but I have no more time currently to narrow down the issue some more
(guess what, there are other kernel bugs standing in the way to bisect it
and I would have to provide some reliable way to reproduce it first) so I
see no more point in wasting people's time on this. I can certainly get by
with allocation failure here and there. Not a big deal for me personally..
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-21 13:12 ` Bartlomiej Zolnierkiewicz
0 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-09-21 13:12 UTC (permalink / raw)
To: Pekka Enberg
Cc: Mel Gorman, Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Monday 21 September 2009 12:56:48 Pekka Enberg wrote:
> On Mon, 2009-09-21 at 12:46 +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > I don't know why people don't see it but for me it has a memory management
> > > > regression and reliability issue written all over it.
> > >
> > > Possibly but drivers that reload their firmware as a response to an
> > > error condition is relatively new and loading network drivers while the
> > > system is already up and running a long time does not strike me as
> > > typical system behaviour.
> >
> > Loading drivers after boot is a typical desktop/laptop behavior, please
> > think about hotplug (the hardware in question is an USB dongle).
>
> Yeah, I wonder what broke things. Did the wireless stack change in
> 2.6.31-rc1 too? IIRC Mel ruled out page allocator changes as a suspect.
The thing is that the mm behavior change has been narrowed down already
over a month ago to -mm merge in 2.6.31-rc1 (as has been noted in my initial
reports), I first though that that it was -next breakage but it turned out
that it came the other way around (because -mm is not even pulled into -next
currently -- great way to set an example for other kernel maintainers BTW).
I understand that behavior change may be justified and technically correct
in itself. I also completely agree that high order allocations in certain
drivers need fixing anyway.
However there is something wrong with the big picture and the way changes
are happening. I'm not saying that I'm surprised though, especially given
the recent decline in the quality assurance and the paradigm shift that
I'm seeing (some influential top level people talking that -rc1 is fine for
testing new code now or the "new kernel new hardware" thing).
Sorry but I have no more time currently to narrow down the issue some more
(guess what, there are other kernel bugs standing in the way to bisect it
and I would have to provide some reliable way to reproduce it first) so I
see no more point in wasting people's time on this. I can certainly get by
with allocation failure here and there. Not a big deal for me personally..
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-21 13:12 ` Bartlomiej Zolnierkiewicz
0 siblings, 0 replies; 286+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2009-09-21 13:12 UTC (permalink / raw)
To: Pekka Enberg
Cc: Mel Gorman, Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Monday 21 September 2009 12:56:48 Pekka Enberg wrote:
> On Mon, 2009-09-21 at 12:46 +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > I don't know why people don't see it but for me it has a memory management
> > > > regression and reliability issue written all over it.
> > >
> > > Possibly but drivers that reload their firmware as a response to an
> > > error condition is relatively new and loading network drivers while the
> > > system is already up and running a long time does not strike me as
> > > typical system behaviour.
> >
> > Loading drivers after boot is a typical desktop/laptop behavior, please
> > think about hotplug (the hardware in question is an USB dongle).
>
> Yeah, I wonder what broke things. Did the wireless stack change in
> 2.6.31-rc1 too? IIRC Mel ruled out page allocator changes as a suspect.
The thing is that the mm behavior change has been narrowed down already
over a month ago to -mm merge in 2.6.31-rc1 (as has been noted in my initial
reports), I first though that that it was -next breakage but it turned out
that it came the other way around (because -mm is not even pulled into -next
currently -- great way to set an example for other kernel maintainers BTW).
I understand that behavior change may be justified and technically correct
in itself. I also completely agree that high order allocations in certain
drivers need fixing anyway.
However there is something wrong with the big picture and the way changes
are happening. I'm not saying that I'm surprised though, especially given
the recent decline in the quality assurance and the paradigm shift that
I'm seeing (some influential top level people talking that -rc1 is fine for
testing new code now or the "new kernel new hardware" thing).
Sorry but I have no more time currently to narrow down the issue some more
(guess what, there are other kernel bugs standing in the way to bisect it
and I would have to provide some reliable way to reproduce it first) so I
see no more point in wasting people's time on this. I can certainly get by
with allocation failure here and there. Not a big deal for me personally..
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
2009-09-21 13:12 ` Bartlomiej Zolnierkiewicz
(?)
@ 2009-09-21 13:37 ` Mel Gorman
-1 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-21 13:37 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz
Cc: Pekka Enberg, Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V,
Zhu Yi, Andrew Morton, Johannes Weiner, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Mon, Sep 21, 2009 at 03:12:14PM +0200, Bartlomiej Zolnierkiewicz wrote:
> On Monday 21 September 2009 12:56:48 Pekka Enberg wrote:
> > On Mon, 2009-09-21 at 12:46 +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > > I don't know why people don't see it but for me it has a memory management
> > > > > regression and reliability issue written all over it.
> > > >
> > > > Possibly but drivers that reload their firmware as a response to an
> > > > error condition is relatively new and loading network drivers while the
> > > > system is already up and running a long time does not strike me as
> > > > typical system behaviour.
> > >
> > > Loading drivers after boot is a typical desktop/laptop behavior, please
> > > think about hotplug (the hardware in question is an USB dongle).
> >
> > Yeah, I wonder what broke things. Did the wireless stack change in
> > 2.6.31-rc1 too? IIRC Mel ruled out page allocator changes as a suspect.
>
> The thing is that the mm behavior change has been narrowed down already
> over a month ago to -mm merge in 2.6.31-rc1 (as has been noted in my initial
> reports), I first though that that it was -next breakage but it turned out
> that it came the other way around (because -mm is not even pulled into -next
> currently -- great way to set an example for other kernel maintainers BTW).
>
Is there a reliable reproduction case for this that narrowed it down to
2.6.31-rc1? That is the window where a number of page-allocator optimisation
patches made it in. None of them should have affected the allocator from a
fragmentation perspective though.
If you have a reliable reproduction case, testing between commits
d239171e4f6efd58d7e423853056b1b6a74f1446..a1dd268cf6306565a31a48deff8bf4f6b4b105f7
would be nice, particularly if it can be bisected within that small
window rather than a full bisect of an rc1 which I know can be a major
mess.
> I understand that behavior change may be justified and technically correct
> in itself. I also completely agree that high order allocations in certain
> drivers need fixing anyway.
>
> However there is something wrong with the big picture and the way changes
> are happening. I'm not saying that I'm surprised though, especially given
> the recent decline in the quality assurance and the paradigm shift that
> I'm seeing (some influential top level people talking that -rc1 is fine for
> testing new code now or the "new kernel new hardware" thing).
>
The quality assurance comment is a bit unfair with respect to the page
allocator. There are a lot of things that can have changed that would hose
order-6 atomic allocations. Furthermore, test cases used for mm patches
would not have taken into account such allocations as being critical. Even
if it was considered, it would have been dismissed as "it makes no sense
for drivers to be doing order-6 GFP_ATOMIC" allocations.
> Sorry but I have no more time currently to narrow down the issue some more
> (guess what, there are other kernel bugs standing in the way to bisect it
> and I would have to provide some reliable way to reproduce it first) so I
> see no more point in wasting people's time on this. I can certainly get by
> with allocation failure here and there. Not a big deal for me personally..
>
That is somewhat unfortunate. Even testing within the window above if
possible would be very helpful if you get the chance.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-21 13:37 ` Mel Gorman
0 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-21 13:37 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz
Cc: Pekka Enberg, Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V,
Zhu Yi, Andrew Morton, Johannes Weiner, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Mon, Sep 21, 2009 at 03:12:14PM +0200, Bartlomiej Zolnierkiewicz wrote:
> On Monday 21 September 2009 12:56:48 Pekka Enberg wrote:
> > On Mon, 2009-09-21 at 12:46 +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > > I don't know why people don't see it but for me it has a memory management
> > > > > regression and reliability issue written all over it.
> > > >
> > > > Possibly but drivers that reload their firmware as a response to an
> > > > error condition is relatively new and loading network drivers while the
> > > > system is already up and running a long time does not strike me as
> > > > typical system behaviour.
> > >
> > > Loading drivers after boot is a typical desktop/laptop behavior, please
> > > think about hotplug (the hardware in question is an USB dongle).
> >
> > Yeah, I wonder what broke things. Did the wireless stack change in
> > 2.6.31-rc1 too? IIRC Mel ruled out page allocator changes as a suspect.
>
> The thing is that the mm behavior change has been narrowed down already
> over a month ago to -mm merge in 2.6.31-rc1 (as has been noted in my initial
> reports), I first though that that it was -next breakage but it turned out
> that it came the other way around (because -mm is not even pulled into -next
> currently -- great way to set an example for other kernel maintainers BTW).
>
Is there a reliable reproduction case for this that narrowed it down to
2.6.31-rc1? That is the window where a number of page-allocator optimisation
patches made it in. None of them should have affected the allocator from a
fragmentation perspective though.
If you have a reliable reproduction case, testing between commits
d239171e4f6efd58d7e423853056b1b6a74f1446..a1dd268cf6306565a31a48deff8bf4f6b4b105f7
would be nice, particularly if it can be bisected within that small
window rather than a full bisect of an rc1 which I know can be a major
mess.
> I understand that behavior change may be justified and technically correct
> in itself. I also completely agree that high order allocations in certain
> drivers need fixing anyway.
>
> However there is something wrong with the big picture and the way changes
> are happening. I'm not saying that I'm surprised though, especially given
> the recent decline in the quality assurance and the paradigm shift that
> I'm seeing (some influential top level people talking that -rc1 is fine for
> testing new code now or the "new kernel new hardware" thing).
>
The quality assurance comment is a bit unfair with respect to the page
allocator. There are a lot of things that can have changed that would hose
order-6 atomic allocations. Furthermore, test cases used for mm patches
would not have taken into account such allocations as being critical. Even
if it was considered, it would have been dismissed as "it makes no sense
for drivers to be doing order-6 GFP_ATOMIC" allocations.
> Sorry but I have no more time currently to narrow down the issue some more
> (guess what, there are other kernel bugs standing in the way to bisect it
> and I would have to provide some reliable way to reproduce it first) so I
> see no more point in wasting people's time on this. I can certainly get by
> with allocation failure here and there. Not a big deal for me personally..
>
That is somewhat unfortunate. Even testing within the window above if
possible would be very helpful if you get the chance.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-21 13:37 ` Mel Gorman
0 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-21 13:37 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz
Cc: Pekka Enberg, Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V,
Zhu Yi, Andrew Morton, Johannes Weiner, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Mon, Sep 21, 2009 at 03:12:14PM +0200, Bartlomiej Zolnierkiewicz wrote:
> On Monday 21 September 2009 12:56:48 Pekka Enberg wrote:
> > On Mon, 2009-09-21 at 12:46 +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > > I don't know why people don't see it but for me it has a memory management
> > > > > regression and reliability issue written all over it.
> > > >
> > > > Possibly but drivers that reload their firmware as a response to an
> > > > error condition is relatively new and loading network drivers while the
> > > > system is already up and running a long time does not strike me as
> > > > typical system behaviour.
> > >
> > > Loading drivers after boot is a typical desktop/laptop behavior, please
> > > think about hotplug (the hardware in question is an USB dongle).
> >
> > Yeah, I wonder what broke things. Did the wireless stack change in
> > 2.6.31-rc1 too? IIRC Mel ruled out page allocator changes as a suspect.
>
> The thing is that the mm behavior change has been narrowed down already
> over a month ago to -mm merge in 2.6.31-rc1 (as has been noted in my initial
> reports), I first though that that it was -next breakage but it turned out
> that it came the other way around (because -mm is not even pulled into -next
> currently -- great way to set an example for other kernel maintainers BTW).
>
Is there a reliable reproduction case for this that narrowed it down to
2.6.31-rc1? That is the window where a number of page-allocator optimisation
patches made it in. None of them should have affected the allocator from a
fragmentation perspective though.
If you have a reliable reproduction case, testing between commits
d239171e4f6efd58d7e423853056b1b6a74f1446..a1dd268cf6306565a31a48deff8bf4f6b4b105f7
would be nice, particularly if it can be bisected within that small
window rather than a full bisect of an rc1 which I know can be a major
mess.
> I understand that behavior change may be justified and technically correct
> in itself. I also completely agree that high order allocations in certain
> drivers need fixing anyway.
>
> However there is something wrong with the big picture and the way changes
> are happening. I'm not saying that I'm surprised though, especially given
> the recent decline in the quality assurance and the paradigm shift that
> I'm seeing (some influential top level people talking that -rc1 is fine for
> testing new code now or the "new kernel new hardware" thing).
>
The quality assurance comment is a bit unfair with respect to the page
allocator. There are a lot of things that can have changed that would hose
order-6 atomic allocations. Furthermore, test cases used for mm patches
would not have taken into account such allocations as being critical. Even
if it was considered, it would have been dismissed as "it makes no sense
for drivers to be doing order-6 GFP_ATOMIC" allocations.
> Sorry but I have no more time currently to narrow down the issue some more
> (guess what, there are other kernel bugs standing in the way to bisect it
> and I would have to provide some reliable way to reproduce it first) so I
> see no more point in wasting people's time on this. I can certainly get by
> with allocation failure here and there. Not a big deal for me personally..
>
That is somewhat unfortunate. Even testing within the window above if
possible would be very helpful if you get the chance.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
2009-09-21 10:46 ` Bartlomiej Zolnierkiewicz
(?)
(?)
@ 2009-09-21 11:02 ` Mel Gorman
-1 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-21 11:02 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz
Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Mon, Sep 21, 2009 at 12:46:34PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > > <SNIP>
> > > > >
> > > > > This time it is an order-6 page allocation failure for rt2870sta
> > > > > (w/ upcoming driver changes) and Linus' tree from few days ago..
> > > > >
> > > >
> > > > It's another high-order atomic allocation which is difficult to grant.
> > > > I didn't look closely, but is this the same type of thing - large allocation
> > > > failure during firmware loading? If so, is this during resume or is the
> > > > device being reloaded for some other reason?
> > >
> > > Just modprobing the driver on a system running for some time.
> > >
> >
> > Was this a common situation before?
>
> Yes, just like firmware restarts with ipw2200.
>
> > > > I suspect that there are going to be a few of these bugs cropping up
> > > > every so often where network devices are assuming large atomic
> > > > allocations will succeed because the "only time they happen" is during
> > > > boot but these days are happening at runtime for other reasons.
> > >
> > > I wouldn't go so far as calling a normal order-6 (256kB) allocation on
> > > 512MB machine with 1024MB swap a bug. Moreover such failures just never
> > > happened before 2.6.31-rc1.
> >
> > It's not that normal, it's an allocation that cannot sleep and cannot
> > reclaim. Why is something like firmware loading allocating memory like
>
> OK.
>
> > that? Is this use of GFP_ATOMIC relatively recent or has it always been
> > that way?
>
> It has always been like that.
>
Nuts, why is firmware loading depending on GFP_ATOMIC?
> > > I don't know why people don't see it but for me it has a memory management
> > > regression and reliability issue written all over it.
> > >
> >
> > Possibly but drivers that reload their firmware as a response to an
> > error condition is relatively new and loading network drivers while the
> > system is already up and running a long time does not strike me as
> > typical system behaviour.
>
> Loading drivers after boot is a typical desktop/laptop behavior, please
> think about hotplug (the hardware in question is an USB dongle).
>
In that case, how reproducible is this problem so it can be
bisected? Basically, there are no guarantees that GFP_ATOMIC allocations
of this order will succeed although you can improve the odds by increasing
min_free_kbytes. Network drivers should never have been depending on GFP_ATOMIC
succeeding like this but the hole has been dug now.
If it's happening more frequently now than it used to then either
1. The allocations are occuring more frequently where as previously a
pool might have been reused or the memory not freed for the lifetime of
the system.
2. Something has changed in the allocator. I'm not aware of recent
changes that could cause this though in such a recent time-frame.
3. Something has changed recently with respect to reclaim. There have
been changes made recently to lumpy reclaim and that might be impacting
kswapd's efforts at keeping large contiguous regions free.
4. Hotplug events that involve driver loads are more common now than they
were previously for some reason. You mention that this is a USB dongle for
example. Was it a case before that the driver loaded early and remained
resident but only active after a hotplug event? If that was the case,
the memory would be allocated once at boot. However, if an optimisation
made recently unloads those unused drivers and re-loads them later, there
would be more order-6 allocations than they were previously and manifest
as these bug reports. Is this a possibility?
The ideal would be that network drivers not make allocations like this
in the first place by, for example, DMAing the firmware across in
page-size chunks instead of one contiguous lump :/
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-21 11:02 ` Mel Gorman
0 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-21 11:02 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz
Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Mon, Sep 21, 2009 at 12:46:34PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > > <SNIP>
> > > > >
> > > > > This time it is an order-6 page allocation failure for rt2870sta
> > > > > (w/ upcoming driver changes) and Linus' tree from few days ago..
> > > > >
> > > >
> > > > It's another high-order atomic allocation which is difficult to grant.
> > > > I didn't look closely, but is this the same type of thing - large allocation
> > > > failure during firmware loading? If so, is this during resume or is the
> > > > device being reloaded for some other reason?
> > >
> > > Just modprobing the driver on a system running for some time.
> > >
> >
> > Was this a common situation before?
>
> Yes, just like firmware restarts with ipw2200.
>
> > > > I suspect that there are going to be a few of these bugs cropping up
> > > > every so often where network devices are assuming large atomic
> > > > allocations will succeed because the "only time they happen" is during
> > > > boot but these days are happening at runtime for other reasons.
> > >
> > > I wouldn't go so far as calling a normal order-6 (256kB) allocation on
> > > 512MB machine with 1024MB swap a bug. Moreover such failures just never
> > > happened before 2.6.31-rc1.
> >
> > It's not that normal, it's an allocation that cannot sleep and cannot
> > reclaim. Why is something like firmware loading allocating memory like
>
> OK.
>
> > that? Is this use of GFP_ATOMIC relatively recent or has it always been
> > that way?
>
> It has always been like that.
>
Nuts, why is firmware loading depending on GFP_ATOMIC?
> > > I don't know why people don't see it but for me it has a memory management
> > > regression and reliability issue written all over it.
> > >
> >
> > Possibly but drivers that reload their firmware as a response to an
> > error condition is relatively new and loading network drivers while the
> > system is already up and running a long time does not strike me as
> > typical system behaviour.
>
> Loading drivers after boot is a typical desktop/laptop behavior, please
> think about hotplug (the hardware in question is an USB dongle).
>
In that case, how reproducible is this problem so it can be
bisected? Basically, there are no guarantees that GFP_ATOMIC allocations
of this order will succeed although you can improve the odds by increasing
min_free_kbytes. Network drivers should never have been depending on GFP_ATOMIC
succeeding like this but the hole has been dug now.
If it's happening more frequently now than it used to then either
1. The allocations are occuring more frequently where as previously a
pool might have been reused or the memory not freed for the lifetime of
the system.
2. Something has changed in the allocator. I'm not aware of recent
changes that could cause this though in such a recent time-frame.
3. Something has changed recently with respect to reclaim. There have
been changes made recently to lumpy reclaim and that might be impacting
kswapd's efforts at keeping large contiguous regions free.
4. Hotplug events that involve driver loads are more common now than they
were previously for some reason. You mention that this is a USB dongle for
example. Was it a case before that the driver loaded early and remained
resident but only active after a hotplug event? If that was the case,
the memory would be allocated once at boot. However, if an optimisation
made recently unloads those unused drivers and re-loads them later, there
would be more order-6 allocations than they were previously and manifest
as these bug reports. Is this a possibility?
The ideal would be that network drivers not make allocations like this
in the first place by, for example, DMAing the firmware across in
page-size chunks instead of one contiguous lump :/
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-21 11:02 ` Mel Gorman
0 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-21 11:02 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz
Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev-u79uwXL29TY76Z2rM5mHXA, linux-mm-Bw31MaZKKs3YtjvyW6yDsg,
James Ketrenos, Chatre, Reinette,
linux-wireless-u79uwXL29TY76Z2rM5mHXA,
ipw2100-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
On Mon, Sep 21, 2009 at 12:46:34PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > > <SNIP>
> > > > >
> > > > > This time it is an order-6 page allocation failure for rt2870sta
> > > > > (w/ upcoming driver changes) and Linus' tree from few days ago..
> > > > >
> > > >
> > > > It's another high-order atomic allocation which is difficult to grant.
> > > > I didn't look closely, but is this the same type of thing - large allocation
> > > > failure during firmware loading? If so, is this during resume or is the
> > > > device being reloaded for some other reason?
> > >
> > > Just modprobing the driver on a system running for some time.
> > >
> >
> > Was this a common situation before?
>
> Yes, just like firmware restarts with ipw2200.
>
> > > > I suspect that there are going to be a few of these bugs cropping up
> > > > every so often where network devices are assuming large atomic
> > > > allocations will succeed because the "only time they happen" is during
> > > > boot but these days are happening at runtime for other reasons.
> > >
> > > I wouldn't go so far as calling a normal order-6 (256kB) allocation on
> > > 512MB machine with 1024MB swap a bug. Moreover such failures just never
> > > happened before 2.6.31-rc1.
> >
> > It's not that normal, it's an allocation that cannot sleep and cannot
> > reclaim. Why is something like firmware loading allocating memory like
>
> OK.
>
> > that? Is this use of GFP_ATOMIC relatively recent or has it always been
> > that way?
>
> It has always been like that.
>
Nuts, why is firmware loading depending on GFP_ATOMIC?
> > > I don't know why people don't see it but for me it has a memory management
> > > regression and reliability issue written all over it.
> > >
> >
> > Possibly but drivers that reload their firmware as a response to an
> > error condition is relatively new and loading network drivers while the
> > system is already up and running a long time does not strike me as
> > typical system behaviour.
>
> Loading drivers after boot is a typical desktop/laptop behavior, please
> think about hotplug (the hardware in question is an USB dongle).
>
In that case, how reproducible is this problem so it can be
bisected? Basically, there are no guarantees that GFP_ATOMIC allocations
of this order will succeed although you can improve the odds by increasing
min_free_kbytes. Network drivers should never have been depending on GFP_ATOMIC
succeeding like this but the hole has been dug now.
If it's happening more frequently now than it used to then either
1. The allocations are occuring more frequently where as previously a
pool might have been reused or the memory not freed for the lifetime of
the system.
2. Something has changed in the allocator. I'm not aware of recent
changes that could cause this though in such a recent time-frame.
3. Something has changed recently with respect to reclaim. There have
been changes made recently to lumpy reclaim and that might be impacting
kswapd's efforts at keeping large contiguous regions free.
4. Hotplug events that involve driver loads are more common now than they
were previously for some reason. You mention that this is a USB dongle for
example. Was it a case before that the driver loaded early and remained
resident but only active after a hotplug event? If that was the case,
the memory would be allocated once at boot. However, if an optimisation
made recently unloads those unused drivers and re-loads them later, there
would be more order-6 allocations than they were previously and manifest
as these bug reports. Is this a possibility?
The ideal would be that network drivers not make allocations like this
in the first place by, for example, DMAing the firmware across in
page-size chunks instead of one contiguous lump :/
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-21 11:02 ` Mel Gorman
0 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-21 11:02 UTC (permalink / raw)
To: Bartlomiej Zolnierkiewicz
Cc: Luis R. Rodriguez, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Mon, Sep 21, 2009 at 12:46:34PM +0200, Bartlomiej Zolnierkiewicz wrote:
> > > > > <SNIP>
> > > > >
> > > > > This time it is an order-6 page allocation failure for rt2870sta
> > > > > (w/ upcoming driver changes) and Linus' tree from few days ago..
> > > > >
> > > >
> > > > It's another high-order atomic allocation which is difficult to grant.
> > > > I didn't look closely, but is this the same type of thing - large allocation
> > > > failure during firmware loading? If so, is this during resume or is the
> > > > device being reloaded for some other reason?
> > >
> > > Just modprobing the driver on a system running for some time.
> > >
> >
> > Was this a common situation before?
>
> Yes, just like firmware restarts with ipw2200.
>
> > > > I suspect that there are going to be a few of these bugs cropping up
> > > > every so often where network devices are assuming large atomic
> > > > allocations will succeed because the "only time they happen" is during
> > > > boot but these days are happening at runtime for other reasons.
> > >
> > > I wouldn't go so far as calling a normal order-6 (256kB) allocation on
> > > 512MB machine with 1024MB swap a bug. Moreover such failures just never
> > > happened before 2.6.31-rc1.
> >
> > It's not that normal, it's an allocation that cannot sleep and cannot
> > reclaim. Why is something like firmware loading allocating memory like
>
> OK.
>
> > that? Is this use of GFP_ATOMIC relatively recent or has it always been
> > that way?
>
> It has always been like that.
>
Nuts, why is firmware loading depending on GFP_ATOMIC?
> > > I don't know why people don't see it but for me it has a memory management
> > > regression and reliability issue written all over it.
> > >
> >
> > Possibly but drivers that reload their firmware as a response to an
> > error condition is relatively new and loading network drivers while the
> > system is already up and running a long time does not strike me as
> > typical system behaviour.
>
> Loading drivers after boot is a typical desktop/laptop behavior, please
> think about hotplug (the hardware in question is an USB dongle).
>
In that case, how reproducible is this problem so it can be
bisected? Basically, there are no guarantees that GFP_ATOMIC allocations
of this order will succeed although you can improve the odds by increasing
min_free_kbytes. Network drivers should never have been depending on GFP_ATOMIC
succeeding like this but the hole has been dug now.
If it's happening more frequently now than it used to then either
1. The allocations are occuring more frequently where as previously a
pool might have been reused or the memory not freed for the lifetime of
the system.
2. Something has changed in the allocator. I'm not aware of recent
changes that could cause this though in such a recent time-frame.
3. Something has changed recently with respect to reclaim. There have
been changes made recently to lumpy reclaim and that might be impacting
kswapd's efforts at keeping large contiguous regions free.
4. Hotplug events that involve driver loads are more common now than they
were previously for some reason. You mention that this is a USB dongle for
example. Was it a case before that the driver loaded early and remained
resident but only active after a hotplug event? If that was the case,
the memory would be allocated once at boot. However, if an optimisation
made recently unloads those unused drivers and re-loads them later, there
would be more order-6 allocations than they were previously and manifest
as these bug reports. Is this a possibility?
The ideal would be that network drivers not make allocations like this
in the first place by, for example, DMAing the firmware across in
page-size chunks instead of one contiguous lump :/
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
2009-09-02 18:02 ` Luis R. Rodriguez
(?)
(?)
@ 2009-09-03 12:49 ` Mel Gorman
-1 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-03 12:49 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Bartlomiej Zolnierkiewicz, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Wed, Sep 02, 2009 at 11:02:14AM -0700, Luis R. Rodriguez wrote:
> On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> Zolnierkiewicz<bzolnier@gmail.com> wrote:
> > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> >>
> >> s/2.6.30/2.6.31-rc6/
> >>
> >> The issue has always been there but it was some recent change that
> >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> >
> > ipw2200 fix works fine but yesterday I got the following error while mounting
> > ext4 filesystem (mb_history is optional so the mount succeeded):
>
> OK so the mount succeeded.
>
> > EXT4-fs (dm-2): barriers enabled
> > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > EXT4-fs (dm-2): internal journal on dm-2:8
> > EXT4-fs (dm-2): delayed allocation enabled
> > EXT4-fs: file extents enabled
> > mount: page allocation failure. order:5, mode:0xc0d0
> > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > Call Trace:
> > [<c0394de3>] ? printk+0xf/0x14
> > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > [<c016a71b>] __get_free_pages+0xf/0x32
> > [<c01865cf>] __kmalloc+0x28/0xfa
> > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > [<c01f529d>] ext4_mb_init+0x392/0x460
> > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > [<c0239bc8>] ? snprintf+0x15/0x17
> > [<c01c0b26>] ? disk_name+0x24/0x69
> > [<c018ba63>] get_sb_bdev+0xda/0x117
> > [<c01e6711>] ext4_get_sb+0x13/0x15
> > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > [<c018adad>] do_kern_mount+0x33/0xbd
> > [<c019d0af>] do_mount+0x660/0x6b8
> > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > [<c019d168>] sys_mount+0x61/0x99
> > [<c0102908>] sysenter_do_call+0x12/0x36
> > Mem-Info:
> > DMA per-cpu:
> > CPU 0: hi: 0, btch: 1 usd: 0
> > Normal per-cpu:
> > CPU 0: hi: 186, btch: 31 usd: 0
> > Active_anon:25471 active_file:22802 inactive_anon:25812
> > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 489 489
> > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 0 0
> > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > 57947 total pagecache pages
> > 878 pages in swap cache
> > Swap cache stats: add 920, delete 42, find 11/11
> > Free swap = 1016436kB
> > Total swap = 1020116kB
> > 131056 pages RAM
> > 4233 pages reserved
> > 90573 pages shared
> > 77286 pages non-shared
> > EXT4-fs: mballoc enabled
> > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> >
> > Thus it seems like the original bug is still there and any ideas how to
> > debug the problem further are appreciated..
> >
> > The complete dmesg and kernel config are here:
> >
> > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
>
> This looks very similar to the kmemleak ext4 reports upon a mount. If
> it is the same issue, which from the trace it seems it is, then this
> is due to an extra kmalloc() allocation and this apparently will not
> get fixed on 2.6.31 due to the closeness of the merge window and the
> non-criticalness this issue has been deemed.
>
I suspect the more pressing concern is why is this kmalloc() resulting in
an order-5 allocation request? What size is the buffer being requested?
Was that expected? What is the contents of /proc/slabinfo in case a buffer
that should have required order-1 or order-2 is using a higher order for
some reason.
> A patch fix is part of the ext4-patchqueue
> http://repo.or.cz/w/ext4-patch-queue.git
>
p.s. I'm will be offline until Tuesday so will not be initially very
responsive.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-03 12:49 ` Mel Gorman
0 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-03 12:49 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Bartlomiej Zolnierkiewicz, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Wed, Sep 02, 2009 at 11:02:14AM -0700, Luis R. Rodriguez wrote:
> On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> Zolnierkiewicz<bzolnier@gmail.com> wrote:
> > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> >>
> >> s/2.6.30/2.6.31-rc6/
> >>
> >> The issue has always been there but it was some recent change that
> >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> >
> > ipw2200 fix works fine but yesterday I got the following error while mounting
> > ext4 filesystem (mb_history is optional so the mount succeeded):
>
> OK so the mount succeeded.
>
> > EXT4-fs (dm-2): barriers enabled
> > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > EXT4-fs (dm-2): internal journal on dm-2:8
> > EXT4-fs (dm-2): delayed allocation enabled
> > EXT4-fs: file extents enabled
> > mount: page allocation failure. order:5, mode:0xc0d0
> > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > Call Trace:
> > [<c0394de3>] ? printk+0xf/0x14
> > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > [<c016a71b>] __get_free_pages+0xf/0x32
> > [<c01865cf>] __kmalloc+0x28/0xfa
> > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > [<c01f529d>] ext4_mb_init+0x392/0x460
> > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > [<c0239bc8>] ? snprintf+0x15/0x17
> > [<c01c0b26>] ? disk_name+0x24/0x69
> > [<c018ba63>] get_sb_bdev+0xda/0x117
> > [<c01e6711>] ext4_get_sb+0x13/0x15
> > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > [<c018adad>] do_kern_mount+0x33/0xbd
> > [<c019d0af>] do_mount+0x660/0x6b8
> > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > [<c019d168>] sys_mount+0x61/0x99
> > [<c0102908>] sysenter_do_call+0x12/0x36
> > Mem-Info:
> > DMA per-cpu:
> > CPU 0: hi: 0, btch: 1 usd: 0
> > Normal per-cpu:
> > CPU 0: hi: 186, btch: 31 usd: 0
> > Active_anon:25471 active_file:22802 inactive_anon:25812
> > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 489 489
> > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 0 0
> > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > 57947 total pagecache pages
> > 878 pages in swap cache
> > Swap cache stats: add 920, delete 42, find 11/11
> > Free swap = 1016436kB
> > Total swap = 1020116kB
> > 131056 pages RAM
> > 4233 pages reserved
> > 90573 pages shared
> > 77286 pages non-shared
> > EXT4-fs: mballoc enabled
> > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> >
> > Thus it seems like the original bug is still there and any ideas how to
> > debug the problem further are appreciated..
> >
> > The complete dmesg and kernel config are here:
> >
> > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
>
> This looks very similar to the kmemleak ext4 reports upon a mount. If
> it is the same issue, which from the trace it seems it is, then this
> is due to an extra kmalloc() allocation and this apparently will not
> get fixed on 2.6.31 due to the closeness of the merge window and the
> non-criticalness this issue has been deemed.
>
I suspect the more pressing concern is why is this kmalloc() resulting in
an order-5 allocation request? What size is the buffer being requested?
Was that expected? What is the contents of /proc/slabinfo in case a buffer
that should have required order-1 or order-2 is using a higher order for
some reason.
> A patch fix is part of the ext4-patchqueue
> http://repo.or.cz/w/ext4-patch-queue.git
>
p.s. I'm will be offline until Tuesday so will not be initially very
responsive.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-03 12:49 ` Mel Gorman
0 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-03 12:49 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Bartlomiej Zolnierkiewicz, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Wed, Sep 02, 2009 at 11:02:14AM -0700, Luis R. Rodriguez wrote:
> On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> Zolnierkiewicz<bzolnier@gmail.com> wrote:
> > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> >>
> >> s/2.6.30/2.6.31-rc6/
> >>
> >> The issue has always been there but it was some recent change that
> >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> >
> > ipw2200 fix works fine but yesterday I got the following error while mounting
> > ext4 filesystem (mb_history is optional so the mount succeeded):
>
> OK so the mount succeeded.
>
> > EXT4-fs (dm-2): barriers enabled
> > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > EXT4-fs (dm-2): internal journal on dm-2:8
> > EXT4-fs (dm-2): delayed allocation enabled
> > EXT4-fs: file extents enabled
> > mount: page allocation failure. order:5, mode:0xc0d0
> > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > Call Trace:
> > [<c0394de3>] ? printk+0xf/0x14
> > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > [<c016a71b>] __get_free_pages+0xf/0x32
> > [<c01865cf>] __kmalloc+0x28/0xfa
> > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > [<c01f529d>] ext4_mb_init+0x392/0x460
> > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > [<c0239bc8>] ? snprintf+0x15/0x17
> > [<c01c0b26>] ? disk_name+0x24/0x69
> > [<c018ba63>] get_sb_bdev+0xda/0x117
> > [<c01e6711>] ext4_get_sb+0x13/0x15
> > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > [<c018adad>] do_kern_mount+0x33/0xbd
> > [<c019d0af>] do_mount+0x660/0x6b8
> > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > [<c019d168>] sys_mount+0x61/0x99
> > [<c0102908>] sysenter_do_call+0x12/0x36
> > Mem-Info:
> > DMA per-cpu:
> > CPU 0: hi: 0, btch: 1 usd: 0
> > Normal per-cpu:
> > CPU 0: hi: 186, btch: 31 usd: 0
> > Active_anon:25471 active_file:22802 inactive_anon:25812
> > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 489 489
> > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 0 0
> > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > 57947 total pagecache pages
> > 878 pages in swap cache
> > Swap cache stats: add 920, delete 42, find 11/11
> > Free swap = 1016436kB
> > Total swap = 1020116kB
> > 131056 pages RAM
> > 4233 pages reserved
> > 90573 pages shared
> > 77286 pages non-shared
> > EXT4-fs: mballoc enabled
> > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> >
> > Thus it seems like the original bug is still there and any ideas how to
> > debug the problem further are appreciated..
> >
> > The complete dmesg and kernel config are here:
> >
> > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
>
> This looks very similar to the kmemleak ext4 reports upon a mount. If
> it is the same issue, which from the trace it seems it is, then this
> is due to an extra kmalloc() allocation and this apparently will not
> get fixed on 2.6.31 due to the closeness of the merge window and the
> non-criticalness this issue has been deemed.
>
I suspect the more pressing concern is why is this kmalloc() resulting in
an order-5 allocation request? What size is the buffer being requested?
Was that expected? What is the contents of /proc/slabinfo in case a buffer
that should have required order-1 or order-2 is using a higher order for
some reason.
> A patch fix is part of the ext4-patchqueue
> http://repo.or.cz/w/ext4-patch-queue.git
>
p.s. I'm will be offline until Tuesday so will not be initially very
responsive.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-03 12:49 ` Mel Gorman
0 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-03 12:49 UTC (permalink / raw)
To: Luis R. Rodriguez
Cc: Bartlomiej Zolnierkiewicz, Tso Ted, Aneesh Kumar K.V, Zhu Yi,
Andrew Morton, Johannes Weiner, Pekka Enberg, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Mel Gorman,
netdev, linux-mm, James Ketrenos, Chatre, Reinette,
linux-wireless, ipw2100-devel
On Wed, Sep 02, 2009 at 11:02:14AM -0700, Luis R. Rodriguez wrote:
> On Wed, Sep 2, 2009 at 10:48 AM, Bartlomiej
> Zolnierkiewicz<bzolnier@gmail.com> wrote:
> > On Sunday 30 August 2009 14:37:42 Bartlomiej Zolnierkiewicz wrote:
> >> On Friday 28 August 2009 05:42:31 Zhu Yi wrote:
> >> > Bartlomiej Zolnierkiewicz reported an atomic order-6 allocation failure
> >> > for ipw2200 firmware loading in kernel 2.6.30. High order allocation is
> >>
> >> s/2.6.30/2.6.31-rc6/
> >>
> >> The issue has always been there but it was some recent change that
> >> explicitly triggered the allocation failures (after 2.6.31-rc1).
> >
> > ipw2200 fix works fine but yesterday I got the following error while mounting
> > ext4 filesystem (mb_history is optional so the mount succeeded):
>
> OK so the mount succeeded.
>
> > EXT4-fs (dm-2): barriers enabled
> > kjournald2 starting: pid 3137, dev dm-2:8, commit interval 5 seconds
> > EXT4-fs (dm-2): internal journal on dm-2:8
> > EXT4-fs (dm-2): delayed allocation enabled
> > EXT4-fs: file extents enabled
> > mount: page allocation failure. order:5, mode:0xc0d0
> > Pid: 3136, comm: mount Not tainted 2.6.31-rc8-00015-gadda766-dirty #78
> > Call Trace:
> > [<c0394de3>] ? printk+0xf/0x14
> > [<c016a693>] __alloc_pages_nodemask+0x400/0x442
> > [<c016a71b>] __get_free_pages+0xf/0x32
> > [<c01865cf>] __kmalloc+0x28/0xfa
> > [<c023d96f>] ? __spin_lock_init+0x28/0x4d
> > [<c01f529d>] ext4_mb_init+0x392/0x460
> > [<c01e99d2>] ext4_fill_super+0x1b96/0x2012
> > [<c0239bc8>] ? snprintf+0x15/0x17
> > [<c01c0b26>] ? disk_name+0x24/0x69
> > [<c018ba63>] get_sb_bdev+0xda/0x117
> > [<c01e6711>] ext4_get_sb+0x13/0x15
> > [<c01e7e3c>] ? ext4_fill_super+0x0/0x2012
> > [<c018ad2d>] vfs_kern_mount+0x3b/0x76
> > [<c018adad>] do_kern_mount+0x33/0xbd
> > [<c019d0af>] do_mount+0x660/0x6b8
> > [<c016a71b>] ? __get_free_pages+0xf/0x32
> > [<c019d168>] sys_mount+0x61/0x99
> > [<c0102908>] sysenter_do_call+0x12/0x36
> > Mem-Info:
> > DMA per-cpu:
> > CPU 0: hi: 0, btch: 1 usd: 0
> > Normal per-cpu:
> > CPU 0: hi: 186, btch: 31 usd: 0
> > Active_anon:25471 active_file:22802 inactive_anon:25812
> > inactive_file:33619 unevictable:2 dirty:2452 writeback:135 unstable:0
> > free:4346 slab:4308 mapped:26038 pagetables:912 bounce:0
> > DMA free:2060kB min:84kB low:104kB high:124kB active_anon:1660kB inactive_anon:1848kB active_file:144kB inactive_file:868kB unevictable:0kB present:15788kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 489 489
> > Normal free:15324kB min:2788kB low:3484kB high:4180kB active_anon:100224kB inactive_anon:101400kB active_file:91064kB inactive_file:133608kB unevictable:8kB present:501392kB pages_scanned:0 all_unreclaimable? no
> > lowmem_reserve[]: 0 0 0
> > DMA: 1*4kB 1*8kB 0*16kB 0*32kB 0*64kB 0*128kB 0*256kB 0*512kB 0*1024kB 1*2048kB 0*4096kB = 2060kB
> > Normal: 1283*4kB 648*8kB 159*16kB 53*32kB 10*64kB 1*128kB 0*256kB 0*512kB 0*1024kB 0*2048kB 0*4096kB = 15324kB
> > 57947 total pagecache pages
> > 878 pages in swap cache
> > Swap cache stats: add 920, delete 42, find 11/11
> > Free swap = 1016436kB
> > Total swap = 1020116kB
> > 131056 pages RAM
> > 4233 pages reserved
> > 90573 pages shared
> > 77286 pages non-shared
> > EXT4-fs: mballoc enabled
> > EXT4-fs (dm-2): mounted filesystem with ordered data mode
> >
> > Thus it seems like the original bug is still there and any ideas how to
> > debug the problem further are appreciated..
> >
> > The complete dmesg and kernel config are here:
> >
> > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.dmesg
> > http://www.kernel.org/pub/linux/kernel/people/bart/ext4-paf.config
>
> This looks very similar to the kmemleak ext4 reports upon a mount. If
> it is the same issue, which from the trace it seems it is, then this
> is due to an extra kmalloc() allocation and this apparently will not
> get fixed on 2.6.31 due to the closeness of the merge window and the
> non-criticalness this issue has been deemed.
>
I suspect the more pressing concern is why is this kmalloc() resulting in
an order-5 allocation request? What size is the buffer being requested?
Was that expected? What is the contents of /proc/slabinfo in case a buffer
that should have required order-1 or order-2 is using a higher order for
some reason.
> A patch fix is part of the ext4-patchqueue
> http://repo.or.cz/w/ext4-patch-queue.git
>
p.s. I'm will be offline until Tuesday so will not be initially very
responsive.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
2009-09-03 12:49 ` Mel Gorman
(?)
(?)
@ 2009-09-05 14:28 ` Theodore Tso
-1 siblings, 0 replies; 286+ messages in thread
From: Theodore Tso @ 2009-09-05 14:28 UTC (permalink / raw)
To: Mel Gorman
Cc: Luis R. Rodriguez, Bartlomiej Zolnierkiewicz, Aneesh Kumar K.V,
Zhu Yi, Andrew Morton, Johannes Weiner, Pekka Enberg,
Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Thu, Sep 03, 2009 at 01:49:14PM +0100, Mel Gorman wrote:
> >
> > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > it is the same issue, which from the trace it seems it is, then this
> > is due to an extra kmalloc() allocation and this apparently will not
> > get fixed on 2.6.31 due to the closeness of the merge window and the
> > non-criticalness this issue has been deemed.
No, it's a different problem.
> I suspect the more pressing concern is why is this kmalloc() resulting in
> an order-5 allocation request? What size is the buffer being requested?
> Was that expected? What is the contents of /proc/slabinfo in case a buffer
> that should have required order-1 or order-2 is using a higher order for
> some reason.
It's allocating 68,000 bytes for the mb_history structure, which is
used for debugging purposes. That's why it's optional and we continue
if it's not allocated. We should fix it to use vmalloc() and I'm
inclined to turn it off by default since it's not worth the overhead,
and most ext4 users won't find it useful or interesting.
- Ted
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-05 14:28 ` Theodore Tso
0 siblings, 0 replies; 286+ messages in thread
From: Theodore Tso @ 2009-09-05 14:28 UTC (permalink / raw)
To: Mel Gorman
Cc: Luis R. Rodriguez, Bartlomiej Zolnierkiewicz, Aneesh Kumar K.V,
Zhu Yi, Andrew Morton, Johannes Weiner, Pekka Enberg,
Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Thu, Sep 03, 2009 at 01:49:14PM +0100, Mel Gorman wrote:
> >
> > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > it is the same issue, which from the trace it seems it is, then this
> > is due to an extra kmalloc() allocation and this apparently will not
> > get fixed on 2.6.31 due to the closeness of the merge window and the
> > non-criticalness this issue has been deemed.
No, it's a different problem.
> I suspect the more pressing concern is why is this kmalloc() resulting in
> an order-5 allocation request? What size is the buffer being requested?
> Was that expected? What is the contents of /proc/slabinfo in case a buffer
> that should have required order-1 or order-2 is using a higher order for
> some reason.
It's allocating 68,000 bytes for the mb_history structure, which is
used for debugging purposes. That's why it's optional and we continue
if it's not allocated. We should fix it to use vmalloc() and I'm
inclined to turn it off by default since it's not worth the overhead,
and most ext4 users won't find it useful or interesting.
- Ted
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-05 14:28 ` Theodore Tso
0 siblings, 0 replies; 286+ messages in thread
From: Theodore Tso @ 2009-09-05 14:28 UTC (permalink / raw)
To: Mel Gorman
Cc: Luis R. Rodriguez, Bartlomiej Zolnierkiewicz, Aneesh Kumar K.V,
Zhu Yi, Andrew Morton, Johannes Weiner, Pekka Enberg,
Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg, James Ketrenos, Chatre,
Reinette, linux-wireless-u79uwXL29TY76Z2rM5mHXA,
ipw2100-devel-5NWGOfrQmneRv+LV9MX5uipxlwaOVQ5f
On Thu, Sep 03, 2009 at 01:49:14PM +0100, Mel Gorman wrote:
> >
> > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > it is the same issue, which from the trace it seems it is, then this
> > is due to an extra kmalloc() allocation and this apparently will not
> > get fixed on 2.6.31 due to the closeness of the merge window and the
> > non-criticalness this issue has been deemed.
No, it's a different problem.
> I suspect the more pressing concern is why is this kmalloc() resulting in
> an order-5 allocation request? What size is the buffer being requested?
> Was that expected? What is the contents of /proc/slabinfo in case a buffer
> that should have required order-1 or order-2 is using a higher order for
> some reason.
It's allocating 68,000 bytes for the mb_history structure, which is
used for debugging purposes. That's why it's optional and we continue
if it's not allocated. We should fix it to use vmalloc() and I'm
inclined to turn it off by default since it's not worth the overhead,
and most ext4 users won't find it useful or interesting.
- Ted
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-05 14:28 ` Theodore Tso
0 siblings, 0 replies; 286+ messages in thread
From: Theodore Tso @ 2009-09-05 14:28 UTC (permalink / raw)
To: Mel Gorman
Cc: Luis R. Rodriguez, Bartlomiej Zolnierkiewicz, Aneesh Kumar K.V,
Zhu Yi, Andrew Morton, Johannes Weiner, Pekka Enberg,
Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Thu, Sep 03, 2009 at 01:49:14PM +0100, Mel Gorman wrote:
> >
> > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > it is the same issue, which from the trace it seems it is, then this
> > is due to an extra kmalloc() allocation and this apparently will not
> > get fixed on 2.6.31 due to the closeness of the merge window and the
> > non-criticalness this issue has been deemed.
No, it's a different problem.
> I suspect the more pressing concern is why is this kmalloc() resulting in
> an order-5 allocation request? What size is the buffer being requested?
> Was that expected? What is the contents of /proc/slabinfo in case a buffer
> that should have required order-1 or order-2 is using a higher order for
> some reason.
It's allocating 68,000 bytes for the mb_history structure, which is
used for debugging purposes. That's why it's optional and we continue
if it's not allocated. We should fix it to use vmalloc() and I'm
inclined to turn it off by default since it's not worth the overhead,
and most ext4 users won't find it useful or interesting.
- Ted
^ permalink raw reply [flat|nested] 286+ messages in thread
[parent not found: <20090905142837.GI16217-3s7WtUTddSA@public.gmane.org>]
* Re: ipw2200: firmware DMA loading rework
[not found] ` <20090905142837.GI16217-3s7WtUTddSA@public.gmane.org>
@ 2009-09-08 11:00 ` Mel Gorman
0 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-08 11:00 UTC (permalink / raw)
To: Theodore Tso, Luis R. Rodriguez, Bartlomiej Zolnierkiewicz,
Aneesh Kumar K.V, Zhu Yi
On Sat, Sep 05, 2009 at 10:28:37AM -0400, Theodore Tso wrote:
> On Thu, Sep 03, 2009 at 01:49:14PM +0100, Mel Gorman wrote:
> > >
> > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > it is the same issue, which from the trace it seems it is, then this
> > > is due to an extra kmalloc() allocation and this apparently will not
> > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > non-criticalness this issue has been deemed.
>
> No, it's a different problem.
>
> > I suspect the more pressing concern is why is this kmalloc() resulting in
> > an order-5 allocation request? What size is the buffer being requested?
> > Was that expected? What is the contents of /proc/slabinfo in case a buffer
> > that should have required order-1 or order-2 is using a higher order for
> > some reason.
>
> It's allocating 68,000 bytes for the mb_history structure, which is
> used for debugging purposes. That's why it's optional and we continue
> if it's not allocated. We should fix it to use vmalloc()
You could call with kmalloc(FLAGS|GFP_NOWARN) with a fallback to
vmalloc() and a disable if vmalloc() fails as well. Maybe check out what
kernel/profile.c#profile_init() to allocate a large buffer and do something
similar?
> and I'm
> inclined to turn it off by default since it's not worth the overhead,
> and most ext4 users won't find it useful or interesting.
>
I can't comment as I don't know what sort of debugging it's useful for.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
2009-09-05 14:28 ` Theodore Tso
` (3 preceding siblings ...)
(?)
@ 2009-09-08 11:00 ` Mel Gorman
-1 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-08 11:00 UTC (permalink / raw)
To: Theodore Tso, Luis R. Rodriguez, Bartlomiej Zolnierkiewicz,
Aneesh Kumar K.V, Zhu Yi
On Sat, Sep 05, 2009 at 10:28:37AM -0400, Theodore Tso wrote:
> On Thu, Sep 03, 2009 at 01:49:14PM +0100, Mel Gorman wrote:
> > >
> > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > it is the same issue, which from the trace it seems it is, then this
> > > is due to an extra kmalloc() allocation and this apparently will not
> > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > non-criticalness this issue has been deemed.
>
> No, it's a different problem.
>
> > I suspect the more pressing concern is why is this kmalloc() resulting in
> > an order-5 allocation request? What size is the buffer being requested?
> > Was that expected? What is the contents of /proc/slabinfo in case a buffer
> > that should have required order-1 or order-2 is using a higher order for
> > some reason.
>
> It's allocating 68,000 bytes for the mb_history structure, which is
> used for debugging purposes. That's why it's optional and we continue
> if it's not allocated. We should fix it to use vmalloc()
You could call with kmalloc(FLAGS|GFP_NOWARN) with a fallback to
vmalloc() and a disable if vmalloc() fails as well. Maybe check out what
kernel/profile.c#profile_init() to allocate a large buffer and do something
similar?
> and I'm
> inclined to turn it off by default since it's not worth the overhead,
> and most ext4 users won't find it useful or interesting.
>
I can't comment as I don't know what sort of debugging it's useful for.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
2009-09-05 14:28 ` Theodore Tso
(?)
@ 2009-09-08 11:00 ` Mel Gorman
-1 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-08 11:00 UTC (permalink / raw)
To: Theodore Tso, Luis R. Rodriguez, Bartlomiej Zolnierkiewicz,
Aneesh Kumar K.V, Zhu Yi, Andrew Morton, Johannes Weiner,
Pekka Enberg, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Sat, Sep 05, 2009 at 10:28:37AM -0400, Theodore Tso wrote:
> On Thu, Sep 03, 2009 at 01:49:14PM +0100, Mel Gorman wrote:
> > >
> > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > it is the same issue, which from the trace it seems it is, then this
> > > is due to an extra kmalloc() allocation and this apparently will not
> > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > non-criticalness this issue has been deemed.
>
> No, it's a different problem.
>
> > I suspect the more pressing concern is why is this kmalloc() resulting in
> > an order-5 allocation request? What size is the buffer being requested?
> > Was that expected? What is the contents of /proc/slabinfo in case a buffer
> > that should have required order-1 or order-2 is using a higher order for
> > some reason.
>
> It's allocating 68,000 bytes for the mb_history structure, which is
> used for debugging purposes. That's why it's optional and we continue
> if it's not allocated. We should fix it to use vmalloc()
You could call with kmalloc(FLAGS|GFP_NOWARN) with a fallback to
vmalloc() and a disable if vmalloc() fails as well. Maybe check out what
kernel/profile.c#profile_init() to allocate a large buffer and do something
similar?
> and I'm
> inclined to turn it off by default since it's not worth the overhead,
> and most ext4 users won't find it useful or interesting.
>
I can't comment as I don't know what sort of debugging it's useful for.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-08 11:00 ` Mel Gorman
0 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-08 11:00 UTC (permalink / raw)
To: Theodore Tso, Luis R. Rodriguez, Bartlomiej Zolnierkiewicz,
Aneesh Kumar K.V, Zhu Yi, Andrew Morton, Johannes Weiner,
Pekka Enberg, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Sat, Sep 05, 2009 at 10:28:37AM -0400, Theodore Tso wrote:
> On Thu, Sep 03, 2009 at 01:49:14PM +0100, Mel Gorman wrote:
> > >
> > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > it is the same issue, which from the trace it seems it is, then this
> > > is due to an extra kmalloc() allocation and this apparently will not
> > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > non-criticalness this issue has been deemed.
>
> No, it's a different problem.
>
> > I suspect the more pressing concern is why is this kmalloc() resulting in
> > an order-5 allocation request? What size is the buffer being requested?
> > Was that expected? What is the contents of /proc/slabinfo in case a buffer
> > that should have required order-1 or order-2 is using a higher order for
> > some reason.
>
> It's allocating 68,000 bytes for the mb_history structure, which is
> used for debugging purposes. That's why it's optional and we continue
> if it's not allocated. We should fix it to use vmalloc()
You could call with kmalloc(FLAGS|GFP_NOWARN) with a fallback to
vmalloc() and a disable if vmalloc() fails as well. Maybe check out what
kernel/profile.c#profile_init() to allocate a large buffer and do something
similar?
> and I'm
> inclined to turn it off by default since it's not worth the overhead,
> and most ext4 users won't find it useful or interesting.
>
I can't comment as I don't know what sort of debugging it's useful for.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-08 11:00 ` Mel Gorman
0 siblings, 0 replies; 286+ messages in thread
From: Mel Gorman @ 2009-09-08 11:00 UTC (permalink / raw)
To: Theodore Tso, Luis R. Rodriguez, Bartlomiej Zolnierkiewicz,
Aneesh Kumar K.V, Zhu Yi, Andrew Morton, Johannes Weiner,
Pekka Enberg, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Sat, Sep 05, 2009 at 10:28:37AM -0400, Theodore Tso wrote:
> On Thu, Sep 03, 2009 at 01:49:14PM +0100, Mel Gorman wrote:
> > >
> > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > it is the same issue, which from the trace it seems it is, then this
> > > is due to an extra kmalloc() allocation and this apparently will not
> > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > non-criticalness this issue has been deemed.
>
> No, it's a different problem.
>
> > I suspect the more pressing concern is why is this kmalloc() resulting in
> > an order-5 allocation request? What size is the buffer being requested?
> > Was that expected? What is the contents of /proc/slabinfo in case a buffer
> > that should have required order-1 or order-2 is using a higher order for
> > some reason.
>
> It's allocating 68,000 bytes for the mb_history structure, which is
> used for debugging purposes. That's why it's optional and we continue
> if it's not allocated. We should fix it to use vmalloc()
You could call with kmalloc(FLAGS|GFP_NOWARN) with a fallback to
vmalloc() and a disable if vmalloc() fails as well. Maybe check out what
kernel/profile.c#profile_init() to allocate a large buffer and do something
similar?
> and I'm
> inclined to turn it off by default since it's not worth the overhead,
> and most ext4 users won't find it useful or interesting.
>
I can't comment as I don't know what sort of debugging it's useful for.
--
Mel Gorman
Part-time Phd Student Linux Technology Center
University of Limerick IBM Dublin Software Lab
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
2009-09-08 11:00 ` Mel Gorman
(?)
(?)
@ 2009-09-08 20:39 ` Simon Kitching
-1 siblings, 0 replies; 286+ messages in thread
From: Simon Kitching @ 2009-09-08 20:39 UTC (permalink / raw)
To: Mel Gorman
Cc: Theodore Tso, Luis R. Rodriguez, Bartlomiej Zolnierkiewicz,
Aneesh Kumar K.V, Zhu Yi, Andrew Morton, Johannes Weiner,
Pekka Enberg, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Tue, 2009-09-08 at 12:00 +0100, Mel Gorman wrote:
> On Sat, Sep 05, 2009 at 10:28:37AM -0400, Theodore Tso wrote:
> > On Thu, Sep 03, 2009 at 01:49:14PM +0100, Mel Gorman wrote:
> > > >
> > > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > > it is the same issue, which from the trace it seems it is, then this
> > > > is due to an extra kmalloc() allocation and this apparently will not
> > > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > > non-criticalness this issue has been deemed.
> >
> > No, it's a different problem.
> >
> > > I suspect the more pressing concern is why is this kmalloc() resulting in
> > > an order-5 allocation request? What size is the buffer being requested?
> > > Was that expected? What is the contents of /proc/slabinfo in case a buffer
> > > that should have required order-1 or order-2 is using a higher order for
> > > some reason.
> >
> > It's allocating 68,000 bytes for the mb_history structure, which is
> > used for debugging purposes. That's why it's optional and we continue
> > if it's not allocated. We should fix it to use vmalloc()
>
> You could call with kmalloc(FLAGS|GFP_NOWARN) with a fallback to
> vmalloc() and a disable if vmalloc() fails as well. Maybe check out what
> kernel/profile.c#profile_init() to allocate a large buffer and do something
> similar?
>
> > and I'm
> > inclined to turn it off by default since it's not worth the overhead,
> > and most ext4 users won't find it useful or interesting.
> >
>
> I can't comment as I don't know what sort of debugging it's useful for.
>
Perhaps this is a suitable use for the new proposed flex_array? From an
initial glance, I can't see why the allocated memory has to be
contiguous..
http://lwn.net/Articles/345273/
Cheers, Simon
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-08 20:39 ` Simon Kitching
0 siblings, 0 replies; 286+ messages in thread
From: Simon Kitching @ 2009-09-08 20:39 UTC (permalink / raw)
To: Mel Gorman
Cc: Theodore Tso, Luis R. Rodriguez, Bartlomiej Zolnierkiewicz,
Aneesh Kumar K.V, Zhu Yi, Andrew Morton, Johannes Weiner,
Pekka Enberg, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Tue, 2009-09-08 at 12:00 +0100, Mel Gorman wrote:
> On Sat, Sep 05, 2009 at 10:28:37AM -0400, Theodore Tso wrote:
> > On Thu, Sep 03, 2009 at 01:49:14PM +0100, Mel Gorman wrote:
> > > >
> > > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > > it is the same issue, which from the trace it seems it is, then this
> > > > is due to an extra kmalloc() allocation and this apparently will not
> > > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > > non-criticalness this issue has been deemed.
> >
> > No, it's a different problem.
> >
> > > I suspect the more pressing concern is why is this kmalloc() resulting in
> > > an order-5 allocation request? What size is the buffer being requested?
> > > Was that expected? What is the contents of /proc/slabinfo in case a buffer
> > > that should have required order-1 or order-2 is using a higher order for
> > > some reason.
> >
> > It's allocating 68,000 bytes for the mb_history structure, which is
> > used for debugging purposes. That's why it's optional and we continue
> > if it's not allocated. We should fix it to use vmalloc()
>
> You could call with kmalloc(FLAGS|GFP_NOWARN) with a fallback to
> vmalloc() and a disable if vmalloc() fails as well. Maybe check out what
> kernel/profile.c#profile_init() to allocate a large buffer and do something
> similar?
>
> > and I'm
> > inclined to turn it off by default since it's not worth the overhead,
> > and most ext4 users won't find it useful or interesting.
> >
>
> I can't comment as I don't know what sort of debugging it's useful for.
>
Perhaps this is a suitable use for the new proposed flex_array? From an
initial glance, I can't see why the allocated memory has to be
contiguous..
http://lwn.net/Articles/345273/
Cheers, Simon
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-08 20:39 ` Simon Kitching
0 siblings, 0 replies; 286+ messages in thread
From: Simon Kitching @ 2009-09-08 20:39 UTC (permalink / raw)
To: Mel Gorman
Cc: Theodore Tso, Luis R. Rodriguez, Bartlomiej Zolnierkiewicz,
Aneesh Kumar K.V, Zhu Yi, Andrew Morton, Johannes Weiner,
Pekka Enberg, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless,
ipw2100-devel@lists.sourceforge.net
On Tue, 2009-09-08 at 12:00 +0100, Mel Gorman wrote:
> On Sat, Sep 05, 2009 at 10:28:37AM -0400, Theodore Tso wrote:
> > On Thu, Sep 03, 2009 at 01:49:14PM +0100, Mel Gorman wrote:
> > > >
> > > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > > it is the same issue, which from the trace it seems it is, then this
> > > > is due to an extra kmalloc() allocation and this apparently will not
> > > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > > non-criticalness this issue has been deemed.
> >
> > No, it's a different problem.
> >
> > > I suspect the more pressing concern is why is this kmalloc() resulting in
> > > an order-5 allocation request? What size is the buffer being requested?
> > > Was that expected? What is the contents of /proc/slabinfo in case a buffer
> > > that should have required order-1 or order-2 is using a higher order for
> > > some reason.
> >
> > It's allocating 68,000 bytes for the mb_history structure, which is
> > used for debugging purposes. That's why it's optional and we continue
> > if it's not allocated. We should fix it to use vmalloc()
>
> You could call with kmalloc(FLAGS|GFP_NOWARN) with a fallback to
> vmalloc() and a disable if vmalloc() fails as well. Maybe check out what
> kernel/profile.c#profile_init() to allocate a large buffer and do something
> similar?
>
> > and I'm
> > inclined to turn it off by default since it's not worth the overhead,
> > and most ext4 users won't find it useful or interesting.
> >
>
> I can't comment as I don't know what sort of debugging it's useful for.
>
Perhaps this is a suitable use for the new proposed flex_array? From an
initial glance, I can't see why the allocated memory has to be
contiguous..
http://lwn.net/Articles/345273/
Cheers, Simon
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: ipw2200: firmware DMA loading rework
@ 2009-09-08 20:39 ` Simon Kitching
0 siblings, 0 replies; 286+ messages in thread
From: Simon Kitching @ 2009-09-08 20:39 UTC (permalink / raw)
To: Mel Gorman
Cc: Theodore Tso, Luis R. Rodriguez, Bartlomiej Zolnierkiewicz,
Aneesh Kumar K.V, Zhu Yi, Andrew Morton, Johannes Weiner,
Pekka Enberg, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Mel Gorman, netdev, linux-mm,
James Ketrenos, Chatre, Reinette, linux-wireless, ipw2100-devel
On Tue, 2009-09-08 at 12:00 +0100, Mel Gorman wrote:
> On Sat, Sep 05, 2009 at 10:28:37AM -0400, Theodore Tso wrote:
> > On Thu, Sep 03, 2009 at 01:49:14PM +0100, Mel Gorman wrote:
> > > >
> > > > This looks very similar to the kmemleak ext4 reports upon a mount. If
> > > > it is the same issue, which from the trace it seems it is, then this
> > > > is due to an extra kmalloc() allocation and this apparently will not
> > > > get fixed on 2.6.31 due to the closeness of the merge window and the
> > > > non-criticalness this issue has been deemed.
> >
> > No, it's a different problem.
> >
> > > I suspect the more pressing concern is why is this kmalloc() resulting in
> > > an order-5 allocation request? What size is the buffer being requested?
> > > Was that expected? What is the contents of /proc/slabinfo in case a buffer
> > > that should have required order-1 or order-2 is using a higher order for
> > > some reason.
> >
> > It's allocating 68,000 bytes for the mb_history structure, which is
> > used for debugging purposes. That's why it's optional and we continue
> > if it's not allocated. We should fix it to use vmalloc()
>
> You could call with kmalloc(FLAGS|GFP_NOWARN) with a fallback to
> vmalloc() and a disable if vmalloc() fails as well. Maybe check out what
> kernel/profile.c#profile_init() to allocate a large buffer and do something
> similar?
>
> > and I'm
> > inclined to turn it off by default since it's not worth the overhead,
> > and most ext4 users won't find it useful or interesting.
> >
>
> I can't comment as I don't know what sort of debugging it's useful for.
>
Perhaps this is a suitable use for the new proposed flex_array? From an
initial glance, I can't see why the allocated memory has to be
contiguous..
http://lwn.net/Articles/345273/
Cheers, Simon
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14016] mm/ipw2200 regression
@ 2009-08-26 9:51 ` Johannes Weiner
0 siblings, 0 replies; 286+ messages in thread
From: Johannes Weiner @ 2009-08-26 9:51 UTC (permalink / raw)
To: Pekka Enberg
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Bartlomiej Zolnierkiewicz, Mel Gorman,
Andrew Morton, netdev, linux-mm
On Wed, Aug 26, 2009 at 10:27:41AM +0200, Johannes Weiner wrote:
> 64 pages, presumably 256k, for fw->boot_size while current ipw
> firmware images have ~188k. I don't know jack squat about this
> driver, but given the field name and the struct:
>
> struct ipw_fw {
> __le32 ver;
> __le32 boot_size;
> __le32 ucode_size;
> __le32 fw_size;
> u8 data[0];
> };
>
> fw->boot_size alone being that big sounds a bit fishy to me.
Scrap that, I just noticed the second call to ipw_load_firmware() a
few lines later... :)
Hannes 'when logic and proportion have fallen sloppy dead...'
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14016] mm/ipw2200 regression
@ 2009-08-26 9:51 ` Johannes Weiner
0 siblings, 0 replies; 286+ messages in thread
From: Johannes Weiner @ 2009-08-26 9:51 UTC (permalink / raw)
To: Pekka Enberg
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Bartlomiej Zolnierkiewicz, Mel Gorman,
Andrew Morton, netdev, linux-mm
On Wed, Aug 26, 2009 at 10:27:41AM +0200, Johannes Weiner wrote:
> 64 pages, presumably 256k, for fw->boot_size while current ipw
> firmware images have ~188k. I don't know jack squat about this
> driver, but given the field name and the struct:
>
> struct ipw_fw {
> __le32 ver;
> __le32 boot_size;
> __le32 ucode_size;
> __le32 fw_size;
> u8 data[0];
> };
>
> fw->boot_size alone being that big sounds a bit fishy to me.
Scrap that, I just noticed the second call to ipw_load_firmware() a
few lines later... :)
Hannes 'when logic and proportion have fallen sloppy dead...'
--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org. For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14016] mm/ipw2200 regression
@ 2009-08-26 9:51 ` Johannes Weiner
0 siblings, 0 replies; 286+ messages in thread
From: Johannes Weiner @ 2009-08-26 9:51 UTC (permalink / raw)
To: Pekka Enberg
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Bartlomiej Zolnierkiewicz, Mel Gorman,
Andrew Morton, netdev-u79uwXL29TY76Z2rM5mHXA,
linux-mm-Bw31MaZKKs3YtjvyW6yDsg
On Wed, Aug 26, 2009 at 10:27:41AM +0200, Johannes Weiner wrote:
> 64 pages, presumably 256k, for fw->boot_size while current ipw
> firmware images have ~188k. I don't know jack squat about this
> driver, but given the field name and the struct:
>
> struct ipw_fw {
> __le32 ver;
> __le32 boot_size;
> __le32 ucode_size;
> __le32 fw_size;
> u8 data[0];
> };
>
> fw->boot_size alone being that big sounds a bit fishy to me.
Scrap that, I just noticed the second call to ipw_load_firmware() a
few lines later... :)
Hannes 'when logic and proportion have fallen sloppy dead...'
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
2009-08-25 20:00 ` Rafael J. Wysocki
` (22 preceding siblings ...)
(?)
@ 2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-27 19:54 ` Mikael Pettersson
-1 siblings, 1 reply; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Mikael Pettersson
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14015
Subject : pty regressed again, breaking expect and gcc's testsuite
Submitter : Mikael Pettersson <mikpe@it.uu.se>
Date : 2009-08-14 23:41 (12 days old)
References : http://marc.info/?l=linux-kernel&m=125029329805643&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
2009-08-25 20:34 ` [Bug #14015] pty regressed again, breaking expect and gcc's testsuite Rafael J. Wysocki
@ 2009-08-27 19:54 ` Mikael Pettersson
0 siblings, 0 replies; 286+ messages in thread
From: Mikael Pettersson @ 2009-08-27 19:54 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux Kernel Mailing List, Kernel Testers List, Mikael Pettersson
On Tue, 25 Aug 2009 22:34:53 +0200 (CEST), Rafael J. Wysocki wrote:
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14015
> Subject : pty regressed again, breaking expect and gcc's testsuite
> Submitter : Mikael Pettersson <mikpe@it.uu.se>
> Date : 2009-08-14 23:41 (12 days old)
> References : http://marc.info/?l=3Dlinux-kernel&m=3D125029329805643&w=3D4
Not fixed. With 2.6.31-rc7 I'm still seeing repeatable testsuite
failures on powerpc64. Reverting to 2.6.30 makes the failures go away.
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-08-27 19:54 ` Mikael Pettersson
0 siblings, 0 replies; 286+ messages in thread
From: Mikael Pettersson @ 2009-08-27 19:54 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux Kernel Mailing List, Kernel Testers List, Mikael Pettersson
On Tue, 25 Aug 2009 22:34:53 +0200 (CEST), Rafael J. Wysocki wrote:
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14015
> Subject : pty regressed again, breaking expect and gcc's testsuite
> Submitter : Mikael Pettersson <mikpe-1zs4UD6AkMk@public.gmane.org>
> Date : 2009-08-14 23:41 (12 days old)
> References : http://marc.info/?l=3Dlinux-kernel&m=3D125029329805643&w=3D4
Not fixed. With 2.6.31-rc7 I'm still seeing repeatable testsuite
failures on powerpc64. Reverting to 2.6.30 makes the failures go away.
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-08-28 18:56 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-28 18:56 UTC (permalink / raw)
To: Mikael Pettersson; +Cc: Linux Kernel Mailing List, Kernel Testers List
On Thursday 27 August 2009, Mikael Pettersson wrote:
> On Tue, 25 Aug 2009 22:34:53 +0200 (CEST), Rafael J. Wysocki wrote:
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14015
> > Subject : pty regressed again, breaking expect and gcc's testsuite
> > Submitter : Mikael Pettersson <mikpe@it.uu.se>
> > Date : 2009-08-14 23:41 (12 days old)
> > References : http://marc.info/?l=3Dlinux-kernel&m=3D125029329805643&w=3D4
>
> Not fixed. With 2.6.31-rc7 I'm still seeing repeatable testsuite
> failures on powerpc64. Reverting to 2.6.30 makes the failures go away.
Thanks for the update.
I guess 2.6.31-rc8 doesn't make any difference, does it?
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-08-28 18:56 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-28 18:56 UTC (permalink / raw)
To: Mikael Pettersson; +Cc: Linux Kernel Mailing List, Kernel Testers List
On Thursday 27 August 2009, Mikael Pettersson wrote:
> On Tue, 25 Aug 2009 22:34:53 +0200 (CEST), Rafael J. Wysocki wrote:
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14015
> > Subject : pty regressed again, breaking expect and gcc's testsuite
> > Submitter : Mikael Pettersson <mikpe-1zs4UD6AkMk@public.gmane.org>
> > Date : 2009-08-14 23:41 (12 days old)
> > References : http://marc.info/?l=3Dlinux-kernel&m=3D125029329805643&w=3D4
>
> Not fixed. With 2.6.31-rc7 I'm still seeing repeatable testsuite
> failures on powerpc64. Reverting to 2.6.30 makes the failures go away.
Thanks for the update.
I guess 2.6.31-rc8 doesn't make any difference, does it?
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-08-28 20:23 ` Mikael Pettersson
0 siblings, 0 replies; 286+ messages in thread
From: Mikael Pettersson @ 2009-08-28 20:23 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Mikael Pettersson, Linux Kernel Mailing List, Kernel Testers List
Rafael J. Wysocki writes:
> On Thursday 27 August 2009, Mikael Pettersson wrote:
> > On Tue, 25 Aug 2009 22:34:53 +0200 (CEST), Rafael J. Wysocki wrote:
> > > The following bug entry is on the current list of known regressions
> > > from 2.6.30. Please verify if it still should be listed and let me know
> > > (either way).
> > >
> > >
> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14015
> > > Subject : pty regressed again, breaking expect and gcc's testsuite
> > > Submitter : Mikael Pettersson <mikpe@it.uu.se>
> > > Date : 2009-08-14 23:41 (12 days old)
> > > References : http://marc.info/?l=3Dlinux-kernel&m=3D125029329805643&w=3D4
> >
> > Not fixed. With 2.6.31-rc7 I'm still seeing repeatable testsuite
> > failures on powerpc64. Reverting to 2.6.30 makes the failures go away.
>
> Thanks for the update.
>
> I guess 2.6.31-rc8 doesn't make any difference, does it?
I've scheduled a number of gcc bootstraps and testsuite runs
with -rc8 on x86, powerpc64, and arm. I'll post an update in
a day or so.
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-08-28 20:23 ` Mikael Pettersson
0 siblings, 0 replies; 286+ messages in thread
From: Mikael Pettersson @ 2009-08-28 20:23 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Mikael Pettersson, Linux Kernel Mailing List, Kernel Testers List
Rafael J. Wysocki writes:
> On Thursday 27 August 2009, Mikael Pettersson wrote:
> > On Tue, 25 Aug 2009 22:34:53 +0200 (CEST), Rafael J. Wysocki wrote:
> > > The following bug entry is on the current list of known regressions
> > > from 2.6.30. Please verify if it still should be listed and let me know
> > > (either way).
> > >
> > >
> > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14015
> > > Subject : pty regressed again, breaking expect and gcc's testsuite
> > > Submitter : Mikael Pettersson <mikpe-1zs4UD6AkMk@public.gmane.org>
> > > Date : 2009-08-14 23:41 (12 days old)
> > > References : http://marc.info/?l=3Dlinux-kernel&m=3D125029329805643&w=3D4
> >
> > Not fixed. With 2.6.31-rc7 I'm still seeing repeatable testsuite
> > failures on powerpc64. Reverting to 2.6.30 makes the failures go away.
>
> Thanks for the update.
>
> I guess 2.6.31-rc8 doesn't make any difference, does it?
I've scheduled a number of gcc bootstraps and testsuite runs
with -rc8 on x86, powerpc64, and arm. I'll post an update in
a day or so.
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-08-29 14:16 ` Mikael Pettersson
0 siblings, 0 replies; 286+ messages in thread
From: Mikael Pettersson @ 2009-08-29 14:16 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List
Mikael Pettersson writes:
> Rafael J. Wysocki writes:
> > On Thursday 27 August 2009, Mikael Pettersson wrote:
> > > On Tue, 25 Aug 2009 22:34:53 +0200 (CEST), Rafael J. Wysocki wrote:
> > > > The following bug entry is on the current list of known regressions
> > > > from 2.6.30. Please verify if it still should be listed and let me know
> > > > (either way).
> > > >
> > > >
> > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14015
> > > > Subject : pty regressed again, breaking expect and gcc's testsuite
> > > > Submitter : Mikael Pettersson <mikpe@it.uu.se>
> > > > Date : 2009-08-14 23:41 (12 days old)
> > > > References : http://marc.info/?l=3Dlinux-kernel&m=3D125029329805643&w=3D4
> > >
> > > Not fixed. With 2.6.31-rc7 I'm still seeing repeatable testsuite
> > > failures on powerpc64. Reverting to 2.6.30 makes the failures go away.
> >
> > Thanks for the update.
> >
> > I guess 2.6.31-rc8 doesn't make any difference, does it?
>
> I've scheduled a number of gcc bootstraps and testsuite runs
> with -rc8 on x86, powerpc64, and arm. I'll post an update in
> a day or so.
2.6.31-rc8 results in bogus testsuite failures on all three platforms.
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-08-29 14:16 ` Mikael Pettersson
0 siblings, 0 replies; 286+ messages in thread
From: Mikael Pettersson @ 2009-08-29 14:16 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List
Mikael Pettersson writes:
> Rafael J. Wysocki writes:
> > On Thursday 27 August 2009, Mikael Pettersson wrote:
> > > On Tue, 25 Aug 2009 22:34:53 +0200 (CEST), Rafael J. Wysocki wrote:
> > > > The following bug entry is on the current list of known regressions
> > > > from 2.6.30. Please verify if it still should be listed and let me know
> > > > (either way).
> > > >
> > > >
> > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14015
> > > > Subject : pty regressed again, breaking expect and gcc's testsuite
> > > > Submitter : Mikael Pettersson <mikpe-1zs4UD6AkMk@public.gmane.org>
> > > > Date : 2009-08-14 23:41 (12 days old)
> > > > References : http://marc.info/?l=3Dlinux-kernel&m=3D125029329805643&w=3D4
> > >
> > > Not fixed. With 2.6.31-rc7 I'm still seeing repeatable testsuite
> > > failures on powerpc64. Reverting to 2.6.30 makes the failures go away.
> >
> > Thanks for the update.
> >
> > I guess 2.6.31-rc8 doesn't make any difference, does it?
>
> I've scheduled a number of gcc bootstraps and testsuite runs
> with -rc8 on x86, powerpc64, and arm. I'll post an update in
> a day or so.
2.6.31-rc8 results in bogus testsuite failures on all three platforms.
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-08-29 19:01 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-29 19:01 UTC (permalink / raw)
To: Mikael Pettersson; +Cc: Linux Kernel Mailing List, Kernel Testers List
On Saturday 29 August 2009, Mikael Pettersson wrote:
> Mikael Pettersson writes:
> > Rafael J. Wysocki writes:
> > > On Thursday 27 August 2009, Mikael Pettersson wrote:
> > > > On Tue, 25 Aug 2009 22:34:53 +0200 (CEST), Rafael J. Wysocki wrote:
> > > > > The following bug entry is on the current list of known regressions
> > > > > from 2.6.30. Please verify if it still should be listed and let me know
> > > > > (either way).
> > > > >
> > > > >
> > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14015
> > > > > Subject : pty regressed again, breaking expect and gcc's testsuite
> > > > > Submitter : Mikael Pettersson <mikpe@it.uu.se>
> > > > > Date : 2009-08-14 23:41 (12 days old)
> > > > > References : http://marc.info/?l=3Dlinux-kernel&m=3D125029329805643&w=3D4
> > > >
> > > > Not fixed. With 2.6.31-rc7 I'm still seeing repeatable testsuite
> > > > failures on powerpc64. Reverting to 2.6.30 makes the failures go away.
> > >
> > > Thanks for the update.
> > >
> > > I guess 2.6.31-rc8 doesn't make any difference, does it?
> >
> > I've scheduled a number of gcc bootstraps and testsuite runs
> > with -rc8 on x86, powerpc64, and arm. I'll post an update in
> > a day or so.
>
> 2.6.31-rc8 results in bogus testsuite failures on all three platforms.
That may be a result of the known inotify borkage in -rc8 that has been fixed
in the current Linus' tree.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-08-29 19:01 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-29 19:01 UTC (permalink / raw)
To: Mikael Pettersson; +Cc: Linux Kernel Mailing List, Kernel Testers List
On Saturday 29 August 2009, Mikael Pettersson wrote:
> Mikael Pettersson writes:
> > Rafael J. Wysocki writes:
> > > On Thursday 27 August 2009, Mikael Pettersson wrote:
> > > > On Tue, 25 Aug 2009 22:34:53 +0200 (CEST), Rafael J. Wysocki wrote:
> > > > > The following bug entry is on the current list of known regressions
> > > > > from 2.6.30. Please verify if it still should be listed and let me know
> > > > > (either way).
> > > > >
> > > > >
> > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14015
> > > > > Subject : pty regressed again, breaking expect and gcc's testsuite
> > > > > Submitter : Mikael Pettersson <mikpe-1zs4UD6AkMk@public.gmane.org>
> > > > > Date : 2009-08-14 23:41 (12 days old)
> > > > > References : http://marc.info/?l=3Dlinux-kernel&m=3D125029329805643&w=3D4
> > > >
> > > > Not fixed. With 2.6.31-rc7 I'm still seeing repeatable testsuite
> > > > failures on powerpc64. Reverting to 2.6.30 makes the failures go away.
> > >
> > > Thanks for the update.
> > >
> > > I guess 2.6.31-rc8 doesn't make any difference, does it?
> >
> > I've scheduled a number of gcc bootstraps and testsuite runs
> > with -rc8 on x86, powerpc64, and arm. I'll post an update in
> > a day or so.
>
> 2.6.31-rc8 results in bogus testsuite failures on all three platforms.
That may be a result of the known inotify borkage in -rc8 that has been fixed
in the current Linus' tree.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
2009-08-29 19:01 ` Rafael J. Wysocki
(?)
@ 2009-08-31 13:22 ` Mikael Pettersson
2009-09-01 1:34 ` Mikael Pettersson
-1 siblings, 1 reply; 286+ messages in thread
From: Mikael Pettersson @ 2009-08-31 13:22 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Mikael Pettersson, Linux Kernel Mailing List, Kernel Testers List
Rafael J. Wysocki writes:
> On Saturday 29 August 2009, Mikael Pettersson wrote:
> > Mikael Pettersson writes:
> > > Rafael J. Wysocki writes:
> > > > On Thursday 27 August 2009, Mikael Pettersson wrote:
> > > > > On Tue, 25 Aug 2009 22:34:53 +0200 (CEST), Rafael J. Wysocki wrote:
> > > > > > The following bug entry is on the current list of known regressions
> > > > > > from 2.6.30. Please verify if it still should be listed and let me know
> > > > > > (either way).
> > > > > >
> > > > > >
> > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14015
> > > > > > Subject : pty regressed again, breaking expect and gcc's testsuite
> > > > > > Submitter : Mikael Pettersson <mikpe@it.uu.se>
> > > > > > Date : 2009-08-14 23:41 (12 days old)
> > > > > > References : http://marc.info/?l=3Dlinux-kernel&m=3D125029329805643&w=3D4
> > > > >
> > > > > Not fixed. With 2.6.31-rc7 I'm still seeing repeatable testsuite
> > > > > failures on powerpc64. Reverting to 2.6.30 makes the failures go away.
> > > >
> > > > Thanks for the update.
> > > >
> > > > I guess 2.6.31-rc8 doesn't make any difference, does it?
> > >
> > > I've scheduled a number of gcc bootstraps and testsuite runs
> > > with -rc8 on x86, powerpc64, and arm. I'll post an update in
> > > a day or so.
> >
> > 2.6.31-rc8 results in bogus testsuite failures on all three platforms.
>
> That may be a result of the known inotify borkage in -rc8 that has been fixed
> in the current Linus' tree.
No, it's the same old semi-random pty breakage. My kernels are built
without inotify.
A bisection has identified Alan's
pty: Rework the pty layer to use the normal buffering logic
d945cb9cce20ac7143c2de8d88b187f62db99bdc
as the culprit. This patch introduces a massive number of bogus
failures in the gcc testsuite. Subsequent pty/tty patches do fix
most of those failures, but clearly not all.
/Mikael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-01 1:34 ` Mikael Pettersson
0 siblings, 0 replies; 286+ messages in thread
From: Mikael Pettersson @ 2009-09-01 1:34 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List
Mikael Pettersson writes:
> Rafael J. Wysocki writes:
> > On Saturday 29 August 2009, Mikael Pettersson wrote:
> > > Mikael Pettersson writes:
> > > > Rafael J. Wysocki writes:
> > > > > On Thursday 27 August 2009, Mikael Pettersson wrote:
> > > > > > On Tue, 25 Aug 2009 22:34:53 +0200 (CEST), Rafael J. Wysocki wrote:
> > > > > > > The following bug entry is on the current list of known regressions
> > > > > > > from 2.6.30. Please verify if it still should be listed and let me know
> > > > > > > (either way).
> > > > > > >
> > > > > > >
> > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14015
> > > > > > > Subject : pty regressed again, breaking expect and gcc's testsuite
> > > > > > > Submitter : Mikael Pettersson <mikpe@it.uu.se>
> > > > > > > Date : 2009-08-14 23:41 (12 days old)
> > > > > > > References : http://marc.info/?l=3Dlinux-kernel&m=3D125029329805643&w=3D4
> > > > > >
> > > > > > Not fixed. With 2.6.31-rc7 I'm still seeing repeatable testsuite
> > > > > > failures on powerpc64. Reverting to 2.6.30 makes the failures go away.
> > > > >
> > > > > Thanks for the update.
> > > > >
> > > > > I guess 2.6.31-rc8 doesn't make any difference, does it?
> > > >
> > > > I've scheduled a number of gcc bootstraps and testsuite runs
> > > > with -rc8 on x86, powerpc64, and arm. I'll post an update in
> > > > a day or so.
> > >
> > > 2.6.31-rc8 results in bogus testsuite failures on all three platforms.
> >
> > That may be a result of the known inotify borkage in -rc8 that has been fixed
> > in the current Linus' tree.
>
> No, it's the same old semi-random pty breakage. My kernels are built
> without inotify.
>
> A bisection has identified Alan's
>
> pty: Rework the pty layer to use the normal buffering logic
> d945cb9cce20ac7143c2de8d88b187f62db99bdc
>
> as the culprit. This patch introduces a massive number of bogus
> failures in the gcc testsuite. Subsequent pty/tty patches do fix
> most of those failures, but clearly not all.
Starting with 2.6.31-rc8 and reverting
85dfd81dc57e8183a277ddd7a56aa65c96f3f487 pty: fix data loss when stopped (^S/^Q)
d945cb9cce20ac7143c2de8d88b187f62db99bdc pty: Rework the pty layer to use the normal buffering logic
in that order gives me a kernel that works on both x86 and powerpc64.
So the bug is definitely limited to the pty buffering logic change.
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-01 1:34 ` Mikael Pettersson
0 siblings, 0 replies; 286+ messages in thread
From: Mikael Pettersson @ 2009-09-01 1:34 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Rafael J. Wysocki, Linux Kernel Mailing List, Kernel Testers List
Mikael Pettersson writes:
> Rafael J. Wysocki writes:
> > On Saturday 29 August 2009, Mikael Pettersson wrote:
> > > Mikael Pettersson writes:
> > > > Rafael J. Wysocki writes:
> > > > > On Thursday 27 August 2009, Mikael Pettersson wrote:
> > > > > > On Tue, 25 Aug 2009 22:34:53 +0200 (CEST), Rafael J. Wysocki wrote:
> > > > > > > The following bug entry is on the current list of known regressions
> > > > > > > from 2.6.30. Please verify if it still should be listed and let me know
> > > > > > > (either way).
> > > > > > >
> > > > > > >
> > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14015
> > > > > > > Subject : pty regressed again, breaking expect and gcc's testsuite
> > > > > > > Submitter : Mikael Pettersson <mikpe-1zs4UD6AkMk@public.gmane.org>
> > > > > > > Date : 2009-08-14 23:41 (12 days old)
> > > > > > > References : http://marc.info/?l=3Dlinux-kernel&m=3D125029329805643&w=3D4
> > > > > >
> > > > > > Not fixed. With 2.6.31-rc7 I'm still seeing repeatable testsuite
> > > > > > failures on powerpc64. Reverting to 2.6.30 makes the failures go away.
> > > > >
> > > > > Thanks for the update.
> > > > >
> > > > > I guess 2.6.31-rc8 doesn't make any difference, does it?
> > > >
> > > > I've scheduled a number of gcc bootstraps and testsuite runs
> > > > with -rc8 on x86, powerpc64, and arm. I'll post an update in
> > > > a day or so.
> > >
> > > 2.6.31-rc8 results in bogus testsuite failures on all three platforms.
> >
> > That may be a result of the known inotify borkage in -rc8 that has been fixed
> > in the current Linus' tree.
>
> No, it's the same old semi-random pty breakage. My kernels are built
> without inotify.
>
> A bisection has identified Alan's
>
> pty: Rework the pty layer to use the normal buffering logic
> d945cb9cce20ac7143c2de8d88b187f62db99bdc
>
> as the culprit. This patch introduces a massive number of bogus
> failures in the gcc testsuite. Subsequent pty/tty patches do fix
> most of those failures, but clearly not all.
Starting with 2.6.31-rc8 and reverting
85dfd81dc57e8183a277ddd7a56aa65c96f3f487 pty: fix data loss when stopped (^S/^Q)
d945cb9cce20ac7143c2de8d88b187f62db99bdc pty: Rework the pty layer to use the normal buffering logic
in that order gives me a kernel that works on both x86 and powerpc64.
So the bug is definitely limited to the pty buffering logic change.
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-01 18:42 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-09-01 18:42 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Linux Kernel Mailing List, Kernel Testers List, Alan Cox,
Linus Torvalds, Greg KH, Andrew Morton
On Tuesday 01 September 2009, Mikael Pettersson wrote:
> Mikael Pettersson writes:
> > Rafael J. Wysocki writes:
> > > On Saturday 29 August 2009, Mikael Pettersson wrote:
> > > > Mikael Pettersson writes:
> > > > > Rafael J. Wysocki writes:
> > > > > > On Thursday 27 August 2009, Mikael Pettersson wrote:
> > > > > > > On Tue, 25 Aug 2009 22:34:53 +0200 (CEST), Rafael J. Wysocki wrote:
> > > > > > > > The following bug entry is on the current list of known regressions
> > > > > > > > from 2.6.30. Please verify if it still should be listed and let me know
> > > > > > > > (either way).
> > > > > > > >
> > > > > > > >
> > > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14015
> > > > > > > > Subject : pty regressed again, breaking expect and gcc's testsuite
> > > > > > > > Submitter : Mikael Pettersson <mikpe@it.uu.se>
> > > > > > > > Date : 2009-08-14 23:41 (12 days old)
> > > > > > > > References : http://marc.info/?l=3Dlinux-kernel&m=3D125029329805643&w=3D4
> > > > > > >
> > > > > > > Not fixed. With 2.6.31-rc7 I'm still seeing repeatable testsuite
> > > > > > > failures on powerpc64. Reverting to 2.6.30 makes the failures go away.
> > > > > >
> > > > > > Thanks for the update.
> > > > > >
> > > > > > I guess 2.6.31-rc8 doesn't make any difference, does it?
> > > > >
> > > > > I've scheduled a number of gcc bootstraps and testsuite runs
> > > > > with -rc8 on x86, powerpc64, and arm. I'll post an update in
> > > > > a day or so.
> > > >
> > > > 2.6.31-rc8 results in bogus testsuite failures on all three platforms.
> > >
> > > That may be a result of the known inotify borkage in -rc8 that has been fixed
> > > in the current Linus' tree.
> >
> > No, it's the same old semi-random pty breakage. My kernels are built
> > without inotify.
> >
> > A bisection has identified Alan's
> >
> > pty: Rework the pty layer to use the normal buffering logic
> > d945cb9cce20ac7143c2de8d88b187f62db99bdc
> >
> > as the culprit. This patch introduces a massive number of bogus
> > failures in the gcc testsuite. Subsequent pty/tty patches do fix
> > most of those failures, but clearly not all.
>
> Starting with 2.6.31-rc8 and reverting
>
> 85dfd81dc57e8183a277ddd7a56aa65c96f3f487 pty: fix data loss when stopped (^S/^Q)
> d945cb9cce20ac7143c2de8d88b187f62db99bdc pty: Rework the pty layer to use the normal buffering logic
>
> in that order gives me a kernel that works on both x86 and powerpc64.
>
> So the bug is definitely limited to the pty buffering logic change.
Thanks a lot for this information, adding somme CCs to the list.
Best,
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-01 18:42 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-09-01 18:42 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Linux Kernel Mailing List, Kernel Testers List, Alan Cox,
Linus Torvalds, Greg KH, Andrew Morton
On Tuesday 01 September 2009, Mikael Pettersson wrote:
> Mikael Pettersson writes:
> > Rafael J. Wysocki writes:
> > > On Saturday 29 August 2009, Mikael Pettersson wrote:
> > > > Mikael Pettersson writes:
> > > > > Rafael J. Wysocki writes:
> > > > > > On Thursday 27 August 2009, Mikael Pettersson wrote:
> > > > > > > On Tue, 25 Aug 2009 22:34:53 +0200 (CEST), Rafael J. Wysocki wrote:
> > > > > > > > The following bug entry is on the current list of known regressions
> > > > > > > > from 2.6.30. Please verify if it still should be listed and let me know
> > > > > > > > (either way).
> > > > > > > >
> > > > > > > >
> > > > > > > > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=3D14015
> > > > > > > > Subject : pty regressed again, breaking expect and gcc's testsuite
> > > > > > > > Submitter : Mikael Pettersson <mikpe-1zs4UD6AkMk@public.gmane.org>
> > > > > > > > Date : 2009-08-14 23:41 (12 days old)
> > > > > > > > References : http://marc.info/?l=3Dlinux-kernel&m=3D125029329805643&w=3D4
> > > > > > >
> > > > > > > Not fixed. With 2.6.31-rc7 I'm still seeing repeatable testsuite
> > > > > > > failures on powerpc64. Reverting to 2.6.30 makes the failures go away.
> > > > > >
> > > > > > Thanks for the update.
> > > > > >
> > > > > > I guess 2.6.31-rc8 doesn't make any difference, does it?
> > > > >
> > > > > I've scheduled a number of gcc bootstraps and testsuite runs
> > > > > with -rc8 on x86, powerpc64, and arm. I'll post an update in
> > > > > a day or so.
> > > >
> > > > 2.6.31-rc8 results in bogus testsuite failures on all three platforms.
> > >
> > > That may be a result of the known inotify borkage in -rc8 that has been fixed
> > > in the current Linus' tree.
> >
> > No, it's the same old semi-random pty breakage. My kernels are built
> > without inotify.
> >
> > A bisection has identified Alan's
> >
> > pty: Rework the pty layer to use the normal buffering logic
> > d945cb9cce20ac7143c2de8d88b187f62db99bdc
> >
> > as the culprit. This patch introduces a massive number of bogus
> > failures in the gcc testsuite. Subsequent pty/tty patches do fix
> > most of those failures, but clearly not all.
>
> Starting with 2.6.31-rc8 and reverting
>
> 85dfd81dc57e8183a277ddd7a56aa65c96f3f487 pty: fix data loss when stopped (^S/^Q)
> d945cb9cce20ac7143c2de8d88b187f62db99bdc pty: Rework the pty layer to use the normal buffering logic
>
> in that order gives me a kernel that works on both x86 and powerpc64.
>
> So the bug is definitely limited to the pty buffering logic change.
Thanks a lot for this information, adding somme CCs to the list.
Best,
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-03 1:23 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-03 1:23 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Mikael Pettersson, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
On Tue, 1 Sep 2009, Rafael J. Wysocki wrote:
> On Tuesday 01 September 2009, Mikael Pettersson wrote:
> >
> > Starting with 2.6.31-rc8 and reverting
> >
> > 85dfd81dc57e8183a277ddd7a56aa65c96f3f487 pty: fix data loss when stopped (^S/^Q)
> > d945cb9cce20ac7143c2de8d88b187f62db99bdc pty: Rework the pty layer to use the normal buffering logic
> >
> > in that order gives me a kernel that works on both x86 and powerpc64.
> >
> > So the bug is definitely limited to the pty buffering logic change.
>
> Thanks a lot for this information, adding somme CCs to the list.
Mikael, is there any way to get the gcc testsuite to show the "expected"
vs "result" cases when the failures occur, so that we can see what the
pattern is ("it drops one character every 8kB" or something like that).
However, I get the feeling that it's really the same bug that
OGAWA-san already fixed - and that his fix just doesn't always do a 100%
of the job.
So what Ogawa did was to make sure that we flush any pending data whenever
we;re checking "do we have any data left". He did that by calling out to
tty_flush_to_ldisc(), which should flush the data through to the ldisc.
The keyword here being "should". In flush_to_ldisc(), we have at least one
case where we say "we'll delay it a bit more":
if (!tty->receive_room) {
schedule_delayed_work(&tty->buf.work, 1);
break;
}
and while I think this _should_ be ok (because if there is no
receive-room, then we'll hopefully always return non-zero from
"input_available_p()". However, we do have this really odd case that the
reader side will do "n_tty_set_room()" onlyl _after_ having checked for
input_available_p(), and so maybe we do sometimes trigger the case that
- input_available_p() tries to flush to the input buffer before checking
how much data is available, by calling 'tty_flush_to_ldisc()'
- but 'tty_flush_to_ldisc()' won't do anything, because tty->receive_room
is zero.
- so now input_available_p will say "I don't have any data", even though
there was data in the write buffers.
- we'll notice that the other end has hung up, and return EOF/EIO.
- which is very WRONG, because the other end may have hung up, but before
it did that, it wrote data that is still in the write queues, and we
should have returned that data.
Anyway, I'm not at all sure that the "receive_room == 0" case can happen
at all, but maybe it can. Ogawa-san?
Here's a totally untested trial patch. I only have this dead-slow netbook
for reading email with me, and I don't have a failing test-case anyway,
but if my analysis is right, then the patch might fix it. It just forces
the re-calculation of the receive buffer before flushing the ldisc.
(And btw, from a performance standpoint, it might make more sense to only
do this whole read-room / ldisc-flush thing if we are about to return
zero. If we already have data available, we probably shouldn't waste time
trying to see if we need to do anything fancy like this.)
CAVEAT EMPTOR. Not tested. It compiled for me, but maybe that was due to
me compiling the wrong file or something.
Linus
---
drivers/char/n_tty.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/drivers/char/n_tty.c b/drivers/char/n_tty.c
index 973be2f..7fa3452 100644
--- a/drivers/char/n_tty.c
+++ b/drivers/char/n_tty.c
@@ -1583,6 +1583,7 @@ static int n_tty_open(struct tty_struct *tty)
static inline int input_available_p(struct tty_struct *tty, int amt)
{
+ n_tty_set_room(tty);
tty_flush_to_ldisc(tty);
if (tty->icanon) {
if (tty->canon_data)
^ permalink raw reply related [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-03 1:23 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-03 1:23 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Mikael Pettersson, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
On Tue, 1 Sep 2009, Rafael J. Wysocki wrote:
> On Tuesday 01 September 2009, Mikael Pettersson wrote:
> >
> > Starting with 2.6.31-rc8 and reverting
> >
> > 85dfd81dc57e8183a277ddd7a56aa65c96f3f487 pty: fix data loss when stopped (^S/^Q)
> > d945cb9cce20ac7143c2de8d88b187f62db99bdc pty: Rework the pty layer to use the normal buffering logic
> >
> > in that order gives me a kernel that works on both x86 and powerpc64.
> >
> > So the bug is definitely limited to the pty buffering logic change.
>
> Thanks a lot for this information, adding somme CCs to the list.
Mikael, is there any way to get the gcc testsuite to show the "expected"
vs "result" cases when the failures occur, so that we can see what the
pattern is ("it drops one character every 8kB" or something like that).
However, I get the feeling that it's really the same bug that
OGAWA-san already fixed - and that his fix just doesn't always do a 100%
of the job.
So what Ogawa did was to make sure that we flush any pending data whenever
we;re checking "do we have any data left". He did that by calling out to
tty_flush_to_ldisc(), which should flush the data through to the ldisc.
The keyword here being "should". In flush_to_ldisc(), we have at least one
case where we say "we'll delay it a bit more":
if (!tty->receive_room) {
schedule_delayed_work(&tty->buf.work, 1);
break;
}
and while I think this _should_ be ok (because if there is no
receive-room, then we'll hopefully always return non-zero from
"input_available_p()". However, we do have this really odd case that the
reader side will do "n_tty_set_room()" onlyl _after_ having checked for
input_available_p(), and so maybe we do sometimes trigger the case that
- input_available_p() tries to flush to the input buffer before checking
how much data is available, by calling 'tty_flush_to_ldisc()'
- but 'tty_flush_to_ldisc()' won't do anything, because tty->receive_room
is zero.
- so now input_available_p will say "I don't have any data", even though
there was data in the write buffers.
- we'll notice that the other end has hung up, and return EOF/EIO.
- which is very WRONG, because the other end may have hung up, but before
it did that, it wrote data that is still in the write queues, and we
should have returned that data.
Anyway, I'm not at all sure that the "receive_room == 0" case can happen
at all, but maybe it can. Ogawa-san?
Here's a totally untested trial patch. I only have this dead-slow netbook
for reading email with me, and I don't have a failing test-case anyway,
but if my analysis is right, then the patch might fix it. It just forces
the re-calculation of the receive buffer before flushing the ldisc.
(And btw, from a performance standpoint, it might make more sense to only
do this whole read-room / ldisc-flush thing if we are about to return
zero. If we already have data available, we probably shouldn't waste time
trying to see if we need to do anything fancy like this.)
CAVEAT EMPTOR. Not tested. It compiled for me, but maybe that was due to
me compiling the wrong file or something.
Linus
---
drivers/char/n_tty.c | 1 +
1 files changed, 1 insertions(+), 0 deletions(-)
diff --git a/drivers/char/n_tty.c b/drivers/char/n_tty.c
index 973be2f..7fa3452 100644
--- a/drivers/char/n_tty.c
+++ b/drivers/char/n_tty.c
@@ -1583,6 +1583,7 @@ static int n_tty_open(struct tty_struct *tty)
static inline int input_available_p(struct tty_struct *tty, int amt)
{
+ n_tty_set_room(tty);
tty_flush_to_ldisc(tty);
if (tty->icanon) {
if (tty->canon_data)
^ permalink raw reply related [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
2009-09-03 1:23 ` Linus Torvalds
(?)
@ 2009-09-03 11:29 ` OGAWA Hirofumi
2009-09-03 21:00 ` Mikael Pettersson
2009-09-04 0:01 ` Linus Torvalds
-1 siblings, 2 replies; 286+ messages in thread
From: OGAWA Hirofumi @ 2009-09-03 11:29 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Mikael Pettersson, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Tue, 1 Sep 2009, Rafael J. Wysocki wrote:
>> On Tuesday 01 September 2009, Mikael Pettersson wrote:
>> >
>> > Starting with 2.6.31-rc8 and reverting
>> >
>> > 85dfd81dc57e8183a277ddd7a56aa65c96f3f487 pty: fix data loss when stopped (^S/^Q)
>> > d945cb9cce20ac7143c2de8d88b187f62db99bdc pty: Rework the pty layer to use the normal buffering logic
>> >
>> > in that order gives me a kernel that works on both x86 and powerpc64.
>> >
>> > So the bug is definitely limited to the pty buffering logic change.
>>
>> Thanks a lot for this information, adding somme CCs to the list.
>
> Mikael, is there any way to get the gcc testsuite to show the "expected"
> vs "result" cases when the failures occur, so that we can see what the
> pattern is ("it drops one character every 8kB" or something like that).
>
> However, I get the feeling that it's really the same bug that
> OGAWA-san already fixed - and that his fix just doesn't always do a 100%
> of the job.
>
> So what Ogawa did was to make sure that we flush any pending data whenever
> we;re checking "do we have any data left". He did that by calling out to
> tty_flush_to_ldisc(), which should flush the data through to the ldisc.
>
> The keyword here being "should". In flush_to_ldisc(), we have at least one
> case where we say "we'll delay it a bit more":
>
> if (!tty->receive_room) {
> schedule_delayed_work(&tty->buf.work, 1);
> break;
> }
>
> and while I think this _should_ be ok (because if there is no
> receive-room, then we'll hopefully always return non-zero from
> "input_available_p()". However, we do have this really odd case that the
> reader side will do "n_tty_set_room()" onlyl _after_ having checked for
> input_available_p(), and so maybe we do sometimes trigger the case that
>
> - input_available_p() tries to flush to the input buffer before checking
> how much data is available, by calling 'tty_flush_to_ldisc()'
>
> - but 'tty_flush_to_ldisc()' won't do anything, because tty->receive_room
> is zero.
>
> - so now input_available_p will say "I don't have any data", even though
> there was data in the write buffers.
>
> - we'll notice that the other end has hung up, and return EOF/EIO.
>
> - which is very WRONG, because the other end may have hung up, but before
> it did that, it wrote data that is still in the write queues, and we
> should have returned that data.
>
> Anyway, I'm not at all sure that the "receive_room == 0" case can happen
> at all, but maybe it can. Ogawa-san?
If I'm not missing, I think it doesn't have big change with old
code. But I would need to check more deeply.
Um.., If "receive_room == 0 && tty->read_cnt == 0" is possible, I wonder
why reverting buffer handling fixes the problem.
Well, anyway, I'd like to reproduce this on my machine. Could you tell
me the version of tools? I guess gcc testsuite using the gcc's source
(svn revision?), expect, dejagnu, tcl. (BTW, I'm using debian
testing. If it can be reproduced on kvm, I can install distro version
which you are using)
Thanks.
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-03 21:00 ` Mikael Pettersson
0 siblings, 0 replies; 286+ messages in thread
From: Mikael Pettersson @ 2009-09-03 21:00 UTC (permalink / raw)
To: OGAWA Hirofumi
Cc: Linus Torvalds, Rafael J. Wysocki, Mikael Pettersson,
Linux Kernel Mailing List, Kernel Testers List, Alan Cox,
Greg KH, Andrew Morton
OGAWA Hirofumi writes:
> Well, anyway, I'd like to reproduce this on my machine. Could you tell
> me the version of tools? I guess gcc testsuite using the gcc's source
> (svn revision?), expect, dejagnu, tcl. (BTW, I'm using debian
> testing. If it can be reproduced on kvm, I can install distro version
> which you are using)
Nothing fancy needed. You can use the gcc-4.4.1 release tarball and any recent
gcc-4.3 weekly snapshot tarball, like 4.3-20090830.
I've always seen the bogus errors in the C or C++ testsuites, so to save
some time you can just --enable-languages=c,c++ when building gcc.
The machines I've been running the testsuites on are a mix of architectures
running older or stability-oriented distros:
M1: i686 PC, Fedora Core 6, dejagnu-1.4.4-5.1, expect-5.43.0-5.1, tcl-8.4.13-3.fc6
M2: powerpc64 (G5), YellowDog 6.2, dejagnu-1.4.4-5.1, expect-5.43.0-5.1, tcl-8.4.13-3
M3: ARM, FC8-based, dejagnu-1.4.4-12.fc8, expect-5.43.0-9.fc8, tcl-8.4.17-1.fc8
/Mikael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-03 21:00 ` Mikael Pettersson
0 siblings, 0 replies; 286+ messages in thread
From: Mikael Pettersson @ 2009-09-03 21:00 UTC (permalink / raw)
To: OGAWA Hirofumi
Cc: Linus Torvalds, Rafael J. Wysocki, Mikael Pettersson,
Linux Kernel Mailing List, Kernel Testers List, Alan Cox,
Greg KH, Andrew Morton
OGAWA Hirofumi writes:
> Well, anyway, I'd like to reproduce this on my machine. Could you tell
> me the version of tools? I guess gcc testsuite using the gcc's source
> (svn revision?), expect, dejagnu, tcl. (BTW, I'm using debian
> testing. If it can be reproduced on kvm, I can install distro version
> which you are using)
Nothing fancy needed. You can use the gcc-4.4.1 release tarball and any recent
gcc-4.3 weekly snapshot tarball, like 4.3-20090830.
I've always seen the bogus errors in the C or C++ testsuites, so to save
some time you can just --enable-languages=c,c++ when building gcc.
The machines I've been running the testsuites on are a mix of architectures
running older or stability-oriented distros:
M1: i686 PC, Fedora Core 6, dejagnu-1.4.4-5.1, expect-5.43.0-5.1, tcl-8.4.13-3.fc6
M2: powerpc64 (G5), YellowDog 6.2, dejagnu-1.4.4-5.1, expect-5.43.0-5.1, tcl-8.4.13-3
M3: ARM, FC8-based, dejagnu-1.4.4-12.fc8, expect-5.43.0-9.fc8, tcl-8.4.17-1.fc8
/Mikael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 0:01 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-04 0:01 UTC (permalink / raw)
To: OGAWA Hirofumi
Cc: Rafael J. Wysocki, Mikael Pettersson, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton
On Thu, 3 Sep 2009, OGAWA Hirofumi wrote:
>
> If I'm not missing, I think it doesn't have big change with old
> code. But I would need to check more deeply.
The thing is, the old pty code pushed _directly_ to the receiving ldisc,
with no buffering. I'm not entirely sure why Alan felt it needed changing,
but moving over to the generic tty buffering code did get rid of some
duplicate logic, and the locking is now done in one place, so that's
probably the main reason.
Anyway, the old pty code would be entirely synchronous, and would do the
ld->ops->receive_buf(to, buf, NULL, c);
to push the data all the way to the receive side frm pty_write(). So with
the old code, the destination "receive_room" was always accurate, because
both the reading side and the writing side basically accessed it directly.
With the new code, it all goes through tty_buffer.c, and the bugs have
been mostly about the receiving side not seeing all the data in the
buffers. And those buffers simply didn't use to exist before.
> Um.., If "receive_room == 0 && tty->read_cnt == 0" is possible, I wonder
> why reverting buffer handling fixes the problem.
In the old code, if 'receive_room' was zero, then the writer would simply
stop writing (no buffers in between). So in the old code, you could never
get into a situation where receive_room was zero and there was still
pending data.
At least that's how I read the situation.
If I'm right, I'm hoping that the patch I sent out fixes it, and if so,
we'll do that for 2.6.31 (and then after that maybe re-think whether the
extra buffering is worth all this pain).
And if it _doesn't_ fix it, then I think we'll just have to revert the
commits in question. We won't have time to root-cause it if the above
isn't it.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 0:01 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-04 0:01 UTC (permalink / raw)
To: OGAWA Hirofumi
Cc: Rafael J. Wysocki, Mikael Pettersson, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton
On Thu, 3 Sep 2009, OGAWA Hirofumi wrote:
>
> If I'm not missing, I think it doesn't have big change with old
> code. But I would need to check more deeply.
The thing is, the old pty code pushed _directly_ to the receiving ldisc,
with no buffering. I'm not entirely sure why Alan felt it needed changing,
but moving over to the generic tty buffering code did get rid of some
duplicate logic, and the locking is now done in one place, so that's
probably the main reason.
Anyway, the old pty code would be entirely synchronous, and would do the
ld->ops->receive_buf(to, buf, NULL, c);
to push the data all the way to the receive side frm pty_write(). So with
the old code, the destination "receive_room" was always accurate, because
both the reading side and the writing side basically accessed it directly.
With the new code, it all goes through tty_buffer.c, and the bugs have
been mostly about the receiving side not seeing all the data in the
buffers. And those buffers simply didn't use to exist before.
> Um.., If "receive_room == 0 && tty->read_cnt == 0" is possible, I wonder
> why reverting buffer handling fixes the problem.
In the old code, if 'receive_room' was zero, then the writer would simply
stop writing (no buffers in between). So in the old code, you could never
get into a situation where receive_room was zero and there was still
pending data.
At least that's how I read the situation.
If I'm right, I'm hoping that the patch I sent out fixes it, and if so,
we'll do that for 2.6.31 (and then after that maybe re-think whether the
extra buffering is worth all this pain).
And if it _doesn't_ fix it, then I think we'll just have to revert the
commits in question. We won't have time to root-cause it if the above
isn't it.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 1:41 ` OGAWA Hirofumi
0 siblings, 0 replies; 286+ messages in thread
From: OGAWA Hirofumi @ 2009-09-04 1:41 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Mikael Pettersson, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Thu, 3 Sep 2009, OGAWA Hirofumi wrote:
>>
>> If I'm not missing, I think it doesn't have big change with old
>> code. But I would need to check more deeply.
>
> The thing is, the old pty code pushed _directly_ to the receiving ldisc,
> with no buffering.
Yes.
> I'm not entirely sure why Alan felt it needed changing,
> but moving over to the generic tty buffering code did get rid of some
> duplicate logic, and the locking is now done in one place, so that's
> probably the main reason.
IIRC, ppp had the locking issue without that patch?
> Anyway, the old pty code would be entirely synchronous, and would do the
>
> ld->ops->receive_buf(to, buf, NULL, c);
>
> to push the data all the way to the receive side frm pty_write(). So with
> the old code, the destination "receive_room" was always accurate, because
> both the reading side and the writing side basically accessed it directly.
>
> With the new code, it all goes through tty_buffer.c, and the bugs have
> been mostly about the receiving side not seeing all the data in the
> buffers. And those buffers simply didn't use to exist before.
Yes. However, pty_write() checks tty_buffer instead of receive_room. So
I thought, the change of write side is mainly buffer size (receive_room
size + tty_buffer size). It will stop after filling tty_buffer, not
receive_room.
And (I hope) the read side guarantees to consume both buffers. If it is
right, I guessed the change is timing issues with more larger buffer
size.
>> Um.., If "receive_room == 0 && tty->read_cnt == 0" is possible, I wonder
>> why reverting buffer handling fixes the problem.
>
> In the old code, if 'receive_room' was zero, then the writer would simply
> stop writing (no buffers in between). So in the old code, you could never
> get into a situation where receive_room was zero and there was still
> pending data.
>
> At least that's how I read the situation.
Another possibility in my guess is the change of pty_flush_buffer() and
pty_chars_in_buffer(). I'm not sure at all though, especially, I'm
suspecting pty_flush_buffer() may change the behaviors.
> If I'm right, I'm hoping that the patch I sent out fixes it, and if so,
> we'll do that for 2.6.31 (and then after that maybe re-think whether the
> extra buffering is worth all this pain).
I also hope it works.
> And if it _doesn't_ fix it, then I think we'll just have to revert the
> commits in question. We won't have time to root-cause it if the above
> isn't it.
At least for me, it sounds like good if revert works. I have no
preference about it.
FWIW, meanwhile, I'll just try to see the root-cause of this as
another/fallback solution.
Thanks.
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 1:41 ` OGAWA Hirofumi
0 siblings, 0 replies; 286+ messages in thread
From: OGAWA Hirofumi @ 2009-09-04 1:41 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Mikael Pettersson, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton
Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> writes:
> On Thu, 3 Sep 2009, OGAWA Hirofumi wrote:
>>
>> If I'm not missing, I think it doesn't have big change with old
>> code. But I would need to check more deeply.
>
> The thing is, the old pty code pushed _directly_ to the receiving ldisc,
> with no buffering.
Yes.
> I'm not entirely sure why Alan felt it needed changing,
> but moving over to the generic tty buffering code did get rid of some
> duplicate logic, and the locking is now done in one place, so that's
> probably the main reason.
IIRC, ppp had the locking issue without that patch?
> Anyway, the old pty code would be entirely synchronous, and would do the
>
> ld->ops->receive_buf(to, buf, NULL, c);
>
> to push the data all the way to the receive side frm pty_write(). So with
> the old code, the destination "receive_room" was always accurate, because
> both the reading side and the writing side basically accessed it directly.
>
> With the new code, it all goes through tty_buffer.c, and the bugs have
> been mostly about the receiving side not seeing all the data in the
> buffers. And those buffers simply didn't use to exist before.
Yes. However, pty_write() checks tty_buffer instead of receive_room. So
I thought, the change of write side is mainly buffer size (receive_room
size + tty_buffer size). It will stop after filling tty_buffer, not
receive_room.
And (I hope) the read side guarantees to consume both buffers. If it is
right, I guessed the change is timing issues with more larger buffer
size.
>> Um.., If "receive_room == 0 && tty->read_cnt == 0" is possible, I wonder
>> why reverting buffer handling fixes the problem.
>
> In the old code, if 'receive_room' was zero, then the writer would simply
> stop writing (no buffers in between). So in the old code, you could never
> get into a situation where receive_room was zero and there was still
> pending data.
>
> At least that's how I read the situation.
Another possibility in my guess is the change of pty_flush_buffer() and
pty_chars_in_buffer(). I'm not sure at all though, especially, I'm
suspecting pty_flush_buffer() may change the behaviors.
> If I'm right, I'm hoping that the patch I sent out fixes it, and if so,
> we'll do that for 2.6.31 (and then after that maybe re-think whether the
> extra buffering is worth all this pain).
I also hope it works.
> And if it _doesn't_ fix it, then I think we'll just have to revert the
> commits in question. We won't have time to root-cause it if the above
> isn't it.
At least for me, it sounds like good if revert works. I have no
preference about it.
FWIW, meanwhile, I'll just try to see the root-cause of this as
another/fallback solution.
Thanks.
--
OGAWA Hirofumi <hirofumi-UIVanBePwB70ZhReMnHkpc8NsWr+9BEh@public.gmane.org>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 1:52 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-04 1:52 UTC (permalink / raw)
To: OGAWA Hirofumi
Cc: Rafael J. Wysocki, Mikael Pettersson, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton
On Fri, 4 Sep 2009, OGAWA Hirofumi wrote:
>
> Yes. However, pty_write() checks tty_buffer instead of receive_room. So
> I thought, the change of write side is mainly buffer size (receive_room
> size + tty_buffer size).
The problem has never been the write side. That side works - just with
extra buffering.
> It will stop after filling tty_buffer, not receive_room.
Yes.
> And (I hope) the read side guarantees to consume both buffers. If it is
> right, I guessed the change is timing issues with more larger buffer
> size.
That's the change. The read side only consumes the buffers it _sees_. And
it doesn't look at the buffers that the write side has written, only at
the 'received' buffers. That's why we had to add that 'tty_flush_to_ldisc'
so that the buffers that got written were properly moved to the receive
side.
And that's the part that I suspect is broken - ie tty_flush_to_ldisc
doesn't always guarantee that it moves all the written stuff to the
receive side.
Before, this wasn't an issue, because the writer always filled up the
receive buffers directly, so there was never any flushing issues.
> Another possibility in my guess is the change of pty_flush_buffer() and
> pty_chars_in_buffer(). I'm not sure at all though, especially, I'm
> suspecting pty_flush_buffer() may change the behaviors.
I don't think 'pty_flush_buffer()' is ever called in any normal
circumstances. Afaik, it's only called for a TIOCFLUSH ioctl (or whatever
it's called) when the user asks for all the contents to be thrown away.
> FWIW, meanwhile, I'll just try to see the root-cause of this as
> another/fallback solution.
Absolutely. If you can find some other possibility, that would be great.
I'm not really sure how that 'receive_room == 0' case would ever happen in
practice, so my patch was really based on the assumption that the bug is
in the flushing code.
The bug could easily be elsewhere.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 1:52 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-04 1:52 UTC (permalink / raw)
To: OGAWA Hirofumi
Cc: Rafael J. Wysocki, Mikael Pettersson, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton
On Fri, 4 Sep 2009, OGAWA Hirofumi wrote:
>
> Yes. However, pty_write() checks tty_buffer instead of receive_room. So
> I thought, the change of write side is mainly buffer size (receive_room
> size + tty_buffer size).
The problem has never been the write side. That side works - just with
extra buffering.
> It will stop after filling tty_buffer, not receive_room.
Yes.
> And (I hope) the read side guarantees to consume both buffers. If it is
> right, I guessed the change is timing issues with more larger buffer
> size.
That's the change. The read side only consumes the buffers it _sees_. And
it doesn't look at the buffers that the write side has written, only at
the 'received' buffers. That's why we had to add that 'tty_flush_to_ldisc'
so that the buffers that got written were properly moved to the receive
side.
And that's the part that I suspect is broken - ie tty_flush_to_ldisc
doesn't always guarantee that it moves all the written stuff to the
receive side.
Before, this wasn't an issue, because the writer always filled up the
receive buffers directly, so there was never any flushing issues.
> Another possibility in my guess is the change of pty_flush_buffer() and
> pty_chars_in_buffer(). I'm not sure at all though, especially, I'm
> suspecting pty_flush_buffer() may change the behaviors.
I don't think 'pty_flush_buffer()' is ever called in any normal
circumstances. Afaik, it's only called for a TIOCFLUSH ioctl (or whatever
it's called) when the user asks for all the contents to be thrown away.
> FWIW, meanwhile, I'll just try to see the root-cause of this as
> another/fallback solution.
Absolutely. If you can find some other possibility, that would be great.
I'm not really sure how that 'receive_room == 0' case would ever happen in
practice, so my patch was really based on the assumption that the bug is
in the flushing code.
The bug could easily be elsewhere.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 15:28 ` Alan Cox
0 siblings, 0 replies; 286+ messages in thread
From: Alan Cox @ 2009-09-04 15:28 UTC (permalink / raw)
To: Linus Torvalds
Cc: OGAWA Hirofumi, Rafael J. Wysocki, Mikael Pettersson,
Linux Kernel Mailing List, Kernel Testers List, Alan Cox,
Greg KH, Andrew Morton
> And if it _doesn't_ fix it, then I think we'll just have to revert the
> commits in question. We won't have time to root-cause it if the above
> isn't it.
In which case ppp will no longer work properly in some cases (ditto
other protocols) and things like the pppoe gateway wont work as they
don't in 2.6.30 - you need to go back to somewhere between 2.6.28/29 to
undo this, then apply the alternative locking patches to the
ppp/slip/ax25/etc ldiscs
Is the missing byte always part of the \r\n - that is handled slightly
specially by the n_tty ldisc code and has always been buggy if there is
flow control buffering - back to 1.3 and probably earlier. It happens on
real ttys too given sufficient flow control blockage because the char by
char opost stuff never did space/buffering checks. At least I don't think
the n_tty rework fixed it.
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 15:28 ` Alan Cox
0 siblings, 0 replies; 286+ messages in thread
From: Alan Cox @ 2009-09-04 15:28 UTC (permalink / raw)
To: Linus Torvalds
Cc: OGAWA Hirofumi, Rafael J. Wysocki, Mikael Pettersson,
Linux Kernel Mailing List, Kernel Testers List, Alan Cox,
Greg KH, Andrew Morton
> And if it _doesn't_ fix it, then I think we'll just have to revert the
> commits in question. We won't have time to root-cause it if the above
> isn't it.
In which case ppp will no longer work properly in some cases (ditto
other protocols) and things like the pppoe gateway wont work as they
don't in 2.6.30 - you need to go back to somewhere between 2.6.28/29 to
undo this, then apply the alternative locking patches to the
ppp/slip/ax25/etc ldiscs
Is the missing byte always part of the \r\n - that is handled slightly
specially by the n_tty ldisc code and has always been buggy if there is
flow control buffering - back to 1.3 and probably earlier. It happens on
real ttys too given sufficient flow control blockage because the char by
char opost stuff never did space/buffering checks. At least I don't think
the n_tty rework fixed it.
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 17:33 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-04 17:33 UTC (permalink / raw)
To: Alan Cox
Cc: OGAWA Hirofumi, Rafael J. Wysocki, Mikael Pettersson,
Linux Kernel Mailing List, Kernel Testers List, Alan Cox,
Greg KH, Andrew Morton
On Fri, 4 Sep 2009, Alan Cox wrote:
>
> In which case ppp will no longer work properly in some cases (ditto
> other protocols) and things like the pppoe gateway wont work as they
> don't in 2.6.30 - you need to go back to somewhere between 2.6.28/29 to
> undo this, then apply the alternative locking patches to the
> ppp/slip/ax25/etc ldiscs
It should be fairly trivial to just add the locking to the pty write
routines. That said, we need to fix the 2.6.31 regression, and right now
that is the big one. If we go back to broken 2.6.30 situation, that's way
more acceptable.
But I'm still hoping we can fix this.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 17:33 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-04 17:33 UTC (permalink / raw)
To: Alan Cox
Cc: OGAWA Hirofumi, Rafael J. Wysocki, Mikael Pettersson,
Linux Kernel Mailing List, Kernel Testers List, Alan Cox,
Greg KH, Andrew Morton
On Fri, 4 Sep 2009, Alan Cox wrote:
>
> In which case ppp will no longer work properly in some cases (ditto
> other protocols) and things like the pppoe gateway wont work as they
> don't in 2.6.30 - you need to go back to somewhere between 2.6.28/29 to
> undo this, then apply the alternative locking patches to the
> ppp/slip/ax25/etc ldiscs
It should be fairly trivial to just add the locking to the pty write
routines. That said, we need to fix the 2.6.31 regression, and right now
that is the big one. If we go back to broken 2.6.30 situation, that's way
more acceptable.
But I'm still hoping we can fix this.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
2009-09-03 1:23 ` Linus Torvalds
(?)
(?)
@ 2009-09-03 20:27 ` Mikael Pettersson
-1 siblings, 0 replies; 286+ messages in thread
From: Mikael Pettersson @ 2009-09-03 20:27 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Mikael Pettersson, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
Linus Torvalds writes:
>
>
> On Tue, 1 Sep 2009, Rafael J. Wysocki wrote:
> > On Tuesday 01 September 2009, Mikael Pettersson wrote:
> > >
> > > Starting with 2.6.31-rc8 and reverting
> > >
> > > 85dfd81dc57e8183a277ddd7a56aa65c96f3f487 pty: fix data loss when stopped (^S/^Q)
> > > d945cb9cce20ac7143c2de8d88b187f62db99bdc pty: Rework the pty layer to use the normal buffering logic
> > >
> > > in that order gives me a kernel that works on both x86 and powerpc64.
> > >
> > > So the bug is definitely limited to the pty buffering logic change.
> >
> > Thanks a lot for this information, adding somme CCs to the list.
>
> Mikael, is there any way to get the gcc testsuite to show the "expected"
> vs "result" cases when the failures occur, so that we can see what the
> pattern is ("it drops one character every 8kB" or something like that).
I don't know. It dumps a summary on stdout and saves logs in various
subdirs. Those logs do show the gcc commands executed and the output
from those commands, but they don't show why some test is considered
failed, or the exact boundaries of each fragment of text returned by
read(2) [which I guess may be significant].
What I can say for certain is that the test cases I've seen fail most
frequently deliberately generate lots of diagnostic output from the
compiler, like 100s or 1000s of lines of warning/error messages.
So the comments in the pty changes that talk about using some 8KB
buffer instead of a 64KB tty flip buffer definitely made me nervous.
> However, I get the feeling that it's really the same bug that
> OGAWA-san already fixed - and that his fix just doesn't always do a 100%
> of the job.
>
> So what Ogawa did was to make sure that we flush any pending data whenever
> we;re checking "do we have any data left". He did that by calling out to
> tty_flush_to_ldisc(), which should flush the data through to the ldisc.
>
> The keyword here being "should". In flush_to_ldisc(), we have at least one
> case where we say "we'll delay it a bit more":
>
> if (!tty->receive_room) {
> schedule_delayed_work(&tty->buf.work, 1);
> break;
> }
>
> and while I think this _should_ be ok (because if there is no
> receive-room, then we'll hopefully always return non-zero from
> "input_available_p()". However, we do have this really odd case that the
> reader side will do "n_tty_set_room()" onlyl _after_ having checked for
> input_available_p(), and so maybe we do sometimes trigger the case that
>
> - input_available_p() tries to flush to the input buffer before checking
> how much data is available, by calling 'tty_flush_to_ldisc()'
>
> - but 'tty_flush_to_ldisc()' won't do anything, because tty->receive_room
> is zero.
>
> - so now input_available_p will say "I don't have any data", even though
> there was data in the write buffers.
>
> - we'll notice that the other end has hung up, and return EOF/EIO.
>
> - which is very WRONG, because the other end may have hung up, but before
> it did that, it wrote data that is still in the write queues, and we
> should have returned that data.
>
> Anyway, I'm not at all sure that the "receive_room == 0" case can happen
> at all, but maybe it can. Ogawa-san?
>
> Here's a totally untested trial patch. I only have this dead-slow netbook
> for reading email with me, and I don't have a failing test-case anyway,
> but if my analysis is right, then the patch might fix it. It just forces
> the re-calculation of the receive buffer before flushing the ldisc.
>
> (And btw, from a performance standpoint, it might make more sense to only
> do this whole read-room / ldisc-flush thing if we are about to return
> zero. If we already have data available, we probably shouldn't waste time
> trying to see if we need to do anything fancy like this.)
>
> CAVEAT EMPTOR. Not tested. It compiled for me, but maybe that was due to
> me compiling the wrong file or something.
>
> Linus
Thanks. I'll give this a try tomorrow.
/Mikael
> drivers/char/n_tty.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/char/n_tty.c b/drivers/char/n_tty.c
> index 973be2f..7fa3452 100644
> --- a/drivers/char/n_tty.c
> +++ b/drivers/char/n_tty.c
> @@ -1583,6 +1583,7 @@ static int n_tty_open(struct tty_struct *tty)
>
> static inline int input_available_p(struct tty_struct *tty, int amt)
> {
> + n_tty_set_room(tty);
> tty_flush_to_ldisc(tty);
> if (tty->icanon) {
> if (tty->canon_data)
>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 13:23 ` Mikael Pettersson
0 siblings, 0 replies; 286+ messages in thread
From: Mikael Pettersson @ 2009-09-04 13:23 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Mikael Pettersson, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
Linus Torvalds writes:
> Here's a totally untested trial patch. I only have this dead-slow netbook
> for reading email with me, and I don't have a failing test-case anyway,
> but if my analysis is right, then the patch might fix it. It just forces
> the re-calculation of the receive buffer before flushing the ldisc.
>
> (And btw, from a performance standpoint, it might make more sense to only
> do this whole read-room / ldisc-flush thing if we are about to return
> zero. If we already have data available, we probably shouldn't waste time
> trying to see if we need to do anything fancy like this.)
>
> CAVEAT EMPTOR. Not tested. It compiled for me, but maybe that was due to
> me compiling the wrong file or something.
>
> Linus
>
> ---
> drivers/char/n_tty.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/char/n_tty.c b/drivers/char/n_tty.c
> index 973be2f..7fa3452 100644
> --- a/drivers/char/n_tty.c
> +++ b/drivers/char/n_tty.c
> @@ -1583,6 +1583,7 @@ static int n_tty_open(struct tty_struct *tty)
>
> static inline int input_available_p(struct tty_struct *tty, int amt)
> {
> + n_tty_set_room(tty);
> tty_flush_to_ldisc(tty);
> if (tty->icanon) {
> if (tty->canon_data)
>
Unfortunately this did not fix the bug. The gcc-4.3 testsuite failed
as usual in gcc.dg/c99-typespec-1.c.
Comparing the gcc outputs for this test case from runs with 2.6.30 and
2.6.31-rc8 shows that 2.6.31-rc8 lost a single newline (\n) byte at byte
offset 131660. So two lines of diagnostics were fused together and the
testsuite framework failed to match the second of those lines.
This is what 2.6.30 output at that place:
/mnt/work1/gcc-4.3-20090830/gcc/testsuite/gcc.dg/c99-typespec-1.c:1143: error: two or more data types in declaration specifiers
/mnt/work1/gcc-4.3-20090830/gcc/testsuite/gcc.dg/c99-typespec-1.c:1144: error: two or more data types in declaration specifiers
/mnt/work1/gcc-4.3-20090830/gcc/testsuite/gcc.dg/c99-typespec-1.c:1145: error: both 'long' and 'short' in declaration specifiers
/mnt/work1/gcc-4.3-20090830/gcc/testsuite/gcc.dg/c99-typespec-1.c:1146: error: two or more data types in declaration specifiers
And this is what 2.6.31-rc8 + the patch output at that place:
/mnt/work1/gcc-4.3-20090830/gcc/testsuite/gcc.dg/c99-typespec-1.c:1143: error: two or more data types in declaration specifiers
/mnt/work1/gcc-4.3-20090830/gcc/testsuite/gcc.dg/c99-typespec-1.c:1144: error: two or more data types in declaration specifiers/mnt/work1/gcc-4.3-20090830/gcc/testsuite/gcc.dg/c99-typespec-1.c:1145: error: both 'long' and 'short' in declaration specifiers
/mnt/work1/gcc-4.3-20090830/gcc/testsuite/gcc.dg/c99-typespec-1.c:1146: error: two or more data types in declaration specifiers
The actual logs use \r\n line endings, so between the diagnostics for source
lines 1144 and 1145 there is now a single \r. Some software will display \r
line ending as \r\n, so a missing \n may not be visible. So I've removed the
\r characters in the text above to avoid affecting how it is presented.
The original logs are available in <http://user.it.uu.se/~mikpe/linux/pty-bug/>
if you need them.
/Mikael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 13:23 ` Mikael Pettersson
0 siblings, 0 replies; 286+ messages in thread
From: Mikael Pettersson @ 2009-09-04 13:23 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Mikael Pettersson, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
Linus Torvalds writes:
> Here's a totally untested trial patch. I only have this dead-slow netbook
> for reading email with me, and I don't have a failing test-case anyway,
> but if my analysis is right, then the patch might fix it. It just forces
> the re-calculation of the receive buffer before flushing the ldisc.
>
> (And btw, from a performance standpoint, it might make more sense to only
> do this whole read-room / ldisc-flush thing if we are about to return
> zero. If we already have data available, we probably shouldn't waste time
> trying to see if we need to do anything fancy like this.)
>
> CAVEAT EMPTOR. Not tested. It compiled for me, but maybe that was due to
> me compiling the wrong file or something.
>
> Linus
>
> ---
> drivers/char/n_tty.c | 1 +
> 1 files changed, 1 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/char/n_tty.c b/drivers/char/n_tty.c
> index 973be2f..7fa3452 100644
> --- a/drivers/char/n_tty.c
> +++ b/drivers/char/n_tty.c
> @@ -1583,6 +1583,7 @@ static int n_tty_open(struct tty_struct *tty)
>
> static inline int input_available_p(struct tty_struct *tty, int amt)
> {
> + n_tty_set_room(tty);
> tty_flush_to_ldisc(tty);
> if (tty->icanon) {
> if (tty->canon_data)
>
Unfortunately this did not fix the bug. The gcc-4.3 testsuite failed
as usual in gcc.dg/c99-typespec-1.c.
Comparing the gcc outputs for this test case from runs with 2.6.30 and
2.6.31-rc8 shows that 2.6.31-rc8 lost a single newline (\n) byte at byte
offset 131660. So two lines of diagnostics were fused together and the
testsuite framework failed to match the second of those lines.
This is what 2.6.30 output at that place:
/mnt/work1/gcc-4.3-20090830/gcc/testsuite/gcc.dg/c99-typespec-1.c:1143: error: two or more data types in declaration specifiers
/mnt/work1/gcc-4.3-20090830/gcc/testsuite/gcc.dg/c99-typespec-1.c:1144: error: two or more data types in declaration specifiers
/mnt/work1/gcc-4.3-20090830/gcc/testsuite/gcc.dg/c99-typespec-1.c:1145: error: both 'long' and 'short' in declaration specifiers
/mnt/work1/gcc-4.3-20090830/gcc/testsuite/gcc.dg/c99-typespec-1.c:1146: error: two or more data types in declaration specifiers
And this is what 2.6.31-rc8 + the patch output at that place:
/mnt/work1/gcc-4.3-20090830/gcc/testsuite/gcc.dg/c99-typespec-1.c:1143: error: two or more data types in declaration specifiers
/mnt/work1/gcc-4.3-20090830/gcc/testsuite/gcc.dg/c99-typespec-1.c:1144: error: two or more data types in declaration specifiers/mnt/work1/gcc-4.3-20090830/gcc/testsuite/gcc.dg/c99-typespec-1.c:1145: error: both 'long' and 'short' in declaration specifiers
/mnt/work1/gcc-4.3-20090830/gcc/testsuite/gcc.dg/c99-typespec-1.c:1146: error: two or more data types in declaration specifiers
The actual logs use \r\n line endings, so between the diagnostics for source
lines 1144 and 1145 there is now a single \r. Some software will display \r
line ending as \r\n, so a missing \n may not be visible. So I've removed the
\r characters in the text above to avoid affecting how it is presented.
The original logs are available in <http://user.it.uu.se/~mikpe/linux/pty-bug/>
if you need them.
/Mikael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 17:30 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-04 17:30 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
On Fri, 4 Sep 2009, Mikael Pettersson wrote:
>
> Comparing the gcc outputs for this test case from runs with 2.6.30 and
> 2.6.31-rc8 shows that 2.6.31-rc8 lost a single newline (\n) byte at byte
> offset 131660. So two lines of diagnostics were fused together and the
> testsuite framework failed to match the second of those lines.
Goodie. That was the kind of hint I was looking for.
And I suspect that that means that the bug is related to do_output_char()
expanding '\n' into '\r\n'. And the different buffering (and the pty
'space' logic) just means that we now hit a case that we didn't use to
hit. The relevant call chain is
- n_tty handling:
n_tty_write() ->
process_output() ->
do_output_char() ->
tty_put_char(tty, '\r')
tty_put_char(tty, '\n')
I'll see what I can find. But your "loses \n character" does mean that the
'lost bytes at the end when the other end closed it' is probably not the
issue, and we're talking about a different kind of bug entirely.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 17:30 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-04 17:30 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
On Fri, 4 Sep 2009, Mikael Pettersson wrote:
>
> Comparing the gcc outputs for this test case from runs with 2.6.30 and
> 2.6.31-rc8 shows that 2.6.31-rc8 lost a single newline (\n) byte at byte
> offset 131660. So two lines of diagnostics were fused together and the
> testsuite framework failed to match the second of those lines.
Goodie. That was the kind of hint I was looking for.
And I suspect that that means that the bug is related to do_output_char()
expanding '\n' into '\r\n'. And the different buffering (and the pty
'space' logic) just means that we now hit a case that we didn't use to
hit. The relevant call chain is
- n_tty handling:
n_tty_write() ->
process_output() ->
do_output_char() ->
tty_put_char(tty, '\r')
tty_put_char(tty, '\n')
I'll see what I can find. But your "loses \n character" does mean that the
'lost bytes at the end when the other end closed it' is probably not the
issue, and we're talking about a different kind of bug entirely.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 17:53 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-04 17:53 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
On Fri, 4 Sep 2009, Linus Torvalds wrote:
>
> And I suspect that that means that the bug is related to do_output_char()
> expanding '\n' into '\r\n'. And the different buffering (and the pty
> 'space' logic) just means that we now hit a case that we didn't use to
> hit. The relevant call chain is
>
> - n_tty handling:
> n_tty_write() ->
> process_output() ->
> do_output_char() ->
> tty_put_char(tty, '\r')
> tty_put_char(tty, '\n')
Hmm. I think I have a clue.
process_output() does
space = tty_write_room(tty);
retval = do_output_char(c, tty, space);
so 'space' can never become off-by-one, since it's always re-calculated
just before. And do_output_char() checks that there is room for two
characters, and won't do just the '\r'.
So the fact that you see the '\r' and not the '\n' means that something
dropped the second character _despite_ tty_write_room() saying there was
room for two characters.
Now, with flow control that can in theory happen in case 'tty->stopped'
gets set asynchronously in between, but that's not an issue here.
So the most likely cause is just that the pty_write_room() function is
simply buggered, or at least doesn't work together with the new world
order.
How about something like this? It's way too anal - it says that we can
only write data if there's enough space to always push it all the way to
the receive buffer (including all the data that was already buffered up,
ie the "memory_used" part). But if it finally makes the problem go away,
we have another clue.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 17:53 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-04 17:53 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
On Fri, 4 Sep 2009, Linus Torvalds wrote:
>
> And I suspect that that means that the bug is related to do_output_char()
> expanding '\n' into '\r\n'. And the different buffering (and the pty
> 'space' logic) just means that we now hit a case that we didn't use to
> hit. The relevant call chain is
>
> - n_tty handling:
> n_tty_write() ->
> process_output() ->
> do_output_char() ->
> tty_put_char(tty, '\r')
> tty_put_char(tty, '\n')
Hmm. I think I have a clue.
process_output() does
space = tty_write_room(tty);
retval = do_output_char(c, tty, space);
so 'space' can never become off-by-one, since it's always re-calculated
just before. And do_output_char() checks that there is room for two
characters, and won't do just the '\r'.
So the fact that you see the '\r' and not the '\n' means that something
dropped the second character _despite_ tty_write_room() saying there was
room for two characters.
Now, with flow control that can in theory happen in case 'tty->stopped'
gets set asynchronously in between, but that's not an issue here.
So the most likely cause is just that the pty_write_room() function is
simply buggered, or at least doesn't work together with the new world
order.
How about something like this? It's way too anal - it says that we can
only write data if there's enough space to always push it all the way to
the receive buffer (including all the data that was already buffered up,
ie the "memory_used" part). But if it finally makes the problem go away,
we have another clue.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 17:55 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-04 17:55 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
On Fri, 4 Sep 2009, Linus Torvalds wrote:
>
> How about something like this? It's way too anal - it says that we can
> only write data if there's enough space to always push it all the way to
> the receive buffer (including all the data that was already buffered up,
> ie the "memory_used" part). But if it finally makes the problem go away,
> we have another clue.
I forgot to actually include the patch. Duh.
And again - UNTESTED. Maybe this makes the buffering _too_ small (the
'memory_used' thing is not really counted in bytes buffered, it's counted
in how much buffer space we've allocated) and things break even worse and
pty's don't work at all. But I think it might work.
Linus
---
drivers/char/pty.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/char/pty.c b/drivers/char/pty.c
index d083c73..139fa5a 100644
--- a/drivers/char/pty.c
+++ b/drivers/char/pty.c
@@ -91,7 +91,7 @@ static void pty_unthrottle(struct tty_struct *tty)
static int pty_space(struct tty_struct *to)
{
- int n = 8192 - to->buf.memory_used;
+ int n = to->receive_room - to->buf.memory_used;
if (n < 0)
return 0;
return n;
^ permalink raw reply related [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 17:55 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-04 17:55 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
On Fri, 4 Sep 2009, Linus Torvalds wrote:
>
> How about something like this? It's way too anal - it says that we can
> only write data if there's enough space to always push it all the way to
> the receive buffer (including all the data that was already buffered up,
> ie the "memory_used" part). But if it finally makes the problem go away,
> we have another clue.
I forgot to actually include the patch. Duh.
And again - UNTESTED. Maybe this makes the buffering _too_ small (the
'memory_used' thing is not really counted in bytes buffered, it's counted
in how much buffer space we've allocated) and things break even worse and
pty's don't work at all. But I think it might work.
Linus
---
drivers/char/pty.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
diff --git a/drivers/char/pty.c b/drivers/char/pty.c
index d083c73..139fa5a 100644
--- a/drivers/char/pty.c
+++ b/drivers/char/pty.c
@@ -91,7 +91,7 @@ static void pty_unthrottle(struct tty_struct *tty)
static int pty_space(struct tty_struct *to)
{
- int n = 8192 - to->buf.memory_used;
+ int n = to->receive_room - to->buf.memory_used;
if (n < 0)
return 0;
return n;
^ permalink raw reply related [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 18:11 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-04 18:11 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
On Fri, 4 Sep 2009, Linus Torvalds wrote:
>
> And again - UNTESTED. Maybe this makes the buffering _too_ small (the
> 'memory_used' thing is not really counted in bytes buffered, it's counted
> in how much buffer space we've allocated) and things break even worse and
> pty's don't work at all. But I think it might work.
Actually, scratch that patch.
After writing the above, the voices in my head started clamoring about
this "space allocated" vs "bytes buffered" thing, which I was obviously
aware of, but hadn't thought about as an issue.
And you know what? The thing about "space allocated" vs "bytes buffered"
is that writing _one_ byte (the '\r') can cause a lot more than one byte
to be allocated for a buffer (we do minimum 256-byte buffers).
So let's say that 'space' was initially 20 - plenty big enough to hold two
characters. But if the '\r' just happened to need a new buffer, it would
actually increase 'memory_used' by 256, and now the next time we call
'pty_space()' it doesn't return 19, but 0 - because now memory_used is
larger than the 8192 we allowed.
So I'm starting to suspect that the real bug is that we do that
'pty_space()' in pty_write() call at all. The _callers_ should already
have done the write_room() check, and if somebody doesn't do it, then the
tty buffering will eventually do a hard limit at the 65kB allocation mark.
So doing it in pty_write() is (a) unnecessary and (b) actively wrong,
because it means that in the situation above, pty_write() won't be allowin
the slop that it _needs_ to allow due to the buffering not being exact
"this many bytes buffered up", but "this many bytes allocated for
buffering".
So rather than the previous patch, try this one instead.
Linus
---
drivers/char/pty.c | 6 ------
1 files changed, 0 insertions(+), 6 deletions(-)
diff --git a/drivers/char/pty.c b/drivers/char/pty.c
index d083c73..45a7ca2 100644
--- a/drivers/char/pty.c
+++ b/drivers/char/pty.c
@@ -118,12 +118,6 @@ static int pty_write(struct tty_struct *tty, const unsigned char *buf,
if (tty->stopped)
return 0;
- /* This isn't locked but our 8K is quite sloppy so no
- big deal */
-
- c = pty_space(to);
- if (c > count)
- c = count;
if (c > 0) {
/* Stuff the data into the input queue of the other end */
c = tty_insert_flip_string(to, buf, c);
^ permalink raw reply related [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 18:11 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-04 18:11 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
On Fri, 4 Sep 2009, Linus Torvalds wrote:
>
> And again - UNTESTED. Maybe this makes the buffering _too_ small (the
> 'memory_used' thing is not really counted in bytes buffered, it's counted
> in how much buffer space we've allocated) and things break even worse and
> pty's don't work at all. But I think it might work.
Actually, scratch that patch.
After writing the above, the voices in my head started clamoring about
this "space allocated" vs "bytes buffered" thing, which I was obviously
aware of, but hadn't thought about as an issue.
And you know what? The thing about "space allocated" vs "bytes buffered"
is that writing _one_ byte (the '\r') can cause a lot more than one byte
to be allocated for a buffer (we do minimum 256-byte buffers).
So let's say that 'space' was initially 20 - plenty big enough to hold two
characters. But if the '\r' just happened to need a new buffer, it would
actually increase 'memory_used' by 256, and now the next time we call
'pty_space()' it doesn't return 19, but 0 - because now memory_used is
larger than the 8192 we allowed.
So I'm starting to suspect that the real bug is that we do that
'pty_space()' in pty_write() call at all. The _callers_ should already
have done the write_room() check, and if somebody doesn't do it, then the
tty buffering will eventually do a hard limit at the 65kB allocation mark.
So doing it in pty_write() is (a) unnecessary and (b) actively wrong,
because it means that in the situation above, pty_write() won't be allowin
the slop that it _needs_ to allow due to the buffering not being exact
"this many bytes buffered up", but "this many bytes allocated for
buffering".
So rather than the previous patch, try this one instead.
Linus
---
drivers/char/pty.c | 6 ------
1 files changed, 0 insertions(+), 6 deletions(-)
diff --git a/drivers/char/pty.c b/drivers/char/pty.c
index d083c73..45a7ca2 100644
--- a/drivers/char/pty.c
+++ b/drivers/char/pty.c
@@ -118,12 +118,6 @@ static int pty_write(struct tty_struct *tty, const unsigned char *buf,
if (tty->stopped)
return 0;
- /* This isn't locked but our 8K is quite sloppy so no
- big deal */
-
- c = pty_space(to);
- if (c > count)
- c = count;
if (c > 0) {
/* Stuff the data into the input queue of the other end */
c = tty_insert_flip_string(to, buf, c);
^ permalink raw reply related [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 19:11 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-04 19:11 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
On Fri, 4 Sep 2009, Linus Torvalds wrote:
>
> So I'm starting to suspect that the real bug is that we do that
> 'pty_space()' in pty_write() call at all. The _callers_ should already
> have done the write_room() check, and if somebody doesn't do it, then the
> tty buffering will eventually do a hard limit at the 65kB allocation mark.
Ok, so the thought was right, but the patch was obviously not even
compiled, because the compiler points out that 'c' was not initialized.
I'm sure you already figured the obvious meaning out, but here's a fixed
version.
Linus
---
drivers/char/pty.c | 10 +---------
1 files changed, 1 insertions(+), 9 deletions(-)
diff --git a/drivers/char/pty.c b/drivers/char/pty.c
index d083c73..b33d668 100644
--- a/drivers/char/pty.c
+++ b/drivers/char/pty.c
@@ -109,21 +109,13 @@ static int pty_space(struct tty_struct *to)
* the other side of the pty/tty pair.
*/
-static int pty_write(struct tty_struct *tty, const unsigned char *buf,
- int count)
+static int pty_write(struct tty_struct *tty, const unsigned char *buf, int c)
{
struct tty_struct *to = tty->link;
- int c;
if (tty->stopped)
return 0;
- /* This isn't locked but our 8K is quite sloppy so no
- big deal */
-
- c = pty_space(to);
- if (c > count)
- c = count;
if (c > 0) {
/* Stuff the data into the input queue of the other end */
c = tty_insert_flip_string(to, buf, c);
^ permalink raw reply related [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 19:11 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-04 19:11 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
On Fri, 4 Sep 2009, Linus Torvalds wrote:
>
> So I'm starting to suspect that the real bug is that we do that
> 'pty_space()' in pty_write() call at all. The _callers_ should already
> have done the write_room() check, and if somebody doesn't do it, then the
> tty buffering will eventually do a hard limit at the 65kB allocation mark.
Ok, so the thought was right, but the patch was obviously not even
compiled, because the compiler points out that 'c' was not initialized.
I'm sure you already figured the obvious meaning out, but here's a fixed
version.
Linus
---
drivers/char/pty.c | 10 +---------
1 files changed, 1 insertions(+), 9 deletions(-)
diff --git a/drivers/char/pty.c b/drivers/char/pty.c
index d083c73..b33d668 100644
--- a/drivers/char/pty.c
+++ b/drivers/char/pty.c
@@ -109,21 +109,13 @@ static int pty_space(struct tty_struct *to)
* the other side of the pty/tty pair.
*/
-static int pty_write(struct tty_struct *tty, const unsigned char *buf,
- int count)
+static int pty_write(struct tty_struct *tty, const unsigned char *buf, int c)
{
struct tty_struct *to = tty->link;
- int c;
if (tty->stopped)
return 0;
- /* This isn't locked but our 8K is quite sloppy so no
- big deal */
-
- c = pty_space(to);
- if (c > count)
- c = count;
if (c > 0) {
/* Stuff the data into the input queue of the other end */
c = tty_insert_flip_string(to, buf, c);
^ permalink raw reply related [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 19:19 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-04 19:19 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
On Fri, 4 Sep 2009, Linus Torvalds wrote:
>
> I'm sure you already figured the obvious meaning out, but here's a fixed
> version.
And here's another patch that may also fix this, simply by virtue of
writing the "\r\n" as a single string, rather than as two characters. That
way, we should never get into the situation that th '\r' allocates a new
buffer (larger than one character), and then the later '\n' writing
decides that we've filled up.
Besides, it's a cleanup. An untested one, naturally.
Linus
---
drivers/char/n_tty.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/drivers/char/n_tty.c b/drivers/char/n_tty.c
index 973be2f..4e28b35 100644
--- a/drivers/char/n_tty.c
+++ b/drivers/char/n_tty.c
@@ -300,8 +300,7 @@ static int do_output_char(unsigned char c, struct tty_struct *tty, int space)
if (space < 2)
return -1;
tty->canon_column = tty->column = 0;
- tty_put_char(tty, '\r');
- tty_put_char(tty, c);
+ tty->ops->write(tty, "\r\n", 2);
return 2;
}
tty->canon_column = tty->column;
^ permalink raw reply related [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 19:19 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-04 19:19 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
On Fri, 4 Sep 2009, Linus Torvalds wrote:
>
> I'm sure you already figured the obvious meaning out, but here's a fixed
> version.
And here's another patch that may also fix this, simply by virtue of
writing the "\r\n" as a single string, rather than as two characters. That
way, we should never get into the situation that th '\r' allocates a new
buffer (larger than one character), and then the later '\n' writing
decides that we've filled up.
Besides, it's a cleanup. An untested one, naturally.
Linus
---
drivers/char/n_tty.c | 3 +--
1 files changed, 1 insertions(+), 2 deletions(-)
diff --git a/drivers/char/n_tty.c b/drivers/char/n_tty.c
index 973be2f..4e28b35 100644
--- a/drivers/char/n_tty.c
+++ b/drivers/char/n_tty.c
@@ -300,8 +300,7 @@ static int do_output_char(unsigned char c, struct tty_struct *tty, int space)
if (space < 2)
return -1;
tty->canon_column = tty->column = 0;
- tty_put_char(tty, '\r');
- tty_put_char(tty, c);
+ tty->ops->write(tty, "\r\n", 2);
return 2;
}
tty->canon_column = tty->column;
^ permalink raw reply related [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-05 10:46 ` Mikael Pettersson
0 siblings, 0 replies; 286+ messages in thread
From: Mikael Pettersson @ 2009-09-05 10:46 UTC (permalink / raw)
To: Linus Torvalds
Cc: Mikael Pettersson, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
Linus Torvalds writes:
>
>
> On Fri, 4 Sep 2009, Linus Torvalds wrote:
> >
> > I'm sure you already figured the obvious meaning out, but here's a fixed
> > version.
>
> And here's another patch that may also fix this, simply by virtue of
> writing the "\r\n" as a single string, rather than as two characters. That
> way, we should never get into the situation that th '\r' allocates a new
> buffer (larger than one character), and then the later '\n' writing
> decides that we've filled up.
Thanks, I'm testing this and the pty_write() fix on i686 and ppc64 now.
Sometimes the bug is difficult to trigger, so I may need to do loads of
testing with different gcc versions before I dare to say that it's fixed.
/Mikael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-05 10:46 ` Mikael Pettersson
0 siblings, 0 replies; 286+ messages in thread
From: Mikael Pettersson @ 2009-09-05 10:46 UTC (permalink / raw)
To: Linus Torvalds
Cc: Mikael Pettersson, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
Linus Torvalds writes:
>
>
> On Fri, 4 Sep 2009, Linus Torvalds wrote:
> >
> > I'm sure you already figured the obvious meaning out, but here's a fixed
> > version.
>
> And here's another patch that may also fix this, simply by virtue of
> writing the "\r\n" as a single string, rather than as two characters. That
> way, we should never get into the situation that th '\r' allocates a new
> buffer (larger than one character), and then the later '\n' writing
> decides that we've filled up.
Thanks, I'm testing this and the pty_write() fix on i686 and ppc64 now.
Sometimes the bug is difficult to trigger, so I may need to do loads of
testing with different gcc versions before I dare to say that it's fixed.
/Mikael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-05 20:29 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-05 20:29 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
On Sat, 5 Sep 2009, Mikael Pettersson wrote:
>
> Thanks, I'm testing this and the pty_write() fix on i686 and ppc64 now.
>
> Sometimes the bug is difficult to trigger, so I may need to do loads of
> testing with different gcc versions before I dare to say that it's fixed.
Ok, I'm going to commit the two patches, because even if there is some
other bug hiding too (and it doesn't fix your thing - although I think it
will), this is definitely a real fix regardless.
And I want to do a -rc9 and let it get a couple of days of testing, since
I got way more pull requests than I hoped for while I was off diving.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-05 20:29 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-05 20:29 UTC (permalink / raw)
To: Mikael Pettersson
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
On Sat, 5 Sep 2009, Mikael Pettersson wrote:
>
> Thanks, I'm testing this and the pty_write() fix on i686 and ppc64 now.
>
> Sometimes the bug is difficult to trigger, so I may need to do loads of
> testing with different gcc versions before I dare to say that it's fixed.
Ok, I'm going to commit the two patches, because even if there is some
other bug hiding too (and it doesn't fix your thing - although I think it
will), this is definitely a real fix regardless.
And I want to do a -rc9 and let it get a couple of days of testing, since
I got way more pull requests than I hoped for while I was off diving.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-05 22:42 ` Mikael Pettersson
0 siblings, 0 replies; 286+ messages in thread
From: Mikael Pettersson @ 2009-09-05 22:42 UTC (permalink / raw)
To: Linus Torvalds
Cc: Mikael Pettersson, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
Linus Torvalds writes:
>
>
> On Sat, 5 Sep 2009, Mikael Pettersson wrote:
> >
> > Thanks, I'm testing this and the pty_write() fix on i686 and ppc64 now.
> >
> > Sometimes the bug is difficult to trigger, so I may need to do loads of
> > testing with different gcc versions before I dare to say that it's fixed.
>
> Ok, I'm going to commit the two patches, because even if there is some
> other bug hiding too (and it doesn't fix your thing - although I think it
> will), this is definitely a real fix regardless.
>
> And I want to do a -rc9 and let it get a couple of days of testing, since
> I got way more pull requests than I hoped for while I was off diving.
With these two fixes my i686 box finished all gcc bootstrap/regtest
cycles I had scheduled with no signs of pty errors. My ppc64 box
isn't done yet, but so far it's not seen any errors either. So I'm
reasonably optimistic about these fixes.
/Mikael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-05 22:42 ` Mikael Pettersson
0 siblings, 0 replies; 286+ messages in thread
From: Mikael Pettersson @ 2009-09-05 22:42 UTC (permalink / raw)
To: Linus Torvalds
Cc: Mikael Pettersson, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
Linus Torvalds writes:
>
>
> On Sat, 5 Sep 2009, Mikael Pettersson wrote:
> >
> > Thanks, I'm testing this and the pty_write() fix on i686 and ppc64 now.
> >
> > Sometimes the bug is difficult to trigger, so I may need to do loads of
> > testing with different gcc versions before I dare to say that it's fixed.
>
> Ok, I'm going to commit the two patches, because even if there is some
> other bug hiding too (and it doesn't fix your thing - although I think it
> will), this is definitely a real fix regardless.
>
> And I want to do a -rc9 and let it get a couple of days of testing, since
> I got way more pull requests than I hoped for while I was off diving.
With these two fixes my i686 box finished all gcc bootstrap/regtest
cycles I had scheduled with no signs of pty errors. My ppc64 box
isn't done yet, but so far it's not seen any errors either. So I'm
reasonably optimistic about these fixes.
/Mikael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-05 17:00 ` OGAWA Hirofumi
0 siblings, 0 replies; 286+ messages in thread
From: OGAWA Hirofumi @ 2009-09-05 17:00 UTC (permalink / raw)
To: Linus Torvalds
Cc: Mikael Pettersson, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Fri, 4 Sep 2009, Linus Torvalds wrote:
>>
>> So I'm starting to suspect that the real bug is that we do that
>> 'pty_space()' in pty_write() call at all. The _callers_ should already
>> have done the write_room() check, and if somebody doesn't do it, then the
>> tty buffering will eventually do a hard limit at the 65kB allocation mark.
>
> Ok, so the thought was right, but the patch was obviously not even
> compiled, because the compiler points out that 'c' was not initialized.
>
> I'm sure you already figured the obvious meaning out, but here's a fixed
> version.
This is not meaning to object to your patch though, I think we would be
good to fix pty_space(), not leaving as wrong. With fix it, I guess we
don't get strange behavior in the near of buffer limit.
Also, it seems the non-n_tty path doesn't use tty_write_room() check,
and instead it just try to write and check written bytes which returned
by tty->ops->write().
So, it will use the 64kb limit at least few paths, and I'm not sure
though, non-n_tty path (e.g. ppp) doesn't use tty_write_room() check
always. It may not be consistent if we removed pty_space() in pty_write().
So, it may not be issue though, I made this patch to fix
pty_space(). What do you think?
Well, anyway, I've tested your patch and this patch fixed the
gcc-testsuite on my machine.
Thanks.
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Signed-off-by: OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
---
diff -puN drivers/char/pty.c~tty-debug drivers/char/pty.c
--- linux-2.6/drivers/char/pty.c~tty-debug 2009-09-05 20:10:35.000000000 +0900
+++ linux-2.6-hirofumi/drivers/char/pty.c 2009-09-06 01:50:44.000000000 +0900
@@ -91,7 +91,7 @@ static void pty_unthrottle(struct tty_st
static int pty_space(struct tty_struct *to)
{
- int n = 8192 - to->buf.memory_used;
+ int n = 8192 - tty_buffer_used(to);
if (n < 0)
return 0;
return n;
diff -puN drivers/char/tty_buffer.c~tty-debug drivers/char/tty_buffer.c
--- linux-2.6/drivers/char/tty_buffer.c~tty-debug 2009-09-05 21:02:48.000000000 +0900
+++ linux-2.6-hirofumi/drivers/char/tty_buffer.c 2009-09-05 21:08:47.000000000 +0900
@@ -231,6 +231,30 @@ int tty_buffer_request_room(struct tty_s
EXPORT_SYMBOL_GPL(tty_buffer_request_room);
/**
+ * tty_buffer_used - return used tty buffer size
+ * @tty: tty structure
+ *
+ * Return used tty buffer size.
+ */
+size_t tty_buffer_used(struct tty_struct *tty)
+{
+ size_t size;
+ int left;
+ unsigned long flags;
+
+ spin_lock_irqsave(&tty->buf.lock, flags);
+ if (tty->buf.tail)
+ left = tty->buf.tail->size - tty->buf.tail->used;
+ else
+ left = 0;
+ size = tty->buf.memory_used - left;
+ spin_unlock_irqrestore(&tty->buf.lock, flags);
+
+ return size;
+}
+EXPORT_SYMBOL_GPL(tty_buffer_used);
+
+/**
* tty_insert_flip_string - Add characters to the tty buffer
* @tty: tty structure
* @chars: characters
diff -puN include/linux/tty_flip.h~tty-debug include/linux/tty_flip.h
--- linux-2.6/include/linux/tty_flip.h~tty-debug 2009-09-05 21:06:25.000000000 +0900
+++ linux-2.6-hirofumi/include/linux/tty_flip.h 2009-09-05 21:06:39.000000000 +0900
@@ -2,6 +2,7 @@
#define _LINUX_TTY_FLIP_H
extern int tty_buffer_request_room(struct tty_struct *tty, size_t size);
+extern size_t tty_buffer_used(struct tty_struct *tty);
extern int tty_insert_flip_string(struct tty_struct *tty, const unsigned char *chars, size_t size);
extern int tty_insert_flip_string_flags(struct tty_struct *tty, const unsigned char *chars, const char *flags, size_t size);
extern int tty_prepare_flip_string(struct tty_struct *tty, unsigned char **chars, size_t size);
_
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-05 17:00 ` OGAWA Hirofumi
0 siblings, 0 replies; 286+ messages in thread
From: OGAWA Hirofumi @ 2009-09-05 17:00 UTC (permalink / raw)
To: Linus Torvalds
Cc: Mikael Pettersson, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton
Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> writes:
> On Fri, 4 Sep 2009, Linus Torvalds wrote:
>>
>> So I'm starting to suspect that the real bug is that we do that
>> 'pty_space()' in pty_write() call at all. The _callers_ should already
>> have done the write_room() check, and if somebody doesn't do it, then the
>> tty buffering will eventually do a hard limit at the 65kB allocation mark.
>
> Ok, so the thought was right, but the patch was obviously not even
> compiled, because the compiler points out that 'c' was not initialized.
>
> I'm sure you already figured the obvious meaning out, but here's a fixed
> version.
This is not meaning to object to your patch though, I think we would be
good to fix pty_space(), not leaving as wrong. With fix it, I guess we
don't get strange behavior in the near of buffer limit.
Also, it seems the non-n_tty path doesn't use tty_write_room() check,
and instead it just try to write and check written bytes which returned
by tty->ops->write().
So, it will use the 64kb limit at least few paths, and I'm not sure
though, non-n_tty path (e.g. ppp) doesn't use tty_write_room() check
always. It may not be consistent if we removed pty_space() in pty_write().
So, it may not be issue though, I made this patch to fix
pty_space(). What do you think?
Well, anyway, I've tested your patch and this patch fixed the
gcc-testsuite on my machine.
Thanks.
--
OGAWA Hirofumi <hirofumi-UIVanBePwB70ZhReMnHkpc8NsWr+9BEh@public.gmane.org>
Signed-off-by: OGAWA Hirofumi <hirofumi-UIVanBePwB70ZhReMnHkpc8NsWr+9BEh@public.gmane.org>
---
diff -puN drivers/char/pty.c~tty-debug drivers/char/pty.c
--- linux-2.6/drivers/char/pty.c~tty-debug 2009-09-05 20:10:35.000000000 +0900
+++ linux-2.6-hirofumi/drivers/char/pty.c 2009-09-06 01:50:44.000000000 +0900
@@ -91,7 +91,7 @@ static void pty_unthrottle(struct tty_st
static int pty_space(struct tty_struct *to)
{
- int n = 8192 - to->buf.memory_used;
+ int n = 8192 - tty_buffer_used(to);
if (n < 0)
return 0;
return n;
diff -puN drivers/char/tty_buffer.c~tty-debug drivers/char/tty_buffer.c
--- linux-2.6/drivers/char/tty_buffer.c~tty-debug 2009-09-05 21:02:48.000000000 +0900
+++ linux-2.6-hirofumi/drivers/char/tty_buffer.c 2009-09-05 21:08:47.000000000 +0900
@@ -231,6 +231,30 @@ int tty_buffer_request_room(struct tty_s
EXPORT_SYMBOL_GPL(tty_buffer_request_room);
/**
+ * tty_buffer_used - return used tty buffer size
+ * @tty: tty structure
+ *
+ * Return used tty buffer size.
+ */
+size_t tty_buffer_used(struct tty_struct *tty)
+{
+ size_t size;
+ int left;
+ unsigned long flags;
+
+ spin_lock_irqsave(&tty->buf.lock, flags);
+ if (tty->buf.tail)
+ left = tty->buf.tail->size - tty->buf.tail->used;
+ else
+ left = 0;
+ size = tty->buf.memory_used - left;
+ spin_unlock_irqrestore(&tty->buf.lock, flags);
+
+ return size;
+}
+EXPORT_SYMBOL_GPL(tty_buffer_used);
+
+/**
* tty_insert_flip_string - Add characters to the tty buffer
* @tty: tty structure
* @chars: characters
diff -puN include/linux/tty_flip.h~tty-debug include/linux/tty_flip.h
--- linux-2.6/include/linux/tty_flip.h~tty-debug 2009-09-05 21:06:25.000000000 +0900
+++ linux-2.6-hirofumi/include/linux/tty_flip.h 2009-09-05 21:06:39.000000000 +0900
@@ -2,6 +2,7 @@
#define _LINUX_TTY_FLIP_H
extern int tty_buffer_request_room(struct tty_struct *tty, size_t size);
+extern size_t tty_buffer_used(struct tty_struct *tty);
extern int tty_insert_flip_string(struct tty_struct *tty, const unsigned char *chars, size_t size);
extern int tty_insert_flip_string_flags(struct tty_struct *tty, const unsigned char *chars, const char *flags, size_t size);
extern int tty_prepare_flip_string(struct tty_struct *tty, unsigned char **chars, size_t size);
_
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-05 18:06 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-05 18:06 UTC (permalink / raw)
To: OGAWA Hirofumi
Cc: Mikael Pettersson, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton
On Sun, 6 Sep 2009, OGAWA Hirofumi wrote:
>
> This is not meaning to object to your patch though, I think we would be
> good to fix pty_space(), not leaving as wrong. With fix it, I guess we
> don't get strange behavior in the near of buffer limit.
I'd actually rather not make that function any more complicated.
Just make the rules be very simple:
- the pty layer has ~64kB buffering, and if you just blindly do a
->write() op, you can see how many characters you were able to write.
- before doing a ->write() op, you can ask how many characters you are
guaranteed to be able to write by doing a "->write_room()" call.
..and then the bug literally was just that "pty_write()" was confused, and
thought that it should do that "write_room()" thing, which it really
shouldn't ever have done.
So I really think that the true fix is to just remove the code from
pty_write(), and not do anything more complicated. I'll also commit the
change to write '\r\n' as one single string, because quite frankly, it's
just stupid to do it as two characters, but at that point it's just a
cleanup.
> Also, it seems the non-n_tty path doesn't use tty_write_room() check,
> and instead it just try to write and check written bytes which returned
> by tty->ops->write().
.. and I think that's fine. I think write_room() should be used sparingly,
and only by code that cares about being able to fit at least 'n'
characters in the tty buffers. In fact, I think even n_tty would likely in
general be better off without it (and just check the return value), but
because of the stateful character translation (that doesn't actually keep
any state around, it just wants to expand things as it goes along), and
because of historical reasons, we'll just keep it using write_room.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-05 18:06 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-05 18:06 UTC (permalink / raw)
To: OGAWA Hirofumi
Cc: Mikael Pettersson, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton
On Sun, 6 Sep 2009, OGAWA Hirofumi wrote:
>
> This is not meaning to object to your patch though, I think we would be
> good to fix pty_space(), not leaving as wrong. With fix it, I guess we
> don't get strange behavior in the near of buffer limit.
I'd actually rather not make that function any more complicated.
Just make the rules be very simple:
- the pty layer has ~64kB buffering, and if you just blindly do a
->write() op, you can see how many characters you were able to write.
- before doing a ->write() op, you can ask how many characters you are
guaranteed to be able to write by doing a "->write_room()" call.
..and then the bug literally was just that "pty_write()" was confused, and
thought that it should do that "write_room()" thing, which it really
shouldn't ever have done.
So I really think that the true fix is to just remove the code from
pty_write(), and not do anything more complicated. I'll also commit the
change to write '\r\n' as one single string, because quite frankly, it's
just stupid to do it as two characters, but at that point it's just a
cleanup.
> Also, it seems the non-n_tty path doesn't use tty_write_room() check,
> and instead it just try to write and check written bytes which returned
> by tty->ops->write().
.. and I think that's fine. I think write_room() should be used sparingly,
and only by code that cares about being able to fit at least 'n'
characters in the tty buffers. In fact, I think even n_tty would likely in
general be better off without it (and just check the return value), but
because of the stateful character translation (that doesn't actually keep
any state around, it just wants to expand things as it goes along), and
because of historical reasons, we'll just keep it using write_room.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-05 18:56 ` OGAWA Hirofumi
0 siblings, 0 replies; 286+ messages in thread
From: OGAWA Hirofumi @ 2009-09-05 18:56 UTC (permalink / raw)
To: Linus Torvalds
Cc: Mikael Pettersson, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton
Linus Torvalds <torvalds@linux-foundation.org> writes:
> On Sun, 6 Sep 2009, OGAWA Hirofumi wrote:
>>
>> This is not meaning to object to your patch though, I think we would be
>> good to fix pty_space(), not leaving as wrong. With fix it, I guess we
>> don't get strange behavior in the near of buffer limit.
>
> I'd actually rather not make that function any more complicated.
>
> Just make the rules be very simple:
>
> - the pty layer has ~64kB buffering, and if you just blindly do a
> ->write() op, you can see how many characters you were able to write.
>
> - before doing a ->write() op, you can ask how many characters you are
> guaranteed to be able to write by doing a "->write_room()" call.
>
> ..and then the bug literally was just that "pty_write()" was confused, and
> thought that it should do that "write_room()" thing, which it really
> shouldn't ever have done.
>
> So I really think that the true fix is to just remove the code from
> pty_write(), and not do anything more complicated. I'll also commit the
> change to write '\r\n' as one single string, because quite frankly, it's
> just stupid to do it as two characters, but at that point it's just a
> cleanup.
But, current write_room() returns almost all wrong value. For example,
if we have the 4kb preallocated buffer in some state and used it,
->memory_used will be 4kb even if we are using only a byte actually.
I thought it's strange/wrong, even if we removed the pty_space() in
pty_write().
>> Also, it seems the non-n_tty path doesn't use tty_write_room() check,
>> and instead it just try to write and check written bytes which returned
>> by tty->ops->write().
>
> .. and I think that's fine. I think write_room() should be used sparingly,
> and only by code that cares about being able to fit at least 'n'
> characters in the tty buffers. In fact, I think even n_tty would likely in
> general be better off without it (and just check the return value), but
> because of the stateful character translation (that doesn't actually keep
> any state around, it just wants to expand things as it goes along), and
> because of historical reasons, we'll just keep it using write_room.
As a bit long term solution, I agree. Current code seems to have fragile
buffer handling about echoes, \n etc. And yes, perhaps, to avoid
write_room() is clean way.
But, I felt 64kb (pty_write) vs 8kb (pty_write_room) sounds strange
currently.
Thanks.
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-05 18:56 ` OGAWA Hirofumi
0 siblings, 0 replies; 286+ messages in thread
From: OGAWA Hirofumi @ 2009-09-05 18:56 UTC (permalink / raw)
To: Linus Torvalds
Cc: Mikael Pettersson, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton
Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> writes:
> On Sun, 6 Sep 2009, OGAWA Hirofumi wrote:
>>
>> This is not meaning to object to your patch though, I think we would be
>> good to fix pty_space(), not leaving as wrong. With fix it, I guess we
>> don't get strange behavior in the near of buffer limit.
>
> I'd actually rather not make that function any more complicated.
>
> Just make the rules be very simple:
>
> - the pty layer has ~64kB buffering, and if you just blindly do a
> ->write() op, you can see how many characters you were able to write.
>
> - before doing a ->write() op, you can ask how many characters you are
> guaranteed to be able to write by doing a "->write_room()" call.
>
> ..and then the bug literally was just that "pty_write()" was confused, and
> thought that it should do that "write_room()" thing, which it really
> shouldn't ever have done.
>
> So I really think that the true fix is to just remove the code from
> pty_write(), and not do anything more complicated. I'll also commit the
> change to write '\r\n' as one single string, because quite frankly, it's
> just stupid to do it as two characters, but at that point it's just a
> cleanup.
But, current write_room() returns almost all wrong value. For example,
if we have the 4kb preallocated buffer in some state and used it,
->memory_used will be 4kb even if we are using only a byte actually.
I thought it's strange/wrong, even if we removed the pty_space() in
pty_write().
>> Also, it seems the non-n_tty path doesn't use tty_write_room() check,
>> and instead it just try to write and check written bytes which returned
>> by tty->ops->write().
>
> .. and I think that's fine. I think write_room() should be used sparingly,
> and only by code that cares about being able to fit at least 'n'
> characters in the tty buffers. In fact, I think even n_tty would likely in
> general be better off without it (and just check the return value), but
> because of the stateful character translation (that doesn't actually keep
> any state around, it just wants to expand things as it goes along), and
> because of historical reasons, we'll just keep it using write_room.
As a bit long term solution, I agree. Current code seems to have fragile
buffer handling about echoes, \n etc. And yes, perhaps, to avoid
write_room() is clean way.
But, I felt 64kb (pty_write) vs 8kb (pty_write_room) sounds strange
currently.
Thanks.
--
OGAWA Hirofumi <hirofumi-UIVanBePwB70ZhReMnHkpc8NsWr+9BEh@public.gmane.org>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-05 21:56 ` Alan Cox
0 siblings, 0 replies; 286+ messages in thread
From: Alan Cox @ 2009-09-05 21:56 UTC (permalink / raw)
To: OGAWA Hirofumi
Cc: Linus Torvalds, Mikael Pettersson, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Alan Cox,
Greg KH, Andrew Morton
> So, it will use the 64kb limit at least few paths, and I'm not sure
> though, non-n_tty path (e.g. ppp) doesn't use tty_write_room() check
> always. It may not be consistent if we removed pty_space() in pty_write().
The correct behaviour for most network protocols to overflow is to drop
packets so the behaviour of not checking was intentional.
Alan
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-05 21:56 ` Alan Cox
0 siblings, 0 replies; 286+ messages in thread
From: Alan Cox @ 2009-09-05 21:56 UTC (permalink / raw)
To: OGAWA Hirofumi
Cc: Linus Torvalds, Mikael Pettersson, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Alan Cox,
Greg KH, Andrew Morton
> So, it will use the 64kb limit at least few paths, and I'm not sure
> though, non-n_tty path (e.g. ppp) doesn't use tty_write_room() check
> always. It may not be consistent if we removed pty_space() in pty_write().
The correct behaviour for most network protocols to overflow is to drop
packets so the behaviour of not checking was intentional.
Alan
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-05 22:46 ` OGAWA Hirofumi
0 siblings, 0 replies; 286+ messages in thread
From: OGAWA Hirofumi @ 2009-09-05 22:46 UTC (permalink / raw)
To: Alan Cox
Cc: Linus Torvalds, Mikael Pettersson, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Alan Cox,
Greg KH, Andrew Morton
Alan Cox <alan@lxorguk.ukuu.org.uk> writes:
>> So, it will use the 64kb limit at least few paths, and I'm not sure
>> though, non-n_tty path (e.g. ppp) doesn't use tty_write_room() check
>> always. It may not be consistent if we removed pty_space() in pty_write().
>
> The correct behaviour for most network protocols to overflow is to drop
> packets so the behaviour of not checking was intentional.
I see. I meant, ppp doesn't check (64kb) on write, but post-process(?)
checks (8kb). If there is this situation, I just worried it becomes the
cause of wrong behavior.
Thanks.
--
OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-05 22:46 ` OGAWA Hirofumi
0 siblings, 0 replies; 286+ messages in thread
From: OGAWA Hirofumi @ 2009-09-05 22:46 UTC (permalink / raw)
To: Alan Cox
Cc: Linus Torvalds, Mikael Pettersson, Rafael J. Wysocki,
Linux Kernel Mailing List, Kernel Testers List, Alan Cox,
Greg KH, Andrew Morton
Alan Cox <alan-qBU/x9rampVanCEyBjwyrvXRex20P6io@public.gmane.org> writes:
>> So, it will use the 64kb limit at least few paths, and I'm not sure
>> though, non-n_tty path (e.g. ppp) doesn't use tty_write_room() check
>> always. It may not be consistent if we removed pty_space() in pty_write().
>
> The correct behaviour for most network protocols to overflow is to drop
> packets so the behaviour of not checking was intentional.
I see. I meant, ppp doesn't check (64kb) on write, but post-process(?)
checks (8kb). If there is this situation, I just worried it becomes the
cause of wrong behavior.
Thanks.
--
OGAWA Hirofumi <hirofumi-UIVanBePwB70ZhReMnHkpc8NsWr+9BEh@public.gmane.org>
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 21:12 ` Alan Cox
0 siblings, 0 replies; 286+ messages in thread
From: Alan Cox @ 2009-09-04 21:12 UTC (permalink / raw)
To: Linus Torvalds
Cc: Mikael Pettersson, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
> After writing the above, the voices in my head started clamoring about
> this "space allocated" vs "bytes buffered" thing, which I was obviously
> aware of, but hadn't thought about as an issue.
>
> And you know what? The thing about "space allocated" vs "bytes buffered"
> is that writing _one_ byte (the '\r') can cause a lot more than one byte
> to be allocated for a buffer (we do minimum 256-byte buffers)
Doh yes thats an utterly dumb bug on my part - we do have to work on
chars.
100% agree with the diagnosis
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14015] pty regressed again, breaking expect and gcc's testsuite
@ 2009-09-04 21:12 ` Alan Cox
0 siblings, 0 replies; 286+ messages in thread
From: Alan Cox @ 2009-09-04 21:12 UTC (permalink / raw)
To: Linus Torvalds
Cc: Mikael Pettersson, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Alan Cox, Greg KH, Andrew Morton,
OGAWA Hirofumi
> After writing the above, the voices in my head started clamoring about
> this "space allocated" vs "bytes buffered" thing, which I was obviously
> aware of, but hadn't thought about as an issue.
>
> And you know what? The thing about "space allocated" vs "bytes buffered"
> is that writing _one_ byte (the '\r') can cause a lot more than one byte
> to be allocated for a buffer (we do minimum 256-byte buffers)
Doh yes thats an utterly dumb bug on my part - we do have to work on
chars.
100% agree with the diagnosis
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #14013] hd don't show up
2009-08-25 20:00 ` Rafael J. Wysocki
` (23 preceding siblings ...)
(?)
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Tejun Heo, Tim Blechmann
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14013
Subject : hd don't show up
Submitter : Tim Blechmann <tim@klingt.org>
Date : 2009-08-14 8:26 (12 days old)
References : http://marc.info/?l=linux-kernel&m=125023842514480&w=4
Handled-By : Tejun Heo <tj@kernel.org>
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #14018] kernel freezes, inotify problem
2009-08-25 20:00 ` Rafael J. Wysocki
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Christoph Thielecke, Eric Paris
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14018
Subject : kernel freezes, inotify problem
Submitter : Christoph Thielecke <christoph.thielecke@gmx.de>
Date : 2009-08-19 12:48 (7 days old)
References : http://marc.info/?l=linux-kernel&m=125068616818353&w=4
Handled-By : Eric Paris <eparis@parisplace.org>
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #14017] _end symbol missing from Symbol.map
2009-08-25 20:00 ` Rafael J. Wysocki
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Hannes Reinecke
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14017
Subject : _end symbol missing from Symbol.map
Submitter : Hannes Reinecke <hare@suse.de>
Date : 2009-08-13 6:45 (13 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=091e52c3551d3031343df24b573b770b4c6c72b6
References : http://marc.info/?l=linux-kernel&m=125014649102253&w=4
Handled-By : Hannes Reinecke <hare@suse.de>
Patch : http://marc.info/?l=linux-kernel&m=125014649102253&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #14030] Kernel NULL pointer dereference at 0000000000000008, pty-related
2009-08-25 20:00 ` Rafael J. Wysocki
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Eric W. Biederman, Linus Torvalds
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14030
Subject : Kernel NULL pointer dereference at 0000000000000008, pty-related
Submitter : Eric W. Biederman <ebiederm@xmission.com>
Date : 2009-08-20 5:46 (6 days old)
References : http://marc.info/?l=linux-kernel&m=125074724623423&w=4
Handled-By : Linus Torvalds <torvalds@linux-foundation.org>
Patch : http://patchwork.kernel.org/patch/43679/
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #14030] Kernel NULL pointer dereference at 0000000000000008, pty-related
@ 2009-08-25 20:34 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Eric W. Biederman, Linus Torvalds
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14030
Subject : Kernel NULL pointer dereference at 0000000000000008, pty-related
Submitter : Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
Date : 2009-08-20 5:46 (6 days old)
References : http://marc.info/?l=linux-kernel&m=125074724623423&w=4
Handled-By : Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
Patch : http://patchwork.kernel.org/patch/43679/
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14030] Kernel NULL pointer dereference at 0000000000000008, pty-related
2009-08-25 20:34 ` Rafael J. Wysocki
(?)
@ 2009-08-26 0:16 ` Linus Torvalds
2009-08-26 21:11 ` Rafael J. Wysocki
-1 siblings, 1 reply; 286+ messages in thread
From: Linus Torvalds @ 2009-08-26 0:16 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux Kernel Mailing List, Kernel Testers List, Eric W. Biederman
On Tue, 25 Aug 2009, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14030
> Subject : Kernel NULL pointer dereference at 0000000000000008, pty-related
> Submitter : Eric W. Biederman <ebiederm@xmission.com>
> Date : 2009-08-20 5:46 (6 days old)
> References : http://marc.info/?l=linux-kernel&m=125074724623423&w=4
> Handled-By : Linus Torvalds <torvalds@linux-foundation.org>
> Patch : http://patchwork.kernel.org/patch/43679/
This is now committed as 5c58ceff103d8a654f24769bb1baaf84a841b0cc.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14030] Kernel NULL pointer dereference at 0000000000000008, pty-related
@ 2009-08-26 21:11 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-26 21:11 UTC (permalink / raw)
To: Linus Torvalds
Cc: Linux Kernel Mailing List, Kernel Testers List, Eric W. Biederman
On Wednesday 26 August 2009, Linus Torvalds wrote:
>
> On Tue, 25 Aug 2009, Rafael J. Wysocki wrote:
>
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14030
> > Subject : Kernel NULL pointer dereference at 0000000000000008, pty-related
> > Submitter : Eric W. Biederman <ebiederm@xmission.com>
> > Date : 2009-08-20 5:46 (6 days old)
> > References : http://marc.info/?l=linux-kernel&m=125074724623423&w=4
> > Handled-By : Linus Torvalds <torvalds@linux-foundation.org>
> > Patch : http://patchwork.kernel.org/patch/43679/
>
> This is now committed as 5c58ceff103d8a654f24769bb1baaf84a841b0cc.
Thanks, closed.
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14030] Kernel NULL pointer dereference at 0000000000000008, pty-related
@ 2009-08-26 21:11 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-26 21:11 UTC (permalink / raw)
To: Linus Torvalds
Cc: Linux Kernel Mailing List, Kernel Testers List, Eric W. Biederman
On Wednesday 26 August 2009, Linus Torvalds wrote:
>
> On Tue, 25 Aug 2009, Rafael J. Wysocki wrote:
>
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14030
> > Subject : Kernel NULL pointer dereference at 0000000000000008, pty-related
> > Submitter : Eric W. Biederman <ebiederm-aS9lmoZGLiVWk0Htik3J/w@public.gmane.org>
> > Date : 2009-08-20 5:46 (6 days old)
> > References : http://marc.info/?l=linux-kernel&m=125074724623423&w=4
> > Handled-By : Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>
> > Patch : http://patchwork.kernel.org/patch/43679/
>
> This is now committed as 5c58ceff103d8a654f24769bb1baaf84a841b0cc.
Thanks, closed.
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #14031] dvb_usb_af9015: Oops on hotplugging
2009-08-25 20:00 ` Rafael J. Wysocki
` (27 preceding siblings ...)
(?)
@ 2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-25 23:57 ` Stefan Lippers-Hollmann
-1 siblings, 1 reply; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Stefan Lippers-Hollmann
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14031
Subject : dvb_usb_af9015: Oops on hotplugging
Submitter : Stefan Lippers-Hollmann <s.L-H@gmx.de>
Date : 2009-08-05 20:32 (21 days old)
References : http://marc.info/?l=linux-kernel&m=124949716608828&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14031] dvb_usb_af9015: Oops on hotplugging
2009-08-25 20:34 ` [Bug #14031] dvb_usb_af9015: Oops on hotplugging Rafael J. Wysocki
@ 2009-08-25 23:57 ` Stefan Lippers-Hollmann
0 siblings, 0 replies; 286+ messages in thread
From: Stefan Lippers-Hollmann @ 2009-08-25 23:57 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Kernel Testers List
Hi
On Wednesday 26 August 2009, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14031
> Subject : dvb_usb_af9015: Oops on hotplugging
> Submitter : Stefan Lippers-Hollmann <s.L-H@gmx.de>
> Date : 2009-08-05 20:32 (21 days old)
> References : http://marc.info/?l=linux-kernel&m=124949716608828&w=4
This issue does not exist in 2.6.31-rc7(-git2) anymore.
Unfortunately I've been away from my system and with limited testing
abilities recently, so I can't pinpoint the actual patch that fixed this
(it was not in the first DVB pull request following that report, but maybe
in the second or third), but it does work very well again. Thank you.
Regards
Stefan Lippers-Hollmann
usb 1-6: new high speed USB device using ehci_hcd and address 4
usb 1-6: New USB device found, idVendor=0ccd, idProduct=0069
usb 1-6: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 1-6: Product: Cinergy T USB XE Ver.2
usb 1-6: Manufacturer: TerraTec
usb 1-6: SerialNumber: 10012007
usb 1-6: configuration #1 chosen from 1 choice
dvb-usb: found a 'TerraTec Cinergy T USB XE' in cold state, will try to load a firmware
usb 1-6: firmware: requesting dvb-usb-af9015.fw
dvb-usb: downloading firmware from file 'dvb-usb-af9015.fw'
dvb-usb: found a 'TerraTec Cinergy T USB XE' in warm state.
dvb-usb: will pass the complete MPEG2 transport stream to the software demuxer.
DVB: registering new adapter (TerraTec Cinergy T USB XE)
af9013: firmware version:4.95.0
DVB: registering adapter 0 frontend 0 (Afatech AF9013 DVB-T)...
mc44s803: successfully identified (ID = 14)
dvb-usb: TerraTec Cinergy T USB XE successfully initialized and connected.
usbcore: registered new interface driver dvb_usb_af9015
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14031] dvb_usb_af9015: Oops on hotplugging
@ 2009-08-25 23:57 ` Stefan Lippers-Hollmann
0 siblings, 0 replies; 286+ messages in thread
From: Stefan Lippers-Hollmann @ 2009-08-25 23:57 UTC (permalink / raw)
To: Rafael J. Wysocki; +Cc: Linux Kernel Mailing List, Kernel Testers List
Hi
On Wednesday 26 August 2009, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14031
> Subject : dvb_usb_af9015: Oops on hotplugging
> Submitter : Stefan Lippers-Hollmann <s.L-H-Mmb7MZpHnFY@public.gmane.org>
> Date : 2009-08-05 20:32 (21 days old)
> References : http://marc.info/?l=linux-kernel&m=124949716608828&w=4
This issue does not exist in 2.6.31-rc7(-git2) anymore.
Unfortunately I've been away from my system and with limited testing
abilities recently, so I can't pinpoint the actual patch that fixed this
(it was not in the first DVB pull request following that report, but maybe
in the second or third), but it does work very well again. Thank you.
Regards
Stefan Lippers-Hollmann
usb 1-6: new high speed USB device using ehci_hcd and address 4
usb 1-6: New USB device found, idVendor=0ccd, idProduct=0069
usb 1-6: New USB device strings: Mfr=1, Product=2, SerialNumber=3
usb 1-6: Product: Cinergy T USB XE Ver.2
usb 1-6: Manufacturer: TerraTec
usb 1-6: SerialNumber: 10012007
usb 1-6: configuration #1 chosen from 1 choice
dvb-usb: found a 'TerraTec Cinergy T USB XE' in cold state, will try to load a firmware
usb 1-6: firmware: requesting dvb-usb-af9015.fw
dvb-usb: downloading firmware from file 'dvb-usb-af9015.fw'
dvb-usb: found a 'TerraTec Cinergy T USB XE' in warm state.
dvb-usb: will pass the complete MPEG2 transport stream to the software demuxer.
DVB: registering new adapter (TerraTec Cinergy T USB XE)
af9013: firmware version:4.95.0
DVB: registering adapter 0 frontend 0 (Afatech AF9013 DVB-T)...
mc44s803: successfully identified (ID = 14)
dvb-usb: TerraTec Cinergy T USB XE successfully initialized and connected.
usbcore: registered new interface driver dvb_usb_af9015
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14031] dvb_usb_af9015: Oops on hotplugging
@ 2009-08-26 0:03 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-26 0:03 UTC (permalink / raw)
To: Stefan Lippers-Hollmann; +Cc: Linux Kernel Mailing List, Kernel Testers List
On Wednesday 26 August 2009, Stefan Lippers-Hollmann wrote:
> Hi
>
> On Wednesday 26 August 2009, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14031
> > Subject : dvb_usb_af9015: Oops on hotplugging
> > Submitter : Stefan Lippers-Hollmann <s.L-H@gmx.de>
> > Date : 2009-08-05 20:32 (21 days old)
> > References : http://marc.info/?l=linux-kernel&m=124949716608828&w=4
>
> This issue does not exist in 2.6.31-rc7(-git2) anymore.
>
> Unfortunately I've been away from my system and with limited testing
> abilities recently, so I can't pinpoint the actual patch that fixed this
> (it was not in the first DVB pull request following that report, but maybe
> in the second or third), but it does work very well again. Thank you.
Thanks, bug closed.
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #14031] dvb_usb_af9015: Oops on hotplugging
@ 2009-08-26 0:03 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-26 0:03 UTC (permalink / raw)
To: Stefan Lippers-Hollmann; +Cc: Linux Kernel Mailing List, Kernel Testers List
On Wednesday 26 August 2009, Stefan Lippers-Hollmann wrote:
> Hi
>
> On Wednesday 26 August 2009, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14031
> > Subject : dvb_usb_af9015: Oops on hotplugging
> > Submitter : Stefan Lippers-Hollmann <s.L-H-Mmb7MZpHnFY@public.gmane.org>
> > Date : 2009-08-05 20:32 (21 days old)
> > References : http://marc.info/?l=linux-kernel&m=124949716608828&w=4
>
> This issue does not exist in 2.6.31-rc7(-git2) anymore.
>
> Unfortunately I've been away from my system and with limited testing
> abilities recently, so I can't pinpoint the actual patch that fixed this
> (it was not in the first DVB pull request following that report, but maybe
> in the second or third), but it does work very well again. Thank you.
Thanks, bug closed.
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #14057] Strange network timeouts w/ e100
2009-08-25 20:00 ` Rafael J. Wysocki
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Krzysztof Halasa, Walt Holman
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14057
Subject : Strange network timeouts w/ e100
Submitter : Walt Holman <walt@holmansrus.com>
Date : 2009-08-20 0:21 (6 days old)
References : http://marc.info/?l=linux-kernel&m=125072831831443&w=4
Handled-By : Krzysztof Halasa <khc@pm.waw.pl>
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #14060] oops: sysfs_remove_link and i915
2009-08-25 20:00 ` Rafael J. Wysocki
` (29 preceding siblings ...)
(?)
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Dominik Brodowski
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14060
Subject : oops: sysfs_remove_link and i915
Submitter : Dominik Brodowski <linux@dominikbrodowski.net>
Date : 2009-08-22 5:48 (4 days old)
References : http://marc.info/?l=linux-kernel&m=125092139113955&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #14058] Oops in fsnotify
2009-08-25 20:00 ` Rafael J. Wysocki
` (30 preceding siblings ...)
(?)
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List; +Cc: Kernel Testers List, Eric Paris, Grant Wilson
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14058
Subject : Oops in fsnotify
Submitter : Grant Wilson <grant.wilson@zen.co.uk>
Date : 2009-08-20 15:48 (6 days old)
References : http://marc.info/?l=linux-kernel&m=125078450923133&w=4
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #14061] Crash due to buggy flat_phys_pkg_id
2009-08-25 20:00 ` Rafael J. Wysocki
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Ravikiran G Thirumalai, Yinghai Lu
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14061
Subject : Crash due to buggy flat_phys_pkg_id
Submitter : Ravikiran G Thirumalai <kiran@scalex86.org>
Date : 2009-08-24 18:26 (2 days old)
References : http://marc.info/?l=linux-kernel&m=125114085701508&w=4
Handled-By : Yinghai Lu <yinghai@kernel.org>
Patch : http://patchwork.kernel.org/patch/43806/
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #14062] Failure to boot as xen guest
2009-08-25 20:00 ` Rafael J. Wysocki
@ 2009-08-25 20:34 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-25 20:34 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Arnd Hannemann, Jeremy Fitzhardinge, Pekka Enberg
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14062
Subject : Failure to boot as xen guest
Submitter : Arnd Hannemann <hannemann@nets.rwth-aachen.de>
Date : 2009-08-25 15:48 (1 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=83b519e8b9572c319c8e0c615ee5dd7272856090
References : http://marc.info/?l=linux-kernel&m=125121534229538&w=4
Handled-By : Jeremy Fitzhardinge <jeremy@goop.org>
Patch : http://patchwork.kernel.org/patch/43799/
^ permalink raw reply [flat|nested] 286+ messages in thread
* 2.6.31-rc9: Reported regressions from 2.6.30
@ 2009-09-06 17:15 Rafael J. Wysocki
2009-09-06 17:24 ` Rafael J. Wysocki
0 siblings, 1 reply; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-09-06 17:15 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich,
Kernel Testers List, Network Development, Linux ACPI,
Linux PM List, Linux SCSI List, Linux Wireless List, DRI
This message contains a list of some regressions from 2.6.30, for which there
are no fixes in the mainline I know of. If any of them have been fixed already,
please let me know.
If you know of any other unresolved regressions from 2.6.30, please let me know
either and I'll add them to the list. Also, please let me know if any of the
entries below are invalid.
Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.
Listed regressions statistics:
Date Total Pending Unresolved
----------------------------------------
2009-09-06 123 34 27
2009-08-26 108 33 26
2009-08-20 102 32 29
2009-08-10 89 27 24
2009-08-02 76 36 28
2009-07-27 70 51 43
2009-07-07 35 25 21
2009-06-29 22 22 15
Unresolved regressions
----------------------
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14141
Subject : order 2 page allocation failures
Submitter : Frans Pop <elendil@planet.nl>
Date : 2009-09-06 7:40 (1 days old)
References : http://marc.info/?l=linux-kernel&m=125222287419691&w=4
Handled-By : Pekka Enberg <penberg@cs.helsinki.fi>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14139
Subject : Output to external monitor is broken
Submitter : Carlos R. Mafra <crmafra2@gmail.com>
Date : 2009-09-06 14:22 (1 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=f8aed700c6ec46ddade6570004ce25332283b306
References : http://marc.info/?l=linux-kernel&m=125224701520738&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14135
Subject : NULL pointer dereference in ima_counts_put
Submitter : Ciprian Docan <docan@eden.rutgers.edu>
Date : 2009-09-02 13:49 (5 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=94e5d714f604d4cb4cb13163f01ede278e69258b
References : http://marc.info/?l=linux-kernel&m=125190146028116&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14133
Subject : WARNING: at arch/x86/kernel/smp.c:117 native_smp_send_reschedule
Submitter : Jens Axboe <jens.axboe@oracle.com>
Date : 2009-08-31 20:43 (7 days old)
References : http://marc.info/?l=linux-kernel&m=125175143918050&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14114
Subject : Tuning a saa7134 based card is broken in kernel 2.6.31-rc7
Submitter : Tsvety Petrov <Tsvetoslav.Petrov@itron.com>
Date : 2009-09-03 21:06 (4 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14103
Subject : cdc_acm gives I/O error
Submitter : Paul Martin <pm@debian.org>
Date : 2009-09-01 13:30 (6 days old)
Handled-By : Oliver Neukum <oliver@neukum.org>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14095
Subject : Asus EeePC 1005HA-M: Suspend hangs and disables the wireless
Submitter : Karsten Jaeger <lists@oss42.com>
Date : 2009-08-31 10:14 (7 days old)
References : http://lists.alioth.debian.org/pipermail/debian-eeepc-devel/2009-August/002513.html
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14070
Subject : lockdep warning triggered by dup_fd
Submitter : Bart Van Assche <bart.vanassche@gmail.com>
Date : 2009-08-23 09:36 (15 days old)
References : http://lkml.org/lkml/2009/8/23/8
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14058
Subject : Oops in fsnotify
Submitter : Grant Wilson <grant.wilson@zen.co.uk>
Date : 2009-08-20 15:48 (18 days old)
References : http://marc.info/?l=linux-kernel&m=125078450923133&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14043
Subject : System sometimes hangs during boot
Submitter : Bart Van Assche <bart.vanassche@gmail.com>
Date : 2009-08-23 18:04 (15 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14018
Subject : kernel freezes, inotify problem
Submitter : Christoph Thielecke <christoph.thielecke@gmx.de>
Date : 2009-08-19 12:48 (19 days old)
References : http://marc.info/?l=linux-kernel&m=125068616818353&w=4
Handled-By : Eric Paris <eparis@parisplace.org>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14013
Subject : hd don't show up
Submitter : Tim Blechmann <tim@klingt.org>
Date : 2009-08-14 8:26 (24 days old)
References : http://marc.info/?l=linux-kernel&m=125023842514480&w=4
Handled-By : Tejun Heo <tj@kernel.org>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13987
Subject : Received NMI interrupt at resume
Submitter : Christian Casteyde <casteyde.christian@free.fr>
Date : 2009-08-15 07:55 (23 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13950
Subject : Oops when USB Serial disconnected while in use
Submitter : Bruno Prémont <bonbons@linux-vserver.org>
Date : 2009-08-08 17:47 (30 days old)
References : http://marc.info/?l=linux-kernel&m=124975432900466&w=4
Handled-By : Alan Stern <stern@rowland.harvard.edu>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943
Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k
Submitter : Fabio Comolli <fabio.comolli@gmail.com>
Date : 2009-08-06 20:15 (32 days old)
References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942
Subject : Troubles with AoE and uninitialized object
Submitter : Bruno Prémont <bonbons@linux-vserver.org>
Date : 2009-08-04 10:12 (34 days old)
References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13941
Subject : x86 Geode issue
Submitter : Martin-Éric Racine <q-funk@iki.fi>
Date : 2009-08-03 12:58 (35 days old)
References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940
Subject : iwlagn and sky2 stopped working, ACPI-related
Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net>
Date : 2009-08-07 22:33 (31 days old)
References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935
Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version)
Submitter : Adrian Ulrich <kernel@blinkenlights.ch>
Date : 2009-08-08 22:08 (30 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13906
Subject : Huawei E169 GPRS connection causes Ooops
Submitter : Clemens Eisserer <linuxhippy@gmail.com>
Date : 2009-08-04 09:02 (34 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13869
Subject : Radeon framebuffer (w/o KMS) corruption at boot.
Submitter : Duncan <1i5t5.duncan@cox.net>
Date : 2009-07-29 16:44 (40 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836
Subject : suspend script fails, related to stdout?
Submitter : Tomas M. <tmezzadra@gmail.com>
Date : 2009-07-17 21:24 (52 days old)
References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
Subject : system freeze when switching to console
Submitter : Reinette Chatre <reinette.chatre@intel.com>
Date : 2009-07-23 17:57 (46 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13809
Subject : oprofile: possible circular locking dependency detected
Submitter : Jerome Marchand <jmarchan@redhat.com>
Date : 2009-07-22 13:35 (47 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13740
Subject : X server crashes with 2.6.31-rc2 when options are changed
Submitter : Michael S. Tsirkin <m.s.tsirkin@gmail.com>
Date : 2009-07-07 15:19 (62 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13733
Subject : 2.6.31-rc2: irq 16: nobody cared
Submitter : Niel Lambrechts <niel.lambrechts@gmail.com>
Date : 2009-07-06 18:32 (63 days old)
References : http://marc.info/?l=linux-kernel&m=124690524027166&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13645
Subject : NULL pointer dereference at (null) (level2_spare_pgt)
Submitter : poornima nayak <mpnayak@linux.vnet.ibm.com>
Date : 2009-06-17 17:56 (82 days old)
References : http://lkml.org/lkml/2009/6/17/194
Regressions with patches
------------------------
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14140
Subject : 2.6.31-rc9 breaks gianfar
Submitter : Michael Guntsche <mike@it-loops.com>
Date : 2009-09-06 7:27 (1 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=38bddf04bcfe661fbdab94888c3b72c32f6873b3
References : http://marc.info/?l=linux-kernel&m=125222206218784&w=4
Handled-By : David Miller <davem@davemloft.net>
Patch : http://patchwork.kernel.org/patch/45965/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14138
Subject : Regression in suspend to ram
Submitter : Zdenek Kabelac <zdenek.kabelac@gmail.com>
Date : 2009-08-31 11:51 (7 days old)
References : http://marc.info/?l=linux-kernel&m=125171952817851&w=4
Handled-By : OGAWA Hirofumi <hirofumi@mail.parknet.co.jp>
Patch : http://patchwork.kernel.org/patch/45945/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14137
Subject : usb console regressions
Submitter : Jason Wessel <jason.wessel@windriver.com>
Date : 2009-09-05 21:08 (2 days old)
References : http://marc.info/?l=linux-kernel&m=125218501310512&w=4
Handled-By : Jason Wessel <jason.wessel@windriver.com>
Patch : http://patchwork.kernel.org/patch/45953/
http://patchwork.kernel.org/patch/45952/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14136
Subject : readcd Oops
Submitter : Bob Tracy <rct@gherkin.frus.com>
Date : 2009-09-03 3:39 (4 days old)
References : http://marc.info/?l=linux-kernel&m=125195043617418&w=4
Handled-By : Michal Schmidt <mschmidt@redhat.com>
Patch : http://patchwork.kernel.org/patch/45347/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14017
Subject : _end symbol missing from Symbol.map
Submitter : Hannes Reinecke <hare@suse.de>
Date : 2009-08-13 6:45 (25 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=091e52c3551d3031343df24b573b770b4c6c72b6
References : http://marc.info/?l=linux-kernel&m=125014649102253&w=4
Handled-By : Hannes Reinecke <hare@suse.de>
Patch : http://marc.info/?l=linux-kernel&m=125014649102253&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13948
Subject : ath5k broken after suspend-to-ram
Submitter : Johannes Stezenbach <js@sig21.net>
Date : 2009-08-07 21:51 (31 days old)
References : http://marc.info/?l=linux-kernel&m=124968192727854&w=4
Handled-By : Nick Kossifidis <mickflemm@gmail.com>
Patch : http://patchwork.kernel.org/patch/38550/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13947
Subject : Libertas: Association request to the driver failed
Submitter : Daniel Mack <daniel@caiaq.de>
Date : 2009-08-07 19:11 (31 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=57921c312e8cef72ba35a4cfe870b376da0b1b87
References : http://marc.info/?l=linux-kernel&m=124967234311481&w=4
Handled-By : Roel Kluin <roel.kluin@gmail.com>
Dan Williams <dcbw@redhat.com>
Patch : http://patchwork.kernel.org/patch/43114/
For details, please visit the bug entries and follow the links given in
references.
As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.30,
unresolved as well as resolved, at:
http://bugzilla.kernel.org/show_bug.cgi?id=13615
Please let me know if there are any Bugzilla entries that should be added to
the list in there.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13819] system freeze when switching to console
2009-09-06 17:15 2.6.31-rc9: Reported regressions from 2.6.30 Rafael J. Wysocki
@ 2009-09-06 17:24 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-09-06 17:24 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Eric Anholt, ling.ma, Linus Torvalds,
Ma Ling, Reinette Chatre
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
Subject : system freeze when switching to console
Submitter : Reinette Chatre <reinette.chatre@intel.com>
Date : 2009-07-23 17:57 (46 days old)
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13819] system freeze when switching to console
@ 2009-09-06 17:24 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-09-06 17:24 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Eric Anholt, ling.ma-ral2JQCrhuEAvxtiuMwx3w,
Linus Torvalds, Ma Ling, Reinette Chatre
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
Subject : system freeze when switching to console
Submitter : Reinette Chatre <reinette.chatre-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Date : 2009-07-23 17:57 (46 days old)
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
2009-09-06 17:24 ` Rafael J. Wysocki
@ 2009-09-08 16:29 ` reinette chatre
-1 siblings, 0 replies; 286+ messages in thread
From: reinette chatre @ 2009-09-08 16:29 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux Kernel Mailing List, Kernel Testers List, Eric Anholt, Ma,
Ling, Linus Torvalds
On Sun, 2009-09-06 at 10:24 -0700, Rafael J. Wysocki wrote:
> Please verify if it still should be listed and let me know
> (either way).
Issue is still present in 2.6.31-rc8.
Reinette
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
@ 2009-09-08 16:29 ` reinette chatre
0 siblings, 0 replies; 286+ messages in thread
From: reinette chatre @ 2009-09-08 16:29 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux Kernel Mailing List, Kernel Testers List, Eric Anholt, Ma,
Ling, Linus Torvalds
On Sun, 2009-09-06 at 10:24 -0700, Rafael J. Wysocki wrote:
> Please verify if it still should be listed and let me know
> (either way).
Issue is still present in 2.6.31-rc8.
Reinette
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
2009-09-08 16:29 ` reinette chatre
@ 2009-09-08 17:00 ` Linus Torvalds
-1 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-08 17:00 UTC (permalink / raw)
To: reinette chatre
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling, bugzilla-daemon
On Tue, 8 Sep 2009, reinette chatre wrote:
> On Sun, 2009-09-06 at 10:24 -0700, Rafael J. Wysocki wrote:
> > Please verify if it still should be listed and let me know
> > (either way).
>
> Issue is still present in 2.6.31-rc8.
Is there any chance that you could connect a serial line to the machine?
Your report about blinking keyboard led's means that there's an oops, but
since the display isn't in textmode (and the oops obviously happens when
trying to enter it), we don't know what it is.
A serial line (along with a kernel compiled with serial console support,
of course, and a kernel command line option like "console=ttyS0,115400
console=tty0") would get that. You'd just need another machine with a
terminal program like minicom..
The network console could also work out, but serial lines tend to be more
reliable if you have them. But in the absense of serial lines, see the
Documentation/networking/netconsole.txt file for some details. The setup
is more complicated, but on the other hand it's a lot more dynamic, and in
your case - since the box works until you try to switch to text-mode, I
suspect the network console dynamic run-time setup would be easy for you
to use.
(For other examples of using netconsole with that dynamic mode, just
google for "sys/kernel/config/netconsole" and you'll find a number of docs
that explain how to find the MAC address for setup etc).
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
@ 2009-09-08 17:00 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-08 17:00 UTC (permalink / raw)
To: reinette chatre
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling,
bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r
On Tue, 8 Sep 2009, reinette chatre wrote:
> On Sun, 2009-09-06 at 10:24 -0700, Rafael J. Wysocki wrote:
> > Please verify if it still should be listed and let me know
> > (either way).
>
> Issue is still present in 2.6.31-rc8.
Is there any chance that you could connect a serial line to the machine?
Your report about blinking keyboard led's means that there's an oops, but
since the display isn't in textmode (and the oops obviously happens when
trying to enter it), we don't know what it is.
A serial line (along with a kernel compiled with serial console support,
of course, and a kernel command line option like "console=ttyS0,115400
console=tty0") would get that. You'd just need another machine with a
terminal program like minicom..
The network console could also work out, but serial lines tend to be more
reliable if you have them. But in the absense of serial lines, see the
Documentation/networking/netconsole.txt file for some details. The setup
is more complicated, but on the other hand it's a lot more dynamic, and in
your case - since the box works until you try to switch to text-mode, I
suspect the network console dynamic run-time setup would be easy for you
to use.
(For other examples of using netconsole with that dynamic mode, just
google for "sys/kernel/config/netconsole" and you'll find a number of docs
that explain how to find the MAC address for setup etc).
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
@ 2009-09-08 17:36 ` reinette chatre
0 siblings, 0 replies; 286+ messages in thread
From: reinette chatre @ 2009-09-08 17:36 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling, bugzilla-daemon
On Tue, 2009-09-08 at 10:00 -0700, Linus Torvalds wrote:
>
> On Tue, 8 Sep 2009, reinette chatre wrote:
>
> > On Sun, 2009-09-06 at 10:24 -0700, Rafael J. Wysocki wrote:
> > > Please verify if it still should be listed and let me know
> > > (either way).
> >
> > Issue is still present in 2.6.31-rc8.
>
> Is there any chance that you could connect a serial line to the machine?
The system does not have a serial console, but I was able to set up
netconsole. For what it is worth, I did not do this until now because
(1) I was able to bisect the problem, and (2) I asked driver developers
directly how I can help to debug this and I received no response.
As you can see from the kernel version it is not a build of a vanilla
kernel. It only contains changes related to the wireless networking work
I am doing.
Here is the output:
[ 352.803652] render error detected, EIR: 0x00000010
[ 352.803684] IPEIR: 0x00000000
[ 352.803709] IPEHR: 0x01000000
[ 352.803732] INSTDONE: 0xfffffffe
[ 352.803754] INSTPS: 0x0001e000
[ 352.803776] INSTDONE1: 0xffffffff
[ 352.803801] ACTHD: 0x0480a3c8
[ 352.803823] page table error
[ 352.803846] PGTBL_ER: 0x00100000
[ 352.803870] [drm:i915_handle_error] *ERROR* EIR stuck: 0x00000010, masking
[ 352.803960] BUG: unable to handle kernel NULL pointer dereference at 0000000000000084
[ 352.804006] IP: [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]
[ 352.804006] PGD b5d00067 PUD b9753067 PMD 0
[ 352.804006] Oops: 0000 [#1] SMP
[ 352.804006] last sysfs file: /sys/class/power_supply/BAT0/energy_full
[ 352.804006] CPU 0
[ 352.804006] Modules linked in: i915 drm i2c_algo_bit i2c_core ipv6 acpi_cpufreq cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_conservative cpufreq_stats freq_table container sbs sbshc arc4 ecb joydev af_packet pcmcia psmouse sony_laptop serio_raw yenta_socket rsrc_nonstatic pcmcia_core pcspkr iTCO_wdt iTCO_vendor_support rfkill intel_agp button battery tpm_infineon tpm tpm_bios processor video output ac evdev ext3 jbd mbcache sr_mod sg cdrom sd_mod ahci libata scsi_mod ehci_hcd uhci_hcd usbcore thermal fan thermal_sys [last unloaded: cfg80211]
[ 352.804006] Pid: 4424, comm: Xorg Not tainted 2.6.31-rc8-wl-50925-gdcecd82-dirty #57 VGN-Z540N
[ 352.804006] RIP: 0010:[<ffffffffa03ecaab>] [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]
[ 352.804006] RSP: 0018:ffff880001e9de58 EFLAGS: 00010082
[ 352.804006] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 352.804006] RDX: ffffc9000007d898 RSI: 0000000000000001 RDI: ffffffff8132f0f8
[ 352.804006] RBP: ffff880001e9dee8 R08: 0000000000000002 R09: ffff880037373c38
[ 352.804006] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800b57fe000
[ 352.804006] R13: 000000000000001f R14: ffff8800b57fe000 R15: ffff8800b9746000
[ 352.804006] FS: 00007fcc05d20700(0000) GS:ffff880001e9a000(0000) knlGS:0000000000000000
[ 352.804006] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 352.804006] CR2: 0000000000000084 CR3: 00000000b50c3000 CR4: 00000000000006f0
[ 352.804006] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 352.804006] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 352.804006] Process Xorg (pid: 4424, threadinfo ffff8800b6b1a000, task ffff880037373c00)
[ 352.804006] Stack:
[ 352.804006] ffffffff8106db7d 0000000000000086 ffff88009a5ce040 ffff8800b57fe158
[ 352.804006] <0> ffff8800b57fe1a8 ffff8800b57fe110 0004000000008000 0000000400440202
[ 352.804006] <0> 0000000000000086 0044020200000000 0000001000040000 0000000000000040
[ 352.804006] Call Trace:
[ 352.804006] <IRQ>
[ 352.804006] [<ffffffff8106db7d>] ? mark_held_locks+0x6d/0x90
[ 352.804006] [<ffffffff81098ee8>] handle_IRQ_event+0x68/0x170
[ 352.804006] [<ffffffff8109ac01>] handle_edge_irq+0xc1/0x160
[ 352.804006] [<ffffffff8100e76f>] handle_irq+0x1f/0x30
[ 352.804006] [<ffffffff8100dc6a>] do_IRQ+0x6a/0xf0
[ 352.804006] [<ffffffff8100c793>] ret_from_intr+0x0/0xf
[ 352.804006] <EOI>
[ 352.804006] [<ffffffff81070b88>] ? lock_acquire+0xe8/0x100
[ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
[ 352.804006] [<ffffffff8132d7b5>] ? mutex_lock_nested+0x45/0x320
[ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
[ 352.804006] [<ffffffff8106de85>] ? trace_hardirqs_on_caller+0x145/0x190
[ 352.804006] [<ffffffff8106dedd>] ? trace_hardirqs_on+0xd/0x10
[ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
[ 352.804006] [<ffffffffa03f3335>] ? i915_gem_idle+0x225/0x330 [i915]
[ 352.804006] [<ffffffffa03f34c7>] ? i915_gem_leavevt_ioctl+0x37/0x50 [i915]
[ 352.804006] [<ffffffffa03bdafd>] ? drm_ioctl+0x17d/0x3c0 [drm]
[ 352.804006] [<ffffffffa03f3490>] ? i915_gem_leavevt_ioctl+0x0/0x50 [i915]
[ 352.804006] [<ffffffff810d0ad5>] ? do_wp_page+0x185/0x7a0
[ 352.804006] [<ffffffff811a9a33>] ? __up_read+0x23/0xb0
[ 352.804006] [<ffffffff810ff17d>] ? vfs_ioctl+0x7d/0xa0
[ 352.804006] [<ffffffff810ff2ba>] ? do_vfs_ioctl+0x8a/0x5c0
[ 352.804006] [<ffffffff8105fec6>] ? up_read+0x26/0x30
[ 352.804006] [<ffffffff8100c829>] ? retint_swapgs+0xe/0x13
[ 352.804006] [<ffffffff810ff889>] ? sys_ioctl+0x99/0xa0
[ 352.804006] [<ffffffff8100bd6b>] ? system_call_fastpath+0x16/0x1b
[ 352.804006] Code: 00 8b 18 49 8b 87 b0 05 00 00 48 8b 80 20 02 00 00 48 85 c0 74 21 48 8b 80 00 01 00 00 48 8b 50 08 48 85 d2 74 11 49 8b 44 24 78 <8b> 80 84 00 00 00 89 82 08 08 00 00 f6 45 a0 02 0f 85 47 03 00
[ 352.804006] RIP [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]
[ 352.804006] RSP <ffff880001e9de58>
[ 352.804006] CR2: 0000000000000084
[ 352.804006] ---[ end trace 756dbe26c2f29fdd ]---
[ 352.804006] Kernel panic - not syncing: Fatal exception in interrupt
[ 352.804006] Pid: 4424, comm: Xorg Tainted: G D 2.6.31-rc8-wl-50925-gdcecd82-dirty #57
[ 352.804006] Call Trace:
[ 352.804006] <IRQ> [<ffffffff8132ba7f>] panic+0xa0/0x170
[ 352.804006] [<ffffffff8132f0f8>] ? _spin_unlock_irqrestore+0x58/0x60
[ 352.804006] [<ffffffff81041b35>] ? release_console_sem+0x1f5/0x240
[ 352.804006] [<ffffffff81041e05>] ? console_unblank+0x75/0x90
[ 352.804006] [<ffffffff813306c4>] oops_end+0xd4/0xe0
[ 352.804006] [<ffffffff810279d8>] no_context+0xe8/0x260
[ 352.804006] [<ffffffff81027ca5>] __bad_area_nosemaphore+0x155/0x1f0
[ 352.804006] [<ffffffff8106ca5d>] ? trace_hardirqs_off+0xd/0x10
[ 352.804006] [<ffffffff8132f0f8>] ? _spin_unlock_irqrestore+0x58/0x60
[ 352.804006] [<ffffffff8103bb58>] ? try_to_wake_up+0xe8/0x210
[ 352.804006] [<ffffffff81027d4e>] bad_area_nosemaphore+0xe/0x10
[ 352.804006] [<ffffffff8133204e>] do_page_fault+0x29e/0x350
[ 352.804006] [<ffffffff8132f8af>] page_fault+0x1f/0x30
[ 352.804006] [<ffffffff8132f0f8>] ? _spin_unlock_irqrestore+0x58/0x60
[ 352.804006] [<ffffffffa03ecaab>] ? i915_driver_irq_handler+0x26b/0xd20 [i915]
[ 352.804006] [<ffffffffa03ec9cb>] ? i915_driver_irq_handler+0x18b/0xd20 [i915]
[ 352.804006] [<ffffffff8106db7d>] ? mark_held_locks+0x6d/0x90
[ 352.804006] [<ffffffff81098ee8>] handle_IRQ_event+0x68/0x170
[ 352.804006] [<ffffffff8109ac01>] handle_edge_irq+0xc1/0x160
[ 352.804006] [<ffffffff8100e76f>] handle_irq+0x1f/0x30
[ 352.804006] [<ffffffff8100dc6a>] do_IRQ+0x6a/0xf0
[ 352.804006] [<ffffffff8100c793>] ret_from_intr+0x0/0xf
[ 352.804006] <EOI> [<ffffffff81070b88>] ? lock_acquire+0xe8/0x100
[ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
[ 352.804006] [<ffffffff8132d7b5>] ? mutex_lock_nested+0x45/0x320
[ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
[ 352.804006] [<ffffffff8106de85>] ? trace_hardirqs_on_caller+0x145/0x190
[ 352.804006] [<ffffffff8106dedd>] ? trace_hardirqs_on+0xd/0x10
[ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
[ 352.804006] [<ffffffffa03f3335>] ? i915_gem_idle+0x225/0x330 [i915]
[ 352.804006] [<ffffffffa03f34c7>] ? i915_gem_leavevt_ioctl+0x37/0x50 [i915]
[ 352.804006] [<ffffffffa03bdafd>] ? drm_ioctl+0x17d/0x3c0 [drm]
[ 352.804006] [<ffffffffa03f3490>] ? i915_gem_leavevt_ioctl+0x0/0x50 [i915]
[ 352.804006] [<ffffffff810d0ad5>] ? do_wp_page+0x185/0x7a0
[ 352.804006] [<ffffffff811a9a33>] ? __up_read+0x23/0xb0
[ 352.804006] [<ffffffff810ff17d>] ? vfs_ioctl+0x7d/0xa0
[ 352.804006] [<ffffffff810ff2ba>] ? do_vfs_ioctl+0x8a/0x5c0
[ 352.804006] [<ffffffff8105fec6>] ? up_read+0x26/0x30
[ 352.804006] [<ffffffff8100c829>] ? retint_swapgs+0xe/0x13
[ 352.804006] [<ffffffff810ff889>] ? sys_ioctl+0x99/0xa0
[ 352.804006] [<ffffffff8100bd6b>] ? system_call_fastpath+0x16/0x1b
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
@ 2009-09-08 17:36 ` reinette chatre
0 siblings, 0 replies; 286+ messages in thread
From: reinette chatre @ 2009-09-08 17:36 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling,
bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r
On Tue, 2009-09-08 at 10:00 -0700, Linus Torvalds wrote:
>
> On Tue, 8 Sep 2009, reinette chatre wrote:
>
> > On Sun, 2009-09-06 at 10:24 -0700, Rafael J. Wysocki wrote:
> > > Please verify if it still should be listed and let me know
> > > (either way).
> >
> > Issue is still present in 2.6.31-rc8.
>
> Is there any chance that you could connect a serial line to the machine?
The system does not have a serial console, but I was able to set up
netconsole. For what it is worth, I did not do this until now because
(1) I was able to bisect the problem, and (2) I asked driver developers
directly how I can help to debug this and I received no response.
As you can see from the kernel version it is not a build of a vanilla
kernel. It only contains changes related to the wireless networking work
I am doing.
Here is the output:
[ 352.803652] render error detected, EIR: 0x00000010
[ 352.803684] IPEIR: 0x00000000
[ 352.803709] IPEHR: 0x01000000
[ 352.803732] INSTDONE: 0xfffffffe
[ 352.803754] INSTPS: 0x0001e000
[ 352.803776] INSTDONE1: 0xffffffff
[ 352.803801] ACTHD: 0x0480a3c8
[ 352.803823] page table error
[ 352.803846] PGTBL_ER: 0x00100000
[ 352.803870] [drm:i915_handle_error] *ERROR* EIR stuck: 0x00000010, masking
[ 352.803960] BUG: unable to handle kernel NULL pointer dereference at 0000000000000084
[ 352.804006] IP: [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]
[ 352.804006] PGD b5d00067 PUD b9753067 PMD 0
[ 352.804006] Oops: 0000 [#1] SMP
[ 352.804006] last sysfs file: /sys/class/power_supply/BAT0/energy_full
[ 352.804006] CPU 0
[ 352.804006] Modules linked in: i915 drm i2c_algo_bit i2c_core ipv6 acpi_cpufreq cpufreq_userspace cpufreq_powersave cpufreq_ondemand cpufreq_conservative cpufreq_stats freq_table container sbs sbshc arc4 ecb joydev af_packet pcmcia psmouse sony_laptop serio_raw yenta_socket rsrc_nonstatic pcmcia_core pcspkr iTCO_wdt iTCO_vendor_support rfkill intel_agp button battery tpm_infineon tpm tpm_bios processor video output ac evdev ext3 jbd mbcache sr_mod sg cdrom sd_mod ahci libata scsi_mod ehci_hcd uhci_hcd usbcore thermal fan thermal_sys [last unloaded: cfg80211]
[ 352.804006] Pid: 4424, comm: Xorg Not tainted 2.6.31-rc8-wl-50925-gdcecd82-dirty #57 VGN-Z540N
[ 352.804006] RIP: 0010:[<ffffffffa03ecaab>] [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]
[ 352.804006] RSP: 0018:ffff880001e9de58 EFLAGS: 00010082
[ 352.804006] RAX: 0000000000000000 RBX: 0000000000000000 RCX: 0000000000000000
[ 352.804006] RDX: ffffc9000007d898 RSI: 0000000000000001 RDI: ffffffff8132f0f8
[ 352.804006] RBP: ffff880001e9dee8 R08: 0000000000000002 R09: ffff880037373c38
[ 352.804006] R10: 0000000000000000 R11: 0000000000000001 R12: ffff8800b57fe000
[ 352.804006] R13: 000000000000001f R14: ffff8800b57fe000 R15: ffff8800b9746000
[ 352.804006] FS: 00007fcc05d20700(0000) GS:ffff880001e9a000(0000) knlGS:0000000000000000
[ 352.804006] CS: 0010 DS: 0000 ES: 0000 CR0: 000000008005003b
[ 352.804006] CR2: 0000000000000084 CR3: 00000000b50c3000 CR4: 00000000000006f0
[ 352.804006] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 352.804006] DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
[ 352.804006] Process Xorg (pid: 4424, threadinfo ffff8800b6b1a000, task ffff880037373c00)
[ 352.804006] Stack:
[ 352.804006] ffffffff8106db7d 0000000000000086 ffff88009a5ce040 ffff8800b57fe158
[ 352.804006] <0> ffff8800b57fe1a8 ffff8800b57fe110 0004000000008000 0000000400440202
[ 352.804006] <0> 0000000000000086 0044020200000000 0000001000040000 0000000000000040
[ 352.804006] Call Trace:
[ 352.804006] <IRQ>
[ 352.804006] [<ffffffff8106db7d>] ? mark_held_locks+0x6d/0x90
[ 352.804006] [<ffffffff81098ee8>] handle_IRQ_event+0x68/0x170
[ 352.804006] [<ffffffff8109ac01>] handle_edge_irq+0xc1/0x160
[ 352.804006] [<ffffffff8100e76f>] handle_irq+0x1f/0x30
[ 352.804006] [<ffffffff8100dc6a>] do_IRQ+0x6a/0xf0
[ 352.804006] [<ffffffff8100c793>] ret_from_intr+0x0/0xf
[ 352.804006] <EOI>
[ 352.804006] [<ffffffff81070b88>] ? lock_acquire+0xe8/0x100
[ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
[ 352.804006] [<ffffffff8132d7b5>] ? mutex_lock_nested+0x45/0x320
[ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
[ 352.804006] [<ffffffff8106de85>] ? trace_hardirqs_on_caller+0x145/0x190
[ 352.804006] [<ffffffff8106dedd>] ? trace_hardirqs_on+0xd/0x10
[ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
[ 352.804006] [<ffffffffa03f3335>] ? i915_gem_idle+0x225/0x330 [i915]
[ 352.804006] [<ffffffffa03f34c7>] ? i915_gem_leavevt_ioctl+0x37/0x50 [i915]
[ 352.804006] [<ffffffffa03bdafd>] ? drm_ioctl+0x17d/0x3c0 [drm]
[ 352.804006] [<ffffffffa03f3490>] ? i915_gem_leavevt_ioctl+0x0/0x50 [i915]
[ 352.804006] [<ffffffff810d0ad5>] ? do_wp_page+0x185/0x7a0
[ 352.804006] [<ffffffff811a9a33>] ? __up_read+0x23/0xb0
[ 352.804006] [<ffffffff810ff17d>] ? vfs_ioctl+0x7d/0xa0
[ 352.804006] [<ffffffff810ff2ba>] ? do_vfs_ioctl+0x8a/0x5c0
[ 352.804006] [<ffffffff8105fec6>] ? up_read+0x26/0x30
[ 352.804006] [<ffffffff8100c829>] ? retint_swapgs+0xe/0x13
[ 352.804006] [<ffffffff810ff889>] ? sys_ioctl+0x99/0xa0
[ 352.804006] [<ffffffff8100bd6b>] ? system_call_fastpath+0x16/0x1b
[ 352.804006] Code: 00 8b 18 49 8b 87 b0 05 00 00 48 8b 80 20 02 00 00 48 85 c0 74 21 48 8b 80 00 01 00 00 48 8b 50 08 48 85 d2 74 11 49 8b 44 24 78 <8b> 80 84 00 00 00 89 82 08 08 00 00 f6 45 a0 02 0f 85 47 03 00
[ 352.804006] RIP [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]
[ 352.804006] RSP <ffff880001e9de58>
[ 352.804006] CR2: 0000000000000084
[ 352.804006] ---[ end trace 756dbe26c2f29fdd ]---
[ 352.804006] Kernel panic - not syncing: Fatal exception in interrupt
[ 352.804006] Pid: 4424, comm: Xorg Tainted: G D 2.6.31-rc8-wl-50925-gdcecd82-dirty #57
[ 352.804006] Call Trace:
[ 352.804006] <IRQ> [<ffffffff8132ba7f>] panic+0xa0/0x170
[ 352.804006] [<ffffffff8132f0f8>] ? _spin_unlock_irqrestore+0x58/0x60
[ 352.804006] [<ffffffff81041b35>] ? release_console_sem+0x1f5/0x240
[ 352.804006] [<ffffffff81041e05>] ? console_unblank+0x75/0x90
[ 352.804006] [<ffffffff813306c4>] oops_end+0xd4/0xe0
[ 352.804006] [<ffffffff810279d8>] no_context+0xe8/0x260
[ 352.804006] [<ffffffff81027ca5>] __bad_area_nosemaphore+0x155/0x1f0
[ 352.804006] [<ffffffff8106ca5d>] ? trace_hardirqs_off+0xd/0x10
[ 352.804006] [<ffffffff8132f0f8>] ? _spin_unlock_irqrestore+0x58/0x60
[ 352.804006] [<ffffffff8103bb58>] ? try_to_wake_up+0xe8/0x210
[ 352.804006] [<ffffffff81027d4e>] bad_area_nosemaphore+0xe/0x10
[ 352.804006] [<ffffffff8133204e>] do_page_fault+0x29e/0x350
[ 352.804006] [<ffffffff8132f8af>] page_fault+0x1f/0x30
[ 352.804006] [<ffffffff8132f0f8>] ? _spin_unlock_irqrestore+0x58/0x60
[ 352.804006] [<ffffffffa03ecaab>] ? i915_driver_irq_handler+0x26b/0xd20 [i915]
[ 352.804006] [<ffffffffa03ec9cb>] ? i915_driver_irq_handler+0x18b/0xd20 [i915]
[ 352.804006] [<ffffffff8106db7d>] ? mark_held_locks+0x6d/0x90
[ 352.804006] [<ffffffff81098ee8>] handle_IRQ_event+0x68/0x170
[ 352.804006] [<ffffffff8109ac01>] handle_edge_irq+0xc1/0x160
[ 352.804006] [<ffffffff8100e76f>] handle_irq+0x1f/0x30
[ 352.804006] [<ffffffff8100dc6a>] do_IRQ+0x6a/0xf0
[ 352.804006] [<ffffffff8100c793>] ret_from_intr+0x0/0xf
[ 352.804006] <EOI> [<ffffffff81070b88>] ? lock_acquire+0xe8/0x100
[ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
[ 352.804006] [<ffffffff8132d7b5>] ? mutex_lock_nested+0x45/0x320
[ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
[ 352.804006] [<ffffffff8106de85>] ? trace_hardirqs_on_caller+0x145/0x190
[ 352.804006] [<ffffffff8106dedd>] ? trace_hardirqs_on+0xd/0x10
[ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
[ 352.804006] [<ffffffffa03f3335>] ? i915_gem_idle+0x225/0x330 [i915]
[ 352.804006] [<ffffffffa03f34c7>] ? i915_gem_leavevt_ioctl+0x37/0x50 [i915]
[ 352.804006] [<ffffffffa03bdafd>] ? drm_ioctl+0x17d/0x3c0 [drm]
[ 352.804006] [<ffffffffa03f3490>] ? i915_gem_leavevt_ioctl+0x0/0x50 [i915]
[ 352.804006] [<ffffffff810d0ad5>] ? do_wp_page+0x185/0x7a0
[ 352.804006] [<ffffffff811a9a33>] ? __up_read+0x23/0xb0
[ 352.804006] [<ffffffff810ff17d>] ? vfs_ioctl+0x7d/0xa0
[ 352.804006] [<ffffffff810ff2ba>] ? do_vfs_ioctl+0x8a/0x5c0
[ 352.804006] [<ffffffff8105fec6>] ? up_read+0x26/0x30
[ 352.804006] [<ffffffff8100c829>] ? retint_swapgs+0xe/0x13
[ 352.804006] [<ffffffff810ff889>] ? sys_ioctl+0x99/0xa0
[ 352.804006] [<ffffffff8100bd6b>] ? system_call_fastpath+0x16/0x1b
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
2009-09-08 17:36 ` reinette chatre
(?)
@ 2009-09-08 18:06 ` Linus Torvalds
2009-09-08 18:20 ` Jesse Barnes
` (2 more replies)
-1 siblings, 3 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-08 18:06 UTC (permalink / raw)
To: reinette chatre
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling, bugzilla-daemon
On Tue, 8 Sep 2009, reinette chatre wrote:
>
> As you can see from the kernel version it is not a build of a vanilla
> kernel. It only contains changes related to the wireless networking work
> I am doing.
>
> Here is the output:
Thanks, this is great. It pinpoints the problem very effectively.
> [ 352.803960] BUG: unable to handle kernel NULL pointer dereference at 0000000000000084
> [ 352.804006] IP: [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]
The code here is
16: 48 8b 80 00 01 00 00 mov 0x100(%rax),%rax
1d: 48 8b 50 08 mov 0x8(%rax),%rdx
21: 48 85 d2 test %rdx,%rdx
24: 74 11 je 0x37
26: 49 8b 44 24 78 mov 0x78(%r12),%rax
2b:* 8b 80 84 00 00 00 mov 0x84(%rax),%eax <-- trapping instruction
31: 89 82 08 08 00 00 mov %eax,0x808(%rdx)
37: f6 45 a0 02 testb $0x2,-0x60(%rbp)
and that "testb $0x2, -0x60(%rbp)" seems to be the
if (iir & I915_USER_INTERRUPT) {
test if I'm reading things right. Although it could also be the
if (eir & I915_ERROR_MEMORY_REFRESH) {
thing. The disassembly is totally impossible to read, because the stupid
i915 driver is chock-full of crap like
if (IS_G4X(dev)) {
..
which expands to insane amounts of code that check the PCI ID's one by
one.
Intel guys: could you _please_ stop doing that. Create a capability mask
in the device or something, so that you can test for "is this a G4x" with
a single bit test, rather than have code like this:
mov 0x31c(%rsi),%eax
cmp $0x2982,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x2972,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x2992,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x29a2,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x2a02,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x2a12,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x2a42,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x2e02,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x2e12,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x2e22,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x2e32,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
cmp $0x42,%eax
je 0xffffffff8124b669 <i915_driver_irq_handler+177>
for that IS_G4X() thing (I'm not kidding - that's exactly a hundred bytes
of code for that _stupid_ test, and it's inlined!)
Anyway, we're getting that DRM irq, and it has a normal IRQ stack trace:
> [ 352.804006] Process Xorg (pid: 4424, threadinfo ffff8800b6b1a000, task ffff880037373c00)
> [ 352.804006] Call Trace:
> [ 352.804006] <IRQ>
> [ 352.804006] [<ffffffff8106db7d>] ? mark_held_locks+0x6d/0x90
> [ 352.804006] [<ffffffff81098ee8>] handle_IRQ_event+0x68/0x170
> [ 352.804006] [<ffffffff8109ac01>] handle_edge_irq+0xc1/0x160
> [ 352.804006] [<ffffffff8100e76f>] handle_irq+0x1f/0x30
> [ 352.804006] [<ffffffff8100dc6a>] do_IRQ+0x6a/0xf0
> [ 352.804006] [<ffffffff8100c793>] ret_from_intr+0x0/0xf
.. but it happened just as we're tearing down the DRM irq handling:
> [ 352.804006] <EOI>
> [ 352.804006] [<ffffffff81070b88>] ? lock_acquire+0xe8/0x100
> [ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
> [ 352.804006] [<ffffffff8132d7b5>] ? mutex_lock_nested+0x45/0x320
> [ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
> [ 352.804006] [<ffffffff8106de85>] ? trace_hardirqs_on_caller+0x145/0x190
> [ 352.804006] [<ffffffff8106dedd>] ? trace_hardirqs_on+0xd/0x10
> [ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180 [drm]
> [ 352.804006] [<ffffffffa03f3335>] ? i915_gem_idle+0x225/0x330 [i915]
> [ 352.804006] [<ffffffffa03f34c7>] ? i915_gem_leavevt_ioctl+0x37/0x50 [i915]
> [ 352.804006] [<ffffffffa03bdafd>] ? drm_ioctl+0x17d/0x3c0 [drm]
> [ 352.804006] [<ffffffffa03f3490>] ? i915_gem_leavevt_ioctl+0x0/0x50 [i915]
so what is going on is that the i915 driver has obviously torn down some
state before it uninstalls the irq, so the irq happens when the state has
already been torn down, and the irq handler is not ready for that.
This patch *may* fix it - simply by getting rid of the irq early. However,
I did not check whether maybe something in i915_gem_idle() actually needs
the interrupt to be able to happen, so this is TOTALLY UNTESTED!
Linus
---
drivers/gpu/drm/i915/i915_gem.c | 6 +-----
1 files changed, 1 insertions(+), 5 deletions(-)
diff --git a/drivers/gpu/drm/i915/i915_gem.c b/drivers/gpu/drm/i915/i915_gem.c
index 7edb5b9..80e5ba4 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -4232,15 +4232,11 @@ int
i915_gem_leavevt_ioctl(struct drm_device *dev, void *data,
struct drm_file *file_priv)
{
- int ret;
-
if (drm_core_check_feature(dev, DRIVER_MODESET))
return 0;
- ret = i915_gem_idle(dev);
drm_irq_uninstall(dev);
-
- return ret;
+ return i915_gem_idle(dev);
}
void
^ permalink raw reply related [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
@ 2009-09-08 18:20 ` Jesse Barnes
0 siblings, 0 replies; 286+ messages in thread
From: Jesse Barnes @ 2009-09-08 18:20 UTC (permalink / raw)
To: Linus Torvalds
Cc: reinette chatre, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling, bugzilla-daemon
On Tue, 8 Sep 2009 11:06:21 -0700 (PDT)
Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> On Tue, 8 Sep 2009, reinette chatre wrote:
> >
> > As you can see from the kernel version it is not a build of a
> > vanilla kernel. It only contains changes related to the wireless
> > networking work I am doing.
> >
> > Here is the output:
>
> Thanks, this is great. It pinpoints the problem very effectively.
>
> > [ 352.803960] BUG: unable to handle kernel NULL pointer
> > dereference at 0000000000000084 [ 352.804006] IP:
> > [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]
>
> The code here is
>
> 16: 48 8b 80 00 01 00 00 mov
> 0x100(%rax),%rax 1d: 48 8b 50 08 mov
> 0x8(%rax),%rdx 21: 48 85 d2 test
> %rdx,%rdx 24: 74 11 je 0x37
> 26: 49 8b 44 24 78 mov
> 0x78(%r12),%rax 2b:* 8b 80 84 00 00 00 mov
> 0x84(%rax),%eax <-- trapping instruction 31: 89 82 08 08
> 00 00 mov %eax,0x808(%rdx) 37: f6 45 a0
> 02 testb $0x2,-0x60(%rbp)
>
> and that "testb $0x2, -0x60(%rbp)" seems to be the
>
> if (iir & I915_USER_INTERRUPT) {
>
> test if I'm reading things right. Although it could also be the
>
> if (eir & I915_ERROR_MEMORY_REFRESH) {
>
> thing. The disassembly is totally impossible to read, because the
> stupid i915 driver is chock-full of crap like
>
> if (IS_G4X(dev)) {
> ..
>
> which expands to insane amounts of code that check the PCI ID's one
> by one.
>
> Intel guys: could you _please_ stop doing that. Create a capability
> mask in the device or something, so that you can test for "is this a
> G4x" with a single bit test, rather than have code like this:
>
> mov 0x31c(%rsi),%eax
> cmp $0x2982,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2972,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2992,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x29a2,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2a02,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2a12,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2a42,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2e02,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2e12,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2e22,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2e32,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x42,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
>
> for that IS_G4X() thing (I'm not kidding - that's exactly a hundred
> bytes of code for that _stupid_ test, and it's inlined!)
Yeah things are getting a bit out of hand there... We've moved to
feature tests for some things, but they're still PCI ID based; however
they should be easy to convert.
>
> Anyway, we're getting that DRM irq, and it has a normal IRQ stack
> trace:
>
> > [ 352.804006] Process Xorg (pid: 4424, threadinfo
> > ffff8800b6b1a000, task ffff880037373c00) [ 352.804006] Call Trace:
> > [ 352.804006] <IRQ>
> > [ 352.804006] [<ffffffff8106db7d>] ? mark_held_locks+0x6d/0x90
> > [ 352.804006] [<ffffffff81098ee8>] handle_IRQ_event+0x68/0x170
> > [ 352.804006] [<ffffffff8109ac01>] handle_edge_irq+0xc1/0x160
> > [ 352.804006] [<ffffffff8100e76f>] handle_irq+0x1f/0x30
> > [ 352.804006] [<ffffffff8100dc6a>] do_IRQ+0x6a/0xf0
> > [ 352.804006] [<ffffffff8100c793>] ret_from_intr+0x0/0xf
>
> .. but it happened just as we're tearing down the DRM irq handling:
>
> > [ 352.804006] <EOI>
> > [ 352.804006] [<ffffffff81070b88>] ? lock_acquire+0xe8/0x100
> > [ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180
> > [drm] [ 352.804006] [<ffffffff8132d7b5>] ?
> > mutex_lock_nested+0x45/0x320 [ 352.804006] [<ffffffffa03c0b85>] ?
> > drm_irq_uninstall+0x65/0x180 [drm] [ 352.804006]
> > [<ffffffff8106de85>] ? trace_hardirqs_on_caller+0x145/0x190
> > [ 352.804006] [<ffffffff8106dedd>] ? trace_hardirqs_on+0xd/0x10
> > [ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180
> > [drm] [ 352.804006] [<ffffffffa03f3335>] ?
> > i915_gem_idle+0x225/0x330 [i915] [ 352.804006]
> > [<ffffffffa03f34c7>] ? i915_gem_leavevt_ioctl+0x37/0x50 [i915]
> > [ 352.804006] [<ffffffffa03bdafd>] ? drm_ioctl+0x17d/0x3c0 [drm]
> > [ 352.804006] [<ffffffffa03f3490>] ?
> > i915_gem_leavevt_ioctl+0x0/0x50 [i915]
>
> so what is going on is that the i915 driver has obviously torn down
> some state before it uninstalls the irq, so the irq happens when the
> state has already been torn down, and the irq handler is not ready
> for that.
>
> This patch *may* fix it - simply by getting rid of the irq early.
> However, I did not check whether maybe something in i915_gem_idle()
> actually needs the interrupt to be able to happen, so this is TOTALLY
> UNTESTED!
>
> Linus
> ---
> drivers/gpu/drm/i915/i915_gem.c | 6 +-----
> 1 files changed, 1 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c
> b/drivers/gpu/drm/i915/i915_gem.c index 7edb5b9..80e5ba4 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4232,15 +4232,11 @@ int
> i915_gem_leavevt_ioctl(struct drm_device *dev, void *data,
> struct drm_file *file_priv)
> {
> - int ret;
> -
> if (drm_core_check_feature(dev, DRIVER_MODESET))
> return 0;
>
> - ret = i915_gem_idle(dev);
> drm_irq_uninstall(dev);
> -
> - return ret;
> + return i915_gem_idle(dev);
> }
Theoretically i915_gem_idle should prevent any user interrupts from
coming in. If we uninstall the IRQ first we i915_gem_idle probably
won't work anymore, since it queues an interrupt and waits for it.
Eric, any thoughts on this? We shouldn't be racing to queue new work
after the idle call since we suspend GEM at that point, so we must be
failing to manage our active lists properly somehow?
--
Jesse Barnes, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
@ 2009-09-08 18:20 ` Jesse Barnes
0 siblings, 0 replies; 286+ messages in thread
From: Jesse Barnes @ 2009-09-08 18:20 UTC (permalink / raw)
To: Linus Torvalds
Cc: reinette chatre, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling,
bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r
On Tue, 8 Sep 2009 11:06:21 -0700 (PDT)
Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
>
>
> On Tue, 8 Sep 2009, reinette chatre wrote:
> >
> > As you can see from the kernel version it is not a build of a
> > vanilla kernel. It only contains changes related to the wireless
> > networking work I am doing.
> >
> > Here is the output:
>
> Thanks, this is great. It pinpoints the problem very effectively.
>
> > [ 352.803960] BUG: unable to handle kernel NULL pointer
> > dereference at 0000000000000084 [ 352.804006] IP:
> > [<ffffffffa03ecaab>] i915_driver_irq_handler+0x26b/0xd20 [i915]
>
> The code here is
>
> 16: 48 8b 80 00 01 00 00 mov
> 0x100(%rax),%rax 1d: 48 8b 50 08 mov
> 0x8(%rax),%rdx 21: 48 85 d2 test
> %rdx,%rdx 24: 74 11 je 0x37
> 26: 49 8b 44 24 78 mov
> 0x78(%r12),%rax 2b:* 8b 80 84 00 00 00 mov
> 0x84(%rax),%eax <-- trapping instruction 31: 89 82 08 08
> 00 00 mov %eax,0x808(%rdx) 37: f6 45 a0
> 02 testb $0x2,-0x60(%rbp)
>
> and that "testb $0x2, -0x60(%rbp)" seems to be the
>
> if (iir & I915_USER_INTERRUPT) {
>
> test if I'm reading things right. Although it could also be the
>
> if (eir & I915_ERROR_MEMORY_REFRESH) {
>
> thing. The disassembly is totally impossible to read, because the
> stupid i915 driver is chock-full of crap like
>
> if (IS_G4X(dev)) {
> ..
>
> which expands to insane amounts of code that check the PCI ID's one
> by one.
>
> Intel guys: could you _please_ stop doing that. Create a capability
> mask in the device or something, so that you can test for "is this a
> G4x" with a single bit test, rather than have code like this:
>
> mov 0x31c(%rsi),%eax
> cmp $0x2982,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2972,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2992,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x29a2,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2a02,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2a12,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2a42,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2e02,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2e12,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2e22,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x2e32,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
> cmp $0x42,%eax
> je 0xffffffff8124b669 <i915_driver_irq_handler+177>
>
> for that IS_G4X() thing (I'm not kidding - that's exactly a hundred
> bytes of code for that _stupid_ test, and it's inlined!)
Yeah things are getting a bit out of hand there... We've moved to
feature tests for some things, but they're still PCI ID based; however
they should be easy to convert.
>
> Anyway, we're getting that DRM irq, and it has a normal IRQ stack
> trace:
>
> > [ 352.804006] Process Xorg (pid: 4424, threadinfo
> > ffff8800b6b1a000, task ffff880037373c00) [ 352.804006] Call Trace:
> > [ 352.804006] <IRQ>
> > [ 352.804006] [<ffffffff8106db7d>] ? mark_held_locks+0x6d/0x90
> > [ 352.804006] [<ffffffff81098ee8>] handle_IRQ_event+0x68/0x170
> > [ 352.804006] [<ffffffff8109ac01>] handle_edge_irq+0xc1/0x160
> > [ 352.804006] [<ffffffff8100e76f>] handle_irq+0x1f/0x30
> > [ 352.804006] [<ffffffff8100dc6a>] do_IRQ+0x6a/0xf0
> > [ 352.804006] [<ffffffff8100c793>] ret_from_intr+0x0/0xf
>
> .. but it happened just as we're tearing down the DRM irq handling:
>
> > [ 352.804006] <EOI>
> > [ 352.804006] [<ffffffff81070b88>] ? lock_acquire+0xe8/0x100
> > [ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180
> > [drm] [ 352.804006] [<ffffffff8132d7b5>] ?
> > mutex_lock_nested+0x45/0x320 [ 352.804006] [<ffffffffa03c0b85>] ?
> > drm_irq_uninstall+0x65/0x180 [drm] [ 352.804006]
> > [<ffffffff8106de85>] ? trace_hardirqs_on_caller+0x145/0x190
> > [ 352.804006] [<ffffffff8106dedd>] ? trace_hardirqs_on+0xd/0x10
> > [ 352.804006] [<ffffffffa03c0b85>] ? drm_irq_uninstall+0x65/0x180
> > [drm] [ 352.804006] [<ffffffffa03f3335>] ?
> > i915_gem_idle+0x225/0x330 [i915] [ 352.804006]
> > [<ffffffffa03f34c7>] ? i915_gem_leavevt_ioctl+0x37/0x50 [i915]
> > [ 352.804006] [<ffffffffa03bdafd>] ? drm_ioctl+0x17d/0x3c0 [drm]
> > [ 352.804006] [<ffffffffa03f3490>] ?
> > i915_gem_leavevt_ioctl+0x0/0x50 [i915]
>
> so what is going on is that the i915 driver has obviously torn down
> some state before it uninstalls the irq, so the irq happens when the
> state has already been torn down, and the irq handler is not ready
> for that.
>
> This patch *may* fix it - simply by getting rid of the irq early.
> However, I did not check whether maybe something in i915_gem_idle()
> actually needs the interrupt to be able to happen, so this is TOTALLY
> UNTESTED!
>
> Linus
> ---
> drivers/gpu/drm/i915/i915_gem.c | 6 +-----
> 1 files changed, 1 insertions(+), 5 deletions(-)
>
> diff --git a/drivers/gpu/drm/i915/i915_gem.c
> b/drivers/gpu/drm/i915/i915_gem.c index 7edb5b9..80e5ba4 100644
> --- a/drivers/gpu/drm/i915/i915_gem.c
> +++ b/drivers/gpu/drm/i915/i915_gem.c
> @@ -4232,15 +4232,11 @@ int
> i915_gem_leavevt_ioctl(struct drm_device *dev, void *data,
> struct drm_file *file_priv)
> {
> - int ret;
> -
> if (drm_core_check_feature(dev, DRIVER_MODESET))
> return 0;
>
> - ret = i915_gem_idle(dev);
> drm_irq_uninstall(dev);
> -
> - return ret;
> + return i915_gem_idle(dev);
> }
Theoretically i915_gem_idle should prevent any user interrupts from
coming in. If we uninstall the IRQ first we i915_gem_idle probably
won't work anymore, since it queues an interrupt and waits for it.
Eric, any thoughts on this? We shouldn't be racing to queue new work
after the idle call since we suspend GEM at that point, so we must be
failing to manage our active lists properly somehow?
--
Jesse Barnes, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
2009-09-08 18:20 ` Jesse Barnes
@ 2009-09-08 19:26 ` Linus Torvalds
-1 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-08 19:26 UTC (permalink / raw)
To: Jesse Barnes
Cc: reinette chatre, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling, bugzilla-daemon
On Tue, 8 Sep 2009, Jesse Barnes wrote:
>
> Theoretically i915_gem_idle should prevent any user interrupts from
> coming in.
That is _entirely_ immaterial.
The thing is, interrupts can be shared. So it does not matter ONE WHIT
that you are trying to idle the hardware - there may be _other_ hardware
in the machine that is not idle, and that raises the same shared
interrupt. End result: the irq handler will be called, whether your
particular hardware is idle or not.
So if you tear down data structures that the interrupt handler needs, you
_ABSOLUTELY_ must first unregister the whole interrupt.
Also, even if there are no shared interrupts or any other devices, there
can easily be old pending interrupts still queued up on IO-APIC's etc. So
even though you quiesce the hardware, there is no guarantee that there
aren't some pending interrupts that happened just before you turned off
the interrupt from the hardware side, and are still "en route" to the CPU.
Which gets us exactly the same rule as if there were shared interrupts: if
your interrupt handler depends on some data structure, you must tear down
the interrupt handler _before_ you tear down the data structures it
depends on (and in the reverse order when setting things up, of course).
> If we uninstall the IRQ first we i915_gem_idle probably
> won't work anymore, since it queues an interrupt and waits for it.
So then you'd better fix that. Because the code as is is very
fundamentally buggy.
> Eric, any thoughts on this? We shouldn't be racing to queue new work
> after the idle call since we suspend GEM at that point, so we must be
> failing to manage our active lists properly somehow?
See my previous email. The bug is that you do
i915_gem_cleanup_ringbuffer ->
i915_gem_cleanup_hws ->
dev_priv->hw_status_page = NULL;
while interrupts are still enabled and coming in. And the interrupt path
wants to access that hw_status_page. Which you just destroyed.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
@ 2009-09-08 19:26 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-08 19:26 UTC (permalink / raw)
To: Jesse Barnes
Cc: reinette chatre, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling,
bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r
On Tue, 8 Sep 2009, Jesse Barnes wrote:
>
> Theoretically i915_gem_idle should prevent any user interrupts from
> coming in.
That is _entirely_ immaterial.
The thing is, interrupts can be shared. So it does not matter ONE WHIT
that you are trying to idle the hardware - there may be _other_ hardware
in the machine that is not idle, and that raises the same shared
interrupt. End result: the irq handler will be called, whether your
particular hardware is idle or not.
So if you tear down data structures that the interrupt handler needs, you
_ABSOLUTELY_ must first unregister the whole interrupt.
Also, even if there are no shared interrupts or any other devices, there
can easily be old pending interrupts still queued up on IO-APIC's etc. So
even though you quiesce the hardware, there is no guarantee that there
aren't some pending interrupts that happened just before you turned off
the interrupt from the hardware side, and are still "en route" to the CPU.
Which gets us exactly the same rule as if there were shared interrupts: if
your interrupt handler depends on some data structure, you must tear down
the interrupt handler _before_ you tear down the data structures it
depends on (and in the reverse order when setting things up, of course).
> If we uninstall the IRQ first we i915_gem_idle probably
> won't work anymore, since it queues an interrupt and waits for it.
So then you'd better fix that. Because the code as is is very
fundamentally buggy.
> Eric, any thoughts on this? We shouldn't be racing to queue new work
> after the idle call since we suspend GEM at that point, so we must be
> failing to manage our active lists properly somehow?
See my previous email. The bug is that you do
i915_gem_cleanup_ringbuffer ->
i915_gem_cleanup_hws ->
dev_priv->hw_status_page = NULL;
while interrupts are still enabled and coming in. And the interrupt path
wants to access that hw_status_page. Which you just destroyed.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
@ 2009-09-08 19:31 ` Jesse Barnes
0 siblings, 0 replies; 286+ messages in thread
From: Jesse Barnes @ 2009-09-08 19:31 UTC (permalink / raw)
To: Linus Torvalds
Cc: reinette chatre, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling, bugzilla-daemon
On Tue, 8 Sep 2009 12:26:45 -0700 (PDT)
Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> On Tue, 8 Sep 2009, Jesse Barnes wrote:
> >
> > Theoretically i915_gem_idle should prevent any user interrupts from
> > coming in.
>
> That is _entirely_ immaterial.
>
> The thing is, interrupts can be shared. So it does not matter ONE
> WHIT that you are trying to idle the hardware - there may be _other_
> hardware in the machine that is not idle, and that raises the same
> shared interrupt. End result: the irq handler will be called, whether
> your particular hardware is idle or not.
Which is fine. We can handle interrupts in the shared case. It's
specific IRQ statuses we can't handle. E.g. if we've explicitly turned
off vblank events we definitely won't expect to see them in the handler
(assuming we've taken care to barrier things like you mention below).
> So if you tear down data structures that the interrupt handler needs,
> you _ABSOLUTELY_ must first unregister the whole interrupt.
>
> Also, even if there are no shared interrupts or any other devices,
> there can easily be old pending interrupts still queued up on
> IO-APIC's etc. So even though you quiesce the hardware, there is no
> guarantee that there aren't some pending interrupts that happened
> just before you turned off the interrupt from the hardware side, and
> are still "en route" to the CPU.
The way we barrier things should handle that case.
> Which gets us exactly the same rule as if there were shared
> interrupts: if your interrupt handler depends on some data structure,
> you must tear down the interrupt handler _before_ you tear down the
> data structures it depends on (and in the reverse order when setting
> things up, of course).
>
> > If we uninstall the IRQ first we i915_gem_idle probably
> > won't work anymore, since it queues an interrupt and waits for it.
>
> So then you'd better fix that. Because the code as is is very
> fundamentally buggy.
>
> > Eric, any thoughts on this? We shouldn't be racing to queue new
> > work after the idle call since we suspend GEM at that point, so we
> > must be failing to manage our active lists properly somehow?
>
> See my previous email. The bug is that you do
>
> i915_gem_cleanup_ringbuffer ->
> i915_gem_cleanup_hws ->
> dev_priv->hw_status_page = NULL;
>
> while interrupts are still enabled and coming in. And the interrupt
> path wants to access that hw_status_page. Which you just destroyed.
Yeah, saw that. I don't think that's the root cause though. If we see
a user interrupt after gem_idle is called we may have serious issues in
our command handling code.
--
Jesse Barnes, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
@ 2009-09-08 19:31 ` Jesse Barnes
0 siblings, 0 replies; 286+ messages in thread
From: Jesse Barnes @ 2009-09-08 19:31 UTC (permalink / raw)
To: Linus Torvalds
Cc: reinette chatre, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling,
bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r
On Tue, 8 Sep 2009 12:26:45 -0700 (PDT)
Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
>
>
> On Tue, 8 Sep 2009, Jesse Barnes wrote:
> >
> > Theoretically i915_gem_idle should prevent any user interrupts from
> > coming in.
>
> That is _entirely_ immaterial.
>
> The thing is, interrupts can be shared. So it does not matter ONE
> WHIT that you are trying to idle the hardware - there may be _other_
> hardware in the machine that is not idle, and that raises the same
> shared interrupt. End result: the irq handler will be called, whether
> your particular hardware is idle or not.
Which is fine. We can handle interrupts in the shared case. It's
specific IRQ statuses we can't handle. E.g. if we've explicitly turned
off vblank events we definitely won't expect to see them in the handler
(assuming we've taken care to barrier things like you mention below).
> So if you tear down data structures that the interrupt handler needs,
> you _ABSOLUTELY_ must first unregister the whole interrupt.
>
> Also, even if there are no shared interrupts or any other devices,
> there can easily be old pending interrupts still queued up on
> IO-APIC's etc. So even though you quiesce the hardware, there is no
> guarantee that there aren't some pending interrupts that happened
> just before you turned off the interrupt from the hardware side, and
> are still "en route" to the CPU.
The way we barrier things should handle that case.
> Which gets us exactly the same rule as if there were shared
> interrupts: if your interrupt handler depends on some data structure,
> you must tear down the interrupt handler _before_ you tear down the
> data structures it depends on (and in the reverse order when setting
> things up, of course).
>
> > If we uninstall the IRQ first we i915_gem_idle probably
> > won't work anymore, since it queues an interrupt and waits for it.
>
> So then you'd better fix that. Because the code as is is very
> fundamentally buggy.
>
> > Eric, any thoughts on this? We shouldn't be racing to queue new
> > work after the idle call since we suspend GEM at that point, so we
> > must be failing to manage our active lists properly somehow?
>
> See my previous email. The bug is that you do
>
> i915_gem_cleanup_ringbuffer ->
> i915_gem_cleanup_hws ->
> dev_priv->hw_status_page = NULL;
>
> while interrupts are still enabled and coming in. And the interrupt
> path wants to access that hw_status_page. Which you just destroyed.
Yeah, saw that. I don't think that's the root cause though. If we see
a user interrupt after gem_idle is called we may have serious issues in
our command handling code.
--
Jesse Barnes, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
2009-09-08 19:31 ` Jesse Barnes
@ 2009-09-08 22:06 ` Linus Torvalds
-1 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-08 22:06 UTC (permalink / raw)
To: Jesse Barnes
Cc: reinette chatre, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling, bugzilla-daemon
On Tue, 8 Sep 2009, Jesse Barnes wrote:
>
> Yeah, saw that. I don't think that's the root cause though. If we see
> a user interrupt after gem_idle is called we may have serious issues in
> our command handling code.
Quite frankly, I do not understand why you seem to be making excuses for
code that causes a very nasty and undebuggable oops, causing the machine
to die.
This regression is almost two months old, and apparently the Intel
graphics people DID ABSOLUTELY NOTHING about it during those two months,
because they couldn't be bothered to look at it.
And now, when I pinpointed exactly where the oops happens, and what the
cause is, you seem to be trying to hold things up. I wanted to do the
final 2.6.31 release yesterday, quite frankly I'm not in the _least_
interested in excuses, I'm interested in something that at least gets us
back to the 2.6.30 state that doesn't oops!
Get me a patch, please. If disabling the interrupts early won't work, get
me something else. Stop delaying it - it's been pending for 48 days
already.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
@ 2009-09-08 22:06 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-08 22:06 UTC (permalink / raw)
To: Jesse Barnes
Cc: reinette chatre, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling,
bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r
On Tue, 8 Sep 2009, Jesse Barnes wrote:
>
> Yeah, saw that. I don't think that's the root cause though. If we see
> a user interrupt after gem_idle is called we may have serious issues in
> our command handling code.
Quite frankly, I do not understand why you seem to be making excuses for
code that causes a very nasty and undebuggable oops, causing the machine
to die.
This regression is almost two months old, and apparently the Intel
graphics people DID ABSOLUTELY NOTHING about it during those two months,
because they couldn't be bothered to look at it.
And now, when I pinpointed exactly where the oops happens, and what the
cause is, you seem to be trying to hold things up. I wanted to do the
final 2.6.31 release yesterday, quite frankly I'm not in the _least_
interested in excuses, I'm interested in something that at least gets us
back to the 2.6.30 state that doesn't oops!
Get me a patch, please. If disabling the interrupts early won't work, get
me something else. Stop delaying it - it's been pending for 48 days
already.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
@ 2009-09-08 22:11 ` Jesse Barnes
0 siblings, 0 replies; 286+ messages in thread
From: Jesse Barnes @ 2009-09-08 22:11 UTC (permalink / raw)
To: Linus Torvalds
Cc: reinette chatre, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling, bugzilla-daemon
On Tue, 8 Sep 2009 15:06:21 -0700 (PDT)
Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> On Tue, 8 Sep 2009, Jesse Barnes wrote:
> >
> > Yeah, saw that. I don't think that's the root cause though. If we
> > see a user interrupt after gem_idle is called we may have serious
> > issues in our command handling code.
>
> Quite frankly, I do not understand why you seem to be making excuses
> for code that causes a very nasty and undebuggable oops, causing the
> machine to die.
No excuses. This is a serious bug; I just don't want to paper over it.
> This regression is almost two months old, and apparently the Intel
> graphics people DID ABSOLUTELY NOTHING about it during those two
> months, because they couldn't be bothered to look at it.
Yeah sorry, this is the first I've seen of it... I usually troll the
regressions lists but I must have missed this one.
> And now, when I pinpointed exactly where the oops happens, and what
> the cause is, you seem to be trying to hold things up. I wanted to do
> the final 2.6.31 release yesterday, quite frankly I'm not in the
> _least_ interested in excuses, I'm interested in something that at
> least gets us back to the 2.6.30 state that doesn't oops!
>
> Get me a patch, please. If disabling the interrupts early won't work,
> get me something else. Stop delaying it - it's been pending for 48
> days already.
Sure, looking at it now.
--
Jesse Barnes, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
@ 2009-09-08 22:11 ` Jesse Barnes
0 siblings, 0 replies; 286+ messages in thread
From: Jesse Barnes @ 2009-09-08 22:11 UTC (permalink / raw)
To: Linus Torvalds
Cc: reinette chatre, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling,
bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r
On Tue, 8 Sep 2009 15:06:21 -0700 (PDT)
Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org> wrote:
>
>
> On Tue, 8 Sep 2009, Jesse Barnes wrote:
> >
> > Yeah, saw that. I don't think that's the root cause though. If we
> > see a user interrupt after gem_idle is called we may have serious
> > issues in our command handling code.
>
> Quite frankly, I do not understand why you seem to be making excuses
> for code that causes a very nasty and undebuggable oops, causing the
> machine to die.
No excuses. This is a serious bug; I just don't want to paper over it.
> This regression is almost two months old, and apparently the Intel
> graphics people DID ABSOLUTELY NOTHING about it during those two
> months, because they couldn't be bothered to look at it.
Yeah sorry, this is the first I've seen of it... I usually troll the
regressions lists but I must have missed this one.
> And now, when I pinpointed exactly where the oops happens, and what
> the cause is, you seem to be trying to hold things up. I wanted to do
> the final 2.6.31 release yesterday, quite frankly I'm not in the
> _least_ interested in excuses, I'm interested in something that at
> least gets us back to the 2.6.30 state that doesn't oops!
>
> Get me a patch, please. If disabling the interrupts early won't work,
> get me something else. Stop delaying it - it's been pending for 48
> days already.
Sure, looking at it now.
--
Jesse Barnes, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
2009-09-08 22:11 ` Jesse Barnes
@ 2009-09-08 23:36 ` Linus Torvalds
-1 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-08 23:36 UTC (permalink / raw)
To: Jesse Barnes
Cc: reinette chatre, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling, bugzilla-daemon
On Tue, 8 Sep 2009, Jesse Barnes wrote:
> > This regression is almost two months old, and apparently the Intel
> > graphics people DID ABSOLUTELY NOTHING about it during those two
> > months, because they couldn't be bothered to look at it.
>
> Yeah sorry, this is the first I've seen of it... I usually troll the
> regressions lists but I must have missed this one.
Hmm. We must have screwed up something, because this was bisected to the
intel DRI commits back in July. See
http://bugzilla.kernel.org/show_bug.cgi?id=13819#c4
and while there was some confusion about exactly which commit caused
it - probably because the irq thing obviously depends on timing -
Reinette had a list of three commits that he used to be able to revert to
get things going:
drm/i915: Don't update display FIFO watermark on IGDNG
drm/i915: add FIFO watermark support
drm/i915: enable error detection & state collection
So Andrew assigned it to DRI, and Rafael has had both Eric and Ma Ling on
the cc for his regression reports because of the bisection. And that has
been going on for a long time, I just checked:
Date: Sun, 26 Jul 2009 22:28:26 +0200 (CEST)
From: Rafael J. Wysocki <rjw@sisk.pl>
To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
Cc: Kernel Testers List <kernel-testers@vger.kernel.org>, Eric Anholt <eric@anholt.net>, "ling.ma@intel.com" <ling.ma@intel.com>,
Linus Torvalds <torvalds@linux-foundation.org>, Ma Ling <ling.ma@intel.com>, Reinette Chatre <reinette.chatre@intel.com>
Subject: [Bug #13819] system freeze when switching to console
If you didn't see it, then that means that we have screw-ups with the
bugzilla thing. You're actually listed as a "Reviewed-by" on the commit
that the fixed-up bisection blamed - And I get the feeling that Rafael's
bugzilla "bugme" scripts may only pick up "Signed-off-by:" lines.
The point is: this bug has been in bisected in bugzilla for a month and a
half, and had at least two Intel DRI people cc'd on the weekly reminder
reports, along with being
Assigned To: drivers_video-dri@kernel-bugs.osdl.org
We have other bugs on the regression list that are even older (no, I'm not
proud of them):
http://bugzilla.kernel.org/show_bug.cgi?id=13809
http://bugzilla.kernel.org/show_bug.cgi?id=13740
http://bugzilla.kernel.org/show_bug.cgi?id=13733
http://bugzilla.kernel.org/show_bug.cgi?id=13645
but they aren't bisected and it's not nearly as clear what is going on
there. The last one in particular I don't know if it even happens any
more and the first one seems to be fixed in -rc5, or at least the
reporter couldn't reproduce it any more..
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
@ 2009-09-08 23:36 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-08 23:36 UTC (permalink / raw)
To: Jesse Barnes
Cc: reinette chatre, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling,
bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r
On Tue, 8 Sep 2009, Jesse Barnes wrote:
> > This regression is almost two months old, and apparently the Intel
> > graphics people DID ABSOLUTELY NOTHING about it during those two
> > months, because they couldn't be bothered to look at it.
>
> Yeah sorry, this is the first I've seen of it... I usually troll the
> regressions lists but I must have missed this one.
Hmm. We must have screwed up something, because this was bisected to the
intel DRI commits back in July. See
http://bugzilla.kernel.org/show_bug.cgi?id=13819#c4
and while there was some confusion about exactly which commit caused
it - probably because the irq thing obviously depends on timing -
Reinette had a list of three commits that he used to be able to revert to
get things going:
drm/i915: Don't update display FIFO watermark on IGDNG
drm/i915: add FIFO watermark support
drm/i915: enable error detection & state collection
So Andrew assigned it to DRI, and Rafael has had both Eric and Ma Ling on
the cc for his regression reports because of the bisection. And that has
been going on for a long time, I just checked:
Date: Sun, 26 Jul 2009 22:28:26 +0200 (CEST)
From: Rafael J. Wysocki <rjw-KKrjLPT3xs0@public.gmane.org>
To: Linux Kernel Mailing List <linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>
Cc: Kernel Testers List <kernel-testers-u79uwXL29TY76Z2rM5mHXA@public.gmane.org>, Eric Anholt <eric-WhKQ6XTQaPysTnJN9+BGXg@public.gmane.org>, "ling.ma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org" <ling.ma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
Linus Torvalds <torvalds-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org>, Ma Ling <ling.ma-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>, Reinette Chatre <reinette.chatre-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Subject: [Bug #13819] system freeze when switching to console
If you didn't see it, then that means that we have screw-ups with the
bugzilla thing. You're actually listed as a "Reviewed-by" on the commit
that the fixed-up bisection blamed - And I get the feeling that Rafael's
bugzilla "bugme" scripts may only pick up "Signed-off-by:" lines.
The point is: this bug has been in bisected in bugzilla for a month and a
half, and had at least two Intel DRI people cc'd on the weekly reminder
reports, along with being
Assigned To: drivers_video-dri-ztI5WcYan/vQLgFONoPN62D2FQJk+8+b@public.gmane.org
We have other bugs on the regression list that are even older (no, I'm not
proud of them):
http://bugzilla.kernel.org/show_bug.cgi?id=13809
http://bugzilla.kernel.org/show_bug.cgi?id=13740
http://bugzilla.kernel.org/show_bug.cgi?id=13733
http://bugzilla.kernel.org/show_bug.cgi?id=13645
but they aren't bisected and it's not nearly as clear what is going on
there. The last one in particular I don't know if it even happens any
more and the first one seems to be fixed in -rc5, or at least the
reporter couldn't reproduce it any more..
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
2009-09-08 23:36 ` Linus Torvalds
(?)
@ 2009-09-08 23:45 ` Jesse Barnes
-1 siblings, 0 replies; 286+ messages in thread
From: Jesse Barnes @ 2009-09-08 23:45 UTC (permalink / raw)
To: Linus Torvalds
Cc: reinette chatre, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling, bugzilla-daemon
On Tue, 8 Sep 2009 16:36:06 -0700 (PDT)
Linus Torvalds <torvalds@linux-foundation.org> wrote:
>
>
> On Tue, 8 Sep 2009, Jesse Barnes wrote:
> > > This regression is almost two months old, and apparently the
> > > Intel graphics people DID ABSOLUTELY NOTHING about it during
> > > those two months, because they couldn't be bothered to look at it.
> >
> > Yeah sorry, this is the first I've seen of it... I usually troll
> > the regressions lists but I must have missed this one.
>
> Hmm. We must have screwed up something, because this was bisected to
> the intel DRI commits back in July. See
>
> http://bugzilla.kernel.org/show_bug.cgi?id=13819#c4
>
> and while there was some confusion about exactly which commit caused
> it - probably because the irq thing obviously depends on timing -
> Reinette had a list of three commits that he used to be able to
> revert to get things going:
>
> drm/i915: Don't update display FIFO watermark on IGDNG
> drm/i915: add FIFO watermark support
> drm/i915: enable error detection & state collection
>
> So Andrew assigned it to DRI, and Rafael has had both Eric and Ma
> Ling on the cc for his regression reports because of the bisection.
> And that has been going on for a long time, I just checked:
>
> Date: Sun, 26 Jul 2009 22:28:26 +0200 (CEST)
> From: Rafael J. Wysocki <rjw@sisk.pl>
> To: Linux Kernel Mailing List <linux-kernel@vger.kernel.org>
> Cc: Kernel Testers List <kernel-testers@vger.kernel.org>, Eric
> Anholt <eric@anholt.net>, "ling.ma@intel.com" <ling.ma@intel.com>,
> Linus Torvalds <torvalds@linux-foundation.org>, Ma Ling
> <ling.ma@intel.com>, Reinette Chatre <reinette.chatre@intel.com>
> Subject: [Bug #13819] system freeze when switching to console
>
> If you didn't see it, then that means that we have screw-ups with the
> bugzilla thing. You're actually listed as a "Reviewed-by" on the
> commit that the fixed-up bisection blamed - And I get the feeling
> that Rafael's bugzilla "bugme" scripts may only pick up
> "Signed-off-by:" lines.
Reinette actually mailed me offlist about this; we corresponded
privately about this issue a month ago; I lost track of it while on
vacation (yeah I'm not on the cc lists for the bz or regression
updates). Totally my fault.
Anyway the bisects look like they might just be lucky; it sounds like
this wasn't a KMS related issue at all...
> We have other bugs on the regression list that are even older (no,
> I'm not proud of them):
>
> http://bugzilla.kernel.org/show_bug.cgi?id=13740
This one looks gfx related, upstream bug is
https://bugs.freedesktop.org/show_bug.cgi?id=23096.
The graphics group tracks freedesktop.org bugs on a weekly basis since
that's where a vast majority of our bugs our filed (often from OSVs);
I'll get the kernel bugzilla stuff included in our future scrubs so we
don't miss stuff like this.
--
Jesse Barnes, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
2009-09-08 22:06 ` Linus Torvalds
(?)
(?)
@ 2009-09-08 23:05 ` Jesse Barnes
2009-09-08 23:56 ` reinette chatre
-1 siblings, 1 reply; 286+ messages in thread
From: Jesse Barnes @ 2009-09-08 23:05 UTC (permalink / raw)
To: Linus Torvalds
Cc: reinette chatre, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling, bugzilla-daemon
On Tue, 8 Sep 2009 15:06:21 -0700 (PDT)
Linus Torvalds <torvalds@linux-foundation.org> wrote:
> And now, when I pinpointed exactly where the oops happens, and what
> the cause is, you seem to be trying to hold things up. I wanted to do
> the final 2.6.31 release yesterday, quite frankly I'm not in the
> _least_ interested in excuses, I'm interested in something that at
> least gets us back to the 2.6.30 state that doesn't oops!
Based on the earlier mail I thought this might have been a bigger
problem with the way we handle command submission and completion; but
on looking at things again (both Linus's debugging and your
configuration), I think this is actually a DRI1 & userspace related
issue. Back in the DRI1 days, the X server told the driver when to
register and unregister its irq handler, and had some responsibility
for making sure it didn't hose things (very easy to do with the old
architecture). Stuff like this was one of the main reasons we moved
most of the handling of this into the kernel...
We obviously need a kernel fix though; panics like this aren't
acceptable.
This fix is along the lines of Linus's initial suggestion; we
definitely are tearing down some state that the interrupt handler
needs. And the 2D driver isn't saving us from ourselves like it used
to (previously it would uninstall the IRQ handler before tearing down
the mappings; but with the kernel in charge of those now, we have to
handle it).
This one should disable i915 interrupts (we'll still handle shared ones
just fine as no-ops) at the point where we no longer need them, then
let the DRM core code take care of finally unregistering it.
Ugly, but I'd like to know if it works for you. Any chance you could
give it a try Reinette?
--
Jesse Barnes, Intel Open Source Technology Center
diff --git a/drivers/gpu/drm/i915/i915_gem.c
b/drivers/gpu/drm/i915/i915_gem.c index 0767521..487d902 100644
--- a/drivers/gpu/drm/i915/i915_gem.c
+++ b/drivers/gpu/drm/i915/i915_gem.c
@@ -3990,6 +3990,7 @@ i915_gem_idle(struct drm_device *dev)
return ret;
}
+ i915_driver_irq_uninstall(dev);
i915_gem_cleanup_ringbuffer(dev);
mutex_unlock(&dev->struct_mutex);
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
2009-09-08 23:05 ` Jesse Barnes
@ 2009-09-08 23:56 ` reinette chatre
0 siblings, 0 replies; 286+ messages in thread
From: reinette chatre @ 2009-09-08 23:56 UTC (permalink / raw)
To: Jesse Barnes
Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling, bugzilla-daemon
On Tue, 2009-09-08 at 16:05 -0700, Jesse Barnes wrote:
> Any chance you could
> give it a try Reinette?
This patch also solves the issue for me.
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Thank you very much
Reinette
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
@ 2009-09-08 19:19 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-08 19:19 UTC (permalink / raw)
To: reinette chatre
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling, bugzilla-daemon
On Tue, 8 Sep 2009, Linus Torvalds wrote:
>
> The code here is
>
> 16: 48 8b 80 00 01 00 00 mov 0x100(%rax),%rax
> 1d: 48 8b 50 08 mov 0x8(%rax),%rdx
> 21: 48 85 d2 test %rdx,%rdx
> 24: 74 11 je 0x37
> 26: 49 8b 44 24 78 mov 0x78(%r12),%rax
> 2b:* 8b 80 84 00 00 00 mov 0x84(%rax),%eax <-- trapping instruction
> 31: 89 82 08 08 00 00 mov %eax,0x808(%rdx)
> 37: f6 45 a0 02 testb $0x2,-0x60(%rbp)
>
> and that "testb $0x2, -0x60(%rbp)" seems to be the
>
> if (iir & I915_USER_INTERRUPT) {
Yeah, that seems to be the right thing.
So the actual faulting instruction is from this:
if (dev->primary->master) {
master_priv = dev->primary->master->driver_priv;
if (master_priv->sarea_priv)
master_priv->sarea_priv->last_dispatch =
READ_BREADCRUMB(dev_priv);
and it looks like %rax starts out being 'dev', then the
mov 0x100(%rax),%rax
means that %rax is now 'dev->primary', and then
mov 0x8(%rax),%rdx
moves 'dev->primary->master' into %rdx. It's not zero, so we then do that
READ_BREADCRUMB(dev_priv), which expands to
READ_HWSP(dev_priv, I915_BREADCRUMB_INDEX)
which in turn is
(((volatile u32*)(dev_priv->hw_status_page))[reg])
and it looks like dev_priv->hw_status_page is NULL.
You can verify this by looking at teh exception address:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000084
and that '84' is I915_BREADCRUMB_INDEX*4 (0x21*4).
And the problem seems to be that we've cleared the hw_status_page pointer
in i915_gem_cleanup_hws():
dev_priv->hw_status_page = NULL;
and we did that in
i915_gem_idle() ->
i915_gem_cleanup_ringbuffer() ->
i915_gem_cleanup_hws()
so now since interrupts are still enabled, you'll get a NULL pointer
dereference.
I think my patch is correct.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
@ 2009-09-08 19:19 ` Linus Torvalds
0 siblings, 0 replies; 286+ messages in thread
From: Linus Torvalds @ 2009-09-08 19:19 UTC (permalink / raw)
To: reinette chatre
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling,
bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r
On Tue, 8 Sep 2009, Linus Torvalds wrote:
>
> The code here is
>
> 16: 48 8b 80 00 01 00 00 mov 0x100(%rax),%rax
> 1d: 48 8b 50 08 mov 0x8(%rax),%rdx
> 21: 48 85 d2 test %rdx,%rdx
> 24: 74 11 je 0x37
> 26: 49 8b 44 24 78 mov 0x78(%r12),%rax
> 2b:* 8b 80 84 00 00 00 mov 0x84(%rax),%eax <-- trapping instruction
> 31: 89 82 08 08 00 00 mov %eax,0x808(%rdx)
> 37: f6 45 a0 02 testb $0x2,-0x60(%rbp)
>
> and that "testb $0x2, -0x60(%rbp)" seems to be the
>
> if (iir & I915_USER_INTERRUPT) {
Yeah, that seems to be the right thing.
So the actual faulting instruction is from this:
if (dev->primary->master) {
master_priv = dev->primary->master->driver_priv;
if (master_priv->sarea_priv)
master_priv->sarea_priv->last_dispatch =
READ_BREADCRUMB(dev_priv);
and it looks like %rax starts out being 'dev', then the
mov 0x100(%rax),%rax
means that %rax is now 'dev->primary', and then
mov 0x8(%rax),%rdx
moves 'dev->primary->master' into %rdx. It's not zero, so we then do that
READ_BREADCRUMB(dev_priv), which expands to
READ_HWSP(dev_priv, I915_BREADCRUMB_INDEX)
which in turn is
(((volatile u32*)(dev_priv->hw_status_page))[reg])
and it looks like dev_priv->hw_status_page is NULL.
You can verify this by looking at teh exception address:
BUG: unable to handle kernel NULL pointer dereference at 0000000000000084
and that '84' is I915_BREADCRUMB_INDEX*4 (0x21*4).
And the problem seems to be that we've cleared the hw_status_page pointer
in i915_gem_cleanup_hws():
dev_priv->hw_status_page = NULL;
and we did that in
i915_gem_idle() ->
i915_gem_cleanup_ringbuffer() ->
i915_gem_cleanup_hws()
so now since interrupts are still enabled, you'll get a NULL pointer
dereference.
I think my patch is correct.
Linus
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
@ 2009-09-08 22:37 ` reinette chatre
0 siblings, 0 replies; 286+ messages in thread
From: reinette chatre @ 2009-09-08 22:37 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling, bugzilla-daemon
On Tue, 2009-09-08 at 11:06 -0700, Linus Torvalds wrote:
> so this is TOTALLY UNTESTED!
I understand that the discussion is still going on whether this is the
right thing to do. Even so, I thought you may like to know that with
this patch I can again switch to console, back again, hibernate, and
shut down .. all without crashing my system.
Tested-by: Reinette Chatre <reinette.chatre@intel.com>
Thank you very much!
Reinette
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
@ 2009-09-08 22:37 ` reinette chatre
0 siblings, 0 replies; 286+ messages in thread
From: reinette chatre @ 2009-09-08 22:37 UTC (permalink / raw)
To: Linus Torvalds
Cc: Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling,
bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r
On Tue, 2009-09-08 at 11:06 -0700, Linus Torvalds wrote:
> so this is TOTALLY UNTESTED!
I understand that the discussion is still going on whether this is the
right thing to do. Even so, I thought you may like to know that with
this patch I can again switch to console, back again, hibernate, and
shut down .. all without crashing my system.
Tested-by: Reinette Chatre <reinette.chatre-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Thank you very much!
Reinette
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
2009-09-08 22:37 ` reinette chatre
(?)
@ 2009-09-08 23:16 ` Jesse Barnes
2009-09-08 23:27 ` reinette chatre
-1 siblings, 1 reply; 286+ messages in thread
From: Jesse Barnes @ 2009-09-08 23:16 UTC (permalink / raw)
To: reinette chatre
Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling, bugzilla-daemon
On Tue, 08 Sep 2009 15:37:41 -0700
reinette chatre <reinette.chatre@intel.com> wrote:
> On Tue, 2009-09-08 at 11:06 -0700, Linus Torvalds wrote:
> > so this is TOTALLY UNTESTED!
>
> I understand that the discussion is still going on whether this is the
> right thing to do. Even so, I thought you may like to know that with
> this patch I can again switch to console, back again, hibernate, and
> shut down .. all without crashing my system.
>
> Tested-by: Reinette Chatre <reinette.chatre@intel.com>
>
> Thank you very much!
Do you see "hardware wedged" messages in your log after using Linus's
patch? That's what I'd expect... ah no I see we don't call the
routine that requires interrupts in that path like I thought.
So Linus's patch is fine with me.
Acked-by: Jesse Barnes <jbarnes@virtuousgeek.org>
Sorry Linus, you were right; I was making this more complicated than it
had to be.
--
Jesse Barnes, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
2009-09-08 23:16 ` Jesse Barnes
@ 2009-09-08 23:27 ` reinette chatre
0 siblings, 0 replies; 286+ messages in thread
From: reinette chatre @ 2009-09-08 23:27 UTC (permalink / raw)
To: Jesse Barnes
Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling, bugzilla-daemon
On Tue, 2009-09-08 at 16:16 -0700, Jesse Barnes wrote:
> Do you see "hardware wedged" messages in your log after using Linus's
> patch? That's what I'd expect... ah no I see we don't call the
> routine that requires interrupts in that path like I thought.
I can confirm that. While using this patch, when I am in X and then
switch to console and back to X there are no new messages (checked with
dmesg).
Reinette
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
@ 2009-09-08 23:27 ` reinette chatre
0 siblings, 0 replies; 286+ messages in thread
From: reinette chatre @ 2009-09-08 23:27 UTC (permalink / raw)
To: Jesse Barnes
Cc: Linus Torvalds, Rafael J. Wysocki, Linux Kernel Mailing List,
Kernel Testers List, Eric Anholt, Ma, Ling,
bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r
On Tue, 2009-09-08 at 16:16 -0700, Jesse Barnes wrote:
> Do you see "hardware wedged" messages in your log after using Linus's
> patch? That's what I'd expect... ah no I see we don't call the
> routine that requires interrupts in that path like I thought.
I can confirm that. While using this patch, when I am in X and then
switch to console and back to X there are no new messages (checked with
dmesg).
Reinette
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
2009-09-06 17:24 ` Rafael J. Wysocki
(?)
(?)
@ 2009-09-08 17:24 ` Jesse Barnes
-1 siblings, 0 replies; 286+ messages in thread
From: Jesse Barnes @ 2009-09-08 17:24 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux Kernel Mailing List, Kernel Testers List, Eric Anholt,
ling.ma, Linus Torvalds, Reinette Chatre
On Sun, 6 Sep 2009 19:24:50 +0200 (CEST)
"Rafael J. Wysocki" <rjw@sisk.pl> wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me
> know (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
> Subject : system freeze when switching to console
> Submitter : Reinette Chatre <reinette.chatre@intel.com>
> Date : 2009-07-23 17:57 (46 days old)
So simply switching VTs causes this problem too? Based on your initial
description it sounds like a panic (keyboard LEDs were flashing). If
it happens at VT switch time you should be able to capture the panic
output with netconsole like Linus mentioned.
--
Jesse Barnes, Intel Open Source Technology Center
^ permalink raw reply [flat|nested] 286+ messages in thread
* 2.6.31-rc6-git5: Reported regressions from 2.6.30
@ 2009-08-19 20:20 Rafael J. Wysocki
2009-08-19 20:26 ` [Bug #13819] system freeze when switching to console Rafael J. Wysocki
0 siblings, 1 reply; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-19 20:20 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich,
Kernel Testers List, Network Development, Linux ACPI,
Linux PM List, Linux SCSI List, Linux Wireless List, DRI
This message contains a list of some regressions from 2.6.30, for which there
are no fixes in the mainline I know of. If any of them have been fixed already,
please let me know.
If you know of any other unresolved regressions from 2.6.30, please let me know
either and I'll add them to the list. Also, please let me know if any of the
entries below are invalid.
Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.
Listed regressions statistics:
Date Total Pending Unresolved
----------------------------------------
2009-08-20 102 32 29
2009-08-10 89 27 24
2009-08-02 76 36 28
2009-07-27 70 51 43
2009-07-07 35 25 21
2009-06-29 22 22 15
Unresolved regressions
----------------------
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14018
Subject : kernel freezes, inotify problem
Submitter : Christoph Thielecke <christoph.thielecke@gmx.de>
Date : 2009-08-19 12:48 (1 days old)
References : http://marc.info/?l=linux-kernel&m=125068616818353&w=4
Handled-By : Eric Paris <eparis@parisplace.org>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14016
Subject : mm/ipw2200 regression
Submitter : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Date : 2009-08-15 16:56 (5 days old)
References : http://marc.info/?l=linux-kernel&m=125036437221408&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14015
Subject : pty regressed again, breaking expect and gcc's testsuite
Submitter : Mikael Pettersson <mikpe@it.uu.se>
Date : 2009-08-14 23:41 (6 days old)
References : http://marc.info/?l=linux-kernel&m=125029329805643&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14014
Subject : kernel bug at shut down
Submitter : Norbert Preining <preining@logic.at>
Date : 2009-08-14 9:11 (6 days old)
References : http://marc.info/?l=linux-kernel&m=125024112418870&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14013
Subject : hd don't show up
Submitter : Tim Blechmann <tim@klingt.org>
Date : 2009-08-14 8:26 (6 days old)
References : http://marc.info/?l=linux-kernel&m=125023842514480&w=4
Handled-By : Tejun Heo <tj@kernel.org>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14012
Subject : latest git fried my x86_64 imac
Submitter : Justin P. Mattock <justinmattock@gmail.com>
Date : 2009-08-13 07:20 (7 days old)
References : http://marc.info/?l=linux-kernel&m=125014080427090&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14011
Subject : Kernel paging request failed in kmem_cache_alloc
Submitter : Matthias Dahl <ml_kernel@mortal-soul.de>
Date : 2009-08-10 22:26 (10 days old)
References : http://marc.info/?l=linux-kernel&m=124993603825082&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14003
Subject : Infinite loop on bootup while handling DMAR
Submitter : Bernhard Rosenkraenzer <bero@arklinux.org>
Date : 2009-08-18 14:54 (2 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14002
Subject : WARNING: at net/ipv4/af_inet.c:154 inet_sock_destruct+0x164/0x1c0()
Submitter : Ralf Hildebrandt <ralf.hildebrandt@charite.de>
Date : 2009-08-18 12:37 (2 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13987
Subject : Received NMI interrupt at resume
Submitter : Christian Casteyde <casteyde.christian@free.fr>
Date : 2009-08-15 07:55 (5 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13960
Subject : rtl8187 not connect to wifi
Submitter : okias <d.okias@gmail.com>
Date : 2009-08-10 19:16 (10 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13950
Subject : Oops when USB Serial disconnected while in use
Submitter : Bruno Prémont <bonbons@linux-vserver.org>
Date : 2009-08-08 17:47 (12 days old)
References : http://marc.info/?l=linux-kernel&m=124975432900466&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13947
Subject : Libertas: Association request to the driver failed
Submitter : Daniel Mack <daniel@caiaq.de>
Date : 2009-08-07 19:11 (13 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=57921c312e8cef72ba35a4cfe870b376da0b1b87
References : http://marc.info/?l=linux-kernel&m=124967234311481&w=4
Handled-By : Roel Kluin <roel.kluin@gmail.com>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943
Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k
Submitter : Fabio Comolli <fabio.comolli@gmail.com>
Date : 2009-08-06 20:15 (14 days old)
References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942
Subject : Troubles with AoE and uninitialized object
Submitter : Bruno Prémont <bonbons@linux-vserver.org>
Date : 2009-08-04 10:12 (16 days old)
References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13941
Subject : x86 Geode issue
Submitter : Martin-Éric Racine <q-funk@iki.fi>
Date : 2009-08-03 12:58 (17 days old)
References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940
Subject : iwlagn and sky2 stopped working, ACPI-related
Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net>
Date : 2009-08-07 22:33 (13 days old)
References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935
Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version)
Submitter : Adrian Ulrich <kernel@blinkenlights.ch>
Date : 2009-08-08 22:08 (12 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13914
Subject : e1000e reports invalid NVM Checksum on 82566DM-2 (bisected)
Submitter : <jsbronder@gentoo.org>
Date : 2009-08-04 18:06 (16 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13906
Subject : Huawei E169 GPRS connection causes Ooops
Submitter : Clemens Eisserer <linuxhippy@gmail.com>
Date : 2009-08-04 09:02 (16 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13899
Subject : Oops from tar, 2.6.31-rc5, 32 bit on quad core phenom.
Submitter : Gene Heskett <gene.heskett@verizon.net>
Date : 2009-08-01 13:04 (19 days old)
References : http://marc.info/?l=linux-kernel&m=124913190304149&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13869
Subject : Radeon framebuffer (w/o KMS) corruption at boot.
Submitter : Duncan <1i5t5.duncan@cox.net>
Date : 2009-07-29 16:44 (22 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13848
Subject : iwlwifi (4965) regression since 2.6.30
Submitter : Lukas Hejtmanek <xhejtman@ics.muni.cz>
Date : 2009-07-26 7:57 (25 days old)
References : http://marc.info/?l=linux-kernel&m=124859658502866&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836
Subject : suspend script fails, related to stdout?
Submitter : Tomas M. <tmezzadra@gmail.com>
Date : 2009-07-17 21:24 (34 days old)
References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
Subject : system freeze when switching to console
Submitter : Reinette Chatre <reinette.chatre@intel.com>
Date : 2009-07-23 17:57 (28 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13809
Subject : oprofile: possible circular locking dependency detected
Submitter : Jerome Marchand <jmarchan@redhat.com>
Date : 2009-07-22 13:35 (29 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13740
Subject : X server crashes with 2.6.31-rc2 when options are changed
Submitter : Michael S. Tsirkin <m.s.tsirkin@gmail.com>
Date : 2009-07-07 15:19 (44 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13733
Subject : 2.6.31-rc2: irq 16: nobody cared
Submitter : Niel Lambrechts <niel.lambrechts@gmail.com>
Date : 2009-07-06 18:32 (45 days old)
References : http://marc.info/?l=linux-kernel&m=124690524027166&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13645
Subject : NULL pointer dereference at (null) (level2_spare_pgt)
Submitter : poornima nayak <mpnayak@linux.vnet.ibm.com>
Date : 2009-06-17 17:56 (64 days old)
References : http://lkml.org/lkml/2009/6/17/194
Regressions with patches
------------------------
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=14017
Subject : _end symbol missing from Symbol.map
Submitter : Hannes Reinecke <hare@suse.de>
Date : 2009-08-13 6:45 (7 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=091e52c3551d3031343df24b573b770b4c6c72b6
References : http://marc.info/?l=linux-kernel&m=125014649102253&w=4
Handled-By : Hannes Reinecke <hare@suse.de>
Patch : http://marc.info/?l=linux-kernel&m=125014649102253&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13948
Subject : ath5k broken after suspend-to-ram
Submitter : Johannes Stezenbach <js@sig21.net>
Date : 2009-08-07 21:51 (13 days old)
References : http://marc.info/?l=linux-kernel&m=124968192727854&w=4
Handled-By : Nick Kossifidis <mickflemm@gmail.com>
Patch : http://patchwork.kernel.org/patch/38550/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13946
Subject : x86 MCE malfunction on Thinkpad T42p
Submitter : Johannes Stezenbach <js@sig21.net>
Date : 2009-08-07 17:09 (13 days old)
References : http://marc.info/?l=linux-kernel&m=124966500232399&w=4
Handled-By : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Patch : http://patchwork.kernel.org/patch/37908/
For details, please visit the bug entries and follow the links given in
references.
As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.30,
unresolved as well as resolved, at:
http://bugzilla.kernel.org/show_bug.cgi?id=13615
Please let me know if there are any Bugzilla entries that should be added to
the list in there.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13819] system freeze when switching to console
2009-08-19 20:20 2.6.31-rc6-git5: Reported regressions from 2.6.30 Rafael J. Wysocki
@ 2009-08-19 20:26 ` Rafael J. Wysocki
2009-08-19 23:35 ` reinette chatre
0 siblings, 1 reply; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-19 20:26 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Eric Anholt, ling.ma, Linus Torvalds,
Ma Ling, Reinette Chatre
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
Subject : system freeze when switching to console
Submitter : Reinette Chatre <reinette.chatre@intel.com>
Date : 2009-07-23 17:57 (28 days old)
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
2009-08-19 20:26 ` [Bug #13819] system freeze when switching to console Rafael J. Wysocki
@ 2009-08-19 23:35 ` reinette chatre
0 siblings, 0 replies; 286+ messages in thread
From: reinette chatre @ 2009-08-19 23:35 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux Kernel Mailing List, Kernel Testers List, Eric Anholt, Ma,
Ling, Linus Torvalds
On Wed, 2009-08-19 at 13:26 -0700, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
> Subject : system freeze when switching to console
> Submitter : Reinette Chatre <reinette.chatre@intel.com>
> Date : 2009-07-23 17:57 (28 days old)
This issue is still present in 2.6.31-rc6. Unfortunately the patches I
reverted to get a working system does not revert cleanly anymore.
Reinette
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
@ 2009-08-19 23:35 ` reinette chatre
0 siblings, 0 replies; 286+ messages in thread
From: reinette chatre @ 2009-08-19 23:35 UTC (permalink / raw)
To: Rafael J. Wysocki
Cc: Linux Kernel Mailing List, Kernel Testers List, Eric Anholt, Ma,
Ling, Linus Torvalds
On Wed, 2009-08-19 at 13:26 -0700, Rafael J. Wysocki wrote:
> This message has been generated automatically as a part of a report
> of recent regressions.
>
> The following bug entry is on the current list of known regressions
> from 2.6.30. Please verify if it still should be listed and let me know
> (either way).
>
>
> Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
> Subject : system freeze when switching to console
> Submitter : Reinette Chatre <reinette.chatre-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> Date : 2009-07-23 17:57 (28 days old)
This issue is still present in 2.6.31-rc6. Unfortunately the patches I
reverted to get a working system does not revert cleanly anymore.
Reinette
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
2009-08-19 23:35 ` reinette chatre
@ 2009-08-20 14:55 ` Rafael J. Wysocki
-1 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-20 14:55 UTC (permalink / raw)
To: reinette chatre
Cc: Linux Kernel Mailing List, Kernel Testers List, Eric Anholt, Ma,
Ling, Linus Torvalds
On Thursday 20 August 2009, reinette chatre wrote:
> On Wed, 2009-08-19 at 13:26 -0700, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
> > Subject : system freeze when switching to console
> > Submitter : Reinette Chatre <reinette.chatre@intel.com>
> > Date : 2009-07-23 17:57 (28 days old)
>
> This issue is still present in 2.6.31-rc6. Unfortunately the patches I
> reverted to get a working system does not revert cleanly anymore.
Thanks for the update.
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* Re: [Bug #13819] system freeze when switching to console
@ 2009-08-20 14:55 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-20 14:55 UTC (permalink / raw)
To: reinette chatre
Cc: Linux Kernel Mailing List, Kernel Testers List, Eric Anholt, Ma,
Ling, Linus Torvalds
On Thursday 20 August 2009, reinette chatre wrote:
> On Wed, 2009-08-19 at 13:26 -0700, Rafael J. Wysocki wrote:
> > This message has been generated automatically as a part of a report
> > of recent regressions.
> >
> > The following bug entry is on the current list of known regressions
> > from 2.6.30. Please verify if it still should be listed and let me know
> > (either way).
> >
> >
> > Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
> > Subject : system freeze when switching to console
> > Submitter : Reinette Chatre <reinette.chatre-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
> > Date : 2009-07-23 17:57 (28 days old)
>
> This issue is still present in 2.6.31-rc6. Unfortunately the patches I
> reverted to get a working system does not revert cleanly anymore.
Thanks for the update.
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* 2.6.31-rc5-git5: Reported regressions from 2.6.30
@ 2009-08-09 20:36 Rafael J. Wysocki
2009-08-09 20:44 ` Rafael J. Wysocki
0 siblings, 1 reply; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-09 20:36 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich,
Kernel Testers List, Network Development, Linux ACPI,
Linux PM List, Linux SCSI List, Linux Wireless List, DRI
This message contains a list of some regressions from 2.6.30, for which there
are no fixes in the mainline I know of. If any of them have been fixed already,
please let me know.
If you know of any other unresolved regressions from 2.6.30, please let me know
either and I'll add them to the list. Also, please let me know if any of the
entries below are invalid.
Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.
Listed regressions statistics:
Date Total Pending Unresolved
----------------------------------------
2009-08-10 89 27 24
2009-08-02 76 36 28
2009-07-27 70 51 43
2009-07-07 35 25 21
2009-06-29 22 22 15
Unresolved regressions
----------------------
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13950
Subject : Oops when USB Serial disconnected while in use
Submitter : Bruno Prémont <bonbons@linux-vserver.org>
Date : 2009-08-08 17:47 (2 days old)
References : http://marc.info/?l=linux-kernel&m=124975432900466&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13947
Subject : Libertas: Association request to the driver failed
Submitter : Daniel Mack <daniel@caiaq.de>
Date : 2009-08-07 19:11 (3 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=57921c312e8cef72ba35a4cfe870b376da0b1b87
References : http://marc.info/?l=linux-kernel&m=124967234311481&w=4
Handled-By : Roel Kluin <roel.kluin@gmail.com>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13943
Subject : WARNING: at net/mac80211/mlme.c:2292 with ath5k
Submitter : Fabio Comolli <fabio.comolli@gmail.com>
Date : 2009-08-06 20:15 (4 days old)
References : http://marc.info/?l=linux-kernel&m=124958978600600&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13942
Subject : Troubles with AoE and uninitialized object
Submitter : Bruno Prémont <bonbons@linux-vserver.org>
Date : 2009-08-04 10:12 (6 days old)
References : http://marc.info/?l=linux-kernel&m=124938117104811&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13941
Subject : x86 Geode issue
Submitter : Martin-Éric Racine <q-funk@iki.fi>
Date : 2009-08-03 12:58 (7 days old)
References : http://marc.info/?l=linux-kernel&m=124930434732481&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13940
Subject : iwlagn and sky2 stopped working, ACPI-related
Submitter : Ricardo Jorge da Fonseca Marques Ferreira <storm@sys49152.net>
Date : 2009-08-07 22:33 (3 days old)
References : http://marc.info/?l=linux-kernel&m=124968457731107&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13935
Subject : 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version)
Submitter : Adrian Ulrich <kernel@blinkenlights.ch>
Date : 2009-08-08 22:08 (2 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=fa047e4f6fa63a6e9d0ae4d7749538830d14a343
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13914
Subject : e1000e reports invalid NVM Checksum on 82566DM-2 (bisected)
Submitter : <jsbronder@gentoo.org>
Date : 2009-08-04 18:06 (6 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13906
Subject : Huawei E169 GPRS connection causes Ooops
Submitter : Clemens Eisserer <linuxhippy@gmail.com>
Date : 2009-08-04 09:02 (6 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13899
Subject : Oops from tar, 2.6.31-rc5, 32 bit on quad core phenom.
Submitter : Gene Heskett <gene.heskett@verizon.net>
Date : 2009-08-01 13:04 (9 days old)
References : http://marc.info/?l=linux-kernel&m=124913190304149&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13895
Subject : 2.6.31-rc4 - slab entry tak_delay_info leaking ???
Submitter : Paul Rolland <rol@as2917.net>
Date : 2009-07-29 08:20 (12 days old)
References : http://marc.info/?l=linux-kernel&m=124884847925375&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13869
Subject : Radeon framebuffer (w/o KMS) corruption at boot.
Submitter : Duncan <1i5t5.duncan@cox.net>
Date : 2009-07-29 16:44 (12 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13848
Subject : iwlwifi (4965) regression since 2.6.30
Submitter : Lukas Hejtmanek <xhejtman@ics.muni.cz>
Date : 2009-07-26 7:57 (15 days old)
References : http://marc.info/?l=linux-kernel&m=124859658502866&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13846
Subject : LEDs switched off permanently by power saving with rt61pci driver
Submitter : Chris Clayton <chris2553@googlemail.com>
Date : 2009-07-13 8:27 (28 days old)
References : http://marc.info/?l=linux-kernel&m=124747418828398&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13837
Subject : Input : regression - touchpad not detected
Submitter : Dave Young <hidave.darkstar@gmail.com>
Date : 2009-07-17 07:13 (24 days old)
References : http://marc.info/?l=linux-kernel&m=124780763701571&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836
Subject : suspend script fails, related to stdout?
Submitter : Tomas M. <tmezzadra@gmail.com>
Date : 2009-07-17 21:24 (24 days old)
References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13833
Subject : Kernel Oops when trying to suspend with ubifs mounted on block2mtd mtd device
Submitter : Tobias Diedrich <ranma@tdiedrich.de>
Date : 2009-07-15 14:20 (26 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=15bce40cb3133bcc07d548013df97e4653d363c1
References : http://marc.info/?l=linux-kernel&m=124766049207807&w=4
http://marc.info/?l=linux-kernel&m=124704927819769&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
Subject : system freeze when switching to console
Submitter : Reinette Chatre <reinette.chatre@intel.com>
Date : 2009-07-23 17:57 (18 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13809
Subject : oprofile: possible circular locking dependency detected
Submitter : Jerome Marchand <jmarchan@redhat.com>
Date : 2009-07-22 13:35 (19 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13740
Subject : X server crashes with 2.6.31-rc2 when options are changed
Submitter : Michael S. Tsirkin <m.s.tsirkin@gmail.com>
Date : 2009-07-07 15:19 (34 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13733
Subject : 2.6.31-rc2: irq 16: nobody cared
Submitter : Niel Lambrechts <niel.lambrechts@gmail.com>
Date : 2009-07-06 18:32 (35 days old)
References : http://marc.info/?l=linux-kernel&m=124690524027166&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13716
Subject : The AIC-7892P controller does not work any more
Submitter : Andrej Podzimek <andrej@podzimek.org>
Date : 2009-07-05 19:23 (36 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13713
Subject : [drm/i915] Possible regression due to commit "Change GEM throttling to be 20ms (...)"
Submitter : <kazikcz@gmail.com>
Date : 2009-07-05 10:49 (36 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b962442e46a9340bdbc6711982c59ff0cc2b5afb
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13645
Subject : NULL pointer dereference at (null) (level2_spare_pgt)
Submitter : poornima nayak <mpnayak@linux.vnet.ibm.com>
Date : 2009-06-17 17:56 (54 days old)
References : http://lkml.org/lkml/2009/6/17/194
Regressions with patches
------------------------
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13948
Subject : ath5k broken after suspend-to-ram
Submitter : Johannes Stezenbach <js@sig21.net>
Date : 2009-08-07 21:51 (3 days old)
References : http://marc.info/?l=linux-kernel&m=124968192727854&w=4
Handled-By : Nick Kossifidis <mickflemm@gmail.com>
Patch : http://patchwork.kernel.org/patch/38550/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13946
Subject : x86 MCE malfunction on Thinkpad T42p
Submitter : Johannes Stezenbach <js@sig21.net>
Date : 2009-08-07 17:09 (3 days old)
References : http://marc.info/?l=linux-kernel&m=124966500232399&w=4
Handled-By : Bartlomiej Zolnierkiewicz <bzolnier@gmail.com>
Patch : http://patchwork.kernel.org/patch/37908/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13944
Subject : MD raid regression
Submitter : Mike Snitzer <snitzer@redhat.com>
Date : 2009-08-05 15:06 (5 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=449aad3e25358812c43afc60918c5ad3819488e7
References : http://marc.info/?l=linux-kernel&m=124948481218857&w=4
Handled-By : NeilBrown <neilb@suse.de>
Patch : http://patchwork.kernel.org/patch/39521/
For details, please visit the bug entries and follow the links given in
references.
As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.30,
unresolved as well as resolved, at:
http://bugzilla.kernel.org/show_bug.cgi?id=13615
Please let me know if there are any Bugzilla entries that should be added to
the list in there.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13819] system freeze when switching to console
2009-08-09 20:36 2.6.31-rc5-git5: Reported regressions from 2.6.30 Rafael J. Wysocki
@ 2009-08-09 20:44 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-09 20:44 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Eric Anholt, ling.ma, Linus Torvalds,
Ma Ling, Reinette Chatre
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
Subject : system freeze when switching to console
Submitter : Reinette Chatre <reinette.chatre@intel.com>
Date : 2009-07-23 17:57 (18 days old)
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13819] system freeze when switching to console
@ 2009-08-09 20:44 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-09 20:44 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Eric Anholt, ling.ma-ral2JQCrhuEAvxtiuMwx3w,
Linus Torvalds, Ma Ling, Reinette Chatre
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
Subject : system freeze when switching to console
Submitter : Reinette Chatre <reinette.chatre-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Date : 2009-07-23 17:57 (18 days old)
^ permalink raw reply [flat|nested] 286+ messages in thread
* 2.6.31-rc5: Reported regressions from 2.6.30
@ 2009-08-02 18:49 Rafael J. Wysocki
2009-08-02 18:58 ` [Bug #13819] system freeze when switching to console Rafael J. Wysocki
0 siblings, 1 reply; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-08-02 18:49 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich,
Kernel Testers List, Network Development, Linux ACPI,
Linux PM List, Linux SCSI List, Linux Wireless List, DRI
This message contains a list of some regressions from 2.6.30, for which there
are no fixes in the mainline I know of. If any of them have been fixed already,
please let me know.
If you know of any other unresolved regressions from 2.6.30, please let me know
either and I'll add them to the list. Also, please let me know if any of the
entries below are invalid.
Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.
Listed regressions statistics:
Date Total Pending Unresolved
----------------------------------------
2009-08-02 76 36 28
2009-07-27 70 51 43
2009-07-07 35 25 21
2009-06-29 22 22 15
Unresolved regressions
----------------------
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13899
Subject : Oops from tar, 2.6.31-rc5, 32 bit on quad core phenom.
Submitter : Gene Heskett <gene.heskett@verizon.net>
Date : 2009-08-01 13:04 (2 days old)
References : http://marc.info/?l=linux-kernel&m=124913190304149&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13896
Subject : 2.6.31-rc4 broke expect and gcc's testsuite
Submitter : Mikael Pettersson <mikpe@it.uu.se>
Date : 2009-07-29 11:00 (5 days old)
References : http://marc.info/?l=linux-kernel&m=124885806406520&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13895
Subject : 2.6.31-rc4 - slab entry tak_delay_info leaking ???
Submitter : Paul Rolland <rol@as2917.net>
Date : 2009-07-29 08:20 (5 days old)
References : http://marc.info/?l=linux-kernel&m=124884847925375&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13894
Subject : intermittent hibernation problem
Submitter : Ferenc Wagner <wferi@niif.hu>
Date : 2009-07-30 13:29 (4 days old)
References : https://lists.linux-foundation.org/pipermail/linux-pm/2009-July/022095.html
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13869
Subject : Radeon framebuffer (w/o KMS) corruption at boot.
Submitter : Duncan <1i5t5.duncan@cox.net>
Date : 2009-07-29 16:44 (5 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13848
Subject : iwlwifi (4965) regression since 2.6.30
Submitter : Lukas Hejtmanek <xhejtman@ics.muni.cz>
Date : 2009-07-26 7:57 (8 days old)
References : http://marc.info/?l=linux-kernel&m=124859658502866&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13846
Subject : Possible regression in rt61pci driver
Submitter : Chris Clayton <chris2553@googlemail.com>
Date : 2009-07-13 8:27 (21 days old)
References : http://marc.info/?l=linux-kernel&m=124747418828398&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13842
Subject : Oops when writing to /sys/block/ram0/queue/max_sectors_kb
Submitter : Jens Rosenboom <jens@mcbone.net>
Date : 2009-07-23 15:30 (11 days old)
References : http://marc.info/?l=linux-kernel&m=124836574403032&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13837
Subject : Input : regression - touchpad not detected
Submitter : Dave Young <hidave.darkstar@gmail.com>
Date : 2009-07-17 07:13 (17 days old)
References : http://marc.info/?l=linux-kernel&m=124780763701571&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836
Subject : suspend script fails, related to stdout?
Submitter : Tomas M. <tmezzadra@gmail.com>
Date : 2009-07-17 21:24 (17 days old)
References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13833
Subject : Kernel Oops when trying to suspend with ubifs mounted on block2mtd mtd device
Submitter : Tobias Diedrich <ranma@tdiedrich.de>
Date : 2009-07-15 14:20 (19 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=15bce40cb3133bcc07d548013df97e4653d363c1
References : http://marc.info/?l=linux-kernel&m=124766049207807&w=4
http://marc.info/?l=linux-kernel&m=124704927819769&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13826
Subject : thinkpad boots with backlight low
Submitter : Pavel Machek <pavel@ucw.cz>
Date : 2009-07-15 15:13 (19 days old)
References : http://marc.info/?l=linux-kernel&m=124756359126830&w=4
Handled-By : Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
Subject : system freeze when switching to console
Submitter : Reinette Chatre <reinette.chatre@intel.com>
Date : 2009-07-23 17:57 (11 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13815
Subject : emacs -nw compilation doesn't show the error message
Submitter : Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date : 2009-07-23 06:22 (11 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d945cb9cce20ac7143c2de8d88b187f62db99bdc
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13813
Subject : Hangups in n_tty_read()
Submitter : Johannes Weiner <hannes@cmpxchg.org>
Date : 2009-07-16 18:48 (18 days old)
References : http://marc.info/?l=linux-kernel&m=124777019920579&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13812
Subject : Ooops on uplug
Submitter : Daniel Mack <daniel@caiaq.de>
Date : 2009-07-20 17:51 (14 days old)
References : http://marc.info/?l=linux-kernel&m=124811234302786&w=4
Handled-By : Alan Stern <stern@rowland.harvard.edu>
Alan Cox <alan@linux.intel.com>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13809
Subject : oprofile: possible circular locking dependency detected
Submitter : Jerome Marchand <jmarchan@redhat.com>
Date : 2009-07-22 13:35 (12 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13770
Subject : System freeze on XFS filesystem recovery on an external disk
Submitter : Jean-Luc Coulon <jean.luc.coulon@gmail.com>
Date : 2009-07-14 10:31 (20 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13762
Subject : AHCI on HP Compaq 6715s broken, did not detect slots/ports -> unable to boot
Submitter : Matthias Tingelhoff <mindo83@t-online.de>
Date : 2009-07-11 20:48 (23 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=a76117dfd687ec4be0a9a05214f3009cc5f73a42
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13740
Subject : X server crashes with 2.6.31-rc2 when options are changed
Submitter : Michael S. Tsirkin <m.s.tsirkin@gmail.com>
Date : 2009-07-07 15:19 (27 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13733
Subject : 2.6.31-rc2: irq 16: nobody cared
Submitter : Niel Lambrechts <niel.lambrechts@gmail.com>
Date : 2009-07-06 18:32 (28 days old)
References : http://marc.info/?l=linux-kernel&m=124690524027166&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13731
Subject : Inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
Submitter : Miles Lane <miles.lane@gmail.com>
Date : 2009-07-06 4:22 (28 days old)
References : http://marc.info/?l=linux-kernel&m=124685417325348&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13726
Subject : fio sync read 4k block size 35% regression
Submitter : Zhang, Yanmin <yanmin_zhang@linux.intel.com>
Date : 2009-07-01 11:25 (33 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=51daa88ebd8e0d437289f589af29d4b39379ea76
References : http://lkml.org/lkml/2009/6/30/679
Handled-By : Wu Fengguang <fengguang.wu@intel.com>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13716
Subject : The AIC-7892P controller does not work any more
Submitter : Andrej Podzimek <andrej@podzimek.org>
Date : 2009-07-05 19:23 (29 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13713
Subject : [drm/i915] Possible regression due to commit "Change GEM throttling to be 20ms (...)"
Submitter : <kazikcz@gmail.com>
Date : 2009-07-05 10:49 (29 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b962442e46a9340bdbc6711982c59ff0cc2b5afb
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13657
Subject : Linux-2.6.31-rc1 Fails To Recognize Some USB Disks
Submitter : Tarkan Erimer <tarkan.erimer@turknet.net.tr>
Date : 2009-06-26 10:03 (38 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=3821d768912a47ddbd6cab52943a8284df88003c
References : http://lkml.org/lkml/2009/6/26/34
Handled-By : Martin K. Petersen <martin.petersen@oracle.com>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13656
Subject : 2.6.31-rc1 crashes randomly on my Machine.
Submitter : Zeno Davatz <zdavatz@gmail.com>
Date : 2009-06-26 08:56 (38 days old)
References : http://lkml.org/lkml/2009/6/26/27
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13645
Subject : NULL pointer dereference at (null) (level2_spare_pgt)
Submitter : poornima nayak <mpnayak@linux.vnet.ibm.com>
Date : 2009-06-17 17:56 (47 days old)
References : http://lkml.org/lkml/2009/6/17/194
Regressions with patches
------------------------
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13891
Subject : PCI resources allocation problem on HP nx6325
Submitter : Rafael J. Wysocki <rjw@sisk.pl>
Date : 2009-08-02 13:37 (1 days old)
Handled-By : Linus Torvalds <torvalds@linux-foundation.org>
Patch : http://patchwork.kernel.org/patch/38774/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13872
Subject : cpufreq bug (null pointer dereference)
Submitter : Christophe Lermytte <christophe.lermytte@thomson.net>
Date : 2009-07-21 22:07 (13 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=ee88415caf736b89500f16e0a545614541a45005
References : http://marc.info/?l=linux-kernel&m=124820689011112&w=4
Handled-By : Pallipadi, Venkatesh <venkatesh.pallipadi@intel.com>
Patch : http://patchwork.kernel.org/patch/38229/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13861
Subject : CIFS mounts ignore uid argument (ok in 2.6.30.3)
Submitter : <bugzilla.kernel.org@falkensweb.com>
Date : 2009-07-28 21:39 (6 days old)
Handled-By : Jeff Layton <jlayton@redhat.com>
Patch : http://patchwork.kernel.org/patch/38498/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13840
Subject : KMS oops on 945G system
Submitter : Diego Calleja <diegocg@gmail.com>
Date : 2009-07-21 20:50 (13 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7662c8bd6545c12ac7b2b39e4554c3ba34789c50
References : http://marc.info/?l=linux-kernel&m=124820945815030&w=4
Handled-By : Jesse Barnes <jbarnes@virtuousgeek.org>
Patch : http://git.kernel.org/?p=linux/kernel/git/anholt/drm-intel.git;a=commit;h=dff33cfcefa31c30b72c57f44586754ea9e8f3e2
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13838
Subject : kernel BUG at include/net/netns/generic.h:41!
Submitter : Luca Tettamanti <kronos.it@gmail.com>
Date : 2009-07-20 15:27 (14 days old)
References : http://lkml.org/lkml/2009/7/20/105
Handled-By : Eric Dumazet <eric.dumazet@gmail.com>
Patch : http://patchwork.kernel.org/patch/37779/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13825
Subject : eeepc-laptop: fix hot-unplug on resume
Submitter : Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Date : 2009-06-29 13:12 (35 days old)
References : http://lkml.org/lkml/2009/6/29/150
Handled-By : Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Patch : http://patchwork.kernel.org/patch/32926/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13781
Subject : System freeze at resume after suspend to RAM
Submitter : Christian Casteyde <casteyde.christian@free.fr>
Date : 2009-07-15 18:42 (19 days old)
Handled-By : Zhao Yakui <yakui.zhao@intel.com>
Patch : http://bugzilla.kernel.org/attachment.cgi?id=22463
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13742
Subject : iwlagn (4965): regression when hardware rf switch is used
Submitter : Frans Pop <elendil@planet.nl>
Date : 2009-06-29 11:28 (35 days old)
References : http://lkml.org/lkml/2009/6/29/88
Handled-By : Reinette Chatre <reinette.chatre@intel.com>
Patch : http://lkml.org/lkml/2009/6/30/224
For details, please visit the bug entries and follow the links given in
references.
As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.30,
unresolved as well as resolved, at:
http://bugzilla.kernel.org/show_bug.cgi?id=13615
Please let me know if there are any Bugzilla entries that should be added to
the list in there.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* 2.6.31-rc4: Reported regressions from 2.6.30
@ 2009-07-26 20:23 Rafael J. Wysocki
2009-07-26 20:28 ` Rafael J. Wysocki
0 siblings, 1 reply; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-07-26 20:23 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Adrian Bunk, Andrew Morton, Linus Torvalds, Natalie Protasevich,
Kernel Testers List, Network Development, Linux ACPI,
Linux PM List, Linux SCSI List, Linux Wireless List, DRI
This message contains a list of some regressions from 2.6.30, for which there
are no fixes in the mainline I know of. If any of them have been fixed already,
please let me know.
If you know of any other unresolved regressions from 2.6.30, please let me know
either and I'll add them to the list. Also, please let me know if any of the
entries below are invalid.
Each entry from the list will be sent additionally in an automatic reply to
this message with CCs to the people involved in reporting and handling the
issue.
Listed regressions statistics:
Date Total Pending Unresolved
----------------------------------------
2009-07-27 70 51 43
2009-07-07 35 25 21
2009-06-29 22 22 15
Unresolved regressions
----------------------
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13848
Subject : iwlwifi (4965) regression since 2.6.30
Submitter : Lukas Hejtmanek <xhejtman@ics.muni.cz>
Date : 2009-07-26 7:57 (1 days old)
References : http://marc.info/?l=linux-kernel&m=124859658502866&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13847
Subject : X stopped accepting keystrokes
Submitter : Pavel Machek <pavel@ucw.cz>
Date : 2009-07-14 9:24 (13 days old)
References : http://marc.info/?l=linux-kernel&m=124756352426737&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13846
Subject : Possible regression in rt61pci driver
Submitter : Chris Clayton <chris2553@googlemail.com>
Date : 2009-07-13 8:27 (14 days old)
References : http://marc.info/?l=linux-kernel&m=124747418828398&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13845
Subject : inotify regression, missing events
Submitter : Scott James Remnant <scott@ubuntu.com>
Date : 2009-07-11 16:02 (16 days old)
References : http://marc.info/?l=linux-kernel&m=124732816314881&w=4
Handled-By : Eric Paris <eparis@redhat.com>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13844
Subject : i915 errors
Submitter : Fabio Comolli <fabio.comolli@gmail.com>
Date : 2009-07-25 9:30 (2 days old)
References : http://marc.info/?l=linux-kernel&m=124851427612720&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13843
Subject : Linux-2.6.31-rc4 fails to open a USB serial port
Submitter : e9hack <e9hack@googlemail.com>
Date : 2009-07-25 9:16 (2 days old)
References : http://marc.info/?l=linux-kernel&m=124851343512022&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13842
Subject : Oops when writing to /sys/block/ram0/queue/max_sectors_kb
Submitter : Jens Rosenboom <jens@mcbone.net>
Date : 2009-07-23 15:30 (4 days old)
References : http://marc.info/?l=linux-kernel&m=124836574403032&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13841
Subject : 2.6.31-rc4 boot failure
Submitter : Gene Heskett <gene.heskett@verizon.net>
Date : 2009-07-23 14:12 (4 days old)
References : http://marc.info/?l=linux-kernel&m=124835839019906&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13839
Subject : xc5000 no longer works with Myth-0.21-fixes branch
Submitter : Mark Lord <lkml@rtr.ca>
Date : 2009-07-19 15:15 (8 days old)
References : http://marc.info/?l=linux-kernel&m=124814394016848&w=4
Handled-By : Devin Heitmueller <dheitmueller@kernellabs.com>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13838
Subject : kernel BUG at include/net/netns/generic.h:41!
Submitter : Luca Tettamanti <kronos.it@gmail.com>
Date : 2009-07-20 15:27 (7 days old)
References : http://lkml.org/lkml/2009/7/20/105
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13837
Subject : Input : regression - touchpad not detected
Submitter : Dave Young <hidave.darkstar@gmail.com>
Date : 2009-07-17 07:13 (10 days old)
References : http://marc.info/?l=linux-kernel&m=124780763701571&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13836
Subject : suspend script fails, related to stdout?
Submitter : Tomas M. <tmezzadra@gmail.com>
Date : 2009-07-17 21:24 (10 days old)
References : http://marc.info/?l=linux-kernel&m=124785853811667&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13835
Subject : e1000e massive packet loss
Submitter : Caleb Cushing <xenoterracide@gmail.com>
Date : 2009-07-16 09:49 (11 days old)
References : http://marc.info/?l=linux-kernel&m=124773057917887&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13833
Subject : Kernel Oops when trying to suspend with ubifs mounted on block2mtd mtd device
Submitter : Tobias Diedrich <ranma@tdiedrich.de>
Date : 2009-07-15 14:20 (12 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=15bce40cb3133bcc07d548013df97e4653d363c1
References : http://marc.info/?l=linux-kernel&m=124766049207807&w=4
http://marc.info/?l=linux-kernel&m=124704927819769&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13826
Subject : thinkpad boots with backlight low
Submitter : Pavel Machek <pavel@ucw.cz>
Date : 2009-07-15 15:13 (12 days old)
References : http://marc.info/?l=linux-kernel&m=124756359126830&w=4
Handled-By : Henrique de Moraes Holschuh <hmh@hmh.eng.br>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13821
Subject : Replugging USB serial converter uses new device node
Submitter : Ferenc Wagner <wferi@niif.hu>
Date : 2009-07-18 20:04 (9 days old)
References : http://marc.info/?l=linux-kernel&m=124794754015776&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
Subject : system freeze when switching to console
Submitter : Reinette Chatre <reinette.chatre@intel.com>
Date : 2009-07-23 17:57 (4 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13815
Subject : emacs -nw compilation doesn't show the error message
Submitter : Aneesh Kumar K.V <aneesh.kumar@linux.vnet.ibm.com>
Date : 2009-07-23 06:22 (4 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=d945cb9cce20ac7143c2de8d88b187f62db99bdc
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13813
Subject : Hangups in n_tty_read()
Submitter : Johannes Weiner <hannes@cmpxchg.org>
Date : 2009-07-16 18:48 (11 days old)
References : http://marc.info/?l=linux-kernel&m=124777019920579&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13812
Subject : Ooops on uplug
Submitter : Daniel Mack <daniel@caiaq.de>
Date : 2009-07-20 17:51 (7 days old)
References : http://marc.info/?l=linux-kernel&m=124811234302786&w=4
Handled-By : Alan Stern <stern@rowland.harvard.edu>
Alan Cox <alan@linux.intel.com>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13809
Subject : oprofile: possible circular locking dependency detected
Submitter : Jerome Marchand <jmarchan@redhat.com>
Date : 2009-07-22 13:35 (5 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13806
Subject : system does not boot due to device-mapper error
Submitter : Roman Shtylman <shtylman@gmail.com>
Date : 2009-07-21 00:03 (6 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13781
Subject : System freeze at resume after suspend to RAM
Submitter : Christian Casteyde <casteyde.christian@free.fr>
Date : 2009-07-15 18:42 (12 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13770
Subject : System freeze on XFS filesystem recovery on an external disk
Submitter : Jean-Luc Coulon <jean.luc.coulon@gmail.com>
Date : 2009-07-14 10:31 (13 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13762
Subject : AHCI on HP Compaq 6715s broken, did not detect slots/ports -> unable to boot
Submitter : Matthias Tingelhoff <mindo83@t-online.de>
Date : 2009-07-11 20:48 (16 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13750
Subject : Load average flatlines after returning from hibernate
Submitter : Duncan <1i5t5.duncan@cox.net>
Date : 2009-07-09 15:14 (18 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13740
Subject : X server crashes with 2.6.31-rc2 when options are changed
Submitter : Michael S. Tsirkin <m.s.tsirkin@gmail.com>
Date : 2009-07-07 15:19 (20 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13733
Subject : 2.6.31-rc2: irq 16: nobody cared
Submitter : Niel Lambrechts <niel.lambrechts@gmail.com>
Date : 2009-07-06 18:32 (21 days old)
References : http://marc.info/?l=linux-kernel&m=124690524027166&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13732
Subject : tty layer instabilities
Submitter : Mikael Pettersson <mikpe@it.uu.se>
Date : 2009-07-06 13:43 (21 days old)
References : http://marc.info/?l=linux-kernel&m=124688781732419&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13731
Subject : Inconsistent {IN-RECLAIM_FS-W} -> {RECLAIM_FS-ON-W} usage.
Submitter : Miles Lane <miles.lane@gmail.com>
Date : 2009-07-06 4:22 (21 days old)
References : http://marc.info/?l=linux-kernel&m=124685417325348&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13730
Subject : hitting lockdep limits...
Submitter : Daniel J Blueman <daniel.blueman@gmail.com>
Date : 2009-07-05 18:19 (22 days old)
References : http://marc.info/?l=linux-kernel&m=124681799023782&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13729
Subject : kernel BUG at fs/notify/notification.c:93!
Submitter : Mikko C. <mikko.cal@gmail.com>
Date : 2009-06-04 10:16 (53 days old)
References : http://lkml.org/lkml/2009/7/4/12
Handled-By : Eric Paris <eparis@redhat.com>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13728
Subject : 2.6.31-rc2 soft lockups, RPC-related
Submitter : Paul Collins <paul@burly.ondioline.org>
Date : 2009-07-05 7:17 (22 days old)
References : http://marc.info/?l=linux-kernel&m=124677884816794&w=4
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13726
Subject : fio sync read 4k block size 35% regression
Submitter : Zhang, Yanmin <yanmin_zhang@linux.intel.com>
Date : 2009-07-01 11:25 (26 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=51daa88ebd8e0d437289f589af29d4b39379ea76
References : http://lkml.org/lkml/2009/6/30/679
Handled-By : Wu Fengguang <fengguang.wu@intel.com>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13716
Subject : The AIC-7892P controller does not work any more
Submitter : Andrej Podzimek <andrej@podzimek.org>
Date : 2009-07-05 19:23 (22 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13713
Subject : [drm/i915] Possible regression due to commit "Change GEM throttling to be 20ms (...)"
Submitter : <kazikcz@gmail.com>
Date : 2009-07-05 10:49 (22 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=b962442e46a9340bdbc6711982c59ff0cc2b5afb
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13709
Subject : b2c2-flexcop: no frontend driver found for this B2C2/FlexCop adapter w/ kernel-2.6.31-rc2
Submitter : boris64 <bugzilla.kernel.org@boris64.net>
Date : 2009-07-05 01:36 (22 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13700
Subject : usb error flood in dmesg, makes kde use plenty of cpu - bisected
Submitter : jouni susiluoto <jouni.susiluoto@helsinki.fi>
Date : 2009-07-03 21:13 (24 days old)
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13667
Subject : drm: display arifacts when X.Org is stopped
Submitter : Frans Pop <elendil@planet.nl>
Date : 2009-06-27 18:52 (30 days old)
References : http://lkml.org/lkml/2009/6/27/105
http://lkml.org/lkml/2009/7/8/30
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13666
Subject : WARNING: at mm/page_alloc.c:1743 __alloc_pages_nodemask
Submitter : Thomas Meyer <thomas@m3y3r.de>
Date : 2009-06-27 16:15 (30 days old)
References : http://lkml.org/lkml/2009/6/27/75
http://lkml.org/lkml/2009/7/7/6
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13657
Subject : Linux-2.6.31-rc1 Fails To Recognize Some USB Disks
Submitter : Tarkan Erimer <tarkan.erimer@turknet.net.tr>
Date : 2009-06-26 10:03 (31 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=3821d768912a47ddbd6cab52943a8284df88003c
References : http://lkml.org/lkml/2009/6/26/34
Handled-By : Martin K. Petersen <martin.petersen@oracle.com>
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13656
Subject : 2.6.31-rc1 crashes randomly on my Machine.
Submitter : Zeno Davatz <zdavatz@gmail.com>
Date : 2009-06-26 08:56 (31 days old)
References : http://lkml.org/lkml/2009/6/26/27
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13645
Subject : NULL pointer dereference at (null) (level2_spare_pgt)
Submitter : poornima nayak <mpnayak@linux.vnet.ibm.com>
Date : 2009-06-17 17:56 (40 days old)
References : http://lkml.org/lkml/2009/6/17/194
Regressions with patches
------------------------
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13840
Subject : KMS oops on 945G system
Submitter : Diego Calleja <diegocg@gmail.com>
Date : 2009-07-21 20:50 (6 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=7662c8bd6545c12ac7b2b39e4554c3ba34789c50
References : http://marc.info/?l=linux-kernel&m=124820945815030&w=4
Handled-By : Jesse Barnes <jbarnes@virtuousgeek.org>
Patch : http://git.kernel.org/?p=linux/kernel/git/anholt/drm-intel.git;a=commit;h=dff33cfcefa31c30b72c57f44586754ea9e8f3e2
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13834
Subject : device mapper fails on some logical volumes
Submitter : Christian Bornträger <borntraeger@de.ibm.com>
Date : 2009-07-15 18:52 (12 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=754c5fc7ebb417b23601a6222a6005cc2e7f2913
References : http://marc.info/?l=linux-kernel&m=124767677206470&w=4
Handled-By : Mike Snitzer <snitzer@redhat.com>
Patch : http://patchwork.kernel.org/patch/33534/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13827
Subject : PM/hibernate swapfile regression
Submitter : Heiko Carstens <heiko.carstens@de.ibm.com>
Date : 2009-07-14 15:54 (13 days old)
References : http://marc.info/?l=linux-kernel&m=124757972118196&w=4
Handled-By : Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Patch : http://patchwork.kernel.org/patch/36510/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13825
Subject : eeepc-laptop: fix hot-unplug on resume
Submitter : Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Date : 2009-06-29 13:12 (28 days old)
References : http://lkml.org/lkml/2009/6/29/150
Handled-By : Alan Jenkins <alan-jenkins@tuffmail.co.uk>
Patch : http://patchwork.kernel.org/patch/32926/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13742
Subject : iwlagn (4965): regression when hardware rf switch is used
Submitter : Frans Pop <elendil@planet.nl>
Date : 2009-06-29 11:28 (28 days old)
References : http://lkml.org/lkml/2009/6/29/88
Handled-By : Reinette Chatre <reinette.chatre@intel.com>
Patch : http://lkml.org/lkml/2009/6/30/224
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13665
Subject : commit 69c854817566 causes OOMs
Submitter : David Howells <dhowells@redhat.com>
Date : 2009-06-27 08:12 (30 days old)
First-Bad-Commit: http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=69c854817566db82c362797b4a6521d0b00fe1d8
References : http://lkml.org/lkml/2009/6/27/28
Handled-By : Wu Fengguang <fengguang.wu@intel.com>
Patch : http://patchwork.kernel.org/patch/32740/
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13659
Subject : iwlagn (4965): no wireless due to RFKILL problem
Submitter : Frans Pop <elendil@planet.nl>
Date : 2009-06-26 13:36 (31 days old)
References : http://lkml.org/lkml/2009/6/26/127
Handled-By : Johannes Berg <johannes@sipsolutions.net>
Patch : http://lkml.org/lkml/2009/6/27/35
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13643
Subject : Touchpad lost synchronization after resume from suspend to RAM
Submitter : Cijoml Cijomlovic Cijomlov <cijoml@volny.cz>
Date : 2009-06-28 10:42 (29 days old)
References : http://lkml.org/lkml/2009/6/23/365
2.6.30-rc1: touchpad disabled
http://lkml.org/lkml/2009/6/25/256
Handled-By : Thadeu Lima de Souza Cascardo <cascardo@holoscopio.com>
Patch : http://patchwork.kernel.org/patch/34075/
For details, please visit the bug entries and follow the links given in
references.
As you can see, there is a Bugzilla entry for each of the listed regressions.
There also is a Bugzilla entry used for tracking the regressions from 2.6.30,
unresolved as well as resolved, at:
http://bugzilla.kernel.org/show_bug.cgi?id=13615
Please let me know if there are any Bugzilla entries that should be added to
the list in there.
Thanks,
Rafael
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13819] system freeze when switching to console
2009-07-26 20:23 2.6.31-rc4: Reported regressions from 2.6.30 Rafael J. Wysocki
@ 2009-07-26 20:28 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-07-26 20:28 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Eric Anholt, ling.ma, Linus Torvalds,
Ma Ling, Reinette Chatre
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
Subject : system freeze when switching to console
Submitter : Reinette Chatre <reinette.chatre@intel.com>
Date : 2009-07-23 17:57 (4 days old)
^ permalink raw reply [flat|nested] 286+ messages in thread
* [Bug #13819] system freeze when switching to console
@ 2009-07-26 20:28 ` Rafael J. Wysocki
0 siblings, 0 replies; 286+ messages in thread
From: Rafael J. Wysocki @ 2009-07-26 20:28 UTC (permalink / raw)
To: Linux Kernel Mailing List
Cc: Kernel Testers List, Eric Anholt, ling.ma-ral2JQCrhuEAvxtiuMwx3w,
Linus Torvalds, Ma Ling, Reinette Chatre
This message has been generated automatically as a part of a report
of recent regressions.
The following bug entry is on the current list of known regressions
from 2.6.30. Please verify if it still should be listed and let me know
(either way).
Bug-Entry : http://bugzilla.kernel.org/show_bug.cgi?id=13819
Subject : system freeze when switching to console
Submitter : Reinette Chatre <reinette.chatre-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>
Date : 2009-07-23 17:57 (4 days old)
^ permalink raw reply [flat|nested] 286+ messages in thread
end of thread, other threads:[~2009-09-21 13:37 UTC | newest]
Thread overview: 286+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-08-25 20:00 2.6.31-rc7-git2: Reported regressions from 2.6.30 Rafael J. Wysocki
2009-08-25 20:00 ` Rafael J. Wysocki
2009-08-25 20:00 ` [Bug #13645] NULL pointer dereference at (null) (level2_spare_pgt) Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #13733] 2.6.31-rc2: irq 16: nobody cared Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #13740] X server crashes with 2.6.31-rc2 when options are changed Rafael J. Wysocki
2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #13809] oprofile: possible circular locking dependency detected Rafael J. Wysocki
2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #13819] system freeze when switching to console Rafael J. Wysocki
2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #13848] iwlwifi (4965) regression since 2.6.30 Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #13836] suspend script fails, related to stdout? Rafael J. Wysocki
2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-26 11:10 ` Tomas M.
2009-08-26 11:10 ` Tomas M.
2009-08-26 20:56 ` Rafael J. Wysocki
2009-08-26 20:56 ` Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #13935] 2.6.31-rcX breaks Apple MightyMouse (Bluetooth version) Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #13906] Huawei E169 GPRS connection causes Ooops Rafael J. Wysocki
2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #13869] Radeon framebuffer (w/o KMS) corruption at boot Rafael J. Wysocki
2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #13941] x86 Geode issue Rafael J. Wysocki
2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-25 23:37 ` Martin-Éric Racine
2009-08-26 20:59 ` Rafael J. Wysocki
2009-08-26 20:59 ` Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #13943] WARNING: at net/mac80211/mlme.c:2292 with ath5k Rafael J. Wysocki
2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-26 6:39 ` Fabio Comolli
2009-08-26 6:39 ` Fabio Comolli
2009-08-26 21:00 ` Rafael J. Wysocki
2009-08-26 21:00 ` Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #13942] Troubles with AoE and uninitialized object Rafael J. Wysocki
2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #13940] iwlagn and sky2 stopped working, ACPI-related Rafael J. Wysocki
2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-26 0:00 ` Ricardo Jorge da Fonseca Marques Ferreira
2009-08-26 0:00 ` Ricardo Jorge da Fonseca Marques Ferreira
2009-08-26 20:58 ` Rafael J. Wysocki
2009-08-26 20:58 ` Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #13947] Libertas: Association request to the driver failed Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #13948] ath5k broken after suspend-to-ram Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #13950] Oops when USB Serial disconnected while in use Rafael J. Wysocki
2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #13960] rtl8187 not connect to wifi Rafael J. Wysocki
2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #13987] Received NMI interrupt at resume Rafael J. Wysocki
2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #14012] latest git fried my x86_64 imac Rafael J. Wysocki
2009-08-26 0:28 ` Justin P. Mattock
2009-08-26 21:06 ` Rafael J. Wysocki
2009-08-26 21:06 ` Rafael J. Wysocki
2009-08-26 21:58 ` Justin P. Mattock
2009-08-26 21:58 ` Justin P. Mattock
2009-08-27 18:01 ` Justin P. Mattock
2009-08-27 18:01 ` Justin P. Mattock
2009-08-27 19:45 ` Rafael J. Wysocki
2009-08-27 19:45 ` Rafael J. Wysocki
2009-08-27 20:47 ` Randy Dunlap
2009-08-27 21:01 ` Justin P. Mattock
2009-08-27 21:01 ` Justin P. Mattock
2009-08-25 20:34 ` [Bug #14011] Kernel paging request failed in kmem_cache_alloc Rafael J. Wysocki
2009-08-26 6:17 ` Pekka Enberg
2009-08-26 6:17 ` Pekka Enberg
2009-08-26 14:01 ` Matthias Dahl
2009-08-26 14:59 ` Pekka Enberg
2009-08-26 14:59 ` Pekka Enberg
2009-08-26 15:08 ` Eric Paris
2009-08-26 15:08 ` Eric Paris
2009-08-26 21:03 ` Rafael J. Wysocki
2009-08-26 21:03 ` Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #14016] mm/ipw2200 regression Rafael J. Wysocki
2009-08-26 6:09 ` Pekka Enberg
2009-08-26 6:09 ` Pekka Enberg
2009-08-26 8:27 ` Johannes Weiner
2009-08-26 8:27 ` Johannes Weiner
2009-08-26 9:37 ` Mel Gorman
2009-08-26 9:37 ` Mel Gorman
2009-08-26 9:37 ` Mel Gorman
2009-08-26 14:44 ` Andrew Morton
2009-08-26 14:44 ` Andrew Morton
2009-08-27 9:11 ` Zhu Yi
2009-08-27 9:11 ` Zhu Yi
2009-08-27 9:11 ` Zhu Yi
2009-08-27 9:11 ` Zhu Yi
2009-08-27 9:45 ` Mel Gorman
2009-08-27 9:45 ` Mel Gorman
2009-08-27 9:45 ` Mel Gorman
2009-08-28 3:42 ` ipw2200: firmware DMA loading rework Zhu Yi
2009-08-28 3:42 ` Zhu Yi
2009-08-28 3:42 ` Zhu Yi
2009-08-28 3:42 ` Zhu Yi
2009-08-30 12:37 ` Bartlomiej Zolnierkiewicz
2009-08-30 12:37 ` Bartlomiej Zolnierkiewicz
2009-08-30 12:37 ` Bartlomiej Zolnierkiewicz
2009-09-02 17:48 ` Bartlomiej Zolnierkiewicz
2009-09-02 17:48 ` Bartlomiej Zolnierkiewicz
2009-09-02 17:48 ` Bartlomiej Zolnierkiewicz
2009-09-02 18:02 ` Luis R. Rodriguez
2009-09-02 18:02 ` Luis R. Rodriguez
2009-09-02 18:02 ` Luis R. Rodriguez
2009-09-02 18:26 ` Bartlomiej Zolnierkiewicz
2009-09-02 18:26 ` Bartlomiej Zolnierkiewicz
2009-09-02 18:26 ` Bartlomiej Zolnierkiewicz
2009-09-02 18:26 ` Bartlomiej Zolnierkiewicz
2009-09-19 13:25 ` Bartlomiej Zolnierkiewicz
2009-09-19 13:25 ` Bartlomiej Zolnierkiewicz
2009-09-19 13:25 ` Bartlomiej Zolnierkiewicz
2009-09-19 13:25 ` Bartlomiej Zolnierkiewicz
2009-09-21 8:58 ` Mel Gorman
2009-09-21 8:58 ` Mel Gorman
2009-09-21 8:58 ` Mel Gorman
2009-09-21 9:59 ` Bartlomiej Zolnierkiewicz
2009-09-21 9:59 ` Bartlomiej Zolnierkiewicz
2009-09-21 9:59 ` Bartlomiej Zolnierkiewicz
2009-09-21 10:08 ` Mel Gorman
2009-09-21 10:08 ` Mel Gorman
2009-09-21 10:08 ` Mel Gorman
2009-09-21 10:46 ` Bartlomiej Zolnierkiewicz
2009-09-21 10:46 ` Bartlomiej Zolnierkiewicz
2009-09-21 10:46 ` Bartlomiej Zolnierkiewicz
2009-09-21 10:56 ` Pekka Enberg
2009-09-21 10:56 ` Pekka Enberg
2009-09-21 10:56 ` Pekka Enberg
2009-09-21 13:12 ` Bartlomiej Zolnierkiewicz
2009-09-21 13:12 ` Bartlomiej Zolnierkiewicz
2009-09-21 13:12 ` Bartlomiej Zolnierkiewicz
2009-09-21 13:37 ` Mel Gorman
2009-09-21 13:37 ` Mel Gorman
2009-09-21 13:37 ` Mel Gorman
2009-09-21 11:02 ` Mel Gorman
2009-09-21 11:02 ` Mel Gorman
2009-09-21 11:02 ` Mel Gorman
2009-09-21 11:02 ` Mel Gorman
2009-09-03 12:49 ` Mel Gorman
2009-09-03 12:49 ` Mel Gorman
2009-09-03 12:49 ` Mel Gorman
2009-09-03 12:49 ` Mel Gorman
2009-09-05 14:28 ` Theodore Tso
2009-09-05 14:28 ` Theodore Tso
2009-09-05 14:28 ` Theodore Tso
2009-09-05 14:28 ` Theodore Tso
[not found] ` <20090905142837.GI16217-3s7WtUTddSA@public.gmane.org>
2009-09-08 11:00 ` Mel Gorman
2009-09-08 11:00 ` Mel Gorman
2009-09-08 11:00 ` Mel Gorman
2009-09-08 11:00 ` Mel Gorman
2009-09-08 11:00 ` Mel Gorman
2009-09-08 20:39 ` Simon Kitching
2009-09-08 20:39 ` Simon Kitching
2009-09-08 20:39 ` Simon Kitching
2009-09-08 20:39 ` Simon Kitching
2009-08-26 9:51 ` [Bug #14016] mm/ipw2200 regression Johannes Weiner
2009-08-26 9:51 ` Johannes Weiner
2009-08-26 9:51 ` Johannes Weiner
2009-08-25 20:34 ` [Bug #14015] pty regressed again, breaking expect and gcc's testsuite Rafael J. Wysocki
2009-08-27 19:54 ` Mikael Pettersson
2009-08-27 19:54 ` Mikael Pettersson
2009-08-28 18:56 ` Rafael J. Wysocki
2009-08-28 18:56 ` Rafael J. Wysocki
2009-08-28 20:23 ` Mikael Pettersson
2009-08-28 20:23 ` Mikael Pettersson
2009-08-29 14:16 ` Mikael Pettersson
2009-08-29 14:16 ` Mikael Pettersson
2009-08-29 19:01 ` Rafael J. Wysocki
2009-08-29 19:01 ` Rafael J. Wysocki
2009-08-31 13:22 ` Mikael Pettersson
2009-09-01 1:34 ` Mikael Pettersson
2009-09-01 1:34 ` Mikael Pettersson
2009-09-01 18:42 ` Rafael J. Wysocki
2009-09-01 18:42 ` Rafael J. Wysocki
2009-09-03 1:23 ` Linus Torvalds
2009-09-03 1:23 ` Linus Torvalds
2009-09-03 11:29 ` OGAWA Hirofumi
2009-09-03 21:00 ` Mikael Pettersson
2009-09-03 21:00 ` Mikael Pettersson
2009-09-04 0:01 ` Linus Torvalds
2009-09-04 0:01 ` Linus Torvalds
2009-09-04 1:41 ` OGAWA Hirofumi
2009-09-04 1:41 ` OGAWA Hirofumi
2009-09-04 1:52 ` Linus Torvalds
2009-09-04 1:52 ` Linus Torvalds
2009-09-04 15:28 ` Alan Cox
2009-09-04 15:28 ` Alan Cox
2009-09-04 17:33 ` Linus Torvalds
2009-09-04 17:33 ` Linus Torvalds
2009-09-03 20:27 ` Mikael Pettersson
2009-09-04 13:23 ` Mikael Pettersson
2009-09-04 13:23 ` Mikael Pettersson
2009-09-04 17:30 ` Linus Torvalds
2009-09-04 17:30 ` Linus Torvalds
2009-09-04 17:53 ` Linus Torvalds
2009-09-04 17:53 ` Linus Torvalds
2009-09-04 17:55 ` Linus Torvalds
2009-09-04 17:55 ` Linus Torvalds
2009-09-04 18:11 ` Linus Torvalds
2009-09-04 18:11 ` Linus Torvalds
2009-09-04 19:11 ` Linus Torvalds
2009-09-04 19:11 ` Linus Torvalds
2009-09-04 19:19 ` Linus Torvalds
2009-09-04 19:19 ` Linus Torvalds
2009-09-05 10:46 ` Mikael Pettersson
2009-09-05 10:46 ` Mikael Pettersson
2009-09-05 20:29 ` Linus Torvalds
2009-09-05 20:29 ` Linus Torvalds
2009-09-05 22:42 ` Mikael Pettersson
2009-09-05 22:42 ` Mikael Pettersson
2009-09-05 17:00 ` OGAWA Hirofumi
2009-09-05 17:00 ` OGAWA Hirofumi
2009-09-05 18:06 ` Linus Torvalds
2009-09-05 18:06 ` Linus Torvalds
2009-09-05 18:56 ` OGAWA Hirofumi
2009-09-05 18:56 ` OGAWA Hirofumi
2009-09-05 21:56 ` Alan Cox
2009-09-05 21:56 ` Alan Cox
2009-09-05 22:46 ` OGAWA Hirofumi
2009-09-05 22:46 ` OGAWA Hirofumi
2009-09-04 21:12 ` Alan Cox
2009-09-04 21:12 ` Alan Cox
2009-08-25 20:34 ` [Bug #14013] hd don't show up Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #14018] kernel freezes, inotify problem Rafael J. Wysocki
2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #14017] _end symbol missing from Symbol.map Rafael J. Wysocki
2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #14030] Kernel NULL pointer dereference at 0000000000000008, pty-related Rafael J. Wysocki
2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-26 0:16 ` Linus Torvalds
2009-08-26 21:11 ` Rafael J. Wysocki
2009-08-26 21:11 ` Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #14031] dvb_usb_af9015: Oops on hotplugging Rafael J. Wysocki
2009-08-25 23:57 ` Stefan Lippers-Hollmann
2009-08-25 23:57 ` Stefan Lippers-Hollmann
2009-08-26 0:03 ` Rafael J. Wysocki
2009-08-26 0:03 ` Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #14057] Strange network timeouts w/ e100 Rafael J. Wysocki
2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #14060] oops: sysfs_remove_link and i915 Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #14058] Oops in fsnotify Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #14061] Crash due to buggy flat_phys_pkg_id Rafael J. Wysocki
2009-08-25 20:34 ` Rafael J. Wysocki
2009-08-25 20:34 ` [Bug #14062] Failure to boot as xen guest Rafael J. Wysocki
2009-08-25 20:34 ` Rafael J. Wysocki
2009-09-01 19:47 ` Jeremy Fitzhardinge
2009-09-01 19:47 ` Jeremy Fitzhardinge
-- strict thread matches above, loose matches on Subject: below --
2009-09-06 17:15 2.6.31-rc9: Reported regressions from 2.6.30 Rafael J. Wysocki
2009-09-06 17:24 ` [Bug #13819] system freeze when switching to console Rafael J. Wysocki
2009-09-06 17:24 ` Rafael J. Wysocki
2009-09-08 16:29 ` reinette chatre
2009-09-08 16:29 ` reinette chatre
2009-09-08 17:00 ` Linus Torvalds
2009-09-08 17:00 ` Linus Torvalds
2009-09-08 17:36 ` reinette chatre
2009-09-08 17:36 ` reinette chatre
2009-09-08 18:06 ` Linus Torvalds
2009-09-08 18:20 ` Jesse Barnes
2009-09-08 18:20 ` Jesse Barnes
2009-09-08 19:26 ` Linus Torvalds
2009-09-08 19:26 ` Linus Torvalds
2009-09-08 19:31 ` Jesse Barnes
2009-09-08 19:31 ` Jesse Barnes
2009-09-08 22:06 ` Linus Torvalds
2009-09-08 22:06 ` Linus Torvalds
2009-09-08 22:11 ` Jesse Barnes
2009-09-08 22:11 ` Jesse Barnes
2009-09-08 23:36 ` Linus Torvalds
2009-09-08 23:36 ` Linus Torvalds
2009-09-08 23:45 ` Jesse Barnes
2009-09-08 23:05 ` Jesse Barnes
2009-09-08 23:56 ` reinette chatre
2009-09-08 19:19 ` Linus Torvalds
2009-09-08 19:19 ` Linus Torvalds
2009-09-08 22:37 ` reinette chatre
2009-09-08 22:37 ` reinette chatre
2009-09-08 23:16 ` Jesse Barnes
2009-09-08 23:27 ` reinette chatre
2009-09-08 23:27 ` reinette chatre
2009-09-08 17:24 ` Jesse Barnes
2009-08-19 20:20 2.6.31-rc6-git5: Reported regressions from 2.6.30 Rafael J. Wysocki
2009-08-19 20:26 ` [Bug #13819] system freeze when switching to console Rafael J. Wysocki
2009-08-19 23:35 ` reinette chatre
2009-08-19 23:35 ` reinette chatre
2009-08-20 14:55 ` Rafael J. Wysocki
2009-08-20 14:55 ` Rafael J. Wysocki
2009-08-09 20:36 2.6.31-rc5-git5: Reported regressions from 2.6.30 Rafael J. Wysocki
2009-08-09 20:44 ` [Bug #13819] system freeze when switching to console Rafael J. Wysocki
2009-08-09 20:44 ` Rafael J. Wysocki
2009-08-02 18:49 2.6.31-rc5: Reported regressions from 2.6.30 Rafael J. Wysocki
2009-08-02 18:58 ` [Bug #13819] system freeze when switching to console Rafael J. Wysocki
2009-07-26 20:23 2.6.31-rc4: Reported regressions from 2.6.30 Rafael J. Wysocki
2009-07-26 20:28 ` [Bug #13819] system freeze when switching to console Rafael J. Wysocki
2009-07-26 20:28 ` Rafael J. Wysocki
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.