* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure [not found] <bug-15709-10286@https.bugzilla.kernel.org/> @ 2010-04-08 19:34 ` Andrew Morton 2010-04-08 19:39 ` Avi Kivity 0 siblings, 1 reply; 62+ messages in thread From: Andrew Morton @ 2010-04-08 19:34 UTC (permalink / raw) To: linux-mm Cc: bugzilla-daemon, bugme-daemon, Avi Kivity, Rusty Russell, kernel, Mel Gorman (switched to email. Please respond via emailed reply-to-all, not via the bugzilla web interface). On Wed, 7 Apr 2010 10:29:20 GMT bugzilla-daemon@bugzilla.kernel.org wrote: > https://bugzilla.kernel.org/show_bug.cgi?id=15709 > > Summary: swapper page allocation failure > Product: Memory Management > Version: 2.5 > Kernel Version: 2.6.32 and 2.6.33 > Platform: All > OS/Version: Linux > Tree: Mainline > Status: NEW > Severity: normal > Priority: P1 > Component: Slab Allocator > AssignedTo: akpm@linux-foundation.org > ReportedBy: kernel@tauceti.net > Regression: No > > > Created an attachment (id=25903) > --> (https://bugzilla.kernel.org/attachment.cgi?id=25903) > dmesg output > > I'm having problems with "swapper page allocation failure's" since upgrading > from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual > maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As > long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after > upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see > attachment of dmesg output). The guest is only running a Apache webserver and > serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried > different kernel configurations (e.g. a unmodified version from Sabayon Linux > Distribution) but doesn't help. Load of the guest (and host) is very low. > Network traffic is about 20-50 MBit/s. > hm, this is a regression. : [ 454.006706] users: page allocation failure. order:0, mode:0x20 : [ 454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2 : [ 454.006714] Call Trace: : [ 454.006717] <IRQ> [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615 : [ 454.006796] [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d : [ 454.006820] [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f : [ 454.006842] [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f : [ 454.006846] [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7 : [ 454.006858] [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4 : [ 454.006873] [<ffffffff8176035d>] net_rx_action+0xad/0x1a5 : [ 454.006882] [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127 : [ 454.006897] [<ffffffff81008ffc>] call_softirq+0x1c/0x30 : [ 454.006901] [<ffffffff8100af01>] do_softirq+0x41/0x7e : [ 454.006904] [<ffffffff8104b3e3>] irq_exit+0x36/0x75 : [ 454.006907] [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1 : [ 454.006926] [<ffffffff8183bc13>] ret_from_intr+0x0/0x11 : [ 454.006928] <EOI> [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7 : [ 454.006942] [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7 : [ 454.006946] [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35 : [ 454.006949] [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b : [ 454.006953] [<ffffffff810aba67>] __do_fault+0x3c4/0x492 : [ 454.006957] [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8 : [ 454.006966] [<ffffffff810deb59>] ? path_put+0x2c/0x30 : [ 454.006975] [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a : [ 454.006979] [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd : [ 454.006982] [<ffffffff8183bef5>] page_fault+0x25/0x30 : [ 454.006985] Mem-Info: : [ 454.006987] Node 0 DMA per-cpu: : [ 454.006990] CPU 0: hi: 0, btch: 1 usd: 0 : [ 454.006992] CPU 1: hi: 0, btch: 1 usd: 0 : [ 454.006993] Node 0 DMA32 per-cpu: : [ 454.006996] CPU 0: hi: 186, btch: 31 usd: 185 : [ 454.006998] CPU 1: hi: 186, btch: 31 usd: 112 : [ 454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0 : [ 454.007005] active_file:4882 inactive_file:205902 isolated_file:0 : [ 454.007006] unevictable:0 dirty:11 writeback:0 unstable:0 : [ 454.007007] free:1385 slab_reclaimable:2445 slab_unreclaimable:4466 : [ 454.007008] mapped:1895 shmem:113 pagetables:1370 bounce:0 : [ 454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no : [ 454.007021] lowmem_reserve[]: 0 994 994 994 : [ 454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no : [ 454.007036] lowmem_reserve[]: 0 0 0 0 : [ 454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB : [ 454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB : [ 454.007059] 210914 total pagecache pages : [ 454.007061] 0 pages in swap cache : [ 454.007063] Swap cache stats: add 0, delete 0, find 0/0 : [ 454.007065] Free swap = 1959924kB : [ 454.007067] Total swap = 1959924kB : [ 454.014238] 262140 pages RAM : [ 454.014241] 7489 pages reserved : [ 454.014242] 21430 pages shared : [ 454.014244] 247174 pages non-shared Either page reclaim got worse or kvm/virtio-net got more aggressive. Avi, Rusty: can you think of any changes in the KVM/virtio area in the 2.6.30 -> 2.6.32 timeframe which may have increased the GFP_ATOMIC demands upon the page allocator? Thanks. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-08 19:34 ` [Bugme-new] [Bug 15709] New: swapper page allocation failure Andrew Morton @ 2010-04-08 19:39 ` Avi Kivity 2010-04-08 20:04 ` Michael S. Tsirkin 0 siblings, 1 reply; 62+ messages in thread From: Avi Kivity @ 2010-04-08 19:39 UTC (permalink / raw) To: Andrew Morton Cc: linux-mm, bugzilla-daemon, bugme-daemon, Rusty Russell, kernel, Mel Gorman, Michael S. Tsirkin cc: mst On 04/08/2010 10:34 PM, Andrew Morton wrote: > (switched to email. Please respond via emailed reply-to-all, not via the > bugzilla web interface). > > On Wed, 7 Apr 2010 10:29:20 GMT > bugzilla-daemon@bugzilla.kernel.org wrote: > > >> https://bugzilla.kernel.org/show_bug.cgi?id=15709 >> >> Summary: swapper page allocation failure >> Product: Memory Management >> Version: 2.5 >> Kernel Version: 2.6.32 and 2.6.33 >> Platform: All >> OS/Version: Linux >> Tree: Mainline >> Status: NEW >> Severity: normal >> Priority: P1 >> Component: Slab Allocator >> AssignedTo: akpm@linux-foundation.org >> ReportedBy: kernel@tauceti.net >> Regression: No >> >> >> Created an attachment (id=25903) >> --> (https://bugzilla.kernel.org/attachment.cgi?id=25903) >> dmesg output >> >> I'm having problems with "swapper page allocation failure's" since upgrading >> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual >> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As >> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after >> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see >> attachment of dmesg output). The guest is only running a Apache webserver and >> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried >> different kernel configurations (e.g. a unmodified version from Sabayon Linux >> Distribution) but doesn't help. Load of the guest (and host) is very low. >> Network traffic is about 20-50 MBit/s. >> >> > hm, this is a regression. > > : [ 454.006706] users: page allocation failure. order:0, mode:0x20 > : [ 454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2 > : [ 454.006714] Call Trace: > : [ 454.006717]<IRQ> [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615 > : [ 454.006796] [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d > : [ 454.006820] [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f > : [ 454.006842] [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f > : [ 454.006846] [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7 > : [ 454.006858] [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4 > : [ 454.006873] [<ffffffff8176035d>] net_rx_action+0xad/0x1a5 > : [ 454.006882] [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127 > : [ 454.006897] [<ffffffff81008ffc>] call_softirq+0x1c/0x30 > : [ 454.006901] [<ffffffff8100af01>] do_softirq+0x41/0x7e > : [ 454.006904] [<ffffffff8104b3e3>] irq_exit+0x36/0x75 > : [ 454.006907] [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1 > : [ 454.006926] [<ffffffff8183bc13>] ret_from_intr+0x0/0x11 > : [ 454.006928]<EOI> [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7 > : [ 454.006942] [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7 > : [ 454.006946] [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35 > : [ 454.006949] [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b > : [ 454.006953] [<ffffffff810aba67>] __do_fault+0x3c4/0x492 > : [ 454.006957] [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8 > : [ 454.006966] [<ffffffff810deb59>] ? path_put+0x2c/0x30 > : [ 454.006975] [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a > : [ 454.006979] [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd > : [ 454.006982] [<ffffffff8183bef5>] page_fault+0x25/0x30 > : [ 454.006985] Mem-Info: > : [ 454.006987] Node 0 DMA per-cpu: > : [ 454.006990] CPU 0: hi: 0, btch: 1 usd: 0 > : [ 454.006992] CPU 1: hi: 0, btch: 1 usd: 0 > : [ 454.006993] Node 0 DMA32 per-cpu: > : [ 454.006996] CPU 0: hi: 186, btch: 31 usd: 185 > : [ 454.006998] CPU 1: hi: 186, btch: 31 usd: 112 > : [ 454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0 > : [ 454.007005] active_file:4882 inactive_file:205902 isolated_file:0 > : [ 454.007006] unevictable:0 dirty:11 writeback:0 unstable:0 > : [ 454.007007] free:1385 slab_reclaimable:2445 slab_unreclaimable:4466 > : [ 454.007008] mapped:1895 shmem:113 pagetables:1370 bounce:0 > : [ 454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > : [ 454.007021] lowmem_reserve[]: 0 994 994 994 > : [ 454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > : [ 454.007036] lowmem_reserve[]: 0 0 0 0 > : [ 454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB > : [ 454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB > : [ 454.007059] 210914 total pagecache pages > : [ 454.007061] 0 pages in swap cache > : [ 454.007063] Swap cache stats: add 0, delete 0, find 0/0 > : [ 454.007065] Free swap = 1959924kB > : [ 454.007067] Total swap = 1959924kB > : [ 454.014238] 262140 pages RAM > : [ 454.014241] 7489 pages reserved > : [ 454.014242] 21430 pages shared > : [ 454.014244] 247174 pages non-shared > > Either page reclaim got worse or kvm/virtio-net got more aggressive. > > Avi, Rusty: can you think of any changes in the KVM/virtio area in the > 2.6.30 -> 2.6.32 timeframe which may have increased the GFP_ATOMIC > demands upon the page allocator? > > Thanks. > -- I have a truly marvellous patch that fixes the bug which this signature is too narrow to contain. -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-08 19:39 ` Avi Kivity @ 2010-04-08 20:04 ` Michael S. Tsirkin 2010-04-09 10:15 ` Robert Wimmer 0 siblings, 1 reply; 62+ messages in thread From: Michael S. Tsirkin @ 2010-04-08 20:04 UTC (permalink / raw) To: Avi Kivity Cc: Andrew Morton, linux-mm, bugzilla-daemon, bugme-daemon, Rusty Russell, kernel, Mel Gorman On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote: > cc: mst > > On 04/08/2010 10:34 PM, Andrew Morton wrote: >> (switched to email. Please respond via emailed reply-to-all, not via the >> bugzilla web interface). >> >> On Wed, 7 Apr 2010 10:29:20 GMT >> bugzilla-daemon@bugzilla.kernel.org wrote: >> >> >>> https://bugzilla.kernel.org/show_bug.cgi?id=15709 >>> >>> Summary: swapper page allocation failure >>> Product: Memory Management >>> Version: 2.5 >>> Kernel Version: 2.6.32 and 2.6.33 >>> Platform: All >>> OS/Version: Linux >>> Tree: Mainline >>> Status: NEW >>> Severity: normal >>> Priority: P1 >>> Component: Slab Allocator >>> AssignedTo: akpm@linux-foundation.org >>> ReportedBy: kernel@tauceti.net >>> Regression: No >>> >>> >>> Created an attachment (id=25903) >>> --> (https://bugzilla.kernel.org/attachment.cgi?id=25903) >>> dmesg output >>> >>> I'm having problems with "swapper page allocation failure's" since upgrading >>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual >>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As >>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after >>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see >>> attachment of dmesg output). The guest is only running a Apache webserver and >>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried >>> different kernel configurations (e.g. a unmodified version from Sabayon Linux >>> Distribution) but doesn't help. Load of the guest (and host) is very low. >>> Network traffic is about 20-50 MBit/s. >>> >>> >> hm, this is a regression. >> >> : [ 454.006706] users: page allocation failure. order:0, mode:0x20 >> : [ 454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2 >> : [ 454.006714] Call Trace: >> : [ 454.006717]<IRQ> [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615 >> : [ 454.006796] [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d >> : [ 454.006820] [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f >> : [ 454.006842] [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f >> : [ 454.006846] [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7 >> : [ 454.006858] [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4 >> : [ 454.006873] [<ffffffff8176035d>] net_rx_action+0xad/0x1a5 >> : [ 454.006882] [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127 >> : [ 454.006897] [<ffffffff81008ffc>] call_softirq+0x1c/0x30 >> : [ 454.006901] [<ffffffff8100af01>] do_softirq+0x41/0x7e >> : [ 454.006904] [<ffffffff8104b3e3>] irq_exit+0x36/0x75 >> : [ 454.006907] [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1 >> : [ 454.006926] [<ffffffff8183bc13>] ret_from_intr+0x0/0x11 >> : [ 454.006928]<EOI> [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7 >> : [ 454.006942] [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7 >> : [ 454.006946] [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35 >> : [ 454.006949] [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b >> : [ 454.006953] [<ffffffff810aba67>] __do_fault+0x3c4/0x492 >> : [ 454.006957] [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8 >> : [ 454.006966] [<ffffffff810deb59>] ? path_put+0x2c/0x30 >> : [ 454.006975] [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a >> : [ 454.006979] [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd >> : [ 454.006982] [<ffffffff8183bef5>] page_fault+0x25/0x30 >> : [ 454.006985] Mem-Info: >> : [ 454.006987] Node 0 DMA per-cpu: >> : [ 454.006990] CPU 0: hi: 0, btch: 1 usd: 0 >> : [ 454.006992] CPU 1: hi: 0, btch: 1 usd: 0 >> : [ 454.006993] Node 0 DMA32 per-cpu: >> : [ 454.006996] CPU 0: hi: 186, btch: 31 usd: 185 >> : [ 454.006998] CPU 1: hi: 186, btch: 31 usd: 112 >> : [ 454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0 >> : [ 454.007005] active_file:4882 inactive_file:205902 isolated_file:0 >> : [ 454.007006] unevictable:0 dirty:11 writeback:0 unstable:0 >> : [ 454.007007] free:1385 slab_reclaimable:2445 slab_unreclaimable:4466 >> : [ 454.007008] mapped:1895 shmem:113 pagetables:1370 bounce:0 >> : [ 454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no >> : [ 454.007021] lowmem_reserve[]: 0 994 994 994 >> : [ 454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no >> : [ 454.007036] lowmem_reserve[]: 0 0 0 0 >> : [ 454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB >> : [ 454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB >> : [ 454.007059] 210914 total pagecache pages >> : [ 454.007061] 0 pages in swap cache >> : [ 454.007063] Swap cache stats: add 0, delete 0, find 0/0 >> : [ 454.007065] Free swap = 1959924kB >> : [ 454.007067] Total swap = 1959924kB >> : [ 454.014238] 262140 pages RAM >> : [ 454.014241] 7489 pages reserved >> : [ 454.014242] 21430 pages shared >> : [ 454.014244] 247174 pages non-shared >> >> Either page reclaim got worse or kvm/virtio-net got more aggressive. >> >> Avi, Rusty: can you think of any changes in the KVM/virtio area in the >> 2.6.30 -> 2.6.32 timeframe which may have increased the GFP_ATOMIC >> demands upon the page allocator? >> >> Thanks. >> On the contrary, with commit 3161e453e496eb5643faad30fff5a5ab183da0fe we should be using GFP_ATOMIC less. But maybe there's a bug and it has the reverse effect somehow ... Robert, could you pls try 3161e453e496eb5643faad30fff5a5ab183da0fe and if that *does* have the problem, 0b4f2928f14c4a9770b0866923fc81beb7f4aa57? -- MST -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-08 20:04 ` Michael S. Tsirkin @ 2010-04-09 10:15 ` Robert Wimmer 2010-04-11 11:03 ` Michael S. Tsirkin 0 siblings, 1 reply; 62+ messages in thread From: Robert Wimmer @ 2010-04-09 10:15 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, bugme-daemon, Rusty Russell, Mel Gorman I'm not really a git hero so here is what I've done: cd /usr/src git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux cd linux git checkout -b mykernel 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 Then I've checked drivers/net/virtio_net.c drivers/net/smc91x.c if the changes commited where not in there. Next I build my kernel as usual. I used my .config from 2.6.30 (which is working fine in a several guests / .config see here: https://bugzilla.kernel.org/attachment.cgi?id=25925) and build the kernel genkernel --menuconfig --lvm --oldconfig all which finally gave me a 2.6.31-rc5. I should mention that 2.6.30 was using SLUB. So here is the output from the 2.6.31-rc5 kernel running about 20 min.: https://bugzilla.kernel.org/attachment.cgi?id=25926 Seems not very usefull to me. I'm currently compiling the same kernel with SLAB. Please let me know if the git commands above are right and/or if you need other kernel options enabled. Thanks! Robert On 04/08/10 22:04, Michael S. Tsirkin wrote: > On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote: > >> cc: mst >> >> On 04/08/2010 10:34 PM, Andrew Morton wrote: >> >>> (switched to email. Please respond via emailed reply-to-all, not via the >>> bugzilla web interface). >>> >>> On Wed, 7 Apr 2010 10:29:20 GMT >>> bugzilla-daemon@bugzilla.kernel.org wrote: >>> >>> >>> >>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709 >>>> >>>> Summary: swapper page allocation failure >>>> Product: Memory Management >>>> Version: 2.5 >>>> Kernel Version: 2.6.32 and 2.6.33 >>>> Platform: All >>>> OS/Version: Linux >>>> Tree: Mainline >>>> Status: NEW >>>> Severity: normal >>>> Priority: P1 >>>> Component: Slab Allocator >>>> AssignedTo: akpm@linux-foundation.org >>>> ReportedBy: kernel@tauceti.net >>>> Regression: No >>>> >>>> >>>> Created an attachment (id=25903) >>>> --> (https://bugzilla.kernel.org/attachment.cgi?id=25903) >>>> dmesg output >>>> >>>> I'm having problems with "swapper page allocation failure's" since upgrading >>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual >>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As >>>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after >>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see >>>> attachment of dmesg output). The guest is only running a Apache webserver and >>>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried >>>> different kernel configurations (e.g. a unmodified version from Sabayon Linux >>>> Distribution) but doesn't help. Load of the guest (and host) is very low. >>>> Network traffic is about 20-50 MBit/s. >>>> >>>> >>>> >>> hm, this is a regression. >>> >>> : [ 454.006706] users: page allocation failure. order:0, mode:0x20 >>> : [ 454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2 >>> : [ 454.006714] Call Trace: >>> : [ 454.006717]<IRQ> [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615 >>> : [ 454.006796] [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d >>> : [ 454.006820] [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f >>> : [ 454.006842] [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f >>> : [ 454.006846] [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7 >>> : [ 454.006858] [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4 >>> : [ 454.006873] [<ffffffff8176035d>] net_rx_action+0xad/0x1a5 >>> : [ 454.006882] [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127 >>> : [ 454.006897] [<ffffffff81008ffc>] call_softirq+0x1c/0x30 >>> : [ 454.006901] [<ffffffff8100af01>] do_softirq+0x41/0x7e >>> : [ 454.006904] [<ffffffff8104b3e3>] irq_exit+0x36/0x75 >>> : [ 454.006907] [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1 >>> : [ 454.006926] [<ffffffff8183bc13>] ret_from_intr+0x0/0x11 >>> : [ 454.006928]<EOI> [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7 >>> : [ 454.006942] [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7 >>> : [ 454.006946] [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35 >>> : [ 454.006949] [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b >>> : [ 454.006953] [<ffffffff810aba67>] __do_fault+0x3c4/0x492 >>> : [ 454.006957] [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8 >>> : [ 454.006966] [<ffffffff810deb59>] ? path_put+0x2c/0x30 >>> : [ 454.006975] [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a >>> : [ 454.006979] [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd >>> : [ 454.006982] [<ffffffff8183bef5>] page_fault+0x25/0x30 >>> : [ 454.006985] Mem-Info: >>> : [ 454.006987] Node 0 DMA per-cpu: >>> : [ 454.006990] CPU 0: hi: 0, btch: 1 usd: 0 >>> : [ 454.006992] CPU 1: hi: 0, btch: 1 usd: 0 >>> : [ 454.006993] Node 0 DMA32 per-cpu: >>> : [ 454.006996] CPU 0: hi: 186, btch: 31 usd: 185 >>> : [ 454.006998] CPU 1: hi: 186, btch: 31 usd: 112 >>> : [ 454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0 >>> : [ 454.007005] active_file:4882 inactive_file:205902 isolated_file:0 >>> : [ 454.007006] unevictable:0 dirty:11 writeback:0 unstable:0 >>> : [ 454.007007] free:1385 slab_reclaimable:2445 slab_unreclaimable:4466 >>> : [ 454.007008] mapped:1895 shmem:113 pagetables:1370 bounce:0 >>> : [ 454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no >>> : [ 454.007021] lowmem_reserve[]: 0 994 994 994 >>> : [ 454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no >>> : [ 454.007036] lowmem_reserve[]: 0 0 0 0 >>> : [ 454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB >>> : [ 454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB >>> : [ 454.007059] 210914 total pagecache pages >>> : [ 454.007061] 0 pages in swap cache >>> : [ 454.007063] Swap cache stats: add 0, delete 0, find 0/0 >>> : [ 454.007065] Free swap = 1959924kB >>> : [ 454.007067] Total swap = 1959924kB >>> : [ 454.014238] 262140 pages RAM >>> : [ 454.014241] 7489 pages reserved >>> : [ 454.014242] 21430 pages shared >>> : [ 454.014244] 247174 pages non-shared >>> >>> Either page reclaim got worse or kvm/virtio-net got more aggressive. >>> >>> Avi, Rusty: can you think of any changes in the KVM/virtio area in the >>> 2.6.30 -> 2.6.32 timeframe which may have increased the GFP_ATOMIC >>> demands upon the page allocator? >>> >>> Thanks. >>> >>> > On the contrary, with commit > 3161e453e496eb5643faad30fff5a5ab183da0fe > we should be using GFP_ATOMIC less. > But maybe there's a bug and it has the reverse effect somehow ... > > Robert, could you pls try 3161e453e496eb5643faad30fff5a5ab183da0fe > and if that *does* have the problem, > 0b4f2928f14c4a9770b0866923fc81beb7f4aa57? > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-09 10:15 ` Robert Wimmer @ 2010-04-11 11:03 ` Michael S. Tsirkin 2010-04-12 9:25 ` Robert Wimmer 0 siblings, 1 reply; 62+ messages in thread From: Michael S. Tsirkin @ 2010-04-11 11:03 UTC (permalink / raw) To: Robert Wimmer Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, bugme-daemon, Rusty Russell, Mel Gorman On Fri, Apr 09, 2010 at 12:15:01PM +0200, Robert Wimmer wrote: > I'm not really a git hero so here is what I've done: > > cd /usr/src > git clone > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux > cd linux > git checkout -b mykernel 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 Looks right. > Then I've checked > > drivers/net/virtio_net.c > drivers/net/smc91x.c > > if the changes commited where not in there. > Next I build my kernel as usual. I used my .config > from 2.6.30 (which is working fine in a several > guests / .config see here: > https://bugzilla.kernel.org/attachment.cgi?id=25925) > and build the kernel > > genkernel --menuconfig --lvm --oldconfig all > > which finally gave me a 2.6.31-rc5. That's right. > I should mention > that 2.6.30 was using SLUB. So here is the output > from the 2.6.31-rc5 kernel running about 20 min.: > https://bugzilla.kernel.org/attachment.cgi?id=25926 Hmm, so we see the error here as well? > Seems not very usefull to me. I'm currently compiling > the same kernel with SLAB. > > Please let me know if the git commands above are > right and/or if you need other kernel options enabled. Looks right. You don't have to add -b flag if you don't want to. > Thanks! > Robert Hmm, I do not see anything else that seems related. Could you please try to bisect? git bisect start v2.6.31 v2.6.30 -- drivers/virtio/ drivers/net/virtio_net.c should help assuming the change that triggers this is in virtio. > On 04/08/10 22:04, Michael S. Tsirkin wrote: > > On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote: > > > >> cc: mst > >> > >> On 04/08/2010 10:34 PM, Andrew Morton wrote: > >> > >>> (switched to email. Please respond via emailed reply-to-all, not via the > >>> bugzilla web interface). > >>> > >>> On Wed, 7 Apr 2010 10:29:20 GMT > >>> bugzilla-daemon@bugzilla.kernel.org wrote: > >>> > >>> > >>> > >>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709 > >>>> > >>>> Summary: swapper page allocation failure > >>>> Product: Memory Management > >>>> Version: 2.5 > >>>> Kernel Version: 2.6.32 and 2.6.33 > >>>> Platform: All > >>>> OS/Version: Linux > >>>> Tree: Mainline > >>>> Status: NEW > >>>> Severity: normal > >>>> Priority: P1 > >>>> Component: Slab Allocator > >>>> AssignedTo: akpm@linux-foundation.org > >>>> ReportedBy: kernel@tauceti.net > >>>> Regression: No > >>>> > >>>> > >>>> Created an attachment (id=25903) > >>>> --> (https://bugzilla.kernel.org/attachment.cgi?id=25903) > >>>> dmesg output > >>>> > >>>> I'm having problems with "swapper page allocation failure's" since upgrading > >>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual > >>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As > >>>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after > >>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see > >>>> attachment of dmesg output). The guest is only running a Apache webserver and > >>>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried > >>>> different kernel configurations (e.g. a unmodified version from Sabayon Linux > >>>> Distribution) but doesn't help. Load of the guest (and host) is very low. > >>>> Network traffic is about 20-50 MBit/s. > >>>> > >>>> > >>>> > >>> hm, this is a regression. > >>> > >>> : [ 454.006706] users: page allocation failure. order:0, mode:0x20 > >>> : [ 454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2 > >>> : [ 454.006714] Call Trace: > >>> : [ 454.006717]<IRQ> [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615 > >>> : [ 454.006796] [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d > >>> : [ 454.006820] [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f > >>> : [ 454.006842] [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f > >>> : [ 454.006846] [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7 > >>> : [ 454.006858] [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4 > >>> : [ 454.006873] [<ffffffff8176035d>] net_rx_action+0xad/0x1a5 > >>> : [ 454.006882] [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127 > >>> : [ 454.006897] [<ffffffff81008ffc>] call_softirq+0x1c/0x30 > >>> : [ 454.006901] [<ffffffff8100af01>] do_softirq+0x41/0x7e > >>> : [ 454.006904] [<ffffffff8104b3e3>] irq_exit+0x36/0x75 > >>> : [ 454.006907] [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1 > >>> : [ 454.006926] [<ffffffff8183bc13>] ret_from_intr+0x0/0x11 > >>> : [ 454.006928]<EOI> [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7 > >>> : [ 454.006942] [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7 > >>> : [ 454.006946] [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35 > >>> : [ 454.006949] [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b > >>> : [ 454.006953] [<ffffffff810aba67>] __do_fault+0x3c4/0x492 > >>> : [ 454.006957] [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8 > >>> : [ 454.006966] [<ffffffff810deb59>] ? path_put+0x2c/0x30 > >>> : [ 454.006975] [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a > >>> : [ 454.006979] [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd > >>> : [ 454.006982] [<ffffffff8183bef5>] page_fault+0x25/0x30 > >>> : [ 454.006985] Mem-Info: > >>> : [ 454.006987] Node 0 DMA per-cpu: > >>> : [ 454.006990] CPU 0: hi: 0, btch: 1 usd: 0 > >>> : [ 454.006992] CPU 1: hi: 0, btch: 1 usd: 0 > >>> : [ 454.006993] Node 0 DMA32 per-cpu: > >>> : [ 454.006996] CPU 0: hi: 186, btch: 31 usd: 185 > >>> : [ 454.006998] CPU 1: hi: 186, btch: 31 usd: 112 > >>> : [ 454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0 > >>> : [ 454.007005] active_file:4882 inactive_file:205902 isolated_file:0 > >>> : [ 454.007006] unevictable:0 dirty:11 writeback:0 unstable:0 > >>> : [ 454.007007] free:1385 slab_reclaimable:2445 slab_unreclaimable:4466 > >>> : [ 454.007008] mapped:1895 shmem:113 pagetables:1370 bounce:0 > >>> : [ 454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > >>> : [ 454.007021] lowmem_reserve[]: 0 994 994 994 > >>> : [ 454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > >>> : [ 454.007036] lowmem_reserve[]: 0 0 0 0 > >>> : [ 454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB > >>> : [ 454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB > >>> : [ 454.007059] 210914 total pagecache pages > >>> : [ 454.007061] 0 pages in swap cache > >>> : [ 454.007063] Swap cache stats: add 0, delete 0, find 0/0 > >>> : [ 454.007065] Free swap = 1959924kB > >>> : [ 454.007067] Total swap = 1959924kB > >>> : [ 454.014238] 262140 pages RAM > >>> : [ 454.014241] 7489 pages reserved > >>> : [ 454.014242] 21430 pages shared > >>> : [ 454.014244] 247174 pages non-shared > >>> > >>> Either page reclaim got worse or kvm/virtio-net got more aggressive. > >>> > >>> Avi, Rusty: can you think of any changes in the KVM/virtio area in the > >>> 2.6.30 -> 2.6.32 timeframe which may have increased the GFP_ATOMIC > >>> demands upon the page allocator? > >>> > >>> Thanks. > >>> > >>> > > On the contrary, with commit > > 3161e453e496eb5643faad30fff5a5ab183da0fe > > we should be using GFP_ATOMIC less. > > But maybe there's a bug and it has the reverse effect somehow ... > > > > Robert, could you pls try 3161e453e496eb5643faad30fff5a5ab183da0fe > > and if that *does* have the problem, > > 0b4f2928f14c4a9770b0866923fc81beb7f4aa57? > > > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-11 11:03 ` Michael S. Tsirkin @ 2010-04-12 9:25 ` Robert Wimmer 2010-04-12 11:23 ` Michael S. Tsirkin 0 siblings, 1 reply; 62+ messages in thread From: Robert Wimmer @ 2010-04-12 9:25 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, bugme-daemon, Rusty Russell, Mel Gorman server10:/usr/src/linux # git bisect start v2.6.31 v2.6.30 -- drivers/virtio/ drivers/net/virtio_net.c Bisecting: 12 revisions left to test after this (roughly 4 steps) [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance id_matching for virtio drivers Today I've upgraded to qemu-kvm-0.12.3-r1 (Gentoo package) but doesn't help. Still getting "page allocation failure" with 2.6.31-rc5. Does it makes sense to use the same 2.6.31-rc5 kernel in the host and guest for testing? Currently I'm still using 2.6.32 in host and testing 2.6.31-rc5 in guest until "crashes". Then I start the guest with 2.6.30 again which works without trouble with 2.6.32 as host. This is really strange. I have hosts with 2.6.32 running guests with 2.6.32 which works perfectly. These hosts and guests running on HP DL 380 G6 with Intel Xeon X5560. The guests which don't work with 2.6.32 (and 2.6.32 as host) running on HP DL 380 G5 with Intel Xeon L5420. (All guests) and (all hosts) have the same packages and the same versions installed and the same kernel configs (hosts and guests using different .config but the difference is very small e.g. CONFIG_PARAVIRT_SPINLOCKS=y, CONFIG_PARAVIRT_GUEST=y in guests but not in hosts .config). I've had problems with qemu-kvm 0.12.2 with high network traffic which was solved by a patch submitted by Tom Lendacky: "Fix a race condition where qemu finds that there are not enough virtio ring buffers available and the guest make more buffers available before qemu can enable notifications." http://www.mail-archive.com/kvm@vger.kernel.org/msg28667.html It was a real lifesaver for the HP DL 380 G6 mentioned above but maybe this is now causing the problems with the G5 machines. The symptoms are the same. I can still log into the guest via VNC but the network is down. Thanks! Robert On 04/11/10 13:03, Michael S. Tsirkin wrote: > On Fri, Apr 09, 2010 at 12:15:01PM +0200, Robert Wimmer wrote: > >> I'm not really a git hero so here is what I've done: >> >> cd /usr/src >> git clone >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux >> cd linux >> git checkout -b mykernel 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 >> > Looks right. > > >> Then I've checked >> >> drivers/net/virtio_net.c >> drivers/net/smc91x.c >> >> if the changes commited where not in there. >> Next I build my kernel as usual. I used my .config >> from 2.6.30 (which is working fine in a several >> guests / .config see here: >> https://bugzilla.kernel.org/attachment.cgi?id=25925) >> and build the kernel >> >> genkernel --menuconfig --lvm --oldconfig all >> >> which finally gave me a 2.6.31-rc5. >> > That's right. > > >> I should mention >> that 2.6.30 was using SLUB. So here is the output >> from the 2.6.31-rc5 kernel running about 20 min.: >> https://bugzilla.kernel.org/attachment.cgi?id=25926 >> > Hmm, so we see the error here as well? > > >> Seems not very usefull to me. I'm currently compiling >> the same kernel with SLAB. >> >> Please let me know if the git commands above are >> right and/or if you need other kernel options enabled. >> > Looks right. You don't have to add -b flag if you don't > want to. > > >> Thanks! >> Robert >> > Hmm, I do not see anything else that seems related. > Could you please try to bisect? > > git bisect start v2.6.31 v2.6.30 -- drivers/virtio/ drivers/net/virtio_net.c > > should help assuming the change that triggers this is in virtio. > > > >> On 04/08/10 22:04, Michael S. Tsirkin wrote: >> >>> On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote: >>> >>> >>>> cc: mst >>>> >>>> On 04/08/2010 10:34 PM, Andrew Morton wrote: >>>> >>>> >>>>> (switched to email. Please respond via emailed reply-to-all, not via the >>>>> bugzilla web interface). >>>>> >>>>> On Wed, 7 Apr 2010 10:29:20 GMT >>>>> bugzilla-daemon@bugzilla.kernel.org wrote: >>>>> >>>>> >>>>> >>>>> >>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709 >>>>>> >>>>>> Summary: swapper page allocation failure >>>>>> Product: Memory Management >>>>>> Version: 2.5 >>>>>> Kernel Version: 2.6.32 and 2.6.33 >>>>>> Platform: All >>>>>> OS/Version: Linux >>>>>> Tree: Mainline >>>>>> Status: NEW >>>>>> Severity: normal >>>>>> Priority: P1 >>>>>> Component: Slab Allocator >>>>>> AssignedTo: akpm@linux-foundation.org >>>>>> ReportedBy: kernel@tauceti.net >>>>>> Regression: No >>>>>> >>>>>> >>>>>> Created an attachment (id=25903) >>>>>> --> (https://bugzilla.kernel.org/attachment.cgi?id=25903) >>>>>> dmesg output >>>>>> >>>>>> I'm having problems with "swapper page allocation failure's" since upgrading >>>>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual >>>>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As >>>>>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after >>>>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see >>>>>> attachment of dmesg output). The guest is only running a Apache webserver and >>>>>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried >>>>>> different kernel configurations (e.g. a unmodified version from Sabayon Linux >>>>>> Distribution) but doesn't help. Load of the guest (and host) is very low. >>>>>> Network traffic is about 20-50 MBit/s. >>>>>> >>>>>> >>>>>> >>>>>> >>>>> hm, this is a regression. >>>>> >>>>> : [ 454.006706] users: page allocation failure. order:0, mode:0x20 >>>>> : [ 454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2 >>>>> : [ 454.006714] Call Trace: >>>>> : [ 454.006717]<IRQ> [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615 >>>>> : [ 454.006796] [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d >>>>> : [ 454.006820] [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f >>>>> : [ 454.006842] [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f >>>>> : [ 454.006846] [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7 >>>>> : [ 454.006858] [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4 >>>>> : [ 454.006873] [<ffffffff8176035d>] net_rx_action+0xad/0x1a5 >>>>> : [ 454.006882] [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127 >>>>> : [ 454.006897] [<ffffffff81008ffc>] call_softirq+0x1c/0x30 >>>>> : [ 454.006901] [<ffffffff8100af01>] do_softirq+0x41/0x7e >>>>> : [ 454.006904] [<ffffffff8104b3e3>] irq_exit+0x36/0x75 >>>>> : [ 454.006907] [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1 >>>>> : [ 454.006926] [<ffffffff8183bc13>] ret_from_intr+0x0/0x11 >>>>> : [ 454.006928]<EOI> [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7 >>>>> : [ 454.006942] [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7 >>>>> : [ 454.006946] [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35 >>>>> : [ 454.006949] [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b >>>>> : [ 454.006953] [<ffffffff810aba67>] __do_fault+0x3c4/0x492 >>>>> : [ 454.006957] [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8 >>>>> : [ 454.006966] [<ffffffff810deb59>] ? path_put+0x2c/0x30 >>>>> : [ 454.006975] [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a >>>>> : [ 454.006979] [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd >>>>> : [ 454.006982] [<ffffffff8183bef5>] page_fault+0x25/0x30 >>>>> : [ 454.006985] Mem-Info: >>>>> : [ 454.006987] Node 0 DMA per-cpu: >>>>> : [ 454.006990] CPU 0: hi: 0, btch: 1 usd: 0 >>>>> : [ 454.006992] CPU 1: hi: 0, btch: 1 usd: 0 >>>>> : [ 454.006993] Node 0 DMA32 per-cpu: >>>>> : [ 454.006996] CPU 0: hi: 186, btch: 31 usd: 185 >>>>> : [ 454.006998] CPU 1: hi: 186, btch: 31 usd: 112 >>>>> : [ 454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0 >>>>> : [ 454.007005] active_file:4882 inactive_file:205902 isolated_file:0 >>>>> : [ 454.007006] unevictable:0 dirty:11 writeback:0 unstable:0 >>>>> : [ 454.007007] free:1385 slab_reclaimable:2445 slab_unreclaimable:4466 >>>>> : [ 454.007008] mapped:1895 shmem:113 pagetables:1370 bounce:0 >>>>> : [ 454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no >>>>> : [ 454.007021] lowmem_reserve[]: 0 994 994 994 >>>>> : [ 454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no >>>>> : [ 454.007036] lowmem_reserve[]: 0 0 0 0 >>>>> : [ 454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB >>>>> : [ 454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB >>>>> : [ 454.007059] 210914 total pagecache pages >>>>> : [ 454.007061] 0 pages in swap cache >>>>> : [ 454.007063] Swap cache stats: add 0, delete 0, find 0/0 >>>>> : [ 454.007065] Free swap = 1959924kB >>>>> : [ 454.007067] Total swap = 1959924kB >>>>> : [ 454.014238] 262140 pages RAM >>>>> : [ 454.014241] 7489 pages reserved >>>>> : [ 454.014242] 21430 pages shared >>>>> : [ 454.014244] 247174 pages non-shared >>>>> >>>>> Either page reclaim got worse or kvm/virtio-net got more aggressive. >>>>> >>>>> Avi, Rusty: can you think of any changes in the KVM/virtio area in the >>>>> 2.6.30 -> 2.6.32 timeframe which may have increased the GFP_ATOMIC >>>>> demands upon the page allocator? >>>>> >>>>> Thanks. >>>>> >>>>> >>>>> >>> On the contrary, with commit >>> 3161e453e496eb5643faad30fff5a5ab183da0fe >>> we should be using GFP_ATOMIC less. >>> But maybe there's a bug and it has the reverse effect somehow ... >>> >>> Robert, could you pls try 3161e453e496eb5643faad30fff5a5ab183da0fe >>> and if that *does* have the problem, >>> 0b4f2928f14c4a9770b0866923fc81beb7f4aa57? >>> >>> >>> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-12 9:25 ` Robert Wimmer @ 2010-04-12 11:23 ` Michael S. Tsirkin 2010-04-12 13:50 ` Robert Wimmer 0 siblings, 1 reply; 62+ messages in thread From: Michael S. Tsirkin @ 2010-04-12 11:23 UTC (permalink / raw) To: Robert Wimmer Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, bugme-daemon, Rusty Russell, Mel Gorman On Mon, Apr 12, 2010 at 11:25:26AM +0200, Robert Wimmer wrote: > server10:/usr/src/linux # git bisect start v2.6.31 v2.6.30 -- > drivers/virtio/ drivers/net/virtio_net.c > Bisecting: 12 revisions left to test after this (roughly 4 steps) > [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance id_matching > for virtio drivers > Sorry I wasn't clear. the way to use bisect is as follows: - first start as you did now. 1. now build kernel, install and test 2. if bug is there, type 'git bisect bad' 3. if bug is not there, type 'git bisect good' 4. The above will give you another kernel version to test if so go back to step 1 6. this will be repeated about 4 times (number of steps above) 7. in the end you will get the first revision which has the problem. Let's assume it is revision ABCDEF. Type git bisect log to see your history. 8. Now git reset --hard ABCDEF~1 and try again. If you see the problem with ABCDEF but not ABCDEF~1 then we will have a good guess at the culprit. Some more tips here: http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html > Today I've upgraded to qemu-kvm-0.12.3-r1 (Gentoo package) > but doesn't help. Still getting "page allocation failure" with > 2.6.31-rc5. > > Does it makes sense to use the same 2.6.31-rc5 kernel > in the host and guest for testing? Currently I'm still using 2.6.32 > in host and testing 2.6.31-rc5 in guest until "crashes". > Then I start the guest with 2.6.30 again which works > without trouble with 2.6.32 as host. > > This is really strange. I have hosts with 2.6.32 running > guests with 2.6.32 which works perfectly. These hosts > and guests running on HP DL 380 G6 with Intel Xeon X5560. > The guests which don't work with 2.6.32 (and 2.6.32 > as host) running on HP DL 380 G5 with Intel Xeon L5420. Hmm. Some subtle race? > (All guests) and (all hosts) have the same packages > and the same versions installed and the same kernel > configs (hosts and guests using different .config but the > difference is very small e.g. CONFIG_PARAVIRT_SPINLOCKS=y, > CONFIG_PARAVIRT_GUEST=y in guests but not in hosts > .config). > > I've had problems with qemu-kvm 0.12.2 with high network > traffic which was solved by a patch submitted by Tom > Lendacky: > > "Fix a race condition where qemu finds that there are not enough virtio > ring buffers available and the guest make more buffers available before > qemu can enable notifications." > http://www.mail-archive.com/kvm@vger.kernel.org/msg28667.html > > It was a real lifesaver for the HP DL 380 G6 mentioned > above but maybe this is now causing the problems with the G5 machines. > The symptoms are the same. I can still log into the guest > via VNC but the network is down. > > Thanks! > Robert > For now the only thing we seem to know for sure is that on specific hardware there's a regression between 2.6.30 and 2.6.31-rc5. Yes, it is possible that all it does is expose a qemu bug, but it's hard to say. Let's find out what change does that, this should give us a hint. > On 04/11/10 13:03, Michael S. Tsirkin wrote: > > On Fri, Apr 09, 2010 at 12:15:01PM +0200, Robert Wimmer wrote: > > > >> I'm not really a git hero so here is what I've done: > >> > >> cd /usr/src > >> git clone > >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux > >> cd linux > >> git checkout -b mykernel 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 > >> > > Looks right. > > > > > >> Then I've checked > >> > >> drivers/net/virtio_net.c > >> drivers/net/smc91x.c > >> > >> if the changes commited where not in there. > >> Next I build my kernel as usual. I used my .config > >> from 2.6.30 (which is working fine in a several > >> guests / .config see here: > >> https://bugzilla.kernel.org/attachment.cgi?id=25925) > >> and build the kernel > >> > >> genkernel --menuconfig --lvm --oldconfig all > >> > >> which finally gave me a 2.6.31-rc5. > >> > > That's right. > > > > > >> I should mention > >> that 2.6.30 was using SLUB. So here is the output > >> from the 2.6.31-rc5 kernel running about 20 min.: > >> https://bugzilla.kernel.org/attachment.cgi?id=25926 > >> > > Hmm, so we see the error here as well? > > > > > >> Seems not very usefull to me. I'm currently compiling > >> the same kernel with SLAB. > >> > >> Please let me know if the git commands above are > >> right and/or if you need other kernel options enabled. > >> > > Looks right. You don't have to add -b flag if you don't > > want to. > > > > > >> Thanks! > >> Robert > >> > > Hmm, I do not see anything else that seems related. > > Could you please try to bisect? > > > > git bisect start v2.6.31 v2.6.30 -- drivers/virtio/ drivers/net/virtio_net.c > > > > should help assuming the change that triggers this is in virtio. > > > > > > > >> On 04/08/10 22:04, Michael S. Tsirkin wrote: > >> > >>> On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote: > >>> > >>> > >>>> cc: mst > >>>> > >>>> On 04/08/2010 10:34 PM, Andrew Morton wrote: > >>>> > >>>> > >>>>> (switched to email. Please respond via emailed reply-to-all, not via the > >>>>> bugzilla web interface). > >>>>> > >>>>> On Wed, 7 Apr 2010 10:29:20 GMT > >>>>> bugzilla-daemon@bugzilla.kernel.org wrote: > >>>>> > >>>>> > >>>>> > >>>>> > >>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709 > >>>>>> > >>>>>> Summary: swapper page allocation failure > >>>>>> Product: Memory Management > >>>>>> Version: 2.5 > >>>>>> Kernel Version: 2.6.32 and 2.6.33 > >>>>>> Platform: All > >>>>>> OS/Version: Linux > >>>>>> Tree: Mainline > >>>>>> Status: NEW > >>>>>> Severity: normal > >>>>>> Priority: P1 > >>>>>> Component: Slab Allocator > >>>>>> AssignedTo: akpm@linux-foundation.org > >>>>>> ReportedBy: kernel@tauceti.net > >>>>>> Regression: No > >>>>>> > >>>>>> > >>>>>> Created an attachment (id=25903) > >>>>>> --> (https://bugzilla.kernel.org/attachment.cgi?id=25903) > >>>>>> dmesg output > >>>>>> > >>>>>> I'm having problems with "swapper page allocation failure's" since upgrading > >>>>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual > >>>>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As > >>>>>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after > >>>>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see > >>>>>> attachment of dmesg output). The guest is only running a Apache webserver and > >>>>>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried > >>>>>> different kernel configurations (e.g. a unmodified version from Sabayon Linux > >>>>>> Distribution) but doesn't help. Load of the guest (and host) is very low. > >>>>>> Network traffic is about 20-50 MBit/s. > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>> hm, this is a regression. > >>>>> > >>>>> : [ 454.006706] users: page allocation failure. order:0, mode:0x20 > >>>>> : [ 454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2 > >>>>> : [ 454.006714] Call Trace: > >>>>> : [ 454.006717]<IRQ> [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615 > >>>>> : [ 454.006796] [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d > >>>>> : [ 454.006820] [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f > >>>>> : [ 454.006842] [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f > >>>>> : [ 454.006846] [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7 > >>>>> : [ 454.006858] [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4 > >>>>> : [ 454.006873] [<ffffffff8176035d>] net_rx_action+0xad/0x1a5 > >>>>> : [ 454.006882] [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127 > >>>>> : [ 454.006897] [<ffffffff81008ffc>] call_softirq+0x1c/0x30 > >>>>> : [ 454.006901] [<ffffffff8100af01>] do_softirq+0x41/0x7e > >>>>> : [ 454.006904] [<ffffffff8104b3e3>] irq_exit+0x36/0x75 > >>>>> : [ 454.006907] [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1 > >>>>> : [ 454.006926] [<ffffffff8183bc13>] ret_from_intr+0x0/0x11 > >>>>> : [ 454.006928]<EOI> [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7 > >>>>> : [ 454.006942] [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7 > >>>>> : [ 454.006946] [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35 > >>>>> : [ 454.006949] [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b > >>>>> : [ 454.006953] [<ffffffff810aba67>] __do_fault+0x3c4/0x492 > >>>>> : [ 454.006957] [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8 > >>>>> : [ 454.006966] [<ffffffff810deb59>] ? path_put+0x2c/0x30 > >>>>> : [ 454.006975] [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a > >>>>> : [ 454.006979] [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd > >>>>> : [ 454.006982] [<ffffffff8183bef5>] page_fault+0x25/0x30 > >>>>> : [ 454.006985] Mem-Info: > >>>>> : [ 454.006987] Node 0 DMA per-cpu: > >>>>> : [ 454.006990] CPU 0: hi: 0, btch: 1 usd: 0 > >>>>> : [ 454.006992] CPU 1: hi: 0, btch: 1 usd: 0 > >>>>> : [ 454.006993] Node 0 DMA32 per-cpu: > >>>>> : [ 454.006996] CPU 0: hi: 186, btch: 31 usd: 185 > >>>>> : [ 454.006998] CPU 1: hi: 186, btch: 31 usd: 112 > >>>>> : [ 454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0 > >>>>> : [ 454.007005] active_file:4882 inactive_file:205902 isolated_file:0 > >>>>> : [ 454.007006] unevictable:0 dirty:11 writeback:0 unstable:0 > >>>>> : [ 454.007007] free:1385 slab_reclaimable:2445 slab_unreclaimable:4466 > >>>>> : [ 454.007008] mapped:1895 shmem:113 pagetables:1370 bounce:0 > >>>>> : [ 454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > >>>>> : [ 454.007021] lowmem_reserve[]: 0 994 994 994 > >>>>> : [ 454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > >>>>> : [ 454.007036] lowmem_reserve[]: 0 0 0 0 > >>>>> : [ 454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB > >>>>> : [ 454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB > >>>>> : [ 454.007059] 210914 total pagecache pages > >>>>> : [ 454.007061] 0 pages in swap cache > >>>>> : [ 454.007063] Swap cache stats: add 0, delete 0, find 0/0 > >>>>> : [ 454.007065] Free swap = 1959924kB > >>>>> : [ 454.007067] Total swap = 1959924kB > >>>>> : [ 454.014238] 262140 pages RAM > >>>>> : [ 454.014241] 7489 pages reserved > >>>>> : [ 454.014242] 21430 pages shared > >>>>> : [ 454.014244] 247174 pages non-shared > >>>>> > >>>>> Either page reclaim got worse or kvm/virtio-net got more aggressive. > >>>>> > >>>>> Avi, Rusty: can you think of any changes in the KVM/virtio area in the > >>>>> 2.6.30 -> 2.6.32 timeframe which may have increased the GFP_ATOMIC > >>>>> demands upon the page allocator? > >>>>> > >>>>> Thanks. > >>>>> > >>>>> > >>>>> > >>> On the contrary, with commit > >>> 3161e453e496eb5643faad30fff5a5ab183da0fe > >>> we should be using GFP_ATOMIC less. > >>> But maybe there's a bug and it has the reverse effect somehow ... > >>> > >>> Robert, could you pls try 3161e453e496eb5643faad30fff5a5ab183da0fe > >>> and if that *does* have the problem, > >>> 0b4f2928f14c4a9770b0866923fc81beb7f4aa57? > >>> > >>> > >>> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-12 11:23 ` Michael S. Tsirkin @ 2010-04-12 13:50 ` Robert Wimmer 2010-04-12 13:52 ` Michael S. Tsirkin 0 siblings, 1 reply; 62+ messages in thread From: Robert Wimmer @ 2010-04-12 13:50 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, bugme-daemon, Rusty Russell, Mel Gorman Sorry but I need some more git help. Here is what I've done. Started with a fresh clone of the kernel: cd /usr/src git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux cd linux git checkout 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 Since I already knew that this commit wasn't good I did git bisect start git bisect bad compiled and started over. As expected the problem returns. So I've done another git bisect bad but I always get the same commit: kabul:/usr/src/linux # git bisect log git bisect start # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix compilation on SMP git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix compilation on SMP git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix compilation on SMP git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 I've expected that after each "git bisect bad" I get the previous commit before the "bad" one. How can get the previous commit? The bisect documentation couldn't help me. Thanks! Robert On 04/12/10 13:23, Michael S. Tsirkin wrote: > On Mon, Apr 12, 2010 at 11:25:26AM +0200, Robert Wimmer wrote: > >> server10:/usr/src/linux # git bisect start v2.6.31 v2.6.30 -- >> drivers/virtio/ drivers/net/virtio_net.c >> Bisecting: 12 revisions left to test after this (roughly 4 steps) >> [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance id_matching >> for virtio drivers >> >> > Sorry I wasn't clear. the way to use bisect is as follows: > - first start as you did now. > 1. now build kernel, install and test > 2. if bug is there, type 'git bisect bad' > 3. if bug is not there, type 'git bisect good' > 4. The above will give you another kernel version to test > if so go back to step 1 > 6. this will be repeated about 4 times (number of steps above) > 7. in the end you will get the first revision which has the > problem. Let's assume it is revision ABCDEF. > > Type git bisect log to see your history. > > 8. Now git reset --hard ABCDEF~1 and try again. > > If you see the problem with ABCDEF but not ABCDEF~1 > then we will have a good guess at the culprit. > > Some more tips here: > http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html > > > >> Today I've upgraded to qemu-kvm-0.12.3-r1 (Gentoo package) >> but doesn't help. Still getting "page allocation failure" with >> 2.6.31-rc5. >> >> Does it makes sense to use the same 2.6.31-rc5 kernel >> in the host and guest for testing? Currently I'm still using 2.6.32 >> in host and testing 2.6.31-rc5 in guest until "crashes". >> Then I start the guest with 2.6.30 again which works >> without trouble with 2.6.32 as host. >> >> This is really strange. I have hosts with 2.6.32 running >> guests with 2.6.32 which works perfectly. These hosts >> and guests running on HP DL 380 G6 with Intel Xeon X5560. >> The guests which don't work with 2.6.32 (and 2.6.32 >> as host) running on HP DL 380 G5 with Intel Xeon L5420. >> > Hmm. Some subtle race? > > >> (All guests) and (all hosts) have the same packages >> and the same versions installed and the same kernel >> configs (hosts and guests using different .config but the >> difference is very small e.g. CONFIG_PARAVIRT_SPINLOCKS=y, >> CONFIG_PARAVIRT_GUEST=y in guests but not in hosts >> .config). >> >> I've had problems with qemu-kvm 0.12.2 with high network >> traffic which was solved by a patch submitted by Tom >> Lendacky: >> >> "Fix a race condition where qemu finds that there are not enough virtio >> ring buffers available and the guest make more buffers available before >> qemu can enable notifications." >> http://www.mail-archive.com/kvm@vger.kernel.org/msg28667.html >> >> It was a real lifesaver for the HP DL 380 G6 mentioned >> above but maybe this is now causing the problems with the G5 machines. >> The symptoms are the same. I can still log into the guest >> via VNC but the network is down. >> >> Thanks! >> Robert >> >> > For now the only thing we seem to know for sure is that on > specific hardware there's a regression between 2.6.30 and > 2.6.31-rc5. Yes, it is possible that all it does > is expose a qemu bug, but it's hard to say. > Let's find out what change > does that, this should give us a hint. > > >> On 04/11/10 13:03, Michael S. Tsirkin wrote: >> >>> On Fri, Apr 09, 2010 at 12:15:01PM +0200, Robert Wimmer wrote: >>> >>> >>>> I'm not really a git hero so here is what I've done: >>>> >>>> cd /usr/src >>>> git clone >>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux >>>> cd linux >>>> git checkout -b mykernel 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 >>>> >>>> >>> Looks right. >>> >>> >>> >>>> Then I've checked >>>> >>>> drivers/net/virtio_net.c >>>> drivers/net/smc91x.c >>>> >>>> if the changes commited where not in there. >>>> Next I build my kernel as usual. I used my .config >>>> from 2.6.30 (which is working fine in a several >>>> guests / .config see here: >>>> https://bugzilla.kernel.org/attachment.cgi?id=25925) >>>> and build the kernel >>>> >>>> genkernel --menuconfig --lvm --oldconfig all >>>> >>>> which finally gave me a 2.6.31-rc5. >>>> >>>> >>> That's right. >>> >>> >>> >>>> I should mention >>>> that 2.6.30 was using SLUB. So here is the output >>>> from the 2.6.31-rc5 kernel running about 20 min.: >>>> https://bugzilla.kernel.org/attachment.cgi?id=25926 >>>> >>>> >>> Hmm, so we see the error here as well? >>> >>> >>> >>>> Seems not very usefull to me. I'm currently compiling >>>> the same kernel with SLAB. >>>> >>>> Please let me know if the git commands above are >>>> right and/or if you need other kernel options enabled. >>>> >>>> >>> Looks right. You don't have to add -b flag if you don't >>> want to. >>> >>> >>> >>>> Thanks! >>>> Robert >>>> >>>> >>> Hmm, I do not see anything else that seems related. >>> Could you please try to bisect? >>> >>> git bisect start v2.6.31 v2.6.30 -- drivers/virtio/ drivers/net/virtio_net.c >>> >>> should help assuming the change that triggers this is in virtio. >>> >>> >>> >>> >>>> On 04/08/10 22:04, Michael S. Tsirkin wrote: >>>> >>>> >>>>> On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote: >>>>> >>>>> >>>>> >>>>>> cc: mst >>>>>> >>>>>> On 04/08/2010 10:34 PM, Andrew Morton wrote: >>>>>> >>>>>> >>>>>> >>>>>>> (switched to email. Please respond via emailed reply-to-all, not via the >>>>>>> bugzilla web interface). >>>>>>> >>>>>>> On Wed, 7 Apr 2010 10:29:20 GMT >>>>>>> bugzilla-daemon@bugzilla.kernel.org wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709 >>>>>>>> >>>>>>>> Summary: swapper page allocation failure >>>>>>>> Product: Memory Management >>>>>>>> Version: 2.5 >>>>>>>> Kernel Version: 2.6.32 and 2.6.33 >>>>>>>> Platform: All >>>>>>>> OS/Version: Linux >>>>>>>> Tree: Mainline >>>>>>>> Status: NEW >>>>>>>> Severity: normal >>>>>>>> Priority: P1 >>>>>>>> Component: Slab Allocator >>>>>>>> AssignedTo: akpm@linux-foundation.org >>>>>>>> ReportedBy: kernel@tauceti.net >>>>>>>> Regression: No >>>>>>>> >>>>>>>> >>>>>>>> Created an attachment (id=25903) >>>>>>>> --> (https://bugzilla.kernel.org/attachment.cgi?id=25903) >>>>>>>> dmesg output >>>>>>>> >>>>>>>> I'm having problems with "swapper page allocation failure's" since upgrading >>>>>>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual >>>>>>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As >>>>>>>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after >>>>>>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see >>>>>>>> attachment of dmesg output). The guest is only running a Apache webserver and >>>>>>>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried >>>>>>>> different kernel configurations (e.g. a unmodified version from Sabayon Linux >>>>>>>> Distribution) but doesn't help. Load of the guest (and host) is very low. >>>>>>>> Network traffic is about 20-50 MBit/s. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>> hm, this is a regression. >>>>>>> >>>>>>> : [ 454.006706] users: page allocation failure. order:0, mode:0x20 >>>>>>> : [ 454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2 >>>>>>> : [ 454.006714] Call Trace: >>>>>>> : [ 454.006717]<IRQ> [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615 >>>>>>> : [ 454.006796] [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d >>>>>>> : [ 454.006820] [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f >>>>>>> : [ 454.006842] [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f >>>>>>> : [ 454.006846] [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7 >>>>>>> : [ 454.006858] [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4 >>>>>>> : [ 454.006873] [<ffffffff8176035d>] net_rx_action+0xad/0x1a5 >>>>>>> : [ 454.006882] [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127 >>>>>>> : [ 454.006897] [<ffffffff81008ffc>] call_softirq+0x1c/0x30 >>>>>>> : [ 454.006901] [<ffffffff8100af01>] do_softirq+0x41/0x7e >>>>>>> : [ 454.006904] [<ffffffff8104b3e3>] irq_exit+0x36/0x75 >>>>>>> : [ 454.006907] [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1 >>>>>>> : [ 454.006926] [<ffffffff8183bc13>] ret_from_intr+0x0/0x11 >>>>>>> : [ 454.006928]<EOI> [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7 >>>>>>> : [ 454.006942] [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7 >>>>>>> : [ 454.006946] [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35 >>>>>>> : [ 454.006949] [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b >>>>>>> : [ 454.006953] [<ffffffff810aba67>] __do_fault+0x3c4/0x492 >>>>>>> : [ 454.006957] [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8 >>>>>>> : [ 454.006966] [<ffffffff810deb59>] ? path_put+0x2c/0x30 >>>>>>> : [ 454.006975] [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a >>>>>>> : [ 454.006979] [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd >>>>>>> : [ 454.006982] [<ffffffff8183bef5>] page_fault+0x25/0x30 >>>>>>> : [ 454.006985] Mem-Info: >>>>>>> : [ 454.006987] Node 0 DMA per-cpu: >>>>>>> : [ 454.006990] CPU 0: hi: 0, btch: 1 usd: 0 >>>>>>> : [ 454.006992] CPU 1: hi: 0, btch: 1 usd: 0 >>>>>>> : [ 454.006993] Node 0 DMA32 per-cpu: >>>>>>> : [ 454.006996] CPU 0: hi: 186, btch: 31 usd: 185 >>>>>>> : [ 454.006998] CPU 1: hi: 186, btch: 31 usd: 112 >>>>>>> : [ 454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0 >>>>>>> : [ 454.007005] active_file:4882 inactive_file:205902 isolated_file:0 >>>>>>> : [ 454.007006] unevictable:0 dirty:11 writeback:0 unstable:0 >>>>>>> : [ 454.007007] free:1385 slab_reclaimable:2445 slab_unreclaimable:4466 >>>>>>> : [ 454.007008] mapped:1895 shmem:113 pagetables:1370 bounce:0 >>>>>>> : [ 454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no >>>>>>> : [ 454.007021] lowmem_reserve[]: 0 994 994 994 >>>>>>> : [ 454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no >>>>>>> : [ 454.007036] lowmem_reserve[]: 0 0 0 0 >>>>>>> : [ 454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB >>>>>>> : [ 454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB >>>>>>> : [ 454.007059] 210914 total pagecache pages >>>>>>> : [ 454.007061] 0 pages in swap cache >>>>>>> : [ 454.007063] Swap cache stats: add 0, delete 0, find 0/0 >>>>>>> : [ 454.007065] Free swap = 1959924kB >>>>>>> : [ 454.007067] Total swap = 1959924kB >>>>>>> : [ 454.014238] 262140 pages RAM >>>>>>> : [ 454.014241] 7489 pages reserved >>>>>>> : [ 454.014242] 21430 pages shared >>>>>>> : [ 454.014244] 247174 pages non-shared >>>>>>> >>>>>>> Either page reclaim got worse or kvm/virtio-net got more aggressive. >>>>>>> >>>>>>> Avi, Rusty: can you think of any changes in the KVM/virtio area in the >>>>>>> 2.6.30 -> 2.6.32 timeframe which may have increased the GFP_ATOMIC >>>>>>> demands upon the page allocator? >>>>>>> >>>>>>> Thanks. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>> On the contrary, with commit >>>>> 3161e453e496eb5643faad30fff5a5ab183da0fe >>>>> we should be using GFP_ATOMIC less. >>>>> But maybe there's a bug and it has the reverse effect somehow ... >>>>> >>>>> Robert, could you pls try 3161e453e496eb5643faad30fff5a5ab183da0fe >>>>> and if that *does* have the problem, >>>>> 0b4f2928f14c4a9770b0866923fc81beb7f4aa57? >>>>> >>>>> >>>>> >>>>> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-12 13:50 ` Robert Wimmer @ 2010-04-12 13:52 ` Michael S. Tsirkin 2010-04-13 8:51 ` Robert Wimmer 0 siblings, 1 reply; 62+ messages in thread From: Michael S. Tsirkin @ 2010-04-12 13:52 UTC (permalink / raw) To: Robert Wimmer Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, bugme-daemon, Rusty Russell, Mel Gorman On Mon, Apr 12, 2010 at 03:50:31PM +0200, Robert Wimmer wrote: > Sorry but I need some more git help. Here is what I've done. > Started with a fresh clone of the kernel: > > cd /usr/src > git clone > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux > cd linux > git checkout 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 > > Since I already knew that this commit wasn't good I did > > git bisect start > git bisect bad I think what you miss is marking the good commit. bisect does a binary search but it needs to know both good and bad commits to search in the range. Optionally, you can use '-- drivers/virtio/ drivers/net/virtio_net.c' what this does is limit bisect to commits that touch files in question. This way you get much less tests to run (about 4) but after you find a first problematic commit you must verify that a commit just before it does not have the issue. If this turns out not to be the case, you'll have to fallback on full bisect, and we will now this is some other change in kernel that triggered the regression. > compiled and started over. As expected the problem returns. > So I've done another > > git bisect bad > > but I always get the same commit: > > kabul:/usr/src/linux # git bisect log > git bisect start > # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix > compilation on SMP > git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 > # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix > compilation on SMP > git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 > # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix > compilation on SMP > git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 > > I've expected that after each "git bisect bad" I get the previous > commit before the "bad" one. How can get the previous commit? > The bisect documentation couldn't help me. > > Thanks! > Robert > > > > On 04/12/10 13:23, Michael S. Tsirkin wrote: > > On Mon, Apr 12, 2010 at 11:25:26AM +0200, Robert Wimmer wrote: > > > >> server10:/usr/src/linux # git bisect start v2.6.31 v2.6.30 -- > >> drivers/virtio/ drivers/net/virtio_net.c > >> Bisecting: 12 revisions left to test after this (roughly 4 steps) > >> [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance id_matching > >> for virtio drivers > >> > >> > > Sorry I wasn't clear. the way to use bisect is as follows: > > - first start as you did now. > > 1. now build kernel, install and test > > 2. if bug is there, type 'git bisect bad' > > 3. if bug is not there, type 'git bisect good' > > 4. The above will give you another kernel version to test > > if so go back to step 1 > > 6. this will be repeated about 4 times (number of steps above) > > 7. in the end you will get the first revision which has the > > problem. Let's assume it is revision ABCDEF. > > > > Type git bisect log to see your history. > > > > 8. Now git reset --hard ABCDEF~1 and try again. > > > > If you see the problem with ABCDEF but not ABCDEF~1 > > then we will have a good guess at the culprit. > > > > Some more tips here: > > http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html > > > > > > > >> Today I've upgraded to qemu-kvm-0.12.3-r1 (Gentoo package) > >> but doesn't help. Still getting "page allocation failure" with > >> 2.6.31-rc5. > >> > >> Does it makes sense to use the same 2.6.31-rc5 kernel > >> in the host and guest for testing? Currently I'm still using 2.6.32 > >> in host and testing 2.6.31-rc5 in guest until "crashes". > >> Then I start the guest with 2.6.30 again which works > >> without trouble with 2.6.32 as host. > >> > >> This is really strange. I have hosts with 2.6.32 running > >> guests with 2.6.32 which works perfectly. These hosts > >> and guests running on HP DL 380 G6 with Intel Xeon X5560. > >> The guests which don't work with 2.6.32 (and 2.6.32 > >> as host) running on HP DL 380 G5 with Intel Xeon L5420. > >> > > Hmm. Some subtle race? > > > > > >> (All guests) and (all hosts) have the same packages > >> and the same versions installed and the same kernel > >> configs (hosts and guests using different .config but the > >> difference is very small e.g. CONFIG_PARAVIRT_SPINLOCKS=y, > >> CONFIG_PARAVIRT_GUEST=y in guests but not in hosts > >> .config). > >> > >> I've had problems with qemu-kvm 0.12.2 with high network > >> traffic which was solved by a patch submitted by Tom > >> Lendacky: > >> > >> "Fix a race condition where qemu finds that there are not enough virtio > >> ring buffers available and the guest make more buffers available before > >> qemu can enable notifications." > >> http://www.mail-archive.com/kvm@vger.kernel.org/msg28667.html > >> > >> It was a real lifesaver for the HP DL 380 G6 mentioned > >> above but maybe this is now causing the problems with the G5 machines. > >> The symptoms are the same. I can still log into the guest > >> via VNC but the network is down. > >> > >> Thanks! > >> Robert > >> > >> > > For now the only thing we seem to know for sure is that on > > specific hardware there's a regression between 2.6.30 and > > 2.6.31-rc5. Yes, it is possible that all it does > > is expose a qemu bug, but it's hard to say. > > Let's find out what change > > does that, this should give us a hint. > > > > > >> On 04/11/10 13:03, Michael S. Tsirkin wrote: > >> > >>> On Fri, Apr 09, 2010 at 12:15:01PM +0200, Robert Wimmer wrote: > >>> > >>> > >>>> I'm not really a git hero so here is what I've done: > >>>> > >>>> cd /usr/src > >>>> git clone > >>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux > >>>> cd linux > >>>> git checkout -b mykernel 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 > >>>> > >>>> > >>> Looks right. > >>> > >>> > >>> > >>>> Then I've checked > >>>> > >>>> drivers/net/virtio_net.c > >>>> drivers/net/smc91x.c > >>>> > >>>> if the changes commited where not in there. > >>>> Next I build my kernel as usual. I used my .config > >>>> from 2.6.30 (which is working fine in a several > >>>> guests / .config see here: > >>>> https://bugzilla.kernel.org/attachment.cgi?id=25925) > >>>> and build the kernel > >>>> > >>>> genkernel --menuconfig --lvm --oldconfig all > >>>> > >>>> which finally gave me a 2.6.31-rc5. > >>>> > >>>> > >>> That's right. > >>> > >>> > >>> > >>>> I should mention > >>>> that 2.6.30 was using SLUB. So here is the output > >>>> from the 2.6.31-rc5 kernel running about 20 min.: > >>>> https://bugzilla.kernel.org/attachment.cgi?id=25926 > >>>> > >>>> > >>> Hmm, so we see the error here as well? > >>> > >>> > >>> > >>>> Seems not very usefull to me. I'm currently compiling > >>>> the same kernel with SLAB. > >>>> > >>>> Please let me know if the git commands above are > >>>> right and/or if you need other kernel options enabled. > >>>> > >>>> > >>> Looks right. You don't have to add -b flag if you don't > >>> want to. > >>> > >>> > >>> > >>>> Thanks! > >>>> Robert > >>>> > >>>> > >>> Hmm, I do not see anything else that seems related. > >>> Could you please try to bisect? > >>> > >>> git bisect start v2.6.31 v2.6.30 -- drivers/virtio/ drivers/net/virtio_net.c > >>> > >>> should help assuming the change that triggers this is in virtio. > >>> > >>> > >>> > >>> > >>>> On 04/08/10 22:04, Michael S. Tsirkin wrote: > >>>> > >>>> > >>>>> On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote: > >>>>> > >>>>> > >>>>> > >>>>>> cc: mst > >>>>>> > >>>>>> On 04/08/2010 10:34 PM, Andrew Morton wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>>> (switched to email. Please respond via emailed reply-to-all, not via the > >>>>>>> bugzilla web interface). > >>>>>>> > >>>>>>> On Wed, 7 Apr 2010 10:29:20 GMT > >>>>>>> bugzilla-daemon@bugzilla.kernel.org wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709 > >>>>>>>> > >>>>>>>> Summary: swapper page allocation failure > >>>>>>>> Product: Memory Management > >>>>>>>> Version: 2.5 > >>>>>>>> Kernel Version: 2.6.32 and 2.6.33 > >>>>>>>> Platform: All > >>>>>>>> OS/Version: Linux > >>>>>>>> Tree: Mainline > >>>>>>>> Status: NEW > >>>>>>>> Severity: normal > >>>>>>>> Priority: P1 > >>>>>>>> Component: Slab Allocator > >>>>>>>> AssignedTo: akpm@linux-foundation.org > >>>>>>>> ReportedBy: kernel@tauceti.net > >>>>>>>> Regression: No > >>>>>>>> > >>>>>>>> > >>>>>>>> Created an attachment (id=25903) > >>>>>>>> --> (https://bugzilla.kernel.org/attachment.cgi?id=25903) > >>>>>>>> dmesg output > >>>>>>>> > >>>>>>>> I'm having problems with "swapper page allocation failure's" since upgrading > >>>>>>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual > >>>>>>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As > >>>>>>>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after > >>>>>>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see > >>>>>>>> attachment of dmesg output). The guest is only running a Apache webserver and > >>>>>>>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried > >>>>>>>> different kernel configurations (e.g. a unmodified version from Sabayon Linux > >>>>>>>> Distribution) but doesn't help. Load of the guest (and host) is very low. > >>>>>>>> Network traffic is about 20-50 MBit/s. > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>> hm, this is a regression. > >>>>>>> > >>>>>>> : [ 454.006706] users: page allocation failure. order:0, mode:0x20 > >>>>>>> : [ 454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2 > >>>>>>> : [ 454.006714] Call Trace: > >>>>>>> : [ 454.006717]<IRQ> [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615 > >>>>>>> : [ 454.006796] [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d > >>>>>>> : [ 454.006820] [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f > >>>>>>> : [ 454.006842] [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f > >>>>>>> : [ 454.006846] [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7 > >>>>>>> : [ 454.006858] [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4 > >>>>>>> : [ 454.006873] [<ffffffff8176035d>] net_rx_action+0xad/0x1a5 > >>>>>>> : [ 454.006882] [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127 > >>>>>>> : [ 454.006897] [<ffffffff81008ffc>] call_softirq+0x1c/0x30 > >>>>>>> : [ 454.006901] [<ffffffff8100af01>] do_softirq+0x41/0x7e > >>>>>>> : [ 454.006904] [<ffffffff8104b3e3>] irq_exit+0x36/0x75 > >>>>>>> : [ 454.006907] [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1 > >>>>>>> : [ 454.006926] [<ffffffff8183bc13>] ret_from_intr+0x0/0x11 > >>>>>>> : [ 454.006928]<EOI> [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7 > >>>>>>> : [ 454.006942] [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7 > >>>>>>> : [ 454.006946] [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35 > >>>>>>> : [ 454.006949] [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b > >>>>>>> : [ 454.006953] [<ffffffff810aba67>] __do_fault+0x3c4/0x492 > >>>>>>> : [ 454.006957] [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8 > >>>>>>> : [ 454.006966] [<ffffffff810deb59>] ? path_put+0x2c/0x30 > >>>>>>> : [ 454.006975] [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a > >>>>>>> : [ 454.006979] [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd > >>>>>>> : [ 454.006982] [<ffffffff8183bef5>] page_fault+0x25/0x30 > >>>>>>> : [ 454.006985] Mem-Info: > >>>>>>> : [ 454.006987] Node 0 DMA per-cpu: > >>>>>>> : [ 454.006990] CPU 0: hi: 0, btch: 1 usd: 0 > >>>>>>> : [ 454.006992] CPU 1: hi: 0, btch: 1 usd: 0 > >>>>>>> : [ 454.006993] Node 0 DMA32 per-cpu: > >>>>>>> : [ 454.006996] CPU 0: hi: 186, btch: 31 usd: 185 > >>>>>>> : [ 454.006998] CPU 1: hi: 186, btch: 31 usd: 112 > >>>>>>> : [ 454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0 > >>>>>>> : [ 454.007005] active_file:4882 inactive_file:205902 isolated_file:0 > >>>>>>> : [ 454.007006] unevictable:0 dirty:11 writeback:0 unstable:0 > >>>>>>> : [ 454.007007] free:1385 slab_reclaimable:2445 slab_unreclaimable:4466 > >>>>>>> : [ 454.007008] mapped:1895 shmem:113 pagetables:1370 bounce:0 > >>>>>>> : [ 454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > >>>>>>> : [ 454.007021] lowmem_reserve[]: 0 994 994 994 > >>>>>>> : [ 454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > >>>>>>> : [ 454.007036] lowmem_reserve[]: 0 0 0 0 > >>>>>>> : [ 454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB > >>>>>>> : [ 454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB > >>>>>>> : [ 454.007059] 210914 total pagecache pages > >>>>>>> : [ 454.007061] 0 pages in swap cache > >>>>>>> : [ 454.007063] Swap cache stats: add 0, delete 0, find 0/0 > >>>>>>> : [ 454.007065] Free swap = 1959924kB > >>>>>>> : [ 454.007067] Total swap = 1959924kB > >>>>>>> : [ 454.014238] 262140 pages RAM > >>>>>>> : [ 454.014241] 7489 pages reserved > >>>>>>> : [ 454.014242] 21430 pages shared > >>>>>>> : [ 454.014244] 247174 pages non-shared > >>>>>>> > >>>>>>> Either page reclaim got worse or kvm/virtio-net got more aggressive. > >>>>>>> > >>>>>>> Avi, Rusty: can you think of any changes in the KVM/virtio area in the > >>>>>>> 2.6.30 -> 2.6.32 timeframe which may have increased the GFP_ATOMIC > >>>>>>> demands upon the page allocator? > >>>>>>> > >>>>>>> Thanks. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>> On the contrary, with commit > >>>>> 3161e453e496eb5643faad30fff5a5ab183da0fe > >>>>> we should be using GFP_ATOMIC less. > >>>>> But maybe there's a bug and it has the reverse effect somehow ... > >>>>> > >>>>> Robert, could you pls try 3161e453e496eb5643faad30fff5a5ab183da0fe > >>>>> and if that *does* have the problem, > >>>>> 0b4f2928f14c4a9770b0866923fc81beb7f4aa57? > >>>>> > >>>>> > >>>>> > >>>>> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-12 13:52 ` Michael S. Tsirkin @ 2010-04-13 8:51 ` Robert Wimmer 2010-04-19 12:55 ` Robert Wimmer 0 siblings, 1 reply; 62+ messages in thread From: Robert Wimmer @ 2010-04-13 8:51 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, bugme-daemon, Rusty Russell, Mel Gorman I've tried to do my very best. In general I can say: All 2.6.30 versions work, all 2.6.31 fail. 2.6.31-rc3 fails with "soft lockup" and is the only one which don't show any "swapper page allocation failure". But the result is finally the same... 2.6.31-rc4 don't show "soft lockups" but "swapper page allocation failure". Here is the dmesg output for 2.6.31-rc3: https://bugzilla.kernel.org/attachment.cgi?id=25986 So here is what I've done. Started with a fresh tree and my 2.6.30 .config: rm -fr /usr/src/linux cd /usr/src git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux cd linux git checkout 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 Here is the "git bisect log" output: # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 git bisect start 'v2.6.31' 'v2.6.30' '--' 'drivers/virtio/' 'drivers/net/virtio_net.c' # good: [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance id_matching for virtio drivers git bisect good e3353853730eb99c56b7b0aed1667d51c0e3699a # good: [9cbc1cb8cd46ce1f7645b9de249b2ce8460129bb] Merge branch 'master' of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 git bisect good 9cbc1cb8cd46ce1f7645b9de249b2ce8460129bb # bad: [ff52c3fc7188855ede75d87b022271f0da309e5b] virtio: fix memory leak on device removal git bisect bad ff52c3fc7188855ede75d87b022271f0da309e5b # good: [31278e71471399beaff9280737e52b47db4dc345] net: group address list and its count git bisect good 31278e71471399beaff9280737e52b47db4dc345 # bad: [4b892e6582e3a4fe01f623aea386907270d5bf83] virtio-pci: correctly unregister root device on error git bisect bad 4b892e6582e3a4fe01f623aea386907270d5bf83 Hopefully this gives you some hints. The problem for me is that I don't know what commit I should consider good or bad. Should I consider the commit with the "soft lockup" as good because it don't show the allocation failure? Currently it is marked as bad (4b892e6582e3a4fe01f623aea386907270d5bf83). What should I do next? Thanks! Robert On 04/12/10 15:52, Michael S. Tsirkin wrote: > On Mon, Apr 12, 2010 at 03:50:31PM +0200, Robert Wimmer wrote: > >> Sorry but I need some more git help. Here is what I've done. >> Started with a fresh clone of the kernel: >> >> cd /usr/src >> git clone >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux >> cd linux >> git checkout 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 >> >> Since I already knew that this commit wasn't good I did >> >> git bisect start >> git bisect bad >> > I think what you miss is marking the good commit. > bisect does a binary search but it needs to know > both good and bad commits to search in the range. > > Optionally, you can use '-- drivers/virtio/ drivers/net/virtio_net.c' > what this does is limit bisect to commits that touch files in > question. This way you get much less tests to run > (about 4) but after you find a first problematic commit > you must verify that a commit just before it does not have the issue. > > If this turns out not to be the case, you'll have to > fallback on full bisect, and we will now this is some > other change in kernel that triggered the regression. > > > >> compiled and started over. As expected the problem returns. >> So I've done another >> >> git bisect bad >> >> but I always get the same commit: >> >> kabul:/usr/src/linux # git bisect log >> git bisect start >> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix >> compilation on SMP >> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 >> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix >> compilation on SMP >> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 >> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix >> compilation on SMP >> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 >> >> I've expected that after each "git bisect bad" I get the previous >> commit before the "bad" one. How can get the previous commit? >> The bisect documentation couldn't help me. >> >> Thanks! >> Robert >> >> >> >> On 04/12/10 13:23, Michael S. Tsirkin wrote: >> >>> On Mon, Apr 12, 2010 at 11:25:26AM +0200, Robert Wimmer wrote: >>> >>> >>>> server10:/usr/src/linux # git bisect start v2.6.31 v2.6.30 -- >>>> drivers/virtio/ drivers/net/virtio_net.c >>>> Bisecting: 12 revisions left to test after this (roughly 4 steps) >>>> [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance id_matching >>>> for virtio drivers >>>> >>>> >>>> >>> Sorry I wasn't clear. the way to use bisect is as follows: >>> - first start as you did now. >>> 1. now build kernel, install and test >>> 2. if bug is there, type 'git bisect bad' >>> 3. if bug is not there, type 'git bisect good' >>> 4. The above will give you another kernel version to test >>> if so go back to step 1 >>> 6. this will be repeated about 4 times (number of steps above) >>> 7. in the end you will get the first revision which has the >>> problem. Let's assume it is revision ABCDEF. >>> >>> Type git bisect log to see your history. >>> >>> 8. Now git reset --hard ABCDEF~1 and try again. >>> >>> If you see the problem with ABCDEF but not ABCDEF~1 >>> then we will have a good guess at the culprit. >>> >>> Some more tips here: >>> http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html >>> >>> >>> >>> >>>> Today I've upgraded to qemu-kvm-0.12.3-r1 (Gentoo package) >>>> but doesn't help. Still getting "page allocation failure" with >>>> 2.6.31-rc5. >>>> >>>> Does it makes sense to use the same 2.6.31-rc5 kernel >>>> in the host and guest for testing? Currently I'm still using 2.6.32 >>>> in host and testing 2.6.31-rc5 in guest until "crashes". >>>> Then I start the guest with 2.6.30 again which works >>>> without trouble with 2.6.32 as host. >>>> >>>> This is really strange. I have hosts with 2.6.32 running >>>> guests with 2.6.32 which works perfectly. These hosts >>>> and guests running on HP DL 380 G6 with Intel Xeon X5560. >>>> The guests which don't work with 2.6.32 (and 2.6.32 >>>> as host) running on HP DL 380 G5 with Intel Xeon L5420. >>>> >>>> >>> Hmm. Some subtle race? >>> >>> >>> >>>> (All guests) and (all hosts) have the same packages >>>> and the same versions installed and the same kernel >>>> configs (hosts and guests using different .config but the >>>> difference is very small e.g. CONFIG_PARAVIRT_SPINLOCKS=y, >>>> CONFIG_PARAVIRT_GUEST=y in guests but not in hosts >>>> .config). >>>> >>>> I've had problems with qemu-kvm 0.12.2 with high network >>>> traffic which was solved by a patch submitted by Tom >>>> Lendacky: >>>> >>>> "Fix a race condition where qemu finds that there are not enough virtio >>>> ring buffers available and the guest make more buffers available before >>>> qemu can enable notifications." >>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg28667.html >>>> >>>> It was a real lifesaver for the HP DL 380 G6 mentioned >>>> above but maybe this is now causing the problems with the G5 machines. >>>> The symptoms are the same. I can still log into the guest >>>> via VNC but the network is down. >>>> >>>> Thanks! >>>> Robert >>>> >>>> >>>> >>> For now the only thing we seem to know for sure is that on >>> specific hardware there's a regression between 2.6.30 and >>> 2.6.31-rc5. Yes, it is possible that all it does >>> is expose a qemu bug, but it's hard to say. >>> Let's find out what change >>> does that, this should give us a hint. >>> >>> >>> >>>> On 04/11/10 13:03, Michael S. Tsirkin wrote: >>>> >>>> >>>>> On Fri, Apr 09, 2010 at 12:15:01PM +0200, Robert Wimmer wrote: >>>>> >>>>> >>>>> >>>>>> I'm not really a git hero so here is what I've done: >>>>>> >>>>>> cd /usr/src >>>>>> git clone >>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux >>>>>> cd linux >>>>>> git checkout -b mykernel 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 >>>>>> >>>>>> >>>>>> >>>>> Looks right. >>>>> >>>>> >>>>> >>>>> >>>>>> Then I've checked >>>>>> >>>>>> drivers/net/virtio_net.c >>>>>> drivers/net/smc91x.c >>>>>> >>>>>> if the changes commited where not in there. >>>>>> Next I build my kernel as usual. I used my .config >>>>>> from 2.6.30 (which is working fine in a several >>>>>> guests / .config see here: >>>>>> https://bugzilla.kernel.org/attachment.cgi?id=25925) >>>>>> and build the kernel >>>>>> >>>>>> genkernel --menuconfig --lvm --oldconfig all >>>>>> >>>>>> which finally gave me a 2.6.31-rc5. >>>>>> >>>>>> >>>>>> >>>>> That's right. >>>>> >>>>> >>>>> >>>>> >>>>>> I should mention >>>>>> that 2.6.30 was using SLUB. So here is the output >>>>>> from the 2.6.31-rc5 kernel running about 20 min.: >>>>>> https://bugzilla.kernel.org/attachment.cgi?id=25926 >>>>>> >>>>>> >>>>>> >>>>> Hmm, so we see the error here as well? >>>>> >>>>> >>>>> >>>>> >>>>>> Seems not very usefull to me. I'm currently compiling >>>>>> the same kernel with SLAB. >>>>>> >>>>>> Please let me know if the git commands above are >>>>>> right and/or if you need other kernel options enabled. >>>>>> >>>>>> >>>>>> >>>>> Looks right. You don't have to add -b flag if you don't >>>>> want to. >>>>> >>>>> >>>>> >>>>> >>>>>> Thanks! >>>>>> Robert >>>>>> >>>>>> >>>>>> >>>>> Hmm, I do not see anything else that seems related. >>>>> Could you please try to bisect? >>>>> >>>>> git bisect start v2.6.31 v2.6.30 -- drivers/virtio/ drivers/net/virtio_net.c >>>>> >>>>> should help assuming the change that triggers this is in virtio. >>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> On 04/08/10 22:04, Michael S. Tsirkin wrote: >>>>>> >>>>>> >>>>>> >>>>>>> On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> cc: mst >>>>>>>> >>>>>>>> On 04/08/2010 10:34 PM, Andrew Morton wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> (switched to email. Please respond via emailed reply-to-all, not via the >>>>>>>>> bugzilla web interface). >>>>>>>>> >>>>>>>>> On Wed, 7 Apr 2010 10:29:20 GMT >>>>>>>>> bugzilla-daemon@bugzilla.kernel.org wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709 >>>>>>>>>> >>>>>>>>>> Summary: swapper page allocation failure >>>>>>>>>> Product: Memory Management >>>>>>>>>> Version: 2.5 >>>>>>>>>> Kernel Version: 2.6.32 and 2.6.33 >>>>>>>>>> Platform: All >>>>>>>>>> OS/Version: Linux >>>>>>>>>> Tree: Mainline >>>>>>>>>> Status: NEW >>>>>>>>>> Severity: normal >>>>>>>>>> Priority: P1 >>>>>>>>>> Component: Slab Allocator >>>>>>>>>> AssignedTo: akpm@linux-foundation.org >>>>>>>>>> ReportedBy: kernel@tauceti.net >>>>>>>>>> Regression: No >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> Created an attachment (id=25903) >>>>>>>>>> --> (https://bugzilla.kernel.org/attachment.cgi?id=25903) >>>>>>>>>> dmesg output >>>>>>>>>> >>>>>>>>>> I'm having problems with "swapper page allocation failure's" since upgrading >>>>>>>>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual >>>>>>>>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As >>>>>>>>>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after >>>>>>>>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see >>>>>>>>>> attachment of dmesg output). The guest is only running a Apache webserver and >>>>>>>>>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried >>>>>>>>>> different kernel configurations (e.g. a unmodified version from Sabayon Linux >>>>>>>>>> Distribution) but doesn't help. Load of the guest (and host) is very low. >>>>>>>>>> Network traffic is about 20-50 MBit/s. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>> hm, this is a regression. >>>>>>>>> >>>>>>>>> : [ 454.006706] users: page allocation failure. order:0, mode:0x20 >>>>>>>>> : [ 454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2 >>>>>>>>> : [ 454.006714] Call Trace: >>>>>>>>> : [ 454.006717]<IRQ> [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615 >>>>>>>>> : [ 454.006796] [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d >>>>>>>>> : [ 454.006820] [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f >>>>>>>>> : [ 454.006842] [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f >>>>>>>>> : [ 454.006846] [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7 >>>>>>>>> : [ 454.006858] [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4 >>>>>>>>> : [ 454.006873] [<ffffffff8176035d>] net_rx_action+0xad/0x1a5 >>>>>>>>> : [ 454.006882] [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127 >>>>>>>>> : [ 454.006897] [<ffffffff81008ffc>] call_softirq+0x1c/0x30 >>>>>>>>> : [ 454.006901] [<ffffffff8100af01>] do_softirq+0x41/0x7e >>>>>>>>> : [ 454.006904] [<ffffffff8104b3e3>] irq_exit+0x36/0x75 >>>>>>>>> : [ 454.006907] [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1 >>>>>>>>> : [ 454.006926] [<ffffffff8183bc13>] ret_from_intr+0x0/0x11 >>>>>>>>> : [ 454.006928]<EOI> [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7 >>>>>>>>> : [ 454.006942] [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7 >>>>>>>>> : [ 454.006946] [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35 >>>>>>>>> : [ 454.006949] [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b >>>>>>>>> : [ 454.006953] [<ffffffff810aba67>] __do_fault+0x3c4/0x492 >>>>>>>>> : [ 454.006957] [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8 >>>>>>>>> : [ 454.006966] [<ffffffff810deb59>] ? path_put+0x2c/0x30 >>>>>>>>> : [ 454.006975] [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a >>>>>>>>> : [ 454.006979] [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd >>>>>>>>> : [ 454.006982] [<ffffffff8183bef5>] page_fault+0x25/0x30 >>>>>>>>> : [ 454.006985] Mem-Info: >>>>>>>>> : [ 454.006987] Node 0 DMA per-cpu: >>>>>>>>> : [ 454.006990] CPU 0: hi: 0, btch: 1 usd: 0 >>>>>>>>> : [ 454.006992] CPU 1: hi: 0, btch: 1 usd: 0 >>>>>>>>> : [ 454.006993] Node 0 DMA32 per-cpu: >>>>>>>>> : [ 454.006996] CPU 0: hi: 186, btch: 31 usd: 185 >>>>>>>>> : [ 454.006998] CPU 1: hi: 186, btch: 31 usd: 112 >>>>>>>>> : [ 454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0 >>>>>>>>> : [ 454.007005] active_file:4882 inactive_file:205902 isolated_file:0 >>>>>>>>> : [ 454.007006] unevictable:0 dirty:11 writeback:0 unstable:0 >>>>>>>>> : [ 454.007007] free:1385 slab_reclaimable:2445 slab_unreclaimable:4466 >>>>>>>>> : [ 454.007008] mapped:1895 shmem:113 pagetables:1370 bounce:0 >>>>>>>>> : [ 454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no >>>>>>>>> : [ 454.007021] lowmem_reserve[]: 0 994 994 994 >>>>>>>>> : [ 454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no >>>>>>>>> : [ 454.007036] lowmem_reserve[]: 0 0 0 0 >>>>>>>>> : [ 454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB >>>>>>>>> : [ 454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB >>>>>>>>> : [ 454.007059] 210914 total pagecache pages >>>>>>>>> : [ 454.007061] 0 pages in swap cache >>>>>>>>> : [ 454.007063] Swap cache stats: add 0, delete 0, find 0/0 >>>>>>>>> : [ 454.007065] Free swap = 1959924kB >>>>>>>>> : [ 454.007067] Total swap = 1959924kB >>>>>>>>> : [ 454.014238] 262140 pages RAM >>>>>>>>> : [ 454.014241] 7489 pages reserved >>>>>>>>> : [ 454.014242] 21430 pages shared >>>>>>>>> : [ 454.014244] 247174 pages non-shared >>>>>>>>> >>>>>>>>> Either page reclaim got worse or kvm/virtio-net got more aggressive. >>>>>>>>> >>>>>>>>> Avi, Rusty: can you think of any changes in the KVM/virtio area in the >>>>>>>>> 2.6.30 -> 2.6.32 timeframe which may have increased the GFP_ATOMIC >>>>>>>>> demands upon the page allocator? >>>>>>>>> >>>>>>>>> Thanks. >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>> On the contrary, with commit >>>>>>> 3161e453e496eb5643faad30fff5a5ab183da0fe >>>>>>> we should be using GFP_ATOMIC less. >>>>>>> But maybe there's a bug and it has the reverse effect somehow ... >>>>>>> >>>>>>> Robert, could you pls try 3161e453e496eb5643faad30fff5a5ab183da0fe >>>>>>> and if that *does* have the problem, >>>>>>> 0b4f2928f14c4a9770b0866923fc81beb7f4aa57? >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-13 8:51 ` Robert Wimmer @ 2010-04-19 12:55 ` Robert Wimmer 2010-04-19 13:17 ` Michael S. Tsirkin 0 siblings, 1 reply; 62+ messages in thread From: Robert Wimmer @ 2010-04-19 12:55 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman Is there a possibility to track this further down? I've problems on two other KVMs since a few weeks which I think that they're related to this. Host for this KVMs are kernel 2.6.32. Guests until today were also running 2.6.32. Inside the KVMs we're using GlusterFS, NFSv4 and Apache with PHP. From time to time the httpd-processes are "hanging". When this happens then we're seeing a lot of soft lockups. This hosts are running Xeon X5560 processors. Until today I suspected that this problems only happens on older Xeon's but this doesn't seems to be true. I've attached the output from /var/log/messages (https://bugzilla.kernel.org/attachment.cgi?id=26048) from one of the hosts with GlusterFS. I've now downgraded to kernel 2.6.30 in the guests. But since this problem also exists in 2.6.34-rc3 I suspect that we're never ever will be able to do a kernel update in the guests when they're using NFS :-( But what I definitely can say is that all the problems only happens with guests running kernel >= 2.6.31 and with a remote file system (NFS, GlusterFS). Some days ago another KVM have had a network shutdown using kernel 2.6.32 in host and guest + NFSv4. But this only happend once until now and there isn't so much traffic running through the interfaces of that host. All other guests with kernel 2.6.30 (about 80 guests on 18 hosts) with NFS and KVM 0.12.3 are really running perfectly. Thanks! Robert On 04/13/10 10:51, Robert Wimmer wrote: > I've tried to do my very best. In general I can > say: All 2.6.30 versions work, all 2.6.31 fail. 2.6.31-rc3 > fails with "soft lockup" and is the only one which > don't show any "swapper page allocation failure". > But the result is finally the same... 2.6.31-rc4 > don't show "soft lockups" but "swapper page allocation failure". > Here is the dmesg output for 2.6.31-rc3: > https://bugzilla.kernel.org/attachment.cgi?id=25986 > > So here is what I've done. Started with a fresh tree > and my 2.6.30 .config: > > rm -fr /usr/src/linux > cd /usr/src > git clone > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux > cd linux > git checkout 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 > > Here is the "git bisect log" output: > > # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 > # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 > git bisect start 'v2.6.31' 'v2.6.30' '--' 'drivers/virtio/' > 'drivers/net/virtio_net.c' > # good: [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance > id_matching for virtio drivers > git bisect good e3353853730eb99c56b7b0aed1667d51c0e3699a > # good: [9cbc1cb8cd46ce1f7645b9de249b2ce8460129bb] Merge branch 'master' > of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 > git bisect good 9cbc1cb8cd46ce1f7645b9de249b2ce8460129bb > # bad: [ff52c3fc7188855ede75d87b022271f0da309e5b] virtio: fix memory > leak on device removal > git bisect bad ff52c3fc7188855ede75d87b022271f0da309e5b > # good: [31278e71471399beaff9280737e52b47db4dc345] net: group address > list and its count > git bisect good 31278e71471399beaff9280737e52b47db4dc345 > # bad: [4b892e6582e3a4fe01f623aea386907270d5bf83] virtio-pci: correctly > unregister root device on error > git bisect bad 4b892e6582e3a4fe01f623aea386907270d5bf83 > > Hopefully this gives you some hints. The problem > for me is that I don't know what commit I should > consider good or bad. Should I consider the > commit with the "soft lockup" as good because it > don't show the allocation failure? Currently it is > marked as bad (4b892e6582e3a4fe01f623aea386907270d5bf83). > What should I do next? > > Thanks! > Robert > > On 04/12/10 15:52, Michael S. Tsirkin wrote: > >> On Mon, Apr 12, 2010 at 03:50:31PM +0200, Robert Wimmer wrote: >> >> >>> Sorry but I need some more git help. Here is what I've done. >>> Started with a fresh clone of the kernel: >>> >>> cd /usr/src >>> git clone >>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux >>> cd linux >>> git checkout 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 >>> >>> Since I already knew that this commit wasn't good I did >>> >>> git bisect start >>> git bisect bad >>> >>> >> I think what you miss is marking the good commit. >> bisect does a binary search but it needs to know >> both good and bad commits to search in the range. >> >> Optionally, you can use '-- drivers/virtio/ drivers/net/virtio_net.c' >> what this does is limit bisect to commits that touch files in >> question. This way you get much less tests to run >> (about 4) but after you find a first problematic commit >> you must verify that a commit just before it does not have the issue. >> >> If this turns out not to be the case, you'll have to >> fallback on full bisect, and we will now this is some >> other change in kernel that triggered the regression. >> >> >> >> >>> compiled and started over. As expected the problem returns. >>> So I've done another >>> >>> git bisect bad >>> >>> but I always get the same commit: >>> >>> kabul:/usr/src/linux # git bisect log >>> git bisect start >>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix >>> compilation on SMP >>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 >>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix >>> compilation on SMP >>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 >>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix >>> compilation on SMP >>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 >>> >>> I've expected that after each "git bisect bad" I get the previous >>> commit before the "bad" one. How can get the previous commit? >>> The bisect documentation couldn't help me. >>> >>> Thanks! >>> Robert >>> >>> >>> >>> On 04/12/10 13:23, Michael S. Tsirkin wrote: >>> >>> >>>> On Mon, Apr 12, 2010 at 11:25:26AM +0200, Robert Wimmer wrote: >>>> >>>> >>>> >>>>> server10:/usr/src/linux # git bisect start v2.6.31 v2.6.30 -- >>>>> drivers/virtio/ drivers/net/virtio_net.c >>>>> Bisecting: 12 revisions left to test after this (roughly 4 steps) >>>>> [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance id_matching >>>>> for virtio drivers >>>>> >>>>> >>>>> >>>>> >>>> Sorry I wasn't clear. the way to use bisect is as follows: >>>> - first start as you did now. >>>> 1. now build kernel, install and test >>>> 2. if bug is there, type 'git bisect bad' >>>> 3. if bug is not there, type 'git bisect good' >>>> 4. The above will give you another kernel version to test >>>> if so go back to step 1 >>>> 6. this will be repeated about 4 times (number of steps above) >>>> 7. in the end you will get the first revision which has the >>>> problem. Let's assume it is revision ABCDEF. >>>> >>>> Type git bisect log to see your history. >>>> >>>> 8. Now git reset --hard ABCDEF~1 and try again. >>>> >>>> If you see the problem with ABCDEF but not ABCDEF~1 >>>> then we will have a good guess at the culprit. >>>> >>>> Some more tips here: >>>> http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html >>>> >>>> >>>> >>>> >>>> >>>>> Today I've upgraded to qemu-kvm-0.12.3-r1 (Gentoo package) >>>>> but doesn't help. Still getting "page allocation failure" with >>>>> 2.6.31-rc5. >>>>> >>>>> Does it makes sense to use the same 2.6.31-rc5 kernel >>>>> in the host and guest for testing? Currently I'm still using 2.6.32 >>>>> in host and testing 2.6.31-rc5 in guest until "crashes". >>>>> Then I start the guest with 2.6.30 again which works >>>>> without trouble with 2.6.32 as host. >>>>> >>>>> This is really strange. I have hosts with 2.6.32 running >>>>> guests with 2.6.32 which works perfectly. These hosts >>>>> and guests running on HP DL 380 G6 with Intel Xeon X5560. >>>>> The guests which don't work with 2.6.32 (and 2.6.32 >>>>> as host) running on HP DL 380 G5 with Intel Xeon L5420. >>>>> >>>>> >>>>> >>>> Hmm. Some subtle race? >>>> >>>> >>>> >>>> >>>>> (All guests) and (all hosts) have the same packages >>>>> and the same versions installed and the same kernel >>>>> configs (hosts and guests using different .config but the >>>>> difference is very small e.g. CONFIG_PARAVIRT_SPINLOCKS=y, >>>>> CONFIG_PARAVIRT_GUEST=y in guests but not in hosts >>>>> .config). >>>>> >>>>> I've had problems with qemu-kvm 0.12.2 with high network >>>>> traffic which was solved by a patch submitted by Tom >>>>> Lendacky: >>>>> >>>>> "Fix a race condition where qemu finds that there are not enough virtio >>>>> ring buffers available and the guest make more buffers available before >>>>> qemu can enable notifications." >>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg28667.html >>>>> >>>>> It was a real lifesaver for the HP DL 380 G6 mentioned >>>>> above but maybe this is now causing the problems with the G5 machines. >>>>> The symptoms are the same. I can still log into the guest >>>>> via VNC but the network is down. >>>>> >>>>> Thanks! >>>>> Robert >>>>> >>>>> >>>>> >>>>> >>>> For now the only thing we seem to know for sure is that on >>>> specific hardware there's a regression between 2.6.30 and >>>> 2.6.31-rc5. Yes, it is possible that all it does >>>> is expose a qemu bug, but it's hard to say. >>>> Let's find out what change >>>> does that, this should give us a hint. >>>> >>>> >>>> >>>> >>>>> On 04/11/10 13:03, Michael S. Tsirkin wrote: >>>>> >>>>> >>>>> >>>>>> On Fri, Apr 09, 2010 at 12:15:01PM +0200, Robert Wimmer wrote: >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> I'm not really a git hero so here is what I've done: >>>>>>> >>>>>>> cd /usr/src >>>>>>> git clone >>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux >>>>>>> cd linux >>>>>>> git checkout -b mykernel 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> Looks right. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Then I've checked >>>>>>> >>>>>>> drivers/net/virtio_net.c >>>>>>> drivers/net/smc91x.c >>>>>>> >>>>>>> if the changes commited where not in there. >>>>>>> Next I build my kernel as usual. I used my .config >>>>>>> from 2.6.30 (which is working fine in a several >>>>>>> guests / .config see here: >>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=25925) >>>>>>> and build the kernel >>>>>>> >>>>>>> genkernel --menuconfig --lvm --oldconfig all >>>>>>> >>>>>>> which finally gave me a 2.6.31-rc5. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> That's right. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> I should mention >>>>>>> that 2.6.30 was using SLUB. So here is the output >>>>>>> from the 2.6.31-rc5 kernel running about 20 min.: >>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=25926 >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> Hmm, so we see the error here as well? >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Seems not very usefull to me. I'm currently compiling >>>>>>> the same kernel with SLAB. >>>>>>> >>>>>>> Please let me know if the git commands above are >>>>>>> right and/or if you need other kernel options enabled. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> Looks right. You don't have to add -b flag if you don't >>>>>> want to. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Thanks! >>>>>>> Robert >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> Hmm, I do not see anything else that seems related. >>>>>> Could you please try to bisect? >>>>>> >>>>>> git bisect start v2.6.31 v2.6.30 -- drivers/virtio/ drivers/net/virtio_net.c >>>>>> >>>>>> should help assuming the change that triggers this is in virtio. >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> On 04/08/10 22:04, Michael S. Tsirkin wrote: >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>>> On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote: >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> cc: mst >>>>>>>>> >>>>>>>>> On 04/08/2010 10:34 PM, Andrew Morton wrote: >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>> >>>>>>>>>> (switched to email. Please respond via emailed reply-to-all, not via the >>>>>>>>>> bugzilla web interface). >>>>>>>>>> >>>>>>>>>> On Wed, 7 Apr 2010 10:29:20 GMT >>>>>>>>>> bugzilla-daemon@bugzilla.kernel.org wrote: >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709 >>>>>>>>>>> >>>>>>>>>>> Summary: swapper page allocation failure >>>>>>>>>>> Product: Memory Management >>>>>>>>>>> Version: 2.5 >>>>>>>>>>> Kernel Version: 2.6.32 and 2.6.33 >>>>>>>>>>> Platform: All >>>>>>>>>>> OS/Version: Linux >>>>>>>>>>> Tree: Mainline >>>>>>>>>>> Status: NEW >>>>>>>>>>> Severity: normal >>>>>>>>>>> Priority: P1 >>>>>>>>>>> Component: Slab Allocator >>>>>>>>>>> AssignedTo: akpm@linux-foundation.org >>>>>>>>>>> ReportedBy: kernel@tauceti.net >>>>>>>>>>> Regression: No >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> Created an attachment (id=25903) >>>>>>>>>>> --> (https://bugzilla.kernel.org/attachment.cgi?id=25903) >>>>>>>>>>> dmesg output >>>>>>>>>>> >>>>>>>>>>> I'm having problems with "swapper page allocation failure's" since upgrading >>>>>>>>>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual >>>>>>>>>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As >>>>>>>>>>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after >>>>>>>>>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see >>>>>>>>>>> attachment of dmesg output). The guest is only running a Apache webserver and >>>>>>>>>>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried >>>>>>>>>>> different kernel configurations (e.g. a unmodified version from Sabayon Linux >>>>>>>>>>> Distribution) but doesn't help. Load of the guest (and host) is very low. >>>>>>>>>>> Network traffic is about 20-50 MBit/s. >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>>> >>>>>>>>>> hm, this is a regression. >>>>>>>>>> >>>>>>>>>> : [ 454.006706] users: page allocation failure. order:0, mode:0x20 >>>>>>>>>> : [ 454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2 >>>>>>>>>> : [ 454.006714] Call Trace: >>>>>>>>>> : [ 454.006717]<IRQ> [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615 >>>>>>>>>> : [ 454.006796] [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d >>>>>>>>>> : [ 454.006820] [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f >>>>>>>>>> : [ 454.006842] [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f >>>>>>>>>> : [ 454.006846] [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7 >>>>>>>>>> : [ 454.006858] [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4 >>>>>>>>>> : [ 454.006873] [<ffffffff8176035d>] net_rx_action+0xad/0x1a5 >>>>>>>>>> : [ 454.006882] [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127 >>>>>>>>>> : [ 454.006897] [<ffffffff81008ffc>] call_softirq+0x1c/0x30 >>>>>>>>>> : [ 454.006901] [<ffffffff8100af01>] do_softirq+0x41/0x7e >>>>>>>>>> : [ 454.006904] [<ffffffff8104b3e3>] irq_exit+0x36/0x75 >>>>>>>>>> : [ 454.006907] [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1 >>>>>>>>>> : [ 454.006926] [<ffffffff8183bc13>] ret_from_intr+0x0/0x11 >>>>>>>>>> : [ 454.006928]<EOI> [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7 >>>>>>>>>> : [ 454.006942] [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7 >>>>>>>>>> : [ 454.006946] [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35 >>>>>>>>>> : [ 454.006949] [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b >>>>>>>>>> : [ 454.006953] [<ffffffff810aba67>] __do_fault+0x3c4/0x492 >>>>>>>>>> : [ 454.006957] [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8 >>>>>>>>>> : [ 454.006966] [<ffffffff810deb59>] ? path_put+0x2c/0x30 >>>>>>>>>> : [ 454.006975] [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a >>>>>>>>>> : [ 454.006979] [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd >>>>>>>>>> : [ 454.006982] [<ffffffff8183bef5>] page_fault+0x25/0x30 >>>>>>>>>> : [ 454.006985] Mem-Info: >>>>>>>>>> : [ 454.006987] Node 0 DMA per-cpu: >>>>>>>>>> : [ 454.006990] CPU 0: hi: 0, btch: 1 usd: 0 >>>>>>>>>> : [ 454.006992] CPU 1: hi: 0, btch: 1 usd: 0 >>>>>>>>>> : [ 454.006993] Node 0 DMA32 per-cpu: >>>>>>>>>> : [ 454.006996] CPU 0: hi: 186, btch: 31 usd: 185 >>>>>>>>>> : [ 454.006998] CPU 1: hi: 186, btch: 31 usd: 112 >>>>>>>>>> : [ 454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0 >>>>>>>>>> : [ 454.007005] active_file:4882 inactive_file:205902 isolated_file:0 >>>>>>>>>> : [ 454.007006] unevictable:0 dirty:11 writeback:0 unstable:0 >>>>>>>>>> : [ 454.007007] free:1385 slab_reclaimable:2445 slab_unreclaimable:4466 >>>>>>>>>> : [ 454.007008] mapped:1895 shmem:113 pagetables:1370 bounce:0 >>>>>>>>>> : [ 454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no >>>>>>>>>> : [ 454.007021] lowmem_reserve[]: 0 994 994 994 >>>>>>>>>> : [ 454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no >>>>>>>>>> : [ 454.007036] lowmem_reserve[]: 0 0 0 0 >>>>>>>>>> : [ 454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB >>>>>>>>>> : [ 454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB >>>>>>>>>> : [ 454.007059] 210914 total pagecache pages >>>>>>>>>> : [ 454.007061] 0 pages in swap cache >>>>>>>>>> : [ 454.007063] Swap cache stats: add 0, delete 0, find 0/0 >>>>>>>>>> : [ 454.007065] Free swap = 1959924kB >>>>>>>>>> : [ 454.007067] Total swap = 1959924kB >>>>>>>>>> : [ 454.014238] 262140 pages RAM >>>>>>>>>> : [ 454.014241] 7489 pages reserved >>>>>>>>>> : [ 454.014242] 21430 pages shared >>>>>>>>>> : [ 454.014244] 247174 pages non-shared >>>>>>>>>> >>>>>>>>>> Either page reclaim got worse or kvm/virtio-net got more aggressive. >>>>>>>>>> >>>>>>>>>> Avi, Rusty: can you think of any changes in the KVM/virtio area in the >>>>>>>>>> 2.6.30 -> 2.6.32 timeframe which may have increased the GFP_ATOMIC >>>>>>>>>> demands upon the page allocator? >>>>>>>>>> >>>>>>>>>> Thanks. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>> On the contrary, with commit >>>>>>>> 3161e453e496eb5643faad30fff5a5ab183da0fe >>>>>>>> we should be using GFP_ATOMIC less. >>>>>>>> But maybe there's a bug and it has the reverse effect somehow ... >>>>>>>> >>>>>>>> Robert, could you pls try 3161e453e496eb5643faad30fff5a5ab183da0fe >>>>>>>> and if that *does* have the problem, >>>>>>>> 0b4f2928f14c4a9770b0866923fc81beb7f4aa57? >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-19 12:55 ` Robert Wimmer @ 2010-04-19 13:17 ` Michael S. Tsirkin 2010-04-21 11:23 ` kernel 0 siblings, 1 reply; 62+ messages in thread From: Michael S. Tsirkin @ 2010-04-19 13:17 UTC (permalink / raw) To: Robert Wimmer Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman So it seems the change that created the problem was not specific to virtio. To track this further down, I think the thing to try would be to do a full bisect. That is instead of git bisect start 'v2.6.31' 'v2.6.30' '--' 'drivers/virtio/' 'drivers/net/virtio_net.c' do git bisect start 'v2.6.31' 'v2.6.30' and then test kernel versions as they are generated. On Mon, Apr 19, 2010 at 02:55:21PM +0200, Robert Wimmer wrote: > Is there a possibility to track this further down? > I've problems on two other KVMs since a few weeks > which I think that they're related to this. Host for > this KVMs are kernel 2.6.32. Guests until today were > also running 2.6.32. Inside the KVMs we're using GlusterFS, > NFSv4 and Apache with PHP. From time to time the > httpd-processes are "hanging". When this happens > then we're seeing a lot of soft lockups. This > hosts are running Xeon X5560 processors. Until > today I suspected that this problems only happens > on older Xeon's but this doesn't seems to be true. > I've attached the output from /var/log/messages > (https://bugzilla.kernel.org/attachment.cgi?id=26048) > from one of the hosts with GlusterFS. I've now > downgraded to kernel 2.6.30 in the guests. But since > this problem also exists in 2.6.34-rc3 I suspect that > we're never ever will be able to do a kernel update > in the guests when they're using NFS :-( > > But what I definitely can say is that all the problems > only happens with guests running kernel >= 2.6.31 > and with a remote file system (NFS, GlusterFS). Some > days ago another KVM have had a network shutdown using > kernel 2.6.32 in host and guest + NFSv4. But this only > happend once until now and there isn't so much > traffic running through the interfaces of that host. > > All other guests with kernel 2.6.30 (about 80 guests on > 18 hosts) with NFS and KVM 0.12.3 are really running > perfectly. > > Thanks! > Robert > > > > On 04/13/10 10:51, Robert Wimmer wrote: > > I've tried to do my very best. In general I can > > say: All 2.6.30 versions work, all 2.6.31 fail. 2.6.31-rc3 > > fails with "soft lockup" and is the only one which > > don't show any "swapper page allocation failure". > > But the result is finally the same... 2.6.31-rc4 > > don't show "soft lockups" but "swapper page allocation failure". > > Here is the dmesg output for 2.6.31-rc3: > > https://bugzilla.kernel.org/attachment.cgi?id=25986 > > > > So here is what I've done. Started with a fresh tree > > and my 2.6.30 .config: > > > > rm -fr /usr/src/linux > > cd /usr/src > > git clone > > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux > > cd linux > > git checkout 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 > > > > Here is the "git bisect log" output: > > > > # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 > > # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 > > git bisect start 'v2.6.31' 'v2.6.30' '--' 'drivers/virtio/' > > 'drivers/net/virtio_net.c' > > # good: [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance > > id_matching for virtio drivers > > git bisect good e3353853730eb99c56b7b0aed1667d51c0e3699a > > # good: [9cbc1cb8cd46ce1f7645b9de249b2ce8460129bb] Merge branch 'master' > > of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 > > git bisect good 9cbc1cb8cd46ce1f7645b9de249b2ce8460129bb > > # bad: [ff52c3fc7188855ede75d87b022271f0da309e5b] virtio: fix memory > > leak on device removal > > git bisect bad ff52c3fc7188855ede75d87b022271f0da309e5b > > # good: [31278e71471399beaff9280737e52b47db4dc345] net: group address > > list and its count > > git bisect good 31278e71471399beaff9280737e52b47db4dc345 > > # bad: [4b892e6582e3a4fe01f623aea386907270d5bf83] virtio-pci: correctly > > unregister root device on error > > git bisect bad 4b892e6582e3a4fe01f623aea386907270d5bf83 > > > > Hopefully this gives you some hints. The problem > > for me is that I don't know what commit I should > > consider good or bad. Should I consider the > > commit with the "soft lockup" as good because it > > don't show the allocation failure? Currently it is > > marked as bad (4b892e6582e3a4fe01f623aea386907270d5bf83). > > What should I do next? > > > > Thanks! > > Robert > > > > On 04/12/10 15:52, Michael S. Tsirkin wrote: > > > >> On Mon, Apr 12, 2010 at 03:50:31PM +0200, Robert Wimmer wrote: > >> > >> > >>> Sorry but I need some more git help. Here is what I've done. > >>> Started with a fresh clone of the kernel: > >>> > >>> cd /usr/src > >>> git clone > >>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux > >>> cd linux > >>> git checkout 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 > >>> > >>> Since I already knew that this commit wasn't good I did > >>> > >>> git bisect start > >>> git bisect bad > >>> > >>> > >> I think what you miss is marking the good commit. > >> bisect does a binary search but it needs to know > >> both good and bad commits to search in the range. > >> > >> Optionally, you can use '-- drivers/virtio/ drivers/net/virtio_net.c' > >> what this does is limit bisect to commits that touch files in > >> question. This way you get much less tests to run > >> (about 4) but after you find a first problematic commit > >> you must verify that a commit just before it does not have the issue. > >> > >> If this turns out not to be the case, you'll have to > >> fallback on full bisect, and we will now this is some > >> other change in kernel that triggered the regression. > >> > >> > >> > >> > >>> compiled and started over. As expected the problem returns. > >>> So I've done another > >>> > >>> git bisect bad > >>> > >>> but I always get the same commit: > >>> > >>> kabul:/usr/src/linux # git bisect log > >>> git bisect start > >>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix > >>> compilation on SMP > >>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 > >>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix > >>> compilation on SMP > >>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 > >>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix > >>> compilation on SMP > >>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 > >>> > >>> I've expected that after each "git bisect bad" I get the previous > >>> commit before the "bad" one. How can get the previous commit? > >>> The bisect documentation couldn't help me. > >>> > >>> Thanks! > >>> Robert > >>> > >>> > >>> > >>> On 04/12/10 13:23, Michael S. Tsirkin wrote: > >>> > >>> > >>>> On Mon, Apr 12, 2010 at 11:25:26AM +0200, Robert Wimmer wrote: > >>>> > >>>> > >>>> > >>>>> server10:/usr/src/linux # git bisect start v2.6.31 v2.6.30 -- > >>>>> drivers/virtio/ drivers/net/virtio_net.c > >>>>> Bisecting: 12 revisions left to test after this (roughly 4 steps) > >>>>> [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance id_matching > >>>>> for virtio drivers > >>>>> > >>>>> > >>>>> > >>>>> > >>>> Sorry I wasn't clear. the way to use bisect is as follows: > >>>> - first start as you did now. > >>>> 1. now build kernel, install and test > >>>> 2. if bug is there, type 'git bisect bad' > >>>> 3. if bug is not there, type 'git bisect good' > >>>> 4. The above will give you another kernel version to test > >>>> if so go back to step 1 > >>>> 6. this will be repeated about 4 times (number of steps above) > >>>> 7. in the end you will get the first revision which has the > >>>> problem. Let's assume it is revision ABCDEF. > >>>> > >>>> Type git bisect log to see your history. > >>>> > >>>> 8. Now git reset --hard ABCDEF~1 and try again. > >>>> > >>>> If you see the problem with ABCDEF but not ABCDEF~1 > >>>> then we will have a good guess at the culprit. > >>>> > >>>> Some more tips here: > >>>> http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html > >>>> > >>>> > >>>> > >>>> > >>>> > >>>>> Today I've upgraded to qemu-kvm-0.12.3-r1 (Gentoo package) > >>>>> but doesn't help. Still getting "page allocation failure" with > >>>>> 2.6.31-rc5. > >>>>> > >>>>> Does it makes sense to use the same 2.6.31-rc5 kernel > >>>>> in the host and guest for testing? Currently I'm still using 2.6.32 > >>>>> in host and testing 2.6.31-rc5 in guest until "crashes". > >>>>> Then I start the guest with 2.6.30 again which works > >>>>> without trouble with 2.6.32 as host. > >>>>> > >>>>> This is really strange. I have hosts with 2.6.32 running > >>>>> guests with 2.6.32 which works perfectly. These hosts > >>>>> and guests running on HP DL 380 G6 with Intel Xeon X5560. > >>>>> The guests which don't work with 2.6.32 (and 2.6.32 > >>>>> as host) running on HP DL 380 G5 with Intel Xeon L5420. > >>>>> > >>>>> > >>>>> > >>>> Hmm. Some subtle race? > >>>> > >>>> > >>>> > >>>> > >>>>> (All guests) and (all hosts) have the same packages > >>>>> and the same versions installed and the same kernel > >>>>> configs (hosts and guests using different .config but the > >>>>> difference is very small e.g. CONFIG_PARAVIRT_SPINLOCKS=y, > >>>>> CONFIG_PARAVIRT_GUEST=y in guests but not in hosts > >>>>> .config). > >>>>> > >>>>> I've had problems with qemu-kvm 0.12.2 with high network > >>>>> traffic which was solved by a patch submitted by Tom > >>>>> Lendacky: > >>>>> > >>>>> "Fix a race condition where qemu finds that there are not enough virtio > >>>>> ring buffers available and the guest make more buffers available before > >>>>> qemu can enable notifications." > >>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg28667.html > >>>>> > >>>>> It was a real lifesaver for the HP DL 380 G6 mentioned > >>>>> above but maybe this is now causing the problems with the G5 machines. > >>>>> The symptoms are the same. I can still log into the guest > >>>>> via VNC but the network is down. > >>>>> > >>>>> Thanks! > >>>>> Robert > >>>>> > >>>>> > >>>>> > >>>>> > >>>> For now the only thing we seem to know for sure is that on > >>>> specific hardware there's a regression between 2.6.30 and > >>>> 2.6.31-rc5. Yes, it is possible that all it does > >>>> is expose a qemu bug, but it's hard to say. > >>>> Let's find out what change > >>>> does that, this should give us a hint. > >>>> > >>>> > >>>> > >>>> > >>>>> On 04/11/10 13:03, Michael S. Tsirkin wrote: > >>>>> > >>>>> > >>>>> > >>>>>> On Fri, Apr 09, 2010 at 12:15:01PM +0200, Robert Wimmer wrote: > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> I'm not really a git hero so here is what I've done: > >>>>>>> > >>>>>>> cd /usr/src > >>>>>>> git clone > >>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux > >>>>>>> cd linux > >>>>>>> git checkout -b mykernel 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> Looks right. > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Then I've checked > >>>>>>> > >>>>>>> drivers/net/virtio_net.c > >>>>>>> drivers/net/smc91x.c > >>>>>>> > >>>>>>> if the changes commited where not in there. > >>>>>>> Next I build my kernel as usual. I used my .config > >>>>>>> from 2.6.30 (which is working fine in a several > >>>>>>> guests / .config see here: > >>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=25925) > >>>>>>> and build the kernel > >>>>>>> > >>>>>>> genkernel --menuconfig --lvm --oldconfig all > >>>>>>> > >>>>>>> which finally gave me a 2.6.31-rc5. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> That's right. > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> I should mention > >>>>>>> that 2.6.30 was using SLUB. So here is the output > >>>>>>> from the 2.6.31-rc5 kernel running about 20 min.: > >>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=25926 > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> Hmm, so we see the error here as well? > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Seems not very usefull to me. I'm currently compiling > >>>>>>> the same kernel with SLAB. > >>>>>>> > >>>>>>> Please let me know if the git commands above are > >>>>>>> right and/or if you need other kernel options enabled. > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> Looks right. You don't have to add -b flag if you don't > >>>>>> want to. > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> Thanks! > >>>>>>> Robert > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>> Hmm, I do not see anything else that seems related. > >>>>>> Could you please try to bisect? > >>>>>> > >>>>>> git bisect start v2.6.31 v2.6.30 -- drivers/virtio/ drivers/net/virtio_net.c > >>>>>> > >>>>>> should help assuming the change that triggers this is in virtio. > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>> > >>>>>>> On 04/08/10 22:04, Michael S. Tsirkin wrote: > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>> > >>>>>>>> On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote: > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>>> cc: mst > >>>>>>>>> > >>>>>>>>> On 04/08/2010 10:34 PM, Andrew Morton wrote: > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>> > >>>>>>>>>> (switched to email. Please respond via emailed reply-to-all, not via the > >>>>>>>>>> bugzilla web interface). > >>>>>>>>>> > >>>>>>>>>> On Wed, 7 Apr 2010 10:29:20 GMT > >>>>>>>>>> bugzilla-daemon@bugzilla.kernel.org wrote: > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709 > >>>>>>>>>>> > >>>>>>>>>>> Summary: swapper page allocation failure > >>>>>>>>>>> Product: Memory Management > >>>>>>>>>>> Version: 2.5 > >>>>>>>>>>> Kernel Version: 2.6.32 and 2.6.33 > >>>>>>>>>>> Platform: All > >>>>>>>>>>> OS/Version: Linux > >>>>>>>>>>> Tree: Mainline > >>>>>>>>>>> Status: NEW > >>>>>>>>>>> Severity: normal > >>>>>>>>>>> Priority: P1 > >>>>>>>>>>> Component: Slab Allocator > >>>>>>>>>>> AssignedTo: akpm@linux-foundation.org > >>>>>>>>>>> ReportedBy: kernel@tauceti.net > >>>>>>>>>>> Regression: No > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> Created an attachment (id=25903) > >>>>>>>>>>> --> (https://bugzilla.kernel.org/attachment.cgi?id=25903) > >>>>>>>>>>> dmesg output > >>>>>>>>>>> > >>>>>>>>>>> I'm having problems with "swapper page allocation failure's" since upgrading > >>>>>>>>>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual > >>>>>>>>>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As > >>>>>>>>>>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after > >>>>>>>>>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see > >>>>>>>>>>> attachment of dmesg output). The guest is only running a Apache webserver and > >>>>>>>>>>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried > >>>>>>>>>>> different kernel configurations (e.g. a unmodified version from Sabayon Linux > >>>>>>>>>>> Distribution) but doesn't help. Load of the guest (and host) is very low. > >>>>>>>>>>> Network traffic is about 20-50 MBit/s. > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> hm, this is a regression. > >>>>>>>>>> > >>>>>>>>>> : [ 454.006706] users: page allocation failure. order:0, mode:0x20 > >>>>>>>>>> : [ 454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2 > >>>>>>>>>> : [ 454.006714] Call Trace: > >>>>>>>>>> : [ 454.006717]<IRQ> [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615 > >>>>>>>>>> : [ 454.006796] [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d > >>>>>>>>>> : [ 454.006820] [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f > >>>>>>>>>> : [ 454.006842] [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f > >>>>>>>>>> : [ 454.006846] [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7 > >>>>>>>>>> : [ 454.006858] [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4 > >>>>>>>>>> : [ 454.006873] [<ffffffff8176035d>] net_rx_action+0xad/0x1a5 > >>>>>>>>>> : [ 454.006882] [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127 > >>>>>>>>>> : [ 454.006897] [<ffffffff81008ffc>] call_softirq+0x1c/0x30 > >>>>>>>>>> : [ 454.006901] [<ffffffff8100af01>] do_softirq+0x41/0x7e > >>>>>>>>>> : [ 454.006904] [<ffffffff8104b3e3>] irq_exit+0x36/0x75 > >>>>>>>>>> : [ 454.006907] [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1 > >>>>>>>>>> : [ 454.006926] [<ffffffff8183bc13>] ret_from_intr+0x0/0x11 > >>>>>>>>>> : [ 454.006928]<EOI> [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7 > >>>>>>>>>> : [ 454.006942] [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7 > >>>>>>>>>> : [ 454.006946] [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35 > >>>>>>>>>> : [ 454.006949] [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b > >>>>>>>>>> : [ 454.006953] [<ffffffff810aba67>] __do_fault+0x3c4/0x492 > >>>>>>>>>> : [ 454.006957] [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8 > >>>>>>>>>> : [ 454.006966] [<ffffffff810deb59>] ? path_put+0x2c/0x30 > >>>>>>>>>> : [ 454.006975] [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a > >>>>>>>>>> : [ 454.006979] [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd > >>>>>>>>>> : [ 454.006982] [<ffffffff8183bef5>] page_fault+0x25/0x30 > >>>>>>>>>> : [ 454.006985] Mem-Info: > >>>>>>>>>> : [ 454.006987] Node 0 DMA per-cpu: > >>>>>>>>>> : [ 454.006990] CPU 0: hi: 0, btch: 1 usd: 0 > >>>>>>>>>> : [ 454.006992] CPU 1: hi: 0, btch: 1 usd: 0 > >>>>>>>>>> : [ 454.006993] Node 0 DMA32 per-cpu: > >>>>>>>>>> : [ 454.006996] CPU 0: hi: 186, btch: 31 usd: 185 > >>>>>>>>>> : [ 454.006998] CPU 1: hi: 186, btch: 31 usd: 112 > >>>>>>>>>> : [ 454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0 > >>>>>>>>>> : [ 454.007005] active_file:4882 inactive_file:205902 isolated_file:0 > >>>>>>>>>> : [ 454.007006] unevictable:0 dirty:11 writeback:0 unstable:0 > >>>>>>>>>> : [ 454.007007] free:1385 slab_reclaimable:2445 slab_unreclaimable:4466 > >>>>>>>>>> : [ 454.007008] mapped:1895 shmem:113 pagetables:1370 bounce:0 > >>>>>>>>>> : [ 454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > >>>>>>>>>> : [ 454.007021] lowmem_reserve[]: 0 994 994 994 > >>>>>>>>>> : [ 454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no > >>>>>>>>>> : [ 454.007036] lowmem_reserve[]: 0 0 0 0 > >>>>>>>>>> : [ 454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB > >>>>>>>>>> : [ 454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB > >>>>>>>>>> : [ 454.007059] 210914 total pagecache pages > >>>>>>>>>> : [ 454.007061] 0 pages in swap cache > >>>>>>>>>> : [ 454.007063] Swap cache stats: add 0, delete 0, find 0/0 > >>>>>>>>>> : [ 454.007065] Free swap = 1959924kB > >>>>>>>>>> : [ 454.007067] Total swap = 1959924kB > >>>>>>>>>> : [ 454.014238] 262140 pages RAM > >>>>>>>>>> : [ 454.014241] 7489 pages reserved > >>>>>>>>>> : [ 454.014242] 21430 pages shared > >>>>>>>>>> : [ 454.014244] 247174 pages non-shared > >>>>>>>>>> > >>>>>>>>>> Either page reclaim got worse or kvm/virtio-net got more aggressive. > >>>>>>>>>> > >>>>>>>>>> Avi, Rusty: can you think of any changes in the KVM/virtio area in the > >>>>>>>>>> 2.6.30 -> 2.6.32 timeframe which may have increased the GFP_ATOMIC > >>>>>>>>>> demands upon the page allocator? > >>>>>>>>>> > >>>>>>>>>> Thanks. > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>> On the contrary, with commit > >>>>>>>> 3161e453e496eb5643faad30fff5a5ab183da0fe > >>>>>>>> we should be using GFP_ATOMIC less. > >>>>>>>> But maybe there's a bug and it has the reverse effect somehow ... > >>>>>>>> > >>>>>>>> Robert, could you pls try 3161e453e496eb5643faad30fff5a5ab183da0fe > >>>>>>>> and if that *does* have the problem, > >>>>>>>> 0b4f2928f14c4a9770b0866923fc81beb7f4aa57? > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > >>>>>>>> > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-19 13:17 ` Michael S. Tsirkin @ 2010-04-21 11:23 ` kernel 2010-04-21 9:42 ` Michael S. Tsirkin 0 siblings, 1 reply; 62+ messages in thread From: kernel @ 2010-04-21 11:23 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman So after the compiler was running hot I've now the following result: server10:/usr/src/linux # git bisect log # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 git bisect start 'v2.6.31' 'v2.6.30' # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736): videobuf: modify return value of VIDIOC_REQBUFS ioctl git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3 # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device capabilities of 82599 single speed fiber NICs. git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android: lowmemorykiller: fix up remaining checkpatch warnings git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03 # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70 # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch 'for-linus' of git://www.jni.nu/cris git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge git://git.infradead.org/mtd-2.6 git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2 # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091): gspca_sonixj: Add light frequency control git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge git://git.infradead.org/~dwmw2/iommu-2.6.31 git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix card driver reloading git bisect good b01b4babbf204443b5a846a7494546501614cefc # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private namespace git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50 # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the function put_mnt_ns() git bisect good 616511d039af402670de8500d0e24495113a9cab # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper functions for setting up private namespaces git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae The last "git bisect good" prints out: server10:/usr/src/linux # git bisect good c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit commit c02d7adf8c5429727a98bad1d039bccad4c61c50 Author: Trond Myklebust <Trond.Myklebust@netapp.com> Date: Mon Jun 22 15:09:14 2009 -0400 NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private namespace As noted in the previous patch, the NFSv4 client mount code currently has several limitations. If the mount path contains symlinks, or referrals, or even if it just contains a '..', then the client code in nfs4_path_walk() will fail with an error. This patch replaces the nfs4_path_walk()-based lookup with a helper function that sets up a private namespace to represent the namespace on the server, then uses the ordinary VFS and NFS path lookup code to walk down the mount path in that namespace. Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M fs Does this help you any further? Thanks! Robert On Mon, 19 Apr 2010 16:17:18 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote: > So it seems the change that created the problem was not > specific to virtio. > > To track this further down, I think the thing to try > would be to do a full bisect. > > That is instead of git bisect start 'v2.6.31' 'v2.6.30' '--' > 'drivers/virtio/' > 'drivers/net/virtio_net.c' > > do > > git bisect start 'v2.6.31' 'v2.6.30' > > and then test kernel versions as they are generated. > > > On Mon, Apr 19, 2010 at 02:55:21PM +0200, Robert Wimmer wrote: >> Is there a possibility to track this further down? >> I've problems on two other KVMs since a few weeks >> which I think that they're related to this. Host for >> this KVMs are kernel 2.6.32. Guests until today were >> also running 2.6.32. Inside the KVMs we're using GlusterFS, >> NFSv4 and Apache with PHP. From time to time the >> httpd-processes are "hanging". When this happens >> then we're seeing a lot of soft lockups. This >> hosts are running Xeon X5560 processors. Until >> today I suspected that this problems only happens >> on older Xeon's but this doesn't seems to be true. >> I've attached the output from /var/log/messages >> (https://bugzilla.kernel.org/attachment.cgi?id=26048) >> from one of the hosts with GlusterFS. I've now >> downgraded to kernel 2.6.30 in the guests. But since >> this problem also exists in 2.6.34-rc3 I suspect that >> we're never ever will be able to do a kernel update >> in the guests when they're using NFS :-( >> >> But what I definitely can say is that all the problems >> only happens with guests running kernel >= 2.6.31 >> and with a remote file system (NFS, GlusterFS). Some >> days ago another KVM have had a network shutdown using >> kernel 2.6.32 in host and guest + NFSv4. But this only >> happend once until now and there isn't so much >> traffic running through the interfaces of that host. >> >> All other guests with kernel 2.6.30 (about 80 guests on >> 18 hosts) with NFS and KVM 0.12.3 are really running >> perfectly. >> >> Thanks! >> Robert >> >> >> >> On 04/13/10 10:51, Robert Wimmer wrote: >> > I've tried to do my very best. In general I can >> > say: All 2.6.30 versions work, all 2.6.31 fail. 2.6.31-rc3 >> > fails with "soft lockup" and is the only one which >> > don't show any "swapper page allocation failure". >> > But the result is finally the same... 2.6.31-rc4 >> > don't show "soft lockups" but "swapper page allocation failure". >> > Here is the dmesg output for 2.6.31-rc3: >> > https://bugzilla.kernel.org/attachment.cgi?id=25986 >> > >> > So here is what I've done. Started with a fresh tree >> > and my 2.6.30 .config: >> > >> > rm -fr /usr/src/linux >> > cd /usr/src >> > git clone >> > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git >> > linux >> > cd linux >> > git checkout 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 >> > >> > Here is the "git bisect log" output: >> > >> > # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 >> > # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 >> > git bisect start 'v2.6.31' 'v2.6.30' '--' 'drivers/virtio/' >> > 'drivers/net/virtio_net.c' >> > # good: [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance >> > id_matching for virtio drivers >> > git bisect good e3353853730eb99c56b7b0aed1667d51c0e3699a >> > # good: [9cbc1cb8cd46ce1f7645b9de249b2ce8460129bb] Merge branch >> > 'master' >> > of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6 >> > git bisect good 9cbc1cb8cd46ce1f7645b9de249b2ce8460129bb >> > # bad: [ff52c3fc7188855ede75d87b022271f0da309e5b] virtio: fix memory >> > leak on device removal >> > git bisect bad ff52c3fc7188855ede75d87b022271f0da309e5b >> > # good: [31278e71471399beaff9280737e52b47db4dc345] net: group address >> > list and its count >> > git bisect good 31278e71471399beaff9280737e52b47db4dc345 >> > # bad: [4b892e6582e3a4fe01f623aea386907270d5bf83] virtio-pci: correctly >> > unregister root device on error >> > git bisect bad 4b892e6582e3a4fe01f623aea386907270d5bf83 >> > >> > Hopefully this gives you some hints. The problem >> > for me is that I don't know what commit I should >> > consider good or bad. Should I consider the >> > commit with the "soft lockup" as good because it >> > don't show the allocation failure? Currently it is >> > marked as bad (4b892e6582e3a4fe01f623aea386907270d5bf83). >> > What should I do next? >> > >> > Thanks! >> > Robert >> > >> > On 04/12/10 15:52, Michael S. Tsirkin wrote: >> > >> >> On Mon, Apr 12, 2010 at 03:50:31PM +0200, Robert Wimmer wrote: >> >> >> >> >> >>> Sorry but I need some more git help. Here is what I've done. >> >>> Started with a fresh clone of the kernel: >> >>> >> >>> cd /usr/src >> >>> git clone >> >>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git >> >>> linux >> >>> cd linux >> >>> git checkout 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 >> >>> >> >>> Since I already knew that this commit wasn't good I did >> >>> >> >>> git bisect start >> >>> git bisect bad >> >>> >> >>> >> >> I think what you miss is marking the good commit. >> >> bisect does a binary search but it needs to know >> >> both good and bad commits to search in the range. >> >> >> >> Optionally, you can use '-- drivers/virtio/ drivers/net/virtio_net.c' >> >> what this does is limit bisect to commits that touch files in >> >> question. This way you get much less tests to run >> >> (about 4) but after you find a first problematic commit >> >> you must verify that a commit just before it does not have the issue. >> >> >> >> If this turns out not to be the case, you'll have to >> >> fallback on full bisect, and we will now this is some >> >> other change in kernel that triggered the regression. >> >> >> >> >> >> >> >> >> >>> compiled and started over. As expected the problem returns. >> >>> So I've done another >> >>> >> >>> git bisect bad >> >>> >> >>> but I always get the same commit: >> >>> >> >>> kabul:/usr/src/linux # git bisect log >> >>> git bisect start >> >>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix >> >>> compilation on SMP >> >>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 >> >>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix >> >>> compilation on SMP >> >>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 >> >>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix >> >>> compilation on SMP >> >>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 >> >>> >> >>> I've expected that after each "git bisect bad" I get the previous >> >>> commit before the "bad" one. How can get the previous commit? >> >>> The bisect documentation couldn't help me. >> >>> >> >>> Thanks! >> >>> Robert >> >>> >> >>> >> >>> >> >>> On 04/12/10 13:23, Michael S. Tsirkin wrote: >> >>> >> >>> >> >>>> On Mon, Apr 12, 2010 at 11:25:26AM +0200, Robert Wimmer wrote: >> >>>> >> >>>> >> >>>> >> >>>>> server10:/usr/src/linux # git bisect start v2.6.31 v2.6.30 -- >> >>>>> drivers/virtio/ drivers/net/virtio_net.c >> >>>>> Bisecting: 12 revisions left to test after this (roughly 4 steps) >> >>>>> [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance >> >>>>> id_matching >> >>>>> for virtio drivers >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>> Sorry I wasn't clear. the way to use bisect is as follows: >> >>>> - first start as you did now. >> >>>> 1. now build kernel, install and test >> >>>> 2. if bug is there, type 'git bisect bad' >> >>>> 3. if bug is not there, type 'git bisect good' >> >>>> 4. The above will give you another kernel version to test >> >>>> if so go back to step 1 >> >>>> 6. this will be repeated about 4 times (number of steps above) >> >>>> 7. in the end you will get the first revision which has the >> >>>> problem. Let's assume it is revision ABCDEF. >> >>>> >> >>>> Type git bisect log to see your history. >> >>>> >> >>>> 8. Now git reset --hard ABCDEF~1 and try again. >> >>>> >> >>>> If you see the problem with ABCDEF but not ABCDEF~1 >> >>>> then we will have a good guess at the culprit. >> >>>> >> >>>> Some more tips here: >> >>>> http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>>> Today I've upgraded to qemu-kvm-0.12.3-r1 (Gentoo package) >> >>>>> but doesn't help. Still getting "page allocation failure" with >> >>>>> 2.6.31-rc5. >> >>>>> >> >>>>> Does it makes sense to use the same 2.6.31-rc5 kernel >> >>>>> in the host and guest for testing? Currently I'm still using 2.6.32 >> >>>>> in host and testing 2.6.31-rc5 in guest until "crashes". >> >>>>> Then I start the guest with 2.6.30 again which works >> >>>>> without trouble with 2.6.32 as host. >> >>>>> >> >>>>> This is really strange. I have hosts with 2.6.32 running >> >>>>> guests with 2.6.32 which works perfectly. These hosts >> >>>>> and guests running on HP DL 380 G6 with Intel Xeon X5560. >> >>>>> The guests which don't work with 2.6.32 (and 2.6.32 >> >>>>> as host) running on HP DL 380 G5 with Intel Xeon L5420. >> >>>>> >> >>>>> >> >>>>> >> >>>> Hmm. Some subtle race? >> >>>> >> >>>> >> >>>> >> >>>> >> >>>>> (All guests) and (all hosts) have the same packages >> >>>>> and the same versions installed and the same kernel >> >>>>> configs (hosts and guests using different .config but the >> >>>>> difference is very small e.g. CONFIG_PARAVIRT_SPINLOCKS=y, >> >>>>> CONFIG_PARAVIRT_GUEST=y in guests but not in hosts >> >>>>> .config). >> >>>>> >> >>>>> I've had problems with qemu-kvm 0.12.2 with high network >> >>>>> traffic which was solved by a patch submitted by Tom >> >>>>> Lendacky: >> >>>>> >> >>>>> "Fix a race condition where qemu finds that there are not enough >> >>>>> virtio >> >>>>> ring buffers available and the guest make more buffers available >> >>>>> before >> >>>>> qemu can enable notifications." >> >>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg28667.html >> >>>>> >> >>>>> It was a real lifesaver for the HP DL 380 G6 mentioned >> >>>>> above but maybe this is now causing the problems with the G5 >> >>>>> machines. >> >>>>> The symptoms are the same. I can still log into the guest >> >>>>> via VNC but the network is down. >> >>>>> >> >>>>> Thanks! >> >>>>> Robert >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>> For now the only thing we seem to know for sure is that on >> >>>> specific hardware there's a regression between 2.6.30 and >> >>>> 2.6.31-rc5. Yes, it is possible that all it does >> >>>> is expose a qemu bug, but it's hard to say. >> >>>> Let's find out what change >> >>>> does that, this should give us a hint. >> >>>> >> >>>> >> >>>> >> >>>> >> >>>>> On 04/11/10 13:03, Michael S. Tsirkin wrote: >> >>>>> >> >>>>> >> >>>>> >> >>>>>> On Fri, Apr 09, 2010 at 12:15:01PM +0200, Robert Wimmer wrote: >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>>> I'm not really a git hero so here is what I've done: >> >>>>>>> >> >>>>>>> cd /usr/src >> >>>>>>> git clone >> >>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git >> >>>>>>> linux >> >>>>>>> cd linux >> >>>>>>> git checkout -b mykernel 0b4f2928f14c4a9770b0866923fc81beb7f4aa57 >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>> Looks right. >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>>> Then I've checked >> >>>>>>> >> >>>>>>> drivers/net/virtio_net.c >> >>>>>>> drivers/net/smc91x.c >> >>>>>>> >> >>>>>>> if the changes commited where not in there. >> >>>>>>> Next I build my kernel as usual. I used my .config >> >>>>>>> from 2.6.30 (which is working fine in a several >> >>>>>>> guests / .config see here: >> >>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=25925) >> >>>>>>> and build the kernel >> >>>>>>> >> >>>>>>> genkernel --menuconfig --lvm --oldconfig all >> >>>>>>> >> >>>>>>> which finally gave me a 2.6.31-rc5. >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>> That's right. >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>>> I should mention >> >>>>>>> that 2.6.30 was using SLUB. So here is the output >> >>>>>>> from the 2.6.31-rc5 kernel running about 20 min.: >> >>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=25926 >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>> Hmm, so we see the error here as well? >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>>> Seems not very usefull to me. I'm currently compiling >> >>>>>>> the same kernel with SLAB. >> >>>>>>> >> >>>>>>> Please let me know if the git commands above are >> >>>>>>> right and/or if you need other kernel options enabled. >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>> Looks right. You don't have to add -b flag if you don't >> >>>>>> want to. >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>>> Thanks! >> >>>>>>> Robert >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>> Hmm, I do not see anything else that seems related. >> >>>>>> Could you please try to bisect? >> >>>>>> >> >>>>>> git bisect start v2.6.31 v2.6.30 -- drivers/virtio/ >> >>>>>> drivers/net/virtio_net.c >> >>>>>> >> >>>>>> should help assuming the change that triggers this is in virtio. >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> >>>>>>> On 04/08/10 22:04, Michael S. Tsirkin wrote: >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>>> On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote: >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>>> cc: mst >> >>>>>>>>> >> >>>>>>>>> On 04/08/2010 10:34 PM, Andrew Morton wrote: >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>> >> >>>>>>>>>> (switched to email. Please respond via emailed reply-to-all, >> >>>>>>>>>> not via the >> >>>>>>>>>> bugzilla web interface). >> >>>>>>>>>> >> >>>>>>>>>> On Wed, 7 Apr 2010 10:29:20 GMT >> >>>>>>>>>> bugzilla-daemon@bugzilla.kernel.org wrote: >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709 >> >>>>>>>>>>> >> >>>>>>>>>>> Summary: swapper page allocation failure >> >>>>>>>>>>> Product: Memory Management >> >>>>>>>>>>> Version: 2.5 >> >>>>>>>>>>> Kernel Version: 2.6.32 and 2.6.33 >> >>>>>>>>>>> Platform: All >> >>>>>>>>>>> OS/Version: Linux >> >>>>>>>>>>> Tree: Mainline >> >>>>>>>>>>> Status: NEW >> >>>>>>>>>>> Severity: normal >> >>>>>>>>>>> Priority: P1 >> >>>>>>>>>>> Component: Slab Allocator >> >>>>>>>>>>> AssignedTo: akpm@linux-foundation.org >> >>>>>>>>>>> ReportedBy: kernel@tauceti.net >> >>>>>>>>>>> Regression: No >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> Created an attachment (id=25903) >> >>>>>>>>>>> --> (https://bugzilla.kernel.org/attachment.cgi?id=25903) >> >>>>>>>>>>> dmesg output >> >>>>>>>>>>> >> >>>>>>>>>>> I'm having problems with "swapper page allocation failure's" >> >>>>>>>>>>> since upgrading >> >>>>>>>>>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur >> >>>>>>>>>>> inside a kernel virtual >> >>>>>>>>>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host >> >>>>>>>>>>> which works fine. As >> >>>>>>>>>>> long as kernel 2.6.30 is used as guest kernel the guest runs >> >>>>>>>>>>> fine. But after >> >>>>>>>>>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page >> >>>>>>>>>>> allocation failure's" (see >> >>>>>>>>>>> attachment of dmesg output). The guest is only running a >> >>>>>>>>>>> Apache webserver and >> >>>>>>>>>>> serves files from a NFS share. It has 1 GB RAM and 2 virtual >> >>>>>>>>>>> CPUs. I've tried >> >>>>>>>>>>> different kernel configurations (e.g. a unmodified version >> >>>>>>>>>>> from Sabayon Linux >> >>>>>>>>>>> Distribution) but doesn't help. Load of the guest (and host) >> >>>>>>>>>>> is very low. >> >>>>>>>>>>> Network traffic is about 20-50 MBit/s. >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>>> >> >>>>>>>>>> hm, this is a regression. >> >>>>>>>>>> >> >>>>>>>>>> : [ 454.006706] users: page allocation failure. order:0, >> >>>>>>>>>> mode:0x20 >> >>>>>>>>>> : [ 454.006712] Pid: 7992, comm: users Not tainted >> >>>>>>>>>> 2.6.34-rc3-git6 #2 >> >>>>>>>>>> : [ 454.006714] Call Trace: >> >>>>>>>>>> : [ 454.006717]<IRQ> [<ffffffff8109dff7>] >> >>>>>>>>>> __alloc_pages_nodemask+0x5c8/0x615 >> >>>>>>>>>> : [ 454.006796] [<ffffffff817860ce>] ? >> >>>>>>>>>> ip_local_deliver+0x65/0x6d >> >>>>>>>>>> : [ 454.006820] [<ffffffff810c39c4>] >> >>>>>>>>>> alloc_pages_current+0x96/0x9f >> >>>>>>>>>> : [ 454.006842] [<ffffffff8167f2c7>] >> >>>>>>>>>> try_fill_recv+0x5e/0x20f >> >>>>>>>>>> : [ 454.006846] [<ffffffff8167fe13>] >> >>>>>>>>>> virtnet_poll+0x52a/0x5c7 >> >>>>>>>>>> : [ 454.006858] [<ffffffff8104fe74>] ? >> >>>>>>>>>> run_timer_softirq+0x1dc/0x1f4 >> >>>>>>>>>> : [ 454.006873] [<ffffffff8176035d>] >> >>>>>>>>>> net_rx_action+0xad/0x1a5 >> >>>>>>>>>> : [ 454.006882] [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127 >> >>>>>>>>>> : [ 454.006897] [<ffffffff81008ffc>] call_softirq+0x1c/0x30 >> >>>>>>>>>> : [ 454.006901] [<ffffffff8100af01>] do_softirq+0x41/0x7e >> >>>>>>>>>> : [ 454.006904] [<ffffffff8104b3e3>] irq_exit+0x36/0x75 >> >>>>>>>>>> : [ 454.006907] [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1 >> >>>>>>>>>> : [ 454.006926] [<ffffffff8183bc13>] ret_from_intr+0x0/0x11 >> >>>>>>>>>> : [ 454.006928]<EOI> [<ffffffff81026b25>] ? >> >>>>>>>>>> kvm_deferred_mmu_op+0x5e/0xe7 >> >>>>>>>>>> : [ 454.006942] [<ffffffff81026b19>] ? >> >>>>>>>>>> kvm_deferred_mmu_op+0x52/0xe7 >> >>>>>>>>>> : [ 454.006946] [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35 >> >>>>>>>>>> : [ 454.006949] [<ffffffff81026c7d>] >> >>>>>>>>>> kvm_set_pte_at+0x19/0x1b >> >>>>>>>>>> : [ 454.006953] [<ffffffff810aba67>] __do_fault+0x3c4/0x492 >> >>>>>>>>>> : [ 454.006957] [<ffffffff810adcf4>] >> >>>>>>>>>> handle_mm_fault+0x478/0x9d8 >> >>>>>>>>>> : [ 454.006966] [<ffffffff810deb59>] ? path_put+0x2c/0x30 >> >>>>>>>>>> : [ 454.006975] [<ffffffff8102f162>] >> >>>>>>>>>> do_page_fault+0x2f6/0x31a >> >>>>>>>>>> : [ 454.006979] [<ffffffff8183b81e>] ? >> >>>>>>>>>> _raw_spin_lock+0x9/0xd >> >>>>>>>>>> : [ 454.006982] [<ffffffff8183bef5>] page_fault+0x25/0x30 >> >>>>>>>>>> : [ 454.006985] Mem-Info: >> >>>>>>>>>> : [ 454.006987] Node 0 DMA per-cpu: >> >>>>>>>>>> : [ 454.006990] CPU 0: hi: 0, btch: 1 usd: 0 >> >>>>>>>>>> : [ 454.006992] CPU 1: hi: 0, btch: 1 usd: 0 >> >>>>>>>>>> : [ 454.006993] Node 0 DMA32 per-cpu: >> >>>>>>>>>> : [ 454.006996] CPU 0: hi: 186, btch: 31 usd: 185 >> >>>>>>>>>> : [ 454.006998] CPU 1: hi: 186, btch: 31 usd: 112 >> >>>>>>>>>> : [ 454.007003] active_anon:8308 inactive_anon:8544 >> >>>>>>>>>> isolated_anon:0 >> >>>>>>>>>> : [ 454.007005] active_file:4882 inactive_file:205902 >> >>>>>>>>>> isolated_file:0 >> >>>>>>>>>> : [ 454.007006] unevictable:0 dirty:11 writeback:0 >> >>>>>>>>>> unstable:0 >> >>>>>>>>>> : [ 454.007007] free:1385 slab_reclaimable:2445 >> >>>>>>>>>> slab_unreclaimable:4466 >> >>>>>>>>>> : [ 454.007008] mapped:1895 shmem:113 pagetables:1370 >> >>>>>>>>>> bounce:0 >> >>>>>>>>>> : [ 454.007010] Node 0 DMA free:4000kB min:60kB low:72kB >> >>>>>>>>>> high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB >> >>>>>>>>>> inactive_file:11844kB unevictable:0kB isolated(anon):0kB >> >>>>>>>>>> isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB >> >>>>>>>>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB >> >>>>>>>>>> slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB >> >>>>>>>>>> unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 >> >>>>>>>>>> all_unreclaimable? no >> >>>>>>>>>> : [ 454.007021] lowmem_reserve[]: 0 994 994 994 >> >>>>>>>>>> : [ 454.007025] Node 0 DMA32 free:1540kB min:4000kB >> >>>>>>>>>> low:5000kB high:6000kB active_anon:33232kB >> >>>>>>>>>> inactive_anon:34176kB active_file:19528kB >> >>>>>>>>>> inactive_file:811764kB unevictable:0kB isolated(anon):0kB >> >>>>>>>>>> isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB >> >>>>>>>>>> writeback:0kB mapped:7580kB shmem:452kB >> >>>>>>>>>> slab_reclaimable:9716kB slab_unreclaimable:17832kB >> >>>>>>>>>> kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB >> >>>>>>>>>> writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no >> >>>>>>>>>> : [ 454.007036] lowmem_reserve[]: 0 0 0 0 >> >>>>>>>>>> : [ 454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB >> >>>>>>>>>> 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB >> >>>>>>>>>> : [ 454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB >> >>>>>>>>>> 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = >> >>>>>>>>>> 1556kB >> >>>>>>>>>> : [ 454.007059] 210914 total pagecache pages >> >>>>>>>>>> : [ 454.007061] 0 pages in swap cache >> >>>>>>>>>> : [ 454.007063] Swap cache stats: add 0, delete 0, find 0/0 >> >>>>>>>>>> : [ 454.007065] Free swap = 1959924kB >> >>>>>>>>>> : [ 454.007067] Total swap = 1959924kB >> >>>>>>>>>> : [ 454.014238] 262140 pages RAM >> >>>>>>>>>> : [ 454.014241] 7489 pages reserved >> >>>>>>>>>> : [ 454.014242] 21430 pages shared >> >>>>>>>>>> : [ 454.014244] 247174 pages non-shared >> >>>>>>>>>> >> >>>>>>>>>> Either page reclaim got worse or kvm/virtio-net got more >> >>>>>>>>>> aggressive. >> >>>>>>>>>> >> >>>>>>>>>> Avi, Rusty: can you think of any changes in the KVM/virtio >> >>>>>>>>>> area in the >> >>>>>>>>>> 2.6.30 -> 2.6.32 timeframe which may have increased the >> >>>>>>>>>> GFP_ATOMIC >> >>>>>>>>>> demands upon the page allocator? >> >>>>>>>>>> >> >>>>>>>>>> Thanks. >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>>>> >> >>>>>>>> On the contrary, with commit >> >>>>>>>> 3161e453e496eb5643faad30fff5a5ab183da0fe >> >>>>>>>> we should be using GFP_ATOMIC less. >> >>>>>>>> But maybe there's a bug and it has the reverse effect somehow >> >>>>>>>> ... >> >>>>>>>> >> >>>>>>>> Robert, could you pls try >> >>>>>>>> 3161e453e496eb5643faad30fff5a5ab183da0fe >> >>>>>>>> and if that *does* have the problem, >> >>>>>>>> 0b4f2928f14c4a9770b0866923fc81beb7f4aa57? >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> >>>>>>>> >> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-21 11:23 ` kernel @ 2010-04-21 9:42 ` Michael S. Tsirkin 2010-04-22 11:31 ` kernel 0 siblings, 1 reply; 62+ messages in thread From: Michael S. Tsirkin @ 2010-04-21 9:42 UTC (permalink / raw) To: kernel Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote: > So after the compiler was running hot I've now the following result: > > server10:/usr/src/linux # git bisect log > # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 > # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 > git bisect start 'v2.6.31' 'v2.6.30' > # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736): > videobuf: modify return value of VIDIOC_REQBUFS ioctl > git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3 > # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device > capabilities of 82599 single speed fiber NICs. > git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e > # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android: > lowmemorykiller: fix up remaining checkpatch warnings > git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03 > # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch > 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 > git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70 > # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch 'for-linus' > of git://www.jni.nu/cris > git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d > # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge > git://git.infradead.org/mtd-2.6 > git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2 > # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091): > gspca_sonixj: Add light frequency control > git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb > # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge > git://git.infradead.org/~dwmw2/iommu-2.6.31 > git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b > # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch 'for-linus' > of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 > git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a > # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix card > driver reloading > git bisect good b01b4babbf204443b5a846a7494546501614cefc > # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace > nfs4_path_walk() with VFS path lookup in a private namespace > git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50 > # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the > function put_mnt_ns() > git bisect good 616511d039af402670de8500d0e24495113a9cab > # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper > functions for setting up private namespaces > git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae > > > The last "git bisect good" prints out: > > server10:/usr/src/linux # git bisect good > c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit > commit c02d7adf8c5429727a98bad1d039bccad4c61c50 > Author: Trond Myklebust <Trond.Myklebust@netapp.com> > Date: Mon Jun 22 15:09:14 2009 -0400 > > NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private > namespace > > As noted in the previous patch, the NFSv4 client mount code currently > has several limitations. If the mount path contains symlinks, or > referrals, or even if it just contains a '..', then the client code in > nfs4_path_walk() will fail with an error. > > This patch replaces the nfs4_path_walk()-based lookup with a helper > function that sets up a private namespace to represent the namespace > on the > server, then uses the ordinary VFS and NFS path lookup code to walk > down the > mount path in that namespace. > > Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > > :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc > 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M fs > > Does this help you any further? > > Thanks! > Robert Looks suspiciously like some error in testing. Could you pls retest and verify again that cf8d2c11cb77f129675478792122f50827e5b0ae is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad? -- MST -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-21 9:42 ` Michael S. Tsirkin @ 2010-04-22 11:31 ` kernel 2010-04-22 10:03 ` Michael S. Tsirkin 0 siblings, 1 reply; 62+ messages in thread From: kernel @ 2010-04-22 11:31 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman Maybe some comments to my former mail about what I've done: I started with a fresh clone (deleted the old /usr/src/linux of course). git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux Then I started bisect git bisect start 'v2.6.31' 'v2.6.30' and build the first kernel and then marked kernels which "crashed" with "soft lockup" or "swapper page allocation failure" as bad and the other ones as good. Before I've compiled a new kernel I've always done a "make mrproper". I don't know if this is needed but thought it wouldn't hurt. For me it was not clear that maybe I should have had stopped testing after the first commit that came up with a "swapper page allocation failure". It was only one commit which cased the allocation failure. All the other commits marked as bad came up with a soft lockup. But I thought it is important to find the earliest commit which crashes. So should I find out the commit with the allocation failure? As you requested I've now done now a git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50 which ended with a soft lockup within 3 min. after starting the VM (see https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit) with this kernel. Then I've done a git checkout cf8d2c11cb77f129675478792122f50827e5b0ae compiled and restarted the VM with this kernel version (BTW: Of course I've always used the same .config for all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae is running fine. Thanks! Robert On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com> wrote: > On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote: >> So after the compiler was running hot I've now the following result: >> >> server10:/usr/src/linux # git bisect log >> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 >> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 >> git bisect start 'v2.6.31' 'v2.6.30' >> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736): >> videobuf: modify return value of VIDIOC_REQBUFS ioctl >> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3 >> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device >> capabilities of 82599 single speed fiber NICs. >> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e >> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android: >> lowmemorykiller: fix up remaining checkpatch warnings >> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03 >> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch >> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 >> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70 >> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch >> 'for-linus' >> of git://www.jni.nu/cris >> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d >> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge >> git://git.infradead.org/mtd-2.6 >> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2 >> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091): >> gspca_sonixj: Add light frequency control >> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb >> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge >> git://git.infradead.org/~dwmw2/iommu-2.6.31 >> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b >> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch >> 'for-linus' >> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 >> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a >> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix >> card >> driver reloading >> git bisect good b01b4babbf204443b5a846a7494546501614cefc >> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace >> nfs4_path_walk() with VFS path lookup in a private namespace >> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50 >> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the >> function put_mnt_ns() >> git bisect good 616511d039af402670de8500d0e24495113a9cab >> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper >> functions for setting up private namespaces >> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae >> >> >> The last "git bisect good" prints out: >> >> server10:/usr/src/linux # git bisect good >> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit >> commit c02d7adf8c5429727a98bad1d039bccad4c61c50 >> Author: Trond Myklebust <Trond.Myklebust@netapp.com> >> Date: Mon Jun 22 15:09:14 2009 -0400 >> >> NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private >> namespace >> >> As noted in the previous patch, the NFSv4 client mount code currently >> has several limitations. If the mount path contains symlinks, or >> referrals, or even if it just contains a '..', then the client code >> in >> nfs4_path_walk() will fail with an error. >> >> This patch replaces the nfs4_path_walk()-based lookup with a helper >> function that sets up a private namespace to represent the namespace >> on the >> server, then uses the ordinary VFS and NFS path lookup code to walk >> down the >> mount path in that namespace. >> >> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> >> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> >> >> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc >> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M fs >> >> Does this help you any further? >> >> Thanks! >> Robert > > Looks suspiciously like some error in testing. > Could you pls retest and verify again that > cf8d2c11cb77f129675478792122f50827e5b0ae > is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-22 11:31 ` kernel @ 2010-04-22 10:03 ` Michael S. Tsirkin 0 siblings, 0 replies; 62+ messages in thread From: Michael S. Tsirkin @ 2010-04-22 10:03 UTC (permalink / raw) To: kernel Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs, linux-kernel On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote: > Maybe some comments to my former mail about what I've done: > I started with a fresh clone (deleted the old /usr/src/linux > of course). > > git clone > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux > > Then I started bisect > > git bisect start 'v2.6.31' 'v2.6.30' > > and build the first kernel and then marked kernels which > "crashed" with "soft lockup" or "swapper page allocation failure" > as bad and the other ones as good. Before I've compiled > a new kernel I've always done a "make mrproper". I don't know > if this is needed but thought it wouldn't hurt. > > For me it was not clear that maybe I should have had stopped > testing after the first commit that came up with a "swapper > page allocation failure". It was only one commit which cased > the allocation failure. All the other commits marked as bad > came up with a soft lockup. But I thought it is important to > find the earliest commit which crashes. So should I find out > the commit with the allocation failure? I think you did the right thing. We'll have to figure out soft lockup thing, then if page allocation failure turns out to be a different issue, look at it. > As you requested I've now done now a > > git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50 > > which ended with a soft lockup within 3 min. after starting > the VM (see > https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit) > with this kernel. I'm not sure why the lockup backtrace does not show function names - is the kernel stripped? > > Then I've done a > > git checkout cf8d2c11cb77f129675478792122f50827e5b0ae > > compiled and restarted the VM with this kernel version > (BTW: Of course I've always used the same .config for > all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae > is running fine. > > Thanks! > Robert Well, so the soft lockup issue seems NFS-related? Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to be causing problems on some old kernels (See bisect below). Any idea why? > On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com> > wrote: > > On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote: > >> So after the compiler was running hot I've now the following result: > >> > >> server10:/usr/src/linux # git bisect log > >> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 > >> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 > >> git bisect start 'v2.6.31' 'v2.6.30' > >> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736): > >> videobuf: modify return value of VIDIOC_REQBUFS ioctl > >> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3 > >> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device > >> capabilities of 82599 single speed fiber NICs. > >> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e > >> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android: > >> lowmemorykiller: fix up remaining checkpatch warnings > >> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03 > >> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch > >> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 > >> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70 > >> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch > >> 'for-linus' > >> of git://www.jni.nu/cris > >> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d > >> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge > >> git://git.infradead.org/mtd-2.6 > >> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2 > >> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091): > >> gspca_sonixj: Add light frequency control > >> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb > >> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge > >> git://git.infradead.org/~dwmw2/iommu-2.6.31 > >> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b > >> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch > >> 'for-linus' > >> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 > >> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a > >> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix > >> card > >> driver reloading > >> git bisect good b01b4babbf204443b5a846a7494546501614cefc > >> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace > >> nfs4_path_walk() with VFS path lookup in a private namespace > >> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50 > >> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the > >> function put_mnt_ns() > >> git bisect good 616511d039af402670de8500d0e24495113a9cab > >> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper > >> functions for setting up private namespaces > >> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae > >> > >> > >> The last "git bisect good" prints out: > >> > >> server10:/usr/src/linux # git bisect good > >> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit > >> commit c02d7adf8c5429727a98bad1d039bccad4c61c50 > >> Author: Trond Myklebust <Trond.Myklebust@netapp.com> > >> Date: Mon Jun 22 15:09:14 2009 -0400 > >> > >> NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private > >> namespace > >> > >> As noted in the previous patch, the NFSv4 client mount code > currently > >> has several limitations. If the mount path contains symlinks, or > >> referrals, or even if it just contains a '..', then the client code > >> in > >> nfs4_path_walk() will fail with an error. > >> > >> This patch replaces the nfs4_path_walk()-based lookup with a helper > >> function that sets up a private namespace to represent the > namespace > >> on the > >> server, then uses the ordinary VFS and NFS path lookup code to walk > >> down the > >> mount path in that namespace. > >> > >> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > >> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > >> > >> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc > >> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M fs > >> > >> Does this help you any further? > >> > >> Thanks! > >> Robert > > > > Looks suspiciously like some error in testing. > > Could you pls retest and verify again that > > cf8d2c11cb77f129675478792122f50827e5b0ae > > is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad? ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-04-22 10:03 ` Michael S. Tsirkin 0 siblings, 0 replies; 62+ messages in thread From: Michael S. Tsirkin @ 2010-04-22 10:03 UTC (permalink / raw) To: kernel Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs, linux-kernel On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote: > Maybe some comments to my former mail about what I've done: > I started with a fresh clone (deleted the old /usr/src/linux > of course). > > git clone > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux > > Then I started bisect > > git bisect start 'v2.6.31' 'v2.6.30' > > and build the first kernel and then marked kernels which > "crashed" with "soft lockup" or "swapper page allocation failure" > as bad and the other ones as good. Before I've compiled > a new kernel I've always done a "make mrproper". I don't know > if this is needed but thought it wouldn't hurt. > > For me it was not clear that maybe I should have had stopped > testing after the first commit that came up with a "swapper > page allocation failure". It was only one commit which cased > the allocation failure. All the other commits marked as bad > came up with a soft lockup. But I thought it is important to > find the earliest commit which crashes. So should I find out > the commit with the allocation failure? I think you did the right thing. We'll have to figure out soft lockup thing, then if page allocation failure turns out to be a different issue, look at it. > As you requested I've now done now a > > git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50 > > which ended with a soft lockup within 3 min. after starting > the VM (see > https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit) > with this kernel. I'm not sure why the lockup backtrace does not show function names - is the kernel stripped? > > Then I've done a > > git checkout cf8d2c11cb77f129675478792122f50827e5b0ae > > compiled and restarted the VM with this kernel version > (BTW: Of course I've always used the same .config for > all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae > is running fine. > > Thanks! > Robert Well, so the soft lockup issue seems NFS-related? Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to be causing problems on some old kernels (See bisect below). Any idea why? > On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com> > wrote: > > On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote: > >> So after the compiler was running hot I've now the following result: > >> > >> server10:/usr/src/linux # git bisect log > >> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 > >> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 > >> git bisect start 'v2.6.31' 'v2.6.30' > >> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736): > >> videobuf: modify return value of VIDIOC_REQBUFS ioctl > >> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3 > >> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device > >> capabilities of 82599 single speed fiber NICs. > >> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e > >> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android: > >> lowmemorykiller: fix up remaining checkpatch warnings > >> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03 > >> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch > >> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 > >> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70 > >> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch > >> 'for-linus' > >> of git://www.jni.nu/cris > >> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d > >> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge > >> git://git.infradead.org/mtd-2.6 > >> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2 > >> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091): > >> gspca_sonixj: Add light frequency control > >> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb > >> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge > >> git://git.infradead.org/~dwmw2/iommu-2.6.31 > >> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b > >> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch > >> 'for-linus' > >> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 > >> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a > >> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix > >> card > >> driver reloading > >> git bisect good b01b4babbf204443b5a846a7494546501614cefc > >> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace > >> nfs4_path_walk() with VFS path lookup in a private namespace > >> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50 > >> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the > >> function put_mnt_ns() > >> git bisect good 616511d039af402670de8500d0e24495113a9cab > >> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper > >> functions for setting up private namespaces > >> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae > >> > >> > >> The last "git bisect good" prints out: > >> > >> server10:/usr/src/linux # git bisect good > >> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit > >> commit c02d7adf8c5429727a98bad1d039bccad4c61c50 > >> Author: Trond Myklebust <Trond.Myklebust@netapp.com> > >> Date: Mon Jun 22 15:09:14 2009 -0400 > >> > >> NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private > >> namespace > >> > >> As noted in the previous patch, the NFSv4 client mount code > currently > >> has several limitations. If the mount path contains symlinks, or > >> referrals, or even if it just contains a '..', then the client code > >> in > >> nfs4_path_walk() will fail with an error. > >> > >> This patch replaces the nfs4_path_walk()-based lookup with a helper > >> function that sets up a private namespace to represent the > namespace > >> on the > >> server, then uses the ordinary VFS and NFS path lookup code to walk > >> down the > >> mount path in that namespace. > >> > >> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > >> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > >> > >> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc > >> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M fs > >> > >> Does this help you any further? > >> > >> Thanks! > >> Robert > > > > Looks suspiciously like some error in testing. > > Could you pls retest and verify again that > > cf8d2c11cb77f129675478792122f50827e5b0ae > > is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad? -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-22 10:03 ` Michael S. Tsirkin @ 2010-04-23 5:26 ` Robert Wimmer -1 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-04-23 5:26 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs, linux-kernel > I'm not sure why the lockup backtrace does not show function names - > is the kernel stripped? I'm building the kernels always with "genkernel" a Gentoo helper programm for kernel building. But I've looked into the log file of genkernel and there is nothing mentioned about striping the kernel. There will be a future release of genkernel which supports this but this is currently not the case. Since I haven't stripped the kernel I would answer no. Maybe a kernel option which should be enabled? Thanks! Robert On 04/22/10 12:03, Michael S. Tsirkin wrote: > On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote: > >> Maybe some comments to my former mail about what I've done: >> I started with a fresh clone (deleted the old /usr/src/linux >> of course). >> >> git clone >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux >> >> Then I started bisect >> >> git bisect start 'v2.6.31' 'v2.6.30' >> >> and build the first kernel and then marked kernels which >> "crashed" with "soft lockup" or "swapper page allocation failure" >> as bad and the other ones as good. Before I've compiled >> a new kernel I've always done a "make mrproper". I don't know >> if this is needed but thought it wouldn't hurt. >> >> For me it was not clear that maybe I should have had stopped >> testing after the first commit that came up with a "swapper >> page allocation failure". It was only one commit which cased >> the allocation failure. All the other commits marked as bad >> came up with a soft lockup. But I thought it is important to >> find the earliest commit which crashes. So should I find out >> the commit with the allocation failure? >> > I think you did the right thing. We'll have to > figure out soft lockup thing, then if page allocation failure > turns out to be a different issue, look at it. > > >> As you requested I've now done now a >> >> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50 >> >> which ended with a soft lockup within 3 min. after starting >> the VM (see >> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit) >> with this kernel. >> > I'm not sure why the lockup backtrace does not show function names - > is the kernel stripped? > > >> Then I've done a >> >> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae >> >> compiled and restarted the VM with this kernel version >> (BTW: Of course I've always used the same .config for >> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae >> is running fine. >> >> Thanks! >> Robert >> > Well, so the soft lockup issue seems NFS-related? > Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to > be causing problems on some old kernels (See bisect below). Any idea why? > > > >> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com> >> wrote: >> >>> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote: >>> >>>> So after the compiler was running hot I've now the following result: >>>> >>>> server10:/usr/src/linux # git bisect log >>>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 >>>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 >>>> git bisect start 'v2.6.31' 'v2.6.30' >>>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736): >>>> videobuf: modify return value of VIDIOC_REQBUFS ioctl >>>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3 >>>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device >>>> capabilities of 82599 single speed fiber NICs. >>>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e >>>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android: >>>> lowmemorykiller: fix up remaining checkpatch warnings >>>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03 >>>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch >>>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 >>>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70 >>>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch >>>> 'for-linus' >>>> of git://www.jni.nu/cris >>>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d >>>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge >>>> git://git.infradead.org/mtd-2.6 >>>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2 >>>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091): >>>> gspca_sonixj: Add light frequency control >>>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb >>>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge >>>> git://git.infradead.org/~dwmw2/iommu-2.6.31 >>>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b >>>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch >>>> 'for-linus' >>>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 >>>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a >>>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix >>>> card >>>> driver reloading >>>> git bisect good b01b4babbf204443b5a846a7494546501614cefc >>>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace >>>> nfs4_path_walk() with VFS path lookup in a private namespace >>>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50 >>>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the >>>> function put_mnt_ns() >>>> git bisect good 616511d039af402670de8500d0e24495113a9cab >>>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper >>>> functions for setting up private namespaces >>>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae >>>> >>>> >>>> The last "git bisect good" prints out: >>>> >>>> server10:/usr/src/linux # git bisect good >>>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit >>>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50 >>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com> >>>> Date: Mon Jun 22 15:09:14 2009 -0400 >>>> >>>> NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private >>>> namespace >>>> >>>> As noted in the previous patch, the NFSv4 client mount code >>>> >> currently >> >>>> has several limitations. If the mount path contains symlinks, or >>>> referrals, or even if it just contains a '..', then the client code >>>> in >>>> nfs4_path_walk() will fail with an error. >>>> >>>> This patch replaces the nfs4_path_walk()-based lookup with a helper >>>> function that sets up a private namespace to represent the >>>> >> namespace >> >>>> on the >>>> server, then uses the ordinary VFS and NFS path lookup code to walk >>>> down the >>>> mount path in that namespace. >>>> >>>> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> >>>> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> >>>> >>>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc >>>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M fs >>>> >>>> Does this help you any further? >>>> >>>> Thanks! >>>> Robert >>>> >>> Looks suspiciously like some error in testing. >>> Could you pls retest and verify again that >>> cf8d2c11cb77f129675478792122f50827e5b0ae >>> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad? >>> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-04-23 5:26 ` Robert Wimmer 0 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-04-23 5:26 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs, linux-kernel > I'm not sure why the lockup backtrace does not show function names - > is the kernel stripped? I'm building the kernels always with "genkernel" a Gentoo helper programm for kernel building. But I've looked into the log file of genkernel and there is nothing mentioned about striping the kernel. There will be a future release of genkernel which supports this but this is currently not the case. Since I haven't stripped the kernel I would answer no. Maybe a kernel option which should be enabled? Thanks! Robert On 04/22/10 12:03, Michael S. Tsirkin wrote: > On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote: > >> Maybe some comments to my former mail about what I've done: >> I started with a fresh clone (deleted the old /usr/src/linux >> of course). >> >> git clone >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux >> >> Then I started bisect >> >> git bisect start 'v2.6.31' 'v2.6.30' >> >> and build the first kernel and then marked kernels which >> "crashed" with "soft lockup" or "swapper page allocation failure" >> as bad and the other ones as good. Before I've compiled >> a new kernel I've always done a "make mrproper". I don't know >> if this is needed but thought it wouldn't hurt. >> >> For me it was not clear that maybe I should have had stopped >> testing after the first commit that came up with a "swapper >> page allocation failure". It was only one commit which cased >> the allocation failure. All the other commits marked as bad >> came up with a soft lockup. But I thought it is important to >> find the earliest commit which crashes. So should I find out >> the commit with the allocation failure? >> > I think you did the right thing. We'll have to > figure out soft lockup thing, then if page allocation failure > turns out to be a different issue, look at it. > > >> As you requested I've now done now a >> >> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50 >> >> which ended with a soft lockup within 3 min. after starting >> the VM (see >> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit) >> with this kernel. >> > I'm not sure why the lockup backtrace does not show function names - > is the kernel stripped? > > >> Then I've done a >> >> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae >> >> compiled and restarted the VM with this kernel version >> (BTW: Of course I've always used the same .config for >> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae >> is running fine. >> >> Thanks! >> Robert >> > Well, so the soft lockup issue seems NFS-related? > Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to > be causing problems on some old kernels (See bisect below). Any idea why? > > > >> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com> >> wrote: >> >>> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote: >>> >>>> So after the compiler was running hot I've now the following result: >>>> >>>> server10:/usr/src/linux # git bisect log >>>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 >>>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 >>>> git bisect start 'v2.6.31' 'v2.6.30' >>>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736): >>>> videobuf: modify return value of VIDIOC_REQBUFS ioctl >>>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3 >>>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device >>>> capabilities of 82599 single speed fiber NICs. >>>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e >>>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android: >>>> lowmemorykiller: fix up remaining checkpatch warnings >>>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03 >>>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch >>>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 >>>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70 >>>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch >>>> 'for-linus' >>>> of git://www.jni.nu/cris >>>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d >>>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge >>>> git://git.infradead.org/mtd-2.6 >>>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2 >>>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091): >>>> gspca_sonixj: Add light frequency control >>>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb >>>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge >>>> git://git.infradead.org/~dwmw2/iommu-2.6.31 >>>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b >>>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch >>>> 'for-linus' >>>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 >>>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a >>>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix >>>> card >>>> driver reloading >>>> git bisect good b01b4babbf204443b5a846a7494546501614cefc >>>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace >>>> nfs4_path_walk() with VFS path lookup in a private namespace >>>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50 >>>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the >>>> function put_mnt_ns() >>>> git bisect good 616511d039af402670de8500d0e24495113a9cab >>>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper >>>> functions for setting up private namespaces >>>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae >>>> >>>> >>>> The last "git bisect good" prints out: >>>> >>>> server10:/usr/src/linux # git bisect good >>>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit >>>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50 >>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com> >>>> Date: Mon Jun 22 15:09:14 2009 -0400 >>>> >>>> NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private >>>> namespace >>>> >>>> As noted in the previous patch, the NFSv4 client mount code >>>> >> currently >> >>>> has several limitations. If the mount path contains symlinks, or >>>> referrals, or even if it just contains a '..', then the client code >>>> in >>>> nfs4_path_walk() will fail with an error. >>>> >>>> This patch replaces the nfs4_path_walk()-based lookup with a helper >>>> function that sets up a private namespace to represent the >>>> >> namespace >> >>>> on the >>>> server, then uses the ordinary VFS and NFS path lookup code to walk >>>> down the >>>> mount path in that namespace. >>>> >>>> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> >>>> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> >>>> >>>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc >>>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M fs >>>> >>>> Does this help you any further? >>>> >>>> Thanks! >>>> Robert >>>> >>> Looks suspiciously like some error in testing. >>> Could you pls retest and verify again that >>> cf8d2c11cb77f129675478792122f50827e5b0ae >>> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad? >>> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-23 5:26 ` Robert Wimmer @ 2010-04-25 9:18 ` Michael S. Tsirkin -1 siblings, 0 replies; 62+ messages in thread From: Michael S. Tsirkin @ 2010-04-25 9:18 UTC (permalink / raw) To: Robert Wimmer Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs, linux-kernel On Fri, Apr 23, 2010 at 07:26:52AM +0200, Robert Wimmer wrote: > > I'm not sure why the lockup backtrace does not show function names - > > is the kernel stripped? > > I'm building the kernels always with "genkernel" a Gentoo > helper programm for kernel building. But I've looked into > the log file of genkernel and there is nothing mentioned about > striping the kernel. There will be a future release of genkernel > which supports this but this is currently not the case. Since > I haven't stripped the kernel I would answer no. Maybe a > kernel option which should be enabled? > > Thanks! > Robert > Hmm. I have these CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y CONFIG_KALLSYMS_EXTRA_PASS=y # CONFIG_STRIP_ASM_SYMS is not set > > > On 04/22/10 12:03, Michael S. Tsirkin wrote: > > On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote: > > > >> Maybe some comments to my former mail about what I've done: > >> I started with a fresh clone (deleted the old /usr/src/linux > >> of course). > >> > >> git clone > >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux > >> > >> Then I started bisect > >> > >> git bisect start 'v2.6.31' 'v2.6.30' > >> > >> and build the first kernel and then marked kernels which > >> "crashed" with "soft lockup" or "swapper page allocation failure" > >> as bad and the other ones as good. Before I've compiled > >> a new kernel I've always done a "make mrproper". I don't know > >> if this is needed but thought it wouldn't hurt. > >> > >> For me it was not clear that maybe I should have had stopped > >> testing after the first commit that came up with a "swapper > >> page allocation failure". It was only one commit which cased > >> the allocation failure. All the other commits marked as bad > >> came up with a soft lockup. But I thought it is important to > >> find the earliest commit which crashes. So should I find out > >> the commit with the allocation failure? > >> > > I think you did the right thing. We'll have to > > figure out soft lockup thing, then if page allocation failure > > turns out to be a different issue, look at it. > > > > > >> As you requested I've now done now a > >> > >> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50 > >> > >> which ended with a soft lockup within 3 min. after starting > >> the VM (see > >> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit) > >> with this kernel. > >> > > I'm not sure why the lockup backtrace does not show function names - > > is the kernel stripped? > > > > > >> Then I've done a > >> > >> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae > >> > >> compiled and restarted the VM with this kernel version > >> (BTW: Of course I've always used the same .config for > >> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae > >> is running fine. > >> > >> Thanks! > >> Robert > >> > > Well, so the soft lockup issue seems NFS-related? > > Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to > > be causing problems on some old kernels (See bisect below). Any idea why? > > > > > > > >> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com> > >> wrote: > >> > >>> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote: > >>> > >>>> So after the compiler was running hot I've now the following result: > >>>> > >>>> server10:/usr/src/linux # git bisect log > >>>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 > >>>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 > >>>> git bisect start 'v2.6.31' 'v2.6.30' > >>>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736): > >>>> videobuf: modify return value of VIDIOC_REQBUFS ioctl > >>>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3 > >>>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device > >>>> capabilities of 82599 single speed fiber NICs. > >>>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e > >>>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android: > >>>> lowmemorykiller: fix up remaining checkpatch warnings > >>>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03 > >>>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch > >>>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 > >>>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70 > >>>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch > >>>> 'for-linus' > >>>> of git://www.jni.nu/cris > >>>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d > >>>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge > >>>> git://git.infradead.org/mtd-2.6 > >>>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2 > >>>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091): > >>>> gspca_sonixj: Add light frequency control > >>>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb > >>>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge > >>>> git://git.infradead.org/~dwmw2/iommu-2.6.31 > >>>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b > >>>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch > >>>> 'for-linus' > >>>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 > >>>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a > >>>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix > >>>> card > >>>> driver reloading > >>>> git bisect good b01b4babbf204443b5a846a7494546501614cefc > >>>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace > >>>> nfs4_path_walk() with VFS path lookup in a private namespace > >>>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50 > >>>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the > >>>> function put_mnt_ns() > >>>> git bisect good 616511d039af402670de8500d0e24495113a9cab > >>>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper > >>>> functions for setting up private namespaces > >>>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae > >>>> > >>>> > >>>> The last "git bisect good" prints out: > >>>> > >>>> server10:/usr/src/linux # git bisect good > >>>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit > >>>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50 > >>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com> > >>>> Date: Mon Jun 22 15:09:14 2009 -0400 > >>>> > >>>> NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private > >>>> namespace > >>>> > >>>> As noted in the previous patch, the NFSv4 client mount code > >>>> > >> currently > >> > >>>> has several limitations. If the mount path contains symlinks, or > >>>> referrals, or even if it just contains a '..', then the client code > >>>> in > >>>> nfs4_path_walk() will fail with an error. > >>>> > >>>> This patch replaces the nfs4_path_walk()-based lookup with a helper > >>>> function that sets up a private namespace to represent the > >>>> > >> namespace > >> > >>>> on the > >>>> server, then uses the ordinary VFS and NFS path lookup code to walk > >>>> down the > >>>> mount path in that namespace. > >>>> > >>>> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > >>>> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > >>>> > >>>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc > >>>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M fs > >>>> > >>>> Does this help you any further? > >>>> > >>>> Thanks! > >>>> Robert > >>>> > >>> Looks suspiciously like some error in testing. > >>> Could you pls retest and verify again that > >>> cf8d2c11cb77f129675478792122f50827e5b0ae > >>> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad? > >>> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-04-25 9:18 ` Michael S. Tsirkin 0 siblings, 0 replies; 62+ messages in thread From: Michael S. Tsirkin @ 2010-04-25 9:18 UTC (permalink / raw) To: Robert Wimmer Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs, linux-kernel On Fri, Apr 23, 2010 at 07:26:52AM +0200, Robert Wimmer wrote: > > I'm not sure why the lockup backtrace does not show function names - > > is the kernel stripped? > > I'm building the kernels always with "genkernel" a Gentoo > helper programm for kernel building. But I've looked into > the log file of genkernel and there is nothing mentioned about > striping the kernel. There will be a future release of genkernel > which supports this but this is currently not the case. Since > I haven't stripped the kernel I would answer no. Maybe a > kernel option which should be enabled? > > Thanks! > Robert > Hmm. I have these CONFIG_KALLSYMS=y CONFIG_KALLSYMS_ALL=y CONFIG_KALLSYMS_EXTRA_PASS=y # CONFIG_STRIP_ASM_SYMS is not set > > > On 04/22/10 12:03, Michael S. Tsirkin wrote: > > On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote: > > > >> Maybe some comments to my former mail about what I've done: > >> I started with a fresh clone (deleted the old /usr/src/linux > >> of course). > >> > >> git clone > >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux > >> > >> Then I started bisect > >> > >> git bisect start 'v2.6.31' 'v2.6.30' > >> > >> and build the first kernel and then marked kernels which > >> "crashed" with "soft lockup" or "swapper page allocation failure" > >> as bad and the other ones as good. Before I've compiled > >> a new kernel I've always done a "make mrproper". I don't know > >> if this is needed but thought it wouldn't hurt. > >> > >> For me it was not clear that maybe I should have had stopped > >> testing after the first commit that came up with a "swapper > >> page allocation failure". It was only one commit which cased > >> the allocation failure. All the other commits marked as bad > >> came up with a soft lockup. But I thought it is important to > >> find the earliest commit which crashes. So should I find out > >> the commit with the allocation failure? > >> > > I think you did the right thing. We'll have to > > figure out soft lockup thing, then if page allocation failure > > turns out to be a different issue, look at it. > > > > > >> As you requested I've now done now a > >> > >> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50 > >> > >> which ended with a soft lockup within 3 min. after starting > >> the VM (see > >> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit) > >> with this kernel. > >> > > I'm not sure why the lockup backtrace does not show function names - > > is the kernel stripped? > > > > > >> Then I've done a > >> > >> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae > >> > >> compiled and restarted the VM with this kernel version > >> (BTW: Of course I've always used the same .config for > >> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae > >> is running fine. > >> > >> Thanks! > >> Robert > >> > > Well, so the soft lockup issue seems NFS-related? > > Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to > > be causing problems on some old kernels (See bisect below). Any idea why? > > > > > > > >> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com> > >> wrote: > >> > >>> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote: > >>> > >>>> So after the compiler was running hot I've now the following result: > >>>> > >>>> server10:/usr/src/linux # git bisect log > >>>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 > >>>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 > >>>> git bisect start 'v2.6.31' 'v2.6.30' > >>>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736): > >>>> videobuf: modify return value of VIDIOC_REQBUFS ioctl > >>>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3 > >>>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device > >>>> capabilities of 82599 single speed fiber NICs. > >>>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e > >>>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android: > >>>> lowmemorykiller: fix up remaining checkpatch warnings > >>>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03 > >>>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch > >>>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 > >>>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70 > >>>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch > >>>> 'for-linus' > >>>> of git://www.jni.nu/cris > >>>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d > >>>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge > >>>> git://git.infradead.org/mtd-2.6 > >>>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2 > >>>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091): > >>>> gspca_sonixj: Add light frequency control > >>>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb > >>>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge > >>>> git://git.infradead.org/~dwmw2/iommu-2.6.31 > >>>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b > >>>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch > >>>> 'for-linus' > >>>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 > >>>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a > >>>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix > >>>> card > >>>> driver reloading > >>>> git bisect good b01b4babbf204443b5a846a7494546501614cefc > >>>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace > >>>> nfs4_path_walk() with VFS path lookup in a private namespace > >>>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50 > >>>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the > >>>> function put_mnt_ns() > >>>> git bisect good 616511d039af402670de8500d0e24495113a9cab > >>>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper > >>>> functions for setting up private namespaces > >>>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae > >>>> > >>>> > >>>> The last "git bisect good" prints out: > >>>> > >>>> server10:/usr/src/linux # git bisect good > >>>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit > >>>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50 > >>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com> > >>>> Date: Mon Jun 22 15:09:14 2009 -0400 > >>>> > >>>> NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private > >>>> namespace > >>>> > >>>> As noted in the previous patch, the NFSv4 client mount code > >>>> > >> currently > >> > >>>> has several limitations. If the mount path contains symlinks, or > >>>> referrals, or even if it just contains a '..', then the client code > >>>> in > >>>> nfs4_path_walk() will fail with an error. > >>>> > >>>> This patch replaces the nfs4_path_walk()-based lookup with a helper > >>>> function that sets up a private namespace to represent the > >>>> > >> namespace > >> > >>>> on the > >>>> server, then uses the ordinary VFS and NFS path lookup code to walk > >>>> down the > >>>> mount path in that namespace. > >>>> > >>>> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > >>>> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > >>>> > >>>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc > >>>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M fs > >>>> > >>>> Does this help you any further? > >>>> > >>>> Thanks! > >>>> Robert > >>>> > >>> Looks suspiciously like some error in testing. > >>> Could you pls retest and verify again that > >>> cf8d2c11cb77f129675478792122f50827e5b0ae > >>> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad? > >>> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-25 9:18 ` Michael S. Tsirkin @ 2010-04-25 20:41 ` Robert Wimmer -1 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-04-25 20:41 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs, linux-kernel I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL to my .config. I've uploaded the dmesg output. Maybe it helps a little bit: https://bugzilla.kernel.org/attachment.cgi?id=26138 - Robert On 04/25/10 11:18, Michael S. Tsirkin wrote: > On Fri, Apr 23, 2010 at 07:26:52AM +0200, Robert Wimmer wrote: > >>> I'm not sure why the lockup backtrace does not show function names - >>> is the kernel stripped? >>> >> I'm building the kernels always with "genkernel" a Gentoo >> helper programm for kernel building. But I've looked into >> the log file of genkernel and there is nothing mentioned about >> striping the kernel. There will be a future release of genkernel >> which supports this but this is currently not the case. Since >> I haven't stripped the kernel I would answer no. Maybe a >> kernel option which should be enabled? >> >> Thanks! >> Robert >> >> > Hmm. I have these > CONFIG_KALLSYMS=y > CONFIG_KALLSYMS_ALL=y > CONFIG_KALLSYMS_EXTRA_PASS=y > # CONFIG_STRIP_ASM_SYMS is not set > > > >> >> On 04/22/10 12:03, Michael S. Tsirkin wrote: >> >>> On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote: >>> >>> >>>> Maybe some comments to my former mail about what I've done: >>>> I started with a fresh clone (deleted the old /usr/src/linux >>>> of course). >>>> >>>> git clone >>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux >>>> >>>> Then I started bisect >>>> >>>> git bisect start 'v2.6.31' 'v2.6.30' >>>> >>>> and build the first kernel and then marked kernels which >>>> "crashed" with "soft lockup" or "swapper page allocation failure" >>>> as bad and the other ones as good. Before I've compiled >>>> a new kernel I've always done a "make mrproper". I don't know >>>> if this is needed but thought it wouldn't hurt. >>>> >>>> For me it was not clear that maybe I should have had stopped >>>> testing after the first commit that came up with a "swapper >>>> page allocation failure". It was only one commit which cased >>>> the allocation failure. All the other commits marked as bad >>>> came up with a soft lockup. But I thought it is important to >>>> find the earliest commit which crashes. So should I find out >>>> the commit with the allocation failure? >>>> >>>> >>> I think you did the right thing. We'll have to >>> figure out soft lockup thing, then if page allocation failure >>> turns out to be a different issue, look at it. >>> >>> >>> >>>> As you requested I've now done now a >>>> >>>> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50 >>>> >>>> which ended with a soft lockup within 3 min. after starting >>>> the VM (see >>>> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit) >>>> with this kernel. >>>> >>>> >>> I'm not sure why the lockup backtrace does not show function names - >>> is the kernel stripped? >>> >>> >>> >>>> Then I've done a >>>> >>>> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae >>>> >>>> compiled and restarted the VM with this kernel version >>>> (BTW: Of course I've always used the same .config for >>>> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae >>>> is running fine. >>>> >>>> Thanks! >>>> Robert >>>> >>>> >>> Well, so the soft lockup issue seems NFS-related? >>> Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to >>> be causing problems on some old kernels (See bisect below). Any idea why? >>> >>> >>> >>> >>>> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com> >>>> wrote: >>>> >>>> >>>>> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote: >>>>> >>>>> >>>>>> So after the compiler was running hot I've now the following result: >>>>>> >>>>>> server10:/usr/src/linux # git bisect log >>>>>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 >>>>>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 >>>>>> git bisect start 'v2.6.31' 'v2.6.30' >>>>>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736): >>>>>> videobuf: modify return value of VIDIOC_REQBUFS ioctl >>>>>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3 >>>>>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device >>>>>> capabilities of 82599 single speed fiber NICs. >>>>>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e >>>>>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android: >>>>>> lowmemorykiller: fix up remaining checkpatch warnings >>>>>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03 >>>>>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch >>>>>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 >>>>>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70 >>>>>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch >>>>>> 'for-linus' >>>>>> of git://www.jni.nu/cris >>>>>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d >>>>>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge >>>>>> git://git.infradead.org/mtd-2.6 >>>>>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2 >>>>>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091): >>>>>> gspca_sonixj: Add light frequency control >>>>>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb >>>>>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge >>>>>> git://git.infradead.org/~dwmw2/iommu-2.6.31 >>>>>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b >>>>>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch >>>>>> 'for-linus' >>>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 >>>>>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a >>>>>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix >>>>>> card >>>>>> driver reloading >>>>>> git bisect good b01b4babbf204443b5a846a7494546501614cefc >>>>>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace >>>>>> nfs4_path_walk() with VFS path lookup in a private namespace >>>>>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50 >>>>>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the >>>>>> function put_mnt_ns() >>>>>> git bisect good 616511d039af402670de8500d0e24495113a9cab >>>>>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper >>>>>> functions for setting up private namespaces >>>>>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae >>>>>> >>>>>> >>>>>> The last "git bisect good" prints out: >>>>>> >>>>>> server10:/usr/src/linux # git bisect good >>>>>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit >>>>>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50 >>>>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com> >>>>>> Date: Mon Jun 22 15:09:14 2009 -0400 >>>>>> >>>>>> NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private >>>>>> namespace >>>>>> >>>>>> As noted in the previous patch, the NFSv4 client mount code >>>>>> >>>>>> >>>> currently >>>> >>>> >>>>>> has several limitations. If the mount path contains symlinks, or >>>>>> referrals, or even if it just contains a '..', then the client code >>>>>> in >>>>>> nfs4_path_walk() will fail with an error. >>>>>> >>>>>> This patch replaces the nfs4_path_walk()-based lookup with a helper >>>>>> function that sets up a private namespace to represent the >>>>>> >>>>>> >>>> namespace >>>> >>>> >>>>>> on the >>>>>> server, then uses the ordinary VFS and NFS path lookup code to walk >>>>>> down the >>>>>> mount path in that namespace. >>>>>> >>>>>> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> >>>>>> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> >>>>>> >>>>>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc >>>>>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M fs >>>>>> >>>>>> Does this help you any further? >>>>>> >>>>>> Thanks! >>>>>> Robert >>>>>> >>>>>> >>>>> Looks suspiciously like some error in testing. >>>>> Could you pls retest and verify again that >>>>> cf8d2c11cb77f129675478792122f50827e5b0ae >>>>> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad? >>>>> >>>>> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-04-25 20:41 ` Robert Wimmer 0 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-04-25 20:41 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs, linux-kernel I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL to my .config. I've uploaded the dmesg output. Maybe it helps a little bit: https://bugzilla.kernel.org/attachment.cgi?id=26138 - Robert On 04/25/10 11:18, Michael S. Tsirkin wrote: > On Fri, Apr 23, 2010 at 07:26:52AM +0200, Robert Wimmer wrote: > >>> I'm not sure why the lockup backtrace does not show function names - >>> is the kernel stripped? >>> >> I'm building the kernels always with "genkernel" a Gentoo >> helper programm for kernel building. But I've looked into >> the log file of genkernel and there is nothing mentioned about >> striping the kernel. There will be a future release of genkernel >> which supports this but this is currently not the case. Since >> I haven't stripped the kernel I would answer no. Maybe a >> kernel option which should be enabled? >> >> Thanks! >> Robert >> >> > Hmm. I have these > CONFIG_KALLSYMS=y > CONFIG_KALLSYMS_ALL=y > CONFIG_KALLSYMS_EXTRA_PASS=y > # CONFIG_STRIP_ASM_SYMS is not set > > > >> >> On 04/22/10 12:03, Michael S. Tsirkin wrote: >> >>> On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote: >>> >>> >>>> Maybe some comments to my former mail about what I've done: >>>> I started with a fresh clone (deleted the old /usr/src/linux >>>> of course). >>>> >>>> git clone >>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux >>>> >>>> Then I started bisect >>>> >>>> git bisect start 'v2.6.31' 'v2.6.30' >>>> >>>> and build the first kernel and then marked kernels which >>>> "crashed" with "soft lockup" or "swapper page allocation failure" >>>> as bad and the other ones as good. Before I've compiled >>>> a new kernel I've always done a "make mrproper". I don't know >>>> if this is needed but thought it wouldn't hurt. >>>> >>>> For me it was not clear that maybe I should have had stopped >>>> testing after the first commit that came up with a "swapper >>>> page allocation failure". It was only one commit which cased >>>> the allocation failure. All the other commits marked as bad >>>> came up with a soft lockup. But I thought it is important to >>>> find the earliest commit which crashes. So should I find out >>>> the commit with the allocation failure? >>>> >>>> >>> I think you did the right thing. We'll have to >>> figure out soft lockup thing, then if page allocation failure >>> turns out to be a different issue, look at it. >>> >>> >>> >>>> As you requested I've now done now a >>>> >>>> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50 >>>> >>>> which ended with a soft lockup within 3 min. after starting >>>> the VM (see >>>> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit) >>>> with this kernel. >>>> >>>> >>> I'm not sure why the lockup backtrace does not show function names - >>> is the kernel stripped? >>> >>> >>> >>>> Then I've done a >>>> >>>> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae >>>> >>>> compiled and restarted the VM with this kernel version >>>> (BTW: Of course I've always used the same .config for >>>> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae >>>> is running fine. >>>> >>>> Thanks! >>>> Robert >>>> >>>> >>> Well, so the soft lockup issue seems NFS-related? >>> Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to >>> be causing problems on some old kernels (See bisect below). Any idea why? >>> >>> >>> >>> >>>> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com> >>>> wrote: >>>> >>>> >>>>> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote: >>>>> >>>>> >>>>>> So after the compiler was running hot I've now the following result: >>>>>> >>>>>> server10:/usr/src/linux # git bisect log >>>>>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 >>>>>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 >>>>>> git bisect start 'v2.6.31' 'v2.6.30' >>>>>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736): >>>>>> videobuf: modify return value of VIDIOC_REQBUFS ioctl >>>>>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3 >>>>>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device >>>>>> capabilities of 82599 single speed fiber NICs. >>>>>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e >>>>>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android: >>>>>> lowmemorykiller: fix up remaining checkpatch warnings >>>>>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03 >>>>>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch >>>>>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 >>>>>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70 >>>>>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch >>>>>> 'for-linus' >>>>>> of git://www.jni.nu/cris >>>>>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d >>>>>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge >>>>>> git://git.infradead.org/mtd-2.6 >>>>>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2 >>>>>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091): >>>>>> gspca_sonixj: Add light frequency control >>>>>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb >>>>>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge >>>>>> git://git.infradead.org/~dwmw2/iommu-2.6.31 >>>>>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b >>>>>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch >>>>>> 'for-linus' >>>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 >>>>>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a >>>>>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix >>>>>> card >>>>>> driver reloading >>>>>> git bisect good b01b4babbf204443b5a846a7494546501614cefc >>>>>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace >>>>>> nfs4_path_walk() with VFS path lookup in a private namespace >>>>>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50 >>>>>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the >>>>>> function put_mnt_ns() >>>>>> git bisect good 616511d039af402670de8500d0e24495113a9cab >>>>>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper >>>>>> functions for setting up private namespaces >>>>>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae >>>>>> >>>>>> >>>>>> The last "git bisect good" prints out: >>>>>> >>>>>> server10:/usr/src/linux # git bisect good >>>>>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit >>>>>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50 >>>>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com> >>>>>> Date: Mon Jun 22 15:09:14 2009 -0400 >>>>>> >>>>>> NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private >>>>>> namespace >>>>>> >>>>>> As noted in the previous patch, the NFSv4 client mount code >>>>>> >>>>>> >>>> currently >>>> >>>> >>>>>> has several limitations. If the mount path contains symlinks, or >>>>>> referrals, or even if it just contains a '..', then the client code >>>>>> in >>>>>> nfs4_path_walk() will fail with an error. >>>>>> >>>>>> This patch replaces the nfs4_path_walk()-based lookup with a helper >>>>>> function that sets up a private namespace to represent the >>>>>> >>>>>> >>>> namespace >>>> >>>> >>>>>> on the >>>>>> server, then uses the ordinary VFS and NFS path lookup code to walk >>>>>> down the >>>>>> mount path in that namespace. >>>>>> >>>>>> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> >>>>>> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> >>>>>> >>>>>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc >>>>>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M fs >>>>>> >>>>>> Does this help you any further? >>>>>> >>>>>> Thanks! >>>>>> Robert >>>>>> >>>>>> >>>>> Looks suspiciously like some error in testing. >>>>> Could you pls retest and verify again that >>>>> cf8d2c11cb77f129675478792122f50827e5b0ae >>>>> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad? >>>>> >>>>> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-25 20:41 ` Robert Wimmer @ 2010-04-25 20:49 ` Michael S. Tsirkin -1 siblings, 0 replies; 62+ messages in thread From: Michael S. Tsirkin @ 2010-04-25 20:49 UTC (permalink / raw) To: Robert Wimmer Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs, linux-kernel So, it's an NFS-related regression, which is consistent with the bisect results. I guess someone who knows about NFS will have to look at it... BTW, you probably want to label the bug as regression. On Sun, Apr 25, 2010 at 10:41:59PM +0200, Robert Wimmer wrote: > I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL > to my .config. I've uploaded the dmesg output. Maybe it > helps a little bit: > > https://bugzilla.kernel.org/attachment.cgi?id=26138 > > - Robert > > > On 04/25/10 11:18, Michael S. Tsirkin wrote: > > On Fri, Apr 23, 2010 at 07:26:52AM +0200, Robert Wimmer wrote: > > > >>> I'm not sure why the lockup backtrace does not show function names - > >>> is the kernel stripped? > >>> > >> I'm building the kernels always with "genkernel" a Gentoo > >> helper programm for kernel building. But I've looked into > >> the log file of genkernel and there is nothing mentioned about > >> striping the kernel. There will be a future release of genkernel > >> which supports this but this is currently not the case. Since > >> I haven't stripped the kernel I would answer no. Maybe a > >> kernel option which should be enabled? > >> > >> Thanks! > >> Robert > >> > >> > > Hmm. I have these > > CONFIG_KALLSYMS=y > > CONFIG_KALLSYMS_ALL=y > > CONFIG_KALLSYMS_EXTRA_PASS=y > > # CONFIG_STRIP_ASM_SYMS is not set > > > > > > > >> > >> On 04/22/10 12:03, Michael S. Tsirkin wrote: > >> > >>> On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote: > >>> > >>> > >>>> Maybe some comments to my former mail about what I've done: > >>>> I started with a fresh clone (deleted the old /usr/src/linux > >>>> of course). > >>>> > >>>> git clone > >>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux > >>>> > >>>> Then I started bisect > >>>> > >>>> git bisect start 'v2.6.31' 'v2.6.30' > >>>> > >>>> and build the first kernel and then marked kernels which > >>>> "crashed" with "soft lockup" or "swapper page allocation failure" > >>>> as bad and the other ones as good. Before I've compiled > >>>> a new kernel I've always done a "make mrproper". I don't know > >>>> if this is needed but thought it wouldn't hurt. > >>>> > >>>> For me it was not clear that maybe I should have had stopped > >>>> testing after the first commit that came up with a "swapper > >>>> page allocation failure". It was only one commit which cased > >>>> the allocation failure. All the other commits marked as bad > >>>> came up with a soft lockup. But I thought it is important to > >>>> find the earliest commit which crashes. So should I find out > >>>> the commit with the allocation failure? > >>>> > >>>> > >>> I think you did the right thing. We'll have to > >>> figure out soft lockup thing, then if page allocation failure > >>> turns out to be a different issue, look at it. > >>> > >>> > >>> > >>>> As you requested I've now done now a > >>>> > >>>> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50 > >>>> > >>>> which ended with a soft lockup within 3 min. after starting > >>>> the VM (see > >>>> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit) > >>>> with this kernel. > >>>> > >>>> > >>> I'm not sure why the lockup backtrace does not show function names - > >>> is the kernel stripped? > >>> > >>> > >>> > >>>> Then I've done a > >>>> > >>>> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae > >>>> > >>>> compiled and restarted the VM with this kernel version > >>>> (BTW: Of course I've always used the same .config for > >>>> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae > >>>> is running fine. > >>>> > >>>> Thanks! > >>>> Robert > >>>> > >>>> > >>> Well, so the soft lockup issue seems NFS-related? > >>> Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to > >>> be causing problems on some old kernels (See bisect below). Any idea why? > >>> > >>> > >>> > >>> > >>>> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com> > >>>> wrote: > >>>> > >>>> > >>>>> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote: > >>>>> > >>>>> > >>>>>> So after the compiler was running hot I've now the following result: > >>>>>> > >>>>>> server10:/usr/src/linux # git bisect log > >>>>>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 > >>>>>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 > >>>>>> git bisect start 'v2.6.31' 'v2.6.30' > >>>>>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736): > >>>>>> videobuf: modify return value of VIDIOC_REQBUFS ioctl > >>>>>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3 > >>>>>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device > >>>>>> capabilities of 82599 single speed fiber NICs. > >>>>>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e > >>>>>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android: > >>>>>> lowmemorykiller: fix up remaining checkpatch warnings > >>>>>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03 > >>>>>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch > >>>>>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 > >>>>>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70 > >>>>>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch > >>>>>> 'for-linus' > >>>>>> of git://www.jni.nu/cris > >>>>>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d > >>>>>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge > >>>>>> git://git.infradead.org/mtd-2.6 > >>>>>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2 > >>>>>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091): > >>>>>> gspca_sonixj: Add light frequency control > >>>>>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb > >>>>>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge > >>>>>> git://git.infradead.org/~dwmw2/iommu-2.6.31 > >>>>>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b > >>>>>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch > >>>>>> 'for-linus' > >>>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 > >>>>>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a > >>>>>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix > >>>>>> card > >>>>>> driver reloading > >>>>>> git bisect good b01b4babbf204443b5a846a7494546501614cefc > >>>>>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace > >>>>>> nfs4_path_walk() with VFS path lookup in a private namespace > >>>>>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50 > >>>>>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the > >>>>>> function put_mnt_ns() > >>>>>> git bisect good 616511d039af402670de8500d0e24495113a9cab > >>>>>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper > >>>>>> functions for setting up private namespaces > >>>>>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae > >>>>>> > >>>>>> > >>>>>> The last "git bisect good" prints out: > >>>>>> > >>>>>> server10:/usr/src/linux # git bisect good > >>>>>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit > >>>>>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50 > >>>>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com> > >>>>>> Date: Mon Jun 22 15:09:14 2009 -0400 > >>>>>> > >>>>>> NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private > >>>>>> namespace > >>>>>> > >>>>>> As noted in the previous patch, the NFSv4 client mount code > >>>>>> > >>>>>> > >>>> currently > >>>> > >>>> > >>>>>> has several limitations. If the mount path contains symlinks, or > >>>>>> referrals, or even if it just contains a '..', then the client code > >>>>>> in > >>>>>> nfs4_path_walk() will fail with an error. > >>>>>> > >>>>>> This patch replaces the nfs4_path_walk()-based lookup with a helper > >>>>>> function that sets up a private namespace to represent the > >>>>>> > >>>>>> > >>>> namespace > >>>> > >>>> > >>>>>> on the > >>>>>> server, then uses the ordinary VFS and NFS path lookup code to walk > >>>>>> down the > >>>>>> mount path in that namespace. > >>>>>> > >>>>>> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > >>>>>> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > >>>>>> > >>>>>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc > >>>>>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M fs > >>>>>> > >>>>>> Does this help you any further? > >>>>>> > >>>>>> Thanks! > >>>>>> Robert > >>>>>> > >>>>>> > >>>>> Looks suspiciously like some error in testing. > >>>>> Could you pls retest and verify again that > >>>>> cf8d2c11cb77f129675478792122f50827e5b0ae > >>>>> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad? > >>>>> > >>>>> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-04-25 20:49 ` Michael S. Tsirkin 0 siblings, 0 replies; 62+ messages in thread From: Michael S. Tsirkin @ 2010-04-25 20:49 UTC (permalink / raw) To: Robert Wimmer Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs, linux-kernel So, it's an NFS-related regression, which is consistent with the bisect results. I guess someone who knows about NFS will have to look at it... BTW, you probably want to label the bug as regression. On Sun, Apr 25, 2010 at 10:41:59PM +0200, Robert Wimmer wrote: > I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL > to my .config. I've uploaded the dmesg output. Maybe it > helps a little bit: > > https://bugzilla.kernel.org/attachment.cgi?id=26138 > > - Robert > > > On 04/25/10 11:18, Michael S. Tsirkin wrote: > > On Fri, Apr 23, 2010 at 07:26:52AM +0200, Robert Wimmer wrote: > > > >>> I'm not sure why the lockup backtrace does not show function names - > >>> is the kernel stripped? > >>> > >> I'm building the kernels always with "genkernel" a Gentoo > >> helper programm for kernel building. But I've looked into > >> the log file of genkernel and there is nothing mentioned about > >> striping the kernel. There will be a future release of genkernel > >> which supports this but this is currently not the case. Since > >> I haven't stripped the kernel I would answer no. Maybe a > >> kernel option which should be enabled? > >> > >> Thanks! > >> Robert > >> > >> > > Hmm. I have these > > CONFIG_KALLSYMS=y > > CONFIG_KALLSYMS_ALL=y > > CONFIG_KALLSYMS_EXTRA_PASS=y > > # CONFIG_STRIP_ASM_SYMS is not set > > > > > > > >> > >> On 04/22/10 12:03, Michael S. Tsirkin wrote: > >> > >>> On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote: > >>> > >>> > >>>> Maybe some comments to my former mail about what I've done: > >>>> I started with a fresh clone (deleted the old /usr/src/linux > >>>> of course). > >>>> > >>>> git clone > >>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux > >>>> > >>>> Then I started bisect > >>>> > >>>> git bisect start 'v2.6.31' 'v2.6.30' > >>>> > >>>> and build the first kernel and then marked kernels which > >>>> "crashed" with "soft lockup" or "swapper page allocation failure" > >>>> as bad and the other ones as good. Before I've compiled > >>>> a new kernel I've always done a "make mrproper". I don't know > >>>> if this is needed but thought it wouldn't hurt. > >>>> > >>>> For me it was not clear that maybe I should have had stopped > >>>> testing after the first commit that came up with a "swapper > >>>> page allocation failure". It was only one commit which cased > >>>> the allocation failure. All the other commits marked as bad > >>>> came up with a soft lockup. But I thought it is important to > >>>> find the earliest commit which crashes. So should I find out > >>>> the commit with the allocation failure? > >>>> > >>>> > >>> I think you did the right thing. We'll have to > >>> figure out soft lockup thing, then if page allocation failure > >>> turns out to be a different issue, look at it. > >>> > >>> > >>> > >>>> As you requested I've now done now a > >>>> > >>>> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50 > >>>> > >>>> which ended with a soft lockup within 3 min. after starting > >>>> the VM (see > >>>> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit) > >>>> with this kernel. > >>>> > >>>> > >>> I'm not sure why the lockup backtrace does not show function names - > >>> is the kernel stripped? > >>> > >>> > >>> > >>>> Then I've done a > >>>> > >>>> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae > >>>> > >>>> compiled and restarted the VM with this kernel version > >>>> (BTW: Of course I've always used the same .config for > >>>> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae > >>>> is running fine. > >>>> > >>>> Thanks! > >>>> Robert > >>>> > >>>> > >>> Well, so the soft lockup issue seems NFS-related? > >>> Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to > >>> be causing problems on some old kernels (See bisect below). Any idea why? > >>> > >>> > >>> > >>> > >>>> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com> > >>>> wrote: > >>>> > >>>> > >>>>> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote: > >>>>> > >>>>> > >>>>>> So after the compiler was running hot I've now the following result: > >>>>>> > >>>>>> server10:/usr/src/linux # git bisect log > >>>>>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31 > >>>>>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30 > >>>>>> git bisect start 'v2.6.31' 'v2.6.30' > >>>>>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736): > >>>>>> videobuf: modify return value of VIDIOC_REQBUFS ioctl > >>>>>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3 > >>>>>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device > >>>>>> capabilities of 82599 single speed fiber NICs. > >>>>>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e > >>>>>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android: > >>>>>> lowmemorykiller: fix up remaining checkpatch warnings > >>>>>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03 > >>>>>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch > >>>>>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6 > >>>>>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70 > >>>>>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch > >>>>>> 'for-linus' > >>>>>> of git://www.jni.nu/cris > >>>>>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d > >>>>>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge > >>>>>> git://git.infradead.org/mtd-2.6 > >>>>>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2 > >>>>>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091): > >>>>>> gspca_sonixj: Add light frequency control > >>>>>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb > >>>>>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge > >>>>>> git://git.infradead.org/~dwmw2/iommu-2.6.31 > >>>>>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b > >>>>>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch > >>>>>> 'for-linus' > >>>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6 > >>>>>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a > >>>>>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix > >>>>>> card > >>>>>> driver reloading > >>>>>> git bisect good b01b4babbf204443b5a846a7494546501614cefc > >>>>>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace > >>>>>> nfs4_path_walk() with VFS path lookup in a private namespace > >>>>>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50 > >>>>>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the > >>>>>> function put_mnt_ns() > >>>>>> git bisect good 616511d039af402670de8500d0e24495113a9cab > >>>>>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper > >>>>>> functions for setting up private namespaces > >>>>>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae > >>>>>> > >>>>>> > >>>>>> The last "git bisect good" prints out: > >>>>>> > >>>>>> server10:/usr/src/linux # git bisect good > >>>>>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit > >>>>>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50 > >>>>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com> > >>>>>> Date: Mon Jun 22 15:09:14 2009 -0400 > >>>>>> > >>>>>> NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private > >>>>>> namespace > >>>>>> > >>>>>> As noted in the previous patch, the NFSv4 client mount code > >>>>>> > >>>>>> > >>>> currently > >>>> > >>>> > >>>>>> has several limitations. If the mount path contains symlinks, or > >>>>>> referrals, or even if it just contains a '..', then the client code > >>>>>> in > >>>>>> nfs4_path_walk() will fail with an error. > >>>>>> > >>>>>> This patch replaces the nfs4_path_walk()-based lookup with a helper > >>>>>> function that sets up a private namespace to represent the > >>>>>> > >>>>>> > >>>> namespace > >>>> > >>>> > >>>>>> on the > >>>>>> server, then uses the ordinary VFS and NFS path lookup code to walk > >>>>>> down the > >>>>>> mount path in that namespace. > >>>>>> > >>>>>> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> > >>>>>> Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org> > >>>>>> > >>>>>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc > >>>>>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M fs > >>>>>> > >>>>>> Does this help you any further? > >>>>>> > >>>>>> Thanks! > >>>>>> Robert > >>>>>> > >>>>>> > >>>>> Looks suspiciously like some error in testing. > >>>>> Could you pls retest and verify again that > >>>>> cf8d2c11cb77f129675478792122f50827e5b0ae > >>>>> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad? > >>>>> > >>>>> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-25 20:49 ` Michael S. Tsirkin (?) @ 2010-04-26 12:15 ` Trond Myklebust -1 siblings, 0 replies; 62+ messages in thread From: Trond Myklebust @ 2010-04-26 12:15 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Robert Wimmer, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel On Sun, 2010-04-25 at 23:49 +0300, Michael S. Tsirkin wrote: > So, it's an NFS-related regression, which is consistent with the bisect > results. I guess someone who knows about NFS will have to look at it... > BTW, you probably want to label the bug as regression. > > On Sun, Apr 25, 2010 at 10:41:59PM +0200, Robert Wimmer wrote: > > I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL > > to my .config. I've uploaded the dmesg output. Maybe it > > helps a little bit: > > > > https://bugzilla.kernel.org/attachment.cgi?id=26138 > > > > - Robert > > That last trace is just saying that the NFSv4 reboot recovery code is crashing (which is hardly surprising if the memory management is hosed). The initial bisection makes little sense to me: it is basically blaming a page allocation problem on a change to the NFSv4 mount code. The only way I can see that possibly happen is if you are hitting a stack overflow. So 2 questions: - Are you able to reproduce the bug when using NFSv3 instead? - Have you tried running with stack tracing enabled? Cheers Trond ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-04-26 12:15 ` Trond Myklebust 0 siblings, 0 replies; 62+ messages in thread From: Trond Myklebust @ 2010-04-26 12:15 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Robert Wimmer, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel On Sun, 2010-04-25 at 23:49 +0300, Michael S. Tsirkin wrote: > So, it's an NFS-related regression, which is consistent with the bisect > results. I guess someone who knows about NFS will have to look at it... > BTW, you probably want to label the bug as regression. > > On Sun, Apr 25, 2010 at 10:41:59PM +0200, Robert Wimmer wrote: > > I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL > > to my .config. I've uploaded the dmesg output. Maybe it > > helps a little bit: > > > > https://bugzilla.kernel.org/attachment.cgi?id=26138 > > > > - Robert > > That last trace is just saying that the NFSv4 reboot recovery code is crashing (which is hardly surprising if the memory management is hosed). The initial bisection makes little sense to me: it is basically blaming a page allocation problem on a change to the NFSv4 mount code. The only way I can see that possibly happen is if you are hitting a stack overflow. So 2 questions: - Are you able to reproduce the bug when using NFSv3 instead? - Have you tried running with stack tracing enabled? Cheers Trond -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-04-26 12:15 ` Trond Myklebust 0 siblings, 0 replies; 62+ messages in thread From: Trond Myklebust @ 2010-04-26 12:15 UTC (permalink / raw) To: Michael S. Tsirkin Cc: Robert Wimmer, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel On Sun, 2010-04-25 at 23:49 +0300, Michael S. Tsirkin wrote: > So, it's an NFS-related regression, which is consistent with the bisect > results. I guess someone who knows about NFS will have to look at it... > BTW, you probably want to label the bug as regression. > > On Sun, Apr 25, 2010 at 10:41:59PM +0200, Robert Wimmer wrote: > > I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL > > to my .config. I've uploaded the dmesg output. Maybe it > > helps a little bit: > > > > https://bugzilla.kernel.org/attachment.cgi?id=26138 > > > > - Robert > > That last trace is just saying that the NFSv4 reboot recovery code is crashing (which is hardly surprising if the memory management is hosed). The initial bisection makes little sense to me: it is basically blaming a page allocation problem on a change to the NFSv4 mount code. The only way I can see that possibly happen is if you are hitting a stack overflow. So 2 questions: - Are you able to reproduce the bug when using NFSv3 instead? - Have you tried running with stack tracing enabled? Cheers Trond ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-26 12:15 ` Trond Myklebust @ 2010-04-26 20:25 ` Robert Wimmer -1 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-04-26 20:25 UTC (permalink / raw) To: Trond Myklebust Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel >>> I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL >>> to my .config. I've uploaded the dmesg output. Maybe it >>> helps a little bit: >>> >>> https://bugzilla.kernel.org/attachment.cgi?id=26138 >>> >>> - Robert >>> >>> > That last trace is just saying that the NFSv4 reboot recovery code is > crashing (which is hardly surprising if the memory management is hosed). > > The initial bisection makes little sense to me: it is basically blaming > a page allocation problem on a change to the NFSv4 mount code. The only > way I can see that possibly happen is if you are hitting a stack > overflow. > So 2 questions: > > - Are you able to reproduce the bug when using NFSv3 instead? > I've tried with NFSv3 now. With v4 the error normally occur within 5 minutes. The VM is now running for one hour and no soft lockup so far. So I would say it can't be reproduced with v3. > - Have you tried running with stack tracing enabled? > Can you explain this a little bit more please? CONFIG_STACKTRACE=y was already enabled. I've now enabled CONFIG_USER_STACKTRACE_SUPPORT=y CONFIG_NOP_TRACER=y CONFIG_HAVE_FTRACE_NMI_ENTER=y CONFIG_HAVE_FUNCTION_TRACER=y CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y CONFIG_HAVE_DYNAMIC_FTRACE=y CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y CONFIG_HAVE_FTRACE_SYSCALLS=y CONFIG_FTRACE_NMI_ENTER=y CONFIG_CONTEXT_SWITCH_TRACER=y CONFIG_GENERIC_TRACER=y CONFIG_FTRACE=y CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y CONFIG_FTRACE_SYSCALLS=y CONFIG_STACK_TRACER=y CONFIG_KMEMTRACE=y CONFIG_DYNAMIC_FTRACE=y CONFIG_FTRACE_MCOUNT_RECORD=y CONFIG_HAVE_MMIOTRACE_SUPPORT=y and run echo 1 > /proc/sys/kernel/stack_tracer_enabled But the output is mostly the same in dmesg/ var/log/messages. Can you please guide me how I can enable the stack tracing you need? Thanks! Robert ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-04-26 20:25 ` Robert Wimmer 0 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-04-26 20:25 UTC (permalink / raw) To: Trond Myklebust Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel >>> I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL >>> to my .config. I've uploaded the dmesg output. Maybe it >>> helps a little bit: >>> >>> https://bugzilla.kernel.org/attachment.cgi?id=26138 >>> >>> - Robert >>> >>> > That last trace is just saying that the NFSv4 reboot recovery code is > crashing (which is hardly surprising if the memory management is hosed). > > The initial bisection makes little sense to me: it is basically blaming > a page allocation problem on a change to the NFSv4 mount code. The only > way I can see that possibly happen is if you are hitting a stack > overflow. > So 2 questions: > > - Are you able to reproduce the bug when using NFSv3 instead? > I've tried with NFSv3 now. With v4 the error normally occur within 5 minutes. The VM is now running for one hour and no soft lockup so far. So I would say it can't be reproduced with v3. > - Have you tried running with stack tracing enabled? > Can you explain this a little bit more please? CONFIG_STACKTRACE=y was already enabled. I've now enabled CONFIG_USER_STACKTRACE_SUPPORT=y CONFIG_NOP_TRACER=y CONFIG_HAVE_FTRACE_NMI_ENTER=y CONFIG_HAVE_FUNCTION_TRACER=y CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y CONFIG_HAVE_DYNAMIC_FTRACE=y CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y CONFIG_HAVE_FTRACE_SYSCALLS=y CONFIG_FTRACE_NMI_ENTER=y CONFIG_CONTEXT_SWITCH_TRACER=y CONFIG_GENERIC_TRACER=y CONFIG_FTRACE=y CONFIG_FUNCTION_TRACER=y CONFIG_FUNCTION_GRAPH_TRACER=y CONFIG_FTRACE_SYSCALLS=y CONFIG_STACK_TRACER=y CONFIG_KMEMTRACE=y CONFIG_DYNAMIC_FTRACE=y CONFIG_FTRACE_MCOUNT_RECORD=y CONFIG_HAVE_MMIOTRACE_SUPPORT=y and run echo 1 > /proc/sys/kernel/stack_tracer_enabled But the output is mostly the same in dmesg/ var/log/messages. Can you please guide me how I can enable the stack tracing you need? Thanks! Robert -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-04-26 21:04 ` Trond Myklebust 0 siblings, 0 replies; 62+ messages in thread From: Trond Myklebust @ 2010-04-26 21:04 UTC (permalink / raw) To: Robert Wimmer Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel On Mon, 2010-04-26 at 22:25 +0200, Robert Wimmer wrote: > I've tried with NFSv3 now. With v4 the error normally occur > within 5 minutes. The VM is now running for one hour and no > soft lockup so far. So I would say it can't be reproduced with > v3. Thanks! That's useful info. > > - Have you tried running with stack tracing enabled? > > > > Can you explain this a little bit more please? CONFIG_STACKTRACE=y > was already enabled. I've now enabled > > CONFIG_USER_STACKTRACE_SUPPORT=y > CONFIG_NOP_TRACER=y > CONFIG_HAVE_FTRACE_NMI_ENTER=y > CONFIG_HAVE_FUNCTION_TRACER=y > CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y > CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y > CONFIG_HAVE_DYNAMIC_FTRACE=y > CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y > CONFIG_HAVE_FTRACE_SYSCALLS=y > CONFIG_FTRACE_NMI_ENTER=y > CONFIG_CONTEXT_SWITCH_TRACER=y > CONFIG_GENERIC_TRACER=y > CONFIG_FTRACE=y > CONFIG_FUNCTION_TRACER=y > CONFIG_FUNCTION_GRAPH_TRACER=y > CONFIG_FTRACE_SYSCALLS=y > CONFIG_STACK_TRACER=y > CONFIG_KMEMTRACE=y > CONFIG_DYNAMIC_FTRACE=y > CONFIG_FTRACE_MCOUNT_RECORD=y > CONFIG_HAVE_MMIOTRACE_SUPPORT=y > > and run > > echo 1 > /proc/sys/kernel/stack_tracer_enabled > > But the output is mostly the same in dmesg/ > var/log/messages. Can you please guide me how I can > enable the stack tracing you need? Sure. In addition to what you did above, please do mount -t debugfs none /sys/kernel/debug and then cat the contents of the pseudofile at /sys/kernel/debug/tracing/stack_trace Please do this more or less immediately after you've finished mounting the NFSv4 client. Does your server have the 'crossmnt' or 'nohide' flags set, or does it use the 'refer' export option anywhere? If so, then we might have to test further, since those may trigger the NFSv4 submount feature. Cheers Trond ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-04-26 21:04 ` Trond Myklebust 0 siblings, 0 replies; 62+ messages in thread From: Trond Myklebust @ 2010-04-26 21:04 UTC (permalink / raw) To: Robert Wimmer Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel On Mon, 2010-04-26 at 22:25 +0200, Robert Wimmer wrote: > I've tried with NFSv3 now. With v4 the error normally occur > within 5 minutes. The VM is now running for one hour and no > soft lockup so far. So I would say it can't be reproduced with > v3. Thanks! That's useful info. > > - Have you tried running with stack tracing enabled? > > > > Can you explain this a little bit more please? CONFIG_STACKTRACE=y > was already enabled. I've now enabled > > CONFIG_USER_STACKTRACE_SUPPORT=y > CONFIG_NOP_TRACER=y > CONFIG_HAVE_FTRACE_NMI_ENTER=y > CONFIG_HAVE_FUNCTION_TRACER=y > CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y > CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y > CONFIG_HAVE_DYNAMIC_FTRACE=y > CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y > CONFIG_HAVE_FTRACE_SYSCALLS=y > CONFIG_FTRACE_NMI_ENTER=y > CONFIG_CONTEXT_SWITCH_TRACER=y > CONFIG_GENERIC_TRACER=y > CONFIG_FTRACE=y > CONFIG_FUNCTION_TRACER=y > CONFIG_FUNCTION_GRAPH_TRACER=y > CONFIG_FTRACE_SYSCALLS=y > CONFIG_STACK_TRACER=y > CONFIG_KMEMTRACE=y > CONFIG_DYNAMIC_FTRACE=y > CONFIG_FTRACE_MCOUNT_RECORD=y > CONFIG_HAVE_MMIOTRACE_SUPPORT=y > > and run > > echo 1 > /proc/sys/kernel/stack_tracer_enabled > > But the output is mostly the same in dmesg/ > var/log/messages. Can you please guide me how I can > enable the stack tracing you need? Sure. In addition to what you did above, please do mount -t debugfs none /sys/kernel/debug and then cat the contents of the pseudofile at /sys/kernel/debug/tracing/stack_trace Please do this more or less immediately after you've finished mounting the NFSv4 client. Does your server have the 'crossmnt' or 'nohide' flags set, or does it use the 'refer' export option anywhere? If so, then we might have to test further, since those may trigger the NFSv4 submount feature. Cheers Trond -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-04-26 21:04 ` Trond Myklebust 0 siblings, 0 replies; 62+ messages in thread From: Trond Myklebust @ 2010-04-26 21:04 UTC (permalink / raw) To: Robert Wimmer Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel On Mon, 2010-04-26 at 22:25 +0200, Robert Wimmer wrote: > I've tried with NFSv3 now. With v4 the error normally occur > within 5 minutes. The VM is now running for one hour and no > soft lockup so far. So I would say it can't be reproduced with > v3. Thanks! That's useful info. > > - Have you tried running with stack tracing enabled? > > > > Can you explain this a little bit more please? CONFIG_STACKTRACE=y > was already enabled. I've now enabled > > CONFIG_USER_STACKTRACE_SUPPORT=y > CONFIG_NOP_TRACER=y > CONFIG_HAVE_FTRACE_NMI_ENTER=y > CONFIG_HAVE_FUNCTION_TRACER=y > CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y > CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y > CONFIG_HAVE_DYNAMIC_FTRACE=y > CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y > CONFIG_HAVE_FTRACE_SYSCALLS=y > CONFIG_FTRACE_NMI_ENTER=y > CONFIG_CONTEXT_SWITCH_TRACER=y > CONFIG_GENERIC_TRACER=y > CONFIG_FTRACE=y > CONFIG_FUNCTION_TRACER=y > CONFIG_FUNCTION_GRAPH_TRACER=y > CONFIG_FTRACE_SYSCALLS=y > CONFIG_STACK_TRACER=y > CONFIG_KMEMTRACE=y > CONFIG_DYNAMIC_FTRACE=y > CONFIG_FTRACE_MCOUNT_RECORD=y > CONFIG_HAVE_MMIOTRACE_SUPPORT=y > > and run > > echo 1 > /proc/sys/kernel/stack_tracer_enabled > > But the output is mostly the same in dmesg/ > var/log/messages. Can you please guide me how I can > enable the stack tracing you need? Sure. In addition to what you did above, please do mount -t debugfs none /sys/kernel/debug and then cat the contents of the pseudofile at /sys/kernel/debug/tracing/stack_trace Please do this more or less immediately after you've finished mounting the NFSv4 client. Does your server have the 'crossmnt' or 'nohide' flags set, or does it use the 'refer' export option anywhere? If so, then we might have to test further, since those may trigger the NFSv4 submount feature. Cheers Trond ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-26 21:04 ` Trond Myklebust @ 2010-04-26 22:18 ` Robert Wimmer -1 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-04-26 22:18 UTC (permalink / raw) To: Trond Myklebust Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel > Sure. In addition to what you did above, please do > > mount -t debugfs none /sys/kernel/debug > > and then cat the contents of the pseudofile at > > /sys/kernel/debug/tracing/stack_trace > > Please do this more or less immediately after you've finished mounting > the NFSv4 client. > I've uploaded the stack trace. It was generated directly after mounting. Here are the stacks: After mounting: https://bugzilla.kernel.org/attachment.cgi?id=26153 After the soft lockup: https://bugzilla.kernel.org/attachment.cgi?id=26154 The dmesg output of the soft lockup: https://bugzilla.kernel.org/attachment.cgi?id=26155 > Does your server have the 'crossmnt' or 'nohide' flags set, or does it > use the 'refer' export option anywhere? If so, then we might have to > test further, since those may trigger the NFSv4 submount feature. > The server has the following settings: rw,nohide,insecure,async,no_subtree_check,no_root_squash Thanks! Robert ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-04-26 22:18 ` Robert Wimmer 0 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-04-26 22:18 UTC (permalink / raw) To: Trond Myklebust Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel > Sure. In addition to what you did above, please do > > mount -t debugfs none /sys/kernel/debug > > and then cat the contents of the pseudofile at > > /sys/kernel/debug/tracing/stack_trace > > Please do this more or less immediately after you've finished mounting > the NFSv4 client. > I've uploaded the stack trace. It was generated directly after mounting. Here are the stacks: After mounting: https://bugzilla.kernel.org/attachment.cgi?id=26153 After the soft lockup: https://bugzilla.kernel.org/attachment.cgi?id=26154 The dmesg output of the soft lockup: https://bugzilla.kernel.org/attachment.cgi?id=26155 > Does your server have the 'crossmnt' or 'nohide' flags set, or does it > use the 'refer' export option anywhere? If so, then we might have to > test further, since those may trigger the NFSv4 submount feature. > The server has the following settings: rw,nohide,insecure,async,no_subtree_check,no_root_squash Thanks! Robert -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-26 22:18 ` Robert Wimmer (?) @ 2010-04-26 23:28 ` Trond Myklebust 2010-04-27 22:56 ` Robert Wimmer -1 siblings, 1 reply; 62+ messages in thread From: Trond Myklebust @ 2010-04-26 23:28 UTC (permalink / raw) To: Robert Wimmer Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel [-- Attachment #1: Type: text/plain, Size: 1532 bytes --] On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: > > Sure. In addition to what you did above, please do > > > > mount -t debugfs none /sys/kernel/debug > > > > and then cat the contents of the pseudofile at > > > > /sys/kernel/debug/tracing/stack_trace > > > > Please do this more or less immediately after you've finished mounting > > the NFSv4 client. > > > > I've uploaded the stack trace. It was generated > directly after mounting. Here are the stacks: > > After mounting: > https://bugzilla.kernel.org/attachment.cgi?id=26153 > After the soft lockup: > https://bugzilla.kernel.org/attachment.cgi?id=26154 > The dmesg output of the soft lockup: > https://bugzilla.kernel.org/attachment.cgi?id=26155 > > > Does your server have the 'crossmnt' or 'nohide' flags set, or does it > > use the 'refer' export option anywhere? If so, then we might have to > > test further, since those may trigger the NFSv4 submount feature. > > > The server has the following settings: > rw,nohide,insecure,async,no_subtree_check,no_root_squash > > Thanks! > Robert > > That second trace is more than 5.5K deep, more than half of which is socket overhead :-(((. The process stack does not appear to have overflowed, however that trace doesn't include any IRQ stack overhead. OK... So what happens if we get rid of half of that trace by forcing asynchronous tasks such as this to run entirely in rpciod instead of first trying to run in the process context? See the attachment... [-- Attachment #2: linux-2.6.34-000-reduce_async_rpc_stack_usage.dif --] [-- Type: text/plain, Size: 856 bytes --] SUNRPC: Reduce asynchronous RPC task stack usage From: Trond Myklebust <Trond.Myklebust@netapp.com> We should just farm out asynchronous RPC tasks immediately to rpciod... Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com> --- net/sunrpc/sched.c | 7 ++++++- 1 files changed, 6 insertions(+), 1 deletions(-) diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c index c8979ce..22a097f 100644 --- a/net/sunrpc/sched.c +++ b/net/sunrpc/sched.c @@ -720,7 +720,12 @@ void rpc_execute(struct rpc_task *task) { rpc_set_active(task); rpc_set_running(task); - __rpc_execute(task); + if (RPC_IS_ASYNC(task)) { + INIT_WORK(&task->u.tk_work, rpc_async_schedule); + queue_work(rpciod_workqueue, &task->u.tk_work); + + } else + __rpc_execute(task); } static void rpc_async_schedule(struct work_struct *work) ^ permalink raw reply related [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-26 23:28 ` Trond Myklebust @ 2010-04-27 22:56 ` Robert Wimmer 0 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-04-27 22:56 UTC (permalink / raw) To: Trond Myklebust Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel I've applied the patch against the kernel which I got from "git clone ...." resulted in a kernel 2.6.34-rc5. The stack trace after mounting NFS is here: https://bugzilla.kernel.org/attachment.cgi?id=26166 /var/log/messages after soft lockup: https://bugzilla.kernel.org/attachment.cgi?id=26167 I hope that there is any usefull information in there. Thanks! Robert On 04/27/10 01:28, Trond Myklebust wrote: > On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: > >>> Sure. In addition to what you did above, please do >>> >>> mount -t debugfs none /sys/kernel/debug >>> >>> and then cat the contents of the pseudofile at >>> >>> /sys/kernel/debug/tracing/stack_trace >>> >>> Please do this more or less immediately after you've finished mounting >>> the NFSv4 client. >>> >>> >> I've uploaded the stack trace. It was generated >> directly after mounting. Here are the stacks: >> >> After mounting: >> https://bugzilla.kernel.org/attachment.cgi?id=26153 >> After the soft lockup: >> https://bugzilla.kernel.org/attachment.cgi?id=26154 >> The dmesg output of the soft lockup: >> https://bugzilla.kernel.org/attachment.cgi?id=26155 >> >> >>> Does your server have the 'crossmnt' or 'nohide' flags set, or does it >>> use the 'refer' export option anywhere? If so, then we might have to >>> test further, since those may trigger the NFSv4 submount feature. >>> >>> >> The server has the following settings: >> rw,nohide,insecure,async,no_subtree_check,no_root_squash >> >> Thanks! >> Robert >> >> >> > That second trace is more than 5.5K deep, more than half of which is > socket overhead :-(((. > > The process stack does not appear to have overflowed, however that trace > doesn't include any IRQ stack overhead. > > OK... So what happens if we get rid of half of that trace by forcing > asynchronous tasks such as this to run entirely in rpciod instead of > first trying to run in the process context? > > See the attachment... > ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-04-27 22:56 ` Robert Wimmer 0 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-04-27 22:56 UTC (permalink / raw) To: Trond Myklebust Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel I've applied the patch against the kernel which I got from "git clone ...." resulted in a kernel 2.6.34-rc5. The stack trace after mounting NFS is here: https://bugzilla.kernel.org/attachment.cgi?id=26166 /var/log/messages after soft lockup: https://bugzilla.kernel.org/attachment.cgi?id=26167 I hope that there is any usefull information in there. Thanks! Robert On 04/27/10 01:28, Trond Myklebust wrote: > On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: > >>> Sure. In addition to what you did above, please do >>> >>> mount -t debugfs none /sys/kernel/debug >>> >>> and then cat the contents of the pseudofile at >>> >>> /sys/kernel/debug/tracing/stack_trace >>> >>> Please do this more or less immediately after you've finished mounting >>> the NFSv4 client. >>> >>> >> I've uploaded the stack trace. It was generated >> directly after mounting. Here are the stacks: >> >> After mounting: >> https://bugzilla.kernel.org/attachment.cgi?id=26153 >> After the soft lockup: >> https://bugzilla.kernel.org/attachment.cgi?id=26154 >> The dmesg output of the soft lockup: >> https://bugzilla.kernel.org/attachment.cgi?id=26155 >> >> >>> Does your server have the 'crossmnt' or 'nohide' flags set, or does it >>> use the 'refer' export option anywhere? If so, then we might have to >>> test further, since those may trigger the NFSv4 submount feature. >>> >>> >> The server has the following settings: >> rw,nohide,insecure,async,no_subtree_check,no_root_squash >> >> Thanks! >> Robert >> >> >> > That second trace is more than 5.5K deep, more than half of which is > socket overhead :-(((. > > The process stack does not appear to have overflowed, however that trace > doesn't include any IRQ stack overhead. > > OK... So what happens if we get rid of half of that trace by forcing > asynchronous tasks such as this to run entirely in rpciod instead of > first trying to run in the process context? > > See the attachment... > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-04-27 22:56 ` Robert Wimmer @ 2010-05-03 8:11 ` kernel -1 siblings, 0 replies; 62+ messages in thread From: kernel @ 2010-05-03 8:11 UTC (permalink / raw) To: Trond Myklebust Cc: Michael S. Tsirkin" <mst@redhat.com>, Avi Kivity <avi@redhat.com>, Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org, bugzilla-daemon@bugzilla.kernel.org, Rusty Russell <rusty@rustcorp.com.au>, Mel Gorman <mel@csn.ul.ie>, linux-nfs@vger.kernel.org,, linux-kernel Anything we can do to investigate this further? Thanks! Robert On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel@tauceti.net> wrote: > I've applied the patch against the kernel which I got > from "git clone ...." resulted in a kernel 2.6.34-rc5. > > The stack trace after mounting NFS is here: > https://bugzilla.kernel.org/attachment.cgi?id=26166 > /var/log/messages after soft lockup: > https://bugzilla.kernel.org/attachment.cgi?id=26167 > > I hope that there is any usefull information in there. > > Thanks! > Robert > > On 04/27/10 01:28, Trond Myklebust wrote: >> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: >> >>>> Sure. In addition to what you did above, please do >>>> >>>> mount -t debugfs none /sys/kernel/debug >>>> >>>> and then cat the contents of the pseudofile at >>>> >>>> /sys/kernel/debug/tracing/stack_trace >>>> >>>> Please do this more or less immediately after you've finished mounting >>>> the NFSv4 client. >>>> >>>> >>> I've uploaded the stack trace. It was generated >>> directly after mounting. Here are the stacks: >>> >>> After mounting: >>> https://bugzilla.kernel.org/attachment.cgi?id=26153 >>> After the soft lockup: >>> https://bugzilla.kernel.org/attachment.cgi?id=26154 >>> The dmesg output of the soft lockup: >>> https://bugzilla.kernel.org/attachment.cgi?id=26155 >>> >>> >>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does it >>>> use the 'refer' export option anywhere? If so, then we might have to >>>> test further, since those may trigger the NFSv4 submount feature. >>>> >>>> >>> The server has the following settings: >>> rw,nohide,insecure,async,no_subtree_check,no_root_squash >>> >>> Thanks! >>> Robert >>> >>> >>> >> That second trace is more than 5.5K deep, more than half of which is >> socket overhead :-(((. >> >> The process stack does not appear to have overflowed, however that trace >> doesn't include any IRQ stack overhead. >> >> OK... So what happens if we get rid of half of that trace by forcing >> asynchronous tasks such as this to run entirely in rpciod instead of >> first trying to run in the process context? >> >> See the attachment... >> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-05-03 8:11 ` kernel 0 siblings, 0 replies; 62+ messages in thread From: kernel @ 2010-05-03 8:11 UTC (permalink / raw) To: Trond Myklebust Cc: Michael S. Tsirkin" <mst@redhat.com>, Avi Kivity <avi@redhat.com>, Andrew Morton <akpm@linux-foundation.org>, linux-mm@kvack.org, bugzilla-daemon@bugzilla.kernel.org, Rusty Russell <rusty@rustcorp.com.au>, Mel Gorman <mel@csn.ul.ie>, linux-nfs@vger.kernel.org, Anything we can do to investigate this further? Thanks! Robert On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel@tauceti.net> wrote: > I've applied the patch against the kernel which I got > from "git clone ...." resulted in a kernel 2.6.34-rc5. > > The stack trace after mounting NFS is here: > https://bugzilla.kernel.org/attachment.cgi?id=26166 > /var/log/messages after soft lockup: > https://bugzilla.kernel.org/attachment.cgi?id=26167 > > I hope that there is any usefull information in there. > > Thanks! > Robert > > On 04/27/10 01:28, Trond Myklebust wrote: >> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: >> >>>> Sure. In addition to what you did above, please do >>>> >>>> mount -t debugfs none /sys/kernel/debug >>>> >>>> and then cat the contents of the pseudofile at >>>> >>>> /sys/kernel/debug/tracing/stack_trace >>>> >>>> Please do this more or less immediately after you've finished mounting >>>> the NFSv4 client. >>>> >>>> >>> I've uploaded the stack trace. It was generated >>> directly after mounting. Here are the stacks: >>> >>> After mounting: >>> https://bugzilla.kernel.org/attachment.cgi?id=26153 >>> After the soft lockup: >>> https://bugzilla.kernel.org/attachment.cgi?id=26154 >>> The dmesg output of the soft lockup: >>> https://bugzilla.kernel.org/attachment.cgi?id=26155 >>> >>> >>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does it >>>> use the 'refer' export option anywhere? If so, then we might have to >>>> test further, since those may trigger the NFSv4 submount feature. >>>> >>>> >>> The server has the following settings: >>> rw,nohide,insecure,async,no_subtree_check,no_root_squash >>> >>> Thanks! >>> Robert >>> >>> >>> >> That second trace is more than 5.5K deep, more than half of which is >> socket overhead :-(((. >> >> The process stack does not appear to have overflowed, however that trace >> doesn't include any IRQ stack overhead. >> >> OK... So what happens if we get rid of half of that trace by forcing >> asynchronous tasks such as this to run entirely in rpciod instead of >> first trying to run in the process context? >> >> See the attachment... >> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-05-06 21:19 ` Robert Wimmer 0 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-05-06 21:19 UTC (permalink / raw) To: Trond Myklebust, mst Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel I don't know if someone is still interested in this but I think Trond isn't further interested because the last error was of cource a "page allocation failure" and not a "soft lookup" which Trond was trying to solve. But the patch was for 2.6.34 and the "soft lookup" comes up only with some 2.6.30 and maybe some 2.6.31 kernel versions. But the first error I reported was a "page allocation failure" which all kernels >= 2.6.32 produces with this configuration I use (NFSv4). Michael suggested to first solve the "soft lookup" before further investigating the "page allocation failure". We know that the "soft lookup" only pop's up with NFSv4 and not v3. I really want to use v4 but since I'm not a kernel hacker someone must guide me what to try next. I know that you're all have a lot of other work to do but if there're no ideas left what to do next it's maybe best to close the bug for now and I stay with kernel 2.6.30 for now or go back to NFS v3 if I upgrade to a newer kernel. Maybe the error will be fixed "by accident" in >= 2.6.35 ;-) Thanks! Robert On 05/03/10 10:11, kernel@tauceti.net wrote: > Anything we can do to investigate this further? > > Thanks! > Robert > > > On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel@tauceti.net> > wrote: > >> I've applied the patch against the kernel which I got >> from "git clone ...." resulted in a kernel 2.6.34-rc5. >> >> The stack trace after mounting NFS is here: >> https://bugzilla.kernel.org/attachment.cgi?id=26166 >> /var/log/messages after soft lockup: >> https://bugzilla.kernel.org/attachment.cgi?id=26167 >> >> I hope that there is any usefull information in there. >> >> Thanks! >> Robert >> >> On 04/27/10 01:28, Trond Myklebust wrote: >> >>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: >>> >>> >>>>> Sure. In addition to what you did above, please do >>>>> >>>>> mount -t debugfs none /sys/kernel/debug >>>>> >>>>> and then cat the contents of the pseudofile at >>>>> >>>>> /sys/kernel/debug/tracing/stack_trace >>>>> >>>>> Please do this more or less immediately after you've finished >>>>> > mounting > >>>>> the NFSv4 client. >>>>> >>>>> >>>>> >>>> I've uploaded the stack trace. It was generated >>>> directly after mounting. Here are the stacks: >>>> >>>> After mounting: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26153 >>>> After the soft lockup: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26154 >>>> The dmesg output of the soft lockup: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26155 >>>> >>>> >>>> >>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does >>>>> > it > >>>>> use the 'refer' export option anywhere? If so, then we might have to >>>>> test further, since those may trigger the NFSv4 submount feature. >>>>> >>>>> >>>>> >>>> The server has the following settings: >>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash >>>> >>>> Thanks! >>>> Robert >>>> >>>> >>>> >>>> >>> That second trace is more than 5.5K deep, more than half of which is >>> socket overhead :-(((. >>> >>> The process stack does not appear to have overflowed, however that >>> > trace > >>> doesn't include any IRQ stack overhead. >>> >>> OK... So what happens if we get rid of half of that trace by forcing >>> asynchronous tasks such as this to run entirely in rpciod instead of >>> first trying to run in the process context? >>> >>> See the attachment... >>> >>> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-05-06 21:19 ` Robert Wimmer 0 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-05-06 21:19 UTC (permalink / raw) To: Trond Myklebust, mst Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel I don't know if someone is still interested in this but I think Trond isn't further interested because the last error was of cource a "page allocation failure" and not a "soft lookup" which Trond was trying to solve. But the patch was for 2.6.34 and the "soft lookup" comes up only with some 2.6.30 and maybe some 2.6.31 kernel versions. But the first error I reported was a "page allocation failure" which all kernels >= 2.6.32 produces with this configuration I use (NFSv4). Michael suggested to first solve the "soft lookup" before further investigating the "page allocation failure". We know that the "soft lookup" only pop's up with NFSv4 and not v3. I really want to use v4 but since I'm not a kernel hacker someone must guide me what to try next. I know that you're all have a lot of other work to do but if there're no ideas left what to do next it's maybe best to close the bug for now and I stay with kernel 2.6.30 for now or go back to NFS v3 if I upgrade to a newer kernel. Maybe the error will be fixed "by accident" in >= 2.6.35 ;-) Thanks! Robert On 05/03/10 10:11, kernel@tauceti.net wrote: > Anything we can do to investigate this further? > > Thanks! > Robert > > > On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel@tauceti.net> > wrote: > >> I've applied the patch against the kernel which I got >> from "git clone ...." resulted in a kernel 2.6.34-rc5. >> >> The stack trace after mounting NFS is here: >> https://bugzilla.kernel.org/attachment.cgi?id=26166 >> /var/log/messages after soft lockup: >> https://bugzilla.kernel.org/attachment.cgi?id=26167 >> >> I hope that there is any usefull information in there. >> >> Thanks! >> Robert >> >> On 04/27/10 01:28, Trond Myklebust wrote: >> >>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: >>> >>> >>>>> Sure. In addition to what you did above, please do >>>>> >>>>> mount -t debugfs none /sys/kernel/debug >>>>> >>>>> and then cat the contents of the pseudofile at >>>>> >>>>> /sys/kernel/debug/tracing/stack_trace >>>>> >>>>> Please do this more or less immediately after you've finished >>>>> > mounting > >>>>> the NFSv4 client. >>>>> >>>>> >>>>> >>>> I've uploaded the stack trace. It was generated >>>> directly after mounting. Here are the stacks: >>>> >>>> After mounting: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26153 >>>> After the soft lockup: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26154 >>>> The dmesg output of the soft lockup: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26155 >>>> >>>> >>>> >>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does >>>>> > it > >>>>> use the 'refer' export option anywhere? If so, then we might have to >>>>> test further, since those may trigger the NFSv4 submount feature. >>>>> >>>>> >>>>> >>>> The server has the following settings: >>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash >>>> >>>> Thanks! >>>> Robert >>>> >>>> >>>> >>>> >>> That second trace is more than 5.5K deep, more than half of which is >>> socket overhead :-(((. >>> >>> The process stack does not appear to have overflowed, however that >>> > trace > >>> doesn't include any IRQ stack overhead. >>> >>> OK... So what happens if we get rid of half of that trace by forcing >>> asynchronous tasks such as this to run entirely in rpciod instead of >>> first trying to run in the process context? >>> >>> See the attachment... >>> >>> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-05-06 21:19 ` Robert Wimmer 0 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-05-06 21:19 UTC (permalink / raw) To: Trond Myklebust, mst Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel I don't know if someone is still interested in this but I think Trond isn't further interested because the last error was of cource a "page allocation failure" and not a "soft lookup" which Trond was trying to solve. But the patch was for 2.6.34 and the "soft lookup" comes up only with some 2.6.30 and maybe some 2.6.31 kernel versions. But the first error I reported was a "page allocation failure" which all kernels >= 2.6.32 produces with this configuration I use (NFSv4). Michael suggested to first solve the "soft lookup" before further investigating the "page allocation failure". We know that the "soft lookup" only pop's up with NFSv4 and not v3. I really want to use v4 but since I'm not a kernel hacker someone must guide me what to try next. I know that you're all have a lot of other work to do but if there're no ideas left what to do next it's maybe best to close the bug for now and I stay with kernel 2.6.30 for now or go back to NFS v3 if I upgrade to a newer kernel. Maybe the error will be fixed "by accident" in >= 2.6.35 ;-) Thanks! Robert On 05/03/10 10:11, kernel-PAwl83ecUlHR7s880joybQ@public.gmane.org wrote: > Anything we can do to investigate this further? > > Thanks! > Robert > > > On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel-PAwl83ecUlHR7s880joybQ@public.gmane.org> > wrote: > >> I've applied the patch against the kernel which I got >> from "git clone ...." resulted in a kernel 2.6.34-rc5. >> >> The stack trace after mounting NFS is here: >> https://bugzilla.kernel.org/attachment.cgi?id=26166 >> /var/log/messages after soft lockup: >> https://bugzilla.kernel.org/attachment.cgi?id=26167 >> >> I hope that there is any usefull information in there. >> >> Thanks! >> Robert >> >> On 04/27/10 01:28, Trond Myklebust wrote: >> >>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: >>> >>> >>>>> Sure. In addition to what you did above, please do >>>>> >>>>> mount -t debugfs none /sys/kernel/debug >>>>> >>>>> and then cat the contents of the pseudofile at >>>>> >>>>> /sys/kernel/debug/tracing/stack_trace >>>>> >>>>> Please do this more or less immediately after you've finished >>>>> > mounting > >>>>> the NFSv4 client. >>>>> >>>>> >>>>> >>>> I've uploaded the stack trace. It was generated >>>> directly after mounting. Here are the stacks: >>>> >>>> After mounting: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26153 >>>> After the soft lockup: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26154 >>>> The dmesg output of the soft lockup: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26155 >>>> >>>> >>>> >>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does >>>>> > it > >>>>> use the 'refer' export option anywhere? If so, then we might have to >>>>> test further, since those may trigger the NFSv4 submount feature. >>>>> >>>>> >>>>> >>>> The server has the following settings: >>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash >>>> >>>> Thanks! >>>> Robert >>>> >>>> >>>> >>>> >>> That second trace is more than 5.5K deep, more than half of which is >>> socket overhead :-(((. >>> >>> The process stack does not appear to have overflowed, however that >>> > trace > >>> doesn't include any IRQ stack overhead. >>> >>> OK... So what happens if we get rid of half of that trace by forcing >>> asynchronous tasks such as this to run entirely in rpciod instead of >>> first trying to run in the process context? >>> >>> See the attachment... >>> >>> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-05-06 21:30 ` Trond Myklebust 0 siblings, 0 replies; 62+ messages in thread From: Trond Myklebust @ 2010-05-06 21:30 UTC (permalink / raw) To: Robert Wimmer Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Sorry. I've been caught up in work in the past few days. I can certainly help with the soft lockup if you are able to supply either a dump that includes all threads stuck in the NFS, or a (binary) wireshark dump that shows the NFSv4 traffic between the client and server around the time of the hang. Cheers Trond On Thu, 2010-05-06 at 23:19 +0200, Robert Wimmer wrote: > I don't know if someone is still interested in this > but I think Trond isn't further interested because > the last error was of cource a "page allocation > failure" and not a "soft lookup" which Trond was > trying to solve. But the patch was for 2.6.34 and > the "soft lookup" comes up only with some 2.6.30 and > maybe some 2.6.31 kernel versions. But the first error > I reported was a "page allocation failure" which > all kernels >= 2.6.32 produces with this configuration > I use (NFSv4). > > Michael suggested to first solve the "soft lookup" > before further investigating the "page allocation > failure". We know that the "soft lookup" only > pop's up with NFSv4 and not v3. I really want to > use v4 but since I'm not a kernel hacker someone > must guide me what to try next. > > I know that you're all have a lot of other work to > do but if there're no ideas left what to do next > it's maybe best to close the bug for now and I stay with > kernel 2.6.30 for now or go back to NFS v3 if I > upgrade to a newer kernel. Maybe the error will > be fixed "by accident" in >= 2.6.35 ;-) > > Thanks! > Robert > > > > On 05/03/10 10:11, kernel@tauceti.net wrote: > > Anything we can do to investigate this further? > > > > Thanks! > > Robert > > > > > > On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel@tauceti.net> > > wrote: > > > >> I've applied the patch against the kernel which I got > >> from "git clone ...." resulted in a kernel 2.6.34-rc5. > >> > >> The stack trace after mounting NFS is here: > >> https://bugzilla.kernel.org/attachment.cgi?id=26166 > >> /var/log/messages after soft lockup: > >> https://bugzilla.kernel.org/attachment.cgi?id=26167 > >> > >> I hope that there is any usefull information in there. > >> > >> Thanks! > >> Robert > >> > >> On 04/27/10 01:28, Trond Myklebust wrote: > >> > >>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: > >>> > >>> > >>>>> Sure. In addition to what you did above, please do > >>>>> > >>>>> mount -t debugfs none /sys/kernel/debug > >>>>> > >>>>> and then cat the contents of the pseudofile at > >>>>> > >>>>> /sys/kernel/debug/tracing/stack_trace > >>>>> > >>>>> Please do this more or less immediately after you've finished > >>>>> > > mounting > > > >>>>> the NFSv4 client. > >>>>> > >>>>> > >>>>> > >>>> I've uploaded the stack trace. It was generated > >>>> directly after mounting. Here are the stacks: > >>>> > >>>> After mounting: > >>>> https://bugzilla.kernel.org/attachment.cgi?id=26153 > >>>> After the soft lockup: > >>>> https://bugzilla.kernel.org/attachment.cgi?id=26154 > >>>> The dmesg output of the soft lockup: > >>>> https://bugzilla.kernel.org/attachment.cgi?id=26155 > >>>> > >>>> > >>>> > >>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does > >>>>> > > it > > > >>>>> use the 'refer' export option anywhere? If so, then we might have to > >>>>> test further, since those may trigger the NFSv4 submount feature. > >>>>> > >>>>> > >>>>> > >>>> The server has the following settings: > >>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash > >>>> > >>>> Thanks! > >>>> Robert > >>>> > >>>> > >>>> > >>>> > >>> That second trace is more than 5.5K deep, more than half of which is > >>> socket overhead :-(((. > >>> > >>> The process stack does not appear to have overflowed, however that > >>> > > trace > > > >>> doesn't include any IRQ stack overhead. > >>> > >>> OK... So what happens if we get rid of half of that trace by forcing > >>> asynchronous tasks such as this to run entirely in rpciod instead of > >>> first trying to run in the process context? > >>> > >>> See the attachment... > >>> > >>> > ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-05-06 21:30 ` Trond Myklebust 0 siblings, 0 replies; 62+ messages in thread From: Trond Myklebust @ 2010-05-06 21:30 UTC (permalink / raw) To: Robert Wimmer Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Sorry. I've been caught up in work in the past few days. I can certainly help with the soft lockup if you are able to supply either a dump that includes all threads stuck in the NFS, or a (binary) wireshark dump that shows the NFSv4 traffic between the client and server around the time of the hang. Cheers Trond On Thu, 2010-05-06 at 23:19 +0200, Robert Wimmer wrote: > I don't know if someone is still interested in this > but I think Trond isn't further interested because > the last error was of cource a "page allocation > failure" and not a "soft lookup" which Trond was > trying to solve. But the patch was for 2.6.34 and > the "soft lookup" comes up only with some 2.6.30 and > maybe some 2.6.31 kernel versions. But the first error > I reported was a "page allocation failure" which > all kernels >= 2.6.32 produces with this configuration > I use (NFSv4). > > Michael suggested to first solve the "soft lookup" > before further investigating the "page allocation > failure". We know that the "soft lookup" only > pop's up with NFSv4 and not v3. I really want to > use v4 but since I'm not a kernel hacker someone > must guide me what to try next. > > I know that you're all have a lot of other work to > do but if there're no ideas left what to do next > it's maybe best to close the bug for now and I stay with > kernel 2.6.30 for now or go back to NFS v3 if I > upgrade to a newer kernel. Maybe the error will > be fixed "by accident" in >= 2.6.35 ;-) > > Thanks! > Robert > > > > On 05/03/10 10:11, kernel@tauceti.net wrote: > > Anything we can do to investigate this further? > > > > Thanks! > > Robert > > > > > > On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel@tauceti.net> > > wrote: > > > >> I've applied the patch against the kernel which I got > >> from "git clone ...." resulted in a kernel 2.6.34-rc5. > >> > >> The stack trace after mounting NFS is here: > >> https://bugzilla.kernel.org/attachment.cgi?id=26166 > >> /var/log/messages after soft lockup: > >> https://bugzilla.kernel.org/attachment.cgi?id=26167 > >> > >> I hope that there is any usefull information in there. > >> > >> Thanks! > >> Robert > >> > >> On 04/27/10 01:28, Trond Myklebust wrote: > >> > >>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: > >>> > >>> > >>>>> Sure. In addition to what you did above, please do > >>>>> > >>>>> mount -t debugfs none /sys/kernel/debug > >>>>> > >>>>> and then cat the contents of the pseudofile at > >>>>> > >>>>> /sys/kernel/debug/tracing/stack_trace > >>>>> > >>>>> Please do this more or less immediately after you've finished > >>>>> > > mounting > > > >>>>> the NFSv4 client. > >>>>> > >>>>> > >>>>> > >>>> I've uploaded the stack trace. It was generated > >>>> directly after mounting. Here are the stacks: > >>>> > >>>> After mounting: > >>>> https://bugzilla.kernel.org/attachment.cgi?id=26153 > >>>> After the soft lockup: > >>>> https://bugzilla.kernel.org/attachment.cgi?id=26154 > >>>> The dmesg output of the soft lockup: > >>>> https://bugzilla.kernel.org/attachment.cgi?id=26155 > >>>> > >>>> > >>>> > >>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does > >>>>> > > it > > > >>>>> use the 'refer' export option anywhere? If so, then we might have to > >>>>> test further, since those may trigger the NFSv4 submount feature. > >>>>> > >>>>> > >>>>> > >>>> The server has the following settings: > >>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash > >>>> > >>>> Thanks! > >>>> Robert > >>>> > >>>> > >>>> > >>>> > >>> That second trace is more than 5.5K deep, more than half of which is > >>> socket overhead :-(((. > >>> > >>> The process stack does not appear to have overflowed, however that > >>> > > trace > > > >>> doesn't include any IRQ stack overhead. > >>> > >>> OK... So what happens if we get rid of half of that trace by forcing > >>> asynchronous tasks such as this to run entirely in rpciod instead of > >>> first trying to run in the process context? > >>> > >>> See the attachment... > >>> > >>> > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-05-06 21:30 ` Trond Myklebust 0 siblings, 0 replies; 62+ messages in thread From: Trond Myklebust @ 2010-05-06 21:30 UTC (permalink / raw) To: Robert Wimmer Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Sorry. I've been caught up in work in the past few days. I can certainly help with the soft lockup if you are able to supply either a dump that includes all threads stuck in the NFS, or a (binary) wireshark dump that shows the NFSv4 traffic between the client and server around the time of the hang. Cheers Trond On Thu, 2010-05-06 at 23:19 +0200, Robert Wimmer wrote: > I don't know if someone is still interested in this > but I think Trond isn't further interested because > the last error was of cource a "page allocation > failure" and not a "soft lookup" which Trond was > trying to solve. But the patch was for 2.6.34 and > the "soft lookup" comes up only with some 2.6.30 and > maybe some 2.6.31 kernel versions. But the first error > I reported was a "page allocation failure" which > all kernels >= 2.6.32 produces with this configuration > I use (NFSv4). > > Michael suggested to first solve the "soft lookup" > before further investigating the "page allocation > failure". We know that the "soft lookup" only > pop's up with NFSv4 and not v3. I really want to > use v4 but since I'm not a kernel hacker someone > must guide me what to try next. > > I know that you're all have a lot of other work to > do but if there're no ideas left what to do next > it's maybe best to close the bug for now and I stay with > kernel 2.6.30 for now or go back to NFS v3 if I > upgrade to a newer kernel. Maybe the error will > be fixed "by accident" in >= 2.6.35 ;-) > > Thanks! > Robert > > > > On 05/03/10 10:11, kernel-PAwl83ecUlHR7s880joybQ@public.gmane.org wrote: > > Anything we can do to investigate this further? > > > > Thanks! > > Robert > > > > > > On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel-PAwl83ecUlHR7s880joybQ@public.gmane.org> > > wrote: > > > >> I've applied the patch against the kernel which I got > >> from "git clone ...." resulted in a kernel 2.6.34-rc5. > >> > >> The stack trace after mounting NFS is here: > >> https://bugzilla.kernel.org/attachment.cgi?id=26166 > >> /var/log/messages after soft lockup: > >> https://bugzilla.kernel.org/attachment.cgi?id=26167 > >> > >> I hope that there is any usefull information in there. > >> > >> Thanks! > >> Robert > >> > >> On 04/27/10 01:28, Trond Myklebust wrote: > >> > >>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: > >>> > >>> > >>>>> Sure. In addition to what you did above, please do > >>>>> > >>>>> mount -t debugfs none /sys/kernel/debug > >>>>> > >>>>> and then cat the contents of the pseudofile at > >>>>> > >>>>> /sys/kernel/debug/tracing/stack_trace > >>>>> > >>>>> Please do this more or less immediately after you've finished > >>>>> > > mounting > > > >>>>> the NFSv4 client. > >>>>> > >>>>> > >>>>> > >>>> I've uploaded the stack trace. It was generated > >>>> directly after mounting. Here are the stacks: > >>>> > >>>> After mounting: > >>>> https://bugzilla.kernel.org/attachment.cgi?id=26153 > >>>> After the soft lockup: > >>>> https://bugzilla.kernel.org/attachment.cgi?id=26154 > >>>> The dmesg output of the soft lockup: > >>>> https://bugzilla.kernel.org/attachment.cgi?id=26155 > >>>> > >>>> > >>>> > >>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does > >>>>> > > it > > > >>>>> use the 'refer' export option anywhere? If so, then we might have to > >>>>> test further, since those may trigger the NFSv4 submount feature. > >>>>> > >>>>> > >>>>> > >>>> The server has the following settings: > >>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash > >>>> > >>>> Thanks! > >>>> Robert > >>>> > >>>> > >>>> > >>>> > >>> That second trace is more than 5.5K deep, more than half of which is > >>> socket overhead :-(((. > >>> > >>> The process stack does not appear to have overflowed, however that > >>> > > trace > > > >>> doesn't include any IRQ stack overhead. > >>> > >>> OK... So what happens if we get rid of half of that trace by forcing > >>> asynchronous tasks such as this to run entirely in rpciod instead of > >>> first trying to run in the process context? > >>> > >>> See the attachment... > >>> > >>> > ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-05-06 21:30 ` Trond Myklebust @ 2010-05-13 21:08 ` Robert Wimmer -1 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-05-13 21:08 UTC (permalink / raw) To: Trond Myklebust Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Finally I've had some time to do the next test. Here is a wireshark dump (~750 MByte): http://213.252.12.93/2.6.34-rc5.cap.gz dmesg output after page allocation failure: https://bugzilla.kernel.org/attachment.cgi?id=26371 stack trace before page allocation failure: https://bugzilla.kernel.org/attachment.cgi?id=26369 stack trace after page allocation failure: https://bugzilla.kernel.org/attachment.cgi?id=26370 I hope the wireshark dump is not to big to download. It was created with tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap Thanks! Robert On 05/06/10 23:30, Trond Myklebust wrote: > Sorry. I've been caught up in work in the past few days. > > I can certainly help with the soft lockup if you are able to supply > either a dump that includes all threads stuck in the NFS, or a (binary) > wireshark dump that shows the NFSv4 traffic between the client and > server around the time of the hang. > > Cheers > Trond > > On Thu, 2010-05-06 at 23:19 +0200, Robert Wimmer wrote: > >> I don't know if someone is still interested in this >> but I think Trond isn't further interested because >> the last error was of cource a "page allocation >> failure" and not a "soft lookup" which Trond was >> trying to solve. But the patch was for 2.6.34 and >> the "soft lookup" comes up only with some 2.6.30 and >> maybe some 2.6.31 kernel versions. But the first error >> I reported was a "page allocation failure" which >> all kernels >= 2.6.32 produces with this configuration >> I use (NFSv4). >> >> Michael suggested to first solve the "soft lookup" >> before further investigating the "page allocation >> failure". We know that the "soft lookup" only >> pop's up with NFSv4 and not v3. I really want to >> use v4 but since I'm not a kernel hacker someone >> must guide me what to try next. >> >> I know that you're all have a lot of other work to >> do but if there're no ideas left what to do next >> it's maybe best to close the bug for now and I stay with >> kernel 2.6.30 for now or go back to NFS v3 if I >> upgrade to a newer kernel. Maybe the error will >> be fixed "by accident" in >= 2.6.35 ;-) >> >> Thanks! >> Robert >> >> >> >> On 05/03/10 10:11, kernel@tauceti.net wrote: >> >>> Anything we can do to investigate this further? >>> >>> Thanks! >>> Robert >>> >>> >>> On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel@tauceti.net> >>> wrote: >>> >>> >>>> I've applied the patch against the kernel which I got >>>> from "git clone ...." resulted in a kernel 2.6.34-rc5. >>>> >>>> The stack trace after mounting NFS is here: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26166 >>>> /var/log/messages after soft lockup: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26167 >>>> >>>> I hope that there is any usefull information in there. >>>> >>>> Thanks! >>>> Robert >>>> >>>> On 04/27/10 01:28, Trond Myklebust wrote: >>>> >>>> >>>>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: >>>>> >>>>> >>>>> >>>>>>> Sure. In addition to what you did above, please do >>>>>>> >>>>>>> mount -t debugfs none /sys/kernel/debug >>>>>>> >>>>>>> and then cat the contents of the pseudofile at >>>>>>> >>>>>>> /sys/kernel/debug/tracing/stack_trace >>>>>>> >>>>>>> Please do this more or less immediately after you've finished >>>>>>> >>>>>>> >>> mounting >>> >>> >>>>>>> the NFSv4 client. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> I've uploaded the stack trace. It was generated >>>>>> directly after mounting. Here are the stacks: >>>>>> >>>>>> After mounting: >>>>>> https://bugzilla.kernel.org/attachment.cgi?id=26153 >>>>>> After the soft lockup: >>>>>> https://bugzilla.kernel.org/attachment.cgi?id=26154 >>>>>> The dmesg output of the soft lockup: >>>>>> https://bugzilla.kernel.org/attachment.cgi?id=26155 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does >>>>>>> >>>>>>> >>> it >>> >>> >>>>>>> use the 'refer' export option anywhere? If so, then we might have to >>>>>>> test further, since those may trigger the NFSv4 submount feature. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> The server has the following settings: >>>>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash >>>>>> >>>>>> Thanks! >>>>>> Robert >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> That second trace is more than 5.5K deep, more than half of which is >>>>> socket overhead :-(((. >>>>> >>>>> The process stack does not appear to have overflowed, however that >>>>> >>>>> >>> trace >>> >>> >>>>> doesn't include any IRQ stack overhead. >>>>> >>>>> OK... So what happens if we get rid of half of that trace by forcing >>>>> asynchronous tasks such as this to run entirely in rpciod instead of >>>>> first trying to run in the process context? >>>>> >>>>> See the attachment... >>>>> >>>>> >>>>> >> > > ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-05-13 21:08 ` Robert Wimmer 0 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-05-13 21:08 UTC (permalink / raw) To: Trond Myklebust Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Finally I've had some time to do the next test. Here is a wireshark dump (~750 MByte): http://213.252.12.93/2.6.34-rc5.cap.gz dmesg output after page allocation failure: https://bugzilla.kernel.org/attachment.cgi?id=26371 stack trace before page allocation failure: https://bugzilla.kernel.org/attachment.cgi?id=26369 stack trace after page allocation failure: https://bugzilla.kernel.org/attachment.cgi?id=26370 I hope the wireshark dump is not to big to download. It was created with tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap Thanks! Robert On 05/06/10 23:30, Trond Myklebust wrote: > Sorry. I've been caught up in work in the past few days. > > I can certainly help with the soft lockup if you are able to supply > either a dump that includes all threads stuck in the NFS, or a (binary) > wireshark dump that shows the NFSv4 traffic between the client and > server around the time of the hang. > > Cheers > Trond > > On Thu, 2010-05-06 at 23:19 +0200, Robert Wimmer wrote: > >> I don't know if someone is still interested in this >> but I think Trond isn't further interested because >> the last error was of cource a "page allocation >> failure" and not a "soft lookup" which Trond was >> trying to solve. But the patch was for 2.6.34 and >> the "soft lookup" comes up only with some 2.6.30 and >> maybe some 2.6.31 kernel versions. But the first error >> I reported was a "page allocation failure" which >> all kernels >= 2.6.32 produces with this configuration >> I use (NFSv4). >> >> Michael suggested to first solve the "soft lookup" >> before further investigating the "page allocation >> failure". We know that the "soft lookup" only >> pop's up with NFSv4 and not v3. I really want to >> use v4 but since I'm not a kernel hacker someone >> must guide me what to try next. >> >> I know that you're all have a lot of other work to >> do but if there're no ideas left what to do next >> it's maybe best to close the bug for now and I stay with >> kernel 2.6.30 for now or go back to NFS v3 if I >> upgrade to a newer kernel. Maybe the error will >> be fixed "by accident" in >= 2.6.35 ;-) >> >> Thanks! >> Robert >> >> >> >> On 05/03/10 10:11, kernel@tauceti.net wrote: >> >>> Anything we can do to investigate this further? >>> >>> Thanks! >>> Robert >>> >>> >>> On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel@tauceti.net> >>> wrote: >>> >>> >>>> I've applied the patch against the kernel which I got >>>> from "git clone ...." resulted in a kernel 2.6.34-rc5. >>>> >>>> The stack trace after mounting NFS is here: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26166 >>>> /var/log/messages after soft lockup: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26167 >>>> >>>> I hope that there is any usefull information in there. >>>> >>>> Thanks! >>>> Robert >>>> >>>> On 04/27/10 01:28, Trond Myklebust wrote: >>>> >>>> >>>>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: >>>>> >>>>> >>>>> >>>>>>> Sure. In addition to what you did above, please do >>>>>>> >>>>>>> mount -t debugfs none /sys/kernel/debug >>>>>>> >>>>>>> and then cat the contents of the pseudofile at >>>>>>> >>>>>>> /sys/kernel/debug/tracing/stack_trace >>>>>>> >>>>>>> Please do this more or less immediately after you've finished >>>>>>> >>>>>>> >>> mounting >>> >>> >>>>>>> the NFSv4 client. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> I've uploaded the stack trace. It was generated >>>>>> directly after mounting. Here are the stacks: >>>>>> >>>>>> After mounting: >>>>>> https://bugzilla.kernel.org/attachment.cgi?id=26153 >>>>>> After the soft lockup: >>>>>> https://bugzilla.kernel.org/attachment.cgi?id=26154 >>>>>> The dmesg output of the soft lockup: >>>>>> https://bugzilla.kernel.org/attachment.cgi?id=26155 >>>>>> >>>>>> >>>>>> >>>>>> >>>>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does >>>>>>> >>>>>>> >>> it >>> >>> >>>>>>> use the 'refer' export option anywhere? If so, then we might have to >>>>>>> test further, since those may trigger the NFSv4 submount feature. >>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>> The server has the following settings: >>>>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash >>>>>> >>>>>> Thanks! >>>>>> Robert >>>>>> >>>>>> >>>>>> >>>>>> >>>>>> >>>>> That second trace is more than 5.5K deep, more than half of which is >>>>> socket overhead :-(((. >>>>> >>>>> The process stack does not appear to have overflowed, however that >>>>> >>>>> >>> trace >>> >>> >>>>> doesn't include any IRQ stack overhead. >>>>> >>>>> OK... So what happens if we get rid of half of that trace by forcing >>>>> asynchronous tasks such as this to run entirely in rpciod instead of >>>>> first trying to run in the process context? >>>>> >>>>> See the attachment... >>>>> >>>>> >>>>> >> > > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-05-13 21:13 ` Trond Myklebust 0 siblings, 0 replies; 62+ messages in thread From: Trond Myklebust @ 2010-05-13 21:13 UTC (permalink / raw) To: Robert Wimmer Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: > Finally I've had some time to do the next test. > Here is a wireshark dump (~750 MByte): > http://213.252.12.93/2.6.34-rc5.cap.gz > > dmesg output after page allocation failure: > https://bugzilla.kernel.org/attachment.cgi?id=26371 > > stack trace before page allocation failure: > https://bugzilla.kernel.org/attachment.cgi?id=26369 > > stack trace after page allocation failure: > https://bugzilla.kernel.org/attachment.cgi?id=26370 > > I hope the wireshark dump is not to big to download. > It was created with > tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap > > Thanks! > Robert Hi Robert, I tried the above wireshark dump URL, but it appears to point to an empty file. Cheers Trond ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-05-13 21:13 ` Trond Myklebust 0 siblings, 0 replies; 62+ messages in thread From: Trond Myklebust @ 2010-05-13 21:13 UTC (permalink / raw) To: Robert Wimmer Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: > Finally I've had some time to do the next test. > Here is a wireshark dump (~750 MByte): > http://213.252.12.93/2.6.34-rc5.cap.gz > > dmesg output after page allocation failure: > https://bugzilla.kernel.org/attachment.cgi?id=26371 > > stack trace before page allocation failure: > https://bugzilla.kernel.org/attachment.cgi?id=26369 > > stack trace after page allocation failure: > https://bugzilla.kernel.org/attachment.cgi?id=26370 > > I hope the wireshark dump is not to big to download. > It was created with > tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap > > Thanks! > Robert Hi Robert, I tried the above wireshark dump URL, but it appears to point to an empty file. Cheers Trond -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-05-13 21:13 ` Trond Myklebust 0 siblings, 0 replies; 62+ messages in thread From: Trond Myklebust @ 2010-05-13 21:13 UTC (permalink / raw) To: Robert Wimmer Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: > Finally I've had some time to do the next test. > Here is a wireshark dump (~750 MByte): > http://213.252.12.93/2.6.34-rc5.cap.gz > > dmesg output after page allocation failure: > https://bugzilla.kernel.org/attachment.cgi?id=26371 > > stack trace before page allocation failure: > https://bugzilla.kernel.org/attachment.cgi?id=26369 > > stack trace after page allocation failure: > https://bugzilla.kernel.org/attachment.cgi?id=26370 > > I hope the wireshark dump is not to big to download. > It was created with > tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap > > Thanks! > Robert Hi Robert, I tried the above wireshark dump URL, but it appears to point to an empty file. Cheers Trond ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-05-13 21:13 ` Trond Myklebust (?) @ 2010-05-14 5:42 ` Robert Wimmer -1 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-05-14 5:42 UTC (permalink / raw) To: Trond Myklebust Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Hi Trond, I'm sorry. There was a Varnish in front of that webserver which doesn't like so big files ;-) Please try this url: http://213.252.12.34/2.6.34-rc5.cap.gz It work's for me. Thanks! Robert On 05/13/10 23:13, Trond Myklebust wrote: > On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: > >> Finally I've had some time to do the next test. >> Here is a wireshark dump (~750 MByte): >> http://213.252.12.93/2.6.34-rc5.cap.gz >> >> dmesg output after page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26371 >> >> stack trace before page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26369 >> >> stack trace after page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26370 >> >> I hope the wireshark dump is not to big to download. >> It was created with >> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap >> >> Thanks! >> Robert >> > Hi Robert, > > I tried the above wireshark dump URL, but it appears to point to an > empty file. > > Cheers > Trond > ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-05-14 5:42 ` Robert Wimmer 0 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-05-14 5:42 UTC (permalink / raw) To: Trond Myklebust Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Hi Trond, I'm sorry. There was a Varnish in front of that webserver which doesn't like so big files ;-) Please try this url: http://213.252.12.34/2.6.34-rc5.cap.gz It work's for me. Thanks! Robert On 05/13/10 23:13, Trond Myklebust wrote: > On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: > >> Finally I've had some time to do the next test. >> Here is a wireshark dump (~750 MByte): >> http://213.252.12.93/2.6.34-rc5.cap.gz >> >> dmesg output after page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26371 >> >> stack trace before page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26369 >> >> stack trace after page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26370 >> >> I hope the wireshark dump is not to big to download. >> It was created with >> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap >> >> Thanks! >> Robert >> > Hi Robert, > > I tried the above wireshark dump URL, but it appears to point to an > empty file. > > Cheers > Trond > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-05-14 5:42 ` Robert Wimmer 0 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-05-14 5:42 UTC (permalink / raw) To: Trond Myklebust Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Hi Trond, I'm sorry. There was a Varnish in front of that webserver which doesn't like so big files ;-) Please try this url: http://213.252.12.34/2.6.34-rc5.cap.gz It work's for me. Thanks! Robert On 05/13/10 23:13, Trond Myklebust wrote: > On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: > >> Finally I've had some time to do the next test. >> Here is a wireshark dump (~750 MByte): >> http://213.252.12.93/2.6.34-rc5.cap.gz >> >> dmesg output after page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26371 >> >> stack trace before page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26369 >> >> stack trace after page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26370 >> >> I hope the wireshark dump is not to big to download. >> It was created with >> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap >> >> Thanks! >> Robert >> > Hi Robert, > > I tried the above wireshark dump URL, but it appears to point to an > empty file. > > Cheers > Trond > -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-05-20 7:39 ` kernel 0 siblings, 0 replies; 62+ messages in thread From: kernel @ 2010-05-20 7:39 UTC (permalink / raw) To: Trond Myklebust Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Hi Trond, have you had some time to download the wireshark dump? Thanks! Robert On Thu, 13 May 2010 17:13:54 -0400, Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: >> Finally I've had some time to do the next test. >> Here is a wireshark dump (~750 MByte): >> http://213.252.12.93/2.6.34-rc5.cap.gz >> >> dmesg output after page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26371 >> >> stack trace before page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26369 >> >> stack trace after page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26370 >> >> I hope the wireshark dump is not to big to download. >> It was created with >> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap >> >> Thanks! >> Robert > > Hi Robert, > > I tried the above wireshark dump URL, but it appears to point to an > empty file. > > Cheers > Trond ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-05-20 7:39 ` kernel 0 siblings, 0 replies; 62+ messages in thread From: kernel @ 2010-05-20 7:39 UTC (permalink / raw) To: Trond Myklebust Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Hi Trond, have you had some time to download the wireshark dump? Thanks! Robert On Thu, 13 May 2010 17:13:54 -0400, Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: >> Finally I've had some time to do the next test. >> Here is a wireshark dump (~750 MByte): >> http://213.252.12.93/2.6.34-rc5.cap.gz >> >> dmesg output after page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26371 >> >> stack trace before page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26369 >> >> stack trace after page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26370 >> >> I hope the wireshark dump is not to big to download. >> It was created with >> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap >> >> Thanks! >> Robert > > Hi Robert, > > I tried the above wireshark dump URL, but it appears to point to an > empty file. > > Cheers > Trond -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-05-20 7:39 ` kernel 0 siblings, 0 replies; 62+ messages in thread From: kernel @ 2010-05-20 7:39 UTC (permalink / raw) To: Trond Myklebust Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Hi Trond, have you had some time to download the wireshark dump? Thanks! Robert On Thu, 13 May 2010 17:13:54 -0400, Trond Myklebust <Trond.Myklebust@netapp.com> wrote: > On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: >> Finally I've had some time to do the next test. >> Here is a wireshark dump (~750 MByte): >> http://213.252.12.93/2.6.34-rc5.cap.gz >> >> dmesg output after page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26371 >> >> stack trace before page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26369 >> >> stack trace after page allocation failure: >> https://bugzilla.kernel.org/attachment.cgi?id=26370 >> >> I hope the wireshark dump is not to big to download. >> It was created with >> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap >> >> Thanks! >> Robert > > Hi Robert, > > I tried the above wireshark dump URL, but it appears to point to an > empty file. > > Cheers > Trond ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-05-25 20:01 ` Robert Wimmer 0 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-05-25 20:01 UTC (permalink / raw) To: Trond Myklebust Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Hi Trond, just a little reminder ;-) Thanks! Robert On 05/20/10 09:39, kernel@tauceti.net wrote: > Hi Trond, > > have you had some time to download the wireshark dump? > > Thanks! > Robert > > On Thu, 13 May 2010 17:13:54 -0400, Trond Myklebust > <Trond.Myklebust@netapp.com> wrote: > >> On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: >> >>> Finally I've had some time to do the next test. >>> Here is a wireshark dump (~750 MByte): >>> http://213.252.12.93/2.6.34-rc5.cap.gz >>> >>> dmesg output after page allocation failure: >>> https://bugzilla.kernel.org/attachment.cgi?id=26371 >>> >>> stack trace before page allocation failure: >>> https://bugzilla.kernel.org/attachment.cgi?id=26369 >>> >>> stack trace after page allocation failure: >>> https://bugzilla.kernel.org/attachment.cgi?id=26370 >>> >>> I hope the wireshark dump is not to big to download. >>> It was created with >>> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap >>> >>> Thanks! >>> Robert >>> >> Hi Robert, >> >> I tried the above wireshark dump URL, but it appears to point to an >> empty file. >> >> Cheers >> Trond >> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-05-25 20:01 ` Robert Wimmer 0 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-05-25 20:01 UTC (permalink / raw) To: Trond Myklebust Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Hi Trond, just a little reminder ;-) Thanks! Robert On 05/20/10 09:39, kernel@tauceti.net wrote: > Hi Trond, > > have you had some time to download the wireshark dump? > > Thanks! > Robert > > On Thu, 13 May 2010 17:13:54 -0400, Trond Myklebust > <Trond.Myklebust@netapp.com> wrote: > >> On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: >> >>> Finally I've had some time to do the next test. >>> Here is a wireshark dump (~750 MByte): >>> http://213.252.12.93/2.6.34-rc5.cap.gz >>> >>> dmesg output after page allocation failure: >>> https://bugzilla.kernel.org/attachment.cgi?id=26371 >>> >>> stack trace before page allocation failure: >>> https://bugzilla.kernel.org/attachment.cgi?id=26369 >>> >>> stack trace after page allocation failure: >>> https://bugzilla.kernel.org/attachment.cgi?id=26370 >>> >>> I hope the wireshark dump is not to big to download. >>> It was created with >>> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap >>> >>> Thanks! >>> Robert >>> >> Hi Robert, >> >> I tried the above wireshark dump URL, but it appears to point to an >> empty file. >> >> Cheers >> Trond >> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-05-25 20:01 ` Robert Wimmer 0 siblings, 0 replies; 62+ messages in thread From: Robert Wimmer @ 2010-05-25 20:01 UTC (permalink / raw) To: Trond Myklebust Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Hi Trond, just a little reminder ;-) Thanks! Robert On 05/20/10 09:39, kernel-PAwl83ecUlHR7s880joybQ@public.gmane.org wrote: > Hi Trond, > > have you had some time to download the wireshark dump? > > Thanks! > Robert > > On Thu, 13 May 2010 17:13:54 -0400, Trond Myklebust > <Trond.Myklebust@netapp.com> wrote: > >> On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: >> >>> Finally I've had some time to do the next test. >>> Here is a wireshark dump (~750 MByte): >>> http://213.252.12.93/2.6.34-rc5.cap.gz >>> >>> dmesg output after page allocation failure: >>> https://bugzilla.kernel.org/attachment.cgi?id=26371 >>> >>> stack trace before page allocation failure: >>> https://bugzilla.kernel.org/attachment.cgi?id=26369 >>> >>> stack trace after page allocation failure: >>> https://bugzilla.kernel.org/attachment.cgi?id=26370 >>> >>> I hope the wireshark dump is not to big to download. >>> It was created with >>> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap >>> >>> Thanks! >>> Robert >>> >> Hi Robert, >> >> I tried the above wireshark dump URL, but it appears to point to an >> empty file. >> >> Cheers >> Trond >> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure 2010-05-25 20:01 ` Robert Wimmer @ 2010-06-02 11:56 ` kernel -1 siblings, 0 replies; 62+ messages in thread From: kernel @ 2010-06-02 11:56 UTC (permalink / raw) To: Robert Wimmer Cc: Trond Myklebust, mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Hi Trond, currently it seems that the problem was fixed by accident... ;-) Since 2.6.34 is now in Gentoo portage I thought I should give it a try. Using my 2.6.35-r5 .config the 2.6.34 release is now working for 4 hours (instead of 5-10 minutes before). Hmmm... Hopefully it will run for some more hours and days now. Since I've definitely changed nothing besides the kernel it must have been fixed (hopefully) in one of the 2.6.34-rc's. If it's still running tomorrow I'll close the bug. Greetings Robert On Tue, 25 May 2010 22:01:54 +0200, Robert Wimmer <kernel@tauceti.net> wrote: > Hi Trond, > > just a little reminder ;-) > > Thanks! > Robert > > On 05/20/10 09:39, kernel@tauceti.net wrote: >> Hi Trond, >> >> have you had some time to download the wireshark dump? >> >> Thanks! >> Robert >> >> On Thu, 13 May 2010 17:13:54 -0400, Trond Myklebust >> <Trond.Myklebust@netapp.com> wrote: >> >>> On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: >>> >>>> Finally I've had some time to do the next test. >>>> Here is a wireshark dump (~750 MByte): >>>> http://213.252.12.93/2.6.34-rc5.cap.gz >>>> >>>> dmesg output after page allocation failure: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26371 >>>> >>>> stack trace before page allocation failure: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26369 >>>> >>>> stack trace after page allocation failure: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26370 >>>> >>>> I hope the wireshark dump is not to big to download. >>>> It was created with >>>> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap >>>> >>>> Thanks! >>>> Robert >>>> >>> Hi Robert, >>> >>> I tried the above wireshark dump URL, but it appears to point to an >>> empty file. >>> >>> Cheers >>> Trond >>> ^ permalink raw reply [flat|nested] 62+ messages in thread
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure @ 2010-06-02 11:56 ` kernel 0 siblings, 0 replies; 62+ messages in thread From: kernel @ 2010-06-02 11:56 UTC (permalink / raw) To: Robert Wimmer Cc: Trond Myklebust, mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs, linux-kernel Hi Trond, currently it seems that the problem was fixed by accident... ;-) Since 2.6.34 is now in Gentoo portage I thought I should give it a try. Using my 2.6.35-r5 .config the 2.6.34 release is now working for 4 hours (instead of 5-10 minutes before). Hmmm... Hopefully it will run for some more hours and days now. Since I've definitely changed nothing besides the kernel it must have been fixed (hopefully) in one of the 2.6.34-rc's. If it's still running tomorrow I'll close the bug. Greetings Robert On Tue, 25 May 2010 22:01:54 +0200, Robert Wimmer <kernel@tauceti.net> wrote: > Hi Trond, > > just a little reminder ;-) > > Thanks! > Robert > > On 05/20/10 09:39, kernel@tauceti.net wrote: >> Hi Trond, >> >> have you had some time to download the wireshark dump? >> >> Thanks! >> Robert >> >> On Thu, 13 May 2010 17:13:54 -0400, Trond Myklebust >> <Trond.Myklebust@netapp.com> wrote: >> >>> On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: >>> >>>> Finally I've had some time to do the next test. >>>> Here is a wireshark dump (~750 MByte): >>>> http://213.252.12.93/2.6.34-rc5.cap.gz >>>> >>>> dmesg output after page allocation failure: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26371 >>>> >>>> stack trace before page allocation failure: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26369 >>>> >>>> stack trace after page allocation failure: >>>> https://bugzilla.kernel.org/attachment.cgi?id=26370 >>>> >>>> I hope the wireshark dump is not to big to download. >>>> It was created with >>>> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap >>>> >>>> Thanks! >>>> Robert >>>> >>> Hi Robert, >>> >>> I tried the above wireshark dump URL, but it appears to point to an >>> empty file. >>> >>> Cheers >>> Trond >>> -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a> ^ permalink raw reply [flat|nested] 62+ messages in thread
end of thread, other threads:[~2010-06-02 9:57 UTC | newest] Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <bug-15709-10286@https.bugzilla.kernel.org/> 2010-04-08 19:34 ` [Bugme-new] [Bug 15709] New: swapper page allocation failure Andrew Morton 2010-04-08 19:39 ` Avi Kivity 2010-04-08 20:04 ` Michael S. Tsirkin 2010-04-09 10:15 ` Robert Wimmer 2010-04-11 11:03 ` Michael S. Tsirkin 2010-04-12 9:25 ` Robert Wimmer 2010-04-12 11:23 ` Michael S. Tsirkin 2010-04-12 13:50 ` Robert Wimmer 2010-04-12 13:52 ` Michael S. Tsirkin 2010-04-13 8:51 ` Robert Wimmer 2010-04-19 12:55 ` Robert Wimmer 2010-04-19 13:17 ` Michael S. Tsirkin 2010-04-21 11:23 ` kernel 2010-04-21 9:42 ` Michael S. Tsirkin 2010-04-22 11:31 ` kernel 2010-04-22 10:03 ` Michael S. Tsirkin 2010-04-22 10:03 ` Michael S. Tsirkin 2010-04-23 5:26 ` Robert Wimmer 2010-04-23 5:26 ` Robert Wimmer 2010-04-25 9:18 ` Michael S. Tsirkin 2010-04-25 9:18 ` Michael S. Tsirkin 2010-04-25 20:41 ` Robert Wimmer 2010-04-25 20:41 ` Robert Wimmer 2010-04-25 20:49 ` Michael S. Tsirkin 2010-04-25 20:49 ` Michael S. Tsirkin 2010-04-26 12:15 ` Trond Myklebust 2010-04-26 12:15 ` Trond Myklebust 2010-04-26 12:15 ` Trond Myklebust 2010-04-26 20:25 ` Robert Wimmer 2010-04-26 20:25 ` Robert Wimmer 2010-04-26 21:04 ` Trond Myklebust 2010-04-26 21:04 ` Trond Myklebust 2010-04-26 21:04 ` Trond Myklebust 2010-04-26 22:18 ` Robert Wimmer 2010-04-26 22:18 ` Robert Wimmer 2010-04-26 23:28 ` Trond Myklebust 2010-04-27 22:56 ` Robert Wimmer 2010-04-27 22:56 ` Robert Wimmer 2010-05-03 8:11 ` kernel 2010-05-03 8:11 ` kernel 2010-05-06 21:19 ` Robert Wimmer 2010-05-06 21:19 ` Robert Wimmer 2010-05-06 21:19 ` Robert Wimmer 2010-05-06 21:30 ` Trond Myklebust 2010-05-06 21:30 ` Trond Myklebust 2010-05-06 21:30 ` Trond Myklebust 2010-05-13 21:08 ` Robert Wimmer 2010-05-13 21:08 ` Robert Wimmer 2010-05-13 21:13 ` Trond Myklebust 2010-05-13 21:13 ` Trond Myklebust 2010-05-13 21:13 ` Trond Myklebust 2010-05-14 5:42 ` Robert Wimmer 2010-05-14 5:42 ` Robert Wimmer 2010-05-14 5:42 ` Robert Wimmer 2010-05-20 7:39 ` kernel 2010-05-20 7:39 ` kernel 2010-05-20 7:39 ` kernel 2010-05-25 20:01 ` Robert Wimmer 2010-05-25 20:01 ` Robert Wimmer 2010-05-25 20:01 ` Robert Wimmer 2010-06-02 11:56 ` kernel 2010-06-02 11:56 ` kernel
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.