All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
       [not found] <bug-15709-10286@https.bugzilla.kernel.org/>
@ 2010-04-08 19:34 ` Andrew Morton
  2010-04-08 19:39   ` Avi Kivity
  0 siblings, 1 reply; 62+ messages in thread
From: Andrew Morton @ 2010-04-08 19:34 UTC (permalink / raw)
  To: linux-mm
  Cc: bugzilla-daemon, bugme-daemon, Avi Kivity, Rusty Russell, kernel,
	Mel Gorman


(switched to email.  Please respond via emailed reply-to-all, not via the
bugzilla web interface).

On Wed, 7 Apr 2010 10:29:20 GMT
bugzilla-daemon@bugzilla.kernel.org wrote:

> https://bugzilla.kernel.org/show_bug.cgi?id=15709
> 
>            Summary: swapper page allocation failure
>            Product: Memory Management
>            Version: 2.5
>     Kernel Version: 2.6.32 and 2.6.33
>           Platform: All
>         OS/Version: Linux
>               Tree: Mainline
>             Status: NEW
>           Severity: normal
>           Priority: P1
>          Component: Slab Allocator
>         AssignedTo: akpm@linux-foundation.org
>         ReportedBy: kernel@tauceti.net
>         Regression: No
> 
> 
> Created an attachment (id=25903)
>  --> (https://bugzilla.kernel.org/attachment.cgi?id=25903)
> dmesg output
> 
> I'm having problems with "swapper page allocation failure's" since upgrading
> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual
> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As
> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after
> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see
> attachment of dmesg output). The guest is only running a Apache webserver and
> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried
> different kernel configurations (e.g. a unmodified version from Sabayon Linux
> Distribution) but doesn't help. Load of the guest (and host) is very low.
> Network traffic is about 20-50 MBit/s.
> 

hm, this is a regression.

: [  454.006706] users: page allocation failure. order:0, mode:0x20
: [  454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2
: [  454.006714] Call Trace:
: [  454.006717]  <IRQ>  [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615
: [  454.006796]  [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d
: [  454.006820]  [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f
: [  454.006842]  [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f
: [  454.006846]  [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7
: [  454.006858]  [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4
: [  454.006873]  [<ffffffff8176035d>] net_rx_action+0xad/0x1a5
: [  454.006882]  [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127
: [  454.006897]  [<ffffffff81008ffc>] call_softirq+0x1c/0x30
: [  454.006901]  [<ffffffff8100af01>] do_softirq+0x41/0x7e
: [  454.006904]  [<ffffffff8104b3e3>] irq_exit+0x36/0x75
: [  454.006907]  [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1
: [  454.006926]  [<ffffffff8183bc13>] ret_from_intr+0x0/0x11
: [  454.006928]  <EOI>  [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7
: [  454.006942]  [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7
: [  454.006946]  [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35
: [  454.006949]  [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b
: [  454.006953]  [<ffffffff810aba67>] __do_fault+0x3c4/0x492
: [  454.006957]  [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8
: [  454.006966]  [<ffffffff810deb59>] ? path_put+0x2c/0x30
: [  454.006975]  [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a
: [  454.006979]  [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd
: [  454.006982]  [<ffffffff8183bef5>] page_fault+0x25/0x30
: [  454.006985] Mem-Info:
: [  454.006987] Node 0 DMA per-cpu:
: [  454.006990] CPU    0: hi:    0, btch:   1 usd:   0
: [  454.006992] CPU    1: hi:    0, btch:   1 usd:   0
: [  454.006993] Node 0 DMA32 per-cpu:
: [  454.006996] CPU    0: hi:  186, btch:  31 usd: 185
: [  454.006998] CPU    1: hi:  186, btch:  31 usd: 112
: [  454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0
: [  454.007005]  active_file:4882 inactive_file:205902 isolated_file:0
: [  454.007006]  unevictable:0 dirty:11 writeback:0 unstable:0
: [  454.007007]  free:1385 slab_reclaimable:2445 slab_unreclaimable:4466
: [  454.007008]  mapped:1895 shmem:113 pagetables:1370 bounce:0
: [  454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
: [  454.007021] lowmem_reserve[]: 0 994 994 994
: [  454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
: [  454.007036] lowmem_reserve[]: 0 0 0 0
: [  454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB
: [  454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB
: [  454.007059] 210914 total pagecache pages
: [  454.007061] 0 pages in swap cache
: [  454.007063] Swap cache stats: add 0, delete 0, find 0/0
: [  454.007065] Free swap  = 1959924kB
: [  454.007067] Total swap = 1959924kB
: [  454.014238] 262140 pages RAM
: [  454.014241] 7489 pages reserved
: [  454.014242] 21430 pages shared
: [  454.014244] 247174 pages non-shared

Either page reclaim got worse or kvm/virtio-net got more aggressive.  

Avi, Rusty: can you think of any changes in the KVM/virtio area in the
2.6.30 -> 2.6.32 timeframe which may have increased the GFP_ATOMIC
demands upon the page allocator?

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-08 19:34 ` [Bugme-new] [Bug 15709] New: swapper page allocation failure Andrew Morton
@ 2010-04-08 19:39   ` Avi Kivity
  2010-04-08 20:04     ` Michael S. Tsirkin
  0 siblings, 1 reply; 62+ messages in thread
From: Avi Kivity @ 2010-04-08 19:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, bugzilla-daemon, bugme-daemon, Rusty Russell, kernel,
	Mel Gorman, Michael S. Tsirkin

cc: mst

On 04/08/2010 10:34 PM, Andrew Morton wrote:
> (switched to email.  Please respond via emailed reply-to-all, not via the
> bugzilla web interface).
>
> On Wed, 7 Apr 2010 10:29:20 GMT
> bugzilla-daemon@bugzilla.kernel.org wrote:
>
>    
>> https://bugzilla.kernel.org/show_bug.cgi?id=15709
>>
>>             Summary: swapper page allocation failure
>>             Product: Memory Management
>>             Version: 2.5
>>      Kernel Version: 2.6.32 and 2.6.33
>>            Platform: All
>>          OS/Version: Linux
>>                Tree: Mainline
>>              Status: NEW
>>            Severity: normal
>>            Priority: P1
>>           Component: Slab Allocator
>>          AssignedTo: akpm@linux-foundation.org
>>          ReportedBy: kernel@tauceti.net
>>          Regression: No
>>
>>
>> Created an attachment (id=25903)
>>   -->  (https://bugzilla.kernel.org/attachment.cgi?id=25903)
>> dmesg output
>>
>> I'm having problems with "swapper page allocation failure's" since upgrading
>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual
>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As
>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after
>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see
>> attachment of dmesg output). The guest is only running a Apache webserver and
>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried
>> different kernel configurations (e.g. a unmodified version from Sabayon Linux
>> Distribution) but doesn't help. Load of the guest (and host) is very low.
>> Network traffic is about 20-50 MBit/s.
>>
>>      
> hm, this is a regression.
>
> : [  454.006706] users: page allocation failure. order:0, mode:0x20
> : [  454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2
> : [  454.006714] Call Trace:
> : [  454.006717]<IRQ>   [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615
> : [  454.006796]  [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d
> : [  454.006820]  [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f
> : [  454.006842]  [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f
> : [  454.006846]  [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7
> : [  454.006858]  [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4
> : [  454.006873]  [<ffffffff8176035d>] net_rx_action+0xad/0x1a5
> : [  454.006882]  [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127
> : [  454.006897]  [<ffffffff81008ffc>] call_softirq+0x1c/0x30
> : [  454.006901]  [<ffffffff8100af01>] do_softirq+0x41/0x7e
> : [  454.006904]  [<ffffffff8104b3e3>] irq_exit+0x36/0x75
> : [  454.006907]  [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1
> : [  454.006926]  [<ffffffff8183bc13>] ret_from_intr+0x0/0x11
> : [  454.006928]<EOI>   [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7
> : [  454.006942]  [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7
> : [  454.006946]  [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35
> : [  454.006949]  [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b
> : [  454.006953]  [<ffffffff810aba67>] __do_fault+0x3c4/0x492
> : [  454.006957]  [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8
> : [  454.006966]  [<ffffffff810deb59>] ? path_put+0x2c/0x30
> : [  454.006975]  [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a
> : [  454.006979]  [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd
> : [  454.006982]  [<ffffffff8183bef5>] page_fault+0x25/0x30
> : [  454.006985] Mem-Info:
> : [  454.006987] Node 0 DMA per-cpu:
> : [  454.006990] CPU    0: hi:    0, btch:   1 usd:   0
> : [  454.006992] CPU    1: hi:    0, btch:   1 usd:   0
> : [  454.006993] Node 0 DMA32 per-cpu:
> : [  454.006996] CPU    0: hi:  186, btch:  31 usd: 185
> : [  454.006998] CPU    1: hi:  186, btch:  31 usd: 112
> : [  454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0
> : [  454.007005]  active_file:4882 inactive_file:205902 isolated_file:0
> : [  454.007006]  unevictable:0 dirty:11 writeback:0 unstable:0
> : [  454.007007]  free:1385 slab_reclaimable:2445 slab_unreclaimable:4466
> : [  454.007008]  mapped:1895 shmem:113 pagetables:1370 bounce:0
> : [  454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> : [  454.007021] lowmem_reserve[]: 0 994 994 994
> : [  454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> : [  454.007036] lowmem_reserve[]: 0 0 0 0
> : [  454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB
> : [  454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB
> : [  454.007059] 210914 total pagecache pages
> : [  454.007061] 0 pages in swap cache
> : [  454.007063] Swap cache stats: add 0, delete 0, find 0/0
> : [  454.007065] Free swap  = 1959924kB
> : [  454.007067] Total swap = 1959924kB
> : [  454.014238] 262140 pages RAM
> : [  454.014241] 7489 pages reserved
> : [  454.014242] 21430 pages shared
> : [  454.014244] 247174 pages non-shared
>
> Either page reclaim got worse or kvm/virtio-net got more aggressive.
>
> Avi, Rusty: can you think of any changes in the KVM/virtio area in the
> 2.6.30 ->  2.6.32 timeframe which may have increased the GFP_ATOMIC
> demands upon the page allocator?
>
> Thanks.
>    


-- 
I have a truly marvellous patch that fixes the bug which this
signature is too narrow to contain.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-08 19:39   ` Avi Kivity
@ 2010-04-08 20:04     ` Michael S. Tsirkin
  2010-04-09 10:15       ` Robert Wimmer
  0 siblings, 1 reply; 62+ messages in thread
From: Michael S. Tsirkin @ 2010-04-08 20:04 UTC (permalink / raw)
  To: Avi Kivity
  Cc: Andrew Morton, linux-mm, bugzilla-daemon, bugme-daemon,
	Rusty Russell, kernel, Mel Gorman

On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote:
> cc: mst
>
> On 04/08/2010 10:34 PM, Andrew Morton wrote:
>> (switched to email.  Please respond via emailed reply-to-all, not via the
>> bugzilla web interface).
>>
>> On Wed, 7 Apr 2010 10:29:20 GMT
>> bugzilla-daemon@bugzilla.kernel.org wrote:
>>
>>    
>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709
>>>
>>>             Summary: swapper page allocation failure
>>>             Product: Memory Management
>>>             Version: 2.5
>>>      Kernel Version: 2.6.32 and 2.6.33
>>>            Platform: All
>>>          OS/Version: Linux
>>>                Tree: Mainline
>>>              Status: NEW
>>>            Severity: normal
>>>            Priority: P1
>>>           Component: Slab Allocator
>>>          AssignedTo: akpm@linux-foundation.org
>>>          ReportedBy: kernel@tauceti.net
>>>          Regression: No
>>>
>>>
>>> Created an attachment (id=25903)
>>>   -->  (https://bugzilla.kernel.org/attachment.cgi?id=25903)
>>> dmesg output
>>>
>>> I'm having problems with "swapper page allocation failure's" since upgrading
>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual
>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As
>>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after
>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see
>>> attachment of dmesg output). The guest is only running a Apache webserver and
>>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried
>>> different kernel configurations (e.g. a unmodified version from Sabayon Linux
>>> Distribution) but doesn't help. Load of the guest (and host) is very low.
>>> Network traffic is about 20-50 MBit/s.
>>>
>>>      
>> hm, this is a regression.
>>
>> : [  454.006706] users: page allocation failure. order:0, mode:0x20
>> : [  454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2
>> : [  454.006714] Call Trace:
>> : [  454.006717]<IRQ>   [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615
>> : [  454.006796]  [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d
>> : [  454.006820]  [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f
>> : [  454.006842]  [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f
>> : [  454.006846]  [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7
>> : [  454.006858]  [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4
>> : [  454.006873]  [<ffffffff8176035d>] net_rx_action+0xad/0x1a5
>> : [  454.006882]  [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127
>> : [  454.006897]  [<ffffffff81008ffc>] call_softirq+0x1c/0x30
>> : [  454.006901]  [<ffffffff8100af01>] do_softirq+0x41/0x7e
>> : [  454.006904]  [<ffffffff8104b3e3>] irq_exit+0x36/0x75
>> : [  454.006907]  [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1
>> : [  454.006926]  [<ffffffff8183bc13>] ret_from_intr+0x0/0x11
>> : [  454.006928]<EOI>   [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7
>> : [  454.006942]  [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7
>> : [  454.006946]  [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35
>> : [  454.006949]  [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b
>> : [  454.006953]  [<ffffffff810aba67>] __do_fault+0x3c4/0x492
>> : [  454.006957]  [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8
>> : [  454.006966]  [<ffffffff810deb59>] ? path_put+0x2c/0x30
>> : [  454.006975]  [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a
>> : [  454.006979]  [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd
>> : [  454.006982]  [<ffffffff8183bef5>] page_fault+0x25/0x30
>> : [  454.006985] Mem-Info:
>> : [  454.006987] Node 0 DMA per-cpu:
>> : [  454.006990] CPU    0: hi:    0, btch:   1 usd:   0
>> : [  454.006992] CPU    1: hi:    0, btch:   1 usd:   0
>> : [  454.006993] Node 0 DMA32 per-cpu:
>> : [  454.006996] CPU    0: hi:  186, btch:  31 usd: 185
>> : [  454.006998] CPU    1: hi:  186, btch:  31 usd: 112
>> : [  454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0
>> : [  454.007005]  active_file:4882 inactive_file:205902 isolated_file:0
>> : [  454.007006]  unevictable:0 dirty:11 writeback:0 unstable:0
>> : [  454.007007]  free:1385 slab_reclaimable:2445 slab_unreclaimable:4466
>> : [  454.007008]  mapped:1895 shmem:113 pagetables:1370 bounce:0
>> : [  454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>> : [  454.007021] lowmem_reserve[]: 0 994 994 994
>> : [  454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>> : [  454.007036] lowmem_reserve[]: 0 0 0 0
>> : [  454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB
>> : [  454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB
>> : [  454.007059] 210914 total pagecache pages
>> : [  454.007061] 0 pages in swap cache
>> : [  454.007063] Swap cache stats: add 0, delete 0, find 0/0
>> : [  454.007065] Free swap  = 1959924kB
>> : [  454.007067] Total swap = 1959924kB
>> : [  454.014238] 262140 pages RAM
>> : [  454.014241] 7489 pages reserved
>> : [  454.014242] 21430 pages shared
>> : [  454.014244] 247174 pages non-shared
>>
>> Either page reclaim got worse or kvm/virtio-net got more aggressive.
>>
>> Avi, Rusty: can you think of any changes in the KVM/virtio area in the
>> 2.6.30 ->  2.6.32 timeframe which may have increased the GFP_ATOMIC
>> demands upon the page allocator?
>>
>> Thanks.
>>    

On the contrary, with commit
3161e453e496eb5643faad30fff5a5ab183da0fe
we should be using GFP_ATOMIC less.
But maybe there's a bug and it has the reverse effect somehow ...

Robert, could you pls try 3161e453e496eb5643faad30fff5a5ab183da0fe
and if that *does* have the problem,
0b4f2928f14c4a9770b0866923fc81beb7f4aa57?

-- 
MST

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-08 20:04     ` Michael S. Tsirkin
@ 2010-04-09 10:15       ` Robert Wimmer
  2010-04-11 11:03         ` Michael S. Tsirkin
  0 siblings, 1 reply; 62+ messages in thread
From: Robert Wimmer @ 2010-04-09 10:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	bugme-daemon, Rusty Russell, Mel Gorman

I'm not really a git hero so here is what I've done:

cd /usr/src
git clone
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
cd linux
git checkout -b mykernel 0b4f2928f14c4a9770b0866923fc81beb7f4aa57

Then I've checked

drivers/net/virtio_net.c
drivers/net/smc91x.c

if the changes commited where not in there.
Next I build my kernel as usual. I used my .config
from 2.6.30 (which is working fine in a several
guests / .config see here:
https://bugzilla.kernel.org/attachment.cgi?id=25925)
and build the kernel

genkernel --menuconfig --lvm --oldconfig all

which finally gave me a 2.6.31-rc5. I should mention
that 2.6.30 was using SLUB. So here is the output
from the 2.6.31-rc5 kernel running about 20 min.:
https://bugzilla.kernel.org/attachment.cgi?id=25926

Seems not very usefull to me. I'm currently compiling
the same kernel with SLAB.

Please let me know if the git commands above are
right and/or if you need other kernel options enabled.

Thanks!
Robert

On 04/08/10 22:04, Michael S. Tsirkin wrote:
> On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote:
>   
>> cc: mst
>>
>> On 04/08/2010 10:34 PM, Andrew Morton wrote:
>>     
>>> (switched to email.  Please respond via emailed reply-to-all, not via the
>>> bugzilla web interface).
>>>
>>> On Wed, 7 Apr 2010 10:29:20 GMT
>>> bugzilla-daemon@bugzilla.kernel.org wrote:
>>>
>>>    
>>>       
>>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709
>>>>
>>>>             Summary: swapper page allocation failure
>>>>             Product: Memory Management
>>>>             Version: 2.5
>>>>      Kernel Version: 2.6.32 and 2.6.33
>>>>            Platform: All
>>>>          OS/Version: Linux
>>>>                Tree: Mainline
>>>>              Status: NEW
>>>>            Severity: normal
>>>>            Priority: P1
>>>>           Component: Slab Allocator
>>>>          AssignedTo: akpm@linux-foundation.org
>>>>          ReportedBy: kernel@tauceti.net
>>>>          Regression: No
>>>>
>>>>
>>>> Created an attachment (id=25903)
>>>>   -->  (https://bugzilla.kernel.org/attachment.cgi?id=25903)
>>>> dmesg output
>>>>
>>>> I'm having problems with "swapper page allocation failure's" since upgrading
>>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual
>>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As
>>>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after
>>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see
>>>> attachment of dmesg output). The guest is only running a Apache webserver and
>>>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried
>>>> different kernel configurations (e.g. a unmodified version from Sabayon Linux
>>>> Distribution) but doesn't help. Load of the guest (and host) is very low.
>>>> Network traffic is about 20-50 MBit/s.
>>>>
>>>>      
>>>>         
>>> hm, this is a regression.
>>>
>>> : [  454.006706] users: page allocation failure. order:0, mode:0x20
>>> : [  454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2
>>> : [  454.006714] Call Trace:
>>> : [  454.006717]<IRQ>   [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615
>>> : [  454.006796]  [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d
>>> : [  454.006820]  [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f
>>> : [  454.006842]  [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f
>>> : [  454.006846]  [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7
>>> : [  454.006858]  [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4
>>> : [  454.006873]  [<ffffffff8176035d>] net_rx_action+0xad/0x1a5
>>> : [  454.006882]  [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127
>>> : [  454.006897]  [<ffffffff81008ffc>] call_softirq+0x1c/0x30
>>> : [  454.006901]  [<ffffffff8100af01>] do_softirq+0x41/0x7e
>>> : [  454.006904]  [<ffffffff8104b3e3>] irq_exit+0x36/0x75
>>> : [  454.006907]  [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1
>>> : [  454.006926]  [<ffffffff8183bc13>] ret_from_intr+0x0/0x11
>>> : [  454.006928]<EOI>   [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7
>>> : [  454.006942]  [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7
>>> : [  454.006946]  [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35
>>> : [  454.006949]  [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b
>>> : [  454.006953]  [<ffffffff810aba67>] __do_fault+0x3c4/0x492
>>> : [  454.006957]  [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8
>>> : [  454.006966]  [<ffffffff810deb59>] ? path_put+0x2c/0x30
>>> : [  454.006975]  [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a
>>> : [  454.006979]  [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd
>>> : [  454.006982]  [<ffffffff8183bef5>] page_fault+0x25/0x30
>>> : [  454.006985] Mem-Info:
>>> : [  454.006987] Node 0 DMA per-cpu:
>>> : [  454.006990] CPU    0: hi:    0, btch:   1 usd:   0
>>> : [  454.006992] CPU    1: hi:    0, btch:   1 usd:   0
>>> : [  454.006993] Node 0 DMA32 per-cpu:
>>> : [  454.006996] CPU    0: hi:  186, btch:  31 usd: 185
>>> : [  454.006998] CPU    1: hi:  186, btch:  31 usd: 112
>>> : [  454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0
>>> : [  454.007005]  active_file:4882 inactive_file:205902 isolated_file:0
>>> : [  454.007006]  unevictable:0 dirty:11 writeback:0 unstable:0
>>> : [  454.007007]  free:1385 slab_reclaimable:2445 slab_unreclaimable:4466
>>> : [  454.007008]  mapped:1895 shmem:113 pagetables:1370 bounce:0
>>> : [  454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>> : [  454.007021] lowmem_reserve[]: 0 994 994 994
>>> : [  454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>> : [  454.007036] lowmem_reserve[]: 0 0 0 0
>>> : [  454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB
>>> : [  454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB
>>> : [  454.007059] 210914 total pagecache pages
>>> : [  454.007061] 0 pages in swap cache
>>> : [  454.007063] Swap cache stats: add 0, delete 0, find 0/0
>>> : [  454.007065] Free swap  = 1959924kB
>>> : [  454.007067] Total swap = 1959924kB
>>> : [  454.014238] 262140 pages RAM
>>> : [  454.014241] 7489 pages reserved
>>> : [  454.014242] 21430 pages shared
>>> : [  454.014244] 247174 pages non-shared
>>>
>>> Either page reclaim got worse or kvm/virtio-net got more aggressive.
>>>
>>> Avi, Rusty: can you think of any changes in the KVM/virtio area in the
>>> 2.6.30 ->  2.6.32 timeframe which may have increased the GFP_ATOMIC
>>> demands upon the page allocator?
>>>
>>> Thanks.
>>>    
>>>       
> On the contrary, with commit
> 3161e453e496eb5643faad30fff5a5ab183da0fe
> we should be using GFP_ATOMIC less.
> But maybe there's a bug and it has the reverse effect somehow ...
>
> Robert, could you pls try 3161e453e496eb5643faad30fff5a5ab183da0fe
> and if that *does* have the problem,
> 0b4f2928f14c4a9770b0866923fc81beb7f4aa57?
>
>   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-09 10:15       ` Robert Wimmer
@ 2010-04-11 11:03         ` Michael S. Tsirkin
  2010-04-12  9:25           ` Robert Wimmer
  0 siblings, 1 reply; 62+ messages in thread
From: Michael S. Tsirkin @ 2010-04-11 11:03 UTC (permalink / raw)
  To: Robert Wimmer
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	bugme-daemon, Rusty Russell, Mel Gorman

On Fri, Apr 09, 2010 at 12:15:01PM +0200, Robert Wimmer wrote:
> I'm not really a git hero so here is what I've done:
> 
> cd /usr/src
> git clone
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
> cd linux
> git checkout -b mykernel 0b4f2928f14c4a9770b0866923fc81beb7f4aa57

Looks right.

> Then I've checked
> 
> drivers/net/virtio_net.c
> drivers/net/smc91x.c
> 
> if the changes commited where not in there.
> Next I build my kernel as usual. I used my .config
> from 2.6.30 (which is working fine in a several
> guests / .config see here:
> https://bugzilla.kernel.org/attachment.cgi?id=25925)
> and build the kernel
> 
> genkernel --menuconfig --lvm --oldconfig all
> 
> which finally gave me a 2.6.31-rc5.

That's right.

> I should mention
> that 2.6.30 was using SLUB. So here is the output
> from the 2.6.31-rc5 kernel running about 20 min.:
> https://bugzilla.kernel.org/attachment.cgi?id=25926

Hmm, so we see the error here as well?

> Seems not very usefull to me. I'm currently compiling
> the same kernel with SLAB.
> 
> Please let me know if the git commands above are
> right and/or if you need other kernel options enabled.

Looks right. You don't have to add -b flag if you don't
want to.

> Thanks!
> Robert

Hmm, I do not see anything else that seems related.
Could you please try to bisect?

git bisect start v2.6.31 v2.6.30 -- drivers/virtio/ drivers/net/virtio_net.c

should help assuming the change that triggers this is in virtio.


> On 04/08/10 22:04, Michael S. Tsirkin wrote:
> > On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote:
> >   
> >> cc: mst
> >>
> >> On 04/08/2010 10:34 PM, Andrew Morton wrote:
> >>     
> >>> (switched to email.  Please respond via emailed reply-to-all, not via the
> >>> bugzilla web interface).
> >>>
> >>> On Wed, 7 Apr 2010 10:29:20 GMT
> >>> bugzilla-daemon@bugzilla.kernel.org wrote:
> >>>
> >>>    
> >>>       
> >>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709
> >>>>
> >>>>             Summary: swapper page allocation failure
> >>>>             Product: Memory Management
> >>>>             Version: 2.5
> >>>>      Kernel Version: 2.6.32 and 2.6.33
> >>>>            Platform: All
> >>>>          OS/Version: Linux
> >>>>                Tree: Mainline
> >>>>              Status: NEW
> >>>>            Severity: normal
> >>>>            Priority: P1
> >>>>           Component: Slab Allocator
> >>>>          AssignedTo: akpm@linux-foundation.org
> >>>>          ReportedBy: kernel@tauceti.net
> >>>>          Regression: No
> >>>>
> >>>>
> >>>> Created an attachment (id=25903)
> >>>>   -->  (https://bugzilla.kernel.org/attachment.cgi?id=25903)
> >>>> dmesg output
> >>>>
> >>>> I'm having problems with "swapper page allocation failure's" since upgrading
> >>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual
> >>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As
> >>>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after
> >>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see
> >>>> attachment of dmesg output). The guest is only running a Apache webserver and
> >>>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried
> >>>> different kernel configurations (e.g. a unmodified version from Sabayon Linux
> >>>> Distribution) but doesn't help. Load of the guest (and host) is very low.
> >>>> Network traffic is about 20-50 MBit/s.
> >>>>
> >>>>      
> >>>>         
> >>> hm, this is a regression.
> >>>
> >>> : [  454.006706] users: page allocation failure. order:0, mode:0x20
> >>> : [  454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2
> >>> : [  454.006714] Call Trace:
> >>> : [  454.006717]<IRQ>   [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615
> >>> : [  454.006796]  [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d
> >>> : [  454.006820]  [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f
> >>> : [  454.006842]  [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f
> >>> : [  454.006846]  [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7
> >>> : [  454.006858]  [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4
> >>> : [  454.006873]  [<ffffffff8176035d>] net_rx_action+0xad/0x1a5
> >>> : [  454.006882]  [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127
> >>> : [  454.006897]  [<ffffffff81008ffc>] call_softirq+0x1c/0x30
> >>> : [  454.006901]  [<ffffffff8100af01>] do_softirq+0x41/0x7e
> >>> : [  454.006904]  [<ffffffff8104b3e3>] irq_exit+0x36/0x75
> >>> : [  454.006907]  [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1
> >>> : [  454.006926]  [<ffffffff8183bc13>] ret_from_intr+0x0/0x11
> >>> : [  454.006928]<EOI>   [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7
> >>> : [  454.006942]  [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7
> >>> : [  454.006946]  [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35
> >>> : [  454.006949]  [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b
> >>> : [  454.006953]  [<ffffffff810aba67>] __do_fault+0x3c4/0x492
> >>> : [  454.006957]  [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8
> >>> : [  454.006966]  [<ffffffff810deb59>] ? path_put+0x2c/0x30
> >>> : [  454.006975]  [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a
> >>> : [  454.006979]  [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd
> >>> : [  454.006982]  [<ffffffff8183bef5>] page_fault+0x25/0x30
> >>> : [  454.006985] Mem-Info:
> >>> : [  454.006987] Node 0 DMA per-cpu:
> >>> : [  454.006990] CPU    0: hi:    0, btch:   1 usd:   0
> >>> : [  454.006992] CPU    1: hi:    0, btch:   1 usd:   0
> >>> : [  454.006993] Node 0 DMA32 per-cpu:
> >>> : [  454.006996] CPU    0: hi:  186, btch:  31 usd: 185
> >>> : [  454.006998] CPU    1: hi:  186, btch:  31 usd: 112
> >>> : [  454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0
> >>> : [  454.007005]  active_file:4882 inactive_file:205902 isolated_file:0
> >>> : [  454.007006]  unevictable:0 dirty:11 writeback:0 unstable:0
> >>> : [  454.007007]  free:1385 slab_reclaimable:2445 slab_unreclaimable:4466
> >>> : [  454.007008]  mapped:1895 shmem:113 pagetables:1370 bounce:0
> >>> : [  454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> >>> : [  454.007021] lowmem_reserve[]: 0 994 994 994
> >>> : [  454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> >>> : [  454.007036] lowmem_reserve[]: 0 0 0 0
> >>> : [  454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB
> >>> : [  454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB
> >>> : [  454.007059] 210914 total pagecache pages
> >>> : [  454.007061] 0 pages in swap cache
> >>> : [  454.007063] Swap cache stats: add 0, delete 0, find 0/0
> >>> : [  454.007065] Free swap  = 1959924kB
> >>> : [  454.007067] Total swap = 1959924kB
> >>> : [  454.014238] 262140 pages RAM
> >>> : [  454.014241] 7489 pages reserved
> >>> : [  454.014242] 21430 pages shared
> >>> : [  454.014244] 247174 pages non-shared
> >>>
> >>> Either page reclaim got worse or kvm/virtio-net got more aggressive.
> >>>
> >>> Avi, Rusty: can you think of any changes in the KVM/virtio area in the
> >>> 2.6.30 ->  2.6.32 timeframe which may have increased the GFP_ATOMIC
> >>> demands upon the page allocator?
> >>>
> >>> Thanks.
> >>>    
> >>>       
> > On the contrary, with commit
> > 3161e453e496eb5643faad30fff5a5ab183da0fe
> > we should be using GFP_ATOMIC less.
> > But maybe there's a bug and it has the reverse effect somehow ...
> >
> > Robert, could you pls try 3161e453e496eb5643faad30fff5a5ab183da0fe
> > and if that *does* have the problem,
> > 0b4f2928f14c4a9770b0866923fc81beb7f4aa57?
> >
> >   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-11 11:03         ` Michael S. Tsirkin
@ 2010-04-12  9:25           ` Robert Wimmer
  2010-04-12 11:23             ` Michael S. Tsirkin
  0 siblings, 1 reply; 62+ messages in thread
From: Robert Wimmer @ 2010-04-12  9:25 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	bugme-daemon, Rusty Russell, Mel Gorman

server10:/usr/src/linux # git bisect start v2.6.31 v2.6.30 --
drivers/virtio/ drivers/net/virtio_net.c
Bisecting: 12 revisions left to test after this (roughly 4 steps)
[e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance id_matching
for virtio drivers

Today I've upgraded to qemu-kvm-0.12.3-r1 (Gentoo package)
but doesn't help. Still getting "page allocation failure" with
2.6.31-rc5.

Does it makes sense to use the same 2.6.31-rc5 kernel
in the host and guest for testing? Currently I'm still using 2.6.32
in host and testing 2.6.31-rc5 in guest until "crashes".
Then I start the guest with 2.6.30 again which works
without trouble with 2.6.32 as host.

This is really strange. I have hosts with 2.6.32 running
guests with 2.6.32 which works perfectly. These hosts
and guests running on HP DL 380 G6 with Intel Xeon X5560.
The guests which don't work with 2.6.32 (and 2.6.32
as host) running on HP DL 380 G5 with Intel Xeon L5420.
(All guests) and (all hosts) have the same packages
and the same versions installed and the same kernel
configs (hosts and guests using different .config but the
difference is very small e.g. CONFIG_PARAVIRT_SPINLOCKS=y,
CONFIG_PARAVIRT_GUEST=y in guests but not in hosts
.config).

I've had problems with qemu-kvm 0.12.2 with high network
traffic which was solved by a patch submitted by Tom
Lendacky:

"Fix a race condition where qemu finds that there are not enough virtio
ring buffers available and the guest make more buffers available before
qemu can enable notifications."
http://www.mail-archive.com/kvm@vger.kernel.org/msg28667.html

It was a real lifesaver for the HP DL 380 G6 mentioned
above but maybe this is now causing the problems with the G5 machines.
The symptoms are the same. I can still log into the guest
via VNC but the network is down.

Thanks!
Robert


On 04/11/10 13:03, Michael S. Tsirkin wrote:
> On Fri, Apr 09, 2010 at 12:15:01PM +0200, Robert Wimmer wrote:
>   
>> I'm not really a git hero so here is what I've done:
>>
>> cd /usr/src
>> git clone
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
>> cd linux
>> git checkout -b mykernel 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
>>     
> Looks right.
>
>   
>> Then I've checked
>>
>> drivers/net/virtio_net.c
>> drivers/net/smc91x.c
>>
>> if the changes commited where not in there.
>> Next I build my kernel as usual. I used my .config
>> from 2.6.30 (which is working fine in a several
>> guests / .config see here:
>> https://bugzilla.kernel.org/attachment.cgi?id=25925)
>> and build the kernel
>>
>> genkernel --menuconfig --lvm --oldconfig all
>>
>> which finally gave me a 2.6.31-rc5.
>>     
> That's right.
>
>   
>> I should mention
>> that 2.6.30 was using SLUB. So here is the output
>> from the 2.6.31-rc5 kernel running about 20 min.:
>> https://bugzilla.kernel.org/attachment.cgi?id=25926
>>     
> Hmm, so we see the error here as well?
>
>   
>> Seems not very usefull to me. I'm currently compiling
>> the same kernel with SLAB.
>>
>> Please let me know if the git commands above are
>> right and/or if you need other kernel options enabled.
>>     
> Looks right. You don't have to add -b flag if you don't
> want to.
>
>   
>> Thanks!
>> Robert
>>     
> Hmm, I do not see anything else that seems related.
> Could you please try to bisect?
>
> git bisect start v2.6.31 v2.6.30 -- drivers/virtio/ drivers/net/virtio_net.c
>
> should help assuming the change that triggers this is in virtio.
>
>
>   
>> On 04/08/10 22:04, Michael S. Tsirkin wrote:
>>     
>>> On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote:
>>>   
>>>       
>>>> cc: mst
>>>>
>>>> On 04/08/2010 10:34 PM, Andrew Morton wrote:
>>>>     
>>>>         
>>>>> (switched to email.  Please respond via emailed reply-to-all, not via the
>>>>> bugzilla web interface).
>>>>>
>>>>> On Wed, 7 Apr 2010 10:29:20 GMT
>>>>> bugzilla-daemon@bugzilla.kernel.org wrote:
>>>>>
>>>>>    
>>>>>       
>>>>>           
>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709
>>>>>>
>>>>>>             Summary: swapper page allocation failure
>>>>>>             Product: Memory Management
>>>>>>             Version: 2.5
>>>>>>      Kernel Version: 2.6.32 and 2.6.33
>>>>>>            Platform: All
>>>>>>          OS/Version: Linux
>>>>>>                Tree: Mainline
>>>>>>              Status: NEW
>>>>>>            Severity: normal
>>>>>>            Priority: P1
>>>>>>           Component: Slab Allocator
>>>>>>          AssignedTo: akpm@linux-foundation.org
>>>>>>          ReportedBy: kernel@tauceti.net
>>>>>>          Regression: No
>>>>>>
>>>>>>
>>>>>> Created an attachment (id=25903)
>>>>>>   -->  (https://bugzilla.kernel.org/attachment.cgi?id=25903)
>>>>>> dmesg output
>>>>>>
>>>>>> I'm having problems with "swapper page allocation failure's" since upgrading
>>>>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual
>>>>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As
>>>>>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after
>>>>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see
>>>>>> attachment of dmesg output). The guest is only running a Apache webserver and
>>>>>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried
>>>>>> different kernel configurations (e.g. a unmodified version from Sabayon Linux
>>>>>> Distribution) but doesn't help. Load of the guest (and host) is very low.
>>>>>> Network traffic is about 20-50 MBit/s.
>>>>>>
>>>>>>      
>>>>>>         
>>>>>>             
>>>>> hm, this is a regression.
>>>>>
>>>>> : [  454.006706] users: page allocation failure. order:0, mode:0x20
>>>>> : [  454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2
>>>>> : [  454.006714] Call Trace:
>>>>> : [  454.006717]<IRQ>   [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615
>>>>> : [  454.006796]  [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d
>>>>> : [  454.006820]  [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f
>>>>> : [  454.006842]  [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f
>>>>> : [  454.006846]  [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7
>>>>> : [  454.006858]  [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4
>>>>> : [  454.006873]  [<ffffffff8176035d>] net_rx_action+0xad/0x1a5
>>>>> : [  454.006882]  [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127
>>>>> : [  454.006897]  [<ffffffff81008ffc>] call_softirq+0x1c/0x30
>>>>> : [  454.006901]  [<ffffffff8100af01>] do_softirq+0x41/0x7e
>>>>> : [  454.006904]  [<ffffffff8104b3e3>] irq_exit+0x36/0x75
>>>>> : [  454.006907]  [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1
>>>>> : [  454.006926]  [<ffffffff8183bc13>] ret_from_intr+0x0/0x11
>>>>> : [  454.006928]<EOI>   [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7
>>>>> : [  454.006942]  [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7
>>>>> : [  454.006946]  [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35
>>>>> : [  454.006949]  [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b
>>>>> : [  454.006953]  [<ffffffff810aba67>] __do_fault+0x3c4/0x492
>>>>> : [  454.006957]  [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8
>>>>> : [  454.006966]  [<ffffffff810deb59>] ? path_put+0x2c/0x30
>>>>> : [  454.006975]  [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a
>>>>> : [  454.006979]  [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd
>>>>> : [  454.006982]  [<ffffffff8183bef5>] page_fault+0x25/0x30
>>>>> : [  454.006985] Mem-Info:
>>>>> : [  454.006987] Node 0 DMA per-cpu:
>>>>> : [  454.006990] CPU    0: hi:    0, btch:   1 usd:   0
>>>>> : [  454.006992] CPU    1: hi:    0, btch:   1 usd:   0
>>>>> : [  454.006993] Node 0 DMA32 per-cpu:
>>>>> : [  454.006996] CPU    0: hi:  186, btch:  31 usd: 185
>>>>> : [  454.006998] CPU    1: hi:  186, btch:  31 usd: 112
>>>>> : [  454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0
>>>>> : [  454.007005]  active_file:4882 inactive_file:205902 isolated_file:0
>>>>> : [  454.007006]  unevictable:0 dirty:11 writeback:0 unstable:0
>>>>> : [  454.007007]  free:1385 slab_reclaimable:2445 slab_unreclaimable:4466
>>>>> : [  454.007008]  mapped:1895 shmem:113 pagetables:1370 bounce:0
>>>>> : [  454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>>>> : [  454.007021] lowmem_reserve[]: 0 994 994 994
>>>>> : [  454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>>>> : [  454.007036] lowmem_reserve[]: 0 0 0 0
>>>>> : [  454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB
>>>>> : [  454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB
>>>>> : [  454.007059] 210914 total pagecache pages
>>>>> : [  454.007061] 0 pages in swap cache
>>>>> : [  454.007063] Swap cache stats: add 0, delete 0, find 0/0
>>>>> : [  454.007065] Free swap  = 1959924kB
>>>>> : [  454.007067] Total swap = 1959924kB
>>>>> : [  454.014238] 262140 pages RAM
>>>>> : [  454.014241] 7489 pages reserved
>>>>> : [  454.014242] 21430 pages shared
>>>>> : [  454.014244] 247174 pages non-shared
>>>>>
>>>>> Either page reclaim got worse or kvm/virtio-net got more aggressive.
>>>>>
>>>>> Avi, Rusty: can you think of any changes in the KVM/virtio area in the
>>>>> 2.6.30 ->  2.6.32 timeframe which may have increased the GFP_ATOMIC
>>>>> demands upon the page allocator?
>>>>>
>>>>> Thanks.
>>>>>    
>>>>>       
>>>>>           
>>> On the contrary, with commit
>>> 3161e453e496eb5643faad30fff5a5ab183da0fe
>>> we should be using GFP_ATOMIC less.
>>> But maybe there's a bug and it has the reverse effect somehow ...
>>>
>>> Robert, could you pls try 3161e453e496eb5643faad30fff5a5ab183da0fe
>>> and if that *does* have the problem,
>>> 0b4f2928f14c4a9770b0866923fc81beb7f4aa57?
>>>
>>>   
>>>       

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-12  9:25           ` Robert Wimmer
@ 2010-04-12 11:23             ` Michael S. Tsirkin
  2010-04-12 13:50               ` Robert Wimmer
  0 siblings, 1 reply; 62+ messages in thread
From: Michael S. Tsirkin @ 2010-04-12 11:23 UTC (permalink / raw)
  To: Robert Wimmer
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	bugme-daemon, Rusty Russell, Mel Gorman

On Mon, Apr 12, 2010 at 11:25:26AM +0200, Robert Wimmer wrote:
> server10:/usr/src/linux # git bisect start v2.6.31 v2.6.30 --
> drivers/virtio/ drivers/net/virtio_net.c
> Bisecting: 12 revisions left to test after this (roughly 4 steps)
> [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance id_matching
> for virtio drivers
> 

Sorry I wasn't clear. the way to use bisect is as follows:
- first start as you did now.
1. now build kernel, install and test
2. if bug is there, type 'git bisect bad'
3. if bug is not there, type 'git bisect good'
4. The above will give you another kernel version to test
   if so go back to step 1
6. this will be repeated about 4 times (number of steps above)
7. in the end you will get the first revision which has the
   problem. Let's assume it is revision ABCDEF.

   Type git bisect log to see your history.

8. Now git reset --hard ABCDEF~1 and try again.

If you see the problem with ABCDEF but not ABCDEF~1
then we will have a good guess at the culprit.

Some more tips here:
http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html


> Today I've upgraded to qemu-kvm-0.12.3-r1 (Gentoo package)
> but doesn't help. Still getting "page allocation failure" with
> 2.6.31-rc5.
> 
> Does it makes sense to use the same 2.6.31-rc5 kernel
> in the host and guest for testing? Currently I'm still using 2.6.32
> in host and testing 2.6.31-rc5 in guest until "crashes".
> Then I start the guest with 2.6.30 again which works
> without trouble with 2.6.32 as host.
> 
> This is really strange. I have hosts with 2.6.32 running
> guests with 2.6.32 which works perfectly. These hosts
> and guests running on HP DL 380 G6 with Intel Xeon X5560.
> The guests which don't work with 2.6.32 (and 2.6.32
> as host) running on HP DL 380 G5 with Intel Xeon L5420.

Hmm. Some subtle race?

> (All guests) and (all hosts) have the same packages
> and the same versions installed and the same kernel
> configs (hosts and guests using different .config but the
> difference is very small e.g. CONFIG_PARAVIRT_SPINLOCKS=y,
> CONFIG_PARAVIRT_GUEST=y in guests but not in hosts
> .config).
> 
> I've had problems with qemu-kvm 0.12.2 with high network
> traffic which was solved by a patch submitted by Tom
> Lendacky:
> 
> "Fix a race condition where qemu finds that there are not enough virtio
> ring buffers available and the guest make more buffers available before
> qemu can enable notifications."
> http://www.mail-archive.com/kvm@vger.kernel.org/msg28667.html
> 
> It was a real lifesaver for the HP DL 380 G6 mentioned
> above but maybe this is now causing the problems with the G5 machines.
> The symptoms are the same. I can still log into the guest
> via VNC but the network is down.
> 
> Thanks!
> Robert
> 

For now the only thing we seem to know for sure is that on
specific hardware there's a regression between 2.6.30 and
2.6.31-rc5. Yes, it is possible that all it does
is expose a qemu bug, but it's hard to say.
Let's find out what change
does that, this should give us a hint.

> On 04/11/10 13:03, Michael S. Tsirkin wrote:
> > On Fri, Apr 09, 2010 at 12:15:01PM +0200, Robert Wimmer wrote:
> >   
> >> I'm not really a git hero so here is what I've done:
> >>
> >> cd /usr/src
> >> git clone
> >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
> >> cd linux
> >> git checkout -b mykernel 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
> >>     
> > Looks right.
> >
> >   
> >> Then I've checked
> >>
> >> drivers/net/virtio_net.c
> >> drivers/net/smc91x.c
> >>
> >> if the changes commited where not in there.
> >> Next I build my kernel as usual. I used my .config
> >> from 2.6.30 (which is working fine in a several
> >> guests / .config see here:
> >> https://bugzilla.kernel.org/attachment.cgi?id=25925)
> >> and build the kernel
> >>
> >> genkernel --menuconfig --lvm --oldconfig all
> >>
> >> which finally gave me a 2.6.31-rc5.
> >>     
> > That's right.
> >
> >   
> >> I should mention
> >> that 2.6.30 was using SLUB. So here is the output
> >> from the 2.6.31-rc5 kernel running about 20 min.:
> >> https://bugzilla.kernel.org/attachment.cgi?id=25926
> >>     
> > Hmm, so we see the error here as well?
> >
> >   
> >> Seems not very usefull to me. I'm currently compiling
> >> the same kernel with SLAB.
> >>
> >> Please let me know if the git commands above are
> >> right and/or if you need other kernel options enabled.
> >>     
> > Looks right. You don't have to add -b flag if you don't
> > want to.
> >
> >   
> >> Thanks!
> >> Robert
> >>     
> > Hmm, I do not see anything else that seems related.
> > Could you please try to bisect?
> >
> > git bisect start v2.6.31 v2.6.30 -- drivers/virtio/ drivers/net/virtio_net.c
> >
> > should help assuming the change that triggers this is in virtio.
> >
> >
> >   
> >> On 04/08/10 22:04, Michael S. Tsirkin wrote:
> >>     
> >>> On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote:
> >>>   
> >>>       
> >>>> cc: mst
> >>>>
> >>>> On 04/08/2010 10:34 PM, Andrew Morton wrote:
> >>>>     
> >>>>         
> >>>>> (switched to email.  Please respond via emailed reply-to-all, not via the
> >>>>> bugzilla web interface).
> >>>>>
> >>>>> On Wed, 7 Apr 2010 10:29:20 GMT
> >>>>> bugzilla-daemon@bugzilla.kernel.org wrote:
> >>>>>
> >>>>>    
> >>>>>       
> >>>>>           
> >>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709
> >>>>>>
> >>>>>>             Summary: swapper page allocation failure
> >>>>>>             Product: Memory Management
> >>>>>>             Version: 2.5
> >>>>>>      Kernel Version: 2.6.32 and 2.6.33
> >>>>>>            Platform: All
> >>>>>>          OS/Version: Linux
> >>>>>>                Tree: Mainline
> >>>>>>              Status: NEW
> >>>>>>            Severity: normal
> >>>>>>            Priority: P1
> >>>>>>           Component: Slab Allocator
> >>>>>>          AssignedTo: akpm@linux-foundation.org
> >>>>>>          ReportedBy: kernel@tauceti.net
> >>>>>>          Regression: No
> >>>>>>
> >>>>>>
> >>>>>> Created an attachment (id=25903)
> >>>>>>   -->  (https://bugzilla.kernel.org/attachment.cgi?id=25903)
> >>>>>> dmesg output
> >>>>>>
> >>>>>> I'm having problems with "swapper page allocation failure's" since upgrading
> >>>>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual
> >>>>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As
> >>>>>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after
> >>>>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see
> >>>>>> attachment of dmesg output). The guest is only running a Apache webserver and
> >>>>>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried
> >>>>>> different kernel configurations (e.g. a unmodified version from Sabayon Linux
> >>>>>> Distribution) but doesn't help. Load of the guest (and host) is very low.
> >>>>>> Network traffic is about 20-50 MBit/s.
> >>>>>>
> >>>>>>      
> >>>>>>         
> >>>>>>             
> >>>>> hm, this is a regression.
> >>>>>
> >>>>> : [  454.006706] users: page allocation failure. order:0, mode:0x20
> >>>>> : [  454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2
> >>>>> : [  454.006714] Call Trace:
> >>>>> : [  454.006717]<IRQ>   [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615
> >>>>> : [  454.006796]  [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d
> >>>>> : [  454.006820]  [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f
> >>>>> : [  454.006842]  [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f
> >>>>> : [  454.006846]  [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7
> >>>>> : [  454.006858]  [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4
> >>>>> : [  454.006873]  [<ffffffff8176035d>] net_rx_action+0xad/0x1a5
> >>>>> : [  454.006882]  [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127
> >>>>> : [  454.006897]  [<ffffffff81008ffc>] call_softirq+0x1c/0x30
> >>>>> : [  454.006901]  [<ffffffff8100af01>] do_softirq+0x41/0x7e
> >>>>> : [  454.006904]  [<ffffffff8104b3e3>] irq_exit+0x36/0x75
> >>>>> : [  454.006907]  [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1
> >>>>> : [  454.006926]  [<ffffffff8183bc13>] ret_from_intr+0x0/0x11
> >>>>> : [  454.006928]<EOI>   [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7
> >>>>> : [  454.006942]  [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7
> >>>>> : [  454.006946]  [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35
> >>>>> : [  454.006949]  [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b
> >>>>> : [  454.006953]  [<ffffffff810aba67>] __do_fault+0x3c4/0x492
> >>>>> : [  454.006957]  [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8
> >>>>> : [  454.006966]  [<ffffffff810deb59>] ? path_put+0x2c/0x30
> >>>>> : [  454.006975]  [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a
> >>>>> : [  454.006979]  [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd
> >>>>> : [  454.006982]  [<ffffffff8183bef5>] page_fault+0x25/0x30
> >>>>> : [  454.006985] Mem-Info:
> >>>>> : [  454.006987] Node 0 DMA per-cpu:
> >>>>> : [  454.006990] CPU    0: hi:    0, btch:   1 usd:   0
> >>>>> : [  454.006992] CPU    1: hi:    0, btch:   1 usd:   0
> >>>>> : [  454.006993] Node 0 DMA32 per-cpu:
> >>>>> : [  454.006996] CPU    0: hi:  186, btch:  31 usd: 185
> >>>>> : [  454.006998] CPU    1: hi:  186, btch:  31 usd: 112
> >>>>> : [  454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0
> >>>>> : [  454.007005]  active_file:4882 inactive_file:205902 isolated_file:0
> >>>>> : [  454.007006]  unevictable:0 dirty:11 writeback:0 unstable:0
> >>>>> : [  454.007007]  free:1385 slab_reclaimable:2445 slab_unreclaimable:4466
> >>>>> : [  454.007008]  mapped:1895 shmem:113 pagetables:1370 bounce:0
> >>>>> : [  454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> >>>>> : [  454.007021] lowmem_reserve[]: 0 994 994 994
> >>>>> : [  454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> >>>>> : [  454.007036] lowmem_reserve[]: 0 0 0 0
> >>>>> : [  454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB
> >>>>> : [  454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB
> >>>>> : [  454.007059] 210914 total pagecache pages
> >>>>> : [  454.007061] 0 pages in swap cache
> >>>>> : [  454.007063] Swap cache stats: add 0, delete 0, find 0/0
> >>>>> : [  454.007065] Free swap  = 1959924kB
> >>>>> : [  454.007067] Total swap = 1959924kB
> >>>>> : [  454.014238] 262140 pages RAM
> >>>>> : [  454.014241] 7489 pages reserved
> >>>>> : [  454.014242] 21430 pages shared
> >>>>> : [  454.014244] 247174 pages non-shared
> >>>>>
> >>>>> Either page reclaim got worse or kvm/virtio-net got more aggressive.
> >>>>>
> >>>>> Avi, Rusty: can you think of any changes in the KVM/virtio area in the
> >>>>> 2.6.30 ->  2.6.32 timeframe which may have increased the GFP_ATOMIC
> >>>>> demands upon the page allocator?
> >>>>>
> >>>>> Thanks.
> >>>>>    
> >>>>>       
> >>>>>           
> >>> On the contrary, with commit
> >>> 3161e453e496eb5643faad30fff5a5ab183da0fe
> >>> we should be using GFP_ATOMIC less.
> >>> But maybe there's a bug and it has the reverse effect somehow ...
> >>>
> >>> Robert, could you pls try 3161e453e496eb5643faad30fff5a5ab183da0fe
> >>> and if that *does* have the problem,
> >>> 0b4f2928f14c4a9770b0866923fc81beb7f4aa57?
> >>>
> >>>   
> >>>       

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-12 11:23             ` Michael S. Tsirkin
@ 2010-04-12 13:50               ` Robert Wimmer
  2010-04-12 13:52                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 62+ messages in thread
From: Robert Wimmer @ 2010-04-12 13:50 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	bugme-daemon, Rusty Russell, Mel Gorman

Sorry but I need some more git help. Here is what I've done.
Started with a fresh clone of the kernel:

cd /usr/src
git clone
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
cd linux
git checkout 0b4f2928f14c4a9770b0866923fc81beb7f4aa57

Since I already knew that this commit wasn't good I did

git bisect start
git bisect bad

compiled and started over. As expected the problem returns.
So I've done another

git bisect bad

but I always get the same commit:

kabul:/usr/src/linux # git bisect log
git bisect start
# bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix
compilation on SMP
git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
# bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix
compilation on SMP
git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
# bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix
compilation on SMP
git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57

I've expected that after each "git bisect bad" I get the previous
commit before the "bad" one. How can get the previous commit?
The bisect documentation couldn't help me.

Thanks!
Robert



On 04/12/10 13:23, Michael S. Tsirkin wrote:
> On Mon, Apr 12, 2010 at 11:25:26AM +0200, Robert Wimmer wrote:
>   
>> server10:/usr/src/linux # git bisect start v2.6.31 v2.6.30 --
>> drivers/virtio/ drivers/net/virtio_net.c
>> Bisecting: 12 revisions left to test after this (roughly 4 steps)
>> [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance id_matching
>> for virtio drivers
>>
>>     
> Sorry I wasn't clear. the way to use bisect is as follows:
> - first start as you did now.
> 1. now build kernel, install and test
> 2. if bug is there, type 'git bisect bad'
> 3. if bug is not there, type 'git bisect good'
> 4. The above will give you another kernel version to test
>    if so go back to step 1
> 6. this will be repeated about 4 times (number of steps above)
> 7. in the end you will get the first revision which has the
>    problem. Let's assume it is revision ABCDEF.
>
>    Type git bisect log to see your history.
>
> 8. Now git reset --hard ABCDEF~1 and try again.
>
> If you see the problem with ABCDEF but not ABCDEF~1
> then we will have a good guess at the culprit.
>
> Some more tips here:
> http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
>
>
>   
>> Today I've upgraded to qemu-kvm-0.12.3-r1 (Gentoo package)
>> but doesn't help. Still getting "page allocation failure" with
>> 2.6.31-rc5.
>>
>> Does it makes sense to use the same 2.6.31-rc5 kernel
>> in the host and guest for testing? Currently I'm still using 2.6.32
>> in host and testing 2.6.31-rc5 in guest until "crashes".
>> Then I start the guest with 2.6.30 again which works
>> without trouble with 2.6.32 as host.
>>
>> This is really strange. I have hosts with 2.6.32 running
>> guests with 2.6.32 which works perfectly. These hosts
>> and guests running on HP DL 380 G6 with Intel Xeon X5560.
>> The guests which don't work with 2.6.32 (and 2.6.32
>> as host) running on HP DL 380 G5 with Intel Xeon L5420.
>>     
> Hmm. Some subtle race?
>
>   
>> (All guests) and (all hosts) have the same packages
>> and the same versions installed and the same kernel
>> configs (hosts and guests using different .config but the
>> difference is very small e.g. CONFIG_PARAVIRT_SPINLOCKS=y,
>> CONFIG_PARAVIRT_GUEST=y in guests but not in hosts
>> .config).
>>
>> I've had problems with qemu-kvm 0.12.2 with high network
>> traffic which was solved by a patch submitted by Tom
>> Lendacky:
>>
>> "Fix a race condition where qemu finds that there are not enough virtio
>> ring buffers available and the guest make more buffers available before
>> qemu can enable notifications."
>> http://www.mail-archive.com/kvm@vger.kernel.org/msg28667.html
>>
>> It was a real lifesaver for the HP DL 380 G6 mentioned
>> above but maybe this is now causing the problems with the G5 machines.
>> The symptoms are the same. I can still log into the guest
>> via VNC but the network is down.
>>
>> Thanks!
>> Robert
>>
>>     
> For now the only thing we seem to know for sure is that on
> specific hardware there's a regression between 2.6.30 and
> 2.6.31-rc5. Yes, it is possible that all it does
> is expose a qemu bug, but it's hard to say.
> Let's find out what change
> does that, this should give us a hint.
>
>   
>> On 04/11/10 13:03, Michael S. Tsirkin wrote:
>>     
>>> On Fri, Apr 09, 2010 at 12:15:01PM +0200, Robert Wimmer wrote:
>>>   
>>>       
>>>> I'm not really a git hero so here is what I've done:
>>>>
>>>> cd /usr/src
>>>> git clone
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
>>>> cd linux
>>>> git checkout -b mykernel 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
>>>>     
>>>>         
>>> Looks right.
>>>
>>>   
>>>       
>>>> Then I've checked
>>>>
>>>> drivers/net/virtio_net.c
>>>> drivers/net/smc91x.c
>>>>
>>>> if the changes commited where not in there.
>>>> Next I build my kernel as usual. I used my .config
>>>> from 2.6.30 (which is working fine in a several
>>>> guests / .config see here:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=25925)
>>>> and build the kernel
>>>>
>>>> genkernel --menuconfig --lvm --oldconfig all
>>>>
>>>> which finally gave me a 2.6.31-rc5.
>>>>     
>>>>         
>>> That's right.
>>>
>>>   
>>>       
>>>> I should mention
>>>> that 2.6.30 was using SLUB. So here is the output
>>>> from the 2.6.31-rc5 kernel running about 20 min.:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=25926
>>>>     
>>>>         
>>> Hmm, so we see the error here as well?
>>>
>>>   
>>>       
>>>> Seems not very usefull to me. I'm currently compiling
>>>> the same kernel with SLAB.
>>>>
>>>> Please let me know if the git commands above are
>>>> right and/or if you need other kernel options enabled.
>>>>     
>>>>         
>>> Looks right. You don't have to add -b flag if you don't
>>> want to.
>>>
>>>   
>>>       
>>>> Thanks!
>>>> Robert
>>>>     
>>>>         
>>> Hmm, I do not see anything else that seems related.
>>> Could you please try to bisect?
>>>
>>> git bisect start v2.6.31 v2.6.30 -- drivers/virtio/ drivers/net/virtio_net.c
>>>
>>> should help assuming the change that triggers this is in virtio.
>>>
>>>
>>>   
>>>       
>>>> On 04/08/10 22:04, Michael S. Tsirkin wrote:
>>>>     
>>>>         
>>>>> On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote:
>>>>>   
>>>>>       
>>>>>           
>>>>>> cc: mst
>>>>>>
>>>>>> On 04/08/2010 10:34 PM, Andrew Morton wrote:
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> (switched to email.  Please respond via emailed reply-to-all, not via the
>>>>>>> bugzilla web interface).
>>>>>>>
>>>>>>> On Wed, 7 Apr 2010 10:29:20 GMT
>>>>>>> bugzilla-daemon@bugzilla.kernel.org wrote:
>>>>>>>
>>>>>>>    
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709
>>>>>>>>
>>>>>>>>             Summary: swapper page allocation failure
>>>>>>>>             Product: Memory Management
>>>>>>>>             Version: 2.5
>>>>>>>>      Kernel Version: 2.6.32 and 2.6.33
>>>>>>>>            Platform: All
>>>>>>>>          OS/Version: Linux
>>>>>>>>                Tree: Mainline
>>>>>>>>              Status: NEW
>>>>>>>>            Severity: normal
>>>>>>>>            Priority: P1
>>>>>>>>           Component: Slab Allocator
>>>>>>>>          AssignedTo: akpm@linux-foundation.org
>>>>>>>>          ReportedBy: kernel@tauceti.net
>>>>>>>>          Regression: No
>>>>>>>>
>>>>>>>>
>>>>>>>> Created an attachment (id=25903)
>>>>>>>>   -->  (https://bugzilla.kernel.org/attachment.cgi?id=25903)
>>>>>>>> dmesg output
>>>>>>>>
>>>>>>>> I'm having problems with "swapper page allocation failure's" since upgrading
>>>>>>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual
>>>>>>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As
>>>>>>>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after
>>>>>>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see
>>>>>>>> attachment of dmesg output). The guest is only running a Apache webserver and
>>>>>>>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried
>>>>>>>> different kernel configurations (e.g. a unmodified version from Sabayon Linux
>>>>>>>> Distribution) but doesn't help. Load of the guest (and host) is very low.
>>>>>>>> Network traffic is about 20-50 MBit/s.
>>>>>>>>
>>>>>>>>      
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>> hm, this is a regression.
>>>>>>>
>>>>>>> : [  454.006706] users: page allocation failure. order:0, mode:0x20
>>>>>>> : [  454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2
>>>>>>> : [  454.006714] Call Trace:
>>>>>>> : [  454.006717]<IRQ>   [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615
>>>>>>> : [  454.006796]  [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d
>>>>>>> : [  454.006820]  [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f
>>>>>>> : [  454.006842]  [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f
>>>>>>> : [  454.006846]  [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7
>>>>>>> : [  454.006858]  [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4
>>>>>>> : [  454.006873]  [<ffffffff8176035d>] net_rx_action+0xad/0x1a5
>>>>>>> : [  454.006882]  [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127
>>>>>>> : [  454.006897]  [<ffffffff81008ffc>] call_softirq+0x1c/0x30
>>>>>>> : [  454.006901]  [<ffffffff8100af01>] do_softirq+0x41/0x7e
>>>>>>> : [  454.006904]  [<ffffffff8104b3e3>] irq_exit+0x36/0x75
>>>>>>> : [  454.006907]  [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1
>>>>>>> : [  454.006926]  [<ffffffff8183bc13>] ret_from_intr+0x0/0x11
>>>>>>> : [  454.006928]<EOI>   [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7
>>>>>>> : [  454.006942]  [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7
>>>>>>> : [  454.006946]  [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35
>>>>>>> : [  454.006949]  [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b
>>>>>>> : [  454.006953]  [<ffffffff810aba67>] __do_fault+0x3c4/0x492
>>>>>>> : [  454.006957]  [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8
>>>>>>> : [  454.006966]  [<ffffffff810deb59>] ? path_put+0x2c/0x30
>>>>>>> : [  454.006975]  [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a
>>>>>>> : [  454.006979]  [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd
>>>>>>> : [  454.006982]  [<ffffffff8183bef5>] page_fault+0x25/0x30
>>>>>>> : [  454.006985] Mem-Info:
>>>>>>> : [  454.006987] Node 0 DMA per-cpu:
>>>>>>> : [  454.006990] CPU    0: hi:    0, btch:   1 usd:   0
>>>>>>> : [  454.006992] CPU    1: hi:    0, btch:   1 usd:   0
>>>>>>> : [  454.006993] Node 0 DMA32 per-cpu:
>>>>>>> : [  454.006996] CPU    0: hi:  186, btch:  31 usd: 185
>>>>>>> : [  454.006998] CPU    1: hi:  186, btch:  31 usd: 112
>>>>>>> : [  454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0
>>>>>>> : [  454.007005]  active_file:4882 inactive_file:205902 isolated_file:0
>>>>>>> : [  454.007006]  unevictable:0 dirty:11 writeback:0 unstable:0
>>>>>>> : [  454.007007]  free:1385 slab_reclaimable:2445 slab_unreclaimable:4466
>>>>>>> : [  454.007008]  mapped:1895 shmem:113 pagetables:1370 bounce:0
>>>>>>> : [  454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>>>>>> : [  454.007021] lowmem_reserve[]: 0 994 994 994
>>>>>>> : [  454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>>>>>> : [  454.007036] lowmem_reserve[]: 0 0 0 0
>>>>>>> : [  454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB
>>>>>>> : [  454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB
>>>>>>> : [  454.007059] 210914 total pagecache pages
>>>>>>> : [  454.007061] 0 pages in swap cache
>>>>>>> : [  454.007063] Swap cache stats: add 0, delete 0, find 0/0
>>>>>>> : [  454.007065] Free swap  = 1959924kB
>>>>>>> : [  454.007067] Total swap = 1959924kB
>>>>>>> : [  454.014238] 262140 pages RAM
>>>>>>> : [  454.014241] 7489 pages reserved
>>>>>>> : [  454.014242] 21430 pages shared
>>>>>>> : [  454.014244] 247174 pages non-shared
>>>>>>>
>>>>>>> Either page reclaim got worse or kvm/virtio-net got more aggressive.
>>>>>>>
>>>>>>> Avi, Rusty: can you think of any changes in the KVM/virtio area in the
>>>>>>> 2.6.30 ->  2.6.32 timeframe which may have increased the GFP_ATOMIC
>>>>>>> demands upon the page allocator?
>>>>>>>
>>>>>>> Thanks.
>>>>>>>    
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>> On the contrary, with commit
>>>>> 3161e453e496eb5643faad30fff5a5ab183da0fe
>>>>> we should be using GFP_ATOMIC less.
>>>>> But maybe there's a bug and it has the reverse effect somehow ...
>>>>>
>>>>> Robert, could you pls try 3161e453e496eb5643faad30fff5a5ab183da0fe
>>>>> and if that *does* have the problem,
>>>>> 0b4f2928f14c4a9770b0866923fc81beb7f4aa57?
>>>>>
>>>>>   
>>>>>       
>>>>>           

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-12 13:50               ` Robert Wimmer
@ 2010-04-12 13:52                 ` Michael S. Tsirkin
  2010-04-13  8:51                   ` Robert Wimmer
  0 siblings, 1 reply; 62+ messages in thread
From: Michael S. Tsirkin @ 2010-04-12 13:52 UTC (permalink / raw)
  To: Robert Wimmer
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	bugme-daemon, Rusty Russell, Mel Gorman

On Mon, Apr 12, 2010 at 03:50:31PM +0200, Robert Wimmer wrote:
> Sorry but I need some more git help. Here is what I've done.
> Started with a fresh clone of the kernel:
> 
> cd /usr/src
> git clone
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
> cd linux
> git checkout 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
> 
> Since I already knew that this commit wasn't good I did
> 
> git bisect start
> git bisect bad

I think what you miss is marking the good commit.
bisect does a binary search but it needs to know
both good and bad commits to search in the range.

Optionally, you can use '-- drivers/virtio/ drivers/net/virtio_net.c'
what this does is limit bisect to commits that touch files in
question. This way you get much less tests to run
(about 4) but after you find a first problematic commit
you must verify that a commit just before it does not have the issue.

If this turns out not to be the case, you'll have to
fallback on full bisect, and we will now this is some
other change in kernel that triggered the regression.


> compiled and started over. As expected the problem returns.
> So I've done another
> 
> git bisect bad
> 
> but I always get the same commit:
> 
> kabul:/usr/src/linux # git bisect log
> git bisect start
> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix
> compilation on SMP
> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix
> compilation on SMP
> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix
> compilation on SMP
> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
> 
> I've expected that after each "git bisect bad" I get the previous
> commit before the "bad" one. How can get the previous commit?
> The bisect documentation couldn't help me.
> 
> Thanks!
> Robert
> 
> 
> 
> On 04/12/10 13:23, Michael S. Tsirkin wrote:
> > On Mon, Apr 12, 2010 at 11:25:26AM +0200, Robert Wimmer wrote:
> >   
> >> server10:/usr/src/linux # git bisect start v2.6.31 v2.6.30 --
> >> drivers/virtio/ drivers/net/virtio_net.c
> >> Bisecting: 12 revisions left to test after this (roughly 4 steps)
> >> [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance id_matching
> >> for virtio drivers
> >>
> >>     
> > Sorry I wasn't clear. the way to use bisect is as follows:
> > - first start as you did now.
> > 1. now build kernel, install and test
> > 2. if bug is there, type 'git bisect bad'
> > 3. if bug is not there, type 'git bisect good'
> > 4. The above will give you another kernel version to test
> >    if so go back to step 1
> > 6. this will be repeated about 4 times (number of steps above)
> > 7. in the end you will get the first revision which has the
> >    problem. Let's assume it is revision ABCDEF.
> >
> >    Type git bisect log to see your history.
> >
> > 8. Now git reset --hard ABCDEF~1 and try again.
> >
> > If you see the problem with ABCDEF but not ABCDEF~1
> > then we will have a good guess at the culprit.
> >
> > Some more tips here:
> > http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
> >
> >
> >   
> >> Today I've upgraded to qemu-kvm-0.12.3-r1 (Gentoo package)
> >> but doesn't help. Still getting "page allocation failure" with
> >> 2.6.31-rc5.
> >>
> >> Does it makes sense to use the same 2.6.31-rc5 kernel
> >> in the host and guest for testing? Currently I'm still using 2.6.32
> >> in host and testing 2.6.31-rc5 in guest until "crashes".
> >> Then I start the guest with 2.6.30 again which works
> >> without trouble with 2.6.32 as host.
> >>
> >> This is really strange. I have hosts with 2.6.32 running
> >> guests with 2.6.32 which works perfectly. These hosts
> >> and guests running on HP DL 380 G6 with Intel Xeon X5560.
> >> The guests which don't work with 2.6.32 (and 2.6.32
> >> as host) running on HP DL 380 G5 with Intel Xeon L5420.
> >>     
> > Hmm. Some subtle race?
> >
> >   
> >> (All guests) and (all hosts) have the same packages
> >> and the same versions installed and the same kernel
> >> configs (hosts and guests using different .config but the
> >> difference is very small e.g. CONFIG_PARAVIRT_SPINLOCKS=y,
> >> CONFIG_PARAVIRT_GUEST=y in guests but not in hosts
> >> .config).
> >>
> >> I've had problems with qemu-kvm 0.12.2 with high network
> >> traffic which was solved by a patch submitted by Tom
> >> Lendacky:
> >>
> >> "Fix a race condition where qemu finds that there are not enough virtio
> >> ring buffers available and the guest make more buffers available before
> >> qemu can enable notifications."
> >> http://www.mail-archive.com/kvm@vger.kernel.org/msg28667.html
> >>
> >> It was a real lifesaver for the HP DL 380 G6 mentioned
> >> above but maybe this is now causing the problems with the G5 machines.
> >> The symptoms are the same. I can still log into the guest
> >> via VNC but the network is down.
> >>
> >> Thanks!
> >> Robert
> >>
> >>     
> > For now the only thing we seem to know for sure is that on
> > specific hardware there's a regression between 2.6.30 and
> > 2.6.31-rc5. Yes, it is possible that all it does
> > is expose a qemu bug, but it's hard to say.
> > Let's find out what change
> > does that, this should give us a hint.
> >
> >   
> >> On 04/11/10 13:03, Michael S. Tsirkin wrote:
> >>     
> >>> On Fri, Apr 09, 2010 at 12:15:01PM +0200, Robert Wimmer wrote:
> >>>   
> >>>       
> >>>> I'm not really a git hero so here is what I've done:
> >>>>
> >>>> cd /usr/src
> >>>> git clone
> >>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
> >>>> cd linux
> >>>> git checkout -b mykernel 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
> >>>>     
> >>>>         
> >>> Looks right.
> >>>
> >>>   
> >>>       
> >>>> Then I've checked
> >>>>
> >>>> drivers/net/virtio_net.c
> >>>> drivers/net/smc91x.c
> >>>>
> >>>> if the changes commited where not in there.
> >>>> Next I build my kernel as usual. I used my .config
> >>>> from 2.6.30 (which is working fine in a several
> >>>> guests / .config see here:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=25925)
> >>>> and build the kernel
> >>>>
> >>>> genkernel --menuconfig --lvm --oldconfig all
> >>>>
> >>>> which finally gave me a 2.6.31-rc5.
> >>>>     
> >>>>         
> >>> That's right.
> >>>
> >>>   
> >>>       
> >>>> I should mention
> >>>> that 2.6.30 was using SLUB. So here is the output
> >>>> from the 2.6.31-rc5 kernel running about 20 min.:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=25926
> >>>>     
> >>>>         
> >>> Hmm, so we see the error here as well?
> >>>
> >>>   
> >>>       
> >>>> Seems not very usefull to me. I'm currently compiling
> >>>> the same kernel with SLAB.
> >>>>
> >>>> Please let me know if the git commands above are
> >>>> right and/or if you need other kernel options enabled.
> >>>>     
> >>>>         
> >>> Looks right. You don't have to add -b flag if you don't
> >>> want to.
> >>>
> >>>   
> >>>       
> >>>> Thanks!
> >>>> Robert
> >>>>     
> >>>>         
> >>> Hmm, I do not see anything else that seems related.
> >>> Could you please try to bisect?
> >>>
> >>> git bisect start v2.6.31 v2.6.30 -- drivers/virtio/ drivers/net/virtio_net.c
> >>>
> >>> should help assuming the change that triggers this is in virtio.
> >>>
> >>>
> >>>   
> >>>       
> >>>> On 04/08/10 22:04, Michael S. Tsirkin wrote:
> >>>>     
> >>>>         
> >>>>> On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote:
> >>>>>   
> >>>>>       
> >>>>>           
> >>>>>> cc: mst
> >>>>>>
> >>>>>> On 04/08/2010 10:34 PM, Andrew Morton wrote:
> >>>>>>     
> >>>>>>         
> >>>>>>             
> >>>>>>> (switched to email.  Please respond via emailed reply-to-all, not via the
> >>>>>>> bugzilla web interface).
> >>>>>>>
> >>>>>>> On Wed, 7 Apr 2010 10:29:20 GMT
> >>>>>>> bugzilla-daemon@bugzilla.kernel.org wrote:
> >>>>>>>
> >>>>>>>    
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709
> >>>>>>>>
> >>>>>>>>             Summary: swapper page allocation failure
> >>>>>>>>             Product: Memory Management
> >>>>>>>>             Version: 2.5
> >>>>>>>>      Kernel Version: 2.6.32 and 2.6.33
> >>>>>>>>            Platform: All
> >>>>>>>>          OS/Version: Linux
> >>>>>>>>                Tree: Mainline
> >>>>>>>>              Status: NEW
> >>>>>>>>            Severity: normal
> >>>>>>>>            Priority: P1
> >>>>>>>>           Component: Slab Allocator
> >>>>>>>>          AssignedTo: akpm@linux-foundation.org
> >>>>>>>>          ReportedBy: kernel@tauceti.net
> >>>>>>>>          Regression: No
> >>>>>>>>
> >>>>>>>>
> >>>>>>>> Created an attachment (id=25903)
> >>>>>>>>   -->  (https://bugzilla.kernel.org/attachment.cgi?id=25903)
> >>>>>>>> dmesg output
> >>>>>>>>
> >>>>>>>> I'm having problems with "swapper page allocation failure's" since upgrading
> >>>>>>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual
> >>>>>>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As
> >>>>>>>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after
> >>>>>>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see
> >>>>>>>> attachment of dmesg output). The guest is only running a Apache webserver and
> >>>>>>>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried
> >>>>>>>> different kernel configurations (e.g. a unmodified version from Sabayon Linux
> >>>>>>>> Distribution) but doesn't help. Load of the guest (and host) is very low.
> >>>>>>>> Network traffic is about 20-50 MBit/s.
> >>>>>>>>
> >>>>>>>>      
> >>>>>>>>         
> >>>>>>>>             
> >>>>>>>>                 
> >>>>>>> hm, this is a regression.
> >>>>>>>
> >>>>>>> : [  454.006706] users: page allocation failure. order:0, mode:0x20
> >>>>>>> : [  454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2
> >>>>>>> : [  454.006714] Call Trace:
> >>>>>>> : [  454.006717]<IRQ>   [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615
> >>>>>>> : [  454.006796]  [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d
> >>>>>>> : [  454.006820]  [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f
> >>>>>>> : [  454.006842]  [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f
> >>>>>>> : [  454.006846]  [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7
> >>>>>>> : [  454.006858]  [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4
> >>>>>>> : [  454.006873]  [<ffffffff8176035d>] net_rx_action+0xad/0x1a5
> >>>>>>> : [  454.006882]  [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127
> >>>>>>> : [  454.006897]  [<ffffffff81008ffc>] call_softirq+0x1c/0x30
> >>>>>>> : [  454.006901]  [<ffffffff8100af01>] do_softirq+0x41/0x7e
> >>>>>>> : [  454.006904]  [<ffffffff8104b3e3>] irq_exit+0x36/0x75
> >>>>>>> : [  454.006907]  [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1
> >>>>>>> : [  454.006926]  [<ffffffff8183bc13>] ret_from_intr+0x0/0x11
> >>>>>>> : [  454.006928]<EOI>   [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7
> >>>>>>> : [  454.006942]  [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7
> >>>>>>> : [  454.006946]  [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35
> >>>>>>> : [  454.006949]  [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b
> >>>>>>> : [  454.006953]  [<ffffffff810aba67>] __do_fault+0x3c4/0x492
> >>>>>>> : [  454.006957]  [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8
> >>>>>>> : [  454.006966]  [<ffffffff810deb59>] ? path_put+0x2c/0x30
> >>>>>>> : [  454.006975]  [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a
> >>>>>>> : [  454.006979]  [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd
> >>>>>>> : [  454.006982]  [<ffffffff8183bef5>] page_fault+0x25/0x30
> >>>>>>> : [  454.006985] Mem-Info:
> >>>>>>> : [  454.006987] Node 0 DMA per-cpu:
> >>>>>>> : [  454.006990] CPU    0: hi:    0, btch:   1 usd:   0
> >>>>>>> : [  454.006992] CPU    1: hi:    0, btch:   1 usd:   0
> >>>>>>> : [  454.006993] Node 0 DMA32 per-cpu:
> >>>>>>> : [  454.006996] CPU    0: hi:  186, btch:  31 usd: 185
> >>>>>>> : [  454.006998] CPU    1: hi:  186, btch:  31 usd: 112
> >>>>>>> : [  454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0
> >>>>>>> : [  454.007005]  active_file:4882 inactive_file:205902 isolated_file:0
> >>>>>>> : [  454.007006]  unevictable:0 dirty:11 writeback:0 unstable:0
> >>>>>>> : [  454.007007]  free:1385 slab_reclaimable:2445 slab_unreclaimable:4466
> >>>>>>> : [  454.007008]  mapped:1895 shmem:113 pagetables:1370 bounce:0
> >>>>>>> : [  454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> >>>>>>> : [  454.007021] lowmem_reserve[]: 0 994 994 994
> >>>>>>> : [  454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> >>>>>>> : [  454.007036] lowmem_reserve[]: 0 0 0 0
> >>>>>>> : [  454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB
> >>>>>>> : [  454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB
> >>>>>>> : [  454.007059] 210914 total pagecache pages
> >>>>>>> : [  454.007061] 0 pages in swap cache
> >>>>>>> : [  454.007063] Swap cache stats: add 0, delete 0, find 0/0
> >>>>>>> : [  454.007065] Free swap  = 1959924kB
> >>>>>>> : [  454.007067] Total swap = 1959924kB
> >>>>>>> : [  454.014238] 262140 pages RAM
> >>>>>>> : [  454.014241] 7489 pages reserved
> >>>>>>> : [  454.014242] 21430 pages shared
> >>>>>>> : [  454.014244] 247174 pages non-shared
> >>>>>>>
> >>>>>>> Either page reclaim got worse or kvm/virtio-net got more aggressive.
> >>>>>>>
> >>>>>>> Avi, Rusty: can you think of any changes in the KVM/virtio area in the
> >>>>>>> 2.6.30 ->  2.6.32 timeframe which may have increased the GFP_ATOMIC
> >>>>>>> demands upon the page allocator?
> >>>>>>>
> >>>>>>> Thanks.
> >>>>>>>    
> >>>>>>>       
> >>>>>>>           
> >>>>>>>               
> >>>>> On the contrary, with commit
> >>>>> 3161e453e496eb5643faad30fff5a5ab183da0fe
> >>>>> we should be using GFP_ATOMIC less.
> >>>>> But maybe there's a bug and it has the reverse effect somehow ...
> >>>>>
> >>>>> Robert, could you pls try 3161e453e496eb5643faad30fff5a5ab183da0fe
> >>>>> and if that *does* have the problem,
> >>>>> 0b4f2928f14c4a9770b0866923fc81beb7f4aa57?
> >>>>>
> >>>>>   
> >>>>>       
> >>>>>           

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-12 13:52                 ` Michael S. Tsirkin
@ 2010-04-13  8:51                   ` Robert Wimmer
  2010-04-19 12:55                     ` Robert Wimmer
  0 siblings, 1 reply; 62+ messages in thread
From: Robert Wimmer @ 2010-04-13  8:51 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	bugme-daemon, Rusty Russell, Mel Gorman

I've tried to do my very best. In general I can
say: All 2.6.30 versions work, all 2.6.31 fail. 2.6.31-rc3
fails with "soft lockup" and is the only one which
don't show any "swapper page allocation failure".
But the result is finally the same... 2.6.31-rc4
don't show "soft lockups" but "swapper page allocation failure".
Here is the dmesg output for 2.6.31-rc3:
https://bugzilla.kernel.org/attachment.cgi?id=25986

So here is what I've done. Started with a fresh tree
and my 2.6.30 .config:

rm -fr /usr/src/linux
cd /usr/src
git clone
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
cd linux
git checkout 0b4f2928f14c4a9770b0866923fc81beb7f4aa57

Here is the "git bisect log" output:

# bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
# good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30
git bisect start 'v2.6.31' 'v2.6.30' '--' 'drivers/virtio/'
'drivers/net/virtio_net.c'
# good: [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance
id_matching for virtio drivers
git bisect good e3353853730eb99c56b7b0aed1667d51c0e3699a
# good: [9cbc1cb8cd46ce1f7645b9de249b2ce8460129bb] Merge branch 'master'
of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
git bisect good 9cbc1cb8cd46ce1f7645b9de249b2ce8460129bb
# bad: [ff52c3fc7188855ede75d87b022271f0da309e5b] virtio: fix memory
leak on device removal
git bisect bad ff52c3fc7188855ede75d87b022271f0da309e5b
# good: [31278e71471399beaff9280737e52b47db4dc345] net: group address
list and its count
git bisect good 31278e71471399beaff9280737e52b47db4dc345
# bad: [4b892e6582e3a4fe01f623aea386907270d5bf83] virtio-pci: correctly
unregister root device on error
git bisect bad 4b892e6582e3a4fe01f623aea386907270d5bf83

Hopefully this gives you some hints. The problem
for me is that I don't know what commit I should
consider good or bad. Should I consider the
commit with the "soft lockup" as good because it
don't show the allocation failure? Currently it is
marked as bad (4b892e6582e3a4fe01f623aea386907270d5bf83).
What should I do next?

Thanks!
Robert

On 04/12/10 15:52, Michael S. Tsirkin wrote:
> On Mon, Apr 12, 2010 at 03:50:31PM +0200, Robert Wimmer wrote:
>   
>> Sorry but I need some more git help. Here is what I've done.
>> Started with a fresh clone of the kernel:
>>
>> cd /usr/src
>> git clone
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
>> cd linux
>> git checkout 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
>>
>> Since I already knew that this commit wasn't good I did
>>
>> git bisect start
>> git bisect bad
>>     
> I think what you miss is marking the good commit.
> bisect does a binary search but it needs to know
> both good and bad commits to search in the range.
>
> Optionally, you can use '-- drivers/virtio/ drivers/net/virtio_net.c'
> what this does is limit bisect to commits that touch files in
> question. This way you get much less tests to run
> (about 4) but after you find a first problematic commit
> you must verify that a commit just before it does not have the issue.
>
> If this turns out not to be the case, you'll have to
> fallback on full bisect, and we will now this is some
> other change in kernel that triggered the regression.
>
>
>   
>> compiled and started over. As expected the problem returns.
>> So I've done another
>>
>> git bisect bad
>>
>> but I always get the same commit:
>>
>> kabul:/usr/src/linux # git bisect log
>> git bisect start
>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix
>> compilation on SMP
>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix
>> compilation on SMP
>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix
>> compilation on SMP
>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
>>
>> I've expected that after each "git bisect bad" I get the previous
>> commit before the "bad" one. How can get the previous commit?
>> The bisect documentation couldn't help me.
>>
>> Thanks!
>> Robert
>>
>>
>>
>> On 04/12/10 13:23, Michael S. Tsirkin wrote:
>>     
>>> On Mon, Apr 12, 2010 at 11:25:26AM +0200, Robert Wimmer wrote:
>>>   
>>>       
>>>> server10:/usr/src/linux # git bisect start v2.6.31 v2.6.30 --
>>>> drivers/virtio/ drivers/net/virtio_net.c
>>>> Bisecting: 12 revisions left to test after this (roughly 4 steps)
>>>> [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance id_matching
>>>> for virtio drivers
>>>>
>>>>     
>>>>         
>>> Sorry I wasn't clear. the way to use bisect is as follows:
>>> - first start as you did now.
>>> 1. now build kernel, install and test
>>> 2. if bug is there, type 'git bisect bad'
>>> 3. if bug is not there, type 'git bisect good'
>>> 4. The above will give you another kernel version to test
>>>    if so go back to step 1
>>> 6. this will be repeated about 4 times (number of steps above)
>>> 7. in the end you will get the first revision which has the
>>>    problem. Let's assume it is revision ABCDEF.
>>>
>>>    Type git bisect log to see your history.
>>>
>>> 8. Now git reset --hard ABCDEF~1 and try again.
>>>
>>> If you see the problem with ABCDEF but not ABCDEF~1
>>> then we will have a good guess at the culprit.
>>>
>>> Some more tips here:
>>> http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
>>>
>>>
>>>   
>>>       
>>>> Today I've upgraded to qemu-kvm-0.12.3-r1 (Gentoo package)
>>>> but doesn't help. Still getting "page allocation failure" with
>>>> 2.6.31-rc5.
>>>>
>>>> Does it makes sense to use the same 2.6.31-rc5 kernel
>>>> in the host and guest for testing? Currently I'm still using 2.6.32
>>>> in host and testing 2.6.31-rc5 in guest until "crashes".
>>>> Then I start the guest with 2.6.30 again which works
>>>> without trouble with 2.6.32 as host.
>>>>
>>>> This is really strange. I have hosts with 2.6.32 running
>>>> guests with 2.6.32 which works perfectly. These hosts
>>>> and guests running on HP DL 380 G6 with Intel Xeon X5560.
>>>> The guests which don't work with 2.6.32 (and 2.6.32
>>>> as host) running on HP DL 380 G5 with Intel Xeon L5420.
>>>>     
>>>>         
>>> Hmm. Some subtle race?
>>>
>>>   
>>>       
>>>> (All guests) and (all hosts) have the same packages
>>>> and the same versions installed and the same kernel
>>>> configs (hosts and guests using different .config but the
>>>> difference is very small e.g. CONFIG_PARAVIRT_SPINLOCKS=y,
>>>> CONFIG_PARAVIRT_GUEST=y in guests but not in hosts
>>>> .config).
>>>>
>>>> I've had problems with qemu-kvm 0.12.2 with high network
>>>> traffic which was solved by a patch submitted by Tom
>>>> Lendacky:
>>>>
>>>> "Fix a race condition where qemu finds that there are not enough virtio
>>>> ring buffers available and the guest make more buffers available before
>>>> qemu can enable notifications."
>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg28667.html
>>>>
>>>> It was a real lifesaver for the HP DL 380 G6 mentioned
>>>> above but maybe this is now causing the problems with the G5 machines.
>>>> The symptoms are the same. I can still log into the guest
>>>> via VNC but the network is down.
>>>>
>>>> Thanks!
>>>> Robert
>>>>
>>>>     
>>>>         
>>> For now the only thing we seem to know for sure is that on
>>> specific hardware there's a regression between 2.6.30 and
>>> 2.6.31-rc5. Yes, it is possible that all it does
>>> is expose a qemu bug, but it's hard to say.
>>> Let's find out what change
>>> does that, this should give us a hint.
>>>
>>>   
>>>       
>>>> On 04/11/10 13:03, Michael S. Tsirkin wrote:
>>>>     
>>>>         
>>>>> On Fri, Apr 09, 2010 at 12:15:01PM +0200, Robert Wimmer wrote:
>>>>>   
>>>>>       
>>>>>           
>>>>>> I'm not really a git hero so here is what I've done:
>>>>>>
>>>>>> cd /usr/src
>>>>>> git clone
>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
>>>>>> cd linux
>>>>>> git checkout -b mykernel 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> Looks right.
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>>>> Then I've checked
>>>>>>
>>>>>> drivers/net/virtio_net.c
>>>>>> drivers/net/smc91x.c
>>>>>>
>>>>>> if the changes commited where not in there.
>>>>>> Next I build my kernel as usual. I used my .config
>>>>>> from 2.6.30 (which is working fine in a several
>>>>>> guests / .config see here:
>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=25925)
>>>>>> and build the kernel
>>>>>>
>>>>>> genkernel --menuconfig --lvm --oldconfig all
>>>>>>
>>>>>> which finally gave me a 2.6.31-rc5.
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> That's right.
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>>>> I should mention
>>>>>> that 2.6.30 was using SLUB. So here is the output
>>>>>> from the 2.6.31-rc5 kernel running about 20 min.:
>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=25926
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> Hmm, so we see the error here as well?
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>>>> Seems not very usefull to me. I'm currently compiling
>>>>>> the same kernel with SLAB.
>>>>>>
>>>>>> Please let me know if the git commands above are
>>>>>> right and/or if you need other kernel options enabled.
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> Looks right. You don't have to add -b flag if you don't
>>>>> want to.
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>>>> Thanks!
>>>>>> Robert
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> Hmm, I do not see anything else that seems related.
>>>>> Could you please try to bisect?
>>>>>
>>>>> git bisect start v2.6.31 v2.6.30 -- drivers/virtio/ drivers/net/virtio_net.c
>>>>>
>>>>> should help assuming the change that triggers this is in virtio.
>>>>>
>>>>>
>>>>>   
>>>>>       
>>>>>           
>>>>>> On 04/08/10 22:04, Michael S. Tsirkin wrote:
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote:
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>>>> cc: mst
>>>>>>>>
>>>>>>>> On 04/08/2010 10:34 PM, Andrew Morton wrote:
>>>>>>>>     
>>>>>>>>         
>>>>>>>>             
>>>>>>>>                 
>>>>>>>>> (switched to email.  Please respond via emailed reply-to-all, not via the
>>>>>>>>> bugzilla web interface).
>>>>>>>>>
>>>>>>>>> On Wed, 7 Apr 2010 10:29:20 GMT
>>>>>>>>> bugzilla-daemon@bugzilla.kernel.org wrote:
>>>>>>>>>
>>>>>>>>>    
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709
>>>>>>>>>>
>>>>>>>>>>             Summary: swapper page allocation failure
>>>>>>>>>>             Product: Memory Management
>>>>>>>>>>             Version: 2.5
>>>>>>>>>>      Kernel Version: 2.6.32 and 2.6.33
>>>>>>>>>>            Platform: All
>>>>>>>>>>          OS/Version: Linux
>>>>>>>>>>                Tree: Mainline
>>>>>>>>>>              Status: NEW
>>>>>>>>>>            Severity: normal
>>>>>>>>>>            Priority: P1
>>>>>>>>>>           Component: Slab Allocator
>>>>>>>>>>          AssignedTo: akpm@linux-foundation.org
>>>>>>>>>>          ReportedBy: kernel@tauceti.net
>>>>>>>>>>          Regression: No
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Created an attachment (id=25903)
>>>>>>>>>>   -->  (https://bugzilla.kernel.org/attachment.cgi?id=25903)
>>>>>>>>>> dmesg output
>>>>>>>>>>
>>>>>>>>>> I'm having problems with "swapper page allocation failure's" since upgrading
>>>>>>>>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual
>>>>>>>>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As
>>>>>>>>>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after
>>>>>>>>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see
>>>>>>>>>> attachment of dmesg output). The guest is only running a Apache webserver and
>>>>>>>>>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried
>>>>>>>>>> different kernel configurations (e.g. a unmodified version from Sabayon Linux
>>>>>>>>>> Distribution) but doesn't help. Load of the guest (and host) is very low.
>>>>>>>>>> Network traffic is about 20-50 MBit/s.
>>>>>>>>>>
>>>>>>>>>>      
>>>>>>>>>>         
>>>>>>>>>>             
>>>>>>>>>>                 
>>>>>>>>>>                     
>>>>>>>>> hm, this is a regression.
>>>>>>>>>
>>>>>>>>> : [  454.006706] users: page allocation failure. order:0, mode:0x20
>>>>>>>>> : [  454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2
>>>>>>>>> : [  454.006714] Call Trace:
>>>>>>>>> : [  454.006717]<IRQ>   [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615
>>>>>>>>> : [  454.006796]  [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d
>>>>>>>>> : [  454.006820]  [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f
>>>>>>>>> : [  454.006842]  [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f
>>>>>>>>> : [  454.006846]  [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7
>>>>>>>>> : [  454.006858]  [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4
>>>>>>>>> : [  454.006873]  [<ffffffff8176035d>] net_rx_action+0xad/0x1a5
>>>>>>>>> : [  454.006882]  [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127
>>>>>>>>> : [  454.006897]  [<ffffffff81008ffc>] call_softirq+0x1c/0x30
>>>>>>>>> : [  454.006901]  [<ffffffff8100af01>] do_softirq+0x41/0x7e
>>>>>>>>> : [  454.006904]  [<ffffffff8104b3e3>] irq_exit+0x36/0x75
>>>>>>>>> : [  454.006907]  [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1
>>>>>>>>> : [  454.006926]  [<ffffffff8183bc13>] ret_from_intr+0x0/0x11
>>>>>>>>> : [  454.006928]<EOI>   [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7
>>>>>>>>> : [  454.006942]  [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7
>>>>>>>>> : [  454.006946]  [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35
>>>>>>>>> : [  454.006949]  [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b
>>>>>>>>> : [  454.006953]  [<ffffffff810aba67>] __do_fault+0x3c4/0x492
>>>>>>>>> : [  454.006957]  [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8
>>>>>>>>> : [  454.006966]  [<ffffffff810deb59>] ? path_put+0x2c/0x30
>>>>>>>>> : [  454.006975]  [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a
>>>>>>>>> : [  454.006979]  [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd
>>>>>>>>> : [  454.006982]  [<ffffffff8183bef5>] page_fault+0x25/0x30
>>>>>>>>> : [  454.006985] Mem-Info:
>>>>>>>>> : [  454.006987] Node 0 DMA per-cpu:
>>>>>>>>> : [  454.006990] CPU    0: hi:    0, btch:   1 usd:   0
>>>>>>>>> : [  454.006992] CPU    1: hi:    0, btch:   1 usd:   0
>>>>>>>>> : [  454.006993] Node 0 DMA32 per-cpu:
>>>>>>>>> : [  454.006996] CPU    0: hi:  186, btch:  31 usd: 185
>>>>>>>>> : [  454.006998] CPU    1: hi:  186, btch:  31 usd: 112
>>>>>>>>> : [  454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0
>>>>>>>>> : [  454.007005]  active_file:4882 inactive_file:205902 isolated_file:0
>>>>>>>>> : [  454.007006]  unevictable:0 dirty:11 writeback:0 unstable:0
>>>>>>>>> : [  454.007007]  free:1385 slab_reclaimable:2445 slab_unreclaimable:4466
>>>>>>>>> : [  454.007008]  mapped:1895 shmem:113 pagetables:1370 bounce:0
>>>>>>>>> : [  454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>>>>>>>> : [  454.007021] lowmem_reserve[]: 0 994 994 994
>>>>>>>>> : [  454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>>>>>>>> : [  454.007036] lowmem_reserve[]: 0 0 0 0
>>>>>>>>> : [  454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB
>>>>>>>>> : [  454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB
>>>>>>>>> : [  454.007059] 210914 total pagecache pages
>>>>>>>>> : [  454.007061] 0 pages in swap cache
>>>>>>>>> : [  454.007063] Swap cache stats: add 0, delete 0, find 0/0
>>>>>>>>> : [  454.007065] Free swap  = 1959924kB
>>>>>>>>> : [  454.007067] Total swap = 1959924kB
>>>>>>>>> : [  454.014238] 262140 pages RAM
>>>>>>>>> : [  454.014241] 7489 pages reserved
>>>>>>>>> : [  454.014242] 21430 pages shared
>>>>>>>>> : [  454.014244] 247174 pages non-shared
>>>>>>>>>
>>>>>>>>> Either page reclaim got worse or kvm/virtio-net got more aggressive.
>>>>>>>>>
>>>>>>>>> Avi, Rusty: can you think of any changes in the KVM/virtio area in the
>>>>>>>>> 2.6.30 ->  2.6.32 timeframe which may have increased the GFP_ATOMIC
>>>>>>>>> demands upon the page allocator?
>>>>>>>>>
>>>>>>>>> Thanks.
>>>>>>>>>    
>>>>>>>>>       
>>>>>>>>>           
>>>>>>>>>               
>>>>>>>>>                   
>>>>>>> On the contrary, with commit
>>>>>>> 3161e453e496eb5643faad30fff5a5ab183da0fe
>>>>>>> we should be using GFP_ATOMIC less.
>>>>>>> But maybe there's a bug and it has the reverse effect somehow ...
>>>>>>>
>>>>>>> Robert, could you pls try 3161e453e496eb5643faad30fff5a5ab183da0fe
>>>>>>> and if that *does* have the problem,
>>>>>>> 0b4f2928f14c4a9770b0866923fc81beb7f4aa57?
>>>>>>>
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-13  8:51                   ` Robert Wimmer
@ 2010-04-19 12:55                     ` Robert Wimmer
  2010-04-19 13:17                       ` Michael S. Tsirkin
  0 siblings, 1 reply; 62+ messages in thread
From: Robert Wimmer @ 2010-04-19 12:55 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman

Is there a possibility to track this further down?
I've problems on two other KVMs since a few weeks
which I think that they're related to this. Host for
this KVMs are kernel 2.6.32. Guests until today were
also running 2.6.32. Inside the KVMs we're using GlusterFS,
NFSv4 and Apache with PHP. From time to time the
httpd-processes are "hanging". When this happens
then we're seeing a lot of soft lockups. This
hosts are running Xeon X5560 processors. Until
today I suspected that this problems only happens
on older Xeon's but this doesn't seems to be true.
I've attached the output from /var/log/messages
(https://bugzilla.kernel.org/attachment.cgi?id=26048)
from one of the hosts with GlusterFS. I've now
downgraded to kernel 2.6.30 in the guests. But since
this problem also exists in 2.6.34-rc3 I suspect that
we're never ever will be able to do a kernel update
in the guests when they're using NFS :-(

But what I definitely can say is that all the problems
only happens with guests running kernel >= 2.6.31
and with a remote file system (NFS, GlusterFS). Some
days ago another KVM have had a network shutdown using
kernel 2.6.32 in host and guest + NFSv4. But this only
happend once until now and there isn't so much
traffic running through the interfaces of that host.

All other guests with kernel 2.6.30 (about 80 guests on
18 hosts) with NFS and KVM 0.12.3 are really running
perfectly.

Thanks!
Robert



On 04/13/10 10:51, Robert Wimmer wrote:
> I've tried to do my very best. In general I can
> say: All 2.6.30 versions work, all 2.6.31 fail. 2.6.31-rc3
> fails with "soft lockup" and is the only one which
> don't show any "swapper page allocation failure".
> But the result is finally the same... 2.6.31-rc4
> don't show "soft lockups" but "swapper page allocation failure".
> Here is the dmesg output for 2.6.31-rc3:
> https://bugzilla.kernel.org/attachment.cgi?id=25986
>
> So here is what I've done. Started with a fresh tree
> and my 2.6.30 .config:
>
> rm -fr /usr/src/linux
> cd /usr/src
> git clone
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
> cd linux
> git checkout 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
>
> Here is the "git bisect log" output:
>
> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30
> git bisect start 'v2.6.31' 'v2.6.30' '--' 'drivers/virtio/'
> 'drivers/net/virtio_net.c'
> # good: [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance
> id_matching for virtio drivers
> git bisect good e3353853730eb99c56b7b0aed1667d51c0e3699a
> # good: [9cbc1cb8cd46ce1f7645b9de249b2ce8460129bb] Merge branch 'master'
> of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
> git bisect good 9cbc1cb8cd46ce1f7645b9de249b2ce8460129bb
> # bad: [ff52c3fc7188855ede75d87b022271f0da309e5b] virtio: fix memory
> leak on device removal
> git bisect bad ff52c3fc7188855ede75d87b022271f0da309e5b
> # good: [31278e71471399beaff9280737e52b47db4dc345] net: group address
> list and its count
> git bisect good 31278e71471399beaff9280737e52b47db4dc345
> # bad: [4b892e6582e3a4fe01f623aea386907270d5bf83] virtio-pci: correctly
> unregister root device on error
> git bisect bad 4b892e6582e3a4fe01f623aea386907270d5bf83
>
> Hopefully this gives you some hints. The problem
> for me is that I don't know what commit I should
> consider good or bad. Should I consider the
> commit with the "soft lockup" as good because it
> don't show the allocation failure? Currently it is
> marked as bad (4b892e6582e3a4fe01f623aea386907270d5bf83).
> What should I do next?
>
> Thanks!
> Robert
>
> On 04/12/10 15:52, Michael S. Tsirkin wrote:
>   
>> On Mon, Apr 12, 2010 at 03:50:31PM +0200, Robert Wimmer wrote:
>>   
>>     
>>> Sorry but I need some more git help. Here is what I've done.
>>> Started with a fresh clone of the kernel:
>>>
>>> cd /usr/src
>>> git clone
>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
>>> cd linux
>>> git checkout 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
>>>
>>> Since I already knew that this commit wasn't good I did
>>>
>>> git bisect start
>>> git bisect bad
>>>     
>>>       
>> I think what you miss is marking the good commit.
>> bisect does a binary search but it needs to know
>> both good and bad commits to search in the range.
>>
>> Optionally, you can use '-- drivers/virtio/ drivers/net/virtio_net.c'
>> what this does is limit bisect to commits that touch files in
>> question. This way you get much less tests to run
>> (about 4) but after you find a first problematic commit
>> you must verify that a commit just before it does not have the issue.
>>
>> If this turns out not to be the case, you'll have to
>> fallback on full bisect, and we will now this is some
>> other change in kernel that triggered the regression.
>>
>>
>>   
>>     
>>> compiled and started over. As expected the problem returns.
>>> So I've done another
>>>
>>> git bisect bad
>>>
>>> but I always get the same commit:
>>>
>>> kabul:/usr/src/linux # git bisect log
>>> git bisect start
>>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix
>>> compilation on SMP
>>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
>>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix
>>> compilation on SMP
>>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
>>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix
>>> compilation on SMP
>>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
>>>
>>> I've expected that after each "git bisect bad" I get the previous
>>> commit before the "bad" one. How can get the previous commit?
>>> The bisect documentation couldn't help me.
>>>
>>> Thanks!
>>> Robert
>>>
>>>
>>>
>>> On 04/12/10 13:23, Michael S. Tsirkin wrote:
>>>     
>>>       
>>>> On Mon, Apr 12, 2010 at 11:25:26AM +0200, Robert Wimmer wrote:
>>>>   
>>>>       
>>>>         
>>>>> server10:/usr/src/linux # git bisect start v2.6.31 v2.6.30 --
>>>>> drivers/virtio/ drivers/net/virtio_net.c
>>>>> Bisecting: 12 revisions left to test after this (roughly 4 steps)
>>>>> [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance id_matching
>>>>> for virtio drivers
>>>>>
>>>>>     
>>>>>         
>>>>>           
>>>> Sorry I wasn't clear. the way to use bisect is as follows:
>>>> - first start as you did now.
>>>> 1. now build kernel, install and test
>>>> 2. if bug is there, type 'git bisect bad'
>>>> 3. if bug is not there, type 'git bisect good'
>>>> 4. The above will give you another kernel version to test
>>>>    if so go back to step 1
>>>> 6. this will be repeated about 4 times (number of steps above)
>>>> 7. in the end you will get the first revision which has the
>>>>    problem. Let's assume it is revision ABCDEF.
>>>>
>>>>    Type git bisect log to see your history.
>>>>
>>>> 8. Now git reset --hard ABCDEF~1 and try again.
>>>>
>>>> If you see the problem with ABCDEF but not ABCDEF~1
>>>> then we will have a good guess at the culprit.
>>>>
>>>> Some more tips here:
>>>> http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
>>>>
>>>>
>>>>   
>>>>       
>>>>         
>>>>> Today I've upgraded to qemu-kvm-0.12.3-r1 (Gentoo package)
>>>>> but doesn't help. Still getting "page allocation failure" with
>>>>> 2.6.31-rc5.
>>>>>
>>>>> Does it makes sense to use the same 2.6.31-rc5 kernel
>>>>> in the host and guest for testing? Currently I'm still using 2.6.32
>>>>> in host and testing 2.6.31-rc5 in guest until "crashes".
>>>>> Then I start the guest with 2.6.30 again which works
>>>>> without trouble with 2.6.32 as host.
>>>>>
>>>>> This is really strange. I have hosts with 2.6.32 running
>>>>> guests with 2.6.32 which works perfectly. These hosts
>>>>> and guests running on HP DL 380 G6 with Intel Xeon X5560.
>>>>> The guests which don't work with 2.6.32 (and 2.6.32
>>>>> as host) running on HP DL 380 G5 with Intel Xeon L5420.
>>>>>     
>>>>>         
>>>>>           
>>>> Hmm. Some subtle race?
>>>>
>>>>   
>>>>       
>>>>         
>>>>> (All guests) and (all hosts) have the same packages
>>>>> and the same versions installed and the same kernel
>>>>> configs (hosts and guests using different .config but the
>>>>> difference is very small e.g. CONFIG_PARAVIRT_SPINLOCKS=y,
>>>>> CONFIG_PARAVIRT_GUEST=y in guests but not in hosts
>>>>> .config).
>>>>>
>>>>> I've had problems with qemu-kvm 0.12.2 with high network
>>>>> traffic which was solved by a patch submitted by Tom
>>>>> Lendacky:
>>>>>
>>>>> "Fix a race condition where qemu finds that there are not enough virtio
>>>>> ring buffers available and the guest make more buffers available before
>>>>> qemu can enable notifications."
>>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg28667.html
>>>>>
>>>>> It was a real lifesaver for the HP DL 380 G6 mentioned
>>>>> above but maybe this is now causing the problems with the G5 machines.
>>>>> The symptoms are the same. I can still log into the guest
>>>>> via VNC but the network is down.
>>>>>
>>>>> Thanks!
>>>>> Robert
>>>>>
>>>>>     
>>>>>         
>>>>>           
>>>> For now the only thing we seem to know for sure is that on
>>>> specific hardware there's a regression between 2.6.30 and
>>>> 2.6.31-rc5. Yes, it is possible that all it does
>>>> is expose a qemu bug, but it's hard to say.
>>>> Let's find out what change
>>>> does that, this should give us a hint.
>>>>
>>>>   
>>>>       
>>>>         
>>>>> On 04/11/10 13:03, Michael S. Tsirkin wrote:
>>>>>     
>>>>>         
>>>>>           
>>>>>> On Fri, Apr 09, 2010 at 12:15:01PM +0200, Robert Wimmer wrote:
>>>>>>   
>>>>>>       
>>>>>>           
>>>>>>             
>>>>>>> I'm not really a git hero so here is what I've done:
>>>>>>>
>>>>>>> cd /usr/src
>>>>>>> git clone
>>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
>>>>>>> cd linux
>>>>>>> git checkout -b mykernel 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
>>>>>>>     
>>>>>>>         
>>>>>>>             
>>>>>>>               
>>>>>> Looks right.
>>>>>>
>>>>>>   
>>>>>>       
>>>>>>           
>>>>>>             
>>>>>>> Then I've checked
>>>>>>>
>>>>>>> drivers/net/virtio_net.c
>>>>>>> drivers/net/smc91x.c
>>>>>>>
>>>>>>> if the changes commited where not in there.
>>>>>>> Next I build my kernel as usual. I used my .config
>>>>>>> from 2.6.30 (which is working fine in a several
>>>>>>> guests / .config see here:
>>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=25925)
>>>>>>> and build the kernel
>>>>>>>
>>>>>>> genkernel --menuconfig --lvm --oldconfig all
>>>>>>>
>>>>>>> which finally gave me a 2.6.31-rc5.
>>>>>>>     
>>>>>>>         
>>>>>>>             
>>>>>>>               
>>>>>> That's right.
>>>>>>
>>>>>>   
>>>>>>       
>>>>>>           
>>>>>>             
>>>>>>> I should mention
>>>>>>> that 2.6.30 was using SLUB. So here is the output
>>>>>>> from the 2.6.31-rc5 kernel running about 20 min.:
>>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=25926
>>>>>>>     
>>>>>>>         
>>>>>>>             
>>>>>>>               
>>>>>> Hmm, so we see the error here as well?
>>>>>>
>>>>>>   
>>>>>>       
>>>>>>           
>>>>>>             
>>>>>>> Seems not very usefull to me. I'm currently compiling
>>>>>>> the same kernel with SLAB.
>>>>>>>
>>>>>>> Please let me know if the git commands above are
>>>>>>> right and/or if you need other kernel options enabled.
>>>>>>>     
>>>>>>>         
>>>>>>>             
>>>>>>>               
>>>>>> Looks right. You don't have to add -b flag if you don't
>>>>>> want to.
>>>>>>
>>>>>>   
>>>>>>       
>>>>>>           
>>>>>>             
>>>>>>> Thanks!
>>>>>>> Robert
>>>>>>>     
>>>>>>>         
>>>>>>>             
>>>>>>>               
>>>>>> Hmm, I do not see anything else that seems related.
>>>>>> Could you please try to bisect?
>>>>>>
>>>>>> git bisect start v2.6.31 v2.6.30 -- drivers/virtio/ drivers/net/virtio_net.c
>>>>>>
>>>>>> should help assuming the change that triggers this is in virtio.
>>>>>>
>>>>>>
>>>>>>   
>>>>>>       
>>>>>>           
>>>>>>             
>>>>>>> On 04/08/10 22:04, Michael S. Tsirkin wrote:
>>>>>>>     
>>>>>>>         
>>>>>>>             
>>>>>>>               
>>>>>>>> On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote:
>>>>>>>>   
>>>>>>>>       
>>>>>>>>           
>>>>>>>>               
>>>>>>>>                 
>>>>>>>>> cc: mst
>>>>>>>>>
>>>>>>>>> On 04/08/2010 10:34 PM, Andrew Morton wrote:
>>>>>>>>>     
>>>>>>>>>         
>>>>>>>>>             
>>>>>>>>>                 
>>>>>>>>>                   
>>>>>>>>>> (switched to email.  Please respond via emailed reply-to-all, not via the
>>>>>>>>>> bugzilla web interface).
>>>>>>>>>>
>>>>>>>>>> On Wed, 7 Apr 2010 10:29:20 GMT
>>>>>>>>>> bugzilla-daemon@bugzilla.kernel.org wrote:
>>>>>>>>>>
>>>>>>>>>>    
>>>>>>>>>>       
>>>>>>>>>>           
>>>>>>>>>>               
>>>>>>>>>>                   
>>>>>>>>>>                     
>>>>>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709
>>>>>>>>>>>
>>>>>>>>>>>             Summary: swapper page allocation failure
>>>>>>>>>>>             Product: Memory Management
>>>>>>>>>>>             Version: 2.5
>>>>>>>>>>>      Kernel Version: 2.6.32 and 2.6.33
>>>>>>>>>>>            Platform: All
>>>>>>>>>>>          OS/Version: Linux
>>>>>>>>>>>                Tree: Mainline
>>>>>>>>>>>              Status: NEW
>>>>>>>>>>>            Severity: normal
>>>>>>>>>>>            Priority: P1
>>>>>>>>>>>           Component: Slab Allocator
>>>>>>>>>>>          AssignedTo: akpm@linux-foundation.org
>>>>>>>>>>>          ReportedBy: kernel@tauceti.net
>>>>>>>>>>>          Regression: No
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>> Created an attachment (id=25903)
>>>>>>>>>>>   -->  (https://bugzilla.kernel.org/attachment.cgi?id=25903)
>>>>>>>>>>> dmesg output
>>>>>>>>>>>
>>>>>>>>>>> I'm having problems with "swapper page allocation failure's" since upgrading
>>>>>>>>>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual
>>>>>>>>>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As
>>>>>>>>>>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after
>>>>>>>>>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see
>>>>>>>>>>> attachment of dmesg output). The guest is only running a Apache webserver and
>>>>>>>>>>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried
>>>>>>>>>>> different kernel configurations (e.g. a unmodified version from Sabayon Linux
>>>>>>>>>>> Distribution) but doesn't help. Load of the guest (and host) is very low.
>>>>>>>>>>> Network traffic is about 20-50 MBit/s.
>>>>>>>>>>>
>>>>>>>>>>>      
>>>>>>>>>>>         
>>>>>>>>>>>             
>>>>>>>>>>>                 
>>>>>>>>>>>                     
>>>>>>>>>>>                       
>>>>>>>>>> hm, this is a regression.
>>>>>>>>>>
>>>>>>>>>> : [  454.006706] users: page allocation failure. order:0, mode:0x20
>>>>>>>>>> : [  454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2
>>>>>>>>>> : [  454.006714] Call Trace:
>>>>>>>>>> : [  454.006717]<IRQ>   [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615
>>>>>>>>>> : [  454.006796]  [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d
>>>>>>>>>> : [  454.006820]  [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f
>>>>>>>>>> : [  454.006842]  [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f
>>>>>>>>>> : [  454.006846]  [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7
>>>>>>>>>> : [  454.006858]  [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4
>>>>>>>>>> : [  454.006873]  [<ffffffff8176035d>] net_rx_action+0xad/0x1a5
>>>>>>>>>> : [  454.006882]  [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127
>>>>>>>>>> : [  454.006897]  [<ffffffff81008ffc>] call_softirq+0x1c/0x30
>>>>>>>>>> : [  454.006901]  [<ffffffff8100af01>] do_softirq+0x41/0x7e
>>>>>>>>>> : [  454.006904]  [<ffffffff8104b3e3>] irq_exit+0x36/0x75
>>>>>>>>>> : [  454.006907]  [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1
>>>>>>>>>> : [  454.006926]  [<ffffffff8183bc13>] ret_from_intr+0x0/0x11
>>>>>>>>>> : [  454.006928]<EOI>   [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7
>>>>>>>>>> : [  454.006942]  [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7
>>>>>>>>>> : [  454.006946]  [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35
>>>>>>>>>> : [  454.006949]  [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b
>>>>>>>>>> : [  454.006953]  [<ffffffff810aba67>] __do_fault+0x3c4/0x492
>>>>>>>>>> : [  454.006957]  [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8
>>>>>>>>>> : [  454.006966]  [<ffffffff810deb59>] ? path_put+0x2c/0x30
>>>>>>>>>> : [  454.006975]  [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a
>>>>>>>>>> : [  454.006979]  [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd
>>>>>>>>>> : [  454.006982]  [<ffffffff8183bef5>] page_fault+0x25/0x30
>>>>>>>>>> : [  454.006985] Mem-Info:
>>>>>>>>>> : [  454.006987] Node 0 DMA per-cpu:
>>>>>>>>>> : [  454.006990] CPU    0: hi:    0, btch:   1 usd:   0
>>>>>>>>>> : [  454.006992] CPU    1: hi:    0, btch:   1 usd:   0
>>>>>>>>>> : [  454.006993] Node 0 DMA32 per-cpu:
>>>>>>>>>> : [  454.006996] CPU    0: hi:  186, btch:  31 usd: 185
>>>>>>>>>> : [  454.006998] CPU    1: hi:  186, btch:  31 usd: 112
>>>>>>>>>> : [  454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0
>>>>>>>>>> : [  454.007005]  active_file:4882 inactive_file:205902 isolated_file:0
>>>>>>>>>> : [  454.007006]  unevictable:0 dirty:11 writeback:0 unstable:0
>>>>>>>>>> : [  454.007007]  free:1385 slab_reclaimable:2445 slab_unreclaimable:4466
>>>>>>>>>> : [  454.007008]  mapped:1895 shmem:113 pagetables:1370 bounce:0
>>>>>>>>>> : [  454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>>>>>>>>> : [  454.007021] lowmem_reserve[]: 0 994 994 994
>>>>>>>>>> : [  454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>>>>>>>>>> : [  454.007036] lowmem_reserve[]: 0 0 0 0
>>>>>>>>>> : [  454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB
>>>>>>>>>> : [  454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB
>>>>>>>>>> : [  454.007059] 210914 total pagecache pages
>>>>>>>>>> : [  454.007061] 0 pages in swap cache
>>>>>>>>>> : [  454.007063] Swap cache stats: add 0, delete 0, find 0/0
>>>>>>>>>> : [  454.007065] Free swap  = 1959924kB
>>>>>>>>>> : [  454.007067] Total swap = 1959924kB
>>>>>>>>>> : [  454.014238] 262140 pages RAM
>>>>>>>>>> : [  454.014241] 7489 pages reserved
>>>>>>>>>> : [  454.014242] 21430 pages shared
>>>>>>>>>> : [  454.014244] 247174 pages non-shared
>>>>>>>>>>
>>>>>>>>>> Either page reclaim got worse or kvm/virtio-net got more aggressive.
>>>>>>>>>>
>>>>>>>>>> Avi, Rusty: can you think of any changes in the KVM/virtio area in the
>>>>>>>>>> 2.6.30 ->  2.6.32 timeframe which may have increased the GFP_ATOMIC
>>>>>>>>>> demands upon the page allocator?
>>>>>>>>>>
>>>>>>>>>> Thanks.
>>>>>>>>>>    
>>>>>>>>>>       
>>>>>>>>>>           
>>>>>>>>>>               
>>>>>>>>>>                   
>>>>>>>>>>                     
>>>>>>>> On the contrary, with commit
>>>>>>>> 3161e453e496eb5643faad30fff5a5ab183da0fe
>>>>>>>> we should be using GFP_ATOMIC less.
>>>>>>>> But maybe there's a bug and it has the reverse effect somehow ...
>>>>>>>>
>>>>>>>> Robert, could you pls try 3161e453e496eb5643faad30fff5a5ab183da0fe
>>>>>>>> and if that *does* have the problem,
>>>>>>>> 0b4f2928f14c4a9770b0866923fc81beb7f4aa57?
>>>>>>>>
>>>>>>>>   
>>>>>>>>       
>>>>>>>>           
>>>>>>>>               
>>>>>>>>                 
>   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-19 12:55                     ` Robert Wimmer
@ 2010-04-19 13:17                       ` Michael S. Tsirkin
  2010-04-21 11:23                         ` kernel
  0 siblings, 1 reply; 62+ messages in thread
From: Michael S. Tsirkin @ 2010-04-19 13:17 UTC (permalink / raw)
  To: Robert Wimmer
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman

So it seems the change that created the problem was not
specific to virtio.

To track this further down, I think the thing to try
would be to do a full bisect.

That is instead of git bisect start 'v2.6.31' 'v2.6.30' '--' 'drivers/virtio/'
'drivers/net/virtio_net.c'

do

git bisect start 'v2.6.31' 'v2.6.30'

and then test kernel versions as they are generated.


On Mon, Apr 19, 2010 at 02:55:21PM +0200, Robert Wimmer wrote:
> Is there a possibility to track this further down?
> I've problems on two other KVMs since a few weeks
> which I think that they're related to this. Host for
> this KVMs are kernel 2.6.32. Guests until today were
> also running 2.6.32. Inside the KVMs we're using GlusterFS,
> NFSv4 and Apache with PHP. From time to time the
> httpd-processes are "hanging". When this happens
> then we're seeing a lot of soft lockups. This
> hosts are running Xeon X5560 processors. Until
> today I suspected that this problems only happens
> on older Xeon's but this doesn't seems to be true.
> I've attached the output from /var/log/messages
> (https://bugzilla.kernel.org/attachment.cgi?id=26048)
> from one of the hosts with GlusterFS. I've now
> downgraded to kernel 2.6.30 in the guests. But since
> this problem also exists in 2.6.34-rc3 I suspect that
> we're never ever will be able to do a kernel update
> in the guests when they're using NFS :-(
> 
> But what I definitely can say is that all the problems
> only happens with guests running kernel >= 2.6.31
> and with a remote file system (NFS, GlusterFS). Some
> days ago another KVM have had a network shutdown using
> kernel 2.6.32 in host and guest + NFSv4. But this only
> happend once until now and there isn't so much
> traffic running through the interfaces of that host.
> 
> All other guests with kernel 2.6.30 (about 80 guests on
> 18 hosts) with NFS and KVM 0.12.3 are really running
> perfectly.
> 
> Thanks!
> Robert
> 
> 
> 
> On 04/13/10 10:51, Robert Wimmer wrote:
> > I've tried to do my very best. In general I can
> > say: All 2.6.30 versions work, all 2.6.31 fail. 2.6.31-rc3
> > fails with "soft lockup" and is the only one which
> > don't show any "swapper page allocation failure".
> > But the result is finally the same... 2.6.31-rc4
> > don't show "soft lockups" but "swapper page allocation failure".
> > Here is the dmesg output for 2.6.31-rc3:
> > https://bugzilla.kernel.org/attachment.cgi?id=25986
> >
> > So here is what I've done. Started with a fresh tree
> > and my 2.6.30 .config:
> >
> > rm -fr /usr/src/linux
> > cd /usr/src
> > git clone
> > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
> > cd linux
> > git checkout 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
> >
> > Here is the "git bisect log" output:
> >
> > # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
> > # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30
> > git bisect start 'v2.6.31' 'v2.6.30' '--' 'drivers/virtio/'
> > 'drivers/net/virtio_net.c'
> > # good: [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance
> > id_matching for virtio drivers
> > git bisect good e3353853730eb99c56b7b0aed1667d51c0e3699a
> > # good: [9cbc1cb8cd46ce1f7645b9de249b2ce8460129bb] Merge branch 'master'
> > of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
> > git bisect good 9cbc1cb8cd46ce1f7645b9de249b2ce8460129bb
> > # bad: [ff52c3fc7188855ede75d87b022271f0da309e5b] virtio: fix memory
> > leak on device removal
> > git bisect bad ff52c3fc7188855ede75d87b022271f0da309e5b
> > # good: [31278e71471399beaff9280737e52b47db4dc345] net: group address
> > list and its count
> > git bisect good 31278e71471399beaff9280737e52b47db4dc345
> > # bad: [4b892e6582e3a4fe01f623aea386907270d5bf83] virtio-pci: correctly
> > unregister root device on error
> > git bisect bad 4b892e6582e3a4fe01f623aea386907270d5bf83
> >
> > Hopefully this gives you some hints. The problem
> > for me is that I don't know what commit I should
> > consider good or bad. Should I consider the
> > commit with the "soft lockup" as good because it
> > don't show the allocation failure? Currently it is
> > marked as bad (4b892e6582e3a4fe01f623aea386907270d5bf83).
> > What should I do next?
> >
> > Thanks!
> > Robert
> >
> > On 04/12/10 15:52, Michael S. Tsirkin wrote:
> >   
> >> On Mon, Apr 12, 2010 at 03:50:31PM +0200, Robert Wimmer wrote:
> >>   
> >>     
> >>> Sorry but I need some more git help. Here is what I've done.
> >>> Started with a fresh clone of the kernel:
> >>>
> >>> cd /usr/src
> >>> git clone
> >>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
> >>> cd linux
> >>> git checkout 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
> >>>
> >>> Since I already knew that this commit wasn't good I did
> >>>
> >>> git bisect start
> >>> git bisect bad
> >>>     
> >>>       
> >> I think what you miss is marking the good commit.
> >> bisect does a binary search but it needs to know
> >> both good and bad commits to search in the range.
> >>
> >> Optionally, you can use '-- drivers/virtio/ drivers/net/virtio_net.c'
> >> what this does is limit bisect to commits that touch files in
> >> question. This way you get much less tests to run
> >> (about 4) but after you find a first problematic commit
> >> you must verify that a commit just before it does not have the issue.
> >>
> >> If this turns out not to be the case, you'll have to
> >> fallback on full bisect, and we will now this is some
> >> other change in kernel that triggered the regression.
> >>
> >>
> >>   
> >>     
> >>> compiled and started over. As expected the problem returns.
> >>> So I've done another
> >>>
> >>> git bisect bad
> >>>
> >>> but I always get the same commit:
> >>>
> >>> kabul:/usr/src/linux # git bisect log
> >>> git bisect start
> >>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix
> >>> compilation on SMP
> >>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
> >>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix
> >>> compilation on SMP
> >>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
> >>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix
> >>> compilation on SMP
> >>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
> >>>
> >>> I've expected that after each "git bisect bad" I get the previous
> >>> commit before the "bad" one. How can get the previous commit?
> >>> The bisect documentation couldn't help me.
> >>>
> >>> Thanks!
> >>> Robert
> >>>
> >>>
> >>>
> >>> On 04/12/10 13:23, Michael S. Tsirkin wrote:
> >>>     
> >>>       
> >>>> On Mon, Apr 12, 2010 at 11:25:26AM +0200, Robert Wimmer wrote:
> >>>>   
> >>>>       
> >>>>         
> >>>>> server10:/usr/src/linux # git bisect start v2.6.31 v2.6.30 --
> >>>>> drivers/virtio/ drivers/net/virtio_net.c
> >>>>> Bisecting: 12 revisions left to test after this (roughly 4 steps)
> >>>>> [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance id_matching
> >>>>> for virtio drivers
> >>>>>
> >>>>>     
> >>>>>         
> >>>>>           
> >>>> Sorry I wasn't clear. the way to use bisect is as follows:
> >>>> - first start as you did now.
> >>>> 1. now build kernel, install and test
> >>>> 2. if bug is there, type 'git bisect bad'
> >>>> 3. if bug is not there, type 'git bisect good'
> >>>> 4. The above will give you another kernel version to test
> >>>>    if so go back to step 1
> >>>> 6. this will be repeated about 4 times (number of steps above)
> >>>> 7. in the end you will get the first revision which has the
> >>>>    problem. Let's assume it is revision ABCDEF.
> >>>>
> >>>>    Type git bisect log to see your history.
> >>>>
> >>>> 8. Now git reset --hard ABCDEF~1 and try again.
> >>>>
> >>>> If you see the problem with ABCDEF but not ABCDEF~1
> >>>> then we will have a good guess at the culprit.
> >>>>
> >>>> Some more tips here:
> >>>> http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
> >>>>
> >>>>
> >>>>   
> >>>>       
> >>>>         
> >>>>> Today I've upgraded to qemu-kvm-0.12.3-r1 (Gentoo package)
> >>>>> but doesn't help. Still getting "page allocation failure" with
> >>>>> 2.6.31-rc5.
> >>>>>
> >>>>> Does it makes sense to use the same 2.6.31-rc5 kernel
> >>>>> in the host and guest for testing? Currently I'm still using 2.6.32
> >>>>> in host and testing 2.6.31-rc5 in guest until "crashes".
> >>>>> Then I start the guest with 2.6.30 again which works
> >>>>> without trouble with 2.6.32 as host.
> >>>>>
> >>>>> This is really strange. I have hosts with 2.6.32 running
> >>>>> guests with 2.6.32 which works perfectly. These hosts
> >>>>> and guests running on HP DL 380 G6 with Intel Xeon X5560.
> >>>>> The guests which don't work with 2.6.32 (and 2.6.32
> >>>>> as host) running on HP DL 380 G5 with Intel Xeon L5420.
> >>>>>     
> >>>>>         
> >>>>>           
> >>>> Hmm. Some subtle race?
> >>>>
> >>>>   
> >>>>       
> >>>>         
> >>>>> (All guests) and (all hosts) have the same packages
> >>>>> and the same versions installed and the same kernel
> >>>>> configs (hosts and guests using different .config but the
> >>>>> difference is very small e.g. CONFIG_PARAVIRT_SPINLOCKS=y,
> >>>>> CONFIG_PARAVIRT_GUEST=y in guests but not in hosts
> >>>>> .config).
> >>>>>
> >>>>> I've had problems with qemu-kvm 0.12.2 with high network
> >>>>> traffic which was solved by a patch submitted by Tom
> >>>>> Lendacky:
> >>>>>
> >>>>> "Fix a race condition where qemu finds that there are not enough virtio
> >>>>> ring buffers available and the guest make more buffers available before
> >>>>> qemu can enable notifications."
> >>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg28667.html
> >>>>>
> >>>>> It was a real lifesaver for the HP DL 380 G6 mentioned
> >>>>> above but maybe this is now causing the problems with the G5 machines.
> >>>>> The symptoms are the same. I can still log into the guest
> >>>>> via VNC but the network is down.
> >>>>>
> >>>>> Thanks!
> >>>>> Robert
> >>>>>
> >>>>>     
> >>>>>         
> >>>>>           
> >>>> For now the only thing we seem to know for sure is that on
> >>>> specific hardware there's a regression between 2.6.30 and
> >>>> 2.6.31-rc5. Yes, it is possible that all it does
> >>>> is expose a qemu bug, but it's hard to say.
> >>>> Let's find out what change
> >>>> does that, this should give us a hint.
> >>>>
> >>>>   
> >>>>       
> >>>>         
> >>>>> On 04/11/10 13:03, Michael S. Tsirkin wrote:
> >>>>>     
> >>>>>         
> >>>>>           
> >>>>>> On Fri, Apr 09, 2010 at 12:15:01PM +0200, Robert Wimmer wrote:
> >>>>>>   
> >>>>>>       
> >>>>>>           
> >>>>>>             
> >>>>>>> I'm not really a git hero so here is what I've done:
> >>>>>>>
> >>>>>>> cd /usr/src
> >>>>>>> git clone
> >>>>>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
> >>>>>>> cd linux
> >>>>>>> git checkout -b mykernel 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
> >>>>>>>     
> >>>>>>>         
> >>>>>>>             
> >>>>>>>               
> >>>>>> Looks right.
> >>>>>>
> >>>>>>   
> >>>>>>       
> >>>>>>           
> >>>>>>             
> >>>>>>> Then I've checked
> >>>>>>>
> >>>>>>> drivers/net/virtio_net.c
> >>>>>>> drivers/net/smc91x.c
> >>>>>>>
> >>>>>>> if the changes commited where not in there.
> >>>>>>> Next I build my kernel as usual. I used my .config
> >>>>>>> from 2.6.30 (which is working fine in a several
> >>>>>>> guests / .config see here:
> >>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=25925)
> >>>>>>> and build the kernel
> >>>>>>>
> >>>>>>> genkernel --menuconfig --lvm --oldconfig all
> >>>>>>>
> >>>>>>> which finally gave me a 2.6.31-rc5.
> >>>>>>>     
> >>>>>>>         
> >>>>>>>             
> >>>>>>>               
> >>>>>> That's right.
> >>>>>>
> >>>>>>   
> >>>>>>       
> >>>>>>           
> >>>>>>             
> >>>>>>> I should mention
> >>>>>>> that 2.6.30 was using SLUB. So here is the output
> >>>>>>> from the 2.6.31-rc5 kernel running about 20 min.:
> >>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=25926
> >>>>>>>     
> >>>>>>>         
> >>>>>>>             
> >>>>>>>               
> >>>>>> Hmm, so we see the error here as well?
> >>>>>>
> >>>>>>   
> >>>>>>       
> >>>>>>           
> >>>>>>             
> >>>>>>> Seems not very usefull to me. I'm currently compiling
> >>>>>>> the same kernel with SLAB.
> >>>>>>>
> >>>>>>> Please let me know if the git commands above are
> >>>>>>> right and/or if you need other kernel options enabled.
> >>>>>>>     
> >>>>>>>         
> >>>>>>>             
> >>>>>>>               
> >>>>>> Looks right. You don't have to add -b flag if you don't
> >>>>>> want to.
> >>>>>>
> >>>>>>   
> >>>>>>       
> >>>>>>           
> >>>>>>             
> >>>>>>> Thanks!
> >>>>>>> Robert
> >>>>>>>     
> >>>>>>>         
> >>>>>>>             
> >>>>>>>               
> >>>>>> Hmm, I do not see anything else that seems related.
> >>>>>> Could you please try to bisect?
> >>>>>>
> >>>>>> git bisect start v2.6.31 v2.6.30 -- drivers/virtio/ drivers/net/virtio_net.c
> >>>>>>
> >>>>>> should help assuming the change that triggers this is in virtio.
> >>>>>>
> >>>>>>
> >>>>>>   
> >>>>>>       
> >>>>>>           
> >>>>>>             
> >>>>>>> On 04/08/10 22:04, Michael S. Tsirkin wrote:
> >>>>>>>     
> >>>>>>>         
> >>>>>>>             
> >>>>>>>               
> >>>>>>>> On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote:
> >>>>>>>>   
> >>>>>>>>       
> >>>>>>>>           
> >>>>>>>>               
> >>>>>>>>                 
> >>>>>>>>> cc: mst
> >>>>>>>>>
> >>>>>>>>> On 04/08/2010 10:34 PM, Andrew Morton wrote:
> >>>>>>>>>     
> >>>>>>>>>         
> >>>>>>>>>             
> >>>>>>>>>                 
> >>>>>>>>>                   
> >>>>>>>>>> (switched to email.  Please respond via emailed reply-to-all, not via the
> >>>>>>>>>> bugzilla web interface).
> >>>>>>>>>>
> >>>>>>>>>> On Wed, 7 Apr 2010 10:29:20 GMT
> >>>>>>>>>> bugzilla-daemon@bugzilla.kernel.org wrote:
> >>>>>>>>>>
> >>>>>>>>>>    
> >>>>>>>>>>       
> >>>>>>>>>>           
> >>>>>>>>>>               
> >>>>>>>>>>                   
> >>>>>>>>>>                     
> >>>>>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709
> >>>>>>>>>>>
> >>>>>>>>>>>             Summary: swapper page allocation failure
> >>>>>>>>>>>             Product: Memory Management
> >>>>>>>>>>>             Version: 2.5
> >>>>>>>>>>>      Kernel Version: 2.6.32 and 2.6.33
> >>>>>>>>>>>            Platform: All
> >>>>>>>>>>>          OS/Version: Linux
> >>>>>>>>>>>                Tree: Mainline
> >>>>>>>>>>>              Status: NEW
> >>>>>>>>>>>            Severity: normal
> >>>>>>>>>>>            Priority: P1
> >>>>>>>>>>>           Component: Slab Allocator
> >>>>>>>>>>>          AssignedTo: akpm@linux-foundation.org
> >>>>>>>>>>>          ReportedBy: kernel@tauceti.net
> >>>>>>>>>>>          Regression: No
> >>>>>>>>>>>
> >>>>>>>>>>>
> >>>>>>>>>>> Created an attachment (id=25903)
> >>>>>>>>>>>   -->  (https://bugzilla.kernel.org/attachment.cgi?id=25903)
> >>>>>>>>>>> dmesg output
> >>>>>>>>>>>
> >>>>>>>>>>> I'm having problems with "swapper page allocation failure's" since upgrading
> >>>>>>>>>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur inside a kernel virtual
> >>>>>>>>>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host which works fine. As
> >>>>>>>>>>> long as kernel 2.6.30 is used as guest kernel the guest runs fine. But after
> >>>>>>>>>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page allocation failure's" (see
> >>>>>>>>>>> attachment of dmesg output). The guest is only running a Apache webserver and
> >>>>>>>>>>> serves files from a NFS share. It has 1 GB RAM and 2 virtual CPUs. I've tried
> >>>>>>>>>>> different kernel configurations (e.g. a unmodified version from Sabayon Linux
> >>>>>>>>>>> Distribution) but doesn't help. Load of the guest (and host) is very low.
> >>>>>>>>>>> Network traffic is about 20-50 MBit/s.
> >>>>>>>>>>>
> >>>>>>>>>>>      
> >>>>>>>>>>>         
> >>>>>>>>>>>             
> >>>>>>>>>>>                 
> >>>>>>>>>>>                     
> >>>>>>>>>>>                       
> >>>>>>>>>> hm, this is a regression.
> >>>>>>>>>>
> >>>>>>>>>> : [  454.006706] users: page allocation failure. order:0, mode:0x20
> >>>>>>>>>> : [  454.006712] Pid: 7992, comm: users Not tainted 2.6.34-rc3-git6 #2
> >>>>>>>>>> : [  454.006714] Call Trace:
> >>>>>>>>>> : [  454.006717]<IRQ>   [<ffffffff8109dff7>] __alloc_pages_nodemask+0x5c8/0x615
> >>>>>>>>>> : [  454.006796]  [<ffffffff817860ce>] ? ip_local_deliver+0x65/0x6d
> >>>>>>>>>> : [  454.006820]  [<ffffffff810c39c4>] alloc_pages_current+0x96/0x9f
> >>>>>>>>>> : [  454.006842]  [<ffffffff8167f2c7>] try_fill_recv+0x5e/0x20f
> >>>>>>>>>> : [  454.006846]  [<ffffffff8167fe13>] virtnet_poll+0x52a/0x5c7
> >>>>>>>>>> : [  454.006858]  [<ffffffff8104fe74>] ? run_timer_softirq+0x1dc/0x1f4
> >>>>>>>>>> : [  454.006873]  [<ffffffff8176035d>] net_rx_action+0xad/0x1a5
> >>>>>>>>>> : [  454.006882]  [<ffffffff8104b6cd>] __do_softirq+0x9c/0x127
> >>>>>>>>>> : [  454.006897]  [<ffffffff81008ffc>] call_softirq+0x1c/0x30
> >>>>>>>>>> : [  454.006901]  [<ffffffff8100af01>] do_softirq+0x41/0x7e
> >>>>>>>>>> : [  454.006904]  [<ffffffff8104b3e3>] irq_exit+0x36/0x75
> >>>>>>>>>> : [  454.006907]  [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1
> >>>>>>>>>> : [  454.006926]  [<ffffffff8183bc13>] ret_from_intr+0x0/0x11
> >>>>>>>>>> : [  454.006928]<EOI>   [<ffffffff81026b25>] ? kvm_deferred_mmu_op+0x5e/0xe7
> >>>>>>>>>> : [  454.006942]  [<ffffffff81026b19>] ? kvm_deferred_mmu_op+0x52/0xe7
> >>>>>>>>>> : [  454.006946]  [<ffffffff81026c03>] kvm_mmu_write+0x2e/0x35
> >>>>>>>>>> : [  454.006949]  [<ffffffff81026c7d>] kvm_set_pte_at+0x19/0x1b
> >>>>>>>>>> : [  454.006953]  [<ffffffff810aba67>] __do_fault+0x3c4/0x492
> >>>>>>>>>> : [  454.006957]  [<ffffffff810adcf4>] handle_mm_fault+0x478/0x9d8
> >>>>>>>>>> : [  454.006966]  [<ffffffff810deb59>] ? path_put+0x2c/0x30
> >>>>>>>>>> : [  454.006975]  [<ffffffff8102f162>] do_page_fault+0x2f6/0x31a
> >>>>>>>>>> : [  454.006979]  [<ffffffff8183b81e>] ? _raw_spin_lock+0x9/0xd
> >>>>>>>>>> : [  454.006982]  [<ffffffff8183bef5>] page_fault+0x25/0x30
> >>>>>>>>>> : [  454.006985] Mem-Info:
> >>>>>>>>>> : [  454.006987] Node 0 DMA per-cpu:
> >>>>>>>>>> : [  454.006990] CPU    0: hi:    0, btch:   1 usd:   0
> >>>>>>>>>> : [  454.006992] CPU    1: hi:    0, btch:   1 usd:   0
> >>>>>>>>>> : [  454.006993] Node 0 DMA32 per-cpu:
> >>>>>>>>>> : [  454.006996] CPU    0: hi:  186, btch:  31 usd: 185
> >>>>>>>>>> : [  454.006998] CPU    1: hi:  186, btch:  31 usd: 112
> >>>>>>>>>> : [  454.007003] active_anon:8308 inactive_anon:8544 isolated_anon:0
> >>>>>>>>>> : [  454.007005]  active_file:4882 inactive_file:205902 isolated_file:0
> >>>>>>>>>> : [  454.007006]  unevictable:0 dirty:11 writeback:0 unstable:0
> >>>>>>>>>> : [  454.007007]  free:1385 slab_reclaimable:2445 slab_unreclaimable:4466
> >>>>>>>>>> : [  454.007008]  mapped:1895 shmem:113 pagetables:1370 bounce:0
> >>>>>>>>>> : [  454.007010] Node 0 DMA free:4000kB min:60kB low:72kB high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB inactive_file:11844kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> >>>>>>>>>> : [  454.007021] lowmem_reserve[]: 0 994 994 994
> >>>>>>>>>> : [  454.007025] Node 0 DMA32 free:1540kB min:4000kB low:5000kB high:6000kB active_anon:33232kB inactive_anon:34176kB active_file:19528kB inactive_file:811764kB unevictable:0kB isolated(anon):0kB isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB writeback:0kB mapped:7580kB shmem:452kB slab_reclaimable:9716kB slab_unreclaimable:17832kB kernel_stack:1144kB pagetables:5480kB unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
> >>>>>>>>>> : [  454.007036] lowmem_reserve[]: 0 0 0 0
> >>>>>>>>>> : [  454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB 6*64kB 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB
> >>>>>>>>>> : [  454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB = 1556kB
> >>>>>>>>>> : [  454.007059] 210914 total pagecache pages
> >>>>>>>>>> : [  454.007061] 0 pages in swap cache
> >>>>>>>>>> : [  454.007063] Swap cache stats: add 0, delete 0, find 0/0
> >>>>>>>>>> : [  454.007065] Free swap  = 1959924kB
> >>>>>>>>>> : [  454.007067] Total swap = 1959924kB
> >>>>>>>>>> : [  454.014238] 262140 pages RAM
> >>>>>>>>>> : [  454.014241] 7489 pages reserved
> >>>>>>>>>> : [  454.014242] 21430 pages shared
> >>>>>>>>>> : [  454.014244] 247174 pages non-shared
> >>>>>>>>>>
> >>>>>>>>>> Either page reclaim got worse or kvm/virtio-net got more aggressive.
> >>>>>>>>>>
> >>>>>>>>>> Avi, Rusty: can you think of any changes in the KVM/virtio area in the
> >>>>>>>>>> 2.6.30 ->  2.6.32 timeframe which may have increased the GFP_ATOMIC
> >>>>>>>>>> demands upon the page allocator?
> >>>>>>>>>>
> >>>>>>>>>> Thanks.
> >>>>>>>>>>    
> >>>>>>>>>>       
> >>>>>>>>>>           
> >>>>>>>>>>               
> >>>>>>>>>>                   
> >>>>>>>>>>                     
> >>>>>>>> On the contrary, with commit
> >>>>>>>> 3161e453e496eb5643faad30fff5a5ab183da0fe
> >>>>>>>> we should be using GFP_ATOMIC less.
> >>>>>>>> But maybe there's a bug and it has the reverse effect somehow ...
> >>>>>>>>
> >>>>>>>> Robert, could you pls try 3161e453e496eb5643faad30fff5a5ab183da0fe
> >>>>>>>> and if that *does* have the problem,
> >>>>>>>> 0b4f2928f14c4a9770b0866923fc81beb7f4aa57?
> >>>>>>>>
> >>>>>>>>   
> >>>>>>>>       
> >>>>>>>>           
> >>>>>>>>               
> >>>>>>>>                 
> >   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-21 11:23                         ` kernel
@ 2010-04-21  9:42                           ` Michael S. Tsirkin
  2010-04-22 11:31                             ` kernel
  0 siblings, 1 reply; 62+ messages in thread
From: Michael S. Tsirkin @ 2010-04-21  9:42 UTC (permalink / raw)
  To: kernel
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman

On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote:
> So after the compiler was running hot I've now the following result:
> 
> server10:/usr/src/linux # git bisect log 
> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30
> git bisect start 'v2.6.31' 'v2.6.30'
> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736):
> videobuf: modify return value of VIDIOC_REQBUFS ioctl
> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3
> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device
> capabilities of 82599 single speed fiber NICs.
> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e
> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android:
> lowmemorykiller: fix up remaining checkpatch warnings
> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03
> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch
> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70
> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch 'for-linus'
> of git://www.jni.nu/cris
> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d
> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge
> git://git.infradead.org/mtd-2.6
> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2
> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091):
> gspca_sonixj: Add light frequency control
> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb
> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge
> git://git.infradead.org/~dwmw2/iommu-2.6.31
> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b
> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch 'for-linus'
> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a
> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix card
> driver reloading
> git bisect good b01b4babbf204443b5a846a7494546501614cefc
> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace
> nfs4_path_walk() with VFS path lookup in a private namespace
> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50
> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the
> function put_mnt_ns()
> git bisect good 616511d039af402670de8500d0e24495113a9cab
> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper
> functions for setting up private namespaces
> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae
> 
> 
> The last "git bisect good" prints out:
> 
> server10:/usr/src/linux # git bisect good
> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit
> commit c02d7adf8c5429727a98bad1d039bccad4c61c50
> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
> Date:   Mon Jun 22 15:09:14 2009 -0400
> 
>     NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private
> namespace
>     
>     As noted in the previous patch, the NFSv4 client mount code currently
>     has several limitations. If the mount path contains symlinks, or
>     referrals, or even if it just contains a '..', then the client code in
>     nfs4_path_walk() will fail with an error.
>     
>     This patch replaces the nfs4_path_walk()-based lookup with a helper
>     function that sets up a private namespace to represent the namespace
> on the
>     server, then uses the ordinary VFS and NFS path lookup code to walk
> down the
>     mount path in that namespace.
>     
>     Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> 
> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc
> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M      fs
> 
> Does this help you any further?
> 
> Thanks!
> Robert

Looks suspiciously like some error in testing.
Could you pls retest and verify again that cf8d2c11cb77f129675478792122f50827e5b0ae
is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad?

-- 
MST

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-19 13:17                       ` Michael S. Tsirkin
@ 2010-04-21 11:23                         ` kernel
  2010-04-21  9:42                           ` Michael S. Tsirkin
  0 siblings, 1 reply; 62+ messages in thread
From: kernel @ 2010-04-21 11:23 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman

So after the compiler was running hot I've now the following result:

server10:/usr/src/linux # git bisect log 
# bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
# good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30
git bisect start 'v2.6.31' 'v2.6.30'
# good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736):
videobuf: modify return value of VIDIOC_REQBUFS ioctl
git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3
# bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device
capabilities of 82599 single speed fiber NICs.
git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e
# good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android:
lowmemorykiller: fix up remaining checkpatch warnings
git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03
# good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch
'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70
# bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch 'for-linus'
of git://www.jni.nu/cris
git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d
# good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge
git://git.infradead.org/mtd-2.6
git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2
# bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091):
gspca_sonixj: Add light frequency control
git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb
# bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge
git://git.infradead.org/~dwmw2/iommu-2.6.31
git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b
# bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch 'for-linus'
of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a
# good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix card
driver reloading
git bisect good b01b4babbf204443b5a846a7494546501614cefc
# bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace
nfs4_path_walk() with VFS path lookup in a private namespace
git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50
# good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the
function put_mnt_ns()
git bisect good 616511d039af402670de8500d0e24495113a9cab
# good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper
functions for setting up private namespaces
git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae


The last "git bisect good" prints out:

server10:/usr/src/linux # git bisect good
c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit
commit c02d7adf8c5429727a98bad1d039bccad4c61c50
Author: Trond Myklebust <Trond.Myklebust@netapp.com>
Date:   Mon Jun 22 15:09:14 2009 -0400

    NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private
namespace
    
    As noted in the previous patch, the NFSv4 client mount code currently
    has several limitations. If the mount path contains symlinks, or
    referrals, or even if it just contains a '..', then the client code in
    nfs4_path_walk() will fail with an error.
    
    This patch replaces the nfs4_path_walk()-based lookup with a helper
    function that sets up a private namespace to represent the namespace
on the
    server, then uses the ordinary VFS and NFS path lookup code to walk
down the
    mount path in that namespace.
    
    Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
    Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>

:040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc
9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M      fs

Does this help you any further?

Thanks!
Robert


On Mon, 19 Apr 2010 16:17:18 +0300, "Michael S. Tsirkin" <mst@redhat.com>
wrote:
> So it seems the change that created the problem was not
> specific to virtio.
> 
> To track this further down, I think the thing to try
> would be to do a full bisect.
> 
> That is instead of git bisect start 'v2.6.31' 'v2.6.30' '--'
> 'drivers/virtio/'
> 'drivers/net/virtio_net.c'
> 
> do
> 
> git bisect start 'v2.6.31' 'v2.6.30'
> 
> and then test kernel versions as they are generated.
> 
> 
> On Mon, Apr 19, 2010 at 02:55:21PM +0200, Robert Wimmer wrote:
>> Is there a possibility to track this further down?
>> I've problems on two other KVMs since a few weeks
>> which I think that they're related to this. Host for
>> this KVMs are kernel 2.6.32. Guests until today were
>> also running 2.6.32. Inside the KVMs we're using GlusterFS,
>> NFSv4 and Apache with PHP. From time to time the
>> httpd-processes are "hanging". When this happens
>> then we're seeing a lot of soft lockups. This
>> hosts are running Xeon X5560 processors. Until
>> today I suspected that this problems only happens
>> on older Xeon's but this doesn't seems to be true.
>> I've attached the output from /var/log/messages
>> (https://bugzilla.kernel.org/attachment.cgi?id=26048)
>> from one of the hosts with GlusterFS. I've now
>> downgraded to kernel 2.6.30 in the guests. But since
>> this problem also exists in 2.6.34-rc3 I suspect that
>> we're never ever will be able to do a kernel update
>> in the guests when they're using NFS :-(
>> 
>> But what I definitely can say is that all the problems
>> only happens with guests running kernel >= 2.6.31
>> and with a remote file system (NFS, GlusterFS). Some
>> days ago another KVM have had a network shutdown using
>> kernel 2.6.32 in host and guest + NFSv4. But this only
>> happend once until now and there isn't so much
>> traffic running through the interfaces of that host.
>> 
>> All other guests with kernel 2.6.30 (about 80 guests on
>> 18 hosts) with NFS and KVM 0.12.3 are really running
>> perfectly.
>> 
>> Thanks!
>> Robert
>> 
>> 
>> 
>> On 04/13/10 10:51, Robert Wimmer wrote:
>> > I've tried to do my very best. In general I can
>> > say: All 2.6.30 versions work, all 2.6.31 fail. 2.6.31-rc3
>> > fails with "soft lockup" and is the only one which
>> > don't show any "swapper page allocation failure".
>> > But the result is finally the same... 2.6.31-rc4
>> > don't show "soft lockups" but "swapper page allocation failure".
>> > Here is the dmesg output for 2.6.31-rc3:
>> > https://bugzilla.kernel.org/attachment.cgi?id=25986
>> >
>> > So here is what I've done. Started with a fresh tree
>> > and my 2.6.30 .config:
>> >
>> > rm -fr /usr/src/linux
>> > cd /usr/src
>> > git clone
>> > git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
>> > linux
>> > cd linux
>> > git checkout 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
>> >
>> > Here is the "git bisect log" output:
>> >
>> > # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
>> > # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30
>> > git bisect start 'v2.6.31' 'v2.6.30' '--' 'drivers/virtio/'
>> > 'drivers/net/virtio_net.c'
>> > # good: [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance
>> > id_matching for virtio drivers
>> > git bisect good e3353853730eb99c56b7b0aed1667d51c0e3699a
>> > # good: [9cbc1cb8cd46ce1f7645b9de249b2ce8460129bb] Merge branch
>> > 'master'
>> > of master.kernel.org:/pub/scm/linux/kernel/git/torvalds/linux-2.6
>> > git bisect good 9cbc1cb8cd46ce1f7645b9de249b2ce8460129bb
>> > # bad: [ff52c3fc7188855ede75d87b022271f0da309e5b] virtio: fix memory
>> > leak on device removal
>> > git bisect bad ff52c3fc7188855ede75d87b022271f0da309e5b
>> > # good: [31278e71471399beaff9280737e52b47db4dc345] net: group address
>> > list and its count
>> > git bisect good 31278e71471399beaff9280737e52b47db4dc345
>> > # bad: [4b892e6582e3a4fe01f623aea386907270d5bf83] virtio-pci:
correctly
>> > unregister root device on error
>> > git bisect bad 4b892e6582e3a4fe01f623aea386907270d5bf83
>> >
>> > Hopefully this gives you some hints. The problem
>> > for me is that I don't know what commit I should
>> > consider good or bad. Should I consider the
>> > commit with the "soft lockup" as good because it
>> > don't show the allocation failure? Currently it is
>> > marked as bad (4b892e6582e3a4fe01f623aea386907270d5bf83).
>> > What should I do next?
>> >
>> > Thanks!
>> > Robert
>> >
>> > On 04/12/10 15:52, Michael S. Tsirkin wrote:
>> >   
>> >> On Mon, Apr 12, 2010 at 03:50:31PM +0200, Robert Wimmer wrote:
>> >>   
>> >>     
>> >>> Sorry but I need some more git help. Here is what I've done.
>> >>> Started with a fresh clone of the kernel:
>> >>>
>> >>> cd /usr/src
>> >>> git clone
>> >>>
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
>> >>> linux
>> >>> cd linux
>> >>> git checkout 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
>> >>>
>> >>> Since I already knew that this commit wasn't good I did
>> >>>
>> >>> git bisect start
>> >>> git bisect bad
>> >>>     
>> >>>       
>> >> I think what you miss is marking the good commit.
>> >> bisect does a binary search but it needs to know
>> >> both good and bad commits to search in the range.
>> >>
>> >> Optionally, you can use '-- drivers/virtio/
drivers/net/virtio_net.c'
>> >> what this does is limit bisect to commits that touch files in
>> >> question. This way you get much less tests to run
>> >> (about 4) but after you find a first problematic commit
>> >> you must verify that a commit just before it does not have the
issue.
>> >>
>> >> If this turns out not to be the case, you'll have to
>> >> fallback on full bisect, and we will now this is some
>> >> other change in kernel that triggered the regression.
>> >>
>> >>
>> >>   
>> >>     
>> >>> compiled and started over. As expected the problem returns.
>> >>> So I've done another
>> >>>
>> >>> git bisect bad
>> >>>
>> >>> but I always get the same commit:
>> >>>
>> >>> kabul:/usr/src/linux # git bisect log
>> >>> git bisect start
>> >>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix
>> >>> compilation on SMP
>> >>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
>> >>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix
>> >>> compilation on SMP
>> >>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
>> >>> # bad: [0b4f2928f14c4a9770b0866923fc81beb7f4aa57] smc91x: fix
>> >>> compilation on SMP
>> >>> git bisect bad 0b4f2928f14c4a9770b0866923fc81beb7f4aa57
>> >>>
>> >>> I've expected that after each "git bisect bad" I get the previous
>> >>> commit before the "bad" one. How can get the previous commit?
>> >>> The bisect documentation couldn't help me.
>> >>>
>> >>> Thanks!
>> >>> Robert
>> >>>
>> >>>
>> >>>
>> >>> On 04/12/10 13:23, Michael S. Tsirkin wrote:
>> >>>     
>> >>>       
>> >>>> On Mon, Apr 12, 2010 at 11:25:26AM +0200, Robert Wimmer wrote:
>> >>>>   
>> >>>>       
>> >>>>         
>> >>>>> server10:/usr/src/linux # git bisect start v2.6.31 v2.6.30 --
>> >>>>> drivers/virtio/ drivers/net/virtio_net.c
>> >>>>> Bisecting: 12 revisions left to test after this (roughly 4 steps)
>> >>>>> [e3353853730eb99c56b7b0aed1667d51c0e3699a] virtio: enhance
>> >>>>> id_matching
>> >>>>> for virtio drivers
>> >>>>>
>> >>>>>     
>> >>>>>         
>> >>>>>           
>> >>>> Sorry I wasn't clear. the way to use bisect is as follows:
>> >>>> - first start as you did now.
>> >>>> 1. now build kernel, install and test
>> >>>> 2. if bug is there, type 'git bisect bad'
>> >>>> 3. if bug is not there, type 'git bisect good'
>> >>>> 4. The above will give you another kernel version to test
>> >>>>    if so go back to step 1
>> >>>> 6. this will be repeated about 4 times (number of steps above)
>> >>>> 7. in the end you will get the first revision which has the
>> >>>>    problem. Let's assume it is revision ABCDEF.
>> >>>>
>> >>>>    Type git bisect log to see your history.
>> >>>>
>> >>>> 8. Now git reset --hard ABCDEF~1 and try again.
>> >>>>
>> >>>> If you see the problem with ABCDEF but not ABCDEF~1
>> >>>> then we will have a good guess at the culprit.
>> >>>>
>> >>>> Some more tips here:
>> >>>> http://www.kernel.org/pub/software/scm/git/docs/git-bisect.html
>> >>>>
>> >>>>
>> >>>>   
>> >>>>       
>> >>>>         
>> >>>>> Today I've upgraded to qemu-kvm-0.12.3-r1 (Gentoo package)
>> >>>>> but doesn't help. Still getting "page allocation failure" with
>> >>>>> 2.6.31-rc5.
>> >>>>>
>> >>>>> Does it makes sense to use the same 2.6.31-rc5 kernel
>> >>>>> in the host and guest for testing? Currently I'm still using
2.6.32
>> >>>>> in host and testing 2.6.31-rc5 in guest until "crashes".
>> >>>>> Then I start the guest with 2.6.30 again which works
>> >>>>> without trouble with 2.6.32 as host.
>> >>>>>
>> >>>>> This is really strange. I have hosts with 2.6.32 running
>> >>>>> guests with 2.6.32 which works perfectly. These hosts
>> >>>>> and guests running on HP DL 380 G6 with Intel Xeon X5560.
>> >>>>> The guests which don't work with 2.6.32 (and 2.6.32
>> >>>>> as host) running on HP DL 380 G5 with Intel Xeon L5420.
>> >>>>>     
>> >>>>>         
>> >>>>>           
>> >>>> Hmm. Some subtle race?
>> >>>>
>> >>>>   
>> >>>>       
>> >>>>         
>> >>>>> (All guests) and (all hosts) have the same packages
>> >>>>> and the same versions installed and the same kernel
>> >>>>> configs (hosts and guests using different .config but the
>> >>>>> difference is very small e.g. CONFIG_PARAVIRT_SPINLOCKS=y,
>> >>>>> CONFIG_PARAVIRT_GUEST=y in guests but not in hosts
>> >>>>> .config).
>> >>>>>
>> >>>>> I've had problems with qemu-kvm 0.12.2 with high network
>> >>>>> traffic which was solved by a patch submitted by Tom
>> >>>>> Lendacky:
>> >>>>>
>> >>>>> "Fix a race condition where qemu finds that there are not enough
>> >>>>> virtio
>> >>>>> ring buffers available and the guest make more buffers available
>> >>>>> before
>> >>>>> qemu can enable notifications."
>> >>>>> http://www.mail-archive.com/kvm@vger.kernel.org/msg28667.html
>> >>>>>
>> >>>>> It was a real lifesaver for the HP DL 380 G6 mentioned
>> >>>>> above but maybe this is now causing the problems with the G5
>> >>>>> machines.
>> >>>>> The symptoms are the same. I can still log into the guest
>> >>>>> via VNC but the network is down.
>> >>>>>
>> >>>>> Thanks!
>> >>>>> Robert
>> >>>>>
>> >>>>>     
>> >>>>>         
>> >>>>>           
>> >>>> For now the only thing we seem to know for sure is that on
>> >>>> specific hardware there's a regression between 2.6.30 and
>> >>>> 2.6.31-rc5. Yes, it is possible that all it does
>> >>>> is expose a qemu bug, but it's hard to say.
>> >>>> Let's find out what change
>> >>>> does that, this should give us a hint.
>> >>>>
>> >>>>   
>> >>>>       
>> >>>>         
>> >>>>> On 04/11/10 13:03, Michael S. Tsirkin wrote:
>> >>>>>     
>> >>>>>         
>> >>>>>           
>> >>>>>> On Fri, Apr 09, 2010 at 12:15:01PM +0200, Robert Wimmer wrote:
>> >>>>>>   
>> >>>>>>       
>> >>>>>>           
>> >>>>>>             
>> >>>>>>> I'm not really a git hero so here is what I've done:
>> >>>>>>>
>> >>>>>>> cd /usr/src
>> >>>>>>> git clone
>> >>>>>>>
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git
>> >>>>>>> linux
>> >>>>>>> cd linux
>> >>>>>>> git checkout -b mykernel
0b4f2928f14c4a9770b0866923fc81beb7f4aa57
>> >>>>>>>     
>> >>>>>>>         
>> >>>>>>>             
>> >>>>>>>               
>> >>>>>> Looks right.
>> >>>>>>
>> >>>>>>   
>> >>>>>>       
>> >>>>>>           
>> >>>>>>             
>> >>>>>>> Then I've checked
>> >>>>>>>
>> >>>>>>> drivers/net/virtio_net.c
>> >>>>>>> drivers/net/smc91x.c
>> >>>>>>>
>> >>>>>>> if the changes commited where not in there.
>> >>>>>>> Next I build my kernel as usual. I used my .config
>> >>>>>>> from 2.6.30 (which is working fine in a several
>> >>>>>>> guests / .config see here:
>> >>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=25925)
>> >>>>>>> and build the kernel
>> >>>>>>>
>> >>>>>>> genkernel --menuconfig --lvm --oldconfig all
>> >>>>>>>
>> >>>>>>> which finally gave me a 2.6.31-rc5.
>> >>>>>>>     
>> >>>>>>>         
>> >>>>>>>             
>> >>>>>>>               
>> >>>>>> That's right.
>> >>>>>>
>> >>>>>>   
>> >>>>>>       
>> >>>>>>           
>> >>>>>>             
>> >>>>>>> I should mention
>> >>>>>>> that 2.6.30 was using SLUB. So here is the output
>> >>>>>>> from the 2.6.31-rc5 kernel running about 20 min.:
>> >>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=25926
>> >>>>>>>     
>> >>>>>>>         
>> >>>>>>>             
>> >>>>>>>               
>> >>>>>> Hmm, so we see the error here as well?
>> >>>>>>
>> >>>>>>   
>> >>>>>>       
>> >>>>>>           
>> >>>>>>             
>> >>>>>>> Seems not very usefull to me. I'm currently compiling
>> >>>>>>> the same kernel with SLAB.
>> >>>>>>>
>> >>>>>>> Please let me know if the git commands above are
>> >>>>>>> right and/or if you need other kernel options enabled.
>> >>>>>>>     
>> >>>>>>>         
>> >>>>>>>             
>> >>>>>>>               
>> >>>>>> Looks right. You don't have to add -b flag if you don't
>> >>>>>> want to.
>> >>>>>>
>> >>>>>>   
>> >>>>>>       
>> >>>>>>           
>> >>>>>>             
>> >>>>>>> Thanks!
>> >>>>>>> Robert
>> >>>>>>>     
>> >>>>>>>         
>> >>>>>>>             
>> >>>>>>>               
>> >>>>>> Hmm, I do not see anything else that seems related.
>> >>>>>> Could you please try to bisect?
>> >>>>>>
>> >>>>>> git bisect start v2.6.31 v2.6.30 -- drivers/virtio/
>> >>>>>> drivers/net/virtio_net.c
>> >>>>>>
>> >>>>>> should help assuming the change that triggers this is in virtio.
>> >>>>>>
>> >>>>>>
>> >>>>>>   
>> >>>>>>       
>> >>>>>>           
>> >>>>>>             
>> >>>>>>> On 04/08/10 22:04, Michael S. Tsirkin wrote:
>> >>>>>>>     
>> >>>>>>>         
>> >>>>>>>             
>> >>>>>>>               
>> >>>>>>>> On Thu, Apr 08, 2010 at 10:39:34PM +0300, Avi Kivity wrote:
>> >>>>>>>>   
>> >>>>>>>>       
>> >>>>>>>>           
>> >>>>>>>>               
>> >>>>>>>>                 
>> >>>>>>>>> cc: mst
>> >>>>>>>>>
>> >>>>>>>>> On 04/08/2010 10:34 PM, Andrew Morton wrote:
>> >>>>>>>>>     
>> >>>>>>>>>         
>> >>>>>>>>>             
>> >>>>>>>>>                 
>> >>>>>>>>>                   
>> >>>>>>>>>> (switched to email.  Please respond via emailed
reply-to-all,
>> >>>>>>>>>> not via the
>> >>>>>>>>>> bugzilla web interface).
>> >>>>>>>>>>
>> >>>>>>>>>> On Wed, 7 Apr 2010 10:29:20 GMT
>> >>>>>>>>>> bugzilla-daemon@bugzilla.kernel.org wrote:
>> >>>>>>>>>>
>> >>>>>>>>>>    
>> >>>>>>>>>>       
>> >>>>>>>>>>           
>> >>>>>>>>>>               
>> >>>>>>>>>>                   
>> >>>>>>>>>>                     
>> >>>>>>>>>>> https://bugzilla.kernel.org/show_bug.cgi?id=15709
>> >>>>>>>>>>>
>> >>>>>>>>>>>             Summary: swapper page allocation failure
>> >>>>>>>>>>>             Product: Memory Management
>> >>>>>>>>>>>             Version: 2.5
>> >>>>>>>>>>>      Kernel Version: 2.6.32 and 2.6.33
>> >>>>>>>>>>>            Platform: All
>> >>>>>>>>>>>          OS/Version: Linux
>> >>>>>>>>>>>                Tree: Mainline
>> >>>>>>>>>>>              Status: NEW
>> >>>>>>>>>>>            Severity: normal
>> >>>>>>>>>>>            Priority: P1
>> >>>>>>>>>>>           Component: Slab Allocator
>> >>>>>>>>>>>          AssignedTo: akpm@linux-foundation.org
>> >>>>>>>>>>>          ReportedBy: kernel@tauceti.net
>> >>>>>>>>>>>          Regression: No
>> >>>>>>>>>>>
>> >>>>>>>>>>>
>> >>>>>>>>>>> Created an attachment (id=25903)
>> >>>>>>>>>>>   --> 
(https://bugzilla.kernel.org/attachment.cgi?id=25903)
>> >>>>>>>>>>> dmesg output
>> >>>>>>>>>>>
>> >>>>>>>>>>> I'm having problems with "swapper page allocation
failure's"
>> >>>>>>>>>>> since upgrading
>> >>>>>>>>>>> from kernel 2.6.30 to 2.6.32/2.6.33. The problems occur
>> >>>>>>>>>>> inside a kernel virtual
>> >>>>>>>>>>> maschine (KVM). Running Gentoo with kernel 2.6.32 as host
>> >>>>>>>>>>> which works fine. As
>> >>>>>>>>>>> long as kernel 2.6.30 is used as guest kernel the guest
runs
>> >>>>>>>>>>> fine. But after
>> >>>>>>>>>>> upgrading to 2.6.32 and 2.6.33 I get "swapper page
>> >>>>>>>>>>> allocation failure's" (see
>> >>>>>>>>>>> attachment of dmesg output). The guest is only running a
>> >>>>>>>>>>> Apache webserver and
>> >>>>>>>>>>> serves files from a NFS share. It has 1 GB RAM and 2
virtual
>> >>>>>>>>>>> CPUs. I've tried
>> >>>>>>>>>>> different kernel configurations (e.g. a unmodified version
>> >>>>>>>>>>> from Sabayon Linux
>> >>>>>>>>>>> Distribution) but doesn't help. Load of the guest (and
host)
>> >>>>>>>>>>> is very low.
>> >>>>>>>>>>> Network traffic is about 20-50 MBit/s.
>> >>>>>>>>>>>
>> >>>>>>>>>>>      
>> >>>>>>>>>>>         
>> >>>>>>>>>>>             
>> >>>>>>>>>>>                 
>> >>>>>>>>>>>                     
>> >>>>>>>>>>>                       
>> >>>>>>>>>> hm, this is a regression.
>> >>>>>>>>>>
>> >>>>>>>>>> : [  454.006706] users: page allocation failure. order:0,
>> >>>>>>>>>> mode:0x20
>> >>>>>>>>>> : [  454.006712] Pid: 7992, comm: users Not tainted
>> >>>>>>>>>> 2.6.34-rc3-git6 #2
>> >>>>>>>>>> : [  454.006714] Call Trace:
>> >>>>>>>>>> : [  454.006717]<IRQ>   [<ffffffff8109dff7>]
>> >>>>>>>>>> __alloc_pages_nodemask+0x5c8/0x615
>> >>>>>>>>>> : [  454.006796]  [<ffffffff817860ce>] ?
>> >>>>>>>>>> ip_local_deliver+0x65/0x6d
>> >>>>>>>>>> : [  454.006820]  [<ffffffff810c39c4>]
>> >>>>>>>>>> alloc_pages_current+0x96/0x9f
>> >>>>>>>>>> : [  454.006842]  [<ffffffff8167f2c7>]
>> >>>>>>>>>> try_fill_recv+0x5e/0x20f
>> >>>>>>>>>> : [  454.006846]  [<ffffffff8167fe13>]
>> >>>>>>>>>> virtnet_poll+0x52a/0x5c7
>> >>>>>>>>>> : [  454.006858]  [<ffffffff8104fe74>] ?
>> >>>>>>>>>> run_timer_softirq+0x1dc/0x1f4
>> >>>>>>>>>> : [  454.006873]  [<ffffffff8176035d>]
>> >>>>>>>>>> net_rx_action+0xad/0x1a5
>> >>>>>>>>>> : [  454.006882]  [<ffffffff8104b6cd>]
__do_softirq+0x9c/0x127
>> >>>>>>>>>> : [  454.006897]  [<ffffffff81008ffc>]
call_softirq+0x1c/0x30
>> >>>>>>>>>> : [  454.006901]  [<ffffffff8100af01>] do_softirq+0x41/0x7e
>> >>>>>>>>>> : [  454.006904]  [<ffffffff8104b3e3>] irq_exit+0x36/0x75
>> >>>>>>>>>> : [  454.006907]  [<ffffffff8100a5ee>] do_IRQ+0xaa/0xc1
>> >>>>>>>>>> : [  454.006926]  [<ffffffff8183bc13>]
ret_from_intr+0x0/0x11
>> >>>>>>>>>> : [  454.006928]<EOI>   [<ffffffff81026b25>] ?
>> >>>>>>>>>> kvm_deferred_mmu_op+0x5e/0xe7
>> >>>>>>>>>> : [  454.006942]  [<ffffffff81026b19>] ?
>> >>>>>>>>>> kvm_deferred_mmu_op+0x52/0xe7
>> >>>>>>>>>> : [  454.006946]  [<ffffffff81026c03>]
kvm_mmu_write+0x2e/0x35
>> >>>>>>>>>> : [  454.006949]  [<ffffffff81026c7d>]
>> >>>>>>>>>> kvm_set_pte_at+0x19/0x1b
>> >>>>>>>>>> : [  454.006953]  [<ffffffff810aba67>]
__do_fault+0x3c4/0x492
>> >>>>>>>>>> : [  454.006957]  [<ffffffff810adcf4>]
>> >>>>>>>>>> handle_mm_fault+0x478/0x9d8
>> >>>>>>>>>> : [  454.006966]  [<ffffffff810deb59>] ? path_put+0x2c/0x30
>> >>>>>>>>>> : [  454.006975]  [<ffffffff8102f162>]
>> >>>>>>>>>> do_page_fault+0x2f6/0x31a
>> >>>>>>>>>> : [  454.006979]  [<ffffffff8183b81e>] ?
>> >>>>>>>>>> _raw_spin_lock+0x9/0xd
>> >>>>>>>>>> : [  454.006982]  [<ffffffff8183bef5>] page_fault+0x25/0x30
>> >>>>>>>>>> : [  454.006985] Mem-Info:
>> >>>>>>>>>> : [  454.006987] Node 0 DMA per-cpu:
>> >>>>>>>>>> : [  454.006990] CPU    0: hi:    0, btch:   1 usd:   0
>> >>>>>>>>>> : [  454.006992] CPU    1: hi:    0, btch:   1 usd:   0
>> >>>>>>>>>> : [  454.006993] Node 0 DMA32 per-cpu:
>> >>>>>>>>>> : [  454.006996] CPU    0: hi:  186, btch:  31 usd: 185
>> >>>>>>>>>> : [  454.006998] CPU    1: hi:  186, btch:  31 usd: 112
>> >>>>>>>>>> : [  454.007003] active_anon:8308 inactive_anon:8544
>> >>>>>>>>>> isolated_anon:0
>> >>>>>>>>>> : [  454.007005]  active_file:4882 inactive_file:205902
>> >>>>>>>>>> isolated_file:0
>> >>>>>>>>>> : [  454.007006]  unevictable:0 dirty:11 writeback:0
>> >>>>>>>>>> unstable:0
>> >>>>>>>>>> : [  454.007007]  free:1385 slab_reclaimable:2445
>> >>>>>>>>>> slab_unreclaimable:4466
>> >>>>>>>>>> : [  454.007008]  mapped:1895 shmem:113 pagetables:1370
>> >>>>>>>>>> bounce:0
>> >>>>>>>>>> : [  454.007010] Node 0 DMA free:4000kB min:60kB low:72kB
>> >>>>>>>>>> high:88kB active_anon:0kB inactive_anon:0kB active_file:0kB
>> >>>>>>>>>> inactive_file:11844kB unevictable:0kB isolated(anon):0kB
>> >>>>>>>>>> isolated(file):0kB present:15768kB mlocked:0kB dirty:0kB
>> >>>>>>>>>> writeback:0kB mapped:0kB shmem:0kB slab_reclaimable:64kB
>> >>>>>>>>>> slab_unreclaimable:32kB kernel_stack:0kB pagetables:0kB
>> >>>>>>>>>> unstable:0kB bounce:0kB writeback_tmp:0kB pages_scanned:0
>> >>>>>>>>>> all_unreclaimable? no
>> >>>>>>>>>> : [  454.007021] lowmem_reserve[]: 0 994 994 994
>> >>>>>>>>>> : [  454.007025] Node 0 DMA32 free:1540kB min:4000kB
>> >>>>>>>>>> low:5000kB high:6000kB active_anon:33232kB
>> >>>>>>>>>> inactive_anon:34176kB active_file:19528kB
>> >>>>>>>>>> inactive_file:811764kB unevictable:0kB isolated(anon):0kB
>> >>>>>>>>>> isolated(file):0kB present:1018068kB mlocked:0kB dirty:44kB
>> >>>>>>>>>> writeback:0kB mapped:7580kB shmem:452kB
>> >>>>>>>>>> slab_reclaimable:9716kB slab_unreclaimable:17832kB
>> >>>>>>>>>> kernel_stack:1144kB pagetables:5480kB unstable:0kB
bounce:0kB
>> >>>>>>>>>> writeback_tmp:0kB pages_scanned:0 all_unreclaimable? no
>> >>>>>>>>>> : [  454.007036] lowmem_reserve[]: 0 0 0 0
>> >>>>>>>>>> : [  454.007040] Node 0 DMA: 0*4kB 4*8kB 6*16kB 5*32kB
6*64kB
>> >>>>>>>>>> 4*128kB 1*256kB 1*512kB 0*1024kB 1*2048kB 0*4096kB = 4000kB
>> >>>>>>>>>> : [  454.007050] Node 0 DMA32: 13*4kB 2*8kB 3*16kB 1*32kB
>> >>>>>>>>>> 2*64kB 0*128kB 1*256kB 0*512kB 1*1024kB 0*2048kB 0*4096kB =
>> >>>>>>>>>> 1556kB
>> >>>>>>>>>> : [  454.007059] 210914 total pagecache pages
>> >>>>>>>>>> : [  454.007061] 0 pages in swap cache
>> >>>>>>>>>> : [  454.007063] Swap cache stats: add 0, delete 0, find 0/0
>> >>>>>>>>>> : [  454.007065] Free swap  = 1959924kB
>> >>>>>>>>>> : [  454.007067] Total swap = 1959924kB
>> >>>>>>>>>> : [  454.014238] 262140 pages RAM
>> >>>>>>>>>> : [  454.014241] 7489 pages reserved
>> >>>>>>>>>> : [  454.014242] 21430 pages shared
>> >>>>>>>>>> : [  454.014244] 247174 pages non-shared
>> >>>>>>>>>>
>> >>>>>>>>>> Either page reclaim got worse or kvm/virtio-net got more
>> >>>>>>>>>> aggressive.
>> >>>>>>>>>>
>> >>>>>>>>>> Avi, Rusty: can you think of any changes in the KVM/virtio
>> >>>>>>>>>> area in the
>> >>>>>>>>>> 2.6.30 ->  2.6.32 timeframe which may have increased the
>> >>>>>>>>>> GFP_ATOMIC
>> >>>>>>>>>> demands upon the page allocator?
>> >>>>>>>>>>
>> >>>>>>>>>> Thanks.
>> >>>>>>>>>>    
>> >>>>>>>>>>       
>> >>>>>>>>>>           
>> >>>>>>>>>>               
>> >>>>>>>>>>                   
>> >>>>>>>>>>                     
>> >>>>>>>> On the contrary, with commit
>> >>>>>>>> 3161e453e496eb5643faad30fff5a5ab183da0fe
>> >>>>>>>> we should be using GFP_ATOMIC less.
>> >>>>>>>> But maybe there's a bug and it has the reverse effect somehow
>> >>>>>>>> ...
>> >>>>>>>>
>> >>>>>>>> Robert, could you pls try
>> >>>>>>>> 3161e453e496eb5643faad30fff5a5ab183da0fe
>> >>>>>>>> and if that *does* have the problem,
>> >>>>>>>> 0b4f2928f14c4a9770b0866923fc81beb7f4aa57?
>> >>>>>>>>
>> >>>>>>>>   
>> >>>>>>>>       
>> >>>>>>>>           
>> >>>>>>>>               
>> >>>>>>>>                 
>> >

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-22 11:31                             ` kernel
@ 2010-04-22 10:03                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 62+ messages in thread
From: Michael S. Tsirkin @ 2010-04-22 10:03 UTC (permalink / raw)
  To: kernel
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs,
	linux-kernel

On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote:
> Maybe some comments to my former mail about what I've done:
> I started with a fresh clone (deleted the old /usr/src/linux
> of course). 
> 
> git clone
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
> 
> Then I started bisect
> 
> git bisect start 'v2.6.31' 'v2.6.30'
> 
> and build the first kernel and then marked kernels which
> "crashed" with "soft lockup" or "swapper page allocation failure"
> as bad and the other ones as good. Before I've compiled
> a new kernel I've always done a "make mrproper". I don't know
> if this is needed but thought it wouldn't hurt.
> 
> For me it was not clear that maybe I should have had stopped
> testing after the first commit that came up with a "swapper
> page allocation failure". It was only one commit which cased
> the allocation failure. All the other commits marked as bad
> came up with a soft lockup. But I thought it is important to
> find the earliest commit which crashes. So should I find out
> the commit with the allocation failure?

I think you did the right thing. We'll have to
figure out soft lockup thing, then if page allocation failure
turns out to be a different issue, look at it.

> As you requested I've now done now a
> 
> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50
> 
> which ended with a soft lockup within 3 min. after starting
> the VM (see
> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit)
> with this kernel.

I'm not sure why the lockup backtrace does not show function names -
is the kernel stripped?

> 
> Then I've done a
> 
> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae
> 
> compiled and restarted the VM with this kernel version
> (BTW: Of course I've always used the same .config for 
> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae
> is running fine.
> 
> Thanks!
> Robert

Well, so the soft lockup issue seems NFS-related?
Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to
be causing problems on some old kernels (See bisect below). Any idea why?


> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com>
> wrote:
> > On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote:
> >> So after the compiler was running hot I've now the following result:
> >> 
> >> server10:/usr/src/linux # git bisect log 
> >> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
> >> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30
> >> git bisect start 'v2.6.31' 'v2.6.30'
> >> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736):
> >> videobuf: modify return value of VIDIOC_REQBUFS ioctl
> >> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3
> >> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device
> >> capabilities of 82599 single speed fiber NICs.
> >> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e
> >> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android:
> >> lowmemorykiller: fix up remaining checkpatch warnings
> >> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03
> >> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch
> >> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
> >> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70
> >> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch
> >> 'for-linus'
> >> of git://www.jni.nu/cris
> >> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d
> >> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge
> >> git://git.infradead.org/mtd-2.6
> >> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2
> >> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091):
> >> gspca_sonixj: Add light frequency control
> >> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb
> >> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge
> >> git://git.infradead.org/~dwmw2/iommu-2.6.31
> >> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b
> >> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch
> >> 'for-linus'
> >> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
> >> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a
> >> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix
> >> card
> >> driver reloading
> >> git bisect good b01b4babbf204443b5a846a7494546501614cefc
> >> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace
> >> nfs4_path_walk() with VFS path lookup in a private namespace
> >> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50
> >> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the
> >> function put_mnt_ns()
> >> git bisect good 616511d039af402670de8500d0e24495113a9cab
> >> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper
> >> functions for setting up private namespaces
> >> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae
> >> 
> >> 
> >> The last "git bisect good" prints out:
> >> 
> >> server10:/usr/src/linux # git bisect good
> >> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit
> >> commit c02d7adf8c5429727a98bad1d039bccad4c61c50
> >> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
> >> Date:   Mon Jun 22 15:09:14 2009 -0400
> >> 
> >>     NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private
> >> namespace
> >>     
> >>     As noted in the previous patch, the NFSv4 client mount code
> currently
> >>     has several limitations. If the mount path contains symlinks, or
> >>     referrals, or even if it just contains a '..', then the client code
> >>     in
> >>     nfs4_path_walk() will fail with an error.
> >>     
> >>     This patch replaces the nfs4_path_walk()-based lookup with a helper
> >>     function that sets up a private namespace to represent the
> namespace
> >> on the
> >>     server, then uses the ordinary VFS and NFS path lookup code to walk
> >> down the
> >>     mount path in that namespace.
> >>     
> >>     Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> >>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> >> 
> >> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc
> >> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M      fs
> >> 
> >> Does this help you any further?
> >> 
> >> Thanks!
> >> Robert
> > 
> > Looks suspiciously like some error in testing.
> > Could you pls retest and verify again that
> > cf8d2c11cb77f129675478792122f50827e5b0ae
> > is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad?

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-04-22 10:03                                 ` Michael S. Tsirkin
  0 siblings, 0 replies; 62+ messages in thread
From: Michael S. Tsirkin @ 2010-04-22 10:03 UTC (permalink / raw)
  To: kernel
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs,
	linux-kernel

On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote:
> Maybe some comments to my former mail about what I've done:
> I started with a fresh clone (deleted the old /usr/src/linux
> of course). 
> 
> git clone
> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
> 
> Then I started bisect
> 
> git bisect start 'v2.6.31' 'v2.6.30'
> 
> and build the first kernel and then marked kernels which
> "crashed" with "soft lockup" or "swapper page allocation failure"
> as bad and the other ones as good. Before I've compiled
> a new kernel I've always done a "make mrproper". I don't know
> if this is needed but thought it wouldn't hurt.
> 
> For me it was not clear that maybe I should have had stopped
> testing after the first commit that came up with a "swapper
> page allocation failure". It was only one commit which cased
> the allocation failure. All the other commits marked as bad
> came up with a soft lockup. But I thought it is important to
> find the earliest commit which crashes. So should I find out
> the commit with the allocation failure?

I think you did the right thing. We'll have to
figure out soft lockup thing, then if page allocation failure
turns out to be a different issue, look at it.

> As you requested I've now done now a
> 
> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50
> 
> which ended with a soft lockup within 3 min. after starting
> the VM (see
> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit)
> with this kernel.

I'm not sure why the lockup backtrace does not show function names -
is the kernel stripped?

> 
> Then I've done a
> 
> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae
> 
> compiled and restarted the VM with this kernel version
> (BTW: Of course I've always used the same .config for 
> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae
> is running fine.
> 
> Thanks!
> Robert

Well, so the soft lockup issue seems NFS-related?
Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to
be causing problems on some old kernels (See bisect below). Any idea why?


> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com>
> wrote:
> > On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote:
> >> So after the compiler was running hot I've now the following result:
> >> 
> >> server10:/usr/src/linux # git bisect log 
> >> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
> >> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30
> >> git bisect start 'v2.6.31' 'v2.6.30'
> >> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736):
> >> videobuf: modify return value of VIDIOC_REQBUFS ioctl
> >> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3
> >> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device
> >> capabilities of 82599 single speed fiber NICs.
> >> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e
> >> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android:
> >> lowmemorykiller: fix up remaining checkpatch warnings
> >> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03
> >> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch
> >> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
> >> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70
> >> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch
> >> 'for-linus'
> >> of git://www.jni.nu/cris
> >> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d
> >> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge
> >> git://git.infradead.org/mtd-2.6
> >> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2
> >> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091):
> >> gspca_sonixj: Add light frequency control
> >> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb
> >> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge
> >> git://git.infradead.org/~dwmw2/iommu-2.6.31
> >> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b
> >> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch
> >> 'for-linus'
> >> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
> >> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a
> >> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix
> >> card
> >> driver reloading
> >> git bisect good b01b4babbf204443b5a846a7494546501614cefc
> >> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace
> >> nfs4_path_walk() with VFS path lookup in a private namespace
> >> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50
> >> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the
> >> function put_mnt_ns()
> >> git bisect good 616511d039af402670de8500d0e24495113a9cab
> >> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper
> >> functions for setting up private namespaces
> >> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae
> >> 
> >> 
> >> The last "git bisect good" prints out:
> >> 
> >> server10:/usr/src/linux # git bisect good
> >> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit
> >> commit c02d7adf8c5429727a98bad1d039bccad4c61c50
> >> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
> >> Date:   Mon Jun 22 15:09:14 2009 -0400
> >> 
> >>     NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private
> >> namespace
> >>     
> >>     As noted in the previous patch, the NFSv4 client mount code
> currently
> >>     has several limitations. If the mount path contains symlinks, or
> >>     referrals, or even if it just contains a '..', then the client code
> >>     in
> >>     nfs4_path_walk() will fail with an error.
> >>     
> >>     This patch replaces the nfs4_path_walk()-based lookup with a helper
> >>     function that sets up a private namespace to represent the
> namespace
> >> on the
> >>     server, then uses the ordinary VFS and NFS path lookup code to walk
> >> down the
> >>     mount path in that namespace.
> >>     
> >>     Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> >>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> >> 
> >> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc
> >> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M      fs
> >> 
> >> Does this help you any further?
> >> 
> >> Thanks!
> >> Robert
> > 
> > Looks suspiciously like some error in testing.
> > Could you pls retest and verify again that
> > cf8d2c11cb77f129675478792122f50827e5b0ae
> > is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-21  9:42                           ` Michael S. Tsirkin
@ 2010-04-22 11:31                             ` kernel
  2010-04-22 10:03                                 ` Michael S. Tsirkin
  0 siblings, 1 reply; 62+ messages in thread
From: kernel @ 2010-04-22 11:31 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman

Maybe some comments to my former mail about what I've done:
I started with a fresh clone (deleted the old /usr/src/linux
of course). 

git clone
git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux

Then I started bisect

git bisect start 'v2.6.31' 'v2.6.30'

and build the first kernel and then marked kernels which
"crashed" with "soft lockup" or "swapper page allocation failure"
as bad and the other ones as good. Before I've compiled
a new kernel I've always done a "make mrproper". I don't know
if this is needed but thought it wouldn't hurt.

For me it was not clear that maybe I should have had stopped
testing after the first commit that came up with a "swapper
page allocation failure". It was only one commit which cased
the allocation failure. All the other commits marked as bad
came up with a soft lockup. But I thought it is important to
find the earliest commit which crashes. So should I find out
the commit with the allocation failure?

As you requested I've now done now a

git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50

which ended with a soft lockup within 3 min. after starting
the VM (see
https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit)
with this kernel.

Then I've done a

git checkout cf8d2c11cb77f129675478792122f50827e5b0ae

compiled and restarted the VM with this kernel version
(BTW: Of course I've always used the same .config for 
all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae
is running fine.

Thanks!
Robert

On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com>
wrote:
> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote:
>> So after the compiler was running hot I've now the following result:
>> 
>> server10:/usr/src/linux # git bisect log 
>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30
>> git bisect start 'v2.6.31' 'v2.6.30'
>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736):
>> videobuf: modify return value of VIDIOC_REQBUFS ioctl
>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3
>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device
>> capabilities of 82599 single speed fiber NICs.
>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e
>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android:
>> lowmemorykiller: fix up remaining checkpatch warnings
>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03
>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch
>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70
>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch
>> 'for-linus'
>> of git://www.jni.nu/cris
>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d
>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge
>> git://git.infradead.org/mtd-2.6
>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2
>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091):
>> gspca_sonixj: Add light frequency control
>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb
>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge
>> git://git.infradead.org/~dwmw2/iommu-2.6.31
>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b
>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch
>> 'for-linus'
>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a
>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix
>> card
>> driver reloading
>> git bisect good b01b4babbf204443b5a846a7494546501614cefc
>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace
>> nfs4_path_walk() with VFS path lookup in a private namespace
>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50
>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the
>> function put_mnt_ns()
>> git bisect good 616511d039af402670de8500d0e24495113a9cab
>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper
>> functions for setting up private namespaces
>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae
>> 
>> 
>> The last "git bisect good" prints out:
>> 
>> server10:/usr/src/linux # git bisect good
>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit
>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50
>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
>> Date:   Mon Jun 22 15:09:14 2009 -0400
>> 
>>     NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private
>> namespace
>>     
>>     As noted in the previous patch, the NFSv4 client mount code
currently
>>     has several limitations. If the mount path contains symlinks, or
>>     referrals, or even if it just contains a '..', then the client code
>>     in
>>     nfs4_path_walk() will fail with an error.
>>     
>>     This patch replaces the nfs4_path_walk()-based lookup with a helper
>>     function that sets up a private namespace to represent the
namespace
>> on the
>>     server, then uses the ordinary VFS and NFS path lookup code to walk
>> down the
>>     mount path in that namespace.
>>     
>>     Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
>>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>> 
>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc
>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M      fs
>> 
>> Does this help you any further?
>> 
>> Thanks!
>> Robert
> 
> Looks suspiciously like some error in testing.
> Could you pls retest and verify again that
> cf8d2c11cb77f129675478792122f50827e5b0ae
> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-22 10:03                                 ` Michael S. Tsirkin
@ 2010-04-23  5:26                                   ` Robert Wimmer
  -1 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-04-23  5:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs,
	linux-kernel

> I'm not sure why the lockup backtrace does not show function names -
> is the kernel stripped?

I'm building the kernels always with "genkernel" a Gentoo
helper programm for kernel building. But I've looked into
the log file of genkernel and there is nothing mentioned about
striping the kernel. There will be a future release of genkernel
which supports this but this is currently not the case. Since
I haven't stripped the kernel I would answer no. Maybe a
kernel option which should be enabled?

Thanks!
Robert




On 04/22/10 12:03, Michael S. Tsirkin wrote:
> On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote:
>   
>> Maybe some comments to my former mail about what I've done:
>> I started with a fresh clone (deleted the old /usr/src/linux
>> of course). 
>>
>> git clone
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
>>
>> Then I started bisect
>>
>> git bisect start 'v2.6.31' 'v2.6.30'
>>
>> and build the first kernel and then marked kernels which
>> "crashed" with "soft lockup" or "swapper page allocation failure"
>> as bad and the other ones as good. Before I've compiled
>> a new kernel I've always done a "make mrproper". I don't know
>> if this is needed but thought it wouldn't hurt.
>>
>> For me it was not clear that maybe I should have had stopped
>> testing after the first commit that came up with a "swapper
>> page allocation failure". It was only one commit which cased
>> the allocation failure. All the other commits marked as bad
>> came up with a soft lockup. But I thought it is important to
>> find the earliest commit which crashes. So should I find out
>> the commit with the allocation failure?
>>     
> I think you did the right thing. We'll have to
> figure out soft lockup thing, then if page allocation failure
> turns out to be a different issue, look at it.
>
>   
>> As you requested I've now done now a
>>
>> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50
>>
>> which ended with a soft lockup within 3 min. after starting
>> the VM (see
>> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit)
>> with this kernel.
>>     
> I'm not sure why the lockup backtrace does not show function names -
> is the kernel stripped?
>
>   
>> Then I've done a
>>
>> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae
>>
>> compiled and restarted the VM with this kernel version
>> (BTW: Of course I've always used the same .config for 
>> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae
>> is running fine.
>>
>> Thanks!
>> Robert
>>     
> Well, so the soft lockup issue seems NFS-related?
> Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to
> be causing problems on some old kernels (See bisect below). Any idea why?
>
>
>   
>> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com>
>> wrote:
>>     
>>> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote:
>>>       
>>>> So after the compiler was running hot I've now the following result:
>>>>
>>>> server10:/usr/src/linux # git bisect log 
>>>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
>>>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30
>>>> git bisect start 'v2.6.31' 'v2.6.30'
>>>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736):
>>>> videobuf: modify return value of VIDIOC_REQBUFS ioctl
>>>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3
>>>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device
>>>> capabilities of 82599 single speed fiber NICs.
>>>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e
>>>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android:
>>>> lowmemorykiller: fix up remaining checkpatch warnings
>>>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03
>>>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch
>>>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
>>>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70
>>>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch
>>>> 'for-linus'
>>>> of git://www.jni.nu/cris
>>>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d
>>>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge
>>>> git://git.infradead.org/mtd-2.6
>>>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2
>>>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091):
>>>> gspca_sonixj: Add light frequency control
>>>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb
>>>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge
>>>> git://git.infradead.org/~dwmw2/iommu-2.6.31
>>>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b
>>>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch
>>>> 'for-linus'
>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
>>>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a
>>>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix
>>>> card
>>>> driver reloading
>>>> git bisect good b01b4babbf204443b5a846a7494546501614cefc
>>>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace
>>>> nfs4_path_walk() with VFS path lookup in a private namespace
>>>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50
>>>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the
>>>> function put_mnt_ns()
>>>> git bisect good 616511d039af402670de8500d0e24495113a9cab
>>>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper
>>>> functions for setting up private namespaces
>>>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae
>>>>
>>>>
>>>> The last "git bisect good" prints out:
>>>>
>>>> server10:/usr/src/linux # git bisect good
>>>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit
>>>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50
>>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
>>>> Date:   Mon Jun 22 15:09:14 2009 -0400
>>>>
>>>>     NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private
>>>> namespace
>>>>     
>>>>     As noted in the previous patch, the NFSv4 client mount code
>>>>         
>> currently
>>     
>>>>     has several limitations. If the mount path contains symlinks, or
>>>>     referrals, or even if it just contains a '..', then the client code
>>>>     in
>>>>     nfs4_path_walk() will fail with an error.
>>>>     
>>>>     This patch replaces the nfs4_path_walk()-based lookup with a helper
>>>>     function that sets up a private namespace to represent the
>>>>         
>> namespace
>>     
>>>> on the
>>>>     server, then uses the ordinary VFS and NFS path lookup code to walk
>>>> down the
>>>>     mount path in that namespace.
>>>>     
>>>>     Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
>>>>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>>>>
>>>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc
>>>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M      fs
>>>>
>>>> Does this help you any further?
>>>>
>>>> Thanks!
>>>> Robert
>>>>         
>>> Looks suspiciously like some error in testing.
>>> Could you pls retest and verify again that
>>> cf8d2c11cb77f129675478792122f50827e5b0ae
>>> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad?
>>>       


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-04-23  5:26                                   ` Robert Wimmer
  0 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-04-23  5:26 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs,
	linux-kernel

> I'm not sure why the lockup backtrace does not show function names -
> is the kernel stripped?

I'm building the kernels always with "genkernel" a Gentoo
helper programm for kernel building. But I've looked into
the log file of genkernel and there is nothing mentioned about
striping the kernel. There will be a future release of genkernel
which supports this but this is currently not the case. Since
I haven't stripped the kernel I would answer no. Maybe a
kernel option which should be enabled?

Thanks!
Robert




On 04/22/10 12:03, Michael S. Tsirkin wrote:
> On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote:
>   
>> Maybe some comments to my former mail about what I've done:
>> I started with a fresh clone (deleted the old /usr/src/linux
>> of course). 
>>
>> git clone
>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
>>
>> Then I started bisect
>>
>> git bisect start 'v2.6.31' 'v2.6.30'
>>
>> and build the first kernel and then marked kernels which
>> "crashed" with "soft lockup" or "swapper page allocation failure"
>> as bad and the other ones as good. Before I've compiled
>> a new kernel I've always done a "make mrproper". I don't know
>> if this is needed but thought it wouldn't hurt.
>>
>> For me it was not clear that maybe I should have had stopped
>> testing after the first commit that came up with a "swapper
>> page allocation failure". It was only one commit which cased
>> the allocation failure. All the other commits marked as bad
>> came up with a soft lockup. But I thought it is important to
>> find the earliest commit which crashes. So should I find out
>> the commit with the allocation failure?
>>     
> I think you did the right thing. We'll have to
> figure out soft lockup thing, then if page allocation failure
> turns out to be a different issue, look at it.
>
>   
>> As you requested I've now done now a
>>
>> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50
>>
>> which ended with a soft lockup within 3 min. after starting
>> the VM (see
>> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit)
>> with this kernel.
>>     
> I'm not sure why the lockup backtrace does not show function names -
> is the kernel stripped?
>
>   
>> Then I've done a
>>
>> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae
>>
>> compiled and restarted the VM with this kernel version
>> (BTW: Of course I've always used the same .config for 
>> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae
>> is running fine.
>>
>> Thanks!
>> Robert
>>     
> Well, so the soft lockup issue seems NFS-related?
> Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to
> be causing problems on some old kernels (See bisect below). Any idea why?
>
>
>   
>> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com>
>> wrote:
>>     
>>> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote:
>>>       
>>>> So after the compiler was running hot I've now the following result:
>>>>
>>>> server10:/usr/src/linux # git bisect log 
>>>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
>>>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30
>>>> git bisect start 'v2.6.31' 'v2.6.30'
>>>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736):
>>>> videobuf: modify return value of VIDIOC_REQBUFS ioctl
>>>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3
>>>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device
>>>> capabilities of 82599 single speed fiber NICs.
>>>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e
>>>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android:
>>>> lowmemorykiller: fix up remaining checkpatch warnings
>>>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03
>>>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch
>>>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
>>>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70
>>>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch
>>>> 'for-linus'
>>>> of git://www.jni.nu/cris
>>>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d
>>>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge
>>>> git://git.infradead.org/mtd-2.6
>>>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2
>>>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091):
>>>> gspca_sonixj: Add light frequency control
>>>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb
>>>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge
>>>> git://git.infradead.org/~dwmw2/iommu-2.6.31
>>>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b
>>>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch
>>>> 'for-linus'
>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
>>>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a
>>>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix
>>>> card
>>>> driver reloading
>>>> git bisect good b01b4babbf204443b5a846a7494546501614cefc
>>>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace
>>>> nfs4_path_walk() with VFS path lookup in a private namespace
>>>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50
>>>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the
>>>> function put_mnt_ns()
>>>> git bisect good 616511d039af402670de8500d0e24495113a9cab
>>>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper
>>>> functions for setting up private namespaces
>>>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae
>>>>
>>>>
>>>> The last "git bisect good" prints out:
>>>>
>>>> server10:/usr/src/linux # git bisect good
>>>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit
>>>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50
>>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
>>>> Date:   Mon Jun 22 15:09:14 2009 -0400
>>>>
>>>>     NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private
>>>> namespace
>>>>     
>>>>     As noted in the previous patch, the NFSv4 client mount code
>>>>         
>> currently
>>     
>>>>     has several limitations. If the mount path contains symlinks, or
>>>>     referrals, or even if it just contains a '..', then the client code
>>>>     in
>>>>     nfs4_path_walk() will fail with an error.
>>>>     
>>>>     This patch replaces the nfs4_path_walk()-based lookup with a helper
>>>>     function that sets up a private namespace to represent the
>>>>         
>> namespace
>>     
>>>> on the
>>>>     server, then uses the ordinary VFS and NFS path lookup code to walk
>>>> down the
>>>>     mount path in that namespace.
>>>>     
>>>>     Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
>>>>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>>>>
>>>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc
>>>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M      fs
>>>>
>>>> Does this help you any further?
>>>>
>>>> Thanks!
>>>> Robert
>>>>         
>>> Looks suspiciously like some error in testing.
>>> Could you pls retest and verify again that
>>> cf8d2c11cb77f129675478792122f50827e5b0ae
>>> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad?
>>>       

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-23  5:26                                   ` Robert Wimmer
@ 2010-04-25  9:18                                     ` Michael S. Tsirkin
  -1 siblings, 0 replies; 62+ messages in thread
From: Michael S. Tsirkin @ 2010-04-25  9:18 UTC (permalink / raw)
  To: Robert Wimmer
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs,
	linux-kernel

On Fri, Apr 23, 2010 at 07:26:52AM +0200, Robert Wimmer wrote:
> > I'm not sure why the lockup backtrace does not show function names -
> > is the kernel stripped?
> 
> I'm building the kernels always with "genkernel" a Gentoo
> helper programm for kernel building. But I've looked into
> the log file of genkernel and there is nothing mentioned about
> striping the kernel. There will be a future release of genkernel
> which supports this but this is currently not the case. Since
> I haven't stripped the kernel I would answer no. Maybe a
> kernel option which should be enabled?
> 
> Thanks!
> Robert
> 

Hmm. I have these
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_EXTRA_PASS=y
# CONFIG_STRIP_ASM_SYMS is not set


> 
> 
> On 04/22/10 12:03, Michael S. Tsirkin wrote:
> > On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote:
> >   
> >> Maybe some comments to my former mail about what I've done:
> >> I started with a fresh clone (deleted the old /usr/src/linux
> >> of course). 
> >>
> >> git clone
> >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
> >>
> >> Then I started bisect
> >>
> >> git bisect start 'v2.6.31' 'v2.6.30'
> >>
> >> and build the first kernel and then marked kernels which
> >> "crashed" with "soft lockup" or "swapper page allocation failure"
> >> as bad and the other ones as good. Before I've compiled
> >> a new kernel I've always done a "make mrproper". I don't know
> >> if this is needed but thought it wouldn't hurt.
> >>
> >> For me it was not clear that maybe I should have had stopped
> >> testing after the first commit that came up with a "swapper
> >> page allocation failure". It was only one commit which cased
> >> the allocation failure. All the other commits marked as bad
> >> came up with a soft lockup. But I thought it is important to
> >> find the earliest commit which crashes. So should I find out
> >> the commit with the allocation failure?
> >>     
> > I think you did the right thing. We'll have to
> > figure out soft lockup thing, then if page allocation failure
> > turns out to be a different issue, look at it.
> >
> >   
> >> As you requested I've now done now a
> >>
> >> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50
> >>
> >> which ended with a soft lockup within 3 min. after starting
> >> the VM (see
> >> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit)
> >> with this kernel.
> >>     
> > I'm not sure why the lockup backtrace does not show function names -
> > is the kernel stripped?
> >
> >   
> >> Then I've done a
> >>
> >> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae
> >>
> >> compiled and restarted the VM with this kernel version
> >> (BTW: Of course I've always used the same .config for 
> >> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae
> >> is running fine.
> >>
> >> Thanks!
> >> Robert
> >>     
> > Well, so the soft lockup issue seems NFS-related?
> > Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to
> > be causing problems on some old kernels (See bisect below). Any idea why?
> >
> >
> >   
> >> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com>
> >> wrote:
> >>     
> >>> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote:
> >>>       
> >>>> So after the compiler was running hot I've now the following result:
> >>>>
> >>>> server10:/usr/src/linux # git bisect log 
> >>>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
> >>>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30
> >>>> git bisect start 'v2.6.31' 'v2.6.30'
> >>>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736):
> >>>> videobuf: modify return value of VIDIOC_REQBUFS ioctl
> >>>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3
> >>>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device
> >>>> capabilities of 82599 single speed fiber NICs.
> >>>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e
> >>>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android:
> >>>> lowmemorykiller: fix up remaining checkpatch warnings
> >>>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03
> >>>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch
> >>>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
> >>>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70
> >>>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch
> >>>> 'for-linus'
> >>>> of git://www.jni.nu/cris
> >>>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d
> >>>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge
> >>>> git://git.infradead.org/mtd-2.6
> >>>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2
> >>>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091):
> >>>> gspca_sonixj: Add light frequency control
> >>>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb
> >>>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge
> >>>> git://git.infradead.org/~dwmw2/iommu-2.6.31
> >>>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b
> >>>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch
> >>>> 'for-linus'
> >>>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
> >>>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a
> >>>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix
> >>>> card
> >>>> driver reloading
> >>>> git bisect good b01b4babbf204443b5a846a7494546501614cefc
> >>>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace
> >>>> nfs4_path_walk() with VFS path lookup in a private namespace
> >>>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50
> >>>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the
> >>>> function put_mnt_ns()
> >>>> git bisect good 616511d039af402670de8500d0e24495113a9cab
> >>>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper
> >>>> functions for setting up private namespaces
> >>>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae
> >>>>
> >>>>
> >>>> The last "git bisect good" prints out:
> >>>>
> >>>> server10:/usr/src/linux # git bisect good
> >>>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit
> >>>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50
> >>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
> >>>> Date:   Mon Jun 22 15:09:14 2009 -0400
> >>>>
> >>>>     NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private
> >>>> namespace
> >>>>     
> >>>>     As noted in the previous patch, the NFSv4 client mount code
> >>>>         
> >> currently
> >>     
> >>>>     has several limitations. If the mount path contains symlinks, or
> >>>>     referrals, or even if it just contains a '..', then the client code
> >>>>     in
> >>>>     nfs4_path_walk() will fail with an error.
> >>>>     
> >>>>     This patch replaces the nfs4_path_walk()-based lookup with a helper
> >>>>     function that sets up a private namespace to represent the
> >>>>         
> >> namespace
> >>     
> >>>> on the
> >>>>     server, then uses the ordinary VFS and NFS path lookup code to walk
> >>>> down the
> >>>>     mount path in that namespace.
> >>>>     
> >>>>     Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> >>>>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> >>>>
> >>>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc
> >>>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M      fs
> >>>>
> >>>> Does this help you any further?
> >>>>
> >>>> Thanks!
> >>>> Robert
> >>>>         
> >>> Looks suspiciously like some error in testing.
> >>> Could you pls retest and verify again that
> >>> cf8d2c11cb77f129675478792122f50827e5b0ae
> >>> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad?
> >>>       

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-04-25  9:18                                     ` Michael S. Tsirkin
  0 siblings, 0 replies; 62+ messages in thread
From: Michael S. Tsirkin @ 2010-04-25  9:18 UTC (permalink / raw)
  To: Robert Wimmer
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs,
	linux-kernel

On Fri, Apr 23, 2010 at 07:26:52AM +0200, Robert Wimmer wrote:
> > I'm not sure why the lockup backtrace does not show function names -
> > is the kernel stripped?
> 
> I'm building the kernels always with "genkernel" a Gentoo
> helper programm for kernel building. But I've looked into
> the log file of genkernel and there is nothing mentioned about
> striping the kernel. There will be a future release of genkernel
> which supports this but this is currently not the case. Since
> I haven't stripped the kernel I would answer no. Maybe a
> kernel option which should be enabled?
> 
> Thanks!
> Robert
> 

Hmm. I have these
CONFIG_KALLSYMS=y
CONFIG_KALLSYMS_ALL=y
CONFIG_KALLSYMS_EXTRA_PASS=y
# CONFIG_STRIP_ASM_SYMS is not set


> 
> 
> On 04/22/10 12:03, Michael S. Tsirkin wrote:
> > On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote:
> >   
> >> Maybe some comments to my former mail about what I've done:
> >> I started with a fresh clone (deleted the old /usr/src/linux
> >> of course). 
> >>
> >> git clone
> >> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
> >>
> >> Then I started bisect
> >>
> >> git bisect start 'v2.6.31' 'v2.6.30'
> >>
> >> and build the first kernel and then marked kernels which
> >> "crashed" with "soft lockup" or "swapper page allocation failure"
> >> as bad and the other ones as good. Before I've compiled
> >> a new kernel I've always done a "make mrproper". I don't know
> >> if this is needed but thought it wouldn't hurt.
> >>
> >> For me it was not clear that maybe I should have had stopped
> >> testing after the first commit that came up with a "swapper
> >> page allocation failure". It was only one commit which cased
> >> the allocation failure. All the other commits marked as bad
> >> came up with a soft lockup. But I thought it is important to
> >> find the earliest commit which crashes. So should I find out
> >> the commit with the allocation failure?
> >>     
> > I think you did the right thing. We'll have to
> > figure out soft lockup thing, then if page allocation failure
> > turns out to be a different issue, look at it.
> >
> >   
> >> As you requested I've now done now a
> >>
> >> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50
> >>
> >> which ended with a soft lockup within 3 min. after starting
> >> the VM (see
> >> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit)
> >> with this kernel.
> >>     
> > I'm not sure why the lockup backtrace does not show function names -
> > is the kernel stripped?
> >
> >   
> >> Then I've done a
> >>
> >> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae
> >>
> >> compiled and restarted the VM with this kernel version
> >> (BTW: Of course I've always used the same .config for 
> >> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae
> >> is running fine.
> >>
> >> Thanks!
> >> Robert
> >>     
> > Well, so the soft lockup issue seems NFS-related?
> > Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to
> > be causing problems on some old kernels (See bisect below). Any idea why?
> >
> >
> >   
> >> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com>
> >> wrote:
> >>     
> >>> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote:
> >>>       
> >>>> So after the compiler was running hot I've now the following result:
> >>>>
> >>>> server10:/usr/src/linux # git bisect log 
> >>>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
> >>>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30
> >>>> git bisect start 'v2.6.31' 'v2.6.30'
> >>>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736):
> >>>> videobuf: modify return value of VIDIOC_REQBUFS ioctl
> >>>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3
> >>>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device
> >>>> capabilities of 82599 single speed fiber NICs.
> >>>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e
> >>>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android:
> >>>> lowmemorykiller: fix up remaining checkpatch warnings
> >>>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03
> >>>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch
> >>>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
> >>>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70
> >>>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch
> >>>> 'for-linus'
> >>>> of git://www.jni.nu/cris
> >>>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d
> >>>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge
> >>>> git://git.infradead.org/mtd-2.6
> >>>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2
> >>>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091):
> >>>> gspca_sonixj: Add light frequency control
> >>>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb
> >>>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge
> >>>> git://git.infradead.org/~dwmw2/iommu-2.6.31
> >>>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b
> >>>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch
> >>>> 'for-linus'
> >>>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
> >>>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a
> >>>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix
> >>>> card
> >>>> driver reloading
> >>>> git bisect good b01b4babbf204443b5a846a7494546501614cefc
> >>>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace
> >>>> nfs4_path_walk() with VFS path lookup in a private namespace
> >>>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50
> >>>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the
> >>>> function put_mnt_ns()
> >>>> git bisect good 616511d039af402670de8500d0e24495113a9cab
> >>>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper
> >>>> functions for setting up private namespaces
> >>>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae
> >>>>
> >>>>
> >>>> The last "git bisect good" prints out:
> >>>>
> >>>> server10:/usr/src/linux # git bisect good
> >>>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit
> >>>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50
> >>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
> >>>> Date:   Mon Jun 22 15:09:14 2009 -0400
> >>>>
> >>>>     NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private
> >>>> namespace
> >>>>     
> >>>>     As noted in the previous patch, the NFSv4 client mount code
> >>>>         
> >> currently
> >>     
> >>>>     has several limitations. If the mount path contains symlinks, or
> >>>>     referrals, or even if it just contains a '..', then the client code
> >>>>     in
> >>>>     nfs4_path_walk() will fail with an error.
> >>>>     
> >>>>     This patch replaces the nfs4_path_walk()-based lookup with a helper
> >>>>     function that sets up a private namespace to represent the
> >>>>         
> >> namespace
> >>     
> >>>> on the
> >>>>     server, then uses the ordinary VFS and NFS path lookup code to walk
> >>>> down the
> >>>>     mount path in that namespace.
> >>>>     
> >>>>     Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> >>>>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> >>>>
> >>>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc
> >>>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M      fs
> >>>>
> >>>> Does this help you any further?
> >>>>
> >>>> Thanks!
> >>>> Robert
> >>>>         
> >>> Looks suspiciously like some error in testing.
> >>> Could you pls retest and verify again that
> >>> cf8d2c11cb77f129675478792122f50827e5b0ae
> >>> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad?
> >>>       

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-25  9:18                                     ` Michael S. Tsirkin
@ 2010-04-25 20:41                                       ` Robert Wimmer
  -1 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-04-25 20:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs,
	linux-kernel

I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL
to my .config. I've uploaded the dmesg output. Maybe it
helps a little bit:

https://bugzilla.kernel.org/attachment.cgi?id=26138

- Robert


On 04/25/10 11:18, Michael S. Tsirkin wrote:
> On Fri, Apr 23, 2010 at 07:26:52AM +0200, Robert Wimmer wrote:
>   
>>> I'm not sure why the lockup backtrace does not show function names -
>>> is the kernel stripped?
>>>       
>> I'm building the kernels always with "genkernel" a Gentoo
>> helper programm for kernel building. But I've looked into
>> the log file of genkernel and there is nothing mentioned about
>> striping the kernel. There will be a future release of genkernel
>> which supports this but this is currently not the case. Since
>> I haven't stripped the kernel I would answer no. Maybe a
>> kernel option which should be enabled?
>>
>> Thanks!
>> Robert
>>
>>     
> Hmm. I have these
> CONFIG_KALLSYMS=y
> CONFIG_KALLSYMS_ALL=y
> CONFIG_KALLSYMS_EXTRA_PASS=y
> # CONFIG_STRIP_ASM_SYMS is not set
>
>
>   
>>
>> On 04/22/10 12:03, Michael S. Tsirkin wrote:
>>     
>>> On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote:
>>>   
>>>       
>>>> Maybe some comments to my former mail about what I've done:
>>>> I started with a fresh clone (deleted the old /usr/src/linux
>>>> of course). 
>>>>
>>>> git clone
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
>>>>
>>>> Then I started bisect
>>>>
>>>> git bisect start 'v2.6.31' 'v2.6.30'
>>>>
>>>> and build the first kernel and then marked kernels which
>>>> "crashed" with "soft lockup" or "swapper page allocation failure"
>>>> as bad and the other ones as good. Before I've compiled
>>>> a new kernel I've always done a "make mrproper". I don't know
>>>> if this is needed but thought it wouldn't hurt.
>>>>
>>>> For me it was not clear that maybe I should have had stopped
>>>> testing after the first commit that came up with a "swapper
>>>> page allocation failure". It was only one commit which cased
>>>> the allocation failure. All the other commits marked as bad
>>>> came up with a soft lockup. But I thought it is important to
>>>> find the earliest commit which crashes. So should I find out
>>>> the commit with the allocation failure?
>>>>     
>>>>         
>>> I think you did the right thing. We'll have to
>>> figure out soft lockup thing, then if page allocation failure
>>> turns out to be a different issue, look at it.
>>>
>>>   
>>>       
>>>> As you requested I've now done now a
>>>>
>>>> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50
>>>>
>>>> which ended with a soft lockup within 3 min. after starting
>>>> the VM (see
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit)
>>>> with this kernel.
>>>>     
>>>>         
>>> I'm not sure why the lockup backtrace does not show function names -
>>> is the kernel stripped?
>>>
>>>   
>>>       
>>>> Then I've done a
>>>>
>>>> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae
>>>>
>>>> compiled and restarted the VM with this kernel version
>>>> (BTW: Of course I've always used the same .config for 
>>>> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae
>>>> is running fine.
>>>>
>>>> Thanks!
>>>> Robert
>>>>     
>>>>         
>>> Well, so the soft lockup issue seems NFS-related?
>>> Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to
>>> be causing problems on some old kernels (See bisect below). Any idea why?
>>>
>>>
>>>   
>>>       
>>>> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com>
>>>> wrote:
>>>>     
>>>>         
>>>>> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote:
>>>>>       
>>>>>           
>>>>>> So after the compiler was running hot I've now the following result:
>>>>>>
>>>>>> server10:/usr/src/linux # git bisect log 
>>>>>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
>>>>>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30
>>>>>> git bisect start 'v2.6.31' 'v2.6.30'
>>>>>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736):
>>>>>> videobuf: modify return value of VIDIOC_REQBUFS ioctl
>>>>>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3
>>>>>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device
>>>>>> capabilities of 82599 single speed fiber NICs.
>>>>>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e
>>>>>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android:
>>>>>> lowmemorykiller: fix up remaining checkpatch warnings
>>>>>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03
>>>>>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch
>>>>>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
>>>>>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70
>>>>>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch
>>>>>> 'for-linus'
>>>>>> of git://www.jni.nu/cris
>>>>>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d
>>>>>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge
>>>>>> git://git.infradead.org/mtd-2.6
>>>>>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2
>>>>>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091):
>>>>>> gspca_sonixj: Add light frequency control
>>>>>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb
>>>>>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge
>>>>>> git://git.infradead.org/~dwmw2/iommu-2.6.31
>>>>>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b
>>>>>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch
>>>>>> 'for-linus'
>>>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
>>>>>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a
>>>>>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix
>>>>>> card
>>>>>> driver reloading
>>>>>> git bisect good b01b4babbf204443b5a846a7494546501614cefc
>>>>>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace
>>>>>> nfs4_path_walk() with VFS path lookup in a private namespace
>>>>>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50
>>>>>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the
>>>>>> function put_mnt_ns()
>>>>>> git bisect good 616511d039af402670de8500d0e24495113a9cab
>>>>>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper
>>>>>> functions for setting up private namespaces
>>>>>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae
>>>>>>
>>>>>>
>>>>>> The last "git bisect good" prints out:
>>>>>>
>>>>>> server10:/usr/src/linux # git bisect good
>>>>>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit
>>>>>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50
>>>>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
>>>>>> Date:   Mon Jun 22 15:09:14 2009 -0400
>>>>>>
>>>>>>     NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private
>>>>>> namespace
>>>>>>     
>>>>>>     As noted in the previous patch, the NFSv4 client mount code
>>>>>>         
>>>>>>             
>>>> currently
>>>>     
>>>>         
>>>>>>     has several limitations. If the mount path contains symlinks, or
>>>>>>     referrals, or even if it just contains a '..', then the client code
>>>>>>     in
>>>>>>     nfs4_path_walk() will fail with an error.
>>>>>>     
>>>>>>     This patch replaces the nfs4_path_walk()-based lookup with a helper
>>>>>>     function that sets up a private namespace to represent the
>>>>>>         
>>>>>>             
>>>> namespace
>>>>     
>>>>         
>>>>>> on the
>>>>>>     server, then uses the ordinary VFS and NFS path lookup code to walk
>>>>>> down the
>>>>>>     mount path in that namespace.
>>>>>>     
>>>>>>     Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
>>>>>>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>>>>>>
>>>>>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc
>>>>>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M      fs
>>>>>>
>>>>>> Does this help you any further?
>>>>>>
>>>>>> Thanks!
>>>>>> Robert
>>>>>>         
>>>>>>             
>>>>> Looks suspiciously like some error in testing.
>>>>> Could you pls retest and verify again that
>>>>> cf8d2c11cb77f129675478792122f50827e5b0ae
>>>>> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad?
>>>>>       
>>>>>           


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-04-25 20:41                                       ` Robert Wimmer
  0 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-04-25 20:41 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs,
	linux-kernel

I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL
to my .config. I've uploaded the dmesg output. Maybe it
helps a little bit:

https://bugzilla.kernel.org/attachment.cgi?id=26138

- Robert


On 04/25/10 11:18, Michael S. Tsirkin wrote:
> On Fri, Apr 23, 2010 at 07:26:52AM +0200, Robert Wimmer wrote:
>   
>>> I'm not sure why the lockup backtrace does not show function names -
>>> is the kernel stripped?
>>>       
>> I'm building the kernels always with "genkernel" a Gentoo
>> helper programm for kernel building. But I've looked into
>> the log file of genkernel and there is nothing mentioned about
>> striping the kernel. There will be a future release of genkernel
>> which supports this but this is currently not the case. Since
>> I haven't stripped the kernel I would answer no. Maybe a
>> kernel option which should be enabled?
>>
>> Thanks!
>> Robert
>>
>>     
> Hmm. I have these
> CONFIG_KALLSYMS=y
> CONFIG_KALLSYMS_ALL=y
> CONFIG_KALLSYMS_EXTRA_PASS=y
> # CONFIG_STRIP_ASM_SYMS is not set
>
>
>   
>>
>> On 04/22/10 12:03, Michael S. Tsirkin wrote:
>>     
>>> On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote:
>>>   
>>>       
>>>> Maybe some comments to my former mail about what I've done:
>>>> I started with a fresh clone (deleted the old /usr/src/linux
>>>> of course). 
>>>>
>>>> git clone
>>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
>>>>
>>>> Then I started bisect
>>>>
>>>> git bisect start 'v2.6.31' 'v2.6.30'
>>>>
>>>> and build the first kernel and then marked kernels which
>>>> "crashed" with "soft lockup" or "swapper page allocation failure"
>>>> as bad and the other ones as good. Before I've compiled
>>>> a new kernel I've always done a "make mrproper". I don't know
>>>> if this is needed but thought it wouldn't hurt.
>>>>
>>>> For me it was not clear that maybe I should have had stopped
>>>> testing after the first commit that came up with a "swapper
>>>> page allocation failure". It was only one commit which cased
>>>> the allocation failure. All the other commits marked as bad
>>>> came up with a soft lockup. But I thought it is important to
>>>> find the earliest commit which crashes. So should I find out
>>>> the commit with the allocation failure?
>>>>     
>>>>         
>>> I think you did the right thing. We'll have to
>>> figure out soft lockup thing, then if page allocation failure
>>> turns out to be a different issue, look at it.
>>>
>>>   
>>>       
>>>> As you requested I've now done now a
>>>>
>>>> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50
>>>>
>>>> which ended with a soft lockup within 3 min. after starting
>>>> the VM (see
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit)
>>>> with this kernel.
>>>>     
>>>>         
>>> I'm not sure why the lockup backtrace does not show function names -
>>> is the kernel stripped?
>>>
>>>   
>>>       
>>>> Then I've done a
>>>>
>>>> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae
>>>>
>>>> compiled and restarted the VM with this kernel version
>>>> (BTW: Of course I've always used the same .config for 
>>>> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae
>>>> is running fine.
>>>>
>>>> Thanks!
>>>> Robert
>>>>     
>>>>         
>>> Well, so the soft lockup issue seems NFS-related?
>>> Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to
>>> be causing problems on some old kernels (See bisect below). Any idea why?
>>>
>>>
>>>   
>>>       
>>>> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com>
>>>> wrote:
>>>>     
>>>>         
>>>>> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote:
>>>>>       
>>>>>           
>>>>>> So after the compiler was running hot I've now the following result:
>>>>>>
>>>>>> server10:/usr/src/linux # git bisect log 
>>>>>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
>>>>>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30
>>>>>> git bisect start 'v2.6.31' 'v2.6.30'
>>>>>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736):
>>>>>> videobuf: modify return value of VIDIOC_REQBUFS ioctl
>>>>>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3
>>>>>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device
>>>>>> capabilities of 82599 single speed fiber NICs.
>>>>>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e
>>>>>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android:
>>>>>> lowmemorykiller: fix up remaining checkpatch warnings
>>>>>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03
>>>>>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch
>>>>>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
>>>>>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70
>>>>>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch
>>>>>> 'for-linus'
>>>>>> of git://www.jni.nu/cris
>>>>>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d
>>>>>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge
>>>>>> git://git.infradead.org/mtd-2.6
>>>>>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2
>>>>>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091):
>>>>>> gspca_sonixj: Add light frequency control
>>>>>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb
>>>>>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge
>>>>>> git://git.infradead.org/~dwmw2/iommu-2.6.31
>>>>>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b
>>>>>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch
>>>>>> 'for-linus'
>>>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
>>>>>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a
>>>>>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix
>>>>>> card
>>>>>> driver reloading
>>>>>> git bisect good b01b4babbf204443b5a846a7494546501614cefc
>>>>>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace
>>>>>> nfs4_path_walk() with VFS path lookup in a private namespace
>>>>>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50
>>>>>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the
>>>>>> function put_mnt_ns()
>>>>>> git bisect good 616511d039af402670de8500d0e24495113a9cab
>>>>>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper
>>>>>> functions for setting up private namespaces
>>>>>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae
>>>>>>
>>>>>>
>>>>>> The last "git bisect good" prints out:
>>>>>>
>>>>>> server10:/usr/src/linux # git bisect good
>>>>>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit
>>>>>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50
>>>>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
>>>>>> Date:   Mon Jun 22 15:09:14 2009 -0400
>>>>>>
>>>>>>     NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private
>>>>>> namespace
>>>>>>     
>>>>>>     As noted in the previous patch, the NFSv4 client mount code
>>>>>>         
>>>>>>             
>>>> currently
>>>>     
>>>>         
>>>>>>     has several limitations. If the mount path contains symlinks, or
>>>>>>     referrals, or even if it just contains a '..', then the client code
>>>>>>     in
>>>>>>     nfs4_path_walk() will fail with an error.
>>>>>>     
>>>>>>     This patch replaces the nfs4_path_walk()-based lookup with a helper
>>>>>>     function that sets up a private namespace to represent the
>>>>>>         
>>>>>>             
>>>> namespace
>>>>     
>>>>         
>>>>>> on the
>>>>>>     server, then uses the ordinary VFS and NFS path lookup code to walk
>>>>>> down the
>>>>>>     mount path in that namespace.
>>>>>>     
>>>>>>     Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
>>>>>>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
>>>>>>
>>>>>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc
>>>>>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M      fs
>>>>>>
>>>>>> Does this help you any further?
>>>>>>
>>>>>> Thanks!
>>>>>> Robert
>>>>>>         
>>>>>>             
>>>>> Looks suspiciously like some error in testing.
>>>>> Could you pls retest and verify again that
>>>>> cf8d2c11cb77f129675478792122f50827e5b0ae
>>>>> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad?
>>>>>       
>>>>>           

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-25 20:41                                       ` Robert Wimmer
@ 2010-04-25 20:49                                         ` Michael S. Tsirkin
  -1 siblings, 0 replies; 62+ messages in thread
From: Michael S. Tsirkin @ 2010-04-25 20:49 UTC (permalink / raw)
  To: Robert Wimmer
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs,
	linux-kernel

So, it's an NFS-related regression, which is consistent with the bisect
results. I guess someone who knows about NFS will have to look at it...
BTW, you probably want to label the bug as regression.

On Sun, Apr 25, 2010 at 10:41:59PM +0200, Robert Wimmer wrote:
> I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL
> to my .config. I've uploaded the dmesg output. Maybe it
> helps a little bit:
> 
> https://bugzilla.kernel.org/attachment.cgi?id=26138
> 
> - Robert
> 
> 
> On 04/25/10 11:18, Michael S. Tsirkin wrote:
> > On Fri, Apr 23, 2010 at 07:26:52AM +0200, Robert Wimmer wrote:
> >   
> >>> I'm not sure why the lockup backtrace does not show function names -
> >>> is the kernel stripped?
> >>>       
> >> I'm building the kernels always with "genkernel" a Gentoo
> >> helper programm for kernel building. But I've looked into
> >> the log file of genkernel and there is nothing mentioned about
> >> striping the kernel. There will be a future release of genkernel
> >> which supports this but this is currently not the case. Since
> >> I haven't stripped the kernel I would answer no. Maybe a
> >> kernel option which should be enabled?
> >>
> >> Thanks!
> >> Robert
> >>
> >>     
> > Hmm. I have these
> > CONFIG_KALLSYMS=y
> > CONFIG_KALLSYMS_ALL=y
> > CONFIG_KALLSYMS_EXTRA_PASS=y
> > # CONFIG_STRIP_ASM_SYMS is not set
> >
> >
> >   
> >>
> >> On 04/22/10 12:03, Michael S. Tsirkin wrote:
> >>     
> >>> On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote:
> >>>   
> >>>       
> >>>> Maybe some comments to my former mail about what I've done:
> >>>> I started with a fresh clone (deleted the old /usr/src/linux
> >>>> of course). 
> >>>>
> >>>> git clone
> >>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
> >>>>
> >>>> Then I started bisect
> >>>>
> >>>> git bisect start 'v2.6.31' 'v2.6.30'
> >>>>
> >>>> and build the first kernel and then marked kernels which
> >>>> "crashed" with "soft lockup" or "swapper page allocation failure"
> >>>> as bad and the other ones as good. Before I've compiled
> >>>> a new kernel I've always done a "make mrproper". I don't know
> >>>> if this is needed but thought it wouldn't hurt.
> >>>>
> >>>> For me it was not clear that maybe I should have had stopped
> >>>> testing after the first commit that came up with a "swapper
> >>>> page allocation failure". It was only one commit which cased
> >>>> the allocation failure. All the other commits marked as bad
> >>>> came up with a soft lockup. But I thought it is important to
> >>>> find the earliest commit which crashes. So should I find out
> >>>> the commit with the allocation failure?
> >>>>     
> >>>>         
> >>> I think you did the right thing. We'll have to
> >>> figure out soft lockup thing, then if page allocation failure
> >>> turns out to be a different issue, look at it.
> >>>
> >>>   
> >>>       
> >>>> As you requested I've now done now a
> >>>>
> >>>> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50
> >>>>
> >>>> which ended with a soft lockup within 3 min. after starting
> >>>> the VM (see
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit)
> >>>> with this kernel.
> >>>>     
> >>>>         
> >>> I'm not sure why the lockup backtrace does not show function names -
> >>> is the kernel stripped?
> >>>
> >>>   
> >>>       
> >>>> Then I've done a
> >>>>
> >>>> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae
> >>>>
> >>>> compiled and restarted the VM with this kernel version
> >>>> (BTW: Of course I've always used the same .config for 
> >>>> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae
> >>>> is running fine.
> >>>>
> >>>> Thanks!
> >>>> Robert
> >>>>     
> >>>>         
> >>> Well, so the soft lockup issue seems NFS-related?
> >>> Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to
> >>> be causing problems on some old kernels (See bisect below). Any idea why?
> >>>
> >>>
> >>>   
> >>>       
> >>>> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com>
> >>>> wrote:
> >>>>     
> >>>>         
> >>>>> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote:
> >>>>>       
> >>>>>           
> >>>>>> So after the compiler was running hot I've now the following result:
> >>>>>>
> >>>>>> server10:/usr/src/linux # git bisect log 
> >>>>>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
> >>>>>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30
> >>>>>> git bisect start 'v2.6.31' 'v2.6.30'
> >>>>>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736):
> >>>>>> videobuf: modify return value of VIDIOC_REQBUFS ioctl
> >>>>>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3
> >>>>>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device
> >>>>>> capabilities of 82599 single speed fiber NICs.
> >>>>>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e
> >>>>>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android:
> >>>>>> lowmemorykiller: fix up remaining checkpatch warnings
> >>>>>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03
> >>>>>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch
> >>>>>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
> >>>>>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70
> >>>>>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch
> >>>>>> 'for-linus'
> >>>>>> of git://www.jni.nu/cris
> >>>>>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d
> >>>>>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge
> >>>>>> git://git.infradead.org/mtd-2.6
> >>>>>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2
> >>>>>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091):
> >>>>>> gspca_sonixj: Add light frequency control
> >>>>>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb
> >>>>>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge
> >>>>>> git://git.infradead.org/~dwmw2/iommu-2.6.31
> >>>>>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b
> >>>>>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch
> >>>>>> 'for-linus'
> >>>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
> >>>>>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a
> >>>>>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix
> >>>>>> card
> >>>>>> driver reloading
> >>>>>> git bisect good b01b4babbf204443b5a846a7494546501614cefc
> >>>>>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace
> >>>>>> nfs4_path_walk() with VFS path lookup in a private namespace
> >>>>>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50
> >>>>>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the
> >>>>>> function put_mnt_ns()
> >>>>>> git bisect good 616511d039af402670de8500d0e24495113a9cab
> >>>>>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper
> >>>>>> functions for setting up private namespaces
> >>>>>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae
> >>>>>>
> >>>>>>
> >>>>>> The last "git bisect good" prints out:
> >>>>>>
> >>>>>> server10:/usr/src/linux # git bisect good
> >>>>>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit
> >>>>>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50
> >>>>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
> >>>>>> Date:   Mon Jun 22 15:09:14 2009 -0400
> >>>>>>
> >>>>>>     NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private
> >>>>>> namespace
> >>>>>>     
> >>>>>>     As noted in the previous patch, the NFSv4 client mount code
> >>>>>>         
> >>>>>>             
> >>>> currently
> >>>>     
> >>>>         
> >>>>>>     has several limitations. If the mount path contains symlinks, or
> >>>>>>     referrals, or even if it just contains a '..', then the client code
> >>>>>>     in
> >>>>>>     nfs4_path_walk() will fail with an error.
> >>>>>>     
> >>>>>>     This patch replaces the nfs4_path_walk()-based lookup with a helper
> >>>>>>     function that sets up a private namespace to represent the
> >>>>>>         
> >>>>>>             
> >>>> namespace
> >>>>     
> >>>>         
> >>>>>> on the
> >>>>>>     server, then uses the ordinary VFS and NFS path lookup code to walk
> >>>>>> down the
> >>>>>>     mount path in that namespace.
> >>>>>>     
> >>>>>>     Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> >>>>>>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> >>>>>>
> >>>>>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc
> >>>>>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M      fs
> >>>>>>
> >>>>>> Does this help you any further?
> >>>>>>
> >>>>>> Thanks!
> >>>>>> Robert
> >>>>>>         
> >>>>>>             
> >>>>> Looks suspiciously like some error in testing.
> >>>>> Could you pls retest and verify again that
> >>>>> cf8d2c11cb77f129675478792122f50827e5b0ae
> >>>>> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad?
> >>>>>       
> >>>>>           

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-04-25 20:49                                         ` Michael S. Tsirkin
  0 siblings, 0 replies; 62+ messages in thread
From: Michael S. Tsirkin @ 2010-04-25 20:49 UTC (permalink / raw)
  To: Robert Wimmer
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, Trond Myklebust, linux-nfs,
	linux-kernel

So, it's an NFS-related regression, which is consistent with the bisect
results. I guess someone who knows about NFS will have to look at it...
BTW, you probably want to label the bug as regression.

On Sun, Apr 25, 2010 at 10:41:59PM +0200, Robert Wimmer wrote:
> I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL
> to my .config. I've uploaded the dmesg output. Maybe it
> helps a little bit:
> 
> https://bugzilla.kernel.org/attachment.cgi?id=26138
> 
> - Robert
> 
> 
> On 04/25/10 11:18, Michael S. Tsirkin wrote:
> > On Fri, Apr 23, 2010 at 07:26:52AM +0200, Robert Wimmer wrote:
> >   
> >>> I'm not sure why the lockup backtrace does not show function names -
> >>> is the kernel stripped?
> >>>       
> >> I'm building the kernels always with "genkernel" a Gentoo
> >> helper programm for kernel building. But I've looked into
> >> the log file of genkernel and there is nothing mentioned about
> >> striping the kernel. There will be a future release of genkernel
> >> which supports this but this is currently not the case. Since
> >> I haven't stripped the kernel I would answer no. Maybe a
> >> kernel option which should be enabled?
> >>
> >> Thanks!
> >> Robert
> >>
> >>     
> > Hmm. I have these
> > CONFIG_KALLSYMS=y
> > CONFIG_KALLSYMS_ALL=y
> > CONFIG_KALLSYMS_EXTRA_PASS=y
> > # CONFIG_STRIP_ASM_SYMS is not set
> >
> >
> >   
> >>
> >> On 04/22/10 12:03, Michael S. Tsirkin wrote:
> >>     
> >>> On Thu, Apr 22, 2010 at 01:31:06PM +0200, kernel wrote:
> >>>   
> >>>       
> >>>> Maybe some comments to my former mail about what I've done:
> >>>> I started with a fresh clone (deleted the old /usr/src/linux
> >>>> of course). 
> >>>>
> >>>> git clone
> >>>> git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux-2.6.git linux
> >>>>
> >>>> Then I started bisect
> >>>>
> >>>> git bisect start 'v2.6.31' 'v2.6.30'
> >>>>
> >>>> and build the first kernel and then marked kernels which
> >>>> "crashed" with "soft lockup" or "swapper page allocation failure"
> >>>> as bad and the other ones as good. Before I've compiled
> >>>> a new kernel I've always done a "make mrproper". I don't know
> >>>> if this is needed but thought it wouldn't hurt.
> >>>>
> >>>> For me it was not clear that maybe I should have had stopped
> >>>> testing after the first commit that came up with a "swapper
> >>>> page allocation failure". It was only one commit which cased
> >>>> the allocation failure. All the other commits marked as bad
> >>>> came up with a soft lockup. But I thought it is important to
> >>>> find the earliest commit which crashes. So should I find out
> >>>> the commit with the allocation failure?
> >>>>     
> >>>>         
> >>> I think you did the right thing. We'll have to
> >>> figure out soft lockup thing, then if page allocation failure
> >>> turns out to be a different issue, look at it.
> >>>
> >>>   
> >>>       
> >>>> As you requested I've now done now a
> >>>>
> >>>> git checkout c02d7adf8c5429727a98bad1d039bccad4c61c50
> >>>>
> >>>> which ended with a soft lockup within 3 min. after starting
> >>>> the VM (see
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=26089&action=edit)
> >>>> with this kernel.
> >>>>     
> >>>>         
> >>> I'm not sure why the lockup backtrace does not show function names -
> >>> is the kernel stripped?
> >>>
> >>>   
> >>>       
> >>>> Then I've done a
> >>>>
> >>>> git checkout cf8d2c11cb77f129675478792122f50827e5b0ae
> >>>>
> >>>> compiled and restarted the VM with this kernel version
> >>>> (BTW: Of course I've always used the same .config for 
> >>>> all kernels I've build.). cf8d2c11cb77f129675478792122f50827e5b0ae
> >>>> is running fine.
> >>>>
> >>>> Thanks!
> >>>> Robert
> >>>>     
> >>>>         
> >>> Well, so the soft lockup issue seems NFS-related?
> >>> Trond, commit cf8d2c11cb77f129675478792122f50827e5b0ae seems to
> >>> be causing problems on some old kernels (See bisect below). Any idea why?
> >>>
> >>>
> >>>   
> >>>       
> >>>> On Wed, 21 Apr 2010 12:42:49 +0300, "Michael S. Tsirkin" <mst@redhat.com>
> >>>> wrote:
> >>>>     
> >>>>         
> >>>>> On Wed, Apr 21, 2010 at 01:23:12PM +0200, kernel wrote:
> >>>>>       
> >>>>>           
> >>>>>> So after the compiler was running hot I've now the following result:
> >>>>>>
> >>>>>> server10:/usr/src/linux # git bisect log 
> >>>>>> # bad: [74fca6a42863ffacaf7ba6f1936a9f228950f657] Linux 2.6.31
> >>>>>> # good: [07a2039b8eb0af4ff464efd3dfd95de5c02648c6] Linux 2.6.30
> >>>>>> git bisect start 'v2.6.31' 'v2.6.30'
> >>>>>> # good: [925d74ae717c9a12d3618eb4b36b9fb632e2cef3] V4L/DVB (11736):
> >>>>>> videobuf: modify return value of VIDIOC_REQBUFS ioctl
> >>>>>> git bisect good 925d74ae717c9a12d3618eb4b36b9fb632e2cef3
> >>>>>> # bad: [a380137900fca5c79e6daa9500bdb6ea5649188e] ixgbe: Fix device
> >>>>>> capabilities of 82599 single speed fiber NICs.
> >>>>>> git bisect bad a380137900fca5c79e6daa9500bdb6ea5649188e
> >>>>>> # good: [1dbb5765acc7a6fe4bc1957c001037cc9d02ae03] Staging: android:
> >>>>>> lowmemorykiller: fix up remaining checkpatch warnings
> >>>>>> git bisect good 1dbb5765acc7a6fe4bc1957c001037cc9d02ae03
> >>>>>> # good: [df36b439c5fedefe013d4449cb6a50d15e2f4d70] Merge branch
> >>>>>> 'for-2.6.31' of git://git.linux-nfs.org/projects/trondmy/nfs-2.6
> >>>>>> git bisect good df36b439c5fedefe013d4449cb6a50d15e2f4d70
> >>>>>> # bad: [a800faec1b21d7133b5f0c8c6dac593b7c4e118d] Merge branch
> >>>>>> 'for-linus'
> >>>>>> of git://www.jni.nu/cris
> >>>>>> git bisect bad a800faec1b21d7133b5f0c8c6dac593b7c4e118d
> >>>>>> # good: [ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2] Merge
> >>>>>> git://git.infradead.org/mtd-2.6
> >>>>>> git bisect good ac1b7c378ef26fba6694d5f118fe7fc16fee2fe2
> >>>>>> # bad: [37c6dbe290c05023b47f52528e30ce51336b93eb] V4L/DVB (12091):
> >>>>>> gspca_sonixj: Add light frequency control
> >>>>>> git bisect bad 37c6dbe290c05023b47f52528e30ce51336b93eb
> >>>>>> # bad: [687d680985b1438360a9ba470ece8b57cd205c3b] Merge
> >>>>>> git://git.infradead.org/~dwmw2/iommu-2.6.31
> >>>>>> git bisect bad 687d680985b1438360a9ba470ece8b57cd205c3b
> >>>>>> # bad: [1053414068bad659479e6efa62a67403b8b1ec0a] Merge branch
> >>>>>> 'for-linus'
> >>>>>> of git://git.kernel.org/pub/scm/linux/kernel/git/ieee1394/linux1394-2.6
> >>>>>> git bisect bad 1053414068bad659479e6efa62a67403b8b1ec0a
> >>>>>> # good: [b01b4babbf204443b5a846a7494546501614cefc] firewire: net: fix
> >>>>>> card
> >>>>>> driver reloading
> >>>>>> git bisect good b01b4babbf204443b5a846a7494546501614cefc
> >>>>>> # bad: [c02d7adf8c5429727a98bad1d039bccad4c61c50] NFSv4: Replace
> >>>>>> nfs4_path_walk() with VFS path lookup in a private namespace
> >>>>>> git bisect bad c02d7adf8c5429727a98bad1d039bccad4c61c50
> >>>>>> # good: [616511d039af402670de8500d0e24495113a9cab] VFS: Uninline the
> >>>>>> function put_mnt_ns()
> >>>>>> git bisect good 616511d039af402670de8500d0e24495113a9cab
> >>>>>> # good: [cf8d2c11cb77f129675478792122f50827e5b0ae] VFS: Add VFS helper
> >>>>>> functions for setting up private namespaces
> >>>>>> git bisect good cf8d2c11cb77f129675478792122f50827e5b0ae
> >>>>>>
> >>>>>>
> >>>>>> The last "git bisect good" prints out:
> >>>>>>
> >>>>>> server10:/usr/src/linux # git bisect good
> >>>>>> c02d7adf8c5429727a98bad1d039bccad4c61c50 is the first bad commit
> >>>>>> commit c02d7adf8c5429727a98bad1d039bccad4c61c50
> >>>>>> Author: Trond Myklebust <Trond.Myklebust@netapp.com>
> >>>>>> Date:   Mon Jun 22 15:09:14 2009 -0400
> >>>>>>
> >>>>>>     NFSv4: Replace nfs4_path_walk() with VFS path lookup in a private
> >>>>>> namespace
> >>>>>>     
> >>>>>>     As noted in the previous patch, the NFSv4 client mount code
> >>>>>>         
> >>>>>>             
> >>>> currently
> >>>>     
> >>>>         
> >>>>>>     has several limitations. If the mount path contains symlinks, or
> >>>>>>     referrals, or even if it just contains a '..', then the client code
> >>>>>>     in
> >>>>>>     nfs4_path_walk() will fail with an error.
> >>>>>>     
> >>>>>>     This patch replaces the nfs4_path_walk()-based lookup with a helper
> >>>>>>     function that sets up a private namespace to represent the
> >>>>>>         
> >>>>>>             
> >>>> namespace
> >>>>     
> >>>>         
> >>>>>> on the
> >>>>>>     server, then uses the ordinary VFS and NFS path lookup code to walk
> >>>>>> down the
> >>>>>>     mount path in that namespace.
> >>>>>>     
> >>>>>>     Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
> >>>>>>     Signed-off-by: Linus Torvalds <torvalds@linux-foundation.org>
> >>>>>>
> >>>>>> :040000 040000 97a18818f26ab9a0987f157257eb6f399c3cc1cc
> >>>>>> 9ab6c712bb64f1349b5ac9f2020191abb5780ca0 M      fs
> >>>>>>
> >>>>>> Does this help you any further?
> >>>>>>
> >>>>>> Thanks!
> >>>>>> Robert
> >>>>>>         
> >>>>>>             
> >>>>> Looks suspiciously like some error in testing.
> >>>>> Could you pls retest and verify again that
> >>>>> cf8d2c11cb77f129675478792122f50827e5b0ae
> >>>>> is good and c02d7adf8c5429727a98bad1d039bccad4c61c50 is bad?
> >>>>>       
> >>>>>           

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-25 20:49                                         ` Michael S. Tsirkin
  (?)
@ 2010-04-26 12:15                                           ` Trond Myklebust
  -1 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2010-04-26 12:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Robert Wimmer, Avi Kivity, Andrew Morton, linux-mm,
	bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs,
	linux-kernel

On Sun, 2010-04-25 at 23:49 +0300, Michael S. Tsirkin wrote: 
> So, it's an NFS-related regression, which is consistent with the bisect
> results. I guess someone who knows about NFS will have to look at it...
> BTW, you probably want to label the bug as regression.
> 
> On Sun, Apr 25, 2010 at 10:41:59PM +0200, Robert Wimmer wrote:
> > I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL
> > to my .config. I've uploaded the dmesg output. Maybe it
> > helps a little bit:
> > 
> > https://bugzilla.kernel.org/attachment.cgi?id=26138
> > 
> > - Robert
> > 

That last trace is just saying that the NFSv4 reboot recovery code is
crashing (which is hardly surprising if the memory management is hosed).

The initial bisection makes little sense to me: it is basically blaming
a page allocation problem on a change to the NFSv4 mount code. The only
way I can see that possibly happen is if you are hitting a stack
overflow.
So 2 questions:

  - Are you able to reproduce the bug when using NFSv3 instead?
  - Have you tried running with stack tracing enabled?

Cheers
  Trond

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-04-26 12:15                                           ` Trond Myklebust
  0 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2010-04-26 12:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Robert Wimmer, Avi Kivity, Andrew Morton, linux-mm,
	bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell,
	Mel Gorman, linux-nfs, linux-kernel

On Sun, 2010-04-25 at 23:49 +0300, Michael S. Tsirkin wrote: 
> So, it's an NFS-related regression, which is consistent with the bisect
> results. I guess someone who knows about NFS will have to look at it...
> BTW, you probably want to label the bug as regression.
> 
> On Sun, Apr 25, 2010 at 10:41:59PM +0200, Robert Wimmer wrote:
> > I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL
> > to my .config. I've uploaded the dmesg output. Maybe it
> > helps a little bit:
> > 
> > https://bugzilla.kernel.org/attachment.cgi?id=26138
> > 
> > - Robert
> > 

That last trace is just saying that the NFSv4 reboot recovery code is
crashing (which is hardly surprising if the memory management is hosed).

The initial bisection makes little sense to me: it is basically blaming
a page allocation problem on a change to the NFSv4 mount code. The only
way I can see that possibly happen is if you are hitting a stack
overflow.
So 2 questions:

  - Are you able to reproduce the bug when using NFSv3 instead?
  - Have you tried running with stack tracing enabled?

Cheers
  Trond

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-04-26 12:15                                           ` Trond Myklebust
  0 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2010-04-26 12:15 UTC (permalink / raw)
  To: Michael S. Tsirkin
  Cc: Robert Wimmer, Avi Kivity, Andrew Morton, linux-mm,
	bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs,
	linux-kernel

On Sun, 2010-04-25 at 23:49 +0300, Michael S. Tsirkin wrote: 
> So, it's an NFS-related regression, which is consistent with the bisect
> results. I guess someone who knows about NFS will have to look at it...
> BTW, you probably want to label the bug as regression.
> 
> On Sun, Apr 25, 2010 at 10:41:59PM +0200, Robert Wimmer wrote:
> > I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL
> > to my .config. I've uploaded the dmesg output. Maybe it
> > helps a little bit:
> > 
> > https://bugzilla.kernel.org/attachment.cgi?id=26138
> > 
> > - Robert
> > 

That last trace is just saying that the NFSv4 reboot recovery code is
crashing (which is hardly surprising if the memory management is hosed).

The initial bisection makes little sense to me: it is basically blaming
a page allocation problem on a change to the NFSv4 mount code. The only
way I can see that possibly happen is if you are hitting a stack
overflow.
So 2 questions:

  - Are you able to reproduce the bug when using NFSv3 instead?
  - Have you tried running with stack tracing enabled?

Cheers
  Trond

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-26 12:15                                           ` Trond Myklebust
@ 2010-04-26 20:25                                             ` Robert Wimmer
  -1 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-04-26 20:25 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm,
	bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs,
	linux-kernel


>>> I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL
>>> to my .config. I've uploaded the dmesg output. Maybe it
>>> helps a little bit:
>>>
>>> https://bugzilla.kernel.org/attachment.cgi?id=26138
>>>
>>> - Robert
>>>
>>>       
> That last trace is just saying that the NFSv4 reboot recovery code is
> crashing (which is hardly surprising if the memory management is hosed).
>
> The initial bisection makes little sense to me: it is basically blaming
> a page allocation problem on a change to the NFSv4 mount code. The only
> way I can see that possibly happen is if you are hitting a stack
> overflow.
> So 2 questions:
>
>   - Are you able to reproduce the bug when using NFSv3 instead?
>   

I've tried with NFSv3 now. With v4 the error normally occur
within 5 minutes. The VM is now running for one hour and no
soft lockup so far. So I would say it can't be reproduced with
v3.

>   - Have you tried running with stack tracing enabled?
>   

Can you explain this a little bit more please? CONFIG_STACKTRACE=y
was already enabled. I've now enabled

CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FTRACE_NMI_ENTER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_FTRACE_SYSCALLS=y
CONFIG_FTRACE_NMI_ENTER=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_GENERIC_TRACER=y
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_STACK_TRACER=y
CONFIG_KMEMTRACE=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_MMIOTRACE_SUPPORT=y

and run

echo 1 > /proc/sys/kernel/stack_tracer_enabled

But the output is mostly the same in dmesg/
var/log/messages. Can you please guide me how I can
enable the stack tracing you need?

Thanks!
Robert


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-04-26 20:25                                             ` Robert Wimmer
  0 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-04-26 20:25 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm,
	bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs,
	linux-kernel


>>> I've added CONFIG_KALLSYMS and CONFIG_KALLSYMS_ALL
>>> to my .config. I've uploaded the dmesg output. Maybe it
>>> helps a little bit:
>>>
>>> https://bugzilla.kernel.org/attachment.cgi?id=26138
>>>
>>> - Robert
>>>
>>>       
> That last trace is just saying that the NFSv4 reboot recovery code is
> crashing (which is hardly surprising if the memory management is hosed).
>
> The initial bisection makes little sense to me: it is basically blaming
> a page allocation problem on a change to the NFSv4 mount code. The only
> way I can see that possibly happen is if you are hitting a stack
> overflow.
> So 2 questions:
>
>   - Are you able to reproduce the bug when using NFSv3 instead?
>   

I've tried with NFSv3 now. With v4 the error normally occur
within 5 minutes. The VM is now running for one hour and no
soft lockup so far. So I would say it can't be reproduced with
v3.

>   - Have you tried running with stack tracing enabled?
>   

Can you explain this a little bit more please? CONFIG_STACKTRACE=y
was already enabled. I've now enabled

CONFIG_USER_STACKTRACE_SUPPORT=y
CONFIG_NOP_TRACER=y
CONFIG_HAVE_FTRACE_NMI_ENTER=y
CONFIG_HAVE_FUNCTION_TRACER=y
CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
CONFIG_HAVE_DYNAMIC_FTRACE=y
CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_FTRACE_SYSCALLS=y
CONFIG_FTRACE_NMI_ENTER=y
CONFIG_CONTEXT_SWITCH_TRACER=y
CONFIG_GENERIC_TRACER=y
CONFIG_FTRACE=y
CONFIG_FUNCTION_TRACER=y
CONFIG_FUNCTION_GRAPH_TRACER=y
CONFIG_FTRACE_SYSCALLS=y
CONFIG_STACK_TRACER=y
CONFIG_KMEMTRACE=y
CONFIG_DYNAMIC_FTRACE=y
CONFIG_FTRACE_MCOUNT_RECORD=y
CONFIG_HAVE_MMIOTRACE_SUPPORT=y

and run

echo 1 > /proc/sys/kernel/stack_tracer_enabled

But the output is mostly the same in dmesg/
var/log/messages. Can you please guide me how I can
enable the stack tracing you need?

Thanks!
Robert

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-04-26 21:04                                               ` Trond Myklebust
  0 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2010-04-26 21:04 UTC (permalink / raw)
  To: Robert Wimmer
  Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm,
	bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs,
	linux-kernel

On Mon, 2010-04-26 at 22:25 +0200, Robert Wimmer wrote: 
> I've tried with NFSv3 now. With v4 the error normally occur
> within 5 minutes. The VM is now running for one hour and no
> soft lockup so far. So I would say it can't be reproduced with
> v3.

Thanks! That's useful info.

> >   - Have you tried running with stack tracing enabled?
> >   
> 
> Can you explain this a little bit more please? CONFIG_STACKTRACE=y
> was already enabled. I've now enabled
> 
> CONFIG_USER_STACKTRACE_SUPPORT=y
> CONFIG_NOP_TRACER=y
> CONFIG_HAVE_FTRACE_NMI_ENTER=y
> CONFIG_HAVE_FUNCTION_TRACER=y
> CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
> CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
> CONFIG_HAVE_DYNAMIC_FTRACE=y
> CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
> CONFIG_HAVE_FTRACE_SYSCALLS=y
> CONFIG_FTRACE_NMI_ENTER=y
> CONFIG_CONTEXT_SWITCH_TRACER=y
> CONFIG_GENERIC_TRACER=y
> CONFIG_FTRACE=y
> CONFIG_FUNCTION_TRACER=y
> CONFIG_FUNCTION_GRAPH_TRACER=y
> CONFIG_FTRACE_SYSCALLS=y
> CONFIG_STACK_TRACER=y
> CONFIG_KMEMTRACE=y
> CONFIG_DYNAMIC_FTRACE=y
> CONFIG_FTRACE_MCOUNT_RECORD=y
> CONFIG_HAVE_MMIOTRACE_SUPPORT=y
> 
> and run
> 
> echo 1 > /proc/sys/kernel/stack_tracer_enabled
> 
> But the output is mostly the same in dmesg/
> var/log/messages. Can you please guide me how I can
> enable the stack tracing you need?

Sure. In addition to what you did above, please do

mount -t debugfs none /sys/kernel/debug

and then cat the contents of the pseudofile at

/sys/kernel/debug/tracing/stack_trace

Please do this more or less immediately after you've finished mounting
the NFSv4 client.

Does your server have the 'crossmnt' or 'nohide' flags set, or does it
use the 'refer' export option anywhere? If so, then we might have to
test further, since those may trigger the NFSv4 submount feature.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-04-26 21:04                                               ` Trond Myklebust
  0 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2010-04-26 21:04 UTC (permalink / raw)
  To: Robert Wimmer
  Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm,
	bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell,
	Mel Gorman, linux-nfs, linux-kernel

On Mon, 2010-04-26 at 22:25 +0200, Robert Wimmer wrote: 
> I've tried with NFSv3 now. With v4 the error normally occur
> within 5 minutes. The VM is now running for one hour and no
> soft lockup so far. So I would say it can't be reproduced with
> v3.

Thanks! That's useful info.

> >   - Have you tried running with stack tracing enabled?
> >   
> 
> Can you explain this a little bit more please? CONFIG_STACKTRACE=y
> was already enabled. I've now enabled
> 
> CONFIG_USER_STACKTRACE_SUPPORT=y
> CONFIG_NOP_TRACER=y
> CONFIG_HAVE_FTRACE_NMI_ENTER=y
> CONFIG_HAVE_FUNCTION_TRACER=y
> CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
> CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
> CONFIG_HAVE_DYNAMIC_FTRACE=y
> CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
> CONFIG_HAVE_FTRACE_SYSCALLS=y
> CONFIG_FTRACE_NMI_ENTER=y
> CONFIG_CONTEXT_SWITCH_TRACER=y
> CONFIG_GENERIC_TRACER=y
> CONFIG_FTRACE=y
> CONFIG_FUNCTION_TRACER=y
> CONFIG_FUNCTION_GRAPH_TRACER=y
> CONFIG_FTRACE_SYSCALLS=y
> CONFIG_STACK_TRACER=y
> CONFIG_KMEMTRACE=y
> CONFIG_DYNAMIC_FTRACE=y
> CONFIG_FTRACE_MCOUNT_RECORD=y
> CONFIG_HAVE_MMIOTRACE_SUPPORT=y
> 
> and run
> 
> echo 1 > /proc/sys/kernel/stack_tracer_enabled
> 
> But the output is mostly the same in dmesg/
> var/log/messages. Can you please guide me how I can
> enable the stack tracing you need?

Sure. In addition to what you did above, please do

mount -t debugfs none /sys/kernel/debug

and then cat the contents of the pseudofile at

/sys/kernel/debug/tracing/stack_trace

Please do this more or less immediately after you've finished mounting
the NFSv4 client.

Does your server have the 'crossmnt' or 'nohide' flags set, or does it
use the 'refer' export option anywhere? If so, then we might have to
test further, since those may trigger the NFSv4 submount feature.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-04-26 21:04                                               ` Trond Myklebust
  0 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2010-04-26 21:04 UTC (permalink / raw)
  To: Robert Wimmer
  Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm,
	bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs,
	linux-kernel

On Mon, 2010-04-26 at 22:25 +0200, Robert Wimmer wrote: 
> I've tried with NFSv3 now. With v4 the error normally occur
> within 5 minutes. The VM is now running for one hour and no
> soft lockup so far. So I would say it can't be reproduced with
> v3.

Thanks! That's useful info.

> >   - Have you tried running with stack tracing enabled?
> >   
> 
> Can you explain this a little bit more please? CONFIG_STACKTRACE=y
> was already enabled. I've now enabled
> 
> CONFIG_USER_STACKTRACE_SUPPORT=y
> CONFIG_NOP_TRACER=y
> CONFIG_HAVE_FTRACE_NMI_ENTER=y
> CONFIG_HAVE_FUNCTION_TRACER=y
> CONFIG_HAVE_FUNCTION_GRAPH_TRACER=y
> CONFIG_HAVE_FUNCTION_TRACE_MCOUNT_TEST=y
> CONFIG_HAVE_DYNAMIC_FTRACE=y
> CONFIG_HAVE_FTRACE_MCOUNT_RECORD=y
> CONFIG_HAVE_FTRACE_SYSCALLS=y
> CONFIG_FTRACE_NMI_ENTER=y
> CONFIG_CONTEXT_SWITCH_TRACER=y
> CONFIG_GENERIC_TRACER=y
> CONFIG_FTRACE=y
> CONFIG_FUNCTION_TRACER=y
> CONFIG_FUNCTION_GRAPH_TRACER=y
> CONFIG_FTRACE_SYSCALLS=y
> CONFIG_STACK_TRACER=y
> CONFIG_KMEMTRACE=y
> CONFIG_DYNAMIC_FTRACE=y
> CONFIG_FTRACE_MCOUNT_RECORD=y
> CONFIG_HAVE_MMIOTRACE_SUPPORT=y
> 
> and run
> 
> echo 1 > /proc/sys/kernel/stack_tracer_enabled
> 
> But the output is mostly the same in dmesg/
> var/log/messages. Can you please guide me how I can
> enable the stack tracing you need?

Sure. In addition to what you did above, please do

mount -t debugfs none /sys/kernel/debug

and then cat the contents of the pseudofile at

/sys/kernel/debug/tracing/stack_trace

Please do this more or less immediately after you've finished mounting
the NFSv4 client.

Does your server have the 'crossmnt' or 'nohide' flags set, or does it
use the 'refer' export option anywhere? If so, then we might have to
test further, since those may trigger the NFSv4 submount feature.

Cheers
  Trond

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-26 21:04                                               ` Trond Myklebust
@ 2010-04-26 22:18                                                 ` Robert Wimmer
  -1 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-04-26 22:18 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm,
	bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs,
	linux-kernel


> Sure. In addition to what you did above, please do
>
> mount -t debugfs none /sys/kernel/debug
>
> and then cat the contents of the pseudofile at
>
> /sys/kernel/debug/tracing/stack_trace
>
> Please do this more or less immediately after you've finished mounting
> the NFSv4 client.
>   

I've uploaded the stack trace. It was generated
directly after mounting. Here are the stacks:

After mounting:
https://bugzilla.kernel.org/attachment.cgi?id=26153
After the soft lockup:
https://bugzilla.kernel.org/attachment.cgi?id=26154
The dmesg output of the soft lockup:
https://bugzilla.kernel.org/attachment.cgi?id=26155

> Does your server have the 'crossmnt' or 'nohide' flags set, or does it
> use the 'refer' export option anywhere? If so, then we might have to
> test further, since those may trigger the NFSv4 submount feature.
>   
The server has the following settings:
rw,nohide,insecure,async,no_subtree_check,no_root_squash

Thanks!
Robert



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-04-26 22:18                                                 ` Robert Wimmer
  0 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-04-26 22:18 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm,
	bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs,
	linux-kernel


> Sure. In addition to what you did above, please do
>
> mount -t debugfs none /sys/kernel/debug
>
> and then cat the contents of the pseudofile at
>
> /sys/kernel/debug/tracing/stack_trace
>
> Please do this more or less immediately after you've finished mounting
> the NFSv4 client.
>   

I've uploaded the stack trace. It was generated
directly after mounting. Here are the stacks:

After mounting:
https://bugzilla.kernel.org/attachment.cgi?id=26153
After the soft lockup:
https://bugzilla.kernel.org/attachment.cgi?id=26154
The dmesg output of the soft lockup:
https://bugzilla.kernel.org/attachment.cgi?id=26155

> Does your server have the 'crossmnt' or 'nohide' flags set, or does it
> use the 'refer' export option anywhere? If so, then we might have to
> test further, since those may trigger the NFSv4 submount feature.
>   
The server has the following settings:
rw,nohide,insecure,async,no_subtree_check,no_root_squash

Thanks!
Robert


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-26 22:18                                                 ` Robert Wimmer
  (?)
@ 2010-04-26 23:28                                                 ` Trond Myklebust
  2010-04-27 22:56                                                     ` Robert Wimmer
  -1 siblings, 1 reply; 62+ messages in thread
From: Trond Myklebust @ 2010-04-26 23:28 UTC (permalink / raw)
  To: Robert Wimmer
  Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm,
	bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs,
	linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1532 bytes --]

On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: 
> > Sure. In addition to what you did above, please do
> >
> > mount -t debugfs none /sys/kernel/debug
> >
> > and then cat the contents of the pseudofile at
> >
> > /sys/kernel/debug/tracing/stack_trace
> >
> > Please do this more or less immediately after you've finished mounting
> > the NFSv4 client.
> >   
> 
> I've uploaded the stack trace. It was generated
> directly after mounting. Here are the stacks:
> 
> After mounting:
> https://bugzilla.kernel.org/attachment.cgi?id=26153
> After the soft lockup:
> https://bugzilla.kernel.org/attachment.cgi?id=26154
> The dmesg output of the soft lockup:
> https://bugzilla.kernel.org/attachment.cgi?id=26155
> 
> > Does your server have the 'crossmnt' or 'nohide' flags set, or does it
> > use the 'refer' export option anywhere? If so, then we might have to
> > test further, since those may trigger the NFSv4 submount feature.
> >   
> The server has the following settings:
> rw,nohide,insecure,async,no_subtree_check,no_root_squash
> 
> Thanks!
> Robert
> 
> 

That second trace is more than 5.5K deep, more than half of which is
socket overhead :-(((.

The process stack does not appear to have overflowed, however that trace
doesn't include any IRQ stack overhead.

OK... So what happens if we get rid of half of that trace by forcing
asynchronous tasks such as this to run entirely in rpciod instead of
first trying to run in the process context?

See the attachment...

[-- Attachment #2: linux-2.6.34-000-reduce_async_rpc_stack_usage.dif --]
[-- Type: text/plain, Size: 856 bytes --]

SUNRPC: Reduce asynchronous RPC task stack usage

From: Trond Myklebust <Trond.Myklebust@netapp.com>

We should just farm out asynchronous RPC tasks immediately to rpciod...

Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
---

 net/sunrpc/sched.c |    7 ++++++-
 1 files changed, 6 insertions(+), 1 deletions(-)


diff --git a/net/sunrpc/sched.c b/net/sunrpc/sched.c
index c8979ce..22a097f 100644
--- a/net/sunrpc/sched.c
+++ b/net/sunrpc/sched.c
@@ -720,7 +720,12 @@ void rpc_execute(struct rpc_task *task)
 {
 	rpc_set_active(task);
 	rpc_set_running(task);
-	__rpc_execute(task);
+	if (RPC_IS_ASYNC(task)) {
+		INIT_WORK(&task->u.tk_work, rpc_async_schedule);
+		queue_work(rpciod_workqueue, &task->u.tk_work);
+
+	} else
+		__rpc_execute(task);
 }
 
 static void rpc_async_schedule(struct work_struct *work)

^ permalink raw reply related	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-26 23:28                                                 ` Trond Myklebust
@ 2010-04-27 22:56                                                     ` Robert Wimmer
  0 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-04-27 22:56 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm,
	bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs,
	linux-kernel

I've applied the patch against the kernel which I got
from "git clone ...." resulted in a kernel 2.6.34-rc5.

The stack trace after mounting NFS is here:
https://bugzilla.kernel.org/attachment.cgi?id=26166
/var/log/messages after soft lockup:
https://bugzilla.kernel.org/attachment.cgi?id=26167

I hope that there is any usefull information in there.

Thanks!
Robert

On 04/27/10 01:28, Trond Myklebust wrote:
> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: 
>   
>>> Sure. In addition to what you did above, please do
>>>
>>> mount -t debugfs none /sys/kernel/debug
>>>
>>> and then cat the contents of the pseudofile at
>>>
>>> /sys/kernel/debug/tracing/stack_trace
>>>
>>> Please do this more or less immediately after you've finished mounting
>>> the NFSv4 client.
>>>   
>>>       
>> I've uploaded the stack trace. It was generated
>> directly after mounting. Here are the stacks:
>>
>> After mounting:
>> https://bugzilla.kernel.org/attachment.cgi?id=26153
>> After the soft lockup:
>> https://bugzilla.kernel.org/attachment.cgi?id=26154
>> The dmesg output of the soft lockup:
>> https://bugzilla.kernel.org/attachment.cgi?id=26155
>>
>>     
>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does it
>>> use the 'refer' export option anywhere? If so, then we might have to
>>> test further, since those may trigger the NFSv4 submount feature.
>>>   
>>>       
>> The server has the following settings:
>> rw,nohide,insecure,async,no_subtree_check,no_root_squash
>>
>> Thanks!
>> Robert
>>
>>
>>     
> That second trace is more than 5.5K deep, more than half of which is
> socket overhead :-(((.
>
> The process stack does not appear to have overflowed, however that trace
> doesn't include any IRQ stack overhead.
>
> OK... So what happens if we get rid of half of that trace by forcing
> asynchronous tasks such as this to run entirely in rpciod instead of
> first trying to run in the process context?
>
> See the attachment...
>   


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-04-27 22:56                                                     ` Robert Wimmer
  0 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-04-27 22:56 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Michael S. Tsirkin, Avi Kivity, Andrew Morton, linux-mm,
	bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs,
	linux-kernel

I've applied the patch against the kernel which I got
from "git clone ...." resulted in a kernel 2.6.34-rc5.

The stack trace after mounting NFS is here:
https://bugzilla.kernel.org/attachment.cgi?id=26166
/var/log/messages after soft lockup:
https://bugzilla.kernel.org/attachment.cgi?id=26167

I hope that there is any usefull information in there.

Thanks!
Robert

On 04/27/10 01:28, Trond Myklebust wrote:
> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: 
>   
>>> Sure. In addition to what you did above, please do
>>>
>>> mount -t debugfs none /sys/kernel/debug
>>>
>>> and then cat the contents of the pseudofile at
>>>
>>> /sys/kernel/debug/tracing/stack_trace
>>>
>>> Please do this more or less immediately after you've finished mounting
>>> the NFSv4 client.
>>>   
>>>       
>> I've uploaded the stack trace. It was generated
>> directly after mounting. Here are the stacks:
>>
>> After mounting:
>> https://bugzilla.kernel.org/attachment.cgi?id=26153
>> After the soft lockup:
>> https://bugzilla.kernel.org/attachment.cgi?id=26154
>> The dmesg output of the soft lockup:
>> https://bugzilla.kernel.org/attachment.cgi?id=26155
>>
>>     
>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does it
>>> use the 'refer' export option anywhere? If so, then we might have to
>>> test further, since those may trigger the NFSv4 submount feature.
>>>   
>>>       
>> The server has the following settings:
>> rw,nohide,insecure,async,no_subtree_check,no_root_squash
>>
>> Thanks!
>> Robert
>>
>>
>>     
> That second trace is more than 5.5K deep, more than half of which is
> socket overhead :-(((.
>
> The process stack does not appear to have overflowed, however that trace
> doesn't include any IRQ stack overhead.
>
> OK... So what happens if we get rid of half of that trace by forcing
> asynchronous tasks such as this to run entirely in rpciod instead of
> first trying to run in the process context?
>
> See the attachment...
>   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-04-27 22:56                                                     ` Robert Wimmer
@ 2010-05-03  8:11                                                       ` kernel
  -1 siblings, 0 replies; 62+ messages in thread
From: kernel @ 2010-05-03  8:11 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Michael S. Tsirkin" <mst@redhat.com>,
	Avi Kivity <avi@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, bugzilla-daemon@bugzilla.kernel.org,
	Rusty Russell <rusty@rustcorp.com.au>,
	Mel Gorman <mel@csn.ul.ie>, linux-nfs@vger.kernel.org,,
	linux-kernel

Anything we can do to investigate this further?

Thanks!
Robert


On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel@tauceti.net>
wrote:
> I've applied the patch against the kernel which I got
> from "git clone ...." resulted in a kernel 2.6.34-rc5.
> 
> The stack trace after mounting NFS is here:
> https://bugzilla.kernel.org/attachment.cgi?id=26166
> /var/log/messages after soft lockup:
> https://bugzilla.kernel.org/attachment.cgi?id=26167
> 
> I hope that there is any usefull information in there.
> 
> Thanks!
> Robert
> 
> On 04/27/10 01:28, Trond Myklebust wrote:
>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: 
>>   
>>>> Sure. In addition to what you did above, please do
>>>>
>>>> mount -t debugfs none /sys/kernel/debug
>>>>
>>>> and then cat the contents of the pseudofile at
>>>>
>>>> /sys/kernel/debug/tracing/stack_trace
>>>>
>>>> Please do this more or less immediately after you've finished
mounting
>>>> the NFSv4 client.
>>>>   
>>>>       
>>> I've uploaded the stack trace. It was generated
>>> directly after mounting. Here are the stacks:
>>>
>>> After mounting:
>>> https://bugzilla.kernel.org/attachment.cgi?id=26153
>>> After the soft lockup:
>>> https://bugzilla.kernel.org/attachment.cgi?id=26154
>>> The dmesg output of the soft lockup:
>>> https://bugzilla.kernel.org/attachment.cgi?id=26155
>>>
>>>     
>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does
it
>>>> use the 'refer' export option anywhere? If so, then we might have to
>>>> test further, since those may trigger the NFSv4 submount feature.
>>>>   
>>>>       
>>> The server has the following settings:
>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash
>>>
>>> Thanks!
>>> Robert
>>>
>>>
>>>     
>> That second trace is more than 5.5K deep, more than half of which is
>> socket overhead :-(((.
>>
>> The process stack does not appear to have overflowed, however that
trace
>> doesn't include any IRQ stack overhead.
>>
>> OK... So what happens if we get rid of half of that trace by forcing
>> asynchronous tasks such as this to run entirely in rpciod instead of
>> first trying to run in the process context?
>>
>> See the attachment...
>>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-05-03  8:11                                                       ` kernel
  0 siblings, 0 replies; 62+ messages in thread
From: kernel @ 2010-05-03  8:11 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: Michael S. Tsirkin" <mst@redhat.com>,
	Avi Kivity <avi@redhat.com>,
	Andrew Morton <akpm@linux-foundation.org>,
	linux-mm@kvack.org, bugzilla-daemon@bugzilla.kernel.org,
	Rusty Russell <rusty@rustcorp.com.au>,
	Mel Gorman <mel@csn.ul.ie>, linux-nfs@vger.kernel.org,

Anything we can do to investigate this further?

Thanks!
Robert


On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel@tauceti.net>
wrote:
> I've applied the patch against the kernel which I got
> from "git clone ...." resulted in a kernel 2.6.34-rc5.
> 
> The stack trace after mounting NFS is here:
> https://bugzilla.kernel.org/attachment.cgi?id=26166
> /var/log/messages after soft lockup:
> https://bugzilla.kernel.org/attachment.cgi?id=26167
> 
> I hope that there is any usefull information in there.
> 
> Thanks!
> Robert
> 
> On 04/27/10 01:28, Trond Myklebust wrote:
>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: 
>>   
>>>> Sure. In addition to what you did above, please do
>>>>
>>>> mount -t debugfs none /sys/kernel/debug
>>>>
>>>> and then cat the contents of the pseudofile at
>>>>
>>>> /sys/kernel/debug/tracing/stack_trace
>>>>
>>>> Please do this more or less immediately after you've finished
mounting
>>>> the NFSv4 client.
>>>>   
>>>>       
>>> I've uploaded the stack trace. It was generated
>>> directly after mounting. Here are the stacks:
>>>
>>> After mounting:
>>> https://bugzilla.kernel.org/attachment.cgi?id=26153
>>> After the soft lockup:
>>> https://bugzilla.kernel.org/attachment.cgi?id=26154
>>> The dmesg output of the soft lockup:
>>> https://bugzilla.kernel.org/attachment.cgi?id=26155
>>>
>>>     
>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does
it
>>>> use the 'refer' export option anywhere? If so, then we might have to
>>>> test further, since those may trigger the NFSv4 submount feature.
>>>>   
>>>>       
>>> The server has the following settings:
>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash
>>>
>>> Thanks!
>>> Robert
>>>
>>>
>>>     
>> That second trace is more than 5.5K deep, more than half of which is
>> socket overhead :-(((.
>>
>> The process stack does not appear to have overflowed, however that
trace
>> doesn't include any IRQ stack overhead.
>>
>> OK... So what happens if we get rid of half of that trace by forcing
>> asynchronous tasks such as this to run entirely in rpciod instead of
>> first trying to run in the process context?
>>
>> See the attachment...
>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-05-06 21:19                                                         ` Robert Wimmer
  0 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-05-06 21:19 UTC (permalink / raw)
  To: Trond Myklebust, mst
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, linux-nfs, linux-kernel

I don't know if someone is still interested in this
but I think Trond isn't further interested because
the last error was of cource a "page allocation
failure" and not a "soft lookup" which Trond was
trying to solve. But the patch was for 2.6.34 and
the "soft lookup" comes up only with some 2.6.30 and
maybe some 2.6.31 kernel versions. But the first error
I reported was a "page allocation failure" which
all kernels >= 2.6.32 produces with this configuration
I use (NFSv4).

Michael suggested to first solve the "soft lookup"
before further investigating the "page allocation
failure". We know that the "soft lookup" only
pop's up with NFSv4 and not v3. I really want to
use v4 but since I'm not a kernel hacker someone
must guide me what to try next.

I know that you're all have a lot of other work to
do but if there're no ideas left what to do next
it's maybe best to close the bug for now and I stay with
kernel 2.6.30 for now or go back to NFS v3 if I
upgrade to a newer kernel. Maybe the error will
be fixed "by accident" in >= 2.6.35 ;-) 

Thanks!
Robert



On 05/03/10 10:11, kernel@tauceti.net wrote:
> Anything we can do to investigate this further?
>
> Thanks!
> Robert
>
>
> On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel@tauceti.net>
> wrote:
>   
>> I've applied the patch against the kernel which I got
>> from "git clone ...." resulted in a kernel 2.6.34-rc5.
>>
>> The stack trace after mounting NFS is here:
>> https://bugzilla.kernel.org/attachment.cgi?id=26166
>> /var/log/messages after soft lockup:
>> https://bugzilla.kernel.org/attachment.cgi?id=26167
>>
>> I hope that there is any usefull information in there.
>>
>> Thanks!
>> Robert
>>
>> On 04/27/10 01:28, Trond Myklebust wrote:
>>     
>>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: 
>>>   
>>>       
>>>>> Sure. In addition to what you did above, please do
>>>>>
>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>
>>>>> and then cat the contents of the pseudofile at
>>>>>
>>>>> /sys/kernel/debug/tracing/stack_trace
>>>>>
>>>>> Please do this more or less immediately after you've finished
>>>>>           
> mounting
>   
>>>>> the NFSv4 client.
>>>>>   
>>>>>       
>>>>>           
>>>> I've uploaded the stack trace. It was generated
>>>> directly after mounting. Here are the stacks:
>>>>
>>>> After mounting:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26153
>>>> After the soft lockup:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26154
>>>> The dmesg output of the soft lockup:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26155
>>>>
>>>>     
>>>>         
>>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does
>>>>>           
> it
>   
>>>>> use the 'refer' export option anywhere? If so, then we might have to
>>>>> test further, since those may trigger the NFSv4 submount feature.
>>>>>   
>>>>>       
>>>>>           
>>>> The server has the following settings:
>>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash
>>>>
>>>> Thanks!
>>>> Robert
>>>>
>>>>
>>>>     
>>>>         
>>> That second trace is more than 5.5K deep, more than half of which is
>>> socket overhead :-(((.
>>>
>>> The process stack does not appear to have overflowed, however that
>>>       
> trace
>   
>>> doesn't include any IRQ stack overhead.
>>>
>>> OK... So what happens if we get rid of half of that trace by forcing
>>> asynchronous tasks such as this to run entirely in rpciod instead of
>>> first trying to run in the process context?
>>>
>>> See the attachment...
>>>
>>>       


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-05-06 21:19                                                         ` Robert Wimmer
  0 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-05-06 21:19 UTC (permalink / raw)
  To: Trond Myklebust, mst
  Cc: Avi Kivity, Andrew Morton, linux-mm,
	bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell,
	Mel Gorman, linux-nfs, linux-kernel

I don't know if someone is still interested in this
but I think Trond isn't further interested because
the last error was of cource a "page allocation
failure" and not a "soft lookup" which Trond was
trying to solve. But the patch was for 2.6.34 and
the "soft lookup" comes up only with some 2.6.30 and
maybe some 2.6.31 kernel versions. But the first error
I reported was a "page allocation failure" which
all kernels >= 2.6.32 produces with this configuration
I use (NFSv4).

Michael suggested to first solve the "soft lookup"
before further investigating the "page allocation
failure". We know that the "soft lookup" only
pop's up with NFSv4 and not v3. I really want to
use v4 but since I'm not a kernel hacker someone
must guide me what to try next.

I know that you're all have a lot of other work to
do but if there're no ideas left what to do next
it's maybe best to close the bug for now and I stay with
kernel 2.6.30 for now or go back to NFS v3 if I
upgrade to a newer kernel. Maybe the error will
be fixed "by accident" in >= 2.6.35 ;-) 

Thanks!
Robert



On 05/03/10 10:11, kernel-PAwl83ecUlHR7s880joybQ@public.gmane.org wrote:
> Anything we can do to investigate this further?
>
> Thanks!
> Robert
>
>
> On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel-PAwl83ecUlHR7s880joybQ@public.gmane.org>
> wrote:
>   
>> I've applied the patch against the kernel which I got
>> from "git clone ...." resulted in a kernel 2.6.34-rc5.
>>
>> The stack trace after mounting NFS is here:
>> https://bugzilla.kernel.org/attachment.cgi?id=26166
>> /var/log/messages after soft lockup:
>> https://bugzilla.kernel.org/attachment.cgi?id=26167
>>
>> I hope that there is any usefull information in there.
>>
>> Thanks!
>> Robert
>>
>> On 04/27/10 01:28, Trond Myklebust wrote:
>>     
>>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: 
>>>   
>>>       
>>>>> Sure. In addition to what you did above, please do
>>>>>
>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>
>>>>> and then cat the contents of the pseudofile at
>>>>>
>>>>> /sys/kernel/debug/tracing/stack_trace
>>>>>
>>>>> Please do this more or less immediately after you've finished
>>>>>           
> mounting
>   
>>>>> the NFSv4 client.
>>>>>   
>>>>>       
>>>>>           
>>>> I've uploaded the stack trace. It was generated
>>>> directly after mounting. Here are the stacks:
>>>>
>>>> After mounting:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26153
>>>> After the soft lockup:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26154
>>>> The dmesg output of the soft lockup:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26155
>>>>
>>>>     
>>>>         
>>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does
>>>>>           
> it
>   
>>>>> use the 'refer' export option anywhere? If so, then we might have to
>>>>> test further, since those may trigger the NFSv4 submount feature.
>>>>>   
>>>>>       
>>>>>           
>>>> The server has the following settings:
>>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash
>>>>
>>>> Thanks!
>>>> Robert
>>>>
>>>>
>>>>     
>>>>         
>>> That second trace is more than 5.5K deep, more than half of which is
>>> socket overhead :-(((.
>>>
>>> The process stack does not appear to have overflowed, however that
>>>       
> trace
>   
>>> doesn't include any IRQ stack overhead.
>>>
>>> OK... So what happens if we get rid of half of that trace by forcing
>>> asynchronous tasks such as this to run entirely in rpciod instead of
>>> first trying to run in the process context?
>>>
>>> See the attachment...
>>>
>>>       


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-05-06 21:19                                                         ` Robert Wimmer
  0 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-05-06 21:19 UTC (permalink / raw)
  To: Trond Myklebust, mst
  Cc: Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, linux-nfs, linux-kernel

I don't know if someone is still interested in this
but I think Trond isn't further interested because
the last error was of cource a "page allocation
failure" and not a "soft lookup" which Trond was
trying to solve. But the patch was for 2.6.34 and
the "soft lookup" comes up only with some 2.6.30 and
maybe some 2.6.31 kernel versions. But the first error
I reported was a "page allocation failure" which
all kernels >= 2.6.32 produces with this configuration
I use (NFSv4).

Michael suggested to first solve the "soft lookup"
before further investigating the "page allocation
failure". We know that the "soft lookup" only
pop's up with NFSv4 and not v3. I really want to
use v4 but since I'm not a kernel hacker someone
must guide me what to try next.

I know that you're all have a lot of other work to
do but if there're no ideas left what to do next
it's maybe best to close the bug for now and I stay with
kernel 2.6.30 for now or go back to NFS v3 if I
upgrade to a newer kernel. Maybe the error will
be fixed "by accident" in >= 2.6.35 ;-) 

Thanks!
Robert



On 05/03/10 10:11, kernel@tauceti.net wrote:
> Anything we can do to investigate this further?
>
> Thanks!
> Robert
>
>
> On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel@tauceti.net>
> wrote:
>   
>> I've applied the patch against the kernel which I got
>> from "git clone ...." resulted in a kernel 2.6.34-rc5.
>>
>> The stack trace after mounting NFS is here:
>> https://bugzilla.kernel.org/attachment.cgi?id=26166
>> /var/log/messages after soft lockup:
>> https://bugzilla.kernel.org/attachment.cgi?id=26167
>>
>> I hope that there is any usefull information in there.
>>
>> Thanks!
>> Robert
>>
>> On 04/27/10 01:28, Trond Myklebust wrote:
>>     
>>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: 
>>>   
>>>       
>>>>> Sure. In addition to what you did above, please do
>>>>>
>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>
>>>>> and then cat the contents of the pseudofile at
>>>>>
>>>>> /sys/kernel/debug/tracing/stack_trace
>>>>>
>>>>> Please do this more or less immediately after you've finished
>>>>>           
> mounting
>   
>>>>> the NFSv4 client.
>>>>>   
>>>>>       
>>>>>           
>>>> I've uploaded the stack trace. It was generated
>>>> directly after mounting. Here are the stacks:
>>>>
>>>> After mounting:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26153
>>>> After the soft lockup:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26154
>>>> The dmesg output of the soft lockup:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26155
>>>>
>>>>     
>>>>         
>>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does
>>>>>           
> it
>   
>>>>> use the 'refer' export option anywhere? If so, then we might have to
>>>>> test further, since those may trigger the NFSv4 submount feature.
>>>>>   
>>>>>       
>>>>>           
>>>> The server has the following settings:
>>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash
>>>>
>>>> Thanks!
>>>> Robert
>>>>
>>>>
>>>>     
>>>>         
>>> That second trace is more than 5.5K deep, more than half of which is
>>> socket overhead :-(((.
>>>
>>> The process stack does not appear to have overflowed, however that
>>>       
> trace
>   
>>> doesn't include any IRQ stack overhead.
>>>
>>> OK... So what happens if we get rid of half of that trace by forcing
>>> asynchronous tasks such as this to run entirely in rpciod instead of
>>> first trying to run in the process context?
>>>
>>> See the attachment...
>>>
>>>       

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-05-06 21:30                                                           ` Trond Myklebust
  0 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2010-05-06 21:30 UTC (permalink / raw)
  To: Robert Wimmer
  Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, linux-nfs, linux-kernel

Sorry. I've been caught up in work in the past few days.

I can certainly help with the soft lockup if you are able to supply
either a dump that includes all threads stuck in the NFS, or a (binary)
wireshark dump that shows the NFSv4 traffic between the client and
server around the time of the hang.

Cheers
  Trond

On Thu, 2010-05-06 at 23:19 +0200, Robert Wimmer wrote: 
> I don't know if someone is still interested in this
> but I think Trond isn't further interested because
> the last error was of cource a "page allocation
> failure" and not a "soft lookup" which Trond was
> trying to solve. But the patch was for 2.6.34 and
> the "soft lookup" comes up only with some 2.6.30 and
> maybe some 2.6.31 kernel versions. But the first error
> I reported was a "page allocation failure" which
> all kernels >= 2.6.32 produces with this configuration
> I use (NFSv4).
> 
> Michael suggested to first solve the "soft lookup"
> before further investigating the "page allocation
> failure". We know that the "soft lookup" only
> pop's up with NFSv4 and not v3. I really want to
> use v4 but since I'm not a kernel hacker someone
> must guide me what to try next.
> 
> I know that you're all have a lot of other work to
> do but if there're no ideas left what to do next
> it's maybe best to close the bug for now and I stay with
> kernel 2.6.30 for now or go back to NFS v3 if I
> upgrade to a newer kernel. Maybe the error will
> be fixed "by accident" in >= 2.6.35 ;-) 
> 
> Thanks!
> Robert
> 
> 
> 
> On 05/03/10 10:11, kernel@tauceti.net wrote:
> > Anything we can do to investigate this further?
> >
> > Thanks!
> > Robert
> >
> >
> > On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel@tauceti.net>
> > wrote:
> >   
> >> I've applied the patch against the kernel which I got
> >> from "git clone ...." resulted in a kernel 2.6.34-rc5.
> >>
> >> The stack trace after mounting NFS is here:
> >> https://bugzilla.kernel.org/attachment.cgi?id=26166
> >> /var/log/messages after soft lockup:
> >> https://bugzilla.kernel.org/attachment.cgi?id=26167
> >>
> >> I hope that there is any usefull information in there.
> >>
> >> Thanks!
> >> Robert
> >>
> >> On 04/27/10 01:28, Trond Myklebust wrote:
> >>     
> >>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: 
> >>>   
> >>>       
> >>>>> Sure. In addition to what you did above, please do
> >>>>>
> >>>>> mount -t debugfs none /sys/kernel/debug
> >>>>>
> >>>>> and then cat the contents of the pseudofile at
> >>>>>
> >>>>> /sys/kernel/debug/tracing/stack_trace
> >>>>>
> >>>>> Please do this more or less immediately after you've finished
> >>>>>           
> > mounting
> >   
> >>>>> the NFSv4 client.
> >>>>>   
> >>>>>       
> >>>>>           
> >>>> I've uploaded the stack trace. It was generated
> >>>> directly after mounting. Here are the stacks:
> >>>>
> >>>> After mounting:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=26153
> >>>> After the soft lockup:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=26154
> >>>> The dmesg output of the soft lockup:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=26155
> >>>>
> >>>>     
> >>>>         
> >>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does
> >>>>>           
> > it
> >   
> >>>>> use the 'refer' export option anywhere? If so, then we might have to
> >>>>> test further, since those may trigger the NFSv4 submount feature.
> >>>>>   
> >>>>>       
> >>>>>           
> >>>> The server has the following settings:
> >>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash
> >>>>
> >>>> Thanks!
> >>>> Robert
> >>>>
> >>>>
> >>>>     
> >>>>         
> >>> That second trace is more than 5.5K deep, more than half of which is
> >>> socket overhead :-(((.
> >>>
> >>> The process stack does not appear to have overflowed, however that
> >>>       
> > trace
> >   
> >>> doesn't include any IRQ stack overhead.
> >>>
> >>> OK... So what happens if we get rid of half of that trace by forcing
> >>> asynchronous tasks such as this to run entirely in rpciod instead of
> >>> first trying to run in the process context?
> >>>
> >>> See the attachment...
> >>>
> >>>       
> 



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-05-06 21:30                                                           ` Trond Myklebust
  0 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2010-05-06 21:30 UTC (permalink / raw)
  To: Robert Wimmer
  Cc: mst, Avi Kivity, Andrew Morton, linux-mm,
	bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell,
	Mel Gorman, linux-nfs, linux-kernel

Sorry. I've been caught up in work in the past few days.

I can certainly help with the soft lockup if you are able to supply
either a dump that includes all threads stuck in the NFS, or a (binary)
wireshark dump that shows the NFSv4 traffic between the client and
server around the time of the hang.

Cheers
  Trond

On Thu, 2010-05-06 at 23:19 +0200, Robert Wimmer wrote: 
> I don't know if someone is still interested in this
> but I think Trond isn't further interested because
> the last error was of cource a "page allocation
> failure" and not a "soft lookup" which Trond was
> trying to solve. But the patch was for 2.6.34 and
> the "soft lookup" comes up only with some 2.6.30 and
> maybe some 2.6.31 kernel versions. But the first error
> I reported was a "page allocation failure" which
> all kernels >= 2.6.32 produces with this configuration
> I use (NFSv4).
> 
> Michael suggested to first solve the "soft lookup"
> before further investigating the "page allocation
> failure". We know that the "soft lookup" only
> pop's up with NFSv4 and not v3. I really want to
> use v4 but since I'm not a kernel hacker someone
> must guide me what to try next.
> 
> I know that you're all have a lot of other work to
> do but if there're no ideas left what to do next
> it's maybe best to close the bug for now and I stay with
> kernel 2.6.30 for now or go back to NFS v3 if I
> upgrade to a newer kernel. Maybe the error will
> be fixed "by accident" in >= 2.6.35 ;-) 
> 
> Thanks!
> Robert
> 
> 
> 
> On 05/03/10 10:11, kernel-PAwl83ecUlHR7s880joybQ@public.gmane.org wrote:
> > Anything we can do to investigate this further?
> >
> > Thanks!
> > Robert
> >
> >
> > On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel-PAwl83ecUlHR7s880joybQ@public.gmane.org>
> > wrote:
> >   
> >> I've applied the patch against the kernel which I got
> >> from "git clone ...." resulted in a kernel 2.6.34-rc5.
> >>
> >> The stack trace after mounting NFS is here:
> >> https://bugzilla.kernel.org/attachment.cgi?id=26166
> >> /var/log/messages after soft lockup:
> >> https://bugzilla.kernel.org/attachment.cgi?id=26167
> >>
> >> I hope that there is any usefull information in there.
> >>
> >> Thanks!
> >> Robert
> >>
> >> On 04/27/10 01:28, Trond Myklebust wrote:
> >>     
> >>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: 
> >>>   
> >>>       
> >>>>> Sure. In addition to what you did above, please do
> >>>>>
> >>>>> mount -t debugfs none /sys/kernel/debug
> >>>>>
> >>>>> and then cat the contents of the pseudofile at
> >>>>>
> >>>>> /sys/kernel/debug/tracing/stack_trace
> >>>>>
> >>>>> Please do this more or less immediately after you've finished
> >>>>>           
> > mounting
> >   
> >>>>> the NFSv4 client.
> >>>>>   
> >>>>>       
> >>>>>           
> >>>> I've uploaded the stack trace. It was generated
> >>>> directly after mounting. Here are the stacks:
> >>>>
> >>>> After mounting:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=26153
> >>>> After the soft lockup:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=26154
> >>>> The dmesg output of the soft lockup:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=26155
> >>>>
> >>>>     
> >>>>         
> >>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does
> >>>>>           
> > it
> >   
> >>>>> use the 'refer' export option anywhere? If so, then we might have to
> >>>>> test further, since those may trigger the NFSv4 submount feature.
> >>>>>   
> >>>>>       
> >>>>>           
> >>>> The server has the following settings:
> >>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash
> >>>>
> >>>> Thanks!
> >>>> Robert
> >>>>
> >>>>
> >>>>     
> >>>>         
> >>> That second trace is more than 5.5K deep, more than half of which is
> >>> socket overhead :-(((.
> >>>
> >>> The process stack does not appear to have overflowed, however that
> >>>       
> > trace
> >   
> >>> doesn't include any IRQ stack overhead.
> >>>
> >>> OK... So what happens if we get rid of half of that trace by forcing
> >>> asynchronous tasks such as this to run entirely in rpciod instead of
> >>> first trying to run in the process context?
> >>>
> >>> See the attachment...
> >>>
> >>>       
> 



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-05-06 21:30                                                           ` Trond Myklebust
  0 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2010-05-06 21:30 UTC (permalink / raw)
  To: Robert Wimmer
  Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, linux-nfs, linux-kernel

Sorry. I've been caught up in work in the past few days.

I can certainly help with the soft lockup if you are able to supply
either a dump that includes all threads stuck in the NFS, or a (binary)
wireshark dump that shows the NFSv4 traffic between the client and
server around the time of the hang.

Cheers
  Trond

On Thu, 2010-05-06 at 23:19 +0200, Robert Wimmer wrote: 
> I don't know if someone is still interested in this
> but I think Trond isn't further interested because
> the last error was of cource a "page allocation
> failure" and not a "soft lookup" which Trond was
> trying to solve. But the patch was for 2.6.34 and
> the "soft lookup" comes up only with some 2.6.30 and
> maybe some 2.6.31 kernel versions. But the first error
> I reported was a "page allocation failure" which
> all kernels >= 2.6.32 produces with this configuration
> I use (NFSv4).
> 
> Michael suggested to first solve the "soft lookup"
> before further investigating the "page allocation
> failure". We know that the "soft lookup" only
> pop's up with NFSv4 and not v3. I really want to
> use v4 but since I'm not a kernel hacker someone
> must guide me what to try next.
> 
> I know that you're all have a lot of other work to
> do but if there're no ideas left what to do next
> it's maybe best to close the bug for now and I stay with
> kernel 2.6.30 for now or go back to NFS v3 if I
> upgrade to a newer kernel. Maybe the error will
> be fixed "by accident" in >= 2.6.35 ;-) 
> 
> Thanks!
> Robert
> 
> 
> 
> On 05/03/10 10:11, kernel@tauceti.net wrote:
> > Anything we can do to investigate this further?
> >
> > Thanks!
> > Robert
> >
> >
> > On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel@tauceti.net>
> > wrote:
> >   
> >> I've applied the patch against the kernel which I got
> >> from "git clone ...." resulted in a kernel 2.6.34-rc5.
> >>
> >> The stack trace after mounting NFS is here:
> >> https://bugzilla.kernel.org/attachment.cgi?id=26166
> >> /var/log/messages after soft lockup:
> >> https://bugzilla.kernel.org/attachment.cgi?id=26167
> >>
> >> I hope that there is any usefull information in there.
> >>
> >> Thanks!
> >> Robert
> >>
> >> On 04/27/10 01:28, Trond Myklebust wrote:
> >>     
> >>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: 
> >>>   
> >>>       
> >>>>> Sure. In addition to what you did above, please do
> >>>>>
> >>>>> mount -t debugfs none /sys/kernel/debug
> >>>>>
> >>>>> and then cat the contents of the pseudofile at
> >>>>>
> >>>>> /sys/kernel/debug/tracing/stack_trace
> >>>>>
> >>>>> Please do this more or less immediately after you've finished
> >>>>>           
> > mounting
> >   
> >>>>> the NFSv4 client.
> >>>>>   
> >>>>>       
> >>>>>           
> >>>> I've uploaded the stack trace. It was generated
> >>>> directly after mounting. Here are the stacks:
> >>>>
> >>>> After mounting:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=26153
> >>>> After the soft lockup:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=26154
> >>>> The dmesg output of the soft lockup:
> >>>> https://bugzilla.kernel.org/attachment.cgi?id=26155
> >>>>
> >>>>     
> >>>>         
> >>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does
> >>>>>           
> > it
> >   
> >>>>> use the 'refer' export option anywhere? If so, then we might have to
> >>>>> test further, since those may trigger the NFSv4 submount feature.
> >>>>>   
> >>>>>       
> >>>>>           
> >>>> The server has the following settings:
> >>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash
> >>>>
> >>>> Thanks!
> >>>> Robert
> >>>>
> >>>>
> >>>>     
> >>>>         
> >>> That second trace is more than 5.5K deep, more than half of which is
> >>> socket overhead :-(((.
> >>>
> >>> The process stack does not appear to have overflowed, however that
> >>>       
> > trace
> >   
> >>> doesn't include any IRQ stack overhead.
> >>>
> >>> OK... So what happens if we get rid of half of that trace by forcing
> >>> asynchronous tasks such as this to run entirely in rpciod instead of
> >>> first trying to run in the process context?
> >>>
> >>> See the attachment...
> >>>
> >>>       
> 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-05-06 21:30                                                           ` Trond Myklebust
@ 2010-05-13 21:08                                                             ` Robert Wimmer
  -1 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-05-13 21:08 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, linux-nfs, linux-kernel

Finally I've had some time to do the next test.
Here is a wireshark dump (~750 MByte):
http://213.252.12.93/2.6.34-rc5.cap.gz

dmesg output after page allocation failure:
https://bugzilla.kernel.org/attachment.cgi?id=26371

stack trace before page allocation failure:
https://bugzilla.kernel.org/attachment.cgi?id=26369

stack trace after page allocation failure:
https://bugzilla.kernel.org/attachment.cgi?id=26370

I hope the wireshark dump is not to big to download.
It was created with
tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap

Thanks!
Robert



On 05/06/10 23:30, Trond Myklebust wrote:
> Sorry. I've been caught up in work in the past few days.
>
> I can certainly help with the soft lockup if you are able to supply
> either a dump that includes all threads stuck in the NFS, or a (binary)
> wireshark dump that shows the NFSv4 traffic between the client and
> server around the time of the hang.
>
> Cheers
>   Trond
>
> On Thu, 2010-05-06 at 23:19 +0200, Robert Wimmer wrote: 
>   
>> I don't know if someone is still interested in this
>> but I think Trond isn't further interested because
>> the last error was of cource a "page allocation
>> failure" and not a "soft lookup" which Trond was
>> trying to solve. But the patch was for 2.6.34 and
>> the "soft lookup" comes up only with some 2.6.30 and
>> maybe some 2.6.31 kernel versions. But the first error
>> I reported was a "page allocation failure" which
>> all kernels >= 2.6.32 produces with this configuration
>> I use (NFSv4).
>>
>> Michael suggested to first solve the "soft lookup"
>> before further investigating the "page allocation
>> failure". We know that the "soft lookup" only
>> pop's up with NFSv4 and not v3. I really want to
>> use v4 but since I'm not a kernel hacker someone
>> must guide me what to try next.
>>
>> I know that you're all have a lot of other work to
>> do but if there're no ideas left what to do next
>> it's maybe best to close the bug for now and I stay with
>> kernel 2.6.30 for now or go back to NFS v3 if I
>> upgrade to a newer kernel. Maybe the error will
>> be fixed "by accident" in >= 2.6.35 ;-) 
>>
>> Thanks!
>> Robert
>>
>>
>>
>> On 05/03/10 10:11, kernel@tauceti.net wrote:
>>     
>>> Anything we can do to investigate this further?
>>>
>>> Thanks!
>>> Robert
>>>
>>>
>>> On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel@tauceti.net>
>>> wrote:
>>>   
>>>       
>>>> I've applied the patch against the kernel which I got
>>>> from "git clone ...." resulted in a kernel 2.6.34-rc5.
>>>>
>>>> The stack trace after mounting NFS is here:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26166
>>>> /var/log/messages after soft lockup:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26167
>>>>
>>>> I hope that there is any usefull information in there.
>>>>
>>>> Thanks!
>>>> Robert
>>>>
>>>> On 04/27/10 01:28, Trond Myklebust wrote:
>>>>     
>>>>         
>>>>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: 
>>>>>   
>>>>>       
>>>>>           
>>>>>>> Sure. In addition to what you did above, please do
>>>>>>>
>>>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>>>
>>>>>>> and then cat the contents of the pseudofile at
>>>>>>>
>>>>>>> /sys/kernel/debug/tracing/stack_trace
>>>>>>>
>>>>>>> Please do this more or less immediately after you've finished
>>>>>>>           
>>>>>>>               
>>> mounting
>>>   
>>>       
>>>>>>> the NFSv4 client.
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> I've uploaded the stack trace. It was generated
>>>>>> directly after mounting. Here are the stacks:
>>>>>>
>>>>>> After mounting:
>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=26153
>>>>>> After the soft lockup:
>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=26154
>>>>>> The dmesg output of the soft lockup:
>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=26155
>>>>>>
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does
>>>>>>>           
>>>>>>>               
>>> it
>>>   
>>>       
>>>>>>> use the 'refer' export option anywhere? If so, then we might have to
>>>>>>> test further, since those may trigger the NFSv4 submount feature.
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> The server has the following settings:
>>>>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash
>>>>>>
>>>>>> Thanks!
>>>>>> Robert
>>>>>>
>>>>>>
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> That second trace is more than 5.5K deep, more than half of which is
>>>>> socket overhead :-(((.
>>>>>
>>>>> The process stack does not appear to have overflowed, however that
>>>>>       
>>>>>           
>>> trace
>>>   
>>>       
>>>>> doesn't include any IRQ stack overhead.
>>>>>
>>>>> OK... So what happens if we get rid of half of that trace by forcing
>>>>> asynchronous tasks such as this to run entirely in rpciod instead of
>>>>> first trying to run in the process context?
>>>>>
>>>>> See the attachment...
>>>>>
>>>>>       
>>>>>           
>>     
>
>   


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-05-13 21:08                                                             ` Robert Wimmer
  0 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-05-13 21:08 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, linux-nfs, linux-kernel

Finally I've had some time to do the next test.
Here is a wireshark dump (~750 MByte):
http://213.252.12.93/2.6.34-rc5.cap.gz

dmesg output after page allocation failure:
https://bugzilla.kernel.org/attachment.cgi?id=26371

stack trace before page allocation failure:
https://bugzilla.kernel.org/attachment.cgi?id=26369

stack trace after page allocation failure:
https://bugzilla.kernel.org/attachment.cgi?id=26370

I hope the wireshark dump is not to big to download.
It was created with
tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap

Thanks!
Robert



On 05/06/10 23:30, Trond Myklebust wrote:
> Sorry. I've been caught up in work in the past few days.
>
> I can certainly help with the soft lockup if you are able to supply
> either a dump that includes all threads stuck in the NFS, or a (binary)
> wireshark dump that shows the NFSv4 traffic between the client and
> server around the time of the hang.
>
> Cheers
>   Trond
>
> On Thu, 2010-05-06 at 23:19 +0200, Robert Wimmer wrote: 
>   
>> I don't know if someone is still interested in this
>> but I think Trond isn't further interested because
>> the last error was of cource a "page allocation
>> failure" and not a "soft lookup" which Trond was
>> trying to solve. But the patch was for 2.6.34 and
>> the "soft lookup" comes up only with some 2.6.30 and
>> maybe some 2.6.31 kernel versions. But the first error
>> I reported was a "page allocation failure" which
>> all kernels >= 2.6.32 produces with this configuration
>> I use (NFSv4).
>>
>> Michael suggested to first solve the "soft lookup"
>> before further investigating the "page allocation
>> failure". We know that the "soft lookup" only
>> pop's up with NFSv4 and not v3. I really want to
>> use v4 but since I'm not a kernel hacker someone
>> must guide me what to try next.
>>
>> I know that you're all have a lot of other work to
>> do but if there're no ideas left what to do next
>> it's maybe best to close the bug for now and I stay with
>> kernel 2.6.30 for now or go back to NFS v3 if I
>> upgrade to a newer kernel. Maybe the error will
>> be fixed "by accident" in >= 2.6.35 ;-) 
>>
>> Thanks!
>> Robert
>>
>>
>>
>> On 05/03/10 10:11, kernel@tauceti.net wrote:
>>     
>>> Anything we can do to investigate this further?
>>>
>>> Thanks!
>>> Robert
>>>
>>>
>>> On Wed, 28 Apr 2010 00:56:01 +0200, Robert Wimmer <kernel@tauceti.net>
>>> wrote:
>>>   
>>>       
>>>> I've applied the patch against the kernel which I got
>>>> from "git clone ...." resulted in a kernel 2.6.34-rc5.
>>>>
>>>> The stack trace after mounting NFS is here:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26166
>>>> /var/log/messages after soft lockup:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26167
>>>>
>>>> I hope that there is any usefull information in there.
>>>>
>>>> Thanks!
>>>> Robert
>>>>
>>>> On 04/27/10 01:28, Trond Myklebust wrote:
>>>>     
>>>>         
>>>>> On Tue, 2010-04-27 at 00:18 +0200, Robert Wimmer wrote: 
>>>>>   
>>>>>       
>>>>>           
>>>>>>> Sure. In addition to what you did above, please do
>>>>>>>
>>>>>>> mount -t debugfs none /sys/kernel/debug
>>>>>>>
>>>>>>> and then cat the contents of the pseudofile at
>>>>>>>
>>>>>>> /sys/kernel/debug/tracing/stack_trace
>>>>>>>
>>>>>>> Please do this more or less immediately after you've finished
>>>>>>>           
>>>>>>>               
>>> mounting
>>>   
>>>       
>>>>>>> the NFSv4 client.
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> I've uploaded the stack trace. It was generated
>>>>>> directly after mounting. Here are the stacks:
>>>>>>
>>>>>> After mounting:
>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=26153
>>>>>> After the soft lockup:
>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=26154
>>>>>> The dmesg output of the soft lockup:
>>>>>> https://bugzilla.kernel.org/attachment.cgi?id=26155
>>>>>>
>>>>>>     
>>>>>>         
>>>>>>             
>>>>>>> Does your server have the 'crossmnt' or 'nohide' flags set, or does
>>>>>>>           
>>>>>>>               
>>> it
>>>   
>>>       
>>>>>>> use the 'refer' export option anywhere? If so, then we might have to
>>>>>>> test further, since those may trigger the NFSv4 submount feature.
>>>>>>>   
>>>>>>>       
>>>>>>>           
>>>>>>>               
>>>>>> The server has the following settings:
>>>>>> rw,nohide,insecure,async,no_subtree_check,no_root_squash
>>>>>>
>>>>>> Thanks!
>>>>>> Robert
>>>>>>
>>>>>>
>>>>>>     
>>>>>>         
>>>>>>             
>>>>> That second trace is more than 5.5K deep, more than half of which is
>>>>> socket overhead :-(((.
>>>>>
>>>>> The process stack does not appear to have overflowed, however that
>>>>>       
>>>>>           
>>> trace
>>>   
>>>       
>>>>> doesn't include any IRQ stack overhead.
>>>>>
>>>>> OK... So what happens if we get rid of half of that trace by forcing
>>>>> asynchronous tasks such as this to run entirely in rpciod instead of
>>>>> first trying to run in the process context?
>>>>>
>>>>> See the attachment...
>>>>>
>>>>>       
>>>>>           
>>     
>
>   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-05-13 21:13                                                               ` Trond Myklebust
  0 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2010-05-13 21:13 UTC (permalink / raw)
  To: Robert Wimmer
  Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, linux-nfs, linux-kernel

On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: 
> Finally I've had some time to do the next test.
> Here is a wireshark dump (~750 MByte):
> http://213.252.12.93/2.6.34-rc5.cap.gz
> 
> dmesg output after page allocation failure:
> https://bugzilla.kernel.org/attachment.cgi?id=26371
> 
> stack trace before page allocation failure:
> https://bugzilla.kernel.org/attachment.cgi?id=26369
> 
> stack trace after page allocation failure:
> https://bugzilla.kernel.org/attachment.cgi?id=26370
> 
> I hope the wireshark dump is not to big to download.
> It was created with
> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap
> 
> Thanks!
> Robert

Hi Robert,

I tried the above wireshark dump URL, but it appears to point to an
empty file.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-05-13 21:13                                                               ` Trond Myklebust
  0 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2010-05-13 21:13 UTC (permalink / raw)
  To: Robert Wimmer
  Cc: mst, Avi Kivity, Andrew Morton, linux-mm,
	bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell,
	Mel Gorman, linux-nfs, linux-kernel

On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: 
> Finally I've had some time to do the next test.
> Here is a wireshark dump (~750 MByte):
> http://213.252.12.93/2.6.34-rc5.cap.gz
> 
> dmesg output after page allocation failure:
> https://bugzilla.kernel.org/attachment.cgi?id=26371
> 
> stack trace before page allocation failure:
> https://bugzilla.kernel.org/attachment.cgi?id=26369
> 
> stack trace after page allocation failure:
> https://bugzilla.kernel.org/attachment.cgi?id=26370
> 
> I hope the wireshark dump is not to big to download.
> It was created with
> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap
> 
> Thanks!
> Robert

Hi Robert,

I tried the above wireshark dump URL, but it appears to point to an
empty file.

Cheers
  Trond

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-05-13 21:13                                                               ` Trond Myklebust
  0 siblings, 0 replies; 62+ messages in thread
From: Trond Myklebust @ 2010-05-13 21:13 UTC (permalink / raw)
  To: Robert Wimmer
  Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, linux-nfs, linux-kernel

On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: 
> Finally I've had some time to do the next test.
> Here is a wireshark dump (~750 MByte):
> http://213.252.12.93/2.6.34-rc5.cap.gz
> 
> dmesg output after page allocation failure:
> https://bugzilla.kernel.org/attachment.cgi?id=26371
> 
> stack trace before page allocation failure:
> https://bugzilla.kernel.org/attachment.cgi?id=26369
> 
> stack trace after page allocation failure:
> https://bugzilla.kernel.org/attachment.cgi?id=26370
> 
> I hope the wireshark dump is not to big to download.
> It was created with
> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap
> 
> Thanks!
> Robert

Hi Robert,

I tried the above wireshark dump URL, but it appears to point to an
empty file.

Cheers
  Trond

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-05-13 21:13                                                               ` Trond Myklebust
  (?)
@ 2010-05-14  5:42                                                                 ` Robert Wimmer
  -1 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-05-14  5:42 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, linux-nfs, linux-kernel

Hi Trond,

I'm sorry. There was a Varnish in front of that
webserver which doesn't like so big files ;-)
Please try this url: http://213.252.12.34/2.6.34-rc5.cap.gz
It work's for me.

Thanks!
Robert


On 05/13/10 23:13, Trond Myklebust wrote:
> On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: 
>   
>> Finally I've had some time to do the next test.
>> Here is a wireshark dump (~750 MByte):
>> http://213.252.12.93/2.6.34-rc5.cap.gz
>>
>> dmesg output after page allocation failure:
>> https://bugzilla.kernel.org/attachment.cgi?id=26371
>>
>> stack trace before page allocation failure:
>> https://bugzilla.kernel.org/attachment.cgi?id=26369
>>
>> stack trace after page allocation failure:
>> https://bugzilla.kernel.org/attachment.cgi?id=26370
>>
>> I hope the wireshark dump is not to big to download.
>> It was created with
>> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap
>>
>> Thanks!
>> Robert
>>     
> Hi Robert,
>
> I tried the above wireshark dump URL, but it appears to point to an
> empty file.
>
> Cheers
>   Trond
>   


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-05-14  5:42                                                                 ` Robert Wimmer
  0 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-05-14  5:42 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, linux-nfs, linux-kernel

Hi Trond,

I'm sorry. There was a Varnish in front of that
webserver which doesn't like so big files ;-)
Please try this url: http://213.252.12.34/2.6.34-rc5.cap.gz
It work's for me.

Thanks!
Robert


On 05/13/10 23:13, Trond Myklebust wrote:
> On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: 
>   
>> Finally I've had some time to do the next test.
>> Here is a wireshark dump (~750 MByte):
>> http://213.252.12.93/2.6.34-rc5.cap.gz
>>
>> dmesg output after page allocation failure:
>> https://bugzilla.kernel.org/attachment.cgi?id=26371
>>
>> stack trace before page allocation failure:
>> https://bugzilla.kernel.org/attachment.cgi?id=26369
>>
>> stack trace after page allocation failure:
>> https://bugzilla.kernel.org/attachment.cgi?id=26370
>>
>> I hope the wireshark dump is not to big to download.
>> It was created with
>> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap
>>
>> Thanks!
>> Robert
>>     
> Hi Robert,
>
> I tried the above wireshark dump URL, but it appears to point to an
> empty file.
>
> Cheers
>   Trond
>   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-05-14  5:42                                                                 ` Robert Wimmer
  0 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-05-14  5:42 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, linux-nfs, linux-kernel

Hi Trond,

I'm sorry. There was a Varnish in front of that
webserver which doesn't like so big files ;-)
Please try this url: http://213.252.12.34/2.6.34-rc5.cap.gz
It work's for me.

Thanks!
Robert


On 05/13/10 23:13, Trond Myklebust wrote:
> On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: 
>   
>> Finally I've had some time to do the next test.
>> Here is a wireshark dump (~750 MByte):
>> http://213.252.12.93/2.6.34-rc5.cap.gz
>>
>> dmesg output after page allocation failure:
>> https://bugzilla.kernel.org/attachment.cgi?id=26371
>>
>> stack trace before page allocation failure:
>> https://bugzilla.kernel.org/attachment.cgi?id=26369
>>
>> stack trace after page allocation failure:
>> https://bugzilla.kernel.org/attachment.cgi?id=26370
>>
>> I hope the wireshark dump is not to big to download.
>> It was created with
>> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap
>>
>> Thanks!
>> Robert
>>     
> Hi Robert,
>
> I tried the above wireshark dump URL, but it appears to point to an
> empty file.
>
> Cheers
>   Trond
>   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-05-20  7:39                                                                 ` kernel
  0 siblings, 0 replies; 62+ messages in thread
From: kernel @ 2010-05-20  7:39 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, linux-nfs, linux-kernel

Hi Trond,

have you had some time to download the wireshark dump?

Thanks!
Robert

On Thu, 13 May 2010 17:13:54 -0400, Trond Myklebust
<Trond.Myklebust@netapp.com> wrote:
> On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: 
>> Finally I've had some time to do the next test.
>> Here is a wireshark dump (~750 MByte):
>> http://213.252.12.93/2.6.34-rc5.cap.gz
>> 
>> dmesg output after page allocation failure:
>> https://bugzilla.kernel.org/attachment.cgi?id=26371
>> 
>> stack trace before page allocation failure:
>> https://bugzilla.kernel.org/attachment.cgi?id=26369
>> 
>> stack trace after page allocation failure:
>> https://bugzilla.kernel.org/attachment.cgi?id=26370
>> 
>> I hope the wireshark dump is not to big to download.
>> It was created with
>> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap
>> 
>> Thanks!
>> Robert
> 
> Hi Robert,
> 
> I tried the above wireshark dump URL, but it appears to point to an
> empty file.
> 
> Cheers
>   Trond

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-05-20  7:39                                                                 ` kernel
  0 siblings, 0 replies; 62+ messages in thread
From: kernel @ 2010-05-20  7:39 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: mst, Avi Kivity, Andrew Morton, linux-mm,
	bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell,
	Mel Gorman, linux-nfs, linux-kernel

Hi Trond,

have you had some time to download the wireshark dump?

Thanks!
Robert

On Thu, 13 May 2010 17:13:54 -0400, Trond Myklebust
<Trond.Myklebust@netapp.com> wrote:
> On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: 
>> Finally I've had some time to do the next test.
>> Here is a wireshark dump (~750 MByte):
>> http://213.252.12.93/2.6.34-rc5.cap.gz
>> 
>> dmesg output after page allocation failure:
>> https://bugzilla.kernel.org/attachment.cgi?id=26371
>> 
>> stack trace before page allocation failure:
>> https://bugzilla.kernel.org/attachment.cgi?id=26369
>> 
>> stack trace after page allocation failure:
>> https://bugzilla.kernel.org/attachment.cgi?id=26370
>> 
>> I hope the wireshark dump is not to big to download.
>> It was created with
>> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap
>> 
>> Thanks!
>> Robert
> 
> Hi Robert,
> 
> I tried the above wireshark dump URL, but it appears to point to an
> empty file.
> 
> Cheers
>   Trond

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-05-20  7:39                                                                 ` kernel
  0 siblings, 0 replies; 62+ messages in thread
From: kernel @ 2010-05-20  7:39 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, linux-nfs, linux-kernel

Hi Trond,

have you had some time to download the wireshark dump?

Thanks!
Robert

On Thu, 13 May 2010 17:13:54 -0400, Trond Myklebust
<Trond.Myklebust@netapp.com> wrote:
> On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: 
>> Finally I've had some time to do the next test.
>> Here is a wireshark dump (~750 MByte):
>> http://213.252.12.93/2.6.34-rc5.cap.gz
>> 
>> dmesg output after page allocation failure:
>> https://bugzilla.kernel.org/attachment.cgi?id=26371
>> 
>> stack trace before page allocation failure:
>> https://bugzilla.kernel.org/attachment.cgi?id=26369
>> 
>> stack trace after page allocation failure:
>> https://bugzilla.kernel.org/attachment.cgi?id=26370
>> 
>> I hope the wireshark dump is not to big to download.
>> It was created with
>> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap
>> 
>> Thanks!
>> Robert
> 
> Hi Robert,
> 
> I tried the above wireshark dump URL, but it appears to point to an
> empty file.
> 
> Cheers
>   Trond

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-05-25 20:01                                                                   ` Robert Wimmer
  0 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-05-25 20:01 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, linux-nfs, linux-kernel

Hi Trond,

just a little reminder ;-)

Thanks!
Robert

On 05/20/10 09:39, kernel@tauceti.net wrote:
> Hi Trond,
>
> have you had some time to download the wireshark dump?
>
> Thanks!
> Robert
>
> On Thu, 13 May 2010 17:13:54 -0400, Trond Myklebust
> <Trond.Myklebust@netapp.com> wrote:
>   
>> On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: 
>>     
>>> Finally I've had some time to do the next test.
>>> Here is a wireshark dump (~750 MByte):
>>> http://213.252.12.93/2.6.34-rc5.cap.gz
>>>
>>> dmesg output after page allocation failure:
>>> https://bugzilla.kernel.org/attachment.cgi?id=26371
>>>
>>> stack trace before page allocation failure:
>>> https://bugzilla.kernel.org/attachment.cgi?id=26369
>>>
>>> stack trace after page allocation failure:
>>> https://bugzilla.kernel.org/attachment.cgi?id=26370
>>>
>>> I hope the wireshark dump is not to big to download.
>>> It was created with
>>> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap
>>>
>>> Thanks!
>>> Robert
>>>       
>> Hi Robert,
>>
>> I tried the above wireshark dump URL, but it appears to point to an
>> empty file.
>>
>> Cheers
>>   Trond
>>     


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-05-25 20:01                                                                   ` Robert Wimmer
  0 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-05-25 20:01 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: mst, Avi Kivity, Andrew Morton, linux-mm,
	bugzilla-daemon-590EEB7GvNiWaY/ihj7yzEB+6BGkLq7r, Rusty Russell,
	Mel Gorman, linux-nfs, linux-kernel

Hi Trond,

just a little reminder ;-)

Thanks!
Robert

On 05/20/10 09:39, kernel-PAwl83ecUlHR7s880joybQ@public.gmane.org wrote:
> Hi Trond,
>
> have you had some time to download the wireshark dump?
>
> Thanks!
> Robert
>
> On Thu, 13 May 2010 17:13:54 -0400, Trond Myklebust
> <Trond.Myklebust@netapp.com> wrote:
>   
>> On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: 
>>     
>>> Finally I've had some time to do the next test.
>>> Here is a wireshark dump (~750 MByte):
>>> http://213.252.12.93/2.6.34-rc5.cap.gz
>>>
>>> dmesg output after page allocation failure:
>>> https://bugzilla.kernel.org/attachment.cgi?id=26371
>>>
>>> stack trace before page allocation failure:
>>> https://bugzilla.kernel.org/attachment.cgi?id=26369
>>>
>>> stack trace after page allocation failure:
>>> https://bugzilla.kernel.org/attachment.cgi?id=26370
>>>
>>> I hope the wireshark dump is not to big to download.
>>> It was created with
>>> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap
>>>
>>> Thanks!
>>> Robert
>>>       
>> Hi Robert,
>>
>> I tried the above wireshark dump URL, but it appears to point to an
>> empty file.
>>
>> Cheers
>>   Trond
>>     


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-05-25 20:01                                                                   ` Robert Wimmer
  0 siblings, 0 replies; 62+ messages in thread
From: Robert Wimmer @ 2010-05-25 20:01 UTC (permalink / raw)
  To: Trond Myklebust
  Cc: mst, Avi Kivity, Andrew Morton, linux-mm, bugzilla-daemon,
	Rusty Russell, Mel Gorman, linux-nfs, linux-kernel

Hi Trond,

just a little reminder ;-)

Thanks!
Robert

On 05/20/10 09:39, kernel@tauceti.net wrote:
> Hi Trond,
>
> have you had some time to download the wireshark dump?
>
> Thanks!
> Robert
>
> On Thu, 13 May 2010 17:13:54 -0400, Trond Myklebust
> <Trond.Myklebust@netapp.com> wrote:
>   
>> On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: 
>>     
>>> Finally I've had some time to do the next test.
>>> Here is a wireshark dump (~750 MByte):
>>> http://213.252.12.93/2.6.34-rc5.cap.gz
>>>
>>> dmesg output after page allocation failure:
>>> https://bugzilla.kernel.org/attachment.cgi?id=26371
>>>
>>> stack trace before page allocation failure:
>>> https://bugzilla.kernel.org/attachment.cgi?id=26369
>>>
>>> stack trace after page allocation failure:
>>> https://bugzilla.kernel.org/attachment.cgi?id=26370
>>>
>>> I hope the wireshark dump is not to big to download.
>>> It was created with
>>> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap
>>>
>>> Thanks!
>>> Robert
>>>       
>> Hi Robert,
>>
>> I tried the above wireshark dump URL, but it appears to point to an
>> empty file.
>>
>> Cheers
>>   Trond
>>     

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
  2010-05-25 20:01                                                                   ` Robert Wimmer
@ 2010-06-02 11:56                                                                     ` kernel
  -1 siblings, 0 replies; 62+ messages in thread
From: kernel @ 2010-06-02 11:56 UTC (permalink / raw)
  To: Robert Wimmer
  Cc: Trond Myklebust, mst, Avi Kivity, Andrew Morton, linux-mm,
	bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs,
	linux-kernel

Hi Trond,

currently it seems that the problem was 
fixed by accident... ;-) Since 2.6.34 is now
in Gentoo portage I thought I should give
it a try. Using my 2.6.35-r5 .config 
the 2.6.34 release is now working for 4 hours
(instead of 5-10 minutes before). Hmmm...
Hopefully it will run for some more hours
and days now. Since I've definitely changed
nothing besides the kernel it must have been
fixed (hopefully) in one of the 2.6.34-rc's.

If it's still running tomorrow I'll close
the bug.

Greetings
Robert

On Tue, 25 May 2010 22:01:54 +0200, Robert Wimmer <kernel@tauceti.net>
wrote:
> Hi Trond,
> 
> just a little reminder ;-)
> 
> Thanks!
> Robert
> 
> On 05/20/10 09:39, kernel@tauceti.net wrote:
>> Hi Trond,
>>
>> have you had some time to download the wireshark dump?
>>
>> Thanks!
>> Robert
>>
>> On Thu, 13 May 2010 17:13:54 -0400, Trond Myklebust
>> <Trond.Myklebust@netapp.com> wrote:
>>   
>>> On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: 
>>>     
>>>> Finally I've had some time to do the next test.
>>>> Here is a wireshark dump (~750 MByte):
>>>> http://213.252.12.93/2.6.34-rc5.cap.gz
>>>>
>>>> dmesg output after page allocation failure:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26371
>>>>
>>>> stack trace before page allocation failure:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26369
>>>>
>>>> stack trace after page allocation failure:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26370
>>>>
>>>> I hope the wireshark dump is not to big to download.
>>>> It was created with
>>>> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap
>>>>
>>>> Thanks!
>>>> Robert
>>>>       
>>> Hi Robert,
>>>
>>> I tried the above wireshark dump URL, but it appears to point to an
>>> empty file.
>>>
>>> Cheers
>>>   Trond
>>>

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: [Bugme-new] [Bug 15709] New: swapper page allocation failure
@ 2010-06-02 11:56                                                                     ` kernel
  0 siblings, 0 replies; 62+ messages in thread
From: kernel @ 2010-06-02 11:56 UTC (permalink / raw)
  To: Robert Wimmer
  Cc: Trond Myklebust, mst, Avi Kivity, Andrew Morton, linux-mm,
	bugzilla-daemon, Rusty Russell, Mel Gorman, linux-nfs,
	linux-kernel

Hi Trond,

currently it seems that the problem was 
fixed by accident... ;-) Since 2.6.34 is now
in Gentoo portage I thought I should give
it a try. Using my 2.6.35-r5 .config 
the 2.6.34 release is now working for 4 hours
(instead of 5-10 minutes before). Hmmm...
Hopefully it will run for some more hours
and days now. Since I've definitely changed
nothing besides the kernel it must have been
fixed (hopefully) in one of the 2.6.34-rc's.

If it's still running tomorrow I'll close
the bug.

Greetings
Robert

On Tue, 25 May 2010 22:01:54 +0200, Robert Wimmer <kernel@tauceti.net>
wrote:
> Hi Trond,
> 
> just a little reminder ;-)
> 
> Thanks!
> Robert
> 
> On 05/20/10 09:39, kernel@tauceti.net wrote:
>> Hi Trond,
>>
>> have you had some time to download the wireshark dump?
>>
>> Thanks!
>> Robert
>>
>> On Thu, 13 May 2010 17:13:54 -0400, Trond Myklebust
>> <Trond.Myklebust@netapp.com> wrote:
>>   
>>> On Thu, 2010-05-13 at 23:08 +0200, Robert Wimmer wrote: 
>>>     
>>>> Finally I've had some time to do the next test.
>>>> Here is a wireshark dump (~750 MByte):
>>>> http://213.252.12.93/2.6.34-rc5.cap.gz
>>>>
>>>> dmesg output after page allocation failure:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26371
>>>>
>>>> stack trace before page allocation failure:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26369
>>>>
>>>> stack trace after page allocation failure:
>>>> https://bugzilla.kernel.org/attachment.cgi?id=26370
>>>>
>>>> I hope the wireshark dump is not to big to download.
>>>> It was created with
>>>> tshark -f "tcp port 2049" -i eth0 -w 2.6.34-rc5.cap
>>>>
>>>> Thanks!
>>>> Robert
>>>>       
>>> Hi Robert,
>>>
>>> I tried the above wireshark dump URL, but it appears to point to an
>>> empty file.
>>>
>>> Cheers
>>>   Trond
>>>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2010-06-02  9:57 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <bug-15709-10286@https.bugzilla.kernel.org/>
2010-04-08 19:34 ` [Bugme-new] [Bug 15709] New: swapper page allocation failure Andrew Morton
2010-04-08 19:39   ` Avi Kivity
2010-04-08 20:04     ` Michael S. Tsirkin
2010-04-09 10:15       ` Robert Wimmer
2010-04-11 11:03         ` Michael S. Tsirkin
2010-04-12  9:25           ` Robert Wimmer
2010-04-12 11:23             ` Michael S. Tsirkin
2010-04-12 13:50               ` Robert Wimmer
2010-04-12 13:52                 ` Michael S. Tsirkin
2010-04-13  8:51                   ` Robert Wimmer
2010-04-19 12:55                     ` Robert Wimmer
2010-04-19 13:17                       ` Michael S. Tsirkin
2010-04-21 11:23                         ` kernel
2010-04-21  9:42                           ` Michael S. Tsirkin
2010-04-22 11:31                             ` kernel
2010-04-22 10:03                               ` Michael S. Tsirkin
2010-04-22 10:03                                 ` Michael S. Tsirkin
2010-04-23  5:26                                 ` Robert Wimmer
2010-04-23  5:26                                   ` Robert Wimmer
2010-04-25  9:18                                   ` Michael S. Tsirkin
2010-04-25  9:18                                     ` Michael S. Tsirkin
2010-04-25 20:41                                     ` Robert Wimmer
2010-04-25 20:41                                       ` Robert Wimmer
2010-04-25 20:49                                       ` Michael S. Tsirkin
2010-04-25 20:49                                         ` Michael S. Tsirkin
2010-04-26 12:15                                         ` Trond Myklebust
2010-04-26 12:15                                           ` Trond Myklebust
2010-04-26 12:15                                           ` Trond Myklebust
2010-04-26 20:25                                           ` Robert Wimmer
2010-04-26 20:25                                             ` Robert Wimmer
2010-04-26 21:04                                             ` Trond Myklebust
2010-04-26 21:04                                               ` Trond Myklebust
2010-04-26 21:04                                               ` Trond Myklebust
2010-04-26 22:18                                               ` Robert Wimmer
2010-04-26 22:18                                                 ` Robert Wimmer
2010-04-26 23:28                                                 ` Trond Myklebust
2010-04-27 22:56                                                   ` Robert Wimmer
2010-04-27 22:56                                                     ` Robert Wimmer
2010-05-03  8:11                                                     ` kernel
2010-05-03  8:11                                                       ` kernel
2010-05-06 21:19                                                       ` Robert Wimmer
2010-05-06 21:19                                                         ` Robert Wimmer
2010-05-06 21:19                                                         ` Robert Wimmer
2010-05-06 21:30                                                         ` Trond Myklebust
2010-05-06 21:30                                                           ` Trond Myklebust
2010-05-06 21:30                                                           ` Trond Myklebust
2010-05-13 21:08                                                           ` Robert Wimmer
2010-05-13 21:08                                                             ` Robert Wimmer
2010-05-13 21:13                                                             ` Trond Myklebust
2010-05-13 21:13                                                               ` Trond Myklebust
2010-05-13 21:13                                                               ` Trond Myklebust
2010-05-14  5:42                                                               ` Robert Wimmer
2010-05-14  5:42                                                                 ` Robert Wimmer
2010-05-14  5:42                                                                 ` Robert Wimmer
2010-05-20  7:39                                                               ` kernel
2010-05-20  7:39                                                                 ` kernel
2010-05-20  7:39                                                                 ` kernel
2010-05-25 20:01                                                                 ` Robert Wimmer
2010-05-25 20:01                                                                   ` Robert Wimmer
2010-05-25 20:01                                                                   ` Robert Wimmer
2010-06-02 11:56                                                                   ` kernel
2010-06-02 11:56                                                                     ` kernel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.