linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Oops in 3.7-rc8 isolate_free_pages_block()
@ 2012-12-06  9:17 Henrik Rydberg
  2012-12-06 14:48 ` Jan Kara
  0 siblings, 1 reply; 19+ messages in thread
From: Henrik Rydberg @ 2012-12-06  9:17 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Linux Kernel Mailing List

Hi Linus,

This is the third time I encounter this oops in 3.7, but the first
time I managed to get a decent screenshot:

http://bitmath.org/test/oops-3.7-rc8.jpg

It seems to have to do with page migration. I run with transparent
hugepages configured, just for the fun of it.

I am happy to test any suggestions.

Thanks,
Henrik

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops in 3.7-rc8 isolate_free_pages_block()
  2012-12-06  9:17 Oops in 3.7-rc8 isolate_free_pages_block() Henrik Rydberg
@ 2012-12-06 14:48 ` Jan Kara
  2012-12-06 15:22   ` Henrik Rydberg
  2012-12-06 16:19   ` Mel Gorman
  0 siblings, 2 replies; 19+ messages in thread
From: Jan Kara @ 2012-12-06 14:48 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Linus Torvalds, mgorman, linux-mm, Linux Kernel Mailing List

On Thu 06-12-12 10:17:44, Henrik Rydberg wrote:
> Hi Linus,
> 
> This is the third time I encounter this oops in 3.7, but the first
> time I managed to get a decent screenshot:
> 
> http://bitmath.org/test/oops-3.7-rc8.jpg
> 
> It seems to have to do with page migration. I run with transparent
> hugepages configured, just for the fun of it.
> 
> I am happy to test any suggestions.
  Adding linux-mm and Mel as an author of compaction in particular to CC...
It seems that while traversing struct page structures, we entered into a new
huge page (note that RBX is 0xffffea0001c00000 - just the beginning of
a huge page) and oopsed on PageBuddy test (_mapcount is at offset 0x18 in
struct page). It might be useful if you provide disassembly of
isolate_freepages_block() function in your kernel so that we can guess more
from other register contents...

								Honza

								Honza
-- 
Jan Kara <jack@suse.cz>
SUSE Labs, CR

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops in 3.7-rc8 isolate_free_pages_block()
  2012-12-06 14:48 ` Jan Kara
@ 2012-12-06 15:22   ` Henrik Rydberg
  2012-12-06 16:10     ` Linus Torvalds
  2012-12-06 16:19   ` Mel Gorman
  1 sibling, 1 reply; 19+ messages in thread
From: Henrik Rydberg @ 2012-12-06 15:22 UTC (permalink / raw)
  To: Jan Kara; +Cc: Linus Torvalds, mgorman, linux-mm, Linux Kernel Mailing List

Hi Jan,

> > http://bitmath.org/test/oops-3.7-rc8.jpg
> > 
> > It seems to have to do with page migration. I run with transparent
> > hugepages configured, just for the fun of it.
> > 
> > I am happy to test any suggestions.
>
> Adding linux-mm and Mel as an author of compaction in particular to CC...
>
> It seems that while traversing struct page structures, we entered into a new
> huge page (note that RBX is 0xffffea0001c00000 - just the beginning of
> a huge page) and oopsed on PageBuddy test (_mapcount is at offset 0x18 in
> struct page). It might be useful if you provide disassembly of
> isolate_freepages_block() function in your kernel so that we can guess more
> from other register contents...

I had to recreate the vmlinux file, but it seems be at the right address, so here we go:

ffffffff810a6d00 <isolate_freepages_block>:
ffffffff810a6d00:	48 b8 00 00 00 00 00 	movabs $0xffffea0000000000,%rax
ffffffff810a6d07:	ea ff ff 
ffffffff810a6d0a:	41 57                	push   %r15
ffffffff810a6d0c:	41 56                	push   %r14
ffffffff810a6d0e:	49 89 fe             	mov    %rdi,%r14
ffffffff810a6d11:	41 55                	push   %r13
ffffffff810a6d13:	49 89 d5             	mov    %rdx,%r13
ffffffff810a6d16:	41 54                	push   %r12
ffffffff810a6d18:	55                   	push   %rbp
ffffffff810a6d19:	53                   	push   %rbx
ffffffff810a6d1a:	48 89 f3             	mov    %rsi,%rbx
ffffffff810a6d1d:	48 c1 e3 06          	shl    $0x6,%rbx
ffffffff810a6d21:	48 83 ec 58          	sub    $0x58,%rsp
ffffffff810a6d25:	48 01 c3             	add    %rax,%rbx
ffffffff810a6d28:	48 39 f2             	cmp    %rsi,%rdx
ffffffff810a6d2b:	48 89 74 24 30       	mov    %rsi,0x30(%rsp)
ffffffff810a6d30:	44 88 44 24 3b       	mov    %r8b,0x3b(%rsp)
ffffffff810a6d35:	0f 86 15 02 00 00    	jbe    ffffffff810a6f50 <isolate_freepages_block+0x250>
ffffffff810a6d3b:	48 8d 47 58          	lea    0x58(%rdi),%rax
ffffffff810a6d3f:	31 d2                	xor    %edx,%edx
ffffffff810a6d41:	48 8b 6c 24 30       	mov    0x30(%rsp),%rbp
ffffffff810a6d46:	48 89 44 24 20       	mov    %rax,0x20(%rsp)
ffffffff810a6d4b:	48 8d 47 40          	lea    0x40(%rdi),%rax
ffffffff810a6d4f:	49 89 dc             	mov    %rbx,%r12
ffffffff810a6d52:	c7 44 24 3c 00 00 00 	movl   $0x0,0x3c(%rsp)
ffffffff810a6d59:	00 
ffffffff810a6d5a:	49 89 ce             	mov    %rcx,%r14
ffffffff810a6d5d:	41 89 d7             	mov    %edx,%r15d
ffffffff810a6d60:	48 89 44 24 28       	mov    %rax,0x28(%rsp)
ffffffff810a6d65:	48 89 7c 24 18       	mov    %rdi,0x18(%rsp)
ffffffff810a6d6a:	eb 1c                	jmp    ffffffff810a6d88 <isolate_freepages_block+0x88>
ffffffff810a6d6c:	0f 1f 40 00          	nopl   0x0(%rax)
ffffffff810a6d70:	48 83 c5 01          	add    $0x1,%rbp
ffffffff810a6d74:	48 83 c3 40          	add    $0x40,%rbx
ffffffff810a6d78:	49 39 ed             	cmp    %rbp,%r13
ffffffff810a6d7b:	0f 86 cf 00 00 00    	jbe    ffffffff810a6e50 <isolate_freepages_block+0x150>
ffffffff810a6d81:	4d 85 e4             	test   %r12,%r12
ffffffff810a6d84:	4c 0f 44 e3          	cmove  %rbx,%r12
ffffffff810a6d88:	8b 43 18             	mov    0x18(%rbx),%eax
ffffffff810a6d8b:	83 f8 80             	cmp    $0xffffff80,%eax
ffffffff810a6d8e:	75 e0                	jne    ffffffff810a6d70 <isolate_freepages_block+0x70>
ffffffff810a6d90:	48 8b 44 24 18       	mov    0x18(%rsp),%rax
ffffffff810a6d95:	48 8d 74 24 48       	lea    0x48(%rsp),%rsi
ffffffff810a6d9a:	41 0f b6 d7          	movzbl %r15b,%edx
ffffffff810a6d9e:	4c 8b 44 24 20       	mov    0x20(%rsp),%r8
ffffffff810a6da3:	48 8b 4c 24 28       	mov    0x28(%rsp),%rcx
ffffffff810a6da8:	48 8b 40 50          	mov    0x50(%rax),%rax
ffffffff810a6dac:	48 89 c7             	mov    %rax,%rdi
ffffffff810a6daf:	48 83 c7 50          	add    $0x50,%rdi
ffffffff810a6db3:	e8 a8 fe ff ff       	callq  ffffffff810a6c60 <compact_checklock_irqsave.isra.13>
ffffffff810a6db8:	84 c0                	test   %al,%al
ffffffff810a6dba:	41 89 c7             	mov    %eax,%r15d
ffffffff810a6dbd:	0f 84 8d 00 00 00    	je     ffffffff810a6e50 <isolate_freepages_block+0x150>
ffffffff810a6dc3:	80 7c 24 3b 00       	cmpb   $0x0,0x3b(%rsp)
ffffffff810a6dc8:	0f 84 c2 00 00 00    	je     ffffffff810a6e90 <isolate_freepages_block+0x190>
ffffffff810a6dce:	8b 43 18             	mov    0x18(%rbx),%eax
ffffffff810a6dd1:	83 f8 80             	cmp    $0xffffff80,%eax
ffffffff810a6dd4:	75 9a                	jne    ffffffff810a6d70 <isolate_freepages_block+0x70>
ffffffff810a6dd6:	48 89 df             	mov    %rbx,%rdi
ffffffff810a6dd9:	e8 32 db fe ff       	callq  ffffffff81094910 <split_free_page>
ffffffff810a6dde:	85 c0                	test   %eax,%eax
ffffffff810a6de0:	0f 84 81 01 00 00    	je     ffffffff810a6f67 <isolate_freepages_block+0x267>
ffffffff810a6de6:	01 44 24 3c          	add    %eax,0x3c(%rsp)
ffffffff810a6dea:	83 f8 00             	cmp    $0x0,%eax
ffffffff810a6ded:	0f 8e 48 01 00 00    	jle    ffffffff810a6f3b <isolate_freepages_block+0x23b>
ffffffff810a6df3:	4d 8b 06             	mov    (%r14),%r8
ffffffff810a6df6:	48 89 d9             	mov    %rbx,%rcx
ffffffff810a6df9:	31 ff                	xor    %edi,%edi
ffffffff810a6dfb:	4c 8d 5b 20          	lea    0x20(%rbx),%r11
ffffffff810a6dff:	90                   	nop
ffffffff810a6e00:	48 8d 51 20          	lea    0x20(%rcx),%rdx
ffffffff810a6e04:	48 89 ce             	mov    %rcx,%rsi
ffffffff810a6e07:	83 c7 01             	add    $0x1,%edi
ffffffff810a6e0a:	48 29 de             	sub    %rbx,%rsi
ffffffff810a6e0d:	48 83 c1 40          	add    $0x40,%rcx
ffffffff810a6e11:	39 c7                	cmp    %eax,%edi
ffffffff810a6e13:	49 89 50 08          	mov    %rdx,0x8(%r8)
ffffffff810a6e17:	4e 89 04 1e          	mov    %r8,(%rsi,%r11,1)
ffffffff810a6e1b:	49 89 d0             	mov    %rdx,%r8
ffffffff810a6e1e:	4e 89 74 1e 08       	mov    %r14,0x8(%rsi,%r11,1)
ffffffff810a6e23:	49 89 16             	mov    %rdx,(%r14)
ffffffff810a6e26:	75 d8                	jne    ffffffff810a6e00 <isolate_freepages_block+0x100>
ffffffff810a6e28:	8d 48 ff             	lea    -0x1(%rax),%ecx
ffffffff810a6e2b:	48 98                	cltq   
ffffffff810a6e2d:	48 83 e8 01          	sub    $0x1,%rax
ffffffff810a6e31:	48 63 c9             	movslq %ecx,%rcx
ffffffff810a6e34:	48 01 cd             	add    %rcx,%rbp
ffffffff810a6e37:	48 c1 e0 06          	shl    $0x6,%rax
ffffffff810a6e3b:	48 01 c3             	add    %rax,%rbx
ffffffff810a6e3e:	48 83 c5 01          	add    $0x1,%rbp
ffffffff810a6e42:	48 83 c3 40          	add    $0x40,%rbx
ffffffff810a6e46:	49 39 ed             	cmp    %rbp,%r13
ffffffff810a6e49:	0f 87 32 ff ff ff    	ja     ffffffff810a6d81 <isolate_freepages_block+0x81>
ffffffff810a6e4f:	90                   	nop
ffffffff810a6e50:	4c 8b 74 24 18       	mov    0x18(%rsp),%r14
ffffffff810a6e55:	44 89 fa             	mov    %r15d,%edx
ffffffff810a6e58:	48 63 44 24 3c       	movslq 0x3c(%rsp),%rax
ffffffff810a6e5d:	80 7c 24 3b 00       	cmpb   $0x0,0x3b(%rsp)
ffffffff810a6e62:	74 11                	je     ffffffff810a6e75 <isolate_freepages_block+0x175>
ffffffff810a6e64:	4c 89 e9             	mov    %r13,%rcx
ffffffff810a6e67:	48 2b 4c 24 30       	sub    0x30(%rsp),%rcx
ffffffff810a6e6c:	48 39 c1             	cmp    %rax,%rcx
ffffffff810a6e6f:	0f 87 b7 00 00 00    	ja     ffffffff810a6f2c <isolate_freepages_block+0x22c>
ffffffff810a6e75:	84 d2                	test   %dl,%dl
ffffffff810a6e77:	75 31                	jne    ffffffff810a6eaa <isolate_freepages_block+0x1aa>
ffffffff810a6e79:	4c 39 ed             	cmp    %r13,%rbp
ffffffff810a6e7c:	74 4d                	je     ffffffff810a6ecb <isolate_freepages_block+0x1cb>
ffffffff810a6e7e:	48 83 c4 58          	add    $0x58,%rsp
ffffffff810a6e82:	5b                   	pop    %rbx
ffffffff810a6e83:	5d                   	pop    %rbp
ffffffff810a6e84:	41 5c                	pop    %r12
ffffffff810a6e86:	41 5d                	pop    %r13
ffffffff810a6e88:	41 5e                	pop    %r14
ffffffff810a6e8a:	41 5f                	pop    %r15
ffffffff810a6e8c:	c3                   	retq   
ffffffff810a6e8d:	0f 1f 00             	nopl   (%rax)
ffffffff810a6e90:	48 89 df             	mov    %rbx,%rdi
ffffffff810a6e93:	e8 78 fd ff ff       	callq  ffffffff810a6c10 <suitable_migration_target>
ffffffff810a6e98:	84 c0                	test   %al,%al
ffffffff810a6e9a:	0f 85 2e ff ff ff    	jne    ffffffff810a6dce <isolate_freepages_block+0xce>
ffffffff810a6ea0:	4c 8b 74 24 18       	mov    0x18(%rsp),%r14
ffffffff810a6ea5:	48 63 44 24 3c       	movslq 0x3c(%rsp),%rax
ffffffff810a6eaa:	49 8b 7e 50          	mov    0x50(%r14),%rdi
ffffffff810a6eae:	48 89 44 24 08       	mov    %rax,0x8(%rsp)
ffffffff810a6eb3:	48 8b 74 24 48       	mov    0x48(%rsp),%rsi
ffffffff810a6eb8:	48 83 c7 50          	add    $0x50,%rdi
ffffffff810a6ebc:	e8 1f 14 60 00       	callq  ffffffff816a82e0 <_raw_spin_unlock_irqrestore>
ffffffff810a6ec1:	4c 39 ed             	cmp    %r13,%rbp
ffffffff810a6ec4:	48 8b 44 24 08       	mov    0x8(%rsp),%rax
ffffffff810a6ec9:	75 b3                	jne    ffffffff810a6e7e <isolate_freepages_block+0x17e>
ffffffff810a6ecb:	4d 85 e4             	test   %r12,%r12
ffffffff810a6ece:	49 8b 5e 50          	mov    0x50(%r14),%rbx
ffffffff810a6ed2:	74 aa                	je     ffffffff810a6e7e <isolate_freepages_block+0x17e>
ffffffff810a6ed4:	8b 4c 24 3c          	mov    0x3c(%rsp),%ecx
ffffffff810a6ed8:	85 c9                	test   %ecx,%ecx
ffffffff810a6eda:	75 a2                	jne    ffffffff810a6e7e <isolate_freepages_block+0x17e>
ffffffff810a6edc:	b9 03 00 00 00       	mov    $0x3,%ecx
ffffffff810a6ee1:	ba 03 00 00 00       	mov    $0x3,%edx
ffffffff810a6ee6:	be 01 00 00 00       	mov    $0x1,%esi
ffffffff810a6eeb:	4c 89 e7             	mov    %r12,%rdi
ffffffff810a6eee:	48 89 44 24 08       	mov    %rax,0x8(%rsp)
ffffffff810a6ef3:	e8 18 d1 fe ff       	callq  ffffffff81094010 <set_pageblock_flags_group>
ffffffff810a6ef8:	41 80 7e 42 00       	cmpb   $0x0,0x42(%r14)
ffffffff810a6efd:	48 8b 44 24 08       	mov    0x8(%rsp),%rax
ffffffff810a6f02:	0f 85 76 ff ff ff    	jne    ffffffff810a6e7e <isolate_freepages_block+0x17e>
ffffffff810a6f08:	48 ba 00 00 00 00 00 	movabs $0x160000000000,%rdx
ffffffff810a6f0f:	16 00 00 
ffffffff810a6f12:	49 01 d4             	add    %rdx,%r12
ffffffff810a6f15:	49 c1 fc 06          	sar    $0x6,%r12
ffffffff810a6f19:	4c 3b 63 60          	cmp    0x60(%rbx),%r12
ffffffff810a6f1d:	0f 83 5b ff ff ff    	jae    ffffffff810a6e7e <isolate_freepages_block+0x17e>
ffffffff810a6f23:	4c 89 63 60          	mov    %r12,0x60(%rbx)
ffffffff810a6f27:	e9 52 ff ff ff       	jmpq   ffffffff810a6e7e <isolate_freepages_block+0x17e>
ffffffff810a6f2c:	31 c0                	xor    %eax,%eax
ffffffff810a6f2e:	c7 44 24 3c 00 00 00 	movl   $0x0,0x3c(%rsp)
ffffffff810a6f35:	00 
ffffffff810a6f36:	e9 3a ff ff ff       	jmpq   ffffffff810a6e75 <isolate_freepages_block+0x175>
ffffffff810a6f3b:	0f 84 2f fe ff ff    	je     ffffffff810a6d70 <isolate_freepages_block+0x70>
ffffffff810a6f41:	e9 e2 fe ff ff       	jmpq   ffffffff810a6e28 <isolate_freepages_block+0x128>
ffffffff810a6f46:	66 2e 0f 1f 84 00 00 	nopw   %cs:0x0(%rax,%rax,1)
ffffffff810a6f4d:	00 00 00 
ffffffff810a6f50:	48 89 f5             	mov    %rsi,%rbp
ffffffff810a6f53:	31 c0                	xor    %eax,%eax
ffffffff810a6f55:	c7 44 24 3c 00 00 00 	movl   $0x0,0x3c(%rsp)
ffffffff810a6f5c:	00 
ffffffff810a6f5d:	31 d2                	xor    %edx,%edx
ffffffff810a6f5f:	45 31 e4             	xor    %r12d,%r12d
ffffffff810a6f62:	e9 f6 fe ff ff       	jmpq   ffffffff810a6e5d <isolate_freepages_block+0x15d>
ffffffff810a6f67:	80 7c 24 3b 00       	cmpb   $0x0,0x3b(%rsp)
ffffffff810a6f6c:	0f 84 74 fe ff ff    	je     ffffffff810a6de6 <isolate_freepages_block+0xe6>
ffffffff810a6f72:	44 89 fa             	mov    %r15d,%edx
ffffffff810a6f75:	4c 8b 74 24 18       	mov    0x18(%rsp),%r14
ffffffff810a6f7a:	48 63 44 24 3c       	movslq 0x3c(%rsp),%rax
ffffffff810a6f7f:	e9 e0 fe ff ff       	jmpq   ffffffff810a6e64 <isolate_freepages_block+0x164>
ffffffff810a6f84:	66 66 66 2e 0f 1f 84 	data32 data32 nopw %cs:0x0(%rax,%rax,1)
ffffffff810a6f8b:	00 00 00 00 00 

Thanks,
Henrik

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops in 3.7-rc8 isolate_free_pages_block()
  2012-12-06 15:22   ` Henrik Rydberg
@ 2012-12-06 16:10     ` Linus Torvalds
  2012-12-06 16:35       ` Mel Gorman
  0 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2012-12-06 16:10 UTC (permalink / raw)
  To: Henrik Rydberg
  Cc: Jan Kara, Mel Gorman, linux-mm, Linux Kernel Mailing List, Andrew Morton

Ok, so it's isolate_freepages_block+0x88, and as Jan Kara already
guessed from just the offset, that is indeed likely the PageBuddy()
test.

On Thu, Dec 6, 2012 at 7:22 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
>
>  http://bitmath.org/test/oops-3.7-rc8.jpg
>
> ffffffff810a6d6a:       eb 1c                   jmp    ffffffff810a6d88 <isolate_freepages_block+0x88>
> ffffffff810a6d6c:       0f 1f 40 00             nopl   0x0(%rax)

On the first entry to the loop, we jump *into* the loop, over the end
condition (the compiler has basically turned. And we jump directly to
the faulting instruction. Looking at the register state, though, we're
not at the first iteration of the loop, so we don't have to worry
about that case. The loop itself then starts with:

> ffffffff810a6d70:       48 83 c5 01             add    $0x1,%rbp
> ffffffff810a6d74:       48 83 c3 40             add    $0x40,%rbx

The above is the "blockpfn++, cursor++" part of the loop, while the
test below is the loop condition ("blockpfn < end_pfn"):

> ffffffff810a6d78:       49 39 ed                cmp    %rbp,%r13
> ffffffff810a6d7b:       0f 86 cf 00 00 00       jbe    ffffffff810a6e50 <isolate_freepages_block+0x150>

>From your image, %rbp is 0x070000 and %r13 is 0x0702f9.

The "pfn_valid_within()" test is a no-op because we don't have holes
in zones on x86, so then we have

                if (!valid_page)
                        valid_page = page;

which generates a test+cmove:

> ffffffff810a6d81:       4d 85 e4                test   %r12,%r12
> ffffffff810a6d84:       4c 0f 44 e3             cmove  %rbx,%r12

(which is how we can tell we're not at the beginning: 'valid_page' is
0xffffea0001bfbe40, while the current page is 0xffffea0001c00000).

.. and finally the oopsing instruction from PageBuddy(), which is the
read of the 'page->_mapcount'

> ffffffff810a6d88:       8b 43 18                mov    0x18(%rbx),%eax
> ffffffff810a6d8b:       83 f8 80                cmp    $0xffffff80,%eax
> ffffffff810a6d8e:       75 e0                   jne    ffffffff810a6d70 <isolate_freepages_block+0x70>

So yeah, that loop has apparently wandered into la-la-land. end_pfn
must be somehow wrong.

Mel, does any of this ring a bell (Andrew also added to the cc, since
the patches came through him).

                  Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops in 3.7-rc8 isolate_free_pages_block()
  2012-12-06 14:48 ` Jan Kara
  2012-12-06 15:22   ` Henrik Rydberg
@ 2012-12-06 16:19   ` Mel Gorman
  2012-12-06 16:50     ` Linus Torvalds
                       ` (2 more replies)
  1 sibling, 3 replies; 19+ messages in thread
From: Mel Gorman @ 2012-12-06 16:19 UTC (permalink / raw)
  To: Jan Kara
  Cc: Henrik Rydberg, Linus Torvalds, linux-mm, Linux Kernel Mailing List

On Thu, Dec 06, 2012 at 03:48:21PM +0100, Jan Kara wrote:
> On Thu 06-12-12 10:17:44, Henrik Rydberg wrote:
> > Hi Linus,
> > 
> > This is the third time I encounter this oops in 3.7, but the first
> > time I managed to get a decent screenshot:
> > 
> > http://bitmath.org/test/oops-3.7-rc8.jpg
> > 
> > It seems to have to do with page migration. I run with transparent
> > hugepages configured, just for the fun of it.
> > 
> > I am happy to test any suggestions.
>   Adding linux-mm and Mel as an author of compaction in particular to CC...
> It seems that while traversing struct page structures, we entered into a new
> huge page (note that RBX is 0xffffea0001c00000 - just the beginning of
> a huge page) and oopsed on PageBuddy test (_mapcount is at offset 0x18 in
> struct page). It might be useful if you provide disassembly of
> isolate_freepages_block() function in your kernel so that we can guess more
> from other register contents...
> 

Still travelling and am not in a position to test this properly :(.
However, this bug feels very similar to a bug in the migration scanner where
a pfn_valid check is missed because the start is not aligned.  Henrik, when
did this start happening? I would be a little surprised if it started between
3.6 and 3.7-rcX but maybe it's just easier to hit now for some reason. How
reproducible is this? Is there anything in particular you do to trigger the
oops? Does the following patch help any? It's only compile tested I'm afraid.

---8<---
mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for free

Commit 0bf380bc (mm: compaction: check pfn_valid when entering a new
MAX_ORDER_NR_PAGES block during isolation for migration) added a check
for pfn_valid() when isolating pages for migration as the scanner does
not necessarily start pageblock-aligned. However, the free scanner has
the same problem. If it encounters a hole, it can also trigger an oops
when is calls PageBuddy(page) on a page that is within an hole.

Reported-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Mel Gorman <mgorman@suse.de>
Cc: stable@vger.kernel.org
---
 mm/compaction.c |   10 ++++++++++
 1 files changed, 10 insertions(+), 0 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 9eef558..7d85ad485 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -298,6 +298,16 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
 			continue;
 		if (!valid_page)
 			valid_page = page;
+
+		/*
+		 * As blockpfn may not start aligned, blockpfn->end_pfn
+		 * may cross a MAX_ORDER_NR_PAGES boundary and a pfn_valid
+		 * check is necessary. If the pfn is not valid, stop
+		 * isolation.
+		 */
+		if ((blockpfn & (MAX_ORDER_NR_PAGES - 1)) == 0 &&
+		    !pfn_valid(blockpfn))
+			break;
 		if (!PageBuddy(page))
 			continue;
 

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Oops in 3.7-rc8 isolate_free_pages_block()
  2012-12-06 16:10     ` Linus Torvalds
@ 2012-12-06 16:35       ` Mel Gorman
  0 siblings, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2012-12-06 16:35 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Henrik Rydberg, Jan Kara, linux-mm, Linux Kernel Mailing List,
	Andrew Morton

On Thu, Dec 06, 2012 at 08:10:13AM -0800, Linus Torvalds wrote:
> Ok, so it's isolate_freepages_block+0x88, and as Jan Kara already
> guessed from just the offset, that is indeed likely the PageBuddy()
> test.
> 
> On Thu, Dec 6, 2012 at 7:22 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
> >
> >  http://bitmath.org/test/oops-3.7-rc8.jpg
> >
> > ffffffff810a6d6a:       eb 1c                   jmp    ffffffff810a6d88 <isolate_freepages_block+0x88>
> > ffffffff810a6d6c:       0f 1f 40 00             nopl   0x0(%rax)
> 
> On the first entry to the loop, we jump *into* the loop, over the end
> condition (the compiler has basically turned. And we jump directly to
> the faulting instruction. Looking at the register state, though, we're
> not at the first iteration of the loop, so we don't have to worry
> about that case. The loop itself then starts with:
> 
> > ffffffff810a6d70:       48 83 c5 01             add    $0x1,%rbp
> > ffffffff810a6d74:       48 83 c3 40             add    $0x40,%rbx
> 
> The above is the "blockpfn++, cursor++" part of the loop, while the
> test below is the loop condition ("blockpfn < end_pfn"):
> 
> > ffffffff810a6d78:       49 39 ed                cmp    %rbp,%r13
> > ffffffff810a6d7b:       0f 86 cf 00 00 00       jbe    ffffffff810a6e50 <isolate_freepages_block+0x150>
> 
> From your image, %rbp is 0x070000 and %r13 is 0x0702f9.
> 
> The "pfn_valid_within()" test is a no-op because we don't have holes
> in zones on x86, so then we have
> 

That thing is not about holes in zones, it's about holes within a
MAX_ORDER_NR_PAGES block but either way it's a no-op x86 and we're not doing
a pfn_valid check in this loop. I didn't look back in time but I have a
vague recollection that this used to be always start with an aligned PFN
but with large amounts of churn since, it's no longer true.

>                 if (!valid_page)
>                         valid_page = page;
> 
> which generates a test+cmove:
> 
> > ffffffff810a6d81:       4d 85 e4                test   %r12,%r12
> > ffffffff810a6d84:       4c 0f 44 e3             cmove  %rbx,%r12
> 
> (which is how we can tell we're not at the beginning: 'valid_page' is
> 0xffffea0001bfbe40, while the current page is 0xffffea0001c00000).
> 
> .. and finally the oopsing instruction from PageBuddy(), which is the
> read of the 'page->_mapcount'
> 
> > ffffffff810a6d88:       8b 43 18                mov    0x18(%rbx),%eax
> > ffffffff810a6d8b:       83 f8 80                cmp    $0xffffff80,%eax
> > ffffffff810a6d8e:       75 e0                   jne    ffffffff810a6d70 <isolate_freepages_block+0x70>
> 
> So yeah, that loop has apparently wandered into la-la-land. end_pfn
> must be somehow wrong.
> 

I think we wandered into a hole where there is no valid struct page.

> Mel, does any of this ring a bell (Andrew also added to the cc, since
> the patches came through him).
> 

It reminded me of a similar bug in the migration scanner which I mentioned
in the patch elsewhere in the thread but carelessly failed to cc Andrew.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops in 3.7-rc8 isolate_free_pages_block()
  2012-12-06 16:19   ` Mel Gorman
@ 2012-12-06 16:50     ` Linus Torvalds
  2012-12-06 17:55       ` Mel Gorman
  2012-12-06 16:58     ` Henrik Rydberg
  2012-12-06 17:22     ` Henrik Rydberg
  2 siblings, 1 reply; 19+ messages in thread
From: Linus Torvalds @ 2012-12-06 16:50 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Jan Kara, Henrik Rydberg, linux-mm, Linux Kernel Mailing List

On Thu, Dec 6, 2012 at 8:19 AM, Mel Gorman <mgorman@suse.de> wrote:
>
> Still travelling and am not in a position to test this properly :(.
> However, this bug feels very similar to a bug in the migration scanner where
> a pfn_valid check is missed because the start is not aligned.

Ugh. This patch makes my eyes bleed.

Is there no way to do this nicely in the caller? IOW, fix the
'end_pfn' logic way upstream where it is computed, and just cap it at
the MAX_ORDER_NR_PAGES boundary?

For example, isolate_freepages_range() seems to have this *other*
end-point alignment thing going on, and does it in a loop. Wouldn't it
be much better to have a separate loop that looped up to the next
MAX_ORDER_NR_PAGES boundary instead of having this kind of very random
test in the middle of a loop.

Even the name ("isolate_freepages_block") implies that we have a
"block" of pages. Having to have a random "oops, this block can have
other blocks inside of it that aren't mapped" test in the middle of
that function really makes me go "Uhh, no".

Plus, is it even guaranteed that the *first* pfn (that we get called
with) is pfnvalid to begin with?

So I guess this patch fixes things, but it does make me go "That's
really *really* ugly".

                  Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops in 3.7-rc8 isolate_free_pages_block()
  2012-12-06 16:19   ` Mel Gorman
  2012-12-06 16:50     ` Linus Torvalds
@ 2012-12-06 16:58     ` Henrik Rydberg
  2012-12-06 17:22     ` Henrik Rydberg
  2 siblings, 0 replies; 19+ messages in thread
From: Henrik Rydberg @ 2012-12-06 16:58 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Jan Kara, Linus Torvalds, linux-mm, Linux Kernel Mailing List

Hi Mel,

> Still travelling and am not in a position to test this properly :(.
> However, this bug feels very similar to a bug in the migration scanner where
> a pfn_valid check is missed because the start is not aligned.  Henrik, when
> did this start happening? I would be a little surprised if it started between
> 3.6 and 3.7-rcX but maybe it's just easier to hit now for some reason.

I started using transparent hugepages when moving to 3.7-rc1, so it is
quite possible that the problem was there already in 3.6.

> How reproducible is this? Is there anything in particular you do to
> trigger the oops?

Unfortunately nothing special, and it is rare. IIRC, it has happened
after a long uptime, but I guess that only means the probability of
the oops is higher then.

> Does the following patch help any? It's only compile tested I'm afraid.
> 
> ---8<---
> mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for free
> 
> Commit 0bf380bc (mm: compaction: check pfn_valid when entering a new
> MAX_ORDER_NR_PAGES block during isolation for migration) added a check
> for pfn_valid() when isolating pages for migration as the scanner does
> not necessarily start pageblock-aligned. However, the free scanner has
> the same problem. If it encounters a hole, it can also trigger an oops
> when is calls PageBuddy(page) on a page that is within an hole.
> 
> Reported-by: Henrik Rydberg <rydberg@euromail.se>
> Signed-off-by: Mel Gorman <mgorman@suse.de>
> Cc: stable@vger.kernel.org
> ---
>  mm/compaction.c |   10 ++++++++++
>  1 files changed, 10 insertions(+), 0 deletions(-)
> 
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 9eef558..7d85ad485 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -298,6 +298,16 @@ static unsigned long isolate_freepages_block(struct compact_control *cc,
>  			continue;
>  		if (!valid_page)
>  			valid_page = page;
> +
> +		/*
> +		 * As blockpfn may not start aligned, blockpfn->end_pfn
> +		 * may cross a MAX_ORDER_NR_PAGES boundary and a pfn_valid
> +		 * check is necessary. If the pfn is not valid, stop
> +		 * isolation.
> +		 */
> +		if ((blockpfn & (MAX_ORDER_NR_PAGES - 1)) == 0 &&
> +		    !pfn_valid(blockpfn))
> +			break;
>  		if (!PageBuddy(page))
>  			continue;
>  

I am running with it now, adding a printout to see if the case happens
at all. Might take a while, will try to stress the machine a bit.

Thanks,
Henrik

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops in 3.7-rc8 isolate_free_pages_block()
  2012-12-06 16:19   ` Mel Gorman
  2012-12-06 16:50     ` Linus Torvalds
  2012-12-06 16:58     ` Henrik Rydberg
@ 2012-12-06 17:22     ` Henrik Rydberg
  2 siblings, 0 replies; 19+ messages in thread
From: Henrik Rydberg @ 2012-12-06 17:22 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Jan Kara, Linus Torvalds, linux-mm, Linux Kernel Mailing List

> Still travelling and am not in a position to test this properly :(.
> However, this bug feels very similar to a bug in the migration scanner where
> a pfn_valid check is missed because the start is not aligned.  Henrik, when
> did this start happening? I would be a little surprised if it started between
> 3.6 and 3.7-rcX but maybe it's just easier to hit now for some reason. How
> reproducible is this? Is there anything in particular you do to trigger the
> oops? Does the following patch help any? It's only compile tested I'm afraid.

I managed to trigger the path several times with a small
memory-intensive program, and since I am still here,

    Tested-by: Henrik Rydberg <rydberg@euromail.se>

Thanks!
Henrik

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops in 3.7-rc8 isolate_free_pages_block()
  2012-12-06 16:50     ` Linus Torvalds
@ 2012-12-06 17:55       ` Mel Gorman
  2012-12-06 18:19         ` Linus Torvalds
  0 siblings, 1 reply; 19+ messages in thread
From: Mel Gorman @ 2012-12-06 17:55 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jan Kara, Henrik Rydberg, linux-mm, Linux Kernel Mailing List

On Thu, Dec 06, 2012 at 08:50:54AM -0800, Linus Torvalds wrote:
> On Thu, Dec 6, 2012 at 8:19 AM, Mel Gorman <mgorman@suse.de> wrote:
> >
> > Still travelling and am not in a position to test this properly :(.
> > However, this bug feels very similar to a bug in the migration scanner where
> > a pfn_valid check is missed because the start is not aligned.
> 
> Ugh. This patch makes my eyes bleed.
> 

Yeah. I was listening to a talk while I was writing it, a bit cranky and
didn't see why I should suffer alone.

> Is there no way to do this nicely in the caller? IOW, fix the
> 'end_pfn' logic way upstream where it is computed, and just cap it at
> the MAX_ORDER_NR_PAGES boundary?
> 

Easily done in the caller, but not on the MAX_ORDER_NR_PAGES boundary.
The caller is striding by pageblock so a MAX_ORDER_NR_PAGES alignment will
not work out.

> For example, isolate_freepages_range() seems to have this *other*
> end-point alignment thing going on, and does it in a loop. Wouldn't it
> be much better to have a separate loop that looped up to the next
> MAX_ORDER_NR_PAGES boundary instead of having this kind of very random
> test in the middle of a loop.
> 
> Even the name ("isolate_freepages_block") implies that we have a
> "block" of pages. Having to have a random "oops, this block can have
> other blocks inside of it that aren't mapped" test in the middle of
> that function really makes me go "Uhh, no".
> 

The block in the name is related to pageblocks.

> Plus, is it even guaranteed that the *first* pfn (that we get called
> with) is pfnvalid to begin with?
> 

Yes, the caller has already checked pfn_valid() and it used to be the
case that this pfn was pageblock-aligned but not since commit c89511ab
(mm: compaction: Restart compaction from near where it left off).

> So I guess this patch fixes things, but it does make me go "That's
> really *really* ugly".
> 

Quasimoto strikes again

---8<---
mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for free

Commit 0bf380bc (mm: compaction: check pfn_valid when entering a new
MAX_ORDER_NR_PAGES block during isolation for migration) added a check
for pfn_valid() when isolating pages for migration as the scanner does not
necessarily start pageblock-aligned. Since commit c89511ab (mm: compaction:
Restart compaction from near where it left off), the free scanner has
the same problem. This patch makes sure that the pfn range passed to
isolate_freepages_block() is within the same block so that pfn_valid()
checks are unnecessary.

Reported-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Mel Gorman <mgorman@suse.de>

diff --git a/mm/compaction.c b/mm/compaction.c
index 9eef558..c23fa55 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -713,7 +713,15 @@ static void isolate_freepages(struct zone *zone,
 
 		/* Found a block suitable for isolating free pages from */
 		isolated = 0;
-		end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
+
+		/*
+		 * As pfn may not start aligned, pfn+pageblock_nr_page
+		 * may cross a MAX_ORDER_NR_PAGES boundary and miss
+		 * a pfn_valid check. Ensure isolate_freepages_block()
+		 * only scans within a pageblock.
+		 */
+		end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages);
+		end_pfn = min(end_pfn, end_pfn);
 		isolated = isolate_freepages_block(cc, pfn, end_pfn,
 						   freelist, false);
 		nr_freepages += isolated;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Oops in 3.7-rc8 isolate_free_pages_block()
  2012-12-06 17:55       ` Mel Gorman
@ 2012-12-06 18:19         ` Linus Torvalds
  2012-12-06 18:21           ` Mel Gorman
  2012-12-06 18:32           ` Henrik Rydberg
  0 siblings, 2 replies; 19+ messages in thread
From: Linus Torvalds @ 2012-12-06 18:19 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Jan Kara, Henrik Rydberg, linux-mm, Linux Kernel Mailing List

On Thu, Dec 6, 2012 at 9:55 AM, Mel Gorman <mgorman@suse.de> wrote:
>
> Yeah. I was listening to a talk while I was writing it, a bit cranky and
> didn't see why I should suffer alone.

Makes sense.

> Quasimoto strikes again

Is that Quasimodo's Japanese cousin?

> -               end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
> +
> +               /*
> +                * As pfn may not start aligned, pfn+pageblock_nr_page
> +                * may cross a MAX_ORDER_NR_PAGES boundary and miss
> +                * a pfn_valid check. Ensure isolate_freepages_block()
> +                * only scans within a pageblock.
> +                */
> +               end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages);
> +               end_pfn = min(end_pfn, end_pfn);

Ok, this looks much nicer, except it's obviously buggy. The
min(end_pfn, end_pfn) thing is insane, and I'm sure you meant for that
line to be

+               end_pfn = min(end_pfn, zone_end_pfn);

Henrik, does that - corrected - patch (*instead* of the previous one,
not in addition to) also fix your issue?

                  Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops in 3.7-rc8 isolate_free_pages_block()
  2012-12-06 18:19         ` Linus Torvalds
@ 2012-12-06 18:21           ` Mel Gorman
  2012-12-06 18:32           ` Henrik Rydberg
  1 sibling, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2012-12-06 18:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Jan Kara, Henrik Rydberg, linux-mm, Linux Kernel Mailing List

On Thu, Dec 06, 2012 at 10:19:35AM -0800, Linus Torvalds wrote:
> On Thu, Dec 6, 2012 at 9:55 AM, Mel Gorman <mgorman@suse.de> wrote:
> >
> > Yeah. I was listening to a talk while I was writing it, a bit cranky and
> > didn't see why I should suffer alone.
> 
> Makes sense.
> 
> > Quasimoto strikes again
> 
> Is that Quasimodo's Japanese cousin?
> 

Yes, he's tried to escape his terrible legacy with a name change.

> > -               end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
> > +
> > +               /*
> > +                * As pfn may not start aligned, pfn+pageblock_nr_page
> > +                * may cross a MAX_ORDER_NR_PAGES boundary and miss
> > +                * a pfn_valid check. Ensure isolate_freepages_block()
> > +                * only scans within a pageblock.
> > +                */
> > +               end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages);
> > +               end_pfn = min(end_pfn, end_pfn);
> 
> Ok, this looks much nicer, except it's obviously buggy. The
> min(end_pfn, end_pfn) thing is insane, and I'm sure you meant for that
> line to be
> 
> +               end_pfn = min(end_pfn, zone_end_pfn);
> 

*sigh* Yes, I did. Thanks.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops in 3.7-rc8 isolate_free_pages_block()
  2012-12-06 18:19         ` Linus Torvalds
  2012-12-06 18:21           ` Mel Gorman
@ 2012-12-06 18:32           ` Henrik Rydberg
  2012-12-06 18:41             ` Linus Torvalds
  1 sibling, 1 reply; 19+ messages in thread
From: Henrik Rydberg @ 2012-12-06 18:32 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Mel Gorman, Jan Kara, linux-mm, Linux Kernel Mailing List

On Thu, Dec 06, 2012 at 10:19:35AM -0800, Linus Torvalds wrote:
> On Thu, Dec 6, 2012 at 9:55 AM, Mel Gorman <mgorman@suse.de> wrote:
> >
> > Yeah. I was listening to a talk while I was writing it, a bit cranky and
> > didn't see why I should suffer alone.
> 
> Makes sense.
> 
> > Quasimoto strikes again
> 
> Is that Quasimodo's Japanese cousin?
> 
> > -               end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
> > +
> > +               /*
> > +                * As pfn may not start aligned, pfn+pageblock_nr_page
> > +                * may cross a MAX_ORDER_NR_PAGES boundary and miss
> > +                * a pfn_valid check. Ensure isolate_freepages_block()
> > +                * only scans within a pageblock.
> > +                */
> > +               end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages);
> > +               end_pfn = min(end_pfn, end_pfn);
> 
> Ok, this looks much nicer, except it's obviously buggy. The
> min(end_pfn, end_pfn) thing is insane, and I'm sure you meant for that
> line to be
> 
> +               end_pfn = min(end_pfn, zone_end_pfn);
> 
> Henrik, does that - corrected - patch (*instead* of the previous one,
> not in addition to) also fix your issue?

Yes - I can no longer trigger the failpath, so it seems to work. Mel,
enjoy the rest of the talk. ;-)

Generally, I am a bit surprised that noone hit this before, given that
it was quite easy to trigger. I will check 3.6 as well.

Thanks,
Henrik


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops in 3.7-rc8 isolate_free_pages_block()
  2012-12-06 18:32           ` Henrik Rydberg
@ 2012-12-06 18:41             ` Linus Torvalds
  2012-12-06 19:01               ` Mel Gorman
  2012-12-06 19:28               ` Henrik Rydberg
  0 siblings, 2 replies; 19+ messages in thread
From: Linus Torvalds @ 2012-12-06 18:41 UTC (permalink / raw)
  To: Henrik Rydberg; +Cc: Mel Gorman, Jan Kara, linux-mm, Linux Kernel Mailing List

On Thu, Dec 6, 2012 at 10:32 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
>>
>> Henrik, does that - corrected - patch (*instead* of the previous one,
>> not in addition to) also fix your issue?
>
> Yes - I can no longer trigger the failpath, so it seems to work. Mel,
> enjoy the rest of the talk. ;-)
>
> Generally, I am a bit surprised that noone hit this before, given that
> it was quite easy to trigger. I will check 3.6 as well.

Actually, looking at it some more, I think that two-liner patch had
*ANOTHER* bug.

Because the other line seems buggy as well.

Instead of

        end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages);

I think it should be

        end_pfn = ALIGN(pfn+1, pageblock_nr_pages);

instead. ALIGN() already aligns upwards (but the "+1" is needed in
case pfn is already at a pageblock_nr_pages boundary, at which point
ALIGN() would have just returned that same boundary.

Hmm? Mel, please confirm. And Henrik, it might be good to test that
doubly-fixed patch. Because reading the patch and trying to fix bugs
in it that way is *not* the same as actually verifying it ;)

                 Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops in 3.7-rc8 isolate_free_pages_block()
  2012-12-06 18:41             ` Linus Torvalds
@ 2012-12-06 19:01               ` Mel Gorman
  2012-12-06 19:28               ` Henrik Rydberg
  1 sibling, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2012-12-06 19:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Henrik Rydberg, Jan Kara, linux-mm, Linux Kernel Mailing List

On Thu, Dec 06, 2012 at 10:41:14AM -0800, Linus Torvalds wrote:
> On Thu, Dec 6, 2012 at 10:32 AM, Henrik Rydberg <rydberg@euromail.se> wrote:
> >>
> >> Henrik, does that - corrected - patch (*instead* of the previous one,
> >> not in addition to) also fix your issue?
> >
> > Yes - I can no longer trigger the failpath, so it seems to work. Mel,
> > enjoy the rest of the talk. ;-)
> >
> > Generally, I am a bit surprised that noone hit this before, given that
> > it was quite easy to trigger. I will check 3.6 as well.
> 
> Actually, looking at it some more, I think that two-liner patch had
> *ANOTHER* bug.
> 
> Because the other line seems buggy as well.
> 
> Instead of
> 
>         end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages);
> 
> I think it should be
> 
>         end_pfn = ALIGN(pfn+1, pageblock_nr_pages);
> 
> instead. ALIGN() already aligns upwards (but the "+1" is needed in
> case pfn is already at a pageblock_nr_pages boundary, at which point
> ALIGN() would have just returned that same boundary.
> 
> Hmm? Mel, please confirm.

FFS. Yes, confirmed.

In answer to Henrik's wondering why others have reported this -- reproducing
this requires a large enough hole with the right aligment to have compaction
walk into a PFN range with no memmap. Size and alignment depends in the
memory model - 4M for FLATMEM and 128M for SPARSEMEM on x86. It needs a
"lucky" machine.

---8<---
mm: compaction: check pfn_valid when entering a new MAX_ORDER_NR_PAGES block during isolation for free

Commit 0bf380bc (mm: compaction: check pfn_valid when entering a new
MAX_ORDER_NR_PAGES block during isolation for migration) added a check
for pfn_valid() when isolating pages for migration as the scanner does not
necessarily start pageblock-aligned. Since commit c89511ab (mm: compaction:
Restart compaction from near where it left off), the free scanner has
the same problem. This patch makes sure that the pfn range passed to
isolate_freepages_block() is within the same block so that pfn_valid()
checks are unnecessary.

Reported-by: Henrik Rydberg <rydberg@euromail.se>
Signed-off-by: Mel Gorman <mgorman@suse.de>
---
 mm/compaction.c |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 9eef558..694eaab 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -713,7 +713,15 @@ static void isolate_freepages(struct zone *zone,
 
 		/* Found a block suitable for isolating free pages from */
 		isolated = 0;
-		end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
+
+		/*
+		 * As pfn may not start aligned, pfn+pageblock_nr_page
+		 * may cross a MAX_ORDER_NR_PAGES boundary and miss
+		 * a pfn_valid check. Ensure isolate_freepages_block()
+		 * only scans within a pageblock
+		 */
+		end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
+		end_pfn = min(end_pfn, zone_end_pfn);
 		isolated = isolate_freepages_block(cc, pfn, end_pfn,
 						   freelist, false);
 		nr_freepages += isolated;

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Oops in 3.7-rc8 isolate_free_pages_block()
  2012-12-06 18:41             ` Linus Torvalds
  2012-12-06 19:01               ` Mel Gorman
@ 2012-12-06 19:28               ` Henrik Rydberg
  2012-12-06 19:38                 ` Linus Torvalds
  1 sibling, 1 reply; 19+ messages in thread
From: Henrik Rydberg @ 2012-12-06 19:28 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Mel Gorman, Jan Kara, linux-mm, Linux Kernel Mailing List

> Actually, looking at it some more, I think that two-liner patch had
> *ANOTHER* bug.
> 
> Because the other line seems buggy as well.
> 
> Instead of
> 
>         end_pfn = ALIGN(pfn + pageblock_nr_pages, pageblock_nr_pages);
> 
> I think it should be
> 
>         end_pfn = ALIGN(pfn+1, pageblock_nr_pages);
> 
> instead. ALIGN() already aligns upwards (but the "+1" is needed in
> case pfn is already at a pageblock_nr_pages boundary, at which point
> ALIGN() would have just returned that same boundary.

Ah, and now the two callers treat the pointers the same way.

> Hmm? Mel, please confirm. And Henrik, it might be good to test that
> doubly-fixed patch. Because reading the patch and trying to fix bugs
> in it that way is *not* the same as actually verifying it ;)

Confirmed, working. I also checked 3.6, but could not trigger the
original problem there. The code also looks different, so it makes
sense. To be explicit, this is what I tested on top of v3.7-rc8:

---
 mm/compaction.c | 10 +++++++++-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/mm/compaction.c b/mm/compaction.c
index 9eef558..ff1c483 100644
--- a/mm/compaction.c
+++ b/mm/compaction.c
@@ -713,7 +713,15 @@ static void isolate_freepages(struct zone *zone,
 
 		/* Found a block suitable for isolating free pages from */
 		isolated = 0;
-		end_pfn = min(pfn + pageblock_nr_pages, zone_end_pfn);
+
+		/*
+		 * As pfn may not start aligned, pfn+pageblock_nr_page
+		 * may cross a MAX_ORDER_NR_PAGES boundary and miss
+		 * a pfn_valid check. Ensure isolate_freepages_block()
+		 * only scans within a pageblock.
+		 */
+		end_pfn = ALIGN(pfn + 1, pageblock_nr_pages);
+		end_pfn = min(end_pfn, zone_end_pfn);
 		isolated = isolate_freepages_block(cc, pfn, end_pfn,
 						   freelist, false);
 		nr_freepages += isolated;
-- 
1.8.0.1

Hopefully, that's a wrap. :-)

Henrik

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: Oops in 3.7-rc8 isolate_free_pages_block()
  2012-12-06 19:28               ` Henrik Rydberg
@ 2012-12-06 19:38                 ` Linus Torvalds
  2012-12-06 21:39                   ` Henrik Rydberg
  2012-12-07  8:32                   ` Mel Gorman
  0 siblings, 2 replies; 19+ messages in thread
From: Linus Torvalds @ 2012-12-06 19:38 UTC (permalink / raw)
  To: Henrik Rydberg; +Cc: Mel Gorman, Jan Kara, linux-mm, Linux Kernel Mailing List

Ok, I've applied the patch.

Mel, some grepping shows that there is an old line that does

    end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages);

which looks bogus. That should probably also use "+ 1" instead. But
I'll consider that an independent issue, so I applied the one patch
regardless.

There is also a

    low_pfn += pageblock_nr_pages;
    low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;

that looks suspicious for similar reasons. Maybe

    low_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages) - 1;

instead? Although that *can* result in the same low_pfn in the end, so
maybe that one was correct after all? I just did some grepping, no
actual semantic analysis...

                  Linus

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops in 3.7-rc8 isolate_free_pages_block()
  2012-12-06 19:38                 ` Linus Torvalds
@ 2012-12-06 21:39                   ` Henrik Rydberg
  2012-12-07  8:32                   ` Mel Gorman
  1 sibling, 0 replies; 19+ messages in thread
From: Henrik Rydberg @ 2012-12-06 21:39 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Mel Gorman, Jan Kara, linux-mm, Linux Kernel Mailing List

> There is also a
> 
>     low_pfn += pageblock_nr_pages;
>     low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
> 
> that looks suspicious for similar reasons. Maybe
> 
>     low_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages) - 1;
> 
> instead? Although that *can* result in the same low_pfn in the end, so
> maybe that one was correct after all? I just did some grepping, no
> actual semantic analysis...

Here is a totally obscure version:

	low_pfn |= pageblock_nr_pages - 1;

It simply moves to the very end of the block, which seems to be what
was intended.

Henrik

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: Oops in 3.7-rc8 isolate_free_pages_block()
  2012-12-06 19:38                 ` Linus Torvalds
  2012-12-06 21:39                   ` Henrik Rydberg
@ 2012-12-07  8:32                   ` Mel Gorman
  1 sibling, 0 replies; 19+ messages in thread
From: Mel Gorman @ 2012-12-07  8:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Henrik Rydberg, Jan Kara, linux-mm, Linux Kernel Mailing List

On Thu, Dec 06, 2012 at 11:38:47AM -0800, Linus Torvalds wrote:
> Ok, I've applied the patch.
> 

Thanks.

> Mel, some grepping shows that there is an old line that does
> 
>     end_pfn = ALIGN(low_pfn + pageblock_nr_pages, pageblock_nr_pages);
> 
> which looks bogus.

It's bogus. The impact is that multiple compaction attempts may be needed
to clear a particular block for allocation. THP allocation success rate
under stress will be lower and the latency before a range of pages is
collapsed by khugepaged to a huge page will be higher. The impact of this
is less and it should not result in a bug like Henrik's

An attentive reviewer is going to exclaim that GFP_ATOMIC allocations for
jumbo frames is impacted by this but it isn't. Even with this bogus walk,
compaction will be clearing SWAP_CLUSTER_MAX contiguous chunks which is
enough for jumbo frames.

> That should probably also use "+ 1" instead. But
> I'll consider that an independent issue, so I applied the one patch
> regardless.
> 
> There is also a
> 
>     low_pfn += pageblock_nr_pages;
>     low_pfn = ALIGN(low_pfn, pageblock_nr_pages) - 1;
> 
> that looks suspicious for similar reasons. Maybe
> 
>     low_pfn = ALIGN(low_pfn + 1, pageblock_nr_pages) - 1;
> 

This one is working by co-incidence because the low_pfn will be aligned
in most cases. If it was outright broken then CMA would never work either.

> instead? Although that *can* result in the same low_pfn in the end, so
> maybe that one was correct after all? I just did some grepping, no
> actual semantic analysis...
> 

They need fixing but the impact is much less severe and does not justify
delaying 3.8 over unlike the other last-minute fixes. My performance
writing patches during talks was less than stellar yesterday so I'll avoid
a repeat performance and follow up with Andrew early next week with a cc
to -stable. It'll also give me a chance to run the patches through the
highalloc stress tests and confirm that allocation success rate is higher
and latency lower as would be expected by such a fix.

Thanks.

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2012-12-07  8:41 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-12-06  9:17 Oops in 3.7-rc8 isolate_free_pages_block() Henrik Rydberg
2012-12-06 14:48 ` Jan Kara
2012-12-06 15:22   ` Henrik Rydberg
2012-12-06 16:10     ` Linus Torvalds
2012-12-06 16:35       ` Mel Gorman
2012-12-06 16:19   ` Mel Gorman
2012-12-06 16:50     ` Linus Torvalds
2012-12-06 17:55       ` Mel Gorman
2012-12-06 18:19         ` Linus Torvalds
2012-12-06 18:21           ` Mel Gorman
2012-12-06 18:32           ` Henrik Rydberg
2012-12-06 18:41             ` Linus Torvalds
2012-12-06 19:01               ` Mel Gorman
2012-12-06 19:28               ` Henrik Rydberg
2012-12-06 19:38                 ` Linus Torvalds
2012-12-06 21:39                   ` Henrik Rydberg
2012-12-07  8:32                   ` Mel Gorman
2012-12-06 16:58     ` Henrik Rydberg
2012-12-06 17:22     ` Henrik Rydberg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).