All of lore.kernel.org
 help / color / mirror / Atom feed
* shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
@ 2012-10-25  2:37 ` Dave Jones
  0 siblings, 0 replies; 69+ messages in thread
From: Dave Jones @ 2012-10-25  2:37 UTC (permalink / raw)
  To: linux-mm; +Cc: Linux Kernel

Machine under significant load (4gb memory used, swap usage fluctuating)
triggered this...

WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
Call Trace:
 [<ffffffff8107100f>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff8107106a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff811903fc>] shmem_getpage_gfp+0xa5c/0xa70
 [<ffffffff8118fc3e>] ? shmem_getpage_gfp+0x29e/0xa70
 [<ffffffff81190e4f>] shmem_fault+0x4f/0xa0
 [<ffffffff8119f391>] __do_fault+0x71/0x5c0
 [<ffffffff810e1ac6>] ? __lock_acquire+0x306/0x1ba0
 [<ffffffff810b6ff9>] ? local_clock+0x89/0xa0
 [<ffffffff811a2767>] handle_pte_fault+0x97/0xae0
 [<ffffffff816d1069>] ? sub_preempt_count+0x79/0xd0
 [<ffffffff8136d68e>] ? delay_tsc+0xae/0x120
 [<ffffffff8136d578>] ? __const_udelay+0x28/0x30
 [<ffffffff811a4a39>] handle_mm_fault+0x289/0x350
 [<ffffffff816d091e>] __do_page_fault+0x18e/0x530
 [<ffffffff810b6ff9>] ? local_clock+0x89/0xa0
 [<ffffffff810b0e51>] ? get_parent_ip+0x11/0x50
 [<ffffffff810b0e51>] ? get_parent_ip+0x11/0x50
 [<ffffffff816d1069>] ? sub_preempt_count+0x79/0xd0
 [<ffffffff8112d389>] ? rcu_user_exit+0xc9/0xf0
 [<ffffffff816d0ceb>] do_page_fault+0x2b/0x50
 [<ffffffff816cd3b8>] page_fault+0x28/0x30
 [<ffffffff8136d259>] ? copy_user_enhanced_fast_string+0x9/0x20
 [<ffffffff8121c181>] ? sys_futimesat+0x41/0xe0
 [<ffffffff8102bf35>] ? syscall_trace_enter+0x25/0x2c0
 [<ffffffff816d5625>] ? tracesys+0x7e/0xe6
 [<ffffffff816d5688>] tracesys+0xe1/0xe6



1148                         error = shmem_add_to_page_cache(page, mapping, index,
1149                                                 gfp, swp_to_radix_entry(swap));
1150                         /* We already confirmed swap, and make no allocation */
1151                         VM_BUG_ON(error);
1152                 }


             total       used       free     shared    buffers     cached
Mem:       3885528    2854064    1031464          0       9624      19208
-/+ buffers/cache:    2825232    1060296
Swap:      6029308      30656    5998652



^ permalink raw reply	[flat|nested] 69+ messages in thread

* shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
@ 2012-10-25  2:37 ` Dave Jones
  0 siblings, 0 replies; 69+ messages in thread
From: Dave Jones @ 2012-10-25  2:37 UTC (permalink / raw)
  To: linux-mm; +Cc: Linux Kernel

Machine under significant load (4gb memory used, swap usage fluctuating)
triggered this...

WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
Call Trace:
 [<ffffffff8107100f>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff8107106a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff811903fc>] shmem_getpage_gfp+0xa5c/0xa70
 [<ffffffff8118fc3e>] ? shmem_getpage_gfp+0x29e/0xa70
 [<ffffffff81190e4f>] shmem_fault+0x4f/0xa0
 [<ffffffff8119f391>] __do_fault+0x71/0x5c0
 [<ffffffff810e1ac6>] ? __lock_acquire+0x306/0x1ba0
 [<ffffffff810b6ff9>] ? local_clock+0x89/0xa0
 [<ffffffff811a2767>] handle_pte_fault+0x97/0xae0
 [<ffffffff816d1069>] ? sub_preempt_count+0x79/0xd0
 [<ffffffff8136d68e>] ? delay_tsc+0xae/0x120
 [<ffffffff8136d578>] ? __const_udelay+0x28/0x30
 [<ffffffff811a4a39>] handle_mm_fault+0x289/0x350
 [<ffffffff816d091e>] __do_page_fault+0x18e/0x530
 [<ffffffff810b6ff9>] ? local_clock+0x89/0xa0
 [<ffffffff810b0e51>] ? get_parent_ip+0x11/0x50
 [<ffffffff810b0e51>] ? get_parent_ip+0x11/0x50
 [<ffffffff816d1069>] ? sub_preempt_count+0x79/0xd0
 [<ffffffff8112d389>] ? rcu_user_exit+0xc9/0xf0
 [<ffffffff816d0ceb>] do_page_fault+0x2b/0x50
 [<ffffffff816cd3b8>] page_fault+0x28/0x30
 [<ffffffff8136d259>] ? copy_user_enhanced_fast_string+0x9/0x20
 [<ffffffff8121c181>] ? sys_futimesat+0x41/0xe0
 [<ffffffff8102bf35>] ? syscall_trace_enter+0x25/0x2c0
 [<ffffffff816d5625>] ? tracesys+0x7e/0xe6
 [<ffffffff816d5688>] tracesys+0xe1/0xe6



1148                         error = shmem_add_to_page_cache(page, mapping, index,
1149                                                 gfp, swp_to_radix_entry(swap));
1150                         /* We already confirmed swap, and make no allocation */
1151                         VM_BUG_ON(error);
1152                 }


             total       used       free     shared    buffers     cached
Mem:       3885528    2854064    1031464          0       9624      19208
-/+ buffers/cache:    2825232    1060296
Swap:      6029308      30656    5998652


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
  2012-10-25  2:37 ` Dave Jones
@ 2012-10-25  4:36   ` Hugh Dickins
  -1 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-10-25  4:36 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-mm, linux-kernel

On Wed, 24 Oct 2012, Dave Jones wrote:

> Machine under significant load (4gb memory used, swap usage fluctuating)
> triggered this...
> 
> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
> Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
> Call Trace:
>  [<ffffffff8107100f>] warn_slowpath_common+0x7f/0xc0
>  [<ffffffff8107106a>] warn_slowpath_null+0x1a/0x20
>  [<ffffffff811903fc>] shmem_getpage_gfp+0xa5c/0xa70
>  [<ffffffff8118fc3e>] ? shmem_getpage_gfp+0x29e/0xa70
>  [<ffffffff81190e4f>] shmem_fault+0x4f/0xa0
>  [<ffffffff8119f391>] __do_fault+0x71/0x5c0
>  [<ffffffff810e1ac6>] ? __lock_acquire+0x306/0x1ba0
>  [<ffffffff810b6ff9>] ? local_clock+0x89/0xa0
>  [<ffffffff811a2767>] handle_pte_fault+0x97/0xae0
>  [<ffffffff816d1069>] ? sub_preempt_count+0x79/0xd0
>  [<ffffffff8136d68e>] ? delay_tsc+0xae/0x120
>  [<ffffffff8136d578>] ? __const_udelay+0x28/0x30
>  [<ffffffff811a4a39>] handle_mm_fault+0x289/0x350
>  [<ffffffff816d091e>] __do_page_fault+0x18e/0x530
>  [<ffffffff810b6ff9>] ? local_clock+0x89/0xa0
>  [<ffffffff810b0e51>] ? get_parent_ip+0x11/0x50
>  [<ffffffff810b0e51>] ? get_parent_ip+0x11/0x50
>  [<ffffffff816d1069>] ? sub_preempt_count+0x79/0xd0
>  [<ffffffff8112d389>] ? rcu_user_exit+0xc9/0xf0
>  [<ffffffff816d0ceb>] do_page_fault+0x2b/0x50
>  [<ffffffff816cd3b8>] page_fault+0x28/0x30
>  [<ffffffff8136d259>] ? copy_user_enhanced_fast_string+0x9/0x20
>  [<ffffffff8121c181>] ? sys_futimesat+0x41/0xe0
>  [<ffffffff8102bf35>] ? syscall_trace_enter+0x25/0x2c0
>  [<ffffffff816d5625>] ? tracesys+0x7e/0xe6
>  [<ffffffff816d5688>] tracesys+0xe1/0xe6
> 
> 
> 
> 1148                         error = shmem_add_to_page_cache(page, mapping, index,
> 1149                                                 gfp, swp_to_radix_entry(swap));
> 1150                         /* We already confirmed swap, and make no allocation */
> 1151                         VM_BUG_ON(error);
> 1152                 }

That's very surprising.  Easy enough to handle an error there, but
of course I made it a VM_BUG_ON because it violates my assumptions:
I rather need to understand how this can be, and I've no idea.

Clutching at straws, I expect this is entirely irrelevant, but:
there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor
in current linux.git; rather, there's a VM_BUG_ON on line 1149.

So you've inserted a couple of lines for some reason (more useful
trinity behaviour, perhaps)?  And have some config option I'm
unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning?

Hugh

> 
> 
>              total       used       free     shared    buffers     cached
> Mem:       3885528    2854064    1031464          0       9624      19208
> -/+ buffers/cache:    2825232    1060296
> Swap:      6029308      30656    5998652

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
@ 2012-10-25  4:36   ` Hugh Dickins
  0 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-10-25  4:36 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-mm, linux-kernel

On Wed, 24 Oct 2012, Dave Jones wrote:

> Machine under significant load (4gb memory used, swap usage fluctuating)
> triggered this...
> 
> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
> Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
> Call Trace:
>  [<ffffffff8107100f>] warn_slowpath_common+0x7f/0xc0
>  [<ffffffff8107106a>] warn_slowpath_null+0x1a/0x20
>  [<ffffffff811903fc>] shmem_getpage_gfp+0xa5c/0xa70
>  [<ffffffff8118fc3e>] ? shmem_getpage_gfp+0x29e/0xa70
>  [<ffffffff81190e4f>] shmem_fault+0x4f/0xa0
>  [<ffffffff8119f391>] __do_fault+0x71/0x5c0
>  [<ffffffff810e1ac6>] ? __lock_acquire+0x306/0x1ba0
>  [<ffffffff810b6ff9>] ? local_clock+0x89/0xa0
>  [<ffffffff811a2767>] handle_pte_fault+0x97/0xae0
>  [<ffffffff816d1069>] ? sub_preempt_count+0x79/0xd0
>  [<ffffffff8136d68e>] ? delay_tsc+0xae/0x120
>  [<ffffffff8136d578>] ? __const_udelay+0x28/0x30
>  [<ffffffff811a4a39>] handle_mm_fault+0x289/0x350
>  [<ffffffff816d091e>] __do_page_fault+0x18e/0x530
>  [<ffffffff810b6ff9>] ? local_clock+0x89/0xa0
>  [<ffffffff810b0e51>] ? get_parent_ip+0x11/0x50
>  [<ffffffff810b0e51>] ? get_parent_ip+0x11/0x50
>  [<ffffffff816d1069>] ? sub_preempt_count+0x79/0xd0
>  [<ffffffff8112d389>] ? rcu_user_exit+0xc9/0xf0
>  [<ffffffff816d0ceb>] do_page_fault+0x2b/0x50
>  [<ffffffff816cd3b8>] page_fault+0x28/0x30
>  [<ffffffff8136d259>] ? copy_user_enhanced_fast_string+0x9/0x20
>  [<ffffffff8121c181>] ? sys_futimesat+0x41/0xe0
>  [<ffffffff8102bf35>] ? syscall_trace_enter+0x25/0x2c0
>  [<ffffffff816d5625>] ? tracesys+0x7e/0xe6
>  [<ffffffff816d5688>] tracesys+0xe1/0xe6
> 
> 
> 
> 1148                         error = shmem_add_to_page_cache(page, mapping, index,
> 1149                                                 gfp, swp_to_radix_entry(swap));
> 1150                         /* We already confirmed swap, and make no allocation */
> 1151                         VM_BUG_ON(error);
> 1152                 }

That's very surprising.  Easy enough to handle an error there, but
of course I made it a VM_BUG_ON because it violates my assumptions:
I rather need to understand how this can be, and I've no idea.

Clutching at straws, I expect this is entirely irrelevant, but:
there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor
in current linux.git; rather, there's a VM_BUG_ON on line 1149.

So you've inserted a couple of lines for some reason (more useful
trinity behaviour, perhaps)?  And have some config option I'm
unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning?

Hugh

> 
> 
>              total       used       free     shared    buffers     cached
> Mem:       3885528    2854064    1031464          0       9624      19208
> -/+ buffers/cache:    2825232    1060296
> Swap:      6029308      30656    5998652

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
  2012-10-25  4:36   ` Hugh Dickins
@ 2012-10-25  4:50     ` Ni zhan Chen
  -1 siblings, 0 replies; 69+ messages in thread
From: Ni zhan Chen @ 2012-10-25  4:50 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Dave Jones, linux-mm, linux-kernel

On 10/25/2012 12:36 PM, Hugh Dickins wrote:
> On Wed, 24 Oct 2012, Dave Jones wrote:
>
>> Machine under significant load (4gb memory used, swap usage fluctuating)
>> triggered this...
>>
>> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
>> Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
>> Call Trace:
>>   [<ffffffff8107100f>] warn_slowpath_common+0x7f/0xc0
>>   [<ffffffff8107106a>] warn_slowpath_null+0x1a/0x20
>>   [<ffffffff811903fc>] shmem_getpage_gfp+0xa5c/0xa70
>>   [<ffffffff8118fc3e>] ? shmem_getpage_gfp+0x29e/0xa70
>>   [<ffffffff81190e4f>] shmem_fault+0x4f/0xa0
>>   [<ffffffff8119f391>] __do_fault+0x71/0x5c0
>>   [<ffffffff810e1ac6>] ? __lock_acquire+0x306/0x1ba0
>>   [<ffffffff810b6ff9>] ? local_clock+0x89/0xa0
>>   [<ffffffff811a2767>] handle_pte_fault+0x97/0xae0
>>   [<ffffffff816d1069>] ? sub_preempt_count+0x79/0xd0
>>   [<ffffffff8136d68e>] ? delay_tsc+0xae/0x120
>>   [<ffffffff8136d578>] ? __const_udelay+0x28/0x30
>>   [<ffffffff811a4a39>] handle_mm_fault+0x289/0x350
>>   [<ffffffff816d091e>] __do_page_fault+0x18e/0x530
>>   [<ffffffff810b6ff9>] ? local_clock+0x89/0xa0
>>   [<ffffffff810b0e51>] ? get_parent_ip+0x11/0x50
>>   [<ffffffff810b0e51>] ? get_parent_ip+0x11/0x50
>>   [<ffffffff816d1069>] ? sub_preempt_count+0x79/0xd0
>>   [<ffffffff8112d389>] ? rcu_user_exit+0xc9/0xf0
>>   [<ffffffff816d0ceb>] do_page_fault+0x2b/0x50
>>   [<ffffffff816cd3b8>] page_fault+0x28/0x30
>>   [<ffffffff8136d259>] ? copy_user_enhanced_fast_string+0x9/0x20
>>   [<ffffffff8121c181>] ? sys_futimesat+0x41/0xe0
>>   [<ffffffff8102bf35>] ? syscall_trace_enter+0x25/0x2c0
>>   [<ffffffff816d5625>] ? tracesys+0x7e/0xe6
>>   [<ffffffff816d5688>] tracesys+0xe1/0xe6
>>
>>
>>
>> 1148                         error = shmem_add_to_page_cache(page, mapping, index,
>> 1149                                                 gfp, swp_to_radix_entry(swap));
>> 1150                         /* We already confirmed swap, and make no allocation */
>> 1151                         VM_BUG_ON(error);
>> 1152                 }
> That's very surprising.  Easy enough to handle an error there, but
> of course I made it a VM_BUG_ON because it violates my assumptions:
> I rather need to understand how this can be, and I've no idea.
>
> Clutching at straws, I expect this is entirely irrelevant, but:
> there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor
> in current linux.git; rather, there's a VM_BUG_ON on line 1149.
>
> So you've inserted a couple of lines for some reason (more useful
> trinity behaviour, perhaps)?  And have some config option I'm
> unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning?

Hi Hugh,

I think it maybe caused by your commit [d189922862e03ce: shmem: fix 
negative rss in memcg memory.stat], one question:

if function shmem_confirm_swap confirm the entry has already brought 
back from swap by a racing thread, then why call shmem_add_to_page_cache 
to add page from swapcache to pagecache again? otherwise, will goto 
unlock and then go to repeat? where I miss?

Regards,
Chen

>
> Hugh
>
>>
>>               total       used       free     shared    buffers     cached
>> Mem:       3885528    2854064    1031464          0       9624      19208
>> -/+ buffers/cache:    2825232    1060296
>> Swap:      6029308      30656    5998652
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
@ 2012-10-25  4:50     ` Ni zhan Chen
  0 siblings, 0 replies; 69+ messages in thread
From: Ni zhan Chen @ 2012-10-25  4:50 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Dave Jones, linux-mm, linux-kernel

On 10/25/2012 12:36 PM, Hugh Dickins wrote:
> On Wed, 24 Oct 2012, Dave Jones wrote:
>
>> Machine under significant load (4gb memory used, swap usage fluctuating)
>> triggered this...
>>
>> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
>> Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
>> Call Trace:
>>   [<ffffffff8107100f>] warn_slowpath_common+0x7f/0xc0
>>   [<ffffffff8107106a>] warn_slowpath_null+0x1a/0x20
>>   [<ffffffff811903fc>] shmem_getpage_gfp+0xa5c/0xa70
>>   [<ffffffff8118fc3e>] ? shmem_getpage_gfp+0x29e/0xa70
>>   [<ffffffff81190e4f>] shmem_fault+0x4f/0xa0
>>   [<ffffffff8119f391>] __do_fault+0x71/0x5c0
>>   [<ffffffff810e1ac6>] ? __lock_acquire+0x306/0x1ba0
>>   [<ffffffff810b6ff9>] ? local_clock+0x89/0xa0
>>   [<ffffffff811a2767>] handle_pte_fault+0x97/0xae0
>>   [<ffffffff816d1069>] ? sub_preempt_count+0x79/0xd0
>>   [<ffffffff8136d68e>] ? delay_tsc+0xae/0x120
>>   [<ffffffff8136d578>] ? __const_udelay+0x28/0x30
>>   [<ffffffff811a4a39>] handle_mm_fault+0x289/0x350
>>   [<ffffffff816d091e>] __do_page_fault+0x18e/0x530
>>   [<ffffffff810b6ff9>] ? local_clock+0x89/0xa0
>>   [<ffffffff810b0e51>] ? get_parent_ip+0x11/0x50
>>   [<ffffffff810b0e51>] ? get_parent_ip+0x11/0x50
>>   [<ffffffff816d1069>] ? sub_preempt_count+0x79/0xd0
>>   [<ffffffff8112d389>] ? rcu_user_exit+0xc9/0xf0
>>   [<ffffffff816d0ceb>] do_page_fault+0x2b/0x50
>>   [<ffffffff816cd3b8>] page_fault+0x28/0x30
>>   [<ffffffff8136d259>] ? copy_user_enhanced_fast_string+0x9/0x20
>>   [<ffffffff8121c181>] ? sys_futimesat+0x41/0xe0
>>   [<ffffffff8102bf35>] ? syscall_trace_enter+0x25/0x2c0
>>   [<ffffffff816d5625>] ? tracesys+0x7e/0xe6
>>   [<ffffffff816d5688>] tracesys+0xe1/0xe6
>>
>>
>>
>> 1148                         error = shmem_add_to_page_cache(page, mapping, index,
>> 1149                                                 gfp, swp_to_radix_entry(swap));
>> 1150                         /* We already confirmed swap, and make no allocation */
>> 1151                         VM_BUG_ON(error);
>> 1152                 }
> That's very surprising.  Easy enough to handle an error there, but
> of course I made it a VM_BUG_ON because it violates my assumptions:
> I rather need to understand how this can be, and I've no idea.
>
> Clutching at straws, I expect this is entirely irrelevant, but:
> there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor
> in current linux.git; rather, there's a VM_BUG_ON on line 1149.
>
> So you've inserted a couple of lines for some reason (more useful
> trinity behaviour, perhaps)?  And have some config option I'm
> unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning?

Hi Hugh,

I think it maybe caused by your commit [d189922862e03ce: shmem: fix 
negative rss in memcg memory.stat], one question:

if function shmem_confirm_swap confirm the entry has already brought 
back from swap by a racing thread, then why call shmem_add_to_page_cache 
to add page from swapcache to pagecache again? otherwise, will goto 
unlock and then go to repeat? where I miss?

Regards,
Chen

>
> Hugh
>
>>
>>               total       used       free     shared    buffers     cached
>> Mem:       3885528    2854064    1031464          0       9624      19208
>> -/+ buffers/cache:    2825232    1060296
>> Swap:      6029308      30656    5998652
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
  2012-10-25  4:50     ` Ni zhan Chen
@ 2012-10-25  6:59       ` Hugh Dickins
  -1 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-10-25  6:59 UTC (permalink / raw)
  To: Ni zhan Chen; +Cc: Dave Jones, linux-mm, linux-kernel

On Thu, 25 Oct 2012, Ni zhan Chen wrote:
> On 10/25/2012 12:36 PM, Hugh Dickins wrote:
> > On Wed, 24 Oct 2012, Dave Jones wrote:
> > 
> > > Machine under significant load (4gb memory used, swap usage fluctuating)
> > > triggered this...
> > > 
> > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
> > > Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
> > > 
> > > 1148                         error = shmem_add_to_page_cache(page,
> > > mapping, index,
> > > 1149                                                 gfp,
> > > swp_to_radix_entry(swap));
> > > 1150                         /* We already confirmed swap, and make no
> > > allocation */
> > > 1151                         VM_BUG_ON(error);
> > > 1152                 }
> > That's very surprising.  Easy enough to handle an error there, but
> > of course I made it a VM_BUG_ON because it violates my assumptions:
> > I rather need to understand how this can be, and I've no idea.
> > 
> > Clutching at straws, I expect this is entirely irrelevant, but:
> > there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor
> > in current linux.git; rather, there's a VM_BUG_ON on line 1149.
> > 
> > So you've inserted a couple of lines for some reason (more useful
> > trinity behaviour, perhaps)?  And have some config option I'm
> > unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning?
> 
> Hi Hugh,
> 
> I think it maybe caused by your commit [d189922862e03ce: shmem: fix negative
> rss in memcg memory.stat], one question:

Well, yes, I added the VM_BUG_ON in that commit.

> 
> if function shmem_confirm_swap confirm the entry has already brought back
> from swap by a racing thread,

The reverse: true confirms that the swap entry has not been brought back
from swap by a racing thread; false indicates that there has been a race.

> then why call shmem_add_to_page_cache to add
> page from swapcache to pagecache again?

Adding it to pagecache again, after such a race, would set error to
-EEXIST (originating from radix_tree_insert); but we don't do that,
we add it to pagecache when it has not already been added.

Or that's the intention: but Dave seems to have found an unexpected
exception, despite us holding the page lock across all this.

(But if it weren't for the memcg and replace_page issues, I'd much
prefer to let shmem_add_to_page_cache discover the race as before.)

Hugh

> otherwise, will goto unlock and then go to repeat? where I miss?
> 
> Regards,
> Chen

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
@ 2012-10-25  6:59       ` Hugh Dickins
  0 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-10-25  6:59 UTC (permalink / raw)
  To: Ni zhan Chen; +Cc: Dave Jones, linux-mm, linux-kernel

On Thu, 25 Oct 2012, Ni zhan Chen wrote:
> On 10/25/2012 12:36 PM, Hugh Dickins wrote:
> > On Wed, 24 Oct 2012, Dave Jones wrote:
> > 
> > > Machine under significant load (4gb memory used, swap usage fluctuating)
> > > triggered this...
> > > 
> > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
> > > Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
> > > 
> > > 1148                         error = shmem_add_to_page_cache(page,
> > > mapping, index,
> > > 1149                                                 gfp,
> > > swp_to_radix_entry(swap));
> > > 1150                         /* We already confirmed swap, and make no
> > > allocation */
> > > 1151                         VM_BUG_ON(error);
> > > 1152                 }
> > That's very surprising.  Easy enough to handle an error there, but
> > of course I made it a VM_BUG_ON because it violates my assumptions:
> > I rather need to understand how this can be, and I've no idea.
> > 
> > Clutching at straws, I expect this is entirely irrelevant, but:
> > there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor
> > in current linux.git; rather, there's a VM_BUG_ON on line 1149.
> > 
> > So you've inserted a couple of lines for some reason (more useful
> > trinity behaviour, perhaps)?  And have some config option I'm
> > unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning?
> 
> Hi Hugh,
> 
> I think it maybe caused by your commit [d189922862e03ce: shmem: fix negative
> rss in memcg memory.stat], one question:

Well, yes, I added the VM_BUG_ON in that commit.

> 
> if function shmem_confirm_swap confirm the entry has already brought back
> from swap by a racing thread,

The reverse: true confirms that the swap entry has not been brought back
from swap by a racing thread; false indicates that there has been a race.

> then why call shmem_add_to_page_cache to add
> page from swapcache to pagecache again?

Adding it to pagecache again, after such a race, would set error to
-EEXIST (originating from radix_tree_insert); but we don't do that,
we add it to pagecache when it has not already been added.

Or that's the intention: but Dave seems to have found an unexpected
exception, despite us holding the page lock across all this.

(But if it weren't for the memcg and replace_page issues, I'd much
prefer to let shmem_add_to_page_cache discover the race as before.)

Hugh

> otherwise, will goto unlock and then go to repeat? where I miss?
> 
> Regards,
> Chen

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
  2012-10-25  6:59       ` Hugh Dickins
  (?)
@ 2012-10-25  9:53       ` Ni zhan Chen
  -1 siblings, 0 replies; 69+ messages in thread
From: Ni zhan Chen @ 2012-10-25  9:53 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Dave Jones, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2867 bytes --]

On 10/25/2012 02:59 PM, Hugh Dickins wrote:
> On Thu, 25 Oct 2012, Ni zhan Chen wrote:
>> On 10/25/2012 12:36 PM, Hugh Dickins wrote:
>>> On Wed, 24 Oct 2012, Dave Jones wrote:
>>>
>>>> Machine under significant load (4gb memory used, swap usage fluctuating)
>>>> triggered this...
>>>>
>>>> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
>>>> Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
>>>>
>>>> 1148                         error = shmem_add_to_page_cache(page,
>>>> mapping, index,
>>>> 1149                                                 gfp,
>>>> swp_to_radix_entry(swap));
>>>> 1150                         /* We already confirmed swap, and make no
>>>> allocation */
>>>> 1151                         VM_BUG_ON(error);
>>>> 1152                 }
>>> That's very surprising.  Easy enough to handle an error there, but
>>> of course I made it a VM_BUG_ON because it violates my assumptions:
>>> I rather need to understand how this can be, and I've no idea.
>>>
>>> Clutching at straws, I expect this is entirely irrelevant, but:
>>> there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor
>>> in current linux.git; rather, there's a VM_BUG_ON on line 1149.
>>>
>>> So you've inserted a couple of lines for some reason (more useful
>>> trinity behaviour, perhaps)?  And have some config option I'm
>>> unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning?
>> Hi Hugh,
>>
>> I think it maybe caused by your commit [d189922862e03ce: shmem: fix negative
>> rss in memcg memory.stat], one question:
> Well, yes, I added the VM_BUG_ON in that commit.
>
>> if function shmem_confirm_swap confirm the entry has already brought back
>> from swap by a racing thread,
> The reverse: true confirms that the swap entry has not been brought back
> from swap by a racing thread; false indicates that there has been a race.
>
>> then why call shmem_add_to_page_cache to add
>> page from swapcache to pagecache again?
> Adding it to pagecache again, after such a race, would set error to
> -EEXIST (originating from radix_tree_insert); but we don't do that,
> we add it to pagecache when it has not already been added.
>
> Or that's the intention: but Dave seems to have found an unexpected
> exception, despite us holding the page lock across all this.
>
> (But if it weren't for the memcg and replace_page issues, I'd much
> prefer to let shmem_add_to_page_cache discover the race as before.)
>
> Hugh

Hi Hugh

Thanks for your response. You mean the -EEXIST originating from 
radix_tree_insert, in radix_tree_insert:
if (slot != NULL)
     return -EEXIST;
But why slot should be NULL? if no race, the pagecache related radix 
tree entry should be RADIX_TREE_EXCEPTIONAL_ENTRY+swap_entry_t.val, 
where I miss?

Regards,
Chen

>
>> otherwise, will goto unlock and then go to repeat? where I miss?
>>
>> Regards,
>> Chen


[-- Attachment #2: Type: text/html, Size: 4576 bytes --]

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
  2012-10-25  6:59       ` Hugh Dickins
@ 2012-10-25 10:21         ` Ni zhan Chen
  -1 siblings, 0 replies; 69+ messages in thread
From: Ni zhan Chen @ 2012-10-25 10:21 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Dave Jones, linux-mm, linux-kernel

On 10/25/2012 02:59 PM, Hugh Dickins wrote:
> On Thu, 25 Oct 2012, Ni zhan Chen wrote:
>> On 10/25/2012 12:36 PM, Hugh Dickins wrote:
>>> On Wed, 24 Oct 2012, Dave Jones wrote:
>>>
>>>> Machine under significant load (4gb memory used, swap usage fluctuating)
>>>> triggered this...
>>>>
>>>> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
>>>> Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
>>>>
>>>> 1148                         error = shmem_add_to_page_cache(page,
>>>> mapping, index,
>>>> 1149                                                 gfp,
>>>> swp_to_radix_entry(swap));
>>>> 1150                         /* We already confirmed swap, and make no
>>>> allocation */
>>>> 1151                         VM_BUG_ON(error);
>>>> 1152                 }
>>> That's very surprising.  Easy enough to handle an error there, but
>>> of course I made it a VM_BUG_ON because it violates my assumptions:
>>> I rather need to understand how this can be, and I've no idea.
>>>
>>> Clutching at straws, I expect this is entirely irrelevant, but:
>>> there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor
>>> in current linux.git; rather, there's a VM_BUG_ON on line 1149.
>>>
>>> So you've inserted a couple of lines for some reason (more useful
>>> trinity behaviour, perhaps)?  And have some config option I'm
>>> unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning?
>> Hi Hugh,
>>
>> I think it maybe caused by your commit [d189922862e03ce: shmem: fix negative
>> rss in memcg memory.stat], one question:
> Well, yes, I added the VM_BUG_ON in that commit.
>
>> if function shmem_confirm_swap confirm the entry has already brought back
>> from swap by a racing thread,
> The reverse: true confirms that the swap entry has not been brought back
> from swap by a racing thread; false indicates that there has been a race.
>
>> then why call shmem_add_to_page_cache to add
>> page from swapcache to pagecache again?
> Adding it to pagecache again, after such a race, would set error to
> -EEXIST (originating from radix_tree_insert); but we don't do that,
> we add it to pagecache when it has not already been added.
>
> Or that's the intention: but Dave seems to have found an unexpected
> exception, despite us holding the page lock across all this.
>
> (But if it weren't for the memcg and replace_page issues, I'd much
> prefer to let shmem_add_to_page_cache discover the race as before.)
>
> Hugh

Hi Hugh

Thanks for your response. You mean the -EEXIST originating from 
radix_tree_insert, in radix_tree_insert:
if (slot != NULL)
     return -EEXIST;
But why slot should be NULL? if no race, the pagecache related radix 
tree entry should be RADIX_TREE_EXCEPTIONAL_ENTRY+swap_entry_t.val, 
where I miss?

Regards,
Chen

>
>> otherwise, will goto unlock and then go to repeat? where I miss?
>>
>> Regards,
>> Chen


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
@ 2012-10-25 10:21         ` Ni zhan Chen
  0 siblings, 0 replies; 69+ messages in thread
From: Ni zhan Chen @ 2012-10-25 10:21 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Dave Jones, linux-mm, linux-kernel

On 10/25/2012 02:59 PM, Hugh Dickins wrote:
> On Thu, 25 Oct 2012, Ni zhan Chen wrote:
>> On 10/25/2012 12:36 PM, Hugh Dickins wrote:
>>> On Wed, 24 Oct 2012, Dave Jones wrote:
>>>
>>>> Machine under significant load (4gb memory used, swap usage fluctuating)
>>>> triggered this...
>>>>
>>>> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
>>>> Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
>>>>
>>>> 1148                         error = shmem_add_to_page_cache(page,
>>>> mapping, index,
>>>> 1149                                                 gfp,
>>>> swp_to_radix_entry(swap));
>>>> 1150                         /* We already confirmed swap, and make no
>>>> allocation */
>>>> 1151                         VM_BUG_ON(error);
>>>> 1152                 }
>>> That's very surprising.  Easy enough to handle an error there, but
>>> of course I made it a VM_BUG_ON because it violates my assumptions:
>>> I rather need to understand how this can be, and I've no idea.
>>>
>>> Clutching at straws, I expect this is entirely irrelevant, but:
>>> there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor
>>> in current linux.git; rather, there's a VM_BUG_ON on line 1149.
>>>
>>> So you've inserted a couple of lines for some reason (more useful
>>> trinity behaviour, perhaps)?  And have some config option I'm
>>> unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning?
>> Hi Hugh,
>>
>> I think it maybe caused by your commit [d189922862e03ce: shmem: fix negative
>> rss in memcg memory.stat], one question:
> Well, yes, I added the VM_BUG_ON in that commit.
>
>> if function shmem_confirm_swap confirm the entry has already brought back
>> from swap by a racing thread,
> The reverse: true confirms that the swap entry has not been brought back
> from swap by a racing thread; false indicates that there has been a race.
>
>> then why call shmem_add_to_page_cache to add
>> page from swapcache to pagecache again?
> Adding it to pagecache again, after such a race, would set error to
> -EEXIST (originating from radix_tree_insert); but we don't do that,
> we add it to pagecache when it has not already been added.
>
> Or that's the intention: but Dave seems to have found an unexpected
> exception, despite us holding the page lock across all this.
>
> (But if it weren't for the memcg and replace_page issues, I'd much
> prefer to let shmem_add_to_page_cache discover the race as before.)
>
> Hugh

Hi Hugh

Thanks for your response. You mean the -EEXIST originating from 
radix_tree_insert, in radix_tree_insert:
if (slot != NULL)
     return -EEXIST;
But why slot should be NULL? if no race, the pagecache related radix 
tree entry should be RADIX_TREE_EXCEPTIONAL_ENTRY+swap_entry_t.val, 
where I miss?

Regards,
Chen

>
>> otherwise, will goto unlock and then go to repeat? where I miss?
>>
>> Regards,
>> Chen

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
  2012-10-25  4:36   ` Hugh Dickins
@ 2012-10-25 11:14     ` Dave Jones
  -1 siblings, 0 replies; 69+ messages in thread
From: Dave Jones @ 2012-10-25 11:14 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm, linux-kernel

On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote:

 > > 1148                         error = shmem_add_to_page_cache(page, mapping, index,
 > > 1149                                                 gfp, swp_to_radix_entry(swap));
 > > 1150                         /* We already confirmed swap, and make no allocation */
 > > 1151                         VM_BUG_ON(error);
 > > 1152                 }
 > 
 > That's very surprising.  Easy enough to handle an error there, but
 > of course I made it a VM_BUG_ON because it violates my assumptions:
 > I rather need to understand how this can be, and I've no idea.
 > 
 > Clutching at straws, I expect this is entirely irrelevant, but:
 > there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor
 > in current linux.git; rather, there's a VM_BUG_ON on line 1149.
 > 
 > So you've inserted a couple of lines for some reason (more useful
 > trinity behaviour, perhaps)? 

detritus from the recent mpol_to_str bug that I was chasing.
Shouldn't be relevant...

diff -durpN '--exclude-from=/home/davej/.exclude' src/git-trees/kernel/linux/mm>
--- src/git-trees/kernel/linux/mm/shmem.c       2012-10-12 10:01:46.613408580 ->
+++ linux-dj/mm/shmem.c 2012-10-15 12:31:32.979653309 -0400
@@ -885,13 +885,15 @@ redirty:
 static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol)
 {
        char buffer[64];
+       int ret;
 
        if (!mpol || mpol->mode == MPOL_DEFAULT)
                return;         /* show nothing */
 
-       mpol_to_str(buffer, sizeof(buffer), mpol, 1);
-
-       seq_printf(seq, ",mpol=%s", buffer);
+       memset(buffer, 0, sizeof(buffer));
+       ret = mpol_to_str(buffer, sizeof(buffer), mpol, 1);
+       if (ret > 0)
+               seq_printf(seq, ",mpol=%s", buffer);
 }


 > And have some config option I'm
 > unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning?

Yes, I do have this..

-#define VM_BUG_ON(cond) BUG_ON(cond)
+#define VM_BUG_ON(cond) WARN_ON(cond)

because I got tired of things not going over my usb serial port when I hit them
a while ago. BUG_ON is pretty unfriendly to bug finding.

	Dave


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
@ 2012-10-25 11:14     ` Dave Jones
  0 siblings, 0 replies; 69+ messages in thread
From: Dave Jones @ 2012-10-25 11:14 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm, linux-kernel

On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote:

 > > 1148                         error = shmem_add_to_page_cache(page, mapping, index,
 > > 1149                                                 gfp, swp_to_radix_entry(swap));
 > > 1150                         /* We already confirmed swap, and make no allocation */
 > > 1151                         VM_BUG_ON(error);
 > > 1152                 }
 > 
 > That's very surprising.  Easy enough to handle an error there, but
 > of course I made it a VM_BUG_ON because it violates my assumptions:
 > I rather need to understand how this can be, and I've no idea.
 > 
 > Clutching at straws, I expect this is entirely irrelevant, but:
 > there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor
 > in current linux.git; rather, there's a VM_BUG_ON on line 1149.
 > 
 > So you've inserted a couple of lines for some reason (more useful
 > trinity behaviour, perhaps)? 

detritus from the recent mpol_to_str bug that I was chasing.
Shouldn't be relevant...

diff -durpN '--exclude-from=/home/davej/.exclude' src/git-trees/kernel/linux/mm>
--- src/git-trees/kernel/linux/mm/shmem.c       2012-10-12 10:01:46.613408580 ->
+++ linux-dj/mm/shmem.c 2012-10-15 12:31:32.979653309 -0400
@@ -885,13 +885,15 @@ redirty:
 static void shmem_show_mpol(struct seq_file *seq, struct mempolicy *mpol)
 {
        char buffer[64];
+       int ret;
 
        if (!mpol || mpol->mode == MPOL_DEFAULT)
                return;         /* show nothing */
 
-       mpol_to_str(buffer, sizeof(buffer), mpol, 1);
-
-       seq_printf(seq, ",mpol=%s", buffer);
+       memset(buffer, 0, sizeof(buffer));
+       ret = mpol_to_str(buffer, sizeof(buffer), mpol, 1);
+       if (ret > 0)
+               seq_printf(seq, ",mpol=%s", buffer);
 }


 > And have some config option I'm
 > unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning?

Yes, I do have this..

-#define VM_BUG_ON(cond) BUG_ON(cond)
+#define VM_BUG_ON(cond) WARN_ON(cond)

because I got tired of things not going over my usb serial port when I hit them
a while ago. BUG_ON is pretty unfriendly to bug finding.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
  2012-10-25  4:36   ` Hugh Dickins
@ 2012-10-25 20:52     ` Johannes Weiner
  -1 siblings, 0 replies; 69+ messages in thread
From: Johannes Weiner @ 2012-10-25 20:52 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Dave Jones, linux-mm, linux-kernel

On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote:
> On Wed, 24 Oct 2012, Dave Jones wrote:
> 
> > Machine under significant load (4gb memory used, swap usage fluctuating)
> > triggered this...
> > 
> > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
> > Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
> > Call Trace:
> >  [<ffffffff8107100f>] warn_slowpath_common+0x7f/0xc0
> >  [<ffffffff8107106a>] warn_slowpath_null+0x1a/0x20
> >  [<ffffffff811903fc>] shmem_getpage_gfp+0xa5c/0xa70
> >  [<ffffffff8118fc3e>] ? shmem_getpage_gfp+0x29e/0xa70
> >  [<ffffffff81190e4f>] shmem_fault+0x4f/0xa0
> >  [<ffffffff8119f391>] __do_fault+0x71/0x5c0
> >  [<ffffffff810e1ac6>] ? __lock_acquire+0x306/0x1ba0
> >  [<ffffffff810b6ff9>] ? local_clock+0x89/0xa0
> >  [<ffffffff811a2767>] handle_pte_fault+0x97/0xae0
> >  [<ffffffff816d1069>] ? sub_preempt_count+0x79/0xd0
> >  [<ffffffff8136d68e>] ? delay_tsc+0xae/0x120
> >  [<ffffffff8136d578>] ? __const_udelay+0x28/0x30
> >  [<ffffffff811a4a39>] handle_mm_fault+0x289/0x350
> >  [<ffffffff816d091e>] __do_page_fault+0x18e/0x530
> >  [<ffffffff810b6ff9>] ? local_clock+0x89/0xa0
> >  [<ffffffff810b0e51>] ? get_parent_ip+0x11/0x50
> >  [<ffffffff810b0e51>] ? get_parent_ip+0x11/0x50
> >  [<ffffffff816d1069>] ? sub_preempt_count+0x79/0xd0
> >  [<ffffffff8112d389>] ? rcu_user_exit+0xc9/0xf0
> >  [<ffffffff816d0ceb>] do_page_fault+0x2b/0x50
> >  [<ffffffff816cd3b8>] page_fault+0x28/0x30
> >  [<ffffffff8136d259>] ? copy_user_enhanced_fast_string+0x9/0x20
> >  [<ffffffff8121c181>] ? sys_futimesat+0x41/0xe0
> >  [<ffffffff8102bf35>] ? syscall_trace_enter+0x25/0x2c0
> >  [<ffffffff816d5625>] ? tracesys+0x7e/0xe6
> >  [<ffffffff816d5688>] tracesys+0xe1/0xe6
> > 
> > 
> > 
> > 1148                         error = shmem_add_to_page_cache(page, mapping, index,
> > 1149                                                 gfp, swp_to_radix_entry(swap));
> > 1150                         /* We already confirmed swap, and make no allocation */
> > 1151                         VM_BUG_ON(error);
> > 1152                 }
> 
> That's very surprising.  Easy enough to handle an error there, but
> of course I made it a VM_BUG_ON because it violates my assumptions:
> I rather need to understand how this can be, and I've no idea.

Could it be concurrent truncation clearing out the entry between
shmem_confirm_swap() and shmem_add_to_page_cache()?  I don't see
anything preventing that.

The empty slot would not match the expected swap entry this call
passes in and the returned error would be -ENOENT.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
@ 2012-10-25 20:52     ` Johannes Weiner
  0 siblings, 0 replies; 69+ messages in thread
From: Johannes Weiner @ 2012-10-25 20:52 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Dave Jones, linux-mm, linux-kernel

On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote:
> On Wed, 24 Oct 2012, Dave Jones wrote:
> 
> > Machine under significant load (4gb memory used, swap usage fluctuating)
> > triggered this...
> > 
> > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
> > Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
> > Call Trace:
> >  [<ffffffff8107100f>] warn_slowpath_common+0x7f/0xc0
> >  [<ffffffff8107106a>] warn_slowpath_null+0x1a/0x20
> >  [<ffffffff811903fc>] shmem_getpage_gfp+0xa5c/0xa70
> >  [<ffffffff8118fc3e>] ? shmem_getpage_gfp+0x29e/0xa70
> >  [<ffffffff81190e4f>] shmem_fault+0x4f/0xa0
> >  [<ffffffff8119f391>] __do_fault+0x71/0x5c0
> >  [<ffffffff810e1ac6>] ? __lock_acquire+0x306/0x1ba0
> >  [<ffffffff810b6ff9>] ? local_clock+0x89/0xa0
> >  [<ffffffff811a2767>] handle_pte_fault+0x97/0xae0
> >  [<ffffffff816d1069>] ? sub_preempt_count+0x79/0xd0
> >  [<ffffffff8136d68e>] ? delay_tsc+0xae/0x120
> >  [<ffffffff8136d578>] ? __const_udelay+0x28/0x30
> >  [<ffffffff811a4a39>] handle_mm_fault+0x289/0x350
> >  [<ffffffff816d091e>] __do_page_fault+0x18e/0x530
> >  [<ffffffff810b6ff9>] ? local_clock+0x89/0xa0
> >  [<ffffffff810b0e51>] ? get_parent_ip+0x11/0x50
> >  [<ffffffff810b0e51>] ? get_parent_ip+0x11/0x50
> >  [<ffffffff816d1069>] ? sub_preempt_count+0x79/0xd0
> >  [<ffffffff8112d389>] ? rcu_user_exit+0xc9/0xf0
> >  [<ffffffff816d0ceb>] do_page_fault+0x2b/0x50
> >  [<ffffffff816cd3b8>] page_fault+0x28/0x30
> >  [<ffffffff8136d259>] ? copy_user_enhanced_fast_string+0x9/0x20
> >  [<ffffffff8121c181>] ? sys_futimesat+0x41/0xe0
> >  [<ffffffff8102bf35>] ? syscall_trace_enter+0x25/0x2c0
> >  [<ffffffff816d5625>] ? tracesys+0x7e/0xe6
> >  [<ffffffff816d5688>] tracesys+0xe1/0xe6
> > 
> > 
> > 
> > 1148                         error = shmem_add_to_page_cache(page, mapping, index,
> > 1149                                                 gfp, swp_to_radix_entry(swap));
> > 1150                         /* We already confirmed swap, and make no allocation */
> > 1151                         VM_BUG_ON(error);
> > 1152                 }
> 
> That's very surprising.  Easy enough to handle an error there, but
> of course I made it a VM_BUG_ON because it violates my assumptions:
> I rather need to understand how this can be, and I've no idea.

Could it be concurrent truncation clearing out the entry between
shmem_confirm_swap() and shmem_add_to_page_cache()?  I don't see
anything preventing that.

The empty slot would not match the expected swap entry this call
passes in and the returned error would be -ENOENT.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
  2012-10-25 10:21         ` Ni zhan Chen
@ 2012-10-25 21:27           ` Hugh Dickins
  -1 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-10-25 21:27 UTC (permalink / raw)
  To: Ni zhan Chen; +Cc: Dave Jones, linux-mm, linux-kernel

On Thu, 25 Oct 2012, Ni zhan Chen wrote:
> On 10/25/2012 02:59 PM, Hugh Dickins wrote:
> > On Thu, 25 Oct 2012, Ni zhan Chen wrote:
> > > 
> > > I think it maybe caused by your commit [d189922862e03ce: shmem: fix
> > > negative
> > > rss in memcg memory.stat], one question:
> > Well, yes, I added the VM_BUG_ON in that commit.
> > 
> > > if function shmem_confirm_swap confirm the entry has already brought back
> > > from swap by a racing thread,
> > The reverse: true confirms that the swap entry has not been brought back
> > from swap by a racing thread; false indicates that there has been a race.
> > 
> > > then why call shmem_add_to_page_cache to add
> > > page from swapcache to pagecache again?
> > Adding it to pagecache again, after such a race, would set error to
> > -EEXIST (originating from radix_tree_insert); but we don't do that,
> > we add it to pagecache when it has not already been added.
> > 
> > Or that's the intention: but Dave seems to have found an unexpected
> > exception, despite us holding the page lock across all this.
> > 
> > (But if it weren't for the memcg and replace_page issues, I'd much
> > prefer to let shmem_add_to_page_cache discover the race as before.)
> > 
> > Hugh
> 
> Hi Hugh
> 
> Thanks for your response. You mean the -EEXIST originating from
> radix_tree_insert, in radix_tree_insert:
> if (slot != NULL)
>     return -EEXIST;
> But why slot should be NULL? if no race, the pagecache related radix tree
> entry should be RADIX_TREE_EXCEPTIONAL_ENTRY+swap_entry_t.val, where I miss?

I was describing what would happen in a case that should not exist,
that you had thought the common case.  In actuality, the entry should
not be NULL, it should be as you say there.

Hugh

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
@ 2012-10-25 21:27           ` Hugh Dickins
  0 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-10-25 21:27 UTC (permalink / raw)
  To: Ni zhan Chen; +Cc: Dave Jones, linux-mm, linux-kernel

On Thu, 25 Oct 2012, Ni zhan Chen wrote:
> On 10/25/2012 02:59 PM, Hugh Dickins wrote:
> > On Thu, 25 Oct 2012, Ni zhan Chen wrote:
> > > 
> > > I think it maybe caused by your commit [d189922862e03ce: shmem: fix
> > > negative
> > > rss in memcg memory.stat], one question:
> > Well, yes, I added the VM_BUG_ON in that commit.
> > 
> > > if function shmem_confirm_swap confirm the entry has already brought back
> > > from swap by a racing thread,
> > The reverse: true confirms that the swap entry has not been brought back
> > from swap by a racing thread; false indicates that there has been a race.
> > 
> > > then why call shmem_add_to_page_cache to add
> > > page from swapcache to pagecache again?
> > Adding it to pagecache again, after such a race, would set error to
> > -EEXIST (originating from radix_tree_insert); but we don't do that,
> > we add it to pagecache when it has not already been added.
> > 
> > Or that's the intention: but Dave seems to have found an unexpected
> > exception, despite us holding the page lock across all this.
> > 
> > (But if it weren't for the memcg and replace_page issues, I'd much
> > prefer to let shmem_add_to_page_cache discover the race as before.)
> > 
> > Hugh
> 
> Hi Hugh
> 
> Thanks for your response. You mean the -EEXIST originating from
> radix_tree_insert, in radix_tree_insert:
> if (slot != NULL)
>     return -EEXIST;
> But why slot should be NULL? if no race, the pagecache related radix tree
> entry should be RADIX_TREE_EXCEPTIONAL_ENTRY+swap_entry_t.val, where I miss?

I was describing what would happen in a case that should not exist,
that you had thought the common case.  In actuality, the entry should
not be NULL, it should be as you say there.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
  2012-10-25 11:14     ` Dave Jones
@ 2012-10-25 21:28       ` Hugh Dickins
  -1 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-10-25 21:28 UTC (permalink / raw)
  To: Dave Jones, Hugh Dickins, linux-mm, linux-kernel

On Thu, 25 Oct 2012, Dave Jones wrote:
> On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote:
> 
>  > Clutching at straws, I expect this is entirely irrelevant, but:
>  > there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor
>  > in current linux.git; rather, there's a VM_BUG_ON on line 1149.
>  > 
>  > So you've inserted a couple of lines for some reason (more useful
>  > trinity behaviour, perhaps)? 
> 
> detritus from the recent mpol_to_str bug that I was chasing.
> Shouldn't be relevant...
> 
>  > And have some config option I'm
>  > unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning?
> 
> Yes, I do have this..
> 
> -#define VM_BUG_ON(cond) BUG_ON(cond)
> +#define VM_BUG_ON(cond) WARN_ON(cond)
> 
> because I got tired of things not going over my usb serial port when I hit them
> a while ago. BUG_ON is pretty unfriendly to bug finding.

Makes sense, thanks for the reassurance.

Hugh

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
@ 2012-10-25 21:28       ` Hugh Dickins
  0 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-10-25 21:28 UTC (permalink / raw)
  To: Dave Jones, Hugh Dickins, linux-mm, linux-kernel

On Thu, 25 Oct 2012, Dave Jones wrote:
> On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote:
> 
>  > Clutching at straws, I expect this is entirely irrelevant, but:
>  > there isn't a warning on line 1151 of mm/shmem.c in 3.7.0-rc2 nor
>  > in current linux.git; rather, there's a VM_BUG_ON on line 1149.
>  > 
>  > So you've inserted a couple of lines for some reason (more useful
>  > trinity behaviour, perhaps)? 
> 
> detritus from the recent mpol_to_str bug that I was chasing.
> Shouldn't be relevant...
> 
>  > And have some config option I'm
>  > unfamiliar with, that mutates a BUG_ON or VM_BUG_ON into a warning?
> 
> Yes, I do have this..
> 
> -#define VM_BUG_ON(cond) BUG_ON(cond)
> +#define VM_BUG_ON(cond) WARN_ON(cond)
> 
> because I got tired of things not going over my usb serial port when I hit them
> a while ago. BUG_ON is pretty unfriendly to bug finding.

Makes sense, thanks for the reassurance.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
  2012-10-25 20:52     ` Johannes Weiner
@ 2012-10-25 21:48       ` Hugh Dickins
  -1 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-10-25 21:48 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Dave Jones, linux-mm, linux-kernel

On Thu, 25 Oct 2012, Johannes Weiner wrote:
> On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote:
> > On Wed, 24 Oct 2012, Dave Jones wrote:
> > 
> > > Machine under significant load (4gb memory used, swap usage fluctuating)
> > > triggered this...
> > > 
> > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
> > > Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
> > > 
> > > 1148                         error = shmem_add_to_page_cache(page, mapping, index,
> > > 1149                                                 gfp, swp_to_radix_entry(swap));
> > > 1150                         /* We already confirmed swap, and make no allocation */
> > > 1151                         VM_BUG_ON(error);
> > > 1152                 }
> > 
> > That's very surprising.  Easy enough to handle an error there, but
> > of course I made it a VM_BUG_ON because it violates my assumptions:
> > I rather need to understand how this can be, and I've no idea.
> 
> Could it be concurrent truncation clearing out the entry between
> shmem_confirm_swap() and shmem_add_to_page_cache()?  I don't see
> anything preventing that.
> 
> The empty slot would not match the expected swap entry this call
> passes in and the returned error would be -ENOENT.

Excellent notion, many thanks Hannes, I believe you've got it.

I've hit that truncation problem in swapoff (and commented on it
in shmem_unuse_inode), but never hit it or considered it here.
I think of the page lock as holding it stable, but truncation's
free_swap_and_cache only does a trylock on the swapcache page,
so we're not secured against that possibility.

So I'd like to change it to VM_BUG_ON(error && error != -ENOENT),
but there's a little tidying up to do in the -ENOENT case, which
needs more thought.  A delete_from_swap_cache(page) - though we
can be lazy and leave that to reclaim for such a rare occurrence -
and probably a mem_cgroup uncharge; but the memcg hooks are always
the hardest to get right, I'll have think about that one carefully.

Hugh

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
@ 2012-10-25 21:48       ` Hugh Dickins
  0 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-10-25 21:48 UTC (permalink / raw)
  To: Johannes Weiner; +Cc: Dave Jones, linux-mm, linux-kernel

On Thu, 25 Oct 2012, Johannes Weiner wrote:
> On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote:
> > On Wed, 24 Oct 2012, Dave Jones wrote:
> > 
> > > Machine under significant load (4gb memory used, swap usage fluctuating)
> > > triggered this...
> > > 
> > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
> > > Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
> > > 
> > > 1148                         error = shmem_add_to_page_cache(page, mapping, index,
> > > 1149                                                 gfp, swp_to_radix_entry(swap));
> > > 1150                         /* We already confirmed swap, and make no allocation */
> > > 1151                         VM_BUG_ON(error);
> > > 1152                 }
> > 
> > That's very surprising.  Easy enough to handle an error there, but
> > of course I made it a VM_BUG_ON because it violates my assumptions:
> > I rather need to understand how this can be, and I've no idea.
> 
> Could it be concurrent truncation clearing out the entry between
> shmem_confirm_swap() and shmem_add_to_page_cache()?  I don't see
> anything preventing that.
> 
> The empty slot would not match the expected swap entry this call
> passes in and the returned error would be -ENOENT.

Excellent notion, many thanks Hannes, I believe you've got it.

I've hit that truncation problem in swapoff (and commented on it
in shmem_unuse_inode), but never hit it or considered it here.
I think of the page lock as holding it stable, but truncation's
free_swap_and_cache only does a trylock on the swapcache page,
so we're not secured against that possibility.

So I'd like to change it to VM_BUG_ON(error && error != -ENOENT),
but there's a little tidying up to do in the -ENOENT case, which
needs more thought.  A delete_from_swap_cache(page) - though we
can be lazy and leave that to reclaim for such a rare occurrence -
and probably a mem_cgroup uncharge; but the memcg hooks are always
the hardest to get right, I'll have think about that one carefully.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
  2012-10-25 21:27           ` Hugh Dickins
@ 2012-10-26  1:48             ` Ni zhan Chen
  -1 siblings, 0 replies; 69+ messages in thread
From: Ni zhan Chen @ 2012-10-26  1:48 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Dave Jones, linux-mm, linux-kernel

On 10/26/2012 05:27 AM, Hugh Dickins wrote:
> On Thu, 25 Oct 2012, Ni zhan Chen wrote:
>> On 10/25/2012 02:59 PM, Hugh Dickins wrote:
>>> On Thu, 25 Oct 2012, Ni zhan Chen wrote:
>>>> I think it maybe caused by your commit [d189922862e03ce: shmem: fix
>>>> negative
>>>> rss in memcg memory.stat], one question:
>>> Well, yes, I added the VM_BUG_ON in that commit.
>>>
>>>> if function shmem_confirm_swap confirm the entry has already brought back
>>>> from swap by a racing thread,
>>> The reverse: true confirms that the swap entry has not been brought back
>>> from swap by a racing thread; false indicates that there has been a race.
>>>
>>>> then why call shmem_add_to_page_cache to add
>>>> page from swapcache to pagecache again?
>>> Adding it to pagecache again, after such a race, would set error to
>>> -EEXIST (originating from radix_tree_insert); but we don't do that,
>>> we add it to pagecache when it has not already been added.
>>>
>>> Or that's the intention: but Dave seems to have found an unexpected
>>> exception, despite us holding the page lock across all this.
>>>
>>> (But if it weren't for the memcg and replace_page issues, I'd much
>>> prefer to let shmem_add_to_page_cache discover the race as before.)
>>>
>>> Hugh
>> Hi Hugh
>>
>> Thanks for your response. You mean the -EEXIST originating from
>> radix_tree_insert, in radix_tree_insert:
>> if (slot != NULL)
>>      return -EEXIST;
>> But why slot should be NULL? if no race, the pagecache related radix tree
>> entry should be RADIX_TREE_EXCEPTIONAL_ENTRY+swap_entry_t.val, where I miss?
> I was describing what would happen in a case that should not exist,
> that you had thought the common case.  In actuality, the entry should
> not be NULL, it should be as you say there.

Thanks for your patience. So in the common case, the entry should be the 
value I mentioned, then why has this check?
if (slot != NULL)
     return -EEXIST;

the common case will return -EEXIST.

>
> Hugh
>


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
@ 2012-10-26  1:48             ` Ni zhan Chen
  0 siblings, 0 replies; 69+ messages in thread
From: Ni zhan Chen @ 2012-10-26  1:48 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Dave Jones, linux-mm, linux-kernel

On 10/26/2012 05:27 AM, Hugh Dickins wrote:
> On Thu, 25 Oct 2012, Ni zhan Chen wrote:
>> On 10/25/2012 02:59 PM, Hugh Dickins wrote:
>>> On Thu, 25 Oct 2012, Ni zhan Chen wrote:
>>>> I think it maybe caused by your commit [d189922862e03ce: shmem: fix
>>>> negative
>>>> rss in memcg memory.stat], one question:
>>> Well, yes, I added the VM_BUG_ON in that commit.
>>>
>>>> if function shmem_confirm_swap confirm the entry has already brought back
>>>> from swap by a racing thread,
>>> The reverse: true confirms that the swap entry has not been brought back
>>> from swap by a racing thread; false indicates that there has been a race.
>>>
>>>> then why call shmem_add_to_page_cache to add
>>>> page from swapcache to pagecache again?
>>> Adding it to pagecache again, after such a race, would set error to
>>> -EEXIST (originating from radix_tree_insert); but we don't do that,
>>> we add it to pagecache when it has not already been added.
>>>
>>> Or that's the intention: but Dave seems to have found an unexpected
>>> exception, despite us holding the page lock across all this.
>>>
>>> (But if it weren't for the memcg and replace_page issues, I'd much
>>> prefer to let shmem_add_to_page_cache discover the race as before.)
>>>
>>> Hugh
>> Hi Hugh
>>
>> Thanks for your response. You mean the -EEXIST originating from
>> radix_tree_insert, in radix_tree_insert:
>> if (slot != NULL)
>>      return -EEXIST;
>> But why slot should be NULL? if no race, the pagecache related radix tree
>> entry should be RADIX_TREE_EXCEPTIONAL_ENTRY+swap_entry_t.val, where I miss?
> I was describing what would happen in a case that should not exist,
> that you had thought the common case.  In actuality, the entry should
> not be NULL, it should be as you say there.

Thanks for your patience. So in the common case, the entry should be the 
value I mentioned, then why has this check?
if (slot != NULL)
     return -EEXIST;

the common case will return -EEXIST.

>
> Hugh
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
  2012-10-25 21:48       ` Hugh Dickins
@ 2012-10-26  2:15         ` Ni zhan Chen
  -1 siblings, 0 replies; 69+ messages in thread
From: Ni zhan Chen @ 2012-10-26  2:15 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Johannes Weiner, Dave Jones, linux-mm, linux-kernel

On 10/26/2012 05:48 AM, Hugh Dickins wrote:
> On Thu, 25 Oct 2012, Johannes Weiner wrote:
>> On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote:
>>> On Wed, 24 Oct 2012, Dave Jones wrote:
>>>
>>>> Machine under significant load (4gb memory used, swap usage fluctuating)
>>>> triggered this...
>>>>
>>>> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
>>>> Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
>>>>
>>>> 1148                         error = shmem_add_to_page_cache(page, mapping, index,
>>>> 1149                                                 gfp, swp_to_radix_entry(swap));
>>>> 1150                         /* We already confirmed swap, and make no allocation */
>>>> 1151                         VM_BUG_ON(error);
>>>> 1152                 }
>>> That's very surprising.  Easy enough to handle an error there, but
>>> of course I made it a VM_BUG_ON because it violates my assumptions:
>>> I rather need to understand how this can be, and I've no idea.
>> Could it be concurrent truncation clearing out the entry between
>> shmem_confirm_swap() and shmem_add_to_page_cache()?  I don't see
>> anything preventing that.
>>
>> The empty slot would not match the expected swap entry this call
>> passes in and the returned error would be -ENOENT.
> Excellent notion, many thanks Hannes, I believe you've got it.
>
> I've hit that truncation problem in swapoff (and commented on it
> in shmem_unuse_inode), but never hit it or considered it here.
> I think of the page lock as holding it stable, but truncation's
> free_swap_and_cache only does a trylock on the swapcache page,
> so we're not secured against that possibility.

Hi Hugh,

Even though free_swap_and_cache only does a trylock on the swapcache 
page, but it doens't call delete_from_swap_cache and the associated 
entry should still be there, I am interested in what you have already 
introduce to protect it?

>
> So I'd like to change it to VM_BUG_ON(error && error != -ENOENT),
> but there's a little tidying up to do in the -ENOENT case, which

Do you mean radix_tree_insert will return -ENOENT if the associated 
entry is not present? Why I can't find this return value in the function 
radix_tree_insert?

> needs more thought.  A delete_from_swap_cache(page) - though we
> can be lazy and leave that to reclaim for such a rare occurrence -
> and probably a mem_cgroup uncharge; but the memcg hooks are always
> the hardest to get right, I'll have think about that one carefully.
>
> Hugh
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
@ 2012-10-26  2:15         ` Ni zhan Chen
  0 siblings, 0 replies; 69+ messages in thread
From: Ni zhan Chen @ 2012-10-26  2:15 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Johannes Weiner, Dave Jones, linux-mm, linux-kernel

On 10/26/2012 05:48 AM, Hugh Dickins wrote:
> On Thu, 25 Oct 2012, Johannes Weiner wrote:
>> On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote:
>>> On Wed, 24 Oct 2012, Dave Jones wrote:
>>>
>>>> Machine under significant load (4gb memory used, swap usage fluctuating)
>>>> triggered this...
>>>>
>>>> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
>>>> Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
>>>>
>>>> 1148                         error = shmem_add_to_page_cache(page, mapping, index,
>>>> 1149                                                 gfp, swp_to_radix_entry(swap));
>>>> 1150                         /* We already confirmed swap, and make no allocation */
>>>> 1151                         VM_BUG_ON(error);
>>>> 1152                 }
>>> That's very surprising.  Easy enough to handle an error there, but
>>> of course I made it a VM_BUG_ON because it violates my assumptions:
>>> I rather need to understand how this can be, and I've no idea.
>> Could it be concurrent truncation clearing out the entry between
>> shmem_confirm_swap() and shmem_add_to_page_cache()?  I don't see
>> anything preventing that.
>>
>> The empty slot would not match the expected swap entry this call
>> passes in and the returned error would be -ENOENT.
> Excellent notion, many thanks Hannes, I believe you've got it.
>
> I've hit that truncation problem in swapoff (and commented on it
> in shmem_unuse_inode), but never hit it or considered it here.
> I think of the page lock as holding it stable, but truncation's
> free_swap_and_cache only does a trylock on the swapcache page,
> so we're not secured against that possibility.

Hi Hugh,

Even though free_swap_and_cache only does a trylock on the swapcache 
page, but it doens't call delete_from_swap_cache and the associated 
entry should still be there, I am interested in what you have already 
introduce to protect it?

>
> So I'd like to change it to VM_BUG_ON(error && error != -ENOENT),
> but there's a little tidying up to do in the -ENOENT case, which

Do you mean radix_tree_insert will return -ENOENT if the associated 
entry is not present? Why I can't find this return value in the function 
radix_tree_insert?

> needs more thought.  A delete_from_swap_cache(page) - though we
> can be lazy and leave that to reclaim for such a rare occurrence -
> and probably a mem_cgroup uncharge; but the memcg hooks are always
> the hardest to get right, I'll have think about that one carefully.
>
> Hugh
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
  2012-10-25  4:36   ` Hugh Dickins
@ 2012-11-01 19:10     ` Dave Jones
  -1 siblings, 0 replies; 69+ messages in thread
From: Dave Jones @ 2012-11-01 19:10 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm, linux-kernel

On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote:
 > On Wed, 24 Oct 2012, Dave Jones wrote:
 > 
 > > Machine under significant load (4gb memory used, swap usage fluctuating)
 > > triggered this...
 > > 
 > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
 > > Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
 > > Call Trace:
 > >  [<ffffffff8107100f>] warn_slowpath_common+0x7f/0xc0
 > >  [<ffffffff8107106a>] warn_slowpath_null+0x1a/0x20
 > >  [<ffffffff811903fc>] shmem_getpage_gfp+0xa5c/0xa70
 > >  [<ffffffff8118fc3e>] ? shmem_getpage_gfp+0x29e/0xa70
 > >  [<ffffffff81190e4f>] shmem_fault+0x4f/0xa0
 > >  [<ffffffff8119f391>] __do_fault+0x71/0x5c0
 > >  [<ffffffff810e1ac6>] ? __lock_acquire+0x306/0x1ba0
 > >  [<ffffffff810b6ff9>] ? local_clock+0x89/0xa0
 > >  [<ffffffff811a2767>] handle_pte_fault+0x97/0xae0
 > >  [<ffffffff816d1069>] ? sub_preempt_count+0x79/0xd0
 > >  [<ffffffff8136d68e>] ? delay_tsc+0xae/0x120
 > >  [<ffffffff8136d578>] ? __const_udelay+0x28/0x30
 > >  [<ffffffff811a4a39>] handle_mm_fault+0x289/0x350
 > >  [<ffffffff816d091e>] __do_page_fault+0x18e/0x530
 > >  [<ffffffff810b6ff9>] ? local_clock+0x89/0xa0
 > >  [<ffffffff810b0e51>] ? get_parent_ip+0x11/0x50
 > >  [<ffffffff810b0e51>] ? get_parent_ip+0x11/0x50
 > >  [<ffffffff816d1069>] ? sub_preempt_count+0x79/0xd0
 > >  [<ffffffff8112d389>] ? rcu_user_exit+0xc9/0xf0
 > >  [<ffffffff816d0ceb>] do_page_fault+0x2b/0x50
 > >  [<ffffffff816cd3b8>] page_fault+0x28/0x30
 > >  [<ffffffff8136d259>] ? copy_user_enhanced_fast_string+0x9/0x20
 > >  [<ffffffff8121c181>] ? sys_futimesat+0x41/0xe0
 > >  [<ffffffff8102bf35>] ? syscall_trace_enter+0x25/0x2c0
 > >  [<ffffffff816d5625>] ? tracesys+0x7e/0xe6
 > >  [<ffffffff816d5688>] tracesys+0xe1/0xe6
 > > 
 > > 
 > > 
 > > 1148                         error = shmem_add_to_page_cache(page, mapping, index,
 > > 1149                                                 gfp, swp_to_radix_entry(swap));
 > > 1150                         /* We already confirmed swap, and make no allocation */
 > > 1151                         VM_BUG_ON(error);
 > > 1152                 }
 > 
 > That's very surprising.  Easy enough to handle an error there, but
 > of course I made it a VM_BUG_ON because it violates my assumptions:
 > I rather need to understand how this can be, and I've no idea.

I just noticed we had a user report hitting this same warning, but
with a different trace..

: [<ffffffff8105b84f>] warn_slowpath_common+0x7f/0xc0
: [<ffffffff8105b8aa>] warn_slowpath_null+0x1a/0x20
: [<ffffffff81143c73>] shmem_getpage_gfp+0x7f3/0x830
: [<ffffffff81158c9d>] ? vma_adjust+0x3ed/0x620
: [<ffffffff81143f02>] shmem_file_aio_read+0x1f2/0x380
: [<ffffffff8118e487>] do_sync_read+0xa7/0xe0
: [<ffffffff8118eda9>] vfs_read+0xa9/0x180
: [<ffffffff8118eeca>] sys_read+0x4a/0x90
: [<ffffffff816226e9>] system_call_fastpath+0x16/0x1b

	Dave
 

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
@ 2012-11-01 19:10     ` Dave Jones
  0 siblings, 0 replies; 69+ messages in thread
From: Dave Jones @ 2012-11-01 19:10 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm, linux-kernel

On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote:
 > On Wed, 24 Oct 2012, Dave Jones wrote:
 > 
 > > Machine under significant load (4gb memory used, swap usage fluctuating)
 > > triggered this...
 > > 
 > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
 > > Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
 > > Call Trace:
 > >  [<ffffffff8107100f>] warn_slowpath_common+0x7f/0xc0
 > >  [<ffffffff8107106a>] warn_slowpath_null+0x1a/0x20
 > >  [<ffffffff811903fc>] shmem_getpage_gfp+0xa5c/0xa70
 > >  [<ffffffff8118fc3e>] ? shmem_getpage_gfp+0x29e/0xa70
 > >  [<ffffffff81190e4f>] shmem_fault+0x4f/0xa0
 > >  [<ffffffff8119f391>] __do_fault+0x71/0x5c0
 > >  [<ffffffff810e1ac6>] ? __lock_acquire+0x306/0x1ba0
 > >  [<ffffffff810b6ff9>] ? local_clock+0x89/0xa0
 > >  [<ffffffff811a2767>] handle_pte_fault+0x97/0xae0
 > >  [<ffffffff816d1069>] ? sub_preempt_count+0x79/0xd0
 > >  [<ffffffff8136d68e>] ? delay_tsc+0xae/0x120
 > >  [<ffffffff8136d578>] ? __const_udelay+0x28/0x30
 > >  [<ffffffff811a4a39>] handle_mm_fault+0x289/0x350
 > >  [<ffffffff816d091e>] __do_page_fault+0x18e/0x530
 > >  [<ffffffff810b6ff9>] ? local_clock+0x89/0xa0
 > >  [<ffffffff810b0e51>] ? get_parent_ip+0x11/0x50
 > >  [<ffffffff810b0e51>] ? get_parent_ip+0x11/0x50
 > >  [<ffffffff816d1069>] ? sub_preempt_count+0x79/0xd0
 > >  [<ffffffff8112d389>] ? rcu_user_exit+0xc9/0xf0
 > >  [<ffffffff816d0ceb>] do_page_fault+0x2b/0x50
 > >  [<ffffffff816cd3b8>] page_fault+0x28/0x30
 > >  [<ffffffff8136d259>] ? copy_user_enhanced_fast_string+0x9/0x20
 > >  [<ffffffff8121c181>] ? sys_futimesat+0x41/0xe0
 > >  [<ffffffff8102bf35>] ? syscall_trace_enter+0x25/0x2c0
 > >  [<ffffffff816d5625>] ? tracesys+0x7e/0xe6
 > >  [<ffffffff816d5688>] tracesys+0xe1/0xe6
 > > 
 > > 
 > > 
 > > 1148                         error = shmem_add_to_page_cache(page, mapping, index,
 > > 1149                                                 gfp, swp_to_radix_entry(swap));
 > > 1150                         /* We already confirmed swap, and make no allocation */
 > > 1151                         VM_BUG_ON(error);
 > > 1152                 }
 > 
 > That's very surprising.  Easy enough to handle an error there, but
 > of course I made it a VM_BUG_ON because it violates my assumptions:
 > I rather need to understand how this can be, and I've no idea.

I just noticed we had a user report hitting this same warning, but
with a different trace..

: [<ffffffff8105b84f>] warn_slowpath_common+0x7f/0xc0
: [<ffffffff8105b8aa>] warn_slowpath_null+0x1a/0x20
: [<ffffffff81143c73>] shmem_getpage_gfp+0x7f3/0x830
: [<ffffffff81158c9d>] ? vma_adjust+0x3ed/0x620
: [<ffffffff81143f02>] shmem_file_aio_read+0x1f2/0x380
: [<ffffffff8118e487>] do_sync_read+0xa7/0xe0
: [<ffffffff8118eda9>] vfs_read+0xa9/0x180
: [<ffffffff8118eeca>] sys_read+0x4a/0x90
: [<ffffffff816226e9>] system_call_fastpath+0x16/0x1b

	Dave
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
  2012-11-01 19:10     ` Dave Jones
@ 2012-11-01 23:03       ` Hugh Dickins
  -1 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-11-01 23:03 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-mm, linux-kernel

On Thu, 1 Nov 2012, Dave Jones wrote:
> On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote:
>  > On Wed, 24 Oct 2012, Dave Jones wrote:
>  > 
>  > > Machine under significant load (4gb memory used, swap usage fluctuating)
>  > > triggered this...
>  > > 
>  > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
>  > > 
>  > > 1148                         error = shmem_add_to_page_cache(page, mapping, index,
>  > > 1149                                                 gfp, swp_to_radix_entry(swap));
>  > > 1150                         /* We already confirmed swap, and make no allocation */
>  > > 1151                         VM_BUG_ON(error);
>  > > 1152                 }
>  > 
>  > That's very surprising.  Easy enough to handle an error there, but
>  > of course I made it a VM_BUG_ON because it violates my assumptions:
>  > I rather need to understand how this can be, and I've no idea.
> 
> I just noticed we had a user report hitting this same warning, but
> with a different trace..
> 
> : [<ffffffff8105b84f>] warn_slowpath_common+0x7f/0xc0
> : [<ffffffff8105b8aa>] warn_slowpath_null+0x1a/0x20
> : [<ffffffff81143c73>] shmem_getpage_gfp+0x7f3/0x830
> : [<ffffffff81158c9d>] ? vma_adjust+0x3ed/0x620
> : [<ffffffff81143f02>] shmem_file_aio_read+0x1f2/0x380
> : [<ffffffff8118e487>] do_sync_read+0xa7/0xe0
> : [<ffffffff8118eda9>] vfs_read+0xa9/0x180
> : [<ffffffff8118eeca>] sys_read+0x4a/0x90
> : [<ffffffff816226e9>] system_call_fastpath+0x16/0x1b

Equally explicable by Hannes's hypothesis;
but useful supporting evidence, thank you.

Except... earlier in the thread you explained how you hacked
#define VM_BUG_ON(cond) WARN_ON(cond)
to get this to come out as a warning instead of a bug,
and now it looks as if "a user" has here done the same.

Which is very much a user's right, of course; but does
make me wonder whether that user might actually be davej ;)

Never mind, whatever, it's more justification for the fix - which
I've honestly not forgotten, but somehow not got around to sending
(with a couple of others even longer outstanding).  On its way
shortly, for some unpredictable value of shortly.

Hugh

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
@ 2012-11-01 23:03       ` Hugh Dickins
  0 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-11-01 23:03 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-mm, linux-kernel

On Thu, 1 Nov 2012, Dave Jones wrote:
> On Wed, Oct 24, 2012 at 09:36:27PM -0700, Hugh Dickins wrote:
>  > On Wed, 24 Oct 2012, Dave Jones wrote:
>  > 
>  > > Machine under significant load (4gb memory used, swap usage fluctuating)
>  > > triggered this...
>  > > 
>  > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
>  > > 
>  > > 1148                         error = shmem_add_to_page_cache(page, mapping, index,
>  > > 1149                                                 gfp, swp_to_radix_entry(swap));
>  > > 1150                         /* We already confirmed swap, and make no allocation */
>  > > 1151                         VM_BUG_ON(error);
>  > > 1152                 }
>  > 
>  > That's very surprising.  Easy enough to handle an error there, but
>  > of course I made it a VM_BUG_ON because it violates my assumptions:
>  > I rather need to understand how this can be, and I've no idea.
> 
> I just noticed we had a user report hitting this same warning, but
> with a different trace..
> 
> : [<ffffffff8105b84f>] warn_slowpath_common+0x7f/0xc0
> : [<ffffffff8105b8aa>] warn_slowpath_null+0x1a/0x20
> : [<ffffffff81143c73>] shmem_getpage_gfp+0x7f3/0x830
> : [<ffffffff81158c9d>] ? vma_adjust+0x3ed/0x620
> : [<ffffffff81143f02>] shmem_file_aio_read+0x1f2/0x380
> : [<ffffffff8118e487>] do_sync_read+0xa7/0xe0
> : [<ffffffff8118eda9>] vfs_read+0xa9/0x180
> : [<ffffffff8118eeca>] sys_read+0x4a/0x90
> : [<ffffffff816226e9>] system_call_fastpath+0x16/0x1b

Equally explicable by Hannes's hypothesis;
but useful supporting evidence, thank you.

Except... earlier in the thread you explained how you hacked
#define VM_BUG_ON(cond) WARN_ON(cond)
to get this to come out as a warning instead of a bug,
and now it looks as if "a user" has here done the same.

Which is very much a user's right, of course; but does
make me wonder whether that user might actually be davej ;)

Never mind, whatever, it's more justification for the fix - which
I've honestly not forgotten, but somehow not got around to sending
(with a couple of others even longer outstanding).  On its way
shortly, for some unpredictable value of shortly.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
  2012-11-01 23:03       ` Hugh Dickins
@ 2012-11-01 23:20         ` Dave Jones
  -1 siblings, 0 replies; 69+ messages in thread
From: Dave Jones @ 2012-11-01 23:20 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm, linux-kernel

On Thu, Nov 01, 2012 at 04:03:40PM -0700, Hugh Dickins wrote:
 > > I just noticed we had a user report hitting this same warning, but
 > > with a different trace..
 > > 
 > > : [<ffffffff8105b84f>] warn_slowpath_common+0x7f/0xc0
 > > : [<ffffffff8105b8aa>] warn_slowpath_null+0x1a/0x20
 > > : [<ffffffff81143c73>] shmem_getpage_gfp+0x7f3/0x830
 > > : [<ffffffff81158c9d>] ? vma_adjust+0x3ed/0x620
 > > : [<ffffffff81143f02>] shmem_file_aio_read+0x1f2/0x380
 > > : [<ffffffff8118e487>] do_sync_read+0xa7/0xe0
 > > : [<ffffffff8118eda9>] vfs_read+0xa9/0x180
 > > : [<ffffffff8118eeca>] sys_read+0x4a/0x90
 > > : [<ffffffff816226e9>] system_call_fastpath+0x16/0x1b
 > 
 > Equally explicable by Hannes's hypothesis;
 > but useful supporting evidence, thank you.
 > 
 > Except... earlier in the thread you explained how you hacked
 > #define VM_BUG_ON(cond) WARN_ON(cond)
 > to get this to come out as a warning instead of a bug,
 > and now it looks as if "a user" has here done the same.
 > 
 > Which is very much a user's right, of course; but does
 > make me wonder whether that user might actually be davej ;)

indirectly. I made the same change in the Fedora kernel a while ago
to test a hypothesis that we weren't getting any VM_BUG_ON reports.

	Dave


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
@ 2012-11-01 23:20         ` Dave Jones
  0 siblings, 0 replies; 69+ messages in thread
From: Dave Jones @ 2012-11-01 23:20 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm, linux-kernel

On Thu, Nov 01, 2012 at 04:03:40PM -0700, Hugh Dickins wrote:
 > > I just noticed we had a user report hitting this same warning, but
 > > with a different trace..
 > > 
 > > : [<ffffffff8105b84f>] warn_slowpath_common+0x7f/0xc0
 > > : [<ffffffff8105b8aa>] warn_slowpath_null+0x1a/0x20
 > > : [<ffffffff81143c73>] shmem_getpage_gfp+0x7f3/0x830
 > > : [<ffffffff81158c9d>] ? vma_adjust+0x3ed/0x620
 > > : [<ffffffff81143f02>] shmem_file_aio_read+0x1f2/0x380
 > > : [<ffffffff8118e487>] do_sync_read+0xa7/0xe0
 > > : [<ffffffff8118eda9>] vfs_read+0xa9/0x180
 > > : [<ffffffff8118eeca>] sys_read+0x4a/0x90
 > > : [<ffffffff816226e9>] system_call_fastpath+0x16/0x1b
 > 
 > Equally explicable by Hannes's hypothesis;
 > but useful supporting evidence, thank you.
 > 
 > Except... earlier in the thread you explained how you hacked
 > #define VM_BUG_ON(cond) WARN_ON(cond)
 > to get this to come out as a warning instead of a bug,
 > and now it looks as if "a user" has here done the same.
 > 
 > Which is very much a user's right, of course; but does
 > make me wonder whether that user might actually be davej ;)

indirectly. I made the same change in the Fedora kernel a while ago
to test a hypothesis that we weren't getting any VM_BUG_ON reports.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
  2012-11-01 23:20         ` Dave Jones
@ 2012-11-01 23:48           ` Hugh Dickins
  -1 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-11-01 23:48 UTC (permalink / raw)
  To: Dave Jones, linux-mm, linux-kernel

On Thu, 1 Nov 2012, Dave Jones wrote:
> On Thu, Nov 01, 2012 at 04:03:40PM -0700, Hugh Dickins wrote:
>  > 
>  > Except... earlier in the thread you explained how you hacked
>  > #define VM_BUG_ON(cond) WARN_ON(cond)
>  > to get this to come out as a warning instead of a bug,
>  > and now it looks as if "a user" has here done the same.
>  > 
>  > Which is very much a user's right, of course; but does
>  > make me wonder whether that user might actually be davej ;)
> 
> indirectly. I made the same change in the Fedora kernel a while ago
> to test a hypothesis that we weren't getting any VM_BUG_ON reports.

Fedora turns on CONFIG_DEBUG_VM?

All mm developers should thank you for the wider testing exposure;
but I'm not so sure that Fedora users should thank you for turning
it on - really it's for mm developers to wrap around !assertions or
more expensive checks (e.g. checking calls) in their development.

Or did I read a few months ago that some change had been made to
such definitions, and VM_BUG_ON(contents) are evaluated even when
the config option is off?  I do hope I'm mistaken on that.

Hugh

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
@ 2012-11-01 23:48           ` Hugh Dickins
  0 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-11-01 23:48 UTC (permalink / raw)
  To: Dave Jones, linux-mm, linux-kernel

On Thu, 1 Nov 2012, Dave Jones wrote:
> On Thu, Nov 01, 2012 at 04:03:40PM -0700, Hugh Dickins wrote:
>  > 
>  > Except... earlier in the thread you explained how you hacked
>  > #define VM_BUG_ON(cond) WARN_ON(cond)
>  > to get this to come out as a warning instead of a bug,
>  > and now it looks as if "a user" has here done the same.
>  > 
>  > Which is very much a user's right, of course; but does
>  > make me wonder whether that user might actually be davej ;)
> 
> indirectly. I made the same change in the Fedora kernel a while ago
> to test a hypothesis that we weren't getting any VM_BUG_ON reports.

Fedora turns on CONFIG_DEBUG_VM?

All mm developers should thank you for the wider testing exposure;
but I'm not so sure that Fedora users should thank you for turning
it on - really it's for mm developers to wrap around !assertions or
more expensive checks (e.g. checking calls) in their development.

Or did I read a few months ago that some change had been made to
such definitions, and VM_BUG_ON(contents) are evaluated even when
the config option is off?  I do hope I'm mistaken on that.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
  2012-11-01 23:48           ` Hugh Dickins
@ 2012-11-02  1:43             ` Dave Jones
  -1 siblings, 0 replies; 69+ messages in thread
From: Dave Jones @ 2012-11-02  1:43 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm, linux-kernel

On Thu, Nov 01, 2012 at 04:48:41PM -0700, Hugh Dickins wrote:
 > On Thu, 1 Nov 2012, Dave Jones wrote:
 > > On Thu, Nov 01, 2012 at 04:03:40PM -0700, Hugh Dickins wrote:
 > >  > 
 > >  > Except... earlier in the thread you explained how you hacked
 > >  > #define VM_BUG_ON(cond) WARN_ON(cond)
 > >  > to get this to come out as a warning instead of a bug,
 > >  > and now it looks as if "a user" has here done the same.
 > >  > 
 > >  > Which is very much a user's right, of course; but does
 > >  > make me wonder whether that user might actually be davej ;)
 > > 
 > > indirectly. I made the same change in the Fedora kernel a while ago
 > > to test a hypothesis that we weren't getting any VM_BUG_ON reports.
 > 
 > Fedora turns on CONFIG_DEBUG_VM?

Yes.
 
 > All mm developers should thank you for the wider testing exposure;
 > but I'm not so sure that Fedora users should thank you for turning
 > it on - really it's for mm developers to wrap around !assertions or
 > more expensive checks (e.g. checking calls) in their development.

The last time I did some benchmarking the impact wasn't as ridiculous
as say lockdep, or spinlock debug. Maybe the benchmarks I was using
weren't pushing the VM very hard, but it seemed to me that the value
in getting info in potential problems early was higher than a small
performance increase.

 > Or did I read a few months ago that some change had been made to
 > such definitions, and VM_BUG_ON(contents) are evaluated even when
 > the config option is off?  I do hope I'm mistaken on that.

Pretty sure that isn't the case. I remember Andrew chastising people
a few times for putting checks in VM_BUG_ON's that needed to stay around 
even when the config option was off. Perhaps you were thinking of one
of those incidents ?

	Dave

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
@ 2012-11-02  1:43             ` Dave Jones
  0 siblings, 0 replies; 69+ messages in thread
From: Dave Jones @ 2012-11-02  1:43 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: linux-mm, linux-kernel

On Thu, Nov 01, 2012 at 04:48:41PM -0700, Hugh Dickins wrote:
 > On Thu, 1 Nov 2012, Dave Jones wrote:
 > > On Thu, Nov 01, 2012 at 04:03:40PM -0700, Hugh Dickins wrote:
 > >  > 
 > >  > Except... earlier in the thread you explained how you hacked
 > >  > #define VM_BUG_ON(cond) WARN_ON(cond)
 > >  > to get this to come out as a warning instead of a bug,
 > >  > and now it looks as if "a user" has here done the same.
 > >  > 
 > >  > Which is very much a user's right, of course; but does
 > >  > make me wonder whether that user might actually be davej ;)
 > > 
 > > indirectly. I made the same change in the Fedora kernel a while ago
 > > to test a hypothesis that we weren't getting any VM_BUG_ON reports.
 > 
 > Fedora turns on CONFIG_DEBUG_VM?

Yes.
 
 > All mm developers should thank you for the wider testing exposure;
 > but I'm not so sure that Fedora users should thank you for turning
 > it on - really it's for mm developers to wrap around !assertions or
 > more expensive checks (e.g. checking calls) in their development.

The last time I did some benchmarking the impact wasn't as ridiculous
as say lockdep, or spinlock debug. Maybe the benchmarks I was using
weren't pushing the VM very hard, but it seemed to me that the value
in getting info in potential problems early was higher than a small
performance increase.

 > Or did I read a few months ago that some change had been made to
 > such definitions, and VM_BUG_ON(contents) are evaluated even when
 > the config option is off?  I do hope I'm mistaken on that.

Pretty sure that isn't the case. I remember Andrew chastising people
a few times for putting checks in VM_BUG_ON's that needed to stay around 
even when the config option was off. Perhaps you were thinking of one
of those incidents ?

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
  2012-11-02  1:43             ` Dave Jones
@ 2012-11-02 23:26               ` Hugh Dickins
  -1 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-11-02 23:26 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-mm, linux-kernel

On Thu, 1 Nov 2012, Dave Jones wrote:
> On Thu, Nov 01, 2012 at 04:48:41PM -0700, Hugh Dickins wrote:
>  > 
>  > Fedora turns on CONFIG_DEBUG_VM?
> 
> Yes.
>  
>  > All mm developers should thank you for the wider testing exposure;
>  > but I'm not so sure that Fedora users should thank you for turning
>  > it on - really it's for mm developers to wrap around !assertions or
>  > more expensive checks (e.g. checking calls) in their development.
> 
> The last time I did some benchmarking the impact wasn't as ridiculous
> as say lockdep, or spinlock debug.

I think you're safe to assume that (outside of an individual developer's
private tree) it will never be nearly as heavy as lockdep or debug
pagealloc.  I hadn't thought of spinlock debug as a heavy one, but
yes, I guess it would be heavier than almost all VM_BUG_ON()s.

> Maybe the benchmarks I was using
> weren't pushing the VM very hard, but it seemed to me that the value
> in getting info in potential problems early was higher than a small
> performance increase.

We thank you.  I may have been over-estimating how much we put inside
those VM_BUG_ON()s, sorry.  Just so long as you're aware that there's
a danger that one day we might slip something heavier in there.

Those few explicit #ifdef CONFIG_DEBUG_VMs sometimes found in mm/
are probably the worst: you might want to check on the current crop.

> 
>  > Or did I read a few months ago that some change had been made to
>  > such definitions, and VM_BUG_ON(contents) are evaluated even when
>  > the config option is off?  I do hope I'm mistaken on that.
> 
> Pretty sure that isn't the case. I remember Andrew chastising people
> a few times for putting checks in VM_BUG_ON's that needed to stay around 
> even when the config option was off. Perhaps you were thinking of one
> of those incidents ?

Avoiding side-effects in BUG_ON and VM_BUG_ON.  Yes, that comes up
from time to time, and I'm a believer on that.  I think the discussion
I'm mis/remembering sprung out of one of those: someone was surprised
by the disassembly they found when it was configured off.

The correct answer is to try it for myself and see.  Not today.

Hugh

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2]
@ 2012-11-02 23:26               ` Hugh Dickins
  0 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-11-02 23:26 UTC (permalink / raw)
  To: Dave Jones; +Cc: linux-mm, linux-kernel

On Thu, 1 Nov 2012, Dave Jones wrote:
> On Thu, Nov 01, 2012 at 04:48:41PM -0700, Hugh Dickins wrote:
>  > 
>  > Fedora turns on CONFIG_DEBUG_VM?
> 
> Yes.
>  
>  > All mm developers should thank you for the wider testing exposure;
>  > but I'm not so sure that Fedora users should thank you for turning
>  > it on - really it's for mm developers to wrap around !assertions or
>  > more expensive checks (e.g. checking calls) in their development.
> 
> The last time I did some benchmarking the impact wasn't as ridiculous
> as say lockdep, or spinlock debug.

I think you're safe to assume that (outside of an individual developer's
private tree) it will never be nearly as heavy as lockdep or debug
pagealloc.  I hadn't thought of spinlock debug as a heavy one, but
yes, I guess it would be heavier than almost all VM_BUG_ON()s.

> Maybe the benchmarks I was using
> weren't pushing the VM very hard, but it seemed to me that the value
> in getting info in potential problems early was higher than a small
> performance increase.

We thank you.  I may have been over-estimating how much we put inside
those VM_BUG_ON()s, sorry.  Just so long as you're aware that there's
a danger that one day we might slip something heavier in there.

Those few explicit #ifdef CONFIG_DEBUG_VMs sometimes found in mm/
are probably the worst: you might want to check on the current crop.

> 
>  > Or did I read a few months ago that some change had been made to
>  > such definitions, and VM_BUG_ON(contents) are evaluated even when
>  > the config option is off?  I do hope I'm mistaken on that.
> 
> Pretty sure that isn't the case. I remember Andrew chastising people
> a few times for putting checks in VM_BUG_ON's that needed to stay around 
> even when the config option was off. Perhaps you were thinking of one
> of those incidents ?

Avoiding side-effects in BUG_ON and VM_BUG_ON.  Yes, that comes up
from time to time, and I'm a believer on that.  I think the discussion
I'm mis/remembering sprung out of one of those: someone was surprised
by the disassembly they found when it was configured off.

The correct answer is to try it for myself and see.  Not today.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
  2012-11-02 23:26               ` Hugh Dickins
@ 2012-11-06  1:32                 ` Hugh Dickins
  -1 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-11-06  1:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dave Jones, Johannes Weiner, linux-mm, linux-kernel

Fuzzing with trinity hit the "impossible" VM_BUG_ON(error)
(which Fedora has converted to WARNING) in shmem_getpage_gfp():

WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
Call Trace:
 [<ffffffff8107100f>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff8107106a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff811903fc>] shmem_getpage_gfp+0xa5c/0xa70
 [<ffffffff81190e4f>] shmem_fault+0x4f/0xa0
 [<ffffffff8119f391>] __do_fault+0x71/0x5c0
 [<ffffffff811a2767>] handle_pte_fault+0x97/0xae0
 [<ffffffff811a4a39>] handle_mm_fault+0x289/0x350
 [<ffffffff816d091e>] __do_page_fault+0x18e/0x530
 [<ffffffff816d0ceb>] do_page_fault+0x2b/0x50
 [<ffffffff816cd3b8>] page_fault+0x28/0x30
 [<ffffffff816d5688>] tracesys+0xe1/0xe6

Thanks to Johannes for pointing to truncation: free_swap_and_cache()
only does a trylock on the page, so the page lock we've held since
before confirming swap is not enough to protect against truncation.

What cleanup is needed in this case?  Just delete_from_swap_cache(),
which takes care of the memcg uncharge.

Reported-by: Dave Jones <davej@redhat.com>
Hypothesis-by: Johannes Weiner <hannes@cmpxchg.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
---

 mm/shmem.c |   18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

--- 3.7-rc4/mm/shmem.c	2012-10-14 16:16:58.361309122 -0700
+++ linux/mm/shmem.c	2012-11-01 14:31:04.288185742 -0700
@@ -1145,8 +1145,22 @@ repeat:
 		if (!error) {
 			error = shmem_add_to_page_cache(page, mapping, index,
 						gfp, swp_to_radix_entry(swap));
-			/* We already confirmed swap, and make no allocation */
-			VM_BUG_ON(error);
+			/*
+			 * We already confirmed swap under page lock, and make
+			 * no memory allocation here, so usually no possibility
+			 * of error; but free_swap_and_cache() only trylocks a
+			 * page, so it is just possible that the entry has been
+			 * truncated or holepunched since swap was confirmed.
+			 * shmem_undo_range() will have done some of the
+			 * unaccounting, now delete_from_swap_cache() will do
+			 * the rest (including mem_cgroup_uncharge_swapcache).
+			 * Reset swap.val? No, leave it so "failed" goes back to
+			 * "repeat": reading a hole and writing should succeed.
+			 */
+			if (error) {
+				VM_BUG_ON(error != -ENOENT);
+				delete_from_swap_cache(page);
+			}
 		}
 		if (error)
 			goto failed;

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
@ 2012-11-06  1:32                 ` Hugh Dickins
  0 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-11-06  1:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dave Jones, Johannes Weiner, linux-mm, linux-kernel

Fuzzing with trinity hit the "impossible" VM_BUG_ON(error)
(which Fedora has converted to WARNING) in shmem_getpage_gfp():

WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
Pid: 29795, comm: trinity-child4 Not tainted 3.7.0-rc2+ #49
Call Trace:
 [<ffffffff8107100f>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff8107106a>] warn_slowpath_null+0x1a/0x20
 [<ffffffff811903fc>] shmem_getpage_gfp+0xa5c/0xa70
 [<ffffffff81190e4f>] shmem_fault+0x4f/0xa0
 [<ffffffff8119f391>] __do_fault+0x71/0x5c0
 [<ffffffff811a2767>] handle_pte_fault+0x97/0xae0
 [<ffffffff811a4a39>] handle_mm_fault+0x289/0x350
 [<ffffffff816d091e>] __do_page_fault+0x18e/0x530
 [<ffffffff816d0ceb>] do_page_fault+0x2b/0x50
 [<ffffffff816cd3b8>] page_fault+0x28/0x30
 [<ffffffff816d5688>] tracesys+0xe1/0xe6

Thanks to Johannes for pointing to truncation: free_swap_and_cache()
only does a trylock on the page, so the page lock we've held since
before confirming swap is not enough to protect against truncation.

What cleanup is needed in this case?  Just delete_from_swap_cache(),
which takes care of the memcg uncharge.

Reported-by: Dave Jones <davej@redhat.com>
Hypothesis-by: Johannes Weiner <hannes@cmpxchg.com>
Signed-off-by: Hugh Dickins <hughd@google.com>
Cc: stable@vger.kernel.org
---

 mm/shmem.c |   18 ++++++++++++++++--
 1 file changed, 16 insertions(+), 2 deletions(-)

--- 3.7-rc4/mm/shmem.c	2012-10-14 16:16:58.361309122 -0700
+++ linux/mm/shmem.c	2012-11-01 14:31:04.288185742 -0700
@@ -1145,8 +1145,22 @@ repeat:
 		if (!error) {
 			error = shmem_add_to_page_cache(page, mapping, index,
 						gfp, swp_to_radix_entry(swap));
-			/* We already confirmed swap, and make no allocation */
-			VM_BUG_ON(error);
+			/*
+			 * We already confirmed swap under page lock, and make
+			 * no memory allocation here, so usually no possibility
+			 * of error; but free_swap_and_cache() only trylocks a
+			 * page, so it is just possible that the entry has been
+			 * truncated or holepunched since swap was confirmed.
+			 * shmem_undo_range() will have done some of the
+			 * unaccounting, now delete_from_swap_cache() will do
+			 * the rest (including mem_cgroup_uncharge_swapcache).
+			 * Reset swap.val? No, leave it so "failed" goes back to
+			 * "repeat": reading a hole and writing should succeed.
+			 */
+			if (error) {
+				VM_BUG_ON(error != -ENOENT);
+				delete_from_swap_cache(page);
+			}
 		}
 		if (error)
 			goto failed;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
  2012-11-06  1:32                 ` Hugh Dickins
@ 2012-11-06 13:54                   ` Dave Jones
  -1 siblings, 0 replies; 69+ messages in thread
From: Dave Jones @ 2012-11-06 13:54 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote:

 > -			/* We already confirmed swap, and make no allocation */
 > -			VM_BUG_ON(error);
 > +			/*
 > +			 * We already confirmed swap under page lock, and make
 > +			 * no memory allocation here, so usually no possibility
 > +			 * of error; but free_swap_and_cache() only trylocks a
 > +			 * page, so it is just possible that the entry has been
 > +			 * truncated or holepunched since swap was confirmed.
 > +			 * shmem_undo_range() will have done some of the
 > +			 * unaccounting, now delete_from_swap_cache() will do
 > +			 * the rest (including mem_cgroup_uncharge_swapcache).
 > +			 * Reset swap.val? No, leave it so "failed" goes back to
 > +			 * "repeat": reading a hole and writing should succeed.
 > +			 */
 > +			if (error) {
 > +				VM_BUG_ON(error != -ENOENT);
 > +				delete_from_swap_cache(page);
 > +			}
 >  		}

I ran with this overnight, and still hit the (new!) VM_BUG_ON

Perhaps we should print out what 'error' was too ?  I'll rebuild with that..

------------[ cut here ]------------
WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
Hardware name: 2012 Client Platform
Modules linked in: fuse ipt_ULOG scsi_transport_iscsi binfmt_misc dn_rtmsg nfnetlink nfc caif_socket caif af_802154 phonet af_rxrpc can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 p8022 psnap llc ax25 lockd sunrpc bluetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables gspca_ov519 gspca_main videodev kvm_intel usb_debug kvm crc32c_intel ghash_clmulni_intel microcode pcspkr i2c_i801 e1000e uinput i915 video i2c_algo_bit drm_kms_helper drm i2c_core
Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54
Call Trace:
 [<ffffffff8107105f>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff810710ba>] warn_slowpath_null+0x1a/0x20
 [<ffffffff8119050c>] shmem_getpage_gfp+0xa5c/0xa70
 [<ffffffff8118fd4e>] ? shmem_getpage_gfp+0x29e/0xa70
 [<ffffffff81190f5f>] shmem_fault+0x4f/0xa0
 [<ffffffff8119f4a1>] __do_fault+0x71/0x5c0
 [<ffffffff810e1b26>] ? __lock_acquire+0x306/0x1ba0
 [<ffffffff810b7059>] ? local_clock+0x89/0xa0
 [<ffffffff811a2877>] handle_pte_fault+0x97/0xae0
 [<ffffffff816d1969>] ? sub_preempt_count+0x79/0xd0
 [<ffffffff8136dbfe>] ? delay_tsc+0xae/0x120
 [<ffffffff8136dae8>] ? __const_udelay+0x28/0x30
 [<ffffffff811a4b49>] handle_mm_fault+0x289/0x350
 [<ffffffff816d121e>] __do_page_fault+0x18e/0x530
 [<ffffffff811f2e70>] ? getname_flags.part.32+0x30/0x150
 [<ffffffff811f2e70>] ? getname_flags.part.32+0x30/0x150
 [<ffffffff811c9c5c>] ? set_track+0x8c/0x1a0
 [<ffffffff816c3dd8>] ? __slab_alloc+0x531/0x59e
 [<ffffffff810e471d>] ? trace_hardirqs_on_caller+0x15d/0x1e0
 [<ffffffff8112d469>] ? rcu_user_exit+0xc9/0xf0
 [<ffffffff816d15eb>] do_page_fault+0x2b/0x50
 [<ffffffff816cdcb8>] page_fault+0x28/0x30
 [<ffffffff81388e2c>] ? strncpy_from_user+0x6c/0x120
 [<ffffffff811f2ec6>] getname_flags.part.32+0x86/0x150
 [<ffffffff811f2fca>] getname+0x3a/0x60
 [<ffffffff811f7aa4>] sys_symlinkat+0x24/0x90
 [<ffffffff816d5f25>] ? tracesys+0x7e/0xe6
 [<ffffffff816d5f88>] tracesys+0xe1/0xe6
---[ end trace 4ba438264ea16e97 ]---



	Dave

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
@ 2012-11-06 13:54                   ` Dave Jones
  0 siblings, 0 replies; 69+ messages in thread
From: Dave Jones @ 2012-11-06 13:54 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote:

 > -			/* We already confirmed swap, and make no allocation */
 > -			VM_BUG_ON(error);
 > +			/*
 > +			 * We already confirmed swap under page lock, and make
 > +			 * no memory allocation here, so usually no possibility
 > +			 * of error; but free_swap_and_cache() only trylocks a
 > +			 * page, so it is just possible that the entry has been
 > +			 * truncated or holepunched since swap was confirmed.
 > +			 * shmem_undo_range() will have done some of the
 > +			 * unaccounting, now delete_from_swap_cache() will do
 > +			 * the rest (including mem_cgroup_uncharge_swapcache).
 > +			 * Reset swap.val? No, leave it so "failed" goes back to
 > +			 * "repeat": reading a hole and writing should succeed.
 > +			 */
 > +			if (error) {
 > +				VM_BUG_ON(error != -ENOENT);
 > +				delete_from_swap_cache(page);
 > +			}
 >  		}

I ran with this overnight, and still hit the (new!) VM_BUG_ON

Perhaps we should print out what 'error' was too ?  I'll rebuild with that..

------------[ cut here ]------------
WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
Hardware name: 2012 Client Platform
Modules linked in: fuse ipt_ULOG scsi_transport_iscsi binfmt_misc dn_rtmsg nfnetlink nfc caif_socket caif af_802154 phonet af_rxrpc can llc2 pppoe pppox ppp_generic slhc irda crc_ccitt rds af_key decnet rose x25 atm netrom appletalk ipx p8023 p8022 psnap llc ax25 lockd sunrpc bluetooth rfkill ip6t_REJECT nf_conntrack_ipv6 nf_defrag_ipv6 xt_state nf_conntrack ip6table_filter ip6_tables gspca_ov519 gspca_main videodev kvm_intel usb_debug kvm crc32c_intel ghash_clmulni_intel microcode pcspkr i2c_i801 e1000e uinput i915 video i2c_algo_bit drm_kms_helper drm i2c_core
Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54
Call Trace:
 [<ffffffff8107105f>] warn_slowpath_common+0x7f/0xc0
 [<ffffffff810710ba>] warn_slowpath_null+0x1a/0x20
 [<ffffffff8119050c>] shmem_getpage_gfp+0xa5c/0xa70
 [<ffffffff8118fd4e>] ? shmem_getpage_gfp+0x29e/0xa70
 [<ffffffff81190f5f>] shmem_fault+0x4f/0xa0
 [<ffffffff8119f4a1>] __do_fault+0x71/0x5c0
 [<ffffffff810e1b26>] ? __lock_acquire+0x306/0x1ba0
 [<ffffffff810b7059>] ? local_clock+0x89/0xa0
 [<ffffffff811a2877>] handle_pte_fault+0x97/0xae0
 [<ffffffff816d1969>] ? sub_preempt_count+0x79/0xd0
 [<ffffffff8136dbfe>] ? delay_tsc+0xae/0x120
 [<ffffffff8136dae8>] ? __const_udelay+0x28/0x30
 [<ffffffff811a4b49>] handle_mm_fault+0x289/0x350
 [<ffffffff816d121e>] __do_page_fault+0x18e/0x530
 [<ffffffff811f2e70>] ? getname_flags.part.32+0x30/0x150
 [<ffffffff811f2e70>] ? getname_flags.part.32+0x30/0x150
 [<ffffffff811c9c5c>] ? set_track+0x8c/0x1a0
 [<ffffffff816c3dd8>] ? __slab_alloc+0x531/0x59e
 [<ffffffff810e471d>] ? trace_hardirqs_on_caller+0x15d/0x1e0
 [<ffffffff8112d469>] ? rcu_user_exit+0xc9/0xf0
 [<ffffffff816d15eb>] do_page_fault+0x2b/0x50
 [<ffffffff816cdcb8>] page_fault+0x28/0x30
 [<ffffffff81388e2c>] ? strncpy_from_user+0x6c/0x120
 [<ffffffff811f2ec6>] getname_flags.part.32+0x86/0x150
 [<ffffffff811f2fca>] getname+0x3a/0x60
 [<ffffffff811f7aa4>] sys_symlinkat+0x24/0x90
 [<ffffffff816d5f25>] ? tracesys+0x7e/0xe6
 [<ffffffff816d5f88>] tracesys+0xe1/0xe6
---[ end trace 4ba438264ea16e97 ]---



	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
  2012-11-06 13:54                   ` Dave Jones
@ 2012-11-06 23:48                     ` Hugh Dickins
  -1 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-11-06 23:48 UTC (permalink / raw)
  To: Dave Jones; +Cc: Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On Tue, 6 Nov 2012, Dave Jones wrote:
> On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote:
> 
>  > -			/* We already confirmed swap, and make no allocation */
>  > -			VM_BUG_ON(error);
>  > +			/*
>  > +			 * We already confirmed swap under page lock, and make
>  > +			 * no memory allocation here, so usually no possibility
>  > +			 * of error; but free_swap_and_cache() only trylocks a
>  > +			 * page, so it is just possible that the entry has been
>  > +			 * truncated or holepunched since swap was confirmed.
>  > +			 * shmem_undo_range() will have done some of the
>  > +			 * unaccounting, now delete_from_swap_cache() will do
>  > +			 * the rest (including mem_cgroup_uncharge_swapcache).
>  > +			 * Reset swap.val? No, leave it so "failed" goes back to
>  > +			 * "repeat": reading a hole and writing should succeed.
>  > +			 */
>  > +			if (error) {
>  > +				VM_BUG_ON(error != -ENOENT);
>  > +				delete_from_swap_cache(page);
>  > +			}
>  >  		}
> 
> I ran with this overnight,

Thanks a lot...

> and still hit the (new!) VM_BUG_ON

... but that's even more surprising than your original report.

> 
> Perhaps we should print out what 'error' was too ?  I'll rebuild with that..

Thanks; though I thought the error was going to turn out too boring,
and was preparing a debug patch for you to show the expected and found
values too.  But then got very puzzled...
 
> 
> ------------[ cut here ]------------
> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
> Hardware name: 2012 Client Platform
> Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54

That's the very same line number as in your original report, despite
the long comment which the patch adds.  Are you sure that kernel was
built with the patch in?

I wouldn't usually question you, but I'm going mad trying to understand
how the VM_BUG_ON(error != -ENOENT) fires.  At the time I wrote that
line, and when I was preparing the debug patch, I was thinking that an
error from shmem_radix_tree_replace could also be -EEXIST, for when a
different something rather than nothing is found [*].  But that's not
the case, shmem_radix_tree_replace returns either 0 or -ENOENT.

So if error != -ENOENT, that means shmem_add_to_page_cache went the
radix_tree_insert route instead of the shmem_radix_tree_replace route;
which means that its 'expected' is NULL, so swp_to_radix_entry(swap)
is NULL; but swp_to_radix_entry() does an "| 2", so however corrupt
the radix_tree might be, I do not understand the new VM_BUG_ON firing.

Please tell me it was the wrong kernel!
Hugh

[*] But in thinking it over, I realize that if shmem_radix_tree_replace
had returned -EEXIST for the "wrong something" case, I would have been
wrong to BUG on that; because just as truncation could remove an entry,
something else could immediately after instantiate a new page there.

So although I believe my VM_BUG_ON(error != -ENOENT) is safe, it's
not saying what I had intended to say with it, and would have been
wrong to say that anyway.  It just looks stupid to me now, rather
like inserting a VM_BUG_ON(false) - but that does become interesting
when you report that you've hit it.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
@ 2012-11-06 23:48                     ` Hugh Dickins
  0 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-11-06 23:48 UTC (permalink / raw)
  To: Dave Jones; +Cc: Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On Tue, 6 Nov 2012, Dave Jones wrote:
> On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote:
> 
>  > -			/* We already confirmed swap, and make no allocation */
>  > -			VM_BUG_ON(error);
>  > +			/*
>  > +			 * We already confirmed swap under page lock, and make
>  > +			 * no memory allocation here, so usually no possibility
>  > +			 * of error; but free_swap_and_cache() only trylocks a
>  > +			 * page, so it is just possible that the entry has been
>  > +			 * truncated or holepunched since swap was confirmed.
>  > +			 * shmem_undo_range() will have done some of the
>  > +			 * unaccounting, now delete_from_swap_cache() will do
>  > +			 * the rest (including mem_cgroup_uncharge_swapcache).
>  > +			 * Reset swap.val? No, leave it so "failed" goes back to
>  > +			 * "repeat": reading a hole and writing should succeed.
>  > +			 */
>  > +			if (error) {
>  > +				VM_BUG_ON(error != -ENOENT);
>  > +				delete_from_swap_cache(page);
>  > +			}
>  >  		}
> 
> I ran with this overnight,

Thanks a lot...

> and still hit the (new!) VM_BUG_ON

... but that's even more surprising than your original report.

> 
> Perhaps we should print out what 'error' was too ?  I'll rebuild with that..

Thanks; though I thought the error was going to turn out too boring,
and was preparing a debug patch for you to show the expected and found
values too.  But then got very puzzled...
 
> 
> ------------[ cut here ]------------
> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
> Hardware name: 2012 Client Platform
> Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54

That's the very same line number as in your original report, despite
the long comment which the patch adds.  Are you sure that kernel was
built with the patch in?

I wouldn't usually question you, but I'm going mad trying to understand
how the VM_BUG_ON(error != -ENOENT) fires.  At the time I wrote that
line, and when I was preparing the debug patch, I was thinking that an
error from shmem_radix_tree_replace could also be -EEXIST, for when a
different something rather than nothing is found [*].  But that's not
the case, shmem_radix_tree_replace returns either 0 or -ENOENT.

So if error != -ENOENT, that means shmem_add_to_page_cache went the
radix_tree_insert route instead of the shmem_radix_tree_replace route;
which means that its 'expected' is NULL, so swp_to_radix_entry(swap)
is NULL; but swp_to_radix_entry() does an "| 2", so however corrupt
the radix_tree might be, I do not understand the new VM_BUG_ON firing.

Please tell me it was the wrong kernel!
Hugh

[*] But in thinking it over, I realize that if shmem_radix_tree_replace
had returned -EEXIST for the "wrong something" case, I would have been
wrong to BUG on that; because just as truncation could remove an entry,
something else could immediately after instantiate a new page there.

So although I believe my VM_BUG_ON(error != -ENOENT) is safe, it's
not saying what I had intended to say with it, and would have been
wrong to say that anyway.  It just looks stupid to me now, rather
like inserting a VM_BUG_ON(false) - but that does become interesting
when you report that you've hit it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
  2012-11-06 23:48                     ` Hugh Dickins
@ 2012-11-07 22:38                       ` Dave Jones
  -1 siblings, 0 replies; 69+ messages in thread
From: Dave Jones @ 2012-11-07 22:38 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On Tue, Nov 06, 2012 at 03:48:20PM -0800, Hugh Dickins wrote:
 
 > > ------------[ cut here ]------------
 > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
 > > Hardware name: 2012 Client Platform
 > > Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54
 > 
 > That's the very same line number as in your original report, despite
 > the long comment which the patch adds.  Are you sure that kernel was
 > built with the patch in?

I just changed the code by hand, and opted not to paste the comment in.

It is plausible that I built that kernel and forgot to reboot into it,
but I'm 99.9% sure that that wasn't the case.

Unfortunatly I can't check immediately, as that machine for reasons
unknown no longer wants to get past the BIOS POST check.

I'll see if I can reproduce it on a different test box until I get
that one back up.

	Dave


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
@ 2012-11-07 22:38                       ` Dave Jones
  0 siblings, 0 replies; 69+ messages in thread
From: Dave Jones @ 2012-11-07 22:38 UTC (permalink / raw)
  To: Hugh Dickins; +Cc: Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On Tue, Nov 06, 2012 at 03:48:20PM -0800, Hugh Dickins wrote:
 
 > > ------------[ cut here ]------------
 > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
 > > Hardware name: 2012 Client Platform
 > > Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54
 > 
 > That's the very same line number as in your original report, despite
 > the long comment which the patch adds.  Are you sure that kernel was
 > built with the patch in?

I just changed the code by hand, and opted not to paste the comment in.

It is plausible that I built that kernel and forgot to reboot into it,
but I'm 99.9% sure that that wasn't the case.

Unfortunatly I can't check immediately, as that machine for reasons
unknown no longer wants to get past the BIOS POST check.

I'll see if I can reproduce it on a different test box until I get
that one back up.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON fix
  2012-11-07 22:38                       ` Dave Jones
@ 2012-11-14  1:36                         ` Hugh Dickins
  -1 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-11-14  1:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dave Jones, Johannes Weiner, linux-mm, linux-kernel

We're still hoping to hear back from Dave Jones: but either way,
please fold this patch into the earlier fix for 3.7 and -stable.

Remove its VM_BUG_ON: because either it's as I believe, a tautology
which cannot happen, and does not assert what I'd intended when I put
it in, and would even be wrong if it did (a non-NULL entry can validly
materialize there); or Dave actually hit it on his updated kernel,
in which case more research will be needed, but for upstream we
do not want a user to BUG there.

Signed-off-by: Hugh Dickins <hughd@google.com>
---

 mm/shmem.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

--- mmotm/mm/shmem.c	2012-11-09 09:43:46.908046342 -0800
+++ linux/mm/shmem.c	2012-11-13 17:16:38.532528959 -0800
@@ -1158,10 +1158,8 @@ repeat:
 			 * Reset swap.val? No, leave it so "failed" goes back to
 			 * "repeat": reading a hole and writing should succeed.
 			 */
-			if (error) {
-				VM_BUG_ON(error != -ENOENT);
+			if (error)
 				delete_from_swap_cache(page);
-			}
 		}
 		if (error)
 			goto failed;

^ permalink raw reply	[flat|nested] 69+ messages in thread

* [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON fix
@ 2012-11-14  1:36                         ` Hugh Dickins
  0 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-11-14  1:36 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Dave Jones, Johannes Weiner, linux-mm, linux-kernel

We're still hoping to hear back from Dave Jones: but either way,
please fold this patch into the earlier fix for 3.7 and -stable.

Remove its VM_BUG_ON: because either it's as I believe, a tautology
which cannot happen, and does not assert what I'd intended when I put
it in, and would even be wrong if it did (a non-NULL entry can validly
materialize there); or Dave actually hit it on his updated kernel,
in which case more research will be needed, but for upstream we
do not want a user to BUG there.

Signed-off-by: Hugh Dickins <hughd@google.com>
---

 mm/shmem.c |    4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

--- mmotm/mm/shmem.c	2012-11-09 09:43:46.908046342 -0800
+++ linux/mm/shmem.c	2012-11-13 17:16:38.532528959 -0800
@@ -1158,10 +1158,8 @@ repeat:
 			 * Reset swap.val? No, leave it so "failed" goes back to
 			 * "repeat": reading a hole and writing should succeed.
 			 */
-			if (error) {
-				VM_BUG_ON(error != -ENOENT);
+			if (error)
 				delete_from_swap_cache(page);
-			}
 		}
 		if (error)
 			goto failed;

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
  2012-11-06 23:48                     ` Hugh Dickins
@ 2012-11-14  3:07                       ` Jaegeuk Hanse
  -1 siblings, 0 replies; 69+ messages in thread
From: Jaegeuk Hanse @ 2012-11-14  3:07 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On 11/07/2012 07:48 AM, Hugh Dickins wrote:
> On Tue, 6 Nov 2012, Dave Jones wrote:
>> On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote:
>>
>>   > -			/* We already confirmed swap, and make no allocation */
>>   > -			VM_BUG_ON(error);
>>   > +			/*
>>   > +			 * We already confirmed swap under page lock, and make
>>   > +			 * no memory allocation here, so usually no possibility
>>   > +			 * of error; but free_swap_and_cache() only trylocks a
>>   > +			 * page, so it is just possible that the entry has been
>>   > +			 * truncated or holepunched since swap was confirmed.
>>   > +			 * shmem_undo_range() will have done some of the
>>   > +			 * unaccounting, now delete_from_swap_cache() will do
>>   > +			 * the rest (including mem_cgroup_uncharge_swapcache).
>>   > +			 * Reset swap.val? No, leave it so "failed" goes back to
>>   > +			 * "repeat": reading a hole and writing should succeed.
>>   > +			 */
>>   > +			if (error) {
>>   > +				VM_BUG_ON(error != -ENOENT);
>>   > +				delete_from_swap_cache(page);
>>   > +			}
>>   >  		}
>>
>> I ran with this overnight,
> Thanks a lot...
>
>> and still hit the (new!) VM_BUG_ON
> ... but that's even more surprising than your original report.
>
>> Perhaps we should print out what 'error' was too ?  I'll rebuild with that..
> Thanks; though I thought the error was going to turn out too boring,
> and was preparing a debug patch for you to show the expected and found
> values too.  But then got very puzzled...
>   
>> ------------[ cut here ]------------
>> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
>> Hardware name: 2012 Client Platform
>> Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54
> That's the very same line number as in your original report, despite
> the long comment which the patch adds.  Are you sure that kernel was
> built with the patch in?
>
> I wouldn't usually question you, but I'm going mad trying to understand
> how the VM_BUG_ON(error != -ENOENT) fires.  At the time I wrote that
> line, and when I was preparing the debug patch, I was thinking that an
> error from shmem_radix_tree_replace could also be -EEXIST, for when a
> different something rather than nothing is found [*].  But that's not
> the case, shmem_radix_tree_replace returns either 0 or -ENOENT.
>
> So if error != -ENOENT, that means shmem_add_to_page_cache went the
> radix_tree_insert route instead of the shmem_radix_tree_replace route;
> which means that its 'expected' is NULL, so swp_to_radix_entry(swap)
> is NULL; but swp_to_radix_entry() does an "| 2", so however corrupt
> the radix_tree might be, I do not understand the new VM_BUG_ON firing.
>
> Please tell me it was the wrong kernel!
> Hugh
>
> [*] But in thinking it over, I realize that if shmem_radix_tree_replace
> had returned -EEXIST for the "wrong something" case, I would have been
> wrong to BUG on that; because just as truncation could remove an entry,
> something else could immediately after instantiate a new page there.

Hi Hugh,

As you said, swp_to_radix_entry() does an "| 2", so even if truncation 
could remove an entry and something else could immediately after 
instantiate a new page there, but the expected parameter will not be 
NULL, the result is radix_tree_insert will not be called and 
shmem_add_to_page_cache will not return -EEXIST, then why trigger BUG_ON ?

Regards,
Jaegeuk

> So although I believe my VM_BUG_ON(error != -ENOENT) is safe, it's
> not saying what I had intended to say with it, and would have been
> wrong to say that anyway.  It just looks stupid to me now, rather
> like inserting a VM_BUG_ON(false) - but that does become interesting
> when you report that you've hit it.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
@ 2012-11-14  3:07                       ` Jaegeuk Hanse
  0 siblings, 0 replies; 69+ messages in thread
From: Jaegeuk Hanse @ 2012-11-14  3:07 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On 11/07/2012 07:48 AM, Hugh Dickins wrote:
> On Tue, 6 Nov 2012, Dave Jones wrote:
>> On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote:
>>
>>   > -			/* We already confirmed swap, and make no allocation */
>>   > -			VM_BUG_ON(error);
>>   > +			/*
>>   > +			 * We already confirmed swap under page lock, and make
>>   > +			 * no memory allocation here, so usually no possibility
>>   > +			 * of error; but free_swap_and_cache() only trylocks a
>>   > +			 * page, so it is just possible that the entry has been
>>   > +			 * truncated or holepunched since swap was confirmed.
>>   > +			 * shmem_undo_range() will have done some of the
>>   > +			 * unaccounting, now delete_from_swap_cache() will do
>>   > +			 * the rest (including mem_cgroup_uncharge_swapcache).
>>   > +			 * Reset swap.val? No, leave it so "failed" goes back to
>>   > +			 * "repeat": reading a hole and writing should succeed.
>>   > +			 */
>>   > +			if (error) {
>>   > +				VM_BUG_ON(error != -ENOENT);
>>   > +				delete_from_swap_cache(page);
>>   > +			}
>>   >  		}
>>
>> I ran with this overnight,
> Thanks a lot...
>
>> and still hit the (new!) VM_BUG_ON
> ... but that's even more surprising than your original report.
>
>> Perhaps we should print out what 'error' was too ?  I'll rebuild with that..
> Thanks; though I thought the error was going to turn out too boring,
> and was preparing a debug patch for you to show the expected and found
> values too.  But then got very puzzled...
>   
>> ------------[ cut here ]------------
>> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
>> Hardware name: 2012 Client Platform
>> Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54
> That's the very same line number as in your original report, despite
> the long comment which the patch adds.  Are you sure that kernel was
> built with the patch in?
>
> I wouldn't usually question you, but I'm going mad trying to understand
> how the VM_BUG_ON(error != -ENOENT) fires.  At the time I wrote that
> line, and when I was preparing the debug patch, I was thinking that an
> error from shmem_radix_tree_replace could also be -EEXIST, for when a
> different something rather than nothing is found [*].  But that's not
> the case, shmem_radix_tree_replace returns either 0 or -ENOENT.
>
> So if error != -ENOENT, that means shmem_add_to_page_cache went the
> radix_tree_insert route instead of the shmem_radix_tree_replace route;
> which means that its 'expected' is NULL, so swp_to_radix_entry(swap)
> is NULL; but swp_to_radix_entry() does an "| 2", so however corrupt
> the radix_tree might be, I do not understand the new VM_BUG_ON firing.
>
> Please tell me it was the wrong kernel!
> Hugh
>
> [*] But in thinking it over, I realize that if shmem_radix_tree_replace
> had returned -EEXIST for the "wrong something" case, I would have been
> wrong to BUG on that; because just as truncation could remove an entry,
> something else could immediately after instantiate a new page there.

Hi Hugh,

As you said, swp_to_radix_entry() does an "| 2", so even if truncation 
could remove an entry and something else could immediately after 
instantiate a new page there, but the expected parameter will not be 
NULL, the result is radix_tree_insert will not be called and 
shmem_add_to_page_cache will not return -EEXIST, then why trigger BUG_ON ?

Regards,
Jaegeuk

> So although I believe my VM_BUG_ON(error != -ENOENT) is safe, it's
> not saying what I had intended to say with it, and would have been
> wrong to say that anyway.  It just looks stupid to me now, rather
> like inserting a VM_BUG_ON(false) - but that does become interesting
> when you report that you've hit it.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
  2012-11-14  3:07                       ` Jaegeuk Hanse
@ 2012-11-14  3:50                         ` Hugh Dickins
  -1 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-11-14  3:50 UTC (permalink / raw)
  To: Jaegeuk Hanse
  Cc: Dave Jones, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On Wed, 14 Nov 2012, Jaegeuk Hanse wrote:
> On 11/07/2012 07:48 AM, Hugh Dickins wrote:
> > On Tue, 6 Nov 2012, Dave Jones wrote:
> > > On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote:
> > > 
> > >   > -			/* We already confirmed swap, and make no
> > > allocation */
> > >   > -			VM_BUG_ON(error);
> > >   > +			/*
> > >   > +			 * We already confirmed swap under page lock,
> > > and make
> > >   > +			 * no memory allocation here, so usually no
> > > possibility
> > >   > +			 * of error; but free_swap_and_cache() only
> > > trylocks a
> > >   > +			 * page, so it is just possible that the
> > > entry has been
> > >   > +			 * truncated or holepunched since swap was
> > > confirmed.
> > >   > +			 * shmem_undo_range() will have done some of
> > > the
> > >   > +			 * unaccounting, now delete_from_swap_cache()
> > > will do
> > >   > +			 * the rest (including
> > > mem_cgroup_uncharge_swapcache).
> > >   > +			 * Reset swap.val? No, leave it so "failed"
> > > goes back to
> > >   > +			 * "repeat": reading a hole and writing
> > > should succeed.
> > >   > +			 */
> > >   > +			if (error) {
> > >   > +				VM_BUG_ON(error != -ENOENT);
> > >   > +				delete_from_swap_cache(page);
> > >   > +			}
> > >   >  		}
> > > 
> > > I ran with this overnight,
> > Thanks a lot...
> > 
> > > and still hit the (new!) VM_BUG_ON
> > ... but that's even more surprising than your original report.
> > 
> > > Perhaps we should print out what 'error' was too ?  I'll rebuild with
> > > that..
> > Thanks; though I thought the error was going to turn out too boring,
> > and was preparing a debug patch for you to show the expected and found
> > values too.  But then got very puzzled...
> >   
> > > ------------[ cut here ]------------
> > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
> > > Hardware name: 2012 Client Platform
> > > Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54
> > That's the very same line number as in your original report, despite
> > the long comment which the patch adds.  Are you sure that kernel was
> > built with the patch in?
> > 
> > I wouldn't usually question you, but I'm going mad trying to understand
> > how the VM_BUG_ON(error != -ENOENT) fires.  At the time I wrote that
> > line, and when I was preparing the debug patch, I was thinking that an
> > error from shmem_radix_tree_replace could also be -EEXIST, for when a
> > different something rather than nothing is found [*].  But that's not
> > the case, shmem_radix_tree_replace returns either 0 or -ENOENT.
> > 
> > So if error != -ENOENT, that means shmem_add_to_page_cache went the
> > radix_tree_insert route instead of the shmem_radix_tree_replace route;
> > which means that its 'expected' is NULL, so swp_to_radix_entry(swap)
> > is NULL; but swp_to_radix_entry() does an "| 2", so however corrupt
> > the radix_tree might be, I do not understand the new VM_BUG_ON firing.
> > 
> > Please tell me it was the wrong kernel!
> > Hugh
> > 
> > [*] But in thinking it over, I realize that if shmem_radix_tree_replace
> > had returned -EEXIST for the "wrong something" case, I would have been
> > wrong to BUG on that; because just as truncation could remove an entry,
> > something else could immediately after instantiate a new page there.
> 
> Hi Hugh,
> 
> As you said, swp_to_radix_entry() does an "| 2", so even if truncation could
> remove an entry and something else could immediately after instantiate a new
> page there, but the expected parameter will not be NULL, the result is
> radix_tree_insert will not be called and shmem_add_to_page_cache will not
> return -EEXIST, then why trigger BUG_ON ?

Why insert the VM_BUG_ON?  Because at the time I thought that it
asserted something useful; but I was mistaken, as explained above.

How can the VM_BUG_ON trigger (without stack corruption, or something
of that kind)?  I have no idea.

We are in agreement: I now think that VM_BUG_ON is misleading and silly,
and sent Andrew a further patch to remove it a just couple of hours ago.

Originally I was waiting to hear further from Dave; but his test
machine was giving trouble, and it occurred to me that, never mind
whether he says he has hit it again, or he has not hit it again,
the answer is the same: don't send that VM_BUG_ON upstream.

Hugh

> 
> Regards,
> Jaegeuk
> 
> > So although I believe my VM_BUG_ON(error != -ENOENT) is safe, it's
> > not saying what I had intended to say with it, and would have been
> > wrong to say that anyway.  It just looks stupid to me now, rather
> > like inserting a VM_BUG_ON(false) - but that does become interesting
> > when you report that you've hit it.

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
@ 2012-11-14  3:50                         ` Hugh Dickins
  0 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-11-14  3:50 UTC (permalink / raw)
  To: Jaegeuk Hanse
  Cc: Dave Jones, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On Wed, 14 Nov 2012, Jaegeuk Hanse wrote:
> On 11/07/2012 07:48 AM, Hugh Dickins wrote:
> > On Tue, 6 Nov 2012, Dave Jones wrote:
> > > On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote:
> > > 
> > >   > -			/* We already confirmed swap, and make no
> > > allocation */
> > >   > -			VM_BUG_ON(error);
> > >   > +			/*
> > >   > +			 * We already confirmed swap under page lock,
> > > and make
> > >   > +			 * no memory allocation here, so usually no
> > > possibility
> > >   > +			 * of error; but free_swap_and_cache() only
> > > trylocks a
> > >   > +			 * page, so it is just possible that the
> > > entry has been
> > >   > +			 * truncated or holepunched since swap was
> > > confirmed.
> > >   > +			 * shmem_undo_range() will have done some of
> > > the
> > >   > +			 * unaccounting, now delete_from_swap_cache()
> > > will do
> > >   > +			 * the rest (including
> > > mem_cgroup_uncharge_swapcache).
> > >   > +			 * Reset swap.val? No, leave it so "failed"
> > > goes back to
> > >   > +			 * "repeat": reading a hole and writing
> > > should succeed.
> > >   > +			 */
> > >   > +			if (error) {
> > >   > +				VM_BUG_ON(error != -ENOENT);
> > >   > +				delete_from_swap_cache(page);
> > >   > +			}
> > >   >  		}
> > > 
> > > I ran with this overnight,
> > Thanks a lot...
> > 
> > > and still hit the (new!) VM_BUG_ON
> > ... but that's even more surprising than your original report.
> > 
> > > Perhaps we should print out what 'error' was too ?  I'll rebuild with
> > > that..
> > Thanks; though I thought the error was going to turn out too boring,
> > and was preparing a debug patch for you to show the expected and found
> > values too.  But then got very puzzled...
> >   
> > > ------------[ cut here ]------------
> > > WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
> > > Hardware name: 2012 Client Platform
> > > Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54
> > That's the very same line number as in your original report, despite
> > the long comment which the patch adds.  Are you sure that kernel was
> > built with the patch in?
> > 
> > I wouldn't usually question you, but I'm going mad trying to understand
> > how the VM_BUG_ON(error != -ENOENT) fires.  At the time I wrote that
> > line, and when I was preparing the debug patch, I was thinking that an
> > error from shmem_radix_tree_replace could also be -EEXIST, for when a
> > different something rather than nothing is found [*].  But that's not
> > the case, shmem_radix_tree_replace returns either 0 or -ENOENT.
> > 
> > So if error != -ENOENT, that means shmem_add_to_page_cache went the
> > radix_tree_insert route instead of the shmem_radix_tree_replace route;
> > which means that its 'expected' is NULL, so swp_to_radix_entry(swap)
> > is NULL; but swp_to_radix_entry() does an "| 2", so however corrupt
> > the radix_tree might be, I do not understand the new VM_BUG_ON firing.
> > 
> > Please tell me it was the wrong kernel!
> > Hugh
> > 
> > [*] But in thinking it over, I realize that if shmem_radix_tree_replace
> > had returned -EEXIST for the "wrong something" case, I would have been
> > wrong to BUG on that; because just as truncation could remove an entry,
> > something else could immediately after instantiate a new page there.
> 
> Hi Hugh,
> 
> As you said, swp_to_radix_entry() does an "| 2", so even if truncation could
> remove an entry and something else could immediately after instantiate a new
> page there, but the expected parameter will not be NULL, the result is
> radix_tree_insert will not be called and shmem_add_to_page_cache will not
> return -EEXIST, then why trigger BUG_ON ?

Why insert the VM_BUG_ON?  Because at the time I thought that it
asserted something useful; but I was mistaken, as explained above.

How can the VM_BUG_ON trigger (without stack corruption, or something
of that kind)?  I have no idea.

We are in agreement: I now think that VM_BUG_ON is misleading and silly,
and sent Andrew a further patch to remove it a just couple of hours ago.

Originally I was waiting to hear further from Dave; but his test
machine was giving trouble, and it occurred to me that, never mind
whether he says he has hit it again, or he has not hit it again,
the answer is the same: don't send that VM_BUG_ON upstream.

Hugh

> 
> Regards,
> Jaegeuk
> 
> > So although I believe my VM_BUG_ON(error != -ENOENT) is safe, it's
> > not saying what I had intended to say with it, and would have been
> > wrong to say that anyway.  It just looks stupid to me now, rather
> > like inserting a VM_BUG_ON(false) - but that does become interesting
> > when you report that you've hit it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
  2012-11-14  3:50                         ` Hugh Dickins
@ 2012-11-14  6:14                           ` Dave Jones
  -1 siblings, 0 replies; 69+ messages in thread
From: Dave Jones @ 2012-11-14  6:14 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Jaegeuk Hanse, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On Tue, Nov 13, 2012 at 07:50:25PM -0800, Hugh Dickins wrote:
 
 > Originally I was waiting to hear further from Dave; but his test
 > machine was giving trouble, and it occurred to me that, never mind
 > whether he says he has hit it again, or he has not hit it again,
 > the answer is the same: don't send that VM_BUG_ON upstream.
 
Sorry, I'm supposedly on vacation. 
That said, a replacement test box has been running tests since last Friday
without hitting that case.  Maybe it was the last death throes of
that other machine before it gave up the ghost completely.

Does sound like an awful coincidence though.

	Dave


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
@ 2012-11-14  6:14                           ` Dave Jones
  0 siblings, 0 replies; 69+ messages in thread
From: Dave Jones @ 2012-11-14  6:14 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Jaegeuk Hanse, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On Tue, Nov 13, 2012 at 07:50:25PM -0800, Hugh Dickins wrote:
 
 > Originally I was waiting to hear further from Dave; but his test
 > machine was giving trouble, and it occurred to me that, never mind
 > whether he says he has hit it again, or he has not hit it again,
 > the answer is the same: don't send that VM_BUG_ON upstream.
 
Sorry, I'm supposedly on vacation. 
That said, a replacement test box has been running tests since last Friday
without hitting that case.  Maybe it was the last death throes of
that other machine before it gave up the ghost completely.

Does sound like an awful coincidence though.

	Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
  2012-11-14  6:14                           ` Dave Jones
@ 2012-11-14 10:06                             ` Hugh Dickins
  -1 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-11-14 10:06 UTC (permalink / raw)
  To: Dave Jones
  Cc: Jaegeuk Hanse, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On Wed, 14 Nov 2012, Dave Jones wrote:
> On Tue, Nov 13, 2012 at 07:50:25PM -0800, Hugh Dickins wrote:
>  
>  > Originally I was waiting to hear further from Dave; but his test
>  > machine was giving trouble, and it occurred to me that, never mind
>  > whether he says he has hit it again, or he has not hit it again,
>  > the answer is the same: don't send that VM_BUG_ON upstream.
>  
> Sorry, I'm supposedly on vacation. 

Sorry for breaking in upon that, and thank you for responding even so.

> That said, a replacement test box has been running tests since last Friday
> without hitting that case.  Maybe it was the last death throes of
> that other machine before it gave up the ghost completely.
> 
> Does sound like an awful coincidence though.

I'm still clinging to your 0.1% possibility that it was not the
intended kernel running.

Anyway, I'm not going to worry about it further, until we see another
hit - please do keep the VM_BUG_ON in your test kernel (i.e. resist
that temptation to race in from your vacation to apply today's patch!),
even though the right thing for 3.7 was to remove it.

Thanks,
Hugh

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
@ 2012-11-14 10:06                             ` Hugh Dickins
  0 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-11-14 10:06 UTC (permalink / raw)
  To: Dave Jones
  Cc: Jaegeuk Hanse, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On Wed, 14 Nov 2012, Dave Jones wrote:
> On Tue, Nov 13, 2012 at 07:50:25PM -0800, Hugh Dickins wrote:
>  
>  > Originally I was waiting to hear further from Dave; but his test
>  > machine was giving trouble, and it occurred to me that, never mind
>  > whether he says he has hit it again, or he has not hit it again,
>  > the answer is the same: don't send that VM_BUG_ON upstream.
>  
> Sorry, I'm supposedly on vacation. 

Sorry for breaking in upon that, and thank you for responding even so.

> That said, a replacement test box has been running tests since last Friday
> without hitting that case.  Maybe it was the last death throes of
> that other machine before it gave up the ghost completely.
> 
> Does sound like an awful coincidence though.

I'm still clinging to your 0.1% possibility that it was not the
intended kernel running.

Anyway, I'm not going to worry about it further, until we see another
hit - please do keep the VM_BUG_ON in your test kernel (i.e. resist
that temptation to race in from your vacation to apply today's patch!),
even though the right thing for 3.7 was to remove it.

Thanks,
Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
  2012-11-14  3:50                         ` Hugh Dickins
@ 2012-11-15  7:39                           ` Jaegeuk Hanse
  -1 siblings, 0 replies; 69+ messages in thread
From: Jaegeuk Hanse @ 2012-11-15  7:39 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On 11/14/2012 11:50 AM, Hugh Dickins wrote:
> On Wed, 14 Nov 2012, Jaegeuk Hanse wrote:
>> On 11/07/2012 07:48 AM, Hugh Dickins wrote:
>>> On Tue, 6 Nov 2012, Dave Jones wrote:
>>>> On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote:
>>>>
>>>>    > -			/* We already confirmed swap, and make no
>>>> allocation */
>>>>    > -			VM_BUG_ON(error);
>>>>    > +			/*
>>>>    > +			 * We already confirmed swap under page lock,
>>>> and make
>>>>    > +			 * no memory allocation here, so usually no
>>>> possibility
>>>>    > +			 * of error; but free_swap_and_cache() only
>>>> trylocks a
>>>>    > +			 * page, so it is just possible that the
>>>> entry has been
>>>>    > +			 * truncated or holepunched since swap was
>>>> confirmed.
>>>>    > +			 * shmem_undo_range() will have done some of
>>>> the
>>>>    > +			 * unaccounting, now delete_from_swap_cache()
>>>> will do
>>>>    > +			 * the rest (including
>>>> mem_cgroup_uncharge_swapcache).
>>>>    > +			 * Reset swap.val? No, leave it so "failed"
>>>> goes back to
>>>>    > +			 * "repeat": reading a hole and writing
>>>> should succeed.
>>>>    > +			 */
>>>>    > +			if (error) {
>>>>    > +				VM_BUG_ON(error != -ENOENT);
>>>>    > +				delete_from_swap_cache(page);
>>>>    > +			}
>>>>    >  		}
>>>>
>>>> I ran with this overnight,
>>> Thanks a lot...
>>>
>>>> and still hit the (new!) VM_BUG_ON
>>> ... but that's even more surprising than your original report.
>>>
>>>> Perhaps we should print out what 'error' was too ?  I'll rebuild with
>>>> that..
>>> Thanks; though I thought the error was going to turn out too boring,
>>> and was preparing a debug patch for you to show the expected and found
>>> values too.  But then got very puzzled...
>>>    
>>>> ------------[ cut here ]------------
>>>> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
>>>> Hardware name: 2012 Client Platform
>>>> Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54
>>> That's the very same line number as in your original report, despite
>>> the long comment which the patch adds.  Are you sure that kernel was
>>> built with the patch in?
>>>
>>> I wouldn't usually question you, but I'm going mad trying to understand
>>> how the VM_BUG_ON(error != -ENOENT) fires.  At the time I wrote that
>>> line, and when I was preparing the debug patch, I was thinking that an
>>> error from shmem_radix_tree_replace could also be -EEXIST, for when a
>>> different something rather than nothing is found [*].  But that's not
>>> the case, shmem_radix_tree_replace returns either 0 or -ENOENT.
>>>
>>> So if error != -ENOENT, that means shmem_add_to_page_cache went the
>>> radix_tree_insert route instead of the shmem_radix_tree_replace route;
>>> which means that its 'expected' is NULL, so swp_to_radix_entry(swap)
>>> is NULL; but swp_to_radix_entry() does an "| 2", so however corrupt
>>> the radix_tree might be, I do not understand the new VM_BUG_ON firing.
>>>
>>> Please tell me it was the wrong kernel!
>>> Hugh
>>>
>>> [*] But in thinking it over, I realize that if shmem_radix_tree_replace
>>> had returned -EEXIST for the "wrong something" case, I would have been
>>> wrong to BUG on that; because just as truncation could remove an entry,
>>> something else could immediately after instantiate a new page there.
>> Hi Hugh,
>>
>> As you said, swp_to_radix_entry() does an "| 2", so even if truncation could
>> remove an entry and something else could immediately after instantiate a new
>> page there, but the expected parameter will not be NULL, the result is
>> radix_tree_insert will not be called and shmem_add_to_page_cache will not
>> return -EEXIST, then why trigger BUG_ON ?
> Why insert the VM_BUG_ON?  Because at the time I thought that it
> asserted something useful; but I was mistaken, as explained above.
>
> How can the VM_BUG_ON trigger (without stack corruption, or something
> of that kind)?  I have no idea.
>
> We are in agreement: I now think that VM_BUG_ON is misleading and silly,
> and sent Andrew a further patch to remove it a just couple of hours ago.
>
> Originally I was waiting to hear further from Dave; but his test
> machine was giving trouble, and it occurred to me that, never mind
> whether he says he has hit it again, or he has not hit it again,
> the answer is the same: don't send that VM_BUG_ON upstream.
>
> Hugh

Thanks Hugh.

Another question. Why the function shmem_fallocate which you add to 
kernel need call shmem_getpage?

Regards,
Jaegeuk

>
>> Regards,
>> Jaegeuk
>>
>>> So although I believe my VM_BUG_ON(error != -ENOENT) is safe, it's
>>> not saying what I had intended to say with it, and would have been
>>> wrong to say that anyway.  It just looks stupid to me now, rather
>>> like inserting a VM_BUG_ON(false) - but that does become interesting
>>> when you report that you've hit it.


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
@ 2012-11-15  7:39                           ` Jaegeuk Hanse
  0 siblings, 0 replies; 69+ messages in thread
From: Jaegeuk Hanse @ 2012-11-15  7:39 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On 11/14/2012 11:50 AM, Hugh Dickins wrote:
> On Wed, 14 Nov 2012, Jaegeuk Hanse wrote:
>> On 11/07/2012 07:48 AM, Hugh Dickins wrote:
>>> On Tue, 6 Nov 2012, Dave Jones wrote:
>>>> On Mon, Nov 05, 2012 at 05:32:41PM -0800, Hugh Dickins wrote:
>>>>
>>>>    > -			/* We already confirmed swap, and make no
>>>> allocation */
>>>>    > -			VM_BUG_ON(error);
>>>>    > +			/*
>>>>    > +			 * We already confirmed swap under page lock,
>>>> and make
>>>>    > +			 * no memory allocation here, so usually no
>>>> possibility
>>>>    > +			 * of error; but free_swap_and_cache() only
>>>> trylocks a
>>>>    > +			 * page, so it is just possible that the
>>>> entry has been
>>>>    > +			 * truncated or holepunched since swap was
>>>> confirmed.
>>>>    > +			 * shmem_undo_range() will have done some of
>>>> the
>>>>    > +			 * unaccounting, now delete_from_swap_cache()
>>>> will do
>>>>    > +			 * the rest (including
>>>> mem_cgroup_uncharge_swapcache).
>>>>    > +			 * Reset swap.val? No, leave it so "failed"
>>>> goes back to
>>>>    > +			 * "repeat": reading a hole and writing
>>>> should succeed.
>>>>    > +			 */
>>>>    > +			if (error) {
>>>>    > +				VM_BUG_ON(error != -ENOENT);
>>>>    > +				delete_from_swap_cache(page);
>>>>    > +			}
>>>>    >  		}
>>>>
>>>> I ran with this overnight,
>>> Thanks a lot...
>>>
>>>> and still hit the (new!) VM_BUG_ON
>>> ... but that's even more surprising than your original report.
>>>
>>>> Perhaps we should print out what 'error' was too ?  I'll rebuild with
>>>> that..
>>> Thanks; though I thought the error was going to turn out too boring,
>>> and was preparing a debug patch for you to show the expected and found
>>> values too.  But then got very puzzled...
>>>    
>>>> ------------[ cut here ]------------
>>>> WARNING: at mm/shmem.c:1151 shmem_getpage_gfp+0xa5c/0xa70()
>>>> Hardware name: 2012 Client Platform
>>>> Pid: 21798, comm: trinity-child4 Not tainted 3.7.0-rc4+ #54
>>> That's the very same line number as in your original report, despite
>>> the long comment which the patch adds.  Are you sure that kernel was
>>> built with the patch in?
>>>
>>> I wouldn't usually question you, but I'm going mad trying to understand
>>> how the VM_BUG_ON(error != -ENOENT) fires.  At the time I wrote that
>>> line, and when I was preparing the debug patch, I was thinking that an
>>> error from shmem_radix_tree_replace could also be -EEXIST, for when a
>>> different something rather than nothing is found [*].  But that's not
>>> the case, shmem_radix_tree_replace returns either 0 or -ENOENT.
>>>
>>> So if error != -ENOENT, that means shmem_add_to_page_cache went the
>>> radix_tree_insert route instead of the shmem_radix_tree_replace route;
>>> which means that its 'expected' is NULL, so swp_to_radix_entry(swap)
>>> is NULL; but swp_to_radix_entry() does an "| 2", so however corrupt
>>> the radix_tree might be, I do not understand the new VM_BUG_ON firing.
>>>
>>> Please tell me it was the wrong kernel!
>>> Hugh
>>>
>>> [*] But in thinking it over, I realize that if shmem_radix_tree_replace
>>> had returned -EEXIST for the "wrong something" case, I would have been
>>> wrong to BUG on that; because just as truncation could remove an entry,
>>> something else could immediately after instantiate a new page there.
>> Hi Hugh,
>>
>> As you said, swp_to_radix_entry() does an "| 2", so even if truncation could
>> remove an entry and something else could immediately after instantiate a new
>> page there, but the expected parameter will not be NULL, the result is
>> radix_tree_insert will not be called and shmem_add_to_page_cache will not
>> return -EEXIST, then why trigger BUG_ON ?
> Why insert the VM_BUG_ON?  Because at the time I thought that it
> asserted something useful; but I was mistaken, as explained above.
>
> How can the VM_BUG_ON trigger (without stack corruption, or something
> of that kind)?  I have no idea.
>
> We are in agreement: I now think that VM_BUG_ON is misleading and silly,
> and sent Andrew a further patch to remove it a just couple of hours ago.
>
> Originally I was waiting to hear further from Dave; but his test
> machine was giving trouble, and it occurred to me that, never mind
> whether he says he has hit it again, or he has not hit it again,
> the answer is the same: don't send that VM_BUG_ON upstream.
>
> Hugh

Thanks Hugh.

Another question. Why the function shmem_fallocate which you add to 
kernel need call shmem_getpage?

Regards,
Jaegeuk

>
>> Regards,
>> Jaegeuk
>>
>>> So although I believe my VM_BUG_ON(error != -ENOENT) is safe, it's
>>> not saying what I had intended to say with it, and would have been
>>> wrong to say that anyway.  It just looks stupid to me now, rather
>>> like inserting a VM_BUG_ON(false) - but that does become interesting
>>> when you report that you've hit it.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
  2012-11-15  7:39                           ` Jaegeuk Hanse
@ 2012-11-15 19:56                             ` Hugh Dickins
  -1 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-11-15 19:56 UTC (permalink / raw)
  To: Jaegeuk Hanse
  Cc: Dave Jones, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

Offtopic...

On Thu, 15 Nov 2012, Jaegeuk Hanse wrote:
> 
> Another question. Why the function shmem_fallocate which you add to kernel
> need call shmem_getpage?

Because shmem_getpage(_gfp) is where shmem's
page lookup and allocation complexities are handled.

I assume the question behind your question is: why does shmem actually
allocate pages for its fallocate, instead of just reserving the space?

I did play with just reserving the space, with more special entries in
the radix_tree to note the reservations made.  It should be doable for
the vm_enough_memory and sbinfo->used_blocks reservations.

What absolutely deterred me from taking that path was the mem_cgroup
case: shmem and swap and memcg are not easy to get working right together,
and nobody would thank me for complicating memcg just for shmem_fallocate.

By allocating pages, the pre-existing memcg code just works; if we used
reservations instead, we would have to track their memcg charges in some
additional new way.  I see no justification for that complication.

Hugh

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
@ 2012-11-15 19:56                             ` Hugh Dickins
  0 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-11-15 19:56 UTC (permalink / raw)
  To: Jaegeuk Hanse
  Cc: Dave Jones, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

Offtopic...

On Thu, 15 Nov 2012, Jaegeuk Hanse wrote:
> 
> Another question. Why the function shmem_fallocate which you add to kernel
> need call shmem_getpage?

Because shmem_getpage(_gfp) is where shmem's
page lookup and allocation complexities are handled.

I assume the question behind your question is: why does shmem actually
allocate pages for its fallocate, instead of just reserving the space?

I did play with just reserving the space, with more special entries in
the radix_tree to note the reservations made.  It should be doable for
the vm_enough_memory and sbinfo->used_blocks reservations.

What absolutely deterred me from taking that path was the mem_cgroup
case: shmem and swap and memcg are not easy to get working right together,
and nobody would thank me for complicating memcg just for shmem_fallocate.

By allocating pages, the pre-existing memcg code just works; if we used
reservations instead, we would have to track their memcg charges in some
additional new way.  I see no justification for that complication.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
  2012-11-15 19:56                             ` Hugh Dickins
@ 2012-11-16  0:40                               ` Jaegeuk Hanse
  -1 siblings, 0 replies; 69+ messages in thread
From: Jaegeuk Hanse @ 2012-11-16  0:40 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On 11/16/2012 03:56 AM, Hugh Dickins wrote:
> Offtopic...
>
> On Thu, 15 Nov 2012, Jaegeuk Hanse wrote:
>> Another question. Why the function shmem_fallocate which you add to kernel
>> need call shmem_getpage?
> Because shmem_getpage(_gfp) is where shmem's
> page lookup and allocation complexities are handled.
>
> I assume the question behind your question is: why does shmem actually
> allocate pages for its fallocate, instead of just reserving the space?

Yeah, this is what I want to know.

>
> I did play with just reserving the space, with more special entries in
> the radix_tree to note the reservations made.  It should be doable for
> the vm_enough_memory and sbinfo->used_blocks reservations.
>
> What absolutely deterred me from taking that path was the mem_cgroup
> case: shmem and swap and memcg are not easy to get working right together,
> and nobody would thank me for complicating memcg just for shmem_fallocate.
>
> By allocating pages, the pre-existing memcg code just works; if we used
> reservations instead, we would have to track their memcg charges in some
> additional new way.  I see no justification for that complication.

Oh, I see, thanks Hugh. :-)

> Hugh


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
@ 2012-11-16  0:40                               ` Jaegeuk Hanse
  0 siblings, 0 replies; 69+ messages in thread
From: Jaegeuk Hanse @ 2012-11-16  0:40 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On 11/16/2012 03:56 AM, Hugh Dickins wrote:
> Offtopic...
>
> On Thu, 15 Nov 2012, Jaegeuk Hanse wrote:
>> Another question. Why the function shmem_fallocate which you add to kernel
>> need call shmem_getpage?
> Because shmem_getpage(_gfp) is where shmem's
> page lookup and allocation complexities are handled.
>
> I assume the question behind your question is: why does shmem actually
> allocate pages for its fallocate, instead of just reserving the space?

Yeah, this is what I want to know.

>
> I did play with just reserving the space, with more special entries in
> the radix_tree to note the reservations made.  It should be doable for
> the vm_enough_memory and sbinfo->used_blocks reservations.
>
> What absolutely deterred me from taking that path was the mem_cgroup
> case: shmem and swap and memcg are not easy to get working right together,
> and nobody would thank me for complicating memcg just for shmem_fallocate.
>
> By allocating pages, the pre-existing memcg code just works; if we used
> reservations instead, we would have to track their memcg charges in some
> additional new way.  I see no justification for that complication.

Oh, I see, thanks Hugh. :-)

> Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
  2012-11-15 19:56                             ` Hugh Dickins
@ 2012-11-16  9:34                               ` Jaegeuk Hanse
  -1 siblings, 0 replies; 69+ messages in thread
From: Jaegeuk Hanse @ 2012-11-16  9:34 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On 11/16/2012 03:56 AM, Hugh Dickins wrote:
> Offtopic...
>
> On Thu, 15 Nov 2012, Jaegeuk Hanse wrote:
>> Another question. Why the function shmem_fallocate which you add to kernel
>> need call shmem_getpage?
> Because shmem_getpage(_gfp) is where shmem's
> page lookup and allocation complexities are handled.
>
> I assume the question behind your question is: why does shmem actually
> allocate pages for its fallocate, instead of just reserving the space?
>
> I did play with just reserving the space, with more special entries in
> the radix_tree to note the reservations made.  It should be doable for
> the vm_enough_memory and sbinfo->used_blocks reservations.
>
> What absolutely deterred me from taking that path was the mem_cgroup
> case: shmem and swap and memcg are not easy to get working right together,
> and nobody would thank me for complicating memcg just for shmem_fallocate.
>
> By allocating pages, the pre-existing memcg code just works; if we used
> reservations instead, we would have to track their memcg charges in some
> additional new way.  I see no justification for that complication.

Hi Hugh

Some questions about your shmem/tmpfs: misc and fallocate patchset.

- Since shmem_setattr can truncate tmpfs files, why need add another 
similar codes in function shmem_fallocate? What's the trick?
- in tmpfs: support fallocate preallocation patch changelog:
   "Christoph Hellwig: What for exactly?  Please explain why 
preallocating on tmpfs would make any sense.
   Kay Sievers: To be able to safely use mmap(), regarding SIGBUS, on 
files on the /dev/shm filesystem.  The glibc fallback loop for -ENOSYS 
[or -EOPNOTSUPP] on fallocate is just ugly."
   Could shmem/tmpfs fallocate prevent one process truncate the file 
which the second process mmap() and get SIGBUS when the second process 
access mmap but out of current size of file?

Regards,
Jaegeuk

> Hugh


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
@ 2012-11-16  9:34                               ` Jaegeuk Hanse
  0 siblings, 0 replies; 69+ messages in thread
From: Jaegeuk Hanse @ 2012-11-16  9:34 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On 11/16/2012 03:56 AM, Hugh Dickins wrote:
> Offtopic...
>
> On Thu, 15 Nov 2012, Jaegeuk Hanse wrote:
>> Another question. Why the function shmem_fallocate which you add to kernel
>> need call shmem_getpage?
> Because shmem_getpage(_gfp) is where shmem's
> page lookup and allocation complexities are handled.
>
> I assume the question behind your question is: why does shmem actually
> allocate pages for its fallocate, instead of just reserving the space?
>
> I did play with just reserving the space, with more special entries in
> the radix_tree to note the reservations made.  It should be doable for
> the vm_enough_memory and sbinfo->used_blocks reservations.
>
> What absolutely deterred me from taking that path was the mem_cgroup
> case: shmem and swap and memcg are not easy to get working right together,
> and nobody would thank me for complicating memcg just for shmem_fallocate.
>
> By allocating pages, the pre-existing memcg code just works; if we used
> reservations instead, we would have to track their memcg charges in some
> additional new way.  I see no justification for that complication.

Hi Hugh

Some questions about your shmem/tmpfs: misc and fallocate patchset.

- Since shmem_setattr can truncate tmpfs files, why need add another 
similar codes in function shmem_fallocate? What's the trick?
- in tmpfs: support fallocate preallocation patch changelog:
   "Christoph Hellwig: What for exactly?  Please explain why 
preallocating on tmpfs would make any sense.
   Kay Sievers: To be able to safely use mmap(), regarding SIGBUS, on 
files on the /dev/shm filesystem.  The glibc fallback loop for -ENOSYS 
[or -EOPNOTSUPP] on fallocate is just ugly."
   Could shmem/tmpfs fallocate prevent one process truncate the file 
which the second process mmap() and get SIGBUS when the second process 
access mmap but out of current size of file?

Regards,
Jaegeuk

> Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
  2012-11-16  9:34                               ` Jaegeuk Hanse
@ 2012-11-17  4:48                                 ` Hugh Dickins
  -1 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-11-17  4:48 UTC (permalink / raw)
  To: Jaegeuk Hanse
  Cc: Dave Jones, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

Further offtopic..

On Fri, 16 Nov 2012, Jaegeuk Hanse wrote:
> Some questions about your shmem/tmpfs: misc and fallocate patchset.
> 
> - Since shmem_setattr can truncate tmpfs files, why need add another similar
> codes in function shmem_fallocate? What's the trick?

I don't know if I understand you.  In general, hole-punching is different
from truncation.  Supporting the hole-punch mode of the fallocate system
call is different from supporting truncation.  They're closely related,
and share code, but meet different specifications.

> - in tmpfs: support fallocate preallocation patch changelog:
>   "Christoph Hellwig: What for exactly?  Please explain why preallocating on
> tmpfs would make any sense.
>   Kay Sievers: To be able to safely use mmap(), regarding SIGBUS, on files on
> the /dev/shm filesystem.  The glibc fallback loop for -ENOSYS [or
> -EOPNOTSUPP] on fallocate is just ugly."
>   Could shmem/tmpfs fallocate prevent one process truncate the file which the
> second process mmap() and get SIGBUS when the second process access mmap but
> out of current size of file?

Again, I don't know if I understand you.  fallocate does not prevent
truncation or races or SIGBUS.  I believe that Kay meant that without
using fallocate to allocate the memory in advance, systemd found it hard
to protect itself from the possibility of getting a SIGBUS, if access to
a shmem mapping happened to run out of memory/space in the middle.

I never grasped why writing the file in advance was not good enough:
fallocate happened to be what they hoped to use, and it was hard to
deny it, given that tmpfs already supported hole-punching, and was
about to convert to the fallocate interface for that.

Hugh

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
@ 2012-11-17  4:48                                 ` Hugh Dickins
  0 siblings, 0 replies; 69+ messages in thread
From: Hugh Dickins @ 2012-11-17  4:48 UTC (permalink / raw)
  To: Jaegeuk Hanse
  Cc: Dave Jones, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

Further offtopic..

On Fri, 16 Nov 2012, Jaegeuk Hanse wrote:
> Some questions about your shmem/tmpfs: misc and fallocate patchset.
> 
> - Since shmem_setattr can truncate tmpfs files, why need add another similar
> codes in function shmem_fallocate? What's the trick?

I don't know if I understand you.  In general, hole-punching is different
from truncation.  Supporting the hole-punch mode of the fallocate system
call is different from supporting truncation.  They're closely related,
and share code, but meet different specifications.

> - in tmpfs: support fallocate preallocation patch changelog:
>   "Christoph Hellwig: What for exactly?  Please explain why preallocating on
> tmpfs would make any sense.
>   Kay Sievers: To be able to safely use mmap(), regarding SIGBUS, on files on
> the /dev/shm filesystem.  The glibc fallback loop for -ENOSYS [or
> -EOPNOTSUPP] on fallocate is just ugly."
>   Could shmem/tmpfs fallocate prevent one process truncate the file which the
> second process mmap() and get SIGBUS when the second process access mmap but
> out of current size of file?

Again, I don't know if I understand you.  fallocate does not prevent
truncation or races or SIGBUS.  I believe that Kay meant that without
using fallocate to allocate the memory in advance, systemd found it hard
to protect itself from the possibility of getting a SIGBUS, if access to
a shmem mapping happened to run out of memory/space in the middle.

I never grasped why writing the file in advance was not good enough:
fallocate happened to be what they hoped to use, and it was hard to
deny it, given that tmpfs already supported hole-punching, and was
about to convert to the fallocate interface for that.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
  2012-11-17  4:48                                 ` Hugh Dickins
@ 2012-11-18  0:57                                   ` Jaegeuk Hanse
  -1 siblings, 0 replies; 69+ messages in thread
From: Jaegeuk Hanse @ 2012-11-18  0:57 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On 11/17/2012 12:48 PM, Hugh Dickins wrote:
> Further offtopic..

Thanks for your explanation, Hugh. :-)

>
> On Fri, 16 Nov 2012, Jaegeuk Hanse wrote:
>> Some questions about your shmem/tmpfs: misc and fallocate patchset.
>>
>> - Since shmem_setattr can truncate tmpfs files, why need add another similar
>> codes in function shmem_fallocate? What's the trick?
> I don't know if I understand you.  In general, hole-punching is different
> from truncation.  Supporting the hole-punch mode of the fallocate system
> call is different from supporting truncation.  They're closely related,
> and share code, but meet different specifications.

What's the different between shmem/tmpfs hole-punching and 
truncate_setsize/truncate_pagecache?
Do you mean one is punch hole in the file and the other one is shrink or 
extent the size of a file?

>> - in tmpfs: support fallocate preallocation patch changelog:
>>    "Christoph Hellwig: What for exactly?  Please explain why preallocating on
>> tmpfs would make any sense.
>>    Kay Sievers: To be able to safely use mmap(), regarding SIGBUS, on files on
>> the /dev/shm filesystem.  The glibc fallback loop for -ENOSYS [or
>> -EOPNOTSUPP] on fallocate is just ugly."
>>    Could shmem/tmpfs fallocate prevent one process truncate the file which the
>> second process mmap() and get SIGBUS when the second process access mmap but
>> out of current size of file?
> Again, I don't know if I understand you.  fallocate does not prevent
> truncation or races or SIGBUS.  I believe that Kay meant that without
> using fallocate to allocate the memory in advance, systemd found it hard
> to protect itself from the possibility of getting a SIGBUS, if access to
> a shmem mapping happened to run out of memory/space in the middle.

IIUC, it will return VM_xxx_OOM instead of SIGBUS if run out of memory. 
Then how can get SIGBUS in this scene?

Regards,
Jaegeuk

> I never grasped why writing the file in advance was not good enough:
> fallocate happened to be what they hoped to use, and it was hard to
> deny it, given that tmpfs already supported hole-punching, and was
> about to convert to the fallocate interface for that.
> Hugh


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
@ 2012-11-18  0:57                                   ` Jaegeuk Hanse
  0 siblings, 0 replies; 69+ messages in thread
From: Jaegeuk Hanse @ 2012-11-18  0:57 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On 11/17/2012 12:48 PM, Hugh Dickins wrote:
> Further offtopic..

Thanks for your explanation, Hugh. :-)

>
> On Fri, 16 Nov 2012, Jaegeuk Hanse wrote:
>> Some questions about your shmem/tmpfs: misc and fallocate patchset.
>>
>> - Since shmem_setattr can truncate tmpfs files, why need add another similar
>> codes in function shmem_fallocate? What's the trick?
> I don't know if I understand you.  In general, hole-punching is different
> from truncation.  Supporting the hole-punch mode of the fallocate system
> call is different from supporting truncation.  They're closely related,
> and share code, but meet different specifications.

What's the different between shmem/tmpfs hole-punching and 
truncate_setsize/truncate_pagecache?
Do you mean one is punch hole in the file and the other one is shrink or 
extent the size of a file?

>> - in tmpfs: support fallocate preallocation patch changelog:
>>    "Christoph Hellwig: What for exactly?  Please explain why preallocating on
>> tmpfs would make any sense.
>>    Kay Sievers: To be able to safely use mmap(), regarding SIGBUS, on files on
>> the /dev/shm filesystem.  The glibc fallback loop for -ENOSYS [or
>> -EOPNOTSUPP] on fallocate is just ugly."
>>    Could shmem/tmpfs fallocate prevent one process truncate the file which the
>> second process mmap() and get SIGBUS when the second process access mmap but
>> out of current size of file?
> Again, I don't know if I understand you.  fallocate does not prevent
> truncation or races or SIGBUS.  I believe that Kay meant that without
> using fallocate to allocate the memory in advance, systemd found it hard
> to protect itself from the possibility of getting a SIGBUS, if access to
> a shmem mapping happened to run out of memory/space in the middle.

IIUC, it will return VM_xxx_OOM instead of SIGBUS if run out of memory. 
Then how can get SIGBUS in this scene?

Regards,
Jaegeuk

> I never grasped why writing the file in advance was not good enough:
> fallocate happened to be what they hoped to use, and it was hard to
> deny it, given that tmpfs already supported hole-punching, and was
> about to convert to the fallocate interface for that.
> Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
  2012-11-17  4:48                                 ` Hugh Dickins
@ 2012-11-18  1:48                                   ` Jaegeuk Hanse
  -1 siblings, 0 replies; 69+ messages in thread
From: Jaegeuk Hanse @ 2012-11-18  1:48 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On 11/17/2012 12:48 PM, Hugh Dickins wrote:
> Further offtopic..

Hi Hugh,

- I see you add this in vfs.txt:
   +  fallocate: called by the VFS to preallocate blocks or punch a hole.
   I want to know if it's necessary to add it to man page since users 
still don't know fallocate can punch a hole from man fallocate.
- in function shmem_fallocate:
+               else if (shmem_falloc.nr_unswapped > 
shmem_falloc.nr_falloced)
+                       error = -ENOMEM;
If this changelog "shmem_fallocate() compare counts and give up once the 
reactivated pages have started to coming back to writepage 
(approximately: some zones would in fact recycle faster than others)." 
describe why need this change? If the answer is yes, I have two questions.
1) how can guarantee it really don't need preallocation if just one or a 
few pages always reactivated, in this scene, nr_unswapped maybe grow 
bigger enough than shmem_falloc.nr_falloced
2) why return -ENOMEM, it's not really OOM, is it a trick or ...?

Regards,
Jaegeuk

>
> On Fri, 16 Nov 2012, Jaegeuk Hanse wrote:
>> Some questions about your shmem/tmpfs: misc and fallocate patchset.
>>
>> - Since shmem_setattr can truncate tmpfs files, why need add another similar
>> codes in function shmem_fallocate? What's the trick?
> I don't know if I understand you.  In general, hole-punching is different
> from truncation.  Supporting the hole-punch mode of the fallocate system
> call is different from supporting truncation.  They're closely related,
> and share code, but meet different specifications.
>
>> - in tmpfs: support fallocate preallocation patch changelog:
>>    "Christoph Hellwig: What for exactly?  Please explain why preallocating on
>> tmpfs would make any sense.
>>    Kay Sievers: To be able to safely use mmap(), regarding SIGBUS, on files on
>> the /dev/shm filesystem.  The glibc fallback loop for -ENOSYS [or
>> -EOPNOTSUPP] on fallocate is just ugly."
>>    Could shmem/tmpfs fallocate prevent one process truncate the file which the
>> second process mmap() and get SIGBUS when the second process access mmap but
>> out of current size of file?
> Again, I don't know if I understand you.  fallocate does not prevent
> truncation or races or SIGBUS.  I believe that Kay meant that without
> using fallocate to allocate the memory in advance, systemd found it hard
> to protect itself from the possibility of getting a SIGBUS, if access to
> a shmem mapping happened to run out of memory/space in the middle.
>
> I never grasped why writing the file in advance was not good enough:
> fallocate happened to be what they hoped to use, and it was hard to
> deny it, given that tmpfs already supported hole-punching, and was
> about to convert to the fallocate interface for that.
>
> Hugh


^ permalink raw reply	[flat|nested] 69+ messages in thread

* Re: [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON
@ 2012-11-18  1:48                                   ` Jaegeuk Hanse
  0 siblings, 0 replies; 69+ messages in thread
From: Jaegeuk Hanse @ 2012-11-18  1:48 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Dave Jones, Andrew Morton, Johannes Weiner, linux-mm, linux-kernel

On 11/17/2012 12:48 PM, Hugh Dickins wrote:
> Further offtopic..

Hi Hugh,

- I see you add this in vfs.txt:
   +  fallocate: called by the VFS to preallocate blocks or punch a hole.
   I want to know if it's necessary to add it to man page since users 
still don't know fallocate can punch a hole from man fallocate.
- in function shmem_fallocate:
+               else if (shmem_falloc.nr_unswapped > 
shmem_falloc.nr_falloced)
+                       error = -ENOMEM;
If this changelog "shmem_fallocate() compare counts and give up once the 
reactivated pages have started to coming back to writepage 
(approximately: some zones would in fact recycle faster than others)." 
describe why need this change? If the answer is yes, I have two questions.
1) how can guarantee it really don't need preallocation if just one or a 
few pages always reactivated, in this scene, nr_unswapped maybe grow 
bigger enough than shmem_falloc.nr_falloced
2) why return -ENOMEM, it's not really OOM, is it a trick or ...?

Regards,
Jaegeuk

>
> On Fri, 16 Nov 2012, Jaegeuk Hanse wrote:
>> Some questions about your shmem/tmpfs: misc and fallocate patchset.
>>
>> - Since shmem_setattr can truncate tmpfs files, why need add another similar
>> codes in function shmem_fallocate? What's the trick?
> I don't know if I understand you.  In general, hole-punching is different
> from truncation.  Supporting the hole-punch mode of the fallocate system
> call is different from supporting truncation.  They're closely related,
> and share code, but meet different specifications.
>
>> - in tmpfs: support fallocate preallocation patch changelog:
>>    "Christoph Hellwig: What for exactly?  Please explain why preallocating on
>> tmpfs would make any sense.
>>    Kay Sievers: To be able to safely use mmap(), regarding SIGBUS, on files on
>> the /dev/shm filesystem.  The glibc fallback loop for -ENOSYS [or
>> -EOPNOTSUPP] on fallocate is just ugly."
>>    Could shmem/tmpfs fallocate prevent one process truncate the file which the
>> second process mmap() and get SIGBUS when the second process access mmap but
>> out of current size of file?
> Again, I don't know if I understand you.  fallocate does not prevent
> truncation or races or SIGBUS.  I believe that Kay meant that without
> using fallocate to allocate the memory in advance, systemd found it hard
> to protect itself from the possibility of getting a SIGBUS, if access to
> a shmem mapping happened to run out of memory/space in the middle.
>
> I never grasped why writing the file in advance was not good enough:
> fallocate happened to be what they hoped to use, and it was hard to
> deny it, given that tmpfs already supported hole-punching, and was
> about to convert to the fallocate interface for that.
>
> Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 69+ messages in thread

end of thread, other threads:[~2012-11-18  1:48 UTC | newest]

Thread overview: 69+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-10-25  2:37 shmem_getpage_gfp VM_BUG_ON triggered. [3.7rc2] Dave Jones
2012-10-25  2:37 ` Dave Jones
2012-10-25  4:36 ` Hugh Dickins
2012-10-25  4:36   ` Hugh Dickins
2012-10-25  4:50   ` Ni zhan Chen
2012-10-25  4:50     ` Ni zhan Chen
2012-10-25  6:59     ` Hugh Dickins
2012-10-25  6:59       ` Hugh Dickins
2012-10-25  9:53       ` Ni zhan Chen
2012-10-25 10:21       ` Ni zhan Chen
2012-10-25 10:21         ` Ni zhan Chen
2012-10-25 21:27         ` Hugh Dickins
2012-10-25 21:27           ` Hugh Dickins
2012-10-26  1:48           ` Ni zhan Chen
2012-10-26  1:48             ` Ni zhan Chen
2012-10-25 11:14   ` Dave Jones
2012-10-25 11:14     ` Dave Jones
2012-10-25 21:28     ` Hugh Dickins
2012-10-25 21:28       ` Hugh Dickins
2012-10-25 20:52   ` Johannes Weiner
2012-10-25 20:52     ` Johannes Weiner
2012-10-25 21:48     ` Hugh Dickins
2012-10-25 21:48       ` Hugh Dickins
2012-10-26  2:15       ` Ni zhan Chen
2012-10-26  2:15         ` Ni zhan Chen
2012-11-01 19:10   ` Dave Jones
2012-11-01 19:10     ` Dave Jones
2012-11-01 23:03     ` Hugh Dickins
2012-11-01 23:03       ` Hugh Dickins
2012-11-01 23:20       ` Dave Jones
2012-11-01 23:20         ` Dave Jones
2012-11-01 23:48         ` Hugh Dickins
2012-11-01 23:48           ` Hugh Dickins
2012-11-02  1:43           ` Dave Jones
2012-11-02  1:43             ` Dave Jones
2012-11-02 23:26             ` Hugh Dickins
2012-11-02 23:26               ` Hugh Dickins
2012-11-06  1:32               ` [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON Hugh Dickins
2012-11-06  1:32                 ` Hugh Dickins
2012-11-06 13:54                 ` Dave Jones
2012-11-06 13:54                   ` Dave Jones
2012-11-06 23:48                   ` Hugh Dickins
2012-11-06 23:48                     ` Hugh Dickins
2012-11-07 22:38                     ` Dave Jones
2012-11-07 22:38                       ` Dave Jones
2012-11-14  1:36                       ` [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON fix Hugh Dickins
2012-11-14  1:36                         ` Hugh Dickins
2012-11-14  3:07                     ` [PATCH] tmpfs: fix shmem_getpage_gfp VM_BUG_ON Jaegeuk Hanse
2012-11-14  3:07                       ` Jaegeuk Hanse
2012-11-14  3:50                       ` Hugh Dickins
2012-11-14  3:50                         ` Hugh Dickins
2012-11-14  6:14                         ` Dave Jones
2012-11-14  6:14                           ` Dave Jones
2012-11-14 10:06                           ` Hugh Dickins
2012-11-14 10:06                             ` Hugh Dickins
2012-11-15  7:39                         ` Jaegeuk Hanse
2012-11-15  7:39                           ` Jaegeuk Hanse
2012-11-15 19:56                           ` Hugh Dickins
2012-11-15 19:56                             ` Hugh Dickins
2012-11-16  0:40                             ` Jaegeuk Hanse
2012-11-16  0:40                               ` Jaegeuk Hanse
2012-11-16  9:34                             ` Jaegeuk Hanse
2012-11-16  9:34                               ` Jaegeuk Hanse
2012-11-17  4:48                               ` Hugh Dickins
2012-11-17  4:48                                 ` Hugh Dickins
2012-11-18  0:57                                 ` Jaegeuk Hanse
2012-11-18  0:57                                   ` Jaegeuk Hanse
2012-11-18  1:48                                 ` Jaegeuk Hanse
2012-11-18  1:48                                   ` Jaegeuk Hanse

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.