Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
* Re: page cache: Store only head pages in i_pages
       [not found] <1553285568.26196.24.camel@lca.pw>
@ 2019-03-23  3:38 ` Matthew Wilcox
  2019-03-23 23:50   ` Qian Cai
  0 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox @ 2019-03-23  3:38 UTC (permalink / raw)
  To: Qian Cai; +Cc: Huang Ying, linux-mm

On Fri, Mar 22, 2019 at 04:12:48PM -0400, Qian Cai wrote:
> FYI, every thing involve swapping seems triggered a panic now since this patch.

Thanks for the report!  Does this fix it?

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 41858a3744b4..975aea9a49a5 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -335,6 +335,8 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
 static inline struct page *find_subpage(struct page *page, pgoff_t offset)
 {
 	VM_BUG_ON_PAGE(PageTail(page), page);
+	if (unlikely(PageSwapCache(page)))
+		return page;
 	VM_BUG_ON_PAGE(page->index > offset, page);
 	VM_BUG_ON_PAGE(page->index + compound_nr(page) <= offset, page);
 	return page - page->index + offset;

Huang, I'm pretty sure this isn't right for CONFIG_THP_SWAP, but I'm
not sure what the right answer is.  The patch this is on top of includes:

@@ -132,7 +132,7 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp)
                for (i = 0; i < nr; i++) {
                        VM_BUG_ON_PAGE(xas.xa_index != idx + i, page);
                        set_page_private(page + i, entry.val + i);
-                       xas_store(&xas, page + i);
+                       xas_store(&xas, page);
                        xas_next(&xas);
                }
                address_space->nrpages += nr;

so if we've added a THP page, we're going to find the head page.  I'm not
sure I understand how to get from the head page to the right subpage.  Is
it as simple as:

+	if (unlikely(PageSwapCache(page)))
+		return page + (offset & (compound_nr(page) - 1));

or are they not stored at an aligned location?

> [11653.484481] page:ffffea0006ef7080 count:2 mapcount:0 mapping:0000000000000000
> index:0x0
> [11653.525397] swap_aops 
> [11653.525404] flags:
> 0x5fffe000080454(uptodate|lru|workingset|owner_priv_1|swapbacked)
> [11653.573631] raw: 005fffe000080454 ffffea0006ef7048 ffffea0007c9c7c8
> 0000000000000000
> [11653.608547] raw: 0000000000000000 0000000000001afd 00000002ffffffff
> 0000000000000000
> [11653.643436] page dumped because: VM_BUG_ON_PAGE(page->index + (1 <<
> compound_order(page)) <= offset)
> [11653.684322] page allocated via order 0, migratetype Movable, gfp_mask
> 0x100cca(GFP_HIGHUSER_MOVABLE)
> [11653.725462]  prep_new_page+0x3a4/0x4d0
> [11653.742373]  get_page_from_freelist+0xcde/0x3550
> [11653.763449]  __alloc_pages_nodemask+0x859/0x2ab0
> [11653.784105]  alloc_pages_vma+0xb2/0x430
> [11653.801248]  __read_swap_cache_async+0x49c/0xc30
> [11653.821943]  swap_cluster_readahead+0x4a1/0x8b0
> [11653.842224]  swapin_readahead+0xb6/0xc3e
> [11653.859894]  do_swap_page+0xc87/0x24b0
> [11653.876664]  __handle_mm_fault+0x1601/0x3bc0
> [11653.895818]  handle_mm_fault+0x326/0x6cf
> [11653.913443]  __do_page_fault+0x333/0x8d0
> [11653.931068]  do_page_fault+0x75/0x48e
> [11653.947560]  page_fault+0x1b/0x20
> [11653.962558] ------------[ cut here ]------------
> [11653.983290] kernel BUG at include/linux/pagemap.h:341!
> [11654.006336] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
> [11654.036835] CPU: 12 PID: 14006 Comm: in:imjournal Kdump: loaded Tainted:
> G        W         5.1.0-rc1-mm1+ #17
> [11654.084191] Hardware name: HP ProLiant DL80 Gen9/ProLiant DL80 Gen9, BIOS U15
> 09/12/2016
> [11654.120401] RIP: 0010:find_get_entry+0x618/0x7f0
> [11654.141171] Code: c6 60 ae e9 97 4c 89 ff e8 c5 b2 0c 00 0f 0b 48 c7 c7 20 3b
> 42 98 e8 3c 3c 57 00 48 c7 c6 e0 b0 e9 97 4c 89 ff e8 a8 b2 0c 00 <0f> 0b 48 c7
> c7 e0 3a 42 98 e8 1f 3c 57 00 48 c7 c7 88 62 7e 98 e8
> [11654.226539] RSP: 0000:ffff8882095b7628 EFLAGS: 00010286
> [11654.249867] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff9735700e
> [11654.281925] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff888213e4d332
> [11654.313411] RBP: ffff8882095b7738 R08: ffffed1042d86a89 R09: ffffed1042d86a88
> [11654.346314] R10: ffffed1042d86a88 R11: ffff888216c35447 R12: 0000000000000000
> [11654.379144] R13: ffffea0006ef70a0 R14: ffff8881e5c16fc8 R15: ffffea0006ef7080
> [11654.411410] FS:  00007fb8daf56700(0000) GS:ffff888216c00000(0000)
> knlGS:0000000000000000
> [11654.447618] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [11654.473481] CR2: 00007f74c0647000 CR3: 0000000204860003 CR4: 00000000001606a0
> [11654.506282] Call Trace:
> [11654.517340]  ? __filemap_set_wb_err+0x1f0/0x1f0
> [11654.538684]  ? generic_make_request+0x283/0xc50
> [11654.562239]  ? mem_cgroup_uncharge+0x150/0x150
> [11654.582551]  pagecache_get_page+0x4a/0xb70
> [11654.600807]  ? release_pages+0xada/0x1750
> [11654.618847]  __read_swap_cache_async+0x1a8/0xc30
> [11654.640167]  ? lookup_swap_cache+0x570/0x570
> [11654.659274]  read_swap_cache_async+0x69/0xd0
> [11654.678354]  ? __read_swap_cache_async+0xc30/0xc30
> [11654.699775]  ? lru_add_drain_cpu+0x239/0x4e0
> [11654.718855]  swap_cluster_readahead+0x386/0x8b0
> [11654.739171]  ? read_swap_cache_async+0xd0/0xd0
> [11654.760517]  ? xas_load+0x8b/0xf0
> [11654.775357]  ? find_get_entry+0x39e/0x7f0
> [11654.793267]  swapin_readahead+0xb6/0xc3e
> [11654.810391]  ? exit_swap_address_space+0x1b0/0x1b0
> [11654.831816]  ? lookup_swap_cache+0x114/0x570
> [11654.850972]  ? xas_find+0x141/0x530
> [11654.866616]  ? free_pages_and_swap_cache+0x2f0/0x2f0
> [11654.889495]  ? swapcache_prepare+0x20/0x20
> [11654.907786]  ? filemap_map_pages+0x3af/0xec0
> [11654.926919]  do_swap_page+0xc87/0x24b0
> [11654.943758]  ? unmap_mapping_range+0x30/0x30
> [11654.962918]  ? kasan_check_read+0x11/0x20
> [11654.980900]  ? do_raw_spin_unlock+0x59/0x250
> [11655.000027]  __handle_mm_fault+0x1601/0x3bc0
> [11655.020130]  ? __lock_acquire.isra.14+0x7d7/0x2130
> [11655.041845]  ? vmf_insert_mixed_mkwrite+0x20/0x20
> [11655.066529]  ? lock_acquire+0x169/0x360
> [11655.085380]  handle_mm_fault+0x326/0x6cf
> [11655.102928]  __do_page_fault+0x333/0x8d0
> [11655.120429]  ? task_work_run+0xdd/0x190
> [11655.138322]  do_page_fault+0x75/0x48e
> [11655.154693]  ? page_fault+0x5/0x20
> [11655.170168]  page_fault+0x1b/0x20
> [11655.185007] RIP: 0033:0x7fb8dde3ae14
> [11655.201007] Code: 00 0f 1f 44 00 00 f3 0f 1e fa f2 ff 25 65 1a 29 00 0f 1f 44
> 00 00 f3 0f 1e fa f2 ff 25 5d 1a 29 00 0f 1f 44 00 00 f3 0f 1e fa <f2> ff 25 55
> 1a 29 00 0f 1f 44 00 00 f3 0f 1e fa f2 ff 25 4d 1a 29
> [11655.286038] RSP: 002b:00007fb8daf55b28 EFLAGS: 00010246
> [11655.309443] RAX: 0000000000000000 RBX: 0000562a76a844f0 RCX: 0000000000000000
> [11655.341503] RDX: 00007fb8daf55ac0 RSI: 0000000000000001 RDI: 0000562a76a844f0
> [11655.373906] RBP: 00000000000dbba0 R08: 0000000000000008 R09: 0000000000000000
> [11655.407498] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000e
> [11655.439429] R13: 00007fb8da036760 R14: 00007fb8daf55be0 R15: 00007fb8daf55bd0
> [11655.471484] Modules linked in: brd nls_iso8859_1 nls_cp437 vfat fat ext4
> crc16 mbcache jbd2 overlay loop kvm_intel kvm irqbypass ip_tables x_tables xfs
> sd_mod igb ahci i2c_algo_bit libahci libata i2c_core dm_mirror dm_region_hash
> dm_log dm_mod


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page cache: Store only head pages in i_pages
  2019-03-23  3:38 ` page cache: Store only head pages in i_pages Matthew Wilcox
@ 2019-03-23 23:50   ` Qian Cai
  2019-03-24  2:06     ` Matthew Wilcox
  0 siblings, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-03-23 23:50 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Huang Ying, linux-mm



On 3/22/19 11:38 PM, Matthew Wilcox wrote:
> On Fri, Mar 22, 2019 at 04:12:48PM -0400, Qian Cai wrote:
>> FYI, every thing involve swapping seems triggered a panic now since this patch.
> 
> Thanks for the report!  Does this fix it?
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 41858a3744b4..975aea9a49a5 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -335,6 +335,8 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
>  static inline struct page *find_subpage(struct page *page, pgoff_t offset)
>  {
>  	VM_BUG_ON_PAGE(PageTail(page), page);
> +	if (unlikely(PageSwapCache(page)))
> +		return page;
>  	VM_BUG_ON_PAGE(page->index > offset, page);
>  	VM_BUG_ON_PAGE(page->index + compound_nr(page) <= offset, page);
>  	return page - page->index + offset;

Yes, it works.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page cache: Store only head pages in i_pages
  2019-03-23 23:50   ` Qian Cai
@ 2019-03-24  2:06     ` Matthew Wilcox
  2019-03-24  2:52       ` Qian Cai
  0 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox @ 2019-03-24  2:06 UTC (permalink / raw)
  To: Qian Cai; +Cc: Huang Ying, linux-mm

On Sat, Mar 23, 2019 at 07:50:15PM -0400, Qian Cai wrote:
> On 3/22/19 11:38 PM, Matthew Wilcox wrote:
> > On Fri, Mar 22, 2019 at 04:12:48PM -0400, Qian Cai wrote:
> >> FYI, every thing involve swapping seems triggered a panic now since this patch.
> 
> Yes, it works.

Thanks for testing.  Kirill suggests this would be a better fix:

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 41858a3744b4..9718393ae45b 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -334,10 +334,12 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
 
 static inline struct page *find_subpage(struct page *page, pgoff_t offset)
 {
+	unsigned long index = page_index(page);
+
 	VM_BUG_ON_PAGE(PageTail(page), page);
-	VM_BUG_ON_PAGE(page->index > offset, page);
-	VM_BUG_ON_PAGE(page->index + compound_nr(page) <= offset, page);
-	return page - page->index + offset;
+	VM_BUG_ON_PAGE(index > offset, page);
+	VM_BUG_ON_PAGE(index + compound_nr(page) <= offset, page);
+	return page - index + offset;
 }
 
 struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page cache: Store only head pages in i_pages
  2019-03-24  2:06     ` Matthew Wilcox
@ 2019-03-24  2:52       ` Qian Cai
  2019-03-24  3:04         ` Matthew Wilcox
  0 siblings, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-03-24  2:52 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Huang Ying, linux-mm



On 3/23/19 10:06 PM, Matthew Wilcox wrote:
> Thanks for testing.  Kirill suggests this would be a better fix:
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 41858a3744b4..9718393ae45b 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -334,10 +334,12 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
>  
>  static inline struct page *find_subpage(struct page *page, pgoff_t offset)
>  {
> +	unsigned long index = page_index(page);
> +
>  	VM_BUG_ON_PAGE(PageTail(page), page);
> -	VM_BUG_ON_PAGE(page->index > offset, page);
> -	VM_BUG_ON_PAGE(page->index + compound_nr(page) <= offset, page);
> -	return page - page->index + offset;
> +	VM_BUG_ON_PAGE(index > offset, page);
> +	VM_BUG_ON_PAGE(index + compound_nr(page) <= offset, page);
> +	return page - index + offset;
>  }
>  
>  struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);

This is not even compiled.

If "s/compound_nr/compound_order/", it failed to boot here,

[   56.843236] Unpacking initramfs...
[   56.881979] page:ffff7fe022eb19c0 count:3 mapcount:0 mapping:38ff80080099c008
index:0x0
[   56.890007] ramfs_aops
[   56.890011] name:"lvm.conf"
[   56.892465] flags: 0x17fffffa400000c(uptodate|dirty)
[   56.900318] raw: 017fffffa400000c dead000000000100 dead000000000200
38ff80080099c008
[   56.908066] raw: 0000000000000000 0000000000000000 00000003ffffffff
7bff8008203bcc80
[   56.915812] page dumped because: VM_BUG_ON_PAGE(index + compound_order(page)
<= offset)
[   56.923818] page->mem_cgroup:7bff8008203bcc80
[   56.928180] page allocated via order 0, migratetype Unmovable, gfp_mask
0x100cc2(GFP_HIGHUSER)
[   56.936800]  prep_new_page+0x4e0/0x5e0
[   56.940556]  get_page_from_freelist+0x4cf4/0x50e0
[   56.945265]  __alloc_pages_nodemask+0x738/0x38b8
[   56.949888]  alloc_page_interleave+0x34/0x2f0
[   56.954249]  alloc_pages_current+0xc0/0x150
[   56.958439]  __page_cache_alloc+0x70/0x2f4
[   56.962541]  pagecache_get_page+0x5e4/0xaf0
[   56.966729]  grab_cache_page_write_begin+0x6c/0x98
[   56.971526]  simple_write_begin+0x40/0x308
[   56.975627]  generic_perform_write+0x1d4/0x4e0
[   56.980076]  __generic_file_write_iter+0x294/0x504
[   56.984872]  generic_file_write_iter+0x354/0x594
[   56.989496]  __vfs_write+0x72c/0x8a0
[   56.993076]  vfs_write+0x1ec/0x424
[   56.996481]  ksys_write+0xbc/0x190
[   56.999890]  xwrite+0x38/0x84
[   57.002869] ------------[ cut here ]------------
[   57.007478] kernel BUG at ./include/linux/pagemap.h:342!
[   57.012801] Internal error: Oops - BUG: 0 [#1] SMP
[   57.017584] Modules linked in:
[   57.020636] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc1-mm1+ #7
[   57.027239] Hardware name: HPE Apollo 70             /C01_APACHE_MB         ,
BIOS L50_5.13_1.0.6 07/10/2018
[   57.037057] pstate: 60400009 (nZCv daif +PAN -UAO)
[   57.041839] pc : find_get_entry+0x948/0x950
[   57.046013] lr : find_get_entry+0x940/0x950
[   57.050185] sp : 10ff80082600f420
[   57.053489] x29: 10ff80082600f4d0 x28: efff100000000000
[   57.058792] x27: ffff7fe022eb19c8 x26: 0000000000000010
[   57.064095] x25: 0000000000000035 x24: 0000000000000003
[   57.069397] x23: 00000000000000ff x22: 10ff80082600f460
[   57.074700] x21: ffff7fe022eb19c0 x20: 35ff800825f9a050
[   57.080002] x19: 0000000000000000 x18: 0000000000000000
[   57.085304] x17: 0000000000000000 x16: 000000000000000a
[   57.090606] x15: 35ff800825f9a0b8 x14: 0000000000000000
[   57.095908] x13: ffff800825f9a050 x12: 00000000ffffffff
[   57.101210] x11: 0000000000000003 x10: 00000000000000ff
[   57.106512] x9 : e071b95619aca700 x8 : e071b95619aca700
[   57.111814] x7 : 0000000000000000 x6 : ffff1000102d01f4
[   57.117116] x5 : 0000000000000000 x4 : 0000000000000080
[   57.122418] x3 : ffff1000102b84c8 x2 : 0000000000000000
[   57.127720] x1 : 0000000000000004 x0 : ffff100013316b10
[   57.133024] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
[   57.139715] Call trace:
[   57.142153]  find_get_entry+0x948/0x950
[   57.145979]  pagecache_get_page+0x68/0xaf0
[   57.150066]  grab_cache_page_write_begin+0x6c/0x98
[   57.154847]  simple_write_begin+0x40/0x308
[   57.158934]  generic_perform_write+0x1d4/0x4e0
[   57.163368]  __generic_file_write_iter+0x294/0x504
[   57.168150]  generic_file_write_iter+0x354/0x594
[   57.172757]  __vfs_write+0x72c/0x8a0
[   57.176323]  vfs_write+0x1ec/0x424
[   57.179715]  ksys_write+0xbc/0x190
[   57.183107]  xwrite+0x38/0x84
[   57.186066]  do_copy+0x110/0x898
[   57.189284]  write_buffer+0x148/0x1cc
[   57.192937]  flush_buffer+0x94/0x240
[   57.196505]  __gunzip+0x738/0x8f0
[   57.199810]  gunzip+0x18/0x20
[   57.202768]  unpack_to_rootfs+0x358/0x968
[   57.206769]  populate_rootfs+0x120/0x198
[   57.210684]  do_one_initcall+0x544/0xd00
[   57.214597]  do_initcall_level+0x660/0x814
[   57.218684]  do_basic_setup+0x38/0x50
[   57.222337]  kernel_init_freeable+0x25c/0x444
[   57.226686]  kernel_init+0x1c/0x548
[   57.230165]  ret_from_fork+0x10/0x18
[   57.233733] Code: aa1503e0 94034d79 b0016fe0 912c4000 (d4210000)
[   57.240055] ---[ end trace d7c5c3c62a7fa743 ]---
[   57.244664] Kernel panic - not syncing: Fatal exception
[   57.249997] SMP: stopping secondary CPUs
[   57.254417] ---[ end Kernel panic - not syncing: Fatal exception ]---


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page cache: Store only head pages in i_pages
  2019-03-24  2:52       ` Qian Cai
@ 2019-03-24  3:04         ` Matthew Wilcox
  2019-03-24 15:42           ` Qian Cai
                             ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Matthew Wilcox @ 2019-03-24  3:04 UTC (permalink / raw)
  To: Qian Cai; +Cc: Huang Ying, linux-mm

On Sat, Mar 23, 2019 at 10:52:49PM -0400, Qian Cai wrote:
> On 3/23/19 10:06 PM, Matthew Wilcox wrote:
> > Thanks for testing.  Kirill suggests this would be a better fix:
> > 
> > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> > index 41858a3744b4..9718393ae45b 100644
> > --- a/include/linux/pagemap.h
> > +++ b/include/linux/pagemap.h
> > @@ -334,10 +334,12 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
> >  
> >  static inline struct page *find_subpage(struct page *page, pgoff_t offset)
> >  {
> > +	unsigned long index = page_index(page);
> > +
> >  	VM_BUG_ON_PAGE(PageTail(page), page);
> > -	VM_BUG_ON_PAGE(page->index > offset, page);
> > -	VM_BUG_ON_PAGE(page->index + compound_nr(page) <= offset, page);
> > -	return page - page->index + offset;
> > +	VM_BUG_ON_PAGE(index > offset, page);
> > +	VM_BUG_ON_PAGE(index + compound_nr(page) <= offset, page);
> > +	return page - index + offset;
> >  }
> >  
> >  struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
> 
> This is not even compiled.
> 
> If "s/compound_nr/compound_order/", it failed to boot here,

Oh, sorry.  I have another patch in that tree.

-       VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset,
-                       page);
+       VM_BUG_ON_PAGE(page->index + compound_nr(page) <= offset, page);

The patch for you should have looked like this:

@@ -335,11 +335,12 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
 
 static inline struct page *find_subpage(struct page *page, pgoff_t offset)
 {
+       unsigned long index = page_index(page);
+
        VM_BUG_ON_PAGE(PageTail(page), page);
-       VM_BUG_ON_PAGE(page->index > offset, page);
-       VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset,
-                       page);
-       return page - page->index + offset;
+       VM_BUG_ON_PAGE(index > offset, page);
+       VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
+       return page - index + offset;
 }


> [   56.915812] page dumped because: VM_BUG_ON_PAGE(index + compound_order(page)
> <= offset)

Yeah, you were missing the '1 <<' part.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page cache: Store only head pages in i_pages
  2019-03-24  3:04         ` Matthew Wilcox
@ 2019-03-24 15:42           ` Qian Cai
  2019-03-27 10:48           ` William Kucharski
  2019-03-29  1:43           ` Qian Cai
  2 siblings, 0 replies; 21+ messages in thread
From: Qian Cai @ 2019-03-24 15:42 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Huang Ying, linux-mm



On 3/23/19 11:04 PM, Matthew Wilcox wrote:
> The patch for you should have looked like this:
> 
> @@ -335,11 +335,12 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
>  
>  static inline struct page *find_subpage(struct page *page, pgoff_t offset)
>  {
> +       unsigned long index = page_index(page);
> +
>         VM_BUG_ON_PAGE(PageTail(page), page);
> -       VM_BUG_ON_PAGE(page->index > offset, page);
> -       VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset,
> -                       page);
> -       return page - page->index + offset;
> +       VM_BUG_ON_PAGE(index > offset, page);
> +       VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
> +       return page - index + offset;
>  }

It works great.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page cache: Store only head pages in i_pages
  2019-03-24  3:04         ` Matthew Wilcox
  2019-03-24 15:42           ` Qian Cai
@ 2019-03-27 10:48           ` William Kucharski
  2019-03-27 11:50             ` Matthew Wilcox
  2019-03-29  1:43           ` Qian Cai
  2 siblings, 1 reply; 21+ messages in thread
From: William Kucharski @ 2019-03-27 10:48 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Qian Cai, Huang Ying, linux-mm



> On Mar 23, 2019, at 9:04 PM, Matthew Wilcox <willy@infradead.org> wrote:
> 
> 
> static inline struct page *find_subpage(struct page *page, pgoff_t offset)
> {
> +       unsigned long index = page_index(page);
> +
>        VM_BUG_ON_PAGE(PageTail(page), page);
> -       VM_BUG_ON_PAGE(page->index > offset, page);
> -       VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset,
> -                       page);
> -       return page - page->index + offset;
> +       VM_BUG_ON_PAGE(index > offset, page);
> +       VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
> +       return page - index + offset;
> }
> 
> 
>> [   56.915812] page dumped because: VM_BUG_ON_PAGE(index + compound_order(page)
>> <= offset)
> 
> Yeah, you were missing the '1 <<' part.
> 

Is a V5 patch coming incorporating these?

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page cache: Store only head pages in i_pages
  2019-03-27 10:48           ` William Kucharski
@ 2019-03-27 11:50             ` Matthew Wilcox
  0 siblings, 0 replies; 21+ messages in thread
From: Matthew Wilcox @ 2019-03-27 11:50 UTC (permalink / raw)
  To: William Kucharski; +Cc: Qian Cai, Huang Ying, linux-mm

On Wed, Mar 27, 2019 at 04:48:42AM -0600, William Kucharski wrote:
> > On Mar 23, 2019, at 9:04 PM, Matthew Wilcox <willy@infradead.org> wrote:
> > static inline struct page *find_subpage(struct page *page, pgoff_t offset)
> > {
> > +       unsigned long index = page_index(page);
> > +
> >        VM_BUG_ON_PAGE(PageTail(page), page);
> > -       VM_BUG_ON_PAGE(page->index > offset, page);
> > -       VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset,
> > -                       page);
> > -       return page - page->index + offset;
> > +       VM_BUG_ON_PAGE(index > offset, page);
> > +       VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
> > +       return page - index + offset;
> > }
> > 
> > 
> >> [   56.915812] page dumped because: VM_BUG_ON_PAGE(index + compound_order(page)
> >> <= offset)
> 
> Is a V5 patch coming incorporating these?

No; Andrew prefers to accumulate fixes as separate patches, so that's
what I sent on March 24th; Andrew added it to -mm yesterday.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page cache: Store only head pages in i_pages
  2019-03-24  3:04         ` Matthew Wilcox
  2019-03-24 15:42           ` Qian Cai
  2019-03-27 10:48           ` William Kucharski
@ 2019-03-29  1:43           ` Qian Cai
  2019-03-29 19:59             ` Matthew Wilcox
  2 siblings, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-03-29  1:43 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Huang Ying, linux-mm

On 3/23/19 11:04 PM, Matthew Wilcox wrote> @@ -335,11 +335,12 @@ static inline
struct page *grab_cache_page_nowait(struct address_space *mapping,
>  
>  static inline struct page *find_subpage(struct page *page, pgoff_t offset)
>  {
> +       unsigned long index = page_index(page);
> +
>         VM_BUG_ON_PAGE(PageTail(page), page);
> -       VM_BUG_ON_PAGE(page->index > offset, page);
> -       VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset,
> -                       page);
> -       return page - page->index + offset;
> +       VM_BUG_ON_PAGE(index > offset, page);
> +       VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
> +       return page - index + offset;
>  }

Even with this patch, it is still able to trigger a panic below by running LTP
mm tests. Always triggered by oom02 (or oom04) at the end.

# /opt/ltp/runltp -f mm

The problem is that in scan_swap_map_slots(),

/* reuse swap entry of cache-only swap if not busy. */
	if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) {
		int swap_was_freed;
		unlock_cluster(ci);
		spin_unlock(&si->lock);
		swap_was_freed = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY);

but that swap entry has already been freed, and the page has PageSwapCache
cleared and page->private is 0.

swp_entry_t entry = swp_entry(si->type, offset)

and then in find_subpage(),

its page->index has a different meaning again and the calculation is now all wrong.

return page - page->index + offset;

[ 7439.033573] oom_reaper: reaped process 47172 (oom02), now anon-rss:0kB,
file-rss:0kB, shmem-rss:0kB
[ 7456.445737] LTP: starting oom03
[ 7456.535940] LTP: starting oom04
[ 7493.077222] page:ffffea00877a13c0 count:1 mapcount:0 mapping:ffff88a79061d009
index:0x7fa81584f
[ 7493.086963] anon
[ 7493.086968] flags: 0x15fffe00008005c(uptodate|dirty|lru|workingset|swapbacked)
[ 7493.097201] raw: 015fffe00008005c ffffea00b4bf9508 ffffea007f45efc8
ffff88a79061d009
[ 7493.105853] raw: 00000007fa81584f 0000000000000000 00000001ffffffff
ffff888f18278008
[ 7493.114504] page dumped because: VM_BUG_ON_PAGE(index + (1 <<
compound_order(page)) <= offset)
[ 7493.124126] page->mem_cgroup:ffff888f18278008
[ 7493.129036] page_owner info is not active (free page?)
[ 7493.134782] ------------[ cut here ]------------
[ 7493.139937] kernel BUG at include/linux/pagemap.h:342!
[ 7493.145682] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 7493.152679] CPU: 5 PID: 47308 Comm: oom04 Kdump: loaded Tainted: G        W
      5.1.0-rc2-mm1+ #13
[ 7493.163068] Hardware name: Lenovo ThinkSystem SR530
-[7X07RCZ000]-/-[7X07RCZ000]-, BIOS -[TEE113T-1.00]- 07/07/2017
[ 7493.174721] RIP: 0010:find_get_entry+0x751/0x9b0
[ 7493.179876] Code: c6 e0 aa a9 8d 4c 89 ff e8 3c 18 0d 00 0f 0b 48 c7 c7 20 40
02 8e e8 a3 17 58 00 48 c7 c6 40 ad a9 8d 4c 89 ff e8 1f 18 0d 00 <0f> 0b 48 c7
c7 e0 3f 02 8e e8 86 17 58 00 48 c7 c7 68 11 3f 8e e8
[ 7493.200834] RSP: 0000:ffff888d50536ba8 EFLAGS: 00010282
[ 7493.206666] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff8cd6401e
[ 7493.214632] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88979e8b5480
[ 7493.222599] RBP: ffff888d50536cb8 R08: ffffed12f3d16a91 R09: ffffed12f3d16a90
[ 7493.230566] R10: ffffed12f3d16a90 R11: ffff88979e8b5487 R12: ffffea00877a13c0
[ 7493.238531] R13: ffffea00877a13c8 R14: ffffea00877a13c8 R15: ffffea00877a13c0
[ 7493.246496] FS:  00007f248398c700(0000) GS:ffff88979e880000(0000)
knlGS:0000000000000000
[ 7493.255527] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7493.261942] CR2: 00007f3fde110000 CR3: 00000011b2fcc003 CR4: 00000000001606a0
[ 7493.269900] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7493.277864] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7493.285830] Call Trace:
[ 7493.288555]  ? queued_spin_lock_slowpath+0x571/0x9e0
[ 7493.294097]  ? __filemap_set_wb_err+0x1f0/0x1f0
[ 7493.299154]  pagecache_get_page+0x4a/0xb70
[ 7493.303729]  __try_to_reclaim_swap+0xa3/0x400
[ 7493.308593]  scan_swap_map_slots+0xc05/0x1850
[ 7493.313447]  ? __try_to_reclaim_swap+0x400/0x400
[ 7493.318600]  ? do_raw_spin_lock+0x128/0x280
[ 7493.323269]  ? rwlock_bug.part.0+0x90/0x90
[ 7493.327840]  ? get_swap_pages+0x195/0x730
[ 7493.332316]  get_swap_pages+0x386/0x730
[ 7493.336590]  get_swap_page+0x2b2/0x643
[ 7493.340774]  ? rmap_walk+0x140/0x140
[ 7493.344765]  ? free_swap_slot+0x3c0/0x3c0
[ 7493.349232]  ? anon_vma_ctor+0xe0/0xe0
[ 7493.353407]  ? page_get_anon_vma+0x280/0x280
[ 7493.358173]  add_to_swap+0x10b/0x230
[ 7493.362164]  shrink_page_list+0x29d8/0x4960
[ 7493.366822]  ? page_evictable+0x11b/0x1d0
[ 7493.371296]  ? page_evictable+0x1d0/0x1d0
[ 7493.375769]  ? __isolate_lru_page+0x880/0x880
[ 7493.380631]  ? __lock_acquire.isra.14+0x7d7/0x2130
[ 7493.385977]  ? shrink_inactive_list+0x484/0x13b0
[ 7493.391130]  ? lock_downgrade+0x760/0x760
[ 7493.395608]  ? kasan_check_read+0x11/0x20
[ 7493.400082]  ? do_raw_spin_unlock+0x59/0x250
[ 7493.404848]  shrink_inactive_list+0x4bf/0x13b0
[ 7493.409823]  ? move_pages_to_lru+0x1c90/0x1c90
[ 7493.414795]  ? kasan_check_read+0x11/0x20
[ 7493.419261]  ? lruvec_lru_size+0xef/0x4c0
[ 7493.423738]  ? call_function_interrupt+0xa/0x20
[ 7493.428800]  ? rcu_all_qs+0x11/0xc0
[ 7493.432692]  shrink_node_memcg+0x66a/0x1ee0
[ 7493.437361]  ? shrink_active_list+0x1150/0x1150
[ 7493.442417]  ? lock_downgrade+0x760/0x760
[ 7493.446891]  ? lock_acquire+0x169/0x360
[ 7493.451177]  ? mem_cgroup_iter+0x210/0xca0
[ 7493.455747]  ? kasan_check_read+0x11/0x20
[ 7493.460221]  ? mem_cgroup_protected+0x94/0x450
[ 7493.465179]  shrink_node+0x266/0x13c0
[ 7493.469267]  ? shrink_node_memcg+0x1ee0/0x1ee0
[ 7493.474230]  ? ktime_get+0xab/0x140
[ 7493.478122]  ? zone_reclaimable_pages+0x553/0x8d0
[ 7493.483371]  do_try_to_free_pages+0x349/0x11e0
[ 7493.488333]  ? allow_direct_reclaim.part.6+0xc3/0x240
[ 7493.493971]  ? shrink_node+0x13c0/0x13c0
[ 7493.498352]  ? queue_delayed_work_on+0x30/0x30
[ 7493.503313]  try_to_free_pages+0x277/0x740
[ 7493.507884]  ? __lock_acquire.isra.14+0x7d7/0x2130
[ 7493.513232]  ? do_try_to_free_pages+0x11e0/0x11e0
[ 7493.518482]  __alloc_pages_nodemask+0xc37/0x2ab0
[ 7493.523635]  ? gfp_pfmemalloc_allowed+0x150/0x150
[ 7493.528886]  ? __lock_acquire.isra.14+0x7d7/0x2130
[ 7493.534226]  ? __lock_acquire.isra.14+0x7d7/0x2130
[ 7493.539566]  ? do_anonymous_page+0x450/0x1e00
[ 7493.544419]  ? lock_downgrade+0x760/0x760
[ 7493.548896]  ? __lru_cache_add+0xc2/0x240
[ 7493.553372]  alloc_pages_vma+0xb2/0x430
[ 7493.557652]  do_anonymous_page+0x50a/0x1e00
[ 7493.562324]  ? put_prev_task_fair+0x27c/0x720
[ 7493.567189]  ? finish_fault+0x290/0x290
[ 7493.571471]  __handle_mm_fault+0x1688/0x3bc0
[ 7493.576227]  ? __lock_acquire.isra.14+0x7d7/0x2130
[ 7493.581574]  ? vmf_insert_mixed_mkwrite+0x20/0x20
[ 7493.586824]  handle_mm_fault+0x326/0x6cf
[ 7493.591203]  __do_page_fault+0x333/0x8d0
[ 7493.595571]  do_page_fault+0x75/0x48e
[ 7493.599660]  ? page_fault+0x5/0x20
[ 7493.603458]  page_fault+0x1b/0x20
[ 7493.607156] RIP: 0033:0x410930
[ 7493.610564] Code: 89 de e8 53 26 ff ff 48 83 f8 ff 0f 84 86 00 00 00 48 89 c5
41 83 fc 02 74 28 41 83 fc 03 74 62 e8 05 2c ff ff 31 d2 48 98 90 <c6> 44 15 00
07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
[ 7493.631521] RSP: 002b:00007f248398bec0 EFLAGS: 00010206
[ 7493.637352] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: 00007f42982ad497
[ 7493.645316] RDX: 000000002a9a3000 RSI: 00000000c0000000 RDI: 0000000000000000
[ 7493.653281] RBP: 00007f230298b000 R08: 00000000ffffffff R09: 0000000000000000
[ 7493.661245] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000001
[ 7493.669209] R13: 00007ffc5b8a54ef R14: 0000000000000000 R15: 00007f248398bfc0
[ 7493.677176] Modules linked in: brd ext4 crc16 mbcache jbd2 overlay loop
nls_iso8859_1 nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables
x_tables xfs sd_mod i40e ahci libahci megaraid_sas libata dm_mirror
dm_region_hash dm_log dm_mod efivarfs


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page cache: Store only head pages in i_pages
  2019-03-29  1:43           ` Qian Cai
@ 2019-03-29 19:59             ` Matthew Wilcox
  2019-03-29 21:25               ` Qian Cai
  0 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox @ 2019-03-29 19:59 UTC (permalink / raw)
  To: Qian Cai; +Cc: Huang Ying, linux-mm, Kirill A. Shutemov

On Thu, Mar 28, 2019 at 09:43:29PM -0400, Qian Cai wrote:
> On 3/23/19 11:04 PM, Matthew Wilcox wrote> @@ -335,11 +335,12 @@ static inline
> struct page *grab_cache_page_nowait(struct address_space *mapping,
> >  
> >  static inline struct page *find_subpage(struct page *page, pgoff_t offset)
> >  {
> > +       unsigned long index = page_index(page);
> > +
> >         VM_BUG_ON_PAGE(PageTail(page), page);
> > -       VM_BUG_ON_PAGE(page->index > offset, page);
> > -       VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset,
> > -                       page);
> > -       return page - page->index + offset;
> > +       VM_BUG_ON_PAGE(index > offset, page);
> > +       VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
> > +       return page - index + offset;
> >  }
> 
> Even with this patch, it is still able to trigger a panic below by running LTP
> mm tests. Always triggered by oom02 (or oom04) at the end.
> 
> # /opt/ltp/runltp -f mm
> 
> The problem is that in scan_swap_map_slots(),
> 
> /* reuse swap entry of cache-only swap if not busy. */
> 	if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) {
> 		int swap_was_freed;
> 		unlock_cluster(ci);
> 		spin_unlock(&si->lock);
> 		swap_was_freed = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY);
> 
> but that swap entry has already been freed, and the page has PageSwapCache
> cleared and page->private is 0.

I don't understand how we get to this situation.  We SetPageSwapCache()
in add_to_swap_cache() right before we store the page in i_pages.
We ClearPageSwapCache() in __delete_from_swap_cache() right after
removing the page from the array.  So how do we find a page in a swap
address space that has PageSwapCache cleared?

Indeed, we have a check which should trigger ...

        VM_BUG_ON_PAGE(!PageSwapCache(page), page);

in __delete_from_swap_cache().

Oh ... is it a race?

 * Its ok to check for PageSwapCache without the page lock
 * here because we are going to recheck again inside
 * try_to_free_swap() _with_ the lock.

so CPU A does:

page = find_get_page(swap_address_space(entry), offset)
        page = find_subpage(page, offset);
trylock_page(page);

while CPU B does:

xa_lock_irq(&address_space->i_pages);
__delete_from_swap_cache(page, entry);
        xas_store(&xas, NULL);
        ClearPageSwapCache(page);
xa_unlock_irq(&address_space->i_pages);

and if the ClearPageSwapCache happens between the xas_load() and the
find_subpage(), we're stuffed.  CPU A has a reference to the page, but
not a lock, and find_get_page is running under RCU.

I suppose we could fix this by taking the i_pages xa_lock around the
call to find_get_pages().  If indeed, that's what this problem is.
Want to try this patch?

diff --git a/mm/swapfile.c b/mm/swapfile.c
index 2b8d9c3fbb47..ed8e42be88b5 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -127,10 +127,14 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
 				 unsigned long offset, unsigned long flags)
 {
 	swp_entry_t entry = swp_entry(si->type, offset);
+	struct address_space *mapping = swap_address_space(entry);
+	unsigned long irq_flags;
 	struct page *page;
 	int ret = 0;
 
-	page = find_get_page(swap_address_space(entry), offset);
+	xa_lock_irqsave(&mapping->i_pages, irq_flags);
+	page = find_get_page(mapping, offset);
+	xa_unlock_irqrestore(&mapping->i_pages, irq_flags);
 	if (!page)
 		return 0;
 	/*

> swp_entry_t entry = swp_entry(si->type, offset)
> 
> and then in find_subpage(),
> 
> its page->index has a different meaning again and the calculation is now all wrong.
> 
> return page - page->index + offset;
> 
> [ 7439.033573] oom_reaper: reaped process 47172 (oom02), now anon-rss:0kB,
> file-rss:0kB, shmem-rss:0kB
> [ 7456.445737] LTP: starting oom03
> [ 7456.535940] LTP: starting oom04
> [ 7493.077222] page:ffffea00877a13c0 count:1 mapcount:0 mapping:ffff88a79061d009
> index:0x7fa81584f
> [ 7493.086963] anon
> [ 7493.086968] flags: 0x15fffe00008005c(uptodate|dirty|lru|workingset|swapbacked)
> [ 7493.097201] raw: 015fffe00008005c ffffea00b4bf9508 ffffea007f45efc8
> ffff88a79061d009
> [ 7493.105853] raw: 00000007fa81584f 0000000000000000 00000001ffffffff
> ffff888f18278008
> [ 7493.114504] page dumped because: VM_BUG_ON_PAGE(index + (1 <<
> compound_order(page)) <= offset)
> [ 7493.124126] page->mem_cgroup:ffff888f18278008
> [ 7493.129036] page_owner info is not active (free page?)
> [ 7493.134782] ------------[ cut here ]------------
> [ 7493.139937] kernel BUG at include/linux/pagemap.h:342!
> [ 7493.145682] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
> [ 7493.152679] CPU: 5 PID: 47308 Comm: oom04 Kdump: loaded Tainted: G        W
>       5.1.0-rc2-mm1+ #13
> [ 7493.163068] Hardware name: Lenovo ThinkSystem SR530
> -[7X07RCZ000]-/-[7X07RCZ000]-, BIOS -[TEE113T-1.00]- 07/07/2017
> [ 7493.174721] RIP: 0010:find_get_entry+0x751/0x9b0
> [ 7493.179876] Code: c6 e0 aa a9 8d 4c 89 ff e8 3c 18 0d 00 0f 0b 48 c7 c7 20 40
> 02 8e e8 a3 17 58 00 48 c7 c6 40 ad a9 8d 4c 89 ff e8 1f 18 0d 00 <0f> 0b 48 c7
> c7 e0 3f 02 8e e8 86 17 58 00 48 c7 c7 68 11 3f 8e e8
> [ 7493.200834] RSP: 0000:ffff888d50536ba8 EFLAGS: 00010282
> [ 7493.206666] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff8cd6401e
> [ 7493.214632] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88979e8b5480
> [ 7493.222599] RBP: ffff888d50536cb8 R08: ffffed12f3d16a91 R09: ffffed12f3d16a90
> [ 7493.230566] R10: ffffed12f3d16a90 R11: ffff88979e8b5487 R12: ffffea00877a13c0
> [ 7493.238531] R13: ffffea00877a13c8 R14: ffffea00877a13c8 R15: ffffea00877a13c0
> [ 7493.246496] FS:  00007f248398c700(0000) GS:ffff88979e880000(0000)
> knlGS:0000000000000000
> [ 7493.255527] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7493.261942] CR2: 00007f3fde110000 CR3: 00000011b2fcc003 CR4: 00000000001606a0
> [ 7493.269900] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 7493.277864] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 7493.285830] Call Trace:
> [ 7493.288555]  ? queued_spin_lock_slowpath+0x571/0x9e0
> [ 7493.294097]  ? __filemap_set_wb_err+0x1f0/0x1f0
> [ 7493.299154]  pagecache_get_page+0x4a/0xb70
> [ 7493.303729]  __try_to_reclaim_swap+0xa3/0x400
> [ 7493.308593]  scan_swap_map_slots+0xc05/0x1850
> [ 7493.313447]  ? __try_to_reclaim_swap+0x400/0x400
> [ 7493.318600]  ? do_raw_spin_lock+0x128/0x280
> [ 7493.323269]  ? rwlock_bug.part.0+0x90/0x90
> [ 7493.327840]  ? get_swap_pages+0x195/0x730
> [ 7493.332316]  get_swap_pages+0x386/0x730
> [ 7493.336590]  get_swap_page+0x2b2/0x643
> [ 7493.340774]  ? rmap_walk+0x140/0x140
> [ 7493.344765]  ? free_swap_slot+0x3c0/0x3c0
> [ 7493.349232]  ? anon_vma_ctor+0xe0/0xe0
> [ 7493.353407]  ? page_get_anon_vma+0x280/0x280
> [ 7493.358173]  add_to_swap+0x10b/0x230
> [ 7493.362164]  shrink_page_list+0x29d8/0x4960
> [ 7493.366822]  ? page_evictable+0x11b/0x1d0
> [ 7493.371296]  ? page_evictable+0x1d0/0x1d0
> [ 7493.375769]  ? __isolate_lru_page+0x880/0x880
> [ 7493.380631]  ? __lock_acquire.isra.14+0x7d7/0x2130
> [ 7493.385977]  ? shrink_inactive_list+0x484/0x13b0
> [ 7493.391130]  ? lock_downgrade+0x760/0x760
> [ 7493.395608]  ? kasan_check_read+0x11/0x20
> [ 7493.400082]  ? do_raw_spin_unlock+0x59/0x250
> [ 7493.404848]  shrink_inactive_list+0x4bf/0x13b0
> [ 7493.409823]  ? move_pages_to_lru+0x1c90/0x1c90
> [ 7493.414795]  ? kasan_check_read+0x11/0x20
> [ 7493.419261]  ? lruvec_lru_size+0xef/0x4c0
> [ 7493.423738]  ? call_function_interrupt+0xa/0x20
> [ 7493.428800]  ? rcu_all_qs+0x11/0xc0
> [ 7493.432692]  shrink_node_memcg+0x66a/0x1ee0
> [ 7493.437361]  ? shrink_active_list+0x1150/0x1150
> [ 7493.442417]  ? lock_downgrade+0x760/0x760
> [ 7493.446891]  ? lock_acquire+0x169/0x360
> [ 7493.451177]  ? mem_cgroup_iter+0x210/0xca0
> [ 7493.455747]  ? kasan_check_read+0x11/0x20
> [ 7493.460221]  ? mem_cgroup_protected+0x94/0x450
> [ 7493.465179]  shrink_node+0x266/0x13c0
> [ 7493.469267]  ? shrink_node_memcg+0x1ee0/0x1ee0
> [ 7493.474230]  ? ktime_get+0xab/0x140
> [ 7493.478122]  ? zone_reclaimable_pages+0x553/0x8d0
> [ 7493.483371]  do_try_to_free_pages+0x349/0x11e0
> [ 7493.488333]  ? allow_direct_reclaim.part.6+0xc3/0x240
> [ 7493.493971]  ? shrink_node+0x13c0/0x13c0
> [ 7493.498352]  ? queue_delayed_work_on+0x30/0x30
> [ 7493.503313]  try_to_free_pages+0x277/0x740
> [ 7493.507884]  ? __lock_acquire.isra.14+0x7d7/0x2130
> [ 7493.513232]  ? do_try_to_free_pages+0x11e0/0x11e0
> [ 7493.518482]  __alloc_pages_nodemask+0xc37/0x2ab0
> [ 7493.523635]  ? gfp_pfmemalloc_allowed+0x150/0x150
> [ 7493.528886]  ? __lock_acquire.isra.14+0x7d7/0x2130
> [ 7493.534226]  ? __lock_acquire.isra.14+0x7d7/0x2130
> [ 7493.539566]  ? do_anonymous_page+0x450/0x1e00
> [ 7493.544419]  ? lock_downgrade+0x760/0x760
> [ 7493.548896]  ? __lru_cache_add+0xc2/0x240
> [ 7493.553372]  alloc_pages_vma+0xb2/0x430
> [ 7493.557652]  do_anonymous_page+0x50a/0x1e00
> [ 7493.562324]  ? put_prev_task_fair+0x27c/0x720
> [ 7493.567189]  ? finish_fault+0x290/0x290
> [ 7493.571471]  __handle_mm_fault+0x1688/0x3bc0
> [ 7493.576227]  ? __lock_acquire.isra.14+0x7d7/0x2130
> [ 7493.581574]  ? vmf_insert_mixed_mkwrite+0x20/0x20
> [ 7493.586824]  handle_mm_fault+0x326/0x6cf
> [ 7493.591203]  __do_page_fault+0x333/0x8d0
> [ 7493.595571]  do_page_fault+0x75/0x48e
> [ 7493.599660]  ? page_fault+0x5/0x20
> [ 7493.603458]  page_fault+0x1b/0x20
> [ 7493.607156] RIP: 0033:0x410930
> [ 7493.610564] Code: 89 de e8 53 26 ff ff 48 83 f8 ff 0f 84 86 00 00 00 48 89 c5
> 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 05 2c ff ff 31 d2 48 98 90 <c6> 44 15 00
> 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
> [ 7493.631521] RSP: 002b:00007f248398bec0 EFLAGS: 00010206
> [ 7493.637352] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: 00007f42982ad497
> [ 7493.645316] RDX: 000000002a9a3000 RSI: 00000000c0000000 RDI: 0000000000000000
> [ 7493.653281] RBP: 00007f230298b000 R08: 00000000ffffffff R09: 0000000000000000
> [ 7493.661245] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000001
> [ 7493.669209] R13: 00007ffc5b8a54ef R14: 0000000000000000 R15: 00007f248398bfc0
> [ 7493.677176] Modules linked in: brd ext4 crc16 mbcache jbd2 overlay loop
> nls_iso8859_1 nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables
> x_tables xfs sd_mod i40e ahci libahci megaraid_sas libata dm_mirror
> dm_region_hash dm_log dm_mod efivarfs


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page cache: Store only head pages in i_pages
  2019-03-29 19:59             ` Matthew Wilcox
@ 2019-03-29 21:25               ` Qian Cai
  2019-03-30  3:04                 ` Matthew Wilcox
  0 siblings, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-03-29 21:25 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Huang Ying, linux-mm, Kirill A. Shutemov

On Fri, 2019-03-29 at 12:59 -0700, Matthew Wilcox wrote:
> I don't understand how we get to this situation.  We SetPageSwapCache()
> in add_to_swap_cache() right before we store the page in i_pages.
> We ClearPageSwapCache() in __delete_from_swap_cache() right after
> removing the page from the array.  So how do we find a page in a swap
> address space that has PageSwapCache cleared?
> 
> Indeed, we have a check which should trigger ...
> 
>         VM_BUG_ON_PAGE(!PageSwapCache(page), page);
> 
> in __delete_from_swap_cache().
> 
> Oh ... is it a race?
> 
>  * Its ok to check for PageSwapCache without the page lock
>  * here because we are going to recheck again inside
>  * try_to_free_swap() _with_ the lock.
> 
> so CPU A does:
> 
> page = find_get_page(swap_address_space(entry), offset)
>         page = find_subpage(page, offset);
> trylock_page(page);
> 
> while CPU B does:
> 
> xa_lock_irq(&address_space->i_pages);
> __delete_from_swap_cache(page, entry);
>         xas_store(&xas, NULL);
>         ClearPageSwapCache(page);
> xa_unlock_irq(&address_space->i_pages);
> 
> and if the ClearPageSwapCache happens between the xas_load() and the
> find_subpage(), we're stuffed.  CPU A has a reference to the page, but
> not a lock, and find_get_page is running under RCU.
> 
> I suppose we could fix this by taking the i_pages xa_lock around the
> call to find_get_pages().  If indeed, that's what this problem is.
> Want to try this patch?

Confirmed. Well spotted!

> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 2b8d9c3fbb47..ed8e42be88b5 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -127,10 +127,14 @@ static int __try_to_reclaim_swap(struct swap_info_struct
> *si,
>  				 unsigned long offset, unsigned long flags)
>  {
>  	swp_entry_t entry = swp_entry(si->type, offset);
> +	struct address_space *mapping = swap_address_space(entry);
> +	unsigned long irq_flags;
>  	struct page *page;
>  	int ret = 0;
>  
> -	page = find_get_page(swap_address_space(entry), offset);
> +	xa_lock_irqsave(&mapping->i_pages, irq_flags);
> +	page = find_get_page(mapping, offset);
> +	xa_unlock_irqrestore(&mapping->i_pages, irq_flags);
>  	if (!page)
>  		return 0;
>  	/*


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page cache: Store only head pages in i_pages
  2019-03-29 21:25               ` Qian Cai
@ 2019-03-30  3:04                 ` Matthew Wilcox
  2019-03-30 14:10                   ` Matthew Wilcox
  0 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox @ 2019-03-30  3:04 UTC (permalink / raw)
  To: Qian Cai; +Cc: Huang Ying, linux-mm, Kirill A. Shutemov

On Fri, Mar 29, 2019 at 05:25:34PM -0400, Qian Cai wrote:
> On Fri, 2019-03-29 at 12:59 -0700, Matthew Wilcox wrote:
> > Oh ... is it a race?
> > 
> > so CPU A does:
> > 
> > page = find_get_page(swap_address_space(entry), offset)
> >         page = find_subpage(page, offset);
> > trylock_page(page);
> > 
> > while CPU B does:
> > 
> > xa_lock_irq(&address_space->i_pages);
> > __delete_from_swap_cache(page, entry);
> >         xas_store(&xas, NULL);
> >         ClearPageSwapCache(page);
> > xa_unlock_irq(&address_space->i_pages);
> > 
> > and if the ClearPageSwapCache happens between the xas_load() and the
> > find_subpage(), we're stuffed.  CPU A has a reference to the page, but
> > not a lock, and find_get_page is running under RCU.
> > 
> > I suppose we could fix this by taking the i_pages xa_lock around the
> > call to find_get_pages().  If indeed, that's what this problem is.
> > Want to try this patch?
> 
> Confirmed. Well spotted!

Excellent!  I'm not comfortable with the rule that you have to be holding
the i_pages lock in order to call find_get_page() on a swap address_space.
How does this look to the various smart people who know far more about the
MM than I do?

The idea is to ensure that if this race does happen, the page will be
handled the same way as a pagecache page.  If __delete_from_swap_cache()
can be called while the page is still part of a VMA, then this patch
will break page_to_pgoff().  But I don't think that can happen ... ?

(also, is this the right memory barrier to use to ensure that the old
value of page->index cannot be observed if the PageSwapCache bit is
obseved as having been cleared?)

diff --git a/mm/swap_state.c b/mm/swap_state.c
index 2e15cc335966..a715efcf0991 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -165,13 +165,16 @@ void __delete_from_swap_cache(struct page *page, swp_entry_t entry)
 	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
 	VM_BUG_ON_PAGE(PageWriteback(page), page);
 
+	page->index = idx;
+	smp_mb__before_atomic();
+	ClearPageSwapCache(page);
+
 	for (i = 0; i < nr; i++) {
 		void *entry = xas_store(&xas, NULL);
 		VM_BUG_ON_PAGE(entry != page, entry);
 		set_page_private(page + i, 0);
 		xas_next(&xas);
 	}
-	ClearPageSwapCache(page);
 	address_space->nrpages -= nr;
 	__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr);
 	ADD_CACHE_INFO(del_total, nr);


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page cache: Store only head pages in i_pages
  2019-03-30  3:04                 ` Matthew Wilcox
@ 2019-03-30 14:10                   ` Matthew Wilcox
  2019-03-31  3:23                     ` Matthew Wilcox
  0 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox @ 2019-03-30 14:10 UTC (permalink / raw)
  To: Qian Cai; +Cc: Huang Ying, linux-mm, Kirill A. Shutemov

On Fri, Mar 29, 2019 at 08:04:32PM -0700, Matthew Wilcox wrote:
> Excellent!  I'm not comfortable with the rule that you have to be holding
> the i_pages lock in order to call find_get_page() on a swap address_space.
> How does this look to the various smart people who know far more about the
> MM than I do?
> 
> The idea is to ensure that if this race does happen, the page will be
> handled the same way as a pagecache page.  If __delete_from_swap_cache()
> can be called while the page is still part of a VMA, then this patch
> will break page_to_pgoff().  But I don't think that can happen ... ?

Oh, blah, that can totally happen.  reuse_swap_page() calls
delete_from_swap_cache().  Need a new plan.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page cache: Store only head pages in i_pages
  2019-03-30 14:10                   ` Matthew Wilcox
@ 2019-03-31  3:23                     ` Matthew Wilcox
  2019-04-01  9:18                       ` Kirill A. Shutemov
  0 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox @ 2019-03-31  3:23 UTC (permalink / raw)
  To: Qian Cai; +Cc: Huang Ying, linux-mm, Kirill A. Shutemov

On Sat, Mar 30, 2019 at 07:10:52AM -0700, Matthew Wilcox wrote:
> On Fri, Mar 29, 2019 at 08:04:32PM -0700, Matthew Wilcox wrote:
> > Excellent!  I'm not comfortable with the rule that you have to be holding
> > the i_pages lock in order to call find_get_page() on a swap address_space.
> > How does this look to the various smart people who know far more about the
> > MM than I do?
> > 
> > The idea is to ensure that if this race does happen, the page will be
> > handled the same way as a pagecache page.  If __delete_from_swap_cache()
> > can be called while the page is still part of a VMA, then this patch
> > will break page_to_pgoff().  But I don't think that can happen ... ?
> 
> Oh, blah, that can totally happen.  reuse_swap_page() calls
> delete_from_swap_cache().  Need a new plan.

I don't see a good solution here that doesn't involve withdrawing this
patch and starting over.  Bad solutions:

 - Take the i_pages lock around each page lookup call in the swap code
   (not just the one you found; there are others like mc_handle_swap_pte()
   in memcontrol.c)
 - Call synchronize_rcu() in __delete_from_swap_cache()
 - Swap the roles of ->index and ->private for swap pages, and then don't
   clear ->index when deleting a page from the swap cache

The first two would be slow and non-scalable.  The third is still prone
to a race where the page is looked up on one CPU, while another CPU
removes it from one swap file then moves it to a different location,
potentially in a different swap file.  Hard to hit, but not a race we
want to introduce.

I believe that the swap code actually never wants to see subpages.  So if
we start again, introducing APIs (eg find_get_head()) which return the
head page, then convert the swap code over to use those APIs, we don't
need to solve the problem of finding the subpage of a swap page while
not holding the page lock.

I'm obviously reluctant to withdraw the patch, but I don't see a better
option.  Your testing has revealed a problem that needs a deeper solution
than just adding a fix patch.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page cache: Store only head pages in i_pages
  2019-03-31  3:23                     ` Matthew Wilcox
@ 2019-04-01  9:18                       ` Kirill A. Shutemov
  2019-04-01  9:27                         ` Kirill A. Shutemov
  0 siblings, 1 reply; 21+ messages in thread
From: Kirill A. Shutemov @ 2019-04-01  9:18 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Qian Cai, Huang Ying, linux-mm

On Sat, Mar 30, 2019 at 08:23:26PM -0700, Matthew Wilcox wrote:
> On Sat, Mar 30, 2019 at 07:10:52AM -0700, Matthew Wilcox wrote:
> > On Fri, Mar 29, 2019 at 08:04:32PM -0700, Matthew Wilcox wrote:
> > > Excellent!  I'm not comfortable with the rule that you have to be holding
> > > the i_pages lock in order to call find_get_page() on a swap address_space.
> > > How does this look to the various smart people who know far more about the
> > > MM than I do?
> > > 
> > > The idea is to ensure that if this race does happen, the page will be
> > > handled the same way as a pagecache page.  If __delete_from_swap_cache()
> > > can be called while the page is still part of a VMA, then this patch
> > > will break page_to_pgoff().  But I don't think that can happen ... ?
> > 
> > Oh, blah, that can totally happen.  reuse_swap_page() calls
> > delete_from_swap_cache().  Need a new plan.
> 
> I don't see a good solution here that doesn't involve withdrawing this
> patch and starting over.  Bad solutions:
> 
>  - Take the i_pages lock around each page lookup call in the swap code
>    (not just the one you found; there are others like mc_handle_swap_pte()
>    in memcontrol.c)
>  - Call synchronize_rcu() in __delete_from_swap_cache()
>  - Swap the roles of ->index and ->private for swap pages, and then don't
>    clear ->index when deleting a page from the swap cache
> 
> The first two would be slow and non-scalable.  The third is still prone
> to a race where the page is looked up on one CPU, while another CPU
> removes it from one swap file then moves it to a different location,
> potentially in a different swap file.  Hard to hit, but not a race we
> want to introduce.
> 
> I believe that the swap code actually never wants to see subpages.  So if
> we start again, introducing APIs (eg find_get_head()) which return the
> head page, then convert the swap code over to use those APIs, we don't
> need to solve the problem of finding the subpage of a swap page while
> not holding the page lock.
> 
> I'm obviously reluctant to withdraw the patch, but I don't see a better
> option.  Your testing has revealed a problem that needs a deeper solution
> than just adding a fix patch.

Hm. Isn't the problem with VM_BUGs themself? I mean find_subpage()
produces right result (or am I wrong here?), but VM_BUGs flags it as wrong.

Maybe we should relax the VM_BUGs?

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page cache: Store only head pages in i_pages
  2019-04-01  9:18                       ` Kirill A. Shutemov
@ 2019-04-01  9:27                         ` Kirill A. Shutemov
  2019-04-04 13:10                           ` Qian Cai
  0 siblings, 1 reply; 21+ messages in thread
From: Kirill A. Shutemov @ 2019-04-01  9:27 UTC (permalink / raw)
  To: Matthew Wilcox; +Cc: Qian Cai, Huang Ying, linux-mm

On Mon, Apr 01, 2019 at 12:18:58PM +0300, Kirill A. Shutemov wrote:
> On Sat, Mar 30, 2019 at 08:23:26PM -0700, Matthew Wilcox wrote:
> > On Sat, Mar 30, 2019 at 07:10:52AM -0700, Matthew Wilcox wrote:
> > > On Fri, Mar 29, 2019 at 08:04:32PM -0700, Matthew Wilcox wrote:
> > > > Excellent!  I'm not comfortable with the rule that you have to be holding
> > > > the i_pages lock in order to call find_get_page() on a swap address_space.
> > > > How does this look to the various smart people who know far more about the
> > > > MM than I do?
> > > > 
> > > > The idea is to ensure that if this race does happen, the page will be
> > > > handled the same way as a pagecache page.  If __delete_from_swap_cache()
> > > > can be called while the page is still part of a VMA, then this patch
> > > > will break page_to_pgoff().  But I don't think that can happen ... ?
> > > 
> > > Oh, blah, that can totally happen.  reuse_swap_page() calls
> > > delete_from_swap_cache().  Need a new plan.
> > 
> > I don't see a good solution here that doesn't involve withdrawing this
> > patch and starting over.  Bad solutions:
> > 
> >  - Take the i_pages lock around each page lookup call in the swap code
> >    (not just the one you found; there are others like mc_handle_swap_pte()
> >    in memcontrol.c)
> >  - Call synchronize_rcu() in __delete_from_swap_cache()
> >  - Swap the roles of ->index and ->private for swap pages, and then don't
> >    clear ->index when deleting a page from the swap cache
> > 
> > The first two would be slow and non-scalable.  The third is still prone
> > to a race where the page is looked up on one CPU, while another CPU
> > removes it from one swap file then moves it to a different location,
> > potentially in a different swap file.  Hard to hit, but not a race we
> > want to introduce.
> > 
> > I believe that the swap code actually never wants to see subpages.  So if
> > we start again, introducing APIs (eg find_get_head()) which return the
> > head page, then convert the swap code over to use those APIs, we don't
> > need to solve the problem of finding the subpage of a swap page while
> > not holding the page lock.
> > 
> > I'm obviously reluctant to withdraw the patch, but I don't see a better
> > option.  Your testing has revealed a problem that needs a deeper solution
> > than just adding a fix patch.
> 
> Hm. Isn't the problem with VM_BUGs themself? I mean find_subpage()
> produces right result (or am I wrong here?), but VM_BUGs flags it as wrong.

Yeah, I'm wrong. :P

What about patch like this? (completely untested)

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index f939e004c5d1..e3b9bf843dcb 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -335,12 +335,12 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
 
 static inline struct page *find_subpage(struct page *page, pgoff_t offset)
 {
-	unsigned long index = page_index(page);
+	unsigned long mask;
 
 	VM_BUG_ON_PAGE(PageTail(page), page);
-	VM_BUG_ON_PAGE(index > offset, page);
-	VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
-	return page - index + offset;
+
+	mask = (1UL << compound_order(page)) - 1;
+	return page + (offset & mask);
 }
 
 struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page cache: Store only head pages in i_pages
  2019-04-01  9:27                         ` Kirill A. Shutemov
@ 2019-04-04 13:10                           ` Qian Cai
  2019-04-04 13:45                             ` Kirill A. Shutemov
  0 siblings, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-04-04 13:10 UTC (permalink / raw)
  To: Kirill A. Shutemov, Matthew Wilcox; +Cc: Huang Ying, linux-mm

On Mon, 2019-04-01 at 12:27 +0300, Kirill A. Shutemov wrote:
> What about patch like this? (completely untested)
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index f939e004c5d1..e3b9bf843dcb 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -335,12 +335,12 @@ static inline struct page *grab_cache_page_nowait(struct
> address_space *mapping,
>  
>  static inline struct page *find_subpage(struct page *page, pgoff_t offset)
>  {
> -	unsigned long index = page_index(page);
> +	unsigned long mask;
>  
>  	VM_BUG_ON_PAGE(PageTail(page), page);
> -	VM_BUG_ON_PAGE(index > offset, page);
> -	VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
> -	return page - index + offset;
> +
> +	mask = (1UL << compound_order(page)) - 1;
> +	return page + (offset & mask);
>  }
>  
>  struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);

No, this then leads to a panic below by LTP hugemmap05.  Still reverting the
whole "mm: page cache: store only head pages in i_pages" commit fixed the
problem.

# /opt/ltp/testcases/bin/hugemmap05
tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s
hugemmap05.c:235: INFO: original nr_hugepages is 0
hugemmap05.c:248: INFO: original nr_overcommit_hugepages is 0
hugemmap05.c:116: INFO: check /proc/meminfo before allocation.
hugemmap05.c:297: INFO: HugePages_Total is 192.
hugemmap05.c:297: INFO: HugePages_Free is 192.
hugemmap05.c:297: INFO: HugePages_Surp is 64.
hugemmap05.c:297: INFO: HugePages_Rsvd is 192.
hugemmap05.c:272: INFO: First hex is 7070707
hugemmap05.c:151: INFO: check /proc/meminfo.
hugemmap05.c:297: INFO: HugePages_Total is 192.
hugemmap05.c:297: INFO: HugePages_Free is 0.
hugemmap05.c:297: INFO: HugePages_Surp is 64.
hugemmap05.c:297: INFO: HugePages_Rsvd is 0.


[10022.547977] ------------[ cut here ]------------ 
[10022.571941] kernel BUG at fs/hugetlbfs/inode.c:475! 
[10022.598304] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI 
[10022.626383] CPU: 39 PID: 13074 Comm: hugemmap05 Kdump: loaded Tainted:
G        W         5.1.0-rc3-next-20190403+ #16 
[10022.674421] Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS
U19 12/27/2015 
[10022.711990] RIP: 0010:remove_inode_hugepages+0x706/0xa60 
[10022.735997] Code: fd ff ff e8 9c a0 99 ff e9 bc fc ff ff 48 c7 c6 40 ae 50 9f
4c 89 f7 e8 c8 3f ca ff 0f 0b 48 c7 c7 80 18 ba 9f e8 2f 63 15 00 <0f> 0b 48 c7
c7 40 18 ba 9f e8 21 63 15 00 48 8b bd 88 fd ff ff e8 
[10022.820547] RSP: 0018:ffff88883ea5f920 EFLAGS: 00010202 
[10022.844039] RAX: 015fffe000002000 RBX: 0000000000000001 RCX:
ffffffff9e2adf5c 
[10022.876130] RDX: 0000000000000001 RSI: 00000000000001df RDI:
ffffea001a0f8048 
[10022.908202] RBP: ffff88883ea5fbf8 R08: fffff9400341f00b R09:
fffff9400341f00a 
[10022.940369] R10: fffff9400341f00a R11: ffffea001a0f8057 R12:
0000000000000001 
[10022.972615] R13: ffff88883ea5fbd0 R14: ffffea001a0f8040 R15:
dffffc0000000000 
[10023.004633] FS:  00007ff5964d7740(0000) GS:ffff888847b80000(0000)
knlGS:0000000000000000 
[10023.040462] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033 
[10023.066242] CR2: 00007ff595800000 CR3: 00000004be5d0006 CR4:
00000000001606a0 
[10023.103426] Call Trace: 
[10023.114997]  ? hugetlbfs_size_to_hpages+0xe0/0xe0 
[10023.136032]  ? fsnotify_grab_connector+0x9f/0x130 
[10023.157131]  ? __lock_acquire.isra.14+0x7d7/0x2130 
[10023.178540]  ? kasan_check_read+0x11/0x20 
[10023.196471]  ? do_raw_spin_unlock+0x59/0x250 
[10023.215893]  hugetlbfs_evict_inode+0x20/0x90 
[10023.235249]  evict+0x2a4/0x5c0 
[10023.249393]  ? do_raw_spin_unlock+0x59/0x250 
[10023.268885]  iput+0x3d9/0x790 
[10023.282210]  do_unlinkat+0x461/0x650 
[10023.298318]  ? __x64_sys_rmdir+0x40/0x40 
[10023.316058]  ? __check_object_size+0x4b4/0x7f1 
[10023.336241]  ? __kasan_kmalloc.constprop.1+0xac/0xc0 
[10023.358681]  ? blkcg_exit_queue+0x1a0/0x1a0 
[10023.377428]  ? getname_flags+0x90/0x400 
[10023.394859]  __x64_sys_unlink+0x3e/0x50 
[10023.411987]  do_syscall_64+0xeb/0xb78 
[10023.428386]  ? syscall_return_slowpath+0x160/0x160 
[10023.449987]  ? __do_page_fault+0x583/0x8d0 
[10023.468333]  ? schedule+0x81/0x180 
[10023.483515]  ? exit_to_usermode_loop+0xab/0x100 
[10023.503763]  entry_SYSCALL_64_after_hwframe+0x44/0xa9 
[10023.526369] RIP: 0033:0x7ff595bbcedb 


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page cache: Store only head pages in i_pages
  2019-04-04 13:10                           ` Qian Cai
@ 2019-04-04 13:45                             ` Kirill A. Shutemov
  2019-04-04 21:28                               ` Qian Cai
  0 siblings, 1 reply; 21+ messages in thread
From: Kirill A. Shutemov @ 2019-04-04 13:45 UTC (permalink / raw)
  To: Qian Cai; +Cc: Matthew Wilcox, Huang Ying, linux-mm

On Thu, Apr 04, 2019 at 09:10:10AM -0400, Qian Cai wrote:
> On Mon, 2019-04-01 at 12:27 +0300, Kirill A. Shutemov wrote:
> > What about patch like this? (completely untested)
> > 
> > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> > index f939e004c5d1..e3b9bf843dcb 100644
> > --- a/include/linux/pagemap.h
> > +++ b/include/linux/pagemap.h
> > @@ -335,12 +335,12 @@ static inline struct page *grab_cache_page_nowait(struct
> > address_space *mapping,
> >  
> >  static inline struct page *find_subpage(struct page *page, pgoff_t offset)
> >  {
> > -	unsigned long index = page_index(page);
> > +	unsigned long mask;
> >  
> >  	VM_BUG_ON_PAGE(PageTail(page), page);
> > -	VM_BUG_ON_PAGE(index > offset, page);
> > -	VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
> > -	return page - index + offset;
> > +
> > +	mask = (1UL << compound_order(page)) - 1;
> > +	return page + (offset & mask);
> >  }
> >  
> >  struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
> 
> No, this then leads to a panic below by LTP hugemmap05.  Still reverting the
> whole "mm: page cache: store only head pages in i_pages" commit fixed the
> problem.

Ughh... hugetlb stores pages in page cache differently.

What about this:

diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index f939e004c5d1..2e8438a1216a 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -335,12 +335,15 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
 
 static inline struct page *find_subpage(struct page *page, pgoff_t offset)
 {
-	unsigned long index = page_index(page);
+	unsigned long mask;
+
+	if (PageHuge(page))
+		return page;
 
 	VM_BUG_ON_PAGE(PageTail(page), page);
-	VM_BUG_ON_PAGE(index > offset, page);
-	VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
-	return page - index + offset;
+
+	mask = (1UL << compound_order(page)) - 1;
+	return page + (offset & mask);
 }
 
 struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page cache: Store only head pages in i_pages
  2019-04-04 13:45                             ` Kirill A. Shutemov
@ 2019-04-04 21:28                               ` Qian Cai
  2019-04-05 13:37                                 ` Kirill A. Shutemov
  0 siblings, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-04-04 21:28 UTC (permalink / raw)
  To: Kirill A. Shutemov; +Cc: Matthew Wilcox, Huang Ying, linux-mm

On Thu, 2019-04-04 at 16:45 +0300, Kirill A. Shutemov wrote:
> What about this:
> 
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index f939e004c5d1..2e8438a1216a 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -335,12 +335,15 @@ static inline struct page *grab_cache_page_nowait(struct
> address_space *mapping,
>  
>  static inline struct page *find_subpage(struct page *page, pgoff_t offset)
>  {
> -	unsigned long index = page_index(page);
> +	unsigned long mask;
> +
> +	if (PageHuge(page))
> +		return page;
>  
>  	VM_BUG_ON_PAGE(PageTail(page), page);
> -	VM_BUG_ON_PAGE(index > offset, page);
> -	VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
> -	return page - index + offset;
> +
> +	mask = (1UL << compound_order(page)) - 1;
> +	return page + (offset & mask);
>  }
>  
>  struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);

It works fine.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page cache: Store only head pages in i_pages
  2019-04-04 21:28                               ` Qian Cai
@ 2019-04-05 13:37                                 ` Kirill A. Shutemov
  2019-04-05 13:51                                   ` Matthew Wilcox
  0 siblings, 1 reply; 21+ messages in thread
From: Kirill A. Shutemov @ 2019-04-05 13:37 UTC (permalink / raw)
  To: Qian Cai; +Cc: Matthew Wilcox, Huang Ying, linux-mm

On Thu, Apr 04, 2019 at 05:28:02PM -0400, Qian Cai wrote:
> On Thu, 2019-04-04 at 16:45 +0300, Kirill A. Shutemov wrote:
> > What about this:
> > 
> > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> > index f939e004c5d1..2e8438a1216a 100644
> > --- a/include/linux/pagemap.h
> > +++ b/include/linux/pagemap.h
> > @@ -335,12 +335,15 @@ static inline struct page *grab_cache_page_nowait(struct
> > address_space *mapping,
> >  
> >  static inline struct page *find_subpage(struct page *page, pgoff_t offset)
> >  {
> > -	unsigned long index = page_index(page);
> > +	unsigned long mask;
> > +
> > +	if (PageHuge(page))
> > +		return page;
> >  
> >  	VM_BUG_ON_PAGE(PageTail(page), page);
> > -	VM_BUG_ON_PAGE(index > offset, page);
> > -	VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
> > -	return page - index + offset;
> > +
> > +	mask = (1UL << compound_order(page)) - 1;
> > +	return page + (offset & mask);
> >  }
> >  
> >  struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
> 
> It works fine.

Nice.

Matthew, does it look fine to you?

-- 
 Kirill A. Shutemov


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: page cache: Store only head pages in i_pages
  2019-04-05 13:37                                 ` Kirill A. Shutemov
@ 2019-04-05 13:51                                   ` Matthew Wilcox
  0 siblings, 0 replies; 21+ messages in thread
From: Matthew Wilcox @ 2019-04-05 13:51 UTC (permalink / raw)
  To: Kirill A. Shutemov, Andrew Morton; +Cc: Qian Cai, Huang Ying, linux-mm

On Fri, Apr 05, 2019 at 04:37:42PM +0300, Kirill A. Shutemov wrote:
> On Thu, Apr 04, 2019 at 05:28:02PM -0400, Qian Cai wrote:
> > On Thu, 2019-04-04 at 16:45 +0300, Kirill A. Shutemov wrote:
> > > What about this:
> > > 
> > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> > > index f939e004c5d1..2e8438a1216a 100644
> > > --- a/include/linux/pagemap.h
> > > +++ b/include/linux/pagemap.h
> > > @@ -335,12 +335,15 @@ static inline struct page *grab_cache_page_nowait(struct
> > > address_space *mapping,
> > >  
> > >  static inline struct page *find_subpage(struct page *page, pgoff_t offset)
> > >  {
> > > -	unsigned long index = page_index(page);
> > > +	unsigned long mask;
> > > +
> > > +	if (PageHuge(page))
> > > +		return page;
> > >  
> > >  	VM_BUG_ON_PAGE(PageTail(page), page);
> > > -	VM_BUG_ON_PAGE(index > offset, page);
> > > -	VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
> > > -	return page - index + offset;
> > > +
> > > +	mask = (1UL << compound_order(page)) - 1;
> > > +	return page + (offset & mask);
> > >  }
> > >  
> > >  struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
> > 
> > It works fine.
> 
> Nice.
> 
> Matthew, does it look fine to you?

Yes, this looks good to me.  I think we'll have opportunities to improve
it later (eg when unifying THP and hugetlbfs uses of the page cache).
But for now, Andrew can you add this version as -fix-fix?


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, back to index

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <1553285568.26196.24.camel@lca.pw>
2019-03-23  3:38 ` page cache: Store only head pages in i_pages Matthew Wilcox
2019-03-23 23:50   ` Qian Cai
2019-03-24  2:06     ` Matthew Wilcox
2019-03-24  2:52       ` Qian Cai
2019-03-24  3:04         ` Matthew Wilcox
2019-03-24 15:42           ` Qian Cai
2019-03-27 10:48           ` William Kucharski
2019-03-27 11:50             ` Matthew Wilcox
2019-03-29  1:43           ` Qian Cai
2019-03-29 19:59             ` Matthew Wilcox
2019-03-29 21:25               ` Qian Cai
2019-03-30  3:04                 ` Matthew Wilcox
2019-03-30 14:10                   ` Matthew Wilcox
2019-03-31  3:23                     ` Matthew Wilcox
2019-04-01  9:18                       ` Kirill A. Shutemov
2019-04-01  9:27                         ` Kirill A. Shutemov
2019-04-04 13:10                           ` Qian Cai
2019-04-04 13:45                             ` Kirill A. Shutemov
2019-04-04 21:28                               ` Qian Cai
2019-04-05 13:37                                 ` Kirill A. Shutemov
2019-04-05 13:51                                   ` Matthew Wilcox

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git