* Re: page cache: Store only head pages in i_pages [not found] <1553285568.26196.24.camel@lca.pw> @ 2019-03-23 3:38 ` Matthew Wilcox 2019-03-23 23:50 ` Qian Cai 0 siblings, 1 reply; 21+ messages in thread From: Matthew Wilcox @ 2019-03-23 3:38 UTC (permalink / raw) To: Qian Cai; +Cc: Huang Ying, linux-mm On Fri, Mar 22, 2019 at 04:12:48PM -0400, Qian Cai wrote: > FYI, every thing involve swapping seems triggered a panic now since this patch. Thanks for the report! Does this fix it? diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 41858a3744b4..975aea9a49a5 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -335,6 +335,8 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping, static inline struct page *find_subpage(struct page *page, pgoff_t offset) { VM_BUG_ON_PAGE(PageTail(page), page); + if (unlikely(PageSwapCache(page))) + return page; VM_BUG_ON_PAGE(page->index > offset, page); VM_BUG_ON_PAGE(page->index + compound_nr(page) <= offset, page); return page - page->index + offset; Huang, I'm pretty sure this isn't right for CONFIG_THP_SWAP, but I'm not sure what the right answer is. The patch this is on top of includes: @@ -132,7 +132,7 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp) for (i = 0; i < nr; i++) { VM_BUG_ON_PAGE(xas.xa_index != idx + i, page); set_page_private(page + i, entry.val + i); - xas_store(&xas, page + i); + xas_store(&xas, page); xas_next(&xas); } address_space->nrpages += nr; so if we've added a THP page, we're going to find the head page. I'm not sure I understand how to get from the head page to the right subpage. Is it as simple as: + if (unlikely(PageSwapCache(page))) + return page + (offset & (compound_nr(page) - 1)); or are they not stored at an aligned location? > [11653.484481] page:ffffea0006ef7080 count:2 mapcount:0 mapping:0000000000000000 > index:0x0 > [11653.525397] swap_aops > [11653.525404] flags: > 0x5fffe000080454(uptodate|lru|workingset|owner_priv_1|swapbacked) > [11653.573631] raw: 005fffe000080454 ffffea0006ef7048 ffffea0007c9c7c8 > 0000000000000000 > [11653.608547] raw: 0000000000000000 0000000000001afd 00000002ffffffff > 0000000000000000 > [11653.643436] page dumped because: VM_BUG_ON_PAGE(page->index + (1 << > compound_order(page)) <= offset) > [11653.684322] page allocated via order 0, migratetype Movable, gfp_mask > 0x100cca(GFP_HIGHUSER_MOVABLE) > [11653.725462] prep_new_page+0x3a4/0x4d0 > [11653.742373] get_page_from_freelist+0xcde/0x3550 > [11653.763449] __alloc_pages_nodemask+0x859/0x2ab0 > [11653.784105] alloc_pages_vma+0xb2/0x430 > [11653.801248] __read_swap_cache_async+0x49c/0xc30 > [11653.821943] swap_cluster_readahead+0x4a1/0x8b0 > [11653.842224] swapin_readahead+0xb6/0xc3e > [11653.859894] do_swap_page+0xc87/0x24b0 > [11653.876664] __handle_mm_fault+0x1601/0x3bc0 > [11653.895818] handle_mm_fault+0x326/0x6cf > [11653.913443] __do_page_fault+0x333/0x8d0 > [11653.931068] do_page_fault+0x75/0x48e > [11653.947560] page_fault+0x1b/0x20 > [11653.962558] ------------[ cut here ]------------ > [11653.983290] kernel BUG at include/linux/pagemap.h:341! > [11654.006336] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI > [11654.036835] CPU: 12 PID: 14006 Comm: in:imjournal Kdump: loaded Tainted: > G W 5.1.0-rc1-mm1+ #17 > [11654.084191] Hardware name: HP ProLiant DL80 Gen9/ProLiant DL80 Gen9, BIOS U15 > 09/12/2016 > [11654.120401] RIP: 0010:find_get_entry+0x618/0x7f0 > [11654.141171] Code: c6 60 ae e9 97 4c 89 ff e8 c5 b2 0c 00 0f 0b 48 c7 c7 20 3b > 42 98 e8 3c 3c 57 00 48 c7 c6 e0 b0 e9 97 4c 89 ff e8 a8 b2 0c 00 <0f> 0b 48 c7 > c7 e0 3a 42 98 e8 1f 3c 57 00 48 c7 c7 88 62 7e 98 e8 > [11654.226539] RSP: 0000:ffff8882095b7628 EFLAGS: 00010286 > [11654.249867] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff9735700e > [11654.281925] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff888213e4d332 > [11654.313411] RBP: ffff8882095b7738 R08: ffffed1042d86a89 R09: ffffed1042d86a88 > [11654.346314] R10: ffffed1042d86a88 R11: ffff888216c35447 R12: 0000000000000000 > [11654.379144] R13: ffffea0006ef70a0 R14: ffff8881e5c16fc8 R15: ffffea0006ef7080 > [11654.411410] FS: 00007fb8daf56700(0000) GS:ffff888216c00000(0000) > knlGS:0000000000000000 > [11654.447618] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [11654.473481] CR2: 00007f74c0647000 CR3: 0000000204860003 CR4: 00000000001606a0 > [11654.506282] Call Trace: > [11654.517340] ? __filemap_set_wb_err+0x1f0/0x1f0 > [11654.538684] ? generic_make_request+0x283/0xc50 > [11654.562239] ? mem_cgroup_uncharge+0x150/0x150 > [11654.582551] pagecache_get_page+0x4a/0xb70 > [11654.600807] ? release_pages+0xada/0x1750 > [11654.618847] __read_swap_cache_async+0x1a8/0xc30 > [11654.640167] ? lookup_swap_cache+0x570/0x570 > [11654.659274] read_swap_cache_async+0x69/0xd0 > [11654.678354] ? __read_swap_cache_async+0xc30/0xc30 > [11654.699775] ? lru_add_drain_cpu+0x239/0x4e0 > [11654.718855] swap_cluster_readahead+0x386/0x8b0 > [11654.739171] ? read_swap_cache_async+0xd0/0xd0 > [11654.760517] ? xas_load+0x8b/0xf0 > [11654.775357] ? find_get_entry+0x39e/0x7f0 > [11654.793267] swapin_readahead+0xb6/0xc3e > [11654.810391] ? exit_swap_address_space+0x1b0/0x1b0 > [11654.831816] ? lookup_swap_cache+0x114/0x570 > [11654.850972] ? xas_find+0x141/0x530 > [11654.866616] ? free_pages_and_swap_cache+0x2f0/0x2f0 > [11654.889495] ? swapcache_prepare+0x20/0x20 > [11654.907786] ? filemap_map_pages+0x3af/0xec0 > [11654.926919] do_swap_page+0xc87/0x24b0 > [11654.943758] ? unmap_mapping_range+0x30/0x30 > [11654.962918] ? kasan_check_read+0x11/0x20 > [11654.980900] ? do_raw_spin_unlock+0x59/0x250 > [11655.000027] __handle_mm_fault+0x1601/0x3bc0 > [11655.020130] ? __lock_acquire.isra.14+0x7d7/0x2130 > [11655.041845] ? vmf_insert_mixed_mkwrite+0x20/0x20 > [11655.066529] ? lock_acquire+0x169/0x360 > [11655.085380] handle_mm_fault+0x326/0x6cf > [11655.102928] __do_page_fault+0x333/0x8d0 > [11655.120429] ? task_work_run+0xdd/0x190 > [11655.138322] do_page_fault+0x75/0x48e > [11655.154693] ? page_fault+0x5/0x20 > [11655.170168] page_fault+0x1b/0x20 > [11655.185007] RIP: 0033:0x7fb8dde3ae14 > [11655.201007] Code: 00 0f 1f 44 00 00 f3 0f 1e fa f2 ff 25 65 1a 29 00 0f 1f 44 > 00 00 f3 0f 1e fa f2 ff 25 5d 1a 29 00 0f 1f 44 00 00 f3 0f 1e fa <f2> ff 25 55 > 1a 29 00 0f 1f 44 00 00 f3 0f 1e fa f2 ff 25 4d 1a 29 > [11655.286038] RSP: 002b:00007fb8daf55b28 EFLAGS: 00010246 > [11655.309443] RAX: 0000000000000000 RBX: 0000562a76a844f0 RCX: 0000000000000000 > [11655.341503] RDX: 00007fb8daf55ac0 RSI: 0000000000000001 RDI: 0000562a76a844f0 > [11655.373906] RBP: 00000000000dbba0 R08: 0000000000000008 R09: 0000000000000000 > [11655.407498] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000e > [11655.439429] R13: 00007fb8da036760 R14: 00007fb8daf55be0 R15: 00007fb8daf55bd0 > [11655.471484] Modules linked in: brd nls_iso8859_1 nls_cp437 vfat fat ext4 > crc16 mbcache jbd2 overlay loop kvm_intel kvm irqbypass ip_tables x_tables xfs > sd_mod igb ahci i2c_algo_bit libahci libata i2c_core dm_mirror dm_region_hash > dm_log dm_mod ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages 2019-03-23 3:38 ` page cache: Store only head pages in i_pages Matthew Wilcox @ 2019-03-23 23:50 ` Qian Cai 2019-03-24 2:06 ` Matthew Wilcox 0 siblings, 1 reply; 21+ messages in thread From: Qian Cai @ 2019-03-23 23:50 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Huang Ying, linux-mm On 3/22/19 11:38 PM, Matthew Wilcox wrote: > On Fri, Mar 22, 2019 at 04:12:48PM -0400, Qian Cai wrote: >> FYI, every thing involve swapping seems triggered a panic now since this patch. > > Thanks for the report! Does this fix it? > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > index 41858a3744b4..975aea9a49a5 100644 > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -335,6 +335,8 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping, > static inline struct page *find_subpage(struct page *page, pgoff_t offset) > { > VM_BUG_ON_PAGE(PageTail(page), page); > + if (unlikely(PageSwapCache(page))) > + return page; > VM_BUG_ON_PAGE(page->index > offset, page); > VM_BUG_ON_PAGE(page->index + compound_nr(page) <= offset, page); > return page - page->index + offset; Yes, it works. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages 2019-03-23 23:50 ` Qian Cai @ 2019-03-24 2:06 ` Matthew Wilcox 2019-03-24 2:52 ` Qian Cai 0 siblings, 1 reply; 21+ messages in thread From: Matthew Wilcox @ 2019-03-24 2:06 UTC (permalink / raw) To: Qian Cai; +Cc: Huang Ying, linux-mm On Sat, Mar 23, 2019 at 07:50:15PM -0400, Qian Cai wrote: > On 3/22/19 11:38 PM, Matthew Wilcox wrote: > > On Fri, Mar 22, 2019 at 04:12:48PM -0400, Qian Cai wrote: > >> FYI, every thing involve swapping seems triggered a panic now since this patch. > > Yes, it works. Thanks for testing. Kirill suggests this would be a better fix: diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index 41858a3744b4..9718393ae45b 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -334,10 +334,12 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping, static inline struct page *find_subpage(struct page *page, pgoff_t offset) { + unsigned long index = page_index(page); + VM_BUG_ON_PAGE(PageTail(page), page); - VM_BUG_ON_PAGE(page->index > offset, page); - VM_BUG_ON_PAGE(page->index + compound_nr(page) <= offset, page); - return page - page->index + offset; + VM_BUG_ON_PAGE(index > offset, page); + VM_BUG_ON_PAGE(index + compound_nr(page) <= offset, page); + return page - index + offset; } struct page *find_get_entry(struct address_space *mapping, pgoff_t offset); ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages 2019-03-24 2:06 ` Matthew Wilcox @ 2019-03-24 2:52 ` Qian Cai 2019-03-24 3:04 ` Matthew Wilcox 0 siblings, 1 reply; 21+ messages in thread From: Qian Cai @ 2019-03-24 2:52 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Huang Ying, linux-mm On 3/23/19 10:06 PM, Matthew Wilcox wrote: > Thanks for testing. Kirill suggests this would be a better fix: > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > index 41858a3744b4..9718393ae45b 100644 > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -334,10 +334,12 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping, > > static inline struct page *find_subpage(struct page *page, pgoff_t offset) > { > + unsigned long index = page_index(page); > + > VM_BUG_ON_PAGE(PageTail(page), page); > - VM_BUG_ON_PAGE(page->index > offset, page); > - VM_BUG_ON_PAGE(page->index + compound_nr(page) <= offset, page); > - return page - page->index + offset; > + VM_BUG_ON_PAGE(index > offset, page); > + VM_BUG_ON_PAGE(index + compound_nr(page) <= offset, page); > + return page - index + offset; > } > > struct page *find_get_entry(struct address_space *mapping, pgoff_t offset); This is not even compiled. If "s/compound_nr/compound_order/", it failed to boot here, [ 56.843236] Unpacking initramfs... [ 56.881979] page:ffff7fe022eb19c0 count:3 mapcount:0 mapping:38ff80080099c008 index:0x0 [ 56.890007] ramfs_aops [ 56.890011] name:"lvm.conf" [ 56.892465] flags: 0x17fffffa400000c(uptodate|dirty) [ 56.900318] raw: 017fffffa400000c dead000000000100 dead000000000200 38ff80080099c008 [ 56.908066] raw: 0000000000000000 0000000000000000 00000003ffffffff 7bff8008203bcc80 [ 56.915812] page dumped because: VM_BUG_ON_PAGE(index + compound_order(page) <= offset) [ 56.923818] page->mem_cgroup:7bff8008203bcc80 [ 56.928180] page allocated via order 0, migratetype Unmovable, gfp_mask 0x100cc2(GFP_HIGHUSER) [ 56.936800] prep_new_page+0x4e0/0x5e0 [ 56.940556] get_page_from_freelist+0x4cf4/0x50e0 [ 56.945265] __alloc_pages_nodemask+0x738/0x38b8 [ 56.949888] alloc_page_interleave+0x34/0x2f0 [ 56.954249] alloc_pages_current+0xc0/0x150 [ 56.958439] __page_cache_alloc+0x70/0x2f4 [ 56.962541] pagecache_get_page+0x5e4/0xaf0 [ 56.966729] grab_cache_page_write_begin+0x6c/0x98 [ 56.971526] simple_write_begin+0x40/0x308 [ 56.975627] generic_perform_write+0x1d4/0x4e0 [ 56.980076] __generic_file_write_iter+0x294/0x504 [ 56.984872] generic_file_write_iter+0x354/0x594 [ 56.989496] __vfs_write+0x72c/0x8a0 [ 56.993076] vfs_write+0x1ec/0x424 [ 56.996481] ksys_write+0xbc/0x190 [ 56.999890] xwrite+0x38/0x84 [ 57.002869] ------------[ cut here ]------------ [ 57.007478] kernel BUG at ./include/linux/pagemap.h:342! [ 57.012801] Internal error: Oops - BUG: 0 [#1] SMP [ 57.017584] Modules linked in: [ 57.020636] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc1-mm1+ #7 [ 57.027239] Hardware name: HPE Apollo 70 /C01_APACHE_MB , BIOS L50_5.13_1.0.6 07/10/2018 [ 57.037057] pstate: 60400009 (nZCv daif +PAN -UAO) [ 57.041839] pc : find_get_entry+0x948/0x950 [ 57.046013] lr : find_get_entry+0x940/0x950 [ 57.050185] sp : 10ff80082600f420 [ 57.053489] x29: 10ff80082600f4d0 x28: efff100000000000 [ 57.058792] x27: ffff7fe022eb19c8 x26: 0000000000000010 [ 57.064095] x25: 0000000000000035 x24: 0000000000000003 [ 57.069397] x23: 00000000000000ff x22: 10ff80082600f460 [ 57.074700] x21: ffff7fe022eb19c0 x20: 35ff800825f9a050 [ 57.080002] x19: 0000000000000000 x18: 0000000000000000 [ 57.085304] x17: 0000000000000000 x16: 000000000000000a [ 57.090606] x15: 35ff800825f9a0b8 x14: 0000000000000000 [ 57.095908] x13: ffff800825f9a050 x12: 00000000ffffffff [ 57.101210] x11: 0000000000000003 x10: 00000000000000ff [ 57.106512] x9 : e071b95619aca700 x8 : e071b95619aca700 [ 57.111814] x7 : 0000000000000000 x6 : ffff1000102d01f4 [ 57.117116] x5 : 0000000000000000 x4 : 0000000000000080 [ 57.122418] x3 : ffff1000102b84c8 x2 : 0000000000000000 [ 57.127720] x1 : 0000000000000004 x0 : ffff100013316b10 [ 57.133024] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____)) [ 57.139715] Call trace: [ 57.142153] find_get_entry+0x948/0x950 [ 57.145979] pagecache_get_page+0x68/0xaf0 [ 57.150066] grab_cache_page_write_begin+0x6c/0x98 [ 57.154847] simple_write_begin+0x40/0x308 [ 57.158934] generic_perform_write+0x1d4/0x4e0 [ 57.163368] __generic_file_write_iter+0x294/0x504 [ 57.168150] generic_file_write_iter+0x354/0x594 [ 57.172757] __vfs_write+0x72c/0x8a0 [ 57.176323] vfs_write+0x1ec/0x424 [ 57.179715] ksys_write+0xbc/0x190 [ 57.183107] xwrite+0x38/0x84 [ 57.186066] do_copy+0x110/0x898 [ 57.189284] write_buffer+0x148/0x1cc [ 57.192937] flush_buffer+0x94/0x240 [ 57.196505] __gunzip+0x738/0x8f0 [ 57.199810] gunzip+0x18/0x20 [ 57.202768] unpack_to_rootfs+0x358/0x968 [ 57.206769] populate_rootfs+0x120/0x198 [ 57.210684] do_one_initcall+0x544/0xd00 [ 57.214597] do_initcall_level+0x660/0x814 [ 57.218684] do_basic_setup+0x38/0x50 [ 57.222337] kernel_init_freeable+0x25c/0x444 [ 57.226686] kernel_init+0x1c/0x548 [ 57.230165] ret_from_fork+0x10/0x18 [ 57.233733] Code: aa1503e0 94034d79 b0016fe0 912c4000 (d4210000) [ 57.240055] ---[ end trace d7c5c3c62a7fa743 ]--- [ 57.244664] Kernel panic - not syncing: Fatal exception [ 57.249997] SMP: stopping secondary CPUs [ 57.254417] ---[ end Kernel panic - not syncing: Fatal exception ]--- ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages 2019-03-24 2:52 ` Qian Cai @ 2019-03-24 3:04 ` Matthew Wilcox 2019-03-24 15:42 ` Qian Cai ` (2 more replies) 0 siblings, 3 replies; 21+ messages in thread From: Matthew Wilcox @ 2019-03-24 3:04 UTC (permalink / raw) To: Qian Cai; +Cc: Huang Ying, linux-mm On Sat, Mar 23, 2019 at 10:52:49PM -0400, Qian Cai wrote: > On 3/23/19 10:06 PM, Matthew Wilcox wrote: > > Thanks for testing. Kirill suggests this would be a better fix: > > > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > > index 41858a3744b4..9718393ae45b 100644 > > --- a/include/linux/pagemap.h > > +++ b/include/linux/pagemap.h > > @@ -334,10 +334,12 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping, > > > > static inline struct page *find_subpage(struct page *page, pgoff_t offset) > > { > > + unsigned long index = page_index(page); > > + > > VM_BUG_ON_PAGE(PageTail(page), page); > > - VM_BUG_ON_PAGE(page->index > offset, page); > > - VM_BUG_ON_PAGE(page->index + compound_nr(page) <= offset, page); > > - return page - page->index + offset; > > + VM_BUG_ON_PAGE(index > offset, page); > > + VM_BUG_ON_PAGE(index + compound_nr(page) <= offset, page); > > + return page - index + offset; > > } > > > > struct page *find_get_entry(struct address_space *mapping, pgoff_t offset); > > This is not even compiled. > > If "s/compound_nr/compound_order/", it failed to boot here, Oh, sorry. I have another patch in that tree. - VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset, - page); + VM_BUG_ON_PAGE(page->index + compound_nr(page) <= offset, page); The patch for you should have looked like this: @@ -335,11 +335,12 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping, static inline struct page *find_subpage(struct page *page, pgoff_t offset) { + unsigned long index = page_index(page); + VM_BUG_ON_PAGE(PageTail(page), page); - VM_BUG_ON_PAGE(page->index > offset, page); - VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset, - page); - return page - page->index + offset; + VM_BUG_ON_PAGE(index > offset, page); + VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page); + return page - index + offset; } > [ 56.915812] page dumped because: VM_BUG_ON_PAGE(index + compound_order(page) > <= offset) Yeah, you were missing the '1 <<' part. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages 2019-03-24 3:04 ` Matthew Wilcox @ 2019-03-24 15:42 ` Qian Cai 2019-03-27 10:48 ` William Kucharski 2019-03-29 1:43 ` Qian Cai 2 siblings, 0 replies; 21+ messages in thread From: Qian Cai @ 2019-03-24 15:42 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Huang Ying, linux-mm On 3/23/19 11:04 PM, Matthew Wilcox wrote: > The patch for you should have looked like this: > > @@ -335,11 +335,12 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping, > > static inline struct page *find_subpage(struct page *page, pgoff_t offset) > { > + unsigned long index = page_index(page); > + > VM_BUG_ON_PAGE(PageTail(page), page); > - VM_BUG_ON_PAGE(page->index > offset, page); > - VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset, > - page); > - return page - page->index + offset; > + VM_BUG_ON_PAGE(index > offset, page); > + VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page); > + return page - index + offset; > } It works great. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages 2019-03-24 3:04 ` Matthew Wilcox 2019-03-24 15:42 ` Qian Cai @ 2019-03-27 10:48 ` William Kucharski 2019-03-27 11:50 ` Matthew Wilcox 2019-03-29 1:43 ` Qian Cai 2 siblings, 1 reply; 21+ messages in thread From: William Kucharski @ 2019-03-27 10:48 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Qian Cai, Huang Ying, linux-mm > On Mar 23, 2019, at 9:04 PM, Matthew Wilcox <willy@infradead.org> wrote: > > > static inline struct page *find_subpage(struct page *page, pgoff_t offset) > { > + unsigned long index = page_index(page); > + > VM_BUG_ON_PAGE(PageTail(page), page); > - VM_BUG_ON_PAGE(page->index > offset, page); > - VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset, > - page); > - return page - page->index + offset; > + VM_BUG_ON_PAGE(index > offset, page); > + VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page); > + return page - index + offset; > } > > >> [ 56.915812] page dumped because: VM_BUG_ON_PAGE(index + compound_order(page) >> <= offset) > > Yeah, you were missing the '1 <<' part. > Is a V5 patch coming incorporating these? ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages 2019-03-27 10:48 ` William Kucharski @ 2019-03-27 11:50 ` Matthew Wilcox 0 siblings, 0 replies; 21+ messages in thread From: Matthew Wilcox @ 2019-03-27 11:50 UTC (permalink / raw) To: William Kucharski; +Cc: Qian Cai, Huang Ying, linux-mm On Wed, Mar 27, 2019 at 04:48:42AM -0600, William Kucharski wrote: > > On Mar 23, 2019, at 9:04 PM, Matthew Wilcox <willy@infradead.org> wrote: > > static inline struct page *find_subpage(struct page *page, pgoff_t offset) > > { > > + unsigned long index = page_index(page); > > + > > VM_BUG_ON_PAGE(PageTail(page), page); > > - VM_BUG_ON_PAGE(page->index > offset, page); > > - VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset, > > - page); > > - return page - page->index + offset; > > + VM_BUG_ON_PAGE(index > offset, page); > > + VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page); > > + return page - index + offset; > > } > > > > > >> [ 56.915812] page dumped because: VM_BUG_ON_PAGE(index + compound_order(page) > >> <= offset) > > Is a V5 patch coming incorporating these? No; Andrew prefers to accumulate fixes as separate patches, so that's what I sent on March 24th; Andrew added it to -mm yesterday. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages 2019-03-24 3:04 ` Matthew Wilcox 2019-03-24 15:42 ` Qian Cai 2019-03-27 10:48 ` William Kucharski @ 2019-03-29 1:43 ` Qian Cai 2019-03-29 19:59 ` Matthew Wilcox 2 siblings, 1 reply; 21+ messages in thread From: Qian Cai @ 2019-03-29 1:43 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Huang Ying, linux-mm On 3/23/19 11:04 PM, Matthew Wilcox wrote> @@ -335,11 +335,12 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping, > > static inline struct page *find_subpage(struct page *page, pgoff_t offset) > { > + unsigned long index = page_index(page); > + > VM_BUG_ON_PAGE(PageTail(page), page); > - VM_BUG_ON_PAGE(page->index > offset, page); > - VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset, > - page); > - return page - page->index + offset; > + VM_BUG_ON_PAGE(index > offset, page); > + VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page); > + return page - index + offset; > } Even with this patch, it is still able to trigger a panic below by running LTP mm tests. Always triggered by oom02 (or oom04) at the end. # /opt/ltp/runltp -f mm The problem is that in scan_swap_map_slots(), /* reuse swap entry of cache-only swap if not busy. */ if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) { int swap_was_freed; unlock_cluster(ci); spin_unlock(&si->lock); swap_was_freed = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY); but that swap entry has already been freed, and the page has PageSwapCache cleared and page->private is 0. swp_entry_t entry = swp_entry(si->type, offset) and then in find_subpage(), its page->index has a different meaning again and the calculation is now all wrong. return page - page->index + offset; [ 7439.033573] oom_reaper: reaped process 47172 (oom02), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB [ 7456.445737] LTP: starting oom03 [ 7456.535940] LTP: starting oom04 [ 7493.077222] page:ffffea00877a13c0 count:1 mapcount:0 mapping:ffff88a79061d009 index:0x7fa81584f [ 7493.086963] anon [ 7493.086968] flags: 0x15fffe00008005c(uptodate|dirty|lru|workingset|swapbacked) [ 7493.097201] raw: 015fffe00008005c ffffea00b4bf9508 ffffea007f45efc8 ffff88a79061d009 [ 7493.105853] raw: 00000007fa81584f 0000000000000000 00000001ffffffff ffff888f18278008 [ 7493.114504] page dumped because: VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset) [ 7493.124126] page->mem_cgroup:ffff888f18278008 [ 7493.129036] page_owner info is not active (free page?) [ 7493.134782] ------------[ cut here ]------------ [ 7493.139937] kernel BUG at include/linux/pagemap.h:342! [ 7493.145682] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [ 7493.152679] CPU: 5 PID: 47308 Comm: oom04 Kdump: loaded Tainted: G W 5.1.0-rc2-mm1+ #13 [ 7493.163068] Hardware name: Lenovo ThinkSystem SR530 -[7X07RCZ000]-/-[7X07RCZ000]-, BIOS -[TEE113T-1.00]- 07/07/2017 [ 7493.174721] RIP: 0010:find_get_entry+0x751/0x9b0 [ 7493.179876] Code: c6 e0 aa a9 8d 4c 89 ff e8 3c 18 0d 00 0f 0b 48 c7 c7 20 40 02 8e e8 a3 17 58 00 48 c7 c6 40 ad a9 8d 4c 89 ff e8 1f 18 0d 00 <0f> 0b 48 c7 c7 e0 3f 02 8e e8 86 17 58 00 48 c7 c7 68 11 3f 8e e8 [ 7493.200834] RSP: 0000:ffff888d50536ba8 EFLAGS: 00010282 [ 7493.206666] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff8cd6401e [ 7493.214632] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88979e8b5480 [ 7493.222599] RBP: ffff888d50536cb8 R08: ffffed12f3d16a91 R09: ffffed12f3d16a90 [ 7493.230566] R10: ffffed12f3d16a90 R11: ffff88979e8b5487 R12: ffffea00877a13c0 [ 7493.238531] R13: ffffea00877a13c8 R14: ffffea00877a13c8 R15: ffffea00877a13c0 [ 7493.246496] FS: 00007f248398c700(0000) GS:ffff88979e880000(0000) knlGS:0000000000000000 [ 7493.255527] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 7493.261942] CR2: 00007f3fde110000 CR3: 00000011b2fcc003 CR4: 00000000001606a0 [ 7493.269900] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 7493.277864] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 7493.285830] Call Trace: [ 7493.288555] ? queued_spin_lock_slowpath+0x571/0x9e0 [ 7493.294097] ? __filemap_set_wb_err+0x1f0/0x1f0 [ 7493.299154] pagecache_get_page+0x4a/0xb70 [ 7493.303729] __try_to_reclaim_swap+0xa3/0x400 [ 7493.308593] scan_swap_map_slots+0xc05/0x1850 [ 7493.313447] ? __try_to_reclaim_swap+0x400/0x400 [ 7493.318600] ? do_raw_spin_lock+0x128/0x280 [ 7493.323269] ? rwlock_bug.part.0+0x90/0x90 [ 7493.327840] ? get_swap_pages+0x195/0x730 [ 7493.332316] get_swap_pages+0x386/0x730 [ 7493.336590] get_swap_page+0x2b2/0x643 [ 7493.340774] ? rmap_walk+0x140/0x140 [ 7493.344765] ? free_swap_slot+0x3c0/0x3c0 [ 7493.349232] ? anon_vma_ctor+0xe0/0xe0 [ 7493.353407] ? page_get_anon_vma+0x280/0x280 [ 7493.358173] add_to_swap+0x10b/0x230 [ 7493.362164] shrink_page_list+0x29d8/0x4960 [ 7493.366822] ? page_evictable+0x11b/0x1d0 [ 7493.371296] ? page_evictable+0x1d0/0x1d0 [ 7493.375769] ? __isolate_lru_page+0x880/0x880 [ 7493.380631] ? __lock_acquire.isra.14+0x7d7/0x2130 [ 7493.385977] ? shrink_inactive_list+0x484/0x13b0 [ 7493.391130] ? lock_downgrade+0x760/0x760 [ 7493.395608] ? kasan_check_read+0x11/0x20 [ 7493.400082] ? do_raw_spin_unlock+0x59/0x250 [ 7493.404848] shrink_inactive_list+0x4bf/0x13b0 [ 7493.409823] ? move_pages_to_lru+0x1c90/0x1c90 [ 7493.414795] ? kasan_check_read+0x11/0x20 [ 7493.419261] ? lruvec_lru_size+0xef/0x4c0 [ 7493.423738] ? call_function_interrupt+0xa/0x20 [ 7493.428800] ? rcu_all_qs+0x11/0xc0 [ 7493.432692] shrink_node_memcg+0x66a/0x1ee0 [ 7493.437361] ? shrink_active_list+0x1150/0x1150 [ 7493.442417] ? lock_downgrade+0x760/0x760 [ 7493.446891] ? lock_acquire+0x169/0x360 [ 7493.451177] ? mem_cgroup_iter+0x210/0xca0 [ 7493.455747] ? kasan_check_read+0x11/0x20 [ 7493.460221] ? mem_cgroup_protected+0x94/0x450 [ 7493.465179] shrink_node+0x266/0x13c0 [ 7493.469267] ? shrink_node_memcg+0x1ee0/0x1ee0 [ 7493.474230] ? ktime_get+0xab/0x140 [ 7493.478122] ? zone_reclaimable_pages+0x553/0x8d0 [ 7493.483371] do_try_to_free_pages+0x349/0x11e0 [ 7493.488333] ? allow_direct_reclaim.part.6+0xc3/0x240 [ 7493.493971] ? shrink_node+0x13c0/0x13c0 [ 7493.498352] ? queue_delayed_work_on+0x30/0x30 [ 7493.503313] try_to_free_pages+0x277/0x740 [ 7493.507884] ? __lock_acquire.isra.14+0x7d7/0x2130 [ 7493.513232] ? do_try_to_free_pages+0x11e0/0x11e0 [ 7493.518482] __alloc_pages_nodemask+0xc37/0x2ab0 [ 7493.523635] ? gfp_pfmemalloc_allowed+0x150/0x150 [ 7493.528886] ? __lock_acquire.isra.14+0x7d7/0x2130 [ 7493.534226] ? __lock_acquire.isra.14+0x7d7/0x2130 [ 7493.539566] ? do_anonymous_page+0x450/0x1e00 [ 7493.544419] ? lock_downgrade+0x760/0x760 [ 7493.548896] ? __lru_cache_add+0xc2/0x240 [ 7493.553372] alloc_pages_vma+0xb2/0x430 [ 7493.557652] do_anonymous_page+0x50a/0x1e00 [ 7493.562324] ? put_prev_task_fair+0x27c/0x720 [ 7493.567189] ? finish_fault+0x290/0x290 [ 7493.571471] __handle_mm_fault+0x1688/0x3bc0 [ 7493.576227] ? __lock_acquire.isra.14+0x7d7/0x2130 [ 7493.581574] ? vmf_insert_mixed_mkwrite+0x20/0x20 [ 7493.586824] handle_mm_fault+0x326/0x6cf [ 7493.591203] __do_page_fault+0x333/0x8d0 [ 7493.595571] do_page_fault+0x75/0x48e [ 7493.599660] ? page_fault+0x5/0x20 [ 7493.603458] page_fault+0x1b/0x20 [ 7493.607156] RIP: 0033:0x410930 [ 7493.610564] Code: 89 de e8 53 26 ff ff 48 83 f8 ff 0f 84 86 00 00 00 48 89 c5 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 05 2c ff ff 31 d2 48 98 90 <c6> 44 15 00 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f [ 7493.631521] RSP: 002b:00007f248398bec0 EFLAGS: 00010206 [ 7493.637352] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: 00007f42982ad497 [ 7493.645316] RDX: 000000002a9a3000 RSI: 00000000c0000000 RDI: 0000000000000000 [ 7493.653281] RBP: 00007f230298b000 R08: 00000000ffffffff R09: 0000000000000000 [ 7493.661245] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000001 [ 7493.669209] R13: 00007ffc5b8a54ef R14: 0000000000000000 R15: 00007f248398bfc0 [ 7493.677176] Modules linked in: brd ext4 crc16 mbcache jbd2 overlay loop nls_iso8859_1 nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables x_tables xfs sd_mod i40e ahci libahci megaraid_sas libata dm_mirror dm_region_hash dm_log dm_mod efivarfs ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages 2019-03-29 1:43 ` Qian Cai @ 2019-03-29 19:59 ` Matthew Wilcox 2019-03-29 21:25 ` Qian Cai 0 siblings, 1 reply; 21+ messages in thread From: Matthew Wilcox @ 2019-03-29 19:59 UTC (permalink / raw) To: Qian Cai; +Cc: Huang Ying, linux-mm, Kirill A. Shutemov On Thu, Mar 28, 2019 at 09:43:29PM -0400, Qian Cai wrote: > On 3/23/19 11:04 PM, Matthew Wilcox wrote> @@ -335,11 +335,12 @@ static inline > struct page *grab_cache_page_nowait(struct address_space *mapping, > > > > static inline struct page *find_subpage(struct page *page, pgoff_t offset) > > { > > + unsigned long index = page_index(page); > > + > > VM_BUG_ON_PAGE(PageTail(page), page); > > - VM_BUG_ON_PAGE(page->index > offset, page); > > - VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset, > > - page); > > - return page - page->index + offset; > > + VM_BUG_ON_PAGE(index > offset, page); > > + VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page); > > + return page - index + offset; > > } > > Even with this patch, it is still able to trigger a panic below by running LTP > mm tests. Always triggered by oom02 (or oom04) at the end. > > # /opt/ltp/runltp -f mm > > The problem is that in scan_swap_map_slots(), > > /* reuse swap entry of cache-only swap if not busy. */ > if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) { > int swap_was_freed; > unlock_cluster(ci); > spin_unlock(&si->lock); > swap_was_freed = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY); > > but that swap entry has already been freed, and the page has PageSwapCache > cleared and page->private is 0. I don't understand how we get to this situation. We SetPageSwapCache() in add_to_swap_cache() right before we store the page in i_pages. We ClearPageSwapCache() in __delete_from_swap_cache() right after removing the page from the array. So how do we find a page in a swap address space that has PageSwapCache cleared? Indeed, we have a check which should trigger ... VM_BUG_ON_PAGE(!PageSwapCache(page), page); in __delete_from_swap_cache(). Oh ... is it a race? * Its ok to check for PageSwapCache without the page lock * here because we are going to recheck again inside * try_to_free_swap() _with_ the lock. so CPU A does: page = find_get_page(swap_address_space(entry), offset) page = find_subpage(page, offset); trylock_page(page); while CPU B does: xa_lock_irq(&address_space->i_pages); __delete_from_swap_cache(page, entry); xas_store(&xas, NULL); ClearPageSwapCache(page); xa_unlock_irq(&address_space->i_pages); and if the ClearPageSwapCache happens between the xas_load() and the find_subpage(), we're stuffed. CPU A has a reference to the page, but not a lock, and find_get_page is running under RCU. I suppose we could fix this by taking the i_pages xa_lock around the call to find_get_pages(). If indeed, that's what this problem is. Want to try this patch? diff --git a/mm/swapfile.c b/mm/swapfile.c index 2b8d9c3fbb47..ed8e42be88b5 100644 --- a/mm/swapfile.c +++ b/mm/swapfile.c @@ -127,10 +127,14 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si, unsigned long offset, unsigned long flags) { swp_entry_t entry = swp_entry(si->type, offset); + struct address_space *mapping = swap_address_space(entry); + unsigned long irq_flags; struct page *page; int ret = 0; - page = find_get_page(swap_address_space(entry), offset); + xa_lock_irqsave(&mapping->i_pages, irq_flags); + page = find_get_page(mapping, offset); + xa_unlock_irqrestore(&mapping->i_pages, irq_flags); if (!page) return 0; /* > swp_entry_t entry = swp_entry(si->type, offset) > > and then in find_subpage(), > > its page->index has a different meaning again and the calculation is now all wrong. > > return page - page->index + offset; > > [ 7439.033573] oom_reaper: reaped process 47172 (oom02), now anon-rss:0kB, > file-rss:0kB, shmem-rss:0kB > [ 7456.445737] LTP: starting oom03 > [ 7456.535940] LTP: starting oom04 > [ 7493.077222] page:ffffea00877a13c0 count:1 mapcount:0 mapping:ffff88a79061d009 > index:0x7fa81584f > [ 7493.086963] anon > [ 7493.086968] flags: 0x15fffe00008005c(uptodate|dirty|lru|workingset|swapbacked) > [ 7493.097201] raw: 015fffe00008005c ffffea00b4bf9508 ffffea007f45efc8 > ffff88a79061d009 > [ 7493.105853] raw: 00000007fa81584f 0000000000000000 00000001ffffffff > ffff888f18278008 > [ 7493.114504] page dumped because: VM_BUG_ON_PAGE(index + (1 << > compound_order(page)) <= offset) > [ 7493.124126] page->mem_cgroup:ffff888f18278008 > [ 7493.129036] page_owner info is not active (free page?) > [ 7493.134782] ------------[ cut here ]------------ > [ 7493.139937] kernel BUG at include/linux/pagemap.h:342! > [ 7493.145682] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI > [ 7493.152679] CPU: 5 PID: 47308 Comm: oom04 Kdump: loaded Tainted: G W > 5.1.0-rc2-mm1+ #13 > [ 7493.163068] Hardware name: Lenovo ThinkSystem SR530 > -[7X07RCZ000]-/-[7X07RCZ000]-, BIOS -[TEE113T-1.00]- 07/07/2017 > [ 7493.174721] RIP: 0010:find_get_entry+0x751/0x9b0 > [ 7493.179876] Code: c6 e0 aa a9 8d 4c 89 ff e8 3c 18 0d 00 0f 0b 48 c7 c7 20 40 > 02 8e e8 a3 17 58 00 48 c7 c6 40 ad a9 8d 4c 89 ff e8 1f 18 0d 00 <0f> 0b 48 c7 > c7 e0 3f 02 8e e8 86 17 58 00 48 c7 c7 68 11 3f 8e e8 > [ 7493.200834] RSP: 0000:ffff888d50536ba8 EFLAGS: 00010282 > [ 7493.206666] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff8cd6401e > [ 7493.214632] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88979e8b5480 > [ 7493.222599] RBP: ffff888d50536cb8 R08: ffffed12f3d16a91 R09: ffffed12f3d16a90 > [ 7493.230566] R10: ffffed12f3d16a90 R11: ffff88979e8b5487 R12: ffffea00877a13c0 > [ 7493.238531] R13: ffffea00877a13c8 R14: ffffea00877a13c8 R15: ffffea00877a13c0 > [ 7493.246496] FS: 00007f248398c700(0000) GS:ffff88979e880000(0000) > knlGS:0000000000000000 > [ 7493.255527] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 7493.261942] CR2: 00007f3fde110000 CR3: 00000011b2fcc003 CR4: 00000000001606a0 > [ 7493.269900] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > [ 7493.277864] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > [ 7493.285830] Call Trace: > [ 7493.288555] ? queued_spin_lock_slowpath+0x571/0x9e0 > [ 7493.294097] ? __filemap_set_wb_err+0x1f0/0x1f0 > [ 7493.299154] pagecache_get_page+0x4a/0xb70 > [ 7493.303729] __try_to_reclaim_swap+0xa3/0x400 > [ 7493.308593] scan_swap_map_slots+0xc05/0x1850 > [ 7493.313447] ? __try_to_reclaim_swap+0x400/0x400 > [ 7493.318600] ? do_raw_spin_lock+0x128/0x280 > [ 7493.323269] ? rwlock_bug.part.0+0x90/0x90 > [ 7493.327840] ? get_swap_pages+0x195/0x730 > [ 7493.332316] get_swap_pages+0x386/0x730 > [ 7493.336590] get_swap_page+0x2b2/0x643 > [ 7493.340774] ? rmap_walk+0x140/0x140 > [ 7493.344765] ? free_swap_slot+0x3c0/0x3c0 > [ 7493.349232] ? anon_vma_ctor+0xe0/0xe0 > [ 7493.353407] ? page_get_anon_vma+0x280/0x280 > [ 7493.358173] add_to_swap+0x10b/0x230 > [ 7493.362164] shrink_page_list+0x29d8/0x4960 > [ 7493.366822] ? page_evictable+0x11b/0x1d0 > [ 7493.371296] ? page_evictable+0x1d0/0x1d0 > [ 7493.375769] ? __isolate_lru_page+0x880/0x880 > [ 7493.380631] ? __lock_acquire.isra.14+0x7d7/0x2130 > [ 7493.385977] ? shrink_inactive_list+0x484/0x13b0 > [ 7493.391130] ? lock_downgrade+0x760/0x760 > [ 7493.395608] ? kasan_check_read+0x11/0x20 > [ 7493.400082] ? do_raw_spin_unlock+0x59/0x250 > [ 7493.404848] shrink_inactive_list+0x4bf/0x13b0 > [ 7493.409823] ? move_pages_to_lru+0x1c90/0x1c90 > [ 7493.414795] ? kasan_check_read+0x11/0x20 > [ 7493.419261] ? lruvec_lru_size+0xef/0x4c0 > [ 7493.423738] ? call_function_interrupt+0xa/0x20 > [ 7493.428800] ? rcu_all_qs+0x11/0xc0 > [ 7493.432692] shrink_node_memcg+0x66a/0x1ee0 > [ 7493.437361] ? shrink_active_list+0x1150/0x1150 > [ 7493.442417] ? lock_downgrade+0x760/0x760 > [ 7493.446891] ? lock_acquire+0x169/0x360 > [ 7493.451177] ? mem_cgroup_iter+0x210/0xca0 > [ 7493.455747] ? kasan_check_read+0x11/0x20 > [ 7493.460221] ? mem_cgroup_protected+0x94/0x450 > [ 7493.465179] shrink_node+0x266/0x13c0 > [ 7493.469267] ? shrink_node_memcg+0x1ee0/0x1ee0 > [ 7493.474230] ? ktime_get+0xab/0x140 > [ 7493.478122] ? zone_reclaimable_pages+0x553/0x8d0 > [ 7493.483371] do_try_to_free_pages+0x349/0x11e0 > [ 7493.488333] ? allow_direct_reclaim.part.6+0xc3/0x240 > [ 7493.493971] ? shrink_node+0x13c0/0x13c0 > [ 7493.498352] ? queue_delayed_work_on+0x30/0x30 > [ 7493.503313] try_to_free_pages+0x277/0x740 > [ 7493.507884] ? __lock_acquire.isra.14+0x7d7/0x2130 > [ 7493.513232] ? do_try_to_free_pages+0x11e0/0x11e0 > [ 7493.518482] __alloc_pages_nodemask+0xc37/0x2ab0 > [ 7493.523635] ? gfp_pfmemalloc_allowed+0x150/0x150 > [ 7493.528886] ? __lock_acquire.isra.14+0x7d7/0x2130 > [ 7493.534226] ? __lock_acquire.isra.14+0x7d7/0x2130 > [ 7493.539566] ? do_anonymous_page+0x450/0x1e00 > [ 7493.544419] ? lock_downgrade+0x760/0x760 > [ 7493.548896] ? __lru_cache_add+0xc2/0x240 > [ 7493.553372] alloc_pages_vma+0xb2/0x430 > [ 7493.557652] do_anonymous_page+0x50a/0x1e00 > [ 7493.562324] ? put_prev_task_fair+0x27c/0x720 > [ 7493.567189] ? finish_fault+0x290/0x290 > [ 7493.571471] __handle_mm_fault+0x1688/0x3bc0 > [ 7493.576227] ? __lock_acquire.isra.14+0x7d7/0x2130 > [ 7493.581574] ? vmf_insert_mixed_mkwrite+0x20/0x20 > [ 7493.586824] handle_mm_fault+0x326/0x6cf > [ 7493.591203] __do_page_fault+0x333/0x8d0 > [ 7493.595571] do_page_fault+0x75/0x48e > [ 7493.599660] ? page_fault+0x5/0x20 > [ 7493.603458] page_fault+0x1b/0x20 > [ 7493.607156] RIP: 0033:0x410930 > [ 7493.610564] Code: 89 de e8 53 26 ff ff 48 83 f8 ff 0f 84 86 00 00 00 48 89 c5 > 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 05 2c ff ff 31 d2 48 98 90 <c6> 44 15 00 > 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f > [ 7493.631521] RSP: 002b:00007f248398bec0 EFLAGS: 00010206 > [ 7493.637352] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: 00007f42982ad497 > [ 7493.645316] RDX: 000000002a9a3000 RSI: 00000000c0000000 RDI: 0000000000000000 > [ 7493.653281] RBP: 00007f230298b000 R08: 00000000ffffffff R09: 0000000000000000 > [ 7493.661245] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000001 > [ 7493.669209] R13: 00007ffc5b8a54ef R14: 0000000000000000 R15: 00007f248398bfc0 > [ 7493.677176] Modules linked in: brd ext4 crc16 mbcache jbd2 overlay loop > nls_iso8859_1 nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables > x_tables xfs sd_mod i40e ahci libahci megaraid_sas libata dm_mirror > dm_region_hash dm_log dm_mod efivarfs ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages 2019-03-29 19:59 ` Matthew Wilcox @ 2019-03-29 21:25 ` Qian Cai 2019-03-30 3:04 ` Matthew Wilcox 0 siblings, 1 reply; 21+ messages in thread From: Qian Cai @ 2019-03-29 21:25 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Huang Ying, linux-mm, Kirill A. Shutemov On Fri, 2019-03-29 at 12:59 -0700, Matthew Wilcox wrote: > I don't understand how we get to this situation. We SetPageSwapCache() > in add_to_swap_cache() right before we store the page in i_pages. > We ClearPageSwapCache() in __delete_from_swap_cache() right after > removing the page from the array. So how do we find a page in a swap > address space that has PageSwapCache cleared? > > Indeed, we have a check which should trigger ... > > VM_BUG_ON_PAGE(!PageSwapCache(page), page); > > in __delete_from_swap_cache(). > > Oh ... is it a race? > > * Its ok to check for PageSwapCache without the page lock > * here because we are going to recheck again inside > * try_to_free_swap() _with_ the lock. > > so CPU A does: > > page = find_get_page(swap_address_space(entry), offset) > page = find_subpage(page, offset); > trylock_page(page); > > while CPU B does: > > xa_lock_irq(&address_space->i_pages); > __delete_from_swap_cache(page, entry); > xas_store(&xas, NULL); > ClearPageSwapCache(page); > xa_unlock_irq(&address_space->i_pages); > > and if the ClearPageSwapCache happens between the xas_load() and the > find_subpage(), we're stuffed. CPU A has a reference to the page, but > not a lock, and find_get_page is running under RCU. > > I suppose we could fix this by taking the i_pages xa_lock around the > call to find_get_pages(). If indeed, that's what this problem is. > Want to try this patch? Confirmed. Well spotted! > diff --git a/mm/swapfile.c b/mm/swapfile.c > index 2b8d9c3fbb47..ed8e42be88b5 100644 > --- a/mm/swapfile.c > +++ b/mm/swapfile.c > @@ -127,10 +127,14 @@ static int __try_to_reclaim_swap(struct swap_info_struct > *si, > unsigned long offset, unsigned long flags) > { > swp_entry_t entry = swp_entry(si->type, offset); > + struct address_space *mapping = swap_address_space(entry); > + unsigned long irq_flags; > struct page *page; > int ret = 0; > > - page = find_get_page(swap_address_space(entry), offset); > + xa_lock_irqsave(&mapping->i_pages, irq_flags); > + page = find_get_page(mapping, offset); > + xa_unlock_irqrestore(&mapping->i_pages, irq_flags); > if (!page) > return 0; > /* ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages 2019-03-29 21:25 ` Qian Cai @ 2019-03-30 3:04 ` Matthew Wilcox 2019-03-30 14:10 ` Matthew Wilcox 0 siblings, 1 reply; 21+ messages in thread From: Matthew Wilcox @ 2019-03-30 3:04 UTC (permalink / raw) To: Qian Cai; +Cc: Huang Ying, linux-mm, Kirill A. Shutemov On Fri, Mar 29, 2019 at 05:25:34PM -0400, Qian Cai wrote: > On Fri, 2019-03-29 at 12:59 -0700, Matthew Wilcox wrote: > > Oh ... is it a race? > > > > so CPU A does: > > > > page = find_get_page(swap_address_space(entry), offset) > > page = find_subpage(page, offset); > > trylock_page(page); > > > > while CPU B does: > > > > xa_lock_irq(&address_space->i_pages); > > __delete_from_swap_cache(page, entry); > > xas_store(&xas, NULL); > > ClearPageSwapCache(page); > > xa_unlock_irq(&address_space->i_pages); > > > > and if the ClearPageSwapCache happens between the xas_load() and the > > find_subpage(), we're stuffed. CPU A has a reference to the page, but > > not a lock, and find_get_page is running under RCU. > > > > I suppose we could fix this by taking the i_pages xa_lock around the > > call to find_get_pages(). If indeed, that's what this problem is. > > Want to try this patch? > > Confirmed. Well spotted! Excellent! I'm not comfortable with the rule that you have to be holding the i_pages lock in order to call find_get_page() on a swap address_space. How does this look to the various smart people who know far more about the MM than I do? The idea is to ensure that if this race does happen, the page will be handled the same way as a pagecache page. If __delete_from_swap_cache() can be called while the page is still part of a VMA, then this patch will break page_to_pgoff(). But I don't think that can happen ... ? (also, is this the right memory barrier to use to ensure that the old value of page->index cannot be observed if the PageSwapCache bit is obseved as having been cleared?) diff --git a/mm/swap_state.c b/mm/swap_state.c index 2e15cc335966..a715efcf0991 100644 --- a/mm/swap_state.c +++ b/mm/swap_state.c @@ -165,13 +165,16 @@ void __delete_from_swap_cache(struct page *page, swp_entry_t entry) VM_BUG_ON_PAGE(!PageSwapCache(page), page); VM_BUG_ON_PAGE(PageWriteback(page), page); + page->index = idx; + smp_mb__before_atomic(); + ClearPageSwapCache(page); + for (i = 0; i < nr; i++) { void *entry = xas_store(&xas, NULL); VM_BUG_ON_PAGE(entry != page, entry); set_page_private(page + i, 0); xas_next(&xas); } - ClearPageSwapCache(page); address_space->nrpages -= nr; __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr); ADD_CACHE_INFO(del_total, nr); ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages 2019-03-30 3:04 ` Matthew Wilcox @ 2019-03-30 14:10 ` Matthew Wilcox 2019-03-31 3:23 ` Matthew Wilcox 0 siblings, 1 reply; 21+ messages in thread From: Matthew Wilcox @ 2019-03-30 14:10 UTC (permalink / raw) To: Qian Cai; +Cc: Huang Ying, linux-mm, Kirill A. Shutemov On Fri, Mar 29, 2019 at 08:04:32PM -0700, Matthew Wilcox wrote: > Excellent! I'm not comfortable with the rule that you have to be holding > the i_pages lock in order to call find_get_page() on a swap address_space. > How does this look to the various smart people who know far more about the > MM than I do? > > The idea is to ensure that if this race does happen, the page will be > handled the same way as a pagecache page. If __delete_from_swap_cache() > can be called while the page is still part of a VMA, then this patch > will break page_to_pgoff(). But I don't think that can happen ... ? Oh, blah, that can totally happen. reuse_swap_page() calls delete_from_swap_cache(). Need a new plan. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages 2019-03-30 14:10 ` Matthew Wilcox @ 2019-03-31 3:23 ` Matthew Wilcox 2019-04-01 9:18 ` Kirill A. Shutemov 0 siblings, 1 reply; 21+ messages in thread From: Matthew Wilcox @ 2019-03-31 3:23 UTC (permalink / raw) To: Qian Cai; +Cc: Huang Ying, linux-mm, Kirill A. Shutemov On Sat, Mar 30, 2019 at 07:10:52AM -0700, Matthew Wilcox wrote: > On Fri, Mar 29, 2019 at 08:04:32PM -0700, Matthew Wilcox wrote: > > Excellent! I'm not comfortable with the rule that you have to be holding > > the i_pages lock in order to call find_get_page() on a swap address_space. > > How does this look to the various smart people who know far more about the > > MM than I do? > > > > The idea is to ensure that if this race does happen, the page will be > > handled the same way as a pagecache page. If __delete_from_swap_cache() > > can be called while the page is still part of a VMA, then this patch > > will break page_to_pgoff(). But I don't think that can happen ... ? > > Oh, blah, that can totally happen. reuse_swap_page() calls > delete_from_swap_cache(). Need a new plan. I don't see a good solution here that doesn't involve withdrawing this patch and starting over. Bad solutions: - Take the i_pages lock around each page lookup call in the swap code (not just the one you found; there are others like mc_handle_swap_pte() in memcontrol.c) - Call synchronize_rcu() in __delete_from_swap_cache() - Swap the roles of ->index and ->private for swap pages, and then don't clear ->index when deleting a page from the swap cache The first two would be slow and non-scalable. The third is still prone to a race where the page is looked up on one CPU, while another CPU removes it from one swap file then moves it to a different location, potentially in a different swap file. Hard to hit, but not a race we want to introduce. I believe that the swap code actually never wants to see subpages. So if we start again, introducing APIs (eg find_get_head()) which return the head page, then convert the swap code over to use those APIs, we don't need to solve the problem of finding the subpage of a swap page while not holding the page lock. I'm obviously reluctant to withdraw the patch, but I don't see a better option. Your testing has revealed a problem that needs a deeper solution than just adding a fix patch. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages 2019-03-31 3:23 ` Matthew Wilcox @ 2019-04-01 9:18 ` Kirill A. Shutemov 2019-04-01 9:27 ` Kirill A. Shutemov 0 siblings, 1 reply; 21+ messages in thread From: Kirill A. Shutemov @ 2019-04-01 9:18 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Qian Cai, Huang Ying, linux-mm On Sat, Mar 30, 2019 at 08:23:26PM -0700, Matthew Wilcox wrote: > On Sat, Mar 30, 2019 at 07:10:52AM -0700, Matthew Wilcox wrote: > > On Fri, Mar 29, 2019 at 08:04:32PM -0700, Matthew Wilcox wrote: > > > Excellent! I'm not comfortable with the rule that you have to be holding > > > the i_pages lock in order to call find_get_page() on a swap address_space. > > > How does this look to the various smart people who know far more about the > > > MM than I do? > > > > > > The idea is to ensure that if this race does happen, the page will be > > > handled the same way as a pagecache page. If __delete_from_swap_cache() > > > can be called while the page is still part of a VMA, then this patch > > > will break page_to_pgoff(). But I don't think that can happen ... ? > > > > Oh, blah, that can totally happen. reuse_swap_page() calls > > delete_from_swap_cache(). Need a new plan. > > I don't see a good solution here that doesn't involve withdrawing this > patch and starting over. Bad solutions: > > - Take the i_pages lock around each page lookup call in the swap code > (not just the one you found; there are others like mc_handle_swap_pte() > in memcontrol.c) > - Call synchronize_rcu() in __delete_from_swap_cache() > - Swap the roles of ->index and ->private for swap pages, and then don't > clear ->index when deleting a page from the swap cache > > The first two would be slow and non-scalable. The third is still prone > to a race where the page is looked up on one CPU, while another CPU > removes it from one swap file then moves it to a different location, > potentially in a different swap file. Hard to hit, but not a race we > want to introduce. > > I believe that the swap code actually never wants to see subpages. So if > we start again, introducing APIs (eg find_get_head()) which return the > head page, then convert the swap code over to use those APIs, we don't > need to solve the problem of finding the subpage of a swap page while > not holding the page lock. > > I'm obviously reluctant to withdraw the patch, but I don't see a better > option. Your testing has revealed a problem that needs a deeper solution > than just adding a fix patch. Hm. Isn't the problem with VM_BUGs themself? I mean find_subpage() produces right result (or am I wrong here?), but VM_BUGs flags it as wrong. Maybe we should relax the VM_BUGs? -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages 2019-04-01 9:18 ` Kirill A. Shutemov @ 2019-04-01 9:27 ` Kirill A. Shutemov 2019-04-04 13:10 ` Qian Cai 0 siblings, 1 reply; 21+ messages in thread From: Kirill A. Shutemov @ 2019-04-01 9:27 UTC (permalink / raw) To: Matthew Wilcox; +Cc: Qian Cai, Huang Ying, linux-mm On Mon, Apr 01, 2019 at 12:18:58PM +0300, Kirill A. Shutemov wrote: > On Sat, Mar 30, 2019 at 08:23:26PM -0700, Matthew Wilcox wrote: > > On Sat, Mar 30, 2019 at 07:10:52AM -0700, Matthew Wilcox wrote: > > > On Fri, Mar 29, 2019 at 08:04:32PM -0700, Matthew Wilcox wrote: > > > > Excellent! I'm not comfortable with the rule that you have to be holding > > > > the i_pages lock in order to call find_get_page() on a swap address_space. > > > > How does this look to the various smart people who know far more about the > > > > MM than I do? > > > > > > > > The idea is to ensure that if this race does happen, the page will be > > > > handled the same way as a pagecache page. If __delete_from_swap_cache() > > > > can be called while the page is still part of a VMA, then this patch > > > > will break page_to_pgoff(). But I don't think that can happen ... ? > > > > > > Oh, blah, that can totally happen. reuse_swap_page() calls > > > delete_from_swap_cache(). Need a new plan. > > > > I don't see a good solution here that doesn't involve withdrawing this > > patch and starting over. Bad solutions: > > > > - Take the i_pages lock around each page lookup call in the swap code > > (not just the one you found; there are others like mc_handle_swap_pte() > > in memcontrol.c) > > - Call synchronize_rcu() in __delete_from_swap_cache() > > - Swap the roles of ->index and ->private for swap pages, and then don't > > clear ->index when deleting a page from the swap cache > > > > The first two would be slow and non-scalable. The third is still prone > > to a race where the page is looked up on one CPU, while another CPU > > removes it from one swap file then moves it to a different location, > > potentially in a different swap file. Hard to hit, but not a race we > > want to introduce. > > > > I believe that the swap code actually never wants to see subpages. So if > > we start again, introducing APIs (eg find_get_head()) which return the > > head page, then convert the swap code over to use those APIs, we don't > > need to solve the problem of finding the subpage of a swap page while > > not holding the page lock. > > > > I'm obviously reluctant to withdraw the patch, but I don't see a better > > option. Your testing has revealed a problem that needs a deeper solution > > than just adding a fix patch. > > Hm. Isn't the problem with VM_BUGs themself? I mean find_subpage() > produces right result (or am I wrong here?), but VM_BUGs flags it as wrong. Yeah, I'm wrong. :P What about patch like this? (completely untested) diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index f939e004c5d1..e3b9bf843dcb 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -335,12 +335,12 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping, static inline struct page *find_subpage(struct page *page, pgoff_t offset) { - unsigned long index = page_index(page); + unsigned long mask; VM_BUG_ON_PAGE(PageTail(page), page); - VM_BUG_ON_PAGE(index > offset, page); - VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page); - return page - index + offset; + + mask = (1UL << compound_order(page)) - 1; + return page + (offset & mask); } struct page *find_get_entry(struct address_space *mapping, pgoff_t offset); -- Kirill A. Shutemov ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages 2019-04-01 9:27 ` Kirill A. Shutemov @ 2019-04-04 13:10 ` Qian Cai 2019-04-04 13:45 ` Kirill A. Shutemov 0 siblings, 1 reply; 21+ messages in thread From: Qian Cai @ 2019-04-04 13:10 UTC (permalink / raw) To: Kirill A. Shutemov, Matthew Wilcox; +Cc: Huang Ying, linux-mm On Mon, 2019-04-01 at 12:27 +0300, Kirill A. Shutemov wrote: > What about patch like this? (completely untested) > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > index f939e004c5d1..e3b9bf843dcb 100644 > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -335,12 +335,12 @@ static inline struct page *grab_cache_page_nowait(struct > address_space *mapping, > > static inline struct page *find_subpage(struct page *page, pgoff_t offset) > { > - unsigned long index = page_index(page); > + unsigned long mask; > > VM_BUG_ON_PAGE(PageTail(page), page); > - VM_BUG_ON_PAGE(index > offset, page); > - VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page); > - return page - index + offset; > + > + mask = (1UL << compound_order(page)) - 1; > + return page + (offset & mask); > } > > struct page *find_get_entry(struct address_space *mapping, pgoff_t offset); No, this then leads to a panic below by LTP hugemmap05. Still reverting the whole "mm: page cache: store only head pages in i_pages" commit fixed the problem. # /opt/ltp/testcases/bin/hugemmap05 tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s hugemmap05.c:235: INFO: original nr_hugepages is 0 hugemmap05.c:248: INFO: original nr_overcommit_hugepages is 0 hugemmap05.c:116: INFO: check /proc/meminfo before allocation. hugemmap05.c:297: INFO: HugePages_Total is 192. hugemmap05.c:297: INFO: HugePages_Free is 192. hugemmap05.c:297: INFO: HugePages_Surp is 64. hugemmap05.c:297: INFO: HugePages_Rsvd is 192. hugemmap05.c:272: INFO: First hex is 7070707 hugemmap05.c:151: INFO: check /proc/meminfo. hugemmap05.c:297: INFO: HugePages_Total is 192. hugemmap05.c:297: INFO: HugePages_Free is 0. hugemmap05.c:297: INFO: HugePages_Surp is 64. hugemmap05.c:297: INFO: HugePages_Rsvd is 0. [10022.547977] ------------[ cut here ]------------ [10022.571941] kernel BUG at fs/hugetlbfs/inode.c:475! [10022.598304] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI [10022.626383] CPU: 39 PID: 13074 Comm: hugemmap05 Kdump: loaded Tainted: G W 5.1.0-rc3-next-20190403+ #16 [10022.674421] Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS U19 12/27/2015 [10022.711990] RIP: 0010:remove_inode_hugepages+0x706/0xa60 [10022.735997] Code: fd ff ff e8 9c a0 99 ff e9 bc fc ff ff 48 c7 c6 40 ae 50 9f 4c 89 f7 e8 c8 3f ca ff 0f 0b 48 c7 c7 80 18 ba 9f e8 2f 63 15 00 <0f> 0b 48 c7 c7 40 18 ba 9f e8 21 63 15 00 48 8b bd 88 fd ff ff e8 [10022.820547] RSP: 0018:ffff88883ea5f920 EFLAGS: 00010202 [10022.844039] RAX: 015fffe000002000 RBX: 0000000000000001 RCX: ffffffff9e2adf5c [10022.876130] RDX: 0000000000000001 RSI: 00000000000001df RDI: ffffea001a0f8048 [10022.908202] RBP: ffff88883ea5fbf8 R08: fffff9400341f00b R09: fffff9400341f00a [10022.940369] R10: fffff9400341f00a R11: ffffea001a0f8057 R12: 0000000000000001 [10022.972615] R13: ffff88883ea5fbd0 R14: ffffea001a0f8040 R15: dffffc0000000000 [10023.004633] FS: 00007ff5964d7740(0000) GS:ffff888847b80000(0000) knlGS:0000000000000000 [10023.040462] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [10023.066242] CR2: 00007ff595800000 CR3: 00000004be5d0006 CR4: 00000000001606a0 [10023.103426] Call Trace: [10023.114997] ? hugetlbfs_size_to_hpages+0xe0/0xe0 [10023.136032] ? fsnotify_grab_connector+0x9f/0x130 [10023.157131] ? __lock_acquire.isra.14+0x7d7/0x2130 [10023.178540] ? kasan_check_read+0x11/0x20 [10023.196471] ? do_raw_spin_unlock+0x59/0x250 [10023.215893] hugetlbfs_evict_inode+0x20/0x90 [10023.235249] evict+0x2a4/0x5c0 [10023.249393] ? do_raw_spin_unlock+0x59/0x250 [10023.268885] iput+0x3d9/0x790 [10023.282210] do_unlinkat+0x461/0x650 [10023.298318] ? __x64_sys_rmdir+0x40/0x40 [10023.316058] ? __check_object_size+0x4b4/0x7f1 [10023.336241] ? __kasan_kmalloc.constprop.1+0xac/0xc0 [10023.358681] ? blkcg_exit_queue+0x1a0/0x1a0 [10023.377428] ? getname_flags+0x90/0x400 [10023.394859] __x64_sys_unlink+0x3e/0x50 [10023.411987] do_syscall_64+0xeb/0xb78 [10023.428386] ? syscall_return_slowpath+0x160/0x160 [10023.449987] ? __do_page_fault+0x583/0x8d0 [10023.468333] ? schedule+0x81/0x180 [10023.483515] ? exit_to_usermode_loop+0xab/0x100 [10023.503763] entry_SYSCALL_64_after_hwframe+0x44/0xa9 [10023.526369] RIP: 0033:0x7ff595bbcedb ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages 2019-04-04 13:10 ` Qian Cai @ 2019-04-04 13:45 ` Kirill A. Shutemov 2019-04-04 21:28 ` Qian Cai 0 siblings, 1 reply; 21+ messages in thread From: Kirill A. Shutemov @ 2019-04-04 13:45 UTC (permalink / raw) To: Qian Cai; +Cc: Matthew Wilcox, Huang Ying, linux-mm On Thu, Apr 04, 2019 at 09:10:10AM -0400, Qian Cai wrote: > On Mon, 2019-04-01 at 12:27 +0300, Kirill A. Shutemov wrote: > > What about patch like this? (completely untested) > > > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > > index f939e004c5d1..e3b9bf843dcb 100644 > > --- a/include/linux/pagemap.h > > +++ b/include/linux/pagemap.h > > @@ -335,12 +335,12 @@ static inline struct page *grab_cache_page_nowait(struct > > address_space *mapping, > > > > static inline struct page *find_subpage(struct page *page, pgoff_t offset) > > { > > - unsigned long index = page_index(page); > > + unsigned long mask; > > > > VM_BUG_ON_PAGE(PageTail(page), page); > > - VM_BUG_ON_PAGE(index > offset, page); > > - VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page); > > - return page - index + offset; > > + > > + mask = (1UL << compound_order(page)) - 1; > > + return page + (offset & mask); > > } > > > > struct page *find_get_entry(struct address_space *mapping, pgoff_t offset); > > No, this then leads to a panic below by LTP hugemmap05. Still reverting the > whole "mm: page cache: store only head pages in i_pages" commit fixed the > problem. Ughh... hugetlb stores pages in page cache differently. What about this: diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index f939e004c5d1..2e8438a1216a 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -335,12 +335,15 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping, static inline struct page *find_subpage(struct page *page, pgoff_t offset) { - unsigned long index = page_index(page); + unsigned long mask; + + if (PageHuge(page)) + return page; VM_BUG_ON_PAGE(PageTail(page), page); - VM_BUG_ON_PAGE(index > offset, page); - VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page); - return page - index + offset; + + mask = (1UL << compound_order(page)) - 1; + return page + (offset & mask); } struct page *find_get_entry(struct address_space *mapping, pgoff_t offset); -- Kirill A. Shutemov ^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages 2019-04-04 13:45 ` Kirill A. Shutemov @ 2019-04-04 21:28 ` Qian Cai 2019-04-05 13:37 ` Kirill A. Shutemov 0 siblings, 1 reply; 21+ messages in thread From: Qian Cai @ 2019-04-04 21:28 UTC (permalink / raw) To: Kirill A. Shutemov; +Cc: Matthew Wilcox, Huang Ying, linux-mm On Thu, 2019-04-04 at 16:45 +0300, Kirill A. Shutemov wrote: > What about this: > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > index f939e004c5d1..2e8438a1216a 100644 > --- a/include/linux/pagemap.h > +++ b/include/linux/pagemap.h > @@ -335,12 +335,15 @@ static inline struct page *grab_cache_page_nowait(struct > address_space *mapping, > > static inline struct page *find_subpage(struct page *page, pgoff_t offset) > { > - unsigned long index = page_index(page); > + unsigned long mask; > + > + if (PageHuge(page)) > + return page; > > VM_BUG_ON_PAGE(PageTail(page), page); > - VM_BUG_ON_PAGE(index > offset, page); > - VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page); > - return page - index + offset; > + > + mask = (1UL << compound_order(page)) - 1; > + return page + (offset & mask); > } > > struct page *find_get_entry(struct address_space *mapping, pgoff_t offset); It works fine. ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages 2019-04-04 21:28 ` Qian Cai @ 2019-04-05 13:37 ` Kirill A. Shutemov 2019-04-05 13:51 ` Matthew Wilcox 0 siblings, 1 reply; 21+ messages in thread From: Kirill A. Shutemov @ 2019-04-05 13:37 UTC (permalink / raw) To: Qian Cai; +Cc: Matthew Wilcox, Huang Ying, linux-mm On Thu, Apr 04, 2019 at 05:28:02PM -0400, Qian Cai wrote: > On Thu, 2019-04-04 at 16:45 +0300, Kirill A. Shutemov wrote: > > What about this: > > > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > > index f939e004c5d1..2e8438a1216a 100644 > > --- a/include/linux/pagemap.h > > +++ b/include/linux/pagemap.h > > @@ -335,12 +335,15 @@ static inline struct page *grab_cache_page_nowait(struct > > address_space *mapping, > > > > static inline struct page *find_subpage(struct page *page, pgoff_t offset) > > { > > - unsigned long index = page_index(page); > > + unsigned long mask; > > + > > + if (PageHuge(page)) > > + return page; > > > > VM_BUG_ON_PAGE(PageTail(page), page); > > - VM_BUG_ON_PAGE(index > offset, page); > > - VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page); > > - return page - index + offset; > > + > > + mask = (1UL << compound_order(page)) - 1; > > + return page + (offset & mask); > > } > > > > struct page *find_get_entry(struct address_space *mapping, pgoff_t offset); > > It works fine. Nice. Matthew, does it look fine to you? -- Kirill A. Shutemov ^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages 2019-04-05 13:37 ` Kirill A. Shutemov @ 2019-04-05 13:51 ` Matthew Wilcox 0 siblings, 0 replies; 21+ messages in thread From: Matthew Wilcox @ 2019-04-05 13:51 UTC (permalink / raw) To: Kirill A. Shutemov, Andrew Morton; +Cc: Qian Cai, Huang Ying, linux-mm On Fri, Apr 05, 2019 at 04:37:42PM +0300, Kirill A. Shutemov wrote: > On Thu, Apr 04, 2019 at 05:28:02PM -0400, Qian Cai wrote: > > On Thu, 2019-04-04 at 16:45 +0300, Kirill A. Shutemov wrote: > > > What about this: > > > > > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h > > > index f939e004c5d1..2e8438a1216a 100644 > > > --- a/include/linux/pagemap.h > > > +++ b/include/linux/pagemap.h > > > @@ -335,12 +335,15 @@ static inline struct page *grab_cache_page_nowait(struct > > > address_space *mapping, > > > > > > static inline struct page *find_subpage(struct page *page, pgoff_t offset) > > > { > > > - unsigned long index = page_index(page); > > > + unsigned long mask; > > > + > > > + if (PageHuge(page)) > > > + return page; > > > > > > VM_BUG_ON_PAGE(PageTail(page), page); > > > - VM_BUG_ON_PAGE(index > offset, page); > > > - VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page); > > > - return page - index + offset; > > > + > > > + mask = (1UL << compound_order(page)) - 1; > > > + return page + (offset & mask); > > > } > > > > > > struct page *find_get_entry(struct address_space *mapping, pgoff_t offset); > > > > It works fine. > > Nice. > > Matthew, does it look fine to you? Yes, this looks good to me. I think we'll have opportunities to improve it later (eg when unifying THP and hugetlbfs uses of the page cache). But for now, Andrew can you add this version as -fix-fix? ^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2019-04-05 13:51 UTC | newest] Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <1553285568.26196.24.camel@lca.pw> 2019-03-23 3:38 ` page cache: Store only head pages in i_pages Matthew Wilcox 2019-03-23 23:50 ` Qian Cai 2019-03-24 2:06 ` Matthew Wilcox 2019-03-24 2:52 ` Qian Cai 2019-03-24 3:04 ` Matthew Wilcox 2019-03-24 15:42 ` Qian Cai 2019-03-27 10:48 ` William Kucharski 2019-03-27 11:50 ` Matthew Wilcox 2019-03-29 1:43 ` Qian Cai 2019-03-29 19:59 ` Matthew Wilcox 2019-03-29 21:25 ` Qian Cai 2019-03-30 3:04 ` Matthew Wilcox 2019-03-30 14:10 ` Matthew Wilcox 2019-03-31 3:23 ` Matthew Wilcox 2019-04-01 9:18 ` Kirill A. Shutemov 2019-04-01 9:27 ` Kirill A. Shutemov 2019-04-04 13:10 ` Qian Cai 2019-04-04 13:45 ` Kirill A. Shutemov 2019-04-04 21:28 ` Qian Cai 2019-04-05 13:37 ` Kirill A. Shutemov 2019-04-05 13:51 ` Matthew Wilcox
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).