* Re: page cache: Store only head pages in i_pages
[not found] <1553285568.26196.24.camel@lca.pw>
@ 2019-03-23 3:38 ` Matthew Wilcox
2019-03-23 23:50 ` Qian Cai
0 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox @ 2019-03-23 3:38 UTC (permalink / raw)
To: Qian Cai; +Cc: Huang Ying, linux-mm
On Fri, Mar 22, 2019 at 04:12:48PM -0400, Qian Cai wrote:
> FYI, every thing involve swapping seems triggered a panic now since this patch.
Thanks for the report! Does this fix it?
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 41858a3744b4..975aea9a49a5 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -335,6 +335,8 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
static inline struct page *find_subpage(struct page *page, pgoff_t offset)
{
VM_BUG_ON_PAGE(PageTail(page), page);
+ if (unlikely(PageSwapCache(page)))
+ return page;
VM_BUG_ON_PAGE(page->index > offset, page);
VM_BUG_ON_PAGE(page->index + compound_nr(page) <= offset, page);
return page - page->index + offset;
Huang, I'm pretty sure this isn't right for CONFIG_THP_SWAP, but I'm
not sure what the right answer is. The patch this is on top of includes:
@@ -132,7 +132,7 @@ int add_to_swap_cache(struct page *page, swp_entry_t entry, gfp_t gfp)
for (i = 0; i < nr; i++) {
VM_BUG_ON_PAGE(xas.xa_index != idx + i, page);
set_page_private(page + i, entry.val + i);
- xas_store(&xas, page + i);
+ xas_store(&xas, page);
xas_next(&xas);
}
address_space->nrpages += nr;
so if we've added a THP page, we're going to find the head page. I'm not
sure I understand how to get from the head page to the right subpage. Is
it as simple as:
+ if (unlikely(PageSwapCache(page)))
+ return page + (offset & (compound_nr(page) - 1));
or are they not stored at an aligned location?
> [11653.484481] page:ffffea0006ef7080 count:2 mapcount:0 mapping:0000000000000000
> index:0x0
> [11653.525397] swap_aops
> [11653.525404] flags:
> 0x5fffe000080454(uptodate|lru|workingset|owner_priv_1|swapbacked)
> [11653.573631] raw: 005fffe000080454 ffffea0006ef7048 ffffea0007c9c7c8
> 0000000000000000
> [11653.608547] raw: 0000000000000000 0000000000001afd 00000002ffffffff
> 0000000000000000
> [11653.643436] page dumped because: VM_BUG_ON_PAGE(page->index + (1 <<
> compound_order(page)) <= offset)
> [11653.684322] page allocated via order 0, migratetype Movable, gfp_mask
> 0x100cca(GFP_HIGHUSER_MOVABLE)
> [11653.725462] prep_new_page+0x3a4/0x4d0
> [11653.742373] get_page_from_freelist+0xcde/0x3550
> [11653.763449] __alloc_pages_nodemask+0x859/0x2ab0
> [11653.784105] alloc_pages_vma+0xb2/0x430
> [11653.801248] __read_swap_cache_async+0x49c/0xc30
> [11653.821943] swap_cluster_readahead+0x4a1/0x8b0
> [11653.842224] swapin_readahead+0xb6/0xc3e
> [11653.859894] do_swap_page+0xc87/0x24b0
> [11653.876664] __handle_mm_fault+0x1601/0x3bc0
> [11653.895818] handle_mm_fault+0x326/0x6cf
> [11653.913443] __do_page_fault+0x333/0x8d0
> [11653.931068] do_page_fault+0x75/0x48e
> [11653.947560] page_fault+0x1b/0x20
> [11653.962558] ------------[ cut here ]------------
> [11653.983290] kernel BUG at include/linux/pagemap.h:341!
> [11654.006336] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
> [11654.036835] CPU: 12 PID: 14006 Comm: in:imjournal Kdump: loaded Tainted:
> G W 5.1.0-rc1-mm1+ #17
> [11654.084191] Hardware name: HP ProLiant DL80 Gen9/ProLiant DL80 Gen9, BIOS U15
> 09/12/2016
> [11654.120401] RIP: 0010:find_get_entry+0x618/0x7f0
> [11654.141171] Code: c6 60 ae e9 97 4c 89 ff e8 c5 b2 0c 00 0f 0b 48 c7 c7 20 3b
> 42 98 e8 3c 3c 57 00 48 c7 c6 e0 b0 e9 97 4c 89 ff e8 a8 b2 0c 00 <0f> 0b 48 c7
> c7 e0 3a 42 98 e8 1f 3c 57 00 48 c7 c7 88 62 7e 98 e8
> [11654.226539] RSP: 0000:ffff8882095b7628 EFLAGS: 00010286
> [11654.249867] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff9735700e
> [11654.281925] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff888213e4d332
> [11654.313411] RBP: ffff8882095b7738 R08: ffffed1042d86a89 R09: ffffed1042d86a88
> [11654.346314] R10: ffffed1042d86a88 R11: ffff888216c35447 R12: 0000000000000000
> [11654.379144] R13: ffffea0006ef70a0 R14: ffff8881e5c16fc8 R15: ffffea0006ef7080
> [11654.411410] FS: 00007fb8daf56700(0000) GS:ffff888216c00000(0000)
> knlGS:0000000000000000
> [11654.447618] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [11654.473481] CR2: 00007f74c0647000 CR3: 0000000204860003 CR4: 00000000001606a0
> [11654.506282] Call Trace:
> [11654.517340] ? __filemap_set_wb_err+0x1f0/0x1f0
> [11654.538684] ? generic_make_request+0x283/0xc50
> [11654.562239] ? mem_cgroup_uncharge+0x150/0x150
> [11654.582551] pagecache_get_page+0x4a/0xb70
> [11654.600807] ? release_pages+0xada/0x1750
> [11654.618847] __read_swap_cache_async+0x1a8/0xc30
> [11654.640167] ? lookup_swap_cache+0x570/0x570
> [11654.659274] read_swap_cache_async+0x69/0xd0
> [11654.678354] ? __read_swap_cache_async+0xc30/0xc30
> [11654.699775] ? lru_add_drain_cpu+0x239/0x4e0
> [11654.718855] swap_cluster_readahead+0x386/0x8b0
> [11654.739171] ? read_swap_cache_async+0xd0/0xd0
> [11654.760517] ? xas_load+0x8b/0xf0
> [11654.775357] ? find_get_entry+0x39e/0x7f0
> [11654.793267] swapin_readahead+0xb6/0xc3e
> [11654.810391] ? exit_swap_address_space+0x1b0/0x1b0
> [11654.831816] ? lookup_swap_cache+0x114/0x570
> [11654.850972] ? xas_find+0x141/0x530
> [11654.866616] ? free_pages_and_swap_cache+0x2f0/0x2f0
> [11654.889495] ? swapcache_prepare+0x20/0x20
> [11654.907786] ? filemap_map_pages+0x3af/0xec0
> [11654.926919] do_swap_page+0xc87/0x24b0
> [11654.943758] ? unmap_mapping_range+0x30/0x30
> [11654.962918] ? kasan_check_read+0x11/0x20
> [11654.980900] ? do_raw_spin_unlock+0x59/0x250
> [11655.000027] __handle_mm_fault+0x1601/0x3bc0
> [11655.020130] ? __lock_acquire.isra.14+0x7d7/0x2130
> [11655.041845] ? vmf_insert_mixed_mkwrite+0x20/0x20
> [11655.066529] ? lock_acquire+0x169/0x360
> [11655.085380] handle_mm_fault+0x326/0x6cf
> [11655.102928] __do_page_fault+0x333/0x8d0
> [11655.120429] ? task_work_run+0xdd/0x190
> [11655.138322] do_page_fault+0x75/0x48e
> [11655.154693] ? page_fault+0x5/0x20
> [11655.170168] page_fault+0x1b/0x20
> [11655.185007] RIP: 0033:0x7fb8dde3ae14
> [11655.201007] Code: 00 0f 1f 44 00 00 f3 0f 1e fa f2 ff 25 65 1a 29 00 0f 1f 44
> 00 00 f3 0f 1e fa f2 ff 25 5d 1a 29 00 0f 1f 44 00 00 f3 0f 1e fa <f2> ff 25 55
> 1a 29 00 0f 1f 44 00 00 f3 0f 1e fa f2 ff 25 4d 1a 29
> [11655.286038] RSP: 002b:00007fb8daf55b28 EFLAGS: 00010246
> [11655.309443] RAX: 0000000000000000 RBX: 0000562a76a844f0 RCX: 0000000000000000
> [11655.341503] RDX: 00007fb8daf55ac0 RSI: 0000000000000001 RDI: 0000562a76a844f0
> [11655.373906] RBP: 00000000000dbba0 R08: 0000000000000008 R09: 0000000000000000
> [11655.407498] R10: 0000000000000000 R11: 0000000000000000 R12: 000000000000000e
> [11655.439429] R13: 00007fb8da036760 R14: 00007fb8daf55be0 R15: 00007fb8daf55bd0
> [11655.471484] Modules linked in: brd nls_iso8859_1 nls_cp437 vfat fat ext4
> crc16 mbcache jbd2 overlay loop kvm_intel kvm irqbypass ip_tables x_tables xfs
> sd_mod igb ahci i2c_algo_bit libahci libata i2c_core dm_mirror dm_region_hash
> dm_log dm_mod
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages
2019-03-23 3:38 ` page cache: Store only head pages in i_pages Matthew Wilcox
@ 2019-03-23 23:50 ` Qian Cai
2019-03-24 2:06 ` Matthew Wilcox
0 siblings, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-03-23 23:50 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Huang Ying, linux-mm
On 3/22/19 11:38 PM, Matthew Wilcox wrote:
> On Fri, Mar 22, 2019 at 04:12:48PM -0400, Qian Cai wrote:
>> FYI, every thing involve swapping seems triggered a panic now since this patch.
>
> Thanks for the report! Does this fix it?
>
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 41858a3744b4..975aea9a49a5 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -335,6 +335,8 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
> static inline struct page *find_subpage(struct page *page, pgoff_t offset)
> {
> VM_BUG_ON_PAGE(PageTail(page), page);
> + if (unlikely(PageSwapCache(page)))
> + return page;
> VM_BUG_ON_PAGE(page->index > offset, page);
> VM_BUG_ON_PAGE(page->index + compound_nr(page) <= offset, page);
> return page - page->index + offset;
Yes, it works.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages
2019-03-23 23:50 ` Qian Cai
@ 2019-03-24 2:06 ` Matthew Wilcox
2019-03-24 2:52 ` Qian Cai
0 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox @ 2019-03-24 2:06 UTC (permalink / raw)
To: Qian Cai; +Cc: Huang Ying, linux-mm
On Sat, Mar 23, 2019 at 07:50:15PM -0400, Qian Cai wrote:
> On 3/22/19 11:38 PM, Matthew Wilcox wrote:
> > On Fri, Mar 22, 2019 at 04:12:48PM -0400, Qian Cai wrote:
> >> FYI, every thing involve swapping seems triggered a panic now since this patch.
>
> Yes, it works.
Thanks for testing. Kirill suggests this would be a better fix:
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index 41858a3744b4..9718393ae45b 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -334,10 +334,12 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
static inline struct page *find_subpage(struct page *page, pgoff_t offset)
{
+ unsigned long index = page_index(page);
+
VM_BUG_ON_PAGE(PageTail(page), page);
- VM_BUG_ON_PAGE(page->index > offset, page);
- VM_BUG_ON_PAGE(page->index + compound_nr(page) <= offset, page);
- return page - page->index + offset;
+ VM_BUG_ON_PAGE(index > offset, page);
+ VM_BUG_ON_PAGE(index + compound_nr(page) <= offset, page);
+ return page - index + offset;
}
struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages
2019-03-24 2:06 ` Matthew Wilcox
@ 2019-03-24 2:52 ` Qian Cai
2019-03-24 3:04 ` Matthew Wilcox
0 siblings, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-03-24 2:52 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Huang Ying, linux-mm
On 3/23/19 10:06 PM, Matthew Wilcox wrote:
> Thanks for testing. Kirill suggests this would be a better fix:
>
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index 41858a3744b4..9718393ae45b 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -334,10 +334,12 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
>
> static inline struct page *find_subpage(struct page *page, pgoff_t offset)
> {
> + unsigned long index = page_index(page);
> +
> VM_BUG_ON_PAGE(PageTail(page), page);
> - VM_BUG_ON_PAGE(page->index > offset, page);
> - VM_BUG_ON_PAGE(page->index + compound_nr(page) <= offset, page);
> - return page - page->index + offset;
> + VM_BUG_ON_PAGE(index > offset, page);
> + VM_BUG_ON_PAGE(index + compound_nr(page) <= offset, page);
> + return page - index + offset;
> }
>
> struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
This is not even compiled.
If "s/compound_nr/compound_order/", it failed to boot here,
[ 56.843236] Unpacking initramfs...
[ 56.881979] page:ffff7fe022eb19c0 count:3 mapcount:0 mapping:38ff80080099c008
index:0x0
[ 56.890007] ramfs_aops
[ 56.890011] name:"lvm.conf"
[ 56.892465] flags: 0x17fffffa400000c(uptodate|dirty)
[ 56.900318] raw: 017fffffa400000c dead000000000100 dead000000000200
38ff80080099c008
[ 56.908066] raw: 0000000000000000 0000000000000000 00000003ffffffff
7bff8008203bcc80
[ 56.915812] page dumped because: VM_BUG_ON_PAGE(index + compound_order(page)
<= offset)
[ 56.923818] page->mem_cgroup:7bff8008203bcc80
[ 56.928180] page allocated via order 0, migratetype Unmovable, gfp_mask
0x100cc2(GFP_HIGHUSER)
[ 56.936800] prep_new_page+0x4e0/0x5e0
[ 56.940556] get_page_from_freelist+0x4cf4/0x50e0
[ 56.945265] __alloc_pages_nodemask+0x738/0x38b8
[ 56.949888] alloc_page_interleave+0x34/0x2f0
[ 56.954249] alloc_pages_current+0xc0/0x150
[ 56.958439] __page_cache_alloc+0x70/0x2f4
[ 56.962541] pagecache_get_page+0x5e4/0xaf0
[ 56.966729] grab_cache_page_write_begin+0x6c/0x98
[ 56.971526] simple_write_begin+0x40/0x308
[ 56.975627] generic_perform_write+0x1d4/0x4e0
[ 56.980076] __generic_file_write_iter+0x294/0x504
[ 56.984872] generic_file_write_iter+0x354/0x594
[ 56.989496] __vfs_write+0x72c/0x8a0
[ 56.993076] vfs_write+0x1ec/0x424
[ 56.996481] ksys_write+0xbc/0x190
[ 56.999890] xwrite+0x38/0x84
[ 57.002869] ------------[ cut here ]------------
[ 57.007478] kernel BUG at ./include/linux/pagemap.h:342!
[ 57.012801] Internal error: Oops - BUG: 0 [#1] SMP
[ 57.017584] Modules linked in:
[ 57.020636] CPU: 1 PID: 1 Comm: swapper/0 Not tainted 5.1.0-rc1-mm1+ #7
[ 57.027239] Hardware name: HPE Apollo 70 /C01_APACHE_MB ,
BIOS L50_5.13_1.0.6 07/10/2018
[ 57.037057] pstate: 60400009 (nZCv daif +PAN -UAO)
[ 57.041839] pc : find_get_entry+0x948/0x950
[ 57.046013] lr : find_get_entry+0x940/0x950
[ 57.050185] sp : 10ff80082600f420
[ 57.053489] x29: 10ff80082600f4d0 x28: efff100000000000
[ 57.058792] x27: ffff7fe022eb19c8 x26: 0000000000000010
[ 57.064095] x25: 0000000000000035 x24: 0000000000000003
[ 57.069397] x23: 00000000000000ff x22: 10ff80082600f460
[ 57.074700] x21: ffff7fe022eb19c0 x20: 35ff800825f9a050
[ 57.080002] x19: 0000000000000000 x18: 0000000000000000
[ 57.085304] x17: 0000000000000000 x16: 000000000000000a
[ 57.090606] x15: 35ff800825f9a0b8 x14: 0000000000000000
[ 57.095908] x13: ffff800825f9a050 x12: 00000000ffffffff
[ 57.101210] x11: 0000000000000003 x10: 00000000000000ff
[ 57.106512] x9 : e071b95619aca700 x8 : e071b95619aca700
[ 57.111814] x7 : 0000000000000000 x6 : ffff1000102d01f4
[ 57.117116] x5 : 0000000000000000 x4 : 0000000000000080
[ 57.122418] x3 : ffff1000102b84c8 x2 : 0000000000000000
[ 57.127720] x1 : 0000000000000004 x0 : ffff100013316b10
[ 57.133024] Process swapper/0 (pid: 1, stack limit = 0x(____ptrval____))
[ 57.139715] Call trace:
[ 57.142153] find_get_entry+0x948/0x950
[ 57.145979] pagecache_get_page+0x68/0xaf0
[ 57.150066] grab_cache_page_write_begin+0x6c/0x98
[ 57.154847] simple_write_begin+0x40/0x308
[ 57.158934] generic_perform_write+0x1d4/0x4e0
[ 57.163368] __generic_file_write_iter+0x294/0x504
[ 57.168150] generic_file_write_iter+0x354/0x594
[ 57.172757] __vfs_write+0x72c/0x8a0
[ 57.176323] vfs_write+0x1ec/0x424
[ 57.179715] ksys_write+0xbc/0x190
[ 57.183107] xwrite+0x38/0x84
[ 57.186066] do_copy+0x110/0x898
[ 57.189284] write_buffer+0x148/0x1cc
[ 57.192937] flush_buffer+0x94/0x240
[ 57.196505] __gunzip+0x738/0x8f0
[ 57.199810] gunzip+0x18/0x20
[ 57.202768] unpack_to_rootfs+0x358/0x968
[ 57.206769] populate_rootfs+0x120/0x198
[ 57.210684] do_one_initcall+0x544/0xd00
[ 57.214597] do_initcall_level+0x660/0x814
[ 57.218684] do_basic_setup+0x38/0x50
[ 57.222337] kernel_init_freeable+0x25c/0x444
[ 57.226686] kernel_init+0x1c/0x548
[ 57.230165] ret_from_fork+0x10/0x18
[ 57.233733] Code: aa1503e0 94034d79 b0016fe0 912c4000 (d4210000)
[ 57.240055] ---[ end trace d7c5c3c62a7fa743 ]---
[ 57.244664] Kernel panic - not syncing: Fatal exception
[ 57.249997] SMP: stopping secondary CPUs
[ 57.254417] ---[ end Kernel panic - not syncing: Fatal exception ]---
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages
2019-03-24 2:52 ` Qian Cai
@ 2019-03-24 3:04 ` Matthew Wilcox
2019-03-24 15:42 ` Qian Cai
` (2 more replies)
0 siblings, 3 replies; 21+ messages in thread
From: Matthew Wilcox @ 2019-03-24 3:04 UTC (permalink / raw)
To: Qian Cai; +Cc: Huang Ying, linux-mm
On Sat, Mar 23, 2019 at 10:52:49PM -0400, Qian Cai wrote:
> On 3/23/19 10:06 PM, Matthew Wilcox wrote:
> > Thanks for testing. Kirill suggests this would be a better fix:
> >
> > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> > index 41858a3744b4..9718393ae45b 100644
> > --- a/include/linux/pagemap.h
> > +++ b/include/linux/pagemap.h
> > @@ -334,10 +334,12 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
> >
> > static inline struct page *find_subpage(struct page *page, pgoff_t offset)
> > {
> > + unsigned long index = page_index(page);
> > +
> > VM_BUG_ON_PAGE(PageTail(page), page);
> > - VM_BUG_ON_PAGE(page->index > offset, page);
> > - VM_BUG_ON_PAGE(page->index + compound_nr(page) <= offset, page);
> > - return page - page->index + offset;
> > + VM_BUG_ON_PAGE(index > offset, page);
> > + VM_BUG_ON_PAGE(index + compound_nr(page) <= offset, page);
> > + return page - index + offset;
> > }
> >
> > struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
>
> This is not even compiled.
>
> If "s/compound_nr/compound_order/", it failed to boot here,
Oh, sorry. I have another patch in that tree.
- VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset,
- page);
+ VM_BUG_ON_PAGE(page->index + compound_nr(page) <= offset, page);
The patch for you should have looked like this:
@@ -335,11 +335,12 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
static inline struct page *find_subpage(struct page *page, pgoff_t offset)
{
+ unsigned long index = page_index(page);
+
VM_BUG_ON_PAGE(PageTail(page), page);
- VM_BUG_ON_PAGE(page->index > offset, page);
- VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset,
- page);
- return page - page->index + offset;
+ VM_BUG_ON_PAGE(index > offset, page);
+ VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
+ return page - index + offset;
}
> [ 56.915812] page dumped because: VM_BUG_ON_PAGE(index + compound_order(page)
> <= offset)
Yeah, you were missing the '1 <<' part.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages
2019-03-24 3:04 ` Matthew Wilcox
@ 2019-03-24 15:42 ` Qian Cai
2019-03-27 10:48 ` William Kucharski
2019-03-29 1:43 ` Qian Cai
2 siblings, 0 replies; 21+ messages in thread
From: Qian Cai @ 2019-03-24 15:42 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Huang Ying, linux-mm
On 3/23/19 11:04 PM, Matthew Wilcox wrote:
> The patch for you should have looked like this:
>
> @@ -335,11 +335,12 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
>
> static inline struct page *find_subpage(struct page *page, pgoff_t offset)
> {
> + unsigned long index = page_index(page);
> +
> VM_BUG_ON_PAGE(PageTail(page), page);
> - VM_BUG_ON_PAGE(page->index > offset, page);
> - VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset,
> - page);
> - return page - page->index + offset;
> + VM_BUG_ON_PAGE(index > offset, page);
> + VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
> + return page - index + offset;
> }
It works great.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages
2019-03-24 3:04 ` Matthew Wilcox
2019-03-24 15:42 ` Qian Cai
@ 2019-03-27 10:48 ` William Kucharski
2019-03-27 11:50 ` Matthew Wilcox
2019-03-29 1:43 ` Qian Cai
2 siblings, 1 reply; 21+ messages in thread
From: William Kucharski @ 2019-03-27 10:48 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Qian Cai, Huang Ying, linux-mm
> On Mar 23, 2019, at 9:04 PM, Matthew Wilcox <willy@infradead.org> wrote:
>
>
> static inline struct page *find_subpage(struct page *page, pgoff_t offset)
> {
> + unsigned long index = page_index(page);
> +
> VM_BUG_ON_PAGE(PageTail(page), page);
> - VM_BUG_ON_PAGE(page->index > offset, page);
> - VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset,
> - page);
> - return page - page->index + offset;
> + VM_BUG_ON_PAGE(index > offset, page);
> + VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
> + return page - index + offset;
> }
>
>
>> [ 56.915812] page dumped because: VM_BUG_ON_PAGE(index + compound_order(page)
>> <= offset)
>
> Yeah, you were missing the '1 <<' part.
>
Is a V5 patch coming incorporating these?
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages
2019-03-27 10:48 ` William Kucharski
@ 2019-03-27 11:50 ` Matthew Wilcox
0 siblings, 0 replies; 21+ messages in thread
From: Matthew Wilcox @ 2019-03-27 11:50 UTC (permalink / raw)
To: William Kucharski; +Cc: Qian Cai, Huang Ying, linux-mm
On Wed, Mar 27, 2019 at 04:48:42AM -0600, William Kucharski wrote:
> > On Mar 23, 2019, at 9:04 PM, Matthew Wilcox <willy@infradead.org> wrote:
> > static inline struct page *find_subpage(struct page *page, pgoff_t offset)
> > {
> > + unsigned long index = page_index(page);
> > +
> > VM_BUG_ON_PAGE(PageTail(page), page);
> > - VM_BUG_ON_PAGE(page->index > offset, page);
> > - VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset,
> > - page);
> > - return page - page->index + offset;
> > + VM_BUG_ON_PAGE(index > offset, page);
> > + VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
> > + return page - index + offset;
> > }
> >
> >
> >> [ 56.915812] page dumped because: VM_BUG_ON_PAGE(index + compound_order(page)
> >> <= offset)
>
> Is a V5 patch coming incorporating these?
No; Andrew prefers to accumulate fixes as separate patches, so that's
what I sent on March 24th; Andrew added it to -mm yesterday.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages
2019-03-24 3:04 ` Matthew Wilcox
2019-03-24 15:42 ` Qian Cai
2019-03-27 10:48 ` William Kucharski
@ 2019-03-29 1:43 ` Qian Cai
2019-03-29 19:59 ` Matthew Wilcox
2 siblings, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-03-29 1:43 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Huang Ying, linux-mm
On 3/23/19 11:04 PM, Matthew Wilcox wrote> @@ -335,11 +335,12 @@ static inline
struct page *grab_cache_page_nowait(struct address_space *mapping,
>
> static inline struct page *find_subpage(struct page *page, pgoff_t offset)
> {
> + unsigned long index = page_index(page);
> +
> VM_BUG_ON_PAGE(PageTail(page), page);
> - VM_BUG_ON_PAGE(page->index > offset, page);
> - VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset,
> - page);
> - return page - page->index + offset;
> + VM_BUG_ON_PAGE(index > offset, page);
> + VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
> + return page - index + offset;
> }
Even with this patch, it is still able to trigger a panic below by running LTP
mm tests. Always triggered by oom02 (or oom04) at the end.
# /opt/ltp/runltp -f mm
The problem is that in scan_swap_map_slots(),
/* reuse swap entry of cache-only swap if not busy. */
if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) {
int swap_was_freed;
unlock_cluster(ci);
spin_unlock(&si->lock);
swap_was_freed = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY);
but that swap entry has already been freed, and the page has PageSwapCache
cleared and page->private is 0.
swp_entry_t entry = swp_entry(si->type, offset)
and then in find_subpage(),
its page->index has a different meaning again and the calculation is now all wrong.
return page - page->index + offset;
[ 7439.033573] oom_reaper: reaped process 47172 (oom02), now anon-rss:0kB,
file-rss:0kB, shmem-rss:0kB
[ 7456.445737] LTP: starting oom03
[ 7456.535940] LTP: starting oom04
[ 7493.077222] page:ffffea00877a13c0 count:1 mapcount:0 mapping:ffff88a79061d009
index:0x7fa81584f
[ 7493.086963] anon
[ 7493.086968] flags: 0x15fffe00008005c(uptodate|dirty|lru|workingset|swapbacked)
[ 7493.097201] raw: 015fffe00008005c ffffea00b4bf9508 ffffea007f45efc8
ffff88a79061d009
[ 7493.105853] raw: 00000007fa81584f 0000000000000000 00000001ffffffff
ffff888f18278008
[ 7493.114504] page dumped because: VM_BUG_ON_PAGE(index + (1 <<
compound_order(page)) <= offset)
[ 7493.124126] page->mem_cgroup:ffff888f18278008
[ 7493.129036] page_owner info is not active (free page?)
[ 7493.134782] ------------[ cut here ]------------
[ 7493.139937] kernel BUG at include/linux/pagemap.h:342!
[ 7493.145682] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[ 7493.152679] CPU: 5 PID: 47308 Comm: oom04 Kdump: loaded Tainted: G W
5.1.0-rc2-mm1+ #13
[ 7493.163068] Hardware name: Lenovo ThinkSystem SR530
-[7X07RCZ000]-/-[7X07RCZ000]-, BIOS -[TEE113T-1.00]- 07/07/2017
[ 7493.174721] RIP: 0010:find_get_entry+0x751/0x9b0
[ 7493.179876] Code: c6 e0 aa a9 8d 4c 89 ff e8 3c 18 0d 00 0f 0b 48 c7 c7 20 40
02 8e e8 a3 17 58 00 48 c7 c6 40 ad a9 8d 4c 89 ff e8 1f 18 0d 00 <0f> 0b 48 c7
c7 e0 3f 02 8e e8 86 17 58 00 48 c7 c7 68 11 3f 8e e8
[ 7493.200834] RSP: 0000:ffff888d50536ba8 EFLAGS: 00010282
[ 7493.206666] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff8cd6401e
[ 7493.214632] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88979e8b5480
[ 7493.222599] RBP: ffff888d50536cb8 R08: ffffed12f3d16a91 R09: ffffed12f3d16a90
[ 7493.230566] R10: ffffed12f3d16a90 R11: ffff88979e8b5487 R12: ffffea00877a13c0
[ 7493.238531] R13: ffffea00877a13c8 R14: ffffea00877a13c8 R15: ffffea00877a13c0
[ 7493.246496] FS: 00007f248398c700(0000) GS:ffff88979e880000(0000)
knlGS:0000000000000000
[ 7493.255527] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[ 7493.261942] CR2: 00007f3fde110000 CR3: 00000011b2fcc003 CR4: 00000000001606a0
[ 7493.269900] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[ 7493.277864] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[ 7493.285830] Call Trace:
[ 7493.288555] ? queued_spin_lock_slowpath+0x571/0x9e0
[ 7493.294097] ? __filemap_set_wb_err+0x1f0/0x1f0
[ 7493.299154] pagecache_get_page+0x4a/0xb70
[ 7493.303729] __try_to_reclaim_swap+0xa3/0x400
[ 7493.308593] scan_swap_map_slots+0xc05/0x1850
[ 7493.313447] ? __try_to_reclaim_swap+0x400/0x400
[ 7493.318600] ? do_raw_spin_lock+0x128/0x280
[ 7493.323269] ? rwlock_bug.part.0+0x90/0x90
[ 7493.327840] ? get_swap_pages+0x195/0x730
[ 7493.332316] get_swap_pages+0x386/0x730
[ 7493.336590] get_swap_page+0x2b2/0x643
[ 7493.340774] ? rmap_walk+0x140/0x140
[ 7493.344765] ? free_swap_slot+0x3c0/0x3c0
[ 7493.349232] ? anon_vma_ctor+0xe0/0xe0
[ 7493.353407] ? page_get_anon_vma+0x280/0x280
[ 7493.358173] add_to_swap+0x10b/0x230
[ 7493.362164] shrink_page_list+0x29d8/0x4960
[ 7493.366822] ? page_evictable+0x11b/0x1d0
[ 7493.371296] ? page_evictable+0x1d0/0x1d0
[ 7493.375769] ? __isolate_lru_page+0x880/0x880
[ 7493.380631] ? __lock_acquire.isra.14+0x7d7/0x2130
[ 7493.385977] ? shrink_inactive_list+0x484/0x13b0
[ 7493.391130] ? lock_downgrade+0x760/0x760
[ 7493.395608] ? kasan_check_read+0x11/0x20
[ 7493.400082] ? do_raw_spin_unlock+0x59/0x250
[ 7493.404848] shrink_inactive_list+0x4bf/0x13b0
[ 7493.409823] ? move_pages_to_lru+0x1c90/0x1c90
[ 7493.414795] ? kasan_check_read+0x11/0x20
[ 7493.419261] ? lruvec_lru_size+0xef/0x4c0
[ 7493.423738] ? call_function_interrupt+0xa/0x20
[ 7493.428800] ? rcu_all_qs+0x11/0xc0
[ 7493.432692] shrink_node_memcg+0x66a/0x1ee0
[ 7493.437361] ? shrink_active_list+0x1150/0x1150
[ 7493.442417] ? lock_downgrade+0x760/0x760
[ 7493.446891] ? lock_acquire+0x169/0x360
[ 7493.451177] ? mem_cgroup_iter+0x210/0xca0
[ 7493.455747] ? kasan_check_read+0x11/0x20
[ 7493.460221] ? mem_cgroup_protected+0x94/0x450
[ 7493.465179] shrink_node+0x266/0x13c0
[ 7493.469267] ? shrink_node_memcg+0x1ee0/0x1ee0
[ 7493.474230] ? ktime_get+0xab/0x140
[ 7493.478122] ? zone_reclaimable_pages+0x553/0x8d0
[ 7493.483371] do_try_to_free_pages+0x349/0x11e0
[ 7493.488333] ? allow_direct_reclaim.part.6+0xc3/0x240
[ 7493.493971] ? shrink_node+0x13c0/0x13c0
[ 7493.498352] ? queue_delayed_work_on+0x30/0x30
[ 7493.503313] try_to_free_pages+0x277/0x740
[ 7493.507884] ? __lock_acquire.isra.14+0x7d7/0x2130
[ 7493.513232] ? do_try_to_free_pages+0x11e0/0x11e0
[ 7493.518482] __alloc_pages_nodemask+0xc37/0x2ab0
[ 7493.523635] ? gfp_pfmemalloc_allowed+0x150/0x150
[ 7493.528886] ? __lock_acquire.isra.14+0x7d7/0x2130
[ 7493.534226] ? __lock_acquire.isra.14+0x7d7/0x2130
[ 7493.539566] ? do_anonymous_page+0x450/0x1e00
[ 7493.544419] ? lock_downgrade+0x760/0x760
[ 7493.548896] ? __lru_cache_add+0xc2/0x240
[ 7493.553372] alloc_pages_vma+0xb2/0x430
[ 7493.557652] do_anonymous_page+0x50a/0x1e00
[ 7493.562324] ? put_prev_task_fair+0x27c/0x720
[ 7493.567189] ? finish_fault+0x290/0x290
[ 7493.571471] __handle_mm_fault+0x1688/0x3bc0
[ 7493.576227] ? __lock_acquire.isra.14+0x7d7/0x2130
[ 7493.581574] ? vmf_insert_mixed_mkwrite+0x20/0x20
[ 7493.586824] handle_mm_fault+0x326/0x6cf
[ 7493.591203] __do_page_fault+0x333/0x8d0
[ 7493.595571] do_page_fault+0x75/0x48e
[ 7493.599660] ? page_fault+0x5/0x20
[ 7493.603458] page_fault+0x1b/0x20
[ 7493.607156] RIP: 0033:0x410930
[ 7493.610564] Code: 89 de e8 53 26 ff ff 48 83 f8 ff 0f 84 86 00 00 00 48 89 c5
41 83 fc 02 74 28 41 83 fc 03 74 62 e8 05 2c ff ff 31 d2 48 98 90 <c6> 44 15 00
07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
[ 7493.631521] RSP: 002b:00007f248398bec0 EFLAGS: 00010206
[ 7493.637352] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: 00007f42982ad497
[ 7493.645316] RDX: 000000002a9a3000 RSI: 00000000c0000000 RDI: 0000000000000000
[ 7493.653281] RBP: 00007f230298b000 R08: 00000000ffffffff R09: 0000000000000000
[ 7493.661245] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000001
[ 7493.669209] R13: 00007ffc5b8a54ef R14: 0000000000000000 R15: 00007f248398bfc0
[ 7493.677176] Modules linked in: brd ext4 crc16 mbcache jbd2 overlay loop
nls_iso8859_1 nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables
x_tables xfs sd_mod i40e ahci libahci megaraid_sas libata dm_mirror
dm_region_hash dm_log dm_mod efivarfs
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages
2019-03-29 1:43 ` Qian Cai
@ 2019-03-29 19:59 ` Matthew Wilcox
2019-03-29 21:25 ` Qian Cai
0 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox @ 2019-03-29 19:59 UTC (permalink / raw)
To: Qian Cai; +Cc: Huang Ying, linux-mm, Kirill A. Shutemov
On Thu, Mar 28, 2019 at 09:43:29PM -0400, Qian Cai wrote:
> On 3/23/19 11:04 PM, Matthew Wilcox wrote> @@ -335,11 +335,12 @@ static inline
> struct page *grab_cache_page_nowait(struct address_space *mapping,
> >
> > static inline struct page *find_subpage(struct page *page, pgoff_t offset)
> > {
> > + unsigned long index = page_index(page);
> > +
> > VM_BUG_ON_PAGE(PageTail(page), page);
> > - VM_BUG_ON_PAGE(page->index > offset, page);
> > - VM_BUG_ON_PAGE(page->index + (1 << compound_order(page)) <= offset,
> > - page);
> > - return page - page->index + offset;
> > + VM_BUG_ON_PAGE(index > offset, page);
> > + VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
> > + return page - index + offset;
> > }
>
> Even with this patch, it is still able to trigger a panic below by running LTP
> mm tests. Always triggered by oom02 (or oom04) at the end.
>
> # /opt/ltp/runltp -f mm
>
> The problem is that in scan_swap_map_slots(),
>
> /* reuse swap entry of cache-only swap if not busy. */
> if (vm_swap_full() && si->swap_map[offset] == SWAP_HAS_CACHE) {
> int swap_was_freed;
> unlock_cluster(ci);
> spin_unlock(&si->lock);
> swap_was_freed = __try_to_reclaim_swap(si, offset, TTRS_ANYWAY);
>
> but that swap entry has already been freed, and the page has PageSwapCache
> cleared and page->private is 0.
I don't understand how we get to this situation. We SetPageSwapCache()
in add_to_swap_cache() right before we store the page in i_pages.
We ClearPageSwapCache() in __delete_from_swap_cache() right after
removing the page from the array. So how do we find a page in a swap
address space that has PageSwapCache cleared?
Indeed, we have a check which should trigger ...
VM_BUG_ON_PAGE(!PageSwapCache(page), page);
in __delete_from_swap_cache().
Oh ... is it a race?
* Its ok to check for PageSwapCache without the page lock
* here because we are going to recheck again inside
* try_to_free_swap() _with_ the lock.
so CPU A does:
page = find_get_page(swap_address_space(entry), offset)
page = find_subpage(page, offset);
trylock_page(page);
while CPU B does:
xa_lock_irq(&address_space->i_pages);
__delete_from_swap_cache(page, entry);
xas_store(&xas, NULL);
ClearPageSwapCache(page);
xa_unlock_irq(&address_space->i_pages);
and if the ClearPageSwapCache happens between the xas_load() and the
find_subpage(), we're stuffed. CPU A has a reference to the page, but
not a lock, and find_get_page is running under RCU.
I suppose we could fix this by taking the i_pages xa_lock around the
call to find_get_pages(). If indeed, that's what this problem is.
Want to try this patch?
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 2b8d9c3fbb47..ed8e42be88b5 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -127,10 +127,14 @@ static int __try_to_reclaim_swap(struct swap_info_struct *si,
unsigned long offset, unsigned long flags)
{
swp_entry_t entry = swp_entry(si->type, offset);
+ struct address_space *mapping = swap_address_space(entry);
+ unsigned long irq_flags;
struct page *page;
int ret = 0;
- page = find_get_page(swap_address_space(entry), offset);
+ xa_lock_irqsave(&mapping->i_pages, irq_flags);
+ page = find_get_page(mapping, offset);
+ xa_unlock_irqrestore(&mapping->i_pages, irq_flags);
if (!page)
return 0;
/*
> swp_entry_t entry = swp_entry(si->type, offset)
>
> and then in find_subpage(),
>
> its page->index has a different meaning again and the calculation is now all wrong.
>
> return page - page->index + offset;
>
> [ 7439.033573] oom_reaper: reaped process 47172 (oom02), now anon-rss:0kB,
> file-rss:0kB, shmem-rss:0kB
> [ 7456.445737] LTP: starting oom03
> [ 7456.535940] LTP: starting oom04
> [ 7493.077222] page:ffffea00877a13c0 count:1 mapcount:0 mapping:ffff88a79061d009
> index:0x7fa81584f
> [ 7493.086963] anon
> [ 7493.086968] flags: 0x15fffe00008005c(uptodate|dirty|lru|workingset|swapbacked)
> [ 7493.097201] raw: 015fffe00008005c ffffea00b4bf9508 ffffea007f45efc8
> ffff88a79061d009
> [ 7493.105853] raw: 00000007fa81584f 0000000000000000 00000001ffffffff
> ffff888f18278008
> [ 7493.114504] page dumped because: VM_BUG_ON_PAGE(index + (1 <<
> compound_order(page)) <= offset)
> [ 7493.124126] page->mem_cgroup:ffff888f18278008
> [ 7493.129036] page_owner info is not active (free page?)
> [ 7493.134782] ------------[ cut here ]------------
> [ 7493.139937] kernel BUG at include/linux/pagemap.h:342!
> [ 7493.145682] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
> [ 7493.152679] CPU: 5 PID: 47308 Comm: oom04 Kdump: loaded Tainted: G W
> 5.1.0-rc2-mm1+ #13
> [ 7493.163068] Hardware name: Lenovo ThinkSystem SR530
> -[7X07RCZ000]-/-[7X07RCZ000]-, BIOS -[TEE113T-1.00]- 07/07/2017
> [ 7493.174721] RIP: 0010:find_get_entry+0x751/0x9b0
> [ 7493.179876] Code: c6 e0 aa a9 8d 4c 89 ff e8 3c 18 0d 00 0f 0b 48 c7 c7 20 40
> 02 8e e8 a3 17 58 00 48 c7 c6 40 ad a9 8d 4c 89 ff e8 1f 18 0d 00 <0f> 0b 48 c7
> c7 e0 3f 02 8e e8 86 17 58 00 48 c7 c7 68 11 3f 8e e8
> [ 7493.200834] RSP: 0000:ffff888d50536ba8 EFLAGS: 00010282
> [ 7493.206666] RAX: 0000000000000000 RBX: 0000000000000001 RCX: ffffffff8cd6401e
> [ 7493.214632] RDX: 0000000000000000 RSI: 0000000000000008 RDI: ffff88979e8b5480
> [ 7493.222599] RBP: ffff888d50536cb8 R08: ffffed12f3d16a91 R09: ffffed12f3d16a90
> [ 7493.230566] R10: ffffed12f3d16a90 R11: ffff88979e8b5487 R12: ffffea00877a13c0
> [ 7493.238531] R13: ffffea00877a13c8 R14: ffffea00877a13c8 R15: ffffea00877a13c0
> [ 7493.246496] FS: 00007f248398c700(0000) GS:ffff88979e880000(0000)
> knlGS:0000000000000000
> [ 7493.255527] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [ 7493.261942] CR2: 00007f3fde110000 CR3: 00000011b2fcc003 CR4: 00000000001606a0
> [ 7493.269900] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [ 7493.277864] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [ 7493.285830] Call Trace:
> [ 7493.288555] ? queued_spin_lock_slowpath+0x571/0x9e0
> [ 7493.294097] ? __filemap_set_wb_err+0x1f0/0x1f0
> [ 7493.299154] pagecache_get_page+0x4a/0xb70
> [ 7493.303729] __try_to_reclaim_swap+0xa3/0x400
> [ 7493.308593] scan_swap_map_slots+0xc05/0x1850
> [ 7493.313447] ? __try_to_reclaim_swap+0x400/0x400
> [ 7493.318600] ? do_raw_spin_lock+0x128/0x280
> [ 7493.323269] ? rwlock_bug.part.0+0x90/0x90
> [ 7493.327840] ? get_swap_pages+0x195/0x730
> [ 7493.332316] get_swap_pages+0x386/0x730
> [ 7493.336590] get_swap_page+0x2b2/0x643
> [ 7493.340774] ? rmap_walk+0x140/0x140
> [ 7493.344765] ? free_swap_slot+0x3c0/0x3c0
> [ 7493.349232] ? anon_vma_ctor+0xe0/0xe0
> [ 7493.353407] ? page_get_anon_vma+0x280/0x280
> [ 7493.358173] add_to_swap+0x10b/0x230
> [ 7493.362164] shrink_page_list+0x29d8/0x4960
> [ 7493.366822] ? page_evictable+0x11b/0x1d0
> [ 7493.371296] ? page_evictable+0x1d0/0x1d0
> [ 7493.375769] ? __isolate_lru_page+0x880/0x880
> [ 7493.380631] ? __lock_acquire.isra.14+0x7d7/0x2130
> [ 7493.385977] ? shrink_inactive_list+0x484/0x13b0
> [ 7493.391130] ? lock_downgrade+0x760/0x760
> [ 7493.395608] ? kasan_check_read+0x11/0x20
> [ 7493.400082] ? do_raw_spin_unlock+0x59/0x250
> [ 7493.404848] shrink_inactive_list+0x4bf/0x13b0
> [ 7493.409823] ? move_pages_to_lru+0x1c90/0x1c90
> [ 7493.414795] ? kasan_check_read+0x11/0x20
> [ 7493.419261] ? lruvec_lru_size+0xef/0x4c0
> [ 7493.423738] ? call_function_interrupt+0xa/0x20
> [ 7493.428800] ? rcu_all_qs+0x11/0xc0
> [ 7493.432692] shrink_node_memcg+0x66a/0x1ee0
> [ 7493.437361] ? shrink_active_list+0x1150/0x1150
> [ 7493.442417] ? lock_downgrade+0x760/0x760
> [ 7493.446891] ? lock_acquire+0x169/0x360
> [ 7493.451177] ? mem_cgroup_iter+0x210/0xca0
> [ 7493.455747] ? kasan_check_read+0x11/0x20
> [ 7493.460221] ? mem_cgroup_protected+0x94/0x450
> [ 7493.465179] shrink_node+0x266/0x13c0
> [ 7493.469267] ? shrink_node_memcg+0x1ee0/0x1ee0
> [ 7493.474230] ? ktime_get+0xab/0x140
> [ 7493.478122] ? zone_reclaimable_pages+0x553/0x8d0
> [ 7493.483371] do_try_to_free_pages+0x349/0x11e0
> [ 7493.488333] ? allow_direct_reclaim.part.6+0xc3/0x240
> [ 7493.493971] ? shrink_node+0x13c0/0x13c0
> [ 7493.498352] ? queue_delayed_work_on+0x30/0x30
> [ 7493.503313] try_to_free_pages+0x277/0x740
> [ 7493.507884] ? __lock_acquire.isra.14+0x7d7/0x2130
> [ 7493.513232] ? do_try_to_free_pages+0x11e0/0x11e0
> [ 7493.518482] __alloc_pages_nodemask+0xc37/0x2ab0
> [ 7493.523635] ? gfp_pfmemalloc_allowed+0x150/0x150
> [ 7493.528886] ? __lock_acquire.isra.14+0x7d7/0x2130
> [ 7493.534226] ? __lock_acquire.isra.14+0x7d7/0x2130
> [ 7493.539566] ? do_anonymous_page+0x450/0x1e00
> [ 7493.544419] ? lock_downgrade+0x760/0x760
> [ 7493.548896] ? __lru_cache_add+0xc2/0x240
> [ 7493.553372] alloc_pages_vma+0xb2/0x430
> [ 7493.557652] do_anonymous_page+0x50a/0x1e00
> [ 7493.562324] ? put_prev_task_fair+0x27c/0x720
> [ 7493.567189] ? finish_fault+0x290/0x290
> [ 7493.571471] __handle_mm_fault+0x1688/0x3bc0
> [ 7493.576227] ? __lock_acquire.isra.14+0x7d7/0x2130
> [ 7493.581574] ? vmf_insert_mixed_mkwrite+0x20/0x20
> [ 7493.586824] handle_mm_fault+0x326/0x6cf
> [ 7493.591203] __do_page_fault+0x333/0x8d0
> [ 7493.595571] do_page_fault+0x75/0x48e
> [ 7493.599660] ? page_fault+0x5/0x20
> [ 7493.603458] page_fault+0x1b/0x20
> [ 7493.607156] RIP: 0033:0x410930
> [ 7493.610564] Code: 89 de e8 53 26 ff ff 48 83 f8 ff 0f 84 86 00 00 00 48 89 c5
> 41 83 fc 02 74 28 41 83 fc 03 74 62 e8 05 2c ff ff 31 d2 48 98 90 <c6> 44 15 00
> 07 48 01 c2 48 39 d3 7f f3 31 c0 5b 5d 41 5c c3 0f 1f
> [ 7493.631521] RSP: 002b:00007f248398bec0 EFLAGS: 00010206
> [ 7493.637352] RAX: 0000000000001000 RBX: 00000000c0000000 RCX: 00007f42982ad497
> [ 7493.645316] RDX: 000000002a9a3000 RSI: 00000000c0000000 RDI: 0000000000000000
> [ 7493.653281] RBP: 00007f230298b000 R08: 00000000ffffffff R09: 0000000000000000
> [ 7493.661245] R10: 0000000000000022 R11: 0000000000000246 R12: 0000000000000001
> [ 7493.669209] R13: 00007ffc5b8a54ef R14: 0000000000000000 R15: 00007f248398bfc0
> [ 7493.677176] Modules linked in: brd ext4 crc16 mbcache jbd2 overlay loop
> nls_iso8859_1 nls_cp437 vfat fat kvm_intel kvm irqbypass efivars ip_tables
> x_tables xfs sd_mod i40e ahci libahci megaraid_sas libata dm_mirror
> dm_region_hash dm_log dm_mod efivarfs
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages
2019-03-29 19:59 ` Matthew Wilcox
@ 2019-03-29 21:25 ` Qian Cai
2019-03-30 3:04 ` Matthew Wilcox
0 siblings, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-03-29 21:25 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Huang Ying, linux-mm, Kirill A. Shutemov
On Fri, 2019-03-29 at 12:59 -0700, Matthew Wilcox wrote:
> I don't understand how we get to this situation. We SetPageSwapCache()
> in add_to_swap_cache() right before we store the page in i_pages.
> We ClearPageSwapCache() in __delete_from_swap_cache() right after
> removing the page from the array. So how do we find a page in a swap
> address space that has PageSwapCache cleared?
>
> Indeed, we have a check which should trigger ...
>
> VM_BUG_ON_PAGE(!PageSwapCache(page), page);
>
> in __delete_from_swap_cache().
>
> Oh ... is it a race?
>
> * Its ok to check for PageSwapCache without the page lock
> * here because we are going to recheck again inside
> * try_to_free_swap() _with_ the lock.
>
> so CPU A does:
>
> page = find_get_page(swap_address_space(entry), offset)
> page = find_subpage(page, offset);
> trylock_page(page);
>
> while CPU B does:
>
> xa_lock_irq(&address_space->i_pages);
> __delete_from_swap_cache(page, entry);
> xas_store(&xas, NULL);
> ClearPageSwapCache(page);
> xa_unlock_irq(&address_space->i_pages);
>
> and if the ClearPageSwapCache happens between the xas_load() and the
> find_subpage(), we're stuffed. CPU A has a reference to the page, but
> not a lock, and find_get_page is running under RCU.
>
> I suppose we could fix this by taking the i_pages xa_lock around the
> call to find_get_pages(). If indeed, that's what this problem is.
> Want to try this patch?
Confirmed. Well spotted!
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 2b8d9c3fbb47..ed8e42be88b5 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -127,10 +127,14 @@ static int __try_to_reclaim_swap(struct swap_info_struct
> *si,
> unsigned long offset, unsigned long flags)
> {
> swp_entry_t entry = swp_entry(si->type, offset);
> + struct address_space *mapping = swap_address_space(entry);
> + unsigned long irq_flags;
> struct page *page;
> int ret = 0;
>
> - page = find_get_page(swap_address_space(entry), offset);
> + xa_lock_irqsave(&mapping->i_pages, irq_flags);
> + page = find_get_page(mapping, offset);
> + xa_unlock_irqrestore(&mapping->i_pages, irq_flags);
> if (!page)
> return 0;
> /*
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages
2019-03-29 21:25 ` Qian Cai
@ 2019-03-30 3:04 ` Matthew Wilcox
2019-03-30 14:10 ` Matthew Wilcox
0 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox @ 2019-03-30 3:04 UTC (permalink / raw)
To: Qian Cai; +Cc: Huang Ying, linux-mm, Kirill A. Shutemov
On Fri, Mar 29, 2019 at 05:25:34PM -0400, Qian Cai wrote:
> On Fri, 2019-03-29 at 12:59 -0700, Matthew Wilcox wrote:
> > Oh ... is it a race?
> >
> > so CPU A does:
> >
> > page = find_get_page(swap_address_space(entry), offset)
> > page = find_subpage(page, offset);
> > trylock_page(page);
> >
> > while CPU B does:
> >
> > xa_lock_irq(&address_space->i_pages);
> > __delete_from_swap_cache(page, entry);
> > xas_store(&xas, NULL);
> > ClearPageSwapCache(page);
> > xa_unlock_irq(&address_space->i_pages);
> >
> > and if the ClearPageSwapCache happens between the xas_load() and the
> > find_subpage(), we're stuffed. CPU A has a reference to the page, but
> > not a lock, and find_get_page is running under RCU.
> >
> > I suppose we could fix this by taking the i_pages xa_lock around the
> > call to find_get_pages(). If indeed, that's what this problem is.
> > Want to try this patch?
>
> Confirmed. Well spotted!
Excellent! I'm not comfortable with the rule that you have to be holding
the i_pages lock in order to call find_get_page() on a swap address_space.
How does this look to the various smart people who know far more about the
MM than I do?
The idea is to ensure that if this race does happen, the page will be
handled the same way as a pagecache page. If __delete_from_swap_cache()
can be called while the page is still part of a VMA, then this patch
will break page_to_pgoff(). But I don't think that can happen ... ?
(also, is this the right memory barrier to use to ensure that the old
value of page->index cannot be observed if the PageSwapCache bit is
obseved as having been cleared?)
diff --git a/mm/swap_state.c b/mm/swap_state.c
index 2e15cc335966..a715efcf0991 100644
--- a/mm/swap_state.c
+++ b/mm/swap_state.c
@@ -165,13 +165,16 @@ void __delete_from_swap_cache(struct page *page, swp_entry_t entry)
VM_BUG_ON_PAGE(!PageSwapCache(page), page);
VM_BUG_ON_PAGE(PageWriteback(page), page);
+ page->index = idx;
+ smp_mb__before_atomic();
+ ClearPageSwapCache(page);
+
for (i = 0; i < nr; i++) {
void *entry = xas_store(&xas, NULL);
VM_BUG_ON_PAGE(entry != page, entry);
set_page_private(page + i, 0);
xas_next(&xas);
}
- ClearPageSwapCache(page);
address_space->nrpages -= nr;
__mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr);
ADD_CACHE_INFO(del_total, nr);
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages
2019-03-30 3:04 ` Matthew Wilcox
@ 2019-03-30 14:10 ` Matthew Wilcox
2019-03-31 3:23 ` Matthew Wilcox
0 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox @ 2019-03-30 14:10 UTC (permalink / raw)
To: Qian Cai; +Cc: Huang Ying, linux-mm, Kirill A. Shutemov
On Fri, Mar 29, 2019 at 08:04:32PM -0700, Matthew Wilcox wrote:
> Excellent! I'm not comfortable with the rule that you have to be holding
> the i_pages lock in order to call find_get_page() on a swap address_space.
> How does this look to the various smart people who know far more about the
> MM than I do?
>
> The idea is to ensure that if this race does happen, the page will be
> handled the same way as a pagecache page. If __delete_from_swap_cache()
> can be called while the page is still part of a VMA, then this patch
> will break page_to_pgoff(). But I don't think that can happen ... ?
Oh, blah, that can totally happen. reuse_swap_page() calls
delete_from_swap_cache(). Need a new plan.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages
2019-03-30 14:10 ` Matthew Wilcox
@ 2019-03-31 3:23 ` Matthew Wilcox
2019-04-01 9:18 ` Kirill A. Shutemov
0 siblings, 1 reply; 21+ messages in thread
From: Matthew Wilcox @ 2019-03-31 3:23 UTC (permalink / raw)
To: Qian Cai; +Cc: Huang Ying, linux-mm, Kirill A. Shutemov
On Sat, Mar 30, 2019 at 07:10:52AM -0700, Matthew Wilcox wrote:
> On Fri, Mar 29, 2019 at 08:04:32PM -0700, Matthew Wilcox wrote:
> > Excellent! I'm not comfortable with the rule that you have to be holding
> > the i_pages lock in order to call find_get_page() on a swap address_space.
> > How does this look to the various smart people who know far more about the
> > MM than I do?
> >
> > The idea is to ensure that if this race does happen, the page will be
> > handled the same way as a pagecache page. If __delete_from_swap_cache()
> > can be called while the page is still part of a VMA, then this patch
> > will break page_to_pgoff(). But I don't think that can happen ... ?
>
> Oh, blah, that can totally happen. reuse_swap_page() calls
> delete_from_swap_cache(). Need a new plan.
I don't see a good solution here that doesn't involve withdrawing this
patch and starting over. Bad solutions:
- Take the i_pages lock around each page lookup call in the swap code
(not just the one you found; there are others like mc_handle_swap_pte()
in memcontrol.c)
- Call synchronize_rcu() in __delete_from_swap_cache()
- Swap the roles of ->index and ->private for swap pages, and then don't
clear ->index when deleting a page from the swap cache
The first two would be slow and non-scalable. The third is still prone
to a race where the page is looked up on one CPU, while another CPU
removes it from one swap file then moves it to a different location,
potentially in a different swap file. Hard to hit, but not a race we
want to introduce.
I believe that the swap code actually never wants to see subpages. So if
we start again, introducing APIs (eg find_get_head()) which return the
head page, then convert the swap code over to use those APIs, we don't
need to solve the problem of finding the subpage of a swap page while
not holding the page lock.
I'm obviously reluctant to withdraw the patch, but I don't see a better
option. Your testing has revealed a problem that needs a deeper solution
than just adding a fix patch.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages
2019-03-31 3:23 ` Matthew Wilcox
@ 2019-04-01 9:18 ` Kirill A. Shutemov
2019-04-01 9:27 ` Kirill A. Shutemov
0 siblings, 1 reply; 21+ messages in thread
From: Kirill A. Shutemov @ 2019-04-01 9:18 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Qian Cai, Huang Ying, linux-mm
On Sat, Mar 30, 2019 at 08:23:26PM -0700, Matthew Wilcox wrote:
> On Sat, Mar 30, 2019 at 07:10:52AM -0700, Matthew Wilcox wrote:
> > On Fri, Mar 29, 2019 at 08:04:32PM -0700, Matthew Wilcox wrote:
> > > Excellent! I'm not comfortable with the rule that you have to be holding
> > > the i_pages lock in order to call find_get_page() on a swap address_space.
> > > How does this look to the various smart people who know far more about the
> > > MM than I do?
> > >
> > > The idea is to ensure that if this race does happen, the page will be
> > > handled the same way as a pagecache page. If __delete_from_swap_cache()
> > > can be called while the page is still part of a VMA, then this patch
> > > will break page_to_pgoff(). But I don't think that can happen ... ?
> >
> > Oh, blah, that can totally happen. reuse_swap_page() calls
> > delete_from_swap_cache(). Need a new plan.
>
> I don't see a good solution here that doesn't involve withdrawing this
> patch and starting over. Bad solutions:
>
> - Take the i_pages lock around each page lookup call in the swap code
> (not just the one you found; there are others like mc_handle_swap_pte()
> in memcontrol.c)
> - Call synchronize_rcu() in __delete_from_swap_cache()
> - Swap the roles of ->index and ->private for swap pages, and then don't
> clear ->index when deleting a page from the swap cache
>
> The first two would be slow and non-scalable. The third is still prone
> to a race where the page is looked up on one CPU, while another CPU
> removes it from one swap file then moves it to a different location,
> potentially in a different swap file. Hard to hit, but not a race we
> want to introduce.
>
> I believe that the swap code actually never wants to see subpages. So if
> we start again, introducing APIs (eg find_get_head()) which return the
> head page, then convert the swap code over to use those APIs, we don't
> need to solve the problem of finding the subpage of a swap page while
> not holding the page lock.
>
> I'm obviously reluctant to withdraw the patch, but I don't see a better
> option. Your testing has revealed a problem that needs a deeper solution
> than just adding a fix patch.
Hm. Isn't the problem with VM_BUGs themself? I mean find_subpage()
produces right result (or am I wrong here?), but VM_BUGs flags it as wrong.
Maybe we should relax the VM_BUGs?
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages
2019-04-01 9:18 ` Kirill A. Shutemov
@ 2019-04-01 9:27 ` Kirill A. Shutemov
2019-04-04 13:10 ` Qian Cai
0 siblings, 1 reply; 21+ messages in thread
From: Kirill A. Shutemov @ 2019-04-01 9:27 UTC (permalink / raw)
To: Matthew Wilcox; +Cc: Qian Cai, Huang Ying, linux-mm
On Mon, Apr 01, 2019 at 12:18:58PM +0300, Kirill A. Shutemov wrote:
> On Sat, Mar 30, 2019 at 08:23:26PM -0700, Matthew Wilcox wrote:
> > On Sat, Mar 30, 2019 at 07:10:52AM -0700, Matthew Wilcox wrote:
> > > On Fri, Mar 29, 2019 at 08:04:32PM -0700, Matthew Wilcox wrote:
> > > > Excellent! I'm not comfortable with the rule that you have to be holding
> > > > the i_pages lock in order to call find_get_page() on a swap address_space.
> > > > How does this look to the various smart people who know far more about the
> > > > MM than I do?
> > > >
> > > > The idea is to ensure that if this race does happen, the page will be
> > > > handled the same way as a pagecache page. If __delete_from_swap_cache()
> > > > can be called while the page is still part of a VMA, then this patch
> > > > will break page_to_pgoff(). But I don't think that can happen ... ?
> > >
> > > Oh, blah, that can totally happen. reuse_swap_page() calls
> > > delete_from_swap_cache(). Need a new plan.
> >
> > I don't see a good solution here that doesn't involve withdrawing this
> > patch and starting over. Bad solutions:
> >
> > - Take the i_pages lock around each page lookup call in the swap code
> > (not just the one you found; there are others like mc_handle_swap_pte()
> > in memcontrol.c)
> > - Call synchronize_rcu() in __delete_from_swap_cache()
> > - Swap the roles of ->index and ->private for swap pages, and then don't
> > clear ->index when deleting a page from the swap cache
> >
> > The first two would be slow and non-scalable. The third is still prone
> > to a race where the page is looked up on one CPU, while another CPU
> > removes it from one swap file then moves it to a different location,
> > potentially in a different swap file. Hard to hit, but not a race we
> > want to introduce.
> >
> > I believe that the swap code actually never wants to see subpages. So if
> > we start again, introducing APIs (eg find_get_head()) which return the
> > head page, then convert the swap code over to use those APIs, we don't
> > need to solve the problem of finding the subpage of a swap page while
> > not holding the page lock.
> >
> > I'm obviously reluctant to withdraw the patch, but I don't see a better
> > option. Your testing has revealed a problem that needs a deeper solution
> > than just adding a fix patch.
>
> Hm. Isn't the problem with VM_BUGs themself? I mean find_subpage()
> produces right result (or am I wrong here?), but VM_BUGs flags it as wrong.
Yeah, I'm wrong. :P
What about patch like this? (completely untested)
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index f939e004c5d1..e3b9bf843dcb 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -335,12 +335,12 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
static inline struct page *find_subpage(struct page *page, pgoff_t offset)
{
- unsigned long index = page_index(page);
+ unsigned long mask;
VM_BUG_ON_PAGE(PageTail(page), page);
- VM_BUG_ON_PAGE(index > offset, page);
- VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
- return page - index + offset;
+
+ mask = (1UL << compound_order(page)) - 1;
+ return page + (offset & mask);
}
struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
--
Kirill A. Shutemov
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages
2019-04-01 9:27 ` Kirill A. Shutemov
@ 2019-04-04 13:10 ` Qian Cai
2019-04-04 13:45 ` Kirill A. Shutemov
0 siblings, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-04-04 13:10 UTC (permalink / raw)
To: Kirill A. Shutemov, Matthew Wilcox; +Cc: Huang Ying, linux-mm
On Mon, 2019-04-01 at 12:27 +0300, Kirill A. Shutemov wrote:
> What about patch like this? (completely untested)
>
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index f939e004c5d1..e3b9bf843dcb 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -335,12 +335,12 @@ static inline struct page *grab_cache_page_nowait(struct
> address_space *mapping,
>
> static inline struct page *find_subpage(struct page *page, pgoff_t offset)
> {
> - unsigned long index = page_index(page);
> + unsigned long mask;
>
> VM_BUG_ON_PAGE(PageTail(page), page);
> - VM_BUG_ON_PAGE(index > offset, page);
> - VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
> - return page - index + offset;
> +
> + mask = (1UL << compound_order(page)) - 1;
> + return page + (offset & mask);
> }
>
> struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
No, this then leads to a panic below by LTP hugemmap05. Still reverting the
whole "mm: page cache: store only head pages in i_pages" commit fixed the
problem.
# /opt/ltp/testcases/bin/hugemmap05
tst_test.c:1096: INFO: Timeout per run is 0h 05m 00s
hugemmap05.c:235: INFO: original nr_hugepages is 0
hugemmap05.c:248: INFO: original nr_overcommit_hugepages is 0
hugemmap05.c:116: INFO: check /proc/meminfo before allocation.
hugemmap05.c:297: INFO: HugePages_Total is 192.
hugemmap05.c:297: INFO: HugePages_Free is 192.
hugemmap05.c:297: INFO: HugePages_Surp is 64.
hugemmap05.c:297: INFO: HugePages_Rsvd is 192.
hugemmap05.c:272: INFO: First hex is 7070707
hugemmap05.c:151: INFO: check /proc/meminfo.
hugemmap05.c:297: INFO: HugePages_Total is 192.
hugemmap05.c:297: INFO: HugePages_Free is 0.
hugemmap05.c:297: INFO: HugePages_Surp is 64.
hugemmap05.c:297: INFO: HugePages_Rsvd is 0.
[10022.547977] ------------[ cut here ]------------
[10022.571941] kernel BUG at fs/hugetlbfs/inode.c:475!
[10022.598304] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC KASAN PTI
[10022.626383] CPU: 39 PID: 13074 Comm: hugemmap05 Kdump: loaded Tainted:
G W 5.1.0-rc3-next-20190403+ #16
[10022.674421] Hardware name: HP ProLiant XL420 Gen9/ProLiant XL420 Gen9, BIOS
U19 12/27/2015
[10022.711990] RIP: 0010:remove_inode_hugepages+0x706/0xa60
[10022.735997] Code: fd ff ff e8 9c a0 99 ff e9 bc fc ff ff 48 c7 c6 40 ae 50 9f
4c 89 f7 e8 c8 3f ca ff 0f 0b 48 c7 c7 80 18 ba 9f e8 2f 63 15 00 <0f> 0b 48 c7
c7 40 18 ba 9f e8 21 63 15 00 48 8b bd 88 fd ff ff e8
[10022.820547] RSP: 0018:ffff88883ea5f920 EFLAGS: 00010202
[10022.844039] RAX: 015fffe000002000 RBX: 0000000000000001 RCX:
ffffffff9e2adf5c
[10022.876130] RDX: 0000000000000001 RSI: 00000000000001df RDI:
ffffea001a0f8048
[10022.908202] RBP: ffff88883ea5fbf8 R08: fffff9400341f00b R09:
fffff9400341f00a
[10022.940369] R10: fffff9400341f00a R11: ffffea001a0f8057 R12:
0000000000000001
[10022.972615] R13: ffff88883ea5fbd0 R14: ffffea001a0f8040 R15:
dffffc0000000000
[10023.004633] FS: 00007ff5964d7740(0000) GS:ffff888847b80000(0000)
knlGS:0000000000000000
[10023.040462] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[10023.066242] CR2: 00007ff595800000 CR3: 00000004be5d0006 CR4:
00000000001606a0
[10023.103426] Call Trace:
[10023.114997] ? hugetlbfs_size_to_hpages+0xe0/0xe0
[10023.136032] ? fsnotify_grab_connector+0x9f/0x130
[10023.157131] ? __lock_acquire.isra.14+0x7d7/0x2130
[10023.178540] ? kasan_check_read+0x11/0x20
[10023.196471] ? do_raw_spin_unlock+0x59/0x250
[10023.215893] hugetlbfs_evict_inode+0x20/0x90
[10023.235249] evict+0x2a4/0x5c0
[10023.249393] ? do_raw_spin_unlock+0x59/0x250
[10023.268885] iput+0x3d9/0x790
[10023.282210] do_unlinkat+0x461/0x650
[10023.298318] ? __x64_sys_rmdir+0x40/0x40
[10023.316058] ? __check_object_size+0x4b4/0x7f1
[10023.336241] ? __kasan_kmalloc.constprop.1+0xac/0xc0
[10023.358681] ? blkcg_exit_queue+0x1a0/0x1a0
[10023.377428] ? getname_flags+0x90/0x400
[10023.394859] __x64_sys_unlink+0x3e/0x50
[10023.411987] do_syscall_64+0xeb/0xb78
[10023.428386] ? syscall_return_slowpath+0x160/0x160
[10023.449987] ? __do_page_fault+0x583/0x8d0
[10023.468333] ? schedule+0x81/0x180
[10023.483515] ? exit_to_usermode_loop+0xab/0x100
[10023.503763] entry_SYSCALL_64_after_hwframe+0x44/0xa9
[10023.526369] RIP: 0033:0x7ff595bbcedb
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages
2019-04-04 13:10 ` Qian Cai
@ 2019-04-04 13:45 ` Kirill A. Shutemov
2019-04-04 21:28 ` Qian Cai
0 siblings, 1 reply; 21+ messages in thread
From: Kirill A. Shutemov @ 2019-04-04 13:45 UTC (permalink / raw)
To: Qian Cai; +Cc: Matthew Wilcox, Huang Ying, linux-mm
On Thu, Apr 04, 2019 at 09:10:10AM -0400, Qian Cai wrote:
> On Mon, 2019-04-01 at 12:27 +0300, Kirill A. Shutemov wrote:
> > What about patch like this? (completely untested)
> >
> > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> > index f939e004c5d1..e3b9bf843dcb 100644
> > --- a/include/linux/pagemap.h
> > +++ b/include/linux/pagemap.h
> > @@ -335,12 +335,12 @@ static inline struct page *grab_cache_page_nowait(struct
> > address_space *mapping,
> >
> > static inline struct page *find_subpage(struct page *page, pgoff_t offset)
> > {
> > - unsigned long index = page_index(page);
> > + unsigned long mask;
> >
> > VM_BUG_ON_PAGE(PageTail(page), page);
> > - VM_BUG_ON_PAGE(index > offset, page);
> > - VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
> > - return page - index + offset;
> > +
> > + mask = (1UL << compound_order(page)) - 1;
> > + return page + (offset & mask);
> > }
> >
> > struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
>
> No, this then leads to a panic below by LTP hugemmap05. Still reverting the
> whole "mm: page cache: store only head pages in i_pages" commit fixed the
> problem.
Ughh... hugetlb stores pages in page cache differently.
What about this:
diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
index f939e004c5d1..2e8438a1216a 100644
--- a/include/linux/pagemap.h
+++ b/include/linux/pagemap.h
@@ -335,12 +335,15 @@ static inline struct page *grab_cache_page_nowait(struct address_space *mapping,
static inline struct page *find_subpage(struct page *page, pgoff_t offset)
{
- unsigned long index = page_index(page);
+ unsigned long mask;
+
+ if (PageHuge(page))
+ return page;
VM_BUG_ON_PAGE(PageTail(page), page);
- VM_BUG_ON_PAGE(index > offset, page);
- VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
- return page - index + offset;
+
+ mask = (1UL << compound_order(page)) - 1;
+ return page + (offset & mask);
}
struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
--
Kirill A. Shutemov
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages
2019-04-04 13:45 ` Kirill A. Shutemov
@ 2019-04-04 21:28 ` Qian Cai
2019-04-05 13:37 ` Kirill A. Shutemov
0 siblings, 1 reply; 21+ messages in thread
From: Qian Cai @ 2019-04-04 21:28 UTC (permalink / raw)
To: Kirill A. Shutemov; +Cc: Matthew Wilcox, Huang Ying, linux-mm
On Thu, 2019-04-04 at 16:45 +0300, Kirill A. Shutemov wrote:
> What about this:
>
> diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> index f939e004c5d1..2e8438a1216a 100644
> --- a/include/linux/pagemap.h
> +++ b/include/linux/pagemap.h
> @@ -335,12 +335,15 @@ static inline struct page *grab_cache_page_nowait(struct
> address_space *mapping,
>
> static inline struct page *find_subpage(struct page *page, pgoff_t offset)
> {
> - unsigned long index = page_index(page);
> + unsigned long mask;
> +
> + if (PageHuge(page))
> + return page;
>
> VM_BUG_ON_PAGE(PageTail(page), page);
> - VM_BUG_ON_PAGE(index > offset, page);
> - VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
> - return page - index + offset;
> +
> + mask = (1UL << compound_order(page)) - 1;
> + return page + (offset & mask);
> }
>
> struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
It works fine.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages
2019-04-04 21:28 ` Qian Cai
@ 2019-04-05 13:37 ` Kirill A. Shutemov
2019-04-05 13:51 ` Matthew Wilcox
0 siblings, 1 reply; 21+ messages in thread
From: Kirill A. Shutemov @ 2019-04-05 13:37 UTC (permalink / raw)
To: Qian Cai; +Cc: Matthew Wilcox, Huang Ying, linux-mm
On Thu, Apr 04, 2019 at 05:28:02PM -0400, Qian Cai wrote:
> On Thu, 2019-04-04 at 16:45 +0300, Kirill A. Shutemov wrote:
> > What about this:
> >
> > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> > index f939e004c5d1..2e8438a1216a 100644
> > --- a/include/linux/pagemap.h
> > +++ b/include/linux/pagemap.h
> > @@ -335,12 +335,15 @@ static inline struct page *grab_cache_page_nowait(struct
> > address_space *mapping,
> >
> > static inline struct page *find_subpage(struct page *page, pgoff_t offset)
> > {
> > - unsigned long index = page_index(page);
> > + unsigned long mask;
> > +
> > + if (PageHuge(page))
> > + return page;
> >
> > VM_BUG_ON_PAGE(PageTail(page), page);
> > - VM_BUG_ON_PAGE(index > offset, page);
> > - VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
> > - return page - index + offset;
> > +
> > + mask = (1UL << compound_order(page)) - 1;
> > + return page + (offset & mask);
> > }
> >
> > struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
>
> It works fine.
Nice.
Matthew, does it look fine to you?
--
Kirill A. Shutemov
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: page cache: Store only head pages in i_pages
2019-04-05 13:37 ` Kirill A. Shutemov
@ 2019-04-05 13:51 ` Matthew Wilcox
0 siblings, 0 replies; 21+ messages in thread
From: Matthew Wilcox @ 2019-04-05 13:51 UTC (permalink / raw)
To: Kirill A. Shutemov, Andrew Morton; +Cc: Qian Cai, Huang Ying, linux-mm
On Fri, Apr 05, 2019 at 04:37:42PM +0300, Kirill A. Shutemov wrote:
> On Thu, Apr 04, 2019 at 05:28:02PM -0400, Qian Cai wrote:
> > On Thu, 2019-04-04 at 16:45 +0300, Kirill A. Shutemov wrote:
> > > What about this:
> > >
> > > diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h
> > > index f939e004c5d1..2e8438a1216a 100644
> > > --- a/include/linux/pagemap.h
> > > +++ b/include/linux/pagemap.h
> > > @@ -335,12 +335,15 @@ static inline struct page *grab_cache_page_nowait(struct
> > > address_space *mapping,
> > >
> > > static inline struct page *find_subpage(struct page *page, pgoff_t offset)
> > > {
> > > - unsigned long index = page_index(page);
> > > + unsigned long mask;
> > > +
> > > + if (PageHuge(page))
> > > + return page;
> > >
> > > VM_BUG_ON_PAGE(PageTail(page), page);
> > > - VM_BUG_ON_PAGE(index > offset, page);
> > > - VM_BUG_ON_PAGE(index + (1 << compound_order(page)) <= offset, page);
> > > - return page - index + offset;
> > > +
> > > + mask = (1UL << compound_order(page)) - 1;
> > > + return page + (offset & mask);
> > > }
> > >
> > > struct page *find_get_entry(struct address_space *mapping, pgoff_t offset);
> >
> > It works fine.
>
> Nice.
>
> Matthew, does it look fine to you?
Yes, this looks good to me. I think we'll have opportunities to improve
it later (eg when unifying THP and hugetlbfs uses of the page cache).
But for now, Andrew can you add this version as -fix-fix?
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2019-04-05 13:51 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <1553285568.26196.24.camel@lca.pw>
2019-03-23 3:38 ` page cache: Store only head pages in i_pages Matthew Wilcox
2019-03-23 23:50 ` Qian Cai
2019-03-24 2:06 ` Matthew Wilcox
2019-03-24 2:52 ` Qian Cai
2019-03-24 3:04 ` Matthew Wilcox
2019-03-24 15:42 ` Qian Cai
2019-03-27 10:48 ` William Kucharski
2019-03-27 11:50 ` Matthew Wilcox
2019-03-29 1:43 ` Qian Cai
2019-03-29 19:59 ` Matthew Wilcox
2019-03-29 21:25 ` Qian Cai
2019-03-30 3:04 ` Matthew Wilcox
2019-03-30 14:10 ` Matthew Wilcox
2019-03-31 3:23 ` Matthew Wilcox
2019-04-01 9:18 ` Kirill A. Shutemov
2019-04-01 9:27 ` Kirill A. Shutemov
2019-04-04 13:10 ` Qian Cai
2019-04-04 13:45 ` Kirill A. Shutemov
2019-04-04 21:28 ` Qian Cai
2019-04-05 13:37 ` Kirill A. Shutemov
2019-04-05 13:51 ` Matthew Wilcox
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).