From: Tim Chen <tim.c.chen@linux.intel.com> To: Jerome Glisse <jglisse@redhat.com> Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Andrea Arcangeli <aarcange@redhat.com>, Michal Hocko <mhocko@kernel.org>, Alexander Viro <viro@zeniv.linux.org.uk>, Theodore Ts'o <tytso@mit.edu>, Tejun Heo <tj@kernel.org>, Jan Kara <jack@suse.cz>, Josef Bacik <jbacik@fb.com>, Mel Gorman <mgorman@techsingularity.net>, Jeff Layton <jlayton@redhat.com> Subject: Re: [RFC PATCH 00/79] Generic page write protection and a solution to page waitqueue Date: Fri, 20 Apr 2018 16:48:22 -0700 [thread overview] Message-ID: <1809b27e-e79d-f2c3-19f5-0f505c340519@linux.intel.com> (raw) In-Reply-To: <20180420221905.GA4124@redhat.com> On 04/20/2018 03:19 PM, Jerome Glisse wrote: > On Fri, Apr 20, 2018 at 12:57:41PM -0700, Tim Chen wrote: >> On 04/04/2018 12:17 PM, jglisse@redhat.com wrote: >> >> >> Your approach seems useful if there are lots of locked pages sharing >> the same wait queue. >> >> That said, in the original workload from our customer with the long wait queue >> problem, there was a single super hot page getting migrated, and it >> is being accessed by all threads which caused the big log jam while they wait for >> the migration to get completed. >> With your approach, we will still likely end up with a long queue >> in that workload even if we have per page wait queue. >> >> Thanks. > > Ok so i re-read the thread, i was writting this cover letter from memory > and i had bad recollection of your issue, so sorry. > > First, do you have a way to reproduce the issue ? Something easy would > be nice :) Unfortunately it is a customer workload that they guard closely and wouldn't let us look at the source code. We have to profile and backtrace its behavior. Mel made a quick attempt to reproduce the behavior with a hot page migration, but he wasn't quite able to duplicate the pathologic behavior. > > So what i am proposing for per page wait queue would only marginaly help > you (it might not even be mesurable in your workload). It would certainly > make the code smaller and easier to understand i believe. In certain cases if we have lots of pages sharing a page wait queue, your solution would help, and we wouldn't be wasting time checking waiters not waiting on the page that's being unlocked. Though I don't have a specific workload that has such behavior. > > Now that i have look back at your issue i think there is 2 things we > should do. First keep migration page map read only, this would at least > avoid CPU read fault. In trace you captured i wasn't able to ascertain > if this were read or write fault. > > Second idea i have is about NUMA, everytime we NUMA migrate a page we > could attach a temporary struct to the page (using page->mapping). So > if we scan that page again we can inspect information about previous > migration and see if we are not over migrating that page (ie bouncing > it all over). If so we can mark the page (maybe with a page flag if we > can find one) to protect it from further migration. That temporary > struct would be remove after a while, ie autonuma would preallocate a > bunch of those and keep an LRU of them and recycle the oldest when it > needs a new one to migrate another page. The goal to migrate a hot page with care, or avoid bouncing it around frequently makes sense. If it is a hot page shared by many threads running on different NUMA nodes, and moving it will only mildly improve NUMA locality, we should avoid the migration. Tim > > > LSF/MM slots: > > Michal can i get 2 slots to talk about this ? MM only discussion, one > to talk about doing migration with page map read only but write > protected while migration is happening. The other one to talk about > attaching auto NUMA tracking struct to page. > > Cheers, > Jérôme >
WARNING: multiple messages have this Message-ID (diff)
From: Tim Chen <tim.c.chen@linux.intel.com> To: Jerome Glisse <jglisse@redhat.com> Cc: linux-mm@kvack.org, linux-fsdevel@vger.kernel.org, linux-block@vger.kernel.org, linux-kernel@vger.kernel.org, Andrea Arcangeli <aarcange@redhat.com>, Michal Hocko <mhocko@kernel.org>, Alexander Viro <viro@zeniv.linux.org.uk>, Theodore Ts'o <tytso@mit.edu>, Tejun Heo <tj@kernel.org>, Jan Kara <jack@suse.cz>, Josef Bacik <jbacik@fb.com>, Mel Gorman <mgorman@techsingularity.net>, Jeff Layton <jlayton@redhat.com> Subject: Re: [RFC PATCH 00/79] Generic page write protection and a solution to page waitqueue Date: Fri, 20 Apr 2018 16:48:22 -0700 [thread overview] Message-ID: <1809b27e-e79d-f2c3-19f5-0f505c340519@linux.intel.com> (raw) In-Reply-To: <20180420221905.GA4124@redhat.com> On 04/20/2018 03:19 PM, Jerome Glisse wrote: > On Fri, Apr 20, 2018 at 12:57:41PM -0700, Tim Chen wrote: >> On 04/04/2018 12:17 PM, jglisse@redhat.com wrote: >> >> >> Your approach seems useful if there are lots of locked pages sharing >> the same wait queue. >> >> That said, in the original workload from our customer with the long wait queue >> problem, there was a single super hot page getting migrated, and it >> is being accessed by all threads which caused the big log jam while they wait for >> the migration to get completed. >> With your approach, we will still likely end up with a long queue >> in that workload even if we have per page wait queue. >> >> Thanks. > > Ok so i re-read the thread, i was writting this cover letter from memory > and i had bad recollection of your issue, so sorry. > > First, do you have a way to reproduce the issue ? Something easy would > be nice :) Unfortunately it is a customer workload that they guard closely and wouldn't let us look at the source code. We have to profile and backtrace its behavior. Mel made a quick attempt to reproduce the behavior with a hot page migration, but he wasn't quite able to duplicate the pathologic behavior. > > So what i am proposing for per page wait queue would only marginaly help > you (it might not even be mesurable in your workload). It would certainly > make the code smaller and easier to understand i believe. In certain cases if we have lots of pages sharing a page wait queue, your solution would help, and we wouldn't be wasting time checking waiters not waiting on the page that's being unlocked. Though I don't have a specific workload that has such behavior. > > Now that i have look back at your issue i think there is 2 things we > should do. First keep migration page map read only, this would at least > avoid CPU read fault. In trace you captured i wasn't able to ascertain > if this were read or write fault. > > Second idea i have is about NUMA, everytime we NUMA migrate a page we > could attach a temporary struct to the page (using page->mapping). So > if we scan that page again we can inspect information about previous > migration and see if we are not over migrating that page (ie bouncing > it all over). If so we can mark the page (maybe with a page flag if we > can find one) to protect it from further migration. That temporary > struct would be remove after a while, ie autonuma would preallocate a > bunch of those and keep an LRU of them and recycle the oldest when it > needs a new one to migrate another page. The goal to migrate a hot page with care, or avoid bouncing it around frequently makes sense. If it is a hot page shared by many threads running on different NUMA nodes, and moving it will only mildly improve NUMA locality, we should avoid the migration. Tim > > > LSF/MM slots: > > Michal can i get 2 slots to talk about this ? MM only discussion, one > to talk about doing migration with page map read only but write > protected while migration is happening. The other one to talk about > attaching auto NUMA tracking struct to page. > > Cheers, > JA(C)rA'me >
next prev parent reply other threads:[~2018-04-20 23:48 UTC|newest] Thread overview: 104+ messages / expand[flat|nested] mbox.gz Atom feed top 2018-04-04 19:17 [RFC PATCH 00/79] Generic page write protection and a solution to page waitqueue jglisse 2018-04-04 19:17 ` jglisse 2018-04-04 19:17 ` [RFC PATCH 04/79] pipe: add inode field to struct pipe_inode_info jglisse 2018-04-04 19:17 ` jglisse 2018-04-04 19:17 ` [RFC PATCH 05/79] mm/swap: add an helper to get address_space from swap_entry_t jglisse 2018-04-04 19:17 ` jglisse 2018-04-04 19:17 ` [RFC PATCH 06/79] mm/page: add helpers to dereference struct page index field jglisse 2018-04-04 19:17 ` jglisse 2018-04-04 19:17 ` [RFC PATCH 07/79] mm/page: add helpers to find mapping give a page and buffer head jglisse 2018-04-04 19:17 ` jglisse 2018-04-04 19:17 ` [RFC PATCH 08/79] mm/page: add helpers to find page mapping and private given a bio jglisse 2018-04-04 19:17 ` jglisse 2018-04-04 19:17 ` [RFC PATCH 09/79] fs: add struct address_space to read_cache_page() callback argument jglisse 2018-04-04 19:17 ` jglisse 2018-04-04 19:17 ` [RFC PATCH 20/79] fs: add struct address_space to write_cache_pages() " jglisse 2018-04-04 19:17 ` jglisse 2018-04-04 19:17 ` [RFC PATCH 22/79] fs: add struct inode to block_read_full_page() arguments jglisse 2018-04-04 19:17 ` jglisse 2018-04-04 19:17 ` [RFC PATCH 24/79] fs: add struct inode to nobh_writepage() arguments jglisse 2018-04-04 19:17 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 26/79] fs: add struct address_space to mpage_readpage() arguments jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 27/79] fs: add struct address_space to fscache_read*() callback arguments jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 28/79] fs: introduce page_is_truncated() helper jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 29/79] fs/block: add struct address_space to bdev_write_page() arguments jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 30/79] fs/block: add struct address_space to __block_write_begin() arguments jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 31/79] fs/block: add struct address_space to __block_write_begin_int() args jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 32/79] fs/block: do not rely on page->mapping get it from the context jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 33/79] fs/journal: add struct super_block to jbd2_journal_forget() arguments jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 34/79] fs/journal: add struct inode to jbd2_journal_revoke() arguments jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 35/79] fs/buffer: add struct address_space and struct page to end_io callback jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 36/79] fs/buffer: add struct super_block to bforget() arguments jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 37/79] fs/buffer: add struct super_block to __bforget() arguments jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 38/79] fs/buffer: add first buffer flag for first buffer_head in a page jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 39/79] fs/buffer: add struct address_space to clean_page_buffers() arguments jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 50/79] fs: stop relying on mapping field of struct page, get it from context jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 51/79] " jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 52/79] fs/buffer: use _page_has_buffers() instead of page_has_buffers() jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 63/79] mm/page: convert page's index lookup to be against specific mapping jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 64/79] mm/buffer: use _page_has_buffers() instead of page_has_buffers() jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 65/79] mm/swap: add struct swap_info_struct swap_readpage() arguments jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 68/79] mm/vma_address: convert page's index lookup to be against specific mapping jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 69/79] fs/journal: add struct address_space to jbd2_journal_try_to_free_buffers() arguments jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 70/79] mm: add struct address_space to mark_buffer_dirty() jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 71/79] mm: add struct address_space to set_page_dirty() jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 72/79] mm: add struct address_space to set_page_dirty_lock() jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 73/79] mm: pass down struct address_space to set_page_dirty() jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 74/79] mm/page_ronly: add config option for generic read only page framework jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 75/79] mm/page_ronly: add page read only core structure and helpers jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 76/79] mm/ksm: have ksm select PAGE_RONLY config jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 77/79] mm/ksm: hide set_page_stable_node() and page_stable_node() jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 78/79] mm/ksm: rename PAGE_MAPPING_KSM to PAGE_MAPPING_RONLY jglisse 2018-04-04 19:18 ` jglisse 2018-04-04 19:18 ` [RFC PATCH 79/79] mm/ksm: set page->mapping to page_ronly struct instead of stable_node jglisse 2018-04-04 19:18 ` jglisse 2018-04-18 14:13 ` [RFC PATCH 00/79] Generic page write protection and a solution to page waitqueue Jan Kara 2018-04-18 14:13 ` Jan Kara 2018-04-18 14:13 ` Jan Kara 2018-04-18 15:54 ` Jerome Glisse 2018-04-18 15:54 ` Jerome Glisse 2018-04-18 15:54 ` Jerome Glisse 2018-04-18 16:20 ` Darrick J. Wong 2018-04-18 16:20 ` Darrick J. Wong 2018-04-18 16:20 ` Darrick J. Wong 2018-04-19 10:32 ` Jan Kara 2018-04-19 14:52 ` Jerome Glisse 2018-04-19 14:52 ` Jerome Glisse 2018-04-19 14:52 ` Jerome Glisse 2018-04-20 19:57 ` Tim Chen 2018-04-20 19:57 ` Tim Chen 2018-04-20 22:19 ` Jerome Glisse 2018-04-20 22:19 ` Jerome Glisse 2018-04-20 22:19 ` Jerome Glisse 2018-04-20 23:48 ` Tim Chen [this message] 2018-04-20 23:48 ` Tim Chen
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1809b27e-e79d-f2c3-19f5-0f505c340519@linux.intel.com \ --to=tim.c.chen@linux.intel.com \ --cc=aarcange@redhat.com \ --cc=jack@suse.cz \ --cc=jbacik@fb.com \ --cc=jglisse@redhat.com \ --cc=jlayton@redhat.com \ --cc=linux-block@vger.kernel.org \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@techsingularity.net \ --cc=mhocko@kernel.org \ --cc=tj@kernel.org \ --cc=tytso@mit.edu \ --cc=viro@zeniv.linux.org.uk \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.