From: Jan Kara <jack@suse.cz> To: Mel Gorman <mgorman@techsingularity.net> Cc: Linux-MM <linux-mm@kvack.org>, Linux-FSDevel <linux-fsdevel@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>, Jan Kara <jack@suse.cz>, Andi Kleen <ak@linux.intel.com>, Dave Hansen <dave.hansen@intel.com>, Dave Chinner <david@fromorbit.com> Subject: Re: [PATCH 3/8] mm, truncate: Remove all exceptional entries from pagevec under one lock Date: Thu, 12 Oct 2017 15:33:23 +0200 [thread overview] Message-ID: <20171012133323.GB29293@quack2.suse.cz> (raw) In-Reply-To: <20171012093103.13412-4-mgorman@techsingularity.net> On Thu 12-10-17 10:30:58, Mel Gorman wrote: > During truncate each entry in a pagevec is checked to see if it is an > exceptional entry and if so, the shadow entry is cleaned up. This is > potentially expensive as multiple entries for a mapping locks/unlocks the > tree lock. This batches the operation such that any exceptional entries > removed from a pagevec only acquire the mapping tree lock once. The corner > case where this is more expensive is where there is only one exceptional > entry but this is unlikely due to temporal locality and how it affects > LRU ordering. Note that for truncations of small files created recently, > this patch should show no gain because it only batches the handling of > exceptional entries. > > sparsetruncate (large) > 4.14.0-rc4 4.14.0-rc4 > pickhelper-v1r1 batchshadow-v1r1 > Min Time 38.00 ( 0.00%) 27.00 ( 28.95%) > 1st-qrtle Time 40.00 ( 0.00%) 28.00 ( 30.00%) > 2nd-qrtle Time 44.00 ( 0.00%) 41.00 ( 6.82%) > 3rd-qrtle Time 146.00 ( 0.00%) 147.00 ( -0.68%) > Max-90% Time 153.00 ( 0.00%) 153.00 ( 0.00%) > Max-95% Time 155.00 ( 0.00%) 156.00 ( -0.65%) > Max-99% Time 181.00 ( 0.00%) 171.00 ( 5.52%) > Amean Time 93.04 ( 0.00%) 88.43 ( 4.96%) > Best99%Amean Time 92.08 ( 0.00%) 86.13 ( 6.46%) > Best95%Amean Time 89.19 ( 0.00%) 83.13 ( 6.80%) > Best90%Amean Time 85.60 ( 0.00%) 79.15 ( 7.53%) > Best75%Amean Time 72.95 ( 0.00%) 65.09 ( 10.78%) > Best50%Amean Time 39.86 ( 0.00%) 28.20 ( 29.25%) > Best25%Amean Time 39.44 ( 0.00%) 27.70 ( 29.77%) > > bonnie > 4.14.0-rc4 4.14.0-rc4 > pickhelper-v1r1 batchshadow-v1r1 > Hmean SeqCreate ops 71.92 ( 0.00%) 76.78 ( 6.76%) > Hmean SeqCreate read 42.42 ( 0.00%) 45.01 ( 6.10%) > Hmean SeqCreate del 26519.88 ( 0.00%) 27191.87 ( 2.53%) > Hmean RandCreate ops 71.92 ( 0.00%) 76.95 ( 7.00%) > Hmean RandCreate read 44.44 ( 0.00%) 49.23 ( 10.78%) > Hmean RandCreate del 24948.62 ( 0.00%) 24764.97 ( -0.74%) > > Truncation of a large number of files shows a substantial gain with 99% of files > being trruncated 6.46% faster. bonnie shows a modest gain of 2.53% > > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> > --- > mm/truncate.c | 86 ++++++++++++++++++++++++++++++++++++++++++----------------- > 1 file changed, 61 insertions(+), 25 deletions(-) > > diff --git a/mm/truncate.c b/mm/truncate.c > index 3dfa2d5e642e..af1eaa5b9450 100644 > --- a/mm/truncate.c > +++ b/mm/truncate.c > @@ -25,44 +25,77 @@ > #include <linux/rmap.h> > #include "internal.h" > > -static void clear_shadow_entry(struct address_space *mapping, pgoff_t index, > - void *entry) > +/* > + * Regular page slots are stabilized by the page lock even without the tree > + * itself locked. These unlocked entries need verification under the tree > + * lock. > + */ > +static inline void __clear_shadow_entry(struct address_space *mapping, > + pgoff_t index, void *entry) > { > struct radix_tree_node *node; > void **slot; > > - spin_lock_irq(&mapping->tree_lock); > - /* > - * Regular page slots are stabilized by the page lock even > - * without the tree itself locked. These unlocked entries > - * need verification under the tree lock. > - */ > if (!__radix_tree_lookup(&mapping->page_tree, index, &node, &slot)) > - goto unlock; > + return; > if (*slot != entry) > - goto unlock; > + return; > __radix_tree_replace(&mapping->page_tree, node, slot, NULL, > workingset_update_node, mapping); > mapping->nrexceptional--; > -unlock: > +} > + > +static void clear_shadow_entry(struct address_space *mapping, pgoff_t index, > + void *entry) > +{ > + spin_lock_irq(&mapping->tree_lock); > + __clear_shadow_entry(mapping, index, entry); > spin_unlock_irq(&mapping->tree_lock); > } > > /* > - * Unconditionally remove exceptional entry. Usually called from truncate path. > + * Unconditionally remove exceptional entries. Usually called from truncate > + * path. Note that the pagevec may be altered by this function by removing > + * exceptional entries similar to what pagevec_remove_exceptionals does. > */ > -static void truncate_exceptional_entry(struct address_space *mapping, > - pgoff_t index, void *entry) > +static void truncate_exceptional_pvec_entries(struct address_space *mapping, > + struct pagevec *pvec, pgoff_t *indices, int ei) > { > + int i, j; > + bool dax; > + > + /* Return immediately if caller indicates there are no entries */ > + if (ei == PAGEVEC_SIZE) > + return; > + > /* Handled by shmem itself */ > if (shmem_mapping(mapping)) > return; > > - if (dax_mapping(mapping)) { > - dax_delete_mapping_entry(mapping, index); > - return; > + dax = dax_mapping(mapping); > + if (!dax) > + spin_lock_irq(&mapping->tree_lock); > + > + for (i = ei, j = ei; i < pagevec_count(pvec); i++) { > + struct page *page = pvec->pages[i]; > + pgoff_t index = indices[i]; > + > + if (!radix_tree_exceptional_entry(page)) { > + pvec->pages[j++] = page; > + continue; > + } > + > + if (unlikely(dax)) { > + dax_delete_mapping_entry(mapping, index); > + continue; > + } > + > + __clear_shadow_entry(mapping, index, page); > } > - clear_shadow_entry(mapping, index, entry); > + > + if (!dax) > + spin_unlock_irq(&mapping->tree_lock); > + pvec->nr = j; > } When I look at this I think could make things cleaner. I have the following observations: 1) All truncate_inode_pages(), invalidate_mapping_pages(), invalidate_inode_pages2_range() essentially do very similar thing and would benefit from a similar kind of batching. 2) As you observed and measured, batching of radix tree operations makes sense both when removing pages and shadow entries, I'm very confident it would make sense for DAX exceptional entries as well. 3) In all cases (i.e., those three functions and for all entry types) the workflow seems to be: * lockless lookup of entries * prepare entry for reclaim (or determine it is not elligible) * lock mapping->tree_lock * verify entry is still elligible for reclaim (otherwise bail) * clear radix tree entry * unlock mapping->tree_lock * final cleanup of the entry So I'm wondering whether we cannot somehow refactor stuff so that batching of radix tree operations could be shared and we wouldn't have to duplicate it in all those cases. But it would be rather large overhaul of the code so it may be a bit out of scope for these improvements... > @@ -409,8 +445,8 @@ void truncate_inode_pages_range(struct address_space *mapping, > } > > if (radix_tree_exceptional_entry(page)) { > - truncate_exceptional_entry(mapping, index, > - page); > + if (ei != PAGEVEC_SIZE) > + ei = i; This should be ei == PAGEVEC_SIZE I think. Otherwise the patch looks good to me so feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR
WARNING: multiple messages have this Message-ID (diff)
From: Jan Kara <jack@suse.cz> To: Mel Gorman <mgorman@techsingularity.net> Cc: Linux-MM <linux-mm@kvack.org>, Linux-FSDevel <linux-fsdevel@vger.kernel.org>, LKML <linux-kernel@vger.kernel.org>, Jan Kara <jack@suse.cz>, Andi Kleen <ak@linux.intel.com>, Dave Hansen <dave.hansen@intel.com>, Dave Chinner <david@fromorbit.com> Subject: Re: [PATCH 3/8] mm, truncate: Remove all exceptional entries from pagevec under one lock Date: Thu, 12 Oct 2017 15:33:23 +0200 [thread overview] Message-ID: <20171012133323.GB29293@quack2.suse.cz> (raw) In-Reply-To: <20171012093103.13412-4-mgorman@techsingularity.net> On Thu 12-10-17 10:30:58, Mel Gorman wrote: > During truncate each entry in a pagevec is checked to see if it is an > exceptional entry and if so, the shadow entry is cleaned up. This is > potentially expensive as multiple entries for a mapping locks/unlocks the > tree lock. This batches the operation such that any exceptional entries > removed from a pagevec only acquire the mapping tree lock once. The corner > case where this is more expensive is where there is only one exceptional > entry but this is unlikely due to temporal locality and how it affects > LRU ordering. Note that for truncations of small files created recently, > this patch should show no gain because it only batches the handling of > exceptional entries. > > sparsetruncate (large) > 4.14.0-rc4 4.14.0-rc4 > pickhelper-v1r1 batchshadow-v1r1 > Min Time 38.00 ( 0.00%) 27.00 ( 28.95%) > 1st-qrtle Time 40.00 ( 0.00%) 28.00 ( 30.00%) > 2nd-qrtle Time 44.00 ( 0.00%) 41.00 ( 6.82%) > 3rd-qrtle Time 146.00 ( 0.00%) 147.00 ( -0.68%) > Max-90% Time 153.00 ( 0.00%) 153.00 ( 0.00%) > Max-95% Time 155.00 ( 0.00%) 156.00 ( -0.65%) > Max-99% Time 181.00 ( 0.00%) 171.00 ( 5.52%) > Amean Time 93.04 ( 0.00%) 88.43 ( 4.96%) > Best99%Amean Time 92.08 ( 0.00%) 86.13 ( 6.46%) > Best95%Amean Time 89.19 ( 0.00%) 83.13 ( 6.80%) > Best90%Amean Time 85.60 ( 0.00%) 79.15 ( 7.53%) > Best75%Amean Time 72.95 ( 0.00%) 65.09 ( 10.78%) > Best50%Amean Time 39.86 ( 0.00%) 28.20 ( 29.25%) > Best25%Amean Time 39.44 ( 0.00%) 27.70 ( 29.77%) > > bonnie > 4.14.0-rc4 4.14.0-rc4 > pickhelper-v1r1 batchshadow-v1r1 > Hmean SeqCreate ops 71.92 ( 0.00%) 76.78 ( 6.76%) > Hmean SeqCreate read 42.42 ( 0.00%) 45.01 ( 6.10%) > Hmean SeqCreate del 26519.88 ( 0.00%) 27191.87 ( 2.53%) > Hmean RandCreate ops 71.92 ( 0.00%) 76.95 ( 7.00%) > Hmean RandCreate read 44.44 ( 0.00%) 49.23 ( 10.78%) > Hmean RandCreate del 24948.62 ( 0.00%) 24764.97 ( -0.74%) > > Truncation of a large number of files shows a substantial gain with 99% of files > being trruncated 6.46% faster. bonnie shows a modest gain of 2.53% > > Signed-off-by: Mel Gorman <mgorman@techsingularity.net> > --- > mm/truncate.c | 86 ++++++++++++++++++++++++++++++++++++++++++----------------- > 1 file changed, 61 insertions(+), 25 deletions(-) > > diff --git a/mm/truncate.c b/mm/truncate.c > index 3dfa2d5e642e..af1eaa5b9450 100644 > --- a/mm/truncate.c > +++ b/mm/truncate.c > @@ -25,44 +25,77 @@ > #include <linux/rmap.h> > #include "internal.h" > > -static void clear_shadow_entry(struct address_space *mapping, pgoff_t index, > - void *entry) > +/* > + * Regular page slots are stabilized by the page lock even without the tree > + * itself locked. These unlocked entries need verification under the tree > + * lock. > + */ > +static inline void __clear_shadow_entry(struct address_space *mapping, > + pgoff_t index, void *entry) > { > struct radix_tree_node *node; > void **slot; > > - spin_lock_irq(&mapping->tree_lock); > - /* > - * Regular page slots are stabilized by the page lock even > - * without the tree itself locked. These unlocked entries > - * need verification under the tree lock. > - */ > if (!__radix_tree_lookup(&mapping->page_tree, index, &node, &slot)) > - goto unlock; > + return; > if (*slot != entry) > - goto unlock; > + return; > __radix_tree_replace(&mapping->page_tree, node, slot, NULL, > workingset_update_node, mapping); > mapping->nrexceptional--; > -unlock: > +} > + > +static void clear_shadow_entry(struct address_space *mapping, pgoff_t index, > + void *entry) > +{ > + spin_lock_irq(&mapping->tree_lock); > + __clear_shadow_entry(mapping, index, entry); > spin_unlock_irq(&mapping->tree_lock); > } > > /* > - * Unconditionally remove exceptional entry. Usually called from truncate path. > + * Unconditionally remove exceptional entries. Usually called from truncate > + * path. Note that the pagevec may be altered by this function by removing > + * exceptional entries similar to what pagevec_remove_exceptionals does. > */ > -static void truncate_exceptional_entry(struct address_space *mapping, > - pgoff_t index, void *entry) > +static void truncate_exceptional_pvec_entries(struct address_space *mapping, > + struct pagevec *pvec, pgoff_t *indices, int ei) > { > + int i, j; > + bool dax; > + > + /* Return immediately if caller indicates there are no entries */ > + if (ei == PAGEVEC_SIZE) > + return; > + > /* Handled by shmem itself */ > if (shmem_mapping(mapping)) > return; > > - if (dax_mapping(mapping)) { > - dax_delete_mapping_entry(mapping, index); > - return; > + dax = dax_mapping(mapping); > + if (!dax) > + spin_lock_irq(&mapping->tree_lock); > + > + for (i = ei, j = ei; i < pagevec_count(pvec); i++) { > + struct page *page = pvec->pages[i]; > + pgoff_t index = indices[i]; > + > + if (!radix_tree_exceptional_entry(page)) { > + pvec->pages[j++] = page; > + continue; > + } > + > + if (unlikely(dax)) { > + dax_delete_mapping_entry(mapping, index); > + continue; > + } > + > + __clear_shadow_entry(mapping, index, page); > } > - clear_shadow_entry(mapping, index, entry); > + > + if (!dax) > + spin_unlock_irq(&mapping->tree_lock); > + pvec->nr = j; > } When I look at this I think could make things cleaner. I have the following observations: 1) All truncate_inode_pages(), invalidate_mapping_pages(), invalidate_inode_pages2_range() essentially do very similar thing and would benefit from a similar kind of batching. 2) As you observed and measured, batching of radix tree operations makes sense both when removing pages and shadow entries, I'm very confident it would make sense for DAX exceptional entries as well. 3) In all cases (i.e., those three functions and for all entry types) the workflow seems to be: * lockless lookup of entries * prepare entry for reclaim (or determine it is not elligible) * lock mapping->tree_lock * verify entry is still elligible for reclaim (otherwise bail) * clear radix tree entry * unlock mapping->tree_lock * final cleanup of the entry So I'm wondering whether we cannot somehow refactor stuff so that batching of radix tree operations could be shared and we wouldn't have to duplicate it in all those cases. But it would be rather large overhaul of the code so it may be a bit out of scope for these improvements... > @@ -409,8 +445,8 @@ void truncate_inode_pages_range(struct address_space *mapping, > } > > if (radix_tree_exceptional_entry(page)) { > - truncate_exceptional_entry(mapping, index, > - page); > + if (ei != PAGEVEC_SIZE) > + ei = i; This should be ei == PAGEVEC_SIZE I think. Otherwise the patch looks good to me so feel free to add: Reviewed-by: Jan Kara <jack@suse.cz> Honza -- Jan Kara <jack@suse.com> SUSE Labs, CR -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2017-10-12 13:33 UTC|newest] Thread overview: 31+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-10-12 9:30 [PATCH 0/8] Follow-up for speed up page cache truncation Mel Gorman 2017-10-12 9:30 ` Mel Gorman 2017-10-12 9:30 ` [PATCH 1/8] mm, page_alloc: Enable/disable IRQs once when freeing a list of pages Mel Gorman 2017-10-12 9:30 ` Mel Gorman 2017-10-12 9:30 ` [PATCH 2/8] mm, truncate: Do not check mapping for every page being truncated Mel Gorman 2017-10-12 9:30 ` Mel Gorman 2017-10-12 12:15 ` Jan Kara 2017-10-12 12:15 ` Jan Kara 2017-10-12 12:41 ` Mel Gorman 2017-10-12 12:41 ` Mel Gorman 2017-10-12 19:11 ` Johannes Weiner 2017-10-12 19:11 ` Johannes Weiner 2017-10-12 9:30 ` [PATCH 3/8] mm, truncate: Remove all exceptional entries from pagevec under one lock Mel Gorman 2017-10-12 9:30 ` Mel Gorman 2017-10-12 13:33 ` Jan Kara [this message] 2017-10-12 13:33 ` Jan Kara 2017-10-12 14:53 ` Mel Gorman 2017-10-12 14:53 ` Mel Gorman 2017-10-12 19:45 ` Johannes Weiner 2017-10-12 19:45 ` Johannes Weiner 2017-10-12 9:30 ` [PATCH 4/8] mm: Only drain per-cpu pagevecs once per pagevec usage Mel Gorman 2017-10-12 9:30 ` Mel Gorman 2017-10-12 9:31 ` [PATCH 5/8] mm, pagevec: Remove cold parameter for pagevecs Mel Gorman 2017-10-12 9:31 ` Mel Gorman 2017-10-12 9:31 ` [PATCH 6/8] mm: Remove cold parameter for release_pages Mel Gorman 2017-10-12 9:31 ` [PATCH 7/8] mm, Remove cold parameter from free_hot_cold_page* Mel Gorman 2017-10-12 9:31 ` Mel Gorman 2017-10-12 9:31 ` [PATCH 8/8] mm: Remove __GFP_COLD Mel Gorman 2017-10-12 9:31 ` Mel Gorman 2017-10-18 7:59 [PATCH 0/8] Follow-up for speed up page cache truncation v2 Mel Gorman 2017-10-18 7:59 ` [PATCH 3/8] mm, truncate: Remove all exceptional entries from pagevec under one lock Mel Gorman 2017-10-18 7:59 ` Mel Gorman
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20171012133323.GB29293@quack2.suse.cz \ --to=jack@suse.cz \ --cc=ak@linux.intel.com \ --cc=dave.hansen@intel.com \ --cc=david@fromorbit.com \ --cc=linux-fsdevel@vger.kernel.org \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@techsingularity.net \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.