From: Michal Hocko <mhocko@kernel.org> To: Jia He <hejianet@gmail.com> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>, Vlastimil Babka <vbabka@suse.cz>, Mel Gorman <mgorman@techsingularity.net>, Johannes Weiner <hannes@cmpxchg.org>, Joonsoo Kim <iamjoonsoo.kim@lge.com>, Taku Izumi <izumi.taku@jp.fujitsu.com> Subject: Re: [PATCH RFC 1/1] mm, page_alloc: fix incorrect zone_statistics data Date: Tue, 20 Dec 2016 10:18:14 +0100 [thread overview] Message-ID: <20161220091814.GC3769@dhcp22.suse.cz> (raw) In-Reply-To: <1481522347-20393-2-git-send-email-hejianet@gmail.com> On Mon 12-12-16 13:59:07, Jia He wrote: > In commit b9f00e147f27 ("mm, page_alloc: reduce branches in > zone_statistics"), it reconstructed codes to reduce the branch miss rate. > Compared with the original logic, it assumed if !(flag & __GFP_OTHER_NODE) > z->node would not be equal to preferred_zone->node. That seems to be > incorrect. I am sorry but I have hard time following the changelog. It is clear that you are trying to fix a missed NUMA_{HIT,OTHER} accounting but it is not really clear when such thing happens. You are adding preferred_zone->node check. preferred_zone is the first zone in the requested zonelist. So for the most allocations it is a node from the local node. But if something request an explicit numa node (without __GFP_OTHER_NODE which would be the majority I suspect) then we could indeed end up accounting that as a NUMA_MISS, NUMA_FOREIGN so the referenced patch indeed caused an unintended change of accounting AFAIU. If this is correct then it should be a part of the changelog. I also cannot say I would like the fix. First of all I am not sure __GFP_OTHER_NODE is a good idea at all. How is an explicit usage of the flag any different from an explicit __alloc_pages_node(non_local_nid)? In both cases we ask for an allocation on a remote node and successful allocation is a NUMA_HIT and NUMA_OTHER. That being said, why cannot we simply do the following? As a bonus, we can get rid of a barely used __GFP_OTHER_NODE. Also the number of branches will stay same. --- diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 429855be6ec9..f035d5c8b864 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2583,25 +2583,17 @@ int __isolate_free_page(struct page *page, unsigned int order) * Update NUMA hit/miss statistics * * Must be called with interrupts disabled. - * - * When __GFP_OTHER_NODE is set assume the node of the preferred - * zone is the local node. This is useful for daemons who allocate - * memory on behalf of other processes. */ static inline void zone_statistics(struct zone *preferred_zone, struct zone *z, gfp_t flags) { #ifdef CONFIG_NUMA - int local_nid = numa_node_id(); - enum zone_stat_item local_stat = NUMA_LOCAL; - - if (unlikely(flags & __GFP_OTHER_NODE)) { - local_stat = NUMA_OTHER; - local_nid = preferred_zone->node; - } + if (z->node == preferred_zone->node) { + enum zone_stat_item local_stat = NUMA_LOCAL; - if (z->node == local_nid) { __inc_zone_state(z, NUMA_HIT); + if (z->node != numa_node_id()) + local_stat = NUMA_OTHER; __inc_zone_state(z, local_stat); } else { __inc_zone_state(z, NUMA_MISS); -- Michal Hocko SUSE Labs
WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org> To: Jia He <hejianet@gmail.com> Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org, Andrew Morton <akpm@linux-foundation.org>, Vlastimil Babka <vbabka@suse.cz>, Mel Gorman <mgorman@techsingularity.net>, Johannes Weiner <hannes@cmpxchg.org>, Joonsoo Kim <iamjoonsoo.kim@lge.com>, Taku Izumi <izumi.taku@jp.fujitsu.com> Subject: Re: [PATCH RFC 1/1] mm, page_alloc: fix incorrect zone_statistics data Date: Tue, 20 Dec 2016 10:18:14 +0100 [thread overview] Message-ID: <20161220091814.GC3769@dhcp22.suse.cz> (raw) In-Reply-To: <1481522347-20393-2-git-send-email-hejianet@gmail.com> On Mon 12-12-16 13:59:07, Jia He wrote: > In commit b9f00e147f27 ("mm, page_alloc: reduce branches in > zone_statistics"), it reconstructed codes to reduce the branch miss rate. > Compared with the original logic, it assumed if !(flag & __GFP_OTHER_NODE) > z->node would not be equal to preferred_zone->node. That seems to be > incorrect. I am sorry but I have hard time following the changelog. It is clear that you are trying to fix a missed NUMA_{HIT,OTHER} accounting but it is not really clear when such thing happens. You are adding preferred_zone->node check. preferred_zone is the first zone in the requested zonelist. So for the most allocations it is a node from the local node. But if something request an explicit numa node (without __GFP_OTHER_NODE which would be the majority I suspect) then we could indeed end up accounting that as a NUMA_MISS, NUMA_FOREIGN so the referenced patch indeed caused an unintended change of accounting AFAIU. If this is correct then it should be a part of the changelog. I also cannot say I would like the fix. First of all I am not sure __GFP_OTHER_NODE is a good idea at all. How is an explicit usage of the flag any different from an explicit __alloc_pages_node(non_local_nid)? In both cases we ask for an allocation on a remote node and successful allocation is a NUMA_HIT and NUMA_OTHER. That being said, why cannot we simply do the following? As a bonus, we can get rid of a barely used __GFP_OTHER_NODE. Also the number of branches will stay same. --- diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 429855be6ec9..f035d5c8b864 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2583,25 +2583,17 @@ int __isolate_free_page(struct page *page, unsigned int order) * Update NUMA hit/miss statistics * * Must be called with interrupts disabled. - * - * When __GFP_OTHER_NODE is set assume the node of the preferred - * zone is the local node. This is useful for daemons who allocate - * memory on behalf of other processes. */ static inline void zone_statistics(struct zone *preferred_zone, struct zone *z, gfp_t flags) { #ifdef CONFIG_NUMA - int local_nid = numa_node_id(); - enum zone_stat_item local_stat = NUMA_LOCAL; - - if (unlikely(flags & __GFP_OTHER_NODE)) { - local_stat = NUMA_OTHER; - local_nid = preferred_zone->node; - } + if (z->node == preferred_zone->node) { + enum zone_stat_item local_stat = NUMA_LOCAL; - if (z->node == local_nid) { __inc_zone_state(z, NUMA_HIT); + if (z->node != numa_node_id()) + local_stat = NUMA_OTHER; __inc_zone_state(z, local_stat); } else { __inc_zone_state(z, NUMA_MISS); -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-12-20 9:18 UTC|newest] Thread overview: 38+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-12-12 5:59 [PATCH RFC 0/1] mm, page_alloc: fix incorrect zone_statistics data Jia He 2016-12-12 5:59 ` Jia He 2016-12-12 5:59 ` [PATCH RFC 1/1] " Jia He 2016-12-12 5:59 ` Jia He 2016-12-20 9:18 ` Michal Hocko [this message] 2016-12-20 9:18 ` Michal Hocko 2016-12-20 13:10 ` Mel Gorman 2016-12-20 13:10 ` Mel Gorman 2016-12-20 13:26 ` Michal Hocko 2016-12-20 13:26 ` Michal Hocko 2016-12-20 14:28 ` Mel Gorman 2016-12-20 14:28 ` Mel Gorman 2016-12-20 14:35 ` Michal Hocko 2016-12-20 14:35 ` Michal Hocko 2016-12-20 14:49 ` Vlastimil Babka 2016-12-20 14:49 ` Vlastimil Babka 2016-12-20 14:54 ` Mel Gorman 2016-12-20 14:54 ` Mel Gorman 2016-12-21 7:57 ` Michal Hocko 2016-12-21 7:57 ` Michal Hocko 2016-12-21 8:06 ` [PATCH 1/2] mm: fix remote numa hits statistics Michal Hocko 2016-12-21 8:06 ` [PATCH 2/2] mm: get rid of __GFP_OTHER_NODE Michal Hocko 2017-01-02 14:18 ` Vlastimil Babka 2016-12-29 11:46 ` [PATCH 1/2] mm: fix remote numa hits statistics Mel Gorman 2016-12-29 12:28 ` Michal Hocko 2017-01-02 14:16 ` Vlastimil Babka 2017-01-02 14:46 ` Michal Hocko 2017-01-02 15:07 ` Vlastimil Babka 2016-12-20 14:42 ` [PATCH RFC 1/1] mm, page_alloc: fix incorrect zone_statistics data Mel Gorman 2016-12-20 14:42 ` Mel Gorman 2016-12-20 15:13 ` Vlastimil Babka 2016-12-20 15:13 ` Vlastimil Babka 2016-12-21 3:01 ` hejianet 2016-12-21 3:01 ` hejianet 2016-12-20 12:31 ` Mel Gorman 2016-12-20 12:31 ` Mel Gorman 2016-12-21 3:07 ` hejianet 2016-12-21 3:07 ` hejianet
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20161220091814.GC3769@dhcp22.suse.cz \ --to=mhocko@kernel.org \ --cc=akpm@linux-foundation.org \ --cc=hannes@cmpxchg.org \ --cc=hejianet@gmail.com \ --cc=iamjoonsoo.kim@lge.com \ --cc=izumi.taku@jp.fujitsu.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@techsingularity.net \ --cc=vbabka@suse.cz \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.