All of lore.kernel.org
 help / color / mirror / Atom feed
From: hejianet <hejianet@gmail.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Taku Izumi <izumi.taku@jp.fujitsu.com>
Subject: Re: [PATCH RFC 1/1] mm, page_alloc: fix incorrect zone_statistics data
Date: Wed, 21 Dec 2016 11:01:08 +0800	[thread overview]
Message-ID: <84c018b5-bf63-6057-e39f-c8e0935bca09@gmail.com> (raw)
In-Reply-To: <20161220091814.GC3769@dhcp22.suse.cz>



On 20/12/2016 5:18 PM, Michal Hocko wrote:
> On Mon 12-12-16 13:59:07, Jia He wrote:
>> In commit b9f00e147f27 ("mm, page_alloc: reduce branches in
>> zone_statistics"), it reconstructed codes to reduce the branch miss rate.
>> Compared with the original logic, it assumed if !(flag & __GFP_OTHER_NODE)
>>   z->node would not be equal to preferred_zone->node. That seems to be
>> incorrect.
> I am sorry but I have hard time following the changelog. It is clear
> that you are trying to fix a missed NUMA_{HIT,OTHER} accounting
> but it is not really clear when such thing happens. You are adding
> preferred_zone->node check. preferred_zone is the first zone in the
> requested zonelist. So for the most allocations it is a node from the
> local node. But if something request an explicit numa node (without
> __GFP_OTHER_NODE which would be the majority I suspect) then we could
> indeed end up accounting that as a NUMA_MISS, NUMA_FOREIGN so the
> referenced patch indeed caused an unintended change of accounting AFAIU.
>
> If this is correct then it should be a part of the changelog. I also
> cannot say I would like the fix. First of all I am not sure
> __GFP_OTHER_NODE is a good idea at all. How is an explicit usage of the
> flag any different from an explicit __alloc_pages_node(non_local_nid)?
> In both cases we ask for an allocation on a remote node and successful
> allocation is a NUMA_HIT and NUMA_OTHER.
>
> That being said, why cannot we simply do the following? As a bonus, we
> can get rid of a barely used __GFP_OTHER_NODE. Also the number of
> branches will stay same.
Yes, I agree maybe we can get rid of __GFP_OTHER_NODE if no objections
Seems currently it is only used for hugepage and statistics
> ---
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 429855be6ec9..f035d5c8b864 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2583,25 +2583,17 @@ int __isolate_free_page(struct page *page, unsigned int order)
>    * Update NUMA hit/miss statistics
>    *
>    * Must be called with interrupts disabled.
> - *
> - * When __GFP_OTHER_NODE is set assume the node of the preferred
> - * zone is the local node. This is useful for daemons who allocate
> - * memory on behalf of other processes.
>    */
>   static inline void zone_statistics(struct zone *preferred_zone, struct zone *z,
>   								gfp_t flags)
>   {
>   #ifdef CONFIG_NUMA
> -	int local_nid = numa_node_id();
> -	enum zone_stat_item local_stat = NUMA_LOCAL;
> -
> -	if (unlikely(flags & __GFP_OTHER_NODE)) {
> -		local_stat = NUMA_OTHER;
> -		local_nid = preferred_zone->node;
> -	}
> +	if (z->node == preferred_zone->node) {
> +		enum zone_stat_item local_stat = NUMA_LOCAL;
>   
> -	if (z->node == local_nid) {
>   		__inc_zone_state(z, NUMA_HIT);
> +		if (z->node != numa_node_id())
> +			local_stat = NUMA_OTHER;
>   		__inc_zone_state(z, local_stat);
>   	} else {
>   		__inc_zone_state(z, NUMA_MISS);
I thought the logic here is different
Here is the zone_statistics() before introducing __GFP_OTHER_NODE:

if (z->zone_pgdat == preferred_zone->zone_pgdat) {
         __inc_zone_state(z, NUMA_HIT);
     } else {
         __inc_zone_state(z, NUMA_MISS);
         __inc_zone_state(preferred_zone, NUMA_FOREIGN);
     }
     if (z->node == numa_node_id())
         __inc_zone_state(z, NUMA_LOCAL);
     else
         __inc_zone_state(z, NUMA_OTHER);

B.R.
Jia

WARNING: multiple messages have this Message-ID (diff)
From: hejianet <hejianet@gmail.com>
To: Michal Hocko <mhocko@kernel.org>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Vlastimil Babka <vbabka@suse.cz>,
	Mel Gorman <mgorman@techsingularity.net>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Taku Izumi <izumi.taku@jp.fujitsu.com>
Subject: Re: [PATCH RFC 1/1] mm, page_alloc: fix incorrect zone_statistics data
Date: Wed, 21 Dec 2016 11:01:08 +0800	[thread overview]
Message-ID: <84c018b5-bf63-6057-e39f-c8e0935bca09@gmail.com> (raw)
In-Reply-To: <20161220091814.GC3769@dhcp22.suse.cz>



On 20/12/2016 5:18 PM, Michal Hocko wrote:
> On Mon 12-12-16 13:59:07, Jia He wrote:
>> In commit b9f00e147f27 ("mm, page_alloc: reduce branches in
>> zone_statistics"), it reconstructed codes to reduce the branch miss rate.
>> Compared with the original logic, it assumed if !(flag & __GFP_OTHER_NODE)
>>   z->node would not be equal to preferred_zone->node. That seems to be
>> incorrect.
> I am sorry but I have hard time following the changelog. It is clear
> that you are trying to fix a missed NUMA_{HIT,OTHER} accounting
> but it is not really clear when such thing happens. You are adding
> preferred_zone->node check. preferred_zone is the first zone in the
> requested zonelist. So for the most allocations it is a node from the
> local node. But if something request an explicit numa node (without
> __GFP_OTHER_NODE which would be the majority I suspect) then we could
> indeed end up accounting that as a NUMA_MISS, NUMA_FOREIGN so the
> referenced patch indeed caused an unintended change of accounting AFAIU.
>
> If this is correct then it should be a part of the changelog. I also
> cannot say I would like the fix. First of all I am not sure
> __GFP_OTHER_NODE is a good idea at all. How is an explicit usage of the
> flag any different from an explicit __alloc_pages_node(non_local_nid)?
> In both cases we ask for an allocation on a remote node and successful
> allocation is a NUMA_HIT and NUMA_OTHER.
>
> That being said, why cannot we simply do the following? As a bonus, we
> can get rid of a barely used __GFP_OTHER_NODE. Also the number of
> branches will stay same.
Yes, I agree maybe we can get rid of __GFP_OTHER_NODE if no objections
Seems currently it is only used for hugepage and statistics
> ---
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 429855be6ec9..f035d5c8b864 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -2583,25 +2583,17 @@ int __isolate_free_page(struct page *page, unsigned int order)
>    * Update NUMA hit/miss statistics
>    *
>    * Must be called with interrupts disabled.
> - *
> - * When __GFP_OTHER_NODE is set assume the node of the preferred
> - * zone is the local node. This is useful for daemons who allocate
> - * memory on behalf of other processes.
>    */
>   static inline void zone_statistics(struct zone *preferred_zone, struct zone *z,
>   								gfp_t flags)
>   {
>   #ifdef CONFIG_NUMA
> -	int local_nid = numa_node_id();
> -	enum zone_stat_item local_stat = NUMA_LOCAL;
> -
> -	if (unlikely(flags & __GFP_OTHER_NODE)) {
> -		local_stat = NUMA_OTHER;
> -		local_nid = preferred_zone->node;
> -	}
> +	if (z->node == preferred_zone->node) {
> +		enum zone_stat_item local_stat = NUMA_LOCAL;
>   
> -	if (z->node == local_nid) {
>   		__inc_zone_state(z, NUMA_HIT);
> +		if (z->node != numa_node_id())
> +			local_stat = NUMA_OTHER;
>   		__inc_zone_state(z, local_stat);
>   	} else {
>   		__inc_zone_state(z, NUMA_MISS);
I thought the logic here is different
Here is the zone_statistics() before introducing __GFP_OTHER_NODE:

if (z->zone_pgdat == preferred_zone->zone_pgdat) {
         __inc_zone_state(z, NUMA_HIT);
     } else {
         __inc_zone_state(z, NUMA_MISS);
         __inc_zone_state(preferred_zone, NUMA_FOREIGN);
     }
     if (z->node == numa_node_id())
         __inc_zone_state(z, NUMA_LOCAL);
     else
         __inc_zone_state(z, NUMA_OTHER);

B.R.
Jia

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  parent reply	other threads:[~2016-12-21  3:01 UTC|newest]

Thread overview: 38+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-12  5:59 [PATCH RFC 0/1] mm, page_alloc: fix incorrect zone_statistics data Jia He
2016-12-12  5:59 ` Jia He
2016-12-12  5:59 ` [PATCH RFC 1/1] " Jia He
2016-12-12  5:59   ` Jia He
2016-12-20  9:18   ` Michal Hocko
2016-12-20  9:18     ` Michal Hocko
2016-12-20 13:10     ` Mel Gorman
2016-12-20 13:10       ` Mel Gorman
2016-12-20 13:26       ` Michal Hocko
2016-12-20 13:26         ` Michal Hocko
2016-12-20 14:28         ` Mel Gorman
2016-12-20 14:28           ` Mel Gorman
2016-12-20 14:35           ` Michal Hocko
2016-12-20 14:35             ` Michal Hocko
2016-12-20 14:49             ` Vlastimil Babka
2016-12-20 14:49               ` Vlastimil Babka
2016-12-20 14:54             ` Mel Gorman
2016-12-20 14:54               ` Mel Gorman
2016-12-21  7:57               ` Michal Hocko
2016-12-21  7:57                 ` Michal Hocko
2016-12-21  8:06                 ` [PATCH 1/2] mm: fix remote numa hits statistics Michal Hocko
2016-12-21  8:06                   ` [PATCH 2/2] mm: get rid of __GFP_OTHER_NODE Michal Hocko
2017-01-02 14:18                     ` Vlastimil Babka
2016-12-29 11:46                   ` [PATCH 1/2] mm: fix remote numa hits statistics Mel Gorman
2016-12-29 12:28                     ` Michal Hocko
2017-01-02 14:16                   ` Vlastimil Babka
2017-01-02 14:46                     ` Michal Hocko
2017-01-02 15:07                       ` Vlastimil Babka
2016-12-20 14:42           ` [PATCH RFC 1/1] mm, page_alloc: fix incorrect zone_statistics data Mel Gorman
2016-12-20 14:42             ` Mel Gorman
2016-12-20 15:13     ` Vlastimil Babka
2016-12-20 15:13       ` Vlastimil Babka
2016-12-21  3:01     ` hejianet [this message]
2016-12-21  3:01       ` hejianet
2016-12-20 12:31   ` Mel Gorman
2016-12-20 12:31     ` Mel Gorman
2016-12-21  3:07     ` hejianet
2016-12-21  3:07       ` hejianet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=84c018b5-bf63-6057-e39f-c8e0935bca09@gmail.com \
    --to=hejianet@gmail.com \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    --cc=vbabka@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.