Linux-mm Archive on lore.kernel.org
 help / color / Atom feed
From: Vlastimil Babka <vbabka@suse.cz>
To: Michal Hocko <mhocko@kernel.org>, Jia He <hejianet@gmail.com>
Cc: linux-mm@kvack.org, linux-kernel@vger.kernel.org,
	Andrew Morton <akpm@linux-foundation.org>,
	Mel Gorman <mgorman@techsingularity.net>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Joonsoo Kim <iamjoonsoo.kim@lge.com>,
	Taku Izumi <izumi.taku@jp.fujitsu.com>Mel Gorman
	<mgorman@techsingularity.net>
Subject: Re: [PATCH RFC 1/1] mm, page_alloc: fix incorrect zone_statistics data
Date: Tue, 20 Dec 2016 16:13:02 +0100
Message-ID: <233ed490-afb9-4644-6d84-c9f888882da2@suse.cz> (raw)
In-Reply-To: <20161220091814.GC3769@dhcp22.suse.cz>


[-- Attachment #1: Type: text/plain, Size: 2451 bytes --]

On 12/20/2016 10:18 AM, Michal Hocko wrote:
> On Mon 12-12-16 13:59:07, Jia He wrote:
>> In commit b9f00e147f27 ("mm, page_alloc: reduce branches in
>> zone_statistics"), it reconstructed codes to reduce the branch miss rate.
>> Compared with the original logic, it assumed if !(flag & __GFP_OTHER_NODE)
>>  z->node would not be equal to preferred_zone->node. That seems to be
>> incorrect.
> 
> I am sorry but I have hard time following the changelog. It is clear
> that you are trying to fix a missed NUMA_{HIT,OTHER} accounting
> but it is not really clear when such thing happens. You are adding
> preferred_zone->node check. preferred_zone is the first zone in the
> requested zonelist. So for the most allocations it is a node from the
> local node. But if something request an explicit numa node (without
> __GFP_OTHER_NODE which would be the majority I suspect) then we could
> indeed end up accounting that as a NUMA_MISS, NUMA_FOREIGN so the
> referenced patch indeed caused an unintended change of accounting AFAIU.

Since I could not quite wrap my head around all possible combinations,
I've got lazy and employed cbmc, inspired by [1] (test file attached if
anyone wants to play with it) and tried to test whether both commit
b9f00e147f27, and Jia He's patch are equivalent to pre-b9f00e147f27. The
tool found counter examples for both cases (there might be more
scenarios, but it only reports the first one it hits). Note that
decoding the output is not exactly trivial...

[1] http://paulmck.livejournal.com/38997.html

==============
Counterexample for commit b9f00e147f27 vs pre-b9f00e147f27:

numa_node_id() == 1
preferred_zone on node 0
allocated from zone on node 1
without __GFP_OTHER_NODE

pre-b9f00e147f27:
allocated zone (node 1) increased NUMA_MISS and NUMA_LOCAL
preferred zone (node 0) increased NUMA_FOREIGN

(that looks correct to me)

b9f00e147f27:
allocated zone (node 1) got NUMA_HIT and NUMA_LOCAL
there's no NUMA_FOREIGN

(that's wrong wrt HIT and FOREIGN)

==============
Counterexample for Jia He's patch vs pre-b9f00e147f27:

numa_node_id() == 0
preferred_zone on node 0
allocated from zone on node 1
without __GFP_OTHER_NODE

pre-b9f00e147f27:
allocated zone (node 1) increased NUMA_MISS and NUMA_OTHER
preferred zone (node 0) increased NUMA_FOREIGN

(that looks correct to me)

Jia He's patch:
allocated zone (node 1) increased NUMA_MISS
preferred zone (node 0) increased NUMA_FOREIGN

(it's missing NUMA_OTHER)

[-- Attachment #2: test-numa.c --]
[-- Type: text/x-csrc, Size: 3798 bytes --]

#include <stdio.h>

#define NODES	3
#define ZONES	2

enum zone_stat_item {
	NUMA_HIT,
	NUMA_MISS,
	NUMA_LOCAL,
	NUMA_OTHER,
	NUMA_FOREIGN,
	ZONE_STAT_ITEMS
};

struct node;
struct zone {
	struct node *zone_pgdat;
	int node;
	int vm_stat[ZONE_STAT_ITEMS];
};

struct node {
	int nid;
	struct zone node_zones[ZONES];
};

typedef int gfp_t;
#define __GFP_OTHER_NODE	1
#define unlikely(x) (x)

int local_nid;
int numa_node_id()
{
	return local_nid;
}

static void __inc_zone_state(struct zone *z, enum zone_stat_item stat)
{
	z->vm_stat[stat]++;
}

// before b9f00e147f27
void zone_statistics1(struct zone *preferred_zone, struct zone *z, gfp_t flags)
{
	if (z->zone_pgdat == preferred_zone->zone_pgdat) {
		__inc_zone_state(z, NUMA_HIT);
	} else {
		__inc_zone_state(z, NUMA_MISS);
		__inc_zone_state(preferred_zone, NUMA_FOREIGN);
	}

	if (z->node == ((flags & __GFP_OTHER_NODE) ?
		preferred_zone->node : numa_node_id()))
                __inc_zone_state(z, NUMA_LOCAL);
        else
                __inc_zone_state(z, NUMA_OTHER);
}

// after b9f00e147f27
void zone_statistics2(struct zone *preferred_zone, struct zone *z, gfp_t flags)
{
        int local_nid = numa_node_id();
        enum zone_stat_item local_stat = NUMA_LOCAL;

        if (unlikely(flags & __GFP_OTHER_NODE)) {
                local_stat = NUMA_OTHER;
                local_nid = preferred_zone->node;
        }

        if (z->node == local_nid) {
                __inc_zone_state(z, NUMA_HIT);
                __inc_zone_state(z, local_stat);
        } else {
                __inc_zone_state(z, NUMA_MISS);
                __inc_zone_state(preferred_zone, NUMA_FOREIGN);
        }
}

// after Jia He's patch
static inline void zone_statistics3(struct zone *preferred_zone, struct zone *z,
                                                                gfp_t flags)
{
        int local_nid = numa_node_id();
        enum zone_stat_item local_stat = NUMA_LOCAL;

        if (unlikely(flags & __GFP_OTHER_NODE)) {
                local_stat = NUMA_OTHER;
                local_nid = preferred_zone->node;
        }

        if (z->node == local_nid) {
                __inc_zone_state(z, NUMA_HIT);
                __inc_zone_state(z, local_stat);
        } else if (z->node == preferred_zone->node) {
                __inc_zone_state(z, NUMA_HIT);
                __inc_zone_state(z, NUMA_OTHER);
        } else {
                __inc_zone_state(z, NUMA_MISS);
                __inc_zone_state(preferred_zone, NUMA_FOREIGN);
        }
}


void init_node(struct node * n, int nid)
{
	n->nid = nid;
	for (int i = 0; i < ZONES; i++) {
		struct zone * z = &n->node_zones[i];

		z->zone_pgdat = n;
		z->node = nid;
		for (int j = 0; j < ZONE_STAT_ITEMS; j++)
			z->vm_stat[j] = 0;
	}
}

struct node nodes1[NODES];
struct node nodes2[NODES];

void check_stats()
{
	for (int i = 0; i < NODES; i++)
		for (int j = 0; j < ZONES; j++)
			for (int k = 0; k < ZONE_STAT_ITEMS; k++)
				assert (nodes1[i].node_zones[j].vm_stat[k]
				     == nodes2[i].node_zones[j].vm_stat[k]);

}

int main(int argc, char *argv[])
{

	for (int i = 0; i < NODES; i++) {
		init_node(&nodes1[i], i);
		init_node(&nodes2[i], i);
	}

	local_nid = ((unsigned int)argv[1]) % NODES;
	int zone_nid = ((unsigned int)argv[2]) % NODES;
	int pzone_nid = ((unsigned int)argv[3]) % NODES;
	int zid = ((unsigned int)argv[4]) % ZONES;
	int pzid = ((unsigned int)argv[5]) % ZONES;

	/* we should not allocate from higher than preferred zone */
	if (zid > pzid)
		zid = pzid;

	gfp_t flags = ((unsigned int)argv[6]) & __GFP_OTHER_NODE;

	zone_statistics1(&nodes1[pzone_nid].node_zones[pzid],
			 &nodes1[zone_nid].node_zones[zid],
			 flags);

	zone_statistics3(&nodes2[pzone_nid].node_zones[pzid],
			 &nodes2[zone_nid].node_zones[zid],
			 flags);

	check_stats();
}


  parent reply index

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-12-12  5:59 [PATCH RFC 0/1] " Jia He
2016-12-12  5:59 ` [PATCH RFC 1/1] " Jia He
2016-12-20  9:18   ` Michal Hocko
2016-12-20 13:10     ` Mel Gorman
2016-12-20 13:26       ` Michal Hocko
2016-12-20 14:28         ` Mel Gorman
2016-12-20 14:35           ` Michal Hocko
2016-12-20 14:49             ` Vlastimil Babka
2016-12-20 14:54             ` Mel Gorman
2016-12-21  7:57               ` Michal Hocko
2016-12-21  8:06                 ` [PATCH 1/2] mm: fix remote numa hits statistics Michal Hocko
2016-12-21  8:06                   ` [PATCH 2/2] mm: get rid of __GFP_OTHER_NODE Michal Hocko
2017-01-02 14:18                     ` Vlastimil Babka
2016-12-29 11:46                   ` [PATCH 1/2] mm: fix remote numa hits statistics Mel Gorman
2016-12-29 12:28                     ` Michal Hocko
2017-01-02 14:16                   ` Vlastimil Babka
2017-01-02 14:46                     ` Michal Hocko
2017-01-02 15:07                       ` Vlastimil Babka
2016-12-20 14:42           ` [PATCH RFC 1/1] mm, page_alloc: fix incorrect zone_statistics data Mel Gorman
2016-12-20 15:13     ` Vlastimil Babka [this message]
2016-12-21  3:01     ` hejianet
2016-12-20 12:31   ` Mel Gorman
2016-12-21  3:07     ` hejianet

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=233ed490-afb9-4644-6d84-c9f888882da2@suse.cz \
    --to=vbabka@suse.cz \
    --cc=akpm@linux-foundation.org \
    --cc=hannes@cmpxchg.org \
    --cc=hejianet@gmail.com \
    --cc=iamjoonsoo.kim@lge.com \
    --cc=izumi.taku@jp.fujitsu.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mgorman@techsingularity.net \
    --cc=mhocko@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Linux-mm Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-mm/0 linux-mm/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-mm linux-mm/ https://lore.kernel.org/linux-mm \
		linux-mm@kvack.org
	public-inbox-index linux-mm

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kvack.linux-mm


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git