All of lore.kernel.org
 help / color / mirror / Atom feed
From: Michal Hocko <mhocko@kernel.org>
To: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: linux-mm@kvack.org, Zi Yan <zi.yan@cs.rutgers.edu>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Reale <ar@linux.vnet.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 1/3] mm, numa: rework do_pages_move
Date: Wed, 3 Jan 2018 10:52:11 +0100	[thread overview]
Message-ID: <20180103095211.GC11319@dhcp22.suse.cz> (raw)
In-Reply-To: <32bec0c9-60e2-0362-9446-feb4de1b119c@linux.vnet.ibm.com>

On Wed 03-01-18 15:06:49, Anshuman Khandual wrote:
> On 01/03/2018 02:28 PM, Michal Hocko wrote:
> > On Wed 03-01-18 14:12:17, Anshuman Khandual wrote:
> >> On 12/08/2017 09:45 PM, Michal Hocko wrote:
[...]
> >>> @@ -1593,79 +1556,80 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
> >>>  			 const int __user *nodes,
> >>>  			 int __user *status, int flags)
> >>>  {
> >>> -	struct page_to_node *pm;
> >>> -	unsigned long chunk_nr_pages;
> >>> -	unsigned long chunk_start;
> >>> -	int err;
> >>> -
> >>> -	err = -ENOMEM;
> >>> -	pm = (struct page_to_node *)__get_free_page(GFP_KERNEL);
> >>> -	if (!pm)
> >>> -		goto out;
> >>> +	int chunk_node = NUMA_NO_NODE;
> >>> +	LIST_HEAD(pagelist);
> >>> +	int chunk_start, i;
> >>> +	int err = 0, err1;
> >>
> >> err init might not be required, its getting assigned to -EFAULT right away.
> > 
> > No, nr_pages might be 0 AFAICS.
> 
> Right but there is another err = 0 after the for loop.

No we have 
out_flush:
	/* Make sure we do not overwrite the existing error */
	err1 = do_move_pages_to_node(mm, &pagelist, current_node);
	if (!err1)
		err1 = store_status(status, start, current_node, i - start);
	if (!err)
		err = err1;

This is obviously not an act of beauty and probably a subject to a
cleanup but I just wanted this thing to be working first. Further
cleanups can go on top.

> > [...]
> >>> +		if (chunk_node == NUMA_NO_NODE) {
> >>> +			chunk_node = node;
> >>> +			chunk_start = i;
> >>> +		} else if (node != chunk_node) {
> >>> +			err = do_move_pages_to_node(mm, &pagelist, chunk_node);
> >>> +			if (err)
> >>> +				goto out;
> >>> +			err = store_status(status, chunk_start, chunk_node, i - chunk_start);
> >>> +			if (err)
> >>> +				goto out;
> >>> +			chunk_start = i;
> >>> +			chunk_node = node;
> >>>  		}
> 
> [...]
> 
> >>> +		err = do_move_pages_to_node(mm, &pagelist, chunk_node);
> >>> +		if (err)
> >>> +			goto out;
> >>> +		if (i > chunk_start) {
> >>> +			err = store_status(status, chunk_start, chunk_node, i - chunk_start);
> >>> +			if (err)
> >>> +				goto out;
> >>> +		}
> >>> +		chunk_node = NUMA_NO_NODE;
> >>
> >> This block of code is bit confusing.
> > 
> > I believe this is easier to grasp when looking at the resulting code.
> >>
> >> 1) Why attempt to migrate when just one page could not be isolated ?
> >> 2) 'i' is always greater than chunk_start except the starting page
> >> 3) Why reset chunk_node as NUMA_NO_NODE ?
> > 
> > This is all about flushing the pending state on an error and
> > distinguising a fresh batch.
> 
> Okay. Will test it out on a multi node system once I get hold of one.

Thanks. I have been testing this specific code path with the following
simple test program and numactl -m0. The code is rather crude so I've
always modified it manually to test different scenarios (this one keeps
every 1k page on the node node to test batching.
---
#include <sys/mman.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
#include <numaif.h>

int main()
{
        unsigned long nr_pages = 10000;
        size_t length = nr_pages << 12, i;
        unsigned char *addr = mmap(NULL, length, PROT_READ | PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
        void *addrs[nr_pages];
        int nodes[nr_pages];
        int status[nr_pages];
        char cmd[128];
        char ch;

        if (addr == MAP_FAILED)
                return 1;

        madvise(addr, length, MADV_NOHUGEPAGE);

        for (i = 0; i < length; i += 4096)
                addr[i] = 1;
        for (i = 0; i < nr_pages; i++)
        {
                addrs[i] = &addr[i * 4096];
                if (i%1024)
                        nodes[i] = 1;
                else
                        nodes[i] = 0;
                status[i] = 0;
        }
        snprintf(cmd, sizeof(cmd)-1, "grep %lx /proc/%d/numa_maps", addr, getpid());
        system(cmd);
        snprintf(cmd, sizeof(cmd)-1, "grep %lx -A20 /proc/%d/smaps", addr, getpid());
        system(cmd);
        read(0, &ch, 1);
        if (move_pages(0, nr_pages, addrs, nodes, status, MPOL_MF_MOVE)) {
                printf("move_pages: err:%d\n", errno);
        }
        snprintf(cmd, sizeof(cmd)-1, "grep %lx /proc/%d/numa_maps", addr, getpid());
        system(cmd);
        snprintf(cmd, sizeof(cmd)-1, "grep %lx -A20 /proc/%d/smaps", addr, getpid());
        system(cmd);
        return 0;
}

---

-- 
Michal Hocko
SUSE Labs

WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org>
To: Anshuman Khandual <khandual@linux.vnet.ibm.com>
Cc: linux-mm@kvack.org, Zi Yan <zi.yan@cs.rutgers.edu>,
	Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>,
	"Kirill A. Shutemov" <kirill@shutemov.name>,
	Vlastimil Babka <vbabka@suse.cz>,
	Andrew Morton <akpm@linux-foundation.org>,
	Andrea Reale <ar@linux.vnet.ibm.com>,
	LKML <linux-kernel@vger.kernel.org>
Subject: Re: [RFC PATCH 1/3] mm, numa: rework do_pages_move
Date: Wed, 3 Jan 2018 10:52:11 +0100	[thread overview]
Message-ID: <20180103095211.GC11319@dhcp22.suse.cz> (raw)
In-Reply-To: <32bec0c9-60e2-0362-9446-feb4de1b119c@linux.vnet.ibm.com>

On Wed 03-01-18 15:06:49, Anshuman Khandual wrote:
> On 01/03/2018 02:28 PM, Michal Hocko wrote:
> > On Wed 03-01-18 14:12:17, Anshuman Khandual wrote:
> >> On 12/08/2017 09:45 PM, Michal Hocko wrote:
[...]
> >>> @@ -1593,79 +1556,80 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes,
> >>>  			 const int __user *nodes,
> >>>  			 int __user *status, int flags)
> >>>  {
> >>> -	struct page_to_node *pm;
> >>> -	unsigned long chunk_nr_pages;
> >>> -	unsigned long chunk_start;
> >>> -	int err;
> >>> -
> >>> -	err = -ENOMEM;
> >>> -	pm = (struct page_to_node *)__get_free_page(GFP_KERNEL);
> >>> -	if (!pm)
> >>> -		goto out;
> >>> +	int chunk_node = NUMA_NO_NODE;
> >>> +	LIST_HEAD(pagelist);
> >>> +	int chunk_start, i;
> >>> +	int err = 0, err1;
> >>
> >> err init might not be required, its getting assigned to -EFAULT right away.
> > 
> > No, nr_pages might be 0 AFAICS.
> 
> Right but there is another err = 0 after the for loop.

No we have 
out_flush:
	/* Make sure we do not overwrite the existing error */
	err1 = do_move_pages_to_node(mm, &pagelist, current_node);
	if (!err1)
		err1 = store_status(status, start, current_node, i - start);
	if (!err)
		err = err1;

This is obviously not an act of beauty and probably a subject to a
cleanup but I just wanted this thing to be working first. Further
cleanups can go on top.

> > [...]
> >>> +		if (chunk_node == NUMA_NO_NODE) {
> >>> +			chunk_node = node;
> >>> +			chunk_start = i;
> >>> +		} else if (node != chunk_node) {
> >>> +			err = do_move_pages_to_node(mm, &pagelist, chunk_node);
> >>> +			if (err)
> >>> +				goto out;
> >>> +			err = store_status(status, chunk_start, chunk_node, i - chunk_start);
> >>> +			if (err)
> >>> +				goto out;
> >>> +			chunk_start = i;
> >>> +			chunk_node = node;
> >>>  		}
> 
> [...]
> 
> >>> +		err = do_move_pages_to_node(mm, &pagelist, chunk_node);
> >>> +		if (err)
> >>> +			goto out;
> >>> +		if (i > chunk_start) {
> >>> +			err = store_status(status, chunk_start, chunk_node, i - chunk_start);
> >>> +			if (err)
> >>> +				goto out;
> >>> +		}
> >>> +		chunk_node = NUMA_NO_NODE;
> >>
> >> This block of code is bit confusing.
> > 
> > I believe this is easier to grasp when looking at the resulting code.
> >>
> >> 1) Why attempt to migrate when just one page could not be isolated ?
> >> 2) 'i' is always greater than chunk_start except the starting page
> >> 3) Why reset chunk_node as NUMA_NO_NODE ?
> > 
> > This is all about flushing the pending state on an error and
> > distinguising a fresh batch.
> 
> Okay. Will test it out on a multi node system once I get hold of one.

Thanks. I have been testing this specific code path with the following
simple test program and numactl -m0. The code is rather crude so I've
always modified it manually to test different scenarios (this one keeps
every 1k page on the node node to test batching.
---
#include <sys/mman.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <stdio.h>
#include <errno.h>
#include <numaif.h>

int main()
{
        unsigned long nr_pages = 10000;
        size_t length = nr_pages << 12, i;
        unsigned char *addr = mmap(NULL, length, PROT_READ | PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0);
        void *addrs[nr_pages];
        int nodes[nr_pages];
        int status[nr_pages];
        char cmd[128];
        char ch;

        if (addr == MAP_FAILED)
                return 1;

        madvise(addr, length, MADV_NOHUGEPAGE);

        for (i = 0; i < length; i += 4096)
                addr[i] = 1;
        for (i = 0; i < nr_pages; i++)
        {
                addrs[i] = &addr[i * 4096];
                if (i%1024)
                        nodes[i] = 1;
                else
                        nodes[i] = 0;
                status[i] = 0;
        }
        snprintf(cmd, sizeof(cmd)-1, "grep %lx /proc/%d/numa_maps", addr, getpid());
        system(cmd);
        snprintf(cmd, sizeof(cmd)-1, "grep %lx -A20 /proc/%d/smaps", addr, getpid());
        system(cmd);
        read(0, &ch, 1);
        if (move_pages(0, nr_pages, addrs, nodes, status, MPOL_MF_MOVE)) {
                printf("move_pages: err:%d\n", errno);
        }
        snprintf(cmd, sizeof(cmd)-1, "grep %lx /proc/%d/numa_maps", addr, getpid());
        system(cmd);
        snprintf(cmd, sizeof(cmd)-1, "grep %lx -A20 /proc/%d/smaps", addr, getpid());
        system(cmd);
        return 0;
}

---

-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2018-01-03  9:52 UTC|newest]

Thread overview: 61+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-12-07 12:48 [RFC PATCH] mm: unclutter THP migration Michal Hocko
2017-12-07 12:48 ` Michal Hocko
2017-12-07 14:10 ` Zi Yan
2017-12-07 14:34   ` Michal Hocko
2017-12-07 14:34     ` Michal Hocko
2017-12-08 16:15     ` [RFC PATCH 0/3] " Michal Hocko
2017-12-08 16:15       ` Michal Hocko
2017-12-08 16:15       ` [RFC PATCH 1/3] mm, numa: rework do_pages_move Michal Hocko
2017-12-08 16:15         ` Michal Hocko
2017-12-13 12:07         ` Kirill A. Shutemov
2017-12-13 12:07           ` Kirill A. Shutemov
2017-12-13 12:17           ` Michal Hocko
2017-12-13 12:17             ` Michal Hocko
2017-12-13 12:47             ` Kirill A. Shutemov
2017-12-13 12:47               ` Kirill A. Shutemov
2017-12-13 14:10               ` Michal Hocko
2017-12-13 14:10                 ` Michal Hocko
2017-12-13 14:27                 ` Kirill A. Shutemov
2017-12-13 14:27                   ` Kirill A. Shutemov
2017-12-13 14:39         ` Michal Hocko
2017-12-13 14:39           ` Michal Hocko
2017-12-14 15:35           ` Kirill A. Shutemov
2017-12-14 15:35             ` Kirill A. Shutemov
2017-12-15  9:28             ` Michal Hocko
2017-12-15  9:28               ` Michal Hocko
2017-12-15  9:51               ` Kirill A. Shutemov
2017-12-15  9:51                 ` Kirill A. Shutemov
2017-12-15  9:57                 ` Michal Hocko
2017-12-15  9:57                   ` Michal Hocko
2018-01-02 11:25         ` Anshuman Khandual
2018-01-02 11:25           ` Anshuman Khandual
2018-01-02 12:12           ` Michal Hocko
2018-01-02 12:12             ` Michal Hocko
2018-01-03  3:11             ` Anshuman Khandual
2018-01-03  3:11               ` Anshuman Khandual
2018-01-03  8:42         ` Anshuman Khandual
2018-01-03  8:42           ` Anshuman Khandual
2018-01-03  8:58           ` Michal Hocko
2018-01-03  8:58             ` Michal Hocko
2018-01-03  9:36             ` Anshuman Khandual
2018-01-03  9:36               ` Anshuman Khandual
2018-01-03  9:52               ` Michal Hocko [this message]
2018-01-03  9:52                 ` Michal Hocko
2017-12-08 16:15       ` [RFC PATCH 2/3] mm, migrate: remove reason argument from new_page_t Michal Hocko
2017-12-08 16:15         ` Michal Hocko
2017-12-27  2:12         ` Zi Yan
2017-12-29 11:32           ` Michal Hocko
2017-12-29 11:32             ` Michal Hocko
2017-12-08 16:15       ` [RFC PATCH 3/3] mm: unclutter THP migration Michal Hocko
2017-12-08 16:15         ` Michal Hocko
2017-12-13 12:20         ` Kirill A. Shutemov
2017-12-13 12:20           ` Kirill A. Shutemov
2017-12-27  2:19         ` Zi Yan
2017-12-29 11:36           ` Michal Hocko
2017-12-29 11:36             ` Michal Hocko
2017-12-29 15:45             ` Zi Yan
2017-12-31  9:07               ` Michal Hocko
2017-12-31  9:07                 ` Michal Hocko
2017-12-31 13:09                 ` Zi Yan
2017-12-19 12:07       ` [RFC PATCH 0/3] " Michal Hocko
2017-12-19 12:07         ` Michal Hocko

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180103095211.GC11319@dhcp22.suse.cz \
    --to=mhocko@kernel.org \
    --cc=akpm@linux-foundation.org \
    --cc=ar@linux.vnet.ibm.com \
    --cc=khandual@linux.vnet.ibm.com \
    --cc=kirill@shutemov.name \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=n-horiguchi@ah.jp.nec.com \
    --cc=vbabka@suse.cz \
    --cc=zi.yan@cs.rutgers.edu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.