From: Michal Hocko <mhocko@kernel.org> To: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: linux-mm@kvack.org, Zi Yan <zi.yan@cs.rutgers.edu>, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>, "Kirill A. Shutemov" <kirill@shutemov.name>, Vlastimil Babka <vbabka@suse.cz>, Andrew Morton <akpm@linux-foundation.org>, Andrea Reale <ar@linux.vnet.ibm.com>, LKML <linux-kernel@vger.kernel.org> Subject: Re: [RFC PATCH 1/3] mm, numa: rework do_pages_move Date: Wed, 3 Jan 2018 10:52:11 +0100 [thread overview] Message-ID: <20180103095211.GC11319@dhcp22.suse.cz> (raw) In-Reply-To: <32bec0c9-60e2-0362-9446-feb4de1b119c@linux.vnet.ibm.com> On Wed 03-01-18 15:06:49, Anshuman Khandual wrote: > On 01/03/2018 02:28 PM, Michal Hocko wrote: > > On Wed 03-01-18 14:12:17, Anshuman Khandual wrote: > >> On 12/08/2017 09:45 PM, Michal Hocko wrote: [...] > >>> @@ -1593,79 +1556,80 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, > >>> const int __user *nodes, > >>> int __user *status, int flags) > >>> { > >>> - struct page_to_node *pm; > >>> - unsigned long chunk_nr_pages; > >>> - unsigned long chunk_start; > >>> - int err; > >>> - > >>> - err = -ENOMEM; > >>> - pm = (struct page_to_node *)__get_free_page(GFP_KERNEL); > >>> - if (!pm) > >>> - goto out; > >>> + int chunk_node = NUMA_NO_NODE; > >>> + LIST_HEAD(pagelist); > >>> + int chunk_start, i; > >>> + int err = 0, err1; > >> > >> err init might not be required, its getting assigned to -EFAULT right away. > > > > No, nr_pages might be 0 AFAICS. > > Right but there is another err = 0 after the for loop. No we have out_flush: /* Make sure we do not overwrite the existing error */ err1 = do_move_pages_to_node(mm, &pagelist, current_node); if (!err1) err1 = store_status(status, start, current_node, i - start); if (!err) err = err1; This is obviously not an act of beauty and probably a subject to a cleanup but I just wanted this thing to be working first. Further cleanups can go on top. > > [...] > >>> + if (chunk_node == NUMA_NO_NODE) { > >>> + chunk_node = node; > >>> + chunk_start = i; > >>> + } else if (node != chunk_node) { > >>> + err = do_move_pages_to_node(mm, &pagelist, chunk_node); > >>> + if (err) > >>> + goto out; > >>> + err = store_status(status, chunk_start, chunk_node, i - chunk_start); > >>> + if (err) > >>> + goto out; > >>> + chunk_start = i; > >>> + chunk_node = node; > >>> } > > [...] > > >>> + err = do_move_pages_to_node(mm, &pagelist, chunk_node); > >>> + if (err) > >>> + goto out; > >>> + if (i > chunk_start) { > >>> + err = store_status(status, chunk_start, chunk_node, i - chunk_start); > >>> + if (err) > >>> + goto out; > >>> + } > >>> + chunk_node = NUMA_NO_NODE; > >> > >> This block of code is bit confusing. > > > > I believe this is easier to grasp when looking at the resulting code. > >> > >> 1) Why attempt to migrate when just one page could not be isolated ? > >> 2) 'i' is always greater than chunk_start except the starting page > >> 3) Why reset chunk_node as NUMA_NO_NODE ? > > > > This is all about flushing the pending state on an error and > > distinguising a fresh batch. > > Okay. Will test it out on a multi node system once I get hold of one. Thanks. I have been testing this specific code path with the following simple test program and numactl -m0. The code is rather crude so I've always modified it manually to test different scenarios (this one keeps every 1k page on the node node to test batching. --- #include <sys/mman.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <stdio.h> #include <errno.h> #include <numaif.h> int main() { unsigned long nr_pages = 10000; size_t length = nr_pages << 12, i; unsigned char *addr = mmap(NULL, length, PROT_READ | PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); void *addrs[nr_pages]; int nodes[nr_pages]; int status[nr_pages]; char cmd[128]; char ch; if (addr == MAP_FAILED) return 1; madvise(addr, length, MADV_NOHUGEPAGE); for (i = 0; i < length; i += 4096) addr[i] = 1; for (i = 0; i < nr_pages; i++) { addrs[i] = &addr[i * 4096]; if (i%1024) nodes[i] = 1; else nodes[i] = 0; status[i] = 0; } snprintf(cmd, sizeof(cmd)-1, "grep %lx /proc/%d/numa_maps", addr, getpid()); system(cmd); snprintf(cmd, sizeof(cmd)-1, "grep %lx -A20 /proc/%d/smaps", addr, getpid()); system(cmd); read(0, &ch, 1); if (move_pages(0, nr_pages, addrs, nodes, status, MPOL_MF_MOVE)) { printf("move_pages: err:%d\n", errno); } snprintf(cmd, sizeof(cmd)-1, "grep %lx /proc/%d/numa_maps", addr, getpid()); system(cmd); snprintf(cmd, sizeof(cmd)-1, "grep %lx -A20 /proc/%d/smaps", addr, getpid()); system(cmd); return 0; } --- -- Michal Hocko SUSE Labs
WARNING: multiple messages have this Message-ID (diff)
From: Michal Hocko <mhocko@kernel.org> To: Anshuman Khandual <khandual@linux.vnet.ibm.com> Cc: linux-mm@kvack.org, Zi Yan <zi.yan@cs.rutgers.edu>, Naoya Horiguchi <n-horiguchi@ah.jp.nec.com>, "Kirill A. Shutemov" <kirill@shutemov.name>, Vlastimil Babka <vbabka@suse.cz>, Andrew Morton <akpm@linux-foundation.org>, Andrea Reale <ar@linux.vnet.ibm.com>, LKML <linux-kernel@vger.kernel.org> Subject: Re: [RFC PATCH 1/3] mm, numa: rework do_pages_move Date: Wed, 3 Jan 2018 10:52:11 +0100 [thread overview] Message-ID: <20180103095211.GC11319@dhcp22.suse.cz> (raw) In-Reply-To: <32bec0c9-60e2-0362-9446-feb4de1b119c@linux.vnet.ibm.com> On Wed 03-01-18 15:06:49, Anshuman Khandual wrote: > On 01/03/2018 02:28 PM, Michal Hocko wrote: > > On Wed 03-01-18 14:12:17, Anshuman Khandual wrote: > >> On 12/08/2017 09:45 PM, Michal Hocko wrote: [...] > >>> @@ -1593,79 +1556,80 @@ static int do_pages_move(struct mm_struct *mm, nodemask_t task_nodes, > >>> const int __user *nodes, > >>> int __user *status, int flags) > >>> { > >>> - struct page_to_node *pm; > >>> - unsigned long chunk_nr_pages; > >>> - unsigned long chunk_start; > >>> - int err; > >>> - > >>> - err = -ENOMEM; > >>> - pm = (struct page_to_node *)__get_free_page(GFP_KERNEL); > >>> - if (!pm) > >>> - goto out; > >>> + int chunk_node = NUMA_NO_NODE; > >>> + LIST_HEAD(pagelist); > >>> + int chunk_start, i; > >>> + int err = 0, err1; > >> > >> err init might not be required, its getting assigned to -EFAULT right away. > > > > No, nr_pages might be 0 AFAICS. > > Right but there is another err = 0 after the for loop. No we have out_flush: /* Make sure we do not overwrite the existing error */ err1 = do_move_pages_to_node(mm, &pagelist, current_node); if (!err1) err1 = store_status(status, start, current_node, i - start); if (!err) err = err1; This is obviously not an act of beauty and probably a subject to a cleanup but I just wanted this thing to be working first. Further cleanups can go on top. > > [...] > >>> + if (chunk_node == NUMA_NO_NODE) { > >>> + chunk_node = node; > >>> + chunk_start = i; > >>> + } else if (node != chunk_node) { > >>> + err = do_move_pages_to_node(mm, &pagelist, chunk_node); > >>> + if (err) > >>> + goto out; > >>> + err = store_status(status, chunk_start, chunk_node, i - chunk_start); > >>> + if (err) > >>> + goto out; > >>> + chunk_start = i; > >>> + chunk_node = node; > >>> } > > [...] > > >>> + err = do_move_pages_to_node(mm, &pagelist, chunk_node); > >>> + if (err) > >>> + goto out; > >>> + if (i > chunk_start) { > >>> + err = store_status(status, chunk_start, chunk_node, i - chunk_start); > >>> + if (err) > >>> + goto out; > >>> + } > >>> + chunk_node = NUMA_NO_NODE; > >> > >> This block of code is bit confusing. > > > > I believe this is easier to grasp when looking at the resulting code. > >> > >> 1) Why attempt to migrate when just one page could not be isolated ? > >> 2) 'i' is always greater than chunk_start except the starting page > >> 3) Why reset chunk_node as NUMA_NO_NODE ? > > > > This is all about flushing the pending state on an error and > > distinguising a fresh batch. > > Okay. Will test it out on a multi node system once I get hold of one. Thanks. I have been testing this specific code path with the following simple test program and numactl -m0. The code is rather crude so I've always modified it manually to test different scenarios (this one keeps every 1k page on the node node to test batching. --- #include <sys/mman.h> #include <stdlib.h> #include <string.h> #include <unistd.h> #include <stdio.h> #include <errno.h> #include <numaif.h> int main() { unsigned long nr_pages = 10000; size_t length = nr_pages << 12, i; unsigned char *addr = mmap(NULL, length, PROT_READ | PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0); void *addrs[nr_pages]; int nodes[nr_pages]; int status[nr_pages]; char cmd[128]; char ch; if (addr == MAP_FAILED) return 1; madvise(addr, length, MADV_NOHUGEPAGE); for (i = 0; i < length; i += 4096) addr[i] = 1; for (i = 0; i < nr_pages; i++) { addrs[i] = &addr[i * 4096]; if (i%1024) nodes[i] = 1; else nodes[i] = 0; status[i] = 0; } snprintf(cmd, sizeof(cmd)-1, "grep %lx /proc/%d/numa_maps", addr, getpid()); system(cmd); snprintf(cmd, sizeof(cmd)-1, "grep %lx -A20 /proc/%d/smaps", addr, getpid()); system(cmd); read(0, &ch, 1); if (move_pages(0, nr_pages, addrs, nodes, status, MPOL_MF_MOVE)) { printf("move_pages: err:%d\n", errno); } snprintf(cmd, sizeof(cmd)-1, "grep %lx /proc/%d/numa_maps", addr, getpid()); system(cmd); snprintf(cmd, sizeof(cmd)-1, "grep %lx -A20 /proc/%d/smaps", addr, getpid()); system(cmd); return 0; } --- -- Michal Hocko SUSE Labs -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2018-01-03 9:52 UTC|newest] Thread overview: 61+ messages / expand[flat|nested] mbox.gz Atom feed top 2017-12-07 12:48 [RFC PATCH] mm: unclutter THP migration Michal Hocko 2017-12-07 12:48 ` Michal Hocko 2017-12-07 14:10 ` Zi Yan 2017-12-07 14:34 ` Michal Hocko 2017-12-07 14:34 ` Michal Hocko 2017-12-08 16:15 ` [RFC PATCH 0/3] " Michal Hocko 2017-12-08 16:15 ` Michal Hocko 2017-12-08 16:15 ` [RFC PATCH 1/3] mm, numa: rework do_pages_move Michal Hocko 2017-12-08 16:15 ` Michal Hocko 2017-12-13 12:07 ` Kirill A. Shutemov 2017-12-13 12:07 ` Kirill A. Shutemov 2017-12-13 12:17 ` Michal Hocko 2017-12-13 12:17 ` Michal Hocko 2017-12-13 12:47 ` Kirill A. Shutemov 2017-12-13 12:47 ` Kirill A. Shutemov 2017-12-13 14:10 ` Michal Hocko 2017-12-13 14:10 ` Michal Hocko 2017-12-13 14:27 ` Kirill A. Shutemov 2017-12-13 14:27 ` Kirill A. Shutemov 2017-12-13 14:39 ` Michal Hocko 2017-12-13 14:39 ` Michal Hocko 2017-12-14 15:35 ` Kirill A. Shutemov 2017-12-14 15:35 ` Kirill A. Shutemov 2017-12-15 9:28 ` Michal Hocko 2017-12-15 9:28 ` Michal Hocko 2017-12-15 9:51 ` Kirill A. Shutemov 2017-12-15 9:51 ` Kirill A. Shutemov 2017-12-15 9:57 ` Michal Hocko 2017-12-15 9:57 ` Michal Hocko 2018-01-02 11:25 ` Anshuman Khandual 2018-01-02 11:25 ` Anshuman Khandual 2018-01-02 12:12 ` Michal Hocko 2018-01-02 12:12 ` Michal Hocko 2018-01-03 3:11 ` Anshuman Khandual 2018-01-03 3:11 ` Anshuman Khandual 2018-01-03 8:42 ` Anshuman Khandual 2018-01-03 8:42 ` Anshuman Khandual 2018-01-03 8:58 ` Michal Hocko 2018-01-03 8:58 ` Michal Hocko 2018-01-03 9:36 ` Anshuman Khandual 2018-01-03 9:36 ` Anshuman Khandual 2018-01-03 9:52 ` Michal Hocko [this message] 2018-01-03 9:52 ` Michal Hocko 2017-12-08 16:15 ` [RFC PATCH 2/3] mm, migrate: remove reason argument from new_page_t Michal Hocko 2017-12-08 16:15 ` Michal Hocko 2017-12-27 2:12 ` Zi Yan 2017-12-29 11:32 ` Michal Hocko 2017-12-29 11:32 ` Michal Hocko 2017-12-08 16:15 ` [RFC PATCH 3/3] mm: unclutter THP migration Michal Hocko 2017-12-08 16:15 ` Michal Hocko 2017-12-13 12:20 ` Kirill A. Shutemov 2017-12-13 12:20 ` Kirill A. Shutemov 2017-12-27 2:19 ` Zi Yan 2017-12-29 11:36 ` Michal Hocko 2017-12-29 11:36 ` Michal Hocko 2017-12-29 15:45 ` Zi Yan 2017-12-31 9:07 ` Michal Hocko 2017-12-31 9:07 ` Michal Hocko 2017-12-31 13:09 ` Zi Yan 2017-12-19 12:07 ` [RFC PATCH 0/3] " Michal Hocko 2017-12-19 12:07 ` Michal Hocko
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=20180103095211.GC11319@dhcp22.suse.cz \ --to=mhocko@kernel.org \ --cc=akpm@linux-foundation.org \ --cc=ar@linux.vnet.ibm.com \ --cc=khandual@linux.vnet.ibm.com \ --cc=kirill@shutemov.name \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=n-horiguchi@ah.jp.nec.com \ --cc=vbabka@suse.cz \ --cc=zi.yan@cs.rutgers.edu \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.