From: Anshuman Khandual <khandual@linux.vnet.ibm.com> To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: mhocko@suse.com, js1304@gmail.com, vbabka@suse.cz, mgorman@suse.de, minchan@kernel.org, akpm@linux-foundation.org, aneesh.kumar@linux.vnet.ibm.com, bsingharora@gmail.com Subject: [RFC 1/8] mm: Define coherent device memory node Date: Mon, 24 Oct 2016 10:01:50 +0530 [thread overview] Message-ID: <1477283517-2504-2-git-send-email-khandual@linux.vnet.ibm.com> (raw) In-Reply-To: <1477283517-2504-1-git-send-email-khandual@linux.vnet.ibm.com> There are certain devices like specialized accelerator, GPU cards, network cards, FPGA cards etc which might contain onboard memory which is coherent along with the existing system RAM while being accessed either from the CPU or from the device. They share some similar properties with that of normal system RAM but at the same time can also be different with respect to system RAM. User applications might be interested in using this kind of coherent device memory explicitly or implicitly along side the system RAM utilizing all possible core memory functions like anon mapping (LRU), file mapping (LRU), page cache (LRU), driver managed (non LRU), HW poisoning, NUMA migrations etc. To achieve this kind of tight integration with core memory subsystem, the device onbaord coherent memory must be represented as a memory only NUMA node. At the same time pglist_data structure (which is node's memory representation) of this NUMA node must also be differentiated indicating that it's coherent device memory not regular system RAM. After achieving the integration with core memory subsystem through a marked pglist_data structure, coherent device memory might still need some special consideration inside the kernel. There can be a variety of coherent memory nodes with different expectations from the core kernel memory. But right now only one kind of special treatment is considered which requires certain isolation. Now consider the case of a coherent device memory node type which requires isolation. This kind of coherent memory is onboard an external device attached to the system through a link where there is always a chance of a link failure taking down the entire memory node with it. More over the memory might also have higher chance of ECC failure as compared to the system RAM. Hence allocation into this kind of coherent memory node should be regulated. Kernel allocations must not come here. Normal user space allocations too should not come here implicitly (without user application knowing about it). This summarizes isolation requirement of certain kind of coherent device memory node as an example. There can be different kinds of isolation requirement also. Some coherent memory devices might not require isolation altogether after all. Then there might be other coherent memory devices which might require some other special treatment after being part of core memory representation For now, will look into isolation seeking coherent device memory node not the other ones. This adds a new 'bool coherent' element in pglist_data structure which can identify any coherent device node. Instead this can be a u64 which can then hold an array of properties bits for various types of coherent devices in future. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> --- include/linux/mmzone.h | 29 +++++++++++++++++++++++++++++ mm/Kconfig | 13 +++++++++++++ 2 files changed, 42 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 7f2ae99..821dffb 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -722,8 +722,37 @@ typedef struct pglist_data { /* Per-node vmstats */ struct per_cpu_nodestat __percpu *per_cpu_nodestats; atomic_long_t vm_stat[NR_VM_NODE_STAT_ITEMS]; + +#ifdef CONFIG_COHERENT_DEVICE + /* + * Coherent device memory node + * + * Devices containing coherent memory is represented as a + * special coherent memory NUMA node, should be identified + * differently compared to normal memory nodes. Though it + * shares lot of common properties with system memory, it + * also has some differentiating factors as well. + * + * XXX: Though this is a bool which identifies the isolation + * requiring coherent device memory node right now, it can be + * extended as a bit mask to represent different properties + * for future coherent device memory nodes. + */ + bool coherent_device; +#endif } pg_data_t; +#ifdef CONFIG_COHERENT_DEVICE +#define node_cdm(nid) (NODE_DATA(nid)->coherent_device) +#define set_cdm_isolation(nid) (node_cdm(nid) = 1) +#define clr_cdm_isolation(nid) (node_cdm(nid) = 0) +#define isolated_cdm_node(nid) (node_cdm(nid) == 1) +#else +#define set_cdm_isolation(nid) () +#define clr_cdm_isolation(nid) () +#define isolated_cdm_node(nid) (0) +#endif + #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) #define node_spanned_pages(nid) (NODE_DATA(nid)->node_spanned_pages) #ifdef CONFIG_FLAT_NODE_MEM_MAP diff --git a/mm/Kconfig b/mm/Kconfig index be0ee11..cb50468 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -704,6 +704,19 @@ config ZONE_DEVICE If FS_DAX is enabled, then say Y. +config COHERENT_DEVICE + bool "Coherent device memory support" + depends on MEMORY_HOTPLUG + depends on MEMORY_HOTREMOVE + depends on PPC64 + default y + help + Coherent device memory node support enables the system to hotplug + a device with coherent memory as a normal system memory node. FPGA, + network, GPU cards etc might contain coherent memory. + + If not sure, then say N. + config FRAME_VECTOR bool -- 2.1.0
WARNING: multiple messages have this Message-ID (diff)
From: Anshuman Khandual <khandual@linux.vnet.ibm.com> To: linux-kernel@vger.kernel.org, linux-mm@kvack.org Cc: mhocko@suse.com, js1304@gmail.com, vbabka@suse.cz, mgorman@suse.de, minchan@kernel.org, akpm@linux-foundation.org, aneesh.kumar@linux.vnet.ibm.com, bsingharora@gmail.com Subject: [RFC 1/8] mm: Define coherent device memory node Date: Mon, 24 Oct 2016 10:01:50 +0530 [thread overview] Message-ID: <1477283517-2504-2-git-send-email-khandual@linux.vnet.ibm.com> (raw) In-Reply-To: <1477283517-2504-1-git-send-email-khandual@linux.vnet.ibm.com> There are certain devices like specialized accelerator, GPU cards, network cards, FPGA cards etc which might contain onboard memory which is coherent along with the existing system RAM while being accessed either from the CPU or from the device. They share some similar properties with that of normal system RAM but at the same time can also be different with respect to system RAM. User applications might be interested in using this kind of coherent device memory explicitly or implicitly along side the system RAM utilizing all possible core memory functions like anon mapping (LRU), file mapping (LRU), page cache (LRU), driver managed (non LRU), HW poisoning, NUMA migrations etc. To achieve this kind of tight integration with core memory subsystem, the device onbaord coherent memory must be represented as a memory only NUMA node. At the same time pglist_data structure (which is node's memory representation) of this NUMA node must also be differentiated indicating that it's coherent device memory not regular system RAM. After achieving the integration with core memory subsystem through a marked pglist_data structure, coherent device memory might still need some special consideration inside the kernel. There can be a variety of coherent memory nodes with different expectations from the core kernel memory. But right now only one kind of special treatment is considered which requires certain isolation. Now consider the case of a coherent device memory node type which requires isolation. This kind of coherent memory is onboard an external device attached to the system through a link where there is always a chance of a link failure taking down the entire memory node with it. More over the memory might also have higher chance of ECC failure as compared to the system RAM. Hence allocation into this kind of coherent memory node should be regulated. Kernel allocations must not come here. Normal user space allocations too should not come here implicitly (without user application knowing about it). This summarizes isolation requirement of certain kind of coherent device memory node as an example. There can be different kinds of isolation requirement also. Some coherent memory devices might not require isolation altogether after all. Then there might be other coherent memory devices which might require some other special treatment after being part of core memory representation For now, will look into isolation seeking coherent device memory node not the other ones. This adds a new 'bool coherent' element in pglist_data structure which can identify any coherent device node. Instead this can be a u64 which can then hold an array of properties bits for various types of coherent devices in future. Signed-off-by: Anshuman Khandual <khandual@linux.vnet.ibm.com> --- include/linux/mmzone.h | 29 +++++++++++++++++++++++++++++ mm/Kconfig | 13 +++++++++++++ 2 files changed, 42 insertions(+) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 7f2ae99..821dffb 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -722,8 +722,37 @@ typedef struct pglist_data { /* Per-node vmstats */ struct per_cpu_nodestat __percpu *per_cpu_nodestats; atomic_long_t vm_stat[NR_VM_NODE_STAT_ITEMS]; + +#ifdef CONFIG_COHERENT_DEVICE + /* + * Coherent device memory node + * + * Devices containing coherent memory is represented as a + * special coherent memory NUMA node, should be identified + * differently compared to normal memory nodes. Though it + * shares lot of common properties with system memory, it + * also has some differentiating factors as well. + * + * XXX: Though this is a bool which identifies the isolation + * requiring coherent device memory node right now, it can be + * extended as a bit mask to represent different properties + * for future coherent device memory nodes. + */ + bool coherent_device; +#endif } pg_data_t; +#ifdef CONFIG_COHERENT_DEVICE +#define node_cdm(nid) (NODE_DATA(nid)->coherent_device) +#define set_cdm_isolation(nid) (node_cdm(nid) = 1) +#define clr_cdm_isolation(nid) (node_cdm(nid) = 0) +#define isolated_cdm_node(nid) (node_cdm(nid) == 1) +#else +#define set_cdm_isolation(nid) () +#define clr_cdm_isolation(nid) () +#define isolated_cdm_node(nid) (0) +#endif + #define node_present_pages(nid) (NODE_DATA(nid)->node_present_pages) #define node_spanned_pages(nid) (NODE_DATA(nid)->node_spanned_pages) #ifdef CONFIG_FLAT_NODE_MEM_MAP diff --git a/mm/Kconfig b/mm/Kconfig index be0ee11..cb50468 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -704,6 +704,19 @@ config ZONE_DEVICE If FS_DAX is enabled, then say Y. +config COHERENT_DEVICE + bool "Coherent device memory support" + depends on MEMORY_HOTPLUG + depends on MEMORY_HOTREMOVE + depends on PPC64 + default y + help + Coherent device memory node support enables the system to hotplug + a device with coherent memory as a normal system memory node. FPGA, + network, GPU cards etc might contain coherent memory. + + If not sure, then say N. + config FRAME_VECTOR bool -- 2.1.0 -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@kvack.org. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
next prev parent reply other threads:[~2016-10-24 4:32 UTC|newest] Thread overview: 135+ messages / expand[flat|nested] mbox.gz Atom feed top 2016-10-24 4:31 [RFC 0/8] Define coherent device memory node Anshuman Khandual 2016-10-24 4:31 ` Anshuman Khandual 2016-10-24 4:31 ` Anshuman Khandual [this message] 2016-10-24 4:31 ` [RFC 1/8] mm: " Anshuman Khandual 2016-10-24 17:09 ` Dave Hansen 2016-10-24 17:09 ` Dave Hansen 2016-10-25 1:22 ` Anshuman Khandual 2016-10-25 1:22 ` Anshuman Khandual 2016-10-25 15:47 ` Dave Hansen 2016-10-25 15:47 ` Dave Hansen 2016-10-24 4:31 ` [RFC 2/8] mm: Add specialized fallback zonelist for coherent device memory nodes Anshuman Khandual 2016-10-24 4:31 ` Anshuman Khandual 2016-10-24 17:10 ` Dave Hansen 2016-10-24 17:10 ` Dave Hansen 2016-10-25 1:27 ` Anshuman Khandual 2016-10-25 1:27 ` Anshuman Khandual 2016-11-17 7:40 ` Anshuman Khandual 2016-11-17 7:40 ` Anshuman Khandual 2016-11-17 7:59 ` [DRAFT 1/2] mm/cpuset: Exclude CDM nodes from each task's mems_allowed node mask Anshuman Khandual 2016-11-17 7:59 ` Anshuman Khandual 2016-11-17 7:59 ` [DRAFT 2/2] mm/hugetlb: Restrict HugeTLB allocations only to the system RAM nodes Anshuman Khandual 2016-11-17 7:59 ` Anshuman Khandual 2016-11-17 8:28 ` [DRAFT 1/2] mm/cpuset: Exclude CDM nodes from each task's mems_allowed node mask kbuild test robot 2016-10-24 4:31 ` [RFC 3/8] mm: Isolate coherent device memory nodes from HugeTLB allocation paths Anshuman Khandual 2016-10-24 4:31 ` Anshuman Khandual 2016-10-24 17:16 ` Dave Hansen 2016-10-24 17:16 ` Dave Hansen 2016-10-25 4:15 ` Aneesh Kumar K.V 2016-10-25 4:15 ` Aneesh Kumar K.V 2016-10-25 7:17 ` Balbir Singh 2016-10-25 7:17 ` Balbir Singh 2016-10-25 7:25 ` Balbir Singh 2016-10-25 7:25 ` Balbir Singh 2016-10-24 4:31 ` [RFC 4/8] mm: Accommodate coherent device memory nodes in MPOL_BIND implementation Anshuman Khandual 2016-10-24 4:31 ` Anshuman Khandual 2016-10-24 4:31 ` [RFC 5/8] mm: Add new flag VM_CDM for coherent device memory Anshuman Khandual 2016-10-24 4:31 ` Anshuman Khandual 2016-10-24 17:38 ` Dave Hansen 2016-10-24 17:38 ` Dave Hansen 2016-10-24 18:00 ` Dave Hansen 2016-10-24 18:00 ` Dave Hansen 2016-10-25 12:36 ` Balbir Singh 2016-10-25 12:36 ` Balbir Singh 2016-10-25 19:20 ` Aneesh Kumar K.V 2016-10-25 19:20 ` Aneesh Kumar K.V 2016-10-25 20:01 ` Dave Hansen 2016-10-25 20:01 ` Dave Hansen 2016-10-24 4:31 ` [RFC 6/8] mm: Make VM_CDM marked VMAs non migratable Anshuman Khandual 2016-10-24 4:31 ` Anshuman Khandual 2016-10-24 4:31 ` [RFC 7/8] mm: Add a new migration function migrate_virtual_range() Anshuman Khandual 2016-10-24 4:31 ` Anshuman Khandual 2016-10-24 4:31 ` [RFC 8/8] mm: Add N_COHERENT_DEVICE node type into node_states[] Anshuman Khandual 2016-10-24 4:31 ` Anshuman Khandual 2016-10-25 7:22 ` Balbir Singh 2016-10-25 7:22 ` Balbir Singh 2016-10-26 4:52 ` Anshuman Khandual 2016-10-26 4:52 ` Anshuman Khandual 2016-10-24 4:42 ` [DEBUG 00/10] Test and debug patches for coherent device memory Anshuman Khandual 2016-10-24 4:42 ` Anshuman Khandual 2016-10-24 4:42 ` [DEBUG 01/10] dt-bindings: Add doc for ibm,hotplug-aperture Anshuman Khandual 2016-10-24 4:42 ` Anshuman Khandual 2016-10-24 4:42 ` [DEBUG 02/10] powerpc/mm: Create numa nodes for hotplug memory Anshuman Khandual 2016-10-24 4:42 ` Anshuman Khandual 2016-10-24 4:42 ` [DEBUG 03/10] powerpc/mm: Allow memory hotplug into a memory less node Anshuman Khandual 2016-10-24 4:42 ` Anshuman Khandual 2016-10-24 4:42 ` [DEBUG 04/10] mm: Enable CONFIG_MOVABLE_NODE on powerpc Anshuman Khandual 2016-10-24 4:42 ` Anshuman Khandual 2016-10-24 4:42 ` [DEBUG 05/10] powerpc/mm: Identify isolation seeking coherent memory nodes during boot Anshuman Khandual 2016-10-24 4:42 ` Anshuman Khandual 2016-10-24 4:42 ` [DEBUG 06/10] mm: Export definition of 'zone_names' array through mmzone.h Anshuman Khandual 2016-10-24 4:42 ` Anshuman Khandual 2016-10-24 4:42 ` [DEBUG 07/10] mm: Add debugfs interface to dump each node's zonelist information Anshuman Khandual 2016-10-24 4:42 ` Anshuman Khandual 2016-10-24 4:42 ` [DEBUG 08/10] powerpc: Enable CONFIG_MOVABLE_NODE for PPC64 platform Anshuman Khandual 2016-10-24 4:42 ` Anshuman Khandual 2016-10-24 4:42 ` [DEBUG 09/10] drivers: Add two drivers for coherent device memory tests Anshuman Khandual 2016-10-24 4:42 ` Anshuman Khandual 2016-10-24 4:42 ` [DEBUG 10/10] test: Add a script to perform random VMA migrations across nodes Anshuman Khandual 2016-10-24 4:42 ` Anshuman Khandual 2016-10-24 17:09 ` [RFC 0/8] Define coherent device memory node Jerome Glisse 2016-10-24 17:09 ` Jerome Glisse 2016-10-25 4:26 ` Aneesh Kumar K.V 2016-10-25 4:26 ` Aneesh Kumar K.V 2016-10-25 15:16 ` Jerome Glisse 2016-10-25 15:16 ` Jerome Glisse 2016-10-26 11:09 ` Aneesh Kumar K.V 2016-10-26 11:09 ` Aneesh Kumar K.V 2016-10-26 16:07 ` Jerome Glisse 2016-10-26 16:07 ` Jerome Glisse 2016-10-28 5:29 ` Aneesh Kumar K.V 2016-10-28 5:29 ` Aneesh Kumar K.V 2016-10-28 16:16 ` Jerome Glisse 2016-10-28 16:16 ` Jerome Glisse 2016-11-05 5:21 ` Anshuman Khandual 2016-11-05 5:21 ` Anshuman Khandual 2016-11-05 18:02 ` Jerome Glisse 2016-11-05 18:02 ` Jerome Glisse 2016-10-25 4:59 ` Aneesh Kumar K.V 2016-10-25 4:59 ` Aneesh Kumar K.V 2016-10-25 15:32 ` Jerome Glisse 2016-10-25 15:32 ` Jerome Glisse 2016-10-25 17:31 ` Aneesh Kumar K.V 2016-10-25 17:31 ` Aneesh Kumar K.V 2016-10-25 18:52 ` Jerome Glisse 2016-10-25 18:52 ` Jerome Glisse 2016-10-26 11:13 ` Anshuman Khandual 2016-10-26 11:13 ` Anshuman Khandual 2016-10-26 16:02 ` Jerome Glisse 2016-10-26 16:02 ` Jerome Glisse 2016-10-27 4:38 ` Anshuman Khandual 2016-10-27 4:38 ` Anshuman Khandual 2016-10-27 7:03 ` Anshuman Khandual 2016-10-27 7:03 ` Anshuman Khandual 2016-10-27 15:05 ` Jerome Glisse 2016-10-27 15:05 ` Jerome Glisse 2016-10-28 5:47 ` Anshuman Khandual 2016-10-28 5:47 ` Anshuman Khandual 2016-10-28 16:08 ` Jerome Glisse 2016-10-28 16:08 ` Jerome Glisse 2016-10-26 12:56 ` Anshuman Khandual 2016-10-26 12:56 ` Anshuman Khandual 2016-10-26 16:28 ` Jerome Glisse 2016-10-26 16:28 ` Jerome Glisse 2016-10-27 10:23 ` Balbir Singh 2016-10-27 10:23 ` Balbir Singh 2016-10-25 12:07 ` Balbir Singh 2016-10-25 12:07 ` Balbir Singh 2016-10-25 15:21 ` Jerome Glisse 2016-10-25 15:21 ` Jerome Glisse 2016-10-24 18:04 ` Dave Hansen 2016-10-24 18:04 ` Dave Hansen 2016-10-24 18:32 ` David Nellans 2016-10-24 18:32 ` David Nellans 2016-10-24 19:36 ` Dave Hansen 2016-10-24 19:36 ` Dave Hansen
Reply instructions: You may reply publicly to this message via plain-text email using any one of the following methods: * Save the following mbox file, import it into your mail client, and reply-to-all from there: mbox Avoid top-posting and favor interleaved quoting: https://en.wikipedia.org/wiki/Posting_style#Interleaved_style * Reply using the --to, --cc, and --in-reply-to switches of git-send-email(1): git send-email \ --in-reply-to=1477283517-2504-2-git-send-email-khandual@linux.vnet.ibm.com \ --to=khandual@linux.vnet.ibm.com \ --cc=akpm@linux-foundation.org \ --cc=aneesh.kumar@linux.vnet.ibm.com \ --cc=bsingharora@gmail.com \ --cc=js1304@gmail.com \ --cc=linux-kernel@vger.kernel.org \ --cc=linux-mm@kvack.org \ --cc=mgorman@suse.de \ --cc=mhocko@suse.com \ --cc=minchan@kernel.org \ --cc=vbabka@suse.cz \ /path/to/YOUR_REPLY https://kernel.org/pub/software/scm/git/docs/git-send-email.html * If your mail client supports setting the In-Reply-To header via mailto: links, try the mailto: linkBe sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.