All of lore.kernel.org
 help / color / mirror / Atom feed
From: Laurent Dufour <ldufour@linux.vnet.ibm.com>
To: "Jérôme Glisse" <jglisse@redhat.com>,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Cc: John Hubbard <jhubbard@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	David Nellans <dnellans@nvidia.com>,
	Balbir Singh <bsingharora@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	cgroups@vger.kernel.org
Subject: Re: [HMM-v25 10/19] mm/memcontrol: support MEMORY_DEVICE_PRIVATE v4
Date: Tue, 5 Sep 2017 19:13:15 +0200	[thread overview]
Message-ID: <f239d1c2-7006-5ce4-7848-7d82e67533a9@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170817000548.32038-11-jglisse@redhat.com>

On 17/08/2017 02:05, Jérôme Glisse wrote:
> HMM pages (private or public device pages) are ZONE_DEVICE page and
> thus need special handling when it comes to lru or refcount. This
> patch make sure that memcontrol properly handle those when it face
> them. Those pages are use like regular pages in a process address
> space either as anonymous page or as file back page. So from memcg
> point of view we want to handle them like regular page for now at
> least.
> 
> Changed since v3:
>   - remove public support and move those chunk to separate patch
> Changed since v2:
>   - s/host/public
> Changed since v1:
>   - s/public/host
>   - add comments explaining how device memory behave and why
> 
> Signed-off-by: Jérôme Glisse <jglisse@redhat.com>
> Acked-by: Balbir Singh <bsingharora@gmail.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: cgroups@vger.kernel.org
> ---
>  kernel/memremap.c |  1 +
>  mm/memcontrol.c   | 52 ++++++++++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 49 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/memremap.c b/kernel/memremap.c
> index 398630c1fba3..f42d7483e886 100644
> --- a/kernel/memremap.c
> +++ b/kernel/memremap.c
> @@ -492,6 +492,7 @@ void put_zone_device_private_page(struct page *page)
>  		__ClearPageWaiters(page);
> 
>  		page->mapping = NULL;
> +		mem_cgroup_uncharge(page);
> 
>  		page->pgmap->page_free(page, page->pgmap->data);
>  	} else if (!count)
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 604fb3ca8028..977d1cf3493a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4407,12 +4407,13 @@ enum mc_target_type {
>  	MC_TARGET_NONE = 0,
>  	MC_TARGET_PAGE,
>  	MC_TARGET_SWAP,
> +	MC_TARGET_DEVICE,
>  };
> 
>  static struct page *mc_handle_present_pte(struct vm_area_struct *vma,
>  						unsigned long addr, pte_t ptent)
>  {
> -	struct page *page = vm_normal_page(vma, addr, ptent);
> +	struct page *page = _vm_normal_page(vma, addr, ptent, true);

Hi Jérôme,

As _vm_normal_page() is defined later in the patch 18, so this patch should
 break the bisectability.

Cheers,
Laurent.

> 
>  	if (!page || !page_mapped(page))
>  		return NULL;
> @@ -4429,7 +4430,7 @@ static struct page *mc_handle_present_pte(struct vm_area_struct *vma,
>  	return page;
>  }
> 
> -#ifdef CONFIG_SWAP
> +#if defined(CONFIG_SWAP) || defined(CONFIG_DEVICE_PRIVATE)
>  static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
>  			pte_t ptent, swp_entry_t *entry)
>  {
> @@ -4438,6 +4439,23 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
> 
>  	if (!(mc.flags & MOVE_ANON) || non_swap_entry(ent))
>  		return NULL;
> +
> +	/*
> +	 * Handle MEMORY_DEVICE_PRIVATE which are ZONE_DEVICE page belonging to
> +	 * a device and because they are not accessible by CPU they are store
> +	 * as special swap entry in the CPU page table.
> +	 */
> +	if (is_device_private_entry(ent)) {
> +		page = device_private_entry_to_page(ent);
> +		/*
> +		 * MEMORY_DEVICE_PRIVATE means ZONE_DEVICE page and which have
> +		 * a refcount of 1 when free (unlike normal page)
> +		 */
> +		if (!page_ref_add_unless(page, 1, 1))
> +			return NULL;
> +		return page;
> +	}
> +
>  	/*
>  	 * Because lookup_swap_cache() updates some statistics counter,
>  	 * we call find_get_page() with swapper_space directly.
> @@ -4598,6 +4616,12 @@ static int mem_cgroup_move_account(struct page *page,
>   *   2(MC_TARGET_SWAP): if the swap entry corresponding to this pte is a
>   *     target for charge migration. if @target is not NULL, the entry is stored
>   *     in target->ent.
> + *   3(MC_TARGET_DEVICE): like MC_TARGET_PAGE  but page is MEMORY_DEVICE_PRIVATE
> + *     (so ZONE_DEVICE page and thus not on the lru). For now we such page is
> + *     charge like a regular page would be as for all intent and purposes it is
> + *     just special memory taking the place of a regular page.
> + *
> + *     See Documentations/vm/hmm.txt and include/linux/hmm.h
>   *
>   * Called with pte lock held.
>   */
> @@ -4626,6 +4650,8 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
>  		 */
>  		if (page->mem_cgroup == mc.from) {
>  			ret = MC_TARGET_PAGE;
> +			if (is_device_private_page(page))
> +				ret = MC_TARGET_DEVICE;
>  			if (target)
>  				target->page = page;
>  		}
> @@ -4693,6 +4719,11 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd,
> 
>  	ptl = pmd_trans_huge_lock(pmd, vma);
>  	if (ptl) {
> +		/*
> +		 * Note their can not be MC_TARGET_DEVICE for now as we do not
> +		 * support transparent huge page with MEMORY_DEVICE_PUBLIC or
> +		 * MEMORY_DEVICE_PRIVATE but this might change.
> +		 */
>  		if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
>  			mc.precharge += HPAGE_PMD_NR;
>  		spin_unlock(ptl);
> @@ -4908,6 +4939,14 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
>  				putback_lru_page(page);
>  			}
>  			put_page(page);
> +		} else if (target_type == MC_TARGET_DEVICE) {
> +			page = target.page;
> +			if (!mem_cgroup_move_account(page, true,
> +						     mc.from, mc.to)) {
> +				mc.precharge -= HPAGE_PMD_NR;
> +				mc.moved_charge += HPAGE_PMD_NR;
> +			}
> +			put_page(page);
>  		}
>  		spin_unlock(ptl);
>  		return 0;
> @@ -4919,12 +4958,16 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
>  	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>  	for (; addr != end; addr += PAGE_SIZE) {
>  		pte_t ptent = *(pte++);
> +		bool device = false;
>  		swp_entry_t ent;
> 
>  		if (!mc.precharge)
>  			break;
> 
>  		switch (get_mctgt_type(vma, addr, ptent, &target)) {
> +		case MC_TARGET_DEVICE:
> +			device = true;
> +			/* fall through */
>  		case MC_TARGET_PAGE:
>  			page = target.page;
>  			/*
> @@ -4935,7 +4978,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
>  			 */
>  			if (PageTransCompound(page))
>  				goto put;
> -			if (isolate_lru_page(page))
> +			if (!device && isolate_lru_page(page))
>  				goto put;
>  			if (!mem_cgroup_move_account(page, false,
>  						mc.from, mc.to)) {
> @@ -4943,7 +4986,8 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
>  				/* we uncharge from mc.from later. */
>  				mc.moved_charge++;
>  			}
> -			putback_lru_page(page);
> +			if (!device)
> +				putback_lru_page(page);
>  put:			/* get_mctgt_type() gets the page */
>  			put_page(page);
>  			break;
> 

WARNING: multiple messages have this Message-ID (diff)
From: Laurent Dufour <ldufour@linux.vnet.ibm.com>
To: "Jérôme Glisse" <jglisse@redhat.com>,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org
Cc: John Hubbard <jhubbard@nvidia.com>,
	Dan Williams <dan.j.williams@intel.com>,
	David Nellans <dnellans@nvidia.com>,
	Balbir Singh <bsingharora@gmail.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Michal Hocko <mhocko@kernel.org>,
	Vladimir Davydov <vdavydov.dev@gmail.com>,
	cgroups@vger.kernel.org
Subject: Re: [HMM-v25 10/19] mm/memcontrol: support MEMORY_DEVICE_PRIVATE v4
Date: Tue, 5 Sep 2017 19:13:15 +0200	[thread overview]
Message-ID: <f239d1c2-7006-5ce4-7848-7d82e67533a9@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170817000548.32038-11-jglisse@redhat.com>

On 17/08/2017 02:05, JA(C)rA'me Glisse wrote:
> HMM pages (private or public device pages) are ZONE_DEVICE page and
> thus need special handling when it comes to lru or refcount. This
> patch make sure that memcontrol properly handle those when it face
> them. Those pages are use like regular pages in a process address
> space either as anonymous page or as file back page. So from memcg
> point of view we want to handle them like regular page for now at
> least.
> 
> Changed since v3:
>   - remove public support and move those chunk to separate patch
> Changed since v2:
>   - s/host/public
> Changed since v1:
>   - s/public/host
>   - add comments explaining how device memory behave and why
> 
> Signed-off-by: JA(C)rA'me Glisse <jglisse@redhat.com>
> Acked-by: Balbir Singh <bsingharora@gmail.com>
> Cc: Johannes Weiner <hannes@cmpxchg.org>
> Cc: Michal Hocko <mhocko@kernel.org>
> Cc: Vladimir Davydov <vdavydov.dev@gmail.com>
> Cc: cgroups@vger.kernel.org
> ---
>  kernel/memremap.c |  1 +
>  mm/memcontrol.c   | 52 ++++++++++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 49 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/memremap.c b/kernel/memremap.c
> index 398630c1fba3..f42d7483e886 100644
> --- a/kernel/memremap.c
> +++ b/kernel/memremap.c
> @@ -492,6 +492,7 @@ void put_zone_device_private_page(struct page *page)
>  		__ClearPageWaiters(page);
> 
>  		page->mapping = NULL;
> +		mem_cgroup_uncharge(page);
> 
>  		page->pgmap->page_free(page, page->pgmap->data);
>  	} else if (!count)
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 604fb3ca8028..977d1cf3493a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4407,12 +4407,13 @@ enum mc_target_type {
>  	MC_TARGET_NONE = 0,
>  	MC_TARGET_PAGE,
>  	MC_TARGET_SWAP,
> +	MC_TARGET_DEVICE,
>  };
> 
>  static struct page *mc_handle_present_pte(struct vm_area_struct *vma,
>  						unsigned long addr, pte_t ptent)
>  {
> -	struct page *page = vm_normal_page(vma, addr, ptent);
> +	struct page *page = _vm_normal_page(vma, addr, ptent, true);

Hi JA(C)rA'me,

As _vm_normal_page() is defined later in the patch 18, so this patch should
 break the bisectability.

Cheers,
Laurent.

> 
>  	if (!page || !page_mapped(page))
>  		return NULL;
> @@ -4429,7 +4430,7 @@ static struct page *mc_handle_present_pte(struct vm_area_struct *vma,
>  	return page;
>  }
> 
> -#ifdef CONFIG_SWAP
> +#if defined(CONFIG_SWAP) || defined(CONFIG_DEVICE_PRIVATE)
>  static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
>  			pte_t ptent, swp_entry_t *entry)
>  {
> @@ -4438,6 +4439,23 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
> 
>  	if (!(mc.flags & MOVE_ANON) || non_swap_entry(ent))
>  		return NULL;
> +
> +	/*
> +	 * Handle MEMORY_DEVICE_PRIVATE which are ZONE_DEVICE page belonging to
> +	 * a device and because they are not accessible by CPU they are store
> +	 * as special swap entry in the CPU page table.
> +	 */
> +	if (is_device_private_entry(ent)) {
> +		page = device_private_entry_to_page(ent);
> +		/*
> +		 * MEMORY_DEVICE_PRIVATE means ZONE_DEVICE page and which have
> +		 * a refcount of 1 when free (unlike normal page)
> +		 */
> +		if (!page_ref_add_unless(page, 1, 1))
> +			return NULL;
> +		return page;
> +	}
> +
>  	/*
>  	 * Because lookup_swap_cache() updates some statistics counter,
>  	 * we call find_get_page() with swapper_space directly.
> @@ -4598,6 +4616,12 @@ static int mem_cgroup_move_account(struct page *page,
>   *   2(MC_TARGET_SWAP): if the swap entry corresponding to this pte is a
>   *     target for charge migration. if @target is not NULL, the entry is stored
>   *     in target->ent.
> + *   3(MC_TARGET_DEVICE): like MC_TARGET_PAGE  but page is MEMORY_DEVICE_PRIVATE
> + *     (so ZONE_DEVICE page and thus not on the lru). For now we such page is
> + *     charge like a regular page would be as for all intent and purposes it is
> + *     just special memory taking the place of a regular page.
> + *
> + *     See Documentations/vm/hmm.txt and include/linux/hmm.h
>   *
>   * Called with pte lock held.
>   */
> @@ -4626,6 +4650,8 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
>  		 */
>  		if (page->mem_cgroup == mc.from) {
>  			ret = MC_TARGET_PAGE;
> +			if (is_device_private_page(page))
> +				ret = MC_TARGET_DEVICE;
>  			if (target)
>  				target->page = page;
>  		}
> @@ -4693,6 +4719,11 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd,
> 
>  	ptl = pmd_trans_huge_lock(pmd, vma);
>  	if (ptl) {
> +		/*
> +		 * Note their can not be MC_TARGET_DEVICE for now as we do not
> +		 * support transparent huge page with MEMORY_DEVICE_PUBLIC or
> +		 * MEMORY_DEVICE_PRIVATE but this might change.
> +		 */
>  		if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
>  			mc.precharge += HPAGE_PMD_NR;
>  		spin_unlock(ptl);
> @@ -4908,6 +4939,14 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
>  				putback_lru_page(page);
>  			}
>  			put_page(page);
> +		} else if (target_type == MC_TARGET_DEVICE) {
> +			page = target.page;
> +			if (!mem_cgroup_move_account(page, true,
> +						     mc.from, mc.to)) {
> +				mc.precharge -= HPAGE_PMD_NR;
> +				mc.moved_charge += HPAGE_PMD_NR;
> +			}
> +			put_page(page);
>  		}
>  		spin_unlock(ptl);
>  		return 0;
> @@ -4919,12 +4958,16 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
>  	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>  	for (; addr != end; addr += PAGE_SIZE) {
>  		pte_t ptent = *(pte++);
> +		bool device = false;
>  		swp_entry_t ent;
> 
>  		if (!mc.precharge)
>  			break;
> 
>  		switch (get_mctgt_type(vma, addr, ptent, &target)) {
> +		case MC_TARGET_DEVICE:
> +			device = true;
> +			/* fall through */
>  		case MC_TARGET_PAGE:
>  			page = target.page;
>  			/*
> @@ -4935,7 +4978,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
>  			 */
>  			if (PageTransCompound(page))
>  				goto put;
> -			if (isolate_lru_page(page))
> +			if (!device && isolate_lru_page(page))
>  				goto put;
>  			if (!mem_cgroup_move_account(page, false,
>  						mc.from, mc.to)) {
> @@ -4943,7 +4986,8 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
>  				/* we uncharge from mc.from later. */
>  				mc.moved_charge++;
>  			}
> -			putback_lru_page(page);
> +			if (!device)
> +				putback_lru_page(page);
>  put:			/* get_mctgt_type() gets the page */
>  			put_page(page);
>  			break;
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

WARNING: multiple messages have this Message-ID (diff)
From: Laurent Dufour <ldufour-23VcF4HTsmIX0ybBhKVfKdBPR1lH4CV8@public.gmane.org>
To: "Jérôme Glisse" <jglisse-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>,
	akpm-de/tnXTf+JLsfHDXvbKv3WD2FQJk+8+b@public.gmane.org,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA@public.gmane.org,
	linux-mm-Bw31MaZKKs3YtjvyW6yDsg@public.gmane.org
Cc: John Hubbard <jhubbard-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>,
	Dan Williams
	<dan.j.williams-ral2JQCrhuEAvxtiuMwx3w@public.gmane.org>,
	David Nellans <dnellans-DDmLM1+adcrQT0dZR+AlfA@public.gmane.org>,
	Balbir Singh
	<bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>,
	Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>,
	Vladimir Davydov
	<vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>,
	cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
Subject: Re: [HMM-v25 10/19] mm/memcontrol: support MEMORY_DEVICE_PRIVATE v4
Date: Tue, 5 Sep 2017 19:13:15 +0200	[thread overview]
Message-ID: <f239d1c2-7006-5ce4-7848-7d82e67533a9@linux.vnet.ibm.com> (raw)
In-Reply-To: <20170817000548.32038-11-jglisse-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>

On 17/08/2017 02:05, Jérôme Glisse wrote:
> HMM pages (private or public device pages) are ZONE_DEVICE page and
> thus need special handling when it comes to lru or refcount. This
> patch make sure that memcontrol properly handle those when it face
> them. Those pages are use like regular pages in a process address
> space either as anonymous page or as file back page. So from memcg
> point of view we want to handle them like regular page for now at
> least.
> 
> Changed since v3:
>   - remove public support and move those chunk to separate patch
> Changed since v2:
>   - s/host/public
> Changed since v1:
>   - s/public/host
>   - add comments explaining how device memory behave and why
> 
> Signed-off-by: Jérôme Glisse <jglisse-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
> Acked-by: Balbir Singh <bsingharora-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: Johannes Weiner <hannes-druUgvl0LCNAfugRpC6u6w@public.gmane.org>
> Cc: Michal Hocko <mhocko-DgEjT+Ai2ygdnm+yROfE0A@public.gmane.org>
> Cc: Vladimir Davydov <vdavydov.dev-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
> Cc: cgroups-u79uwXL29TY76Z2rM5mHXA@public.gmane.org
> ---
>  kernel/memremap.c |  1 +
>  mm/memcontrol.c   | 52 ++++++++++++++++++++++++++++++++++++++++++++++++----
>  2 files changed, 49 insertions(+), 4 deletions(-)
> 
> diff --git a/kernel/memremap.c b/kernel/memremap.c
> index 398630c1fba3..f42d7483e886 100644
> --- a/kernel/memremap.c
> +++ b/kernel/memremap.c
> @@ -492,6 +492,7 @@ void put_zone_device_private_page(struct page *page)
>  		__ClearPageWaiters(page);
> 
>  		page->mapping = NULL;
> +		mem_cgroup_uncharge(page);
> 
>  		page->pgmap->page_free(page, page->pgmap->data);
>  	} else if (!count)
> diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> index 604fb3ca8028..977d1cf3493a 100644
> --- a/mm/memcontrol.c
> +++ b/mm/memcontrol.c
> @@ -4407,12 +4407,13 @@ enum mc_target_type {
>  	MC_TARGET_NONE = 0,
>  	MC_TARGET_PAGE,
>  	MC_TARGET_SWAP,
> +	MC_TARGET_DEVICE,
>  };
> 
>  static struct page *mc_handle_present_pte(struct vm_area_struct *vma,
>  						unsigned long addr, pte_t ptent)
>  {
> -	struct page *page = vm_normal_page(vma, addr, ptent);
> +	struct page *page = _vm_normal_page(vma, addr, ptent, true);

Hi Jérôme,

As _vm_normal_page() is defined later in the patch 18, so this patch should
 break the bisectability.

Cheers,
Laurent.

> 
>  	if (!page || !page_mapped(page))
>  		return NULL;
> @@ -4429,7 +4430,7 @@ static struct page *mc_handle_present_pte(struct vm_area_struct *vma,
>  	return page;
>  }
> 
> -#ifdef CONFIG_SWAP
> +#if defined(CONFIG_SWAP) || defined(CONFIG_DEVICE_PRIVATE)
>  static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
>  			pte_t ptent, swp_entry_t *entry)
>  {
> @@ -4438,6 +4439,23 @@ static struct page *mc_handle_swap_pte(struct vm_area_struct *vma,
> 
>  	if (!(mc.flags & MOVE_ANON) || non_swap_entry(ent))
>  		return NULL;
> +
> +	/*
> +	 * Handle MEMORY_DEVICE_PRIVATE which are ZONE_DEVICE page belonging to
> +	 * a device and because they are not accessible by CPU they are store
> +	 * as special swap entry in the CPU page table.
> +	 */
> +	if (is_device_private_entry(ent)) {
> +		page = device_private_entry_to_page(ent);
> +		/*
> +		 * MEMORY_DEVICE_PRIVATE means ZONE_DEVICE page and which have
> +		 * a refcount of 1 when free (unlike normal page)
> +		 */
> +		if (!page_ref_add_unless(page, 1, 1))
> +			return NULL;
> +		return page;
> +	}
> +
>  	/*
>  	 * Because lookup_swap_cache() updates some statistics counter,
>  	 * we call find_get_page() with swapper_space directly.
> @@ -4598,6 +4616,12 @@ static int mem_cgroup_move_account(struct page *page,
>   *   2(MC_TARGET_SWAP): if the swap entry corresponding to this pte is a
>   *     target for charge migration. if @target is not NULL, the entry is stored
>   *     in target->ent.
> + *   3(MC_TARGET_DEVICE): like MC_TARGET_PAGE  but page is MEMORY_DEVICE_PRIVATE
> + *     (so ZONE_DEVICE page and thus not on the lru). For now we such page is
> + *     charge like a regular page would be as for all intent and purposes it is
> + *     just special memory taking the place of a regular page.
> + *
> + *     See Documentations/vm/hmm.txt and include/linux/hmm.h
>   *
>   * Called with pte lock held.
>   */
> @@ -4626,6 +4650,8 @@ static enum mc_target_type get_mctgt_type(struct vm_area_struct *vma,
>  		 */
>  		if (page->mem_cgroup == mc.from) {
>  			ret = MC_TARGET_PAGE;
> +			if (is_device_private_page(page))
> +				ret = MC_TARGET_DEVICE;
>  			if (target)
>  				target->page = page;
>  		}
> @@ -4693,6 +4719,11 @@ static int mem_cgroup_count_precharge_pte_range(pmd_t *pmd,
> 
>  	ptl = pmd_trans_huge_lock(pmd, vma);
>  	if (ptl) {
> +		/*
> +		 * Note their can not be MC_TARGET_DEVICE for now as we do not
> +		 * support transparent huge page with MEMORY_DEVICE_PUBLIC or
> +		 * MEMORY_DEVICE_PRIVATE but this might change.
> +		 */
>  		if (get_mctgt_type_thp(vma, addr, *pmd, NULL) == MC_TARGET_PAGE)
>  			mc.precharge += HPAGE_PMD_NR;
>  		spin_unlock(ptl);
> @@ -4908,6 +4939,14 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
>  				putback_lru_page(page);
>  			}
>  			put_page(page);
> +		} else if (target_type == MC_TARGET_DEVICE) {
> +			page = target.page;
> +			if (!mem_cgroup_move_account(page, true,
> +						     mc.from, mc.to)) {
> +				mc.precharge -= HPAGE_PMD_NR;
> +				mc.moved_charge += HPAGE_PMD_NR;
> +			}
> +			put_page(page);
>  		}
>  		spin_unlock(ptl);
>  		return 0;
> @@ -4919,12 +4958,16 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
>  	pte = pte_offset_map_lock(vma->vm_mm, pmd, addr, &ptl);
>  	for (; addr != end; addr += PAGE_SIZE) {
>  		pte_t ptent = *(pte++);
> +		bool device = false;
>  		swp_entry_t ent;
> 
>  		if (!mc.precharge)
>  			break;
> 
>  		switch (get_mctgt_type(vma, addr, ptent, &target)) {
> +		case MC_TARGET_DEVICE:
> +			device = true;
> +			/* fall through */
>  		case MC_TARGET_PAGE:
>  			page = target.page;
>  			/*
> @@ -4935,7 +4978,7 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
>  			 */
>  			if (PageTransCompound(page))
>  				goto put;
> -			if (isolate_lru_page(page))
> +			if (!device && isolate_lru_page(page))
>  				goto put;
>  			if (!mem_cgroup_move_account(page, false,
>  						mc.from, mc.to)) {
> @@ -4943,7 +4986,8 @@ static int mem_cgroup_move_charge_pte_range(pmd_t *pmd,
>  				/* we uncharge from mc.from later. */
>  				mc.moved_charge++;
>  			}
> -			putback_lru_page(page);
> +			if (!device)
> +				putback_lru_page(page);
>  put:			/* get_mctgt_type() gets the page */
>  			put_page(page);
>  			break;
> 

  reply	other threads:[~2017-09-05 17:13 UTC|newest]

Thread overview: 119+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2017-08-17  0:05 [HMM-v25 00/19] HMM (Heterogeneous Memory Management) v25 Jérôme Glisse
2017-08-17  0:05 ` Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 01/19] hmm: heterogeneous memory management documentation v3 Jérôme Glisse
2017-08-17  0:05   ` Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 02/19] mm/hmm: heterogeneous memory management (HMM for short) v5 Jérôme Glisse
2017-08-17  0:05   ` Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 03/19] mm/hmm/mirror: mirror process address space on device with HMM helpers v3 Jérôme Glisse
2017-08-17  0:05   ` Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 04/19] mm/hmm/mirror: helper to snapshot CPU page table v4 Jérôme Glisse
2017-08-17  0:05   ` Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 05/19] mm/hmm/mirror: device page fault handler Jérôme Glisse
2017-08-17  0:05   ` Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 06/19] mm/memory_hotplug: introduce add_pages Jérôme Glisse
2017-08-17  0:05   ` Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 07/19] mm/ZONE_DEVICE: new type of ZONE_DEVICE for unaddressable memory v5 Jérôme Glisse
2017-08-17  0:05   ` Jérôme Glisse
2018-12-20  8:33   ` Dan Williams
2018-12-20 16:15     ` Jerome Glisse
2018-12-20 16:15       ` Jerome Glisse
2018-12-20 16:47       ` Dan Williams
2018-12-20 16:47         ` Dan Williams
2018-12-20 16:57         ` Jerome Glisse
2018-12-20 16:57           ` Jerome Glisse
2017-08-17  0:05 ` [HMM-v25 08/19] mm/ZONE_DEVICE: special case put_page() for device private pages v4 Jérôme Glisse
2017-08-17  0:05   ` Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 09/19] mm/memcontrol: allow to uncharge page without using page->lru field Jérôme Glisse
2017-08-17  0:05   ` Jérôme Glisse
2017-08-17  0:05   ` Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 10/19] mm/memcontrol: support MEMORY_DEVICE_PRIVATE v4 Jérôme Glisse
2017-08-17  0:05   ` Jérôme Glisse
2017-09-05 17:13   ` Laurent Dufour [this message]
2017-09-05 17:13     ` Laurent Dufour
2017-09-05 17:13     ` Laurent Dufour
2017-09-05 17:21     ` Jerome Glisse
2017-09-05 17:21       ` Jerome Glisse
2017-08-17  0:05 ` [HMM-v25 11/19] mm/hmm/devmem: device memory hotplug using ZONE_DEVICE v7 Jérôme Glisse
2017-08-17  0:05   ` Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 12/19] mm/hmm/devmem: dummy HMM device for ZONE_DEVICE memory v3 Jérôme Glisse
2017-08-17  0:05   ` Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 13/19] mm/migrate: new migrate mode MIGRATE_SYNC_NO_COPY Jérôme Glisse
2017-08-17  0:05   ` Jérôme Glisse
2017-08-17 21:12   ` Andrew Morton
2017-08-17 21:12     ` Andrew Morton
2017-08-17 21:44     ` Jerome Glisse
2017-08-17 21:44       ` Jerome Glisse
2017-08-17  0:05 ` [HMM-v25 14/19] mm/migrate: new memory migration helper for use with device memory v5 Jérôme Glisse
2017-08-17  0:05   ` Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 15/19] mm/migrate: migrate_vma() unmap page from vma while collecting pages Jérôme Glisse
2017-08-17  0:05   ` Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 16/19] mm/migrate: support un-addressable ZONE_DEVICE page in migration v3 Jérôme Glisse
2017-08-17  0:05   ` Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 17/19] mm/migrate: allow migrate_vma() to alloc new page on empty entry v4 Jérôme Glisse
2017-08-17  0:05   ` Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 18/19] mm/device-public-memory: device memory cache coherent with CPU v5 Jérôme Glisse
2017-08-17  0:05   ` Jérôme Glisse
2017-08-17  0:05 ` [HMM-v25 19/19] mm/hmm: add new helper to hotplug CDM memory region v3 Jérôme Glisse
2017-08-17  0:05   ` Jérôme Glisse
2017-09-04  3:09   ` Bob Liu
2017-09-04  3:09     ` Bob Liu
2017-09-04 15:51     ` Jerome Glisse
2017-09-04 15:51       ` Jerome Glisse
2017-09-05  1:13       ` Bob Liu
2017-09-05  1:13         ` Bob Liu
2017-09-05  2:38         ` Jerome Glisse
2017-09-05  2:38           ` Jerome Glisse
2017-09-05  3:50           ` Bob Liu
2017-09-05  3:50             ` Bob Liu
2017-09-05 13:50             ` Jerome Glisse
2017-09-05 13:50               ` Jerome Glisse
2017-09-05 16:18               ` Dan Williams
2017-09-05 16:18                 ` Dan Williams
2017-09-05 19:00               ` Ross Zwisler
2017-09-05 19:00                 ` Ross Zwisler
2017-09-05 19:20                 ` Jerome Glisse
2017-09-05 19:20                   ` Jerome Glisse
2017-09-08 19:43                   ` Ross Zwisler
2017-09-08 19:43                     ` Ross Zwisler
2017-09-08 20:29                     ` Jerome Glisse
2017-09-08 20:29                       ` Jerome Glisse
2017-09-05 18:54           ` Ross Zwisler
2017-09-05 18:54             ` Ross Zwisler
2017-09-06  1:25             ` Bob Liu
2017-09-06  1:25               ` Bob Liu
2017-09-06  2:12               ` Jerome Glisse
2017-09-06  2:12                 ` Jerome Glisse
2017-09-07  2:06                 ` Bob Liu
2017-09-07  2:06                   ` Bob Liu
2017-09-07 17:00                   ` Jerome Glisse
2017-09-07 17:00                     ` Jerome Glisse
2017-09-07 17:27                   ` Jerome Glisse
2017-09-07 17:27                     ` Jerome Glisse
2017-09-08  1:59                     ` Bob Liu
2017-09-08  1:59                       ` Bob Liu
2017-09-08 20:43                       ` Dan Williams
2017-09-08 20:43                         ` Dan Williams
2017-11-17  3:47                         ` chetan L
2017-11-17  3:47                           ` chetan L
2017-09-05  3:36       ` Balbir Singh
2017-09-05  3:36         ` Balbir Singh
2017-08-17 21:39 ` [HMM-v25 00/19] HMM (Heterogeneous Memory Management) v25 Andrew Morton
2017-08-17 21:39   ` Andrew Morton
2017-08-17 21:55   ` Jerome Glisse
2017-08-17 21:55     ` Jerome Glisse
2017-08-17 21:59     ` Dan Williams
2017-08-17 21:59       ` Dan Williams
2017-08-17 22:02       ` Jerome Glisse
2017-08-17 22:02         ` Jerome Glisse
2017-08-17 22:06         ` Dan Williams
2017-08-17 22:06           ` Dan Williams
2017-08-17 22:16       ` Andrew Morton
2017-08-17 22:16         ` Andrew Morton
2017-12-13 12:10 ` Figo.zhang
2017-12-13 16:12   ` Jerome Glisse
2017-12-14  2:48     ` Figo.zhang
2017-12-14  3:16       ` Jerome Glisse
2017-12-14  3:53         ` Figo.zhang
2017-12-14  4:16           ` Jerome Glisse
2017-12-14  7:05             ` Figo.zhang
2017-12-14 15:28               ` Jerome Glisse

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f239d1c2-7006-5ce4-7848-7d82e67533a9@linux.vnet.ibm.com \
    --to=ldufour@linux.vnet.ibm.com \
    --cc=akpm@linux-foundation.org \
    --cc=bsingharora@gmail.com \
    --cc=cgroups@vger.kernel.org \
    --cc=dan.j.williams@intel.com \
    --cc=dnellans@nvidia.com \
    --cc=hannes@cmpxchg.org \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=mhocko@kernel.org \
    --cc=vdavydov.dev@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.