All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jerome Glisse <j.glisse@gmail.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	joro@8bytes.org, "Mel Gorman" <mgorman@suse.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Johannes Weiner" <jweiner@redhat.com>,
	"Larry Woodman" <lwoodman@redhat.com>,
	"Rik van Riel" <riel@redhat.com>,
	"Dave Airlie" <airlied@redhat.com>,
	"Brendan Conoboy" <blc@redhat.com>,
	"Joe Donohue" <jdonohue@redhat.com>,
	"Christophe Harle" <charle@nvidia.com>,
	"Duncan Poole" <dpoole@nvidia.com>,
	"Sherry Cheung" <SCheung@nvidia.com>,
	"Subhash Gutti" <sgutti@nvidia.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Mark Hairgrove" <mhairgrove@nvidia.com>,
	"Lucien Dunning" <ldunning@nvidia.com>,
	"Cameron Buschardt" <cabuschardt@nvidia.com>,
	"Arvind Gopalakrishnan" <arvindg@nvidia.com>,
	"Haggai Eran" <haggaie@mellanox.com>,
	"Liran Liss" <liranl@mellanox.com>,
	"Roland Dreier" <roland@purestorage.com>,
	"Ben Sander" <ben.sander@amd.com>,
	"Greg Stoner" <Greg.Stoner@amd.com>,
	"John Bridgman" <John.Bridgman@amd.com>,
	"Michael Mantor" <Michael.Mantor@amd.com>,
	"Paul Blinzer" <Paul.Blinzer@amd.com>,
	"Leonid Shamis" <Leonid.Shamis@amd.com>,
	"Laurent Morichetti" <Laurent.Morichetti@amd.com>,
	"Alexander Deucher" <Alexander.Deucher@amd.com>,
	"Jatin Kumar" <jakumar@nvidia.com>
Subject: Re: [PATCH v12 08/29] HMM: add device page fault support v6.
Date: Wed, 23 Mar 2016 12:25:32 +0100	[thread overview]
Message-ID: <20160323112532.GB2888@gmail.com> (raw)
In-Reply-To: <87egb1trlf.fsf@linux.vnet.ibm.com>

On Wed, Mar 23, 2016 at 03:59:32PM +0530, Aneesh Kumar K.V wrote:
> Jerome Glisse <j.glisse@gmail.com> writes:

[...]

> >>  +static int hmm_mirror_fault_hpmd(struct hmm_mirror *mirror,
> >> > +				 struct hmm_event *event,
> >> > +				 struct vm_area_struct *vma,
> >> > +				 struct hmm_pt_iter *iter,
> >> > +				 pmd_t *pmdp,
> >> > +				 struct hmm_mirror_fault *mirror_fault,
> >> > +				 unsigned long start,
> >> > +				 unsigned long end)
> >> > +{
> >> > +	struct page *page;
> >> > +	unsigned long addr, pfn;
> >> > +	unsigned flags = FOLL_TOUCH;
> >> > +	spinlock_t *ptl;
> >> > +	int ret;
> >> > +
> >> > +	ptl = pmd_lock(mirror->hmm->mm, pmdp);
> >> > +	if (unlikely(!pmd_trans_huge(*pmdp))) {
> >> > +		spin_unlock(ptl);
> >> > +		return -EAGAIN;
> >> > +	}
> >> > +	flags |= event->etype == HMM_DEVICE_WFAULT ? FOLL_WRITE : 0;
> >> > +	page = follow_trans_huge_pmd(vma, start, pmdp, flags);
> >> > +	pfn = page_to_pfn(page);
> >> > +	spin_unlock(ptl);
> >> > +
> >> > +	/* Just fault in the whole PMD. */
> >> > +	start &= PMD_MASK;
> >> > +	end = start + PMD_SIZE - 1;
> >> > +
> >> > +	if (!pmd_write(*pmdp) && event->etype == HMM_DEVICE_WFAULT)
> >> > +			return -ENOENT;
> >> > +
> >> > +	for (ret = 0, addr = start; !ret && addr < end;) {
> >> > +		unsigned long i, next = end;
> >> > +		dma_addr_t *hmm_pte;
> >> > +
> >> > +		hmm_pte = hmm_pt_iter_populate(iter, addr, &next);
> >> > +		if (!hmm_pte)
> >> > +			return -ENOMEM;
> >> > +
> >> > +		i = hmm_pt_index(&mirror->pt, addr, mirror->pt.llevel);
> >> > +
> >> > +		/*
> >> > +		 * The directory lock protect against concurrent clearing of
> >> > +		 * page table bit flags. Exceptions being the dirty bit and
> >> > +		 * the device driver private flags.
> >> > +		 */
> >> > +		hmm_pt_iter_directory_lock(iter);
> >> > +		do {
> >> > +			if (!hmm_pte_test_valid_pfn(&hmm_pte[i])) {
> >> > +				hmm_pte[i] = hmm_pte_from_pfn(pfn);
> >> > +				hmm_pt_iter_directory_ref(iter);
> >> 
> >> I looked at that and it is actually 
> >> static inline void hmm_pt_iter_directory_ref(struct hmm_pt_iter *iter)
> >> {
> >> 	BUG_ON(!iter->ptd[iter->pt->llevel - 1]);
> >> 	hmm_pt_directory_ref(iter->pt, iter->ptd[iter->pt->llevel - 1]);
> >> }
> >> 
> >> static inline void hmm_pt_directory_ref(struct hmm_pt *pt,
> >> 					struct page *ptd)
> >> {
> >> 	if (!atomic_inc_not_zero(&ptd->_mapcount))
> >> 		/* Illegal this should not happen. */
> >> 		BUG();
> >> }
> >> 
> >> what is the mapcount update about ?
> >
> > Unlike regular CPU page table we do not rely on unmap to prune HMM mirror
> > page table. Rather we free/prune it aggressively once the device no longer
> > have anything mirror in a given range.
> 
> Which patch does this ?

Well it is done in hmm_pt_iter_directory_unref_safe() so there is no particular
patch per say. One optimization i want to do, as part of latter patch is to
delay directory pruning so that we avoid freeing and the reallocating right
away because device or some memory event wrongly induced us into believing it
was done with a range. But i do not want to complexify code before knowing if
it does make sense to do so with hard numbers.


> > As such mapcount is use to keep track of any many valid entry there is per
> > directory.
> >
> > Moreover mapcount is also use to protect from concurrent pruning when
> > you walk through the page table you increment refcount by one along your
> > way. When you done walking you decrement refcount.
> >
> > Because of that last aspect, the mapcount can never reach zero because we
> > unmap page, it can only reach zero once we cleanup the page table walk.
> >
> >> 
> >> > +			}
> >> > +			BUG_ON(hmm_pte_pfn(hmm_pte[i]) != pfn);
> >> > +			if (pmd_write(*pmdp))
> >> > +				hmm_pte_set_write(&hmm_pte[i]);
> >> > +		} while (addr += PAGE_SIZE, pfn++, i++, addr != next);
> >> > +		hmm_pt_iter_directory_unlock(iter);
> >> > +		mirror_fault->addr = addr;
> >> > +	}
> >> > +
> >> 
> >> So we don't have huge page mapping in hmm page table ? 
> >
> > No we don't right now. First reason is that i wanted to keep things simple for
> > device driver. Second motivation is to keep first patchset simpler especialy
> > the page migration code.
> >
> > Memory overhead is 2MB per GB of virtual memory mirrored. There is no TLB here.
> > I believe adding huge page can be done as part of a latter patchset if it makes
> > sense.
> >
> 
> One of the thing I am wondering is can we do the patch series in such a
> way that we move the page table mirror to device driver. That is an
> hmm fault will look at cpu page table and call into a device driver callback
> with the pte entry details. It is upto the device driver to maintain a
> mirror table if needed. Similarly for cpu fault we call into hmm
> callback to find per pte dma_addr and do a migrate using
> copy_from_device callback. I haven't fully looked at how easy this would
> be, but I guess lot of the code in this series got to do with mirror
> table and I wondering is there a simpler version we can get upstream
> that hides it within a driver.

This is one possibility but it means that many device driver will duplicate
page table code. It also means that some optimization that i want to do down
the road are not doable. Most notably i want to share IOMMU directory among
several devices (when those devices mirror the same virtual address range)
but this require works in the DMA/IOMMU code.

Another side is related to page reclaimation, with having page use by device
we could get stall because device page table invalidation is way more complex
and takes more time the CPU page table invalidation.

Having mirror in common code makes it easier to have a new lru list for pages
referenced by device. Allowing to hide device page table invalidation latency.
This is also probably doable if we hide the mirror page table into the device
driver but it is harder for common code to know to which device it needs to
ask unmapping. Also this would require either a new page flag or a new pte
flags. Both of which is in short supply and i am not sure people would be
thrill to reserve one just for this feature.

Also i think we want to limit device usage of things like mmu_notifier API.
At least i would.

Another possibility that i did explore is having common code manage mirror
range (instead of a page table) and have the device driver deals on its own
with the down to the page mirroring. I even have patch doing this somewhere.
This might be a middle ground solution. Note by range i means something like:

struct mirror_range {
	struct hmm_device *hdev;
	unsigned long start; /* virtual start address for the range */
	unsigned long end; /* virtual end address for the range */
	/* other field like for an rb_tree and flags. */
};

But it gets quite ugly with range merging/splitting and the obvious worst
case of having one of this struct per page (like mirroring a range every
other page).


> Also does it simply to have interfaces that operates on one pte than an
> array of ptes ? 

I strongly believe we do not want to do that. GPU are like 2048 cores with
16384 threads in flight, if each of the threads page fault over a linear
range you end up having to do 16384 calls and the overhead is gona kill
performances. GPU is about batching things up. So doing things in bulk is
what we want for performances.

Also, i should add that on GPU saving thread context out to memory and
swapping in another is way more expensive. First you can only do so on
large boundary, ie 256 thread at a time or more depends on the GPU.
Seconds for each thread there is way much memory, think few kilo bytes,
so you easily endup moving around MB of thread context data. This is
not lightweight. It is a different paradigm from CPU.

Cheers,
Jérôme

WARNING: multiple messages have this Message-ID (diff)
From: Jerome Glisse <j.glisse@gmail.com>
To: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com>
Cc: "Jérôme Glisse" <jglisse@redhat.com>,
	akpm@linux-foundation.org, linux-kernel@vger.kernel.org,
	linux-mm@kvack.org,
	"Linus Torvalds" <torvalds@linux-foundation.org>,
	joro@8bytes.org, "Mel Gorman" <mgorman@suse.de>,
	"H. Peter Anvin" <hpa@zytor.com>,
	"Peter Zijlstra" <peterz@infradead.org>,
	"Andrea Arcangeli" <aarcange@redhat.com>,
	"Johannes Weiner" <jweiner@redhat.com>,
	"Larry Woodman" <lwoodman@redhat.com>,
	"Rik van Riel" <riel@redhat.com>,
	"Dave Airlie" <airlied@redhat.com>,
	"Brendan Conoboy" <blc@redhat.com>,
	"Joe Donohue" <jdonohue@redhat.com>,
	"Christophe Harle" <charle@nvidia.com>,
	"Duncan Poole" <dpoole@nvidia.com>,
	"Sherry Cheung" <SCheung@nvidia.com>,
	"Subhash Gutti" <sgutti@nvidia.com>,
	"John Hubbard" <jhubbard@nvidia.com>,
	"Mark Hairgrove" <mhairgrove@nvidia.com>,
	"Lucien Dunning" <ldunning@nvidia.com>,
	"Cameron Buschardt" <cabuschardt@nvidia.com>,
	"Arvind Gopalakrishnan" <arvindg@nvidia.com>,
	"Haggai Eran" <haggaie@mellanox.com>,
	"Liran Liss" <liranl@mellanox.com>,
	"Roland Dreier" <roland@purestorage.com>,
	"Ben Sander" <ben.sander@amd.com>,
	"Greg Stoner" <Greg.Stoner@amd.com>,
	"John Bridgman" <John.Bridgman@amd.com>,
	"Michael Mantor" <Michael.Mantor@amd.com>,
	"Paul Blinzer" <Paul.Blinzer@amd.com>,
	"Leonid Shamis" <Leonid.Shamis@amd.com>,
	"Laurent Morichetti" <Laurent.Morichetti@amd.com>,
	"Alexander Deucher" <Alexander.Deucher@amd.com>,
	"Jatin Kumar" <jakumar@nvidia.com>
Subject: Re: [PATCH v12 08/29] HMM: add device page fault support v6.
Date: Wed, 23 Mar 2016 12:25:32 +0100	[thread overview]
Message-ID: <20160323112532.GB2888@gmail.com> (raw)
In-Reply-To: <87egb1trlf.fsf@linux.vnet.ibm.com>

On Wed, Mar 23, 2016 at 03:59:32PM +0530, Aneesh Kumar K.V wrote:
> Jerome Glisse <j.glisse@gmail.com> writes:

[...]

> >>  +static int hmm_mirror_fault_hpmd(struct hmm_mirror *mirror,
> >> > +				 struct hmm_event *event,
> >> > +				 struct vm_area_struct *vma,
> >> > +				 struct hmm_pt_iter *iter,
> >> > +				 pmd_t *pmdp,
> >> > +				 struct hmm_mirror_fault *mirror_fault,
> >> > +				 unsigned long start,
> >> > +				 unsigned long end)
> >> > +{
> >> > +	struct page *page;
> >> > +	unsigned long addr, pfn;
> >> > +	unsigned flags = FOLL_TOUCH;
> >> > +	spinlock_t *ptl;
> >> > +	int ret;
> >> > +
> >> > +	ptl = pmd_lock(mirror->hmm->mm, pmdp);
> >> > +	if (unlikely(!pmd_trans_huge(*pmdp))) {
> >> > +		spin_unlock(ptl);
> >> > +		return -EAGAIN;
> >> > +	}
> >> > +	flags |= event->etype == HMM_DEVICE_WFAULT ? FOLL_WRITE : 0;
> >> > +	page = follow_trans_huge_pmd(vma, start, pmdp, flags);
> >> > +	pfn = page_to_pfn(page);
> >> > +	spin_unlock(ptl);
> >> > +
> >> > +	/* Just fault in the whole PMD. */
> >> > +	start &= PMD_MASK;
> >> > +	end = start + PMD_SIZE - 1;
> >> > +
> >> > +	if (!pmd_write(*pmdp) && event->etype == HMM_DEVICE_WFAULT)
> >> > +			return -ENOENT;
> >> > +
> >> > +	for (ret = 0, addr = start; !ret && addr < end;) {
> >> > +		unsigned long i, next = end;
> >> > +		dma_addr_t *hmm_pte;
> >> > +
> >> > +		hmm_pte = hmm_pt_iter_populate(iter, addr, &next);
> >> > +		if (!hmm_pte)
> >> > +			return -ENOMEM;
> >> > +
> >> > +		i = hmm_pt_index(&mirror->pt, addr, mirror->pt.llevel);
> >> > +
> >> > +		/*
> >> > +		 * The directory lock protect against concurrent clearing of
> >> > +		 * page table bit flags. Exceptions being the dirty bit and
> >> > +		 * the device driver private flags.
> >> > +		 */
> >> > +		hmm_pt_iter_directory_lock(iter);
> >> > +		do {
> >> > +			if (!hmm_pte_test_valid_pfn(&hmm_pte[i])) {
> >> > +				hmm_pte[i] = hmm_pte_from_pfn(pfn);
> >> > +				hmm_pt_iter_directory_ref(iter);
> >> 
> >> I looked at that and it is actually 
> >> static inline void hmm_pt_iter_directory_ref(struct hmm_pt_iter *iter)
> >> {
> >> 	BUG_ON(!iter->ptd[iter->pt->llevel - 1]);
> >> 	hmm_pt_directory_ref(iter->pt, iter->ptd[iter->pt->llevel - 1]);
> >> }
> >> 
> >> static inline void hmm_pt_directory_ref(struct hmm_pt *pt,
> >> 					struct page *ptd)
> >> {
> >> 	if (!atomic_inc_not_zero(&ptd->_mapcount))
> >> 		/* Illegal this should not happen. */
> >> 		BUG();
> >> }
> >> 
> >> what is the mapcount update about ?
> >
> > Unlike regular CPU page table we do not rely on unmap to prune HMM mirror
> > page table. Rather we free/prune it aggressively once the device no longer
> > have anything mirror in a given range.
> 
> Which patch does this ?

Well it is done in hmm_pt_iter_directory_unref_safe() so there is no particular
patch per say. One optimization i want to do, as part of latter patch is to
delay directory pruning so that we avoid freeing and the reallocating right
away because device or some memory event wrongly induced us into believing it
was done with a range. But i do not want to complexify code before knowing if
it does make sense to do so with hard numbers.


> > As such mapcount is use to keep track of any many valid entry there is per
> > directory.
> >
> > Moreover mapcount is also use to protect from concurrent pruning when
> > you walk through the page table you increment refcount by one along your
> > way. When you done walking you decrement refcount.
> >
> > Because of that last aspect, the mapcount can never reach zero because we
> > unmap page, it can only reach zero once we cleanup the page table walk.
> >
> >> 
> >> > +			}
> >> > +			BUG_ON(hmm_pte_pfn(hmm_pte[i]) != pfn);
> >> > +			if (pmd_write(*pmdp))
> >> > +				hmm_pte_set_write(&hmm_pte[i]);
> >> > +		} while (addr += PAGE_SIZE, pfn++, i++, addr != next);
> >> > +		hmm_pt_iter_directory_unlock(iter);
> >> > +		mirror_fault->addr = addr;
> >> > +	}
> >> > +
> >> 
> >> So we don't have huge page mapping in hmm page table ? 
> >
> > No we don't right now. First reason is that i wanted to keep things simple for
> > device driver. Second motivation is to keep first patchset simpler especialy
> > the page migration code.
> >
> > Memory overhead is 2MB per GB of virtual memory mirrored. There is no TLB here.
> > I believe adding huge page can be done as part of a latter patchset if it makes
> > sense.
> >
> 
> One of the thing I am wondering is can we do the patch series in such a
> way that we move the page table mirror to device driver. That is an
> hmm fault will look at cpu page table and call into a device driver callback
> with the pte entry details. It is upto the device driver to maintain a
> mirror table if needed. Similarly for cpu fault we call into hmm
> callback to find per pte dma_addr and do a migrate using
> copy_from_device callback. I haven't fully looked at how easy this would
> be, but I guess lot of the code in this series got to do with mirror
> table and I wondering is there a simpler version we can get upstream
> that hides it within a driver.

This is one possibility but it means that many device driver will duplicate
page table code. It also means that some optimization that i want to do down
the road are not doable. Most notably i want to share IOMMU directory among
several devices (when those devices mirror the same virtual address range)
but this require works in the DMA/IOMMU code.

Another side is related to page reclaimation, with having page use by device
we could get stall because device page table invalidation is way more complex
and takes more time the CPU page table invalidation.

Having mirror in common code makes it easier to have a new lru list for pages
referenced by device. Allowing to hide device page table invalidation latency.
This is also probably doable if we hide the mirror page table into the device
driver but it is harder for common code to know to which device it needs to
ask unmapping. Also this would require either a new page flag or a new pte
flags. Both of which is in short supply and i am not sure people would be
thrill to reserve one just for this feature.

Also i think we want to limit device usage of things like mmu_notifier API.
At least i would.

Another possibility that i did explore is having common code manage mirror
range (instead of a page table) and have the device driver deals on its own
with the down to the page mirroring. I even have patch doing this somewhere.
This might be a middle ground solution. Note by range i means something like:

struct mirror_range {
	struct hmm_device *hdev;
	unsigned long start; /* virtual start address for the range */
	unsigned long end; /* virtual end address for the range */
	/* other field like for an rb_tree and flags. */
};

But it gets quite ugly with range merging/splitting and the obvious worst
case of having one of this struct per page (like mirroring a range every
other page).


> Also does it simply to have interfaces that operates on one pte than an
> array of ptes ? 

I strongly believe we do not want to do that. GPU are like 2048 cores with
16384 threads in flight, if each of the threads page fault over a linear
range you end up having to do 16384 calls and the overhead is gona kill
performances. GPU is about batching things up. So doing things in bulk is
what we want for performances.

Also, i should add that on GPU saving thread context out to memory and
swapping in another is way more expensive. First you can only do so on
large boundary, ie 256 thread at a time or more depends on the GPU.
Seconds for each thread there is way much memory, think few kilo bytes,
so you easily endup moving around MB of thread context data. This is
not lightweight. It is a different paradigm from CPU.

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

  reply	other threads:[~2016-03-23 11:25 UTC|newest]

Thread overview: 81+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-03-08 20:42 HMM (Heterogeneous Memory Management) Jérôme Glisse
2016-03-08 20:42 ` Jérôme Glisse
2016-03-08 20:42 ` [PATCH v12 01/29] mmu_notifier: add event information to address invalidation v9 Jérôme Glisse
2016-03-08 20:42   ` Jérôme Glisse
2016-03-08 20:42 ` [PATCH v12 02/29] mmu_notifier: keep track of active invalidation ranges v5 Jérôme Glisse
2016-03-08 20:42   ` Jérôme Glisse
2016-03-08 20:42 ` [PATCH v12 03/29] mmu_notifier: pass page pointer to mmu_notifier_invalidate_page() v2 Jérôme Glisse
2016-03-08 20:42   ` Jérôme Glisse
2016-03-08 20:42 ` [PATCH v12 04/29] mmu_notifier: allow range invalidation to exclude a specific mmu_notifier Jérôme Glisse
2016-03-08 20:42   ` Jérôme Glisse
2016-03-08 20:42 ` [PATCH v12 05/29] HMM: introduce heterogeneous memory management v5 Jérôme Glisse
2016-03-08 20:42   ` Jérôme Glisse
2016-03-08 20:42 ` [PATCH v12 06/29] HMM: add HMM page table v4 Jérôme Glisse
2016-03-08 20:42   ` Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 07/29] HMM: add per mirror " Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-29 22:58   ` John Hubbard
2016-03-29 22:58     ` John Hubbard
2016-03-08 20:43 ` [PATCH v12 08/29] HMM: add device page fault support v6 Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-23  6:52   ` Aneesh Kumar K.V
2016-03-23  6:52     ` Aneesh Kumar K.V
2016-03-23 10:09     ` Jerome Glisse
2016-03-23 10:09       ` Jerome Glisse
2016-03-23 10:29       ` Aneesh Kumar K.V
2016-03-23 10:29         ` Aneesh Kumar K.V
2016-03-23 11:25         ` Jerome Glisse [this message]
2016-03-23 11:25           ` Jerome Glisse
2016-03-08 20:43 ` [PATCH v12 09/29] HMM: add mm page table iterator helpers Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 10/29] HMM: use CPU page table during invalidation Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 11/29] HMM: add discard range helper (to clear and free resources for a range) Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 12/29] HMM: add dirty range helper (toggle dirty bit inside mirror page table) v2 Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 13/29] HMM: DMA map memory on behalf of device driver v2 Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 14/29] HMM: Add support for hugetlb Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 15/29] HMM: add documentation explaining HMM internals and how to use it Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 16/29] fork: pass the dst vma to copy_page_range() and its sub-functions Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 17/29] HMM: add special swap filetype for memory migrated to device v2 Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 18/29] HMM: add new HMM page table flag (valid device memory) Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 19/29] HMM: add new HMM page table flag (select flag) Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 20/29] HMM: handle HMM device page table entry on mirror page table fault and update Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 21/29] HMM: mm add helper to update page table when migrating memory back v2 Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-21 11:27   ` Aneesh Kumar K.V
2016-03-21 11:27     ` Aneesh Kumar K.V
2016-03-21 12:02     ` Jerome Glisse
2016-03-21 12:02       ` Jerome Glisse
2016-03-21 13:48       ` Aneesh Kumar K.V
2016-03-21 13:48         ` Aneesh Kumar K.V
2016-03-21 14:30         ` Jerome Glisse
2016-03-21 14:30           ` Jerome Glisse
2016-03-08 20:43 ` [PATCH v12 22/29] HMM: mm add helper to update page table when migrating memory v3 Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-21 14:24   ` Aneesh Kumar K.V
2016-03-21 14:24     ` Aneesh Kumar K.V
2016-03-08 20:43 ` [PATCH v12 23/29] HMM: new callback for copying memory from and to device memory v2 Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 24/29] HMM: allow to get pointer to spinlock protecting a directory Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 25/29] HMM: split DMA mapping function in two Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 26/29] HMM: add helpers for migration back to system memory v3 Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 27/29] HMM: fork copy migrated memory into system memory for child process Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 28/29] HMM: CPU page fault on migrated memory Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-08 20:43 ` [PATCH v12 29/29] HMM: add mirror fault support for system to device memory migration v3 Jérôme Glisse
2016-03-08 20:43   ` Jérôme Glisse
2016-03-08 22:02 ` HMM (Heterogeneous Memory Management) John Hubbard

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20160323112532.GB2888@gmail.com \
    --to=j.glisse@gmail.com \
    --cc=Alexander.Deucher@amd.com \
    --cc=Greg.Stoner@amd.com \
    --cc=John.Bridgman@amd.com \
    --cc=Laurent.Morichetti@amd.com \
    --cc=Leonid.Shamis@amd.com \
    --cc=Michael.Mantor@amd.com \
    --cc=Paul.Blinzer@amd.com \
    --cc=SCheung@nvidia.com \
    --cc=aarcange@redhat.com \
    --cc=airlied@redhat.com \
    --cc=akpm@linux-foundation.org \
    --cc=aneesh.kumar@linux.vnet.ibm.com \
    --cc=arvindg@nvidia.com \
    --cc=ben.sander@amd.com \
    --cc=blc@redhat.com \
    --cc=cabuschardt@nvidia.com \
    --cc=charle@nvidia.com \
    --cc=dpoole@nvidia.com \
    --cc=haggaie@mellanox.com \
    --cc=hpa@zytor.com \
    --cc=jakumar@nvidia.com \
    --cc=jdonohue@redhat.com \
    --cc=jglisse@redhat.com \
    --cc=jhubbard@nvidia.com \
    --cc=joro@8bytes.org \
    --cc=jweiner@redhat.com \
    --cc=ldunning@nvidia.com \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=liranl@mellanox.com \
    --cc=lwoodman@redhat.com \
    --cc=mgorman@suse.de \
    --cc=mhairgrove@nvidia.com \
    --cc=peterz@infradead.org \
    --cc=riel@redhat.com \
    --cc=roland@purestorage.com \
    --cc=sgutti@nvidia.com \
    --cc=torvalds@linux-foundation.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.