[LSF/MM TOPIC] Memory hotplug, ZONE_DEVICE, and the future of struct page

All of lore.kernel.org
 help / color / mirror / Atom feed

* [LSF/MM TOPIC] Memory hotplug, ZONE_DEVICE, and the future of struct page
@ 2017-01-12 22:43 ` Dan Williams
  0 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2017-01-12 22:43 UTC (permalink / raw)
  To: Linux MM, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block
  Cc: Stephen Bates, Logan Gunthorpe, Jason Gunthorpe

Back when we were first attempting to support DMA for DAX mappings of
persistent memory the plan was to forgo 'struct page' completely and
develop a pfn-to-scatterlist capability for the dma-mapping-api. That
effort died in this thread:

    https://lkml.org/lkml/2015/8/14/3

...where we learned that the dependencies on struct page for dma
mapping are deeper than a PFN_PHYS() conversion for some
architectures. That was the moment we pivoted to ZONE_DEVICE and
arranged for a 'struct page' to be available for any persistent memory
range that needs to be the target of DMA. ZONE_DEVICE enables any
device-driver that can target "System RAM" to also be able to target
persistent memory through a DAX mapping.

Since that time the "page-less" DAX path has continued to mature [1]
without growing new dependencies on struct page, but at the same time
continuing to rely on ZONE_DEVICE to satisfy get_user_pages().

Peer-to-peer DMA appears to be evolving from a niche embedded use case
to something general purpose platforms will need to comprehend. The
"map_peer_resource" [2] approach looks to be headed to the same
destination as the pfn-to-scatterlist effort. It's difficult to avoid
'struct page' for describing DMA operations without custom driver
code.

With that background, a statement and a question to discuss at LSF/MM:

General purpose DMA, i.e. any DMA setup through the dma-mapping-api,
requires pfn_to_page() support across the entire physical address
range mapped.

Is ZONE_DEVICE the proper vehicle for this? We've already seen that it
collides with platform alignment assumptions [3], and if there's a
wider effort to rework memory hotplug [4] it seems DMA support should
be part of the discussion.

---

This topic focuses on the mechanism to enable pfn_to_page() for an
arbitrary physical address range, and the proposed peer-to-peer DMA
topic [5] touches on the userspace presentation of this mechanism. I
might be good to combine these topics if there's interest? In any
event, I'm interested in both as well Michal's concern about memory
hotplug in general.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2016-November/007672.html
[2]: http://www.spinics.net/lists/linux-pci/msg44560.html
[3]: https://lkml.org/lkml/2016/12/1/740
[4]: http://www.spinics.net/lists/linux-mm/msg119369.html
[5]: http://marc.info/?l=linux-mm&m=148156541804940&w=2

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* [LSF/MM TOPIC] Memory hotplug, ZONE_DEVICE, and the future of struct page
@ 2017-01-12 22:43 ` Dan Williams
  0 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2017-01-12 22:43 UTC (permalink / raw)
  To: Linux MM, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block
  Cc: Stephen Bates, Logan Gunthorpe, Jason Gunthorpe

Back when we were first attempting to support DMA for DAX mappings of
persistent memory the plan was to forgo 'struct page' completely and
develop a pfn-to-scatterlist capability for the dma-mapping-api. That
effort died in this thread:

    https://lkml.org/lkml/2015/8/14/3

...where we learned that the dependencies on struct page for dma
mapping are deeper than a PFN_PHYS() conversion for some
architectures. That was the moment we pivoted to ZONE_DEVICE and
arranged for a 'struct page' to be available for any persistent memory
range that needs to be the target of DMA. ZONE_DEVICE enables any
device-driver that can target "System RAM" to also be able to target
persistent memory through a DAX mapping.

Since that time the "page-less" DAX path has continued to mature [1]
without growing new dependencies on struct page, but at the same time
continuing to rely on ZONE_DEVICE to satisfy get_user_pages().

Peer-to-peer DMA appears to be evolving from a niche embedded use case
to something general purpose platforms will need to comprehend. The
"map_peer_resource" [2] approach looks to be headed to the same
destination as the pfn-to-scatterlist effort. It's difficult to avoid
'struct page' for describing DMA operations without custom driver
code.

With that background, a statement and a question to discuss at LSF/MM:

General purpose DMA, i.e. any DMA setup through the dma-mapping-api,
requires pfn_to_page() support across the entire physical address
range mapped.

Is ZONE_DEVICE the proper vehicle for this? We've already seen that it
collides with platform alignment assumptions [3], and if there's a
wider effort to rework memory hotplug [4] it seems DMA support should
be part of the discussion.

---

This topic focuses on the mechanism to enable pfn_to_page() for an
arbitrary physical address range, and the proposed peer-to-peer DMA
topic [5] touches on the userspace presentation of this mechanism. I
might be good to combine these topics if there's interest? In any
event, I'm interested in both as well Michal's concern about memory
hotplug in general.

[1]: https://lists.01.org/pipermail/linux-nvdimm/2016-November/007672.html
[2]: http://www.spinics.net/lists/linux-pci/msg44560.html
[3]: https://lkml.org/lkml/2016/12/1/740
[4]: http://www.spinics.net/lists/linux-mm/msg119369.html
[5]: http://marc.info/?l=linux-mm&m=148156541804940&w=2

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Memory hotplug, ZONE_DEVICE, and the future of struct page
  2017-01-12 22:43 ` Dan Williams
  (?)
  (?)
@ 2017-01-12 23:14   ` Jerome Glisse
  -1 siblings, 0 replies; 13+ messages in thread
From: Jerome Glisse @ 2017-01-12 23:14 UTC (permalink / raw)
  To: Dan Williams
  Cc: Jason Gunthorpe, linux-nvdimm@lists.01.org, linux-block,
	Linux MM, linux-fsdevel, lsf-pc

On Thu, Jan 12, 2017 at 02:43:03PM -0800, Dan Williams wrote:
> Back when we were first attempting to support DMA for DAX mappings of
> persistent memory the plan was to forgo 'struct page' completely and
> develop a pfn-to-scatterlist capability for the dma-mapping-api. That
> effort died in this thread:
> 
>     https://lkml.org/lkml/2015/8/14/3
> 
> ...where we learned that the dependencies on struct page for dma
> mapping are deeper than a PFN_PHYS() conversion for some
> architectures. That was the moment we pivoted to ZONE_DEVICE and
> arranged for a 'struct page' to be available for any persistent memory
> range that needs to be the target of DMA. ZONE_DEVICE enables any
> device-driver that can target "System RAM" to also be able to target
> persistent memory through a DAX mapping.
> 
> Since that time the "page-less" DAX path has continued to mature [1]
> without growing new dependencies on struct page, but at the same time
> continuing to rely on ZONE_DEVICE to satisfy get_user_pages().
> 
> Peer-to-peer DMA appears to be evolving from a niche embedded use case
> to something general purpose platforms will need to comprehend. The
> "map_peer_resource" [2] approach looks to be headed to the same
> destination as the pfn-to-scatterlist effort. It's difficult to avoid
> 'struct page' for describing DMA operations without custom driver
> code.
> 
> With that background, a statement and a question to discuss at LSF/MM:
> 
> General purpose DMA, i.e. any DMA setup through the dma-mapping-api,
> requires pfn_to_page() support across the entire physical address
> range mapped.

Note that in my case it is even worse. The pfn of the page does not
correspond to anything so it need to go through a special function
to find if a page can be mapped for another device and to provide a
valid pfn at which the page can be access by other device.

Basicly the PCIE bar is like a window into the device memory that is
dynamicly remap to specific page of the device memory. Not all device
memory can be expose through PCIE bar because of PCIE issues.

> 
> Is ZONE_DEVICE the proper vehicle for this? We've already seen that it
> collides with platform alignment assumptions [3], and if there's a
> wider effort to rework memory hotplug [4] it seems DMA support should
> be part of the discussion.

Obvioulsy i would like to join this discussion :)

Cheers,
Jérôme
_______________________________________________
Linux-nvdimm mailing list
Linux-nvdimm@lists.01.org
https://lists.01.org/mailman/listinfo/linux-nvdimm

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Memory hotplug, ZONE_DEVICE, and the future of struct page
@ 2017-01-12 23:14   ` Jerome Glisse
  0 siblings, 0 replies; 13+ messages in thread
From: Jerome Glisse @ 2017-01-12 23:14 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, lsf-pc, linux-fsdevel, linux-nvdimm@lists.01.org,
	linux-block, Stephen Bates, Logan Gunthorpe, Jason Gunthorpe

On Thu, Jan 12, 2017 at 02:43:03PM -0800, Dan Williams wrote:
> Back when we were first attempting to support DMA for DAX mappings of
> persistent memory the plan was to forgo 'struct page' completely and
> develop a pfn-to-scatterlist capability for the dma-mapping-api. That
> effort died in this thread:
> 
>     https://lkml.org/lkml/2015/8/14/3
> 
> ...where we learned that the dependencies on struct page for dma
> mapping are deeper than a PFN_PHYS() conversion for some
> architectures. That was the moment we pivoted to ZONE_DEVICE and
> arranged for a 'struct page' to be available for any persistent memory
> range that needs to be the target of DMA. ZONE_DEVICE enables any
> device-driver that can target "System RAM" to also be able to target
> persistent memory through a DAX mapping.
> 
> Since that time the "page-less" DAX path has continued to mature [1]
> without growing new dependencies on struct page, but at the same time
> continuing to rely on ZONE_DEVICE to satisfy get_user_pages().
> 
> Peer-to-peer DMA appears to be evolving from a niche embedded use case
> to something general purpose platforms will need to comprehend. The
> "map_peer_resource" [2] approach looks to be headed to the same
> destination as the pfn-to-scatterlist effort. It's difficult to avoid
> 'struct page' for describing DMA operations without custom driver
> code.
> 
> With that background, a statement and a question to discuss at LSF/MM:
> 
> General purpose DMA, i.e. any DMA setup through the dma-mapping-api,
> requires pfn_to_page() support across the entire physical address
> range mapped.

Note that in my case it is even worse. The pfn of the page does not
correspond to anything so it need to go through a special function
to find if a page can be mapped for another device and to provide a
valid pfn at which the page can be access by other device.

Basicly the PCIE bar is like a window into the device memory that is
dynamicly remap to specific page of the device memory. Not all device
memory can be expose through PCIE bar because of PCIE issues.

> 
> Is ZONE_DEVICE the proper vehicle for this? We've already seen that it
> collides with platform alignment assumptions [3], and if there's a
> wider effort to rework memory hotplug [4] it seems DMA support should
> be part of the discussion.

Obvioulsy i would like to join this discussion :)

Cheers,
Jï¿½rï¿½me

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Memory hotplug, ZONE_DEVICE, and the future of struct page
@ 2017-01-12 23:14   ` Jerome Glisse
  0 siblings, 0 replies; 13+ messages in thread
From: Jerome Glisse @ 2017-01-12 23:14 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, lsf-pc, linux-fsdevel, linux-nvdimm@lists.01.org,
	linux-block, Stephen Bates, Logan Gunthorpe, Jason Gunthorpe

On Thu, Jan 12, 2017 at 02:43:03PM -0800, Dan Williams wrote:
> Back when we were first attempting to support DMA for DAX mappings of
> persistent memory the plan was to forgo 'struct page' completely and
> develop a pfn-to-scatterlist capability for the dma-mapping-api. That
> effort died in this thread:
> 
>     https://lkml.org/lkml/2015/8/14/3
> 
> ...where we learned that the dependencies on struct page for dma
> mapping are deeper than a PFN_PHYS() conversion for some
> architectures. That was the moment we pivoted to ZONE_DEVICE and
> arranged for a 'struct page' to be available for any persistent memory
> range that needs to be the target of DMA. ZONE_DEVICE enables any
> device-driver that can target "System RAM" to also be able to target
> persistent memory through a DAX mapping.
> 
> Since that time the "page-less" DAX path has continued to mature [1]
> without growing new dependencies on struct page, but at the same time
> continuing to rely on ZONE_DEVICE to satisfy get_user_pages().
> 
> Peer-to-peer DMA appears to be evolving from a niche embedded use case
> to something general purpose platforms will need to comprehend. The
> "map_peer_resource" [2] approach looks to be headed to the same
> destination as the pfn-to-scatterlist effort. It's difficult to avoid
> 'struct page' for describing DMA operations without custom driver
> code.
> 
> With that background, a statement and a question to discuss at LSF/MM:
> 
> General purpose DMA, i.e. any DMA setup through the dma-mapping-api,
> requires pfn_to_page() support across the entire physical address
> range mapped.

Note that in my case it is even worse. The pfn of the page does not
correspond to anything so it need to go through a special function
to find if a page can be mapped for another device and to provide a
valid pfn at which the page can be access by other device.

Basicly the PCIE bar is like a window into the device memory that is
dynamicly remap to specific page of the device memory. Not all device
memory can be expose through PCIE bar because of PCIE issues.

> 
> Is ZONE_DEVICE the proper vehicle for this? We've already seen that it
> collides with platform alignment assumptions [3], and if there's a
> wider effort to rework memory hotplug [4] it seems DMA support should
> be part of the discussion.

Obvioulsy i would like to join this discussion :)

Cheers,
Jï¿½rï¿½me

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Memory hotplug, ZONE_DEVICE, and the future of struct page
@ 2017-01-12 23:14   ` Jerome Glisse
  0 siblings, 0 replies; 13+ messages in thread
From: Jerome Glisse @ 2017-01-12 23:14 UTC (permalink / raw)
  To: Dan Williams
  Cc: Linux MM, lsf-pc, linux-fsdevel, linux-nvdimm@lists.01.org,
	linux-block, Stephen Bates, Logan Gunthorpe, Jason Gunthorpe

On Thu, Jan 12, 2017 at 02:43:03PM -0800, Dan Williams wrote:
> Back when we were first attempting to support DMA for DAX mappings of
> persistent memory the plan was to forgo 'struct page' completely and
> develop a pfn-to-scatterlist capability for the dma-mapping-api. That
> effort died in this thread:
> 
>     https://lkml.org/lkml/2015/8/14/3
> 
> ...where we learned that the dependencies on struct page for dma
> mapping are deeper than a PFN_PHYS() conversion for some
> architectures. That was the moment we pivoted to ZONE_DEVICE and
> arranged for a 'struct page' to be available for any persistent memory
> range that needs to be the target of DMA. ZONE_DEVICE enables any
> device-driver that can target "System RAM" to also be able to target
> persistent memory through a DAX mapping.
> 
> Since that time the "page-less" DAX path has continued to mature [1]
> without growing new dependencies on struct page, but at the same time
> continuing to rely on ZONE_DEVICE to satisfy get_user_pages().
> 
> Peer-to-peer DMA appears to be evolving from a niche embedded use case
> to something general purpose platforms will need to comprehend. The
> "map_peer_resource" [2] approach looks to be headed to the same
> destination as the pfn-to-scatterlist effort. It's difficult to avoid
> 'struct page' for describing DMA operations without custom driver
> code.
> 
> With that background, a statement and a question to discuss at LSF/MM:
> 
> General purpose DMA, i.e. any DMA setup through the dma-mapping-api,
> requires pfn_to_page() support across the entire physical address
> range mapped.

Note that in my case it is even worse. The pfn of the page does not
correspond to anything so it need to go through a special function
to find if a page can be mapped for another device and to provide a
valid pfn at which the page can be access by other device.

Basicly the PCIE bar is like a window into the device memory that is
dynamicly remap to specific page of the device memory. Not all device
memory can be expose through PCIE bar because of PCIE issues.

> 
> Is ZONE_DEVICE the proper vehicle for this? We've already seen that it
> collides with platform alignment assumptions [3], and if there's a
> wider effort to rework memory hotplug [4] it seems DMA support should
> be part of the discussion.

Obvioulsy i would like to join this discussion :)

Cheers,
Jerome

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Memory hotplug, ZONE_DEVICE, and the future of struct page
  2017-01-12 23:14   ` Jerome Glisse
@ 2017-01-12 23:59     ` Dan Williams
  -1 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2017-01-12 23:59 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Linux MM, lsf-pc, linux-fsdevel, linux-nvdimm@lists.01.org,
	linux-block, Stephen Bates, Logan Gunthorpe, Jason Gunthorpe

On Thu, Jan 12, 2017 at 3:14 PM, Jerome Glisse <jglisse@redhat.com> wrote:
> On Thu, Jan 12, 2017 at 02:43:03PM -0800, Dan Williams wrote:
>> Back when we were first attempting to support DMA for DAX mappings of
>> persistent memory the plan was to forgo 'struct page' completely and
>> develop a pfn-to-scatterlist capability for the dma-mapping-api. That
>> effort died in this thread:
>>
>>     https://lkml.org/lkml/2015/8/14/3
>>
>> ...where we learned that the dependencies on struct page for dma
>> mapping are deeper than a PFN_PHYS() conversion for some
>> architectures. That was the moment we pivoted to ZONE_DEVICE and
>> arranged for a 'struct page' to be available for any persistent memory
>> range that needs to be the target of DMA. ZONE_DEVICE enables any
>> device-driver that can target "System RAM" to also be able to target
>> persistent memory through a DAX mapping.
>>
>> Since that time the "page-less" DAX path has continued to mature [1]
>> without growing new dependencies on struct page, but at the same time
>> continuing to rely on ZONE_DEVICE to satisfy get_user_pages().
>>
>> Peer-to-peer DMA appears to be evolving from a niche embedded use case
>> to something general purpose platforms will need to comprehend. The
>> "map_peer_resource" [2] approach looks to be headed to the same
>> destination as the pfn-to-scatterlist effort. It's difficult to avoid
>> 'struct page' for describing DMA operations without custom driver
>> code.
>>
>> With that background, a statement and a question to discuss at LSF/MM:
>>
>> General purpose DMA, i.e. any DMA setup through the dma-mapping-api,
>> requires pfn_to_page() support across the entire physical address
>> range mapped.
>
> Note that in my case it is even worse. The pfn of the page does not
> correspond to anything so it need to go through a special function
> to find if a page can be mapped for another device and to provide a
> valid pfn at which the page can be access by other device.

I still haven't quite wrapped my head about how these pfn ranges are
created. Would this be a use case for a new pfn_t flag? It doesn't
sound like something we'd want to risk describing with raw 'unsigned
long' pfns.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Memory hotplug, ZONE_DEVICE, and the future of struct page
@ 2017-01-12 23:59     ` Dan Williams
  0 siblings, 0 replies; 13+ messages in thread
From: Dan Williams @ 2017-01-12 23:59 UTC (permalink / raw)
  To: Jerome Glisse
  Cc: Linux MM, lsf-pc, linux-fsdevel, linux-nvdimm@lists.01.org,
	linux-block, Stephen Bates, Logan Gunthorpe, Jason Gunthorpe

On Thu, Jan 12, 2017 at 3:14 PM, Jerome Glisse <jglisse@redhat.com> wrote:
> On Thu, Jan 12, 2017 at 02:43:03PM -0800, Dan Williams wrote:
>> Back when we were first attempting to support DMA for DAX mappings of
>> persistent memory the plan was to forgo 'struct page' completely and
>> develop a pfn-to-scatterlist capability for the dma-mapping-api. That
>> effort died in this thread:
>>
>>     https://lkml.org/lkml/2015/8/14/3
>>
>> ...where we learned that the dependencies on struct page for dma
>> mapping are deeper than a PFN_PHYS() conversion for some
>> architectures. That was the moment we pivoted to ZONE_DEVICE and
>> arranged for a 'struct page' to be available for any persistent memory
>> range that needs to be the target of DMA. ZONE_DEVICE enables any
>> device-driver that can target "System RAM" to also be able to target
>> persistent memory through a DAX mapping.
>>
>> Since that time the "page-less" DAX path has continued to mature [1]
>> without growing new dependencies on struct page, but at the same time
>> continuing to rely on ZONE_DEVICE to satisfy get_user_pages().
>>
>> Peer-to-peer DMA appears to be evolving from a niche embedded use case
>> to something general purpose platforms will need to comprehend. The
>> "map_peer_resource" [2] approach looks to be headed to the same
>> destination as the pfn-to-scatterlist effort. It's difficult to avoid
>> 'struct page' for describing DMA operations without custom driver
>> code.
>>
>> With that background, a statement and a question to discuss at LSF/MM:
>>
>> General purpose DMA, i.e. any DMA setup through the dma-mapping-api,
>> requires pfn_to_page() support across the entire physical address
>> range mapped.
>
> Note that in my case it is even worse. The pfn of the page does not
> correspond to anything so it need to go through a special function
> to find if a page can be mapped for another device and to provide a
> valid pfn at which the page can be access by other device.

I still haven't quite wrapped my head about how these pfn ranges are
created. Would this be a use case for a new pfn_t flag? It doesn't
sound like something we'd want to risk describing with raw 'unsigned
long' pfns.

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Memory hotplug, ZONE_DEVICE, and the future of struct page
  2017-01-12 22:43 ` Dan Williams
@ 2017-01-16 12:58   ` Anshuman Khandual
  -1 siblings, 0 replies; 13+ messages in thread
From: Anshuman Khandual @ 2017-01-16 12:58 UTC (permalink / raw)
  To: Dan Williams, Linux MM, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block
  Cc: Stephen Bates, Logan Gunthorpe, Jason Gunthorpe

On 01/13/2017 04:13 AM, Dan Williams wrote:
> Back when we were first attempting to support DMA for DAX mappings of
> persistent memory the plan was to forgo 'struct page' completely and
> develop a pfn-to-scatterlist capability for the dma-mapping-api. That
> effort died in this thread:
> 
>     https://lkml.org/lkml/2015/8/14/3
> 
> ...where we learned that the dependencies on struct page for dma
> mapping are deeper than a PFN_PHYS() conversion for some
> architectures. That was the moment we pivoted to ZONE_DEVICE and
> arranged for a 'struct page' to be available for any persistent memory
> range that needs to be the target of DMA. ZONE_DEVICE enables any
> device-driver that can target "System RAM" to also be able to target
> persistent memory through a DAX mapping.
> 
> Since that time the "page-less" DAX path has continued to mature [1]
> without growing new dependencies on struct page, but at the same time
> continuing to rely on ZONE_DEVICE to satisfy get_user_pages().
> 
> Peer-to-peer DMA appears to be evolving from a niche embedded use case
> to something general purpose platforms will need to comprehend. The
> "map_peer_resource" [2] approach looks to be headed to the same
> destination as the pfn-to-scatterlist effort. It's difficult to avoid
> 'struct page' for describing DMA operations without custom driver
> code.
> 
> With that background, a statement and a question to discuss at LSF/MM:
> 
> General purpose DMA, i.e. any DMA setup through the dma-mapping-api,
> requires pfn_to_page() support across the entire physical address
> range mapped.
> 
> Is ZONE_DEVICE the proper vehicle for this? We've already seen that it
> collides with platform alignment assumptions [3], and if there's a
> wider effort to rework memory hotplug [4] it seems DMA support should
> be part of the discussion.

I had experimented with ZONE_DEVICE representation from migration point of
view. Tried migration of both anonymous pages as well as file cache pages
into and away from ZONE_DEVICE memory. Learned that the lack of 'page->lru'
element in the struct page of the ZONE_DEVICE memory makes it difficult
for it to represent file backed mapping in it's present form. But given
that ZONE_DEVICE was created to enable direct mapping (DAX) bypassing page
cache, it came as no surprise. My objective has been how ZONE_DEVICE can
accommodate movable coherent device memory. In our HMM discussions I had
brought to the attention how ZONE_DEVICE going forward should evolve to
represent all these three types of device memory.

* Unmovable addressable device memory   (persistent memory)
* Movable addressable device memory     (similar memory represented as CDM)
* Movable un-addressable device memory  (similar memory represented as HMM)

I would like to attend to discuss on the road map for ZONE_DEVICE, struct
pages and device memory in general. 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Memory hotplug, ZONE_DEVICE, and the future of struct page
@ 2017-01-16 12:58   ` Anshuman Khandual
  0 siblings, 0 replies; 13+ messages in thread
From: Anshuman Khandual @ 2017-01-16 12:58 UTC (permalink / raw)
  To: Dan Williams, Linux MM, lsf-pc, linux-fsdevel, linux-nvdimm, linux-block
  Cc: Stephen Bates, Logan Gunthorpe, Jason Gunthorpe

On 01/13/2017 04:13 AM, Dan Williams wrote:
> Back when we were first attempting to support DMA for DAX mappings of
> persistent memory the plan was to forgo 'struct page' completely and
> develop a pfn-to-scatterlist capability for the dma-mapping-api. That
> effort died in this thread:
> 
>     https://lkml.org/lkml/2015/8/14/3
> 
> ...where we learned that the dependencies on struct page for dma
> mapping are deeper than a PFN_PHYS() conversion for some
> architectures. That was the moment we pivoted to ZONE_DEVICE and
> arranged for a 'struct page' to be available for any persistent memory
> range that needs to be the target of DMA. ZONE_DEVICE enables any
> device-driver that can target "System RAM" to also be able to target
> persistent memory through a DAX mapping.
> 
> Since that time the "page-less" DAX path has continued to mature [1]
> without growing new dependencies on struct page, but at the same time
> continuing to rely on ZONE_DEVICE to satisfy get_user_pages().
> 
> Peer-to-peer DMA appears to be evolving from a niche embedded use case
> to something general purpose platforms will need to comprehend. The
> "map_peer_resource" [2] approach looks to be headed to the same
> destination as the pfn-to-scatterlist effort. It's difficult to avoid
> 'struct page' for describing DMA operations without custom driver
> code.
> 
> With that background, a statement and a question to discuss at LSF/MM:
> 
> General purpose DMA, i.e. any DMA setup through the dma-mapping-api,
> requires pfn_to_page() support across the entire physical address
> range mapped.
> 
> Is ZONE_DEVICE the proper vehicle for this? We've already seen that it
> collides with platform alignment assumptions [3], and if there's a
> wider effort to rework memory hotplug [4] it seems DMA support should
> be part of the discussion.

I had experimented with ZONE_DEVICE representation from migration point of
view. Tried migration of both anonymous pages as well as file cache pages
into and away from ZONE_DEVICE memory. Learned that the lack of 'page->lru'
element in the struct page of the ZONE_DEVICE memory makes it difficult
for it to represent file backed mapping in it's present form. But given
that ZONE_DEVICE was created to enable direct mapping (DAX) bypassing page
cache, it came as no surprise. My objective has been how ZONE_DEVICE can
accommodate movable coherent device memory. In our HMM discussions I had
brought to the attention how ZONE_DEVICE going forward should evolve to
represent all these three types of device memory.

* Unmovable addressable device memory   (persistent memory)
* Movable addressable device memory     (similar memory represented as CDM)
* Movable un-addressable device memory  (similar memory represented as HMM)

I would like to attend to discuss on the road map for ZONE_DEVICE, struct
pages and device memory in general. 


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Memory hotplug, ZONE_DEVICE, and the future of struct page
  2017-01-16 12:58   ` Anshuman Khandual
  (?)
@ 2017-01-16 22:59     ` John Hubbard
  -1 siblings, 0 replies; 13+ messages in thread
From: John Hubbard @ 2017-01-16 22:59 UTC (permalink / raw)
  To: Anshuman Khandual, Dan Williams, Linux MM, lsf-pc, linux-fsdevel,
	linux-nvdimm, linux-block
  Cc: Stephen Bates, Logan Gunthorpe, Jason Gunthorpe



On 01/16/2017 04:58 AM, Anshuman Khandual wrote:
> On 01/13/2017 04:13 AM, Dan Williams wrote:
>> Back when we were first attempting to support DMA for DAX mappings of
>> persistent memory the plan was to forgo 'struct page' completely and
>> develop a pfn-to-scatterlist capability for the dma-mapping-api. That
>> effort died in this thread:
>>
>>     https://lkml.org/lkml/2015/8/14/3
>>
>> ...where we learned that the dependencies on struct page for dma
>> mapping are deeper than a PFN_PHYS() conversion for some
>> architectures. That was the moment we pivoted to ZONE_DEVICE and
>> arranged for a 'struct page' to be available for any persistent memory
>> range that needs to be the target of DMA. ZONE_DEVICE enables any
>> device-driver that can target "System RAM" to also be able to target
>> persistent memory through a DAX mapping.
>>
>> Since that time the "page-less" DAX path has continued to mature [1]
>> without growing new dependencies on struct page, but at the same time
>> continuing to rely on ZONE_DEVICE to satisfy get_user_pages().
>>
>> Peer-to-peer DMA appears to be evolving from a niche embedded use case
>> to something general purpose platforms will need to comprehend. The
>> "map_peer_resource" [2] approach looks to be headed to the same
>> destination as the pfn-to-scatterlist effort. It's difficult to avoid
>> 'struct page' for describing DMA operations without custom driver
>> code.
>>
>> With that background, a statement and a question to discuss at LSF/MM:
>>
>> General purpose DMA, i.e. any DMA setup through the dma-mapping-api,
>> requires pfn_to_page() support across the entire physical address
>> range mapped.
>>
>> Is ZONE_DEVICE the proper vehicle for this? We've already seen that it
>> collides with platform alignment assumptions [3], and if there's a
>> wider effort to rework memory hotplug [4] it seems DMA support should
>> be part of the discussion.
>
> I had experimented with ZONE_DEVICE representation from migration point of
> view. Tried migration of both anonymous pages as well as file cache pages
> into and away from ZONE_DEVICE memory. Learned that the lack of 'page->lru'
> element in the struct page of the ZONE_DEVICE memory makes it difficult
> for it to represent file backed mapping in it's present form. But given

That reminds me: while testing out HMM in our device driver, we had some early difficulties with the 
LRU system (including pagevec) in general. For example, sometimes HMM was forced to say "I cannot 
migrate your page range, because a page is still on the very most recently used list". If the number 
of pages was very small, then *all* the pages might be on that list. :)  HMM avoids the problem by 
forcing it, but it reminds me that the LRU and pagevec were never really intended to intersect with 
device memory.

Another point that may seem unrelated at first: using struct pages and pfns to back device memory is 
still under discussion:

    a)  Need to avoid using pfns that can ever be needed for other hotpluggable memory

    b) *Very* hard to justify adding any fields to struct page, or flags for it, of course.

...but given this new-ish requirement to support these types of devices, maybe (b) actually makes 
sense. Something to discuss.

thanks,
John Hubbard
NVIDIA


> that ZONE_DEVICE was created to enable direct mapping (DAX) bypassing page
> cache, it came as no surprise. My objective has been how ZONE_DEVICE can
> accommodate movable coherent device memory. In our HMM discussions I had
> brought to the attention how ZONE_DEVICE going forward should evolve to
> represent all these three types of device memory.
>
> * Unmovable addressable device memory   (persistent memory)
> * Movable addressable device memory     (similar memory represented as CDM)
> * Movable un-addressable device memory  (similar memory represented as HMM)
>
> I would like to attend to discuss on the road map for ZONE_DEVICE, struct
> pages and device memory in general.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Memory hotplug, ZONE_DEVICE, and the future of struct page
@ 2017-01-16 22:59     ` John Hubbard
  0 siblings, 0 replies; 13+ messages in thread
From: John Hubbard @ 2017-01-16 22:59 UTC (permalink / raw)
  To: Anshuman Khandual, Dan Williams, Linux MM, lsf-pc, linux-fsdevel,
	linux-nvdimm, linux-block
  Cc: Stephen Bates, Logan Gunthorpe, Jason Gunthorpe



On 01/16/2017 04:58 AM, Anshuman Khandual wrote:
> On 01/13/2017 04:13 AM, Dan Williams wrote:
>> Back when we were first attempting to support DMA for DAX mappings of
>> persistent memory the plan was to forgo 'struct page' completely and
>> develop a pfn-to-scatterlist capability for the dma-mapping-api. That
>> effort died in this thread:
>>
>>     https://lkml.org/lkml/2015/8/14/3
>>
>> ...where we learned that the dependencies on struct page for dma
>> mapping are deeper than a PFN_PHYS() conversion for some
>> architectures. That was the moment we pivoted to ZONE_DEVICE and
>> arranged for a 'struct page' to be available for any persistent memory
>> range that needs to be the target of DMA. ZONE_DEVICE enables any
>> device-driver that can target "System RAM" to also be able to target
>> persistent memory through a DAX mapping.
>>
>> Since that time the "page-less" DAX path has continued to mature [1]
>> without growing new dependencies on struct page, but at the same time
>> continuing to rely on ZONE_DEVICE to satisfy get_user_pages().
>>
>> Peer-to-peer DMA appears to be evolving from a niche embedded use case
>> to something general purpose platforms will need to comprehend. The
>> "map_peer_resource" [2] approach looks to be headed to the same
>> destination as the pfn-to-scatterlist effort. It's difficult to avoid
>> 'struct page' for describing DMA operations without custom driver
>> code.
>>
>> With that background, a statement and a question to discuss at LSF/MM:
>>
>> General purpose DMA, i.e. any DMA setup through the dma-mapping-api,
>> requires pfn_to_page() support across the entire physical address
>> range mapped.
>>
>> Is ZONE_DEVICE the proper vehicle for this? We've already seen that it
>> collides with platform alignment assumptions [3], and if there's a
>> wider effort to rework memory hotplug [4] it seems DMA support should
>> be part of the discussion.
>
> I had experimented with ZONE_DEVICE representation from migration point of
> view. Tried migration of both anonymous pages as well as file cache pages
> into and away from ZONE_DEVICE memory. Learned that the lack of 'page->lru'
> element in the struct page of the ZONE_DEVICE memory makes it difficult
> for it to represent file backed mapping in it's present form. But given

That reminds me: while testing out HMM in our device driver, we had some early difficulties with the 
LRU system (including pagevec) in general. For example, sometimes HMM was forced to say "I cannot 
migrate your page range, because a page is still on the very most recently used list". If the number 
of pages was very small, then *all* the pages might be on that list. :)  HMM avoids the problem by 
forcing it, but it reminds me that the LRU and pagevec were never really intended to intersect with 
device memory.

Another point that may seem unrelated at first: using struct pages and pfns to back device memory is 
still under discussion:

    a)  Need to avoid using pfns that can ever be needed for other hotpluggable memory

    b) *Very* hard to justify adding any fields to struct page, or flags for it, of course.

...but given this new-ish requirement to support these types of devices, maybe (b) actually makes 
sense. Something to discuss.

thanks,
John Hubbard
NVIDIA


> that ZONE_DEVICE was created to enable direct mapping (DAX) bypassing page
> cache, it came as no surprise. My objective has been how ZONE_DEVICE can
> accommodate movable coherent device memory. In our HMM discussions I had
> brought to the attention how ZONE_DEVICE going forward should evolve to
> represent all these three types of device memory.
>
> * Unmovable addressable device memory   (persistent memory)
> * Movable addressable device memory     (similar memory represented as CDM)
> * Movable un-addressable device memory  (similar memory represented as HMM)
>
> I would like to attend to discuss on the road map for ZONE_DEVICE, struct
> pages and device memory in general.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [LSF/MM TOPIC] Memory hotplug, ZONE_DEVICE, and the future of struct page
@ 2017-01-16 22:59     ` John Hubbard
  0 siblings, 0 replies; 13+ messages in thread
From: John Hubbard @ 2017-01-16 22:59 UTC (permalink / raw)
  To: Anshuman Khandual, Dan Williams, Linux MM, lsf-pc, linux-fsdevel,
	linux-nvdimm, linux-block
  Cc: Stephen Bates, Logan Gunthorpe, Jason Gunthorpe



On 01/16/2017 04:58 AM, Anshuman Khandual wrote:
> On 01/13/2017 04:13 AM, Dan Williams wrote:
>> Back when we were first attempting to support DMA for DAX mappings of
>> persistent memory the plan was to forgo 'struct page' completely and
>> develop a pfn-to-scatterlist capability for the dma-mapping-api. That
>> effort died in this thread:
>>
>>     https://lkml.org/lkml/2015/8/14/3
>>
>> ...where we learned that the dependencies on struct page for dma
>> mapping are deeper than a PFN_PHYS() conversion for some
>> architectures. That was the moment we pivoted to ZONE_DEVICE and
>> arranged for a 'struct page' to be available for any persistent memory
>> range that needs to be the target of DMA. ZONE_DEVICE enables any
>> device-driver that can target "System RAM" to also be able to target
>> persistent memory through a DAX mapping.
>>
>> Since that time the "page-less" DAX path has continued to mature [1]
>> without growing new dependencies on struct page, but at the same time
>> continuing to rely on ZONE_DEVICE to satisfy get_user_pages().
>>
>> Peer-to-peer DMA appears to be evolving from a niche embedded use case
>> to something general purpose platforms will need to comprehend. The
>> "map_peer_resource" [2] approach looks to be headed to the same
>> destination as the pfn-to-scatterlist effort. It's difficult to avoid
>> 'struct page' for describing DMA operations without custom driver
>> code.
>>
>> With that background, a statement and a question to discuss at LSF/MM:
>>
>> General purpose DMA, i.e. any DMA setup through the dma-mapping-api,
>> requires pfn_to_page() support across the entire physical address
>> range mapped.
>>
>> Is ZONE_DEVICE the proper vehicle for this? We've already seen that it
>> collides with platform alignment assumptions [3], and if there's a
>> wider effort to rework memory hotplug [4] it seems DMA support should
>> be part of the discussion.
>
> I had experimented with ZONE_DEVICE representation from migration point of
> view. Tried migration of both anonymous pages as well as file cache pages
> into and away from ZONE_DEVICE memory. Learned that the lack of 'page->lru'
> element in the struct page of the ZONE_DEVICE memory makes it difficult
> for it to represent file backed mapping in it's present form. But given

That reminds me: while testing out HMM in our device driver, we had some early difficulties with the 
LRU system (including pagevec) in general. For example, sometimes HMM was forced to say "I cannot 
migrate your page range, because a page is still on the very most recently used list". If the number 
of pages was very small, then *all* the pages might be on that list. :)  HMM avoids the problem by 
forcing it, but it reminds me that the LRU and pagevec were never really intended to intersect with 
device memory.

Another point that may seem unrelated at first: using struct pages and pfns to back device memory is 
still under discussion:

    a)  Need to avoid using pfns that can ever be needed for other hotpluggable memory

    b) *Very* hard to justify adding any fields to struct page, or flags for it, of course.

...but given this new-ish requirement to support these types of devices, maybe (b) actually makes 
sense. Something to discuss.

thanks,
John Hubbard
NVIDIA


> that ZONE_DEVICE was created to enable direct mapping (DAX) bypassing page
> cache, it came as no surprise. My objective has been how ZONE_DEVICE can
> accommodate movable coherent device memory. In our HMM discussions I had
> brought to the attention how ZONE_DEVICE going forward should evolve to
> represent all these three types of device memory.
>
> * Unmovable addressable device memory   (persistent memory)
> * Movable addressable device memory     (similar memory represented as CDM)
> * Movable un-addressable device memory  (similar memory represented as HMM)
>
> I would like to attend to discuss on the road map for ZONE_DEVICE, struct
> pages and device memory in general.
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information.  Any unauthorized review, use, disclosure or distribution
is prohibited.  If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2017-01-16 23:00 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-01-12 22:43 [LSF/MM TOPIC] Memory hotplug, ZONE_DEVICE, and the future of struct page Dan Williams
2017-01-12 22:43 ` Dan Williams
2017-01-12 23:14 ` Jerome Glisse
2017-01-12 23:14   ` Jerome Glisse
2017-01-12 23:14   ` Jerome Glisse
2017-01-12 23:14   ` Jerome Glisse
2017-01-12 23:59   ` Dan Williams
2017-01-12 23:59     ` Dan Williams
2017-01-16 12:58 ` Anshuman Khandual
2017-01-16 12:58   ` Anshuman Khandual
2017-01-16 22:59   ` John Hubbard
2017-01-16 22:59     ` John Hubbard
2017-01-16 22:59     ` John Hubbard

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.