From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S934118AbcKVVVk (ORCPT ); Tue, 22 Nov 2016 16:21:40 -0500 Received: from mail-oi0-f50.google.com ([209.85.218.50]:33109 "EHLO mail-oi0-f50.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756103AbcKVVVg (ORCPT ); Tue, 22 Nov 2016 16:21:36 -0500 MIME-Version: 1.0 In-Reply-To: References: <75a1f44f-c495-7d1e-7e1c-17e89555edba@amd.com> From: Dan Williams Date: Tue, 22 Nov 2016 13:21:03 -0800 Message-ID: Subject: Re: Enabling peer to peer device transactions for PCIe devices To: Daniel Vetter Cc: Serguei Sagalovitch , Dave Hansen , "linux-nvdimm@lists.01.org" , "linux-rdma@vger.kernel.org" , "linux-pci@vger.kernel.org" , "Kuehling, Felix" , "linux-kernel@vger.kernel.org" , "dri-devel@lists.freedesktop.org" , "Koenig, Christian" , "Sander, Ben" , "Suthikulpanit, Suravee" , "Deucher, Alexander" , "Blinzer, Paul" , "Linux-media@vger.kernel.org" Content-Type: text/plain; charset=UTF-8 Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Tue, Nov 22, 2016 at 1:03 PM, Daniel Vetter wrote: > On Tue, Nov 22, 2016 at 9:35 PM, Serguei Sagalovitch > wrote: >> >> On 2016-11-22 03:10 PM, Daniel Vetter wrote: >>> >>> On Tue, Nov 22, 2016 at 9:01 PM, Dan Williams >>> wrote: >>>> >>>> On Tue, Nov 22, 2016 at 10:59 AM, Serguei Sagalovitch >>>> wrote: >>>>> >>>>> I personally like "device-DAX" idea but my concerns are: >>>>> >>>>> - How well it will co-exists with the DRM infrastructure / >>>>> implementations >>>>> in part dealing with CPU pointers? >>>> >>>> Inside the kernel a device-DAX range is "just memory" in the sense >>>> that you can perform pfn_to_page() on it and issue I/O, but the vma is >>>> not migratable. To be honest I do not know how well that co-exists >>>> with drm infrastructure. >>>> >>>>> - How well we will be able to handle case when we need to >>>>> "move"/"evict" >>>>> memory/data to the new location so CPU pointer should point to the >>>>> new >>>>> physical location/address >>>>> (and may be not in PCI device memory at all)? >>>> >>>> So, device-DAX deliberately avoids support for in-kernel migration or >>>> overcommit. Those cases are left to the core mm or drm. The device-dax >>>> interface is for cases where all that is needed is a direct-mapping to >>>> a statically-allocated physical-address range be it persistent memory >>>> or some other special reserved memory range. >>> >>> For some of the fancy use-cases (e.g. to be comparable to what HMM can >>> pull off) I think we want all the magic in core mm, i.e. migration and >>> overcommit. At least that seems to be the very strong drive in all >>> general-purpose gpu abstractions and implementations, where memory is >>> allocated with malloc, and then mapped/moved into vram/gpu address >>> space through some magic, >> >> It is possible that there is other way around: memory is requested to be >> allocated and should be kept in vram for performance reason but due >> to possible overcommit case we need at least temporally to "move" such >> allocation to system memory. > > With migration I meant migrating both ways of course. And with stuff > like numactl we can also influence where exactly the malloc'ed memory > is allocated originally, at least if we'd expose the vram range as a > very special numa node that happens to be far away and not hold any > cpu cores. I don't think we should be using numa distance to reverse engineer a certain allocation behavior. The latency data should be truthful, but you're right we'll need a mechanism to keep general purpose allocations out of that range by default. Btw, strict isolation is another design point of device-dax, but I think in this case we're describing something between the two extremes of full isolation and full compatibility with existing numactl apis.