From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.3 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,NICE_REPLY_A,SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 366D5C433DF for ; Tue, 18 Aug 2020 17:07:35 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id CF35F2067C for ; Tue, 18 Aug 2020 17:07:34 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=nvidia.com header.i=@nvidia.com header.b="C5vqoiPd" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org CF35F2067C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=nvidia.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 6FA0D8D002F; Tue, 18 Aug 2020 13:07:34 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 6A9DE8D0001; Tue, 18 Aug 2020 13:07:34 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5733E8D002F; Tue, 18 Aug 2020 13:07:34 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0093.hostedemail.com [216.40.44.93]) by kanga.kvack.org (Postfix) with ESMTP id 42B288D0001 for ; Tue, 18 Aug 2020 13:07:34 -0400 (EDT) Received: from smtpin20.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 02017248E for ; Tue, 18 Aug 2020 17:07:34 +0000 (UTC) X-FDA: 77164320828.20.mint61_0b16dff27021 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin20.hostedemail.com (Postfix) with ESMTP id 4F95B1801AA93 for ; Tue, 18 Aug 2020 17:06:42 +0000 (UTC) X-HE-Tag: mint61_0b16dff27021 X-Filterd-Recvd-Size: 4384 Received: from hqnvemgate26.nvidia.com (hqnvemgate26.nvidia.com [216.228.121.65]) by imf37.hostedemail.com (Postfix) with ESMTP for ; Tue, 18 Aug 2020 17:06:41 +0000 (UTC) Received: from hqpgpgate101.nvidia.com (Not Verified[216.228.121.13]) by hqnvemgate26.nvidia.com (using TLS: TLSv1.2, DES-CBC3-SHA) id ; Tue, 18 Aug 2020 10:06:26 -0700 Received: from hqmail.nvidia.com ([172.20.161.6]) by hqpgpgate101.nvidia.com (PGP Universal service); Tue, 18 Aug 2020 10:06:40 -0700 X-PGP-Universal: processed; by hqpgpgate101.nvidia.com on Tue, 18 Aug 2020 10:06:40 -0700 Received: from rcampbell-dev.nvidia.com (10.124.1.5) by HQMAIL107.nvidia.com (172.20.187.13) with Microsoft SMTP Server (TLS) id 15.0.1473.3; Tue, 18 Aug 2020 17:06:39 +0000 Subject: Re: Regarding HMM To: Valmiki , CC: References: X-Nvconfidentiality: public From: Ralph Campbell Message-ID: <3482c2c7-6827-77f7-a581-69af8adc73c3@nvidia.com> Date: Tue, 18 Aug 2020 10:06:39 -0700 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.2 MIME-Version: 1.0 In-Reply-To: X-Originating-IP: [10.124.1.5] X-ClientProxiedBy: HQMAIL101.nvidia.com (172.20.187.10) To HQMAIL107.nvidia.com (172.20.187.13) Content-Type: text/plain; charset="utf-8"; format=flowed Content-Language: en-US Content-Transfer-Encoding: 7bit DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nvidia.com; s=n1; t=1597770386; bh=JcjcJbHE3q3xrQOYuwNtWsaXz0ZaebgvgE0IwSLuD74=; h=X-PGP-Universal:Subject:To:CC:References:X-Nvconfidentiality:From: Message-ID:Date:User-Agent:MIME-Version:In-Reply-To: X-Originating-IP:X-ClientProxiedBy:Content-Type:Content-Language: Content-Transfer-Encoding; b=C5vqoiPdt+ovhLqUA6zoTZurl1YWVwYcCQ5gw/hgYY+VyZWeX/nJChPC/VU3SnSho kY9uCczt4pMh6Qm1m3jWN5jMFJwU2E8FfOm93TNgKm3ntkMbA2r1BdmuYRVCUEQC3P iIacK8grgnNoMRY7O7qOrvKr0FPGFBTQh6pkaO0iZ8g+QLwHLwInKsoYmqTeg/4nwF wzoHqg0t2qYLxGzS05Rzf3ygJUFTgezUt0S/qfua1/ZwdJg/IH3LZmfahhWL7TrIx3 LFzvLMXBcrg5C/4bL7u/PBhoOYWJ/fQ/4NAxefWaOAOMvlyEy+F0W4IhiGGpSWLpJp PiSj/eBvYTl1w== X-Rspamd-Queue-Id: 4F95B1801AA93 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam03 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On 8/18/20 12:15 AM, Valmiki wrote: > Hi All, > > Im trying to understand heterogeneous memory management, i have following doubts. > > If HMM is being used we dont have to use DMA controller on device for memory transfers ? > Without DMA if software is managing page faults and migrations, will there be any performance impacts ? > > Is HMM targeted for any specific use cases where DMA controller is not there on device ? > > Regards, > Valmiki > There are two APIs that are part of "HMM" and are independent of each other. hmm_range_fault() is for getting the physical address of a system resident memory page that a device can map but is not pinned in the usual way I/O increases the page reference count to pin the page. The device driver has to handle invalidation callbacks to remove the device mapping. This lets the device access the page without moving it. migrate_vma_setup(), migrate_vma_pages(), and migrate_vma_finalize() are used by the device driver to migrate data to device private memory. After migration, the system memory is freed and the CPU page table holds an invalid PTE that points to the device private struct page (similar to a swap PTE). If the CPU process faults on that address, there is a callback to the driver to migrate it back to system memory. This is where device DMA engines can be used to copy data to/from system memory and device private memory. The use case for the above is to be able to run code such as OpenCL on GPUs and CPUs using the same virtual addresses without having to call special memory allocators. In other words, just use mmap() and malloc() and not clSVMAlloc(). There is a performance consideration here. If the GPU accesses the data over PCIe to system memory, there is much less bandwidth than accessing local GPU memory. If the data is to be accessed/used many times, it can be more efficient to migrate the data to local GPU memory. If the data is only accessed a few times, then it is probably more efficient to map system memory.