From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-7.8 required=3.0 tests=BAYES_00, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING, PDS_BAD_THREAD_QP_64,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4ED6BC433DB for ; Mon, 8 Feb 2021 02:28:19 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 4E92364E0B for ; Mon, 8 Feb 2021 02:28:18 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 4E92364E0B Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=hisilicon.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 95A926B0006; Sun, 7 Feb 2021 21:28:17 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 90B1F6B006C; Sun, 7 Feb 2021 21:28:17 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8494B6B006E; Sun, 7 Feb 2021 21:28:17 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0232.hostedemail.com [216.40.44.232]) by kanga.kvack.org (Postfix) with ESMTP id 6EBEE6B0006 for ; Sun, 7 Feb 2021 21:28:17 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with ESMTP id 454758249980 for ; Mon, 8 Feb 2021 02:28:17 +0000 (UTC) X-FDA: 77793516234.28.sheet45_4310fca275fb Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin28.hostedemail.com (Postfix) with ESMTP id 21D316C11 for ; Mon, 8 Feb 2021 02:28:17 +0000 (UTC) X-HE-Tag: sheet45_4310fca275fb X-Filterd-Recvd-Size: 7095 Received: from szxga02-in.huawei.com (szxga02-in.huawei.com [45.249.212.188]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Mon, 8 Feb 2021 02:28:15 +0000 (UTC) Received: from DGGEMM404-HUB.china.huawei.com (unknown [172.30.72.56]) by szxga02-in.huawei.com (SkyGuard) with ESMTP id 4DYqfS6xxczRCKl; Mon, 8 Feb 2021 10:26:56 +0800 (CST) Received: from dggpemm100010.china.huawei.com (7.185.36.24) by DGGEMM404-HUB.china.huawei.com (10.3.20.212) with Microsoft SMTP Server (TLS) id 14.3.498.0; Mon, 8 Feb 2021 10:27:06 +0800 Received: from dggemi761-chm.china.huawei.com (10.1.198.147) by dggpemm100010.china.huawei.com (7.185.36.24) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_128_CBC_SHA256) id 15.1.2106.2; Mon, 8 Feb 2021 10:27:06 +0800 Received: from dggemi761-chm.china.huawei.com ([10.9.49.202]) by dggemi761-chm.china.huawei.com ([10.9.49.202]) with mapi id 15.01.2106.006; Mon, 8 Feb 2021 10:27:06 +0800 From: "Song Bao Hua (Barry Song)" To: Matthew Wilcox CC: "Wangzhou (B)" , "linux-kernel@vger.kernel.org" , "iommu@lists.linux-foundation.org" , "linux-mm@kvack.org" , "linux-arm-kernel@lists.infradead.org" , "linux-api@vger.kernel.org" , Andrew Morton , Alexander Viro , "gregkh@linuxfoundation.org" , "jgg@ziepe.ca" , "kevin.tian@intel.com" , "jean-philippe@linaro.org" , "eric.auger@redhat.com" , "Liguozhu (Kenneth)" , "zhangfei.gao@linaro.org" , "chensihang (A)" Subject: RE: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory pin Thread-Topic: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory pin Thread-Index: AQHW/SrsWWMRpilf2UC1Pz29QqsBVqpMsX2AgACQE1D//7IVAIAAi2xQ Date: Mon, 8 Feb 2021 02:27:06 +0000 Message-ID: References: <1612685884-19514-1-git-send-email-wangzhou1@hisilicon.com> <1612685884-19514-2-git-send-email-wangzhou1@hisilicon.com> <20210207213409.GL308988@casper.infradead.org> <20210208013056.GM308988@casper.infradead.org> In-Reply-To: <20210208013056.GM308988@casper.infradead.org> Accept-Language: en-GB, en-US Content-Language: en-US X-MS-Has-Attach: X-MS-TNEF-Correlator: x-originating-ip: [10.126.200.200] Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: quoted-printable MIME-Version: 1.0 X-CFilter-Loop: Reflected X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > -----Original Message----- > From: owner-linux-mm@kvack.org [mailto:owner-linux-mm@kvack.org] On Behal= f Of > Matthew Wilcox > Sent: Monday, February 8, 2021 2:31 PM > To: Song Bao Hua (Barry Song) > Cc: Wangzhou (B) ; linux-kernel@vger.kernel.org; > iommu@lists.linux-foundation.org; linux-mm@kvack.org; > linux-arm-kernel@lists.infradead.org; linux-api@vger.kernel.org; Andrew > Morton ; Alexander Viro ; > gregkh@linuxfoundation.org; jgg@ziepe.ca; kevin.tian@intel.com; > jean-philippe@linaro.org; eric.auger@redhat.com; Liguozhu (Kenneth) > ; zhangfei.gao@linaro.org; chensihang (A) > > Subject: Re: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memo= ry > pin >=20 > On Sun, Feb 07, 2021 at 10:24:28PM +0000, Song Bao Hua (Barry Song) wrote= : > > > > In high-performance I/O cases, accelerators might want to perform > > > > I/O on a memory without IO page faults which can result in dramatic= ally > > > > increased latency. Current memory related APIs could not achieve th= is > > > > requirement, e.g. mlock can only avoid memory to swap to backup dev= ice, > > > > page migration can still trigger IO page fault. > > > > > > Well ... we have two requirements. The application wants to not take > > > page faults. The system wants to move the application to a different > > > NUMA node in order to optimise overall performance. Why should the > > > application's desires take precedence over the kernel's desires? And= why > > > should it be done this way rather than by the sysadmin using numactl = to > > > lock the application to a particular node? > > > > NUMA balancer is just one of many reasons for page migration. Even one > > simple alloc_pages() can cause memory migration in just single NUMA > > node or UMA system. > > > > The other reasons for page migration include but are not limited to: > > * memory move due to CMA > > * memory move due to huge pages creation > > > > Hardly we can ask users to disable the COMPACTION, CMA and Huge Page > > in the whole system. >=20 > You're dodging the question. Should the CMA allocation fail because > another application is using SVA? >=20 > I would say no. =20 I would say no as well. While IOMMU is enabled, CMA almost has one user only: IOMMU driver as other drivers will depend on iommu to use non-contiguous memory though they are still calling dma_alloc_coherent(). In iommu driver, dma_alloc_coherent is called during initialization and there is no new allocation afterwards. So it wouldn't cause runtime impact on SVA performance. Even there is new allocations, CMA will fall back to general alloc_pages() and iommu drivers are almost allocating small memory for command queues. So I would say general compound pages, huge pages, especially transparent huge pages, would be bigger concerns than CMA for internal page migration within one NUMA.=20 Not like CMA, general alloc_pages() can get memory by moving pages other than those pinned. And there is no guarantee we can always bind the memory of SVA applications to single one NUMA, so NUMA balancing is still a concern. But I agree we need a way to make CMA success while the userspace pages are pinned. Since pin has been viral in many drivers, I assume there is a way to handle this. Otherwise, APIs like=20 V4L2_MEMORY_USERPTR[1] will possibly make CMA fail as there is no guarantee that usersspace will allocate unmovable memory and there is no guarantee the fallback path- alloc_pages() can succeed while allocating big memory. Will investigate more. > The application using SVA should take the one-time > performance hit from having its memory moved around. Sometimes I also feel SVA is doomed to suffer from performance impact due to page migration. But we are still trying to extend its use cases to high-performance I/O. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree= /drivers/media/v4l2-core/videobuf-dma-sg.c Thanks Barry