From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.2 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,NICE_REPLY_A, SPF_HELO_NONE,SPF_PASS,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 533AEC433E0 for ; Thu, 11 Feb 2021 10:29:09 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EDE9364E9C for ; Thu, 11 Feb 2021 10:29:08 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EDE9364E9C Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 4AFE96B00B0; Thu, 11 Feb 2021 05:29:08 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 43B3B6B00B1; Thu, 11 Feb 2021 05:29:08 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 3020E6B00B2; Thu, 11 Feb 2021 05:29:08 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0129.hostedemail.com [216.40.44.129]) by kanga.kvack.org (Postfix) with ESMTP id 1790D6B00B0 for ; Thu, 11 Feb 2021 05:29:08 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id D02971EE6 for ; Thu, 11 Feb 2021 10:29:07 +0000 (UTC) X-FDA: 77805614334.27.drain12_4506bb127617 Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id B384E3D66B for ; Thu, 11 Feb 2021 10:29:07 +0000 (UTC) X-HE-Tag: drain12_4506bb127617 X-Filterd-Recvd-Size: 5260 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [216.205.24.124]) by imf01.hostedemail.com (Postfix) with ESMTP for ; Thu, 11 Feb 2021 10:29:06 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1613039346; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=uy1irgQ/ZwVow/3hF7bCH/moHa6j8udykjKdWqM+w4w=; b=QrrsdJRX1u16+sdFjLIJJucq6iT55FtZYFLwQ+NSdgyOUI7YrlvbRW+MVr9gDpvrWbrMUQ p16H2X7P9WZi7tQcou7Gh+8+1SAJqILCCx0Ugvo1zGxV0ch0vprGS3xjbgT+t0f1qV6uky VVT4tQ53F4OOH9JJMezsbZ+uIns9qik= Received: from mimecast-mx01.redhat.com (mimecast-mx01.redhat.com [209.132.183.4]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-343-tBokRLcyOCuiMXeuiZATSA-1; Thu, 11 Feb 2021 05:29:02 -0500 X-MC-Unique: tBokRLcyOCuiMXeuiZATSA-1 Received: from smtp.corp.redhat.com (int-mx05.intmail.prod.int.phx2.redhat.com [10.5.11.15]) (using TLSv1.2 with cipher AECDH-AES256-SHA (256/256 bits)) (No client certificate requested) by mimecast-mx01.redhat.com (Postfix) with ESMTPS id CA09FC7401; Thu, 11 Feb 2021 10:28:59 +0000 (UTC) Received: from [10.36.114.52] (ovpn-114-52.ams2.redhat.com [10.36.114.52]) by smtp.corp.redhat.com (Postfix) with ESMTP id 61FF96268D; Thu, 11 Feb 2021 10:28:53 +0000 (UTC) To: "Song Bao Hua (Barry Song)" , Jason Gunthorpe Cc: "Wangzhou (B)" , "linux-kernel@vger.kernel.org" , "iommu@lists.linux-foundation.org" , "linux-mm@kvack.org" , "linux-arm-kernel@lists.infradead.org" , "linux-api@vger.kernel.org" , Andrew Morton , Alexander Viro , "gregkh@linuxfoundation.org" , "kevin.tian@intel.com" , "jean-philippe@linaro.org" , "eric.auger@redhat.com" , "Liguozhu (Kenneth)" , "zhangfei.gao@linaro.org" , "chensihang (A)" References: <1612685884-19514-1-git-send-email-wangzhou1@hisilicon.com> <1612685884-19514-2-git-send-email-wangzhou1@hisilicon.com> <20210208183348.GV4718@ziepe.ca> <0dca000a6cd34d8183062466ba7d6eaf@hisilicon.com> <20210208213023.GZ4718@ziepe.ca> <0868d209d7424942a46d1238674cf75d@hisilicon.com> <20210209135331.GF4718@ziepe.ca> <2527b4ac8df14fa1b427bef65dace719@hisilicon.com> <20210210180405.GP4718@ziepe.ca> <8a676b45ebaa49e8886f4bf9b762bb75@hisilicon.com> From: David Hildenbrand Organization: Red Hat GmbH Subject: Re: [RFC PATCH v3 1/2] mempinfd: Add new syscall to provide memory pin Message-ID: Date: Thu, 11 Feb 2021 11:28:52 +0100 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Thunderbird/78.5.0 MIME-Version: 1.0 In-Reply-To: <8a676b45ebaa49e8886f4bf9b762bb75@hisilicon.com> Content-Type: text/plain; charset=utf-8; format=flowed Content-Language: en-US X-Scanned-By: MIMEDefang 2.79 on 10.5.11.15 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: >> >> Again in proper SVA it should be quite unlikely to take a fault caused >> by something like migration, on the same likelyhood as the CPU. If >> things are faulting so much this is a problem then I think it is a >> system level problem with doing too much page motion. >=20 > My point is that single one SVA application shouldn't require system > to make global changes, such as disabling numa balancing, disabling > THP, to decrease page fault frequency by affecting other applications. >=20 > Anyway, guys are in lunar new year. Hopefully, we are getting more > real benchmark data afterwards to make the discussion more targeted. Right, but I think functionality as proposed in this patch is highly=20 unlikely to make it into the kernel. I'd be interested in approaches to=20 mitigate this per process. E.g., temporarily stop the kernel from=20 messing with THP of this specific process. But even then, why should some random library make such decisions for a=20 whole process? Just as, why should some random library pin pages never=20 allocated by it and stop THP from being created or NUMA layout from=20 getting optimized? This looks like a clear layer violation to me. I fully agree with Jason: Why do the events happen that often such that=20 your use cases are affected that heavily, such that we even need such=20 ugly handling? What mempinfd does is exposing dangerous functionality that we don't=20 want 99.99996% of all user space to ever use via a syscall to generic=20 users to fix broken* hw. *broken might be over-stressing the situation, but the HW (SVA)=20 obviously seems to perform way worse than ordinary CPUs. --=20 Thanks, David / dhildenb