From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.8 required=3.0 tests=DKIM_SIGNED,DKIM_VALID, HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 21D4BC2D0DB for ; Wed, 22 Jan 2020 15:57:04 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id C58A921835 for ; Wed, 22 Jan 2020 15:57:03 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=intel-com.20150623.gappssmtp.com header.i=@intel-com.20150623.gappssmtp.com header.b="DQiCBrPo" Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1725989AbgAVP5D (ORCPT ); Wed, 22 Jan 2020 10:57:03 -0500 Received: from mail-ot1-f67.google.com ([209.85.210.67]:32912 "EHLO mail-ot1-f67.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725933AbgAVP5C (ORCPT ); Wed, 22 Jan 2020 10:57:02 -0500 Received: by mail-ot1-f67.google.com with SMTP id b18so6727702otp.0 for ; Wed, 22 Jan 2020 07:57:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=intel-com.20150623.gappssmtp.com; s=20150623; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc:content-transfer-encoding; bh=9U8vpXr1lSnDkiGOePVp2UvsnCYTPpW1/ZiqUrn/0bE=; b=DQiCBrPoGOz3XTfcpCIUcGNhA/KNZtuz5uRkW/2xZVqYChlveuBVRhSuYQPTMfol+W At71MaSza7laAFgO6UzkAjsrS3tr/rRQmn1CILzBhI/kaceAy5MikhtAT0S6RiGjZYhR 2tfQEjIssE+M31SShWXhpeDiKZTSgnf4J3p4yGakffIIQzBYZs80qPj2dPjtttQ3PD7x Z9YSFWrFUM+ZOFJGmBpHEjAm5Q7cKUoK2gHjnnI+Sem40jlgE5hDIOtz25ovcvksZA52 kAR5S4kNTQcK6xKsaCVPvsSBRvcXRGrA8GOC16yk7Nw4f/zt2BA6wn8f0F0/+8pXAqtK ZHug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc:content-transfer-encoding; bh=9U8vpXr1lSnDkiGOePVp2UvsnCYTPpW1/ZiqUrn/0bE=; b=FEZg/dsFLdABFDMIT/wFYGoW2IhnDwvee+AMnQup+CjLaKRiswd282YCVZPx8udRbB E4XLm3Ya+rLvhUxuaNmkJk+YwPjMbfel3Aw2A5gfK7Z0UHP3Gii84FFIYzHfYb2Qg+4Y qhDNWRLWhJg4GNtSuTMs1lDG665qmZUnqDuds7timW5i4j4HiI1LF0p6CVP+vTlysQFy /T6B4Hm8AeoskT1y9KUxaNOj2K/+CV+lJMSy+N47UyzZZY32ku8xn8N34k921JsZMDe5 10lhGFy5O/1hO5ur35VlellVM8cg6UfCQ4HJyTzu5as/7e1pxRLC1Q2KUEFmqLTI83Af T+8Q== X-Gm-Message-State: APjAAAV/Gv8C+jEKmAfu/n5UCw6OGjCNxdNICVzkcAWi4kbAKAsI09Dp wIwvvGYhIpvfOYSBRK5/c5c0l6pv/Le2AfHZWQdgS3Tw X-Google-Smtp-Source: APXvYqxKRi31EhyzzaInRoUEurvrhrWOw8rTDumOjwbWE6I/QgQ4zK+fCfxN1LcK3NhE8w+XJWifpA3CF+mWkUC2Ejo= X-Received: by 2002:a9d:68cc:: with SMTP id i12mr7751497oto.207.1579708622095; Wed, 22 Jan 2020 07:57:02 -0800 (PST) MIME-Version: 1.0 References: <20200122023100.75226-1-jglisse@redhat.com> <20200122050012.GD76712@redhat.com> In-Reply-To: <20200122050012.GD76712@redhat.com> From: Dan Williams Date: Wed, 22 Jan 2020 07:56:50 -0800 Message-ID: Subject: Re: [Lsf-pc] [LSF/MM/BPF TOPIC] Do not pin pages for various direct-io scheme To: Jerome Glisse Cc: linux-fsdevel , linux-mm , lsf-pc@lists.linux-foundation.org, Jens Axboe , Benjamin LaHaise Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-fsdevel@vger.kernel.org On Tue, Jan 21, 2020 at 9:04 PM Jerome Glisse wrote: > > On Tue, Jan 21, 2020 at 08:19:54PM -0800, Dan Williams wrote: > > On Tue, Jan 21, 2020 at 6:34 PM wrote: > > > > > > From: J=C3=A9r=C3=B4me Glisse > > > > > > Direct I/O does pin memory through GUP (get user page) this does > > > block several mm activities like: > > > - compaction > > > - numa > > > - migration > > > ... > > > > > > It is also troublesome if the pinned pages are actualy file back > > > pages that migth go under writeback. In which case the page can > > > not be write protected from direct-io point of view (see various > > > discussion about recent work on GUP [1]). This does happens for > > > instance if the virtual memory address use as buffer for read > > > operation is the outcome of an mmap of a regular file. > > > > > > > > > With direct-io or aio (asynchronous io) pages are pinned until > > > syscall completion (which depends on many factors: io size, > > > block device speed, ...). For io-uring pages can be pinned an > > > indifinite amount of time. > > > > > > > > > So i would like to convert direct io code (direct-io, aio and > > > io-uring) to obey mmu notifier and thus allow memory management > > > and writeback to work and behave like any other process memory. > > > > > > For direct-io and aio this mostly gives a way to wait on syscall > > > completion. For io-uring this means that buffer might need to be > > > re-validated (ie looking up pages again to get the new set of > > > pages for the buffer). Impact for io-uring is the delay needed > > > to lookup new pages or wait on writeback (if necessary). This > > > would only happens _if_ an invalidation event happens, which it- > > > self should only happen under memory preissure or for NUMA > > > activities. > > > > This seems to assume that memory pressure and NUMA migration are rare > > events. Some of the proposed hierarchical memory management schemes > > [1] might impact that assumption. > > > > [1]: http://lore.kernel.org/r/20191101075727.26683-1-ying.huang@intel.c= om/ > > > > Yes, it is true that it will likely becomes more and more an issues. > We are facing a tough choice here as pining block NUMA or any kind of > migration and thus might impede performance while invalidating an io- > uring buffer will also cause a small latency burst. I do not think we > can make everyone happy but at very least we should avoid pining and > provide knobs to let user decide what they care more about (ie io with- > out burst or better NUMA locality). It's a question of tradeoffs and this proposal seems to have already decided that the question should be answered in favor a GPU/SVM centric view of the world without presenting the alternative. Direct-I/O colliding with GPU operations might also be solved by always triggering a migration, and applications that care would avoid colliding operations that slow down their GPU workload. A slow compat fallback that applications can programmatically avoid is more flexible than an upfront knob.