From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-2.5 required=3.0 tests=HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED,USER_AGENT_SANE_1 autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9841DC433E0 for ; Thu, 25 Jun 2020 11:24:59 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 694B320706 for ; Thu, 25 Jun 2020 11:24:59 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 694B320706 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=suse.cz Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E46B16B0005; Thu, 25 Jun 2020 07:24:58 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id DF7DA6B0007; Thu, 25 Jun 2020 07:24:58 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D35EB6B0008; Thu, 25 Jun 2020 07:24:58 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0025.hostedemail.com [216.40.44.25]) by kanga.kvack.org (Postfix) with ESMTP id BCD0A6B0005 for ; Thu, 25 Jun 2020 07:24:58 -0400 (EDT) Received: from smtpin05.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 71B4C2DFC for ; Thu, 25 Jun 2020 11:24:58 +0000 (UTC) X-FDA: 76967502276.05.heat74_5d0b9b826e4c Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin05.hostedemail.com (Postfix) with ESMTP id 4C37618022BF4 for ; Thu, 25 Jun 2020 11:24:58 +0000 (UTC) X-HE-Tag: heat74_5d0b9b826e4c X-Filterd-Recvd-Size: 5078 Received: from mx2.suse.de (mx2.suse.de [195.135.220.15]) by imf06.hostedemail.com (Postfix) with ESMTP for ; Thu, 25 Jun 2020 11:24:57 +0000 (UTC) X-Virus-Scanned: by amavisd-new at test-mx.suse.de Received: from relay2.suse.de (unknown [195.135.221.27]) by mx2.suse.de (Postfix) with ESMTP id E7CB1AD60; Thu, 25 Jun 2020 11:24:55 +0000 (UTC) Received: by quack2.suse.cz (Postfix, from userid 1000) id AB4511E1274; Thu, 25 Jun 2020 13:24:55 +0200 (CEST) Date: Thu, 25 Jun 2020 13:24:55 +0200 From: Jan Kara To: John Hubbard Cc: Jason Gunthorpe , Chris Wilson , linux-mm@kvack.org, linux-kernel@vger.kernel.org, intel-gfx@lists.freedesktop.org, Andrew Morton , Jan Kara , =?iso-8859-1?B?Suly9G1l?= Glisse , Claudio Imbrenda , "Kirill A . Shutemov" Subject: Re: [PATCH] mm: Skip opportunistic reclaim for dma pinned pages Message-ID: <20200625112455.GC17788@quack2.suse.cz> References: <20200624191417.16735-1-chris@chris-wilson.co.uk> <20200624192116.GO6578@ziepe.ca> <44708b2e-479f-7d58-fe01-29cfd6c70bdb@nvidia.com> <20200624232047.GP6578@ziepe.ca> <887ac706-65f0-3089-b51b-47aabf7d3847@nvidia.com> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <887ac706-65f0-3089-b51b-47aabf7d3847@nvidia.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-Rspamd-Queue-Id: 4C37618022BF4 X-Spamd-Result: default: False [0.00 / 100.00] X-Rspamd-Server: rspam01 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed 24-06-20 17:11:30, John Hubbard wrote: > On 2020-06-24 16:20, Jason Gunthorpe wrote: > > > I do like this code change, though. And I *think* it's actually safe to > > > do this, as it stays away from writeback or other filesystem activity. > > > But let me double check that, in case I'm forgetting something. > > ...OK, I've checked, and I like it a little bit less now. Mainly for > structural reasons, though. I think it would work correctly. But > here's a concern: try_to_unmap() should only fail to unmap if there is a > reason to not unmap. Having a page be pinned for dma is a reason to not > *free* a page, and it's also a reason to be careful about writeback and > page buffers for writeback and such. But I'm not sure that it's a reason > to fail to remove mappings. > > True, most (all?) of the reasons that we remove mappings, generally are > for things that are not allowed while a page is dma-pinned...at least, > today. But still, there's nothing fundamental about a mapping that > should prevent it from coming or going while a page is undergoing > dma. > > So, it's merely a convenient, now-misnamed location in the call stack > to fail out. That's not great. It might be better, as Jason hints at > below, to fail out a little earlier, instead. That would lead to a more > places to call page_maybe_dma_pinned(), but that's not a real problem, > because it's still a small number of places. Agreed, I think that shrink_page_list() as Michal writes is a better place for the page_may_be_dma_pinned() check. But other than that I agree it is good to avoid all the unnecessary work of preparing a page for reclaim when we can now check there's no hope of reclaiming it (at this time). > > I don't know, but could it be that try_to_unmap() has to be done > > before checking the refcount as each mapping is included in the > > refcount? ie we couldn't know a DMA pin was active in advance? > > > > Now that we have your pin stuff we can detect a DMA pin without doing > > all the unmaps? > > > > Once something calls pin_user_page*(), then the pages will be marked > as dma-pinned, yes. So no, there is no need to wait until try_to_unmap() > to find out. Yeah, I'm pretty sure the page ref check happens so late because earlier it was impossible to tell whether there are "external" page references reclaim cannot get rid of - filesystem may (but need not!) hold a page reference for its private data, page tables will hold references as well, etc. Now with page_may_be_dma_pinned() we can detect at least one class of external references (i.e. pins) early so it's stupid not to do that. > A final note: depending on where page_maybe_dma_pinned() ends up > getting called, this might prevent a fair number of the problems that > Jan originally reported [1], and that I also reported separately! > > Well, not all of the problems, and only after the filesystems get > converted to call pin_user_pages() (working on that next), but...I think > it would actually avoid the crash our customer reported back in early > 2018. Even though we don't have the full file lease + pin_user_pages() > solution in place. > > That's because reclaim is what triggers the problems that we saw. And > with this patch, we bail out of reclaim early. I agree that with this change, some races will become much less likely for some usecases. But as you say, it's not a full solution. Honza -- Jan Kara SUSE Labs, CR