From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.2 required=3.0 tests=BAYES_00,DKIMWL_WL_HIGH, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_RED autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 9F61EC11F69 for ; Fri, 2 Jul 2021 00:21:20 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 68B5161411 for ; Fri, 2 Jul 2021 00:21:20 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S234434AbhGBAXH (ORCPT ); Thu, 1 Jul 2021 20:23:07 -0400 Received: from us-smtp-delivery-124.mimecast.com ([216.205.24.124]:58700 "EHLO us-smtp-delivery-124.mimecast.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S234195AbhGBAXG (ORCPT ); Thu, 1 Jul 2021 20:23:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1625185234; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=rQPsdOkVHeGKBuEsStJMLYq9tWsj5wYRKjPTUAXJMQg=; b=fM+iRd5ODoNG0m8i2Z2qxJvCe54K1AUZBzK/r2rwBJlEAwKbOgI2noqe//ZUWMUGLPeTNB hAnrEk/m7U92hgTkgG/zw6N/HzlVldQAA6tFvpOiEBj1G0JhqNocgfKzKc2Mb8bSS+JHGH unMoj+WswHVsRk6VXzPFNx4cHjYjWUg= Received: from mail-wm1-f72.google.com (mail-wm1-f72.google.com [209.85.128.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-233-hgA3ORIPMsidc1wjVIcNZw-1; Thu, 01 Jul 2021 20:20:33 -0400 X-MC-Unique: hgA3ORIPMsidc1wjVIcNZw-1 Received: by mail-wm1-f72.google.com with SMTP id j38-20020a05600c1c26b02901dbf7d18ff8so5411634wms.8 for ; Thu, 01 Jul 2021 17:20:33 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=rQPsdOkVHeGKBuEsStJMLYq9tWsj5wYRKjPTUAXJMQg=; b=FVKE2wPKMdqwdHWhat57kKXqslieq7TNpmY9T97msrN93Ycz4//Ty3x6neWRKVgDFK QeLG6mWsm2w+9isy3Iy2IybBw+EaXN1AoQabjbIDVPE6NLi/OxmvB4Gnhv2XSFtC/blm G97ekOv45ApSssA9jg/k3PIDyP9mIRdLNqLViz7QVD56iNDQBuXlDA/gImKKAjgIaBcY MSw1p+Z35greHpo426ZQnawV2+s340joKkj+dF7zxTvuuG3sJziD36nDxkMs2oYLNHzx DULY+J/kYaSzHGeE614EX+VppsauZDKXNJbs3IV6XhNVa7MFMVa2qdpjPVQZQKmNxs1u hjug== X-Gm-Message-State: AOAM533ps+DVgJfiDEaff2A5cd9YMaIK3DqoAqzHOzuFZd0LqyabjCvr xuBKNJdojSqD5nivyf312eatPO2bqgBJlZwouiI5ZYy25XndzKPtJQJg5rtQL2kQw9/oZsVvvol YY05uV5KeDecfB5dK9ixvryDSjYiFcjjNg3UATj1s X-Received: by 2002:a7b:c4da:: with SMTP id g26mr13499688wmk.3.1625185232447; Thu, 01 Jul 2021 17:20:32 -0700 (PDT) X-Google-Smtp-Source: ABdhPJz2mrmrFqhY+KJO7OmT5+cTABba2Xcud4NJgvF/TCklQGhDqfs42VqLXpLsCT79j3xMTA47vuLYlFge+nI55lQ= X-Received: by 2002:a7b:c4da:: with SMTP id g26mr13499672wmk.3.1625185232278; Thu, 01 Jul 2021 17:20:32 -0700 (PDT) MIME-Version: 1.0 References: <20210701204246.2037142-1-agruenba@redhat.com> In-Reply-To: From: Andreas Gruenbacher Date: Fri, 2 Jul 2021 02:20:20 +0200 Message-ID: Subject: Re: [PATCH] gfs2: Fix mmap + page fault deadlocks To: Linus Torvalds Cc: Alexander Viro , cluster-devel , Linux Kernel Mailing List , Jan Kara , Matthew Wilcox Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Thu, Jul 1, 2021 at 11:41 PM Linus Torvalds wrote: > On Thu, Jul 1, 2021 at 1:43 PM Andreas Gruenbacher wrote: > > here's another attempt at fixing the mmap + page fault deadlocks we're > > seeing on gfs2. Still not ideal because get_user_pages_fast ignores the > > current->pagefault_disabled flag > > Of course get_user_pages_fast() ignores the pagefault_disabled flag, > because it doesn't do any page faults. > > If you don't want to fall back to the "maybe do IO" case, you should > use the FOLL_FAST_ONLY flag - or get_user_pages_fast_only(), which > does that itself. > > > For getting get_user_pages_fast changed to fix this properly, I'd need > > help from the memory management folks. > > I really don't think you need anything at all from the mm people, > because we already support that whole "fast only" case. Yes, fair enough. > Also, I have to say that I think the direct-IO code is fundamentally > mis-designed. Why it is doing the page lookup _during_ the IO is a > complete mystery to me. Why wasn't that done ahead of time before the > filesystem took the locks it needed? That would be inconvenient for reads, when the number of bytes read is much smaller than the buffer size and we won't need to page in the entire buffer. > So what the direct-IO code _should_ do is to turn an ITER_IOVEC into a > ITER_KVEC by doing the page lookup ahead of time, and none of these > issues should even exist, and then the whole pagefault_disabled and/or > FOLL_FAST_ONLY would be a complete non-issue. > > Is there any reason why that isn't what it does (other than historical baggage)? It turns out that there's an even deeper issue with keeping references to user-space pages. Those references will essentially pin the glock of the associated inode to the node. Moving a glock off a node requires truncating the inode's page cache, but the page references would prevent that. So we'd only end up with different kinds of potential deadlocks. If we could get iomap_dio_rw to use "fast only" mode when requested, we could fault in the pages without keeping references, try the IO, and repeat when necessary. Thanks a lot, Adreas