From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-8.5 required=3.0 tests=BAYES_00,DKIM_INVALID, DKIM_SIGNED,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_CR_TRAILER, MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 75B97C433DB for ; Mon, 21 Dec 2020 20:52:53 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id E3EBD229C5 for ; Mon, 21 Dec 2020 20:52:52 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org E3EBD229C5 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 329EB6B0036; Mon, 21 Dec 2020 15:52:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 2B3756B005C; Mon, 21 Dec 2020 15:52:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 106F66B0068; Mon, 21 Dec 2020 15:52:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0128.hostedemail.com [216.40.44.128]) by kanga.kvack.org (Postfix) with ESMTP id E79C96B0036 for ; Mon, 21 Dec 2020 15:52:51 -0500 (EST) Received: from smtpin13.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id B0F26180AD807 for ; Mon, 21 Dec 2020 20:52:51 +0000 (UTC) X-FDA: 77618488542.13.birth34_1f03c762745a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin13.hostedemail.com (Postfix) with ESMTP id 91BB618140B8A for ; Mon, 21 Dec 2020 20:52:51 +0000 (UTC) X-HE-Tag: birth34_1f03c762745a X-Filterd-Recvd-Size: 7973 Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [63.128.21.124]) by imf25.hostedemail.com (Postfix) with ESMTP for ; Mon, 21 Dec 2020 20:52:50 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1608583970; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=YISt5j4b8XAb8GNT7dnsvmnkE22RNgIdTlImdupifmU=; b=DGBQZcf0pJ3iJ7VJ1xWubLRp0evF4DvpT7ZVpYcIXwMUgdKZowv5dbhsoz9Ce4zYucFA+h VexD8Bjo3+rNyXJqkKD6j9cQOmGP5/MI2WPC0M2MCWsoILnEvvIpl4NFp8augtxQmuS82M ARKBOYULMiJnvGTVNQU36AtbdN5Yf+w= Received: from mail-qv1-f70.google.com (mail-qv1-f70.google.com [209.85.219.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-504-BUsEQic7Pli1FC07CRc5Vw-1; Mon, 21 Dec 2020 15:52:48 -0500 X-MC-Unique: BUsEQic7Pli1FC07CRc5Vw-1 Received: by mail-qv1-f70.google.com with SMTP id c17so9007299qvv.9 for ; Mon, 21 Dec 2020 12:52:48 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:message-id:references :mime-version:content-disposition:content-transfer-encoding :in-reply-to; bh=YISt5j4b8XAb8GNT7dnsvmnkE22RNgIdTlImdupifmU=; b=NLoja9qFo3UZOJimfNjsDwC0d8aSPFXhBtvWZzMfvgu+eSD7cUFGNQj1MQfm5FBOZV NJG4aOVRMTyI/NSZRCsVnh1EQd71tA0ltVOxLWi8zBYtb3wLr6eEYlF6UsNhrZqcFoaJ DQjdAVZ/iXOaoRPCpmK7NcSfZfwqT/Nlk5WclWuNUEzXZ+Lxv1aom68btrDlEiXqEv4q eAS/8R9nnROM+OLrdvqsDhkeCBrFs9YRhf5QwVFMIlpqlWCbDA43Yt4Q1xUvWY8PH/UQ fPR93qhDy6uyCYAMBSB1IQQOn+R+oi7qKBcDNCvLnilaOSReGzyCEQLyb8pUaW/xKzv4 Ep/Q== X-Gm-Message-State: AOAM530mdY0kFfftK5o2L+0wAYsM1yGW0BP6HUV7yyENp6aOkVxDycpD hJ4PeKQH5Ags4XredkxluqkT34I1NG0JEy5N3DW/saeYrxtWzhEtgG8hZjvcnqBa7vr8jdIL6w7 q7x0VHoZcDSQ= X-Received: by 2002:ac8:1c6a:: with SMTP id j39mr18262159qtk.341.1608583968347; Mon, 21 Dec 2020 12:52:48 -0800 (PST) X-Google-Smtp-Source: ABdhPJxlTQNtGfgcc75KoX43sIvGUMqc1M5QQjfDVaI6Wi++irNpTyfAAD7kpHTPTCFPsWuJ85Dxgg== X-Received: by 2002:ac8:1c6a:: with SMTP id j39mr18262143qtk.341.1608583968088; Mon, 21 Dec 2020 12:52:48 -0800 (PST) Received: from xz-x1 ([142.126.83.202]) by smtp.gmail.com with ESMTPSA id y67sm1521029qka.68.2020.12.21.12.52.46 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Dec 2020 12:52:47 -0800 (PST) Date: Mon, 21 Dec 2020 15:52:45 -0500 From: Peter Xu To: Nadav Amit Cc: "linux-fsdevel@vger.kernel.org" , Jens Axboe , Andrea Arcangeli , Alexander Viro , "io-uring@vger.kernel.org" , "linux-kernel@vger.kernel.org" , "linux-mm@kvack.org" Subject: Re: [RFC PATCH 03/13] selftests/vm/userfaultfd: wake after copy failure Message-ID: <20201221205245.GJ6640@xz-x1> References: <20201129004548.1619714-1-namit@vmware.com> <20201129004548.1619714-4-namit@vmware.com> <20201221192846.GH6640@xz-x1> <2B08ECCA-A7D2-4743-8956-571CB8788FDA@vmware.com> MIME-Version: 1.0 In-Reply-To: <2B08ECCA-A7D2-4743-8956-571CB8788FDA@vmware.com> Authentication-Results: relay.mimecast.com; auth=pass smtp.auth=CUSA124A263 smtp.mailfrom=peterx@redhat.com X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Dec 21, 2020 at 07:51:52PM +0000, Nadav Amit wrote: > > On Dec 21, 2020, at 11:28 AM, Peter Xu wrote: > >=20 > > On Sat, Nov 28, 2020 at 04:45:38PM -0800, Nadav Amit wrote: > >> From: Nadav Amit > >>=20 > >> When userfaultfd copy-ioctl fails since the PTE already exists, an > >> -EEXIST error is returned and the faulting thread is not woken. The > >> current userfaultfd test does not wake the faulting thread in such c= ase. > >> The assumption is presumably that another thread set the PTE through > >> copy/wp ioctl and would wake the faulting thread or that alternative= ly > >> the fault handler would realize there is no need to "must_wait" and > >> continue. This is not necessarily true. > >>=20 > >> There is an assumption that the "must_wait" tests in handle_userfaul= t() > >> are sufficient to provide definitive answer whether the offending PT= E is > >> populated or not. However, userfaultfd_must_wait() test is lockless. > >> Consequently, concurrent calls to ptep_modify_prot_start(), for > >> instance, can clear the PTE and can cause userfaultfd_must_wait() > >> to wrongly assume it is not populated and a wait is needed. > >=20 > > Yes userfaultfd_must_wait() is lockless, however my understanding is = that we'll > > enqueue before reading the page table, which seems to me that we'll a= lways get > > notified even the race happens. Should apply to either UFFDIO_WRITEP= ROTECT or > > UFFDIO_COPY, iiuc, as long as we follow the order of (1) modify pgtab= le (2) > > wake sleeping threads. Then it also means that when must_wait() retu= rned true, > > it should always get waked up when fault resolved. > >=20 > > Taking UFFDIO_COPY as example, even if UFFDIO_COPY happen right befor= e > > must_wait() calls: > >=20 > > worker thread uffd thread > > ------------- ----------- > >=20 > > handle_userfault > > spin_lock(fault_pending_wqh) > > enqueue() > > set_current_state(INTERRUPTIBLE) > > spin_unlock(fault_pending_wqh) > > must_wait() > > lockless walk page table > > UFFDIO_COPY > > fill in the hole > > wake up threads > > (this will wake up work= er thread too?) > > schedule() > > (which may return immediately?) > >=20 > > While here fault_pending_wqh is lock protected. I just feel like ther= e's some > > other reason to cause the thread to stall. Or did I miss something? >=20 > But what happens if the copy completed before the enqueuing? Assume > the page is write-protected during UFFDIO_COPY: >=20 >=20 > cpu0 cpu1 =09 > ---- ---- =09 > handle_userfault > UFFDIO_COPY > [ write-protected ] > fill in the hole > wake up threads > [nothing to wake] > =09 > UFFD_WP (unprotect) > logically marks as unprotected > [nothing to wake] >=20 > spin_lock(fault_pending_wqh) > enqueue() > set_current_state(INTERRUPTIBLE) > spin_unlock(fault_pending_wqh) > must_wait() >=20 > [ #PF on the same PTE > due to write-protection ] >=20 > ... > wp_page_copy() > ptep_clear_flush_notify() > [ PTE is clear ] > =09 > lockless walk page table > pte_none(*pte) -> must wait >=20 > Note that additional scenarios are possible. For instance, instead of > wp_page_copy(), we can have other change_pte_range() (due to worker=E2=80= =99s > mprotect() or NUMA balancing), calling ptep_modify_prot_start() and cle= aring > the PTE. >=20 > Am I missing something? Ah I see your point, thanks. I think you're right: Reviewed-by: Peter Xu Would you mind adding something like above into the commit message if you= 're going to repost? IMHO it would even be nicer to mention why UFFDIO_WRITEPROTECT does not need this extra wakeup (I think it's because= it'll do the wakeup unconditionally anyway). --=20 Peter Xu