From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 0CA15C433E0 for ; Fri, 8 Jan 2021 00:34:12 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 74D8D236F9 for ; Fri, 8 Jan 2021 00:34:11 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 74D8D236F9 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=linux-foundation.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id BDCC46B0131; Thu, 7 Jan 2021 19:34:10 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id B8DF66B0132; Thu, 7 Jan 2021 19:34:10 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id AA26E6B0134; Thu, 7 Jan 2021 19:34:10 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0032.hostedemail.com [216.40.44.32]) by kanga.kvack.org (Postfix) with ESMTP id 9326C6B0131 for ; Thu, 7 Jan 2021 19:34:10 -0500 (EST) Received: from smtpin21.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 5BAB9180AD806 for ; Fri, 8 Jan 2021 00:34:10 +0000 (UTC) X-FDA: 77680735860.21.rice96_0e0d0c5274ee Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin21.hostedemail.com (Postfix) with ESMTP id 2E5D0180442C7 for ; Fri, 8 Jan 2021 00:34:10 +0000 (UTC) X-HE-Tag: rice96_0e0d0c5274ee X-Filterd-Recvd-Size: 10341 Received: from mail-ed1-f49.google.com (mail-ed1-f49.google.com [209.85.208.49]) by imf49.hostedemail.com (Postfix) with ESMTP for ; Fri, 8 Jan 2021 00:34:09 +0000 (UTC) Received: by mail-ed1-f49.google.com with SMTP id cm17so9534560edb.4 for ; Thu, 07 Jan 2021 16:34:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linux-foundation.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=8s3uIzM7CTA6cmvZ+OTc9ndGPxNgOdh/GHBwFHUd4d0=; b=NtGQKCBiRPOwGZjhYMZBNKKfxScocvD0Cgjdpwrr3V9rBBbI9mHruMIWYOSIzFsDE2 tvj8vUmTNk3Abio3nYepEGipIHPZlcouWvKuBYTDC5cMYhT3hAPO0AYYIAlmw/ipxsLs EyItQzyrvBm3z6/Bi85h8PHp/znaj79v1JLow= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=8s3uIzM7CTA6cmvZ+OTc9ndGPxNgOdh/GHBwFHUd4d0=; b=k7qstE5tLHZITOF7G7zMiYUy6F7A0xDvJZSP3rhFBWWu6CDPF6BVlen/r15O6A93Ho ff+JP1QV+G8tosMHGe3+NvaNW6/kNNIGVB8NGyF/G8B5ZUj/KCc03mfInNzgCi1cBv44 rM4QpmFEOcDwDTc/Qs0PUd1auFrBZFG4E4EhlISto4TxGkWLQ4jWLv7Pbbe72tlf7ZMv il5ii3r9Pjk85Mm1uXhXD7lA/EoRGeh/LcLEbEX8AsnK8mKWU7jyF52k5oSwcumTOZCf wSYiQB9JVLeS2mQLWOVhiX5IbDiiovjXMIMQFdJoN2QkVy6Mgupvs5DzxyVNIC2hbP/+ UMPA== X-Gm-Message-State: AOAM532M7mmM0DvVQwsoOuRMHApcWvUDKj4MVZN2/GMZQvHpOvh3cQdC UMvvlkjyPf/NL6zU+M3PqJGy5Mu79Bzr2Q== X-Google-Smtp-Source: ABdhPJyb6/2yg5pgXe1NMa19goTah9wN/suI3jGBFZX5G8crUPwVaF6hmuRjKHjKyyiVukmRJ6mKIw== X-Received: by 2002:aa7:c3cf:: with SMTP id l15mr3590720edr.282.1610066047988; Thu, 07 Jan 2021 16:34:07 -0800 (PST) Received: from mail-ej1-f53.google.com (mail-ej1-f53.google.com. [209.85.218.53]) by smtp.gmail.com with ESMTPSA id d22sm2979891eja.72.2021.01.07.16.34.07 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 07 Jan 2021 16:34:07 -0800 (PST) Received: by mail-ej1-f53.google.com with SMTP id b9so12350170ejy.0 for ; Thu, 07 Jan 2021 16:34:07 -0800 (PST) X-Received: by 2002:a19:f014:: with SMTP id p20mr66695lfc.421.1610065570184; Thu, 07 Jan 2021 16:26:10 -0800 (PST) MIME-Version: 1.0 References: <20210107200402.31095-1-aarcange@redhat.com> <20210107200402.31095-3-aarcange@redhat.com> In-Reply-To: From: Linus Torvalds Date: Thu, 7 Jan 2021 16:25:54 -0800 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH 2/2] mm: soft_dirty: userfaultfd: introduce wrprotect_tlb_flush_pending To: Andrea Arcangeli Cc: Linux-MM , Linux Kernel Mailing List , Yu Zhao , Andy Lutomirski , Peter Xu , Pavel Emelyanov , Mike Kravetz , Mike Rapoport , Minchan Kim , Will Deacon , Peter Zijlstra , Hugh Dickins , "Kirill A. Shutemov" , Matthew Wilcox , Oleg Nesterov , Jann Horn , Kees Cook , John Hubbard , Leon Romanovsky , Jason Gunthorpe , Jan Kara , Kirill Tkhai Content-Type: multipart/mixed; boundary="000000000000606e9605b8589a93" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: --000000000000606e9605b8589a93 Content-Type: text/plain; charset="UTF-8" On Thu, Jan 7, 2021 at 3:48 PM Andrea Arcangeli wrote: > > > The alternate fix remains "make sure we always flush the TLB before > > releasing the page table lock, and make COW do the copy under the page > > table lock". But I really would prefer to just have this code follow > The copy under PT lock isn't enough. > > Flush TLB before releasing is enough of course. Right. That's why I said "and". You need both, afaik. But if we just do the mmap write lock, you need neither - then you just need to flush before you release the write lock. > Note also the patch in 2/2 patch that I sent: Yes, yes, and that's what I'm objecting to. All these stupid games with "flush_pending(" counts are complete garbage. They all come from the fact that this code doesn't hold the right lock. I don't understand you: in one breath you seem to say "yes, taking the write lock is the right thing to do", and then in the next one you point to this patch that adds all this garbage *because* it's not holding the write lock. All of those "tlb_flush_pending" things are wrong. They should not exist. The code in clear_refs_write() should hold the mmap_sem for writing, and do the TLB flush before it releases the mmap sem, and then it *cannot* race with the page faults. See what I'm saying? I refuse to apply your patch 2/2, because it all seems entirely wrong. What's doubly ludicrous about that is that the coide already _took_ the mmap_sem for writing, and spent extra cycles to downgrade it - INCORRECTLY - to a read-lock. And as far as I can tell, it doesn't even do anything expensive inside that (now downgraded) region, so the downgrading was (a) buggy (b) slower than just keeping the lock the way it had and (b) is because it had already done the expensive part (which was to get the lock in the first place). Just as an example, the whole "Rollback wrprotect_tlb_flush_pending" is all because it got the lock - again wrongly - as a read-lock initially, then it says "oh, I need to get a write lock", releases it, re-takes it as a write lock, does a tiny amount of work, and then - again incorrectly - downgrades it to a read-lock. To make it even worse (if that is possible) it actually had YET ANOTHER case - that CLEAR_REFS_MM_HIWATER_RSS - where it took the mmap sem for writing, did its thing, and then released it. So there's like *four* different locking mistakes in that single function. And it's not even an important function to begin with. It shgould just have done a single mmap_write_lock_killable(mm); ... mmap_write_unlock(mm); around the whole thing, instead of _any_ of that crazy stuff. That code is WRONG. And your PATCH 2/2 makes that insane code EVEN WORSE. Why the heck is that magic fs/proc/ interface allowed to get VM internals so wrong, and make things so much worse? Can you not see why I'm arguing with you? Please. Why is the correct patch not the attached one (apart from the obvious fact that I haven't tested it and maybe just screwed up completely - but you get the idea)? Linus --000000000000606e9605b8589a93 Content-Type: application/octet-stream; name=patch Content-Disposition: attachment; filename=patch Content-Transfer-Encoding: base64 Content-ID: X-Attachment-Id: f_kjnjflab0 IGZzL3Byb2MvdGFza19tbXUuYyB8IDMyICsrKysrKysrKy0tLS0tLS0tLS0tLS0tLS0tLS0tLS0t CiAxIGZpbGUgY2hhbmdlZCwgOSBpbnNlcnRpb25zKCspLCAyMyBkZWxldGlvbnMoLSkKCmRpZmYg LS1naXQgYS9mcy9wcm9jL3Rhc2tfbW11LmMgYi9mcy9wcm9jL3Rhc2tfbW11LmMKaW5kZXggZWU1 YTIzNWIzMDU2Li5hYjdkNzAwYjJjYWEgMTAwNjQ0Ci0tLSBhL2ZzL3Byb2MvdGFza19tbXUuYwor KysgYi9mcy9wcm9jL3Rhc2tfbW11LmMKQEAgLTEyMTUsNDEgKzEyMTUsMjYgQEAgc3RhdGljIHNz aXplX3QgY2xlYXJfcmVmc193cml0ZShzdHJ1Y3QgZmlsZSAqZmlsZSwgY29uc3QgY2hhciBfX3Vz ZXIgKmJ1ZiwKIAkJCS50eXBlID0gdHlwZSwKIAkJfTsKIAorCQlpZiAobW1hcF93cml0ZV9sb2Nr X2tpbGxhYmxlKG1tKSkgeworCQkJY291bnQgPSAtRUlOVFI7CisJCQlnb3RvIG91dF9tbTsKKwkJ fQogCQlpZiAodHlwZSA9PSBDTEVBUl9SRUZTX01NX0hJV0FURVJfUlNTKSB7Ci0JCQlpZiAobW1h cF93cml0ZV9sb2NrX2tpbGxhYmxlKG1tKSkgewotCQkJCWNvdW50ID0gLUVJTlRSOwotCQkJCWdv dG8gb3V0X21tOwotCQkJfQotCiAJCQkvKgogCQkJICogV3JpdGluZyA1IHRvIC9wcm9jL3BpZC9j bGVhcl9yZWZzIHJlc2V0cyB0aGUgcGVhawogCQkJICogcmVzaWRlbnQgc2V0IHNpemUgdG8gdGhp cyBtbSdzIGN1cnJlbnQgcnNzIHZhbHVlLgogCQkJICovCiAJCQlyZXNldF9tbV9oaXdhdGVyX3Jz cyhtbSk7Ci0JCQltbWFwX3dyaXRlX3VubG9jayhtbSk7Ci0JCQlnb3RvIG91dF9tbTsKKwkJCWdv dG8gb3V0X3VubG9jazsKIAkJfQogCi0JCWlmIChtbWFwX3JlYWRfbG9ja19raWxsYWJsZShtbSkp IHsKLQkJCWNvdW50ID0gLUVJTlRSOwotCQkJZ290byBvdXRfbW07Ci0JCX0KIAkJdGxiX2dhdGhl cl9tbXUoJnRsYiwgbW0sIDAsIC0xKTsKIAkJaWYgKHR5cGUgPT0gQ0xFQVJfUkVGU19TT0ZUX0RJ UlRZKSB7CiAJCQlmb3IgKHZtYSA9IG1tLT5tbWFwOyB2bWE7IHZtYSA9IHZtYS0+dm1fbmV4dCkg ewogCQkJCWlmICghKHZtYS0+dm1fZmxhZ3MgJiBWTV9TT0ZURElSVFkpKQogCQkJCQljb250aW51 ZTsKLQkJCQltbWFwX3JlYWRfdW5sb2NrKG1tKTsKLQkJCQlpZiAobW1hcF93cml0ZV9sb2NrX2tp bGxhYmxlKG1tKSkgewotCQkJCQljb3VudCA9IC1FSU5UUjsKLQkJCQkJZ290byBvdXRfbW07Ci0J CQkJfQotCQkJCWZvciAodm1hID0gbW0tPm1tYXA7IHZtYTsgdm1hID0gdm1hLT52bV9uZXh0KSB7 Ci0JCQkJCXZtYS0+dm1fZmxhZ3MgJj0gflZNX1NPRlRESVJUWTsKLQkJCQkJdm1hX3NldF9wYWdl X3Byb3Qodm1hKTsKLQkJCQl9Ci0JCQkJbW1hcF93cml0ZV9kb3duZ3JhZGUobW0pOwotCQkJCWJy ZWFrOworCQkJCXZtYS0+dm1fZmxhZ3MgJj0gflZNX1NPRlRESVJUWTsKKwkJCQl2bWFfc2V0X3Bh Z2VfcHJvdCh2bWEpOwogCQkJfQogCiAJCQltbXVfbm90aWZpZXJfcmFuZ2VfaW5pdCgmcmFuZ2Us IE1NVV9OT1RJRllfU09GVF9ESVJUWSwKQEAgLTEyNjEsNyArMTI0Niw4IEBAIHN0YXRpYyBzc2l6 ZV90IGNsZWFyX3JlZnNfd3JpdGUoc3RydWN0IGZpbGUgKmZpbGUsIGNvbnN0IGNoYXIgX191c2Vy ICpidWYsCiAJCWlmICh0eXBlID09IENMRUFSX1JFRlNfU09GVF9ESVJUWSkKIAkJCW1tdV9ub3Rp Zmllcl9pbnZhbGlkYXRlX3JhbmdlX2VuZCgmcmFuZ2UpOwogCQl0bGJfZmluaXNoX21tdSgmdGxi LCAwLCAtMSk7Ci0JCW1tYXBfcmVhZF91bmxvY2sobW0pOworb3V0X3VubG9jazoKKwkJbW1hcF93 cml0ZV91bmxvY2sobW0pOwogb3V0X21tOgogCQltbXB1dChtbSk7CiAJfQo= --000000000000606e9605b8589a93--