From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-0.9 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 875CEC2D0BF for ; Mon, 16 Dec 2019 16:08:41 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 211DA20717 for ; Mon, 16 Dec 2019 16:08:40 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=redhat.com header.i=@redhat.com header.b="W9X4XrmE" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 211DA20717 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=redhat.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 777FB8E0005; Mon, 16 Dec 2019 11:08:40 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 7011B8E0003; Mon, 16 Dec 2019 11:08:40 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 5C9998E0005; Mon, 16 Dec 2019 11:08:40 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0048.hostedemail.com [216.40.44.48]) by kanga.kvack.org (Postfix) with ESMTP id 4277D8E0003 for ; Mon, 16 Dec 2019 11:08:40 -0500 (EST) Received: from smtpin29.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay03.hostedemail.com (Postfix) with SMTP id ED858824999B for ; Mon, 16 Dec 2019 16:08:39 +0000 (UTC) X-FDA: 76271487558.29.grape53_4b959d011e13c X-HE-Tag: grape53_4b959d011e13c X-Filterd-Recvd-Size: 8048 Received: from us-smtp-delivery-1.mimecast.com (us-smtp-1.mimecast.com [205.139.110.61]) by imf05.hostedemail.com (Postfix) with ESMTP for ; Mon, 16 Dec 2019 16:08:39 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1576512518; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ZnxkV9GModxsSbKA4xijaIZYqi36Az+4rB4CseudO98=; b=W9X4XrmEyx95fMs/fIENHevDb/eOoOZ3e+xj+wjLgAQ48A6mu7JHjmbrVfBmrJKRUGJy/a 2oX6zWbOzfwCoEBQDSRTJ9p9swYAwjNeoQgS9YCzX7069Qj1A300Fx2XQKxjeSq+i+dWuZ 9I9BRnlYjs+KJYHTRhonVJOFSnktKgo= Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-78-KODZ3h8oPXCxHSzPpz-YmA-1; Mon, 16 Dec 2019 11:08:36 -0500 Received: by mail-wr1-f72.google.com with SMTP id f15so3995607wrr.2 for ; Mon, 16 Dec 2019 08:08:36 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version; bh=yIpix4UPxBSh2MT6UEhohtiVputl6QStexIF4LGjRs4=; b=TSlHO+r1bDivWWQCYOnSi/AyoqS3Qqq9udF9Qbcr2I5qX52AMqDrUVVmz7JyC6WsZp qF5NKEN0dN+NVyi5yz37HS74M4dLBankBp94DwFkLdK41urss85ZHBc8CfGUiU5BazgL FnBDTFf26bwVeWxMkkgUWutpvZFx2FZdI2jhn4wc5t+2fjzsbTMMYiLWd6xKG6pAWE2F E1bcutV5s/DA9e3VtrFY7k9+Y9SpdS13XlSv2O7h6N3bMo/dNnuXt7FDGuj7z7PwP27c X+S6kmlMnIsrUuRdCnZ38kHJclKfHD1K2ye3MKGAPAHOm9iSrKyiyJaei+QUAIw7QQGW OdDw== X-Gm-Message-State: APjAAAVQp7TokAJFVfRF5QyJ/saG6KD22LPZQADcmmKPZN7jYExpG7WV cJOAnPY5J2L4JhENm2VmgpyRxcMT9So4jxKhIumpys9NrYXekkFHm23GMTnkyZOkxQq+Mn0r3K+ xpZwj84Lx6hQ= X-Received: by 2002:a1c:7c18:: with SMTP id x24mr31499812wmc.21.1576512515630; Mon, 16 Dec 2019 08:08:35 -0800 (PST) X-Google-Smtp-Source: APXvYqwDcQ7lKF+oayQgC++uUUuXpbNGgRnUniNAizGb6g1+QNaoRZFDljRrO1NXUlKqtodM3COvQA== X-Received: by 2002:a1c:7c18:: with SMTP id x24mr31499763wmc.21.1576512515346; Mon, 16 Dec 2019 08:08:35 -0800 (PST) Received: from vitty.brq.redhat.com (g-server-2.ign.cz. [91.219.240.2]) by smtp.gmail.com with ESMTPSA id g2sm21949089wrw.76.2019.12.16.08.08.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 16 Dec 2019 08:08:34 -0800 (PST) From: Vitaly Kuznetsov To: Vlastimil Babka , Ajay Kaher Cc: Peter Zijlstra , gregkh@linuxfoundation.org, stable@vger.kernel.org, torvalds@linux-foundation.org, punit.agrawal@arm.com, akpm@linux-foundation.org, kirill.shutemov@linux.intel.com, willy@infradead.org, will.deacon@arm.com, mszeredi@redhat.com, linux-mm@kvack.org, linux-kernel@vger.kernel.org, srivatsab@vmware.com, srivatsa@csail.mit.edu, amakhalov@vmware.com, srinidhir@vmware.com, bvikas@vmware.com, anishs@vmware.com, vsirnapalli@vmware.com, srostedt@vmware.com, Oscar Salvador , Thomas Gleixner , Ingo Molnar , Juergen Gross , Borislav Petkov , Dave Hansen , Andy Lutomirski Subject: Re: [PATCH v3 8/8] x86, mm, gup: prevent get_page() race with munmap in paravirt guest In-Reply-To: <1aacc7ac-87b0-e22e-a265-ea175506844d@suse.cz> References: <1576529149-14269-1-git-send-email-akaher@vmware.com> <1576529149-14269-9-git-send-email-akaher@vmware.com> <20191216130443.GN2844@hirez.programming.kicks-ass.net> <87lfrc9z3v.fsf@vitty.brq.redhat.com> <20191216134725.GE2827@hirez.programming.kicks-ass.net> <1aacc7ac-87b0-e22e-a265-ea175506844d@suse.cz> Date: Mon, 16 Dec 2019 17:08:32 +0100 Message-ID: <87immg9rsv.fsf@vitty.brq.redhat.com> MIME-Version: 1.0 X-MC-Unique: KODZ3h8oPXCxHSzPpz-YmA-1 X-Mimecast-Spam-Score: 0 Content-Type: text/plain Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Vlastimil Babka writes: > On 12/16/19 2:47 PM, Peter Zijlstra wrote: >> On Mon, Dec 16, 2019 at 02:30:44PM +0100, Vitaly Kuznetsov wrote: >>> Peter Zijlstra writes: >>> >>>> On Tue, Dec 17, 2019 at 02:15:48AM +0530, Ajay Kaher wrote: >>>>> From: Vlastimil Babka >>>>> >>>>> The x86 version of get_user_pages_fast() relies on disabled interrupt= s to >>>>> synchronize gup_pte_range() between gup_get_pte(ptep); and get_page()= against >>>>> a parallel munmap. The munmap side nulls the pte, then flushes TLBs, = then >>>>> releases the page. As TLB flush is done synchronously via IPI disabli= ng >>>>> interrupts blocks the page release, and get_page(), which assumes exi= sting >>>>> reference on page, is thus safe. >>>>> However when TLB flush is done by a hypercall, e.g. in a Xen PV guest= , there is >>>>> no blocking thanks to disabled interrupts, and get_page() can succeed= on a page >>>>> that was already freed or even reused. >>>>> >>>>> We have recently seen this happen with our 4.4 and 4.12 based kernels= , with >>>>> userspace (java) that exits a thread, where mm_release() performs a f= utex_wake() >>>>> on tsk->clear_child_tid, and another thread in parallel unmaps the pa= ge where >>>>> tsk->clear_child_tid points to. The spurious get_page() succeeds, but= futex code >>>>> immediately releases the page again, while it's already on a freelist= . Symptoms >>>>> include a bad page state warning, general protection faults acessing = a poisoned >>>>> list prev/next pointer in the freelist, or free page pcplists of two = cpus joined >>>>> together in a single list. Oscar has also reproduced this scenario, w= ith a >>>>> patch inserting delays before the get_page() to make the race window = larger. >>>>> >>>>> Fix this by removing the dependency on TLB flush interrupts the same = way as the >>>> >>>> This is suppsed to be fixed by: >>>> >>>> arch/x86/Kconfig: select HAVE_RCU_TABLE_FREE if PAR= AVIRT >>>> >>> >>> Yes, > > Well, that commit fixes the "page table can be freed under us" part. But > this patch is about the "get_page() will succeed on a page that's being > freed" part. Upstream fixed that unknowingly in 4.13 by a gup.c > refactoring that would be too risky to backport fully. > (I also dislike receiving only this patch of the series, next time please send the whole thing, it's only 8 patches, our mailfolders will survive that) When I was adding Hyper-V PV TLB flush to RHEL7 - which is 3.10 based - in addition to adding page_cache_get_speculative() to gup_get_pte()/gup_huge_pmd()/gup_huge_pud() I also had to synchronize huge PMD split against gup_fast with the following hack: +static void do_nothing(void *unused) +{ + +} + +static void serialize_against_pte_lookup(struct mm_struct *mm) +{ + smp_mb(); + smp_call_function_many(mm_cpumask(mm), do_nothing, NULL, 1); +} + void pmdp_splitting_flush(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) { @@ -434,9 +473,10 @@ void pmdp_splitting_flush(struct vm_area_struct *vma, set =3D !test_and_set_bit(_PAGE_BIT_SPLITTING, (unsigned long *)pmdp); if (set) { /* need tlb flush only to serialize against gup-fast */ flush_tlb_range(vma, address, address + HPAGE_PMD_SIZE); + if (pv_mmu_ops.flush_tlb_others !=3D native_flush_tlb_other= s) + serialize_against_pte_lookup(vma->vm_mm); } } I'm not sure which stable kernel you're targeting (and if you addressed thi= s with other patches in the series, if this is needed,...) so JFYI. --=20 Vitaly