From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_PASS,UNPARSEABLE_RELAY,USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 406C8C07E85 for ; Mon, 3 Dec 2018 20:09:24 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.kernel.org (Postfix) with ESMTP id 00FBC2087F for ; Mon, 3 Dec 2018 20:09:24 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (2048-bit key) header.d=oracle.com header.i=@oracle.com header.b="BRqp1y5Z" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 00FBC2087F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=oracle.com Authentication-Results: mail.kernel.org; spf=none smtp.mailfrom=linux-kernel-owner@vger.kernel.org Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726062AbeLCUJZ (ORCPT ); Mon, 3 Dec 2018 15:09:25 -0500 Received: from aserp2120.oracle.com ([141.146.126.78]:56290 "EHLO aserp2120.oracle.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725936AbeLCUJX (ORCPT ); Mon, 3 Dec 2018 15:09:23 -0500 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.22/8.16.0.22) with SMTP id wB3JwaRr086200; Mon, 3 Dec 2018 20:09:08 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : date : message-id : in-reply-to : references; s=corp-2018-07-02; bh=0t52fLgngXHEH1UGn44+Uw2HE3xbOfQ4NJc5Y8fZs1w=; b=BRqp1y5ZfntM59ay9BfDE/kWnTDhTBqYvoosUr7//8dE9WKWqHmYgIiajfJOFo54VCSt iVOiU4WLSSZ5Uyl8wJcql+ebxSO5atKnVKgOUKtwnIGfHI4h7GNQN74uClNF3196BUol GVOTueNl6HVPPZ7aVOIc3E+pdjyvg+pSoXP8Q1rcalcCMfzPg8EBlMr6Dj7Rzt3tv8v+ j6GPjivj0pS6qiFlPYy+2nYc3Wm3s8LoaRMmgN7/YL64+hQVDhvNMG8laHvh5h7b/xDE cxcLRJmTqxyewMoutlpgnQGb/FyMjmbARu7S73GoFTMV81+UdzQ50WEbINkBtK+ksW9m rg== Received: from userv0022.oracle.com (userv0022.oracle.com [156.151.31.74]) by aserp2120.oracle.com with ESMTP id 2p3j8q8kqn-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 03 Dec 2018 20:09:07 +0000 Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userv0022.oracle.com (8.14.4/8.14.4) with ESMTP id wB3K91Kp024324 (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Mon, 3 Dec 2018 20:09:02 GMT Received: from abhmp0012.oracle.com (abhmp0012.oracle.com [141.146.116.18]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id wB3K91ph006920; Mon, 3 Dec 2018 20:09:01 GMT Received: from monkey.oracle.com (/50.38.38.67) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Mon, 03 Dec 2018 12:09:01 -0800 From: Mike Kravetz To: linux-mm@kvack.org, linux-kernel@vger.kernel.org Cc: Michal Hocko , Hugh Dickins , Naoya Horiguchi , "Aneesh Kumar K . V" , Andrea Arcangeli , "Kirill A . Shutemov" , Davidlohr Bueso , Prakash Sangappa , Andrew Morton , Mike Kravetz , stable@vger.kernel.org Subject: [PATCH 2/3] hugetlbfs: Use i_mmap_rwsem to fix page fault/truncate race Date: Mon, 3 Dec 2018 12:08:49 -0800 Message-Id: <20181203200850.6460-3-mike.kravetz@oracle.com> X-Mailer: git-send-email 2.17.2 In-Reply-To: <20181203200850.6460-1-mike.kravetz@oracle.com> References: <20181203200850.6460-1-mike.kravetz@oracle.com> X-Proofpoint-Virus-Version: vendor=nai engine=5900 definitions=9096 signatures=668686 X-Proofpoint-Spam-Details: rule=notspam policy=default score=0 suspectscore=0 malwarescore=0 phishscore=0 bulkscore=0 spamscore=0 mlxscore=0 mlxlogscore=999 adultscore=0 classifier=spam adjust=0 reason=mlx scancount=1 engine=8.0.1-1810050000 definitions=main-1812030183 Sender: linux-kernel-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org hugetlbfs page faults can race with truncate and hole punch operations. Current code in the page fault path attempts to handle this by 'backing out' operations if we encounter the race. One obvious omission in the current code is removing a page newly added to the page cache. This is pretty straight forward to address, but there is a more subtle and difficult issue of backing out hugetlb reservations. To handle this correctly, the 'reservation state' before page allocation needs to be noted so that it can be properly backed out. There are four distinct possibilities for reservation state: shared/reserved, shared/no-resv, private/reserved and private/no-resv. Backing out a reservation may require memory allocation which could fail so that needs to be taken into account as well. Instead of writing the required complicated code for this rare occurrence, just eliminate the race. i_mmap_rwsem is now held in read mode for the duration of page fault processing. Hold i_mmap_rwsem longer in truncation and hold punch code to cover the call to remove_inode_hugepages. Cc: Fixes: ebed4bfc8da8 ("hugetlb: fix absurd HugePages_Rsvd") Signed-off-by: Mike Kravetz --- fs/hugetlbfs/inode.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/fs/hugetlbfs/inode.c b/fs/hugetlbfs/inode.c index 32920a10100e..3244147fc42b 100644 --- a/fs/hugetlbfs/inode.c +++ b/fs/hugetlbfs/inode.c @@ -505,8 +505,8 @@ static int hugetlb_vmtruncate(struct inode *inode, loff_t offset) i_mmap_lock_write(mapping); if (!RB_EMPTY_ROOT(&mapping->i_mmap.rb_root)) hugetlb_vmdelete_list(&mapping->i_mmap, pgoff, 0); - i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, offset, LLONG_MAX); + i_mmap_unlock_write(mapping); return 0; } @@ -540,8 +540,8 @@ static long hugetlbfs_punch_hole(struct inode *inode, loff_t offset, loff_t len) hugetlb_vmdelete_list(&mapping->i_mmap, hole_start >> PAGE_SHIFT, hole_end >> PAGE_SHIFT); - i_mmap_unlock_write(mapping); remove_inode_hugepages(inode, hole_start, hole_end); + i_mmap_unlock_write(mapping); inode_unlock(inode); } -- 2.17.2