From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-19.8 required=3.0 tests=BAYES_00,DKIMWL_WL_MED, DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS, INCLUDES_CR_TRAILER,MAILING_LIST_MULTI,SPF_HELO_NONE,SPF_PASS, USER_AGENT_SANE_1,USER_IN_DEF_DKIM_WL autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 5C728C47083 for ; Wed, 2 Jun 2021 15:57:45 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id EF49A61493 for ; Wed, 2 Jun 2021 15:57:44 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org EF49A61493 Authentication-Results: mail.kernel.org; dmarc=fail (p=reject dis=none) header.from=google.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 8CEB06B0071; Wed, 2 Jun 2021 11:57:44 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 8A5656B0072; Wed, 2 Jun 2021 11:57:44 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 746AA6B0073; Wed, 2 Jun 2021 11:57:44 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0121.hostedemail.com [216.40.44.121]) by kanga.kvack.org (Postfix) with ESMTP id 424956B0071 for ; Wed, 2 Jun 2021 11:57:44 -0400 (EDT) Received: from smtpin16.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id C691ABF08 for ; Wed, 2 Jun 2021 15:57:43 +0000 (UTC) X-FDA: 78209239206.16.CDF5D83 Received: from mail-qk1-f182.google.com (mail-qk1-f182.google.com [209.85.222.182]) by imf17.hostedemail.com (Postfix) with ESMTP id 1922A4202A23 for ; Wed, 2 Jun 2021 15:57:33 +0000 (UTC) Received: by mail-qk1-f182.google.com with SMTP id c20so2872032qkm.3 for ; Wed, 02 Jun 2021 08:57:43 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=date:from:to:cc:subject:in-reply-to:message-id:references :user-agent:mime-version; bh=PiMEsGQpq3LUehhbI8DOf0dylmS1YX1XfHq3USTeXx8=; b=Rj1zqFpwGJJYnqtQOl3TcsAYTj3xtbT5PfSAXG3eEavAMAU5SvPpMcgBZgEGjK4rPg k5w+LwuyhEPnXVL40E1e2x8z4SpVtSJhFVR3VUCR/pL+YCFEoOuduILwYN2DDhwVJTjS aVQZ6KgI8OlYaAKMJ4Omp1YW+akz8nkMlmLVYZ78XLjI+KysadcnG4uJ2Tqmcx1RX+u5 VbukluowEeVqyk6MXmL/s3p9qaTPpadjDilPV0+0cMqpICTcF8X7GJpf4Bn7LTrBBt8d joK7TWQKn12xipPA18uaIHmNJYk3CI5dnxO48P8ndxuBGoCjY8rXVpOhw7yoI6Bjplon qlDg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:date:from:to:cc:subject:in-reply-to:message-id :references:user-agent:mime-version; bh=PiMEsGQpq3LUehhbI8DOf0dylmS1YX1XfHq3USTeXx8=; b=ZMT41gXeSiqMu8uqloAqN4HK/Z86faYNAAA/iRriDcwhPAcIILjy3qOpdLLThc4hgY lJlC3v3oga29Yf4W1iO0/dYihMR1E0bKU0ZDt6+D8shl6qKhcnOZMj3B6DwtEr+/I/LW oyDvIw3SVnSmeyC3gSgynZ6vu+anUoNnNfln4UIngtiSTg+fOCpg/LVqMzTUe2TnLk5w DcpPKliJmxXUhv+QtGy8AFYdIF6hitS6tiww4HuAyY4cUadHPSz+lfKgmiIG2O/8TNGh llqej9tSEcE/iJ4/a9lce4N1IRbs/62MTHZOwWS5A5g+/lQZbpVuIFIko5YXOvGFgS2d GK3g== X-Gm-Message-State: AOAM531NGIU/16b6IAXsT3ARxCKKzskQkyly9uAQfvsUoYv1QkNBSG3w JBZvmQWW8H/Hf8i5hb7qfRIdhA== X-Google-Smtp-Source: ABdhPJyWQfsfxT1L5lCPdfWfPjEw/x6wcP/E4fn6J1XvXlUXBVyD89H0Er+4Md1KjO84oQCapmheQA== X-Received: by 2002:a05:620a:a85:: with SMTP id v5mr8137500qkg.285.1622649462162; Wed, 02 Jun 2021 08:57:42 -0700 (PDT) Received: from eggly.attlocal.net (172-10-233-147.lightspeed.sntcca.sbcglobal.net. [172.10.233.147]) by smtp.gmail.com with ESMTPSA id x15sm58080qtq.60.2021.06.02.08.57.36 (version=TLS1 cipher=ECDHE-ECDSA-AES128-SHA bits=128/128); Wed, 02 Jun 2021 08:57:41 -0700 (PDT) Date: Wed, 2 Jun 2021 08:57:25 -0700 (PDT) From: Hugh Dickins X-X-Sender: hugh@eggly.anvils To: Yu Xu cc: Hugh Dickins , linux-mm@kvack.org, linux-kernel@vger.kernel.org, akpm@linux-foundation.org, gavin.dg@linux.alibaba.com, Greg Thelen , Wei Xu , Matthew Wilcox , Nicholas Piggin , Vlastimil Babka Subject: Re: [PATCH] mm, thp: relax migration wait when failed to get tail page In-Reply-To: Message-ID: References: User-Agent: Alpine 2.11 (LSU 23 2013-08-11) MIME-Version: 1.0 Content-Type: TEXT/PLAIN; charset=US-ASCII X-Rspamd-Queue-Id: 1922A4202A23 Authentication-Results: imf17.hostedemail.com; dkim=pass header.d=google.com header.s=20161025 header.b=Rj1zqFpw; spf=pass (imf17.hostedemail.com: domain of hughd@google.com designates 209.85.222.182 as permitted sender) smtp.mailfrom=hughd@google.com; dmarc=pass (policy=reject) header.from=google.com X-Rspamd-Server: rspam04 X-Stat-Signature: zea1gnsfpcxsqhomu587dyagjbuq5jfu X-HE-Tag: 1622649453-380198 X-Bogosity: Ham, tests=bogofilter, spamicity=0.000339, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Wed, 2 Jun 2021, Yu Xu wrote: > On 6/2/21 12:55 AM, Hugh Dickins wrote: > > On Wed, 2 Jun 2021, Xu Yu wrote: > > > > > We notice that hung task happens in a conner but practical scenario when > > > CONFIG_PREEMPT_NONE is enabled, as follows. > > > > > > Process 0 Process 1 Process > > > 2..Inf > > > split_huge_page_to_list > > > unmap_page > > > split_huge_pmd_address > > > __migration_entry_wait(head) > > > __migration_entry_wait(tail) > > > remap_page (roll back) > > > remove_migration_ptes > > > rmap_walk_anon > > > cond_resched > > > > > > Where __migration_entry_wait(tail) is occurred in kernel space, e.g., > > > copy_to_user, which will immediately fault again without rescheduling, > > > and thus occupy the cpu fully. > > > > > > When there are too many processes performing __migration_entry_wait on > > > tail page, remap_page will never be done after cond_resched. > > > > > > This relaxes __migration_entry_wait on tail page, thus gives remap_page > > > a chance to complete. > > > > > > Signed-off-by: Gang Deng > > > Signed-off-by: Xu Yu > > > > Well caught: you're absolutely right that there's a bug there. > > But isn't cond_resched() just papering over the real bug, and > > what it should do is a "page = compound_head(page);" before the > > get_page_unless_zero()? How does that work out in your testing? > > compound_head works. The patched kernel is alive for hours under > our reproducer, which usually makes the vanilla kernel hung after > tens of minutes at most. Oh, that's good news, thanks. (It's still likely that a well-placed cond_resched() somewhere in mm/gup.c would also be a good idea, but none of us have yet got around to identifying where.) > > If we use compound_head, the behavior of __migration_entry_wait(tail) > changes from "retry fault" to "prevent THP from being split". Is that > right? Then which is preferred? If it were me, I would prefer "retry > fault". As Matthew remarked, you are asking very good questions, and split migration entries are difficult to think about. But I believe you'll find it works out okay. The point of *put_and_* wait_on_page_locked() is that it does drop the page reference you acquired with get_page_unless_zero, as soon as the page is on the wait queue, before actually waiting. So splitting the THP is only prevented for a brief interval. Now, it's true that if there are very many tasks faulting on portions of the huge page, in that interval between inserting the migration entries and freezing the huge page's refcount to 0, they can reduce the chance of splitting considerably. But that's not an excuse for for doing get_page_unless_zero() on the wrong thing, as it was doing. Hugh