From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-9.8 required=3.0 tests=DKIM_INVALID,DKIM_SIGNED, INCLUDES_PATCH,MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS, USER_AGENT_GIT autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F3930C35242 for ; Fri, 14 Feb 2020 19:30:00 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 96C0F2187F for ; Fri, 14 Feb 2020 19:30:00 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=fail reason="signature verification failed" (2048-bit key) header.d=gmail.com header.i=@gmail.com header.b="HbbNon0J" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 96C0F2187F Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=kernel.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 27B836B066D; Fri, 14 Feb 2020 14:30:00 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 22BE66B066F; Fri, 14 Feb 2020 14:30:00 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 142066B0670; Fri, 14 Feb 2020 14:30:00 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0012.hostedemail.com [216.40.44.12]) by kanga.kvack.org (Postfix) with ESMTP id F17FE6B066D for ; Fri, 14 Feb 2020 14:29:59 -0500 (EST) Received: from smtpin28.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay01.hostedemail.com (Postfix) with ESMTP id 8A958180AD817 for ; Fri, 14 Feb 2020 19:29:59 +0000 (UTC) X-FDA: 76489722918.28.grip34_572b430b0a940 X-HE-Tag: grip34_572b430b0a940 X-Filterd-Recvd-Size: 8007 Received: from mail-pg1-f195.google.com (mail-pg1-f195.google.com [209.85.215.195]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Fri, 14 Feb 2020 19:29:58 +0000 (UTC) Received: by mail-pg1-f195.google.com with SMTP id b9so5295179pgk.12 for ; Fri, 14 Feb 2020 11:29:58 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:from:to:cc:subject:date:message-id:in-reply-to:references :mime-version:content-transfer-encoding; bh=7eUMgQaKaihKgMoljstQiDj/fqDJCSe3MTjoLlLm6K8=; b=HbbNon0JXPwzqtzf+TMmfozNLvk2sykTlLtEXI+26GH1OYbdkXNgi6o7hpBOxidwOB KFFKv3yVPme1VXg0ifx4fLw0VYUqou0zJ8xp59IxrVnuUaSRbvz2wYFzBQo5UaNRyhjC R4GSCSFJHhxSq+IVen3hu7jDW7PZhKBnVDf7sNOwWIYA0PNocC/J/o5+wvkK75BWM8kk PHsUODEcORexiHE2806/gn0jETf1I/15l+7lXOiU48cNY8wDBzvDlc/bB4ss1/ciKRKW 8syOy6ehxmQ34sZC+ogIHfRiWdYCiZLiLHXdxnIzIhwvAysEpXdIqFbDnChhPZjXatTy 04sw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:sender:from:to:cc:subject:date:message-id :in-reply-to:references:mime-version:content-transfer-encoding; bh=7eUMgQaKaihKgMoljstQiDj/fqDJCSe3MTjoLlLm6K8=; b=WDhFbF0E1RjBzpSmeuhSWJJ0o9nQFc2kzgoMgkApijDXwy0lXngwOofQQc9eiRG/ia M0CyofdRrd3iZCpV1jyUUuooioZG45IVGMUU3TSAiFFEWy8YG7hcf9s5lrYkS59qgMOv 3vCtYgoEIMsWab2kLAxiK4kOEq3v22otvEr3Z019266a59JINNHh22qBOX6FZXxLCoPs kX28TtgA8UN4coahIcfQVK2qfdyP8lVVMvPXna25yr1PF/inUgQxd08ci3M4JmBRGjd9 JVMHBGZQWfb6lBMTyyJtE/ezCSxloR60+0t+gLr7Pl0ZGC4yClOFm+Lpvwiq5bMWR8tI 9h/A== X-Gm-Message-State: APjAAAUJfR9tHy0LjHAuBmrh7siFLDYqVmgkTTvWC8D88kgXBJdm7WgB laxvnTR4HbzfpMeybNYzv8g= X-Google-Smtp-Source: APXvYqyPF4iOUKloQHTVRqBKuM+cBSu+Va0Xza4+5VjIfvLUiAOKoGEM9GatToa/1WIUHDsKygU70A== X-Received: by 2002:a63:4823:: with SMTP id v35mr4960198pga.177.1581708597756; Fri, 14 Feb 2020 11:29:57 -0800 (PST) Received: from bbox-1.mtv.corp.google.com ([2620:15c:211:1:3e01:2939:5992:52da]) by smtp.gmail.com with ESMTPSA id d4sm7219795pjz.12.2020.02.14.11.29.55 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 14 Feb 2020 11:29:56 -0800 (PST) From: Minchan Kim To: Andrew Morton Cc: linux-mm , LKML , Jan Kara , Matthew Wilcox , Josef Bacik , Johannes Weiner , Minchan Kim Subject: [PATCH v2 2/2] mm: fix long time stall from mm_populate Date: Fri, 14 Feb 2020 11:29:51 -0800 Message-Id: <20200214192951.29430-2-minchan@kernel.org> X-Mailer: git-send-email 2.25.0.265.gbab2e86ba0-goog In-Reply-To: <20200214192951.29430-1-minchan@kernel.org> References: <20200214192951.29430-1-minchan@kernel.org> MIME-Version: 1.0 Content-Transfer-Encoding: quoted-printable X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: Basically, fault handler releases mmap_sem before requesting readahead and then it is supposed to retry lookup the page from page cache with FAULT_FLAG_TRIED so that it avoids the live lock of infinite retry. However, what happens if the fault handler find a page from page cache and the page has readahead marker but are waiting under writeback? Plus one more condition, it happens under mm_populate which repeats faulting unless it encounters error. So let's assemble conditions below. CPU 1 CPU 2 - first loop mm_populate for () .. ret =3D populate_vma_page_range __get_user_pages faultin_page handle_mm_fault filemap_fault do_async_mmap_readahead if (PageReadahead(pageA)) maybe_unlock_mmap_for_io up_read(mmap_sem) shrink_page_list pageout SetPageRe= claim(=3DSetPageReadahead)(pageA) writepage SetPage= Writeback(pageA) page_cache_async_readahead() ClearPageReadahead(pageA) do_async_mmap_readahead lock_page_maybe_drop_mmap goto out_retry the pageA is reclaimed and new pageB is populated to the file offset and finally has become PG_readahead - second loop __get_user_pages faultin_page handle_mm_fault filemap_fault do_async_mmap_readahead if (PageReadahead(pageB)) maybe_unlock_mmap_for_io up_read(mmap_sem) shrink_page_list pageout SetPageRe= claim(=3DSetPageReadahead)(pageB) writepage SetPage= Writeback(pageB) page_cache_async_readahead() ClearPageReadahead(pageB) do_async_mmap_readahead lock_page_maybe_drop_mmap goto out_retry It could be repeated forever so it's livelock. without involving reclaim, it could happens if ra_pages become zero by fadvise/other threads who have same fd one doing randome while the other one is sequential because page_cache_async_readahead has following condition check like PageWriteback and ra_pages are never synchrnized with fadvise and shrink_readahead_size_eio from other threads. void page_cache_async_readahead(struct address_space *mapping, unsigned long req_size) { /* no read-ahead */ if (!ra->ra_pages) return; Thus, we need to limit fault retry from mm_populate like page fault handler. Fixes: 6b4c9f446981 ("filemap: drop the mmap_sem for all blocking operati= ons") Reviewed-by: Jan Kara Signed-off-by: Minchan Kim --- mm/gup.c | 11 ++++++++--- 1 file changed, 8 insertions(+), 3 deletions(-) diff --git a/mm/gup.c b/mm/gup.c index 1b521e0ac1de..6f6548c63ad5 100644 --- a/mm/gup.c +++ b/mm/gup.c @@ -1133,7 +1133,7 @@ static __always_inline long __get_user_pages_locked= (struct task_struct *tsk, * * This takes care of mlocking the pages too if VM_LOCKED is set. * - * return 0 on success, negative error code on error. + * return number of pages pinned on success, negative error code on erro= r. * * vma->vm_mm->mmap_sem must be held. * @@ -1196,6 +1196,7 @@ int __mm_populate(unsigned long start, unsigned lon= g len, int ignore_errors) struct vm_area_struct *vma =3D NULL; int locked =3D 0; long ret =3D 0; + bool tried =3D false; =20 end =3D start + len; =20 @@ -1226,14 +1227,18 @@ int __mm_populate(unsigned long start, unsigned l= ong len, int ignore_errors) * double checks the vma flags, so that it won't mlock pages * if the vma was already munlocked. */ - ret =3D populate_vma_page_range(vma, nstart, nend, &locked); + ret =3D populate_vma_page_range(vma, nstart, nend, + tried ? NULL : &locked); if (ret < 0) { if (ignore_errors) { ret =3D 0; continue; /* continue at next VMA */ } break; - } + } else if (ret =3D=3D 0) + tried =3D true; + else + tried =3D false; nend =3D nstart + ret * PAGE_SIZE; ret =3D 0; } --=20 2.25.0.265.gbab2e86ba0-goog