From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.8 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, SPF_HELO_NONE,SPF_PASS autolearn=no autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 4F1BFC433FE for ; Thu, 3 Dec 2020 19:16:17 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 932E6206F6 for ; Thu, 3 Dec 2020 19:16:16 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 932E6206F6 Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=soleen.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 999696B0068; Thu, 3 Dec 2020 14:16:15 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id 96EF96B006C; Thu, 3 Dec 2020 14:16:15 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 8375C8D0001; Thu, 3 Dec 2020 14:16:15 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0034.hostedemail.com [216.40.44.34]) by kanga.kvack.org (Postfix) with ESMTP id 68E6D6B0068 for ; Thu, 3 Dec 2020 14:16:15 -0500 (EST) Received: from smtpin06.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay02.hostedemail.com (Postfix) with ESMTP id 13E8B3625 for ; Thu, 3 Dec 2020 19:16:15 +0000 (UTC) X-FDA: 77552926710.06.hose68_1c0848a273be Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin06.hostedemail.com (Postfix) with ESMTP id D32C010049A55 for ; Thu, 3 Dec 2020 19:16:14 +0000 (UTC) X-HE-Tag: hose68_1c0848a273be X-Filterd-Recvd-Size: 5896 Received: from mail-ej1-f67.google.com (mail-ej1-f67.google.com [209.85.218.67]) by imf13.hostedemail.com (Postfix) with ESMTP for ; Thu, 3 Dec 2020 19:16:14 +0000 (UTC) Received: by mail-ej1-f67.google.com with SMTP id n26so5134721eju.6 for ; Thu, 03 Dec 2020 11:16:13 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=soleen.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=B2unFigJQegdx4byhopE68eSGwQDdjPNiBkqppQKORw=; b=RQwzlugWq9y6xGCWYGZqV0cDbOSU4sMIIOG0haHdSUCDe8tW9oou5xKHPs4jrQ5Zk6 dsM353hzEl0YtP4/cRli48OFFj6cJohs+2c+a97YMQ1MBUPOe1M48RaakX2SvJrwwJdw g2us13bxps7lUhtOZLk3D294XsC4LGLdLQg51tkEkh6tplx3WN0gH1R9HZKzPEh/DEIb xWtdRHy7iORiEg/TuEzucbrR6ONX2/caaotAbHnNi0XVYH9MilgpB1y3L7Ql1PjUlfRF pmeAK7kmIGBV7ktkwYpFvIlxo60/CJUBLTIdNX2dgrALQAl7J2K8sZ93R4NIAU5efFIS nf4Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=B2unFigJQegdx4byhopE68eSGwQDdjPNiBkqppQKORw=; b=frMPai9ptLA6ZcD9jiEYnAWqZGoDQ2aMSIuoxBfF4c1WkNXmMS0Axo4GJSg5AybRJL obSdMnuzSRO8xtjyybzVKD3LrAeMRPPFILt+FDMHNRCUfQ4i3a+yIA6l0Ih5DHk1ag+U n7QMIncHUObWHYIRv2AQQT6dPy2m+lGF4Q8BErqQm87iUqWH2NjOvcAfy9KWebdXSU++ O79keSrYJAlPnAk9YV/s3P7pMJ1iwAzVuS0VG/8cgCQo5otJPb3YVdO5c7fsjZ7+KQ99 edckRh/IM/7fLHNR6LZqqvToJfuBfCX2cqKEJ/uiPWQrgxbKwJ/7fkSIoIwvB9iKmucH AtLg== X-Gm-Message-State: AOAM531neen/XuSXpFUDxoLqx9IeFdZearKLg50bWSiN6O/7Lk8FSD/3 tUtASporpyhmN4sohD3erOX7Rm4aHQEs7/2tl8KtRw== X-Google-Smtp-Source: ABdhPJwbdJuS1OA7LMC/r9M6sMcR5xENHDlR5Qqdy0Y+bRqCvsxnoEU9EYNBNQ/7hdjMyH2Og+UA/U7bahqYicmLgVc= X-Received: by 2002:a17:906:c04d:: with SMTP id bm13mr3822732ejb.519.1607022972744; Thu, 03 Dec 2020 11:16:12 -0800 (PST) MIME-Version: 1.0 References: <20201202052330.474592-1-pasha.tatashin@soleen.com> <20201202052330.474592-7-pasha.tatashin@soleen.com> <20201202163507.GL5487@ziepe.ca> <20201203010809.GQ5487@ziepe.ca> <20201203141729.GS5487@ziepe.ca> <20201203165937.GU5487@ziepe.ca> In-Reply-To: From: Pavel Tatashin Date: Thu, 3 Dec 2020 14:15:36 -0500 Message-ID: Subject: Re: [PATCH 6/6] mm/gup: migrate pinned pages out of movable zone To: Jason Gunthorpe Cc: LKML , linux-mm , Andrew Morton , Vlastimil Babka , Michal Hocko , David Hildenbrand , Oscar Salvador , Dan Williams , Sasha Levin , Tyler Hicks , Joonsoo Kim , mike.kravetz@oracle.com, Steven Rostedt , Ingo Molnar , Peter Zijlstra , Mel Gorman , Matthew Wilcox , David Rientjes , John Hubbard Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: > > > > Looking at this code some more.. How is it even correct? > > > > > > > > 1633 if (!isolate_lru_page(head)) { > > > > 1634 list_add_tail(&head->lru, &cma_page_list); > > > > > > > > Here we are only running under the read side of the mmap sem so multiple > > > > GUPs can be calling that sequence in parallel. I don't see any > > > > obvious exclusion that will prevent corruption of head->lru. The first > > > > GUP thread to do isolate_lru_page() will ClearPageLRU() and the second > > > > GUP thread will be a NOP for isolate_lru_page(). > > > > > > > > They will both race list_add_tail and other list ops. That is not OK. > > > > > > Good question. I studied it, and I do not see how this is OK. Worse, > > > this race is also exposable as a syscall instead of via driver: two > > > move_pages() run simultaneously. Perhaps in other places? > > > > > > move_pages() > > > kernel_move_pages() > > > mmget() > > > do_pages_move() > > > add_page_for_migratio() > > > mmap_read_lock(mm); > > > list_add_tail(&head->lru, pagelist); <- Not protected > > > > When this was CMA only it might have been rarer to trigger, but this > > move stuff sounds like it makes it much more broadly, eg on typical > > servers with RDMA exposed/etc > > > > Seems like it needs fixing as part of this too :\ > > Just to clarify the stack that I showed above is outside of gup, it is > the same issue that you pointed out that happens elsewhere. I suspect > there might be more. All of them should be addressed together. Hi Jason, I studied some more, and I think this is not a race: list_add_tail(&head->lru, &cma_page_list) is called only when isolate_lru_page(head) succeeds. isolate_lru_page(head) succeeds only when PageLRU(head) is true. However, in this function we also clear LRU flag before returning success. This means, that if we race with another thread, the other thread won't get to unprotected list_add_tail(&head->lru, &cma_page_list) until head is is back on LRU list. Please let me know if I am missing anything. Thank you, Pasha