From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by smtp.lore.kernel.org (Postfix) with ESMTP id 123AFC77B78 for ; Tue, 2 May 2023 21:08:59 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S229524AbjEBVIb (ORCPT ); Tue, 2 May 2023 17:08:31 -0400 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:48958 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S229488AbjEBVI2 (ORCPT ); Tue, 2 May 2023 17:08:28 -0400 Received: from mail-ej1-x636.google.com (mail-ej1-x636.google.com [IPv6:2a00:1450:4864:20::636]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A954E10DE for ; Tue, 2 May 2023 14:08:26 -0700 (PDT) Received: by mail-ej1-x636.google.com with SMTP id a640c23a62f3a-959a3e2dd27so846994266b.3 for ; Tue, 02 May 2023 14:08:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=chromium.org; s=google; t=1683061704; x=1685653704; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=wExq1azF4GftMmcHTSy6kxlx9hSn4o3sgJc6Bo+y6xM=; b=U9etvfHaprHGpalhp2To0JER9EVuFoJeXDxLlidFVE+rYxzGpfYhbpSYE9dib3Q++m yd0vxeCTKNnzubvxFITlaARgWI2rJaTdgE1XSRFFE+yP7q+n7EbqS3vcY9i95LtbMMzB /yrrYB6e7Lk3d1Vd1WB/UgHUcJe+3jzfDjGHo= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1683061704; x=1685653704; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=wExq1azF4GftMmcHTSy6kxlx9hSn4o3sgJc6Bo+y6xM=; b=Fq3+dIkJ7jlFz5VQyp8RBdDYJrTrmfCTK5YqgUED3YYhsXV4XN+NNAfBcnLlCreP+O 7h41xZJgb6NzoSOIn2b66zLqNg6y7Zvc99SoTBBtZpd5K2VPtX+3OuW11IBSJgrvtX4R xMVswVMU1YkmGd6Qx974jObEtjk1fw0b94OJadtJCFWV7hd9KouQJiF/CMIQKTFR+0GW xkuN4r362gWhraEIozX3BozxvYPVKCQtJsGJsTRFoRP01BND6PNrIZAoM7y9IUP014Jx DvGJOO35lxbSyqBLGrOAdjuX2Y+qRbko8NvEYGQg21VYYxNZ0vRpK5qJ92ow+EpQR32u mC9Q== X-Gm-Message-State: AC+VfDzYbVGuVYIM0cKGtj+ols8Doh2EunsCQW5BmpqD3n6J1d31UkBs 6V0zC+gitqHr4vC5ZNspZphtv61Eh15rus29RbfrwQ== X-Google-Smtp-Source: ACHHUZ71ZP2Aryg7I3rIjMWtHGueHghNqeOgMku7GSuAxcjxvWEt0JdFY3pHVvYpNUtzKRVTXcjaKA== X-Received: by 2002:a17:907:9496:b0:94f:1a11:e08b with SMTP id dm22-20020a170907949600b0094f1a11e08bmr1204826ejc.20.1683061704453; Tue, 02 May 2023 14:08:24 -0700 (PDT) Received: from mail-wm1-f54.google.com (mail-wm1-f54.google.com. [209.85.128.54]) by smtp.gmail.com with ESMTPSA id bv13-20020a170907934d00b009584c5bcbc7sm13505451ejc.49.2023.05.02.14.08.23 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Tue, 02 May 2023 14:08:23 -0700 (PDT) Received: by mail-wm1-f54.google.com with SMTP id 5b1f17b1804b1-3f2548256d0so58665e9.1 for ; Tue, 02 May 2023 14:08:23 -0700 (PDT) X-Received: by 2002:a05:600c:1e20:b0:3f1:70d1:21a6 with SMTP id ay32-20020a05600c1e2000b003f170d121a6mr63861wmb.0.1683061703233; Tue, 02 May 2023 14:08:23 -0700 (PDT) MIME-Version: 1.0 References: <20230428135414.v3.1.Ia86ccac02a303154a0b8bc60567e7a95d34c96d3@changeid> <20230429101345.2769-1-hdanton@sina.com> In-Reply-To: <20230429101345.2769-1-hdanton@sina.com> From: Doug Anderson Date: Tue, 2 May 2023 14:08:10 -0700 X-Gmail-Original-Message-ID: Message-ID: Subject: Re: [PATCH v3] migrate_pages: Avoid blocking for IO in MIGRATE_SYNC_LIGHT To: Hillf Danton Cc: Andrew Morton , Mel Gorman , Alexander Viro , Christian Brauner , linux-mm@kvack.org, linux-kernel@vger.kernel.org, Matthew Wilcox , Yu Zhao , Johannes Weiner Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org Hi, On Sat, Apr 29, 2023 at 3:14=E2=80=AFAM Hillf Danton wro= te: > > On 28 Apr 2023 13:54:38 -0700 Douglas Anderson > > The MIGRATE_SYNC_LIGHT mode is intended to block for things that will > > finish quickly but not for things that will take a long time. Exactly > > how long is too long is not well defined, but waits of tens of > > milliseconds is likely non-ideal. > > > > When putting a Chromebook under memory pressure (opening over 90 tabs > > on a 4GB machine) it was fairly easy to see delays waiting for some > > locks in the kcompactd code path of > 100 ms. While the laptop wasn't > > amazingly usable in this state, it was still limping along and this > > state isn't something artificial. Sometimes we simply end up with a > > lot of memory pressure. > > Was kcompactd waken up for PAGE_ALLOC_COSTLY_ORDER? I put some more traces in and reproduced it again. I saw something that looked like this: 1. balance_pgdat() called wakeup_kcompactd() with order=3D10 and that caused us to get all the way to the end and wakeup kcompactd (there were previous calls to wakeup_kcompactd() that returned early). 2. kcompactd started and completed kcompactd_do_work() without blocking. 3. kcompactd called proactive_compact_node() and there blocked for ~92ms in one case, ~120ms in another case, ~131ms in another case. > > Putting the same Chromebook under memory pressure while it was running > > Android apps (though not stressing them) showed a much worse result > > (NOTE: this was on a older kernel but the codepaths here are similar). > > Android apps on ChromeOS currently run from a 128K-block, > > zlib-compressed, loopback-mounted squashfs disk. If we get a page > > fault from something backed by the squashfs filesystem we could end up > > holding a folio lock while reading enough from disk to decompress 128K > > (and then decompressing it using the somewhat slow zlib algorithms). > > That reading goes through the ext4 subsystem (because it's a loopback > > mount) before eventually ending up in the block subsystem. This extra > > jaunt adds extra overhead. Without much work I could see cases where > > we ended up blocked on a folio lock for over a second. With more > > extreme memory pressure I could see up to 25 seconds. > > In the same kcompactd code path above? It was definitely in kcompactd. I can go back and trace through this too, if it's useful, but I suspect it's the same. > > We considered adding a timeout in the case of MIGRATE_SYNC_LIGHT for > > the two locks that were seen to be slow [1] and that generated much > > discussion. After discussion, it was decided that we should avoid > > waiting for the two locks during MIGRATE_SYNC_LIGHT if they were being > > held for IO. We'll continue with the unbounded wait for the more full > > SYNC modes. > > > > With this change, I couldn't see any slow waits on these locks with my > > previous testcases. > > Well this is the upside after this change, but given the win, what is > the lose/cost paid? For example the changes in compact fail and success [= 1]. > > [1] https://lore.kernel.org/lkml/20230418191313.268131-1-hannes@cmpxchg.o= rg/ That looks like an interesting series. Obviously it would need to be tested, but my hunch is that ${SUBJECT} patch would work well with that series. Specifically with Johannes's series it seems more important for the kcompactd thread to be working fruitfully. Having it blocked for a long time when there is other useful work it could be doing still seems wrong. With ${SUBJECT} patch it's not that we'll never come back and try again, but we'll just wait until a future iteration when (hopefully) the locks are easier to acquire. In the meantime, we're looking for other pages to migrate. -Doug