From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F1A4BC433DB for ; Mon, 21 Dec 2020 22:13:01 +0000 (UTC) Received: from vger.kernel.org (vger.kernel.org [23.128.96.18]) by mail.kernel.org (Postfix) with ESMTP id 9A9F622A83 for ; Mon, 21 Dec 2020 22:13:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1726429AbgLUWMc (ORCPT ); Mon, 21 Dec 2020 17:12:32 -0500 Received: from lindbergh.monkeyblade.net ([23.128.96.19]:59072 "EHLO lindbergh.monkeyblade.net" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1725938AbgLUWMc (ORCPT ); Mon, 21 Dec 2020 17:12:32 -0500 Received: from mail-lf1-x132.google.com (mail-lf1-x132.google.com [IPv6:2a00:1450:4864:20::132]) by lindbergh.monkeyblade.net (Postfix) with ESMTPS id A5211C061282 for ; Mon, 21 Dec 2020 14:11:51 -0800 (PST) Received: by mail-lf1-x132.google.com with SMTP id 23so27273753lfg.10 for ; Mon, 21 Dec 2020 14:11:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=konsulko.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=jBAt1RHYy1aqMRYFGaE3yHYI/8MQKnzdriZFqO7W7Ls=; b=kA5V2OlTwufddnTrHnWNieXRpJQxP8HBytYtM3xUnAyMronNistfPL6yokAgEAWxel EyvkAZUq5044dpTahW+xWsqmo+qcDypBXFhy2QZnC2BSJ6gYR5V7RVyoZOWEVEGYqGbr hR+XWJFyQbFts/HNnbsmkTByIEhGmQ61D5Zs8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=jBAt1RHYy1aqMRYFGaE3yHYI/8MQKnzdriZFqO7W7Ls=; b=pXX6aRJJNGq+++IuF8rILbsrANpvnpsn+LUlHs1MLIV2MKb0SBvo8eBkSq5w7CLMAS nQTADXudQAE5qukq41Pe69dBTdOxb+HGck5izJIcQwjriaF/5ZBsTn2erMp+3BaNyBlU H3Tk+2echxH6HidgmUCpMZ4Zo3JkP0Z+yQnxdO6Ciud0W11QT58VNcW0ztp/xyC/HJwU 4LZS8GGQJ9GH1tNqaRfxS7P9tc4NjdyOYf5PvhtHMJ1uNEiHP8L7Qy+cLt6KRghM7teL 0pyjjRahuRWXZ9Z99m0o8UwmOxtubVX5WGpEfQBcodOCpKjxA7xEI4mz4PgInc2oXmg0 Bnpw== X-Gm-Message-State: AOAM530o9M/NWwDbZ47nCg99dHdJdPAHeNlSPmdmk7ulqiNw8KKYIXI5 flUR859U+6t9vDmr75SCssXeA8qjxRUpA6md6+avBOE9ILs6bMKncJQ= X-Google-Smtp-Source: ABdhPJwyzqUFoUSF/1NmjueMbe52ezVBGyvQwEQATI1oi6HEfbllTgf+rWMpGEBw4+kjq2xaHtVNL4qLTO2F5BtZ2ek= X-Received: by 2002:a2e:154b:: with SMTP id 11mr8099534ljv.22.1608588710132; Mon, 21 Dec 2020 14:11:50 -0800 (PST) MIME-Version: 1.0 References: <18669bd607ae9efbf4e00e36532c7aa167d0fa12.camel@gmx.de> <20201220002228.38697-1-vitaly.wool@konsulko.com> <8cc0e01fd03245a4994f2e0f54b264fa@hisilicon.com> In-Reply-To: <8cc0e01fd03245a4994f2e0f54b264fa@hisilicon.com> From: Vitaly Wool Date: Mon, 21 Dec 2020 23:11:39 +0100 Message-ID: Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock To: "Song Bao Hua (Barry Song)" Cc: Shakeel Butt , Minchan Kim , Mike Galbraith , LKML , linux-mm , Sebastian Andrzej Siewior , NitinGupta , Sergey Senozhatsky , Andrew Morton Content-Type: text/plain; charset="UTF-8" Precedence: bulk List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Mon, Dec 21, 2020 at 10:30 PM Song Bao Hua (Barry Song) wrote: > > > > > -----Original Message----- > > From: Shakeel Butt [mailto:shakeelb@google.com] > > Sent: Tuesday, December 22, 2020 10:03 AM > > To: Song Bao Hua (Barry Song) > > Cc: Vitaly Wool ; Minchan Kim ; > > Mike Galbraith ; LKML ; linux-mm > > ; Sebastian Andrzej Siewior ; > > NitinGupta ; Sergey Senozhatsky > > ; Andrew Morton > > > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock > > > > On Mon, Dec 21, 2020 at 12:06 PM Song Bao Hua (Barry Song) > > wrote: > > > > > > > > > > > > > -----Original Message----- > > > > From: Shakeel Butt [mailto:shakeelb@google.com] > > > > Sent: Tuesday, December 22, 2020 8:50 AM > > > > To: Vitaly Wool > > > > Cc: Minchan Kim ; Mike Galbraith ; LKML > > > > ; linux-mm ; Song Bao > > Hua > > > > (Barry Song) ; Sebastian Andrzej Siewior > > > > ; NitinGupta ; Sergey > > Senozhatsky > > > > ; Andrew Morton > > > > > > > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock > > > > > > > > On Mon, Dec 21, 2020 at 11:20 AM Vitaly Wool > > wrote: > > > > > > > > > > On Mon, Dec 21, 2020 at 6:24 PM Minchan Kim wrote: > > > > > > > > > > > > On Sun, Dec 20, 2020 at 02:22:28AM +0200, Vitaly Wool wrote: > > > > > > > zsmalloc takes bit spinlock in its _map() callback and releases it > > > > > > > only in unmap() which is unsafe and leads to zswap complaining > > > > > > > about scheduling in atomic context. > > > > > > > > > > > > > > To fix that and to improve RT properties of zsmalloc, remove that > > > > > > > bit spinlock completely and use a bit flag instead. > > > > > > > > > > > > I don't want to use such open code for the lock. > > > > > > > > > > > > I see from Mike's patch, recent zswap change introduced the lockdep > > > > > > splat bug and you want to improve zsmalloc to fix the zswap bug and > > > > > > introduce this patch with allowing preemption enabling. > > > > > > > > > > This understanding is upside down. The code in zswap you are referring > > > > > to is not buggy. You may claim that it is suboptimal but there is > > > > > nothing wrong in taking a mutex. > > > > > > > > > > > > > Is this suboptimal for all or just the hardware accelerators? Sorry, I > > > > am not very familiar with the crypto API. If I select lzo or lz4 as a > > > > zswap compressor will the [de]compression be async or sync? > > > > > > Right now, in crypto subsystem, new drivers are required to write based on > > > async APIs. The old sync API can't work in new accelerator drivers as they > > > are not supported at all. > > > > > > Old drivers are used to sync, but they've got async wrappers to support async > > > APIs. Eg. > > > crypto: acomp - add support for lz4 via scomp > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > > crypto/lz4.c?id=8cd9330e0a615c931037d4def98b5ce0d540f08d > > > > > > crypto: acomp - add support for lzo via scomp > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > > crypto/lzo.c?id=ac9d2c4b39e022d2c61486bfc33b730cfd02898e > > > > > > so they are supporting async APIs but they are still working in sync mode > > as > > > those old drivers don't sleep. > > > > > > > Good to know that those are sync because I want them to be sync. > > Please note that zswap is a cache in front of a real swap and the load > > operation is latency sensitive as it comes in the page fault path and > > directly impacts the applications. I doubt decompressing synchronously > > a 4k page on a cpu will be costlier than asynchronously decompressing > > the same page from hardware accelerators. > > If you read the old paper: > https://www.ibm.com/support/pages/new-linux-zswap-compression-functionality > Because the hardware accelerator speeds up compression, looking at the zswap > metrics we observed that there were more store and load requests in a given > amount of time, which filled up the zswap pool faster than a software > compression run. Because of this behavior, we set the max_pool_percent > parameter to 30 for the hardware compression runs - this means that zswap > can use up to 30% of the 10GB of total memory. > > So using hardware accelerators, we get a chance to speed up compression > while decreasing cpu utilization. > > BTW, If it is not easy to change zsmalloc, one quick workaround we might do > in zswap is adding the below after applying Mike's original patch: > > if(in_atomic()) /* for zsmalloc */ > while(!try_wait_for_completion(&req->done); > else /* for zbud, z3fold */ > crypto_wait_req(....); I don't think I'm going to ack this, sorry. Best regards, Vitaly > crypto_wait_req() is actually doing wait_for_completion(): > static inline int crypto_wait_req(int err, struct crypto_wait *wait) > { > switch (err) { > case -EINPROGRESS: > case -EBUSY: > wait_for_completion(&wait->completion); > reinit_completion(&wait->completion); > err = wait->err; > break; > } > > return err; > } > > Thanks > Barry From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-10.7 required=3.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,MAILING_LIST_MULTI, MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id 8A7C8C433DB for ; Mon, 21 Dec 2020 22:11:54 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id C08CC22AAA for ; Mon, 21 Dec 2020 22:11:53 +0000 (UTC) DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org C08CC22AAA Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=konsulko.com Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id E71CB6B0036; Mon, 21 Dec 2020 17:11:52 -0500 (EST) Received: by kanga.kvack.org (Postfix, from userid 40) id E22D36B005C; Mon, 21 Dec 2020 17:11:52 -0500 (EST) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id D3B246B0068; Mon, 21 Dec 2020 17:11:52 -0500 (EST) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0083.hostedemail.com [216.40.44.83]) by kanga.kvack.org (Postfix) with ESMTP id BD3AB6B0036 for ; Mon, 21 Dec 2020 17:11:52 -0500 (EST) Received: from smtpin27.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay04.hostedemail.com (Postfix) with ESMTP id 8D8FD1E01 for ; Mon, 21 Dec 2020 22:11:52 +0000 (UTC) X-FDA: 77618687664.27.kitty42_070b0d42745a Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251]) by smtpin27.hostedemail.com (Postfix) with ESMTP id 72E503D668 for ; Mon, 21 Dec 2020 22:11:52 +0000 (UTC) X-HE-Tag: kitty42_070b0d42745a X-Filterd-Recvd-Size: 8895 Received: from mail-lf1-f48.google.com (mail-lf1-f48.google.com [209.85.167.48]) by imf24.hostedemail.com (Postfix) with ESMTP for ; Mon, 21 Dec 2020 22:11:51 +0000 (UTC) Received: by mail-lf1-f48.google.com with SMTP id y19so27201914lfa.13 for ; Mon, 21 Dec 2020 14:11:51 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=konsulko.com; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=jBAt1RHYy1aqMRYFGaE3yHYI/8MQKnzdriZFqO7W7Ls=; b=kA5V2OlTwufddnTrHnWNieXRpJQxP8HBytYtM3xUnAyMronNistfPL6yokAgEAWxel EyvkAZUq5044dpTahW+xWsqmo+qcDypBXFhy2QZnC2BSJ6gYR5V7RVyoZOWEVEGYqGbr hR+XWJFyQbFts/HNnbsmkTByIEhGmQ61D5Zs8= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=jBAt1RHYy1aqMRYFGaE3yHYI/8MQKnzdriZFqO7W7Ls=; b=HBAE5IkHnb/4v+AunuJP1LwgZAUbUyExfyMDyUHn3Woh5QGkqZDA0LclkQCkuUncOh 8Lu6nssNenPakkfIo5u6m9+OSz1qjig+tW5VOSTIrdwMwquJEyu3z1zwfo6aFojlPmba PhoFwtVQmcODUBsIOZRnON821P2DuHWcg3xZuoQuDaG7l+ae3BfthhX2jydMzulWU19H PR0V6BID9YQrCLwIZMnqI1rHVRqiPFqkAzV2GHLnNMSJ5ydDW8wx2kSsvAbevx2CdqI7 RilS3LHUVdL2tSX0Qkl+HwNmwQzP85hmNG76qlIbMQiBHm6+d1EQuPd1ZChGJYoUCeeC 92Zw== X-Gm-Message-State: AOAM530ZYTFfhDBvH5F65k8e9baHgObfgk4oDF9qCO+U+eOMuux/8vKB 77CKJl4CC8oHi76pQ32fAM3rSog9Ra8gZpBAZc7Y3w== X-Google-Smtp-Source: ABdhPJwyzqUFoUSF/1NmjueMbe52ezVBGyvQwEQATI1oi6HEfbllTgf+rWMpGEBw4+kjq2xaHtVNL4qLTO2F5BtZ2ek= X-Received: by 2002:a2e:154b:: with SMTP id 11mr8099534ljv.22.1608588710132; Mon, 21 Dec 2020 14:11:50 -0800 (PST) MIME-Version: 1.0 References: <18669bd607ae9efbf4e00e36532c7aa167d0fa12.camel@gmx.de> <20201220002228.38697-1-vitaly.wool@konsulko.com> <8cc0e01fd03245a4994f2e0f54b264fa@hisilicon.com> In-Reply-To: <8cc0e01fd03245a4994f2e0f54b264fa@hisilicon.com> From: Vitaly Wool Date: Mon, 21 Dec 2020 23:11:39 +0100 Message-ID: Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock To: "Song Bao Hua (Barry Song)" Cc: Shakeel Butt , Minchan Kim , Mike Galbraith , LKML , linux-mm , Sebastian Andrzej Siewior , NitinGupta , Sergey Senozhatsky , Andrew Morton Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Mon, Dec 21, 2020 at 10:30 PM Song Bao Hua (Barry Song) wrote: > > > > > -----Original Message----- > > From: Shakeel Butt [mailto:shakeelb@google.com] > > Sent: Tuesday, December 22, 2020 10:03 AM > > To: Song Bao Hua (Barry Song) > > Cc: Vitaly Wool ; Minchan Kim ; > > Mike Galbraith ; LKML ; linux-mm > > ; Sebastian Andrzej Siewior ; > > NitinGupta ; Sergey Senozhatsky > > ; Andrew Morton > > > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock > > > > On Mon, Dec 21, 2020 at 12:06 PM Song Bao Hua (Barry Song) > > wrote: > > > > > > > > > > > > > -----Original Message----- > > > > From: Shakeel Butt [mailto:shakeelb@google.com] > > > > Sent: Tuesday, December 22, 2020 8:50 AM > > > > To: Vitaly Wool > > > > Cc: Minchan Kim ; Mike Galbraith ; LKML > > > > ; linux-mm ; Song Bao > > Hua > > > > (Barry Song) ; Sebastian Andrzej Siewior > > > > ; NitinGupta ; Sergey > > Senozhatsky > > > > ; Andrew Morton > > > > > > > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock > > > > > > > > On Mon, Dec 21, 2020 at 11:20 AM Vitaly Wool > > wrote: > > > > > > > > > > On Mon, Dec 21, 2020 at 6:24 PM Minchan Kim wrote: > > > > > > > > > > > > On Sun, Dec 20, 2020 at 02:22:28AM +0200, Vitaly Wool wrote: > > > > > > > zsmalloc takes bit spinlock in its _map() callback and releases it > > > > > > > only in unmap() which is unsafe and leads to zswap complaining > > > > > > > about scheduling in atomic context. > > > > > > > > > > > > > > To fix that and to improve RT properties of zsmalloc, remove that > > > > > > > bit spinlock completely and use a bit flag instead. > > > > > > > > > > > > I don't want to use such open code for the lock. > > > > > > > > > > > > I see from Mike's patch, recent zswap change introduced the lockdep > > > > > > splat bug and you want to improve zsmalloc to fix the zswap bug and > > > > > > introduce this patch with allowing preemption enabling. > > > > > > > > > > This understanding is upside down. The code in zswap you are referring > > > > > to is not buggy. You may claim that it is suboptimal but there is > > > > > nothing wrong in taking a mutex. > > > > > > > > > > > > > Is this suboptimal for all or just the hardware accelerators? Sorry, I > > > > am not very familiar with the crypto API. If I select lzo or lz4 as a > > > > zswap compressor will the [de]compression be async or sync? > > > > > > Right now, in crypto subsystem, new drivers are required to write based on > > > async APIs. The old sync API can't work in new accelerator drivers as they > > > are not supported at all. > > > > > > Old drivers are used to sync, but they've got async wrappers to support async > > > APIs. Eg. > > > crypto: acomp - add support for lz4 via scomp > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > > crypto/lz4.c?id=8cd9330e0a615c931037d4def98b5ce0d540f08d > > > > > > crypto: acomp - add support for lzo via scomp > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/ > > crypto/lzo.c?id=ac9d2c4b39e022d2c61486bfc33b730cfd02898e > > > > > > so they are supporting async APIs but they are still working in sync mode > > as > > > those old drivers don't sleep. > > > > > > > Good to know that those are sync because I want them to be sync. > > Please note that zswap is a cache in front of a real swap and the load > > operation is latency sensitive as it comes in the page fault path and > > directly impacts the applications. I doubt decompressing synchronously > > a 4k page on a cpu will be costlier than asynchronously decompressing > > the same page from hardware accelerators. > > If you read the old paper: > https://www.ibm.com/support/pages/new-linux-zswap-compression-functionality > Because the hardware accelerator speeds up compression, looking at the zswap > metrics we observed that there were more store and load requests in a given > amount of time, which filled up the zswap pool faster than a software > compression run. Because of this behavior, we set the max_pool_percent > parameter to 30 for the hardware compression runs - this means that zswap > can use up to 30% of the 10GB of total memory. > > So using hardware accelerators, we get a chance to speed up compression > while decreasing cpu utilization. > > BTW, If it is not easy to change zsmalloc, one quick workaround we might do > in zswap is adding the below after applying Mike's original patch: > > if(in_atomic()) /* for zsmalloc */ > while(!try_wait_for_completion(&req->done); > else /* for zbud, z3fold */ > crypto_wait_req(....); I don't think I'm going to ack this, sorry. Best regards, Vitaly > crypto_wait_req() is actually doing wait_for_completion(): > static inline int crypto_wait_req(int err, struct crypto_wait *wait) > { > switch (err) { > case -EINPROGRESS: > case -EBUSY: > wait_for_completion(&wait->completion); > reinit_completion(&wait->completion); > err = wait->err; > break; > } > > return err; > } > > Thanks > Barry