From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <SRS0=3xAm=F2=kvack.org=owner-linux-mm@kernel.org>
X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on
	aws-us-west-2-korg-lkml-1.web.codeaurora.org
X-Spam-Level: 
X-Spam-Status: No, score=-7.7 required=3.0 tests=BAYES_00,DKIM_SIGNED,
	DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,HTML_MESSAGE,
	MAILING_LIST_MULTI,MENTIONS_GIT_HOSTING,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED
	autolearn=no autolearn_force=no version=3.4.0
Received: from mail.kernel.org (mail.kernel.org [198.145.29.99])
	by smtp.lore.kernel.org (Postfix) with ESMTP id 45DB5C433DB
	for <linux-mm@archiver.kernel.org>; Tue, 22 Dec 2020 01:57:19 +0000 (UTC)
Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17])
	by mail.kernel.org (Postfix) with ESMTP id A538222A83
	for <linux-mm@archiver.kernel.org>; Tue, 22 Dec 2020 01:57:18 +0000 (UTC)
DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org A538222A83
Authentication-Results: mail.kernel.org; dmarc=none (p=none dis=none) header.from=konsulko.com
Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix)
	id C1C9E6B008C; Mon, 21 Dec 2020 20:57:17 -0500 (EST)
Received: by kanga.kvack.org (Postfix, from userid 40)
	id BCC3D6B0092; Mon, 21 Dec 2020 20:57:17 -0500 (EST)
X-Delivered-To: int-list-linux-mm@kvack.org
Received: by kanga.kvack.org (Postfix, from userid 63042)
	id 9F72A8D0001; Mon, 21 Dec 2020 20:57:17 -0500 (EST)
X-Delivered-To: linux-mm@kvack.org
Received: from forelay.hostedemail.com (smtprelay0130.hostedemail.com [216.40.44.130])
	by kanga.kvack.org (Postfix) with ESMTP id 7C5F06B008C
	for <linux-mm@kvack.org>; Mon, 21 Dec 2020 20:57:17 -0500 (EST)
Received: from smtpin02.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251])
	by forelay01.hostedemail.com (Postfix) with ESMTP id 3815A180AD82F
	for <linux-mm@kvack.org>; Tue, 22 Dec 2020 01:57:17 +0000 (UTC)
X-FDA: 77619255714.02.ray19_4b152402745c
Received: from filter.hostedemail.com (10.5.16.251.rfc1918.com [10.5.16.251])
	by smtpin02.hostedemail.com (Postfix) with ESMTP id 1F54A10097AA0
	for <linux-mm@kvack.org>; Tue, 22 Dec 2020 01:57:17 +0000 (UTC)
X-HE-Tag: ray19_4b152402745c
X-Filterd-Recvd-Size: 47810
Received: from mail-lf1-f54.google.com (mail-lf1-f54.google.com [209.85.167.54])
	by imf38.hostedemail.com (Postfix) with ESMTP
	for <linux-mm@kvack.org>; Tue, 22 Dec 2020 01:57:15 +0000 (UTC)
Received: by mail-lf1-f54.google.com with SMTP id o13so28331351lfr.3
        for <linux-mm@kvack.org>; Mon, 21 Dec 2020 17:57:15 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=konsulko.com; s=google;
        h=mime-version:references:in-reply-to:from:date:message-id:subject:to
         :cc;
        bh=0jnBeZrqUSvR8n6EoDicjQpe0YHKprUMkodKeSNQD/k=;
        b=iF6wdcPl/3QaL0hSnBXLN+gSmIEtdc3ti574KyWOG3/9Z0LHudGEZtmMfaCk6pkpIG
         tLYPW3PkXUAWzr7GYP0RZZvGSQoQwRyeGvVb1qRbqg9nO0eIaU82ZXTNmoTZf63ff5kW
         +FwiR9BnpXWQdtMaNPOfl4kE/GFV7HKN/PjKI=
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20161025;
        h=x-gm-message-state:mime-version:references:in-reply-to:from:date
         :message-id:subject:to:cc;
        bh=0jnBeZrqUSvR8n6EoDicjQpe0YHKprUMkodKeSNQD/k=;
        b=bgzPjpE/Lin5RS1zghrToRELl3dw0ptUTUgLpRrdt8dbLifI9QKRtFCXTSBmyPqv1l
         M+ZUjp2wAXPxH6Lo5GiEVl03HBoctiCoxhpA2SsTMLkZ2pbmlUmcifaFtWFZ315qN/u5
         TXmHmDcrWD9j66OCW4W2liHFfBHN3NQunHQjOi5Ty7bfS/wxJXD1MEgwmsA/zBjvGr25
         qkgPECz+LgSrvN6Zxj2wC0w5WsUXSuzGnEVUxFWeK4UUJ2HhOcBNIdeE3KhkAsXMhMVt
         Y8hWsMqVurLe8pWeHcKaQlt0HSJMef0pJ8VZq4L9lkIZWsuYQL1j8hqMURXdrU3MrMDY
         ON2Q==
X-Gm-Message-State: AOAM532jn498M40jB5hbEvvkkZHxk3J+9GFx5E8bBdjhSrUzuM5j8748
	PJAfSFKLaOzWK9uXIhbbu84GorWoc0flJKTfToN7TshDocCxe9KgU5Y=
X-Google-Smtp-Source: ABdhPJzB4ERwPv9WwAyKhorY0AySPOAt9RDyr5A09zYw65VpneWuIXWyP8TLHNGa98ua7UtysqfDUSU80i5ut8PFC4o=
X-Received: by 2002:a05:6512:3e7:: with SMTP id n7mr7620173lfq.585.1608602234114;
 Mon, 21 Dec 2020 17:57:14 -0800 (PST)
MIME-Version: 1.0
References: <18669bd607ae9efbf4e00e36532c7aa167d0fa12.camel@gmx.de>
 <20201220002228.38697-1-vitaly.wool@konsulko.com> <X+DaMSJE22nUC0tl@google.com>
 <CAM4kBBKnW6K-mbPno4SpvhUBiykP4zeFm_CNzssDkReURbuU7w@mail.gmail.com>
 <CALvZod69OtXkdOJPzuY5XfXz_ro0V7OmqW4OY9B_emqwroxW4w@mail.gmail.com>
 <e5cd8a0a5df84081a11359ede6e746bc@hisilicon.com> <CALvZod7EZnEWb_65FjSNdx+-S_4pLHyS5rYiU-D3hFLRMXS6Lw@mail.gmail.com>
 <8cc0e01fd03245a4994f2e0f54b264fa@hisilicon.com> <CAM4kBB+xUa8zXSRSuB0z5FCdPNmUpDfcC4Vqu7wzAkf0b+RXqw@mail.gmail.com>
 <f0ca46a830e54f4482fb4f46df9675f5@hisilicon.com> <CAM4kBBKD6MAOaBvwC_Wedf_zmzmt-gm=TrAF1Lh7pVbNtcsFZg@mail.gmail.com>
 <4490cb6a7e2243fba374e40652979e46@hisilicon.com>
In-Reply-To: <4490cb6a7e2243fba374e40652979e46@hisilicon.com>
From: Vitaly Wool <vitaly.wool@konsulko.com>
Date: Tue, 22 Dec 2020 02:57:08 +0100
Message-ID: <CAM4kBBK=5eBdCjWc5VJXcdr=Z4PV1=ZQ2n8fZmJ6ahJbpUyv2A@mail.gmail.com>
Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
To: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
Cc: Shakeel Butt <shakeelb@google.com>, Minchan Kim <minchan@kernel.org>, 
	Mike Galbraith <efault@gmx.de>, LKML <linux-kernel@vger.kernel.org>, 
	linux-mm <linux-mm@kvack.org>, Sebastian Andrzej Siewior <bigeasy@linutronix.de>, 
	NitinGupta <ngupta@vflare.org>, Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>, 
	Andrew Morton <akpm@linux-foundation.org>, "tiantao (H)" <tiantao6@hisilicon.com>
Content-Type: multipart/alternative; boundary="000000000000bfe60805b703e488"
X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4
Sender: owner-linux-mm@kvack.org
Precedence: bulk
X-Loop: owner-majordomo@kvack.org
List-ID: <linux-mm.kvack.org>

--000000000000bfe60805b703e488
Content-Type: text/plain; charset="UTF-8"

On Tue, 22 Dec 2020, 02:42 Song Bao Hua (Barry Song), <
song.bao.hua@hisilicon.com> wrote:

>
>
> > -----Original Message-----
> > From: Song Bao Hua (Barry Song)
> > Sent: Tuesday, December 22, 2020 2:06 PM
> > To: 'Vitaly Wool' <vitaly.wool@konsulko.com>
> > Cc: Shakeel Butt <shakeelb@google.com>; Minchan Kim <minchan@kernel.org>;
> Mike
> > Galbraith <efault@gmx.de>; LKML <linux-kernel@vger.kernel.org>; linux-mm
> > <linux-mm@kvack.org>; Sebastian Andrzej Siewior <bigeasy@linutronix.de>;
> > NitinGupta <ngupta@vflare.org>; Sergey Senozhatsky
> > <sergey.senozhatsky.work@gmail.com>; Andrew Morton
> > <akpm@linux-foundation.org>
> > Subject: RE: [PATCH] zsmalloc: do not use bit_spin_lock
> >
> >
> >
> > > -----Original Message-----
> > > From: Vitaly Wool [mailto:vitaly.wool@konsulko.com]
> > > Sent: Tuesday, December 22, 2020 2:00 PM
> > > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> > > Cc: Shakeel Butt <shakeelb@google.com>; Minchan Kim <
> minchan@kernel.org>;
> > Mike
> > > Galbraith <efault@gmx.de>; LKML <linux-kernel@vger.kernel.org>;
> linux-mm
> > > <linux-mm@kvack.org>; Sebastian Andrzej Siewior <bigeasy@linutronix.de
> >;
> > > NitinGupta <ngupta@vflare.org>; Sergey Senozhatsky
> > > <sergey.senozhatsky.work@gmail.com>; Andrew Morton
> > > <akpm@linux-foundation.org>
> > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> > >
> > > On Tue, Dec 22, 2020 at 12:37 AM Song Bao Hua (Barry Song)
> > > <song.bao.hua@hisilicon.com> wrote:
> > > >
> > > >
> > > >
> > > > > -----Original Message-----
> > > > > From: Song Bao Hua (Barry Song)
> > > > > Sent: Tuesday, December 22, 2020 11:38 AM
> > > > > To: 'Vitaly Wool' <vitaly.wool@konsulko.com>
> > > > > Cc: Shakeel Butt <shakeelb@google.com>; Minchan Kim <
> minchan@kernel.org>;
> > > Mike
> > > > > Galbraith <efault@gmx.de>; LKML <linux-kernel@vger.kernel.org>;
> linux-mm
> > > > > <linux-mm@kvack.org>; Sebastian Andrzej Siewior <
> bigeasy@linutronix.de>;
> > > > > NitinGupta <ngupta@vflare.org>; Sergey Senozhatsky
> > > > > <sergey.senozhatsky.work@gmail.com>; Andrew Morton
> > > > > <akpm@linux-foundation.org>
> > > > > Subject: RE: [PATCH] zsmalloc: do not use bit_spin_lock
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Vitaly Wool [mailto:vitaly.wool@konsulko.com]
> > > > > > Sent: Tuesday, December 22, 2020 11:12 AM
> > > > > > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> > > > > > Cc: Shakeel Butt <shakeelb@google.com>; Minchan Kim
> > <minchan@kernel.org>;
> > > > > Mike
> > > > > > Galbraith <efault@gmx.de>; LKML <linux-kernel@vger.kernel.org>;
> > linux-mm
> > > > > > <linux-mm@kvack.org>; Sebastian Andrzej Siewior
> > <bigeasy@linutronix.de>;
> > > > > > NitinGupta <ngupta@vflare.org>; Sergey Senozhatsky
> > > > > > <sergey.senozhatsky.work@gmail.com>; Andrew Morton
> > > > > > <akpm@linux-foundation.org>
> > > > > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> > > > > >
> > > > > > On Mon, Dec 21, 2020 at 10:30 PM Song Bao Hua (Barry Song)
> > > > > > <song.bao.hua@hisilicon.com> wrote:
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > > -----Original Message-----
> > > > > > > > From: Shakeel Butt [mailto:shakeelb@google.com]
> > > > > > > > Sent: Tuesday, December 22, 2020 10:03 AM
> > > > > > > > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> > > > > > > > Cc: Vitaly Wool <vitaly.wool@konsulko.com>; Minchan Kim
> > > > > > <minchan@kernel.org>;
> > > > > > > > Mike Galbraith <efault@gmx.de>; LKML <
> linux-kernel@vger.kernel.org>;
> > > > > > linux-mm
> > > > > > > > <linux-mm@kvack.org>; Sebastian Andrzej Siewior
> > > <bigeasy@linutronix.de>;
> > > > > > > > NitinGupta <ngupta@vflare.org>; Sergey Senozhatsky
> > > > > > > > <sergey.senozhatsky.work@gmail.com>; Andrew Morton
> > > > > > > > <akpm@linux-foundation.org>
> > > > > > > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> > > > > > > >
> > > > > > > > On Mon, Dec 21, 2020 at 12:06 PM Song Bao Hua (Barry Song)
> > > > > > > > <song.bao.hua@hisilicon.com> wrote:
> > > > > > > > >
> > > > > > > > >
> > > > > > > > >
> > > > > > > > > > -----Original Message-----
> > > > > > > > > > From: Shakeel Butt [mailto:shakeelb@google.com]
> > > > > > > > > > Sent: Tuesday, December 22, 2020 8:50 AM
> > > > > > > > > > To: Vitaly Wool <vitaly.wool@konsulko.com>
> > > > > > > > > > Cc: Minchan Kim <minchan@kernel.org>; Mike Galbraith
> > > <efault@gmx.de>;
> > > > > > LKML
> > > > > > > > > > <linux-kernel@vger.kernel.org>; linux-mm <
> linux-mm@kvack.org>;
> > > Song
> > > > > > Bao
> > > > > > > > Hua
> > > > > > > > > > (Barry Song) <song.bao.hua@hisilicon.com>; Sebastian
> Andrzej
> > > Siewior
> > > > > > > > > > <bigeasy@linutronix.de>; NitinGupta <ngupta@vflare.org>;
> Sergey
> > > > > > > > Senozhatsky
> > > > > > > > > > <sergey.senozhatsky.work@gmail.com>; Andrew Morton
> > > > > > > > > > <akpm@linux-foundation.org>
> > > > > > > > > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> > > > > > > > > >
> > > > > > > > > > On Mon, Dec 21, 2020 at 11:20 AM Vitaly Wool
> > > <vitaly.wool@konsulko.com>
> > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > > On Mon, Dec 21, 2020 at 6:24 PM Minchan Kim <
> minchan@kernel.org>
> > > > > wrote:
> > > > > > > > > > > >
> > > > > > > > > > > > On Sun, Dec 20, 2020 at 02:22:28AM +0200, Vitaly
> Wool wrote:
> > > > > > > > > > > > > zsmalloc takes bit spinlock in its _map() callback
> and
> > releases
> > > > > > it
> > > > > > > > > > > > > only in unmap() which is unsafe and leads to zswap
> complaining
> > > > > > > > > > > > > about scheduling in atomic context.
> > > > > > > > > > > > >
> > > > > > > > > > > > > To fix that and to improve RT properties of
> zsmalloc,
> > remove
> > > > > that
> > > > > > > > > > > > > bit spinlock completely and use a bit flag instead.
> > > > > > > > > > > >
> > > > > > > > > > > > I don't want to use such open code for the lock.
> > > > > > > > > > > >
> > > > > > > > > > > > I see from Mike's patch, recent zswap change
> introduced
> > the
> > > lockdep
> > > > > > > > > > > > splat bug and you want to improve zsmalloc to fix
> the zswap
> > > bug
> > > > > > and
> > > > > > > > > > > > introduce this patch with allowing preemption
> enabling.
> > > > > > > > > > >
> > > > > > > > > > > This understanding is upside down. The code in zswap
> you are
> > > referring
> > > > > > > > > > > to is not buggy.  You may claim that it is suboptimal
> but there
> > > is
> > > > > > > > > > > nothing wrong in taking a mutex.
> > > > > > > > > > >
> > > > > > > > > >
> > > > > > > > > > Is this suboptimal for all or just the hardware
> accelerators?
> > > Sorry,
> > > > > > I
> > > > > > > > > > am not very familiar with the crypto API. If I select
> lzo or
> > lz4
> > > as
> > > > > > a
> > > > > > > > > > zswap compressor will the [de]compression be async or
> sync?
> > > > > > > > >
> > > > > > > > > Right now, in crypto subsystem, new drivers are required
> to write
> > > based
> > > > > > on
> > > > > > > > > async APIs. The old sync API can't work in new accelerator
> drivers
> > > as
> > > > > > they
> > > > > > > > > are not supported at all.
> > > > > > > > >
> > > > > > > > > Old drivers are used to sync, but they've got async
> wrappers to
> > > support
> > > > > > async
> > > > > > > > > APIs. Eg.
> > > > > > > > > crypto: acomp - add support for lz4 via scomp
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> > > > > > > > crypto/lz4.c?id=8cd9330e0a615c931037d4def98b5ce0d540f08d
> > > > > > > > >
> > > > > > > > > crypto: acomp - add support for lzo via scomp
> > > > > > > > >
> > > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> > > > > > > > crypto/lzo.c?id=ac9d2c4b39e022d2c61486bfc33b730cfd02898e
> > > > > > > > >
> > > > > > > > > so they are supporting async APIs but they are still
> working in
> > > sync
> > > > > mode
> > > > > > > > as
> > > > > > > > > those old drivers don't sleep.
> > > > > > > > >
> > > > > > > >
> > > > > > > > Good to know that those are sync because I want them to be
> sync.
> > > > > > > > Please note that zswap is a cache in front of a real swap
> and the
> > > load
> > > > > > > > operation is latency sensitive as it comes in the page fault
> path
> > > and
> > > > > > > > directly impacts the applications. I doubt decompressing
> synchronously
> > > > > > > > a 4k page on a cpu will be costlier than asynchronously
> decompressing
> > > > > > > > the same page from hardware accelerators.
> > > > > > >
> > > > > > > If you read the old paper:
> > > > > > >
> > > > > >
> > > > >
> > >
> >
> https://www.ibm.com/support/pages/new-linux-zswap-compression-functionalit
> > > > > > y
> > > > > > > Because the hardware accelerator speeds up compression,
> looking at
> > the
> > > zswap
> > > > > > > metrics we observed that there were more store and load
> requests in
> > > a given
> > > > > > > amount of time, which filled up the zswap pool faster than a
> software
> > > > > > > compression run. Because of this behavior, we set the
> max_pool_percent
> > > > > > > parameter to 30 for the hardware compression runs - this means
> that
> > > zswap
> > > > > > > can use up to 30% of the 10GB of total memory.
> > > > > > >
> > > > > > > So using hardware accelerators, we get a chance to speed up
> compression
> > > > > > > while decreasing cpu utilization.
> > > > > > >
> > > > > > > BTW, If it is not easy to change zsmalloc, one quick
> workaround we
> > might
> > > > > do
> > > > > > > in zswap is adding the below after applying Mike's original
> patch:
> > > > > > >
> > > > > > > if(in_atomic()) /* for zsmalloc */
> > > > > > >         while(!try_wait_for_completion(&req->done);
> > > > > > > else /* for zbud, z3fold */
> > > > > > >         crypto_wait_req(....);
> > > > > >
> > > > > > I don't think I'm going to ack this, sorry.
> > > > > >
> > > > >
> > > > > Fair enough. And I am also thinking if we can move
> zpool_unmap_handle()
> > > > > quite after zpool_map_handle() as below:
> > > > >
> > > > >       dlen = PAGE_SIZE;
> > > > >       src = zpool_map_handle(entry->pool->zpool, entry->handle,
> > > ZPOOL_MM_RO);
> > > > >       if (zpool_evictable(entry->pool->zpool))
> > > > >               src += sizeof(struct zswap_header);
> > > > > +     zpool_unmap_handle(entry->pool->zpool, entry->handle);
> > > > >
> > > > >       acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> > > > >       mutex_lock(acomp_ctx->mutex);
> > > > >       sg_init_one(&input, src, entry->length);
> > > > >       sg_init_table(&output, 1);
> > > > >       sg_set_page(&output, page, PAGE_SIZE, 0);
> > > > >       acomp_request_set_params(acomp_ctx->req, &input, &output,
> > > entry->length,
> > > > > dlen);
> > > > >       ret =
> crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req),
> > > > > &acomp_ctx->wait);
> > > > >       mutex_unlock(acomp_ctx->mutex);
> > > > >
> > > > > -     zpool_unmap_handle(entry->pool->zpool, entry->handle);
> > > > >
> > > > > Since src is always low memory and we only need its virtual address
> > > > > to get the page of src in sg_init_one(). We don't actually read it
> > > > > by CPU anywhere.
> > > >
> > > > The below code might be better:
> > > >
> > > >         dlen = PAGE_SIZE;
> > > >         src = zpool_map_handle(entry->pool->zpool, entry->handle,
> > > ZPOOL_MM_RO);
> > > >         if (zpool_evictable(entry->pool->zpool))
> > > >                 src += sizeof(struct zswap_header);
> > > >
> > > >         acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> > > >
> > > > +       zpool_unmap_handle(entry->pool->zpool, entry->handle);
> > > >
> > > >         mutex_lock(acomp_ctx->mutex);
> > > >         sg_init_one(&input, src, entry->length);
> > > >         sg_init_table(&output, 1);
> > > >         sg_set_page(&output, page, PAGE_SIZE, 0);
> > > >         acomp_request_set_params(acomp_ctx->req, &input, &output,
> > > entry->length, dlen);
> > > >         ret =
> crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req),
> > > &acomp_ctx->wait);
> > > >         mutex_unlock(acomp_ctx->mutex);
> > > >
> > > > -       zpool_unmap_handle(entry->pool->zpool, entry->handle);
> > >
> > > I don't see how this is going to work since we can't guarantee src
> > > will be a valid pointer after the zpool_unmap_handle() call, can we?
> > > Could you please elaborate?
> >
> > A valid pointer is for cpu to read and write. Here, cpu doesn't read
> > and write it, we only need to get page struct from the address.
> >
> > void sg_init_one(struct scatterlist *sg, const void *buf, unsigned int
> buflen)
> > {
> >       sg_init_table(sg, 1);
> >       sg_set_buf(sg, buf, buflen);
> > }
> >
> > static inline void sg_set_buf(struct scatterlist *sg, const void *buf,
> >                             unsigned int buflen)
> > {
> > #ifdef CONFIG_DEBUG_SG
> >       BUG_ON(!virt_addr_valid(buf));
> > #endif
> >       sg_set_page(sg, virt_to_page(buf), buflen, offset_in_page(buf));
> > }
> >
> > sg_init_one() is always using an address which has a linear mapping
> > with physical address.
> > So once we get the value of src, we can get the page struct.
> >
> > src has a linear mapping with physical address. It doesn't require
> > page table walk which vmalloc_to_page() wants.
> >
> > The req only requires page to initialize sg table, I think if
> > we are going to use a cpu-based (de)compression, the crypto
> > driver will kmap it again.
>
> Probably I made another bug here. for zsmalloc, it is possible to
> get highmem for zpool since its malloc_support_movable = true.
>
>         if (zpool_malloc_support_movable(entry->pool->zpool))
>                 gfp |= __GFP_HIGHMEM | __GFP_MOVABLE;
>         ret = zpool_malloc(entry->pool->zpool, hlen + dlen, gfp, &handle);
>
> For 64bit system, there is never a highmem. For 32bit system, we may
> trigger this bug.
>
> So actually zswap should have used kmap_to_page() which can support
> both linear mapping and non-linear mapping. sg_init_one() only supports
> linear mapping.
> But it does't change the fact: Once req is initialized with page
> struct, we can unmap src. If we are going to use a HW accelerator,
> it would be a DMA; if we are going to use CPU decompression, crypto
> driver will kmap() again.
>

I'm still not convinced. Will kmap what, src? At this point src might
become just a bogus pointer. Why couldn't the object have been moved
somewhere else (due to the compaction mechanism for instance) at the time
DMA kicks in?


> >
> > >
> > > ~Vitaly
> >
>
> Thanks
> Barry
>

--000000000000bfe60805b703e488
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"auto"><div><br><br><div class=3D"gmail_quote"><div dir=3D"ltr" =
class=3D"gmail_attr">On Tue, 22 Dec 2020, 02:42 Song Bao Hua (Barry Song), =
&lt;<a href=3D"mailto:song.bao.hua@hisilicon.com">song.bao.hua@hisilicon.co=
m</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"margin=
:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
<br>
&gt; -----Original Message-----<br>
&gt; From: Song Bao Hua (Barry Song)<br>
&gt; Sent: Tuesday, December 22, 2020 2:06 PM<br>
&gt; To: &#39;Vitaly Wool&#39; &lt;<a href=3D"mailto:vitaly.wool@konsulko.c=
om" target=3D"_blank" rel=3D"noreferrer">vitaly.wool@konsulko.com</a>&gt;<b=
r>
&gt; Cc: Shakeel Butt &lt;<a href=3D"mailto:shakeelb@google.com" target=3D"=
_blank" rel=3D"noreferrer">shakeelb@google.com</a>&gt;; Minchan Kim &lt;<a =
href=3D"mailto:minchan@kernel.org" target=3D"_blank" rel=3D"noreferrer">min=
chan@kernel.org</a>&gt;; Mike<br>
&gt; Galbraith &lt;<a href=3D"mailto:efault@gmx.de" target=3D"_blank" rel=
=3D"noreferrer">efault@gmx.de</a>&gt;; LKML &lt;<a href=3D"mailto:linux-ker=
nel@vger.kernel.org" target=3D"_blank" rel=3D"noreferrer">linux-kernel@vger=
.kernel.org</a>&gt;; linux-mm<br>
&gt; &lt;<a href=3D"mailto:linux-mm@kvack.org" target=3D"_blank" rel=3D"nor=
eferrer">linux-mm@kvack.org</a>&gt;; Sebastian Andrzej Siewior &lt;<a href=
=3D"mailto:bigeasy@linutronix.de" target=3D"_blank" rel=3D"noreferrer">bige=
asy@linutronix.de</a>&gt;;<br>
&gt; NitinGupta &lt;<a href=3D"mailto:ngupta@vflare.org" target=3D"_blank" =
rel=3D"noreferrer">ngupta@vflare.org</a>&gt;; Sergey Senozhatsky<br>
&gt; &lt;<a href=3D"mailto:sergey.senozhatsky.work@gmail.com" target=3D"_bl=
ank" rel=3D"noreferrer">sergey.senozhatsky.work@gmail.com</a>&gt;; Andrew M=
orton<br>
&gt; &lt;<a href=3D"mailto:akpm@linux-foundation.org" target=3D"_blank" rel=
=3D"noreferrer">akpm@linux-foundation.org</a>&gt;<br>
&gt; Subject: RE: [PATCH] zsmalloc: do not use bit_spin_lock<br>
&gt; <br>
&gt; <br>
&gt; <br>
&gt; &gt; -----Original Message-----<br>
&gt; &gt; From: Vitaly Wool [mailto:<a href=3D"mailto:vitaly.wool@konsulko.=
com" target=3D"_blank" rel=3D"noreferrer">vitaly.wool@konsulko.com</a>]<br>
&gt; &gt; Sent: Tuesday, December 22, 2020 2:00 PM<br>
&gt; &gt; To: Song Bao Hua (Barry Song) &lt;<a href=3D"mailto:song.bao.hua@=
hisilicon.com" target=3D"_blank" rel=3D"noreferrer">song.bao.hua@hisilicon.=
com</a>&gt;<br>
&gt; &gt; Cc: Shakeel Butt &lt;<a href=3D"mailto:shakeelb@google.com" targe=
t=3D"_blank" rel=3D"noreferrer">shakeelb@google.com</a>&gt;; Minchan Kim &l=
t;<a href=3D"mailto:minchan@kernel.org" target=3D"_blank" rel=3D"noreferrer=
">minchan@kernel.org</a>&gt;;<br>
&gt; Mike<br>
&gt; &gt; Galbraith &lt;<a href=3D"mailto:efault@gmx.de" target=3D"_blank" =
rel=3D"noreferrer">efault@gmx.de</a>&gt;; LKML &lt;<a href=3D"mailto:linux-=
kernel@vger.kernel.org" target=3D"_blank" rel=3D"noreferrer">linux-kernel@v=
ger.kernel.org</a>&gt;; linux-mm<br>
&gt; &gt; &lt;<a href=3D"mailto:linux-mm@kvack.org" target=3D"_blank" rel=
=3D"noreferrer">linux-mm@kvack.org</a>&gt;; Sebastian Andrzej Siewior &lt;<=
a href=3D"mailto:bigeasy@linutronix.de" target=3D"_blank" rel=3D"noreferrer=
">bigeasy@linutronix.de</a>&gt;;<br>
&gt; &gt; NitinGupta &lt;<a href=3D"mailto:ngupta@vflare.org" target=3D"_bl=
ank" rel=3D"noreferrer">ngupta@vflare.org</a>&gt;; Sergey Senozhatsky<br>
&gt; &gt; &lt;<a href=3D"mailto:sergey.senozhatsky.work@gmail.com" target=
=3D"_blank" rel=3D"noreferrer">sergey.senozhatsky.work@gmail.com</a>&gt;; A=
ndrew Morton<br>
&gt; &gt; &lt;<a href=3D"mailto:akpm@linux-foundation.org" target=3D"_blank=
" rel=3D"noreferrer">akpm@linux-foundation.org</a>&gt;<br>
&gt; &gt; Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock<br>
&gt; &gt;<br>
&gt; &gt; On Tue, Dec 22, 2020 at 12:37 AM Song Bao Hua (Barry Song)<br>
&gt; &gt; &lt;<a href=3D"mailto:song.bao.hua@hisilicon.com" target=3D"_blan=
k" rel=3D"noreferrer">song.bao.hua@hisilicon.com</a>&gt; wrote:<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; -----Original Message-----<br>
&gt; &gt; &gt; &gt; From: Song Bao Hua (Barry Song)<br>
&gt; &gt; &gt; &gt; Sent: Tuesday, December 22, 2020 11:38 AM<br>
&gt; &gt; &gt; &gt; To: &#39;Vitaly Wool&#39; &lt;<a href=3D"mailto:vitaly.=
wool@konsulko.com" target=3D"_blank" rel=3D"noreferrer">vitaly.wool@konsulk=
o.com</a>&gt;<br>
&gt; &gt; &gt; &gt; Cc: Shakeel Butt &lt;<a href=3D"mailto:shakeelb@google.=
com" target=3D"_blank" rel=3D"noreferrer">shakeelb@google.com</a>&gt;; Minc=
han Kim &lt;<a href=3D"mailto:minchan@kernel.org" target=3D"_blank" rel=3D"=
noreferrer">minchan@kernel.org</a>&gt;;<br>
&gt; &gt; Mike<br>
&gt; &gt; &gt; &gt; Galbraith &lt;<a href=3D"mailto:efault@gmx.de" target=
=3D"_blank" rel=3D"noreferrer">efault@gmx.de</a>&gt;; LKML &lt;<a href=3D"m=
ailto:linux-kernel@vger.kernel.org" target=3D"_blank" rel=3D"noreferrer">li=
nux-kernel@vger.kernel.org</a>&gt;; linux-mm<br>
&gt; &gt; &gt; &gt; &lt;<a href=3D"mailto:linux-mm@kvack.org" target=3D"_bl=
ank" rel=3D"noreferrer">linux-mm@kvack.org</a>&gt;; Sebastian Andrzej Siewi=
or &lt;<a href=3D"mailto:bigeasy@linutronix.de" target=3D"_blank" rel=3D"no=
referrer">bigeasy@linutronix.de</a>&gt;;<br>
&gt; &gt; &gt; &gt; NitinGupta &lt;<a href=3D"mailto:ngupta@vflare.org" tar=
get=3D"_blank" rel=3D"noreferrer">ngupta@vflare.org</a>&gt;; Sergey Senozha=
tsky<br>
&gt; &gt; &gt; &gt; &lt;<a href=3D"mailto:sergey.senozhatsky.work@gmail.com=
" target=3D"_blank" rel=3D"noreferrer">sergey.senozhatsky.work@gmail.com</a=
>&gt;; Andrew Morton<br>
&gt; &gt; &gt; &gt; &lt;<a href=3D"mailto:akpm@linux-foundation.org" target=
=3D"_blank" rel=3D"noreferrer">akpm@linux-foundation.org</a>&gt;<br>
&gt; &gt; &gt; &gt; Subject: RE: [PATCH] zsmalloc: do not use bit_spin_lock=
<br>
&gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; -----Original Message-----<br>
&gt; &gt; &gt; &gt; &gt; From: Vitaly Wool [mailto:<a href=3D"mailto:vitaly=
.wool@konsulko.com" target=3D"_blank" rel=3D"noreferrer">vitaly.wool@konsul=
ko.com</a>]<br>
&gt; &gt; &gt; &gt; &gt; Sent: Tuesday, December 22, 2020 11:12 AM<br>
&gt; &gt; &gt; &gt; &gt; To: Song Bao Hua (Barry Song) &lt;<a href=3D"mailt=
o:song.bao.hua@hisilicon.com" target=3D"_blank" rel=3D"noreferrer">song.bao=
.hua@hisilicon.com</a>&gt;<br>
&gt; &gt; &gt; &gt; &gt; Cc: Shakeel Butt &lt;<a href=3D"mailto:shakeelb@go=
ogle.com" target=3D"_blank" rel=3D"noreferrer">shakeelb@google.com</a>&gt;;=
 Minchan Kim<br>
&gt; &lt;<a href=3D"mailto:minchan@kernel.org" target=3D"_blank" rel=3D"nor=
eferrer">minchan@kernel.org</a>&gt;;<br>
&gt; &gt; &gt; &gt; Mike<br>
&gt; &gt; &gt; &gt; &gt; Galbraith &lt;<a href=3D"mailto:efault@gmx.de" tar=
get=3D"_blank" rel=3D"noreferrer">efault@gmx.de</a>&gt;; LKML &lt;<a href=
=3D"mailto:linux-kernel@vger.kernel.org" target=3D"_blank" rel=3D"noreferre=
r">linux-kernel@vger.kernel.org</a>&gt;;<br>
&gt; linux-mm<br>
&gt; &gt; &gt; &gt; &gt; &lt;<a href=3D"mailto:linux-mm@kvack.org" target=
=3D"_blank" rel=3D"noreferrer">linux-mm@kvack.org</a>&gt;; Sebastian Andrze=
j Siewior<br>
&gt; &lt;<a href=3D"mailto:bigeasy@linutronix.de" target=3D"_blank" rel=3D"=
noreferrer">bigeasy@linutronix.de</a>&gt;;<br>
&gt; &gt; &gt; &gt; &gt; NitinGupta &lt;<a href=3D"mailto:ngupta@vflare.org=
" target=3D"_blank" rel=3D"noreferrer">ngupta@vflare.org</a>&gt;; Sergey Se=
nozhatsky<br>
&gt; &gt; &gt; &gt; &gt; &lt;<a href=3D"mailto:sergey.senozhatsky.work@gmai=
l.com" target=3D"_blank" rel=3D"noreferrer">sergey.senozhatsky.work@gmail.c=
om</a>&gt;; Andrew Morton<br>
&gt; &gt; &gt; &gt; &gt; &lt;<a href=3D"mailto:akpm@linux-foundation.org" t=
arget=3D"_blank" rel=3D"noreferrer">akpm@linux-foundation.org</a>&gt;<br>
&gt; &gt; &gt; &gt; &gt; Subject: Re: [PATCH] zsmalloc: do not use bit_spin=
_lock<br>
&gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; On Mon, Dec 21, 2020 at 10:30 PM Song Bao Hua (Bar=
ry Song)<br>
&gt; &gt; &gt; &gt; &gt; &lt;<a href=3D"mailto:song.bao.hua@hisilicon.com" =
target=3D"_blank" rel=3D"noreferrer">song.bao.hua@hisilicon.com</a>&gt; wro=
te:<br>
&gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; -----Original Message-----<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; From: Shakeel Butt [mailto:<a href=3D"ma=
ilto:shakeelb@google.com" target=3D"_blank" rel=3D"noreferrer">shakeelb@goo=
gle.com</a>]<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; Sent: Tuesday, December 22, 2020 10:03 A=
M<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; To: Song Bao Hua (Barry Song) &lt;<a hre=
f=3D"mailto:song.bao.hua@hisilicon.com" target=3D"_blank" rel=3D"noreferrer=
">song.bao.hua@hisilicon.com</a>&gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; Cc: Vitaly Wool &lt;<a href=3D"mailto:vi=
taly.wool@konsulko.com" target=3D"_blank" rel=3D"noreferrer">vitaly.wool@ko=
nsulko.com</a>&gt;; Minchan Kim<br>
&gt; &gt; &gt; &gt; &gt; &lt;<a href=3D"mailto:minchan@kernel.org" target=
=3D"_blank" rel=3D"noreferrer">minchan@kernel.org</a>&gt;;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; Mike Galbraith &lt;<a href=3D"mailto:efa=
ult@gmx.de" target=3D"_blank" rel=3D"noreferrer">efault@gmx.de</a>&gt;; LKM=
L &lt;<a href=3D"mailto:linux-kernel@vger.kernel.org" target=3D"_blank" rel=
=3D"noreferrer">linux-kernel@vger.kernel.org</a>&gt;;<br>
&gt; &gt; &gt; &gt; &gt; linux-mm<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &lt;<a href=3D"mailto:linux-mm@kvack.org=
" target=3D"_blank" rel=3D"noreferrer">linux-mm@kvack.org</a>&gt;; Sebastia=
n Andrzej Siewior<br>
&gt; &gt; &lt;<a href=3D"mailto:bigeasy@linutronix.de" target=3D"_blank" re=
l=3D"noreferrer">bigeasy@linutronix.de</a>&gt;;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; NitinGupta &lt;<a href=3D"mailto:ngupta@=
vflare.org" target=3D"_blank" rel=3D"noreferrer">ngupta@vflare.org</a>&gt;;=
 Sergey Senozhatsky<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &lt;<a href=3D"mailto:sergey.senozhatsky=
.work@gmail.com" target=3D"_blank" rel=3D"noreferrer">sergey.senozhatsky.wo=
rk@gmail.com</a>&gt;; Andrew Morton<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &lt;<a href=3D"mailto:akpm@linux-foundat=
ion.org" target=3D"_blank" rel=3D"noreferrer">akpm@linux-foundation.org</a>=
&gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; Subject: Re: [PATCH] zsmalloc: do not us=
e bit_spin_lock<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; On Mon, Dec 21, 2020 at 12:06 PM Song Ba=
o Hua (Barry Song)<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &lt;<a href=3D"mailto:song.bao.hua@hisil=
icon.com" target=3D"_blank" rel=3D"noreferrer">song.bao.hua@hisilicon.com</=
a>&gt; wrote:<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; -----Original Message-----<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; From: Shakeel Butt [mailto:<a =
href=3D"mailto:shakeelb@google.com" target=3D"_blank" rel=3D"noreferrer">sh=
akeelb@google.com</a>]<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Sent: Tuesday, December 22, 20=
20 8:50 AM<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; To: Vitaly Wool &lt;<a href=3D=
"mailto:vitaly.wool@konsulko.com" target=3D"_blank" rel=3D"noreferrer">vita=
ly.wool@konsulko.com</a>&gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Cc: Minchan Kim &lt;<a href=3D=
"mailto:minchan@kernel.org" target=3D"_blank" rel=3D"noreferrer">minchan@ke=
rnel.org</a>&gt;; Mike Galbraith<br>
&gt; &gt; &lt;<a href=3D"mailto:efault@gmx.de" target=3D"_blank" rel=3D"nor=
eferrer">efault@gmx.de</a>&gt;;<br>
&gt; &gt; &gt; &gt; &gt; LKML<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &lt;<a href=3D"mailto:linux-ke=
rnel@vger.kernel.org" target=3D"_blank" rel=3D"noreferrer">linux-kernel@vge=
r.kernel.org</a>&gt;; linux-mm &lt;<a href=3D"mailto:linux-mm@kvack.org" ta=
rget=3D"_blank" rel=3D"noreferrer">linux-mm@kvack.org</a>&gt;;<br>
&gt; &gt; Song<br>
&gt; &gt; &gt; &gt; &gt; Bao<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; Hua<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; (Barry Song) &lt;<a href=3D"ma=
ilto:song.bao.hua@hisilicon.com" target=3D"_blank" rel=3D"noreferrer">song.=
bao.hua@hisilicon.com</a>&gt;; Sebastian Andrzej<br>
&gt; &gt; Siewior<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &lt;<a href=3D"mailto:bigeasy@=
linutronix.de" target=3D"_blank" rel=3D"noreferrer">bigeasy@linutronix.de</=
a>&gt;; NitinGupta &lt;<a href=3D"mailto:ngupta@vflare.org" target=3D"_blan=
k" rel=3D"noreferrer">ngupta@vflare.org</a>&gt;; Sergey<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; Senozhatsky<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &lt;<a href=3D"mailto:sergey.s=
enozhatsky.work@gmail.com" target=3D"_blank" rel=3D"noreferrer">sergey.seno=
zhatsky.work@gmail.com</a>&gt;; Andrew Morton<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &lt;<a href=3D"mailto:akpm@lin=
ux-foundation.org" target=3D"_blank" rel=3D"noreferrer">akpm@linux-foundati=
on.org</a>&gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Subject: Re: [PATCH] zsmalloc:=
 do not use bit_spin_lock<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; On Mon, Dec 21, 2020 at 11:20 =
AM Vitaly Wool<br>
&gt; &gt; &lt;<a href=3D"mailto:vitaly.wool@konsulko.com" target=3D"_blank"=
 rel=3D"noreferrer">vitaly.wool@konsulko.com</a>&gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; wrote:<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; On Mon, Dec 21, 2020 at 6=
:24 PM Minchan Kim &lt;<a href=3D"mailto:minchan@kernel.org" target=3D"_bla=
nk" rel=3D"noreferrer">minchan@kernel.org</a>&gt;<br>
&gt; &gt; &gt; &gt; wrote:<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; On Sun, Dec 20, 2020=
 at 02:22:28AM +0200, Vitaly Wool wrote:<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; zsmalloc takes =
bit spinlock in its _map() callback and<br>
&gt; releases<br>
&gt; &gt; &gt; &gt; &gt; it<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; only in unmap()=
 which is unsafe and leads to zswap complaining<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; about schedulin=
g in atomic context.<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; To fix that and=
 to improve RT properties of zsmalloc,<br>
&gt; remove<br>
&gt; &gt; &gt; &gt; that<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; bit spinlock co=
mpletely and use a bit flag instead.<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; I don&#39;t want to =
use such open code for the lock.<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; I see from Mike&#39;=
s patch, recent zswap change introduced<br>
&gt; the<br>
&gt; &gt; lockdep<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; splat bug and you wa=
nt to improve zsmalloc to fix the zswap<br>
&gt; &gt; bug<br>
&gt; &gt; &gt; &gt; &gt; and<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; introduce this patch=
 with allowing preemption enabling.<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; This understanding is ups=
ide down. The code in zswap you are<br>
&gt; &gt; referring<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; to is not buggy.=C2=A0 Yo=
u may claim that it is suboptimal but there<br>
&gt; &gt; is<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; nothing wrong in taking a=
 mutex.<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Is this suboptimal for all or =
just the hardware accelerators?<br>
&gt; &gt; Sorry,<br>
&gt; &gt; &gt; &gt; &gt; I<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; am not very familiar with the =
crypto API. If I select lzo or<br>
&gt; lz4<br>
&gt; &gt; as<br>
&gt; &gt; &gt; &gt; &gt; a<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; zswap compressor will the [de]=
compression be async or sync?<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Right now, in crypto subsystem, new=
 drivers are required to write<br>
&gt; &gt; based<br>
&gt; &gt; &gt; &gt; &gt; on<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; async APIs. The old sync API can=
9;t work in new accelerator drivers<br>
&gt; &gt; as<br>
&gt; &gt; &gt; &gt; &gt; they<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; are not supported at all.<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; Old drivers are used to sync, but t=
hey&#39;ve got async wrappers to<br>
&gt; &gt; support<br>
&gt; &gt; &gt; &gt; &gt; async<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; APIs. Eg.<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; crypto: acomp - add support for lz4=
 via scomp<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt;<br>
&gt; &gt;<br>
&gt; <a href=3D"https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/li=
nux.git/commit/" rel=3D"noreferrer noreferrer" target=3D"_blank">https://gi=
t.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/</a><br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; crypto/lz4.c?id=3D8cd9330e0a615c931037d4=
def98b5ce0d540f08d<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; crypto: acomp - add support for lzo=
 via scomp<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt;<br>
&gt; &gt;<br>
&gt; <a href=3D"https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/li=
nux.git/commit/" rel=3D"noreferrer noreferrer" target=3D"_blank">https://gi=
t.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/</a><br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; crypto/lzo.c?id=3Dac9d2c4b39e022d2c61486=
bfc33b730cfd02898e<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; so they are supporting async APIs b=
ut they are still working in<br>
&gt; &gt; sync<br>
&gt; &gt; &gt; &gt; mode<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; as<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt; those old drivers don&#39;t sleep.<=
br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; Good to know that those are sync because=
 I want them to be sync.<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; Please note that zswap is a cache in fro=
nt of a real swap and the<br>
&gt; &gt; load<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; operation is latency sensitive as it com=
es in the page fault path<br>
&gt; &gt; and<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; directly impacts the applications. I dou=
bt decompressing synchronously<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; a 4k page on a cpu will be costlier than=
 asynchronously decompressing<br>
&gt; &gt; &gt; &gt; &gt; &gt; &gt; the same page from hardware accelerators=
.<br>
&gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; If you read the old paper:<br>
&gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt;<br>
&gt; &gt;<br>
&gt; <a href=3D"https://www.ibm.com/support/pages/new-linux-zswap-compressi=
on-functionalit" rel=3D"noreferrer noreferrer" target=3D"_blank">https://ww=
w.ibm.com/support/pages/new-linux-zswap-compression-functionalit</a><br>
&gt; &gt; &gt; &gt; &gt; y<br>
&gt; &gt; &gt; &gt; &gt; &gt; Because the hardware accelerator speeds up co=
mpression, looking at<br>
&gt; the<br>
&gt; &gt; zswap<br>
&gt; &gt; &gt; &gt; &gt; &gt; metrics we observed that there were more stor=
e and load requests in<br>
&gt; &gt; a given<br>
&gt; &gt; &gt; &gt; &gt; &gt; amount of time, which filled up the zswap poo=
l faster than a software<br>
&gt; &gt; &gt; &gt; &gt; &gt; compression run. Because of this behavior, we=
 set the max_pool_percent<br>
&gt; &gt; &gt; &gt; &gt; &gt; parameter to 30 for the hardware compression =
runs - this means that<br>
&gt; &gt; zswap<br>
&gt; &gt; &gt; &gt; &gt; &gt; can use up to 30% of the 10GB of total memory=
.<br>
&gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; So using hardware accelerators, we get a chan=
ce to speed up compression<br>
&gt; &gt; &gt; &gt; &gt; &gt; while decreasing cpu utilization.<br>
&gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; BTW, If it is not easy to change zsmalloc, on=
e quick workaround we<br>
&gt; might<br>
&gt; &gt; &gt; &gt; do<br>
&gt; &gt; &gt; &gt; &gt; &gt; in zswap is adding the below after applying M=
ike&#39;s original patch:<br>
&gt; &gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; &gt; if(in_atomic()) /* for zsmalloc */<br>
&gt; &gt; &gt; &gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0while(!try_w=
ait_for_completion(&amp;req-&gt;done);<br>
&gt; &gt; &gt; &gt; &gt; &gt; else /* for zbud, z3fold */<br>
&gt; &gt; &gt; &gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0crypto_wait_=
req(....);<br>
&gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; &gt; I don&#39;t think I&#39;m going to ack this, sorry=
.<br>
&gt; &gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; Fair enough. And I am also thinking if we can move zpoo=
l_unmap_handle()<br>
&gt; &gt; &gt; &gt; quite after zpool_map_handle() as below:<br>
&gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0dlen =3D PAGE_SIZE;<br>
&gt; &gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0src =3D zpool_map_handle(entr=
y-&gt;pool-&gt;zpool, entry-&gt;handle,<br>
&gt; &gt; ZPOOL_MM_RO);<br>
&gt; &gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0if (zpool_evictable(entry-&gt=
;pool-&gt;zpool))<br>
&gt; &gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0s=
rc +=3D sizeof(struct zswap_header);<br>
&gt; &gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0zpool_unmap_handle(entry-&gt;pool-=
&gt;zpool, entry-&gt;handle);<br>
&gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0acomp_ctx =3D raw_cpu_ptr(ent=
ry-&gt;pool-&gt;acomp_ctx);<br>
&gt; &gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0mutex_lock(acomp_ctx-&gt;mute=
x);<br>
&gt; &gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0sg_init_one(&amp;input, src, =
entry-&gt;length);<br>
&gt; &gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0sg_init_table(&amp;output, 1)=
;<br>
&gt; &gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0sg_set_page(&amp;output, page=
, PAGE_SIZE, 0);<br>
&gt; &gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0acomp_request_set_params(acom=
p_ctx-&gt;req, &amp;input, &amp;output,<br>
&gt; &gt; entry-&gt;length,<br>
&gt; &gt; &gt; &gt; dlen);<br>
&gt; &gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0ret =3D crypto_wait_req(crypt=
o_acomp_decompress(acomp_ctx-&gt;req),<br>
&gt; &gt; &gt; &gt; &amp;acomp_ctx-&gt;wait);<br>
&gt; &gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0mutex_unlock(acomp_ctx-&gt;mu=
tex);<br>
&gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; -=C2=A0 =C2=A0 =C2=A0zpool_unmap_handle(entry-&gt;pool-=
&gt;zpool, entry-&gt;handle);<br>
&gt; &gt; &gt; &gt;<br>
&gt; &gt; &gt; &gt; Since src is always low memory and we only need its vir=
tual address<br>
&gt; &gt; &gt; &gt; to get the page of src in sg_init_one(). We don&#39;t a=
ctually read it<br>
&gt; &gt; &gt; &gt; by CPU anywhere.<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; The below code might be better:<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0dlen =3D PAGE_SIZE;<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0src =3D zpool_map_handle(en=
try-&gt;pool-&gt;zpool, entry-&gt;handle,<br>
&gt; &gt; ZPOOL_MM_RO);<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0if (zpool_evictable(entry-&=
gt;pool-&gt;zpool))<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0src +=3D sizeof(struct zswap_header);<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0acomp_ctx =3D raw_cpu_ptr(e=
ntry-&gt;pool-&gt;acomp_ctx);<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; +=C2=A0 =C2=A0 =C2=A0 =C2=A0zpool_unmap_handle(entry-&gt;poo=
l-&gt;zpool, entry-&gt;handle);<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mutex_lock(acomp_ctx-&gt;mu=
tex);<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0sg_init_one(&amp;input, src=
, entry-&gt;length);<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0sg_init_table(&amp;output, =
1);<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0sg_set_page(&amp;output, pa=
ge, PAGE_SIZE, 0);<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0acomp_request_set_params(ac=
omp_ctx-&gt;req, &amp;input, &amp;output,<br>
&gt; &gt; entry-&gt;length, dlen);<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0ret =3D crypto_wait_req(cry=
pto_acomp_decompress(acomp_ctx-&gt;req),<br>
&gt; &gt; &amp;acomp_ctx-&gt;wait);<br>
&gt; &gt; &gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0mutex_unlock(acomp_ctx-&gt;=
mutex);<br>
&gt; &gt; &gt;<br>
&gt; &gt; &gt; -=C2=A0 =C2=A0 =C2=A0 =C2=A0zpool_unmap_handle(entry-&gt;poo=
l-&gt;zpool, entry-&gt;handle);<br>
&gt; &gt;<br>
&gt; &gt; I don&#39;t see how this is going to work since we can&#39;t guar=
antee src<br>
&gt; &gt; will be a valid pointer after the zpool_unmap_handle() call, can =
we?<br>
&gt; &gt; Could you please elaborate?<br>
&gt; <br>
&gt; A valid pointer is for cpu to read and write. Here, cpu doesn&#39;t re=
ad<br>
&gt; and write it, we only need to get page struct from the address.<br>
&gt; <br>
&gt; void sg_init_one(struct scatterlist *sg, const void *buf, unsigned int=
 buflen)<br>
&gt; {<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0sg_init_table(sg, 1);<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0sg_set_buf(sg, buf, buflen);<br>
&gt; }<br>
&gt; <br>
&gt; static inline void sg_set_buf(struct scatterlist *sg, const void *buf,=
<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0unsigned int buflen)<br>
&gt; {<br>
&gt; #ifdef CONFIG_DEBUG_SG<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0BUG_ON(!virt_addr_valid(buf));<br>
&gt; #endif<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0sg_set_page(sg, virt_to_page(buf), buflen, o=
ffset_in_page(buf));<br>
&gt; }<br>
&gt; <br>
&gt; sg_init_one() is always using an address which has a linear mapping<br=
>
&gt; with physical address.<br>
&gt; So once we get the value of src, we can get the page struct.<br>
&gt; <br>
&gt; src has a linear mapping with physical address. It doesn&#39;t require=
<br>
&gt; page table walk which vmalloc_to_page() wants.<br>
&gt; <br>
&gt; The req only requires page to initialize sg table, I think if<br>
&gt; we are going to use a cpu-based (de)compression, the crypto<br>
&gt; driver will kmap it again.<br>
<br>
Probably I made another bug here. for zsmalloc, it is possible to<br>
get highmem for zpool since its malloc_support_movable =3D true.<br>
<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 if (zpool_malloc_support_movable(entry-&gt;pool=
-&gt;zpool))<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 gfp |=3D __GFP_HIGH=
MEM | __GFP_MOVABLE;<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 ret =3D zpool_malloc(entry-&gt;pool-&gt;zpool, =
hlen + dlen, gfp, &amp;handle);<br>
<br>
For 64bit system, there is never a highmem. For 32bit system, we may<br>
trigger this bug.<br>
<br>
So actually zswap should have used kmap_to_page() which can support<br>
both linear mapping and non-linear mapping. sg_init_one() only supports<br>
linear mapping.<br>
But it does&#39;t change the fact: Once req is initialized with page<br>
struct, we can unmap src. If we are going to use a HW accelerator,<br>
it would be a DMA; if we are going to use CPU decompression, crypto<br>
driver will kmap() again.<br></blockquote></div></div><div dir=3D"auto"><br=
></div><div dir=3D"auto">I&#39;m still not convinced. Will kmap what, src? =
At this point src might become just a bogus pointer. Why couldn&#39;t the o=
bject have been moved somewhere else (due to the compaction mechanism for i=
nstance) at the time DMA kicks in?</div><div dir=3D"auto"><br></div><div di=
r=3D"auto"><div class=3D"gmail_quote"><blockquote class=3D"gmail_quote" sty=
le=3D"margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<br>
&gt; <br>
&gt; &gt;<br>
&gt; &gt; ~Vitaly<br>
&gt; <br>
<br>
Thanks<br>
Barry<br>
</blockquote></div></div></div>

--000000000000bfe60805b703e488--