All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Song Bao Hua (Barry Song)" <song.bao.hua@hisilicon.com>
To: Vitaly Wool <vitaly.wool@konsulko.com>
Cc: Shakeel Butt <shakeelb@google.com>,
	Minchan Kim <minchan@kernel.org>,
	"Mike Galbraith" <efault@gmx.de>,
	LKML <linux-kernel@vger.kernel.org>,
	linux-mm <linux-mm@kvack.org>,
	Sebastian Andrzej Siewior <bigeasy@linutronix.de>,
	NitinGupta <ngupta@vflare.org>,
	Sergey Senozhatsky <sergey.senozhatsky.work@gmail.com>,
	Andrew Morton <akpm@linux-foundation.org>
Subject: RE: [PATCH] zsmalloc: do not use bit_spin_lock
Date: Mon, 21 Dec 2020 23:35:27 +0000	[thread overview]
Message-ID: <f0ca46a830e54f4482fb4f46df9675f5@hisilicon.com> (raw)
In-Reply-To: CAM4kBB+xUa8zXSRSuB0z5FCdPNmUpDfcC4Vqu7wzAkf0b+RXqw@mail.gmail.com



> -----Original Message-----
> From: Song Bao Hua (Barry Song)
> Sent: Tuesday, December 22, 2020 11:38 AM
> To: 'Vitaly Wool' <vitaly.wool@konsulko.com>
> Cc: Shakeel Butt <shakeelb@google.com>; Minchan Kim <minchan@kernel.org>; Mike
> Galbraith <efault@gmx.de>; LKML <linux-kernel@vger.kernel.org>; linux-mm
> <linux-mm@kvack.org>; Sebastian Andrzej Siewior <bigeasy@linutronix.de>;
> NitinGupta <ngupta@vflare.org>; Sergey Senozhatsky
> <sergey.senozhatsky.work@gmail.com>; Andrew Morton
> <akpm@linux-foundation.org>
> Subject: RE: [PATCH] zsmalloc: do not use bit_spin_lock
> 
> 
> 
> > -----Original Message-----
> > From: Vitaly Wool [mailto:vitaly.wool@konsulko.com]
> > Sent: Tuesday, December 22, 2020 11:12 AM
> > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> > Cc: Shakeel Butt <shakeelb@google.com>; Minchan Kim <minchan@kernel.org>;
> Mike
> > Galbraith <efault@gmx.de>; LKML <linux-kernel@vger.kernel.org>; linux-mm
> > <linux-mm@kvack.org>; Sebastian Andrzej Siewior <bigeasy@linutronix.de>;
> > NitinGupta <ngupta@vflare.org>; Sergey Senozhatsky
> > <sergey.senozhatsky.work@gmail.com>; Andrew Morton
> > <akpm@linux-foundation.org>
> > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> >
> > On Mon, Dec 21, 2020 at 10:30 PM Song Bao Hua (Barry Song)
> > <song.bao.hua@hisilicon.com> wrote:
> > >
> > >
> > >
> > > > -----Original Message-----
> > > > From: Shakeel Butt [mailto:shakeelb@google.com]
> > > > Sent: Tuesday, December 22, 2020 10:03 AM
> > > > To: Song Bao Hua (Barry Song) <song.bao.hua@hisilicon.com>
> > > > Cc: Vitaly Wool <vitaly.wool@konsulko.com>; Minchan Kim
> > <minchan@kernel.org>;
> > > > Mike Galbraith <efault@gmx.de>; LKML <linux-kernel@vger.kernel.org>;
> > linux-mm
> > > > <linux-mm@kvack.org>; Sebastian Andrzej Siewior <bigeasy@linutronix.de>;
> > > > NitinGupta <ngupta@vflare.org>; Sergey Senozhatsky
> > > > <sergey.senozhatsky.work@gmail.com>; Andrew Morton
> > > > <akpm@linux-foundation.org>
> > > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> > > >
> > > > On Mon, Dec 21, 2020 at 12:06 PM Song Bao Hua (Barry Song)
> > > > <song.bao.hua@hisilicon.com> wrote:
> > > > >
> > > > >
> > > > >
> > > > > > -----Original Message-----
> > > > > > From: Shakeel Butt [mailto:shakeelb@google.com]
> > > > > > Sent: Tuesday, December 22, 2020 8:50 AM
> > > > > > To: Vitaly Wool <vitaly.wool@konsulko.com>
> > > > > > Cc: Minchan Kim <minchan@kernel.org>; Mike Galbraith <efault@gmx.de>;
> > LKML
> > > > > > <linux-kernel@vger.kernel.org>; linux-mm <linux-mm@kvack.org>; Song
> > Bao
> > > > Hua
> > > > > > (Barry Song) <song.bao.hua@hisilicon.com>; Sebastian Andrzej Siewior
> > > > > > <bigeasy@linutronix.de>; NitinGupta <ngupta@vflare.org>; Sergey
> > > > Senozhatsky
> > > > > > <sergey.senozhatsky.work@gmail.com>; Andrew Morton
> > > > > > <akpm@linux-foundation.org>
> > > > > > Subject: Re: [PATCH] zsmalloc: do not use bit_spin_lock
> > > > > >
> > > > > > On Mon, Dec 21, 2020 at 11:20 AM Vitaly Wool <vitaly.wool@konsulko.com>
> > > > wrote:
> > > > > > >
> > > > > > > On Mon, Dec 21, 2020 at 6:24 PM Minchan Kim <minchan@kernel.org>
> wrote:
> > > > > > > >
> > > > > > > > On Sun, Dec 20, 2020 at 02:22:28AM +0200, Vitaly Wool wrote:
> > > > > > > > > zsmalloc takes bit spinlock in its _map() callback and releases
> > it
> > > > > > > > > only in unmap() which is unsafe and leads to zswap complaining
> > > > > > > > > about scheduling in atomic context.
> > > > > > > > >
> > > > > > > > > To fix that and to improve RT properties of zsmalloc, remove
> that
> > > > > > > > > bit spinlock completely and use a bit flag instead.
> > > > > > > >
> > > > > > > > I don't want to use such open code for the lock.
> > > > > > > >
> > > > > > > > I see from Mike's patch, recent zswap change introduced the lockdep
> > > > > > > > splat bug and you want to improve zsmalloc to fix the zswap bug
> > and
> > > > > > > > introduce this patch with allowing preemption enabling.
> > > > > > >
> > > > > > > This understanding is upside down. The code in zswap you are referring
> > > > > > > to is not buggy.  You may claim that it is suboptimal but there is
> > > > > > > nothing wrong in taking a mutex.
> > > > > > >
> > > > > >
> > > > > > Is this suboptimal for all or just the hardware accelerators? Sorry,
> > I
> > > > > > am not very familiar with the crypto API. If I select lzo or lz4 as
> > a
> > > > > > zswap compressor will the [de]compression be async or sync?
> > > > >
> > > > > Right now, in crypto subsystem, new drivers are required to write based
> > on
> > > > > async APIs. The old sync API can't work in new accelerator drivers as
> > they
> > > > > are not supported at all.
> > > > >
> > > > > Old drivers are used to sync, but they've got async wrappers to support
> > async
> > > > > APIs. Eg.
> > > > > crypto: acomp - add support for lz4 via scomp
> > > > >
> > > >
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> > > > crypto/lz4.c?id=8cd9330e0a615c931037d4def98b5ce0d540f08d
> > > > >
> > > > > crypto: acomp - add support for lzo via scomp
> > > > >
> > > >
> >
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/
> > > > crypto/lzo.c?id=ac9d2c4b39e022d2c61486bfc33b730cfd02898e
> > > > >
> > > > > so they are supporting async APIs but they are still working in sync
> mode
> > > > as
> > > > > those old drivers don't sleep.
> > > > >
> > > >
> > > > Good to know that those are sync because I want them to be sync.
> > > > Please note that zswap is a cache in front of a real swap and the load
> > > > operation is latency sensitive as it comes in the page fault path and
> > > > directly impacts the applications. I doubt decompressing synchronously
> > > > a 4k page on a cpu will be costlier than asynchronously decompressing
> > > > the same page from hardware accelerators.
> > >
> > > If you read the old paper:
> > >
> >
> https://www.ibm.com/support/pages/new-linux-zswap-compression-functionalit
> > y
> > > Because the hardware accelerator speeds up compression, looking at the zswap
> > > metrics we observed that there were more store and load requests in a given
> > > amount of time, which filled up the zswap pool faster than a software
> > > compression run. Because of this behavior, we set the max_pool_percent
> > > parameter to 30 for the hardware compression runs - this means that zswap
> > > can use up to 30% of the 10GB of total memory.
> > >
> > > So using hardware accelerators, we get a chance to speed up compression
> > > while decreasing cpu utilization.
> > >
> > > BTW, If it is not easy to change zsmalloc, one quick workaround we might
> do
> > > in zswap is adding the below after applying Mike's original patch:
> > >
> > > if(in_atomic()) /* for zsmalloc */
> > >         while(!try_wait_for_completion(&req->done);
> > > else /* for zbud, z3fold */
> > >         crypto_wait_req(....);
> >
> > I don't think I'm going to ack this, sorry.
> >
> 
> Fair enough. And I am also thinking if we can move zpool_unmap_handle()
> quite after zpool_map_handle() as below:
> 
> 	dlen = PAGE_SIZE;
> 	src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO);
> 	if (zpool_evictable(entry->pool->zpool))
> 		src += sizeof(struct zswap_header);
> +	zpool_unmap_handle(entry->pool->zpool, entry->handle);
> 
> 	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);
> 	mutex_lock(acomp_ctx->mutex);
> 	sg_init_one(&input, src, entry->length);
> 	sg_init_table(&output, 1);
> 	sg_set_page(&output, page, PAGE_SIZE, 0);
> 	acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length,
> dlen);
> 	ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req),
> &acomp_ctx->wait);
> 	mutex_unlock(acomp_ctx->mutex);
> 
> -	zpool_unmap_handle(entry->pool->zpool, entry->handle);
> 
> Since src is always low memory and we only need its virtual address
> to get the page of src in sg_init_one(). We don't actually read it
> by CPU anywhere.

The below code might be better:

	dlen = PAGE_SIZE;
	src = zpool_map_handle(entry->pool->zpool, entry->handle, ZPOOL_MM_RO);
	if (zpool_evictable(entry->pool->zpool))
		src += sizeof(struct zswap_header);

	acomp_ctx = raw_cpu_ptr(entry->pool->acomp_ctx);

+	zpool_unmap_handle(entry->pool->zpool, entry->handle);

	mutex_lock(acomp_ctx->mutex);
	sg_init_one(&input, src, entry->length);
	sg_init_table(&output, 1);
	sg_set_page(&output, page, PAGE_SIZE, 0);
	acomp_request_set_params(acomp_ctx->req, &input, &output, entry->length, dlen);
	ret = crypto_wait_req(crypto_acomp_decompress(acomp_ctx->req), &acomp_ctx->wait);
	mutex_unlock(acomp_ctx->mutex);

-	zpool_unmap_handle(entry->pool->zpool, entry->handle);

> 
> > Best regards,
> >    Vitaly
> >
> > > crypto_wait_req() is actually doing wait_for_completion():
> > > static inline int crypto_wait_req(int err, struct crypto_wait *wait)
> > > {
> > >         switch (err) {
> > >         case -EINPROGRESS:
> > >         case -EBUSY:
> > >                 wait_for_completion(&wait->completion);
> > >                 reinit_completion(&wait->completion);
> > >                 err = wait->err;
> > >                 break;
> > >         }
> > >
> > >         return err;
> > > }

Thanks
Barry

  parent reply	other threads:[~2020-12-21 23:38 UTC|newest]

Thread overview: 70+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-12-19 10:04 [patch] zswap: fix zswap_frontswap_load() vs zsmalloc::map/unmap() might_sleep() splat Mike Galbraith
2020-12-19 10:04 ` Mike Galbraith
2020-12-19 10:12 ` Mike Galbraith
2020-12-19 10:12   ` Mike Galbraith
2020-12-19 10:20   ` Vitaly Wool
2020-12-19 10:20     ` Vitaly Wool
2020-12-19 10:27     ` Mike Galbraith
2020-12-19 10:27       ` Mike Galbraith
2020-12-19 10:46       ` Vitaly Wool
2020-12-19 10:46         ` Vitaly Wool
2020-12-19 10:59         ` Mike Galbraith
2020-12-19 10:59           ` Mike Galbraith
2020-12-19 11:03           ` Mike Galbraith
2020-12-19 11:03             ` Mike Galbraith
2020-12-20  0:22           ` [PATCH] zsmalloc: do not use bit_spin_lock Vitaly Wool
2020-12-20  1:18             ` Matthew Wilcox
2020-12-20  7:21               ` Vitaly Wool
2020-12-20  7:21                 ` Vitaly Wool
2021-01-14 16:17                 ` Sebastian Andrzej Siewior
2020-12-20  1:23             ` Mike Galbraith
2020-12-20  1:23               ` Mike Galbraith
2020-12-20  4:11               ` Mike Galbraith
2020-12-20  4:11                 ` Mike Galbraith
2020-12-20  7:47               ` Mike Galbraith
2020-12-20  7:47                 ` Mike Galbraith
2020-12-20 21:20                 ` Song Bao Hua (Barry Song)
2020-12-20 22:10                   ` Mike Galbraith
2020-12-20  1:56             ` Mike Galbraith
2020-12-20  1:56               ` Mike Galbraith
2020-12-21 17:24             ` Minchan Kim
2020-12-21 19:20               ` Vitaly Wool
2020-12-21 19:20                 ` Vitaly Wool
2020-12-21 19:50                 ` Shakeel Butt
2020-12-21 19:50                   ` Shakeel Butt
2020-12-21 20:05                   ` Song Bao Hua (Barry Song)
2020-12-21 21:02                     ` Shakeel Butt
2020-12-21 21:02                       ` Shakeel Butt
2020-12-21 21:25                       ` Song Bao Hua (Barry Song)
2020-12-21 22:11                         ` Vitaly Wool
2020-12-21 22:11                           ` Vitaly Wool
2020-12-21 22:42                           ` Song Bao Hua (Barry Song)
2020-12-21 23:35                           ` Song Bao Hua (Barry Song) [this message]
2020-12-22  0:59                             ` Vitaly Wool
2020-12-22  0:59                               ` Vitaly Wool
2020-12-22  1:10                               ` Song Bao Hua (Barry Song)
2020-12-22  1:42                               ` Song Bao Hua (Barry Song)
2020-12-22  1:57                                 ` Vitaly Wool
2020-12-22  2:07                                   ` Song Bao Hua (Barry Song)
2020-12-22  2:10                                   ` Song Bao Hua (Barry Song)
2020-12-22  9:44                                     ` Vitaly Wool
2020-12-22  9:44                                       ` Vitaly Wool
2020-12-22 21:06                                       ` Song Bao Hua (Barry Song)
2020-12-23  0:11                                         ` Vitaly Wool
2020-12-23  0:11                                           ` Vitaly Wool
2020-12-23 12:44                                           ` tiantao (H)
2020-12-23 18:25                                             ` Vitaly Wool
2020-12-23 18:25                                               ` Vitaly Wool
2021-01-14 16:18                                               ` Sebastian Andrzej Siewior
2021-01-14 16:29                                                 ` Vitaly Wool
2021-01-14 16:29                                                   ` Vitaly Wool
2021-01-14 16:56                                                   ` Sebastian Andrzej Siewior
2021-01-14 17:15                                                     ` Vitaly Wool
2021-01-14 17:15                                                       ` Vitaly Wool
2021-01-14 17:18                                                       ` Sebastian Andrzej Siewior
2020-12-21 22:46                         ` Shakeel Butt
2020-12-21 22:46                           ` Shakeel Butt
2020-12-21 23:02                           ` Song Bao Hua (Barry Song)
2020-12-22  9:20                             ` David Laight
2020-12-22  9:32                               ` Vitaly Wool
2020-12-21 20:22                 ` Minchan Kim

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=f0ca46a830e54f4482fb4f46df9675f5@hisilicon.com \
    --to=song.bao.hua@hisilicon.com \
    --cc=akpm@linux-foundation.org \
    --cc=bigeasy@linutronix.de \
    --cc=efault@gmx.de \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux-mm@kvack.org \
    --cc=minchan@kernel.org \
    --cc=ngupta@vflare.org \
    --cc=sergey.senozhatsky.work@gmail.com \
    --cc=shakeelb@google.com \
    --cc=vitaly.wool@konsulko.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.