From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: X-Spam-Checker-Version: SpamAssassin 3.4.0 (2014-02-07) on aws-us-west-2-korg-lkml-1.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-6.8 required=3.0 tests=DKIMWL_WL_HIGH,DKIM_SIGNED, DKIM_VALID,DKIM_VALID_AU,HEADER_FROM_DIFFERENT_DOMAINS,INCLUDES_PATCH, MAILING_LIST_MULTI,SIGNED_OFF_BY,SPF_HELO_NONE,SPF_PASS,URIBL_BLOCKED autolearn=ham autolearn_force=no version=3.4.0 Received: from mail.kernel.org (mail.kernel.org [198.145.29.99]) by smtp.lore.kernel.org (Postfix) with ESMTP id F002DC4CEC4 for ; Mon, 23 Sep 2019 20:15:07 +0000 (UTC) Received: from kanga.kvack.org (kanga.kvack.org [205.233.56.17]) by mail.kernel.org (Postfix) with ESMTP id 953DD20820 for ; Mon, 23 Sep 2019 20:15:07 +0000 (UTC) Authentication-Results: mail.kernel.org; dkim=pass (1024-bit key) header.d=ieee.org header.i=@ieee.org header.b="bkgRAK8T" DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 953DD20820 Authentication-Results: mail.kernel.org; dmarc=fail (p=none dis=none) header.from=ieee.org Authentication-Results: mail.kernel.org; spf=pass smtp.mailfrom=owner-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix) id 28AF76B0272; Mon, 23 Sep 2019 16:15:07 -0400 (EDT) Received: by kanga.kvack.org (Postfix, from userid 40) id 23B7B6B0274; Mon, 23 Sep 2019 16:15:07 -0400 (EDT) X-Delivered-To: int-list-linux-mm@kvack.org Received: by kanga.kvack.org (Postfix, from userid 63042) id 129FD6B0275; Mon, 23 Sep 2019 16:15:07 -0400 (EDT) X-Delivered-To: linux-mm@kvack.org Received: from forelay.hostedemail.com (smtprelay0115.hostedemail.com [216.40.44.115]) by kanga.kvack.org (Postfix) with ESMTP id E268F6B0272 for ; Mon, 23 Sep 2019 16:15:06 -0400 (EDT) Received: from smtpin09.hostedemail.com (10.5.19.251.rfc1918.com [10.5.19.251]) by forelay05.hostedemail.com (Postfix) with SMTP id 683F8181AC9BA for ; Mon, 23 Sep 2019 20:15:06 +0000 (UTC) X-FDA: 75967289412.09.guide26_66ae8c44d075e X-HE-Tag: guide26_66ae8c44d075e X-Filterd-Recvd-Size: 12869 Received: from mail-io1-f65.google.com (mail-io1-f65.google.com [209.85.166.65]) by imf45.hostedemail.com (Postfix) with ESMTP for ; Mon, 23 Sep 2019 20:15:05 +0000 (UTC) Received: by mail-io1-f65.google.com with SMTP id c6so23499514ioo.13 for ; Mon, 23 Sep 2019 13:15:05 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=ieee.org; s=google; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=Lp6TMonzbuKlj33/BD70j5mcW8fMXi5umrwspsU0yM0=; b=bkgRAK8T3TSB8yADJditMdMUoiZ+K2NcwqoDPIgl7k65NTZgnQe+6AqAyePYWLy6Ju YSZfIBzKKQq5CzFsQrRlizYDZWrDJMWKvJ44aVNFe+3Qv+wpuwLxI2wYkWGvmPHmW3W3 f93dMu1/W9JMNKc0jYWvqy2yYCUHWu5Q1Rvwg= X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=Lp6TMonzbuKlj33/BD70j5mcW8fMXi5umrwspsU0yM0=; b=O2KaFGEQKgnMrfJWMMmXlWKocrgKImP6t6bVgud653gKOYfZvDZHsvnOI4uRyCg2Ue ebEI0hd9IgSDtOwyWulfNg9y3PytCTKQUfUhBSmnjcf7ErtzWpz9saMkZTSGm8aYINMr koRsCBuGgcUsv6fvQX5jmGKtiomhM9Wf7Rd2Rnv49KTvlQVjP3Oxys16mX2nYCE5QUIf jPvVNSaUQ6LEAUxTy2Mo1DF+eT12jgXKauwD3hlB/ZzgjxKUja1Lu7jZ0X0LqeUw9kXC aO2eLqCZQkLhxEvrJa9+JNtDS9wcP2MH2GdT2uGhXNr5Qdw2BZc8SuOFyMgaDYDQR7db 73Ew== X-Gm-Message-State: APjAAAXDjWqBAq9PBhwzyMpvEqeMaTT7Ony9edWh5b5MVBPOpUA/JPtB aQ/xkkOkYWZdtTzsoZAz4dpRil9AN3begGewp6w= X-Google-Smtp-Source: APXvYqwO5Em5+XJuiPV4ZrZ53a6N6x5NEVARmGtsCO+2CLPZiMl0XC+iZXFMJtOR+QaIV4DwlepFcqhJToxVRZ72Y6k= X-Received: by 2002:a02:9443:: with SMTP id a61mr1412409jai.35.1569269704624; Mon, 23 Sep 2019 13:15:04 -0700 (PDT) MIME-Version: 1.0 References: <1569209505-15801-1-git-send-email-teawaterz@linux.alibaba.com> In-Reply-To: <1569209505-15801-1-git-send-email-teawaterz@linux.alibaba.com> From: Dan Streetman Date: Mon, 23 Sep 2019 16:14:27 -0400 Message-ID: Subject: Re: [RFC v3] zswap: Add CONFIG_ZSWAP_IO_SWITCH to handle swap IO issue To: Hui Zhu Cc: Seth Jennings , Andrew Morton , Michal Hocko , Matthew Wilcox , chris@chris-wilson.co.uk, Johannes Weiner , ziqian.lzq@antfin.com, osandov@fb.com, Huang Ying , aryabinin@virtuozzo.com, vovoy@chromium.org, richard.weiyang@gmail.com, jgg@ziepe.ca, dan.j.williams@intel.com, rppt@linux.ibm.com, jglisse@redhat.com, b.zolnierkie@samsung.com, axboe@kernel.dk, dennis@kernel.org, Josef Bacik , tj@kernel.org, oleg@redhat.com, Randy Dunlap , linux-kernel , Linux-MM Content-Type: text/plain; charset="UTF-8" X-Bogosity: Ham, tests=bogofilter, spamicity=0.000000, version=1.2.4 Sender: owner-linux-mm@kvack.org Precedence: bulk X-Loop: owner-majordomo@kvack.org List-ID: On Sun, Sep 22, 2019 at 11:32 PM Hui Zhu wrote: > > This is the third version of this patch. The first and second version > is in [1] and [2]. > This verion is updated according to the comments from Randy Dunlap > in [3]. > > Currently, I use a VM that has 2 CPUs, 4G memory and 4G swap file. > I found that swap will affect the IO performance when it is running. > So I open zswap to handle it because it just use CPU cycles but not > disk IO. > > It work OK but I found that zswap is slower than normal swap in this > VM. zswap is about 300M/s and normal swap is about 500M/s. (The reason > is disk inside VM has fscache in host machine.) I must be missing something here - if zswap in the guest is *slower* than real swap, why are you using zswap? Also, I don't see why zswap is slower than normal swap, unless you mean that your zswap is full, since once zswap fills up any additional swap will absolutely be slower than not having zswap at all. > So open zswap is make memory shrinker slower but good for IO performance > in this VM. > So I just want zswap work when the disk of the swap file is under high > IO load. > > This commit is designed for this idea. > It add two parameters read_in_flight_limit and write_in_flight_limit to > zswap. > In zswap_frontswap_store, pages will be stored to zswap only when > the IO in flight number of swap device is bigger than > zswap_read_in_flight_limit or zswap_write_in_flight_limit > when zswap is enabled. > Then the zswap just work when the IO in flight number of swap device > is low. Ok, so maybe I understand what you mean, your disk I/O is normally very fast, but once your host-side cache is full it starts actually writing to your host physical disk, and your guest swap I/O drops way down (since caching pages in host memory is much faster than writing to a host physical disk). Is that what's going on? That was not clear at all to me from the commit description... In general I think the description of this commit, as well as the docs and even user interface of how to use it, is very confusing. I can see how it would be beneficial in this specific situation, but I'm not a fan of the implementation, and I'm very concerned that nobody will be able to understand how to use it properly - when should they enable it? What limit values should they use? Why are there separate read and write limits? None of that is clear to me, and I'm fairly certainly it would not be clear to other normal users. Is there a better way this can be done? > > [1] https://lkml.org/lkml/2019/9/11/935 > [2] https://lkml.org/lkml/2019/9/20/90 > [3] https://lkml.org/lkml/2019/9/20/1076 > > Signed-off-by: Hui Zhu > --- > include/linux/swap.h | 3 +++ > mm/Kconfig | 18 ++++++++++++++++ > mm/page_io.c | 16 +++++++++++++++ > mm/zswap.c | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++ > 4 files changed, 95 insertions(+) > > diff --git a/include/linux/swap.h b/include/linux/swap.h > index de2c67a..82b621f 100644 > --- a/include/linux/swap.h > +++ b/include/linux/swap.h > @@ -389,6 +389,9 @@ extern void end_swap_bio_write(struct bio *bio); > extern int __swap_writepage(struct page *page, struct writeback_control *wbc, > bio_end_io_t end_write_func); > extern int swap_set_page_dirty(struct page *page); > +#ifdef CONFIG_ZSWAP_IO_SWITCH > +extern void swap_io_in_flight(struct page *page, unsigned int inflight[2]); > +#endif > > int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page, > unsigned long nr_pages, sector_t start_block); > diff --git a/mm/Kconfig b/mm/Kconfig > index 56cec63..387c3b5 100644 > --- a/mm/Kconfig > +++ b/mm/Kconfig > @@ -546,6 +546,24 @@ config ZSWAP > they have not be fully explored on the large set of potential > configurations and workloads that exist. > > +config ZSWAP_IO_SWITCH > + bool "Compressed cache for swap pages according to the IO status" > + depends on ZSWAP > + help > + This function helps the system that normal swap speed is higher > + than zswap speed to handle the swap IO issue. > + For example, a VM where the disk device is not set cache config or > + set cache=writeback. > + > + This function makes zswap just work when the disk of the swap file > + is under high IO load. > + It add two parameters (read_in_flight_limit and > + write_in_flight_limit) to zswap. When zswap is enabled, pages will > + be stored to zswap only when the IO in flight number of swap device > + is bigger than zswap_read_in_flight_limit or > + zswap_write_in_flight_limit. > + If unsure, say "n". > + > config ZPOOL > tristate "Common API for compressed memory storage" > help > diff --git a/mm/page_io.c b/mm/page_io.c > index 24ee600..e66b050 100644 > --- a/mm/page_io.c > +++ b/mm/page_io.c > @@ -434,3 +434,19 @@ int swap_set_page_dirty(struct page *page) > return __set_page_dirty_no_writeback(page); > } > } > + > +#ifdef CONFIG_ZSWAP_IO_SWITCH > +void swap_io_in_flight(struct page *page, unsigned int inflight[2]) > +{ > + struct swap_info_struct *sis = page_swap_info(page); > + > + if (!sis->bdev) { > + inflight[0] = 0; > + inflight[1] = 0; > + return; > + } > + > + part_in_flight_rw(bdev_get_queue(sis->bdev), sis->bdev->bd_part, > + inflight); this potentially will read inflight stats info from all possible cpus, that's not something I'm a big fan of adding to every single page swap call...it's not awful, but there might be scaling issues for systems with lots of cpus. > +} > +#endif > diff --git a/mm/zswap.c b/mm/zswap.c > index 0e22744..0190b2d 100644 > --- a/mm/zswap.c > +++ b/mm/zswap.c > @@ -62,6 +62,14 @@ static u64 zswap_reject_compress_poor; > static u64 zswap_reject_alloc_fail; > /* Store failed because the entry metadata could not be allocated (rare) */ > static u64 zswap_reject_kmemcache_fail; > +#ifdef CONFIG_ZSWAP_IO_SWITCH > +/* > + * Store failed because zswap_read_in_flight_limit or > + * zswap_write_in_flight_limit is bigger than IO in flight number of > + * swap device > + */ > +static u64 zswap_reject_io; > +#endif > /* Duplicate store was encountered (rare) */ > static u64 zswap_duplicate_entry; > > @@ -114,6 +122,24 @@ static bool zswap_same_filled_pages_enabled = true; > module_param_named(same_filled_pages_enabled, zswap_same_filled_pages_enabled, > bool, 0644); > > +#ifdef CONFIG_ZSWAP_IO_SWITCH > +/* > + * zswap will not try to store the page if zswap_read_in_flight_limit is > + * bigger than IO read in flight number of swap device > + */ > +static unsigned int zswap_read_in_flight_limit; > +module_param_named(read_in_flight_limit, zswap_read_in_flight_limit, > + uint, 0644); > + > +/* > + * zswap will not try to store the page if zswap_write_in_flight_limit is > + * bigger than IO write in flight number of swap device > + */ > +static unsigned int zswap_write_in_flight_limit; > +module_param_named(write_in_flight_limit, zswap_write_in_flight_limit, > + uint, 0644); > +#endif > + > /********************************* > * data structures > **********************************/ > @@ -1009,6 +1035,34 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset, > goto reject; > } > > +#ifdef CONFIG_ZSWAP_IO_SWITCH > + if (zswap_read_in_flight_limit || zswap_write_in_flight_limit) { > + unsigned int inflight[2]; > + bool should_swap = false; > + > + swap_io_in_flight(page, inflight); > + > + if (zswap_write_in_flight_limit && > + inflight[1] < zswap_write_in_flight_limit) > + should_swap = true; > + > + if (zswap_read_in_flight_limit && > + (should_swap || > + (!should_swap && !zswap_write_in_flight_limit))) { > + if (inflight[0] < zswap_read_in_flight_limit) > + should_swap = true; > + else > + should_swap = false; > + } > + > + if (should_swap) { > + zswap_reject_io++; > + ret = -EIO; > + goto reject; > + } > + } > +#endif > + > /* reclaim space if needed */ > if (zswap_is_full()) { > zswap_pool_limit_hit++; > @@ -1264,6 +1318,10 @@ static int __init zswap_debugfs_init(void) > zswap_debugfs_root, &zswap_reject_kmemcache_fail); > debugfs_create_u64("reject_compress_poor", 0444, > zswap_debugfs_root, &zswap_reject_compress_poor); > +#ifdef CONFIG_ZSWAP_IO_SWITCH > + debugfs_create_u64("reject_io", 0444, "reject_io" is not very clear about why it was rejected; I think most people will assume this means pages were rejected because of I/O errors, not because the I/O inflight page count was lower than the set limit. > + zswap_debugfs_root, &zswap_reject_io); > +#endif > debugfs_create_u64("written_back_pages", 0444, > zswap_debugfs_root, &zswap_written_back_pages); > debugfs_create_u64("duplicate_entry", 0444, > -- > 2.7.4 >