linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [RFC v3] zswap: Add CONFIG_ZSWAP_IO_SWITCH to handle swap IO issue
@ 2019-09-23  3:31 Hui Zhu
  2019-09-23 20:14 ` Dan Streetman
  0 siblings, 1 reply; 3+ messages in thread
From: Hui Zhu @ 2019-09-23  3:31 UTC (permalink / raw)
  To: sjenning, ddstreet, akpm, mhocko, willy, chris, hannes,
	ziqian.lzq, osandov, ying.huang, aryabinin, vovoy,
	richard.weiyang, jgg, dan.j.williams, rppt, jglisse,
	b.zolnierkie, axboe, dennis, josef, tj, oleg, rdunlap,
	linux-kernel, linux-mm
  Cc: Hui Zhu

This is the third version of this patch.  The first and second version
is in [1] and [2].
This verion is updated according to the comments from Randy Dunlap
in [3].

Currently, I use a VM that has 2 CPUs, 4G memory and 4G swap file.
I found that swap will affect the IO performance when it is running.
So I open zswap to handle it because it just use CPU cycles but not
disk IO.

It work OK but I found that zswap is slower than normal swap in this
VM.  zswap is about 300M/s and normal swap is about 500M/s. (The reason
is disk inside VM has fscache in host machine.)
So open zswap is make memory shrinker slower but good for IO performance
in this VM.
So I just want zswap work when the disk of the swap file is under high
IO load.

This commit is designed for this idea.
It add two parameters read_in_flight_limit and write_in_flight_limit to
zswap.
In zswap_frontswap_store, pages will be stored to zswap only when
the IO in flight number of swap device is bigger than
zswap_read_in_flight_limit or zswap_write_in_flight_limit
when zswap is enabled.
Then the zswap just work when the IO in flight number of swap device
is low.

[1] https://lkml.org/lkml/2019/9/11/935
[2] https://lkml.org/lkml/2019/9/20/90
[3] https://lkml.org/lkml/2019/9/20/1076

Signed-off-by: Hui Zhu <teawaterz@linux.alibaba.com>
---
 include/linux/swap.h |  3 +++
 mm/Kconfig           | 18 ++++++++++++++++
 mm/page_io.c         | 16 +++++++++++++++
 mm/zswap.c           | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++
 4 files changed, 95 insertions(+)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index de2c67a..82b621f 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -389,6 +389,9 @@ extern void end_swap_bio_write(struct bio *bio);
 extern int __swap_writepage(struct page *page, struct writeback_control *wbc,
 	bio_end_io_t end_write_func);
 extern int swap_set_page_dirty(struct page *page);
+#ifdef CONFIG_ZSWAP_IO_SWITCH
+extern void swap_io_in_flight(struct page *page, unsigned int inflight[2]);
+#endif
 
 int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
 		unsigned long nr_pages, sector_t start_block);
diff --git a/mm/Kconfig b/mm/Kconfig
index 56cec63..387c3b5 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -546,6 +546,24 @@ config ZSWAP
 	  they have not be fully explored on the large set of potential
 	  configurations and workloads that exist.
 
+config ZSWAP_IO_SWITCH
+	bool "Compressed cache for swap pages according to the IO status"
+	depends on ZSWAP
+	help
+	  This function helps the system that normal swap speed is higher
+	  than zswap speed to handle the swap IO issue.
+	  For example, a VM where the disk device is not set cache config or
+	  set cache=writeback.
+
+	  This function makes zswap just work when the disk of the swap file
+	  is under high IO load.
+	  It add two parameters (read_in_flight_limit and
+	  write_in_flight_limit) to zswap.  When zswap is enabled, pages will
+	  be stored to zswap only when the IO in flight number of swap device
+	  is bigger than zswap_read_in_flight_limit or
+	  zswap_write_in_flight_limit.
+	  If unsure, say "n".
+
 config ZPOOL
 	tristate "Common API for compressed memory storage"
 	help
diff --git a/mm/page_io.c b/mm/page_io.c
index 24ee600..e66b050 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -434,3 +434,19 @@ int swap_set_page_dirty(struct page *page)
 		return __set_page_dirty_no_writeback(page);
 	}
 }
+
+#ifdef CONFIG_ZSWAP_IO_SWITCH
+void swap_io_in_flight(struct page *page, unsigned int inflight[2])
+{
+	struct swap_info_struct *sis = page_swap_info(page);
+
+	if (!sis->bdev) {
+		inflight[0] = 0;
+		inflight[1] = 0;
+		return;
+	}
+
+	part_in_flight_rw(bdev_get_queue(sis->bdev), sis->bdev->bd_part,
+					  inflight);
+}
+#endif
diff --git a/mm/zswap.c b/mm/zswap.c
index 0e22744..0190b2d 100644
--- a/mm/zswap.c
+++ b/mm/zswap.c
@@ -62,6 +62,14 @@ static u64 zswap_reject_compress_poor;
 static u64 zswap_reject_alloc_fail;
 /* Store failed because the entry metadata could not be allocated (rare) */
 static u64 zswap_reject_kmemcache_fail;
+#ifdef CONFIG_ZSWAP_IO_SWITCH
+/*
+ * Store failed because zswap_read_in_flight_limit or
+ * zswap_write_in_flight_limit is bigger than IO in flight number of
+ * swap device
+ */
+static u64 zswap_reject_io;
+#endif
 /* Duplicate store was encountered (rare) */
 static u64 zswap_duplicate_entry;
 
@@ -114,6 +122,24 @@ static bool zswap_same_filled_pages_enabled = true;
 module_param_named(same_filled_pages_enabled, zswap_same_filled_pages_enabled,
 		   bool, 0644);
 
+#ifdef CONFIG_ZSWAP_IO_SWITCH
+/*
+ * zswap will not try to store the page if zswap_read_in_flight_limit is
+ * bigger than IO read in flight number of swap device
+ */
+static unsigned int zswap_read_in_flight_limit;
+module_param_named(read_in_flight_limit, zswap_read_in_flight_limit,
+		   uint, 0644);
+
+/*
+ * zswap will not try to store the page if zswap_write_in_flight_limit is
+ * bigger than IO write in flight number of swap device
+ */
+static unsigned int zswap_write_in_flight_limit;
+module_param_named(write_in_flight_limit, zswap_write_in_flight_limit,
+		   uint, 0644);
+#endif
+
 /*********************************
 * data structures
 **********************************/
@@ -1009,6 +1035,34 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
 		goto reject;
 	}
 
+#ifdef CONFIG_ZSWAP_IO_SWITCH
+	if (zswap_read_in_flight_limit || zswap_write_in_flight_limit) {
+		unsigned int inflight[2];
+		bool should_swap = false;
+
+		swap_io_in_flight(page, inflight);
+
+		if (zswap_write_in_flight_limit &&
+			inflight[1] < zswap_write_in_flight_limit)
+			should_swap = true;
+
+		if (zswap_read_in_flight_limit &&
+			(should_swap ||
+			 (!should_swap && !zswap_write_in_flight_limit))) {
+			if (inflight[0] < zswap_read_in_flight_limit)
+				should_swap = true;
+			else
+				should_swap = false;
+		}
+
+		if (should_swap) {
+			zswap_reject_io++;
+			ret = -EIO;
+			goto reject;
+		}
+	}
+#endif
+
 	/* reclaim space if needed */
 	if (zswap_is_full()) {
 		zswap_pool_limit_hit++;
@@ -1264,6 +1318,10 @@ static int __init zswap_debugfs_init(void)
 			   zswap_debugfs_root, &zswap_reject_kmemcache_fail);
 	debugfs_create_u64("reject_compress_poor", 0444,
 			   zswap_debugfs_root, &zswap_reject_compress_poor);
+#ifdef CONFIG_ZSWAP_IO_SWITCH
+	debugfs_create_u64("reject_io", 0444,
+			   zswap_debugfs_root, &zswap_reject_io);
+#endif
 	debugfs_create_u64("written_back_pages", 0444,
 			   zswap_debugfs_root, &zswap_written_back_pages);
 	debugfs_create_u64("duplicate_entry", 0444,
-- 
2.7.4



^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [RFC v3] zswap: Add CONFIG_ZSWAP_IO_SWITCH to handle swap IO issue
  2019-09-23  3:31 [RFC v3] zswap: Add CONFIG_ZSWAP_IO_SWITCH to handle swap IO issue Hui Zhu
@ 2019-09-23 20:14 ` Dan Streetman
  2019-09-26 10:51   ` Dan Streetman
  0 siblings, 1 reply; 3+ messages in thread
From: Dan Streetman @ 2019-09-23 20:14 UTC (permalink / raw)
  To: Hui Zhu
  Cc: Seth Jennings, Andrew Morton, Michal Hocko, Matthew Wilcox,
	chris, Johannes Weiner, ziqian.lzq, osandov, Huang Ying,
	aryabinin, vovoy, richard.weiyang, jgg, dan.j.williams, rppt,
	jglisse, b.zolnierkie, axboe, dennis, Josef Bacik, tj, oleg,
	Randy Dunlap, linux-kernel, Linux-MM

On Sun, Sep 22, 2019 at 11:32 PM Hui Zhu <teawaterz@linux.alibaba.com> wrote:
>
> This is the third version of this patch.  The first and second version
> is in [1] and [2].
> This verion is updated according to the comments from Randy Dunlap
> in [3].
>
> Currently, I use a VM that has 2 CPUs, 4G memory and 4G swap file.
> I found that swap will affect the IO performance when it is running.
> So I open zswap to handle it because it just use CPU cycles but not
> disk IO.
>
> It work OK but I found that zswap is slower than normal swap in this
> VM.  zswap is about 300M/s and normal swap is about 500M/s. (The reason
> is disk inside VM has fscache in host machine.)

I must be missing something here - if zswap in the guest is *slower*
than real swap, why are you using zswap?

Also, I don't see why zswap is slower than normal swap, unless you
mean that your zswap is full, since once zswap fills up any additional
swap will absolutely be slower than not having zswap at all.

> So open zswap is make memory shrinker slower but good for IO performance
> in this VM.
> So I just want zswap work when the disk of the swap file is under high
> IO load.
>
> This commit is designed for this idea.
> It add two parameters read_in_flight_limit and write_in_flight_limit to
> zswap.
> In zswap_frontswap_store, pages will be stored to zswap only when
> the IO in flight number of swap device is bigger than
> zswap_read_in_flight_limit or zswap_write_in_flight_limit
> when zswap is enabled.
> Then the zswap just work when the IO in flight number of swap device
> is low.

Ok, so maybe I understand what you mean, your disk I/O is normally
very fast, but once your host-side cache is full it starts actually
writing to your host physical disk, and your guest swap I/O drops way
down (since caching pages in host memory is much faster than writing
to a host physical disk).  Is that what's going on?  That was not
clear at all to me from the commit description...

In general I think the description of this commit, as well as the docs
and even user interface of how to use it, is very confusing.  I can
see how it would be beneficial in this specific situation, but I'm not
a fan of the implementation, and I'm very concerned that nobody will
be able to understand how to use it properly - when should they enable
it?  What limit values should they use?  Why are there separate read
and write limits?  None of that is clear to me, and I'm fairly
certainly it would not be clear to other normal users.

Is there a better way this can be done?

>
> [1] https://lkml.org/lkml/2019/9/11/935
> [2] https://lkml.org/lkml/2019/9/20/90
> [3] https://lkml.org/lkml/2019/9/20/1076
>
> Signed-off-by: Hui Zhu <teawaterz@linux.alibaba.com>
> ---
>  include/linux/swap.h |  3 +++
>  mm/Kconfig           | 18 ++++++++++++++++
>  mm/page_io.c         | 16 +++++++++++++++
>  mm/zswap.c           | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++
>  4 files changed, 95 insertions(+)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index de2c67a..82b621f 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -389,6 +389,9 @@ extern void end_swap_bio_write(struct bio *bio);
>  extern int __swap_writepage(struct page *page, struct writeback_control *wbc,
>         bio_end_io_t end_write_func);
>  extern int swap_set_page_dirty(struct page *page);
> +#ifdef CONFIG_ZSWAP_IO_SWITCH
> +extern void swap_io_in_flight(struct page *page, unsigned int inflight[2]);
> +#endif
>
>  int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
>                 unsigned long nr_pages, sector_t start_block);
> diff --git a/mm/Kconfig b/mm/Kconfig
> index 56cec63..387c3b5 100644
> --- a/mm/Kconfig
> +++ b/mm/Kconfig
> @@ -546,6 +546,24 @@ config ZSWAP
>           they have not be fully explored on the large set of potential
>           configurations and workloads that exist.
>
> +config ZSWAP_IO_SWITCH
> +       bool "Compressed cache for swap pages according to the IO status"
> +       depends on ZSWAP
> +       help
> +         This function helps the system that normal swap speed is higher
> +         than zswap speed to handle the swap IO issue.
> +         For example, a VM where the disk device is not set cache config or
> +         set cache=writeback.
> +
> +         This function makes zswap just work when the disk of the swap file
> +         is under high IO load.
> +         It add two parameters (read_in_flight_limit and
> +         write_in_flight_limit) to zswap.  When zswap is enabled, pages will
> +         be stored to zswap only when the IO in flight number of swap device
> +         is bigger than zswap_read_in_flight_limit or
> +         zswap_write_in_flight_limit.
> +         If unsure, say "n".
> +
>  config ZPOOL
>         tristate "Common API for compressed memory storage"
>         help
> diff --git a/mm/page_io.c b/mm/page_io.c
> index 24ee600..e66b050 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -434,3 +434,19 @@ int swap_set_page_dirty(struct page *page)
>                 return __set_page_dirty_no_writeback(page);
>         }
>  }
> +
> +#ifdef CONFIG_ZSWAP_IO_SWITCH
> +void swap_io_in_flight(struct page *page, unsigned int inflight[2])
> +{
> +       struct swap_info_struct *sis = page_swap_info(page);
> +
> +       if (!sis->bdev) {
> +               inflight[0] = 0;
> +               inflight[1] = 0;
> +               return;
> +       }
> +
> +       part_in_flight_rw(bdev_get_queue(sis->bdev), sis->bdev->bd_part,
> +                                         inflight);

this potentially will read inflight stats info from all possible cpus,
that's not something I'm a big fan of adding to every single page swap
call...it's not awful, but there might be scaling issues for systems
with lots of cpus.

> +}
> +#endif
> diff --git a/mm/zswap.c b/mm/zswap.c
> index 0e22744..0190b2d 100644
> --- a/mm/zswap.c
> +++ b/mm/zswap.c
> @@ -62,6 +62,14 @@ static u64 zswap_reject_compress_poor;
>  static u64 zswap_reject_alloc_fail;
>  /* Store failed because the entry metadata could not be allocated (rare) */
>  static u64 zswap_reject_kmemcache_fail;
> +#ifdef CONFIG_ZSWAP_IO_SWITCH
> +/*
> + * Store failed because zswap_read_in_flight_limit or
> + * zswap_write_in_flight_limit is bigger than IO in flight number of
> + * swap device
> + */
> +static u64 zswap_reject_io;
> +#endif
>  /* Duplicate store was encountered (rare) */
>  static u64 zswap_duplicate_entry;
>
> @@ -114,6 +122,24 @@ static bool zswap_same_filled_pages_enabled = true;
>  module_param_named(same_filled_pages_enabled, zswap_same_filled_pages_enabled,
>                    bool, 0644);
>
> +#ifdef CONFIG_ZSWAP_IO_SWITCH
> +/*
> + * zswap will not try to store the page if zswap_read_in_flight_limit is
> + * bigger than IO read in flight number of swap device
> + */
> +static unsigned int zswap_read_in_flight_limit;
> +module_param_named(read_in_flight_limit, zswap_read_in_flight_limit,
> +                  uint, 0644);
> +
> +/*
> + * zswap will not try to store the page if zswap_write_in_flight_limit is
> + * bigger than IO write in flight number of swap device
> + */
> +static unsigned int zswap_write_in_flight_limit;
> +module_param_named(write_in_flight_limit, zswap_write_in_flight_limit,
> +                  uint, 0644);
> +#endif
> +
>  /*********************************
>  * data structures
>  **********************************/
> @@ -1009,6 +1035,34 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
>                 goto reject;
>         }
>
> +#ifdef CONFIG_ZSWAP_IO_SWITCH
> +       if (zswap_read_in_flight_limit || zswap_write_in_flight_limit) {
> +               unsigned int inflight[2];
> +               bool should_swap = false;
> +
> +               swap_io_in_flight(page, inflight);
> +
> +               if (zswap_write_in_flight_limit &&
> +                       inflight[1] < zswap_write_in_flight_limit)
> +                       should_swap = true;
> +
> +               if (zswap_read_in_flight_limit &&
> +                       (should_swap ||
> +                        (!should_swap && !zswap_write_in_flight_limit))) {
> +                       if (inflight[0] < zswap_read_in_flight_limit)
> +                               should_swap = true;
> +                       else
> +                               should_swap = false;
> +               }
> +
> +               if (should_swap) {
> +                       zswap_reject_io++;
> +                       ret = -EIO;
> +                       goto reject;
> +               }
> +       }
> +#endif
> +
>         /* reclaim space if needed */
>         if (zswap_is_full()) {
>                 zswap_pool_limit_hit++;
> @@ -1264,6 +1318,10 @@ static int __init zswap_debugfs_init(void)
>                            zswap_debugfs_root, &zswap_reject_kmemcache_fail);
>         debugfs_create_u64("reject_compress_poor", 0444,
>                            zswap_debugfs_root, &zswap_reject_compress_poor);
> +#ifdef CONFIG_ZSWAP_IO_SWITCH
> +       debugfs_create_u64("reject_io", 0444,

"reject_io" is not very clear about why it was rejected; I think most
people will assume this means pages were rejected because of I/O
errors, not because the I/O inflight page count was lower than the set
limit.

> +                          zswap_debugfs_root, &zswap_reject_io);
> +#endif
>         debugfs_create_u64("written_back_pages", 0444,
>                            zswap_debugfs_root, &zswap_written_back_pages);
>         debugfs_create_u64("duplicate_entry", 0444,
> --
> 2.7.4
>


^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC v3] zswap: Add CONFIG_ZSWAP_IO_SWITCH to handle swap IO issue
  2019-09-23 20:14 ` Dan Streetman
@ 2019-09-26 10:51   ` Dan Streetman
  0 siblings, 0 replies; 3+ messages in thread
From: Dan Streetman @ 2019-09-26 10:51 UTC (permalink / raw)
  To: Hui Zhu, Andrew Morton
  Cc: Seth Jennings, Michal Hocko, Matthew Wilcox, chris,
	Johannes Weiner, ziqian.lzq, osandov, Huang Ying, aryabinin,
	vovoy, richard.weiyang, jgg, dan.j.williams, rppt, jglisse,
	b.zolnierkie, axboe, dennis, Josef Bacik, tj, oleg, Randy Dunlap,
	linux-kernel, Linux-MM

On Mon, Sep 23, 2019 at 4:14 PM Dan Streetman <ddstreet@ieee.org> wrote:
>
> On Sun, Sep 22, 2019 at 11:32 PM Hui Zhu <teawaterz@linux.alibaba.com> wrote:
> >
> > This is the third version of this patch.  The first and second version
> > is in [1] and [2].
> > This verion is updated according to the comments from Randy Dunlap
> > in [3].
> >
> > Currently, I use a VM that has 2 CPUs, 4G memory and 4G swap file.
> > I found that swap will affect the IO performance when it is running.
> > So I open zswap to handle it because it just use CPU cycles but not
> > disk IO.
> >
> > It work OK but I found that zswap is slower than normal swap in this
> > VM.  zswap is about 300M/s and normal swap is about 500M/s. (The reason
> > is disk inside VM has fscache in host machine.)
>
> I must be missing something here - if zswap in the guest is *slower*
> than real swap, why are you using zswap?
>
> Also, I don't see why zswap is slower than normal swap, unless you
> mean that your zswap is full, since once zswap fills up any additional
> swap will absolutely be slower than not having zswap at all.
>
> > So open zswap is make memory shrinker slower but good for IO performance
> > in this VM.
> > So I just want zswap work when the disk of the swap file is under high
> > IO load.
> >
> > This commit is designed for this idea.
> > It add two parameters read_in_flight_limit and write_in_flight_limit to
> > zswap.
> > In zswap_frontswap_store, pages will be stored to zswap only when
> > the IO in flight number of swap device is bigger than
> > zswap_read_in_flight_limit or zswap_write_in_flight_limit
> > when zswap is enabled.
> > Then the zswap just work when the IO in flight number of swap device
> > is low.
>
> Ok, so maybe I understand what you mean, your disk I/O is normally
> very fast, but once your host-side cache is full it starts actually
> writing to your host physical disk, and your guest swap I/O drops way
> down (since caching pages in host memory is much faster than writing
> to a host physical disk).  Is that what's going on?  That was not
> clear at all to me from the commit description...
>
> In general I think the description of this commit, as well as the docs
> and even user interface of how to use it, is very confusing.  I can
> see how it would be beneficial in this specific situation, but I'm not
> a fan of the implementation, and I'm very concerned that nobody will
> be able to understand how to use it properly - when should they enable
> it?  What limit values should they use?  Why are there separate read
> and write limits?  None of that is clear to me, and I'm fairly
> certainly it would not be clear to other normal users.
>
> Is there a better way this can be done?
>
> >
> > [1] https://lkml.org/lkml/2019/9/11/935
> > [2] https://lkml.org/lkml/2019/9/20/90
> > [3] https://lkml.org/lkml/2019/9/20/1076
> >
> > Signed-off-by: Hui Zhu <teawaterz@linux.alibaba.com>

Nacked-by: Dan Streetman <ddstreet@ieee.org>

due to my concerns that I emailed before


> > ---
> >  include/linux/swap.h |  3 +++
> >  mm/Kconfig           | 18 ++++++++++++++++
> >  mm/page_io.c         | 16 +++++++++++++++
> >  mm/zswap.c           | 58 ++++++++++++++++++++++++++++++++++++++++++++++++++++
> >  4 files changed, 95 insertions(+)
> >
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index de2c67a..82b621f 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -389,6 +389,9 @@ extern void end_swap_bio_write(struct bio *bio);
> >  extern int __swap_writepage(struct page *page, struct writeback_control *wbc,
> >         bio_end_io_t end_write_func);
> >  extern int swap_set_page_dirty(struct page *page);
> > +#ifdef CONFIG_ZSWAP_IO_SWITCH
> > +extern void swap_io_in_flight(struct page *page, unsigned int inflight[2]);
> > +#endif
> >
> >  int add_swap_extent(struct swap_info_struct *sis, unsigned long start_page,
> >                 unsigned long nr_pages, sector_t start_block);
> > diff --git a/mm/Kconfig b/mm/Kconfig
> > index 56cec63..387c3b5 100644
> > --- a/mm/Kconfig
> > +++ b/mm/Kconfig
> > @@ -546,6 +546,24 @@ config ZSWAP
> >           they have not be fully explored on the large set of potential
> >           configurations and workloads that exist.
> >
> > +config ZSWAP_IO_SWITCH
> > +       bool "Compressed cache for swap pages according to the IO status"
> > +       depends on ZSWAP
> > +       help
> > +         This function helps the system that normal swap speed is higher
> > +         than zswap speed to handle the swap IO issue.
> > +         For example, a VM where the disk device is not set cache config or
> > +         set cache=writeback.
> > +
> > +         This function makes zswap just work when the disk of the swap file
> > +         is under high IO load.
> > +         It add two parameters (read_in_flight_limit and
> > +         write_in_flight_limit) to zswap.  When zswap is enabled, pages will
> > +         be stored to zswap only when the IO in flight number of swap device
> > +         is bigger than zswap_read_in_flight_limit or
> > +         zswap_write_in_flight_limit.
> > +         If unsure, say "n".
> > +
> >  config ZPOOL
> >         tristate "Common API for compressed memory storage"
> >         help
> > diff --git a/mm/page_io.c b/mm/page_io.c
> > index 24ee600..e66b050 100644
> > --- a/mm/page_io.c
> > +++ b/mm/page_io.c
> > @@ -434,3 +434,19 @@ int swap_set_page_dirty(struct page *page)
> >                 return __set_page_dirty_no_writeback(page);
> >         }
> >  }
> > +
> > +#ifdef CONFIG_ZSWAP_IO_SWITCH
> > +void swap_io_in_flight(struct page *page, unsigned int inflight[2])
> > +{
> > +       struct swap_info_struct *sis = page_swap_info(page);
> > +
> > +       if (!sis->bdev) {
> > +               inflight[0] = 0;
> > +               inflight[1] = 0;
> > +               return;
> > +       }
> > +
> > +       part_in_flight_rw(bdev_get_queue(sis->bdev), sis->bdev->bd_part,
> > +                                         inflight);
>
> this potentially will read inflight stats info from all possible cpus,
> that's not something I'm a big fan of adding to every single page swap
> call...it's not awful, but there might be scaling issues for systems
> with lots of cpus.
>
> > +}
> > +#endif
> > diff --git a/mm/zswap.c b/mm/zswap.c
> > index 0e22744..0190b2d 100644
> > --- a/mm/zswap.c
> > +++ b/mm/zswap.c
> > @@ -62,6 +62,14 @@ static u64 zswap_reject_compress_poor;
> >  static u64 zswap_reject_alloc_fail;
> >  /* Store failed because the entry metadata could not be allocated (rare) */
> >  static u64 zswap_reject_kmemcache_fail;
> > +#ifdef CONFIG_ZSWAP_IO_SWITCH
> > +/*
> > + * Store failed because zswap_read_in_flight_limit or
> > + * zswap_write_in_flight_limit is bigger than IO in flight number of
> > + * swap device
> > + */
> > +static u64 zswap_reject_io;
> > +#endif
> >  /* Duplicate store was encountered (rare) */
> >  static u64 zswap_duplicate_entry;
> >
> > @@ -114,6 +122,24 @@ static bool zswap_same_filled_pages_enabled = true;
> >  module_param_named(same_filled_pages_enabled, zswap_same_filled_pages_enabled,
> >                    bool, 0644);
> >
> > +#ifdef CONFIG_ZSWAP_IO_SWITCH
> > +/*
> > + * zswap will not try to store the page if zswap_read_in_flight_limit is
> > + * bigger than IO read in flight number of swap device
> > + */
> > +static unsigned int zswap_read_in_flight_limit;
> > +module_param_named(read_in_flight_limit, zswap_read_in_flight_limit,
> > +                  uint, 0644);
> > +
> > +/*
> > + * zswap will not try to store the page if zswap_write_in_flight_limit is
> > + * bigger than IO write in flight number of swap device
> > + */
> > +static unsigned int zswap_write_in_flight_limit;
> > +module_param_named(write_in_flight_limit, zswap_write_in_flight_limit,
> > +                  uint, 0644);
> > +#endif
> > +
> >  /*********************************
> >  * data structures
> >  **********************************/
> > @@ -1009,6 +1035,34 @@ static int zswap_frontswap_store(unsigned type, pgoff_t offset,
> >                 goto reject;
> >         }
> >
> > +#ifdef CONFIG_ZSWAP_IO_SWITCH
> > +       if (zswap_read_in_flight_limit || zswap_write_in_flight_limit) {
> > +               unsigned int inflight[2];
> > +               bool should_swap = false;
> > +
> > +               swap_io_in_flight(page, inflight);
> > +
> > +               if (zswap_write_in_flight_limit &&
> > +                       inflight[1] < zswap_write_in_flight_limit)
> > +                       should_swap = true;
> > +
> > +               if (zswap_read_in_flight_limit &&
> > +                       (should_swap ||
> > +                        (!should_swap && !zswap_write_in_flight_limit))) {
> > +                       if (inflight[0] < zswap_read_in_flight_limit)
> > +                               should_swap = true;
> > +                       else
> > +                               should_swap = false;
> > +               }
> > +
> > +               if (should_swap) {
> > +                       zswap_reject_io++;
> > +                       ret = -EIO;
> > +                       goto reject;
> > +               }
> > +       }
> > +#endif
> > +
> >         /* reclaim space if needed */
> >         if (zswap_is_full()) {
> >                 zswap_pool_limit_hit++;
> > @@ -1264,6 +1318,10 @@ static int __init zswap_debugfs_init(void)
> >                            zswap_debugfs_root, &zswap_reject_kmemcache_fail);
> >         debugfs_create_u64("reject_compress_poor", 0444,
> >                            zswap_debugfs_root, &zswap_reject_compress_poor);
> > +#ifdef CONFIG_ZSWAP_IO_SWITCH
> > +       debugfs_create_u64("reject_io", 0444,
>
> "reject_io" is not very clear about why it was rejected; I think most
> people will assume this means pages were rejected because of I/O
> errors, not because the I/O inflight page count was lower than the set
> limit.
>
> > +                          zswap_debugfs_root, &zswap_reject_io);
> > +#endif
> >         debugfs_create_u64("written_back_pages", 0444,
> >                            zswap_debugfs_root, &zswap_written_back_pages);
> >         debugfs_create_u64("duplicate_entry", 0444,
> > --
> > 2.7.4
> >


^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2019-09-26 10:52 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-09-23  3:31 [RFC v3] zswap: Add CONFIG_ZSWAP_IO_SWITCH to handle swap IO issue Hui Zhu
2019-09-23 20:14 ` Dan Streetman
2019-09-26 10:51   ` Dan Streetman

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).