All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/4] skip swapcache for super fast device
@ 2017-09-20  5:43 ` Minchan Kim
  0 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2017-09-20  5:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, kernel-team, Christoph Hellwig, Minchan Kim

With fast swap storage, platform want to use swap more aggressively
and swap-in is crucial to application latency.

The rw_page based synchronous devices like zram, pmem and btt are such
fast storage. When I profile swapin performance with zram lz4 decompress
test, S/W overhead is more than 70%. Maybe, it would be bigger in nvdimm.

This patch aims for reducing swap-in latency via skipping swapcache
if swap device is synchronous device like rw_page based device.

It enhances 45% my swapin test(5G sequential swapin, no readahead,
from 2.41sec to 1.64sec).

Andrew, [1] is zram specific patch so could be applied separately
but this patch is based on that so I include it in this series.

* From v1
  * style fix
  * a bug fix
  * drop page-cluster based readahead off
    * This regression could be solved by other patch from Huang.
      http://lkml.kernel.org/r/87tw04in60.fsf@yhuang-dev.intel.com
  
Minchan Kim (4):
  [1] zram: set BDI_CAP_STABLE_WRITES once
  [2] bdi: introduce BDI_CAP_SYNCHRONOUS_IO
  [3] mm:swap: introduce SWP_SYNCHRONOUS_IO
  [4] mm:swap: skip swapcache for swapin of synchronous device

 drivers/block/brd.c           |  2 ++
 drivers/block/zram/zram_drv.c | 16 +++++--------
 drivers/nvdimm/btt.c          |  3 +++
 drivers/nvdimm/pmem.c         |  2 ++
 include/linux/backing-dev.h   |  8 +++++++
 include/linux/swap.h          | 14 +++++++++++-
 mm/memory.c                   | 52 ++++++++++++++++++++++++++++++-------------
 mm/page_io.c                  |  6 ++---
 mm/swapfile.c                 | 14 ++++++++----
 9 files changed, 83 insertions(+), 34 deletions(-)

-- 
2.7.4

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 0/4] skip swapcache for super fast device
@ 2017-09-20  5:43 ` Minchan Kim
  0 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2017-09-20  5:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, kernel-team, Christoph Hellwig, Minchan Kim

With fast swap storage, platform want to use swap more aggressively
and swap-in is crucial to application latency.

The rw_page based synchronous devices like zram, pmem and btt are such
fast storage. When I profile swapin performance with zram lz4 decompress
test, S/W overhead is more than 70%. Maybe, it would be bigger in nvdimm.

This patch aims for reducing swap-in latency via skipping swapcache
if swap device is synchronous device like rw_page based device.

It enhances 45% my swapin test(5G sequential swapin, no readahead,
from 2.41sec to 1.64sec).

Andrew, [1] is zram specific patch so could be applied separately
but this patch is based on that so I include it in this series.

* From v1
  * style fix
  * a bug fix
  * drop page-cluster based readahead off
    * This regression could be solved by other patch from Huang.
      http://lkml.kernel.org/r/87tw04in60.fsf@yhuang-dev.intel.com
  
Minchan Kim (4):
  [1] zram: set BDI_CAP_STABLE_WRITES once
  [2] bdi: introduce BDI_CAP_SYNCHRONOUS_IO
  [3] mm:swap: introduce SWP_SYNCHRONOUS_IO
  [4] mm:swap: skip swapcache for swapin of synchronous device

 drivers/block/brd.c           |  2 ++
 drivers/block/zram/zram_drv.c | 16 +++++--------
 drivers/nvdimm/btt.c          |  3 +++
 drivers/nvdimm/pmem.c         |  2 ++
 include/linux/backing-dev.h   |  8 +++++++
 include/linux/swap.h          | 14 +++++++++++-
 mm/memory.c                   | 52 ++++++++++++++++++++++++++++++-------------
 mm/page_io.c                  |  6 ++---
 mm/swapfile.c                 | 14 ++++++++----
 9 files changed, 83 insertions(+), 34 deletions(-)

-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 1/4] zram: set BDI_CAP_STABLE_WRITES once
  2017-09-20  5:43 ` Minchan Kim
@ 2017-09-20  5:43   ` Minchan Kim
  -1 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2017-09-20  5:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, kernel-team, Christoph Hellwig,
	Minchan Kim, Ilya Dryomov, Sergey Senozhatsky

[1] fixed weird thing(i.e., reset BDI_CAP_STABLE_WRITES flag
unconditionally whenever revalidat_disk is called) so zram doesn't
need to reset the flag any more whenever revalidating the bdev.
Instead, set the flag just once when the zram device is created.

It shouldn't change any behavior.

[1] 19b7ccf8651d, block: get rid of blk_integrity_revalidate()
Cc: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index cc78f61e22d1..98ef1a8389b0 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -122,14 +122,6 @@ static inline bool is_partial_io(struct bio_vec *bvec)
 }
 #endif
 
-static void zram_revalidate_disk(struct zram *zram)
-{
-	revalidate_disk(zram->disk);
-	/* revalidate_disk reset the BDI_CAP_STABLE_WRITES so set again */
-	zram->disk->queue->backing_dev_info->capabilities |=
-		BDI_CAP_STABLE_WRITES;
-}
-
 /*
  * Check if request is within bounds and aligned on zram logical blocks.
  */
@@ -1371,7 +1363,8 @@ static ssize_t disksize_store(struct device *dev,
 	zram->comp = comp;
 	zram->disksize = disksize;
 	set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
-	zram_revalidate_disk(zram);
+
+	revalidate_disk(zram->disk);
 	up_write(&zram->init_lock);
 
 	return len;
@@ -1418,7 +1411,7 @@ static ssize_t reset_store(struct device *dev,
 	/* Make sure all the pending I/O are finished */
 	fsync_bdev(bdev);
 	zram_reset_device(zram);
-	zram_revalidate_disk(zram);
+	revalidate_disk(zram->disk);
 	bdput(bdev);
 
 	mutex_lock(&bdev->bd_mutex);
@@ -1537,6 +1530,7 @@ static int zram_add(void)
 	/* zram devices sort of resembles non-rotational disks */
 	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, zram->disk->queue);
 	queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, zram->disk->queue);
+
 	/*
 	 * To ensure that we always get PAGE_SIZE aligned
 	 * and n*PAGE_SIZED sized I/O requests.
@@ -1561,6 +1555,8 @@ static int zram_add(void)
 	if (ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE)
 		blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX);
 
+	zram->disk->queue->backing_dev_info->capabilities |=
+					BDI_CAP_STABLE_WRITES;
 	add_disk(zram->disk);
 
 	ret = sysfs_create_group(&disk_to_dev(zram->disk)->kobj,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 1/4] zram: set BDI_CAP_STABLE_WRITES once
@ 2017-09-20  5:43   ` Minchan Kim
  0 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2017-09-20  5:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, kernel-team, Christoph Hellwig,
	Minchan Kim, Ilya Dryomov, Sergey Senozhatsky

[1] fixed weird thing(i.e., reset BDI_CAP_STABLE_WRITES flag
unconditionally whenever revalidat_disk is called) so zram doesn't
need to reset the flag any more whenever revalidating the bdev.
Instead, set the flag just once when the zram device is created.

It shouldn't change any behavior.

[1] 19b7ccf8651d, block: get rid of blk_integrity_revalidate()
Cc: Ilya Dryomov <idryomov@gmail.com>
Reviewed-by: Sergey Senozhatsky <sergey.senozhatsky@gmail.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/zram/zram_drv.c | 16 ++++++----------
 1 file changed, 6 insertions(+), 10 deletions(-)

diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index cc78f61e22d1..98ef1a8389b0 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -122,14 +122,6 @@ static inline bool is_partial_io(struct bio_vec *bvec)
 }
 #endif
 
-static void zram_revalidate_disk(struct zram *zram)
-{
-	revalidate_disk(zram->disk);
-	/* revalidate_disk reset the BDI_CAP_STABLE_WRITES so set again */
-	zram->disk->queue->backing_dev_info->capabilities |=
-		BDI_CAP_STABLE_WRITES;
-}
-
 /*
  * Check if request is within bounds and aligned on zram logical blocks.
  */
@@ -1371,7 +1363,8 @@ static ssize_t disksize_store(struct device *dev,
 	zram->comp = comp;
 	zram->disksize = disksize;
 	set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
-	zram_revalidate_disk(zram);
+
+	revalidate_disk(zram->disk);
 	up_write(&zram->init_lock);
 
 	return len;
@@ -1418,7 +1411,7 @@ static ssize_t reset_store(struct device *dev,
 	/* Make sure all the pending I/O are finished */
 	fsync_bdev(bdev);
 	zram_reset_device(zram);
-	zram_revalidate_disk(zram);
+	revalidate_disk(zram->disk);
 	bdput(bdev);
 
 	mutex_lock(&bdev->bd_mutex);
@@ -1537,6 +1530,7 @@ static int zram_add(void)
 	/* zram devices sort of resembles non-rotational disks */
 	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, zram->disk->queue);
 	queue_flag_clear_unlocked(QUEUE_FLAG_ADD_RANDOM, zram->disk->queue);
+
 	/*
 	 * To ensure that we always get PAGE_SIZE aligned
 	 * and n*PAGE_SIZED sized I/O requests.
@@ -1561,6 +1555,8 @@ static int zram_add(void)
 	if (ZRAM_LOGICAL_BLOCK_SIZE == PAGE_SIZE)
 		blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX);
 
+	zram->disk->queue->backing_dev_info->capabilities |=
+					BDI_CAP_STABLE_WRITES;
 	add_disk(zram->disk);
 
 	ret = sysfs_create_group(&disk_to_dev(zram->disk)->kobj,
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/4] bdi: introduce BDI_CAP_SYNCHRONOUS_IO
  2017-09-20  5:43 ` Minchan Kim
@ 2017-09-20  5:43   ` Minchan Kim
  -1 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2017-09-20  5:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, kernel-team, Christoph Hellwig,
	Minchan Kim, Dan Williams, Ross Zwisler

By discussion[1], someday we will remove rw_page function. If so, we need
something to detect such super-fast storage which synchronous IO operation
like current rw_page is always win.

This patch introduces BDI_CAP_SYNCHRONOUS_IO to indicate such devices.
With it, we could use various optimization techniques.

[1] lkml.kernel.org/r/<20170728165604.10455-1-ross.zwisler@linux.intel.com>

Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/brd.c           | 2 ++
 drivers/block/zram/zram_drv.c | 2 +-
 drivers/nvdimm/btt.c          | 3 +++
 drivers/nvdimm/pmem.c         | 2 ++
 include/linux/backing-dev.h   | 8 ++++++++
 5 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index bbd0d186cfc0..1fdb736aa882 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -20,6 +20,7 @@
 #include <linux/radix-tree.h>
 #include <linux/fs.h>
 #include <linux/slab.h>
+#include <linux/backing-dev.h>
 #ifdef CONFIG_BLK_DEV_RAM_DAX
 #include <linux/pfn_t.h>
 #include <linux/dax.h>
@@ -449,6 +450,7 @@ static struct brd_device *brd_alloc(int i)
 	disk->flags		= GENHD_FL_EXT_DEVT;
 	sprintf(disk->disk_name, "ram%d", i);
 	set_capacity(disk, rd_size * 2);
+	disk->queue->backing_dev_info->capabilities |= BDI_CAP_SYNCHRONOUS_IO;
 
 #ifdef CONFIG_BLK_DEV_RAM_DAX
 	queue_flag_set_unlocked(QUEUE_FLAG_DAX, brd->brd_queue);
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 98ef1a8389b0..23172641fc01 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1556,7 +1556,7 @@ static int zram_add(void)
 		blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX);
 
 	zram->disk->queue->backing_dev_info->capabilities |=
-					BDI_CAP_STABLE_WRITES;
+			(BDI_CAP_STABLE_WRITES | BDI_CAP_SYNCHRONOUS_IO);
 	add_disk(zram->disk);
 
 	ret = sysfs_create_group(&disk_to_dev(zram->disk)->kobj,
diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index d5612bd1cc81..e949e3302af4 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -23,6 +23,7 @@
 #include <linux/ndctl.h>
 #include <linux/fs.h>
 #include <linux/nd.h>
+#include <linux/backing-dev.h>
 #include "btt.h"
 #include "nd.h"
 
@@ -1402,6 +1403,8 @@ static int btt_blk_init(struct btt *btt)
 	btt->btt_disk->private_data = btt;
 	btt->btt_disk->queue = btt->btt_queue;
 	btt->btt_disk->flags = GENHD_FL_EXT_DEVT;
+	btt->btt_disk->queue->backing_dev_info->capabilities |=
+			BDI_CAP_SYNCHRONOUS_IO;
 
 	blk_queue_make_request(btt->btt_queue, btt_make_request);
 	blk_queue_logical_block_size(btt->btt_queue, btt->sector_size);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 39dfd7affa31..7fbc5c5dc8e1 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -31,6 +31,7 @@
 #include <linux/uio.h>
 #include <linux/dax.h>
 #include <linux/nd.h>
+#include <linux/backing-dev.h>
 #include "pmem.h"
 #include "pfn.h"
 #include "nd.h"
@@ -394,6 +395,7 @@ static int pmem_attach_disk(struct device *dev,
 	disk->fops		= &pmem_fops;
 	disk->queue		= q;
 	disk->flags		= GENHD_FL_EXT_DEVT;
+	disk->queue->backing_dev_info->capabilities |= BDI_CAP_SYNCHRONOUS_IO;
 	nvdimm_namespace_disk_name(ndns, disk->disk_name);
 	set_capacity(disk, (pmem->size - pmem->pfn_pad - pmem->data_offset)
 			/ 512);
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 854e1bdd0b2a..cd41617c6594 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -123,6 +123,8 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ratio);
  * BDI_CAP_STRICTLIMIT:    Keep number of dirty pages below bdi threshold.
  *
  * BDI_CAP_CGROUP_WRITEBACK: Supports cgroup-aware writeback.
+ * BDI_CAP_SYNCHRONOUS_IO: Device is so fast that asynchronous IO would be
+ *			   inefficient.
  */
 #define BDI_CAP_NO_ACCT_DIRTY	0x00000001
 #define BDI_CAP_NO_WRITEBACK	0x00000002
@@ -130,6 +132,7 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ratio);
 #define BDI_CAP_STABLE_WRITES	0x00000008
 #define BDI_CAP_STRICTLIMIT	0x00000010
 #define BDI_CAP_CGROUP_WRITEBACK 0x00000020
+#define BDI_CAP_SYNCHRONOUS_IO	0x00000040
 
 #define BDI_CAP_NO_ACCT_AND_WRITEBACK \
 	(BDI_CAP_NO_WRITEBACK | BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_ACCT_WB)
@@ -177,6 +180,11 @@ long wait_iff_congested(struct pglist_data *pgdat, int sync, long timeout);
 int pdflush_proc_obsolete(struct ctl_table *table, int write,
 		void __user *buffer, size_t *lenp, loff_t *ppos);
 
+static inline bool bdi_cap_synchronous_io(struct backing_dev_info *bdi)
+{
+	return bdi->capabilities & BDI_CAP_SYNCHRONOUS_IO;
+}
+
 static inline bool bdi_cap_stable_pages_required(struct backing_dev_info *bdi)
 {
 	return bdi->capabilities & BDI_CAP_STABLE_WRITES;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 2/4] bdi: introduce BDI_CAP_SYNCHRONOUS_IO
@ 2017-09-20  5:43   ` Minchan Kim
  0 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2017-09-20  5:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, kernel-team, Christoph Hellwig,
	Minchan Kim, Dan Williams, Ross Zwisler

By discussion[1], someday we will remove rw_page function. If so, we need
something to detect such super-fast storage which synchronous IO operation
like current rw_page is always win.

This patch introduces BDI_CAP_SYNCHRONOUS_IO to indicate such devices.
With it, we could use various optimization techniques.

[1] lkml.kernel.org/r/<20170728165604.10455-1-ross.zwisler@linux.intel.com>

Cc: Christoph Hellwig <hch@lst.de>
Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 drivers/block/brd.c           | 2 ++
 drivers/block/zram/zram_drv.c | 2 +-
 drivers/nvdimm/btt.c          | 3 +++
 drivers/nvdimm/pmem.c         | 2 ++
 include/linux/backing-dev.h   | 8 ++++++++
 5 files changed, 16 insertions(+), 1 deletion(-)

diff --git a/drivers/block/brd.c b/drivers/block/brd.c
index bbd0d186cfc0..1fdb736aa882 100644
--- a/drivers/block/brd.c
+++ b/drivers/block/brd.c
@@ -20,6 +20,7 @@
 #include <linux/radix-tree.h>
 #include <linux/fs.h>
 #include <linux/slab.h>
+#include <linux/backing-dev.h>
 #ifdef CONFIG_BLK_DEV_RAM_DAX
 #include <linux/pfn_t.h>
 #include <linux/dax.h>
@@ -449,6 +450,7 @@ static struct brd_device *brd_alloc(int i)
 	disk->flags		= GENHD_FL_EXT_DEVT;
 	sprintf(disk->disk_name, "ram%d", i);
 	set_capacity(disk, rd_size * 2);
+	disk->queue->backing_dev_info->capabilities |= BDI_CAP_SYNCHRONOUS_IO;
 
 #ifdef CONFIG_BLK_DEV_RAM_DAX
 	queue_flag_set_unlocked(QUEUE_FLAG_DAX, brd->brd_queue);
diff --git a/drivers/block/zram/zram_drv.c b/drivers/block/zram/zram_drv.c
index 98ef1a8389b0..23172641fc01 100644
--- a/drivers/block/zram/zram_drv.c
+++ b/drivers/block/zram/zram_drv.c
@@ -1556,7 +1556,7 @@ static int zram_add(void)
 		blk_queue_max_write_zeroes_sectors(zram->disk->queue, UINT_MAX);
 
 	zram->disk->queue->backing_dev_info->capabilities |=
-					BDI_CAP_STABLE_WRITES;
+			(BDI_CAP_STABLE_WRITES | BDI_CAP_SYNCHRONOUS_IO);
 	add_disk(zram->disk);
 
 	ret = sysfs_create_group(&disk_to_dev(zram->disk)->kobj,
diff --git a/drivers/nvdimm/btt.c b/drivers/nvdimm/btt.c
index d5612bd1cc81..e949e3302af4 100644
--- a/drivers/nvdimm/btt.c
+++ b/drivers/nvdimm/btt.c
@@ -23,6 +23,7 @@
 #include <linux/ndctl.h>
 #include <linux/fs.h>
 #include <linux/nd.h>
+#include <linux/backing-dev.h>
 #include "btt.h"
 #include "nd.h"
 
@@ -1402,6 +1403,8 @@ static int btt_blk_init(struct btt *btt)
 	btt->btt_disk->private_data = btt;
 	btt->btt_disk->queue = btt->btt_queue;
 	btt->btt_disk->flags = GENHD_FL_EXT_DEVT;
+	btt->btt_disk->queue->backing_dev_info->capabilities |=
+			BDI_CAP_SYNCHRONOUS_IO;
 
 	blk_queue_make_request(btt->btt_queue, btt_make_request);
 	blk_queue_logical_block_size(btt->btt_queue, btt->sector_size);
diff --git a/drivers/nvdimm/pmem.c b/drivers/nvdimm/pmem.c
index 39dfd7affa31..7fbc5c5dc8e1 100644
--- a/drivers/nvdimm/pmem.c
+++ b/drivers/nvdimm/pmem.c
@@ -31,6 +31,7 @@
 #include <linux/uio.h>
 #include <linux/dax.h>
 #include <linux/nd.h>
+#include <linux/backing-dev.h>
 #include "pmem.h"
 #include "pfn.h"
 #include "nd.h"
@@ -394,6 +395,7 @@ static int pmem_attach_disk(struct device *dev,
 	disk->fops		= &pmem_fops;
 	disk->queue		= q;
 	disk->flags		= GENHD_FL_EXT_DEVT;
+	disk->queue->backing_dev_info->capabilities |= BDI_CAP_SYNCHRONOUS_IO;
 	nvdimm_namespace_disk_name(ndns, disk->disk_name);
 	set_capacity(disk, (pmem->size - pmem->pfn_pad - pmem->data_offset)
 			/ 512);
diff --git a/include/linux/backing-dev.h b/include/linux/backing-dev.h
index 854e1bdd0b2a..cd41617c6594 100644
--- a/include/linux/backing-dev.h
+++ b/include/linux/backing-dev.h
@@ -123,6 +123,8 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ratio);
  * BDI_CAP_STRICTLIMIT:    Keep number of dirty pages below bdi threshold.
  *
  * BDI_CAP_CGROUP_WRITEBACK: Supports cgroup-aware writeback.
+ * BDI_CAP_SYNCHRONOUS_IO: Device is so fast that asynchronous IO would be
+ *			   inefficient.
  */
 #define BDI_CAP_NO_ACCT_DIRTY	0x00000001
 #define BDI_CAP_NO_WRITEBACK	0x00000002
@@ -130,6 +132,7 @@ int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ratio);
 #define BDI_CAP_STABLE_WRITES	0x00000008
 #define BDI_CAP_STRICTLIMIT	0x00000010
 #define BDI_CAP_CGROUP_WRITEBACK 0x00000020
+#define BDI_CAP_SYNCHRONOUS_IO	0x00000040
 
 #define BDI_CAP_NO_ACCT_AND_WRITEBACK \
 	(BDI_CAP_NO_WRITEBACK | BDI_CAP_NO_ACCT_DIRTY | BDI_CAP_NO_ACCT_WB)
@@ -177,6 +180,11 @@ long wait_iff_congested(struct pglist_data *pgdat, int sync, long timeout);
 int pdflush_proc_obsolete(struct ctl_table *table, int write,
 		void __user *buffer, size_t *lenp, loff_t *ppos);
 
+static inline bool bdi_cap_synchronous_io(struct backing_dev_info *bdi)
+{
+	return bdi->capabilities & BDI_CAP_SYNCHRONOUS_IO;
+}
+
 static inline bool bdi_cap_stable_pages_required(struct backing_dev_info *bdi)
 {
 	return bdi->capabilities & BDI_CAP_STABLE_WRITES;
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/4] mm:swap: introduce SWP_SYNCHRONOUS_IO
  2017-09-20  5:43 ` Minchan Kim
@ 2017-09-20  5:43   ` Minchan Kim
  -1 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2017-09-20  5:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, kernel-team, Christoph Hellwig,
	Minchan Kim, Hugh Dickins

If rw-page based fast storage is used for swap devices, we need to
detect it to enhance swap IO operations.
This patch is preparation for optimizing of swap-in operation with
next patch.

Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 include/linux/swap.h | 3 ++-
 mm/swapfile.c        | 3 +++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 8a807292037f..fbb33919d1c6 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -170,8 +170,9 @@ enum {
 	SWP_AREA_DISCARD = (1 << 8),	/* single-time swap area discards */
 	SWP_PAGE_DISCARD = (1 << 9),	/* freed swap page-cluster discards */
 	SWP_STABLE_WRITES = (1 << 10),	/* no overwrite PG_writeback pages */
+	SWP_SYNCHRONOUS_IO = (1 << 11),	/* synchronous IO is efficient */
 					/* add others here before... */
-	SWP_SCANNING	= (1 << 11),	/* refcount in scan_swap_map */
+	SWP_SCANNING	= (1 << 12),	/* refcount in scan_swap_map */
 };
 
 #define SWAP_CLUSTER_MAX 32UL
diff --git a/mm/swapfile.c b/mm/swapfile.c
index bf91dc9e7a79..1305591cde4d 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3168,6 +3168,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	if (bdi_cap_stable_pages_required(inode_to_bdi(inode)))
 		p->flags |= SWP_STABLE_WRITES;
 
+	if (bdi_cap_synchronous_io(inode_to_bdi(inode)))
+		p->flags |= SWP_SYNCHRONOUS_IO;
+
 	if (p->bdev && blk_queue_nonrot(bdev_get_queue(p->bdev))) {
 		int cpu;
 		unsigned long ci, nr_cluster;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 3/4] mm:swap: introduce SWP_SYNCHRONOUS_IO
@ 2017-09-20  5:43   ` Minchan Kim
  0 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2017-09-20  5:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, kernel-team, Christoph Hellwig,
	Minchan Kim, Hugh Dickins

If rw-page based fast storage is used for swap devices, we need to
detect it to enhance swap IO operations.
This patch is preparation for optimizing of swap-in operation with
next patch.

Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 include/linux/swap.h | 3 ++-
 mm/swapfile.c        | 3 +++
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index 8a807292037f..fbb33919d1c6 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -170,8 +170,9 @@ enum {
 	SWP_AREA_DISCARD = (1 << 8),	/* single-time swap area discards */
 	SWP_PAGE_DISCARD = (1 << 9),	/* freed swap page-cluster discards */
 	SWP_STABLE_WRITES = (1 << 10),	/* no overwrite PG_writeback pages */
+	SWP_SYNCHRONOUS_IO = (1 << 11),	/* synchronous IO is efficient */
 					/* add others here before... */
-	SWP_SCANNING	= (1 << 11),	/* refcount in scan_swap_map */
+	SWP_SCANNING	= (1 << 12),	/* refcount in scan_swap_map */
 };
 
 #define SWAP_CLUSTER_MAX 32UL
diff --git a/mm/swapfile.c b/mm/swapfile.c
index bf91dc9e7a79..1305591cde4d 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3168,6 +3168,9 @@ SYSCALL_DEFINE2(swapon, const char __user *, specialfile, int, swap_flags)
 	if (bdi_cap_stable_pages_required(inode_to_bdi(inode)))
 		p->flags |= SWP_STABLE_WRITES;
 
+	if (bdi_cap_synchronous_io(inode_to_bdi(inode)))
+		p->flags |= SWP_SYNCHRONOUS_IO;
+
 	if (p->bdev && blk_queue_nonrot(bdev_get_queue(p->bdev))) {
 		int cpu;
 		unsigned long ci, nr_cluster;
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 4/4] mm:swap: skip swapcache for swapin of synchronous device
  2017-09-20  5:43 ` Minchan Kim
@ 2017-09-20  5:43   ` Minchan Kim
  -1 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2017-09-20  5:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, kernel-team, Christoph Hellwig,
	Minchan Kim, Dan Williams, Ross Zwisler, Hugh Dickins

With fast swap storage, platform want to use swap more aggressively
and swap-in is crucial to application latency.

The rw_page based synchronous devices like zram, pmem and btt are such
fast storage. When I profile swapin performance with zram lz4 decompress
test, S/W overhead is more than 70%. Maybe, it would be bigger in nvdimm.

This patch aims for reducing swap-in latency via skipping swapcache
if swap device is synchronous device like rw_page based device.
It enhances 45% my swapin test(5G sequential swapin, no readahead,
from 2.41sec to 1.64sec).

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 include/linux/swap.h | 11 +++++++++++
 mm/memory.c          | 52 ++++++++++++++++++++++++++++++++++++----------------
 mm/page_io.c         |  6 +++---
 mm/swapfile.c        | 11 +++++++----
 4 files changed, 57 insertions(+), 23 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index fbb33919d1c6..cd2f66fdfc2d 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -461,6 +461,7 @@ extern int page_swapcount(struct page *);
 extern int __swp_swapcount(swp_entry_t entry);
 extern int swp_swapcount(swp_entry_t entry);
 extern struct swap_info_struct *page_swap_info(struct page *);
+extern struct swap_info_struct *swp_swap_info(swp_entry_t entry);
 extern bool reuse_swap_page(struct page *, int *);
 extern int try_to_free_swap(struct page *);
 struct backing_dev_info;
@@ -469,6 +470,16 @@ extern void exit_swap_address_space(unsigned int type);
 
 #else /* CONFIG_SWAP */
 
+static inline int swap_readpage(struct page *page, bool do_poll)
+{
+	return 0;
+}
+
+static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry)
+{
+	return NULL;
+}
+
 #define swap_address_space(entry)		(NULL)
 #define get_nr_swap_pages()			0L
 #define total_swap_pages			0L
diff --git a/mm/memory.c b/mm/memory.c
index ec4e15494901..163ab2062385 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2842,7 +2842,7 @@ EXPORT_SYMBOL(unmap_mapping_range);
 int do_swap_page(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	struct page *page = NULL, *swapcache;
+	struct page *page = NULL, *swapcache = NULL;
 	struct mem_cgroup *memcg;
 	struct vma_swap_readahead swap_ra;
 	swp_entry_t entry;
@@ -2881,17 +2881,35 @@ int do_swap_page(struct vm_fault *vmf)
 		}
 		goto out;
 	}
+
+
 	delayacct_set_flag(DELAYACCT_PF_SWAPIN);
 	if (!page)
 		page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
 					 vmf->address);
 	if (!page) {
-		if (vma_readahead)
-			page = do_swap_page_readahead(entry,
-				GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
-		else
-			page = swapin_readahead(entry,
-				GFP_HIGHUSER_MOVABLE, vma, vmf->address);
+		struct swap_info_struct *si = swp_swap_info(entry);
+
+		if (!(si->flags & SWP_SYNCHRONOUS_IO)) {
+			if (vma_readahead)
+				page = do_swap_page_readahead(entry,
+					GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
+			else
+				page = swapin_readahead(entry,
+					GFP_HIGHUSER_MOVABLE, vma, vmf->address);
+			swapcache = page;
+		} else {
+			/* skip swapcache */
+			page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
+			if (page) {
+				__SetPageLocked(page);
+				__SetPageSwapBacked(page);
+				set_page_private(page, entry.val);
+				lru_cache_add_anon(page);
+				swap_readpage(page, true);
+			}
+		}
+
 		if (!page) {
 			/*
 			 * Back out if somebody else faulted in this pte
@@ -2920,7 +2938,6 @@ int do_swap_page(struct vm_fault *vmf)
 		goto out_release;
 	}
 
-	swapcache = page;
 	locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags);
 
 	delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
@@ -2935,7 +2952,8 @@ int do_swap_page(struct vm_fault *vmf)
 	 * test below, are not enough to exclude that.  Even if it is still
 	 * swapcache, we need to check that the page's swap has not changed.
 	 */
-	if (unlikely(!PageSwapCache(page) || page_private(page) != entry.val))
+	if (unlikely((!PageSwapCache(page) ||
+			page_private(page) != entry.val)) && swapcache)
 		goto out_page;
 
 	page = ksm_might_need_to_copy(page, vma, vmf->address);
@@ -2988,14 +3006,16 @@ int do_swap_page(struct vm_fault *vmf)
 		pte = pte_mksoft_dirty(pte);
 	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
 	vmf->orig_pte = pte;
-	if (page == swapcache) {
-		do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
-		mem_cgroup_commit_charge(page, memcg, true, false);
-		activate_page(page);
-	} else { /* ksm created a completely new copy */
+
+	/* ksm created a completely new copy */
+	if (unlikely(page != swapcache && swapcache)) {
 		page_add_new_anon_rmap(page, vma, vmf->address, false);
 		mem_cgroup_commit_charge(page, memcg, false, false);
 		lru_cache_add_active_or_unevictable(page, vma);
+	} else {
+		do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
+		mem_cgroup_commit_charge(page, memcg, true, false);
+		activate_page(page);
 	}
 
 	swap_free(entry);
@@ -3003,7 +3023,7 @@ int do_swap_page(struct vm_fault *vmf)
 	    (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
 		try_to_free_swap(page);
 	unlock_page(page);
-	if (page != swapcache) {
+	if (page != swapcache && swapcache) {
 		/*
 		 * Hold the lock to avoid the swap entry to be reused
 		 * until we take the PT lock for the pte_same() check
@@ -3036,7 +3056,7 @@ int do_swap_page(struct vm_fault *vmf)
 	unlock_page(page);
 out_release:
 	put_page(page);
-	if (page != swapcache) {
+	if (page != swapcache && swapcache) {
 		unlock_page(swapcache);
 		put_page(swapcache);
 	}
diff --git a/mm/page_io.c b/mm/page_io.c
index 21502d341a67..d4a98e1f6608 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -346,7 +346,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 	return ret;
 }
 
-int swap_readpage(struct page *page, bool do_poll)
+int swap_readpage(struct page *page, bool synchronous)
 {
 	struct bio *bio;
 	int ret = 0;
@@ -354,7 +354,7 @@ int swap_readpage(struct page *page, bool do_poll)
 	blk_qc_t qc;
 	struct gendisk *disk;
 
-	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
+	VM_BUG_ON_PAGE(!PageSwapCache(page) && !synchronous, page);
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	VM_BUG_ON_PAGE(PageUptodate(page), page);
 	if (frontswap_load(page) == 0) {
@@ -402,7 +402,7 @@ int swap_readpage(struct page *page, bool do_poll)
 	count_vm_event(PSWPIN);
 	bio_get(bio);
 	qc = submit_bio(bio);
-	while (do_poll) {
+	while (synchronous) {
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		if (!READ_ONCE(bio->bi_private))
 			break;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 1305591cde4d..64a3d85226ba 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3454,10 +3454,15 @@ int swapcache_prepare(swp_entry_t entry)
 	return __swap_duplicate(entry, SWAP_HAS_CACHE);
 }
 
+struct swap_info_struct *swp_swap_info(swp_entry_t entry)
+{
+	return swap_info[swp_type(entry)];
+}
+
 struct swap_info_struct *page_swap_info(struct page *page)
 {
-	swp_entry_t swap = { .val = page_private(page) };
-	return swap_info[swp_type(swap)];
+	swp_entry_t entry = { .val = page_private(page) };
+	return swp_swap_info(entry);
 }
 
 /*
@@ -3465,7 +3470,6 @@ struct swap_info_struct *page_swap_info(struct page *page)
  */
 struct address_space *__page_file_mapping(struct page *page)
 {
-	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
 	return page_swap_info(page)->swap_file->f_mapping;
 }
 EXPORT_SYMBOL_GPL(__page_file_mapping);
@@ -3473,7 +3477,6 @@ EXPORT_SYMBOL_GPL(__page_file_mapping);
 pgoff_t __page_file_index(struct page *page)
 {
 	swp_entry_t swap = { .val = page_private(page) };
-	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
 	return swp_offset(swap);
 }
 EXPORT_SYMBOL_GPL(__page_file_index);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 4/4] mm:swap: skip swapcache for swapin of synchronous device
@ 2017-09-20  5:43   ` Minchan Kim
  0 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2017-09-20  5:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, linux-kernel, kernel-team, Christoph Hellwig,
	Minchan Kim, Dan Williams, Ross Zwisler, Hugh Dickins

With fast swap storage, platform want to use swap more aggressively
and swap-in is crucial to application latency.

The rw_page based synchronous devices like zram, pmem and btt are such
fast storage. When I profile swapin performance with zram lz4 decompress
test, S/W overhead is more than 70%. Maybe, it would be bigger in nvdimm.

This patch aims for reducing swap-in latency via skipping swapcache
if swap device is synchronous device like rw_page based device.
It enhances 45% my swapin test(5G sequential swapin, no readahead,
from 2.41sec to 1.64sec).

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 include/linux/swap.h | 11 +++++++++++
 mm/memory.c          | 52 ++++++++++++++++++++++++++++++++++++----------------
 mm/page_io.c         |  6 +++---
 mm/swapfile.c        | 11 +++++++----
 4 files changed, 57 insertions(+), 23 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index fbb33919d1c6..cd2f66fdfc2d 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -461,6 +461,7 @@ extern int page_swapcount(struct page *);
 extern int __swp_swapcount(swp_entry_t entry);
 extern int swp_swapcount(swp_entry_t entry);
 extern struct swap_info_struct *page_swap_info(struct page *);
+extern struct swap_info_struct *swp_swap_info(swp_entry_t entry);
 extern bool reuse_swap_page(struct page *, int *);
 extern int try_to_free_swap(struct page *);
 struct backing_dev_info;
@@ -469,6 +470,16 @@ extern void exit_swap_address_space(unsigned int type);
 
 #else /* CONFIG_SWAP */
 
+static inline int swap_readpage(struct page *page, bool do_poll)
+{
+	return 0;
+}
+
+static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry)
+{
+	return NULL;
+}
+
 #define swap_address_space(entry)		(NULL)
 #define get_nr_swap_pages()			0L
 #define total_swap_pages			0L
diff --git a/mm/memory.c b/mm/memory.c
index ec4e15494901..163ab2062385 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2842,7 +2842,7 @@ EXPORT_SYMBOL(unmap_mapping_range);
 int do_swap_page(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	struct page *page = NULL, *swapcache;
+	struct page *page = NULL, *swapcache = NULL;
 	struct mem_cgroup *memcg;
 	struct vma_swap_readahead swap_ra;
 	swp_entry_t entry;
@@ -2881,17 +2881,35 @@ int do_swap_page(struct vm_fault *vmf)
 		}
 		goto out;
 	}
+
+
 	delayacct_set_flag(DELAYACCT_PF_SWAPIN);
 	if (!page)
 		page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
 					 vmf->address);
 	if (!page) {
-		if (vma_readahead)
-			page = do_swap_page_readahead(entry,
-				GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
-		else
-			page = swapin_readahead(entry,
-				GFP_HIGHUSER_MOVABLE, vma, vmf->address);
+		struct swap_info_struct *si = swp_swap_info(entry);
+
+		if (!(si->flags & SWP_SYNCHRONOUS_IO)) {
+			if (vma_readahead)
+				page = do_swap_page_readahead(entry,
+					GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
+			else
+				page = swapin_readahead(entry,
+					GFP_HIGHUSER_MOVABLE, vma, vmf->address);
+			swapcache = page;
+		} else {
+			/* skip swapcache */
+			page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
+			if (page) {
+				__SetPageLocked(page);
+				__SetPageSwapBacked(page);
+				set_page_private(page, entry.val);
+				lru_cache_add_anon(page);
+				swap_readpage(page, true);
+			}
+		}
+
 		if (!page) {
 			/*
 			 * Back out if somebody else faulted in this pte
@@ -2920,7 +2938,6 @@ int do_swap_page(struct vm_fault *vmf)
 		goto out_release;
 	}
 
-	swapcache = page;
 	locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags);
 
 	delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
@@ -2935,7 +2952,8 @@ int do_swap_page(struct vm_fault *vmf)
 	 * test below, are not enough to exclude that.  Even if it is still
 	 * swapcache, we need to check that the page's swap has not changed.
 	 */
-	if (unlikely(!PageSwapCache(page) || page_private(page) != entry.val))
+	if (unlikely((!PageSwapCache(page) ||
+			page_private(page) != entry.val)) && swapcache)
 		goto out_page;
 
 	page = ksm_might_need_to_copy(page, vma, vmf->address);
@@ -2988,14 +3006,16 @@ int do_swap_page(struct vm_fault *vmf)
 		pte = pte_mksoft_dirty(pte);
 	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
 	vmf->orig_pte = pte;
-	if (page == swapcache) {
-		do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
-		mem_cgroup_commit_charge(page, memcg, true, false);
-		activate_page(page);
-	} else { /* ksm created a completely new copy */
+
+	/* ksm created a completely new copy */
+	if (unlikely(page != swapcache && swapcache)) {
 		page_add_new_anon_rmap(page, vma, vmf->address, false);
 		mem_cgroup_commit_charge(page, memcg, false, false);
 		lru_cache_add_active_or_unevictable(page, vma);
+	} else {
+		do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
+		mem_cgroup_commit_charge(page, memcg, true, false);
+		activate_page(page);
 	}
 
 	swap_free(entry);
@@ -3003,7 +3023,7 @@ int do_swap_page(struct vm_fault *vmf)
 	    (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
 		try_to_free_swap(page);
 	unlock_page(page);
-	if (page != swapcache) {
+	if (page != swapcache && swapcache) {
 		/*
 		 * Hold the lock to avoid the swap entry to be reused
 		 * until we take the PT lock for the pte_same() check
@@ -3036,7 +3056,7 @@ int do_swap_page(struct vm_fault *vmf)
 	unlock_page(page);
 out_release:
 	put_page(page);
-	if (page != swapcache) {
+	if (page != swapcache && swapcache) {
 		unlock_page(swapcache);
 		put_page(swapcache);
 	}
diff --git a/mm/page_io.c b/mm/page_io.c
index 21502d341a67..d4a98e1f6608 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -346,7 +346,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 	return ret;
 }
 
-int swap_readpage(struct page *page, bool do_poll)
+int swap_readpage(struct page *page, bool synchronous)
 {
 	struct bio *bio;
 	int ret = 0;
@@ -354,7 +354,7 @@ int swap_readpage(struct page *page, bool do_poll)
 	blk_qc_t qc;
 	struct gendisk *disk;
 
-	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
+	VM_BUG_ON_PAGE(!PageSwapCache(page) && !synchronous, page);
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	VM_BUG_ON_PAGE(PageUptodate(page), page);
 	if (frontswap_load(page) == 0) {
@@ -402,7 +402,7 @@ int swap_readpage(struct page *page, bool do_poll)
 	count_vm_event(PSWPIN);
 	bio_get(bio);
 	qc = submit_bio(bio);
-	while (do_poll) {
+	while (synchronous) {
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		if (!READ_ONCE(bio->bi_private))
 			break;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 1305591cde4d..64a3d85226ba 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3454,10 +3454,15 @@ int swapcache_prepare(swp_entry_t entry)
 	return __swap_duplicate(entry, SWAP_HAS_CACHE);
 }
 
+struct swap_info_struct *swp_swap_info(swp_entry_t entry)
+{
+	return swap_info[swp_type(entry)];
+}
+
 struct swap_info_struct *page_swap_info(struct page *page)
 {
-	swp_entry_t swap = { .val = page_private(page) };
-	return swap_info[swp_type(swap)];
+	swp_entry_t entry = { .val = page_private(page) };
+	return swp_swap_info(entry);
 }
 
 /*
@@ -3465,7 +3470,6 @@ struct swap_info_struct *page_swap_info(struct page *page)
  */
 struct address_space *__page_file_mapping(struct page *page)
 {
-	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
 	return page_swap_info(page)->swap_file->f_mapping;
 }
 EXPORT_SYMBOL_GPL(__page_file_mapping);
@@ -3473,7 +3477,6 @@ EXPORT_SYMBOL_GPL(__page_file_mapping);
 pgoff_t __page_file_index(struct page *page)
 {
 	swp_entry_t swap = { .val = page_private(page) };
-	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
 	return swp_offset(swap);
 }
 EXPORT_SYMBOL_GPL(__page_file_index);
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/4] mm:swap: skip swapcache for swapin of synchronous device
  2017-09-20  5:43   ` Minchan Kim
@ 2017-09-29  8:51     ` huang ying
  -1 siblings, 0 replies; 20+ messages in thread
From: huang ying @ 2017-09-29  8:51 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, LKML, kernel-team, Christoph Hellwig,
	Dan Williams, Ross Zwisler, Hugh Dickins, Huang Ying

On Wed, Sep 20, 2017 at 1:43 PM, Minchan Kim <minchan@kernel.org> wrote:
> With fast swap storage, platform want to use swap more aggressively
> and swap-in is crucial to application latency.
>
> The rw_page based synchronous devices like zram, pmem and btt are such
> fast storage. When I profile swapin performance with zram lz4 decompress
> test, S/W overhead is more than 70%. Maybe, it would be bigger in nvdimm.
>
> This patch aims for reducing swap-in latency via skipping swapcache
> if swap device is synchronous device like rw_page based device.
> It enhances 45% my swapin test(5G sequential swapin, no readahead,
> from 2.41sec to 1.64sec).
>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Cc: Hugh Dickins <hughd@google.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  include/linux/swap.h | 11 +++++++++++
>  mm/memory.c          | 52 ++++++++++++++++++++++++++++++++++++----------------
>  mm/page_io.c         |  6 +++---
>  mm/swapfile.c        | 11 +++++++----
>  4 files changed, 57 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index fbb33919d1c6..cd2f66fdfc2d 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -461,6 +461,7 @@ extern int page_swapcount(struct page *);
>  extern int __swp_swapcount(swp_entry_t entry);
>  extern int swp_swapcount(swp_entry_t entry);
>  extern struct swap_info_struct *page_swap_info(struct page *);
> +extern struct swap_info_struct *swp_swap_info(swp_entry_t entry);
>  extern bool reuse_swap_page(struct page *, int *);
>  extern int try_to_free_swap(struct page *);
>  struct backing_dev_info;
> @@ -469,6 +470,16 @@ extern void exit_swap_address_space(unsigned int type);
>
>  #else /* CONFIG_SWAP */
>
> +static inline int swap_readpage(struct page *page, bool do_poll)
> +{
> +       return 0;
> +}
> +
> +static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry)
> +{
> +       return NULL;
> +}
> +
>  #define swap_address_space(entry)              (NULL)
>  #define get_nr_swap_pages()                    0L
>  #define total_swap_pages                       0L
> diff --git a/mm/memory.c b/mm/memory.c
> index ec4e15494901..163ab2062385 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2842,7 +2842,7 @@ EXPORT_SYMBOL(unmap_mapping_range);
>  int do_swap_page(struct vm_fault *vmf)
>  {
>         struct vm_area_struct *vma = vmf->vma;
> -       struct page *page = NULL, *swapcache;
> +       struct page *page = NULL, *swapcache = NULL;
>         struct mem_cgroup *memcg;
>         struct vma_swap_readahead swap_ra;
>         swp_entry_t entry;
> @@ -2881,17 +2881,35 @@ int do_swap_page(struct vm_fault *vmf)
>                 }
>                 goto out;
>         }
> +
> +
>         delayacct_set_flag(DELAYACCT_PF_SWAPIN);
>         if (!page)
>                 page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
>                                          vmf->address);
>         if (!page) {
> -               if (vma_readahead)
> -                       page = do_swap_page_readahead(entry,
> -                               GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
> -               else
> -                       page = swapin_readahead(entry,
> -                               GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> +               struct swap_info_struct *si = swp_swap_info(entry);
> +
> +               if (!(si->flags & SWP_SYNCHRONOUS_IO)) {
> +                       if (vma_readahead)
> +                               page = do_swap_page_readahead(entry,
> +                                       GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
> +                       else
> +                               page = swapin_readahead(entry,
> +                                       GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> +                       swapcache = page;
> +               } else {
> +                       /* skip swapcache */
> +                       page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> +                       if (page) {
> +                               __SetPageLocked(page);
> +                               __SetPageSwapBacked(page);
> +                               set_page_private(page, entry.val);
> +                               lru_cache_add_anon(page);
> +                               swap_readpage(page, true);
> +                       }
> +               }

I have a question for this.  If a page is mapped in multiple processes
(for example, because of fork).  With swap cache, after swapping out
and swapping in, the page will be still shared by these processes.
But with your changes, it appears that there will be multiple pages
with same contents mapped in multiple processes, even if the page
isn't written in these processes.  So this may waste some memory in
some situation?  And copying from device is even faster than looking
up swap cache in your system?

Best Regards,
Huang, Ying

> +
>                 if (!page) {
>                         /*
>                          * Back out if somebody else faulted in this pte
> @@ -2920,7 +2938,6 @@ int do_swap_page(struct vm_fault *vmf)
>                 goto out_release;
>         }
>
> -       swapcache = page;
>         locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags);
>
>         delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
> @@ -2935,7 +2952,8 @@ int do_swap_page(struct vm_fault *vmf)
>          * test below, are not enough to exclude that.  Even if it is still
>          * swapcache, we need to check that the page's swap has not changed.
>          */
> -       if (unlikely(!PageSwapCache(page) || page_private(page) != entry.val))
> +       if (unlikely((!PageSwapCache(page) ||
> +                       page_private(page) != entry.val)) && swapcache)
>                 goto out_page;
>
>         page = ksm_might_need_to_copy(page, vma, vmf->address);
> @@ -2988,14 +3006,16 @@ int do_swap_page(struct vm_fault *vmf)
>                 pte = pte_mksoft_dirty(pte);
>         set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
>         vmf->orig_pte = pte;
> -       if (page == swapcache) {
> -               do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
> -               mem_cgroup_commit_charge(page, memcg, true, false);
> -               activate_page(page);
> -       } else { /* ksm created a completely new copy */
> +
> +       /* ksm created a completely new copy */
> +       if (unlikely(page != swapcache && swapcache)) {
>                 page_add_new_anon_rmap(page, vma, vmf->address, false);
>                 mem_cgroup_commit_charge(page, memcg, false, false);
>                 lru_cache_add_active_or_unevictable(page, vma);
> +       } else {
> +               do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
> +               mem_cgroup_commit_charge(page, memcg, true, false);
> +               activate_page(page);
>         }
>
>         swap_free(entry);
> @@ -3003,7 +3023,7 @@ int do_swap_page(struct vm_fault *vmf)
>             (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
>                 try_to_free_swap(page);
>         unlock_page(page);
> -       if (page != swapcache) {
> +       if (page != swapcache && swapcache) {
>                 /*
>                  * Hold the lock to avoid the swap entry to be reused
>                  * until we take the PT lock for the pte_same() check
> @@ -3036,7 +3056,7 @@ int do_swap_page(struct vm_fault *vmf)
>         unlock_page(page);
>  out_release:
>         put_page(page);
> -       if (page != swapcache) {
> +       if (page != swapcache && swapcache) {
>                 unlock_page(swapcache);
>                 put_page(swapcache);
>         }
> diff --git a/mm/page_io.c b/mm/page_io.c
> index 21502d341a67..d4a98e1f6608 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -346,7 +346,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
>         return ret;
>  }
>
> -int swap_readpage(struct page *page, bool do_poll)
> +int swap_readpage(struct page *page, bool synchronous)
>  {
>         struct bio *bio;
>         int ret = 0;
> @@ -354,7 +354,7 @@ int swap_readpage(struct page *page, bool do_poll)
>         blk_qc_t qc;
>         struct gendisk *disk;
>
> -       VM_BUG_ON_PAGE(!PageSwapCache(page), page);
> +       VM_BUG_ON_PAGE(!PageSwapCache(page) && !synchronous, page);
>         VM_BUG_ON_PAGE(!PageLocked(page), page);
>         VM_BUG_ON_PAGE(PageUptodate(page), page);
>         if (frontswap_load(page) == 0) {
> @@ -402,7 +402,7 @@ int swap_readpage(struct page *page, bool do_poll)
>         count_vm_event(PSWPIN);
>         bio_get(bio);
>         qc = submit_bio(bio);
> -       while (do_poll) {
> +       while (synchronous) {
>                 set_current_state(TASK_UNINTERRUPTIBLE);
>                 if (!READ_ONCE(bio->bi_private))
>                         break;
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 1305591cde4d..64a3d85226ba 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -3454,10 +3454,15 @@ int swapcache_prepare(swp_entry_t entry)
>         return __swap_duplicate(entry, SWAP_HAS_CACHE);
>  }
>
> +struct swap_info_struct *swp_swap_info(swp_entry_t entry)
> +{
> +       return swap_info[swp_type(entry)];
> +}
> +
>  struct swap_info_struct *page_swap_info(struct page *page)
>  {
> -       swp_entry_t swap = { .val = page_private(page) };
> -       return swap_info[swp_type(swap)];
> +       swp_entry_t entry = { .val = page_private(page) };
> +       return swp_swap_info(entry);
>  }
>
>  /*
> @@ -3465,7 +3470,6 @@ struct swap_info_struct *page_swap_info(struct page *page)
>   */
>  struct address_space *__page_file_mapping(struct page *page)
>  {
> -       VM_BUG_ON_PAGE(!PageSwapCache(page), page);
>         return page_swap_info(page)->swap_file->f_mapping;
>  }
>  EXPORT_SYMBOL_GPL(__page_file_mapping);
> @@ -3473,7 +3477,6 @@ EXPORT_SYMBOL_GPL(__page_file_mapping);
>  pgoff_t __page_file_index(struct page *page)
>  {
>         swp_entry_t swap = { .val = page_private(page) };
> -       VM_BUG_ON_PAGE(!PageSwapCache(page), page);
>         return swp_offset(swap);
>  }
>  EXPORT_SYMBOL_GPL(__page_file_index);
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/4] mm:swap: skip swapcache for swapin of synchronous device
@ 2017-09-29  8:51     ` huang ying
  0 siblings, 0 replies; 20+ messages in thread
From: huang ying @ 2017-09-29  8:51 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, LKML, kernel-team, Christoph Hellwig,
	Dan Williams, Ross Zwisler, Hugh Dickins, Huang Ying

On Wed, Sep 20, 2017 at 1:43 PM, Minchan Kim <minchan@kernel.org> wrote:
> With fast swap storage, platform want to use swap more aggressively
> and swap-in is crucial to application latency.
>
> The rw_page based synchronous devices like zram, pmem and btt are such
> fast storage. When I profile swapin performance with zram lz4 decompress
> test, S/W overhead is more than 70%. Maybe, it would be bigger in nvdimm.
>
> This patch aims for reducing swap-in latency via skipping swapcache
> if swap device is synchronous device like rw_page based device.
> It enhances 45% my swapin test(5G sequential swapin, no readahead,
> from 2.41sec to 1.64sec).
>
> Cc: Dan Williams <dan.j.williams@intel.com>
> Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> Cc: Hugh Dickins <hughd@google.com>
> Signed-off-by: Minchan Kim <minchan@kernel.org>
> ---
>  include/linux/swap.h | 11 +++++++++++
>  mm/memory.c          | 52 ++++++++++++++++++++++++++++++++++++----------------
>  mm/page_io.c         |  6 +++---
>  mm/swapfile.c        | 11 +++++++----
>  4 files changed, 57 insertions(+), 23 deletions(-)
>
> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index fbb33919d1c6..cd2f66fdfc2d 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -461,6 +461,7 @@ extern int page_swapcount(struct page *);
>  extern int __swp_swapcount(swp_entry_t entry);
>  extern int swp_swapcount(swp_entry_t entry);
>  extern struct swap_info_struct *page_swap_info(struct page *);
> +extern struct swap_info_struct *swp_swap_info(swp_entry_t entry);
>  extern bool reuse_swap_page(struct page *, int *);
>  extern int try_to_free_swap(struct page *);
>  struct backing_dev_info;
> @@ -469,6 +470,16 @@ extern void exit_swap_address_space(unsigned int type);
>
>  #else /* CONFIG_SWAP */
>
> +static inline int swap_readpage(struct page *page, bool do_poll)
> +{
> +       return 0;
> +}
> +
> +static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry)
> +{
> +       return NULL;
> +}
> +
>  #define swap_address_space(entry)              (NULL)
>  #define get_nr_swap_pages()                    0L
>  #define total_swap_pages                       0L
> diff --git a/mm/memory.c b/mm/memory.c
> index ec4e15494901..163ab2062385 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2842,7 +2842,7 @@ EXPORT_SYMBOL(unmap_mapping_range);
>  int do_swap_page(struct vm_fault *vmf)
>  {
>         struct vm_area_struct *vma = vmf->vma;
> -       struct page *page = NULL, *swapcache;
> +       struct page *page = NULL, *swapcache = NULL;
>         struct mem_cgroup *memcg;
>         struct vma_swap_readahead swap_ra;
>         swp_entry_t entry;
> @@ -2881,17 +2881,35 @@ int do_swap_page(struct vm_fault *vmf)
>                 }
>                 goto out;
>         }
> +
> +
>         delayacct_set_flag(DELAYACCT_PF_SWAPIN);
>         if (!page)
>                 page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
>                                          vmf->address);
>         if (!page) {
> -               if (vma_readahead)
> -                       page = do_swap_page_readahead(entry,
> -                               GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
> -               else
> -                       page = swapin_readahead(entry,
> -                               GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> +               struct swap_info_struct *si = swp_swap_info(entry);
> +
> +               if (!(si->flags & SWP_SYNCHRONOUS_IO)) {
> +                       if (vma_readahead)
> +                               page = do_swap_page_readahead(entry,
> +                                       GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
> +                       else
> +                               page = swapin_readahead(entry,
> +                                       GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> +                       swapcache = page;
> +               } else {
> +                       /* skip swapcache */
> +                       page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> +                       if (page) {
> +                               __SetPageLocked(page);
> +                               __SetPageSwapBacked(page);
> +                               set_page_private(page, entry.val);
> +                               lru_cache_add_anon(page);
> +                               swap_readpage(page, true);
> +                       }
> +               }

I have a question for this.  If a page is mapped in multiple processes
(for example, because of fork).  With swap cache, after swapping out
and swapping in, the page will be still shared by these processes.
But with your changes, it appears that there will be multiple pages
with same contents mapped in multiple processes, even if the page
isn't written in these processes.  So this may waste some memory in
some situation?  And copying from device is even faster than looking
up swap cache in your system?

Best Regards,
Huang, Ying

> +
>                 if (!page) {
>                         /*
>                          * Back out if somebody else faulted in this pte
> @@ -2920,7 +2938,6 @@ int do_swap_page(struct vm_fault *vmf)
>                 goto out_release;
>         }
>
> -       swapcache = page;
>         locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags);
>
>         delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
> @@ -2935,7 +2952,8 @@ int do_swap_page(struct vm_fault *vmf)
>          * test below, are not enough to exclude that.  Even if it is still
>          * swapcache, we need to check that the page's swap has not changed.
>          */
> -       if (unlikely(!PageSwapCache(page) || page_private(page) != entry.val))
> +       if (unlikely((!PageSwapCache(page) ||
> +                       page_private(page) != entry.val)) && swapcache)
>                 goto out_page;
>
>         page = ksm_might_need_to_copy(page, vma, vmf->address);
> @@ -2988,14 +3006,16 @@ int do_swap_page(struct vm_fault *vmf)
>                 pte = pte_mksoft_dirty(pte);
>         set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
>         vmf->orig_pte = pte;
> -       if (page == swapcache) {
> -               do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
> -               mem_cgroup_commit_charge(page, memcg, true, false);
> -               activate_page(page);
> -       } else { /* ksm created a completely new copy */
> +
> +       /* ksm created a completely new copy */
> +       if (unlikely(page != swapcache && swapcache)) {
>                 page_add_new_anon_rmap(page, vma, vmf->address, false);
>                 mem_cgroup_commit_charge(page, memcg, false, false);
>                 lru_cache_add_active_or_unevictable(page, vma);
> +       } else {
> +               do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
> +               mem_cgroup_commit_charge(page, memcg, true, false);
> +               activate_page(page);
>         }
>
>         swap_free(entry);
> @@ -3003,7 +3023,7 @@ int do_swap_page(struct vm_fault *vmf)
>             (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
>                 try_to_free_swap(page);
>         unlock_page(page);
> -       if (page != swapcache) {
> +       if (page != swapcache && swapcache) {
>                 /*
>                  * Hold the lock to avoid the swap entry to be reused
>                  * until we take the PT lock for the pte_same() check
> @@ -3036,7 +3056,7 @@ int do_swap_page(struct vm_fault *vmf)
>         unlock_page(page);
>  out_release:
>         put_page(page);
> -       if (page != swapcache) {
> +       if (page != swapcache && swapcache) {
>                 unlock_page(swapcache);
>                 put_page(swapcache);
>         }
> diff --git a/mm/page_io.c b/mm/page_io.c
> index 21502d341a67..d4a98e1f6608 100644
> --- a/mm/page_io.c
> +++ b/mm/page_io.c
> @@ -346,7 +346,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
>         return ret;
>  }
>
> -int swap_readpage(struct page *page, bool do_poll)
> +int swap_readpage(struct page *page, bool synchronous)
>  {
>         struct bio *bio;
>         int ret = 0;
> @@ -354,7 +354,7 @@ int swap_readpage(struct page *page, bool do_poll)
>         blk_qc_t qc;
>         struct gendisk *disk;
>
> -       VM_BUG_ON_PAGE(!PageSwapCache(page), page);
> +       VM_BUG_ON_PAGE(!PageSwapCache(page) && !synchronous, page);
>         VM_BUG_ON_PAGE(!PageLocked(page), page);
>         VM_BUG_ON_PAGE(PageUptodate(page), page);
>         if (frontswap_load(page) == 0) {
> @@ -402,7 +402,7 @@ int swap_readpage(struct page *page, bool do_poll)
>         count_vm_event(PSWPIN);
>         bio_get(bio);
>         qc = submit_bio(bio);
> -       while (do_poll) {
> +       while (synchronous) {
>                 set_current_state(TASK_UNINTERRUPTIBLE);
>                 if (!READ_ONCE(bio->bi_private))
>                         break;
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 1305591cde4d..64a3d85226ba 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -3454,10 +3454,15 @@ int swapcache_prepare(swp_entry_t entry)
>         return __swap_duplicate(entry, SWAP_HAS_CACHE);
>  }
>
> +struct swap_info_struct *swp_swap_info(swp_entry_t entry)
> +{
> +       return swap_info[swp_type(entry)];
> +}
> +
>  struct swap_info_struct *page_swap_info(struct page *page)
>  {
> -       swp_entry_t swap = { .val = page_private(page) };
> -       return swap_info[swp_type(swap)];
> +       swp_entry_t entry = { .val = page_private(page) };
> +       return swp_swap_info(entry);
>  }
>
>  /*
> @@ -3465,7 +3470,6 @@ struct swap_info_struct *page_swap_info(struct page *page)
>   */
>  struct address_space *__page_file_mapping(struct page *page)
>  {
> -       VM_BUG_ON_PAGE(!PageSwapCache(page), page);
>         return page_swap_info(page)->swap_file->f_mapping;
>  }
>  EXPORT_SYMBOL_GPL(__page_file_mapping);
> @@ -3473,7 +3477,6 @@ EXPORT_SYMBOL_GPL(__page_file_mapping);
>  pgoff_t __page_file_index(struct page *page)
>  {
>         swp_entry_t swap = { .val = page_private(page) };
> -       VM_BUG_ON_PAGE(!PageSwapCache(page), page);
>         return swp_offset(swap);
>  }
>  EXPORT_SYMBOL_GPL(__page_file_index);
> --
> 2.7.4
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/4] mm:swap: skip swapcache for swapin of synchronous device
  2017-09-29  8:51     ` huang ying
@ 2017-10-09  1:26       ` huang ying
  -1 siblings, 0 replies; 20+ messages in thread
From: huang ying @ 2017-10-09  1:26 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, LKML, kernel-team, Christoph Hellwig,
	Dan Williams, Ross Zwisler, Hugh Dickins, Huang Ying

On Fri, Sep 29, 2017 at 4:51 PM, huang ying
<huang.ying.caritas@gmail.com> wrote:
> On Wed, Sep 20, 2017 at 1:43 PM, Minchan Kim <minchan@kernel.org> wrote:

[snip]

>> diff --git a/mm/memory.c b/mm/memory.c
>> index ec4e15494901..163ab2062385 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -2842,7 +2842,7 @@ EXPORT_SYMBOL(unmap_mapping_range);
>>  int do_swap_page(struct vm_fault *vmf)
>>  {
>>         struct vm_area_struct *vma = vmf->vma;
>> -       struct page *page = NULL, *swapcache;
>> +       struct page *page = NULL, *swapcache = NULL;
>>         struct mem_cgroup *memcg;
>>         struct vma_swap_readahead swap_ra;
>>         swp_entry_t entry;
>> @@ -2881,17 +2881,35 @@ int do_swap_page(struct vm_fault *vmf)
>>                 }
>>                 goto out;
>>         }
>> +
>> +
>>         delayacct_set_flag(DELAYACCT_PF_SWAPIN);
>>         if (!page)
>>                 page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
>>                                          vmf->address);
>>         if (!page) {
>> -               if (vma_readahead)
>> -                       page = do_swap_page_readahead(entry,
>> -                               GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
>> -               else
>> -                       page = swapin_readahead(entry,
>> -                               GFP_HIGHUSER_MOVABLE, vma, vmf->address);
>> +               struct swap_info_struct *si = swp_swap_info(entry);
>> +
>> +               if (!(si->flags & SWP_SYNCHRONOUS_IO)) {
>> +                       if (vma_readahead)
>> +                               page = do_swap_page_readahead(entry,
>> +                                       GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
>> +                       else
>> +                               page = swapin_readahead(entry,
>> +                                       GFP_HIGHUSER_MOVABLE, vma, vmf->address);
>> +                       swapcache = page;
>> +               } else {
>> +                       /* skip swapcache */
>> +                       page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
>> +                       if (page) {
>> +                               __SetPageLocked(page);
>> +                               __SetPageSwapBacked(page);
>> +                               set_page_private(page, entry.val);
>> +                               lru_cache_add_anon(page);
>> +                               swap_readpage(page, true);
>> +                       }
>> +               }
>
> I have a question for this.  If a page is mapped in multiple processes
> (for example, because of fork).  With swap cache, after swapping out
> and swapping in, the page will be still shared by these processes.
> But with your changes, it appears that there will be multiple pages
> with same contents mapped in multiple processes, even if the page
> isn't written in these processes.  So this may waste some memory in
> some situation?  And copying from device is even faster than looking
> up swap cache in your system?

Hi, Minchan,

Could you help me on this?

Best Regards,
Huang, Ying

[snip]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/4] mm:swap: skip swapcache for swapin of synchronous device
@ 2017-10-09  1:26       ` huang ying
  0 siblings, 0 replies; 20+ messages in thread
From: huang ying @ 2017-10-09  1:26 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-mm, LKML, kernel-team, Christoph Hellwig,
	Dan Williams, Ross Zwisler, Hugh Dickins, Huang Ying

On Fri, Sep 29, 2017 at 4:51 PM, huang ying
<huang.ying.caritas@gmail.com> wrote:
> On Wed, Sep 20, 2017 at 1:43 PM, Minchan Kim <minchan@kernel.org> wrote:

[snip]

>> diff --git a/mm/memory.c b/mm/memory.c
>> index ec4e15494901..163ab2062385 100644
>> --- a/mm/memory.c
>> +++ b/mm/memory.c
>> @@ -2842,7 +2842,7 @@ EXPORT_SYMBOL(unmap_mapping_range);
>>  int do_swap_page(struct vm_fault *vmf)
>>  {
>>         struct vm_area_struct *vma = vmf->vma;
>> -       struct page *page = NULL, *swapcache;
>> +       struct page *page = NULL, *swapcache = NULL;
>>         struct mem_cgroup *memcg;
>>         struct vma_swap_readahead swap_ra;
>>         swp_entry_t entry;
>> @@ -2881,17 +2881,35 @@ int do_swap_page(struct vm_fault *vmf)
>>                 }
>>                 goto out;
>>         }
>> +
>> +
>>         delayacct_set_flag(DELAYACCT_PF_SWAPIN);
>>         if (!page)
>>                 page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
>>                                          vmf->address);
>>         if (!page) {
>> -               if (vma_readahead)
>> -                       page = do_swap_page_readahead(entry,
>> -                               GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
>> -               else
>> -                       page = swapin_readahead(entry,
>> -                               GFP_HIGHUSER_MOVABLE, vma, vmf->address);
>> +               struct swap_info_struct *si = swp_swap_info(entry);
>> +
>> +               if (!(si->flags & SWP_SYNCHRONOUS_IO)) {
>> +                       if (vma_readahead)
>> +                               page = do_swap_page_readahead(entry,
>> +                                       GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
>> +                       else
>> +                               page = swapin_readahead(entry,
>> +                                       GFP_HIGHUSER_MOVABLE, vma, vmf->address);
>> +                       swapcache = page;
>> +               } else {
>> +                       /* skip swapcache */
>> +                       page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
>> +                       if (page) {
>> +                               __SetPageLocked(page);
>> +                               __SetPageSwapBacked(page);
>> +                               set_page_private(page, entry.val);
>> +                               lru_cache_add_anon(page);
>> +                               swap_readpage(page, true);
>> +                       }
>> +               }
>
> I have a question for this.  If a page is mapped in multiple processes
> (for example, because of fork).  With swap cache, after swapping out
> and swapping in, the page will be still shared by these processes.
> But with your changes, it appears that there will be multiple pages
> with same contents mapped in multiple processes, even if the page
> isn't written in these processes.  So this may waste some memory in
> some situation?  And copying from device is even faster than looking
> up swap cache in your system?

Hi, Minchan,

Could you help me on this?

Best Regards,
Huang, Ying

[snip]

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/4] mm:swap: skip swapcache for swapin of synchronous device
  2017-09-29  8:51     ` huang ying
@ 2017-10-10  0:34       ` Minchan Kim
  -1 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2017-10-10  0:34 UTC (permalink / raw)
  To: huang ying
  Cc: Andrew Morton, linux-mm, LKML, kernel-team, Christoph Hellwig,
	Dan Williams, Ross Zwisler, Hugh Dickins, Huang Ying

Hi Huang,

Sorry for the late response. It was long national holiday.

On Fri, Sep 29, 2017 at 04:51:17PM +0800, huang ying wrote:
> On Wed, Sep 20, 2017 at 1:43 PM, Minchan Kim <minchan@kernel.org> wrote:
> > With fast swap storage, platform want to use swap more aggressively
> > and swap-in is crucial to application latency.
> >
> > The rw_page based synchronous devices like zram, pmem and btt are such
> > fast storage. When I profile swapin performance with zram lz4 decompress
> > test, S/W overhead is more than 70%. Maybe, it would be bigger in nvdimm.
> >
> > This patch aims for reducing swap-in latency via skipping swapcache
> > if swap device is synchronous device like rw_page based device.
> > It enhances 45% my swapin test(5G sequential swapin, no readahead,
> > from 2.41sec to 1.64sec).
> >
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> > Cc: Hugh Dickins <hughd@google.com>
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >  include/linux/swap.h | 11 +++++++++++
> >  mm/memory.c          | 52 ++++++++++++++++++++++++++++++++++++----------------
> >  mm/page_io.c         |  6 +++---
> >  mm/swapfile.c        | 11 +++++++----
> >  4 files changed, 57 insertions(+), 23 deletions(-)
> >
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index fbb33919d1c6..cd2f66fdfc2d 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -461,6 +461,7 @@ extern int page_swapcount(struct page *);
> >  extern int __swp_swapcount(swp_entry_t entry);
> >  extern int swp_swapcount(swp_entry_t entry);
> >  extern struct swap_info_struct *page_swap_info(struct page *);
> > +extern struct swap_info_struct *swp_swap_info(swp_entry_t entry);
> >  extern bool reuse_swap_page(struct page *, int *);
> >  extern int try_to_free_swap(struct page *);
> >  struct backing_dev_info;
> > @@ -469,6 +470,16 @@ extern void exit_swap_address_space(unsigned int type);
> >
> >  #else /* CONFIG_SWAP */
> >
> > +static inline int swap_readpage(struct page *page, bool do_poll)
> > +{
> > +       return 0;
> > +}
> > +
> > +static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry)
> > +{
> > +       return NULL;
> > +}
> > +
> >  #define swap_address_space(entry)              (NULL)
> >  #define get_nr_swap_pages()                    0L
> >  #define total_swap_pages                       0L
> > diff --git a/mm/memory.c b/mm/memory.c
> > index ec4e15494901..163ab2062385 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2842,7 +2842,7 @@ EXPORT_SYMBOL(unmap_mapping_range);
> >  int do_swap_page(struct vm_fault *vmf)
> >  {
> >         struct vm_area_struct *vma = vmf->vma;
> > -       struct page *page = NULL, *swapcache;
> > +       struct page *page = NULL, *swapcache = NULL;
> >         struct mem_cgroup *memcg;
> >         struct vma_swap_readahead swap_ra;
> >         swp_entry_t entry;
> > @@ -2881,17 +2881,35 @@ int do_swap_page(struct vm_fault *vmf)
> >                 }
> >                 goto out;
> >         }
> > +
> > +
> >         delayacct_set_flag(DELAYACCT_PF_SWAPIN);
> >         if (!page)
> >                 page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
> >                                          vmf->address);
> >         if (!page) {
> > -               if (vma_readahead)
> > -                       page = do_swap_page_readahead(entry,
> > -                               GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
> > -               else
> > -                       page = swapin_readahead(entry,
> > -                               GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> > +               struct swap_info_struct *si = swp_swap_info(entry);
> > +
> > +               if (!(si->flags & SWP_SYNCHRONOUS_IO)) {
> > +                       if (vma_readahead)
> > +                               page = do_swap_page_readahead(entry,
> > +                                       GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
> > +                       else
> > +                               page = swapin_readahead(entry,
> > +                                       GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> > +                       swapcache = page;
> > +               } else {
> > +                       /* skip swapcache */
> > +                       page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> > +                       if (page) {
> > +                               __SetPageLocked(page);
> > +                               __SetPageSwapBacked(page);
> > +                               set_page_private(page, entry.val);
> > +                               lru_cache_add_anon(page);
> > +                               swap_readpage(page, true);
> > +                       }
> > +               }
> 
> I have a question for this.  If a page is mapped in multiple processes
> (for example, because of fork).  With swap cache, after swapping out
> and swapping in, the page will be still shared by these processes.
> But with your changes, it appears that there will be multiple pages
> with same contents mapped in multiple processes, even if the page
> isn't written in these processes.  So this may waste some memory in
> some situation?  And copying from device is even faster than looking
> up swap cache in your system?

I expected a page shared by several processes has low possibility to swap out
compared to a single mapped page. Nonetheless, once it is swapped out, it also
has low chance to swap in so I didn't cover the case intentionally until we
get any regression report.

However, a fix would be simple so I don't care to add up it.
Any thoughts?

diff --git a/include/linux/swap.h b/include/linux/swap.h
index cd2f66fdfc2d..23f19ffa5cc3 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -458,6 +458,7 @@ extern unsigned int count_swap_pages(int, int);
 extern sector_t map_swap_page(struct page *, struct block_device **);
 extern sector_t swapdev_block(int, pgoff_t);
 extern int page_swapcount(struct page *);
+extern int __swap_count(struct swap_info_struct *si, swp_entry_t entry);
 extern int __swp_swapcount(swp_entry_t entry);
 extern int swp_swapcount(swp_entry_t entry);
 extern struct swap_info_struct *page_swap_info(struct page *);
@@ -584,6 +585,11 @@ static inline int page_swapcount(struct page *page)
 	return 0;
 }
 
+static inline int __swap_count(structd swap_info_struct *si, swp_entry_t entry)
+{
+	return 0;
+}
+
 static inline int __swp_swapcount(swp_entry_t entry)
 {
 	return 0;
diff --git a/include/linux/swapfile.h b/include/linux/swapfile.h
index 388293a91e8c..49f8e19dd506 100644
--- a/include/linux/swapfile.h
+++ b/include/linux/swapfile.h
@@ -9,5 +9,4 @@ extern spinlock_t swap_lock;
 extern struct plist_head swap_active_head;
 extern struct swap_info_struct *swap_info[];
 extern int try_to_unuse(unsigned int, bool, unsigned long);
-
 #endif /* _LINUX_SWAPFILE_H */
diff --git a/mm/memory.c b/mm/memory.c
index 163ab2062385..c6f0abe8b39b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2890,15 +2890,8 @@ int do_swap_page(struct vm_fault *vmf)
 	if (!page) {
 		struct swap_info_struct *si = swp_swap_info(entry);
 
-		if (!(si->flags & SWP_SYNCHRONOUS_IO)) {
-			if (vma_readahead)
-				page = do_swap_page_readahead(entry,
-					GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
-			else
-				page = swapin_readahead(entry,
-					GFP_HIGHUSER_MOVABLE, vma, vmf->address);
-			swapcache = page;
-		} else {
+		if ((si->flags & SWP_SYNCHRONOUS_IO) && (vmf->flags & FAULT_FLAG_WRITE ||
+							__swap_count(si, entry) == 1)) {
 			/* skip swapcache */
 			page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
 			if (page) {
@@ -2908,6 +2901,14 @@ int do_swap_page(struct vm_fault *vmf)
 				lru_cache_add_anon(page);
 				swap_readpage(page, true);
 			}
+		} else {
+			if (vma_readahead)
+				page = do_swap_page_readahead(entry,
+					GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
+			else
+				page = swapin_readahead(entry,
+					GFP_HIGHUSER_MOVABLE, vma, vmf->address);
+			swapcache = page;
 		}
 
 		if (!page) {
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 64a3d85226ba..37d7ba71a2ca 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1328,7 +1328,13 @@ int page_swapcount(struct page *page)
 	return count;
 }
 
-static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry)
+int __swap_count(struct swap_info_struct *si, swp_entry_t entry)
+{
+	pgoff_t offset = swp_offset(entry);
+	return swap_count(si->swap_map[offset]);
+}
+
+int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry)
 {
 	int count = 0;
 	pgoff_t offset = swp_offset(entry);

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/4] mm:swap: skip swapcache for swapin of synchronous device
@ 2017-10-10  0:34       ` Minchan Kim
  0 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2017-10-10  0:34 UTC (permalink / raw)
  To: huang ying
  Cc: Andrew Morton, linux-mm, LKML, kernel-team, Christoph Hellwig,
	Dan Williams, Ross Zwisler, Hugh Dickins, Huang Ying

Hi Huang,

Sorry for the late response. It was long national holiday.

On Fri, Sep 29, 2017 at 04:51:17PM +0800, huang ying wrote:
> On Wed, Sep 20, 2017 at 1:43 PM, Minchan Kim <minchan@kernel.org> wrote:
> > With fast swap storage, platform want to use swap more aggressively
> > and swap-in is crucial to application latency.
> >
> > The rw_page based synchronous devices like zram, pmem and btt are such
> > fast storage. When I profile swapin performance with zram lz4 decompress
> > test, S/W overhead is more than 70%. Maybe, it would be bigger in nvdimm.
> >
> > This patch aims for reducing swap-in latency via skipping swapcache
> > if swap device is synchronous device like rw_page based device.
> > It enhances 45% my swapin test(5G sequential swapin, no readahead,
> > from 2.41sec to 1.64sec).
> >
> > Cc: Dan Williams <dan.j.williams@intel.com>
> > Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
> > Cc: Hugh Dickins <hughd@google.com>
> > Signed-off-by: Minchan Kim <minchan@kernel.org>
> > ---
> >  include/linux/swap.h | 11 +++++++++++
> >  mm/memory.c          | 52 ++++++++++++++++++++++++++++++++++++----------------
> >  mm/page_io.c         |  6 +++---
> >  mm/swapfile.c        | 11 +++++++----
> >  4 files changed, 57 insertions(+), 23 deletions(-)
> >
> > diff --git a/include/linux/swap.h b/include/linux/swap.h
> > index fbb33919d1c6..cd2f66fdfc2d 100644
> > --- a/include/linux/swap.h
> > +++ b/include/linux/swap.h
> > @@ -461,6 +461,7 @@ extern int page_swapcount(struct page *);
> >  extern int __swp_swapcount(swp_entry_t entry);
> >  extern int swp_swapcount(swp_entry_t entry);
> >  extern struct swap_info_struct *page_swap_info(struct page *);
> > +extern struct swap_info_struct *swp_swap_info(swp_entry_t entry);
> >  extern bool reuse_swap_page(struct page *, int *);
> >  extern int try_to_free_swap(struct page *);
> >  struct backing_dev_info;
> > @@ -469,6 +470,16 @@ extern void exit_swap_address_space(unsigned int type);
> >
> >  #else /* CONFIG_SWAP */
> >
> > +static inline int swap_readpage(struct page *page, bool do_poll)
> > +{
> > +       return 0;
> > +}
> > +
> > +static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry)
> > +{
> > +       return NULL;
> > +}
> > +
> >  #define swap_address_space(entry)              (NULL)
> >  #define get_nr_swap_pages()                    0L
> >  #define total_swap_pages                       0L
> > diff --git a/mm/memory.c b/mm/memory.c
> > index ec4e15494901..163ab2062385 100644
> > --- a/mm/memory.c
> > +++ b/mm/memory.c
> > @@ -2842,7 +2842,7 @@ EXPORT_SYMBOL(unmap_mapping_range);
> >  int do_swap_page(struct vm_fault *vmf)
> >  {
> >         struct vm_area_struct *vma = vmf->vma;
> > -       struct page *page = NULL, *swapcache;
> > +       struct page *page = NULL, *swapcache = NULL;
> >         struct mem_cgroup *memcg;
> >         struct vma_swap_readahead swap_ra;
> >         swp_entry_t entry;
> > @@ -2881,17 +2881,35 @@ int do_swap_page(struct vm_fault *vmf)
> >                 }
> >                 goto out;
> >         }
> > +
> > +
> >         delayacct_set_flag(DELAYACCT_PF_SWAPIN);
> >         if (!page)
> >                 page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
> >                                          vmf->address);
> >         if (!page) {
> > -               if (vma_readahead)
> > -                       page = do_swap_page_readahead(entry,
> > -                               GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
> > -               else
> > -                       page = swapin_readahead(entry,
> > -                               GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> > +               struct swap_info_struct *si = swp_swap_info(entry);
> > +
> > +               if (!(si->flags & SWP_SYNCHRONOUS_IO)) {
> > +                       if (vma_readahead)
> > +                               page = do_swap_page_readahead(entry,
> > +                                       GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
> > +                       else
> > +                               page = swapin_readahead(entry,
> > +                                       GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> > +                       swapcache = page;
> > +               } else {
> > +                       /* skip swapcache */
> > +                       page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> > +                       if (page) {
> > +                               __SetPageLocked(page);
> > +                               __SetPageSwapBacked(page);
> > +                               set_page_private(page, entry.val);
> > +                               lru_cache_add_anon(page);
> > +                               swap_readpage(page, true);
> > +                       }
> > +               }
> 
> I have a question for this.  If a page is mapped in multiple processes
> (for example, because of fork).  With swap cache, after swapping out
> and swapping in, the page will be still shared by these processes.
> But with your changes, it appears that there will be multiple pages
> with same contents mapped in multiple processes, even if the page
> isn't written in these processes.  So this may waste some memory in
> some situation?  And copying from device is even faster than looking
> up swap cache in your system?

I expected a page shared by several processes has low possibility to swap out
compared to a single mapped page. Nonetheless, once it is swapped out, it also
has low chance to swap in so I didn't cover the case intentionally until we
get any regression report.

However, a fix would be simple so I don't care to add up it.
Any thoughts?

diff --git a/include/linux/swap.h b/include/linux/swap.h
index cd2f66fdfc2d..23f19ffa5cc3 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -458,6 +458,7 @@ extern unsigned int count_swap_pages(int, int);
 extern sector_t map_swap_page(struct page *, struct block_device **);
 extern sector_t swapdev_block(int, pgoff_t);
 extern int page_swapcount(struct page *);
+extern int __swap_count(struct swap_info_struct *si, swp_entry_t entry);
 extern int __swp_swapcount(swp_entry_t entry);
 extern int swp_swapcount(swp_entry_t entry);
 extern struct swap_info_struct *page_swap_info(struct page *);
@@ -584,6 +585,11 @@ static inline int page_swapcount(struct page *page)
 	return 0;
 }
 
+static inline int __swap_count(structd swap_info_struct *si, swp_entry_t entry)
+{
+	return 0;
+}
+
 static inline int __swp_swapcount(swp_entry_t entry)
 {
 	return 0;
diff --git a/include/linux/swapfile.h b/include/linux/swapfile.h
index 388293a91e8c..49f8e19dd506 100644
--- a/include/linux/swapfile.h
+++ b/include/linux/swapfile.h
@@ -9,5 +9,4 @@ extern spinlock_t swap_lock;
 extern struct plist_head swap_active_head;
 extern struct swap_info_struct *swap_info[];
 extern int try_to_unuse(unsigned int, bool, unsigned long);
-
 #endif /* _LINUX_SWAPFILE_H */
diff --git a/mm/memory.c b/mm/memory.c
index 163ab2062385..c6f0abe8b39b 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2890,15 +2890,8 @@ int do_swap_page(struct vm_fault *vmf)
 	if (!page) {
 		struct swap_info_struct *si = swp_swap_info(entry);
 
-		if (!(si->flags & SWP_SYNCHRONOUS_IO)) {
-			if (vma_readahead)
-				page = do_swap_page_readahead(entry,
-					GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
-			else
-				page = swapin_readahead(entry,
-					GFP_HIGHUSER_MOVABLE, vma, vmf->address);
-			swapcache = page;
-		} else {
+		if ((si->flags & SWP_SYNCHRONOUS_IO) && (vmf->flags & FAULT_FLAG_WRITE ||
+							__swap_count(si, entry) == 1)) {
 			/* skip swapcache */
 			page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
 			if (page) {
@@ -2908,6 +2901,14 @@ int do_swap_page(struct vm_fault *vmf)
 				lru_cache_add_anon(page);
 				swap_readpage(page, true);
 			}
+		} else {
+			if (vma_readahead)
+				page = do_swap_page_readahead(entry,
+					GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
+			else
+				page = swapin_readahead(entry,
+					GFP_HIGHUSER_MOVABLE, vma, vmf->address);
+			swapcache = page;
 		}
 
 		if (!page) {
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 64a3d85226ba..37d7ba71a2ca 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -1328,7 +1328,13 @@ int page_swapcount(struct page *page)
 	return count;
 }
 
-static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry)
+int __swap_count(struct swap_info_struct *si, swp_entry_t entry)
+{
+	pgoff_t offset = swp_offset(entry);
+	return swap_count(si->swap_map[offset]);
+}
+
+int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry)
 {
 	int count = 0;
 	pgoff_t offset = swp_offset(entry);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/4] mm:swap: skip swapcache for swapin of synchronous device
  2017-10-10  0:34       ` Minchan Kim
@ 2017-10-10  1:10         ` Huang, Ying
  -1 siblings, 0 replies; 20+ messages in thread
From: Huang, Ying @ 2017-10-10  1:10 UTC (permalink / raw)
  To: Minchan Kim
  Cc: huang ying, Andrew Morton, linux-mm, LKML, kernel-team,
	Christoph Hellwig, Dan Williams, Ross Zwisler, Hugh Dickins,
	Huang Ying

Minchan Kim <minchan@kernel.org> writes:

> Hi Huang,
>
> Sorry for the late response. It was long national holiday.
>
> On Fri, Sep 29, 2017 at 04:51:17PM +0800, huang ying wrote:
>> On Wed, Sep 20, 2017 at 1:43 PM, Minchan Kim <minchan@kernel.org> wrote:
>> > With fast swap storage, platform want to use swap more aggressively
>> > and swap-in is crucial to application latency.
>> >
>> > The rw_page based synchronous devices like zram, pmem and btt are such
>> > fast storage. When I profile swapin performance with zram lz4 decompress
>> > test, S/W overhead is more than 70%. Maybe, it would be bigger in nvdimm.
>> >
>> > This patch aims for reducing swap-in latency via skipping swapcache
>> > if swap device is synchronous device like rw_page based device.
>> > It enhances 45% my swapin test(5G sequential swapin, no readahead,
>> > from 2.41sec to 1.64sec).
>> >
>> > Cc: Dan Williams <dan.j.williams@intel.com>
>> > Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
>> > Cc: Hugh Dickins <hughd@google.com>
>> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>> > ---
>> >  include/linux/swap.h | 11 +++++++++++
>> >  mm/memory.c          | 52 ++++++++++++++++++++++++++++++++++++----------------
>> >  mm/page_io.c         |  6 +++---
>> >  mm/swapfile.c        | 11 +++++++----
>> >  4 files changed, 57 insertions(+), 23 deletions(-)
>> >
>> > diff --git a/include/linux/swap.h b/include/linux/swap.h
>> > index fbb33919d1c6..cd2f66fdfc2d 100644
>> > --- a/include/linux/swap.h
>> > +++ b/include/linux/swap.h
>> > @@ -461,6 +461,7 @@ extern int page_swapcount(struct page *);
>> >  extern int __swp_swapcount(swp_entry_t entry);
>> >  extern int swp_swapcount(swp_entry_t entry);
>> >  extern struct swap_info_struct *page_swap_info(struct page *);
>> > +extern struct swap_info_struct *swp_swap_info(swp_entry_t entry);
>> >  extern bool reuse_swap_page(struct page *, int *);
>> >  extern int try_to_free_swap(struct page *);
>> >  struct backing_dev_info;
>> > @@ -469,6 +470,16 @@ extern void exit_swap_address_space(unsigned int type);
>> >
>> >  #else /* CONFIG_SWAP */
>> >
>> > +static inline int swap_readpage(struct page *page, bool do_poll)
>> > +{
>> > +       return 0;
>> > +}
>> > +
>> > +static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry)
>> > +{
>> > +       return NULL;
>> > +}
>> > +
>> >  #define swap_address_space(entry)              (NULL)
>> >  #define get_nr_swap_pages()                    0L
>> >  #define total_swap_pages                       0L
>> > diff --git a/mm/memory.c b/mm/memory.c
>> > index ec4e15494901..163ab2062385 100644
>> > --- a/mm/memory.c
>> > +++ b/mm/memory.c
>> > @@ -2842,7 +2842,7 @@ EXPORT_SYMBOL(unmap_mapping_range);
>> >  int do_swap_page(struct vm_fault *vmf)
>> >  {
>> >         struct vm_area_struct *vma = vmf->vma;
>> > -       struct page *page = NULL, *swapcache;
>> > +       struct page *page = NULL, *swapcache = NULL;
>> >         struct mem_cgroup *memcg;
>> >         struct vma_swap_readahead swap_ra;
>> >         swp_entry_t entry;
>> > @@ -2881,17 +2881,35 @@ int do_swap_page(struct vm_fault *vmf)
>> >                 }
>> >                 goto out;
>> >         }
>> > +
>> > +
>> >         delayacct_set_flag(DELAYACCT_PF_SWAPIN);
>> >         if (!page)
>> >                 page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
>> >                                          vmf->address);
>> >         if (!page) {
>> > -               if (vma_readahead)
>> > -                       page = do_swap_page_readahead(entry,
>> > -                               GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
>> > -               else
>> > -                       page = swapin_readahead(entry,
>> > -                               GFP_HIGHUSER_MOVABLE, vma, vmf->address);
>> > +               struct swap_info_struct *si = swp_swap_info(entry);
>> > +
>> > +               if (!(si->flags & SWP_SYNCHRONOUS_IO)) {
>> > +                       if (vma_readahead)
>> > +                               page = do_swap_page_readahead(entry,
>> > +                                       GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
>> > +                       else
>> > +                               page = swapin_readahead(entry,
>> > +                                       GFP_HIGHUSER_MOVABLE, vma, vmf->address);
>> > +                       swapcache = page;
>> > +               } else {
>> > +                       /* skip swapcache */
>> > +                       page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
>> > +                       if (page) {
>> > +                               __SetPageLocked(page);
>> > +                               __SetPageSwapBacked(page);
>> > +                               set_page_private(page, entry.val);
>> > +                               lru_cache_add_anon(page);
>> > +                               swap_readpage(page, true);
>> > +                       }
>> > +               }
>> 
>> I have a question for this.  If a page is mapped in multiple processes
>> (for example, because of fork).  With swap cache, after swapping out
>> and swapping in, the page will be still shared by these processes.
>> But with your changes, it appears that there will be multiple pages
>> with same contents mapped in multiple processes, even if the page
>> isn't written in these processes.  So this may waste some memory in
>> some situation?  And copying from device is even faster than looking
>> up swap cache in your system?
>
> I expected a page shared by several processes has low possibility to swap out
> compared to a single mapped page. Nonetheless, once it is swapped out, it also
> has low chance to swap in so I didn't cover the case intentionally until we
> get any regression report.

Thanks for explanation.

> However, a fix would be simple so I don't care to add up it.
> Any thoughts?

I think the fix can work well with shared anonymous pages in most cases
(although not all).  It should be good to add it.

Best Regards,
Huang, Ying

> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index cd2f66fdfc2d..23f19ffa5cc3 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -458,6 +458,7 @@ extern unsigned int count_swap_pages(int, int);
>  extern sector_t map_swap_page(struct page *, struct block_device **);
>  extern sector_t swapdev_block(int, pgoff_t);
>  extern int page_swapcount(struct page *);
> +extern int __swap_count(struct swap_info_struct *si, swp_entry_t entry);
>  extern int __swp_swapcount(swp_entry_t entry);
>  extern int swp_swapcount(swp_entry_t entry);
>  extern struct swap_info_struct *page_swap_info(struct page *);
> @@ -584,6 +585,11 @@ static inline int page_swapcount(struct page *page)
>  	return 0;
>  }
>  
> +static inline int __swap_count(structd swap_info_struct *si, swp_entry_t entry)
> +{
> +	return 0;
> +}
> +
>  static inline int __swp_swapcount(swp_entry_t entry)
>  {
>  	return 0;
> diff --git a/include/linux/swapfile.h b/include/linux/swapfile.h
> index 388293a91e8c..49f8e19dd506 100644
> --- a/include/linux/swapfile.h
> +++ b/include/linux/swapfile.h
> @@ -9,5 +9,4 @@ extern spinlock_t swap_lock;
>  extern struct plist_head swap_active_head;
>  extern struct swap_info_struct *swap_info[];
>  extern int try_to_unuse(unsigned int, bool, unsigned long);
> -
>  #endif /* _LINUX_SWAPFILE_H */
> diff --git a/mm/memory.c b/mm/memory.c
> index 163ab2062385..c6f0abe8b39b 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2890,15 +2890,8 @@ int do_swap_page(struct vm_fault *vmf)
>  	if (!page) {
>  		struct swap_info_struct *si = swp_swap_info(entry);
>  
> -		if (!(si->flags & SWP_SYNCHRONOUS_IO)) {
> -			if (vma_readahead)
> -				page = do_swap_page_readahead(entry,
> -					GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
> -			else
> -				page = swapin_readahead(entry,
> -					GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> -			swapcache = page;
> -		} else {
> +		if ((si->flags & SWP_SYNCHRONOUS_IO) && (vmf->flags & FAULT_FLAG_WRITE ||
> +							__swap_count(si, entry) == 1)) {
>  			/* skip swapcache */
>  			page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
>  			if (page) {
> @@ -2908,6 +2901,14 @@ int do_swap_page(struct vm_fault *vmf)
>  				lru_cache_add_anon(page);
>  				swap_readpage(page, true);
>  			}
> +		} else {
> +			if (vma_readahead)
> +				page = do_swap_page_readahead(entry,
> +					GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
> +			else
> +				page = swapin_readahead(entry,
> +					GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> +			swapcache = page;
>  		}
>  
>  		if (!page) {
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 64a3d85226ba..37d7ba71a2ca 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1328,7 +1328,13 @@ int page_swapcount(struct page *page)
>  	return count;
>  }
>  
> -static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry)
> +int __swap_count(struct swap_info_struct *si, swp_entry_t entry)
> +{
> +	pgoff_t offset = swp_offset(entry);
> +	return swap_count(si->swap_map[offset]);
> +}
> +
> +int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry)
>  {
>  	int count = 0;
>  	pgoff_t offset = swp_offset(entry);

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [PATCH v2 4/4] mm:swap: skip swapcache for swapin of synchronous device
@ 2017-10-10  1:10         ` Huang, Ying
  0 siblings, 0 replies; 20+ messages in thread
From: Huang, Ying @ 2017-10-10  1:10 UTC (permalink / raw)
  To: Minchan Kim
  Cc: huang ying, Andrew Morton, linux-mm, LKML, kernel-team,
	Christoph Hellwig, Dan Williams, Ross Zwisler, Hugh Dickins,
	Huang Ying

Minchan Kim <minchan@kernel.org> writes:

> Hi Huang,
>
> Sorry for the late response. It was long national holiday.
>
> On Fri, Sep 29, 2017 at 04:51:17PM +0800, huang ying wrote:
>> On Wed, Sep 20, 2017 at 1:43 PM, Minchan Kim <minchan@kernel.org> wrote:
>> > With fast swap storage, platform want to use swap more aggressively
>> > and swap-in is crucial to application latency.
>> >
>> > The rw_page based synchronous devices like zram, pmem and btt are such
>> > fast storage. When I profile swapin performance with zram lz4 decompress
>> > test, S/W overhead is more than 70%. Maybe, it would be bigger in nvdimm.
>> >
>> > This patch aims for reducing swap-in latency via skipping swapcache
>> > if swap device is synchronous device like rw_page based device.
>> > It enhances 45% my swapin test(5G sequential swapin, no readahead,
>> > from 2.41sec to 1.64sec).
>> >
>> > Cc: Dan Williams <dan.j.williams@intel.com>
>> > Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
>> > Cc: Hugh Dickins <hughd@google.com>
>> > Signed-off-by: Minchan Kim <minchan@kernel.org>
>> > ---
>> >  include/linux/swap.h | 11 +++++++++++
>> >  mm/memory.c          | 52 ++++++++++++++++++++++++++++++++++++----------------
>> >  mm/page_io.c         |  6 +++---
>> >  mm/swapfile.c        | 11 +++++++----
>> >  4 files changed, 57 insertions(+), 23 deletions(-)
>> >
>> > diff --git a/include/linux/swap.h b/include/linux/swap.h
>> > index fbb33919d1c6..cd2f66fdfc2d 100644
>> > --- a/include/linux/swap.h
>> > +++ b/include/linux/swap.h
>> > @@ -461,6 +461,7 @@ extern int page_swapcount(struct page *);
>> >  extern int __swp_swapcount(swp_entry_t entry);
>> >  extern int swp_swapcount(swp_entry_t entry);
>> >  extern struct swap_info_struct *page_swap_info(struct page *);
>> > +extern struct swap_info_struct *swp_swap_info(swp_entry_t entry);
>> >  extern bool reuse_swap_page(struct page *, int *);
>> >  extern int try_to_free_swap(struct page *);
>> >  struct backing_dev_info;
>> > @@ -469,6 +470,16 @@ extern void exit_swap_address_space(unsigned int type);
>> >
>> >  #else /* CONFIG_SWAP */
>> >
>> > +static inline int swap_readpage(struct page *page, bool do_poll)
>> > +{
>> > +       return 0;
>> > +}
>> > +
>> > +static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry)
>> > +{
>> > +       return NULL;
>> > +}
>> > +
>> >  #define swap_address_space(entry)              (NULL)
>> >  #define get_nr_swap_pages()                    0L
>> >  #define total_swap_pages                       0L
>> > diff --git a/mm/memory.c b/mm/memory.c
>> > index ec4e15494901..163ab2062385 100644
>> > --- a/mm/memory.c
>> > +++ b/mm/memory.c
>> > @@ -2842,7 +2842,7 @@ EXPORT_SYMBOL(unmap_mapping_range);
>> >  int do_swap_page(struct vm_fault *vmf)
>> >  {
>> >         struct vm_area_struct *vma = vmf->vma;
>> > -       struct page *page = NULL, *swapcache;
>> > +       struct page *page = NULL, *swapcache = NULL;
>> >         struct mem_cgroup *memcg;
>> >         struct vma_swap_readahead swap_ra;
>> >         swp_entry_t entry;
>> > @@ -2881,17 +2881,35 @@ int do_swap_page(struct vm_fault *vmf)
>> >                 }
>> >                 goto out;
>> >         }
>> > +
>> > +
>> >         delayacct_set_flag(DELAYACCT_PF_SWAPIN);
>> >         if (!page)
>> >                 page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
>> >                                          vmf->address);
>> >         if (!page) {
>> > -               if (vma_readahead)
>> > -                       page = do_swap_page_readahead(entry,
>> > -                               GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
>> > -               else
>> > -                       page = swapin_readahead(entry,
>> > -                               GFP_HIGHUSER_MOVABLE, vma, vmf->address);
>> > +               struct swap_info_struct *si = swp_swap_info(entry);
>> > +
>> > +               if (!(si->flags & SWP_SYNCHRONOUS_IO)) {
>> > +                       if (vma_readahead)
>> > +                               page = do_swap_page_readahead(entry,
>> > +                                       GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
>> > +                       else
>> > +                               page = swapin_readahead(entry,
>> > +                                       GFP_HIGHUSER_MOVABLE, vma, vmf->address);
>> > +                       swapcache = page;
>> > +               } else {
>> > +                       /* skip swapcache */
>> > +                       page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
>> > +                       if (page) {
>> > +                               __SetPageLocked(page);
>> > +                               __SetPageSwapBacked(page);
>> > +                               set_page_private(page, entry.val);
>> > +                               lru_cache_add_anon(page);
>> > +                               swap_readpage(page, true);
>> > +                       }
>> > +               }
>> 
>> I have a question for this.  If a page is mapped in multiple processes
>> (for example, because of fork).  With swap cache, after swapping out
>> and swapping in, the page will be still shared by these processes.
>> But with your changes, it appears that there will be multiple pages
>> with same contents mapped in multiple processes, even if the page
>> isn't written in these processes.  So this may waste some memory in
>> some situation?  And copying from device is even faster than looking
>> up swap cache in your system?
>
> I expected a page shared by several processes has low possibility to swap out
> compared to a single mapped page. Nonetheless, once it is swapped out, it also
> has low chance to swap in so I didn't cover the case intentionally until we
> get any regression report.

Thanks for explanation.

> However, a fix would be simple so I don't care to add up it.
> Any thoughts?

I think the fix can work well with shared anonymous pages in most cases
(although not all).  It should be good to add it.

Best Regards,
Huang, Ying

> diff --git a/include/linux/swap.h b/include/linux/swap.h
> index cd2f66fdfc2d..23f19ffa5cc3 100644
> --- a/include/linux/swap.h
> +++ b/include/linux/swap.h
> @@ -458,6 +458,7 @@ extern unsigned int count_swap_pages(int, int);
>  extern sector_t map_swap_page(struct page *, struct block_device **);
>  extern sector_t swapdev_block(int, pgoff_t);
>  extern int page_swapcount(struct page *);
> +extern int __swap_count(struct swap_info_struct *si, swp_entry_t entry);
>  extern int __swp_swapcount(swp_entry_t entry);
>  extern int swp_swapcount(swp_entry_t entry);
>  extern struct swap_info_struct *page_swap_info(struct page *);
> @@ -584,6 +585,11 @@ static inline int page_swapcount(struct page *page)
>  	return 0;
>  }
>  
> +static inline int __swap_count(structd swap_info_struct *si, swp_entry_t entry)
> +{
> +	return 0;
> +}
> +
>  static inline int __swp_swapcount(swp_entry_t entry)
>  {
>  	return 0;
> diff --git a/include/linux/swapfile.h b/include/linux/swapfile.h
> index 388293a91e8c..49f8e19dd506 100644
> --- a/include/linux/swapfile.h
> +++ b/include/linux/swapfile.h
> @@ -9,5 +9,4 @@ extern spinlock_t swap_lock;
>  extern struct plist_head swap_active_head;
>  extern struct swap_info_struct *swap_info[];
>  extern int try_to_unuse(unsigned int, bool, unsigned long);
> -
>  #endif /* _LINUX_SWAPFILE_H */
> diff --git a/mm/memory.c b/mm/memory.c
> index 163ab2062385..c6f0abe8b39b 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -2890,15 +2890,8 @@ int do_swap_page(struct vm_fault *vmf)
>  	if (!page) {
>  		struct swap_info_struct *si = swp_swap_info(entry);
>  
> -		if (!(si->flags & SWP_SYNCHRONOUS_IO)) {
> -			if (vma_readahead)
> -				page = do_swap_page_readahead(entry,
> -					GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
> -			else
> -				page = swapin_readahead(entry,
> -					GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> -			swapcache = page;
> -		} else {
> +		if ((si->flags & SWP_SYNCHRONOUS_IO) && (vmf->flags & FAULT_FLAG_WRITE ||
> +							__swap_count(si, entry) == 1)) {
>  			/* skip swapcache */
>  			page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
>  			if (page) {
> @@ -2908,6 +2901,14 @@ int do_swap_page(struct vm_fault *vmf)
>  				lru_cache_add_anon(page);
>  				swap_readpage(page, true);
>  			}
> +		} else {
> +			if (vma_readahead)
> +				page = do_swap_page_readahead(entry,
> +					GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
> +			else
> +				page = swapin_readahead(entry,
> +					GFP_HIGHUSER_MOVABLE, vma, vmf->address);
> +			swapcache = page;
>  		}
>  
>  		if (!page) {
> diff --git a/mm/swapfile.c b/mm/swapfile.c
> index 64a3d85226ba..37d7ba71a2ca 100644
> --- a/mm/swapfile.c
> +++ b/mm/swapfile.c
> @@ -1328,7 +1328,13 @@ int page_swapcount(struct page *page)
>  	return count;
>  }
>  
> -static int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry)
> +int __swap_count(struct swap_info_struct *si, swp_entry_t entry)
> +{
> +	pgoff_t offset = swp_offset(entry);
> +	return swap_count(si->swap_map[offset]);
> +}
> +
> +int swap_swapcount(struct swap_info_struct *si, swp_entry_t entry)
>  {
>  	int count = 0;
>  	pgoff_t offset = swp_offset(entry);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [PATCH v2 4/4] mm:swap: skip swapcache for swapin of synchronous device
  2017-09-19  7:09 [PATCH v2 0/4] skip swapcache for super fast device Minchan Kim
@ 2017-09-19  7:10   ` Minchan Kim
  0 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2017-09-19  7:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kernel-team, linux-kernel, linux-mm, Minchan Kim, Dan Williams,
	Ross Zwisler, Hugh Dickins

With fast swap storage, platform want to use swap more aggressively
and swap-in is crucial to application latency.

The rw_page based synchronous devices like zram, pmem and btt are such
fast storage. When I profile swapin performance with zram lz4 decompress
test, S/W overhead is more than 70%. Maybe, it would be bigger in nvdimm.

This patch aims for reducing swap-in latency via skipping swapcache
if swap device is synchronous device like rw_page based device.
It enhances 45% my swapin test(5G sequential swapin, no readahead,
from 2.41sec to 1.64sec).

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 include/linux/swap.h | 11 +++++++++++
 mm/memory.c          | 52 ++++++++++++++++++++++++++++++++++++----------------
 mm/page_io.c         |  6 +++---
 mm/swapfile.c        | 11 +++++++----
 4 files changed, 57 insertions(+), 23 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index fbb33919d1c6..cd2f66fdfc2d 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -461,6 +461,7 @@ extern int page_swapcount(struct page *);
 extern int __swp_swapcount(swp_entry_t entry);
 extern int swp_swapcount(swp_entry_t entry);
 extern struct swap_info_struct *page_swap_info(struct page *);
+extern struct swap_info_struct *swp_swap_info(swp_entry_t entry);
 extern bool reuse_swap_page(struct page *, int *);
 extern int try_to_free_swap(struct page *);
 struct backing_dev_info;
@@ -469,6 +470,16 @@ extern void exit_swap_address_space(unsigned int type);
 
 #else /* CONFIG_SWAP */
 
+static inline int swap_readpage(struct page *page, bool do_poll)
+{
+	return 0;
+}
+
+static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry)
+{
+	return NULL;
+}
+
 #define swap_address_space(entry)		(NULL)
 #define get_nr_swap_pages()			0L
 #define total_swap_pages			0L
diff --git a/mm/memory.c b/mm/memory.c
index ec4e15494901..163ab2062385 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2842,7 +2842,7 @@ EXPORT_SYMBOL(unmap_mapping_range);
 int do_swap_page(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	struct page *page = NULL, *swapcache;
+	struct page *page = NULL, *swapcache = NULL;
 	struct mem_cgroup *memcg;
 	struct vma_swap_readahead swap_ra;
 	swp_entry_t entry;
@@ -2881,17 +2881,35 @@ int do_swap_page(struct vm_fault *vmf)
 		}
 		goto out;
 	}
+
+
 	delayacct_set_flag(DELAYACCT_PF_SWAPIN);
 	if (!page)
 		page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
 					 vmf->address);
 	if (!page) {
-		if (vma_readahead)
-			page = do_swap_page_readahead(entry,
-				GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
-		else
-			page = swapin_readahead(entry,
-				GFP_HIGHUSER_MOVABLE, vma, vmf->address);
+		struct swap_info_struct *si = swp_swap_info(entry);
+
+		if (!(si->flags & SWP_SYNCHRONOUS_IO)) {
+			if (vma_readahead)
+				page = do_swap_page_readahead(entry,
+					GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
+			else
+				page = swapin_readahead(entry,
+					GFP_HIGHUSER_MOVABLE, vma, vmf->address);
+			swapcache = page;
+		} else {
+			/* skip swapcache */
+			page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
+			if (page) {
+				__SetPageLocked(page);
+				__SetPageSwapBacked(page);
+				set_page_private(page, entry.val);
+				lru_cache_add_anon(page);
+				swap_readpage(page, true);
+			}
+		}
+
 		if (!page) {
 			/*
 			 * Back out if somebody else faulted in this pte
@@ -2920,7 +2938,6 @@ int do_swap_page(struct vm_fault *vmf)
 		goto out_release;
 	}
 
-	swapcache = page;
 	locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags);
 
 	delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
@@ -2935,7 +2952,8 @@ int do_swap_page(struct vm_fault *vmf)
 	 * test below, are not enough to exclude that.  Even if it is still
 	 * swapcache, we need to check that the page's swap has not changed.
 	 */
-	if (unlikely(!PageSwapCache(page) || page_private(page) != entry.val))
+	if (unlikely((!PageSwapCache(page) ||
+			page_private(page) != entry.val)) && swapcache)
 		goto out_page;
 
 	page = ksm_might_need_to_copy(page, vma, vmf->address);
@@ -2988,14 +3006,16 @@ int do_swap_page(struct vm_fault *vmf)
 		pte = pte_mksoft_dirty(pte);
 	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
 	vmf->orig_pte = pte;
-	if (page == swapcache) {
-		do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
-		mem_cgroup_commit_charge(page, memcg, true, false);
-		activate_page(page);
-	} else { /* ksm created a completely new copy */
+
+	/* ksm created a completely new copy */
+	if (unlikely(page != swapcache && swapcache)) {
 		page_add_new_anon_rmap(page, vma, vmf->address, false);
 		mem_cgroup_commit_charge(page, memcg, false, false);
 		lru_cache_add_active_or_unevictable(page, vma);
+	} else {
+		do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
+		mem_cgroup_commit_charge(page, memcg, true, false);
+		activate_page(page);
 	}
 
 	swap_free(entry);
@@ -3003,7 +3023,7 @@ int do_swap_page(struct vm_fault *vmf)
 	    (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
 		try_to_free_swap(page);
 	unlock_page(page);
-	if (page != swapcache) {
+	if (page != swapcache && swapcache) {
 		/*
 		 * Hold the lock to avoid the swap entry to be reused
 		 * until we take the PT lock for the pte_same() check
@@ -3036,7 +3056,7 @@ int do_swap_page(struct vm_fault *vmf)
 	unlock_page(page);
 out_release:
 	put_page(page);
-	if (page != swapcache) {
+	if (page != swapcache && swapcache) {
 		unlock_page(swapcache);
 		put_page(swapcache);
 	}
diff --git a/mm/page_io.c b/mm/page_io.c
index 21502d341a67..d4a98e1f6608 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -346,7 +346,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 	return ret;
 }
 
-int swap_readpage(struct page *page, bool do_poll)
+int swap_readpage(struct page *page, bool synchronous)
 {
 	struct bio *bio;
 	int ret = 0;
@@ -354,7 +354,7 @@ int swap_readpage(struct page *page, bool do_poll)
 	blk_qc_t qc;
 	struct gendisk *disk;
 
-	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
+	VM_BUG_ON_PAGE(!PageSwapCache(page) && !synchronous, page);
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	VM_BUG_ON_PAGE(PageUptodate(page), page);
 	if (frontswap_load(page) == 0) {
@@ -402,7 +402,7 @@ int swap_readpage(struct page *page, bool do_poll)
 	count_vm_event(PSWPIN);
 	bio_get(bio);
 	qc = submit_bio(bio);
-	while (do_poll) {
+	while (synchronous) {
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		if (!READ_ONCE(bio->bi_private))
 			break;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 1305591cde4d..64a3d85226ba 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3454,10 +3454,15 @@ int swapcache_prepare(swp_entry_t entry)
 	return __swap_duplicate(entry, SWAP_HAS_CACHE);
 }
 
+struct swap_info_struct *swp_swap_info(swp_entry_t entry)
+{
+	return swap_info[swp_type(entry)];
+}
+
 struct swap_info_struct *page_swap_info(struct page *page)
 {
-	swp_entry_t swap = { .val = page_private(page) };
-	return swap_info[swp_type(swap)];
+	swp_entry_t entry = { .val = page_private(page) };
+	return swp_swap_info(entry);
 }
 
 /*
@@ -3465,7 +3470,6 @@ struct swap_info_struct *page_swap_info(struct page *page)
  */
 struct address_space *__page_file_mapping(struct page *page)
 {
-	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
 	return page_swap_info(page)->swap_file->f_mapping;
 }
 EXPORT_SYMBOL_GPL(__page_file_mapping);
@@ -3473,7 +3477,6 @@ EXPORT_SYMBOL_GPL(__page_file_mapping);
 pgoff_t __page_file_index(struct page *page)
 {
 	swp_entry_t swap = { .val = page_private(page) };
-	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
 	return swp_offset(swap);
 }
 EXPORT_SYMBOL_GPL(__page_file_index);
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [PATCH v2 4/4] mm:swap: skip swapcache for swapin of synchronous device
@ 2017-09-19  7:10   ` Minchan Kim
  0 siblings, 0 replies; 20+ messages in thread
From: Minchan Kim @ 2017-09-19  7:10 UTC (permalink / raw)
  To: Andrew Morton
  Cc: kernel-team, linux-kernel, linux-mm, Minchan Kim, Dan Williams,
	Ross Zwisler, Hugh Dickins

With fast swap storage, platform want to use swap more aggressively
and swap-in is crucial to application latency.

The rw_page based synchronous devices like zram, pmem and btt are such
fast storage. When I profile swapin performance with zram lz4 decompress
test, S/W overhead is more than 70%. Maybe, it would be bigger in nvdimm.

This patch aims for reducing swap-in latency via skipping swapcache
if swap device is synchronous device like rw_page based device.
It enhances 45% my swapin test(5G sequential swapin, no readahead,
from 2.41sec to 1.64sec).

Cc: Dan Williams <dan.j.williams@intel.com>
Cc: Ross Zwisler <ross.zwisler@linux.intel.com>
Cc: Hugh Dickins <hughd@google.com>
Signed-off-by: Minchan Kim <minchan@kernel.org>
---
 include/linux/swap.h | 11 +++++++++++
 mm/memory.c          | 52 ++++++++++++++++++++++++++++++++++++----------------
 mm/page_io.c         |  6 +++---
 mm/swapfile.c        | 11 +++++++----
 4 files changed, 57 insertions(+), 23 deletions(-)

diff --git a/include/linux/swap.h b/include/linux/swap.h
index fbb33919d1c6..cd2f66fdfc2d 100644
--- a/include/linux/swap.h
+++ b/include/linux/swap.h
@@ -461,6 +461,7 @@ extern int page_swapcount(struct page *);
 extern int __swp_swapcount(swp_entry_t entry);
 extern int swp_swapcount(swp_entry_t entry);
 extern struct swap_info_struct *page_swap_info(struct page *);
+extern struct swap_info_struct *swp_swap_info(swp_entry_t entry);
 extern bool reuse_swap_page(struct page *, int *);
 extern int try_to_free_swap(struct page *);
 struct backing_dev_info;
@@ -469,6 +470,16 @@ extern void exit_swap_address_space(unsigned int type);
 
 #else /* CONFIG_SWAP */
 
+static inline int swap_readpage(struct page *page, bool do_poll)
+{
+	return 0;
+}
+
+static inline struct swap_info_struct *swp_swap_info(swp_entry_t entry)
+{
+	return NULL;
+}
+
 #define swap_address_space(entry)		(NULL)
 #define get_nr_swap_pages()			0L
 #define total_swap_pages			0L
diff --git a/mm/memory.c b/mm/memory.c
index ec4e15494901..163ab2062385 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -2842,7 +2842,7 @@ EXPORT_SYMBOL(unmap_mapping_range);
 int do_swap_page(struct vm_fault *vmf)
 {
 	struct vm_area_struct *vma = vmf->vma;
-	struct page *page = NULL, *swapcache;
+	struct page *page = NULL, *swapcache = NULL;
 	struct mem_cgroup *memcg;
 	struct vma_swap_readahead swap_ra;
 	swp_entry_t entry;
@@ -2881,17 +2881,35 @@ int do_swap_page(struct vm_fault *vmf)
 		}
 		goto out;
 	}
+
+
 	delayacct_set_flag(DELAYACCT_PF_SWAPIN);
 	if (!page)
 		page = lookup_swap_cache(entry, vma_readahead ? vma : NULL,
 					 vmf->address);
 	if (!page) {
-		if (vma_readahead)
-			page = do_swap_page_readahead(entry,
-				GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
-		else
-			page = swapin_readahead(entry,
-				GFP_HIGHUSER_MOVABLE, vma, vmf->address);
+		struct swap_info_struct *si = swp_swap_info(entry);
+
+		if (!(si->flags & SWP_SYNCHRONOUS_IO)) {
+			if (vma_readahead)
+				page = do_swap_page_readahead(entry,
+					GFP_HIGHUSER_MOVABLE, vmf, &swap_ra);
+			else
+				page = swapin_readahead(entry,
+					GFP_HIGHUSER_MOVABLE, vma, vmf->address);
+			swapcache = page;
+		} else {
+			/* skip swapcache */
+			page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma, vmf->address);
+			if (page) {
+				__SetPageLocked(page);
+				__SetPageSwapBacked(page);
+				set_page_private(page, entry.val);
+				lru_cache_add_anon(page);
+				swap_readpage(page, true);
+			}
+		}
+
 		if (!page) {
 			/*
 			 * Back out if somebody else faulted in this pte
@@ -2920,7 +2938,6 @@ int do_swap_page(struct vm_fault *vmf)
 		goto out_release;
 	}
 
-	swapcache = page;
 	locked = lock_page_or_retry(page, vma->vm_mm, vmf->flags);
 
 	delayacct_clear_flag(DELAYACCT_PF_SWAPIN);
@@ -2935,7 +2952,8 @@ int do_swap_page(struct vm_fault *vmf)
 	 * test below, are not enough to exclude that.  Even if it is still
 	 * swapcache, we need to check that the page's swap has not changed.
 	 */
-	if (unlikely(!PageSwapCache(page) || page_private(page) != entry.val))
+	if (unlikely((!PageSwapCache(page) ||
+			page_private(page) != entry.val)) && swapcache)
 		goto out_page;
 
 	page = ksm_might_need_to_copy(page, vma, vmf->address);
@@ -2988,14 +3006,16 @@ int do_swap_page(struct vm_fault *vmf)
 		pte = pte_mksoft_dirty(pte);
 	set_pte_at(vma->vm_mm, vmf->address, vmf->pte, pte);
 	vmf->orig_pte = pte;
-	if (page == swapcache) {
-		do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
-		mem_cgroup_commit_charge(page, memcg, true, false);
-		activate_page(page);
-	} else { /* ksm created a completely new copy */
+
+	/* ksm created a completely new copy */
+	if (unlikely(page != swapcache && swapcache)) {
 		page_add_new_anon_rmap(page, vma, vmf->address, false);
 		mem_cgroup_commit_charge(page, memcg, false, false);
 		lru_cache_add_active_or_unevictable(page, vma);
+	} else {
+		do_page_add_anon_rmap(page, vma, vmf->address, exclusive);
+		mem_cgroup_commit_charge(page, memcg, true, false);
+		activate_page(page);
 	}
 
 	swap_free(entry);
@@ -3003,7 +3023,7 @@ int do_swap_page(struct vm_fault *vmf)
 	    (vma->vm_flags & VM_LOCKED) || PageMlocked(page))
 		try_to_free_swap(page);
 	unlock_page(page);
-	if (page != swapcache) {
+	if (page != swapcache && swapcache) {
 		/*
 		 * Hold the lock to avoid the swap entry to be reused
 		 * until we take the PT lock for the pte_same() check
@@ -3036,7 +3056,7 @@ int do_swap_page(struct vm_fault *vmf)
 	unlock_page(page);
 out_release:
 	put_page(page);
-	if (page != swapcache) {
+	if (page != swapcache && swapcache) {
 		unlock_page(swapcache);
 		put_page(swapcache);
 	}
diff --git a/mm/page_io.c b/mm/page_io.c
index 21502d341a67..d4a98e1f6608 100644
--- a/mm/page_io.c
+++ b/mm/page_io.c
@@ -346,7 +346,7 @@ int __swap_writepage(struct page *page, struct writeback_control *wbc,
 	return ret;
 }
 
-int swap_readpage(struct page *page, bool do_poll)
+int swap_readpage(struct page *page, bool synchronous)
 {
 	struct bio *bio;
 	int ret = 0;
@@ -354,7 +354,7 @@ int swap_readpage(struct page *page, bool do_poll)
 	blk_qc_t qc;
 	struct gendisk *disk;
 
-	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
+	VM_BUG_ON_PAGE(!PageSwapCache(page) && !synchronous, page);
 	VM_BUG_ON_PAGE(!PageLocked(page), page);
 	VM_BUG_ON_PAGE(PageUptodate(page), page);
 	if (frontswap_load(page) == 0) {
@@ -402,7 +402,7 @@ int swap_readpage(struct page *page, bool do_poll)
 	count_vm_event(PSWPIN);
 	bio_get(bio);
 	qc = submit_bio(bio);
-	while (do_poll) {
+	while (synchronous) {
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		if (!READ_ONCE(bio->bi_private))
 			break;
diff --git a/mm/swapfile.c b/mm/swapfile.c
index 1305591cde4d..64a3d85226ba 100644
--- a/mm/swapfile.c
+++ b/mm/swapfile.c
@@ -3454,10 +3454,15 @@ int swapcache_prepare(swp_entry_t entry)
 	return __swap_duplicate(entry, SWAP_HAS_CACHE);
 }
 
+struct swap_info_struct *swp_swap_info(swp_entry_t entry)
+{
+	return swap_info[swp_type(entry)];
+}
+
 struct swap_info_struct *page_swap_info(struct page *page)
 {
-	swp_entry_t swap = { .val = page_private(page) };
-	return swap_info[swp_type(swap)];
+	swp_entry_t entry = { .val = page_private(page) };
+	return swp_swap_info(entry);
 }
 
 /*
@@ -3465,7 +3470,6 @@ struct swap_info_struct *page_swap_info(struct page *page)
  */
 struct address_space *__page_file_mapping(struct page *page)
 {
-	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
 	return page_swap_info(page)->swap_file->f_mapping;
 }
 EXPORT_SYMBOL_GPL(__page_file_mapping);
@@ -3473,7 +3477,6 @@ EXPORT_SYMBOL_GPL(__page_file_mapping);
 pgoff_t __page_file_index(struct page *page)
 {
 	swp_entry_t swap = { .val = page_private(page) };
-	VM_BUG_ON_PAGE(!PageSwapCache(page), page);
 	return swp_offset(swap);
 }
 EXPORT_SYMBOL_GPL(__page_file_index);
-- 
2.7.4

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2017-10-10  1:10 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-09-20  5:43 [PATCH v2 0/4] skip swapcache for super fast device Minchan Kim
2017-09-20  5:43 ` Minchan Kim
2017-09-20  5:43 ` [PATCH v2 1/4] zram: set BDI_CAP_STABLE_WRITES once Minchan Kim
2017-09-20  5:43   ` Minchan Kim
2017-09-20  5:43 ` [PATCH v2 2/4] bdi: introduce BDI_CAP_SYNCHRONOUS_IO Minchan Kim
2017-09-20  5:43   ` Minchan Kim
2017-09-20  5:43 ` [PATCH v2 3/4] mm:swap: introduce SWP_SYNCHRONOUS_IO Minchan Kim
2017-09-20  5:43   ` Minchan Kim
2017-09-20  5:43 ` [PATCH v2 4/4] mm:swap: skip swapcache for swapin of synchronous device Minchan Kim
2017-09-20  5:43   ` Minchan Kim
2017-09-29  8:51   ` huang ying
2017-09-29  8:51     ` huang ying
2017-10-09  1:26     ` huang ying
2017-10-09  1:26       ` huang ying
2017-10-10  0:34     ` Minchan Kim
2017-10-10  0:34       ` Minchan Kim
2017-10-10  1:10       ` Huang, Ying
2017-10-10  1:10         ` Huang, Ying
  -- strict thread matches above, loose matches on Subject: below --
2017-09-19  7:09 [PATCH v2 0/4] skip swapcache for super fast device Minchan Kim
2017-09-19  7:10 ` [PATCH v2 4/4] mm:swap: skip swapcache for swapin of synchronous device Minchan Kim
2017-09-19  7:10   ` Minchan Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.