All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC v2 0/5] introduce gcma
@ 2015-02-23 19:54 ` SeongJae Park
  0 siblings, 0 replies; 20+ messages in thread
From: SeongJae Park @ 2015-02-23 19:54 UTC (permalink / raw)
  To: akpm
  Cc: lauraa, minchan, sergey.senozhatsky, linux-mm, linux-kernel,
	SeongJae Park

This RFC patchset is based on linux v3.18 and available on git:
git://github.com/sjp38/linux.gcma -b gcma/rfc/v2

Abstract
========

Current cma(contiguous memory allocator) could not guarantee success and fast
latency of contiguous allocation.
This coverletter explains about the problem in detail and proposes new
contiguous memory allocator, gcma(guaranteed contiguous allocator).



CMA: Contiguous Memory Allocator
================================


Basic idea of cma
-----------------

Basic idea of cma is as follows. It focuses on memory efficiency while keeping
contiguous allocation could be done without serious penalty.

 - Reserves large contiguous memory area during boot and let the area could be
   used by contiguous allocation.
 - Because system memory could be inefficient if the reserved memory is not
   fully utilized by contiguous allocation, let the area could be allocated for
   2nd-class clients.
 - If pages being allocated for 2nd-class clients are necessary for contiguous
   allocation(doubtless 1st class client), migrates or discard the page and use
   them for contiguous allocation.

In cma, _2nd-class client_ is movable page. The reserved area could be
allocated for movable pages and the movable pages be migrated or discarded if
contiguous allocation needs them.


Problem of cma
--------------

The cma mechanism imposes following weaknesses.

1. Allocation failure
CMA could fail to allocate contiguous memory due to following reasons.
1-1. Direct pinning
Any kernel thread could pin any movable pages for a long time. If a movable
page which needs to be migrated for a contiguous memory allocation is already
pinned by someone, migration could not be completed. In consequence, contiguous
allocation could be fail if the page is not be unpinned longtime.
1-2. Indirect pin
If a movable page have dependency with an object, the object would increase
reference count of the movable page to assert it is safe to use the page. If a
movable page which is needs to be migrated for a contiguous memory allocation
is in the case, the page could not be free to be used by contiguous allocation.
In consequence, contiguous allocation could be failed.

2. High cost
Contiguous memory allocation of CMA could be expensive by following reasons.
2-1. Function overhead
Most of all, migration itself is not so simple. It should manipulate rmap and
copy content of the pages into another pages. It could require relatively long
time.
After migration, migrated pages be inserted in head of LRU page list again
though it was not be used, just migrated. In that case, the pages on LRU list
is not ordered in LRU degree. In consequence, system performance could be
degraded because working set pages could be swapped-out by the abnormal LRU
list.
2-2. Writeback cost
If the page which needs to be discarded for contiguous memory allocation was
dirty, it should be written-back to mapped file. Latency of write-back is
usually not predictably high because it depends on not only memory management,
but also block layer, file system and block device h/w characteristic.

In short, cma doesn't guarantee success and fast latency of contiguous
memory allocation. And, the main reason is the fact that cma chosen 2nd-class
client(movable pages) was not nice(hard to migrate / discard) enough.

The problem was discussed in detail from [1] and [2].



GCMA: Guaranteed contiguous memory allocator
============================================

Goal of gcma is to solve those two weaknesses of cma discussed above.
In other words, gcma aims to guarantee success and fast latency of contiguous
memory allocation.


Basic idea
----------

Basic idea of gcma is as same as cma's. It reserves large contiguous memory
area during boot and use it for contiguous memory allocator while let it be
allocated for 2nd-class clients to keep memory efficiency. If the pages
allocated for 2nd-class clients are need for contiguous allocation(doubtless
1st-class client), discard or migrate them.

Difference with cma is choice and operation of 2nd-class client. In gcma,
2nd-class client should allocate pages from the reserved area only if the
allocated pages mets following conditions.

1. Out of Kernel
If a page is out of kernel scope, the page could be handled by the 2nd-class
client only and no others could see, touch or hold it. Those pages could be
discarded anytime. In consequence, contiguous allocation could not be fail if
2nd-class client cooperates well.
2. Quickly discardable or migratable
The pages being used by 2nd-class client should be Quickly discardable of
migratable. If so, the contiguous allocation could guarantee fast latency.

With above conditions, we picked 2 candidates for gcma 2nd-class clients.
Cleancache and Frontswap are them.


Cleancache
----------

1. Out of Kernel
Pages inside clean cache is clean pages evicted from page cache, which means
out of kernel.

2. Quickly discardable or migratable
Because pages inside clean cache is clean, it could be free immediately without
any additional operation.


Frontswap backend
-----------------

1. Out of Kernel
Pages inside frontswap backend is swapped-out pages, which are out of kernel.

2. Quickly discardable or migratable
Pages inside frontswap backend could be discarded using following policies.
2.1. Write-back
After the pages written-back containing data to backed real swap device, the
page could be free without any interference.
In this policy, latency of write-back operation could be bounded to swap device
write speed.
2.2. Write-through
Frontswap could be run with write-through mode. In this case, any pages in
frontswap backend could be free immediately because the data is already in swap
device.
This policy could show very fast speed but could make whole system slow due to
frequent write-through. In flash storage based system, it could cause the
storage system failure unless it do wear-leveling on swap device.
2.3. Put-back
When pages inside frontswap backend need to be discarded, gcma could allocates
pages from system memroy(not reserved memory) and copy content of discarding
pages into newly allocated page. After that, put those newly-allocated, data
copied pages inside swap cache to let them in frontswap backend again. After
that, the discarding pages are free.
Because it do only memory-operation, speed would not be too slow. We call the
operation as _put-back_.



Current RFC implementation
==========================

Though we suggested various policies for frontswap pages discarding,
current RFC implementation uses only write-through policy frontswap naively 
because this is a prototype for various comments from reviewers.

At the moment, current naive implementation is as follows:
1) Reserves large amount of memory during boot.
2) Allow the memory to cleancache, write-through mode frontswap and
   contiguous memory allocation.
3) Drain pages being used for the cleancache, frontswap if contiguous memroy
   allocation needs.

As discussed above, this implementation could introduces clear trade-off:
1) System performance could be degraded due to write-through mode
2) Flash storage using system should worry about wear-leveling

Configuring swap device using zram could be helpful to alleviate those problems
though the trade-off still exists.

Basic concept, implementation, and performance evaluation result were presented
in detail at [2].



Disclaimer
==========

Because cma and gcma has clear merits and demerits, gcma aims to be coexists
with cma rather than alternates it. Users could operate cma and gcma on a
system concurrently and could use them as they need.



Performance Evaluation
======================


Machine Setting
---------------

CuBox i4 Pro
 - ARM v7, 4 * 1 GHz cores
 - 800 MiB DDR3 RAM (Originally 2 GiB equipped.)
 - Class 10 SanDisk 16 GiB microSD card


Evaluation Variants
-------------------

 - Baseline:	Linux v3.17, 128 MiB swap
 - cma:		Baseline + 256 MiB CMA area
 - gcma:	Baseline + 256 MiB GCMA area
 - gcma.zram:	GCMA + 128 MiB zram swap device


Workloads
---------

 - Background workload: `make defconfig && time make -j 16` with Linux v3.12.6
 - Foreground workload: Request 1-32000 contiguous page allocation 32 times


Evaluation Result
-----------------

[ Latency (u-seconds) ]
Results below shows gcma's latency is significantly lower than cma's. Note that
cma max latency reaches more than 4 seconds easily.

		cma			gcma			gcma.zram
nr_pages	min	max	avg	min	max	avg	min	max	avg
1		381	2853	684	13	279	26	14	274	101
512		818	2382	1162	512	10140	2002	510	648	600
1024		3113	184703	12303	1016	17426	3495	1014	1284	1192
2048		2545	790727	33084	2029	27027	5983	2142	3781	2829
4096		2899	2768349	298640	4087	86887	13091	4101	6046	4780
8192		4476	3496211	407076	8254	75976	17519	9888	11386	10580
16384		7266	4132546	657603	16398	98087	30474	21079	23641	22491
32000		8612	3910423	641340	32328	92502	44675	44859	654966	249453


[ System performance ]
Background workload(kernel build) result measured to evaluate system
performance degradation cma / gcma affects.
original means background workload result on CMA configuration without
foreground(contiguous allocation) workload.

Ran workload 5 times and measured average of user / system / elapsed time and
cpu utilization percentage. Result are as below:

		user		system		elapsed		cpu
original	1675.388	167.702		507.738		362.4
cma		1707.902	172.184		523.738		358.4
gcma		1677.492	170.016		515.042		358.2
gcma.zram	1678.104	166.992		513.622		358.6


cma and gcma degraded system performance due to page migration / write-through
and affected kernel build workload performance while gcma with zram swap device
shows alleviated performance degradation.


[ Evaluation result summary ]
With performance evaluation results above, we can say,
1. latency of gcma is significantly lower then cma's.
2. gcma degrade system performance though zram swap device configuration can
   abbreviate the effect a little.



Acknowledgement
===============

Really appreciate Minchan who suggested main idea and have helped a lot
during development with code fix/review.


[1] https://lkml.org/lkml/2013/10/30/16
[2] http://sched.co/1qZcBAO


Changes in v2:
 - Discardable memory abstraction
 - Cleancache implementation


SeongJae Park (5):
  gcma: introduce contiguous memory allocator
  gcma: utilize reserved memory as discardable memory
  gcma: adopt cleancache and frontswap as second-class clients
  gcma: export statistical data on debugfs
  gcma: integrate gcma under cma interface

 include/linux/cma.h  |    4 +
 include/linux/gcma.h |   64 +++
 mm/Kconfig           |   24 +
 mm/Makefile          |    1 +
 mm/cma.c             |  113 ++++-
 mm/gcma.c            | 1321 ++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 1508 insertions(+), 19 deletions(-)
 create mode 100644 include/linux/gcma.h
 create mode 100644 mm/gcma.c

-- 
1.9.1


^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC v2 0/5] introduce gcma
@ 2015-02-23 19:54 ` SeongJae Park
  0 siblings, 0 replies; 20+ messages in thread
From: SeongJae Park @ 2015-02-23 19:54 UTC (permalink / raw)
  To: akpm
  Cc: lauraa, minchan, sergey.senozhatsky, linux-mm, linux-kernel,
	SeongJae Park

This RFC patchset is based on linux v3.18 and available on git:
git://github.com/sjp38/linux.gcma -b gcma/rfc/v2

Abstract
========

Current cma(contiguous memory allocator) could not guarantee success and fast
latency of contiguous allocation.
This coverletter explains about the problem in detail and proposes new
contiguous memory allocator, gcma(guaranteed contiguous allocator).



CMA: Contiguous Memory Allocator
================================


Basic idea of cma
-----------------

Basic idea of cma is as follows. It focuses on memory efficiency while keeping
contiguous allocation could be done without serious penalty.

 - Reserves large contiguous memory area during boot and let the area could be
   used by contiguous allocation.
 - Because system memory could be inefficient if the reserved memory is not
   fully utilized by contiguous allocation, let the area could be allocated for
   2nd-class clients.
 - If pages being allocated for 2nd-class clients are necessary for contiguous
   allocation(doubtless 1st class client), migrates or discard the page and use
   them for contiguous allocation.

In cma, _2nd-class client_ is movable page. The reserved area could be
allocated for movable pages and the movable pages be migrated or discarded if
contiguous allocation needs them.


Problem of cma
--------------

The cma mechanism imposes following weaknesses.

1. Allocation failure
CMA could fail to allocate contiguous memory due to following reasons.
1-1. Direct pinning
Any kernel thread could pin any movable pages for a long time. If a movable
page which needs to be migrated for a contiguous memory allocation is already
pinned by someone, migration could not be completed. In consequence, contiguous
allocation could be fail if the page is not be unpinned longtime.
1-2. Indirect pin
If a movable page have dependency with an object, the object would increase
reference count of the movable page to assert it is safe to use the page. If a
movable page which is needs to be migrated for a contiguous memory allocation
is in the case, the page could not be free to be used by contiguous allocation.
In consequence, contiguous allocation could be failed.

2. High cost
Contiguous memory allocation of CMA could be expensive by following reasons.
2-1. Function overhead
Most of all, migration itself is not so simple. It should manipulate rmap and
copy content of the pages into another pages. It could require relatively long
time.
After migration, migrated pages be inserted in head of LRU page list again
though it was not be used, just migrated. In that case, the pages on LRU list
is not ordered in LRU degree. In consequence, system performance could be
degraded because working set pages could be swapped-out by the abnormal LRU
list.
2-2. Writeback cost
If the page which needs to be discarded for contiguous memory allocation was
dirty, it should be written-back to mapped file. Latency of write-back is
usually not predictably high because it depends on not only memory management,
but also block layer, file system and block device h/w characteristic.

In short, cma doesn't guarantee success and fast latency of contiguous
memory allocation. And, the main reason is the fact that cma chosen 2nd-class
client(movable pages) was not nice(hard to migrate / discard) enough.

The problem was discussed in detail from [1] and [2].



GCMA: Guaranteed contiguous memory allocator
============================================

Goal of gcma is to solve those two weaknesses of cma discussed above.
In other words, gcma aims to guarantee success and fast latency of contiguous
memory allocation.


Basic idea
----------

Basic idea of gcma is as same as cma's. It reserves large contiguous memory
area during boot and use it for contiguous memory allocator while let it be
allocated for 2nd-class clients to keep memory efficiency. If the pages
allocated for 2nd-class clients are need for contiguous allocation(doubtless
1st-class client), discard or migrate them.

Difference with cma is choice and operation of 2nd-class client. In gcma,
2nd-class client should allocate pages from the reserved area only if the
allocated pages mets following conditions.

1. Out of Kernel
If a page is out of kernel scope, the page could be handled by the 2nd-class
client only and no others could see, touch or hold it. Those pages could be
discarded anytime. In consequence, contiguous allocation could not be fail if
2nd-class client cooperates well.
2. Quickly discardable or migratable
The pages being used by 2nd-class client should be Quickly discardable of
migratable. If so, the contiguous allocation could guarantee fast latency.

With above conditions, we picked 2 candidates for gcma 2nd-class clients.
Cleancache and Frontswap are them.


Cleancache
----------

1. Out of Kernel
Pages inside clean cache is clean pages evicted from page cache, which means
out of kernel.

2. Quickly discardable or migratable
Because pages inside clean cache is clean, it could be free immediately without
any additional operation.


Frontswap backend
-----------------

1. Out of Kernel
Pages inside frontswap backend is swapped-out pages, which are out of kernel.

2. Quickly discardable or migratable
Pages inside frontswap backend could be discarded using following policies.
2.1. Write-back
After the pages written-back containing data to backed real swap device, the
page could be free without any interference.
In this policy, latency of write-back operation could be bounded to swap device
write speed.
2.2. Write-through
Frontswap could be run with write-through mode. In this case, any pages in
frontswap backend could be free immediately because the data is already in swap
device.
This policy could show very fast speed but could make whole system slow due to
frequent write-through. In flash storage based system, it could cause the
storage system failure unless it do wear-leveling on swap device.
2.3. Put-back
When pages inside frontswap backend need to be discarded, gcma could allocates
pages from system memroy(not reserved memory) and copy content of discarding
pages into newly allocated page. After that, put those newly-allocated, data
copied pages inside swap cache to let them in frontswap backend again. After
that, the discarding pages are free.
Because it do only memory-operation, speed would not be too slow. We call the
operation as _put-back_.



Current RFC implementation
==========================

Though we suggested various policies for frontswap pages discarding,
current RFC implementation uses only write-through policy frontswap naively 
because this is a prototype for various comments from reviewers.

At the moment, current naive implementation is as follows:
1) Reserves large amount of memory during boot.
2) Allow the memory to cleancache, write-through mode frontswap and
   contiguous memory allocation.
3) Drain pages being used for the cleancache, frontswap if contiguous memroy
   allocation needs.

As discussed above, this implementation could introduces clear trade-off:
1) System performance could be degraded due to write-through mode
2) Flash storage using system should worry about wear-leveling

Configuring swap device using zram could be helpful to alleviate those problems
though the trade-off still exists.

Basic concept, implementation, and performance evaluation result were presented
in detail at [2].



Disclaimer
==========

Because cma and gcma has clear merits and demerits, gcma aims to be coexists
with cma rather than alternates it. Users could operate cma and gcma on a
system concurrently and could use them as they need.



Performance Evaluation
======================


Machine Setting
---------------

CuBox i4 Pro
 - ARM v7, 4 * 1 GHz cores
 - 800 MiB DDR3 RAM (Originally 2 GiB equipped.)
 - Class 10 SanDisk 16 GiB microSD card


Evaluation Variants
-------------------

 - Baseline:	Linux v3.17, 128 MiB swap
 - cma:		Baseline + 256 MiB CMA area
 - gcma:	Baseline + 256 MiB GCMA area
 - gcma.zram:	GCMA + 128 MiB zram swap device


Workloads
---------

 - Background workload: `make defconfig && time make -j 16` with Linux v3.12.6
 - Foreground workload: Request 1-32000 contiguous page allocation 32 times


Evaluation Result
-----------------

[ Latency (u-seconds) ]
Results below shows gcma's latency is significantly lower than cma's. Note that
cma max latency reaches more than 4 seconds easily.

		cma			gcma			gcma.zram
nr_pages	min	max	avg	min	max	avg	min	max	avg
1		381	2853	684	13	279	26	14	274	101
512		818	2382	1162	512	10140	2002	510	648	600
1024		3113	184703	12303	1016	17426	3495	1014	1284	1192
2048		2545	790727	33084	2029	27027	5983	2142	3781	2829
4096		2899	2768349	298640	4087	86887	13091	4101	6046	4780
8192		4476	3496211	407076	8254	75976	17519	9888	11386	10580
16384		7266	4132546	657603	16398	98087	30474	21079	23641	22491
32000		8612	3910423	641340	32328	92502	44675	44859	654966	249453


[ System performance ]
Background workload(kernel build) result measured to evaluate system
performance degradation cma / gcma affects.
original means background workload result on CMA configuration without
foreground(contiguous allocation) workload.

Ran workload 5 times and measured average of user / system / elapsed time and
cpu utilization percentage. Result are as below:

		user		system		elapsed		cpu
original	1675.388	167.702		507.738		362.4
cma		1707.902	172.184		523.738		358.4
gcma		1677.492	170.016		515.042		358.2
gcma.zram	1678.104	166.992		513.622		358.6


cma and gcma degraded system performance due to page migration / write-through
and affected kernel build workload performance while gcma with zram swap device
shows alleviated performance degradation.


[ Evaluation result summary ]
With performance evaluation results above, we can say,
1. latency of gcma is significantly lower then cma's.
2. gcma degrade system performance though zram swap device configuration can
   abbreviate the effect a little.



Acknowledgement
===============

Really appreciate Minchan who suggested main idea and have helped a lot
during development with code fix/review.


[1] https://lkml.org/lkml/2013/10/30/16
[2] http://sched.co/1qZcBAO


Changes in v2:
 - Discardable memory abstraction
 - Cleancache implementation


SeongJae Park (5):
  gcma: introduce contiguous memory allocator
  gcma: utilize reserved memory as discardable memory
  gcma: adopt cleancache and frontswap as second-class clients
  gcma: export statistical data on debugfs
  gcma: integrate gcma under cma interface

 include/linux/cma.h  |    4 +
 include/linux/gcma.h |   64 +++
 mm/Kconfig           |   24 +
 mm/Makefile          |    1 +
 mm/cma.c             |  113 ++++-
 mm/gcma.c            | 1321 ++++++++++++++++++++++++++++++++++++++++++++++++++
 6 files changed, 1508 insertions(+), 19 deletions(-)
 create mode 100644 include/linux/gcma.h
 create mode 100644 mm/gcma.c

-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* [RFC v2 1/5] gcma: introduce contiguous memory allocator
  2015-02-23 19:54 ` SeongJae Park
@ 2015-02-23 19:54   ` SeongJae Park
  -1 siblings, 0 replies; 20+ messages in thread
From: SeongJae Park @ 2015-02-23 19:54 UTC (permalink / raw)
  To: akpm
  Cc: lauraa, minchan, sergey.senozhatsky, linux-mm, linux-kernel,
	SeongJae Park

Currently, cma reserves large contiguous memory area during early boot
and let the area could be used by others for movable pages only. Then,
if those movable pages are necessary for contiguous memory allocation,
cma migrates and / or discards them out.

This mechanism have two weakness.
1) Because any one in kernel could pin any movable pages, migration of
movable pages could be fail. It could lead to contiguous memory
allocation failure.
2) Because of migration / reclaim overhead, the latency could be
extremely high.
In short, cma doesn't guarantee success and fast latency of contiguous
memory allocation. The problem was discussed in detail from [1] and [2].

This patch introduces a simple contiguous memory allocator, namely,
GCMA(Guaranteed Contiguous Memory Allocator). It aims to guarantee
success and fast latency by reserving memory during early boot time.
However, this simple mechanism could degrade system memory space
efficiency seriously; following commits will settle the problem.

[1] https://lkml.org/lkml/2013/10/30/16
[2] http://sched.co/1qZcBAO

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
 include/linux/gcma.h |  26 ++++++++
 mm/gcma.c            | 173 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 199 insertions(+)
 create mode 100644 include/linux/gcma.h
 create mode 100644 mm/gcma.c

diff --git a/include/linux/gcma.h b/include/linux/gcma.h
new file mode 100644
index 0000000..cda481f
--- /dev/null
+++ b/include/linux/gcma.h
@@ -0,0 +1,26 @@
+/*
+ * gcma.h - Guaranteed Contiguous Memory Allocator
+ *
+ * GCMA aims for contiguous memory allocation with success and fast
+ * latency guarantee.
+ * It reserves large amount of memory and let it be allocated to
+ * contiguous memory requests.
+ *
+ * Copyright (C) 2014  LG Electronics Inc.,
+ * Copyright (C) 2014  Minchan Kim <minchan@kernel.org>
+ * Copyright (C) 2014-2015  SeongJae Park <sj38.park@gmail.com>
+ */
+
+#ifndef _LINUX_GCMA_H
+#define _LINUX_GCMA_H
+
+struct gcma;
+
+int gcma_init(unsigned long start_pfn, unsigned long size,
+	      struct gcma **res_gcma);
+int gcma_alloc_contig(struct gcma *gcma,
+		      unsigned long start_pfn, unsigned long size);
+void gcma_free_contig(struct gcma *gcma,
+		      unsigned long start_pfn, unsigned long size);
+
+#endif /* _LINUX_GCMA_H */
diff --git a/mm/gcma.c b/mm/gcma.c
new file mode 100644
index 0000000..3f6a337
--- /dev/null
+++ b/mm/gcma.c
@@ -0,0 +1,173 @@
+/*
+ * gcma.c - Guaranteed Contiguous Memory Allocator
+ *
+ * GCMA aims for contiguous memory allocation with success and fast
+ * latency guarantee.
+ * It reserves large amount of memory and let it be allocated to
+ * contiguous memory requests.
+ *
+ * Copyright (C) 2014  LG Electronics Inc.,
+ * Copyright (C) 2014  Minchan Kim <minchan@kernel.org>
+ * Copyright (C) 2014-2015  SeongJae Park <sj38.park@gmail.com>
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/gcma.h>
+#include <linux/highmem.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+
+struct gcma {
+	spinlock_t lock;
+	unsigned long *bitmap;
+	unsigned long base_pfn, size;
+	struct list_head list;
+};
+
+struct gcma_info {
+	spinlock_t lock;	/* protect list */
+	struct list_head head;
+};
+
+static struct gcma_info ginfo = {
+	.head = LIST_HEAD_INIT(ginfo.head),
+	.lock = __SPIN_LOCK_UNLOCKED(ginfo.lock),
+};
+
+/*
+ * gcma_init - initializes a contiguous memory area
+ *
+ * @start_pfn	start pfn of contiguous memory area
+ * @size	number of pages in the contiguous memory area
+ * @res_gcma	pointer to store the created gcma region
+ *
+ * Returns 0 on success, error code on failure.
+ */
+int gcma_init(unsigned long start_pfn, unsigned long size,
+		struct gcma **res_gcma)
+{
+	int bitmap_size = BITS_TO_LONGS(size) * sizeof(long);
+	struct gcma *gcma;
+
+	gcma = kmalloc(sizeof(*gcma), GFP_KERNEL);
+	if (!gcma)
+		goto out;
+
+	gcma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+	if (!gcma->bitmap)
+		goto free_cma;
+
+	gcma->size = size;
+	gcma->base_pfn = start_pfn;
+	spin_lock_init(&gcma->lock);
+
+	spin_lock(&ginfo.lock);
+	list_add(&gcma->list, &ginfo.head);
+	spin_unlock(&ginfo.lock);
+
+	*res_gcma = gcma;
+	pr_info("initialized gcma area [%lu, %lu]\n",
+			start_pfn, start_pfn + size);
+	return 0;
+
+free_cma:
+	kfree(gcma);
+out:
+	return -ENOMEM;
+}
+
+static struct page *gcma_alloc_page(struct gcma *gcma)
+{
+	unsigned long bit;
+	unsigned long *bitmap = gcma->bitmap;
+	struct page *page = NULL;
+
+	spin_lock(&gcma->lock);
+	bit = bitmap_find_next_zero_area(bitmap, gcma->size, 0, 1, 0);
+	if (bit >= gcma->size) {
+		spin_unlock(&gcma->lock);
+		goto out;
+	}
+
+	bitmap_set(bitmap, bit, 1);
+	page = pfn_to_page(gcma->base_pfn + bit);
+	spin_unlock(&gcma->lock);
+
+out:
+	return page;
+}
+
+static void gcma_free_page(struct gcma *gcma, struct page *page)
+{
+	unsigned long pfn, offset;
+
+	pfn = page_to_pfn(page);
+
+	spin_lock(&gcma->lock);
+	offset = pfn - gcma->base_pfn;
+
+	bitmap_clear(gcma->bitmap, offset, 1);
+	spin_unlock(&gcma->lock);
+}
+
+/*
+ * gcma_alloc_contig - allocates contiguous pages
+ *
+ * @start_pfn	start pfn of requiring contiguous memory area
+ * @size	number of pages in requiring contiguous memory area
+ *
+ * Returns 0 on success, error code on failure.
+ */
+int gcma_alloc_contig(struct gcma *gcma, unsigned long start_pfn,
+			unsigned long size)
+{
+	unsigned long offset;
+
+	spin_lock(&gcma->lock);
+	offset = start_pfn - gcma->base_pfn;
+
+	if (bitmap_find_next_zero_area(gcma->bitmap, gcma->size, offset,
+				size, 0) != 0) {
+		spin_unlock(&gcma->lock);
+		pr_warn("already allocated region required: %lu, %lu",
+				start_pfn, size);
+		return -EINVAL;
+	}
+
+	bitmap_set(gcma->bitmap, offset, size);
+	spin_unlock(&gcma->lock);
+
+	return 0;
+}
+
+/*
+ * gcma_free_contig - free allocated contiguous pages
+ *
+ * @start_pfn	start pfn of freeing contiguous memory area
+ * @size	number of pages in freeing contiguous memory area
+ */
+void gcma_free_contig(struct gcma *gcma,
+			unsigned long start_pfn, unsigned long size)
+{
+	unsigned long offset;
+
+	spin_lock(&gcma->lock);
+	offset = start_pfn - gcma->base_pfn;
+	bitmap_clear(gcma->bitmap, offset, size);
+	spin_unlock(&gcma->lock);
+}
+
+static int __init init_gcma(void)
+{
+	pr_info("loading gcma\n");
+
+	return 0;
+}
+
+module_init(init_gcma);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Minchan Kim <minchan@kernel.org>");
+MODULE_AUTHOR("SeongJae Park <sj38.park@gmail.com>");
+MODULE_DESCRIPTION("Guaranteed Contiguous Memory Allocator");
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC v2 1/5] gcma: introduce contiguous memory allocator
@ 2015-02-23 19:54   ` SeongJae Park
  0 siblings, 0 replies; 20+ messages in thread
From: SeongJae Park @ 2015-02-23 19:54 UTC (permalink / raw)
  To: akpm
  Cc: lauraa, minchan, sergey.senozhatsky, linux-mm, linux-kernel,
	SeongJae Park

Currently, cma reserves large contiguous memory area during early boot
and let the area could be used by others for movable pages only. Then,
if those movable pages are necessary for contiguous memory allocation,
cma migrates and / or discards them out.

This mechanism have two weakness.
1) Because any one in kernel could pin any movable pages, migration of
movable pages could be fail. It could lead to contiguous memory
allocation failure.
2) Because of migration / reclaim overhead, the latency could be
extremely high.
In short, cma doesn't guarantee success and fast latency of contiguous
memory allocation. The problem was discussed in detail from [1] and [2].

This patch introduces a simple contiguous memory allocator, namely,
GCMA(Guaranteed Contiguous Memory Allocator). It aims to guarantee
success and fast latency by reserving memory during early boot time.
However, this simple mechanism could degrade system memory space
efficiency seriously; following commits will settle the problem.

[1] https://lkml.org/lkml/2013/10/30/16
[2] http://sched.co/1qZcBAO

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
 include/linux/gcma.h |  26 ++++++++
 mm/gcma.c            | 173 +++++++++++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 199 insertions(+)
 create mode 100644 include/linux/gcma.h
 create mode 100644 mm/gcma.c

diff --git a/include/linux/gcma.h b/include/linux/gcma.h
new file mode 100644
index 0000000..cda481f
--- /dev/null
+++ b/include/linux/gcma.h
@@ -0,0 +1,26 @@
+/*
+ * gcma.h - Guaranteed Contiguous Memory Allocator
+ *
+ * GCMA aims for contiguous memory allocation with success and fast
+ * latency guarantee.
+ * It reserves large amount of memory and let it be allocated to
+ * contiguous memory requests.
+ *
+ * Copyright (C) 2014  LG Electronics Inc.,
+ * Copyright (C) 2014  Minchan Kim <minchan@kernel.org>
+ * Copyright (C) 2014-2015  SeongJae Park <sj38.park@gmail.com>
+ */
+
+#ifndef _LINUX_GCMA_H
+#define _LINUX_GCMA_H
+
+struct gcma;
+
+int gcma_init(unsigned long start_pfn, unsigned long size,
+	      struct gcma **res_gcma);
+int gcma_alloc_contig(struct gcma *gcma,
+		      unsigned long start_pfn, unsigned long size);
+void gcma_free_contig(struct gcma *gcma,
+		      unsigned long start_pfn, unsigned long size);
+
+#endif /* _LINUX_GCMA_H */
diff --git a/mm/gcma.c b/mm/gcma.c
new file mode 100644
index 0000000..3f6a337
--- /dev/null
+++ b/mm/gcma.c
@@ -0,0 +1,173 @@
+/*
+ * gcma.c - Guaranteed Contiguous Memory Allocator
+ *
+ * GCMA aims for contiguous memory allocation with success and fast
+ * latency guarantee.
+ * It reserves large amount of memory and let it be allocated to
+ * contiguous memory requests.
+ *
+ * Copyright (C) 2014  LG Electronics Inc.,
+ * Copyright (C) 2014  Minchan Kim <minchan@kernel.org>
+ * Copyright (C) 2014-2015  SeongJae Park <sj38.park@gmail.com>
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include <linux/gcma.h>
+#include <linux/highmem.h>
+#include <linux/module.h>
+#include <linux/slab.h>
+
+struct gcma {
+	spinlock_t lock;
+	unsigned long *bitmap;
+	unsigned long base_pfn, size;
+	struct list_head list;
+};
+
+struct gcma_info {
+	spinlock_t lock;	/* protect list */
+	struct list_head head;
+};
+
+static struct gcma_info ginfo = {
+	.head = LIST_HEAD_INIT(ginfo.head),
+	.lock = __SPIN_LOCK_UNLOCKED(ginfo.lock),
+};
+
+/*
+ * gcma_init - initializes a contiguous memory area
+ *
+ * @start_pfn	start pfn of contiguous memory area
+ * @size	number of pages in the contiguous memory area
+ * @res_gcma	pointer to store the created gcma region
+ *
+ * Returns 0 on success, error code on failure.
+ */
+int gcma_init(unsigned long start_pfn, unsigned long size,
+		struct gcma **res_gcma)
+{
+	int bitmap_size = BITS_TO_LONGS(size) * sizeof(long);
+	struct gcma *gcma;
+
+	gcma = kmalloc(sizeof(*gcma), GFP_KERNEL);
+	if (!gcma)
+		goto out;
+
+	gcma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+	if (!gcma->bitmap)
+		goto free_cma;
+
+	gcma->size = size;
+	gcma->base_pfn = start_pfn;
+	spin_lock_init(&gcma->lock);
+
+	spin_lock(&ginfo.lock);
+	list_add(&gcma->list, &ginfo.head);
+	spin_unlock(&ginfo.lock);
+
+	*res_gcma = gcma;
+	pr_info("initialized gcma area [%lu, %lu]\n",
+			start_pfn, start_pfn + size);
+	return 0;
+
+free_cma:
+	kfree(gcma);
+out:
+	return -ENOMEM;
+}
+
+static struct page *gcma_alloc_page(struct gcma *gcma)
+{
+	unsigned long bit;
+	unsigned long *bitmap = gcma->bitmap;
+	struct page *page = NULL;
+
+	spin_lock(&gcma->lock);
+	bit = bitmap_find_next_zero_area(bitmap, gcma->size, 0, 1, 0);
+	if (bit >= gcma->size) {
+		spin_unlock(&gcma->lock);
+		goto out;
+	}
+
+	bitmap_set(bitmap, bit, 1);
+	page = pfn_to_page(gcma->base_pfn + bit);
+	spin_unlock(&gcma->lock);
+
+out:
+	return page;
+}
+
+static void gcma_free_page(struct gcma *gcma, struct page *page)
+{
+	unsigned long pfn, offset;
+
+	pfn = page_to_pfn(page);
+
+	spin_lock(&gcma->lock);
+	offset = pfn - gcma->base_pfn;
+
+	bitmap_clear(gcma->bitmap, offset, 1);
+	spin_unlock(&gcma->lock);
+}
+
+/*
+ * gcma_alloc_contig - allocates contiguous pages
+ *
+ * @start_pfn	start pfn of requiring contiguous memory area
+ * @size	number of pages in requiring contiguous memory area
+ *
+ * Returns 0 on success, error code on failure.
+ */
+int gcma_alloc_contig(struct gcma *gcma, unsigned long start_pfn,
+			unsigned long size)
+{
+	unsigned long offset;
+
+	spin_lock(&gcma->lock);
+	offset = start_pfn - gcma->base_pfn;
+
+	if (bitmap_find_next_zero_area(gcma->bitmap, gcma->size, offset,
+				size, 0) != 0) {
+		spin_unlock(&gcma->lock);
+		pr_warn("already allocated region required: %lu, %lu",
+				start_pfn, size);
+		return -EINVAL;
+	}
+
+	bitmap_set(gcma->bitmap, offset, size);
+	spin_unlock(&gcma->lock);
+
+	return 0;
+}
+
+/*
+ * gcma_free_contig - free allocated contiguous pages
+ *
+ * @start_pfn	start pfn of freeing contiguous memory area
+ * @size	number of pages in freeing contiguous memory area
+ */
+void gcma_free_contig(struct gcma *gcma,
+			unsigned long start_pfn, unsigned long size)
+{
+	unsigned long offset;
+
+	spin_lock(&gcma->lock);
+	offset = start_pfn - gcma->base_pfn;
+	bitmap_clear(gcma->bitmap, offset, size);
+	spin_unlock(&gcma->lock);
+}
+
+static int __init init_gcma(void)
+{
+	pr_info("loading gcma\n");
+
+	return 0;
+}
+
+module_init(init_gcma);
+
+MODULE_LICENSE("GPL");
+MODULE_AUTHOR("Minchan Kim <minchan@kernel.org>");
+MODULE_AUTHOR("SeongJae Park <sj38.park@gmail.com>");
+MODULE_DESCRIPTION("Guaranteed Contiguous Memory Allocator");
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC v2 2/5] gcma: utilize reserved memory as discardable memory
  2015-02-23 19:54 ` SeongJae Park
@ 2015-02-23 19:54   ` SeongJae Park
  -1 siblings, 0 replies; 20+ messages in thread
From: SeongJae Park @ 2015-02-23 19:54 UTC (permalink / raw)
  To: akpm
  Cc: lauraa, minchan, sergey.senozhatsky, linux-mm, linux-kernel,
	SeongJae Park

Because gcma reserves large amount of memory during early boot and let
it be used for contiguous memory requests only, system memory space
efficiency could be degraded if the reserved area being idle. The
problem could be settled by lending the reserved area to other clients.
In this context, we could call contiguous memory requests as first-class
clients and other clients as second-class clients. CMA also shares this
idea using movable pages as second-class clients.

Key point of this idea is, niceness of second-class clients. If
second-class clients does not pay borrowed pages back soon while
first-class clients waiting them, first-class clients could suffer from
slow latency or failure.

For that, gcma restricts second-class clients to use the reserved area
as only easily discardable memory. With the restriction, gcma guarantees
success and fast latency by discarding pages of second-class clients
whenever first-class client needs them.

This commit implements interface and backend of discardable memory
inside gcma. Any subsystem satisfying with discardable memory could be
second-class clients of gcma by using the interface.

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
 include/linux/gcma.h |  16 +-
 mm/gcma.c            | 751 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 754 insertions(+), 13 deletions(-)

diff --git a/include/linux/gcma.h b/include/linux/gcma.h
index cda481f..005bf77 100644
--- a/include/linux/gcma.h
+++ b/include/linux/gcma.h
@@ -4,7 +4,21 @@
  * GCMA aims for contiguous memory allocation with success and fast
  * latency guarantee.
  * It reserves large amount of memory and let it be allocated to
- * contiguous memory requests.
+ * contiguous memory requests. Because system memory space efficiency could be
+ * degraded if reserved area being idle, GCMA let the reserved area could be
+ * used by other clients with lower priority.
+ * We call those lower priority clients as second-class clients. In this
+ * context, contiguous memory requests are first-class clients, of course.
+ *
+ * With this idea, gcma withdraw pages being used for second-class clients and
+ * gives them to first-class clients if they required. Because latency
+ * and success of first-class clients depend on speed and availability of
+ * withdrawing, GCMA restricts only easily discardable memory could be used for
+ * second-class clients.
+ *
+ * To support various second-class clients, GCMA provides interface and
+ * backend of discardable memory. Any candiates satisfying with discardable
+ * memory could be second-class client of GCMA using the interface.
  *
  * Copyright (C) 2014  LG Electronics Inc.,
  * Copyright (C) 2014  Minchan Kim <minchan@kernel.org>
diff --git a/mm/gcma.c b/mm/gcma.c
index 3f6a337..dc70fa8 100644
--- a/mm/gcma.c
+++ b/mm/gcma.c
@@ -4,7 +4,21 @@
  * GCMA aims for contiguous memory allocation with success and fast
  * latency guarantee.
  * It reserves large amount of memory and let it be allocated to
- * contiguous memory requests.
+ * contiguous memory requests. Because system memory space efficiency could be
+ * degraded if reserved area being idle, GCMA let the reserved area could be
+ * used by other clients with lower priority.
+ * We call those lower priority clients as second-class clients. In this
+ * context, contiguous memory requests are first-class clients, of course.
+ *
+ * With this idea, gcma withdraw pages being used for second-class clients and
+ * gives them to first-class clients if they required. Because latency
+ * and success of first-class clients depend on speed and availability of
+ * withdrawing, GCMA restricts only easily discardable memory could be used for
+ * second-class clients.
+ *
+ * To support various second-class clients, GCMA provides interface and
+ * backend of discardable memory. Any candiates satisfying with discardable
+ * memory could be second-class client of GCMA using the interface.
  *
  * Copyright (C) 2014  LG Electronics Inc.,
  * Copyright (C) 2014  Minchan Kim <minchan@kernel.org>
@@ -18,6 +32,9 @@
 #include <linux/module.h>
 #include <linux/slab.h>
 
+/* XXX: What's the ideal? */
+#define NR_EVICT_BATCH	32
+
 struct gcma {
 	spinlock_t lock;
 	unsigned long *bitmap;
@@ -36,6 +53,114 @@ static struct gcma_info ginfo = {
 };
 
 /*
+ * Discardable memory(dmem) store and load easily discardable pages inside
+ * gcma area. Because it's discardable memory, loading stored page could fail
+ * anytime.
+ */
+
+/* entry for a discardable page */
+struct dmem_entry {
+	struct rb_node rbnode;
+	struct gcma *gcma;
+	void *key;
+	struct page *page;
+	atomic_t refcount;
+};
+
+/* dmem hash bucket */
+struct dmem_hashbucket {
+	struct dmem *dmem;
+	struct rb_root rbroot;
+	spinlock_t lock;
+};
+
+/* dmem pool */
+struct dmem_pool {
+	struct dmem_hashbucket *hashbuckets;
+};
+
+struct dmem {
+	struct dmem_pool **pools;
+	unsigned nr_pools;
+	unsigned nr_hash;
+	struct kmem_cache *key_cache;
+	size_t bytes_key;
+	struct list_head lru_list;
+	spinlock_t lru_lock;
+
+	unsigned (*hash_key)(void *key);
+	int (*compare)(void *lkey, void *rkey);
+};
+
+static struct kmem_cache *dmem_entry_cache;
+
+static unsigned long dmem_evict_lru(struct dmem *dmem, unsigned long nr_pages);
+
+static struct dmem_hashbucket *dmem_hashbuck(struct page *page)
+{
+	return (struct dmem_hashbucket *)page->mapping;
+}
+
+static void set_dmem_hashbuck(struct page *page, struct dmem_hashbucket *buck)
+{
+	page->mapping = (struct address_space *)buck;
+}
+
+static struct dmem_entry *dmem_entry(struct page *page)
+{
+	return (struct dmem_entry *)page->index;
+}
+
+static void set_dmem_entry(struct page *page, struct dmem_entry *entry)
+{
+	page->index = (pgoff_t)entry;
+}
+
+/*
+ * Flags for status of a page in gcma
+ *
+ * GF_LRU
+ * The page is being used for a dmem and hang on LRU list of the dmem.
+ * It could be discarded for contiguous memory allocation easily.
+ * Protected by dmem->lru_lock.
+ *
+ * GF_RECLAIMING
+ * The page is being discarded for contiguous memory allocation.
+ * It should not be used for dmem anymore.
+ * Protected by dmem->lru_lock.
+ *
+ * GF_ISOLATED
+ * The page is isolated from dmem.
+ * GCMA clients can use the page safely while dmem should not.
+ * Protected by gcma->lock.
+ */
+enum gpage_flags {
+	GF_LRU = 0x1,
+	GF_RECLAIMING = 0x2,
+	GF_ISOLATED = 0x4,
+};
+
+static int gpage_flag(struct page *page, int flag)
+{
+	return page->private & flag;
+}
+
+static void set_gpage_flag(struct page *page, int flag)
+{
+	page->private |= flag;
+}
+
+static void clear_gpage_flag(struct page *page, int flag)
+{
+	page->private &= ~flag;
+}
+
+static void clear_gpage_flagall(struct page *page)
+{
+	page->private = 0;
+}
+
+/*
  * gcma_init - initializes a contiguous memory area
  *
  * @start_pfn	start pfn of contiguous memory area
@@ -93,11 +218,13 @@ static struct page *gcma_alloc_page(struct gcma *gcma)
 	bitmap_set(bitmap, bit, 1);
 	page = pfn_to_page(gcma->base_pfn + bit);
 	spin_unlock(&gcma->lock);
+	clear_gpage_flagall(page);
 
 out:
 	return page;
 }
 
+/* Caller should hold lru_lock */
 static void gcma_free_page(struct gcma *gcma, struct page *page)
 {
 	unsigned long pfn, offset;
@@ -107,36 +234,632 @@ static void gcma_free_page(struct gcma *gcma, struct page *page)
 	spin_lock(&gcma->lock);
 	offset = pfn - gcma->base_pfn;
 
-	bitmap_clear(gcma->bitmap, offset, 1);
+	if (likely(!gpage_flag(page, GF_RECLAIMING))) {
+		bitmap_clear(gcma->bitmap, offset, 1);
+	} else {
+		/*
+		 * The page should be safe to be used for a thread which
+		 * reclaimed the page.
+		 * To prevent further allocation from other thread,
+		 * set bitmap and mark the page as isolated.
+		 */
+		bitmap_set(gcma->bitmap, offset, 1);
+		set_gpage_flag(page, GF_ISOLATED);
+	}
 	spin_unlock(&gcma->lock);
 }
 
 /*
+ * In the case that a entry with the same offset is found, a pointer to
+ * the existing entry is stored in dupentry and the function returns -EEXIST.
+ */
+static int dmem_insert_entry(struct dmem_hashbucket *bucket,
+		struct dmem_entry *entry,
+		struct dmem_entry **dupentry)
+{
+	struct rb_node **link = &bucket->rbroot.rb_node, *parent = NULL;
+	struct dmem_entry *iter;
+	int cmp;
+
+	while (*link) {
+		parent = *link;
+		iter = rb_entry(parent, struct dmem_entry, rbnode);
+		cmp = bucket->dmem->compare(entry->key, iter->key);
+		if (cmp < 0)
+			link = &(*link)->rb_left;
+		else if (cmp > 0)
+			link = &(*link)->rb_right;
+		else {
+			*dupentry = iter;
+			return -EEXIST;
+		}
+	}
+	rb_link_node(&entry->rbnode, parent, link);
+	rb_insert_color(&entry->rbnode, &bucket->rbroot);
+	return 0;
+}
+
+static void dmem_erase_entry(struct dmem_hashbucket *bucket,
+		struct dmem_entry *entry)
+{
+	if (!RB_EMPTY_NODE(&entry->rbnode)) {
+		rb_erase(&entry->rbnode, &bucket->rbroot);
+		RB_CLEAR_NODE(&entry->rbnode);
+	}
+}
+
+static struct dmem_entry *dmem_search_entry(struct dmem_hashbucket *bucket,
+		void *key)
+{
+	struct rb_node *node = bucket->rbroot.rb_node;
+	struct dmem_entry *iter;
+	int cmp;
+
+	while (node) {
+		iter = rb_entry(node, struct dmem_entry, rbnode);
+		cmp = bucket->dmem->compare(key, iter->key);
+		if (cmp < 0)
+			node = node->rb_left;
+		else if (cmp > 0)
+			node = node->rb_right;
+		else
+			return iter;
+	}
+	return NULL;
+}
+
+/* Allocates a page from gcma areas using round-robin way */
+static struct page *dmem_alloc_page(struct dmem *dmem, struct gcma **res_gcma)
+{
+	struct page *page;
+	struct gcma *gcma;
+
+retry:
+	spin_lock(&ginfo.lock);
+	gcma = list_first_entry(&ginfo.head, struct gcma, list);
+	list_move_tail(&gcma->list, &ginfo.head);
+
+	list_for_each_entry(gcma, &ginfo.head, list) {
+		page = gcma_alloc_page(gcma);
+		if (page) {
+			spin_unlock(&ginfo.lock);
+			goto got;
+		}
+	}
+	spin_unlock(&ginfo.lock);
+
+	/*
+	 * Failed to alloc a page from entire gcma. Evict adequate LRU
+	 * discardable pages and try allocation again.
+	 */
+	if (dmem_evict_lru(dmem, NR_EVICT_BATCH))
+		goto retry;
+
+got:
+	*res_gcma = gcma;
+	return page;
+}
+
+/* Should be called from dmem_put only */
+static void dmem_free_entry(struct dmem *dmem, struct dmem_entry *entry)
+{
+	gcma_free_page(entry->gcma, entry->page);
+	kmem_cache_free(dmem->key_cache, entry->key);
+	kmem_cache_free(dmem_entry_cache, entry);
+}
+
+/* Caller should hold hashbucket spinlock */
+static void dmem_get(struct dmem_entry *entry)
+{
+	atomic_inc(&entry->refcount);
+}
+
+/*
+ * Caller should hold hashbucket spinlock and dmem lru_lock.
+ * Remove from the bucket and free it, if nobody reference the entry.
+ */
+static void dmem_put(struct dmem_hashbucket *buck,
+				struct dmem_entry *entry)
+{
+	int refcount = atomic_dec_return(&entry->refcount);
+
+	BUG_ON(refcount < 0);
+
+	if (refcount == 0) {
+		struct page *page = entry->page;
+
+		dmem_erase_entry(buck, entry);
+		list_del(&page->lru);
+		dmem_free_entry(buck->dmem, entry);
+	}
+}
+
+/*
+ * dmem_evict_lru - evict @nr_pages LRU dmem pages
+ *
+ * @dmem	dmem to evict LRU pages from
+ * @nr_pages	number of LRU pages to be evicted
+ *
+ * Returns number of successfully evicted pages
+ */
+static unsigned long dmem_evict_lru(struct dmem *dmem, unsigned long nr_pages)
+{
+	struct dmem_hashbucket *buck;
+	struct dmem_entry *entry;
+	struct page *page, *n;
+	unsigned long evicted = 0;
+	u8 key[dmem->bytes_key];
+	LIST_HEAD(free_pages);
+
+	spin_lock(&dmem->lru_lock);
+	list_for_each_entry_safe_reverse(page, n, &dmem->lru_list, lru) {
+		entry = dmem_entry(page);
+
+		/*
+		 * the entry could be free by other thread in the while.
+		 * check whether the situation occurred and avoid others to
+		 * free it by compare reference count and increase it
+		 * atomically.
+		 */
+		if (!atomic_inc_not_zero(&entry->refcount))
+			continue;
+
+		clear_gpage_flag(page, GF_LRU);
+		list_move(&page->lru, &free_pages);
+		if (++evicted >= nr_pages)
+			break;
+	}
+	spin_unlock(&dmem->lru_lock);
+
+	list_for_each_entry_safe(page, n, &free_pages, lru) {
+		buck = dmem_hashbuck(page);
+		entry = dmem_entry(page);
+
+		spin_lock(&buck->lock);
+		spin_lock(&dmem->lru_lock);
+		/* drop refcount increased by above loop */
+		memcpy(&key, entry->key, dmem->bytes_key);
+		dmem_put(buck, entry);
+		/* free entry if the entry is still in tree */
+		if (dmem_search_entry(buck, &key))
+			dmem_put(buck, entry);
+		spin_unlock(&dmem->lru_lock);
+		spin_unlock(&buck->lock);
+	}
+
+	return evicted;
+}
+
+/* Caller should hold bucket spinlock */
+static struct dmem_entry *dmem_find_get_entry(struct dmem_hashbucket *buck,
+						void *key)
+{
+	struct dmem_entry *entry;
+
+	assert_spin_locked(&buck->lock);
+	entry = dmem_search_entry(buck, key);
+	if (entry)
+		dmem_get(entry);
+
+	return entry;
+}
+
+static struct dmem_hashbucket *dmem_find_hashbucket(struct dmem *dmem,
+							struct dmem_pool *pool,
+							void *key)
+{
+	return &pool->hashbuckets[dmem->hash_key(key)];
+}
+
+/*
+ * dmem_init_pool - initialize a pool in dmem
+ *
+ * @dmem	dmem of a pool to be initialized
+ * @pool_id	id of a pool to be initialized
+ *
+ * Returns 0 if success,
+ * Returns non-zero if failed.
+ */
+int dmem_init_pool(struct dmem *dmem, unsigned pool_id)
+{
+	struct dmem_pool *pool;
+	struct dmem_hashbucket *buck;
+	int i;
+
+	pool = kzalloc(sizeof(struct dmem_pool), GFP_KERNEL);
+	if (!pool) {
+		pr_warn("%s: failed to alloc dmem pool %d\n",
+				__func__, pool_id);
+		return -ENOMEM;
+	}
+
+	pool->hashbuckets = kzalloc(
+				sizeof(struct dmem_hashbucket) * dmem->nr_hash,
+				GFP_KERNEL);
+	if (!pool) {
+		pr_warn("%s: failed to alloc hashbuckets\n", __func__);
+		kfree(pool);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < dmem->nr_hash; i++) {
+		buck = &pool->hashbuckets[i];
+		buck->dmem = dmem;
+		buck->rbroot = RB_ROOT;
+		spin_lock_init(&buck->lock);
+	}
+
+	dmem->pools[pool_id] = pool;
+	return 0;
+}
+
+/*
+ * dmem_store_page - store a page in dmem
+ *
+ * Saves content of @page in gcma area and manages it using dmem. The content
+ * could be loaded again from dmem using @key if it has not been discarded for
+ * first-class clients.
+ *
+ * @dmem	dmem to store the page in
+ * @pool_id	id of a dmem pool to store the page in
+ * @key		key of the page to be stored in
+ * @page	page to be stored in
+ *
+ * Returns 0 if success,
+ * Returns non-zero if failed.
+ */
+int dmem_store_page(struct dmem *dmem, unsigned pool_id, void *key,
+			struct page *page)
+{
+	struct dmem_pool *pool;
+	struct dmem_hashbucket *buck;
+	struct dmem_entry *entry, *dupentry;
+	struct gcma *gcma;
+	struct page *gcma_page = NULL;
+
+	u8 *src, *dst;
+	int ret;
+
+	pool = dmem->pools[pool_id];
+	if (!pool) {
+		pr_warn("%s: dmem pool for id %d is not exist\n",
+				__func__, pool_id);
+		return -ENODEV;
+	}
+
+	gcma_page = dmem_alloc_page(dmem, &gcma);
+	if (!gcma_page)
+		return -ENOMEM;
+
+	entry = kmem_cache_alloc(dmem_entry_cache, GFP_ATOMIC);
+	if (!entry) {
+		spin_lock(&dmem->lru_lock);
+		gcma_free_page(gcma, gcma_page);
+		spin_unlock(&dmem->lru_lock);
+		return -ENOMEM;
+	}
+
+	entry->gcma = gcma;
+	entry->page = gcma_page;
+	entry->key = kmem_cache_alloc(dmem->key_cache, GFP_ATOMIC);
+	if (!entry->key) {
+		spin_lock(&dmem->lru_lock);
+		gcma_free_page(gcma, gcma_page);
+		spin_unlock(&dmem->lru_lock);
+		kmem_cache_free(dmem_entry_cache, entry);
+		return -ENOMEM;
+	}
+	memcpy(entry->key, key, dmem->bytes_key);
+	atomic_set(&entry->refcount, 1);
+	RB_CLEAR_NODE(&entry->rbnode);
+
+	buck = dmem_find_hashbucket(dmem, pool, entry->key);
+	set_dmem_hashbuck(gcma_page, buck);
+	set_dmem_entry(gcma_page, entry);
+
+	/* copy from orig data to gcma_page */
+	src = kmap_atomic(page);
+	dst = kmap_atomic(gcma_page);
+	memcpy(dst, src, PAGE_SIZE);
+	kunmap_atomic(src);
+	kunmap_atomic(dst);
+
+	spin_lock(&buck->lock);
+	do {
+		/*
+		 * Though this duplication scenario may happen rarely by
+		 * race of dmem client layer, we handle this case here rather
+		 * than fix the client layer because handling the possibility
+		 * of duplicates is part of the tmem ABI.
+		 */
+		ret = dmem_insert_entry(buck, entry, &dupentry);
+		if (ret == -EEXIST) {
+			dmem_erase_entry(buck, dupentry);
+			spin_lock(&dmem->lru_lock);
+			dmem_put(buck, dupentry);
+			spin_unlock(&dmem->lru_lock);
+		}
+	} while (ret == -EEXIST);
+
+	spin_lock(&dmem->lru_lock);
+	set_gpage_flag(gcma_page, GF_LRU);
+	list_add(&gcma_page->lru, &dmem->lru_list);
+	spin_unlock(&dmem->lru_lock);
+	spin_unlock(&buck->lock);
+
+	return ret;
+}
+
+/*
+ * dmem_load_page - load a page stored in dmem using @key
+ *
+ * @dmem	dmem which the page stored in
+ * @pool_id	id of a dmem pool the page stored in
+ * @key		key of the page looking for
+ * @page	page to store the loaded content
+ *
+ * Returns 0 if success,
+ * Returns non-zero if failed.
+ */
+int dmem_load_page(struct dmem *dmem, unsigned pool_id, void *key,
+			struct page *page)
+{
+	struct dmem_pool *pool;
+	struct dmem_hashbucket *buck;
+	struct dmem_entry *entry;
+	struct page *gcma_page;
+	u8 *src, *dst;
+
+	pool = dmem->pools[pool_id];
+	if (!pool) {
+		pr_warn("dmem pool for id %d not exist\n", pool_id);
+		return -1;
+	}
+
+	buck = dmem_find_hashbucket(dmem, pool, key);
+
+	spin_lock(&buck->lock);
+	entry = dmem_find_get_entry(buck, key);
+	spin_unlock(&buck->lock);
+	if (!entry)
+		return -1;
+
+	gcma_page = entry->page;
+	src = kmap_atomic(gcma_page);
+	dst = kmap_atomic(page);
+	memcpy(dst, src, PAGE_SIZE);
+	kunmap_atomic(src);
+	kunmap_atomic(dst);
+
+	spin_lock(&buck->lock);
+	spin_lock(&dmem->lru_lock);
+	if (likely(gpage_flag(gcma_page, GF_LRU)))
+		list_move(&gcma_page->lru, &dmem->lru_list);
+	dmem_put(buck, entry);
+	spin_unlock(&dmem->lru_lock);
+	spin_unlock(&buck->lock);
+
+	return 0;
+}
+
+/*
+ * dmem_invalidate_entry - invalidates an entry from dmem
+ *
+ * @dmem	dmem of entry to be invalidated
+ * @pool_id	dmem pool id of entry to be invalidated
+ * @key		key of entry to be invalidated
+ *
+ * Returns 0 if success,
+ * Returns non-zero if failed.
+ */
+int dmem_invalidate_entry(struct dmem *dmem, unsigned pool_id, void *key)
+{
+	struct dmem_pool *pool;
+	struct dmem_hashbucket *buck;
+	struct dmem_entry *entry;
+
+	pool = dmem->pools[pool_id];
+	buck = dmem_find_hashbucket(dmem, pool, key);
+
+	spin_lock(&buck->lock);
+	entry = dmem_search_entry(buck, key);
+	if (!entry) {
+		spin_unlock(&buck->lock);
+		return -ENOENT;
+	}
+
+	spin_lock(&dmem->lru_lock);
+	dmem_put(buck, entry);
+	spin_unlock(&dmem->lru_lock);
+	spin_unlock(&buck->lock);
+
+	return 0;
+}
+
+/*
+ * dmem_invalidate_pool - invalidates whole entries in a dmem pool
+ *
+ * @dmem	dmem of pool to be invalidated
+ * @pool_id	id of pool to be invalidated
+ *
+ * Returns 0 if success,
+ * Returns non-zero if failed.
+ */
+int dmem_invalidate_pool(struct dmem *dmem, unsigned pool_id)
+{
+	struct dmem_pool *pool;
+	struct dmem_hashbucket *buck;
+	struct dmem_entry *entry, *n;
+	int i;
+
+	pool = dmem->pools[pool_id];
+	if (!pool)
+		return -1;
+
+	for (i = 0; i < dmem->nr_hash; i++) {
+		buck = &pool->hashbuckets[i];
+		spin_lock(&buck->lock);
+		rbtree_postorder_for_each_entry_safe(entry, n, &buck->rbroot,
+							rbnode) {
+			spin_lock(&dmem->lru_lock);
+			dmem_put(buck, entry);
+			spin_unlock(&dmem->lru_lock);
+		}
+		buck->rbroot = RB_ROOT;
+		spin_unlock(&buck->lock);
+	}
+
+	kfree(pool->hashbuckets);
+	kfree(pool);
+	dmem->pools[pool_id] = NULL;
+
+	return 0;
+}
+
+/*
+ * Return 0 if [start_pfn, end_pfn] is isolated.
+ * Otherwise, return first unisolated pfn from the start_pfn.
+ */
+static unsigned long isolate_interrupted(struct gcma *gcma,
+		unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long offset;
+	unsigned long *bitmap;
+	unsigned long pfn, ret = 0;
+	struct page *page;
+
+	spin_lock(&gcma->lock);
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+		int set;
+
+		offset = pfn - gcma->base_pfn;
+		bitmap = gcma->bitmap + offset / BITS_PER_LONG;
+
+		set = test_bit(pfn % BITS_PER_LONG, bitmap);
+		if (!set) {
+			ret = pfn;
+			break;
+		}
+
+		page = pfn_to_page(pfn);
+		if (!gpage_flag(page, GF_ISOLATED)) {
+			ret = pfn;
+			break;
+		}
+
+	}
+	spin_unlock(&gcma->lock);
+	return ret;
+}
+
+/*
  * gcma_alloc_contig - allocates contiguous pages
  *
  * @start_pfn	start pfn of requiring contiguous memory area
- * @size	number of pages in requiring contiguous memory area
+ * @size	size of the requiring contiguous memory area
  *
  * Returns 0 on success, error code on failure.
  */
 int gcma_alloc_contig(struct gcma *gcma, unsigned long start_pfn,
 			unsigned long size)
 {
+	LIST_HEAD(free_pages);
+	struct dmem_hashbucket *buck;
+	struct dmem_entry *entry;
+	struct cleancache_dmem_key key;	/* cc key is larger than fs's */
+	struct page *page, *n;
 	unsigned long offset;
+	unsigned long *bitmap;
+	unsigned long pfn;
+	unsigned long orig_start = start_pfn;
+	spinlock_t *lru_lock;
 
-	spin_lock(&gcma->lock);
-	offset = start_pfn - gcma->base_pfn;
+retry:
+	for (pfn = start_pfn; pfn < start_pfn + size; pfn++) {
+		spin_lock(&gcma->lock);
+
+		offset = pfn - gcma->base_pfn;
+		bitmap = gcma->bitmap + offset / BITS_PER_LONG;
+		page = pfn_to_page(pfn);
+
+		if (!test_bit(offset % BITS_PER_LONG, bitmap)) {
+			/* set a bit to prevent allocation for dmem */
+			bitmap_set(gcma->bitmap, offset, 1);
+			set_gpage_flag(page, GF_ISOLATED);
+			spin_unlock(&gcma->lock);
+			continue;
+		}
+		if (gpage_flag(page, GF_ISOLATED)) {
+			spin_unlock(&gcma->lock);
+			continue;
+		}
+
+		/* Someone is using the page so it's complicated :( */
+		spin_unlock(&gcma->lock);
+
+		/* During dmem_store, hashbuck could not be set in page, yet */
+		if (dmem_hashbuck(page) == NULL)
+			continue;
+
+		lru_lock = &dmem_hashbuck(page)->dmem->lru_lock;
+		spin_lock(lru_lock);
+		spin_lock(&gcma->lock);
 
-	if (bitmap_find_next_zero_area(gcma->bitmap, gcma->size, offset,
-				size, 0) != 0) {
+		/* Avoid allocation from other threads */
+		set_gpage_flag(page, GF_RECLAIMING);
+
+		/*
+		 * The page is in LRU and being used by someone. Discard it
+		 * after removing from lru_list.
+		 */
+		if (gpage_flag(page, GF_LRU)) {
+			entry = dmem_entry(page);
+			if (atomic_inc_not_zero(&entry->refcount)) {
+				clear_gpage_flag(page, GF_LRU);
+				list_move(&page->lru, &free_pages);
+				goto next_page;
+			}
+		}
+
+		/*
+		 * The page is
+		 * 1) allocated by others but not yet in LRU in case of
+		 *    dmem_store or
+		 * 2) deleted from LRU but not yet from gcma's bitmap in case
+		 *    of dmem_invalidate or dmem_evict_lru.
+		 * Anycase, the race is small so retry after a while will see
+		 * success. Below isolate_interrupted handles it.
+		 */
+next_page:
 		spin_unlock(&gcma->lock);
-		pr_warn("already allocated region required: %lu, %lu",
-				start_pfn, size);
-		return -EINVAL;
+		spin_unlock(lru_lock);
 	}
 
-	bitmap_set(gcma->bitmap, offset, size);
-	spin_unlock(&gcma->lock);
+	/*
+	 * Since we increased refcount of the page above, we can access
+	 * dmem_entry with safe.
+	 */
+	list_for_each_entry_safe(page, n, &free_pages, lru) {
+		buck = dmem_hashbuck(page);
+		entry = dmem_entry(page);
+		lru_lock = &dmem_hashbuck(page)->dmem->lru_lock;
+
+		spin_lock(&buck->lock);
+		spin_lock(lru_lock);
+		/* drop refcount increased by above loop */
+		memcpy(&key, entry->key, dmem_hashbuck(page)->dmem->bytes_key);
+		dmem_put(buck, entry);
+		/* free entry if the entry is still in tree */
+		if (dmem_search_entry(buck, &key))
+			dmem_put(buck, entry);
+		spin_unlock(lru_lock);
+		spin_unlock(&buck->lock);
+	}
+
+	start_pfn = isolate_interrupted(gcma, orig_start, orig_start + size);
+	if (start_pfn)
+		goto retry;
 
 	return 0;
 }
@@ -162,6 +885,10 @@ static int __init init_gcma(void)
 {
 	pr_info("loading gcma\n");
 
+	dmem_entry_cache = KMEM_CACHE(dmem_entry, 0);
+	if (dmem_entry_cache == NULL)
+		return -ENOMEM;
+
 	return 0;
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC v2 2/5] gcma: utilize reserved memory as discardable memory
@ 2015-02-23 19:54   ` SeongJae Park
  0 siblings, 0 replies; 20+ messages in thread
From: SeongJae Park @ 2015-02-23 19:54 UTC (permalink / raw)
  To: akpm
  Cc: lauraa, minchan, sergey.senozhatsky, linux-mm, linux-kernel,
	SeongJae Park

Because gcma reserves large amount of memory during early boot and let
it be used for contiguous memory requests only, system memory space
efficiency could be degraded if the reserved area being idle. The
problem could be settled by lending the reserved area to other clients.
In this context, we could call contiguous memory requests as first-class
clients and other clients as second-class clients. CMA also shares this
idea using movable pages as second-class clients.

Key point of this idea is, niceness of second-class clients. If
second-class clients does not pay borrowed pages back soon while
first-class clients waiting them, first-class clients could suffer from
slow latency or failure.

For that, gcma restricts second-class clients to use the reserved area
as only easily discardable memory. With the restriction, gcma guarantees
success and fast latency by discarding pages of second-class clients
whenever first-class client needs them.

This commit implements interface and backend of discardable memory
inside gcma. Any subsystem satisfying with discardable memory could be
second-class clients of gcma by using the interface.

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
 include/linux/gcma.h |  16 +-
 mm/gcma.c            | 751 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 754 insertions(+), 13 deletions(-)

diff --git a/include/linux/gcma.h b/include/linux/gcma.h
index cda481f..005bf77 100644
--- a/include/linux/gcma.h
+++ b/include/linux/gcma.h
@@ -4,7 +4,21 @@
  * GCMA aims for contiguous memory allocation with success and fast
  * latency guarantee.
  * It reserves large amount of memory and let it be allocated to
- * contiguous memory requests.
+ * contiguous memory requests. Because system memory space efficiency could be
+ * degraded if reserved area being idle, GCMA let the reserved area could be
+ * used by other clients with lower priority.
+ * We call those lower priority clients as second-class clients. In this
+ * context, contiguous memory requests are first-class clients, of course.
+ *
+ * With this idea, gcma withdraw pages being used for second-class clients and
+ * gives them to first-class clients if they required. Because latency
+ * and success of first-class clients depend on speed and availability of
+ * withdrawing, GCMA restricts only easily discardable memory could be used for
+ * second-class clients.
+ *
+ * To support various second-class clients, GCMA provides interface and
+ * backend of discardable memory. Any candiates satisfying with discardable
+ * memory could be second-class client of GCMA using the interface.
  *
  * Copyright (C) 2014  LG Electronics Inc.,
  * Copyright (C) 2014  Minchan Kim <minchan@kernel.org>
diff --git a/mm/gcma.c b/mm/gcma.c
index 3f6a337..dc70fa8 100644
--- a/mm/gcma.c
+++ b/mm/gcma.c
@@ -4,7 +4,21 @@
  * GCMA aims for contiguous memory allocation with success and fast
  * latency guarantee.
  * It reserves large amount of memory and let it be allocated to
- * contiguous memory requests.
+ * contiguous memory requests. Because system memory space efficiency could be
+ * degraded if reserved area being idle, GCMA let the reserved area could be
+ * used by other clients with lower priority.
+ * We call those lower priority clients as second-class clients. In this
+ * context, contiguous memory requests are first-class clients, of course.
+ *
+ * With this idea, gcma withdraw pages being used for second-class clients and
+ * gives them to first-class clients if they required. Because latency
+ * and success of first-class clients depend on speed and availability of
+ * withdrawing, GCMA restricts only easily discardable memory could be used for
+ * second-class clients.
+ *
+ * To support various second-class clients, GCMA provides interface and
+ * backend of discardable memory. Any candiates satisfying with discardable
+ * memory could be second-class client of GCMA using the interface.
  *
  * Copyright (C) 2014  LG Electronics Inc.,
  * Copyright (C) 2014  Minchan Kim <minchan@kernel.org>
@@ -18,6 +32,9 @@
 #include <linux/module.h>
 #include <linux/slab.h>
 
+/* XXX: What's the ideal? */
+#define NR_EVICT_BATCH	32
+
 struct gcma {
 	spinlock_t lock;
 	unsigned long *bitmap;
@@ -36,6 +53,114 @@ static struct gcma_info ginfo = {
 };
 
 /*
+ * Discardable memory(dmem) store and load easily discardable pages inside
+ * gcma area. Because it's discardable memory, loading stored page could fail
+ * anytime.
+ */
+
+/* entry for a discardable page */
+struct dmem_entry {
+	struct rb_node rbnode;
+	struct gcma *gcma;
+	void *key;
+	struct page *page;
+	atomic_t refcount;
+};
+
+/* dmem hash bucket */
+struct dmem_hashbucket {
+	struct dmem *dmem;
+	struct rb_root rbroot;
+	spinlock_t lock;
+};
+
+/* dmem pool */
+struct dmem_pool {
+	struct dmem_hashbucket *hashbuckets;
+};
+
+struct dmem {
+	struct dmem_pool **pools;
+	unsigned nr_pools;
+	unsigned nr_hash;
+	struct kmem_cache *key_cache;
+	size_t bytes_key;
+	struct list_head lru_list;
+	spinlock_t lru_lock;
+
+	unsigned (*hash_key)(void *key);
+	int (*compare)(void *lkey, void *rkey);
+};
+
+static struct kmem_cache *dmem_entry_cache;
+
+static unsigned long dmem_evict_lru(struct dmem *dmem, unsigned long nr_pages);
+
+static struct dmem_hashbucket *dmem_hashbuck(struct page *page)
+{
+	return (struct dmem_hashbucket *)page->mapping;
+}
+
+static void set_dmem_hashbuck(struct page *page, struct dmem_hashbucket *buck)
+{
+	page->mapping = (struct address_space *)buck;
+}
+
+static struct dmem_entry *dmem_entry(struct page *page)
+{
+	return (struct dmem_entry *)page->index;
+}
+
+static void set_dmem_entry(struct page *page, struct dmem_entry *entry)
+{
+	page->index = (pgoff_t)entry;
+}
+
+/*
+ * Flags for status of a page in gcma
+ *
+ * GF_LRU
+ * The page is being used for a dmem and hang on LRU list of the dmem.
+ * It could be discarded for contiguous memory allocation easily.
+ * Protected by dmem->lru_lock.
+ *
+ * GF_RECLAIMING
+ * The page is being discarded for contiguous memory allocation.
+ * It should not be used for dmem anymore.
+ * Protected by dmem->lru_lock.
+ *
+ * GF_ISOLATED
+ * The page is isolated from dmem.
+ * GCMA clients can use the page safely while dmem should not.
+ * Protected by gcma->lock.
+ */
+enum gpage_flags {
+	GF_LRU = 0x1,
+	GF_RECLAIMING = 0x2,
+	GF_ISOLATED = 0x4,
+};
+
+static int gpage_flag(struct page *page, int flag)
+{
+	return page->private & flag;
+}
+
+static void set_gpage_flag(struct page *page, int flag)
+{
+	page->private |= flag;
+}
+
+static void clear_gpage_flag(struct page *page, int flag)
+{
+	page->private &= ~flag;
+}
+
+static void clear_gpage_flagall(struct page *page)
+{
+	page->private = 0;
+}
+
+/*
  * gcma_init - initializes a contiguous memory area
  *
  * @start_pfn	start pfn of contiguous memory area
@@ -93,11 +218,13 @@ static struct page *gcma_alloc_page(struct gcma *gcma)
 	bitmap_set(bitmap, bit, 1);
 	page = pfn_to_page(gcma->base_pfn + bit);
 	spin_unlock(&gcma->lock);
+	clear_gpage_flagall(page);
 
 out:
 	return page;
 }
 
+/* Caller should hold lru_lock */
 static void gcma_free_page(struct gcma *gcma, struct page *page)
 {
 	unsigned long pfn, offset;
@@ -107,36 +234,632 @@ static void gcma_free_page(struct gcma *gcma, struct page *page)
 	spin_lock(&gcma->lock);
 	offset = pfn - gcma->base_pfn;
 
-	bitmap_clear(gcma->bitmap, offset, 1);
+	if (likely(!gpage_flag(page, GF_RECLAIMING))) {
+		bitmap_clear(gcma->bitmap, offset, 1);
+	} else {
+		/*
+		 * The page should be safe to be used for a thread which
+		 * reclaimed the page.
+		 * To prevent further allocation from other thread,
+		 * set bitmap and mark the page as isolated.
+		 */
+		bitmap_set(gcma->bitmap, offset, 1);
+		set_gpage_flag(page, GF_ISOLATED);
+	}
 	spin_unlock(&gcma->lock);
 }
 
 /*
+ * In the case that a entry with the same offset is found, a pointer to
+ * the existing entry is stored in dupentry and the function returns -EEXIST.
+ */
+static int dmem_insert_entry(struct dmem_hashbucket *bucket,
+		struct dmem_entry *entry,
+		struct dmem_entry **dupentry)
+{
+	struct rb_node **link = &bucket->rbroot.rb_node, *parent = NULL;
+	struct dmem_entry *iter;
+	int cmp;
+
+	while (*link) {
+		parent = *link;
+		iter = rb_entry(parent, struct dmem_entry, rbnode);
+		cmp = bucket->dmem->compare(entry->key, iter->key);
+		if (cmp < 0)
+			link = &(*link)->rb_left;
+		else if (cmp > 0)
+			link = &(*link)->rb_right;
+		else {
+			*dupentry = iter;
+			return -EEXIST;
+		}
+	}
+	rb_link_node(&entry->rbnode, parent, link);
+	rb_insert_color(&entry->rbnode, &bucket->rbroot);
+	return 0;
+}
+
+static void dmem_erase_entry(struct dmem_hashbucket *bucket,
+		struct dmem_entry *entry)
+{
+	if (!RB_EMPTY_NODE(&entry->rbnode)) {
+		rb_erase(&entry->rbnode, &bucket->rbroot);
+		RB_CLEAR_NODE(&entry->rbnode);
+	}
+}
+
+static struct dmem_entry *dmem_search_entry(struct dmem_hashbucket *bucket,
+		void *key)
+{
+	struct rb_node *node = bucket->rbroot.rb_node;
+	struct dmem_entry *iter;
+	int cmp;
+
+	while (node) {
+		iter = rb_entry(node, struct dmem_entry, rbnode);
+		cmp = bucket->dmem->compare(key, iter->key);
+		if (cmp < 0)
+			node = node->rb_left;
+		else if (cmp > 0)
+			node = node->rb_right;
+		else
+			return iter;
+	}
+	return NULL;
+}
+
+/* Allocates a page from gcma areas using round-robin way */
+static struct page *dmem_alloc_page(struct dmem *dmem, struct gcma **res_gcma)
+{
+	struct page *page;
+	struct gcma *gcma;
+
+retry:
+	spin_lock(&ginfo.lock);
+	gcma = list_first_entry(&ginfo.head, struct gcma, list);
+	list_move_tail(&gcma->list, &ginfo.head);
+
+	list_for_each_entry(gcma, &ginfo.head, list) {
+		page = gcma_alloc_page(gcma);
+		if (page) {
+			spin_unlock(&ginfo.lock);
+			goto got;
+		}
+	}
+	spin_unlock(&ginfo.lock);
+
+	/*
+	 * Failed to alloc a page from entire gcma. Evict adequate LRU
+	 * discardable pages and try allocation again.
+	 */
+	if (dmem_evict_lru(dmem, NR_EVICT_BATCH))
+		goto retry;
+
+got:
+	*res_gcma = gcma;
+	return page;
+}
+
+/* Should be called from dmem_put only */
+static void dmem_free_entry(struct dmem *dmem, struct dmem_entry *entry)
+{
+	gcma_free_page(entry->gcma, entry->page);
+	kmem_cache_free(dmem->key_cache, entry->key);
+	kmem_cache_free(dmem_entry_cache, entry);
+}
+
+/* Caller should hold hashbucket spinlock */
+static void dmem_get(struct dmem_entry *entry)
+{
+	atomic_inc(&entry->refcount);
+}
+
+/*
+ * Caller should hold hashbucket spinlock and dmem lru_lock.
+ * Remove from the bucket and free it, if nobody reference the entry.
+ */
+static void dmem_put(struct dmem_hashbucket *buck,
+				struct dmem_entry *entry)
+{
+	int refcount = atomic_dec_return(&entry->refcount);
+
+	BUG_ON(refcount < 0);
+
+	if (refcount == 0) {
+		struct page *page = entry->page;
+
+		dmem_erase_entry(buck, entry);
+		list_del(&page->lru);
+		dmem_free_entry(buck->dmem, entry);
+	}
+}
+
+/*
+ * dmem_evict_lru - evict @nr_pages LRU dmem pages
+ *
+ * @dmem	dmem to evict LRU pages from
+ * @nr_pages	number of LRU pages to be evicted
+ *
+ * Returns number of successfully evicted pages
+ */
+static unsigned long dmem_evict_lru(struct dmem *dmem, unsigned long nr_pages)
+{
+	struct dmem_hashbucket *buck;
+	struct dmem_entry *entry;
+	struct page *page, *n;
+	unsigned long evicted = 0;
+	u8 key[dmem->bytes_key];
+	LIST_HEAD(free_pages);
+
+	spin_lock(&dmem->lru_lock);
+	list_for_each_entry_safe_reverse(page, n, &dmem->lru_list, lru) {
+		entry = dmem_entry(page);
+
+		/*
+		 * the entry could be free by other thread in the while.
+		 * check whether the situation occurred and avoid others to
+		 * free it by compare reference count and increase it
+		 * atomically.
+		 */
+		if (!atomic_inc_not_zero(&entry->refcount))
+			continue;
+
+		clear_gpage_flag(page, GF_LRU);
+		list_move(&page->lru, &free_pages);
+		if (++evicted >= nr_pages)
+			break;
+	}
+	spin_unlock(&dmem->lru_lock);
+
+	list_for_each_entry_safe(page, n, &free_pages, lru) {
+		buck = dmem_hashbuck(page);
+		entry = dmem_entry(page);
+
+		spin_lock(&buck->lock);
+		spin_lock(&dmem->lru_lock);
+		/* drop refcount increased by above loop */
+		memcpy(&key, entry->key, dmem->bytes_key);
+		dmem_put(buck, entry);
+		/* free entry if the entry is still in tree */
+		if (dmem_search_entry(buck, &key))
+			dmem_put(buck, entry);
+		spin_unlock(&dmem->lru_lock);
+		spin_unlock(&buck->lock);
+	}
+
+	return evicted;
+}
+
+/* Caller should hold bucket spinlock */
+static struct dmem_entry *dmem_find_get_entry(struct dmem_hashbucket *buck,
+						void *key)
+{
+	struct dmem_entry *entry;
+
+	assert_spin_locked(&buck->lock);
+	entry = dmem_search_entry(buck, key);
+	if (entry)
+		dmem_get(entry);
+
+	return entry;
+}
+
+static struct dmem_hashbucket *dmem_find_hashbucket(struct dmem *dmem,
+							struct dmem_pool *pool,
+							void *key)
+{
+	return &pool->hashbuckets[dmem->hash_key(key)];
+}
+
+/*
+ * dmem_init_pool - initialize a pool in dmem
+ *
+ * @dmem	dmem of a pool to be initialized
+ * @pool_id	id of a pool to be initialized
+ *
+ * Returns 0 if success,
+ * Returns non-zero if failed.
+ */
+int dmem_init_pool(struct dmem *dmem, unsigned pool_id)
+{
+	struct dmem_pool *pool;
+	struct dmem_hashbucket *buck;
+	int i;
+
+	pool = kzalloc(sizeof(struct dmem_pool), GFP_KERNEL);
+	if (!pool) {
+		pr_warn("%s: failed to alloc dmem pool %d\n",
+				__func__, pool_id);
+		return -ENOMEM;
+	}
+
+	pool->hashbuckets = kzalloc(
+				sizeof(struct dmem_hashbucket) * dmem->nr_hash,
+				GFP_KERNEL);
+	if (!pool) {
+		pr_warn("%s: failed to alloc hashbuckets\n", __func__);
+		kfree(pool);
+		return -ENOMEM;
+	}
+
+	for (i = 0; i < dmem->nr_hash; i++) {
+		buck = &pool->hashbuckets[i];
+		buck->dmem = dmem;
+		buck->rbroot = RB_ROOT;
+		spin_lock_init(&buck->lock);
+	}
+
+	dmem->pools[pool_id] = pool;
+	return 0;
+}
+
+/*
+ * dmem_store_page - store a page in dmem
+ *
+ * Saves content of @page in gcma area and manages it using dmem. The content
+ * could be loaded again from dmem using @key if it has not been discarded for
+ * first-class clients.
+ *
+ * @dmem	dmem to store the page in
+ * @pool_id	id of a dmem pool to store the page in
+ * @key		key of the page to be stored in
+ * @page	page to be stored in
+ *
+ * Returns 0 if success,
+ * Returns non-zero if failed.
+ */
+int dmem_store_page(struct dmem *dmem, unsigned pool_id, void *key,
+			struct page *page)
+{
+	struct dmem_pool *pool;
+	struct dmem_hashbucket *buck;
+	struct dmem_entry *entry, *dupentry;
+	struct gcma *gcma;
+	struct page *gcma_page = NULL;
+
+	u8 *src, *dst;
+	int ret;
+
+	pool = dmem->pools[pool_id];
+	if (!pool) {
+		pr_warn("%s: dmem pool for id %d is not exist\n",
+				__func__, pool_id);
+		return -ENODEV;
+	}
+
+	gcma_page = dmem_alloc_page(dmem, &gcma);
+	if (!gcma_page)
+		return -ENOMEM;
+
+	entry = kmem_cache_alloc(dmem_entry_cache, GFP_ATOMIC);
+	if (!entry) {
+		spin_lock(&dmem->lru_lock);
+		gcma_free_page(gcma, gcma_page);
+		spin_unlock(&dmem->lru_lock);
+		return -ENOMEM;
+	}
+
+	entry->gcma = gcma;
+	entry->page = gcma_page;
+	entry->key = kmem_cache_alloc(dmem->key_cache, GFP_ATOMIC);
+	if (!entry->key) {
+		spin_lock(&dmem->lru_lock);
+		gcma_free_page(gcma, gcma_page);
+		spin_unlock(&dmem->lru_lock);
+		kmem_cache_free(dmem_entry_cache, entry);
+		return -ENOMEM;
+	}
+	memcpy(entry->key, key, dmem->bytes_key);
+	atomic_set(&entry->refcount, 1);
+	RB_CLEAR_NODE(&entry->rbnode);
+
+	buck = dmem_find_hashbucket(dmem, pool, entry->key);
+	set_dmem_hashbuck(gcma_page, buck);
+	set_dmem_entry(gcma_page, entry);
+
+	/* copy from orig data to gcma_page */
+	src = kmap_atomic(page);
+	dst = kmap_atomic(gcma_page);
+	memcpy(dst, src, PAGE_SIZE);
+	kunmap_atomic(src);
+	kunmap_atomic(dst);
+
+	spin_lock(&buck->lock);
+	do {
+		/*
+		 * Though this duplication scenario may happen rarely by
+		 * race of dmem client layer, we handle this case here rather
+		 * than fix the client layer because handling the possibility
+		 * of duplicates is part of the tmem ABI.
+		 */
+		ret = dmem_insert_entry(buck, entry, &dupentry);
+		if (ret == -EEXIST) {
+			dmem_erase_entry(buck, dupentry);
+			spin_lock(&dmem->lru_lock);
+			dmem_put(buck, dupentry);
+			spin_unlock(&dmem->lru_lock);
+		}
+	} while (ret == -EEXIST);
+
+	spin_lock(&dmem->lru_lock);
+	set_gpage_flag(gcma_page, GF_LRU);
+	list_add(&gcma_page->lru, &dmem->lru_list);
+	spin_unlock(&dmem->lru_lock);
+	spin_unlock(&buck->lock);
+
+	return ret;
+}
+
+/*
+ * dmem_load_page - load a page stored in dmem using @key
+ *
+ * @dmem	dmem which the page stored in
+ * @pool_id	id of a dmem pool the page stored in
+ * @key		key of the page looking for
+ * @page	page to store the loaded content
+ *
+ * Returns 0 if success,
+ * Returns non-zero if failed.
+ */
+int dmem_load_page(struct dmem *dmem, unsigned pool_id, void *key,
+			struct page *page)
+{
+	struct dmem_pool *pool;
+	struct dmem_hashbucket *buck;
+	struct dmem_entry *entry;
+	struct page *gcma_page;
+	u8 *src, *dst;
+
+	pool = dmem->pools[pool_id];
+	if (!pool) {
+		pr_warn("dmem pool for id %d not exist\n", pool_id);
+		return -1;
+	}
+
+	buck = dmem_find_hashbucket(dmem, pool, key);
+
+	spin_lock(&buck->lock);
+	entry = dmem_find_get_entry(buck, key);
+	spin_unlock(&buck->lock);
+	if (!entry)
+		return -1;
+
+	gcma_page = entry->page;
+	src = kmap_atomic(gcma_page);
+	dst = kmap_atomic(page);
+	memcpy(dst, src, PAGE_SIZE);
+	kunmap_atomic(src);
+	kunmap_atomic(dst);
+
+	spin_lock(&buck->lock);
+	spin_lock(&dmem->lru_lock);
+	if (likely(gpage_flag(gcma_page, GF_LRU)))
+		list_move(&gcma_page->lru, &dmem->lru_list);
+	dmem_put(buck, entry);
+	spin_unlock(&dmem->lru_lock);
+	spin_unlock(&buck->lock);
+
+	return 0;
+}
+
+/*
+ * dmem_invalidate_entry - invalidates an entry from dmem
+ *
+ * @dmem	dmem of entry to be invalidated
+ * @pool_id	dmem pool id of entry to be invalidated
+ * @key		key of entry to be invalidated
+ *
+ * Returns 0 if success,
+ * Returns non-zero if failed.
+ */
+int dmem_invalidate_entry(struct dmem *dmem, unsigned pool_id, void *key)
+{
+	struct dmem_pool *pool;
+	struct dmem_hashbucket *buck;
+	struct dmem_entry *entry;
+
+	pool = dmem->pools[pool_id];
+	buck = dmem_find_hashbucket(dmem, pool, key);
+
+	spin_lock(&buck->lock);
+	entry = dmem_search_entry(buck, key);
+	if (!entry) {
+		spin_unlock(&buck->lock);
+		return -ENOENT;
+	}
+
+	spin_lock(&dmem->lru_lock);
+	dmem_put(buck, entry);
+	spin_unlock(&dmem->lru_lock);
+	spin_unlock(&buck->lock);
+
+	return 0;
+}
+
+/*
+ * dmem_invalidate_pool - invalidates whole entries in a dmem pool
+ *
+ * @dmem	dmem of pool to be invalidated
+ * @pool_id	id of pool to be invalidated
+ *
+ * Returns 0 if success,
+ * Returns non-zero if failed.
+ */
+int dmem_invalidate_pool(struct dmem *dmem, unsigned pool_id)
+{
+	struct dmem_pool *pool;
+	struct dmem_hashbucket *buck;
+	struct dmem_entry *entry, *n;
+	int i;
+
+	pool = dmem->pools[pool_id];
+	if (!pool)
+		return -1;
+
+	for (i = 0; i < dmem->nr_hash; i++) {
+		buck = &pool->hashbuckets[i];
+		spin_lock(&buck->lock);
+		rbtree_postorder_for_each_entry_safe(entry, n, &buck->rbroot,
+							rbnode) {
+			spin_lock(&dmem->lru_lock);
+			dmem_put(buck, entry);
+			spin_unlock(&dmem->lru_lock);
+		}
+		buck->rbroot = RB_ROOT;
+		spin_unlock(&buck->lock);
+	}
+
+	kfree(pool->hashbuckets);
+	kfree(pool);
+	dmem->pools[pool_id] = NULL;
+
+	return 0;
+}
+
+/*
+ * Return 0 if [start_pfn, end_pfn] is isolated.
+ * Otherwise, return first unisolated pfn from the start_pfn.
+ */
+static unsigned long isolate_interrupted(struct gcma *gcma,
+		unsigned long start_pfn, unsigned long end_pfn)
+{
+	unsigned long offset;
+	unsigned long *bitmap;
+	unsigned long pfn, ret = 0;
+	struct page *page;
+
+	spin_lock(&gcma->lock);
+
+	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
+		int set;
+
+		offset = pfn - gcma->base_pfn;
+		bitmap = gcma->bitmap + offset / BITS_PER_LONG;
+
+		set = test_bit(pfn % BITS_PER_LONG, bitmap);
+		if (!set) {
+			ret = pfn;
+			break;
+		}
+
+		page = pfn_to_page(pfn);
+		if (!gpage_flag(page, GF_ISOLATED)) {
+			ret = pfn;
+			break;
+		}
+
+	}
+	spin_unlock(&gcma->lock);
+	return ret;
+}
+
+/*
  * gcma_alloc_contig - allocates contiguous pages
  *
  * @start_pfn	start pfn of requiring contiguous memory area
- * @size	number of pages in requiring contiguous memory area
+ * @size	size of the requiring contiguous memory area
  *
  * Returns 0 on success, error code on failure.
  */
 int gcma_alloc_contig(struct gcma *gcma, unsigned long start_pfn,
 			unsigned long size)
 {
+	LIST_HEAD(free_pages);
+	struct dmem_hashbucket *buck;
+	struct dmem_entry *entry;
+	struct cleancache_dmem_key key;	/* cc key is larger than fs's */
+	struct page *page, *n;
 	unsigned long offset;
+	unsigned long *bitmap;
+	unsigned long pfn;
+	unsigned long orig_start = start_pfn;
+	spinlock_t *lru_lock;
 
-	spin_lock(&gcma->lock);
-	offset = start_pfn - gcma->base_pfn;
+retry:
+	for (pfn = start_pfn; pfn < start_pfn + size; pfn++) {
+		spin_lock(&gcma->lock);
+
+		offset = pfn - gcma->base_pfn;
+		bitmap = gcma->bitmap + offset / BITS_PER_LONG;
+		page = pfn_to_page(pfn);
+
+		if (!test_bit(offset % BITS_PER_LONG, bitmap)) {
+			/* set a bit to prevent allocation for dmem */
+			bitmap_set(gcma->bitmap, offset, 1);
+			set_gpage_flag(page, GF_ISOLATED);
+			spin_unlock(&gcma->lock);
+			continue;
+		}
+		if (gpage_flag(page, GF_ISOLATED)) {
+			spin_unlock(&gcma->lock);
+			continue;
+		}
+
+		/* Someone is using the page so it's complicated :( */
+		spin_unlock(&gcma->lock);
+
+		/* During dmem_store, hashbuck could not be set in page, yet */
+		if (dmem_hashbuck(page) == NULL)
+			continue;
+
+		lru_lock = &dmem_hashbuck(page)->dmem->lru_lock;
+		spin_lock(lru_lock);
+		spin_lock(&gcma->lock);
 
-	if (bitmap_find_next_zero_area(gcma->bitmap, gcma->size, offset,
-				size, 0) != 0) {
+		/* Avoid allocation from other threads */
+		set_gpage_flag(page, GF_RECLAIMING);
+
+		/*
+		 * The page is in LRU and being used by someone. Discard it
+		 * after removing from lru_list.
+		 */
+		if (gpage_flag(page, GF_LRU)) {
+			entry = dmem_entry(page);
+			if (atomic_inc_not_zero(&entry->refcount)) {
+				clear_gpage_flag(page, GF_LRU);
+				list_move(&page->lru, &free_pages);
+				goto next_page;
+			}
+		}
+
+		/*
+		 * The page is
+		 * 1) allocated by others but not yet in LRU in case of
+		 *    dmem_store or
+		 * 2) deleted from LRU but not yet from gcma's bitmap in case
+		 *    of dmem_invalidate or dmem_evict_lru.
+		 * Anycase, the race is small so retry after a while will see
+		 * success. Below isolate_interrupted handles it.
+		 */
+next_page:
 		spin_unlock(&gcma->lock);
-		pr_warn("already allocated region required: %lu, %lu",
-				start_pfn, size);
-		return -EINVAL;
+		spin_unlock(lru_lock);
 	}
 
-	bitmap_set(gcma->bitmap, offset, size);
-	spin_unlock(&gcma->lock);
+	/*
+	 * Since we increased refcount of the page above, we can access
+	 * dmem_entry with safe.
+	 */
+	list_for_each_entry_safe(page, n, &free_pages, lru) {
+		buck = dmem_hashbuck(page);
+		entry = dmem_entry(page);
+		lru_lock = &dmem_hashbuck(page)->dmem->lru_lock;
+
+		spin_lock(&buck->lock);
+		spin_lock(lru_lock);
+		/* drop refcount increased by above loop */
+		memcpy(&key, entry->key, dmem_hashbuck(page)->dmem->bytes_key);
+		dmem_put(buck, entry);
+		/* free entry if the entry is still in tree */
+		if (dmem_search_entry(buck, &key))
+			dmem_put(buck, entry);
+		spin_unlock(lru_lock);
+		spin_unlock(&buck->lock);
+	}
+
+	start_pfn = isolate_interrupted(gcma, orig_start, orig_start + size);
+	if (start_pfn)
+		goto retry;
 
 	return 0;
 }
@@ -162,6 +885,10 @@ static int __init init_gcma(void)
 {
 	pr_info("loading gcma\n");
 
+	dmem_entry_cache = KMEM_CACHE(dmem_entry, 0);
+	if (dmem_entry_cache == NULL)
+		return -ENOMEM;
+
 	return 0;
 }
 
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC v2 3/5] gcma: adopt cleancache and frontswap as second-class clients
  2015-02-23 19:54 ` SeongJae Park
@ 2015-02-23 19:54   ` SeongJae Park
  -1 siblings, 0 replies; 20+ messages in thread
From: SeongJae Park @ 2015-02-23 19:54 UTC (permalink / raw)
  To: akpm
  Cc: lauraa, minchan, sergey.senozhatsky, linux-mm, linux-kernel,
	SeongJae Park

Because pages in cleancache is clean and out of kernel scope, they could
be free immediately whenever it required to be with no additional task.
Similarly, because frontswap pages are out of kernel scope, they could
be free easily after written back to backing swap device. Moreover, the
writing back task could be avoided if frontswap run as write-through
mode.  It means cleancache and write-through mode frontswap pages are
best candidates for second-class clients of gcma.

By the consequence, this commit implements cleancache and write-through
mode frontswap backend inside gcma area using discardable memory
interface.

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
 include/linux/gcma.h |   3 +
 mm/gcma.c            | 312 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 314 insertions(+), 1 deletion(-)

diff --git a/include/linux/gcma.h b/include/linux/gcma.h
index 005bf77..12e4431 100644
--- a/include/linux/gcma.h
+++ b/include/linux/gcma.h
@@ -20,6 +20,9 @@
  * backend of discardable memory. Any candiates satisfying with discardable
  * memory could be second-class client of GCMA using the interface.
  *
+ * Currently, GCMA uses cleancache and write-through mode frontswap as
+ * second-class clients.
+ *
  * Copyright (C) 2014  LG Electronics Inc.,
  * Copyright (C) 2014  Minchan Kim <minchan@kernel.org>
  * Copyright (C) 2014-2015  SeongJae Park <sj38.park@gmail.com>
diff --git a/mm/gcma.c b/mm/gcma.c
index dc70fa8..924e3f6 100644
--- a/mm/gcma.c
+++ b/mm/gcma.c
@@ -20,6 +20,9 @@
  * backend of discardable memory. Any candiates satisfying with discardable
  * memory could be second-class client of GCMA using the interface.
  *
+ * Currently, GCMA uses cleancache and write-through mode frontswap as
+ * second-class clients.
+ *
  * Copyright (C) 2014  LG Electronics Inc.,
  * Copyright (C) 2014  Minchan Kim <minchan@kernel.org>
  * Copyright (C) 2014-2015  SeongJae Park <sj38.park@gmail.com>
@@ -27,11 +30,24 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/cleancache.h>
+#include <linux/frontswap.h>
 #include <linux/gcma.h>
+#include <linux/hash.h>
 #include <linux/highmem.h>
 #include <linux/module.h>
 #include <linux/slab.h>
 
+#define BITS_FS_DMEM_HASH	8
+#define NR_FS_DMEM_HASH_BUCKS	(1 << BITS_FS_DMEM_HASH)
+#define BYTES_FS_DMEM_KEY	(sizeof(struct frontswap_dmem_key))
+
+#define BITS_CC_DMEM_HASH	8
+#define NR_CC_DMEM_HASH_BUCKS	(1 << BITS_CC_DMEM_HASH)
+#define BYTES_CC_DMEM_KEY	(sizeof(struct cleancache_dmem_key))
+#define MAX_CLEANCACHE_FS	16
+
+
 /* XXX: What's the ideal? */
 #define NR_EVICT_BATCH	32
 
@@ -92,8 +108,28 @@ struct dmem {
 	int (*compare)(void *lkey, void *rkey);
 };
 
+struct frontswap_dmem_key {
+	pgoff_t key;
+};
+
+struct cleancache_dmem_key {
+	u8 key[sizeof(pgoff_t) + sizeof(struct cleancache_filekey)];
+};
+
 static struct kmem_cache *dmem_entry_cache;
 
+static struct dmem fs_dmem;	/* dmem for frontswap backend */
+
+static struct dmem cc_dmem;	/* dmem for cleancache backend */
+static atomic_t nr_cleancache_fses = ATOMIC_INIT(0);
+
+/* configs from kernel parameter */
+static bool fs_disabled __read_mostly;
+module_param_named(fs_disabled, fs_disabled, bool, 0444);
+
+static bool cc_disabled __read_mostly;
+module_param_named(cc_disabled, cc_disabled, bool, 0444);
+
 static unsigned long dmem_evict_lru(struct dmem *dmem, unsigned long nr_pages);
 
 static struct dmem_hashbucket *dmem_hashbuck(struct page *page)
@@ -174,6 +210,7 @@ int gcma_init(unsigned long start_pfn, unsigned long size,
 {
 	int bitmap_size = BITS_TO_LONGS(size) * sizeof(long);
 	struct gcma *gcma;
+	unsigned long flags;
 
 	gcma = kmalloc(sizeof(*gcma), GFP_KERNEL);
 	if (!gcma)
@@ -187,9 +224,11 @@ int gcma_init(unsigned long start_pfn, unsigned long size,
 	gcma->base_pfn = start_pfn;
 	spin_lock_init(&gcma->lock);
 
+	local_irq_save(flags);
 	spin_lock(&ginfo.lock);
 	list_add(&gcma->list, &ginfo.head);
 	spin_unlock(&ginfo.lock);
+	local_irq_restore(flags);
 
 	*res_gcma = gcma;
 	pr_info("initialized gcma area [%lu, %lu]\n",
@@ -207,7 +246,9 @@ static struct page *gcma_alloc_page(struct gcma *gcma)
 	unsigned long bit;
 	unsigned long *bitmap = gcma->bitmap;
 	struct page *page = NULL;
+	unsigned long flags;
 
+	local_irq_save(flags);
 	spin_lock(&gcma->lock);
 	bit = bitmap_find_next_zero_area(bitmap, gcma->size, 0, 1, 0);
 	if (bit >= gcma->size) {
@@ -221,6 +262,7 @@ static struct page *gcma_alloc_page(struct gcma *gcma)
 	clear_gpage_flagall(page);
 
 out:
+	local_irq_restore(flags);
 	return page;
 }
 
@@ -228,9 +270,11 @@ out:
 static void gcma_free_page(struct gcma *gcma, struct page *page)
 {
 	unsigned long pfn, offset;
+	unsigned long flags;
 
 	pfn = page_to_pfn(page);
 
+	local_irq_save(flags);
 	spin_lock(&gcma->lock);
 	offset = pfn - gcma->base_pfn;
 
@@ -247,6 +291,7 @@ static void gcma_free_page(struct gcma *gcma, struct page *page)
 		set_gpage_flag(page, GF_ISOLATED);
 	}
 	spin_unlock(&gcma->lock);
+	local_irq_restore(flags);
 }
 
 /*
@@ -313,7 +358,9 @@ static struct page *dmem_alloc_page(struct dmem *dmem, struct gcma **res_gcma)
 {
 	struct page *page;
 	struct gcma *gcma;
+	unsigned long flags;
 
+	local_irq_save(flags);
 retry:
 	spin_lock(&ginfo.lock);
 	gcma = list_first_entry(&ginfo.head, struct gcma, list);
@@ -336,6 +383,7 @@ retry:
 		goto retry;
 
 got:
+	local_irq_restore(flags);
 	*res_gcma = gcma;
 	return page;
 }
@@ -486,7 +534,20 @@ int dmem_init_pool(struct dmem *dmem, unsigned pool_id)
 		buck = &pool->hashbuckets[i];
 		buck->dmem = dmem;
 		buck->rbroot = RB_ROOT;
-		spin_lock_init(&buck->lock);
+
+		/*
+		 * Because lockdep recognizes lock class using lock
+		 * initialization point, bucket lock of dmem for cleancache and
+		 * frontswap be treated as same class.
+		 * Because cleancache have dependency with softirq safe lock
+		 * while frontswap doesn't, lockdep causes false irq lock
+		 * inversion dependency report.
+		 * Avoid the situation using this ugly, simple hack.
+		 */
+		if (dmem == &fs_dmem)
+			spin_lock_init(&buck->lock);
+		else
+			spin_lock_init(&buck->lock);
 	}
 
 	dmem->pools[pool_id] = pool;
@@ -716,6 +777,180 @@ int dmem_invalidate_pool(struct dmem *dmem, unsigned pool_id)
 	return 0;
 }
 
+
+static int frontswap_compare(void *lkey, void *rkey)
+{
+	return *(pgoff_t *)lkey - *(pgoff_t *)rkey;
+}
+
+static unsigned frontswap_hash_key(void *key)
+{
+	return *(pgoff_t *)key % fs_dmem.nr_hash;
+}
+
+void gcma_frontswap_init(unsigned type)
+{
+	dmem_init_pool(&fs_dmem, type);
+}
+
+int gcma_frontswap_store(unsigned type, pgoff_t offset,
+				struct page *page)
+{
+	return dmem_store_page(&fs_dmem, type, (void *)&offset, page);
+}
+
+/*
+ * Returns 0 if success,
+ * Returns non-zero if failed.
+ */
+int gcma_frontswap_load(unsigned type, pgoff_t offset,
+			       struct page *page)
+{
+	return dmem_load_page(&fs_dmem, type, (void *)&offset, page);
+}
+
+void gcma_frontswap_invalidate_page(unsigned type, pgoff_t offset)
+{
+	dmem_invalidate_entry(&fs_dmem, type, (void *)&offset);
+}
+
+void gcma_frontswap_invalidate_area(unsigned type)
+{
+	dmem_invalidate_pool(&fs_dmem, type);
+}
+
+static struct frontswap_ops gcma_frontswap_ops = {
+	.init = gcma_frontswap_init,
+	.store = gcma_frontswap_store,
+	.load = gcma_frontswap_load,
+	.invalidate_page = gcma_frontswap_invalidate_page,
+	.invalidate_area = gcma_frontswap_invalidate_area
+};
+
+
+static int cleancache_compare(void *lkey, void *rkey)
+{
+	/* Frontswap uses pgoff_t value as key */
+	return memcmp(lkey, rkey, BYTES_CC_DMEM_KEY);
+}
+
+static unsigned int cleancache_hash_key(void *key)
+{
+	unsigned long *k = (unsigned long *)key;
+
+	return hash_long(k[0] ^ k[1] ^ k[2], BITS_CC_DMEM_HASH);
+}
+
+static void cleancache_set_key(struct cleancache_filekey *fkey, pgoff_t *offset,
+				void *key)
+{
+	memcpy(key, offset, sizeof(pgoff_t));
+	memcpy(key + sizeof(pgoff_t), fkey, sizeof(struct cleancache_filekey));
+}
+
+
+/* Returns positive pool id or negative error code */
+int gcma_cleancache_init_fs(size_t pagesize)
+{
+	int pool_id;
+	int err;
+
+	pool_id = atomic_inc_return(&nr_cleancache_fses) - 1;
+	if (pool_id >= MAX_CLEANCACHE_FS) {
+		pr_warn("%s: too many cleancache fs %d / %d\n",
+				__func__, pool_id, MAX_CLEANCACHE_FS);
+		return -1;
+	}
+
+	err = dmem_init_pool(&cc_dmem, pool_id);
+	if (err != 0)
+		return err;
+	return pool_id;
+}
+
+int gcma_cleancache_init_shared_fs(char *uuid, size_t pagesize)
+{
+	return -1;
+}
+
+int gcma_cleancache_get_page(int pool_id, struct cleancache_filekey fkey,
+				pgoff_t offset, struct page *page)
+{
+	struct cleancache_dmem_key key;
+	int ret;
+	unsigned long flags;
+
+	cleancache_set_key(&fkey, &offset, &key);
+
+	local_irq_save(flags);
+	ret = dmem_load_page(&cc_dmem, pool_id, &key, page);
+	local_irq_restore(flags);
+	return ret;
+}
+
+void gcma_cleancache_put_page(int pool_id, struct cleancache_filekey fkey,
+				pgoff_t offset, struct page *page)
+{
+	struct cleancache_dmem_key key;
+	unsigned long flags;
+
+	cleancache_set_key(&fkey, &offset, &key);
+
+	local_irq_save(flags);
+	dmem_store_page(&cc_dmem, pool_id, &key, page);
+	local_irq_restore(flags);
+}
+
+void gcma_cleancache_invalidate_page(int pool_id,
+					struct cleancache_filekey fkey,
+					pgoff_t offset)
+{
+	struct cleancache_dmem_key key;
+	unsigned long flags;
+
+	cleancache_set_key(&fkey, &offset, &key);
+
+	local_irq_save(flags);
+	dmem_invalidate_entry(&cc_dmem, pool_id, &key);
+	local_irq_restore(flags);
+}
+
+/*
+ * Invalidating every entry of an filekey from a dmem pool requires iterating
+ * and comparing key of every entry in the pool; it could be too expensive. To
+ * alleviates the overhead, do nothing here. The entry will be evicted in LRU
+ * order anyway.
+ */
+void gcma_cleancache_invalidate_inode(int pool_id,
+					struct cleancache_filekey key)
+{
+}
+
+void gcma_cleancache_invalidate_fs(int pool_id)
+{
+	unsigned long flags;
+
+	if (pool_id < 0 || pool_id >= atomic_read(&nr_cleancache_fses)) {
+		pr_warn("%s received wrong pool id %d\n",
+				__func__, pool_id);
+		return;
+	}
+	local_irq_save(flags);
+	dmem_invalidate_pool(&cc_dmem, pool_id);
+	local_irq_restore(flags);
+}
+
+struct cleancache_ops gcma_cleancache_ops = {
+	.init_fs = gcma_cleancache_init_fs,
+	.init_shared_fs = gcma_cleancache_init_shared_fs,
+	.get_page = gcma_cleancache_get_page,
+	.put_page = gcma_cleancache_put_page,
+	.invalidate_page = gcma_cleancache_invalidate_page,
+	.invalidate_inode = gcma_cleancache_invalidate_inode,
+	.invalidate_fs = gcma_cleancache_invalidate_fs,
+};
+
+
 /*
  * Return 0 if [start_pfn, end_pfn] is isolated.
  * Otherwise, return first unisolated pfn from the start_pfn.
@@ -727,7 +962,9 @@ static unsigned long isolate_interrupted(struct gcma *gcma,
 	unsigned long *bitmap;
 	unsigned long pfn, ret = 0;
 	struct page *page;
+	unsigned long flags;
 
+	local_irq_save(flags);
 	spin_lock(&gcma->lock);
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
@@ -750,6 +987,7 @@ static unsigned long isolate_interrupted(struct gcma *gcma,
 
 	}
 	spin_unlock(&gcma->lock);
+	local_irq_restore(flags);
 	return ret;
 }
 
@@ -774,9 +1012,11 @@ int gcma_alloc_contig(struct gcma *gcma, unsigned long start_pfn,
 	unsigned long pfn;
 	unsigned long orig_start = start_pfn;
 	spinlock_t *lru_lock;
+	unsigned long flags = 0;
 
 retry:
 	for (pfn = start_pfn; pfn < start_pfn + size; pfn++) {
+		local_irq_save(flags);
 		spin_lock(&gcma->lock);
 
 		offset = pfn - gcma->base_pfn;
@@ -788,21 +1028,25 @@ retry:
 			bitmap_set(gcma->bitmap, offset, 1);
 			set_gpage_flag(page, GF_ISOLATED);
 			spin_unlock(&gcma->lock);
+			local_irq_restore(flags);
 			continue;
 		}
 		if (gpage_flag(page, GF_ISOLATED)) {
 			spin_unlock(&gcma->lock);
+			local_irq_restore(flags);
 			continue;
 		}
 
 		/* Someone is using the page so it's complicated :( */
 		spin_unlock(&gcma->lock);
+		local_irq_restore(flags);
 
 		/* During dmem_store, hashbuck could not be set in page, yet */
 		if (dmem_hashbuck(page) == NULL)
 			continue;
 
 		lru_lock = &dmem_hashbuck(page)->dmem->lru_lock;
+		local_irq_save(flags);
 		spin_lock(lru_lock);
 		spin_lock(&gcma->lock);
 
@@ -834,6 +1078,7 @@ retry:
 next_page:
 		spin_unlock(&gcma->lock);
 		spin_unlock(lru_lock);
+		local_irq_restore(flags);
 	}
 
 	/*
@@ -845,6 +1090,8 @@ next_page:
 		entry = dmem_entry(page);
 		lru_lock = &dmem_hashbuck(page)->dmem->lru_lock;
 
+		if (lru_lock == &cc_dmem.lru_lock)
+			local_irq_save(flags);
 		spin_lock(&buck->lock);
 		spin_lock(lru_lock);
 		/* drop refcount increased by above loop */
@@ -855,6 +1102,8 @@ next_page:
 			dmem_put(buck, entry);
 		spin_unlock(lru_lock);
 		spin_unlock(&buck->lock);
+		if (lru_lock == &cc_dmem.lru_lock)
+			local_irq_restore(flags);
 	}
 
 	start_pfn = isolate_interrupted(gcma, orig_start, orig_start + size);
@@ -874,11 +1123,14 @@ void gcma_free_contig(struct gcma *gcma,
 			unsigned long start_pfn, unsigned long size)
 {
 	unsigned long offset;
+	unsigned long flags;
 
+	local_irq_save(flags);
 	spin_lock(&gcma->lock);
 	offset = start_pfn - gcma->base_pfn;
 	bitmap_clear(gcma->bitmap, offset, size);
 	spin_unlock(&gcma->lock);
+	local_irq_restore(flags);
 }
 
 static int __init init_gcma(void)
@@ -889,6 +1141,64 @@ static int __init init_gcma(void)
 	if (dmem_entry_cache == NULL)
 		return -ENOMEM;
 
+	if (fs_disabled) {
+		pr_info("gcma frontswap is disabled. skip it\n");
+		goto init_cleancache;
+	}
+	fs_dmem.nr_pools = MAX_SWAPFILES;
+	fs_dmem.pools = kzalloc(sizeof(struct dmem_pool *) * fs_dmem.nr_pools,
+				GFP_KERNEL);
+	if (!fs_dmem.pools) {
+		pr_warn("failed to allocate frontswap dmem pools\n");
+		return -ENOMEM;
+	}
+
+	fs_dmem.nr_hash = NR_FS_DMEM_HASH_BUCKS;
+	fs_dmem.key_cache = KMEM_CACHE(frontswap_dmem_key, 0);
+	if (!fs_dmem.key_cache)
+		return -ENOMEM;
+	fs_dmem.bytes_key = BYTES_FS_DMEM_KEY;
+
+	INIT_LIST_HEAD(&fs_dmem.lru_list);
+	spin_lock_init(&fs_dmem.lru_lock);
+
+	fs_dmem.hash_key = frontswap_hash_key;
+	fs_dmem.compare = frontswap_compare;
+
+	/*
+	 * By writethough mode, GCMA could discard all of pages in an instant
+	 * instead of slow writing pages out to the swap device.
+	 */
+	frontswap_writethrough(true);
+	frontswap_register_ops(&gcma_frontswap_ops);
+
+init_cleancache:
+	if (cc_disabled) {
+		pr_info("gcma cleancache is disabled. skip it\n");
+		goto init_debugfs;
+	}
+	cc_dmem.nr_pools = MAX_CLEANCACHE_FS;
+	cc_dmem.pools = kzalloc(sizeof(struct dmem_pool *) * cc_dmem.nr_pools,
+				GFP_KERNEL);
+	if (!cc_dmem.pools) {
+		pr_warn("failed to allocate cleancache dmem pools\n");
+		return -ENOMEM;
+	}
+	cc_dmem.nr_hash = NR_CC_DMEM_HASH_BUCKS;
+	cc_dmem.key_cache = KMEM_CACHE(cleancache_dmem_key, 0);
+	if (!cc_dmem.key_cache)
+		return -ENOMEM;
+	cc_dmem.bytes_key = BYTES_CC_DMEM_KEY;
+
+	INIT_LIST_HEAD(&cc_dmem.lru_list);
+	spin_lock_init(&cc_dmem.lru_lock);
+
+	cc_dmem.hash_key = cleancache_hash_key;
+	cc_dmem.compare = cleancache_compare;
+	cleancache_register_ops(&gcma_cleancache_ops);
+
+init_debugfs:
+	gcma_debugfs_init();
 	return 0;
 }
 
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC v2 3/5] gcma: adopt cleancache and frontswap as second-class clients
@ 2015-02-23 19:54   ` SeongJae Park
  0 siblings, 0 replies; 20+ messages in thread
From: SeongJae Park @ 2015-02-23 19:54 UTC (permalink / raw)
  To: akpm
  Cc: lauraa, minchan, sergey.senozhatsky, linux-mm, linux-kernel,
	SeongJae Park

Because pages in cleancache is clean and out of kernel scope, they could
be free immediately whenever it required to be with no additional task.
Similarly, because frontswap pages are out of kernel scope, they could
be free easily after written back to backing swap device. Moreover, the
writing back task could be avoided if frontswap run as write-through
mode.  It means cleancache and write-through mode frontswap pages are
best candidates for second-class clients of gcma.

By the consequence, this commit implements cleancache and write-through
mode frontswap backend inside gcma area using discardable memory
interface.

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
 include/linux/gcma.h |   3 +
 mm/gcma.c            | 312 ++++++++++++++++++++++++++++++++++++++++++++++++++-
 2 files changed, 314 insertions(+), 1 deletion(-)

diff --git a/include/linux/gcma.h b/include/linux/gcma.h
index 005bf77..12e4431 100644
--- a/include/linux/gcma.h
+++ b/include/linux/gcma.h
@@ -20,6 +20,9 @@
  * backend of discardable memory. Any candiates satisfying with discardable
  * memory could be second-class client of GCMA using the interface.
  *
+ * Currently, GCMA uses cleancache and write-through mode frontswap as
+ * second-class clients.
+ *
  * Copyright (C) 2014  LG Electronics Inc.,
  * Copyright (C) 2014  Minchan Kim <minchan@kernel.org>
  * Copyright (C) 2014-2015  SeongJae Park <sj38.park@gmail.com>
diff --git a/mm/gcma.c b/mm/gcma.c
index dc70fa8..924e3f6 100644
--- a/mm/gcma.c
+++ b/mm/gcma.c
@@ -20,6 +20,9 @@
  * backend of discardable memory. Any candiates satisfying with discardable
  * memory could be second-class client of GCMA using the interface.
  *
+ * Currently, GCMA uses cleancache and write-through mode frontswap as
+ * second-class clients.
+ *
  * Copyright (C) 2014  LG Electronics Inc.,
  * Copyright (C) 2014  Minchan Kim <minchan@kernel.org>
  * Copyright (C) 2014-2015  SeongJae Park <sj38.park@gmail.com>
@@ -27,11 +30,24 @@
 
 #define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
 
+#include <linux/cleancache.h>
+#include <linux/frontswap.h>
 #include <linux/gcma.h>
+#include <linux/hash.h>
 #include <linux/highmem.h>
 #include <linux/module.h>
 #include <linux/slab.h>
 
+#define BITS_FS_DMEM_HASH	8
+#define NR_FS_DMEM_HASH_BUCKS	(1 << BITS_FS_DMEM_HASH)
+#define BYTES_FS_DMEM_KEY	(sizeof(struct frontswap_dmem_key))
+
+#define BITS_CC_DMEM_HASH	8
+#define NR_CC_DMEM_HASH_BUCKS	(1 << BITS_CC_DMEM_HASH)
+#define BYTES_CC_DMEM_KEY	(sizeof(struct cleancache_dmem_key))
+#define MAX_CLEANCACHE_FS	16
+
+
 /* XXX: What's the ideal? */
 #define NR_EVICT_BATCH	32
 
@@ -92,8 +108,28 @@ struct dmem {
 	int (*compare)(void *lkey, void *rkey);
 };
 
+struct frontswap_dmem_key {
+	pgoff_t key;
+};
+
+struct cleancache_dmem_key {
+	u8 key[sizeof(pgoff_t) + sizeof(struct cleancache_filekey)];
+};
+
 static struct kmem_cache *dmem_entry_cache;
 
+static struct dmem fs_dmem;	/* dmem for frontswap backend */
+
+static struct dmem cc_dmem;	/* dmem for cleancache backend */
+static atomic_t nr_cleancache_fses = ATOMIC_INIT(0);
+
+/* configs from kernel parameter */
+static bool fs_disabled __read_mostly;
+module_param_named(fs_disabled, fs_disabled, bool, 0444);
+
+static bool cc_disabled __read_mostly;
+module_param_named(cc_disabled, cc_disabled, bool, 0444);
+
 static unsigned long dmem_evict_lru(struct dmem *dmem, unsigned long nr_pages);
 
 static struct dmem_hashbucket *dmem_hashbuck(struct page *page)
@@ -174,6 +210,7 @@ int gcma_init(unsigned long start_pfn, unsigned long size,
 {
 	int bitmap_size = BITS_TO_LONGS(size) * sizeof(long);
 	struct gcma *gcma;
+	unsigned long flags;
 
 	gcma = kmalloc(sizeof(*gcma), GFP_KERNEL);
 	if (!gcma)
@@ -187,9 +224,11 @@ int gcma_init(unsigned long start_pfn, unsigned long size,
 	gcma->base_pfn = start_pfn;
 	spin_lock_init(&gcma->lock);
 
+	local_irq_save(flags);
 	spin_lock(&ginfo.lock);
 	list_add(&gcma->list, &ginfo.head);
 	spin_unlock(&ginfo.lock);
+	local_irq_restore(flags);
 
 	*res_gcma = gcma;
 	pr_info("initialized gcma area [%lu, %lu]\n",
@@ -207,7 +246,9 @@ static struct page *gcma_alloc_page(struct gcma *gcma)
 	unsigned long bit;
 	unsigned long *bitmap = gcma->bitmap;
 	struct page *page = NULL;
+	unsigned long flags;
 
+	local_irq_save(flags);
 	spin_lock(&gcma->lock);
 	bit = bitmap_find_next_zero_area(bitmap, gcma->size, 0, 1, 0);
 	if (bit >= gcma->size) {
@@ -221,6 +262,7 @@ static struct page *gcma_alloc_page(struct gcma *gcma)
 	clear_gpage_flagall(page);
 
 out:
+	local_irq_restore(flags);
 	return page;
 }
 
@@ -228,9 +270,11 @@ out:
 static void gcma_free_page(struct gcma *gcma, struct page *page)
 {
 	unsigned long pfn, offset;
+	unsigned long flags;
 
 	pfn = page_to_pfn(page);
 
+	local_irq_save(flags);
 	spin_lock(&gcma->lock);
 	offset = pfn - gcma->base_pfn;
 
@@ -247,6 +291,7 @@ static void gcma_free_page(struct gcma *gcma, struct page *page)
 		set_gpage_flag(page, GF_ISOLATED);
 	}
 	spin_unlock(&gcma->lock);
+	local_irq_restore(flags);
 }
 
 /*
@@ -313,7 +358,9 @@ static struct page *dmem_alloc_page(struct dmem *dmem, struct gcma **res_gcma)
 {
 	struct page *page;
 	struct gcma *gcma;
+	unsigned long flags;
 
+	local_irq_save(flags);
 retry:
 	spin_lock(&ginfo.lock);
 	gcma = list_first_entry(&ginfo.head, struct gcma, list);
@@ -336,6 +383,7 @@ retry:
 		goto retry;
 
 got:
+	local_irq_restore(flags);
 	*res_gcma = gcma;
 	return page;
 }
@@ -486,7 +534,20 @@ int dmem_init_pool(struct dmem *dmem, unsigned pool_id)
 		buck = &pool->hashbuckets[i];
 		buck->dmem = dmem;
 		buck->rbroot = RB_ROOT;
-		spin_lock_init(&buck->lock);
+
+		/*
+		 * Because lockdep recognizes lock class using lock
+		 * initialization point, bucket lock of dmem for cleancache and
+		 * frontswap be treated as same class.
+		 * Because cleancache have dependency with softirq safe lock
+		 * while frontswap doesn't, lockdep causes false irq lock
+		 * inversion dependency report.
+		 * Avoid the situation using this ugly, simple hack.
+		 */
+		if (dmem == &fs_dmem)
+			spin_lock_init(&buck->lock);
+		else
+			spin_lock_init(&buck->lock);
 	}
 
 	dmem->pools[pool_id] = pool;
@@ -716,6 +777,180 @@ int dmem_invalidate_pool(struct dmem *dmem, unsigned pool_id)
 	return 0;
 }
 
+
+static int frontswap_compare(void *lkey, void *rkey)
+{
+	return *(pgoff_t *)lkey - *(pgoff_t *)rkey;
+}
+
+static unsigned frontswap_hash_key(void *key)
+{
+	return *(pgoff_t *)key % fs_dmem.nr_hash;
+}
+
+void gcma_frontswap_init(unsigned type)
+{
+	dmem_init_pool(&fs_dmem, type);
+}
+
+int gcma_frontswap_store(unsigned type, pgoff_t offset,
+				struct page *page)
+{
+	return dmem_store_page(&fs_dmem, type, (void *)&offset, page);
+}
+
+/*
+ * Returns 0 if success,
+ * Returns non-zero if failed.
+ */
+int gcma_frontswap_load(unsigned type, pgoff_t offset,
+			       struct page *page)
+{
+	return dmem_load_page(&fs_dmem, type, (void *)&offset, page);
+}
+
+void gcma_frontswap_invalidate_page(unsigned type, pgoff_t offset)
+{
+	dmem_invalidate_entry(&fs_dmem, type, (void *)&offset);
+}
+
+void gcma_frontswap_invalidate_area(unsigned type)
+{
+	dmem_invalidate_pool(&fs_dmem, type);
+}
+
+static struct frontswap_ops gcma_frontswap_ops = {
+	.init = gcma_frontswap_init,
+	.store = gcma_frontswap_store,
+	.load = gcma_frontswap_load,
+	.invalidate_page = gcma_frontswap_invalidate_page,
+	.invalidate_area = gcma_frontswap_invalidate_area
+};
+
+
+static int cleancache_compare(void *lkey, void *rkey)
+{
+	/* Frontswap uses pgoff_t value as key */
+	return memcmp(lkey, rkey, BYTES_CC_DMEM_KEY);
+}
+
+static unsigned int cleancache_hash_key(void *key)
+{
+	unsigned long *k = (unsigned long *)key;
+
+	return hash_long(k[0] ^ k[1] ^ k[2], BITS_CC_DMEM_HASH);
+}
+
+static void cleancache_set_key(struct cleancache_filekey *fkey, pgoff_t *offset,
+				void *key)
+{
+	memcpy(key, offset, sizeof(pgoff_t));
+	memcpy(key + sizeof(pgoff_t), fkey, sizeof(struct cleancache_filekey));
+}
+
+
+/* Returns positive pool id or negative error code */
+int gcma_cleancache_init_fs(size_t pagesize)
+{
+	int pool_id;
+	int err;
+
+	pool_id = atomic_inc_return(&nr_cleancache_fses) - 1;
+	if (pool_id >= MAX_CLEANCACHE_FS) {
+		pr_warn("%s: too many cleancache fs %d / %d\n",
+				__func__, pool_id, MAX_CLEANCACHE_FS);
+		return -1;
+	}
+
+	err = dmem_init_pool(&cc_dmem, pool_id);
+	if (err != 0)
+		return err;
+	return pool_id;
+}
+
+int gcma_cleancache_init_shared_fs(char *uuid, size_t pagesize)
+{
+	return -1;
+}
+
+int gcma_cleancache_get_page(int pool_id, struct cleancache_filekey fkey,
+				pgoff_t offset, struct page *page)
+{
+	struct cleancache_dmem_key key;
+	int ret;
+	unsigned long flags;
+
+	cleancache_set_key(&fkey, &offset, &key);
+
+	local_irq_save(flags);
+	ret = dmem_load_page(&cc_dmem, pool_id, &key, page);
+	local_irq_restore(flags);
+	return ret;
+}
+
+void gcma_cleancache_put_page(int pool_id, struct cleancache_filekey fkey,
+				pgoff_t offset, struct page *page)
+{
+	struct cleancache_dmem_key key;
+	unsigned long flags;
+
+	cleancache_set_key(&fkey, &offset, &key);
+
+	local_irq_save(flags);
+	dmem_store_page(&cc_dmem, pool_id, &key, page);
+	local_irq_restore(flags);
+}
+
+void gcma_cleancache_invalidate_page(int pool_id,
+					struct cleancache_filekey fkey,
+					pgoff_t offset)
+{
+	struct cleancache_dmem_key key;
+	unsigned long flags;
+
+	cleancache_set_key(&fkey, &offset, &key);
+
+	local_irq_save(flags);
+	dmem_invalidate_entry(&cc_dmem, pool_id, &key);
+	local_irq_restore(flags);
+}
+
+/*
+ * Invalidating every entry of an filekey from a dmem pool requires iterating
+ * and comparing key of every entry in the pool; it could be too expensive. To
+ * alleviates the overhead, do nothing here. The entry will be evicted in LRU
+ * order anyway.
+ */
+void gcma_cleancache_invalidate_inode(int pool_id,
+					struct cleancache_filekey key)
+{
+}
+
+void gcma_cleancache_invalidate_fs(int pool_id)
+{
+	unsigned long flags;
+
+	if (pool_id < 0 || pool_id >= atomic_read(&nr_cleancache_fses)) {
+		pr_warn("%s received wrong pool id %d\n",
+				__func__, pool_id);
+		return;
+	}
+	local_irq_save(flags);
+	dmem_invalidate_pool(&cc_dmem, pool_id);
+	local_irq_restore(flags);
+}
+
+struct cleancache_ops gcma_cleancache_ops = {
+	.init_fs = gcma_cleancache_init_fs,
+	.init_shared_fs = gcma_cleancache_init_shared_fs,
+	.get_page = gcma_cleancache_get_page,
+	.put_page = gcma_cleancache_put_page,
+	.invalidate_page = gcma_cleancache_invalidate_page,
+	.invalidate_inode = gcma_cleancache_invalidate_inode,
+	.invalidate_fs = gcma_cleancache_invalidate_fs,
+};
+
+
 /*
  * Return 0 if [start_pfn, end_pfn] is isolated.
  * Otherwise, return first unisolated pfn from the start_pfn.
@@ -727,7 +962,9 @@ static unsigned long isolate_interrupted(struct gcma *gcma,
 	unsigned long *bitmap;
 	unsigned long pfn, ret = 0;
 	struct page *page;
+	unsigned long flags;
 
+	local_irq_save(flags);
 	spin_lock(&gcma->lock);
 
 	for (pfn = start_pfn; pfn < end_pfn; pfn++) {
@@ -750,6 +987,7 @@ static unsigned long isolate_interrupted(struct gcma *gcma,
 
 	}
 	spin_unlock(&gcma->lock);
+	local_irq_restore(flags);
 	return ret;
 }
 
@@ -774,9 +1012,11 @@ int gcma_alloc_contig(struct gcma *gcma, unsigned long start_pfn,
 	unsigned long pfn;
 	unsigned long orig_start = start_pfn;
 	spinlock_t *lru_lock;
+	unsigned long flags = 0;
 
 retry:
 	for (pfn = start_pfn; pfn < start_pfn + size; pfn++) {
+		local_irq_save(flags);
 		spin_lock(&gcma->lock);
 
 		offset = pfn - gcma->base_pfn;
@@ -788,21 +1028,25 @@ retry:
 			bitmap_set(gcma->bitmap, offset, 1);
 			set_gpage_flag(page, GF_ISOLATED);
 			spin_unlock(&gcma->lock);
+			local_irq_restore(flags);
 			continue;
 		}
 		if (gpage_flag(page, GF_ISOLATED)) {
 			spin_unlock(&gcma->lock);
+			local_irq_restore(flags);
 			continue;
 		}
 
 		/* Someone is using the page so it's complicated :( */
 		spin_unlock(&gcma->lock);
+		local_irq_restore(flags);
 
 		/* During dmem_store, hashbuck could not be set in page, yet */
 		if (dmem_hashbuck(page) == NULL)
 			continue;
 
 		lru_lock = &dmem_hashbuck(page)->dmem->lru_lock;
+		local_irq_save(flags);
 		spin_lock(lru_lock);
 		spin_lock(&gcma->lock);
 
@@ -834,6 +1078,7 @@ retry:
 next_page:
 		spin_unlock(&gcma->lock);
 		spin_unlock(lru_lock);
+		local_irq_restore(flags);
 	}
 
 	/*
@@ -845,6 +1090,8 @@ next_page:
 		entry = dmem_entry(page);
 		lru_lock = &dmem_hashbuck(page)->dmem->lru_lock;
 
+		if (lru_lock == &cc_dmem.lru_lock)
+			local_irq_save(flags);
 		spin_lock(&buck->lock);
 		spin_lock(lru_lock);
 		/* drop refcount increased by above loop */
@@ -855,6 +1102,8 @@ next_page:
 			dmem_put(buck, entry);
 		spin_unlock(lru_lock);
 		spin_unlock(&buck->lock);
+		if (lru_lock == &cc_dmem.lru_lock)
+			local_irq_restore(flags);
 	}
 
 	start_pfn = isolate_interrupted(gcma, orig_start, orig_start + size);
@@ -874,11 +1123,14 @@ void gcma_free_contig(struct gcma *gcma,
 			unsigned long start_pfn, unsigned long size)
 {
 	unsigned long offset;
+	unsigned long flags;
 
+	local_irq_save(flags);
 	spin_lock(&gcma->lock);
 	offset = start_pfn - gcma->base_pfn;
 	bitmap_clear(gcma->bitmap, offset, size);
 	spin_unlock(&gcma->lock);
+	local_irq_restore(flags);
 }
 
 static int __init init_gcma(void)
@@ -889,6 +1141,64 @@ static int __init init_gcma(void)
 	if (dmem_entry_cache == NULL)
 		return -ENOMEM;
 
+	if (fs_disabled) {
+		pr_info("gcma frontswap is disabled. skip it\n");
+		goto init_cleancache;
+	}
+	fs_dmem.nr_pools = MAX_SWAPFILES;
+	fs_dmem.pools = kzalloc(sizeof(struct dmem_pool *) * fs_dmem.nr_pools,
+				GFP_KERNEL);
+	if (!fs_dmem.pools) {
+		pr_warn("failed to allocate frontswap dmem pools\n");
+		return -ENOMEM;
+	}
+
+	fs_dmem.nr_hash = NR_FS_DMEM_HASH_BUCKS;
+	fs_dmem.key_cache = KMEM_CACHE(frontswap_dmem_key, 0);
+	if (!fs_dmem.key_cache)
+		return -ENOMEM;
+	fs_dmem.bytes_key = BYTES_FS_DMEM_KEY;
+
+	INIT_LIST_HEAD(&fs_dmem.lru_list);
+	spin_lock_init(&fs_dmem.lru_lock);
+
+	fs_dmem.hash_key = frontswap_hash_key;
+	fs_dmem.compare = frontswap_compare;
+
+	/*
+	 * By writethough mode, GCMA could discard all of pages in an instant
+	 * instead of slow writing pages out to the swap device.
+	 */
+	frontswap_writethrough(true);
+	frontswap_register_ops(&gcma_frontswap_ops);
+
+init_cleancache:
+	if (cc_disabled) {
+		pr_info("gcma cleancache is disabled. skip it\n");
+		goto init_debugfs;
+	}
+	cc_dmem.nr_pools = MAX_CLEANCACHE_FS;
+	cc_dmem.pools = kzalloc(sizeof(struct dmem_pool *) * cc_dmem.nr_pools,
+				GFP_KERNEL);
+	if (!cc_dmem.pools) {
+		pr_warn("failed to allocate cleancache dmem pools\n");
+		return -ENOMEM;
+	}
+	cc_dmem.nr_hash = NR_CC_DMEM_HASH_BUCKS;
+	cc_dmem.key_cache = KMEM_CACHE(cleancache_dmem_key, 0);
+	if (!cc_dmem.key_cache)
+		return -ENOMEM;
+	cc_dmem.bytes_key = BYTES_CC_DMEM_KEY;
+
+	INIT_LIST_HEAD(&cc_dmem.lru_list);
+	spin_lock_init(&cc_dmem.lru_lock);
+
+	cc_dmem.hash_key = cleancache_hash_key;
+	cc_dmem.compare = cleancache_compare;
+	cleancache_register_ops(&gcma_cleancache_ops);
+
+init_debugfs:
+	gcma_debugfs_init();
 	return 0;
 }
 
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC v2 4/5] gcma: export statistical data on debugfs
  2015-02-23 19:54 ` SeongJae Park
@ 2015-02-23 19:54   ` SeongJae Park
  -1 siblings, 0 replies; 20+ messages in thread
From: SeongJae Park @ 2015-02-23 19:54 UTC (permalink / raw)
  To: akpm
  Cc: lauraa, minchan, sergey.senozhatsky, linux-mm, linux-kernel,
	SeongJae Park

Export statistical data of second-class clients of gcma on debugfs to
let users know how gcma is working internally.

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
 mm/gcma.c | 127 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 119 insertions(+), 8 deletions(-)

diff --git a/mm/gcma.c b/mm/gcma.c
index 924e3f6..57203b4 100644
--- a/mm/gcma.c
+++ b/mm/gcma.c
@@ -130,6 +130,26 @@ module_param_named(fs_disabled, fs_disabled, bool, 0444);
 static bool cc_disabled __read_mostly;
 module_param_named(cc_disabled, cc_disabled, bool, 0444);
 
+/* For statistics */
+static atomic_t gcma_fs_inits = ATOMIC_INIT(0);
+static atomic_t gcma_fs_stored_pages = ATOMIC_INIT(0);
+static atomic_t gcma_fs_loaded_pages = ATOMIC_INIT(0);
+static atomic_t gcma_fs_evicted_pages = ATOMIC_INIT(0);
+static atomic_t gcma_fs_reclaimed_pages = ATOMIC_INIT(0);
+static atomic_t gcma_fs_invalidated_pages = ATOMIC_INIT(0);
+static atomic_t gcma_fs_invalidated_areas = ATOMIC_INIT(0);
+
+static atomic_t gcma_cc_inits = ATOMIC_INIT(0);
+static atomic_t gcma_cc_stored_pages = ATOMIC_INIT(0);
+static atomic_t gcma_cc_loaded_pages = ATOMIC_INIT(0);
+static atomic_t gcma_cc_load_failed_pages = ATOMIC_INIT(0);
+static atomic_t gcma_cc_evicted_pages = ATOMIC_INIT(0);
+static atomic_t gcma_cc_reclaimed_pages = ATOMIC_INIT(0);
+static atomic_t gcma_cc_invalidated_pages = ATOMIC_INIT(0);
+static atomic_t gcma_cc_invalidated_inodes = ATOMIC_INIT(0);
+static atomic_t gcma_cc_invalidated_fses = ATOMIC_INIT(0);
+static atomic_t gcma_cc_invalidate_failed_fses = ATOMIC_INIT(0);
+
 static unsigned long dmem_evict_lru(struct dmem *dmem, unsigned long nr_pages);
 
 static struct dmem_hashbucket *dmem_hashbuck(struct page *page)
@@ -475,6 +495,10 @@ static unsigned long dmem_evict_lru(struct dmem *dmem, unsigned long nr_pages)
 		spin_unlock(&buck->lock);
 	}
 
+	if (dmem == &fs_dmem)
+		atomic_add(evicted, &gcma_fs_evicted_pages);
+	else
+		atomic_add(evicted, &gcma_cc_evicted_pages);
 	return evicted;
 }
 
@@ -791,12 +815,18 @@ static unsigned frontswap_hash_key(void *key)
 void gcma_frontswap_init(unsigned type)
 {
 	dmem_init_pool(&fs_dmem, type);
+	atomic_inc(&gcma_fs_inits);
 }
 
 int gcma_frontswap_store(unsigned type, pgoff_t offset,
 				struct page *page)
 {
-	return dmem_store_page(&fs_dmem, type, (void *)&offset, page);
+	int ret;
+
+	ret = dmem_store_page(&fs_dmem, type, (void *)&offset, page);
+	if (ret == 0)
+		atomic_inc(&gcma_fs_stored_pages);
+	return ret;
 }
 
 /*
@@ -806,17 +836,24 @@ int gcma_frontswap_store(unsigned type, pgoff_t offset,
 int gcma_frontswap_load(unsigned type, pgoff_t offset,
 			       struct page *page)
 {
-	return dmem_load_page(&fs_dmem, type, (void *)&offset, page);
+	int ret;
+
+	ret = dmem_load_page(&fs_dmem, type, (void *)&offset, page);
+	if (ret == 0)
+		atomic_inc(&gcma_fs_loaded_pages);
+	return ret;
 }
 
 void gcma_frontswap_invalidate_page(unsigned type, pgoff_t offset)
 {
-	dmem_invalidate_entry(&fs_dmem, type, (void *)&offset);
+	if (dmem_invalidate_entry(&fs_dmem, type, (void *)&offset) == 0)
+		atomic_inc(&gcma_fs_invalidated_pages);
 }
 
 void gcma_frontswap_invalidate_area(unsigned type)
 {
-	dmem_invalidate_pool(&fs_dmem, type);
+	if (dmem_invalidate_pool(&fs_dmem, type) == 0)
+		atomic_inc(&gcma_fs_invalidated_areas);
 }
 
 static struct frontswap_ops gcma_frontswap_ops = {
@@ -865,6 +902,8 @@ int gcma_cleancache_init_fs(size_t pagesize)
 	err = dmem_init_pool(&cc_dmem, pool_id);
 	if (err != 0)
 		return err;
+
+	atomic_inc(&gcma_cc_inits);
 	return pool_id;
 }
 
@@ -885,6 +924,10 @@ int gcma_cleancache_get_page(int pool_id, struct cleancache_filekey fkey,
 	local_irq_save(flags);
 	ret = dmem_load_page(&cc_dmem, pool_id, &key, page);
 	local_irq_restore(flags);
+	if (ret == 0)
+		atomic_inc(&gcma_cc_loaded_pages);
+	else
+		atomic_inc(&gcma_cc_load_failed_pages);
 	return ret;
 }
 
@@ -897,7 +940,8 @@ void gcma_cleancache_put_page(int pool_id, struct cleancache_filekey fkey,
 	cleancache_set_key(&fkey, &offset, &key);
 
 	local_irq_save(flags);
-	dmem_store_page(&cc_dmem, pool_id, &key, page);
+	if (dmem_store_page(&cc_dmem, pool_id, &key, page) == 0)
+		atomic_inc(&gcma_cc_stored_pages);
 	local_irq_restore(flags);
 }
 
@@ -911,7 +955,8 @@ void gcma_cleancache_invalidate_page(int pool_id,
 	cleancache_set_key(&fkey, &offset, &key);
 
 	local_irq_save(flags);
-	dmem_invalidate_entry(&cc_dmem, pool_id, &key);
+	if (dmem_invalidate_entry(&cc_dmem, pool_id, &key) == 0)
+		atomic_inc(&gcma_cc_invalidated_pages);
 	local_irq_restore(flags);
 }
 
@@ -933,10 +978,12 @@ void gcma_cleancache_invalidate_fs(int pool_id)
 	if (pool_id < 0 || pool_id >= atomic_read(&nr_cleancache_fses)) {
 		pr_warn("%s received wrong pool id %d\n",
 				__func__, pool_id);
+		atomic_inc(&gcma_cc_invalidate_failed_fses);
 		return;
 	}
 	local_irq_save(flags);
-	dmem_invalidate_pool(&cc_dmem, pool_id);
+	if (dmem_invalidate_pool(&cc_dmem, pool_id) == 0)
+		atomic_inc(&gcma_cc_invalidated_fses);
 	local_irq_restore(flags);
 }
 
@@ -1102,8 +1149,12 @@ next_page:
 			dmem_put(buck, entry);
 		spin_unlock(lru_lock);
 		spin_unlock(&buck->lock);
-		if (lru_lock == &cc_dmem.lru_lock)
+		if (lru_lock == &cc_dmem.lru_lock) {
 			local_irq_restore(flags);
+			atomic_inc(&gcma_cc_reclaimed_pages);
+		} else {
+			atomic_inc(&gcma_fs_reclaimed_pages);
+		}
 	}
 
 	start_pfn = isolate_interrupted(gcma, orig_start, orig_start + size);
@@ -1133,6 +1184,66 @@ void gcma_free_contig(struct gcma *gcma,
 	local_irq_restore(flags);
 }
 
+#ifdef CONFIG_DEBUG_FS
+#include <linux/debugfs.h>
+
+static struct dentry *gcma_debugfs_root;
+
+static int __init gcma_debugfs_init(void)
+{
+	if (!debugfs_initialized())
+		return -ENODEV;
+
+	gcma_debugfs_root = debugfs_create_dir("gcma", NULL);
+	if (!gcma_debugfs_root)
+		return -ENOMEM;
+
+	debugfs_create_atomic_t("fs_inits", S_IRUGO,
+			gcma_debugfs_root, &gcma_fs_inits);
+	debugfs_create_atomic_t("fs_stored_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_fs_stored_pages);
+	debugfs_create_atomic_t("fs_loaded_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_fs_loaded_pages);
+	debugfs_create_atomic_t("fs_evicted_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_fs_evicted_pages);
+	debugfs_create_atomic_t("fs_reclaimed_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_fs_reclaimed_pages);
+	debugfs_create_atomic_t("fs_invalidated_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_fs_invalidated_pages);
+	debugfs_create_atomic_t("fs_invalidated_areas", S_IRUGO,
+			gcma_debugfs_root, &gcma_fs_invalidated_areas);
+
+	debugfs_create_atomic_t("cc_inits", S_IRUGO,
+			gcma_debugfs_root, &gcma_cc_inits);
+	debugfs_create_atomic_t("cc_stored_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_cc_stored_pages);
+	debugfs_create_atomic_t("cc_loaded_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_cc_loaded_pages);
+	debugfs_create_atomic_t("cc_load_failed_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_cc_load_failed_pages);
+	debugfs_create_atomic_t("cc_evicted_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_cc_evicted_pages);
+	debugfs_create_atomic_t("cc_reclaimed_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_cc_reclaimed_pages);
+	debugfs_create_atomic_t("cc_invalidated_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_cc_invalidated_pages);
+	debugfs_create_atomic_t("cc_invalidated_inodes", S_IRUGO,
+			gcma_debugfs_root, &gcma_cc_invalidated_inodes);
+	debugfs_create_atomic_t("cc_invalidated_fses", S_IRUGO,
+			gcma_debugfs_root, &gcma_cc_invalidated_fses);
+	debugfs_create_atomic_t("cc_invalidate_failed_fses", S_IRUGO,
+			gcma_debugfs_root, &gcma_cc_invalidate_failed_fses);
+
+	pr_info("gcma debufs init\n");
+	return 0;
+}
+#else
+static int __init gcma_debugfs_init(void)
+{
+	return 0;
+}
+#endif
+
 static int __init init_gcma(void)
 {
 	pr_info("loading gcma\n");
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC v2 4/5] gcma: export statistical data on debugfs
@ 2015-02-23 19:54   ` SeongJae Park
  0 siblings, 0 replies; 20+ messages in thread
From: SeongJae Park @ 2015-02-23 19:54 UTC (permalink / raw)
  To: akpm
  Cc: lauraa, minchan, sergey.senozhatsky, linux-mm, linux-kernel,
	SeongJae Park

Export statistical data of second-class clients of gcma on debugfs to
let users know how gcma is working internally.

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
 mm/gcma.c | 127 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 119 insertions(+), 8 deletions(-)

diff --git a/mm/gcma.c b/mm/gcma.c
index 924e3f6..57203b4 100644
--- a/mm/gcma.c
+++ b/mm/gcma.c
@@ -130,6 +130,26 @@ module_param_named(fs_disabled, fs_disabled, bool, 0444);
 static bool cc_disabled __read_mostly;
 module_param_named(cc_disabled, cc_disabled, bool, 0444);
 
+/* For statistics */
+static atomic_t gcma_fs_inits = ATOMIC_INIT(0);
+static atomic_t gcma_fs_stored_pages = ATOMIC_INIT(0);
+static atomic_t gcma_fs_loaded_pages = ATOMIC_INIT(0);
+static atomic_t gcma_fs_evicted_pages = ATOMIC_INIT(0);
+static atomic_t gcma_fs_reclaimed_pages = ATOMIC_INIT(0);
+static atomic_t gcma_fs_invalidated_pages = ATOMIC_INIT(0);
+static atomic_t gcma_fs_invalidated_areas = ATOMIC_INIT(0);
+
+static atomic_t gcma_cc_inits = ATOMIC_INIT(0);
+static atomic_t gcma_cc_stored_pages = ATOMIC_INIT(0);
+static atomic_t gcma_cc_loaded_pages = ATOMIC_INIT(0);
+static atomic_t gcma_cc_load_failed_pages = ATOMIC_INIT(0);
+static atomic_t gcma_cc_evicted_pages = ATOMIC_INIT(0);
+static atomic_t gcma_cc_reclaimed_pages = ATOMIC_INIT(0);
+static atomic_t gcma_cc_invalidated_pages = ATOMIC_INIT(0);
+static atomic_t gcma_cc_invalidated_inodes = ATOMIC_INIT(0);
+static atomic_t gcma_cc_invalidated_fses = ATOMIC_INIT(0);
+static atomic_t gcma_cc_invalidate_failed_fses = ATOMIC_INIT(0);
+
 static unsigned long dmem_evict_lru(struct dmem *dmem, unsigned long nr_pages);
 
 static struct dmem_hashbucket *dmem_hashbuck(struct page *page)
@@ -475,6 +495,10 @@ static unsigned long dmem_evict_lru(struct dmem *dmem, unsigned long nr_pages)
 		spin_unlock(&buck->lock);
 	}
 
+	if (dmem == &fs_dmem)
+		atomic_add(evicted, &gcma_fs_evicted_pages);
+	else
+		atomic_add(evicted, &gcma_cc_evicted_pages);
 	return evicted;
 }
 
@@ -791,12 +815,18 @@ static unsigned frontswap_hash_key(void *key)
 void gcma_frontswap_init(unsigned type)
 {
 	dmem_init_pool(&fs_dmem, type);
+	atomic_inc(&gcma_fs_inits);
 }
 
 int gcma_frontswap_store(unsigned type, pgoff_t offset,
 				struct page *page)
 {
-	return dmem_store_page(&fs_dmem, type, (void *)&offset, page);
+	int ret;
+
+	ret = dmem_store_page(&fs_dmem, type, (void *)&offset, page);
+	if (ret == 0)
+		atomic_inc(&gcma_fs_stored_pages);
+	return ret;
 }
 
 /*
@@ -806,17 +836,24 @@ int gcma_frontswap_store(unsigned type, pgoff_t offset,
 int gcma_frontswap_load(unsigned type, pgoff_t offset,
 			       struct page *page)
 {
-	return dmem_load_page(&fs_dmem, type, (void *)&offset, page);
+	int ret;
+
+	ret = dmem_load_page(&fs_dmem, type, (void *)&offset, page);
+	if (ret == 0)
+		atomic_inc(&gcma_fs_loaded_pages);
+	return ret;
 }
 
 void gcma_frontswap_invalidate_page(unsigned type, pgoff_t offset)
 {
-	dmem_invalidate_entry(&fs_dmem, type, (void *)&offset);
+	if (dmem_invalidate_entry(&fs_dmem, type, (void *)&offset) == 0)
+		atomic_inc(&gcma_fs_invalidated_pages);
 }
 
 void gcma_frontswap_invalidate_area(unsigned type)
 {
-	dmem_invalidate_pool(&fs_dmem, type);
+	if (dmem_invalidate_pool(&fs_dmem, type) == 0)
+		atomic_inc(&gcma_fs_invalidated_areas);
 }
 
 static struct frontswap_ops gcma_frontswap_ops = {
@@ -865,6 +902,8 @@ int gcma_cleancache_init_fs(size_t pagesize)
 	err = dmem_init_pool(&cc_dmem, pool_id);
 	if (err != 0)
 		return err;
+
+	atomic_inc(&gcma_cc_inits);
 	return pool_id;
 }
 
@@ -885,6 +924,10 @@ int gcma_cleancache_get_page(int pool_id, struct cleancache_filekey fkey,
 	local_irq_save(flags);
 	ret = dmem_load_page(&cc_dmem, pool_id, &key, page);
 	local_irq_restore(flags);
+	if (ret == 0)
+		atomic_inc(&gcma_cc_loaded_pages);
+	else
+		atomic_inc(&gcma_cc_load_failed_pages);
 	return ret;
 }
 
@@ -897,7 +940,8 @@ void gcma_cleancache_put_page(int pool_id, struct cleancache_filekey fkey,
 	cleancache_set_key(&fkey, &offset, &key);
 
 	local_irq_save(flags);
-	dmem_store_page(&cc_dmem, pool_id, &key, page);
+	if (dmem_store_page(&cc_dmem, pool_id, &key, page) == 0)
+		atomic_inc(&gcma_cc_stored_pages);
 	local_irq_restore(flags);
 }
 
@@ -911,7 +955,8 @@ void gcma_cleancache_invalidate_page(int pool_id,
 	cleancache_set_key(&fkey, &offset, &key);
 
 	local_irq_save(flags);
-	dmem_invalidate_entry(&cc_dmem, pool_id, &key);
+	if (dmem_invalidate_entry(&cc_dmem, pool_id, &key) == 0)
+		atomic_inc(&gcma_cc_invalidated_pages);
 	local_irq_restore(flags);
 }
 
@@ -933,10 +978,12 @@ void gcma_cleancache_invalidate_fs(int pool_id)
 	if (pool_id < 0 || pool_id >= atomic_read(&nr_cleancache_fses)) {
 		pr_warn("%s received wrong pool id %d\n",
 				__func__, pool_id);
+		atomic_inc(&gcma_cc_invalidate_failed_fses);
 		return;
 	}
 	local_irq_save(flags);
-	dmem_invalidate_pool(&cc_dmem, pool_id);
+	if (dmem_invalidate_pool(&cc_dmem, pool_id) == 0)
+		atomic_inc(&gcma_cc_invalidated_fses);
 	local_irq_restore(flags);
 }
 
@@ -1102,8 +1149,12 @@ next_page:
 			dmem_put(buck, entry);
 		spin_unlock(lru_lock);
 		spin_unlock(&buck->lock);
-		if (lru_lock == &cc_dmem.lru_lock)
+		if (lru_lock == &cc_dmem.lru_lock) {
 			local_irq_restore(flags);
+			atomic_inc(&gcma_cc_reclaimed_pages);
+		} else {
+			atomic_inc(&gcma_fs_reclaimed_pages);
+		}
 	}
 
 	start_pfn = isolate_interrupted(gcma, orig_start, orig_start + size);
@@ -1133,6 +1184,66 @@ void gcma_free_contig(struct gcma *gcma,
 	local_irq_restore(flags);
 }
 
+#ifdef CONFIG_DEBUG_FS
+#include <linux/debugfs.h>
+
+static struct dentry *gcma_debugfs_root;
+
+static int __init gcma_debugfs_init(void)
+{
+	if (!debugfs_initialized())
+		return -ENODEV;
+
+	gcma_debugfs_root = debugfs_create_dir("gcma", NULL);
+	if (!gcma_debugfs_root)
+		return -ENOMEM;
+
+	debugfs_create_atomic_t("fs_inits", S_IRUGO,
+			gcma_debugfs_root, &gcma_fs_inits);
+	debugfs_create_atomic_t("fs_stored_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_fs_stored_pages);
+	debugfs_create_atomic_t("fs_loaded_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_fs_loaded_pages);
+	debugfs_create_atomic_t("fs_evicted_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_fs_evicted_pages);
+	debugfs_create_atomic_t("fs_reclaimed_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_fs_reclaimed_pages);
+	debugfs_create_atomic_t("fs_invalidated_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_fs_invalidated_pages);
+	debugfs_create_atomic_t("fs_invalidated_areas", S_IRUGO,
+			gcma_debugfs_root, &gcma_fs_invalidated_areas);
+
+	debugfs_create_atomic_t("cc_inits", S_IRUGO,
+			gcma_debugfs_root, &gcma_cc_inits);
+	debugfs_create_atomic_t("cc_stored_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_cc_stored_pages);
+	debugfs_create_atomic_t("cc_loaded_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_cc_loaded_pages);
+	debugfs_create_atomic_t("cc_load_failed_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_cc_load_failed_pages);
+	debugfs_create_atomic_t("cc_evicted_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_cc_evicted_pages);
+	debugfs_create_atomic_t("cc_reclaimed_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_cc_reclaimed_pages);
+	debugfs_create_atomic_t("cc_invalidated_pages", S_IRUGO,
+			gcma_debugfs_root, &gcma_cc_invalidated_pages);
+	debugfs_create_atomic_t("cc_invalidated_inodes", S_IRUGO,
+			gcma_debugfs_root, &gcma_cc_invalidated_inodes);
+	debugfs_create_atomic_t("cc_invalidated_fses", S_IRUGO,
+			gcma_debugfs_root, &gcma_cc_invalidated_fses);
+	debugfs_create_atomic_t("cc_invalidate_failed_fses", S_IRUGO,
+			gcma_debugfs_root, &gcma_cc_invalidate_failed_fses);
+
+	pr_info("gcma debufs init\n");
+	return 0;
+}
+#else
+static int __init gcma_debugfs_init(void)
+{
+	return 0;
+}
+#endif
+
 static int __init init_gcma(void)
 {
 	pr_info("loading gcma\n");
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC v2 5/5] gcma: integrate gcma under cma interface
  2015-02-23 19:54 ` SeongJae Park
@ 2015-02-23 19:54   ` SeongJae Park
  -1 siblings, 0 replies; 20+ messages in thread
From: SeongJae Park @ 2015-02-23 19:54 UTC (permalink / raw)
  To: akpm
  Cc: lauraa, minchan, sergey.senozhatsky, linux-mm, linux-kernel,
	SeongJae Park

This commit let cma clients to be able to use gcma easily using familiar
cma interface by integrating gcma under cma interface.

With this commit, clients can decalre a contiguous memory area to be
managed in gcma way instead of cma way internally by using
gcma_declare_contiguous() function call. After declaration, clients can
use the area using familiar cma interface while it works in gcma way.

For example, you can use following code snippet to make two contiguous
regions: one region will work as cma and the other will work as gcma.

```
struct cma *cma, *gcma;

cma_declare_contiguous(base, size, limit, 0, 0, fixed, &cma);
gcma_declare_contiguous(gcma_base, size, gcma_limit, 0, 0, fixed,
&gcma);

cma_alloc(cma, 1024, 0);	/* alloc in cma way */
cma_alloc(gcma, 1024, 0);	/* alloc in gcma way */
```

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
 include/linux/cma.h  |   4 ++
 include/linux/gcma.h |  21 ++++++++++
 mm/Kconfig           |  24 +++++++++++
 mm/Makefile          |   1 +
 mm/cma.c             | 113 ++++++++++++++++++++++++++++++++++++++++++---------
 5 files changed, 144 insertions(+), 19 deletions(-)

diff --git a/include/linux/cma.h b/include/linux/cma.h
index a93438b..6173cff 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -22,6 +22,10 @@ extern int __init cma_declare_contiguous(phys_addr_t base,
 			phys_addr_t size, phys_addr_t limit,
 			phys_addr_t alignment, unsigned int order_per_bit,
 			bool fixed, struct cma **res_cma);
+extern int __init gcma_declare_contiguous(phys_addr_t base,
+			phys_addr_t size, phys_addr_t limit,
+			phys_addr_t alignment, unsigned int order_per_bit,
+			bool fixed, struct cma **res_cma);
 extern int cma_init_reserved_mem(phys_addr_t base,
 					phys_addr_t size, int order_per_bit,
 					struct cma **res_cma);
diff --git a/include/linux/gcma.h b/include/linux/gcma.h
index 12e4431..c8f8c32 100644
--- a/include/linux/gcma.h
+++ b/include/linux/gcma.h
@@ -33,6 +33,25 @@
 
 struct gcma;
 
+#ifndef CONFIG_GCMA
+
+inline int gcma_init(unsigned long start_pfn, unsigned long size,
+		     struct gcma **res_gcma)
+{
+	return 0;
+}
+
+inline int gcma_alloc_contig(struct gcma *gcma,
+			     unsigned long start, unsigned long end)
+{
+	return 0;
+}
+
+void gcma_free_contig(struct gcma *gcma,
+		      unsigned long pfn, unsigned long nr_pages) { }
+
+#else
+
 int gcma_init(unsigned long start_pfn, unsigned long size,
 	      struct gcma **res_gcma);
 int gcma_alloc_contig(struct gcma *gcma,
@@ -40,4 +59,6 @@ int gcma_alloc_contig(struct gcma *gcma,
 void gcma_free_contig(struct gcma *gcma,
 		      unsigned long start_pfn, unsigned long size);
 
+#endif
+
 #endif /* _LINUX_GCMA_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index 1d1ae6b..cceef9a 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -527,6 +527,30 @@ config CMA_AREAS
 
 	  If unsure, leave the default value "7".
 
+config GCMA
+	bool "Guaranteed Contiguous Memory Allocator (EXPERIMENTAL)"
+	default n
+	select FRONTSWAP
+	select CLEANCACHE
+	select CMA
+	help
+	  A contiguous memory allocator which guarantees success and
+	  predictable latency for allocation request.
+	  It carves out large amount of memory and let them be allocated
+	  to the contiguous memory request while it can be used as backend
+	  for frontswap.
+
+	  This is marked experimental because it is a new feature that
+	  interacts heavily with memory reclaim.
+
+config GCMA_DEFAULT
+	bool "Set GCMA as default CMA facility (EXPERIMENTA)"
+	default n
+	depends on GCMA
+	help
+	  Set older CMA interfaces to work as GCMA rather than CMA without any
+	  client code change.
+
 config MEM_SOFT_DIRTY
 	bool "Track memory changes"
 	depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY && PROC_FS
diff --git a/mm/Makefile b/mm/Makefile
index 8405eb0..e79cb70 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -68,4 +68,5 @@ obj-$(CONFIG_ZBUD)	+= zbud.o
 obj-$(CONFIG_ZSMALLOC)	+= zsmalloc.o
 obj-$(CONFIG_GENERIC_EARLY_IOREMAP) += early_ioremap.o
 obj-$(CONFIG_CMA)	+= cma.o
+obj-$(CONFIG_GCMA)	+= gcma.o
 obj-$(CONFIG_MEMORY_BALLOON) += balloon_compaction.o
diff --git a/mm/cma.c b/mm/cma.c
index fde706e..0e1a32c 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -32,14 +32,18 @@
 #include <linux/slab.h>
 #include <linux/log2.h>
 #include <linux/cma.h>
+#include <linux/gcma.h>
 #include <linux/highmem.h>
 
+#define IS_GCMA ((struct gcma *)(void *)0xFF)
+
 struct cma {
 	unsigned long	base_pfn;
 	unsigned long	count;
 	unsigned long	*bitmap;
 	unsigned int order_per_bit; /* Order of pages represented by one bit */
 	struct mutex	lock;
+	struct gcma	*gcma;
 };
 
 static struct cma cma_areas[MAX_CMA_AREAS];
@@ -86,26 +90,25 @@ static void cma_clear_bitmap(struct cma *cma, unsigned long pfn, int count)
 	mutex_unlock(&cma->lock);
 }
 
-static int __init cma_activate_area(struct cma *cma)
+/*
+ * Return reserved pages for CMA to buddy allocator for using those pages
+ * as movable pages.
+ * Return 0 if it's called successfully. Otherwise, non-zero.
+ */
+static int free_reserved_pages(unsigned long pfn, unsigned long count)
 {
-	int bitmap_size = BITS_TO_LONGS(cma_bitmap_maxno(cma)) * sizeof(long);
-	unsigned long base_pfn = cma->base_pfn, pfn = base_pfn;
-	unsigned i = cma->count >> pageblock_order;
+	int ret = 0;
+	unsigned long base_pfn;
 	struct zone *zone;
 
-	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
-
-	if (!cma->bitmap)
-		return -ENOMEM;
-
-	WARN_ON_ONCE(!pfn_valid(pfn));
+	count = count >> pageblock_order;
 	zone = page_zone(pfn_to_page(pfn));
 
 	do {
-		unsigned j;
+		unsigned i;
 
 		base_pfn = pfn;
-		for (j = pageblock_nr_pages; j; --j, pfn++) {
+		for (i = pageblock_nr_pages; i; --i, pfn++) {
 			WARN_ON_ONCE(!pfn_valid(pfn));
 			/*
 			 * alloc_contig_range requires the pfn range
@@ -113,12 +116,38 @@ static int __init cma_activate_area(struct cma *cma)
 			 * simple by forcing the entire CMA resv range
 			 * to be in the same zone.
 			 */
-			if (page_zone(pfn_to_page(pfn)) != zone)
-				goto err;
+			if (page_zone(pfn_to_page(pfn)) != zone) {
+				ret = -EINVAL;
+				break;
+			}
 		}
 		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
-	} while (--i);
+	} while (--count);
+
+	return ret;
+}
+
+static int __init cma_activate_area(struct cma *cma)
+{
+	int bitmap_size = BITS_TO_LONGS(cma_bitmap_maxno(cma)) * sizeof(long);
+	unsigned long base_pfn = cma->base_pfn, pfn = base_pfn;
+	int fail;
+
+	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+
+	if (!cma->bitmap)
+		return -ENOMEM;
+
+	WARN_ON_ONCE(!pfn_valid(pfn));
 
+	if (cma->gcma == IS_GCMA)
+		fail = gcma_init(cma->base_pfn, cma->count, &cma->gcma);
+	else
+		fail = free_reserved_pages(cma->base_pfn, cma->count);
+	if (fail != 0) {
+		kfree(cma->bitmap);
+		return -EINVAL;
+	}
 	mutex_init(&cma->lock);
 	return 0;
 
@@ -192,7 +221,7 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
 }
 
 /**
- * cma_declare_contiguous() - reserve custom contiguous area
+ * __declare_contiguous() - reserve custom contiguous area
  * @base: Base address of the reserved area optional, use 0 for any
  * @size: Size of the reserved area (in bytes),
  * @limit: End address of the reserved memory (optional, 0 for any).
@@ -209,7 +238,7 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
  * If @fixed is true, reserve contiguous area at exactly @base.  If false,
  * reserve in range from @base to @limit.
  */
-int __init cma_declare_contiguous(phys_addr_t base,
+int __init __declare_contiguous(phys_addr_t base,
 			phys_addr_t size, phys_addr_t limit,
 			phys_addr_t alignment, unsigned int order_per_bit,
 			bool fixed, struct cma **res_cma)
@@ -318,6 +347,43 @@ err:
 }
 
 /**
+ * gcma_declare_contiguous() - same as cma_declare_contiguous() except result
+ * cma's is_gcma field setting.
+ */
+int __init gcma_declare_contiguous(phys_addr_t base,
+			phys_addr_t size, phys_addr_t limit,
+			phys_addr_t alignment, unsigned int order_per_bit,
+			bool fixed, struct cma **res_cma)
+{
+	int ret = 0;
+
+	ret = __declare_contiguous(base, size, limit, alignment,
+			order_per_bit, fixed, res_cma);
+	if (ret >= 0)
+		(*res_cma)->gcma = IS_GCMA;
+
+	return ret;
+}
+
+int __init cma_declare_contiguous(phys_addr_t base,
+			phys_addr_t size, phys_addr_t limit,
+			phys_addr_t alignment, unsigned int order_per_bit,
+			bool fixed, struct cma **res_cma)
+{
+#ifdef CONFIG_GCMA_DEFAULT
+	return gcma_declare_contiguous(base, size, limit, alignment,
+			order_per_bit, fixed, res_cma);
+#else
+	int ret = 0;
+
+	ret = __declare_contiguous(base, size, limit, alignment,
+			order_per_bit, fixed, res_cma);
+
+	return ret;
+#endif
+}
+
+/**
  * cma_alloc() - allocate pages from contiguous area
  * @cma:   Contiguous memory region for which the allocation is performed.
  * @count: Requested number of pages.
@@ -364,7 +430,12 @@ struct page *cma_alloc(struct cma *cma, int count, unsigned int align)
 
 		pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
 		mutex_lock(&cma_mutex);
-		ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
+
+		if (cma->gcma)
+			ret = gcma_alloc_contig(cma->gcma, pfn, count);
+		else
+			ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
+
 		mutex_unlock(&cma_mutex);
 		if (ret == 0) {
 			page = pfn_to_page(pfn);
@@ -411,7 +482,11 @@ bool cma_release(struct cma *cma, struct page *pages, int count)
 
 	VM_BUG_ON(pfn + count > cma->base_pfn + cma->count);
 
-	free_contig_range(pfn, count);
+	if (cma->gcma)
+		gcma_free_contig(cma->gcma, pfn, count);
+	else
+		free_contig_range(pfn, count);
+
 	cma_clear_bitmap(cma, pfn, count);
 
 	return true;
-- 
1.9.1


^ permalink raw reply related	[flat|nested] 20+ messages in thread

* [RFC v2 5/5] gcma: integrate gcma under cma interface
@ 2015-02-23 19:54   ` SeongJae Park
  0 siblings, 0 replies; 20+ messages in thread
From: SeongJae Park @ 2015-02-23 19:54 UTC (permalink / raw)
  To: akpm
  Cc: lauraa, minchan, sergey.senozhatsky, linux-mm, linux-kernel,
	SeongJae Park

This commit let cma clients to be able to use gcma easily using familiar
cma interface by integrating gcma under cma interface.

With this commit, clients can decalre a contiguous memory area to be
managed in gcma way instead of cma way internally by using
gcma_declare_contiguous() function call. After declaration, clients can
use the area using familiar cma interface while it works in gcma way.

For example, you can use following code snippet to make two contiguous
regions: one region will work as cma and the other will work as gcma.

```
struct cma *cma, *gcma;

cma_declare_contiguous(base, size, limit, 0, 0, fixed, &cma);
gcma_declare_contiguous(gcma_base, size, gcma_limit, 0, 0, fixed,
&gcma);

cma_alloc(cma, 1024, 0);	/* alloc in cma way */
cma_alloc(gcma, 1024, 0);	/* alloc in gcma way */
```

Signed-off-by: SeongJae Park <sj38.park@gmail.com>
---
 include/linux/cma.h  |   4 ++
 include/linux/gcma.h |  21 ++++++++++
 mm/Kconfig           |  24 +++++++++++
 mm/Makefile          |   1 +
 mm/cma.c             | 113 ++++++++++++++++++++++++++++++++++++++++++---------
 5 files changed, 144 insertions(+), 19 deletions(-)

diff --git a/include/linux/cma.h b/include/linux/cma.h
index a93438b..6173cff 100644
--- a/include/linux/cma.h
+++ b/include/linux/cma.h
@@ -22,6 +22,10 @@ extern int __init cma_declare_contiguous(phys_addr_t base,
 			phys_addr_t size, phys_addr_t limit,
 			phys_addr_t alignment, unsigned int order_per_bit,
 			bool fixed, struct cma **res_cma);
+extern int __init gcma_declare_contiguous(phys_addr_t base,
+			phys_addr_t size, phys_addr_t limit,
+			phys_addr_t alignment, unsigned int order_per_bit,
+			bool fixed, struct cma **res_cma);
 extern int cma_init_reserved_mem(phys_addr_t base,
 					phys_addr_t size, int order_per_bit,
 					struct cma **res_cma);
diff --git a/include/linux/gcma.h b/include/linux/gcma.h
index 12e4431..c8f8c32 100644
--- a/include/linux/gcma.h
+++ b/include/linux/gcma.h
@@ -33,6 +33,25 @@
 
 struct gcma;
 
+#ifndef CONFIG_GCMA
+
+inline int gcma_init(unsigned long start_pfn, unsigned long size,
+		     struct gcma **res_gcma)
+{
+	return 0;
+}
+
+inline int gcma_alloc_contig(struct gcma *gcma,
+			     unsigned long start, unsigned long end)
+{
+	return 0;
+}
+
+void gcma_free_contig(struct gcma *gcma,
+		      unsigned long pfn, unsigned long nr_pages) { }
+
+#else
+
 int gcma_init(unsigned long start_pfn, unsigned long size,
 	      struct gcma **res_gcma);
 int gcma_alloc_contig(struct gcma *gcma,
@@ -40,4 +59,6 @@ int gcma_alloc_contig(struct gcma *gcma,
 void gcma_free_contig(struct gcma *gcma,
 		      unsigned long start_pfn, unsigned long size);
 
+#endif
+
 #endif /* _LINUX_GCMA_H */
diff --git a/mm/Kconfig b/mm/Kconfig
index 1d1ae6b..cceef9a 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -527,6 +527,30 @@ config CMA_AREAS
 
 	  If unsure, leave the default value "7".
 
+config GCMA
+	bool "Guaranteed Contiguous Memory Allocator (EXPERIMENTAL)"
+	default n
+	select FRONTSWAP
+	select CLEANCACHE
+	select CMA
+	help
+	  A contiguous memory allocator which guarantees success and
+	  predictable latency for allocation request.
+	  It carves out large amount of memory and let them be allocated
+	  to the contiguous memory request while it can be used as backend
+	  for frontswap.
+
+	  This is marked experimental because it is a new feature that
+	  interacts heavily with memory reclaim.
+
+config GCMA_DEFAULT
+	bool "Set GCMA as default CMA facility (EXPERIMENTA)"
+	default n
+	depends on GCMA
+	help
+	  Set older CMA interfaces to work as GCMA rather than CMA without any
+	  client code change.
+
 config MEM_SOFT_DIRTY
 	bool "Track memory changes"
 	depends on CHECKPOINT_RESTORE && HAVE_ARCH_SOFT_DIRTY && PROC_FS
diff --git a/mm/Makefile b/mm/Makefile
index 8405eb0..e79cb70 100644
--- a/mm/Makefile
+++ b/mm/Makefile
@@ -68,4 +68,5 @@ obj-$(CONFIG_ZBUD)	+= zbud.o
 obj-$(CONFIG_ZSMALLOC)	+= zsmalloc.o
 obj-$(CONFIG_GENERIC_EARLY_IOREMAP) += early_ioremap.o
 obj-$(CONFIG_CMA)	+= cma.o
+obj-$(CONFIG_GCMA)	+= gcma.o
 obj-$(CONFIG_MEMORY_BALLOON) += balloon_compaction.o
diff --git a/mm/cma.c b/mm/cma.c
index fde706e..0e1a32c 100644
--- a/mm/cma.c
+++ b/mm/cma.c
@@ -32,14 +32,18 @@
 #include <linux/slab.h>
 #include <linux/log2.h>
 #include <linux/cma.h>
+#include <linux/gcma.h>
 #include <linux/highmem.h>
 
+#define IS_GCMA ((struct gcma *)(void *)0xFF)
+
 struct cma {
 	unsigned long	base_pfn;
 	unsigned long	count;
 	unsigned long	*bitmap;
 	unsigned int order_per_bit; /* Order of pages represented by one bit */
 	struct mutex	lock;
+	struct gcma	*gcma;
 };
 
 static struct cma cma_areas[MAX_CMA_AREAS];
@@ -86,26 +90,25 @@ static void cma_clear_bitmap(struct cma *cma, unsigned long pfn, int count)
 	mutex_unlock(&cma->lock);
 }
 
-static int __init cma_activate_area(struct cma *cma)
+/*
+ * Return reserved pages for CMA to buddy allocator for using those pages
+ * as movable pages.
+ * Return 0 if it's called successfully. Otherwise, non-zero.
+ */
+static int free_reserved_pages(unsigned long pfn, unsigned long count)
 {
-	int bitmap_size = BITS_TO_LONGS(cma_bitmap_maxno(cma)) * sizeof(long);
-	unsigned long base_pfn = cma->base_pfn, pfn = base_pfn;
-	unsigned i = cma->count >> pageblock_order;
+	int ret = 0;
+	unsigned long base_pfn;
 	struct zone *zone;
 
-	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
-
-	if (!cma->bitmap)
-		return -ENOMEM;
-
-	WARN_ON_ONCE(!pfn_valid(pfn));
+	count = count >> pageblock_order;
 	zone = page_zone(pfn_to_page(pfn));
 
 	do {
-		unsigned j;
+		unsigned i;
 
 		base_pfn = pfn;
-		for (j = pageblock_nr_pages; j; --j, pfn++) {
+		for (i = pageblock_nr_pages; i; --i, pfn++) {
 			WARN_ON_ONCE(!pfn_valid(pfn));
 			/*
 			 * alloc_contig_range requires the pfn range
@@ -113,12 +116,38 @@ static int __init cma_activate_area(struct cma *cma)
 			 * simple by forcing the entire CMA resv range
 			 * to be in the same zone.
 			 */
-			if (page_zone(pfn_to_page(pfn)) != zone)
-				goto err;
+			if (page_zone(pfn_to_page(pfn)) != zone) {
+				ret = -EINVAL;
+				break;
+			}
 		}
 		init_cma_reserved_pageblock(pfn_to_page(base_pfn));
-	} while (--i);
+	} while (--count);
+
+	return ret;
+}
+
+static int __init cma_activate_area(struct cma *cma)
+{
+	int bitmap_size = BITS_TO_LONGS(cma_bitmap_maxno(cma)) * sizeof(long);
+	unsigned long base_pfn = cma->base_pfn, pfn = base_pfn;
+	int fail;
+
+	cma->bitmap = kzalloc(bitmap_size, GFP_KERNEL);
+
+	if (!cma->bitmap)
+		return -ENOMEM;
+
+	WARN_ON_ONCE(!pfn_valid(pfn));
 
+	if (cma->gcma == IS_GCMA)
+		fail = gcma_init(cma->base_pfn, cma->count, &cma->gcma);
+	else
+		fail = free_reserved_pages(cma->base_pfn, cma->count);
+	if (fail != 0) {
+		kfree(cma->bitmap);
+		return -EINVAL;
+	}
 	mutex_init(&cma->lock);
 	return 0;
 
@@ -192,7 +221,7 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
 }
 
 /**
- * cma_declare_contiguous() - reserve custom contiguous area
+ * __declare_contiguous() - reserve custom contiguous area
  * @base: Base address of the reserved area optional, use 0 for any
  * @size: Size of the reserved area (in bytes),
  * @limit: End address of the reserved memory (optional, 0 for any).
@@ -209,7 +238,7 @@ int __init cma_init_reserved_mem(phys_addr_t base, phys_addr_t size,
  * If @fixed is true, reserve contiguous area at exactly @base.  If false,
  * reserve in range from @base to @limit.
  */
-int __init cma_declare_contiguous(phys_addr_t base,
+int __init __declare_contiguous(phys_addr_t base,
 			phys_addr_t size, phys_addr_t limit,
 			phys_addr_t alignment, unsigned int order_per_bit,
 			bool fixed, struct cma **res_cma)
@@ -318,6 +347,43 @@ err:
 }
 
 /**
+ * gcma_declare_contiguous() - same as cma_declare_contiguous() except result
+ * cma's is_gcma field setting.
+ */
+int __init gcma_declare_contiguous(phys_addr_t base,
+			phys_addr_t size, phys_addr_t limit,
+			phys_addr_t alignment, unsigned int order_per_bit,
+			bool fixed, struct cma **res_cma)
+{
+	int ret = 0;
+
+	ret = __declare_contiguous(base, size, limit, alignment,
+			order_per_bit, fixed, res_cma);
+	if (ret >= 0)
+		(*res_cma)->gcma = IS_GCMA;
+
+	return ret;
+}
+
+int __init cma_declare_contiguous(phys_addr_t base,
+			phys_addr_t size, phys_addr_t limit,
+			phys_addr_t alignment, unsigned int order_per_bit,
+			bool fixed, struct cma **res_cma)
+{
+#ifdef CONFIG_GCMA_DEFAULT
+	return gcma_declare_contiguous(base, size, limit, alignment,
+			order_per_bit, fixed, res_cma);
+#else
+	int ret = 0;
+
+	ret = __declare_contiguous(base, size, limit, alignment,
+			order_per_bit, fixed, res_cma);
+
+	return ret;
+#endif
+}
+
+/**
  * cma_alloc() - allocate pages from contiguous area
  * @cma:   Contiguous memory region for which the allocation is performed.
  * @count: Requested number of pages.
@@ -364,7 +430,12 @@ struct page *cma_alloc(struct cma *cma, int count, unsigned int align)
 
 		pfn = cma->base_pfn + (bitmap_no << cma->order_per_bit);
 		mutex_lock(&cma_mutex);
-		ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
+
+		if (cma->gcma)
+			ret = gcma_alloc_contig(cma->gcma, pfn, count);
+		else
+			ret = alloc_contig_range(pfn, pfn + count, MIGRATE_CMA);
+
 		mutex_unlock(&cma_mutex);
 		if (ret == 0) {
 			page = pfn_to_page(pfn);
@@ -411,7 +482,11 @@ bool cma_release(struct cma *cma, struct page *pages, int count)
 
 	VM_BUG_ON(pfn + count > cma->base_pfn + cma->count);
 
-	free_contig_range(pfn, count);
+	if (cma->gcma)
+		gcma_free_contig(cma->gcma, pfn, count);
+	else
+		free_contig_range(pfn, count);
+
 	cma_clear_bitmap(cma, pfn, count);
 
 	return true;
-- 
1.9.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* Re: [RFC v2 0/5] introduce gcma
  2015-02-23 19:54 ` SeongJae Park
@ 2015-02-24 14:48   ` Michal Hocko
  -1 siblings, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2015-02-24 14:48 UTC (permalink / raw)
  To: SeongJae Park
  Cc: akpm, lauraa, minchan, sergey.senozhatsky, linux-mm, linux-kernel

On Tue 24-02-15 04:54:18, SeongJae Park wrote:
[...]
>  include/linux/cma.h  |    4 +
>  include/linux/gcma.h |   64 +++
>  mm/Kconfig           |   24 +
>  mm/Makefile          |    1 +
>  mm/cma.c             |  113 ++++-
>  mm/gcma.c            | 1321 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 1508 insertions(+), 19 deletions(-)
>  create mode 100644 include/linux/gcma.h
>  create mode 100644 mm/gcma.c

Wow this is huge! And I do not see reason for it to be so big. Why
cannot you simply define (per-cma area) 2-class users policy? Either via
kernel command line or export areas to userspace and allow to set policy
there.

For starter something like the following policies should suffice AFAIU
your description.
	- NONE - exclusive pool for CMA allocations only
	- DROPABLE - only allocations which might be dropped without any
	  additional actions - e.g. cleancache and frontswap with
	  write-through policy
	- RECLAIMABLE - only movable allocations which can be migrated
	  or dropped after writeback.

Has such an approach been considered?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC v2 0/5] introduce gcma
@ 2015-02-24 14:48   ` Michal Hocko
  0 siblings, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2015-02-24 14:48 UTC (permalink / raw)
  To: SeongJae Park
  Cc: akpm, lauraa, minchan, sergey.senozhatsky, linux-mm, linux-kernel

On Tue 24-02-15 04:54:18, SeongJae Park wrote:
[...]
>  include/linux/cma.h  |    4 +
>  include/linux/gcma.h |   64 +++
>  mm/Kconfig           |   24 +
>  mm/Makefile          |    1 +
>  mm/cma.c             |  113 ++++-
>  mm/gcma.c            | 1321 ++++++++++++++++++++++++++++++++++++++++++++++++++
>  6 files changed, 1508 insertions(+), 19 deletions(-)
>  create mode 100644 include/linux/gcma.h
>  create mode 100644 mm/gcma.c

Wow this is huge! And I do not see reason for it to be so big. Why
cannot you simply define (per-cma area) 2-class users policy? Either via
kernel command line or export areas to userspace and allow to set policy
there.

For starter something like the following policies should suffice AFAIU
your description.
	- NONE - exclusive pool for CMA allocations only
	- DROPABLE - only allocations which might be dropped without any
	  additional actions - e.g. cleancache and frontswap with
	  write-through policy
	- RECLAIMABLE - only movable allocations which can be migrated
	  or dropped after writeback.

Has such an approach been considered?
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC v2 0/5] introduce gcma
  2015-02-24 14:48   ` Michal Hocko
@ 2015-02-25  5:31     ` SeongJae Park
  -1 siblings, 0 replies; 20+ messages in thread
From: SeongJae Park @ 2015-02-25  5:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: SeongJae Park, akpm, lauraa, minchan, sergey.senozhatsky,
	linux-mm, linux-kernel

Hello Michal,

Thanks for your comment :)

On Tue, 24 Feb 2015, Michal Hocko wrote:

> On Tue 24-02-15 04:54:18, SeongJae Park wrote:
> [...]
>>  include/linux/cma.h  |    4 +
>>  include/linux/gcma.h |   64 +++
>>  mm/Kconfig           |   24 +
>>  mm/Makefile          |    1 +
>>  mm/cma.c             |  113 ++++-
>>  mm/gcma.c            | 1321 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  6 files changed, 1508 insertions(+), 19 deletions(-)
>>  create mode 100644 include/linux/gcma.h
>>  create mode 100644 mm/gcma.c
>
> Wow this is huge! And I do not see reason for it to be so big. Why
> cannot you simply define (per-cma area) 2-class users policy? Either via
> kernel command line or export areas to userspace and allow to set policy
> there.

For implementation of the idea, we should develop not only policy 
selection, but also backend for discardable memory. Most part of this 
patch were made for the backend.

Current implementation gives selection of policy per cma area to users. 
Only about 120 lines of code were changed for that though it's most ugly 
part of this patch. The part remains as ugly in this RFC because this is 
just prototype. The part will be changed in next version patchset.

>
> For starter something like the following policies should suffice AFAIU
> your description.
> 	- NONE - exclusive pool for CMA allocations only
> 	- DROPABLE - only allocations which might be dropped without any
> 	  additional actions - e.g. cleancache and frontswap with
> 	  write-through policy
> 	- RECLAIMABLE - only movable allocations which can be migrated
> 	  or dropped after writeback.
>
> Has such an approach been considered?

Similarly, but not in same way. In summary, GCMA gives DROPABLE and 
RECLAIMABLE policy selection per cma area and NONE policy to entire cma 
area declared using GCMA interface.

In detail, user could set policy of cma area as gcma way(DROPABLE) or cma 
way(RECLAIMABLE). Also, user could set gcma to utilize their cma area with 
Cleancache and/or Frontswap or not(NONE policy).

Your suggestion looks simple and better to understand. Next version of 
gcma will let users to be able to select policy as those per cma area.


Thanks,
SeongJae Park

> -- 
> Michal Hocko
> SUSE Labs
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC v2 0/5] introduce gcma
@ 2015-02-25  5:31     ` SeongJae Park
  0 siblings, 0 replies; 20+ messages in thread
From: SeongJae Park @ 2015-02-25  5:31 UTC (permalink / raw)
  To: Michal Hocko
  Cc: SeongJae Park, akpm, lauraa, minchan, sergey.senozhatsky,
	linux-mm, linux-kernel

Hello Michal,

Thanks for your comment :)

On Tue, 24 Feb 2015, Michal Hocko wrote:

> On Tue 24-02-15 04:54:18, SeongJae Park wrote:
> [...]
>>  include/linux/cma.h  |    4 +
>>  include/linux/gcma.h |   64 +++
>>  mm/Kconfig           |   24 +
>>  mm/Makefile          |    1 +
>>  mm/cma.c             |  113 ++++-
>>  mm/gcma.c            | 1321 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>  6 files changed, 1508 insertions(+), 19 deletions(-)
>>  create mode 100644 include/linux/gcma.h
>>  create mode 100644 mm/gcma.c
>
> Wow this is huge! And I do not see reason for it to be so big. Why
> cannot you simply define (per-cma area) 2-class users policy? Either via
> kernel command line or export areas to userspace and allow to set policy
> there.

For implementation of the idea, we should develop not only policy 
selection, but also backend for discardable memory. Most part of this 
patch were made for the backend.

Current implementation gives selection of policy per cma area to users. 
Only about 120 lines of code were changed for that though it's most ugly 
part of this patch. The part remains as ugly in this RFC because this is 
just prototype. The part will be changed in next version patchset.

>
> For starter something like the following policies should suffice AFAIU
> your description.
> 	- NONE - exclusive pool for CMA allocations only
> 	- DROPABLE - only allocations which might be dropped without any
> 	  additional actions - e.g. cleancache and frontswap with
> 	  write-through policy
> 	- RECLAIMABLE - only movable allocations which can be migrated
> 	  or dropped after writeback.
>
> Has such an approach been considered?

Similarly, but not in same way. In summary, GCMA gives DROPABLE and 
RECLAIMABLE policy selection per cma area and NONE policy to entire cma 
area declared using GCMA interface.

In detail, user could set policy of cma area as gcma way(DROPABLE) or cma 
way(RECLAIMABLE). Also, user could set gcma to utilize their cma area with 
Cleancache and/or Frontswap or not(NONE policy).

Your suggestion looks simple and better to understand. Next version of 
gcma will let users to be able to select policy as those per cma area.


Thanks,
SeongJae Park

> -- 
> Michal Hocko
> SUSE Labs
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC v2 0/5] introduce gcma
  2015-02-25  5:31     ` SeongJae Park
@ 2015-02-25 16:11       ` Michal Hocko
  -1 siblings, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2015-02-25 16:11 UTC (permalink / raw)
  To: SeongJae Park
  Cc: akpm, lauraa, minchan, sergey.senozhatsky, linux-mm, linux-kernel

On Wed 25-02-15 14:31:08, SeongJae Park wrote:
> Hello Michal,
> 
> Thanks for your comment :)
> 
> On Tue, 24 Feb 2015, Michal Hocko wrote:
> 
> >On Tue 24-02-15 04:54:18, SeongJae Park wrote:
> >[...]
> >> include/linux/cma.h  |    4 +
> >> include/linux/gcma.h |   64 +++
> >> mm/Kconfig           |   24 +
> >> mm/Makefile          |    1 +
> >> mm/cma.c             |  113 ++++-
> >> mm/gcma.c            | 1321 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >> 6 files changed, 1508 insertions(+), 19 deletions(-)
> >> create mode 100644 include/linux/gcma.h
> >> create mode 100644 mm/gcma.c
> >
> >Wow this is huge! And I do not see reason for it to be so big. Why
> >cannot you simply define (per-cma area) 2-class users policy? Either via
> >kernel command line or export areas to userspace and allow to set policy
> >there.
> 
> For implementation of the idea, we should develop not only policy selection,
> but also backend for discardable memory. Most part of this patch were made
> for the backend.

What is the backend and why is it needed? I thought the discardable will
go back to the CMA pool. I mean the cover email explained why the
current CMA allocation policy might lead to lower success rate or
stalls. So I would expect a new policy would be a relatively small
change in the CMA allocation path to serve 2-class users as per policy.
It is not clear to my why we need to pull a whole gcma layer in. I might
be missing something obvious because I haven't looked at the patches yet
but this should better be explained in the cover letter.

Thanks!
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC v2 0/5] introduce gcma
@ 2015-02-25 16:11       ` Michal Hocko
  0 siblings, 0 replies; 20+ messages in thread
From: Michal Hocko @ 2015-02-25 16:11 UTC (permalink / raw)
  To: SeongJae Park
  Cc: akpm, lauraa, minchan, sergey.senozhatsky, linux-mm, linux-kernel

On Wed 25-02-15 14:31:08, SeongJae Park wrote:
> Hello Michal,
> 
> Thanks for your comment :)
> 
> On Tue, 24 Feb 2015, Michal Hocko wrote:
> 
> >On Tue 24-02-15 04:54:18, SeongJae Park wrote:
> >[...]
> >> include/linux/cma.h  |    4 +
> >> include/linux/gcma.h |   64 +++
> >> mm/Kconfig           |   24 +
> >> mm/Makefile          |    1 +
> >> mm/cma.c             |  113 ++++-
> >> mm/gcma.c            | 1321 ++++++++++++++++++++++++++++++++++++++++++++++++++
> >> 6 files changed, 1508 insertions(+), 19 deletions(-)
> >> create mode 100644 include/linux/gcma.h
> >> create mode 100644 mm/gcma.c
> >
> >Wow this is huge! And I do not see reason for it to be so big. Why
> >cannot you simply define (per-cma area) 2-class users policy? Either via
> >kernel command line or export areas to userspace and allow to set policy
> >there.
> 
> For implementation of the idea, we should develop not only policy selection,
> but also backend for discardable memory. Most part of this patch were made
> for the backend.

What is the backend and why is it needed? I thought the discardable will
go back to the CMA pool. I mean the cover email explained why the
current CMA allocation policy might lead to lower success rate or
stalls. So I would expect a new policy would be a relatively small
change in the CMA allocation path to serve 2-class users as per policy.
It is not clear to my why we need to pull a whole gcma layer in. I might
be missing something obvious because I haven't looked at the patches yet
but this should better be explained in the cover letter.

Thanks!
-- 
Michal Hocko
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC v2 0/5] introduce gcma
  2015-02-25 16:11       ` Michal Hocko
@ 2015-02-25 16:47         ` SeongJae Park
  -1 siblings, 0 replies; 20+ messages in thread
From: SeongJae Park @ 2015-02-25 16:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: SeongJae Park, akpm, lauraa, minchan, sergey.senozhatsky,
	linux-mm, linux-kernel



On Wed, 25 Feb 2015, Michal Hocko wrote:

> On Wed 25-02-15 14:31:08, SeongJae Park wrote:
>> Hello Michal,
>>
>> Thanks for your comment :)
>>
>> On Tue, 24 Feb 2015, Michal Hocko wrote:
>>
>>> On Tue 24-02-15 04:54:18, SeongJae Park wrote:
>>> [...]
>>>> include/linux/cma.h  |    4 +
>>>> include/linux/gcma.h |   64 +++
>>>> mm/Kconfig           |   24 +
>>>> mm/Makefile          |    1 +
>>>> mm/cma.c             |  113 ++++-
>>>> mm/gcma.c            | 1321 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 6 files changed, 1508 insertions(+), 19 deletions(-)
>>>> create mode 100644 include/linux/gcma.h
>>>> create mode 100644 mm/gcma.c
>>>
>>> Wow this is huge! And I do not see reason for it to be so big. Why
>>> cannot you simply define (per-cma area) 2-class users policy? Either via
>>> kernel command line or export areas to userspace and allow to set policy
>>> there.
>>
>> For implementation of the idea, we should develop not only policy selection,
>> but also backend for discardable memory. Most part of this patch were made
>> for the backend.
>
> What is the backend and why is it needed? I thought the discardable will
> go back to the CMA pool. I mean the cover email explained why the
> current CMA allocation policy might lead to lower success rate or
> stalls. So I would expect a new policy would be a relatively small
> change in the CMA allocation path to serve 2-class users as per policy.
> It is not clear to my why we need to pull a whole gcma layer in. I might
> be missing something obvious because I haven't looked at the patches yet
> but this should better be explained in the cover letter.

I meant backend for 2nd-class clients like cleancache and frontswap.
Because implementing backend for cleancache or frontswap is subsystem's
responsibility, gcma was needed to implement those backend. I believe
second ("gcma: utilize reserved memory as discardable memory") and
third ("gcma: adopt cleancache and frontswap as second-class
clients") could be helpful to understand about that.

And yes, I agree the explanation was not enough. My fault, sorry. My
explanation was too concentrated on policy itself. I should explained
about how the policy could be implemented and how gcma did. I will explain
about that in coverletter with next version.

Thanks for your helpful and nice comment.


Thanks,
SeongJae Park

>
> Thanks!
> -- 
> Michal Hocko
> SUSE Labs
>

^ permalink raw reply	[flat|nested] 20+ messages in thread

* Re: [RFC v2 0/5] introduce gcma
@ 2015-02-25 16:47         ` SeongJae Park
  0 siblings, 0 replies; 20+ messages in thread
From: SeongJae Park @ 2015-02-25 16:47 UTC (permalink / raw)
  To: Michal Hocko
  Cc: SeongJae Park, akpm, lauraa, minchan, sergey.senozhatsky,
	linux-mm, linux-kernel



On Wed, 25 Feb 2015, Michal Hocko wrote:

> On Wed 25-02-15 14:31:08, SeongJae Park wrote:
>> Hello Michal,
>>
>> Thanks for your comment :)
>>
>> On Tue, 24 Feb 2015, Michal Hocko wrote:
>>
>>> On Tue 24-02-15 04:54:18, SeongJae Park wrote:
>>> [...]
>>>> include/linux/cma.h  |    4 +
>>>> include/linux/gcma.h |   64 +++
>>>> mm/Kconfig           |   24 +
>>>> mm/Makefile          |    1 +
>>>> mm/cma.c             |  113 ++++-
>>>> mm/gcma.c            | 1321 ++++++++++++++++++++++++++++++++++++++++++++++++++
>>>> 6 files changed, 1508 insertions(+), 19 deletions(-)
>>>> create mode 100644 include/linux/gcma.h
>>>> create mode 100644 mm/gcma.c
>>>
>>> Wow this is huge! And I do not see reason for it to be so big. Why
>>> cannot you simply define (per-cma area) 2-class users policy? Either via
>>> kernel command line or export areas to userspace and allow to set policy
>>> there.
>>
>> For implementation of the idea, we should develop not only policy selection,
>> but also backend for discardable memory. Most part of this patch were made
>> for the backend.
>
> What is the backend and why is it needed? I thought the discardable will
> go back to the CMA pool. I mean the cover email explained why the
> current CMA allocation policy might lead to lower success rate or
> stalls. So I would expect a new policy would be a relatively small
> change in the CMA allocation path to serve 2-class users as per policy.
> It is not clear to my why we need to pull a whole gcma layer in. I might
> be missing something obvious because I haven't looked at the patches yet
> but this should better be explained in the cover letter.

I meant backend for 2nd-class clients like cleancache and frontswap.
Because implementing backend for cleancache or frontswap is subsystem's
responsibility, gcma was needed to implement those backend. I believe
second ("gcma: utilize reserved memory as discardable memory") and
third ("gcma: adopt cleancache and frontswap as second-class
clients") could be helpful to understand about that.

And yes, I agree the explanation was not enough. My fault, sorry. My
explanation was too concentrated on policy itself. I should explained
about how the policy could be implemented and how gcma did. I will explain
about that in coverletter with next version.

Thanks for your helpful and nice comment.


Thanks,
SeongJae Park

>
> Thanks!
> -- 
> Michal Hocko
> SUSE Labs
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2015-02-25 16:44 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-23 19:54 [RFC v2 0/5] introduce gcma SeongJae Park
2015-02-23 19:54 ` SeongJae Park
2015-02-23 19:54 ` [RFC v2 1/5] gcma: introduce contiguous memory allocator SeongJae Park
2015-02-23 19:54   ` SeongJae Park
2015-02-23 19:54 ` [RFC v2 2/5] gcma: utilize reserved memory as discardable memory SeongJae Park
2015-02-23 19:54   ` SeongJae Park
2015-02-23 19:54 ` [RFC v2 3/5] gcma: adopt cleancache and frontswap as second-class clients SeongJae Park
2015-02-23 19:54   ` SeongJae Park
2015-02-23 19:54 ` [RFC v2 4/5] gcma: export statistical data on debugfs SeongJae Park
2015-02-23 19:54   ` SeongJae Park
2015-02-23 19:54 ` [RFC v2 5/5] gcma: integrate gcma under cma interface SeongJae Park
2015-02-23 19:54   ` SeongJae Park
2015-02-24 14:48 ` [RFC v2 0/5] introduce gcma Michal Hocko
2015-02-24 14:48   ` Michal Hocko
2015-02-25  5:31   ` SeongJae Park
2015-02-25  5:31     ` SeongJae Park
2015-02-25 16:11     ` Michal Hocko
2015-02-25 16:11       ` Michal Hocko
2015-02-25 16:47       ` SeongJae Park
2015-02-25 16:47         ` SeongJae Park

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.