All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/6] ksm changes (v2)
@ 2009-05-04 22:25 ` Izik Eidus
  0 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-04 22:25 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

Following patchs touch 4 diffrent areas inside ksm:

1) Patchs 1 - 3: Change the api to be more robust and make more sense.
                 This include:
                     * Limiting the number of memory regions user can
                       register inside ksm per file descriptor that
                       he open.

                     * Reject overlap memory addresses registrations.

                     * change KSM_REMOVE_MEMORY_REGION ioctl to make
                       more sense, untill this patchs user was able
                       to register servel memory regions per file
                       descriptor, but when he called to
                       KSM_REMOVE_MEMORY_REGION, he had no way to tell
                       what memory region he want to remove, as a
                       result each call to KSM_REMOVE_MEMORY_REGION
                       nuked all the regions inside the fd.
                       Now KSM_REMOVE_MEMORY_REGION is working on
                       specific addresses.

2) Patch 4: Use generic helper functions to deal with the vma prot.
            
3) Patch 5: Return ksm to be build on all archs (Now after patch 4,
            ksm shouldnt break any arch).

4) Patch 6: change the miscdevice minor number - lets wait to Alan
            saying he is happy with this change before we apply.

>From V1 to V2:

Patch 3 (ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.) was changed:

- Fixed bug found by Hugh.

- Make ksm_sma_ioctl_remove_memory_region return -EFAULT in case there
  is no such memory. (up untill now it was just return 0)


Thanks.


Izik Eidus (6):
  ksm: limiting the num of mem regions user can register per fd.
  ksm: dont allow overlap memory addresses registrations.
  ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  ksm: change the prot handling to use the generic helper functions
  ksm: build system make it compile for all archs
  ksm: use another miscdevice minor number.

 Documentation/devices.txt  |    1 +
 include/linux/miscdevice.h |    2 +-
 mm/Kconfig                 |    1 -
 mm/ksm.c                   |  126 ++++++++++++++++++++++++++++++++++---------
 4 files changed, 101 insertions(+), 29 deletions(-)


^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH 0/6] ksm changes (v2)
@ 2009-05-04 22:25 ` Izik Eidus
  0 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-04 22:25 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

Following patchs touch 4 diffrent areas inside ksm:

1) Patchs 1 - 3: Change the api to be more robust and make more sense.
                 This include:
                     * Limiting the number of memory regions user can
                       register inside ksm per file descriptor that
                       he open.

                     * Reject overlap memory addresses registrations.

                     * change KSM_REMOVE_MEMORY_REGION ioctl to make
                       more sense, untill this patchs user was able
                       to register servel memory regions per file
                       descriptor, but when he called to
                       KSM_REMOVE_MEMORY_REGION, he had no way to tell
                       what memory region he want to remove, as a
                       result each call to KSM_REMOVE_MEMORY_REGION
                       nuked all the regions inside the fd.
                       Now KSM_REMOVE_MEMORY_REGION is working on
                       specific addresses.

2) Patch 4: Use generic helper functions to deal with the vma prot.
            
3) Patch 5: Return ksm to be build on all archs (Now after patch 4,
            ksm shouldnt break any arch).

4) Patch 6: change the miscdevice minor number - lets wait to Alan
            saying he is happy with this change before we apply.

>From V1 to V2:

Patch 3 (ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.) was changed:

- Fixed bug found by Hugh.

- Make ksm_sma_ioctl_remove_memory_region return -EFAULT in case there
  is no such memory. (up untill now it was just return 0)


Thanks.


Izik Eidus (6):
  ksm: limiting the num of mem regions user can register per fd.
  ksm: dont allow overlap memory addresses registrations.
  ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  ksm: change the prot handling to use the generic helper functions
  ksm: build system make it compile for all archs
  ksm: use another miscdevice minor number.

 Documentation/devices.txt  |    1 +
 include/linux/miscdevice.h |    2 +-
 mm/Kconfig                 |    1 -
 mm/ksm.c                   |  126 ++++++++++++++++++++++++++++++++++---------
 4 files changed, 101 insertions(+), 29 deletions(-)

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd.
  2009-05-04 22:25 ` Izik Eidus
@ 2009-05-04 22:25   ` Izik Eidus
  -1 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-04 22:25 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

Right now user can open /dev/ksm fd and register unlimited number of
regions, such behavior may allocate unlimited amount of kernel memory
and get the whole host into out of memory situation.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 mm/ksm.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 6165276..d58db6b 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -48,6 +48,9 @@ static int rmap_hash_size;
 module_param(rmap_hash_size, int, 0);
 MODULE_PARM_DESC(rmap_hash_size, "Hash table size for the reverse mapping");
 
+static int regions_per_fd;
+module_param(regions_per_fd, int, 0);
+
 /*
  * ksm_mem_slot - hold information for an userspace scanning range
  * (the scanning for this region will be from addr untill addr +
@@ -67,6 +70,7 @@ struct ksm_mem_slot {
  */
 struct ksm_sma {
 	struct list_head sma_slots;
+	int nregions;
 };
 
 /**
@@ -453,6 +457,11 @@ static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 	struct ksm_mem_slot *slot;
 	int ret = -EPERM;
 
+	if ((ksm_sma->nregions + 1) > regions_per_fd) {
+		ret = -EBUSY;
+		goto out;
+	}
+
 	slot = kzalloc(sizeof(struct ksm_mem_slot), GFP_KERNEL);
 	if (!slot) {
 		ret = -ENOMEM;
@@ -473,6 +482,7 @@ static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 
 	list_add_tail(&slot->link, &slots);
 	list_add_tail(&slot->sma_link, &ksm_sma->sma_slots);
+	ksm_sma->nregions++;
 
 	up_write(&slots_lock);
 	return 0;
@@ -511,6 +521,7 @@ static int ksm_sma_ioctl_remove_memory_region(struct ksm_sma *ksm_sma)
 		mmput(slot->mm);
 		list_del(&slot->sma_link);
 		kfree(slot);
+		ksm_sma->nregions--;
 	}
 	up_write(&slots_lock);
 	return 0;
@@ -1389,6 +1400,7 @@ static int ksm_dev_ioctl_create_shared_memory_area(void)
 	}
 
 	INIT_LIST_HEAD(&ksm_sma->sma_slots);
+	ksm_sma->nregions = 0;
 
 	fd = anon_inode_getfd("ksm-sma", &ksm_sma_fops, ksm_sma, 0);
 	if (fd < 0)
@@ -1631,6 +1643,9 @@ static int __init ksm_init(void)
 	if (r)
 		goto out_free1;
 
+	if (!regions_per_fd)
+		regions_per_fd = 1024;
+
 	ksm_thread = kthread_run(ksm_scan_thread, NULL, "kksmd");
 	if (IS_ERR(ksm_thread)) {
 		printk(KERN_ERR "ksm: creating kthread failed\n");
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd.
@ 2009-05-04 22:25   ` Izik Eidus
  0 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-04 22:25 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

Right now user can open /dev/ksm fd and register unlimited number of
regions, such behavior may allocate unlimited amount of kernel memory
and get the whole host into out of memory situation.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 mm/ksm.c |   15 +++++++++++++++
 1 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 6165276..d58db6b 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -48,6 +48,9 @@ static int rmap_hash_size;
 module_param(rmap_hash_size, int, 0);
 MODULE_PARM_DESC(rmap_hash_size, "Hash table size for the reverse mapping");
 
+static int regions_per_fd;
+module_param(regions_per_fd, int, 0);
+
 /*
  * ksm_mem_slot - hold information for an userspace scanning range
  * (the scanning for this region will be from addr untill addr +
@@ -67,6 +70,7 @@ struct ksm_mem_slot {
  */
 struct ksm_sma {
 	struct list_head sma_slots;
+	int nregions;
 };
 
 /**
@@ -453,6 +457,11 @@ static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 	struct ksm_mem_slot *slot;
 	int ret = -EPERM;
 
+	if ((ksm_sma->nregions + 1) > regions_per_fd) {
+		ret = -EBUSY;
+		goto out;
+	}
+
 	slot = kzalloc(sizeof(struct ksm_mem_slot), GFP_KERNEL);
 	if (!slot) {
 		ret = -ENOMEM;
@@ -473,6 +482,7 @@ static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 
 	list_add_tail(&slot->link, &slots);
 	list_add_tail(&slot->sma_link, &ksm_sma->sma_slots);
+	ksm_sma->nregions++;
 
 	up_write(&slots_lock);
 	return 0;
@@ -511,6 +521,7 @@ static int ksm_sma_ioctl_remove_memory_region(struct ksm_sma *ksm_sma)
 		mmput(slot->mm);
 		list_del(&slot->sma_link);
 		kfree(slot);
+		ksm_sma->nregions--;
 	}
 	up_write(&slots_lock);
 	return 0;
@@ -1389,6 +1400,7 @@ static int ksm_dev_ioctl_create_shared_memory_area(void)
 	}
 
 	INIT_LIST_HEAD(&ksm_sma->sma_slots);
+	ksm_sma->nregions = 0;
 
 	fd = anon_inode_getfd("ksm-sma", &ksm_sma_fops, ksm_sma, 0);
 	if (fd < 0)
@@ -1631,6 +1643,9 @@ static int __init ksm_init(void)
 	if (r)
 		goto out_free1;
 
+	if (!regions_per_fd)
+		regions_per_fd = 1024;
+
 	ksm_thread = kthread_run(ksm_scan_thread, NULL, "kksmd");
 	if (IS_ERR(ksm_thread)) {
 		printk(KERN_ERR "ksm: creating kthread failed\n");
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-04 22:25   ` Izik Eidus
@ 2009-05-04 22:25     ` Izik Eidus
  -1 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-04 22:25 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

subjects say it all.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 mm/ksm.c |   58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index d58db6b..982dfff 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -451,21 +451,71 @@ static void remove_page_from_tree(struct mm_struct *mm,
 	remove_rmap_item_from_tree(rmap_item);
 }
 
+static inline int is_intersecting_address(unsigned long addr,
+					  unsigned long begin,
+					  unsigned long end)
+{
+	if (addr >= begin && addr < end)
+		return 1;
+	return 0;
+}
+
+/*
+ * is_overlap_mem - check if there is overlapping with memory that was already
+ * registred.
+ *
+ * note - this function must to be called under slots_lock
+ */
+static int is_overlap_mem(struct ksm_memory_region *mem)
+{
+	struct ksm_mem_slot *slot;
+
+	list_for_each_entry(slot, &slots, link) {
+		unsigned long mem_end;
+		unsigned long slot_end;
+
+		cond_resched();
+
+		if (current->mm != slot->mm)
+			continue;
+
+		mem_end = mem->addr + (unsigned long)mem->npages * PAGE_SIZE;
+		slot_end = slot->addr + (unsigned long)slot->npages * PAGE_SIZE;
+
+		if (is_intersecting_address(mem->addr, slot->addr, slot_end) ||
+		    is_intersecting_address(mem_end - 1, slot->addr, slot_end))
+			return 1;
+		if (is_intersecting_address(slot->addr, mem->addr, mem_end) ||
+		    is_intersecting_address(slot_end - 1, mem->addr, mem_end))
+			return 1;
+	}
+
+	return 0;
+}
+
 static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 						struct ksm_memory_region *mem)
 {
 	struct ksm_mem_slot *slot;
 	int ret = -EPERM;
 
+	if (!mem->npages)
+		goto out;
+
+	down_write(&slots_lock);
+
 	if ((ksm_sma->nregions + 1) > regions_per_fd) {
 		ret = -EBUSY;
-		goto out;
+		goto out_unlock;
 	}
 
+	if (is_overlap_mem(mem))
+		goto out_unlock;
+
 	slot = kzalloc(sizeof(struct ksm_mem_slot), GFP_KERNEL);
 	if (!slot) {
 		ret = -ENOMEM;
-		goto out;
+		goto out_unlock;
 	}
 
 	/*
@@ -478,8 +528,6 @@ static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 	slot->addr = mem->addr;
 	slot->npages = mem->npages;
 
-	down_write(&slots_lock);
-
 	list_add_tail(&slot->link, &slots);
 	list_add_tail(&slot->sma_link, &ksm_sma->sma_slots);
 	ksm_sma->nregions++;
@@ -489,6 +537,8 @@ static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 
 out_free:
 	kfree(slot);
+out_unlock:
+	up_write(&slots_lock);
 out:
 	return ret;
 }
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
@ 2009-05-04 22:25     ` Izik Eidus
  0 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-04 22:25 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

subjects say it all.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 mm/ksm.c |   58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index d58db6b..982dfff 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -451,21 +451,71 @@ static void remove_page_from_tree(struct mm_struct *mm,
 	remove_rmap_item_from_tree(rmap_item);
 }
 
+static inline int is_intersecting_address(unsigned long addr,
+					  unsigned long begin,
+					  unsigned long end)
+{
+	if (addr >= begin && addr < end)
+		return 1;
+	return 0;
+}
+
+/*
+ * is_overlap_mem - check if there is overlapping with memory that was already
+ * registred.
+ *
+ * note - this function must to be called under slots_lock
+ */
+static int is_overlap_mem(struct ksm_memory_region *mem)
+{
+	struct ksm_mem_slot *slot;
+
+	list_for_each_entry(slot, &slots, link) {
+		unsigned long mem_end;
+		unsigned long slot_end;
+
+		cond_resched();
+
+		if (current->mm != slot->mm)
+			continue;
+
+		mem_end = mem->addr + (unsigned long)mem->npages * PAGE_SIZE;
+		slot_end = slot->addr + (unsigned long)slot->npages * PAGE_SIZE;
+
+		if (is_intersecting_address(mem->addr, slot->addr, slot_end) ||
+		    is_intersecting_address(mem_end - 1, slot->addr, slot_end))
+			return 1;
+		if (is_intersecting_address(slot->addr, mem->addr, mem_end) ||
+		    is_intersecting_address(slot_end - 1, mem->addr, mem_end))
+			return 1;
+	}
+
+	return 0;
+}
+
 static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 						struct ksm_memory_region *mem)
 {
 	struct ksm_mem_slot *slot;
 	int ret = -EPERM;
 
+	if (!mem->npages)
+		goto out;
+
+	down_write(&slots_lock);
+
 	if ((ksm_sma->nregions + 1) > regions_per_fd) {
 		ret = -EBUSY;
-		goto out;
+		goto out_unlock;
 	}
 
+	if (is_overlap_mem(mem))
+		goto out_unlock;
+
 	slot = kzalloc(sizeof(struct ksm_mem_slot), GFP_KERNEL);
 	if (!slot) {
 		ret = -ENOMEM;
-		goto out;
+		goto out_unlock;
 	}
 
 	/*
@@ -478,8 +528,6 @@ static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 	slot->addr = mem->addr;
 	slot->npages = mem->npages;
 
-	down_write(&slots_lock);
-
 	list_add_tail(&slot->link, &slots);
 	list_add_tail(&slot->sma_link, &ksm_sma->sma_slots);
 	ksm_sma->nregions++;
@@ -489,6 +537,8 @@ static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 
 out_free:
 	kfree(slot);
+out_unlock:
+	up_write(&slots_lock);
 out:
 	return ret;
 }
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-04 22:25     ` Izik Eidus
@ 2009-05-04 22:25       ` Izik Eidus
  -1 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-04 22:25 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

This patch change the KSM_REMOVE_MEMORY_REGION ioctl to be specific per
memory region (instead of flushing all the registred memory regions inside
the file descriptor like it happen now)

The previoes api was:
user register memory regions using KSM_REGISTER_MEMORY_REGION inside the fd,
and then when he wanted to remove just one memory region, he had to remove them
all using KSM_REMOVE_MEMORY_REGION.

This patch change this beahivor by chaning the KSM_REMOVE_MEMORY_REGION
ioctl to recive another paramter that it is the begining of the virtual
address that is wanted to be removed.

(user can still remove all the memory regions all at once, by just closing
the file descriptor)

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 mm/ksm.c |   45 ++++++++++++++++++++++++++-------------------
 1 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 982dfff..6e8b24b 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -543,48 +543,55 @@ out:
 	return ret;
 }
 
-static void remove_mm_from_hash_and_tree(struct mm_struct *mm)
+static void remove_slot_from_hash_and_tree(struct ksm_mem_slot *slot)
 {
-	struct ksm_mem_slot *slot;
 	int pages_count;
 
-	list_for_each_entry(slot, &slots, link)
-		if (slot->mm == mm)
-			break;
-	BUG_ON(!slot);
-
 	root_unstable_tree = RB_ROOT;
 	for (pages_count = 0; pages_count < slot->npages; ++pages_count)
-		remove_page_from_tree(mm, slot->addr +
+		remove_page_from_tree(slot->mm, slot->addr +
 				      pages_count * PAGE_SIZE);
 	/* Called under slots_lock */
 	list_del(&slot->link);
 }
 
-static int ksm_sma_ioctl_remove_memory_region(struct ksm_sma *ksm_sma)
+static int ksm_sma_ioctl_remove_memory_region(struct ksm_sma *ksm_sma,
+					      unsigned long addr)
 {
+	int ret = -EFAULT;
 	struct ksm_mem_slot *slot, *node;
 
 	down_write(&slots_lock);
 	list_for_each_entry_safe(slot, node, &ksm_sma->sma_slots, sma_link) {
-		remove_mm_from_hash_and_tree(slot->mm);
-		mmput(slot->mm);
-		list_del(&slot->sma_link);
-		kfree(slot);
-		ksm_sma->nregions--;
+		if (addr == slot->addr) {
+			remove_slot_from_hash_and_tree(slot);
+			mmput(slot->mm);
+			list_del(&slot->sma_link);
+			kfree(slot);
+			ksm_sma->nregions--;
+			ret = 0;
+		}
 	}
 	up_write(&slots_lock);
-	return 0;
+	return ret;
 }
 
 static int ksm_sma_release(struct inode *inode, struct file *filp)
 {
+	struct ksm_mem_slot *slot, *node;
 	struct ksm_sma *ksm_sma = filp->private_data;
-	int r;
 
-	r = ksm_sma_ioctl_remove_memory_region(ksm_sma);
+	down_write(&slots_lock);
+	list_for_each_entry_safe(slot, node, &ksm_sma->sma_slots, sma_link) {
+		remove_slot_from_hash_and_tree(slot);
+		mmput(slot->mm);
+		list_del(&slot->sma_link);
+		kfree(slot);
+	}
+	up_write(&slots_lock);
+
 	kfree(ksm_sma);
-	return r;
+	return 0;
 }
 
 static long ksm_sma_ioctl(struct file *filp,
@@ -607,7 +614,7 @@ static long ksm_sma_ioctl(struct file *filp,
 		break;
 	}
 	case KSM_REMOVE_MEMORY_REGION:
-		r = ksm_sma_ioctl_remove_memory_region(sma);
+		r = ksm_sma_ioctl_remove_memory_region(sma, arg);
 		break;
 	}
 
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-04 22:25       ` Izik Eidus
  0 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-04 22:25 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

This patch change the KSM_REMOVE_MEMORY_REGION ioctl to be specific per
memory region (instead of flushing all the registred memory regions inside
the file descriptor like it happen now)

The previoes api was:
user register memory regions using KSM_REGISTER_MEMORY_REGION inside the fd,
and then when he wanted to remove just one memory region, he had to remove them
all using KSM_REMOVE_MEMORY_REGION.

This patch change this beahivor by chaning the KSM_REMOVE_MEMORY_REGION
ioctl to recive another paramter that it is the begining of the virtual
address that is wanted to be removed.

(user can still remove all the memory regions all at once, by just closing
the file descriptor)

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 mm/ksm.c |   45 ++++++++++++++++++++++++++-------------------
 1 files changed, 26 insertions(+), 19 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 982dfff..6e8b24b 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -543,48 +543,55 @@ out:
 	return ret;
 }
 
-static void remove_mm_from_hash_and_tree(struct mm_struct *mm)
+static void remove_slot_from_hash_and_tree(struct ksm_mem_slot *slot)
 {
-	struct ksm_mem_slot *slot;
 	int pages_count;
 
-	list_for_each_entry(slot, &slots, link)
-		if (slot->mm == mm)
-			break;
-	BUG_ON(!slot);
-
 	root_unstable_tree = RB_ROOT;
 	for (pages_count = 0; pages_count < slot->npages; ++pages_count)
-		remove_page_from_tree(mm, slot->addr +
+		remove_page_from_tree(slot->mm, slot->addr +
 				      pages_count * PAGE_SIZE);
 	/* Called under slots_lock */
 	list_del(&slot->link);
 }
 
-static int ksm_sma_ioctl_remove_memory_region(struct ksm_sma *ksm_sma)
+static int ksm_sma_ioctl_remove_memory_region(struct ksm_sma *ksm_sma,
+					      unsigned long addr)
 {
+	int ret = -EFAULT;
 	struct ksm_mem_slot *slot, *node;
 
 	down_write(&slots_lock);
 	list_for_each_entry_safe(slot, node, &ksm_sma->sma_slots, sma_link) {
-		remove_mm_from_hash_and_tree(slot->mm);
-		mmput(slot->mm);
-		list_del(&slot->sma_link);
-		kfree(slot);
-		ksm_sma->nregions--;
+		if (addr == slot->addr) {
+			remove_slot_from_hash_and_tree(slot);
+			mmput(slot->mm);
+			list_del(&slot->sma_link);
+			kfree(slot);
+			ksm_sma->nregions--;
+			ret = 0;
+		}
 	}
 	up_write(&slots_lock);
-	return 0;
+	return ret;
 }
 
 static int ksm_sma_release(struct inode *inode, struct file *filp)
 {
+	struct ksm_mem_slot *slot, *node;
 	struct ksm_sma *ksm_sma = filp->private_data;
-	int r;
 
-	r = ksm_sma_ioctl_remove_memory_region(ksm_sma);
+	down_write(&slots_lock);
+	list_for_each_entry_safe(slot, node, &ksm_sma->sma_slots, sma_link) {
+		remove_slot_from_hash_and_tree(slot);
+		mmput(slot->mm);
+		list_del(&slot->sma_link);
+		kfree(slot);
+	}
+	up_write(&slots_lock);
+
 	kfree(ksm_sma);
-	return r;
+	return 0;
 }
 
 static long ksm_sma_ioctl(struct file *filp,
@@ -607,7 +614,7 @@ static long ksm_sma_ioctl(struct file *filp,
 		break;
 	}
 	case KSM_REMOVE_MEMORY_REGION:
-		r = ksm_sma_ioctl_remove_memory_region(sma);
+		r = ksm_sma_ioctl_remove_memory_region(sma, arg);
 		break;
 	}
 
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 4/6] ksm: change the prot handling to use the generic helper functions
  2009-05-04 22:25       ` Izik Eidus
@ 2009-05-04 22:25         ` Izik Eidus
  -1 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-04 22:25 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

This is needed to avoid breaking some architectures.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 mm/ksm.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 6e8b24b..8a0489b 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -762,8 +762,8 @@ static int try_to_merge_two_pages_alloc(struct mm_struct *mm1,
 		up_read(&mm1->mmap_sem);
 		return ret;
 	}
-	prot = vma->vm_page_prot;
-	pgprot_val(prot) &= ~_PAGE_RW;
+
+	prot = vm_get_page_prot(vma->vm_flags & ~VM_WRITE);
 
 	copy_user_highpage(kpage, page1, addr1, vma);
 	ret = try_to_merge_one_page(mm1, vma, page1, kpage, prot);
@@ -780,8 +780,7 @@ static int try_to_merge_two_pages_alloc(struct mm_struct *mm1,
 			return ret;
 		}
 
-		prot = vma->vm_page_prot;
-		pgprot_val(prot) &= ~_PAGE_RW;
+		prot = vm_get_page_prot(vma->vm_flags & ~VM_WRITE);
 
 		ret = try_to_merge_one_page(mm2, vma, page2, kpage,
 					    prot);
@@ -827,8 +826,9 @@ static int try_to_merge_two_pages_noalloc(struct mm_struct *mm1,
 		up_read(&mm1->mmap_sem);
 		return ret;
 	}
-	prot = vma->vm_page_prot;
-	pgprot_val(prot) &= ~_PAGE_RW;
+
+	prot = vm_get_page_prot(vma->vm_flags & ~VM_WRITE);
+
 	ret = try_to_merge_one_page(mm1, vma, page1, page2, prot);
 	up_read(&mm1->mmap_sem);
 	if (!ret)
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 4/6] ksm: change the prot handling to use the generic helper functions
@ 2009-05-04 22:25         ` Izik Eidus
  0 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-04 22:25 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

This is needed to avoid breaking some architectures.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 mm/ksm.c |   12 ++++++------
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index 6e8b24b..8a0489b 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -762,8 +762,8 @@ static int try_to_merge_two_pages_alloc(struct mm_struct *mm1,
 		up_read(&mm1->mmap_sem);
 		return ret;
 	}
-	prot = vma->vm_page_prot;
-	pgprot_val(prot) &= ~_PAGE_RW;
+
+	prot = vm_get_page_prot(vma->vm_flags & ~VM_WRITE);
 
 	copy_user_highpage(kpage, page1, addr1, vma);
 	ret = try_to_merge_one_page(mm1, vma, page1, kpage, prot);
@@ -780,8 +780,7 @@ static int try_to_merge_two_pages_alloc(struct mm_struct *mm1,
 			return ret;
 		}
 
-		prot = vma->vm_page_prot;
-		pgprot_val(prot) &= ~_PAGE_RW;
+		prot = vm_get_page_prot(vma->vm_flags & ~VM_WRITE);
 
 		ret = try_to_merge_one_page(mm2, vma, page2, kpage,
 					    prot);
@@ -827,8 +826,9 @@ static int try_to_merge_two_pages_noalloc(struct mm_struct *mm1,
 		up_read(&mm1->mmap_sem);
 		return ret;
 	}
-	prot = vma->vm_page_prot;
-	pgprot_val(prot) &= ~_PAGE_RW;
+
+	prot = vm_get_page_prot(vma->vm_flags & ~VM_WRITE);
+
 	ret = try_to_merge_one_page(mm1, vma, page1, page2, prot);
 	up_read(&mm1->mmap_sem);
 	if (!ret)
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 5/6] ksm: build system make it compile for all archs
  2009-05-04 22:25         ` Izik Eidus
@ 2009-05-04 22:25           ` Izik Eidus
  -1 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-04 22:25 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

The known issues with cross platform support were fixed,
so we return it back to compile on all archs.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 mm/Kconfig |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index f59b1e4..fb8ac63 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -228,7 +228,6 @@ config MMU_NOTIFIER
 
 config KSM
 	tristate "Enable KSM for page sharing"
-	depends on X86
 	help
 	  Enable the KSM kernel module to allow page sharing of equal pages
 	  among different tasks.
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 5/6] ksm: build system make it compile for all archs
@ 2009-05-04 22:25           ` Izik Eidus
  0 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-04 22:25 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

The known issues with cross platform support were fixed,
so we return it back to compile on all archs.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 mm/Kconfig |    1 -
 1 files changed, 0 insertions(+), 1 deletions(-)

diff --git a/mm/Kconfig b/mm/Kconfig
index f59b1e4..fb8ac63 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -228,7 +228,6 @@ config MMU_NOTIFIER
 
 config KSM
 	tristate "Enable KSM for page sharing"
-	depends on X86
 	help
 	  Enable the KSM kernel module to allow page sharing of equal pages
 	  among different tasks.
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 6/6] ksm: use another miscdevice minor number.
  2009-05-04 22:25           ` Izik Eidus
@ 2009-05-04 22:25             ` Izik Eidus
  -1 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-04 22:25 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

The old number was registered already by another project.
The new number is #234.

Thanks.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Alan Cox <device@lanana.org>
---
 Documentation/devices.txt  |    1 +
 include/linux/miscdevice.h |    2 +-
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/Documentation/devices.txt b/Documentation/devices.txt
index 53d64d3..a0c3259 100644
--- a/Documentation/devices.txt
+++ b/Documentation/devices.txt
@@ -443,6 +443,7 @@ Your cooperation is appreciated.
 		231 = /dev/snapshot	System memory snapshot device
 		232 = /dev/kvm		Kernel-based virtual machine (hardware virtualization extensions)
 		233 = /dev/kmview	View-OS A process with a view
+		234 = /dev/ksm		Dynamic page sharing
 		240-254			Reserved for local use
 		255			Reserved for MISC_DYNAMIC_MINOR
 
diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
index 297c0bb..c7b8e9b 100644
--- a/include/linux/miscdevice.h
+++ b/include/linux/miscdevice.h
@@ -30,7 +30,7 @@
 #define HPET_MINOR		228
 #define FUSE_MINOR		229
 #define KVM_MINOR		232
-#define KSM_MINOR		233
+#define KSM_MINOR		234
 #define MISC_DYNAMIC_MINOR	255
 
 struct device;
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 6/6] ksm: use another miscdevice minor number.
@ 2009-05-04 22:25             ` Izik Eidus
  0 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-04 22:25 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

The old number was registered already by another project.
The new number is #234.

Thanks.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
Signed-off-by: Alan Cox <device@lanana.org>
---
 Documentation/devices.txt  |    1 +
 include/linux/miscdevice.h |    2 +-
 2 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/Documentation/devices.txt b/Documentation/devices.txt
index 53d64d3..a0c3259 100644
--- a/Documentation/devices.txt
+++ b/Documentation/devices.txt
@@ -443,6 +443,7 @@ Your cooperation is appreciated.
 		231 = /dev/snapshot	System memory snapshot device
 		232 = /dev/kvm		Kernel-based virtual machine (hardware virtualization extensions)
 		233 = /dev/kmview	View-OS A process with a view
+		234 = /dev/ksm		Dynamic page sharing
 		240-254			Reserved for local use
 		255			Reserved for MISC_DYNAMIC_MINOR
 
diff --git a/include/linux/miscdevice.h b/include/linux/miscdevice.h
index 297c0bb..c7b8e9b 100644
--- a/include/linux/miscdevice.h
+++ b/include/linux/miscdevice.h
@@ -30,7 +30,7 @@
 #define HPET_MINOR		228
 #define FUSE_MINOR		229
 #define KVM_MINOR		232
-#define KSM_MINOR		233
+#define KSM_MINOR		234
 #define MISC_DYNAMIC_MINOR	255
 
 struct device;
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 108+ messages in thread

* Re: [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd.
  2009-05-04 22:25   ` Izik Eidus
@ 2009-05-06  0:40     ` Rik van Riel
  -1 siblings, 0 replies; 108+ messages in thread
From: Rik van Riel @ 2009-05-06  0:40 UTC (permalink / raw)
  To: Izik Eidus
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Izik Eidus wrote:
> Right now user can open /dev/ksm fd and register unlimited number of
> regions, such behavior may allocate unlimited amount of kernel memory
> and get the whole host into out of memory situation.
> 
> Signed-off-by: Izik Eidus <ieidus@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd.
@ 2009-05-06  0:40     ` Rik van Riel
  0 siblings, 0 replies; 108+ messages in thread
From: Rik van Riel @ 2009-05-06  0:40 UTC (permalink / raw)
  To: Izik Eidus
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Izik Eidus wrote:
> Right now user can open /dev/ksm fd and register unlimited number of
> regions, such behavior may allocate unlimited amount of kernel memory
> and get the whole host into out of memory situation.
> 
> Signed-off-by: Izik Eidus <ieidus@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-04 22:25     ` Izik Eidus
@ 2009-05-06  0:43       ` Rik van Riel
  -1 siblings, 0 replies; 108+ messages in thread
From: Rik van Riel @ 2009-05-06  0:43 UTC (permalink / raw)
  To: Izik Eidus
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Izik Eidus wrote:
> subjects say it all.

Not a very useful commit message.

This makes me wonder, though.

What happens if a user mmaps a 30MB memory region, registers it
with KSM and then unmaps the middle 10MB?

> Signed-off-by: Izik Eidus <ieidus@redhat.com>

except for the commit message, Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
@ 2009-05-06  0:43       ` Rik van Riel
  0 siblings, 0 replies; 108+ messages in thread
From: Rik van Riel @ 2009-05-06  0:43 UTC (permalink / raw)
  To: Izik Eidus
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Izik Eidus wrote:
> subjects say it all.

Not a very useful commit message.

This makes me wonder, though.

What happens if a user mmaps a 30MB memory region, registers it
with KSM and then unmaps the middle 10MB?

> Signed-off-by: Izik Eidus <ieidus@redhat.com>

except for the commit message, Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-04 22:25       ` Izik Eidus
@ 2009-05-06  0:53         ` Rik van Riel
  -1 siblings, 0 replies; 108+ messages in thread
From: Rik van Riel @ 2009-05-06  0:53 UTC (permalink / raw)
  To: Izik Eidus
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Izik Eidus wrote:
> This patch change the KSM_REMOVE_MEMORY_REGION ioctl to be specific per
> memory region (instead of flushing all the registred memory regions inside
> the file descriptor like it happen now)
> 
> The previoes api was:
> user register memory regions using KSM_REGISTER_MEMORY_REGION inside the fd,
> and then when he wanted to remove just one memory region, he had to remove them
> all using KSM_REMOVE_MEMORY_REGION.
> 
> This patch change this beahivor by chaning the KSM_REMOVE_MEMORY_REGION
> ioctl to recive another paramter that it is the begining of the virtual
> address that is wanted to be removed.

This is different from munmap and madvise, which take both
start address and length.

Why?

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06  0:53         ` Rik van Riel
  0 siblings, 0 replies; 108+ messages in thread
From: Rik van Riel @ 2009-05-06  0:53 UTC (permalink / raw)
  To: Izik Eidus
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Izik Eidus wrote:
> This patch change the KSM_REMOVE_MEMORY_REGION ioctl to be specific per
> memory region (instead of flushing all the registred memory regions inside
> the file descriptor like it happen now)
> 
> The previoes api was:
> user register memory regions using KSM_REGISTER_MEMORY_REGION inside the fd,
> and then when he wanted to remove just one memory region, he had to remove them
> all using KSM_REMOVE_MEMORY_REGION.
> 
> This patch change this beahivor by chaning the KSM_REMOVE_MEMORY_REGION
> ioctl to recive another paramter that it is the begining of the virtual
> address that is wanted to be removed.

This is different from munmap and madvise, which take both
start address and length.

Why?

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/6] ksm: change the prot handling to use the generic helper functions
  2009-05-04 22:25         ` Izik Eidus
@ 2009-05-06  0:54           ` Rik van Riel
  -1 siblings, 0 replies; 108+ messages in thread
From: Rik van Riel @ 2009-05-06  0:54 UTC (permalink / raw)
  To: Izik Eidus
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Izik Eidus wrote:
> This is needed to avoid breaking some architectures.
> 
> Signed-off-by: Izik Eidus <ieidus@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 4/6] ksm: change the prot handling to use the generic helper functions
@ 2009-05-06  0:54           ` Rik van Riel
  0 siblings, 0 replies; 108+ messages in thread
From: Rik van Riel @ 2009-05-06  0:54 UTC (permalink / raw)
  To: Izik Eidus
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Izik Eidus wrote:
> This is needed to avoid breaking some architectures.
> 
> Signed-off-by: Izik Eidus <ieidus@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 5/6] ksm: build system make it compile for all archs
  2009-05-04 22:25           ` Izik Eidus
@ 2009-05-06  0:54             ` Rik van Riel
  -1 siblings, 0 replies; 108+ messages in thread
From: Rik van Riel @ 2009-05-06  0:54 UTC (permalink / raw)
  To: Izik Eidus
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Izik Eidus wrote:
> The known issues with cross platform support were fixed,
> so we return it back to compile on all archs.
> 
> Signed-off-by: Izik Eidus <ieidus@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 5/6] ksm: build system make it compile for all archs
@ 2009-05-06  0:54             ` Rik van Riel
  0 siblings, 0 replies; 108+ messages in thread
From: Rik van Riel @ 2009-05-06  0:54 UTC (permalink / raw)
  To: Izik Eidus
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Izik Eidus wrote:
> The known issues with cross platform support were fixed,
> so we return it back to compile on all archs.
> 
> Signed-off-by: Izik Eidus <ieidus@redhat.com>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 6/6] ksm: use another miscdevice minor number.
  2009-05-04 22:25             ` Izik Eidus
@ 2009-05-06  0:55               ` Rik van Riel
  -1 siblings, 0 replies; 108+ messages in thread
From: Rik van Riel @ 2009-05-06  0:55 UTC (permalink / raw)
  To: Izik Eidus
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Izik Eidus wrote:
> The old number was registered already by another project.
> The new number is #234.
> 
> Thanks.
> 
> Signed-off-by: Izik Eidus <ieidus@redhat.com>
> Signed-off-by: Alan Cox <device@lanana.org>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 6/6] ksm: use another miscdevice minor number.
@ 2009-05-06  0:55               ` Rik van Riel
  0 siblings, 0 replies; 108+ messages in thread
From: Rik van Riel @ 2009-05-06  0:55 UTC (permalink / raw)
  To: Izik Eidus
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Izik Eidus wrote:
> The old number was registered already by another project.
> The new number is #234.
> 
> Thanks.
> 
> Signed-off-by: Izik Eidus <ieidus@redhat.com>
> Signed-off-by: Alan Cox <device@lanana.org>

Acked-by: Rik van Riel <riel@redhat.com>

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06  0:53         ` Rik van Riel
@ 2009-05-06  8:38           ` Izik Eidus
  -1 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-06  8:38 UTC (permalink / raw)
  To: Rik van Riel
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Rik van Riel wrote:
> Izik Eidus wrote:
>> This patch change the KSM_REMOVE_MEMORY_REGION ioctl to be specific per
>> memory region (instead of flushing all the registred memory regions 
>> inside
>> the file descriptor like it happen now)
>>
>> The previoes api was:
>> user register memory regions using KSM_REGISTER_MEMORY_REGION inside 
>> the fd,
>> and then when he wanted to remove just one memory region, he had to 
>> remove them
>> all using KSM_REMOVE_MEMORY_REGION.
>>
>> This patch change this beahivor by chaning the KSM_REMOVE_MEMORY_REGION
>> ioctl to recive another paramter that it is the begining of the virtual
>> address that is wanted to be removed.
>
> This is different from munmap and madvise, which take both
> start address and length.
>
> Why?
>
It work like free, considering the fact that we dont allow memory 
overlay in no way,
If we have the start of the address it is enough for us to know what 
memory we want to remove.

Isnt interface for userspace that work like malloc / free is enough?

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06  8:38           ` Izik Eidus
  0 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-06  8:38 UTC (permalink / raw)
  To: Rik van Riel
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Rik van Riel wrote:
> Izik Eidus wrote:
>> This patch change the KSM_REMOVE_MEMORY_REGION ioctl to be specific per
>> memory region (instead of flushing all the registred memory regions 
>> inside
>> the file descriptor like it happen now)
>>
>> The previoes api was:
>> user register memory regions using KSM_REGISTER_MEMORY_REGION inside 
>> the fd,
>> and then when he wanted to remove just one memory region, he had to 
>> remove them
>> all using KSM_REMOVE_MEMORY_REGION.
>>
>> This patch change this beahivor by chaning the KSM_REMOVE_MEMORY_REGION
>> ioctl to recive another paramter that it is the begining of the virtual
>> address that is wanted to be removed.
>
> This is different from munmap and madvise, which take both
> start address and length.
>
> Why?
>
It work like free, considering the fact that we dont allow memory 
overlay in no way,
If we have the start of the address it is enough for us to know what 
memory we want to remove.

Isnt interface for userspace that work like malloc / free is enough?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06  0:43       ` Rik van Riel
@ 2009-05-06  9:46         ` Izik Eidus
  -1 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-06  9:46 UTC (permalink / raw)
  To: Rik van Riel
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Rik van Riel wrote:
> Izik Eidus wrote:
>> subjects say it all.
>
> Not a very useful commit message.
>
> This makes me wonder, though.
>
> What happens if a user mmaps a 30MB memory region, registers it
> with KSM and then unmaps the middle 10MB?

User cant break 30MB into smaller one.
That mean that when you regisiter memory region that is X mb size, you 
can only remove it (as a whole), or add new region.
This should answer the next question you had about why i have just the 
start address for removing the regions.

>
>> Signed-off-by: Izik Eidus <ieidus@redhat.com>
>
> except for the commit message, Acked-by: Rik van Riel <riel@redhat.com>
>


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
@ 2009-05-06  9:46         ` Izik Eidus
  0 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-06  9:46 UTC (permalink / raw)
  To: Rik van Riel
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Rik van Riel wrote:
> Izik Eidus wrote:
>> subjects say it all.
>
> Not a very useful commit message.
>
> This makes me wonder, though.
>
> What happens if a user mmaps a 30MB memory region, registers it
> with KSM and then unmaps the middle 10MB?

User cant break 30MB into smaller one.
That mean that when you regisiter memory region that is X mb size, you 
can only remove it (as a whole), or add new region.
This should answer the next question you had about why i have just the 
start address for removing the regions.

>
>> Signed-off-by: Izik Eidus <ieidus@redhat.com>
>
> except for the commit message, Acked-by: Rik van Riel <riel@redhat.com>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06  8:38           ` Izik Eidus
@ 2009-05-06 11:16             ` Hugh Dickins
  -1 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-06 11:16 UTC (permalink / raw)
  To: Izik Eidus
  Cc: Rik van Riel, akpm, linux-kernel, aarcange, chrisw, alan, device,
	linux-mm, nickpiggin

On Wed, 6 May 2009, Izik Eidus wrote:
> Rik van Riel wrote:
> >
> > This is different from munmap and madvise, which take both
> > start address and length.
> >
> > Why?
> >
> It work like free, considering the fact that we dont allow memory overlay in
> no way,
> If we have the start of the address it is enough for us to know what memory we
> want to remove.
> 
> Isnt interface for userspace that work like malloc / free is enough?

I'm afraid not.

There's an (addr, length) consistency throughout the mm system calls -
mmap munmap mlock munlock mprotect msync madvise, even mincore and mremap
- that we ought not to depart from lightly.  And those (addr, length)s
are allowed to break into and span the original mmaps (excepting mremap).

Getting an fd from /dev/ksm, using that in an ioctl to get another fd
(eh?), using that in a further ioctl to specify an addr and number of
pages, may well have been a good interface for getting this working out
of tree, as an adjunct of KVM.  But you've done too well at selling KSM
as more generally useful than that: it is good work, I'm liking it, but
if it's going to mainline, then it needs an appropriate user interface.

I'm very much with those who suggested an madvise(), for which Chris
prepared a patch.  I know Andrea felt uneasy with an madvise() going
to a possibly-configured-out-or-never-loaded module, but it is just
advice, so I don't have a problem with that myself, so long as it
is documented in the manpage.

Whereas I do worry just what capability should be required for this:
can't a greedy app simply fork itself, touch all its pages, and thus
lock itself into memory in this way?  And I do worry about the cpu
cost of all the scanning, if it were to get used more generally -
it would be a pity if we just advised complainers to tune it out.

I'm still working my way through ksm.c, and not gone back to look at
Chris's madvise patch, but doubt it will be sufficient.  There's an
interesting difference between what you're doing in ksm.c, and how
madvise usually behaves, regarding unmapped areas: madvice doesn't
usually apply to an unmapped area, and goes away with an area when
it is unmapped; whereas in KSM's case, the advice applies to whatever
happens to get mapped in the area specified, persisting across unmaps.

If KSM is to behave in the usual madvise way, it'll need to be informed
of unmaps.  And I suspect it may need to be informed of them, even if we
let it continue to apply to empty address space.  Because even with your
more limited unsigned int nrpages interface, the caller can specify an
enormous range on 64-bit, and ksm.c be fully occupied just incrementing
from one absent page to the next.  mmap's vma ranges confine the space
to be searched, and instantiated pagetables confine it further: I think
you're either going to need to rely upon those to confine your search
area, or else enhance your own data structures to confine it.

But I do appreciate the separation you've kept so far,
and wouldn't want to tie it all together too closely.

Hugh

p.s.  I wish you'd chosen different name than KSM - the kernel
has supported shared memory for many years - and notice ksm.c itself
says "Memory merging driver".  "Merge" would indeed have been a less
ambiguous term than "Share", but I think too late to change that now
- except possibly in the MADV_ flag names?

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06 11:16             ` Hugh Dickins
  0 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-06 11:16 UTC (permalink / raw)
  To: Izik Eidus
  Cc: Rik van Riel, akpm, linux-kernel, aarcange, chrisw, alan, device,
	linux-mm, nickpiggin

On Wed, 6 May 2009, Izik Eidus wrote:
> Rik van Riel wrote:
> >
> > This is different from munmap and madvise, which take both
> > start address and length.
> >
> > Why?
> >
> It work like free, considering the fact that we dont allow memory overlay in
> no way,
> If we have the start of the address it is enough for us to know what memory we
> want to remove.
> 
> Isnt interface for userspace that work like malloc / free is enough?

I'm afraid not.

There's an (addr, length) consistency throughout the mm system calls -
mmap munmap mlock munlock mprotect msync madvise, even mincore and mremap
- that we ought not to depart from lightly.  And those (addr, length)s
are allowed to break into and span the original mmaps (excepting mremap).

Getting an fd from /dev/ksm, using that in an ioctl to get another fd
(eh?), using that in a further ioctl to specify an addr and number of
pages, may well have been a good interface for getting this working out
of tree, as an adjunct of KVM.  But you've done too well at selling KSM
as more generally useful than that: it is good work, I'm liking it, but
if it's going to mainline, then it needs an appropriate user interface.

I'm very much with those who suggested an madvise(), for which Chris
prepared a patch.  I know Andrea felt uneasy with an madvise() going
to a possibly-configured-out-or-never-loaded module, but it is just
advice, so I don't have a problem with that myself, so long as it
is documented in the manpage.

Whereas I do worry just what capability should be required for this:
can't a greedy app simply fork itself, touch all its pages, and thus
lock itself into memory in this way?  And I do worry about the cpu
cost of all the scanning, if it were to get used more generally -
it would be a pity if we just advised complainers to tune it out.

I'm still working my way through ksm.c, and not gone back to look at
Chris's madvise patch, but doubt it will be sufficient.  There's an
interesting difference between what you're doing in ksm.c, and how
madvise usually behaves, regarding unmapped areas: madvice doesn't
usually apply to an unmapped area, and goes away with an area when
it is unmapped; whereas in KSM's case, the advice applies to whatever
happens to get mapped in the area specified, persisting across unmaps.

If KSM is to behave in the usual madvise way, it'll need to be informed
of unmaps.  And I suspect it may need to be informed of them, even if we
let it continue to apply to empty address space.  Because even with your
more limited unsigned int nrpages interface, the caller can specify an
enormous range on 64-bit, and ksm.c be fully occupied just incrementing
from one absent page to the next.  mmap's vma ranges confine the space
to be searched, and instantiated pagetables confine it further: I think
you're either going to need to rely upon those to confine your search
area, or else enhance your own data structures to confine it.

But I do appreciate the separation you've kept so far,
and wouldn't want to tie it all together too closely.

Hugh

p.s.  I wish you'd chosen different name than KSM - the kernel
has supported shared memory for many years - and notice ksm.c itself
says "Memory merging driver".  "Merge" would indeed have been a less
ambiguous term than "Share", but I think too late to change that now
- except possibly in the MADV_ flag names?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06  9:46         ` Izik Eidus
@ 2009-05-06 12:26           ` Rik van Riel
  -1 siblings, 0 replies; 108+ messages in thread
From: Rik van Riel @ 2009-05-06 12:26 UTC (permalink / raw)
  To: Izik Eidus
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Izik Eidus wrote:
> Rik van Riel wrote:
>> Izik Eidus wrote:
>>> subjects say it all.
>>
>> Not a very useful commit message.
>>
>> This makes me wonder, though.
>>
>> What happens if a user mmaps a 30MB memory region, registers it
>> with KSM and then unmaps the middle 10MB?
> 
> User cant break 30MB into smaller one.

The user can break up the underlying VMAs though.

I am just wondering out loud if we really want two
VMA-like objects in the kernel, the VMA itself and
a separate KSM object, with different semantics.

Maybe this is fine, but I do think it's a question
that needs to be thought about.

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
@ 2009-05-06 12:26           ` Rik van Riel
  0 siblings, 0 replies; 108+ messages in thread
From: Rik van Riel @ 2009-05-06 12:26 UTC (permalink / raw)
  To: Izik Eidus
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Izik Eidus wrote:
> Rik van Riel wrote:
>> Izik Eidus wrote:
>>> subjects say it all.
>>
>> Not a very useful commit message.
>>
>> This makes me wonder, though.
>>
>> What happens if a user mmaps a 30MB memory region, registers it
>> with KSM and then unmaps the middle 10MB?
> 
> User cant break 30MB into smaller one.

The user can break up the underlying VMAs though.

I am just wondering out loud if we really want two
VMA-like objects in the kernel, the VMA itself and
a separate KSM object, with different semantics.

Maybe this is fine, but I do think it's a question
that needs to be thought about.

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 12:26           ` Rik van Riel
@ 2009-05-06 12:39             ` Izik Eidus
  -1 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-06 12:39 UTC (permalink / raw)
  To: Rik van Riel
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Rik van Riel wrote:
> Izik Eidus wrote:
>> Rik van Riel wrote:
>>> Izik Eidus wrote:
>>>> subjects say it all.
>>>
>>> Not a very useful commit message.
>>>
>>> This makes me wonder, though.
>>>
>>> What happens if a user mmaps a 30MB memory region, registers it
>>> with KSM and then unmaps the middle 10MB?
>>
>> User cant break 30MB into smaller one.
>
> The user can break up the underlying VMAs though.

So? KSM work on contigiouns virtual address, if user will break its 
virtual address and will leave it to be registered inside ksm
get_user_pages() will just fail, and ksm will skip scanning this 
addresses...

Normal usage of ksm is:

1) Allocating big chunck of memory.

2) registering it inside ksm

3) free the memory and remove it from ksm...

>
> I am just wondering out loud if we really want two
> VMA-like objects in the kernel, the VMA itself and
> a separate KSM object, with different semantics. 
>
> Maybe this is fine, but I do think it's a question
> that needs to be thought about.


Yea, we had some talk about that issue, considering the fact that user 
register its memory using ioctl and not systemcall, and considering the 
fact that ksm is loadable module that the kernel doesnt depend on,

How would you prefer to see the interface?



^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
@ 2009-05-06 12:39             ` Izik Eidus
  0 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-06 12:39 UTC (permalink / raw)
  To: Rik van Riel
  Cc: akpm, linux-kernel, aarcange, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

Rik van Riel wrote:
> Izik Eidus wrote:
>> Rik van Riel wrote:
>>> Izik Eidus wrote:
>>>> subjects say it all.
>>>
>>> Not a very useful commit message.
>>>
>>> This makes me wonder, though.
>>>
>>> What happens if a user mmaps a 30MB memory region, registers it
>>> with KSM and then unmaps the middle 10MB?
>>
>> User cant break 30MB into smaller one.
>
> The user can break up the underlying VMAs though.

So? KSM work on contigiouns virtual address, if user will break its 
virtual address and will leave it to be registered inside ksm
get_user_pages() will just fail, and ksm will skip scanning this 
addresses...

Normal usage of ksm is:

1) Allocating big chunck of memory.

2) registering it inside ksm

3) free the memory and remove it from ksm...

>
> I am just wondering out loud if we really want two
> VMA-like objects in the kernel, the VMA itself and
> a separate KSM object, with different semantics. 
>
> Maybe this is fine, but I do think it's a question
> that needs to be thought about.


Yea, we had some talk about that issue, considering the fact that user 
register its memory using ioctl and not systemcall, and considering the 
fact that ksm is loadable module that the kernel doesnt depend on,

How would you prefer to see the interface?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 12:26           ` Rik van Riel
@ 2009-05-06 13:17             ` Andrea Arcangeli
  -1 siblings, 0 replies; 108+ messages in thread
From: Andrea Arcangeli @ 2009-05-06 13:17 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Izik Eidus, akpm, linux-kernel, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

On Wed, May 06, 2009 at 08:26:18AM -0400, Rik van Riel wrote:
> The user can break up the underlying VMAs though.
>
> I am just wondering out loud if we really want two
> VMA-like objects in the kernel, the VMA itself and
> a separate KSM object, with different semantics.
>
> Maybe this is fine, but I do think it's a question
> that needs to be thought about.

If we want to keep KVM self contained we need a separate object. If we
want to merge part of KVM into the kernel VM core, then it can use the
vma and use madvise or better its own syscall (usually madvise doesn't
depend on admin starting kernel threads) or similar and the semantics
will change slightly. From a practical point of view I don't think
there's much difference and it can be done later if we change our
mind, given the low amount of apps that uses KVM (but for those few
apps like KVM, KSM can save tons of memory).

For example for the swapping of KSM pages we've been thinking of using
external rmap hooks to avoid the VM to know anything specific to KSM
pages but to still allow their unmapping and swap. Otherwise if there
are other modules like KVM that wants to extend the VM they'll also
have to add their own PG_ bitflags just for allow the swapping of
their own pages in the VM LRUs etc..

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
@ 2009-05-06 13:17             ` Andrea Arcangeli
  0 siblings, 0 replies; 108+ messages in thread
From: Andrea Arcangeli @ 2009-05-06 13:17 UTC (permalink / raw)
  To: Rik van Riel
  Cc: Izik Eidus, akpm, linux-kernel, chrisw, alan, device, linux-mm,
	hugh, nickpiggin

On Wed, May 06, 2009 at 08:26:18AM -0400, Rik van Riel wrote:
> The user can break up the underlying VMAs though.
>
> I am just wondering out loud if we really want two
> VMA-like objects in the kernel, the VMA itself and
> a separate KSM object, with different semantics.
>
> Maybe this is fine, but I do think it's a question
> that needs to be thought about.

If we want to keep KVM self contained we need a separate object. If we
want to merge part of KVM into the kernel VM core, then it can use the
vma and use madvise or better its own syscall (usually madvise doesn't
depend on admin starting kernel threads) or similar and the semantics
will change slightly. From a practical point of view I don't think
there's much difference and it can be done later if we change our
mind, given the low amount of apps that uses KVM (but for those few
apps like KVM, KSM can save tons of memory).

For example for the swapping of KSM pages we've been thinking of using
external rmap hooks to avoid the VM to know anything specific to KSM
pages but to still allow their unmapping and swap. Otherwise if there
are other modules like KVM that wants to extend the VM they'll also
have to add their own PG_ bitflags just for allow the swapping of
their own pages in the VM LRUs etc..

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 13:17             ` Andrea Arcangeli
@ 2009-05-06 13:28               ` Hugh Dickins
  -1 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-06 13:28 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Rik van Riel, Izik Eidus, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

On Wed, 6 May 2009, Andrea Arcangeli wrote:
> 
> For example for the swapping of KSM pages we've been thinking of using
> external rmap hooks to avoid the VM to know anything specific to KSM
> pages but to still allow their unmapping and swap.

There may prove to be various reasons why it wouldn't work out in practice;
but when thinking of swapping them, it is worth considering if those KSM
pages can just be assigned to a tmpfs file, then leave the swapping to that.

Hugh

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
@ 2009-05-06 13:28               ` Hugh Dickins
  0 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-06 13:28 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Rik van Riel, Izik Eidus, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

On Wed, 6 May 2009, Andrea Arcangeli wrote:
> 
> For example for the swapping of KSM pages we've been thinking of using
> external rmap hooks to avoid the VM to know anything specific to KSM
> pages but to still allow their unmapping and swap.

There may prove to be various reasons why it wouldn't work out in practice;
but when thinking of swapping them, it is worth considering if those KSM
pages can just be assigned to a tmpfs file, then leave the swapping to that.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06 11:16             ` Hugh Dickins
@ 2009-05-06 13:34               ` Andrea Arcangeli
  -1 siblings, 0 replies; 108+ messages in thread
From: Andrea Arcangeli @ 2009-05-06 13:34 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Izik Eidus, Rik van Riel, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

On Wed, May 06, 2009 at 12:16:52PM +0100, Hugh Dickins wrote:
> I'm very much with those who suggested an madvise(), for which Chris
> prepared a patch.  I know Andrea felt uneasy with an madvise() going
> to a possibly-configured-out-or-never-loaded module, but it is just
> advice, so I don't have a problem with that myself, so long as it
> is documented in the manpage.

I don't have so much of a problem with that, but there are a couple of
differences: normally madvise doesn't depend on the admin to start
some kernel thread to be meaningful, and normally madvise isn't a
privileged operation, see below.

> Whereas I do worry just what capability should be required for this:
> can't a greedy app simply fork itself, touch all its pages, and thus
> lock itself into memory in this way?  And I do worry about the cpu

KSM pages are supposed to be swappable in the long run so let's think
longer term.

> cost of all the scanning, if it were to get used more generally -
> it would be a pity if we just advised complainers to tune it out.

Clearly if tons of apps maliciously register themself in ksm, they'll
waste tons of CPU for no good, they'll just populate the unstable tree
with pages that are all equal except for the last 4 bytes slowing down
KSM for nothing. This is also why it's good to have a /dev/ksm ioctl
that the admin can allow only certain users to use for registering
virtual ranges (for example only the kvm/qemu user or all users in
scientific environments). Otherwise we'd need some kind of permissions
settings in sysfs with some API that certainly is less intuitive than
chown/chmod on /dev/ksm. We just can't allow madvise to succeed on any
luser registering itself in KSM, so if it was madvise, it shall return
-EPERM somehow sometime.

> I'm still working my way through ksm.c, and not gone back to look at
> Chris's madvise patch, but doubt it will be sufficient.  There's an
> interesting difference between what you're doing in ksm.c, and how
> madvise usually behaves, regarding unmapped areas: madvice doesn't
> usually apply to an unmapped area, and goes away with an area when
> it is unmapped; whereas in KSM's case, the advice applies to whatever
> happens to get mapped in the area specified, persisting across unmaps.

Given the apps using KSM tends to be quite special, the fact it's
sticky, it doesn't go away with munmap isn't big deal, quite to the
contrary those apps will likely have an easier time thanks to the
registration not going away over munmap/mmap, without requiring
reloading of malloc/new calls.

To skip over holes during virtual scans we just vma->vm_next.

> But I do appreciate the separation you've kept so far,
> and wouldn't want to tie it all together too closely.

The above plus the fact it remains self contained without making the
VM any more complicated, gives some value. Even swapping I'd like to
add it without VM specific knowledge about KSM. tmpfs has an easier
time because it has its own vma type, here we've KSM pages mixed
inside regular anonymous !vma->vm_file regions and !vm_ops.

> p.s.  I wish you'd chosen different name than KSM - the kernel
> has supported shared memory for many years - and notice ksm.c itself
> says "Memory merging driver".  "Merge" would indeed have been a less
> ambiguous term than "Share", but I think too late to change that now
> - except possibly in the MADV_ flag names?

I don't actually care about names, so I leave this to other to discuss.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06 13:34               ` Andrea Arcangeli
  0 siblings, 0 replies; 108+ messages in thread
From: Andrea Arcangeli @ 2009-05-06 13:34 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Izik Eidus, Rik van Riel, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

On Wed, May 06, 2009 at 12:16:52PM +0100, Hugh Dickins wrote:
> I'm very much with those who suggested an madvise(), for which Chris
> prepared a patch.  I know Andrea felt uneasy with an madvise() going
> to a possibly-configured-out-or-never-loaded module, but it is just
> advice, so I don't have a problem with that myself, so long as it
> is documented in the manpage.

I don't have so much of a problem with that, but there are a couple of
differences: normally madvise doesn't depend on the admin to start
some kernel thread to be meaningful, and normally madvise isn't a
privileged operation, see below.

> Whereas I do worry just what capability should be required for this:
> can't a greedy app simply fork itself, touch all its pages, and thus
> lock itself into memory in this way?  And I do worry about the cpu

KSM pages are supposed to be swappable in the long run so let's think
longer term.

> cost of all the scanning, if it were to get used more generally -
> it would be a pity if we just advised complainers to tune it out.

Clearly if tons of apps maliciously register themself in ksm, they'll
waste tons of CPU for no good, they'll just populate the unstable tree
with pages that are all equal except for the last 4 bytes slowing down
KSM for nothing. This is also why it's good to have a /dev/ksm ioctl
that the admin can allow only certain users to use for registering
virtual ranges (for example only the kvm/qemu user or all users in
scientific environments). Otherwise we'd need some kind of permissions
settings in sysfs with some API that certainly is less intuitive than
chown/chmod on /dev/ksm. We just can't allow madvise to succeed on any
luser registering itself in KSM, so if it was madvise, it shall return
-EPERM somehow sometime.

> I'm still working my way through ksm.c, and not gone back to look at
> Chris's madvise patch, but doubt it will be sufficient.  There's an
> interesting difference between what you're doing in ksm.c, and how
> madvise usually behaves, regarding unmapped areas: madvice doesn't
> usually apply to an unmapped area, and goes away with an area when
> it is unmapped; whereas in KSM's case, the advice applies to whatever
> happens to get mapped in the area specified, persisting across unmaps.

Given the apps using KSM tends to be quite special, the fact it's
sticky, it doesn't go away with munmap isn't big deal, quite to the
contrary those apps will likely have an easier time thanks to the
registration not going away over munmap/mmap, without requiring
reloading of malloc/new calls.

To skip over holes during virtual scans we just vma->vm_next.

> But I do appreciate the separation you've kept so far,
> and wouldn't want to tie it all together too closely.

The above plus the fact it remains self contained without making the
VM any more complicated, gives some value. Even swapping I'd like to
add it without VM specific knowledge about KSM. tmpfs has an easier
time because it has its own vma type, here we've KSM pages mixed
inside regular anonymous !vma->vm_file regions and !vm_ops.

> p.s.  I wish you'd chosen different name than KSM - the kernel
> has supported shared memory for many years - and notice ksm.c itself
> says "Memory merging driver".  "Merge" would indeed have been a less
> ambiguous term than "Share", but I think too late to change that now
> - except possibly in the MADV_ flag names?

I don't actually care about names, so I leave this to other to discuss.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06 13:34               ` Andrea Arcangeli
@ 2009-05-06 13:56                 ` Izik Eidus
  -1 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-06 13:56 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Hugh Dickins, Rik van Riel, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

Andrea Arcangeli wrote:
> On Wed, May 06, 2009 at 12:16:52PM +0100, Hugh Dickins wrote:
>   
>
>> p.s.  I wish you'd chosen different name than KSM - the kernel
>> has supported shared memory for many years - and notice ksm.c itself
>> says "Memory merging driver".  "Merge" would indeed have been a less
>> ambiguous term than "Share", but I think too late to change that now
>> - except possibly in the MADV_ flag names?
>>     
>
> I don't actually care about names, so I leave this to other to discuss.
>   
Well, There is no real problem changing the name, any suggestions?

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06 13:56                 ` Izik Eidus
  0 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-06 13:56 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Hugh Dickins, Rik van Riel, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

Andrea Arcangeli wrote:
> On Wed, May 06, 2009 at 12:16:52PM +0100, Hugh Dickins wrote:
>   
>
>> p.s.  I wish you'd chosen different name than KSM - the kernel
>> has supported shared memory for many years - and notice ksm.c itself
>> says "Memory merging driver".  "Merge" would indeed have been a less
>> ambiguous term than "Share", but I think too late to change that now
>> - except possibly in the MADV_ flag names?
>>     
>
> I don't actually care about names, so I leave this to other to discuss.
>   
Well, There is no real problem changing the name, any suggestions?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 13:28               ` Hugh Dickins
@ 2009-05-06 14:02                 ` Izik Eidus
  -1 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-06 14:02 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrea Arcangeli, Rik van Riel, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

Hugh Dickins wrote:
> On Wed, 6 May 2009, Andrea Arcangeli wrote:
>   
>> For example for the swapping of KSM pages we've been thinking of using
>> external rmap hooks to avoid the VM to know anything specific to KSM
>> pages but to still allow their unmapping and swap.
>>     
>
> There may prove to be various reasons why it wouldn't work out in practice;
> but when thinking of swapping them, it is worth considering if those KSM
> pages can just be assigned to a tmpfs file, then leave the swapping to that.
>
> Hugh
>   

The problem here (as i see it) is reverse mapping for this vmas that 
point into the shared page.
Right now linux use the ->index to find this pages and then unpresent 
them...
But even if we move into allocating them inside tmpfs, who will know how 
to unpresent the virtual addresses when we want to swap the page?

I had old patch that did that extrnal rmap thing, it added few callbacks 
inside rmap.c (you had AnonPage FilePage and ExtRmapPage) and if any 
driver wanted to mangae the reverse mapping by itself it would just 
marked the pages as ExtRmapPages and will then tell the 
try_to_unmap_one() the virtual address it need to unmap...

As far as i remember it required some small changes into memory.c


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
@ 2009-05-06 14:02                 ` Izik Eidus
  0 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-06 14:02 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Andrea Arcangeli, Rik van Riel, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

Hugh Dickins wrote:
> On Wed, 6 May 2009, Andrea Arcangeli wrote:
>   
>> For example for the swapping of KSM pages we've been thinking of using
>> external rmap hooks to avoid the VM to know anything specific to KSM
>> pages but to still allow their unmapping and swap.
>>     
>
> There may prove to be various reasons why it wouldn't work out in practice;
> but when thinking of swapping them, it is worth considering if those KSM
> pages can just be assigned to a tmpfs file, then leave the swapping to that.
>
> Hugh
>   

The problem here (as i see it) is reverse mapping for this vmas that 
point into the shared page.
Right now linux use the ->index to find this pages and then unpresent 
them...
But even if we move into allocating them inside tmpfs, who will know how 
to unpresent the virtual addresses when we want to swap the page?

I had old patch that did that extrnal rmap thing, it added few callbacks 
inside rmap.c (you had AnonPage FilePage and ExtRmapPage) and if any 
driver wanted to mangae the reverse mapping by itself it would just 
marked the pages as ExtRmapPages and will then tell the 
try_to_unmap_one() the virtual address it need to unmap...

As far as i remember it required some small changes into memory.c

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 13:28               ` Hugh Dickins
@ 2009-05-06 14:09                 ` Andrea Arcangeli
  -1 siblings, 0 replies; 108+ messages in thread
From: Andrea Arcangeli @ 2009-05-06 14:09 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Rik van Riel, Izik Eidus, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

On Wed, May 06, 2009 at 02:28:33PM +0100, Hugh Dickins wrote:
> There may prove to be various reasons why it wouldn't work out in practice;
> but when thinking of swapping them, it is worth considering if those KSM
> pages can just be assigned to a tmpfs file, then leave the swapping to that.

Not sure if I understand but the vma handled by KSM is anonymous, how
can you assign those pages to a tmpfs file, the anon vma won't permit
that, all regular anon methods will be called for swapin etc... What I
mean is that some change in core VM looks required and I plan those to
be external-rmap kind, KSM agnostic. But perhaps we can reuse some
shmem code yes, I didn't think about that yet. Anyway I'd rather
discuss this later, this isn't the time yet. I'm quite optimistic that
to make KSM swap it won't be a big change. For now there's a limit on
the max number of ksm pages that can be allocated at any given time so
to avoid OOM conditions, like the swap-compress logic that limits the
swapdevice size to less than ram.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
@ 2009-05-06 14:09                 ` Andrea Arcangeli
  0 siblings, 0 replies; 108+ messages in thread
From: Andrea Arcangeli @ 2009-05-06 14:09 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Rik van Riel, Izik Eidus, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

On Wed, May 06, 2009 at 02:28:33PM +0100, Hugh Dickins wrote:
> There may prove to be various reasons why it wouldn't work out in practice;
> but when thinking of swapping them, it is worth considering if those KSM
> pages can just be assigned to a tmpfs file, then leave the swapping to that.

Not sure if I understand but the vma handled by KSM is anonymous, how
can you assign those pages to a tmpfs file, the anon vma won't permit
that, all regular anon methods will be called for swapin etc... What I
mean is that some change in core VM looks required and I plan those to
be external-rmap kind, KSM agnostic. But perhaps we can reuse some
shmem code yes, I didn't think about that yet. Anyway I'd rather
discuss this later, this isn't the time yet. I'm quite optimistic that
to make KSM swap it won't be a big change. For now there's a limit on
the max number of ksm pages that can be allocated at any given time so
to avoid OOM conditions, like the swap-compress logic that limits the
swapdevice size to less than ram.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 14:09                 ` Andrea Arcangeli
@ 2009-05-06 14:21                   ` Alan Cox
  -1 siblings, 0 replies; 108+ messages in thread
From: Alan Cox @ 2009-05-06 14:21 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Hugh Dickins, Rik van Riel, Izik Eidus, akpm, linux-kernel,
	chrisw, device, linux-mm, nickpiggin

> the max number of ksm pages that can be allocated at any given time so
> to avoid OOM conditions, like the swap-compress logic that limits the
> swapdevice size to less than ram.

Are those pages accounted for in the vm_overcommit logic, as if you
allocate a big chunk of memory as KSM will do you need the worst case
vm_overcommit behaviour preserved and that means keeping the stats
correct.


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
@ 2009-05-06 14:21                   ` Alan Cox
  0 siblings, 0 replies; 108+ messages in thread
From: Alan Cox @ 2009-05-06 14:21 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Hugh Dickins, Rik van Riel, Izik Eidus, akpm, linux-kernel,
	chrisw, device, linux-mm, nickpiggin

> the max number of ksm pages that can be allocated at any given time so
> to avoid OOM conditions, like the swap-compress logic that limits the
> swapdevice size to less than ram.

Are those pages accounted for in the vm_overcommit logic, as if you
allocate a big chunk of memory as KSM will do you need the worst case
vm_overcommit behaviour preserved and that means keeping the stats
correct.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06 13:34               ` Andrea Arcangeli
@ 2009-05-06 14:25                 ` Hugh Dickins
  -1 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-06 14:25 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Izik Eidus, Rik van Riel, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

On Wed, 6 May 2009, Andrea Arcangeli wrote:
> On Wed, May 06, 2009 at 12:16:52PM +0100, Hugh Dickins wrote:
> > I'm very much with those who suggested an madvise(), for which Chris
> > prepared a patch.  I know Andrea felt uneasy with an madvise() going
> > to a possibly-configured-out-or-never-loaded module, but it is just
> > advice, so I don't have a problem with that myself, so long as it
> > is documented in the manpage.
> 
> I don't have so much of a problem with that, but there are a couple of
> differences: normally madvise doesn't depend on the admin to start
> some kernel thread to be meaningful, and normally madvise isn't a
> privileged operation, see below.
> 
> > Whereas I do worry just what capability should be required for this:
> > can't a greedy app simply fork itself, touch all its pages, and thus
> > lock itself into memory in this way?  And I do worry about the cpu
> 
> KSM pages are supposed to be swappable in the long run so let's think
> longer term.

And in the interim, insist on capable(CAP_IPC_LOCK)?
If that's okay for KVM's usage, that's fine by me for now.

Whether not having privilege means it should fail or silently ignore
the advice it's been given, I'm not sure: fail appears more helpful, but
silently ignore may fit better with whether module has been loaded yet
(we can keep a list of what's registered, for when module is loaded).

> 
> > cost of all the scanning, if it were to get used more generally -
> > it would be a pity if we just advised complainers to tune it out.
> 
> Clearly if tons of apps maliciously register themself in ksm, they'll
> waste tons of CPU for no good, they'll just populate the unstable tree
> with pages that are all equal except for the last 4 bytes slowing down
> KSM for nothing.

You're right to be concerned about the malicious, but I was thinking
rather of apps just wanting to say they may contain a goodly number
of duplicate pages, and wanting to register themselves for merging,
no malice intended.

If only for my hacked-up testing, I'm interested in having a workable
system on which every process has opted the whole of its address space
into this merging: never be optimal, but I'd like workable.

> This is also why it's good to have a /dev/ksm ioctl
> that the admin can allow only certain users to use for registering
> virtual ranges (for example only the kvm/qemu user or all users in
> scientific environments). Otherwise we'd need some kind of permissions
> settings in sysfs with some API that certainly is less intuitive than
> chown/chmod on /dev/ksm. We just can't allow madvise to succeed on any
> luser registering itself in KSM, so if it was madvise, it shall return
> -EPERM somehow sometime.

I don't see a role for /dev/ksm any more.  I'm glad the administrative
tunables have now been switched over to sysfs, and think that should be
used for these restrictions too.

And please don't think of non-KVM users of KSM as malicious lusers!

> 
> > I'm still working my way through ksm.c, and not gone back to look at
> > Chris's madvise patch, but doubt it will be sufficient.  There's an
> > interesting difference between what you're doing in ksm.c, and how
> > madvise usually behaves, regarding unmapped areas: madvice doesn't
> > usually apply to an unmapped area, and goes away with an area when
> > it is unmapped; whereas in KSM's case, the advice applies to whatever
> > happens to get mapped in the area specified, persisting across unmaps.
> 
> Given the apps using KSM tends to be quite special, the fact it's
> sticky, it doesn't go away with munmap isn't big deal, quite to the
> contrary those apps will likely have an easier time thanks to the
> registration not going away over munmap/mmap, without requiring
> reloading of malloc/new calls.

I don't have a fixed view on this: I'm open to it behaving differently
in this way (SuS on madvise certainly doesn't prohibit it), but suspect
we might end up better off behaving consistently.

> 
> To skip over holes during virtual scans we just vma->vm_next.

Is that in updates yet to come?  I see things like
	for (pages_count = 0; pages_count < slot->npages; ++pages_count)
and
		ksm_scan->page_index++;
which will, of course, eventually get across any hole and move into
vma->vm_next, but take vastly longer to do so than necessary.

> 
> > But I do appreciate the separation you've kept so far,
> > and wouldn't want to tie it all together too closely.
> 
> The above plus the fact it remains self contained without making the
> VM any more complicated, gives some value.

Yes, that's admirable, and shouldn't be discarded lightly.
But we do also need to be careful about letting subsystems go their
own way.  I was _very_ pleased to find it all sited in mm/, rather
than hidden away elsewhere, as I'd originally feared.

> Even swapping I'd like to
> add it without VM specific knowledge about KSM. tmpfs has an easier
> time because it has its own vma type, here we've KSM pages mixed
> inside regular anonymous !vma->vm_file regions and !vm_ops.

Well, whatever works out simplest, probably.  I think there's a good
chance that doing it with tmpfs pages will work out simplest, but
perfectly possible also that that's a no-go for some reason.

> 
> > p.s.  I wish you'd chosen different name than KSM - the kernel
> > has supported shared memory for many years - and notice ksm.c itself
> > says "Memory merging driver".  "Merge" would indeed have been a less
> > ambiguous term than "Share", but I think too late to change that now
> > - except possibly in the MADV_ flag names?
> 
> I don't actually care about names, so I leave this to other to discuss.

Hugh

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06 14:25                 ` Hugh Dickins
  0 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-06 14:25 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Izik Eidus, Rik van Riel, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

On Wed, 6 May 2009, Andrea Arcangeli wrote:
> On Wed, May 06, 2009 at 12:16:52PM +0100, Hugh Dickins wrote:
> > I'm very much with those who suggested an madvise(), for which Chris
> > prepared a patch.  I know Andrea felt uneasy with an madvise() going
> > to a possibly-configured-out-or-never-loaded module, but it is just
> > advice, so I don't have a problem with that myself, so long as it
> > is documented in the manpage.
> 
> I don't have so much of a problem with that, but there are a couple of
> differences: normally madvise doesn't depend on the admin to start
> some kernel thread to be meaningful, and normally madvise isn't a
> privileged operation, see below.
> 
> > Whereas I do worry just what capability should be required for this:
> > can't a greedy app simply fork itself, touch all its pages, and thus
> > lock itself into memory in this way?  And I do worry about the cpu
> 
> KSM pages are supposed to be swappable in the long run so let's think
> longer term.

And in the interim, insist on capable(CAP_IPC_LOCK)?
If that's okay for KVM's usage, that's fine by me for now.

Whether not having privilege means it should fail or silently ignore
the advice it's been given, I'm not sure: fail appears more helpful, but
silently ignore may fit better with whether module has been loaded yet
(we can keep a list of what's registered, for when module is loaded).

> 
> > cost of all the scanning, if it were to get used more generally -
> > it would be a pity if we just advised complainers to tune it out.
> 
> Clearly if tons of apps maliciously register themself in ksm, they'll
> waste tons of CPU for no good, they'll just populate the unstable tree
> with pages that are all equal except for the last 4 bytes slowing down
> KSM for nothing.

You're right to be concerned about the malicious, but I was thinking
rather of apps just wanting to say they may contain a goodly number
of duplicate pages, and wanting to register themselves for merging,
no malice intended.

If only for my hacked-up testing, I'm interested in having a workable
system on which every process has opted the whole of its address space
into this merging: never be optimal, but I'd like workable.

> This is also why it's good to have a /dev/ksm ioctl
> that the admin can allow only certain users to use for registering
> virtual ranges (for example only the kvm/qemu user or all users in
> scientific environments). Otherwise we'd need some kind of permissions
> settings in sysfs with some API that certainly is less intuitive than
> chown/chmod on /dev/ksm. We just can't allow madvise to succeed on any
> luser registering itself in KSM, so if it was madvise, it shall return
> -EPERM somehow sometime.

I don't see a role for /dev/ksm any more.  I'm glad the administrative
tunables have now been switched over to sysfs, and think that should be
used for these restrictions too.

And please don't think of non-KVM users of KSM as malicious lusers!

> 
> > I'm still working my way through ksm.c, and not gone back to look at
> > Chris's madvise patch, but doubt it will be sufficient.  There's an
> > interesting difference between what you're doing in ksm.c, and how
> > madvise usually behaves, regarding unmapped areas: madvice doesn't
> > usually apply to an unmapped area, and goes away with an area when
> > it is unmapped; whereas in KSM's case, the advice applies to whatever
> > happens to get mapped in the area specified, persisting across unmaps.
> 
> Given the apps using KSM tends to be quite special, the fact it's
> sticky, it doesn't go away with munmap isn't big deal, quite to the
> contrary those apps will likely have an easier time thanks to the
> registration not going away over munmap/mmap, without requiring
> reloading of malloc/new calls.

I don't have a fixed view on this: I'm open to it behaving differently
in this way (SuS on madvise certainly doesn't prohibit it), but suspect
we might end up better off behaving consistently.

> 
> To skip over holes during virtual scans we just vma->vm_next.

Is that in updates yet to come?  I see things like
	for (pages_count = 0; pages_count < slot->npages; ++pages_count)
and
		ksm_scan->page_index++;
which will, of course, eventually get across any hole and move into
vma->vm_next, but take vastly longer to do so than necessary.

> 
> > But I do appreciate the separation you've kept so far,
> > and wouldn't want to tie it all together too closely.
> 
> The above plus the fact it remains self contained without making the
> VM any more complicated, gives some value.

Yes, that's admirable, and shouldn't be discarded lightly.
But we do also need to be careful about letting subsystems go their
own way.  I was _very_ pleased to find it all sited in mm/, rather
than hidden away elsewhere, as I'd originally feared.

> Even swapping I'd like to
> add it without VM specific knowledge about KSM. tmpfs has an easier
> time because it has its own vma type, here we've KSM pages mixed
> inside regular anonymous !vma->vm_file regions and !vm_ops.

Well, whatever works out simplest, probably.  I think there's a good
chance that doing it with tmpfs pages will work out simplest, but
perfectly possible also that that's a no-go for some reason.

> 
> > p.s.  I wish you'd chosen different name than KSM - the kernel
> > has supported shared memory for many years - and notice ksm.c itself
> > says "Memory merging driver".  "Merge" would indeed have been a less
> > ambiguous term than "Share", but I think too late to change that now
> > - except possibly in the MADV_ flag names?
> 
> I don't actually care about names, so I leave this to other to discuss.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06 14:25                 ` Hugh Dickins
@ 2009-05-06 14:45                   ` Andrea Arcangeli
  -1 siblings, 0 replies; 108+ messages in thread
From: Andrea Arcangeli @ 2009-05-06 14:45 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Izik Eidus, Rik van Riel, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

On Wed, May 06, 2009 at 03:25:25PM +0100, Hugh Dickins wrote:
> And in the interim, insist on capable(CAP_IPC_LOCK)?
> If that's okay for KVM's usage, that's fine by me for now.

KVM has been in the kernel for years without being able to swap
reliably (if any spte mapped any anon page) and yet it didn't require
capable(CAP_IPC_LOCK).

Sure if we were to use the madvise syscall we'd be forced to make it
fail with -EPERM without capable(CAP_IPC_LOCK) but with the /dev/ksm
permissions I don't see big deal, it's definitely not worth requiring
userland changes given we'll make the ksm pages swappable later.

> Whether not having privilege means it should fail or silently ignore
> the advice it's been given, I'm not sure: fail appears more helpful, but

Fail surely is more helpful. The app is free to ignore the failure of
course! But there's no reason to forbid the app to know about it. Not
checking the 'rax' value when 'call' returns is good and fast enough.

> silently ignore may fit better with whether module has been loaded yet
> (we can keep a list of what's registered, for when module is loaded).

NOTE: it will not fail if the module isn't loaded yet. It must
succeed! Otherwise it would also need to fail after it succeeded if we
unload the module later...

> You're right to be concerned about the malicious, but I was thinking
> rather of apps just wanting to say they may contain a goodly number
> of duplicate pages, and wanting to register themselves for merging,
> no malice intended.

NOTE: it's not big deal if all users can register, admin still can
kill them and kksmd reschedule fine and it won't ever be noticeable. I
just think it's nicer if you don't give /dev/ksm to the whole world in
a system where you only use KSM for the KVM virtual machines and you
have lusers in the same system doing other stuff in the host. But
perhaps you're right and it's not worth ever returning -EPERM.

> If only for my hacked-up testing, I'm interested in having a workable
> system on which every process has opted the whole of its address space
> into this merging: never be optimal, but I'd like workable.

That's doable sure.

> And please don't think of non-KVM users of KSM as malicious lusers!

Sure not as we've provided feedback on non-KVM users too. Like I
already mentioned in previous email in scientific environments where
there's no malicious luser, ksm should be chowned 777 and given to
everyone.

> Is that in updates yet to come?  I see things like
> 	for (pages_count = 0; pages_count < slot->npages; ++pages_count)
> and
> 		ksm_scan->page_index++;
> which will, of course, eventually get across any hole and move into
> vma->vm_next, but take vastly longer to do so than necessary.

Update is yet to come, but this isn't relevant for the API.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06 14:45                   ` Andrea Arcangeli
  0 siblings, 0 replies; 108+ messages in thread
From: Andrea Arcangeli @ 2009-05-06 14:45 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Izik Eidus, Rik van Riel, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

On Wed, May 06, 2009 at 03:25:25PM +0100, Hugh Dickins wrote:
> And in the interim, insist on capable(CAP_IPC_LOCK)?
> If that's okay for KVM's usage, that's fine by me for now.

KVM has been in the kernel for years without being able to swap
reliably (if any spte mapped any anon page) and yet it didn't require
capable(CAP_IPC_LOCK).

Sure if we were to use the madvise syscall we'd be forced to make it
fail with -EPERM without capable(CAP_IPC_LOCK) but with the /dev/ksm
permissions I don't see big deal, it's definitely not worth requiring
userland changes given we'll make the ksm pages swappable later.

> Whether not having privilege means it should fail or silently ignore
> the advice it's been given, I'm not sure: fail appears more helpful, but

Fail surely is more helpful. The app is free to ignore the failure of
course! But there's no reason to forbid the app to know about it. Not
checking the 'rax' value when 'call' returns is good and fast enough.

> silently ignore may fit better with whether module has been loaded yet
> (we can keep a list of what's registered, for when module is loaded).

NOTE: it will not fail if the module isn't loaded yet. It must
succeed! Otherwise it would also need to fail after it succeeded if we
unload the module later...

> You're right to be concerned about the malicious, but I was thinking
> rather of apps just wanting to say they may contain a goodly number
> of duplicate pages, and wanting to register themselves for merging,
> no malice intended.

NOTE: it's not big deal if all users can register, admin still can
kill them and kksmd reschedule fine and it won't ever be noticeable. I
just think it's nicer if you don't give /dev/ksm to the whole world in
a system where you only use KSM for the KVM virtual machines and you
have lusers in the same system doing other stuff in the host. But
perhaps you're right and it's not worth ever returning -EPERM.

> If only for my hacked-up testing, I'm interested in having a workable
> system on which every process has opted the whole of its address space
> into this merging: never be optimal, but I'd like workable.

That's doable sure.

> And please don't think of non-KVM users of KSM as malicious lusers!

Sure not as we've provided feedback on non-KVM users too. Like I
already mentioned in previous email in scientific environments where
there's no malicious luser, ksm should be chowned 777 and given to
everyone.

> Is that in updates yet to come?  I see things like
> 	for (pages_count = 0; pages_count < slot->npages; ++pages_count)
> and
> 		ksm_scan->page_index++;
> which will, of course, eventually get across any hole and move into
> vma->vm_next, but take vastly longer to do so than necessary.

Update is yet to come, but this isn't relevant for the API.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 14:21                   ` Alan Cox
@ 2009-05-06 14:46                     ` Hugh Dickins
  -1 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-06 14:46 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andrea Arcangeli, Rik van Riel, Izik Eidus, akpm, linux-kernel,
	chrisw, device, linux-mm, nickpiggin

On Wed, 6 May 2009, Alan Cox wrote:
> > the max number of ksm pages that can be allocated at any given time so
> > to avoid OOM conditions, like the swap-compress logic that limits the
> > swapdevice size to less than ram.

(I don't know anything about that swap-compress logic and limitation.)

> 
> Are those pages accounted for in the vm_overcommit logic, as if you
> allocate a big chunk of memory as KSM will do you need the worst case
> vm_overcommit behaviour preserved and that means keeping the stats
> correct.

As I understand it, KSM won't affect the vm_overcommit behaviour at all.
Those pages Izik refers to are not allocated up front, they're just a
limit on the number of process pages which may get held in core at any
one time, through being shared via the KSM mechanism.

KSM is not evading vm_committed_space at all, not opening a backdoor
away from the ordinary mmaps: just collapsing duplicated pages in
what's been mapped in the usual way, down to single copies.

So the vm_commited_space accounting is exactly as before: it would
be a bit odd to be running KSM along with OVERCOMMIT_NEVER, but it
doesn't change its calculations at all - it will and will have to
be as pessimistic as it ever was.

The only difference would be in how much memory (mostly lowmem)
KSM's own data structures will take up - as usual, the kernel
data structures aren't being accounted, but do take up memory.

Hugh

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
@ 2009-05-06 14:46                     ` Hugh Dickins
  0 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-06 14:46 UTC (permalink / raw)
  To: Alan Cox
  Cc: Andrea Arcangeli, Rik van Riel, Izik Eidus, akpm, linux-kernel,
	chrisw, device, linux-mm, nickpiggin

On Wed, 6 May 2009, Alan Cox wrote:
> > the max number of ksm pages that can be allocated at any given time so
> > to avoid OOM conditions, like the swap-compress logic that limits the
> > swapdevice size to less than ram.

(I don't know anything about that swap-compress logic and limitation.)

> 
> Are those pages accounted for in the vm_overcommit logic, as if you
> allocate a big chunk of memory as KSM will do you need the worst case
> vm_overcommit behaviour preserved and that means keeping the stats
> correct.

As I understand it, KSM won't affect the vm_overcommit behaviour at all.
Those pages Izik refers to are not allocated up front, they're just a
limit on the number of process pages which may get held in core at any
one time, through being shared via the KSM mechanism.

KSM is not evading vm_committed_space at all, not opening a backdoor
away from the ordinary mmaps: just collapsing duplicated pages in
what's been mapped in the usual way, down to single copies.

So the vm_commited_space accounting is exactly as before: it would
be a bit odd to be running KSM along with OVERCOMMIT_NEVER, but it
doesn't change its calculations at all - it will and will have to
be as pessimistic as it ever was.

The only difference would be in how much memory (mostly lowmem)
KSM's own data structures will take up - as usual, the kernel
data structures aren't being accounted, but do take up memory.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 14:46                     ` Hugh Dickins
@ 2009-05-06 14:56                       ` Andrea Arcangeli
  -1 siblings, 0 replies; 108+ messages in thread
From: Andrea Arcangeli @ 2009-05-06 14:56 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Alan Cox, Rik van Riel, Izik Eidus, akpm, linux-kernel, chrisw,
	device, linux-mm, nickpiggin

On Wed, May 06, 2009 at 03:46:31PM +0100, Hugh Dickins wrote:
> As I understand it, KSM won't affect the vm_overcommit behaviour at all.

In short vm_overcommit is a virtual thing, KSM only makes virtual
takes less physical than before. One issue in KSM that was mentioned
was the cgroup accounting if you merge two pages in different groups
but that is kind of a corner case and it'll be handled "somehow" :)

> The only difference would be in how much memory (mostly lowmem)
> KSM's own data structures will take up - as usual, the kernel
> data structures aren't being accounted, but do take up memory.

Oh yeah, on 32bit systems that would be a problem... That lowmem is
taken for eacy virtual address scanned. One more reason to still allow
ksm to all users only selectively through chown/chmod with ioctl or
sysfs permissions with syscall/madvise. Luckily most systems where ksm
is used are 64bit. We don't plan to kmap_atomic around the
rmap_item/tree_item. No ram is allocated in the holes though, so if
there's not a real anonymous page allocated the rmap_item will not be
allocated either (without requiring pending update ;).

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
@ 2009-05-06 14:56                       ` Andrea Arcangeli
  0 siblings, 0 replies; 108+ messages in thread
From: Andrea Arcangeli @ 2009-05-06 14:56 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Alan Cox, Rik van Riel, Izik Eidus, akpm, linux-kernel, chrisw,
	device, linux-mm, nickpiggin

On Wed, May 06, 2009 at 03:46:31PM +0100, Hugh Dickins wrote:
> As I understand it, KSM won't affect the vm_overcommit behaviour at all.

In short vm_overcommit is a virtual thing, KSM only makes virtual
takes less physical than before. One issue in KSM that was mentioned
was the cgroup accounting if you merge two pages in different groups
but that is kind of a corner case and it'll be handled "somehow" :)

> The only difference would be in how much memory (mostly lowmem)
> KSM's own data structures will take up - as usual, the kernel
> data structures aren't being accounted, but do take up memory.

Oh yeah, on 32bit systems that would be a problem... That lowmem is
taken for eacy virtual address scanned. One more reason to still allow
ksm to all users only selectively through chown/chmod with ioctl or
sysfs permissions with syscall/madvise. Luckily most systems where ksm
is used are 64bit. We don't plan to kmap_atomic around the
rmap_item/tree_item. No ram is allocated in the holes though, so if
there's not a real anonymous page allocated the rmap_item will not be
allocated either (without requiring pending update ;).

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 14:46                     ` Hugh Dickins
@ 2009-05-06 14:57                       ` Izik Eidus
  -1 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-06 14:57 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Alan Cox, Andrea Arcangeli, Rik van Riel, akpm, linux-kernel,
	chrisw, device, linux-mm, nickpiggin

Hugh Dickins wrote:
> On Wed, 6 May 2009, Alan Cox wrote:
>   
>>> the max number of ksm pages that can be allocated at any given time so
>>> to avoid OOM conditions, like the swap-compress logic that limits the
>>> swapdevice size to less than ram.
>>>       
>
> (I don't know anything about that swap-compress logic and limitation.)
>
>   
>> Are those pages accounted for in the vm_overcommit logic, as if you
>> allocate a big chunk of memory as KSM will do you need the worst case
>> vm_overcommit behaviour preserved and that means keeping the stats
>> correct.
>>     
>
> As I understand it, KSM won't affect the vm_overcommit behaviour at all.
> Those pages Izik refers to are not allocated up front, they're just a
> limit on the number of process pages which may get held in core at any
> one time, through being shared via the KSM mechanism.
>   

Exactly, this pages are not swappable (now), so we allow the sysadmin to 
control the maximum value of them.
> KSM is not evading vm_committed_space at all, not opening a backdoor
> away from the ordinary mmaps: just collapsing duplicated pages in
> what's been mapped in the usual way, down to single copies.
>
> So the vm_commited_space accounting is exactly as before: it would
> be a bit odd to be running KSM along with OVERCOMMIT_NEVER, but it
> doesn't change its calculations at all - it will and will have to
> be as pessimistic as it ever was.
>
> The only difference would be in how much memory (mostly lowmem)
> KSM's own data structures will take up - as usual, the kernel
> data structures aren't being accounted, but do take up memory.
>
> Hugh
>   


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
@ 2009-05-06 14:57                       ` Izik Eidus
  0 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-06 14:57 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Alan Cox, Andrea Arcangeli, Rik van Riel, akpm, linux-kernel,
	chrisw, device, linux-mm, nickpiggin

Hugh Dickins wrote:
> On Wed, 6 May 2009, Alan Cox wrote:
>   
>>> the max number of ksm pages that can be allocated at any given time so
>>> to avoid OOM conditions, like the swap-compress logic that limits the
>>> swapdevice size to less than ram.
>>>       
>
> (I don't know anything about that swap-compress logic and limitation.)
>
>   
>> Are those pages accounted for in the vm_overcommit logic, as if you
>> allocate a big chunk of memory as KSM will do you need the worst case
>> vm_overcommit behaviour preserved and that means keeping the stats
>> correct.
>>     
>
> As I understand it, KSM won't affect the vm_overcommit behaviour at all.
> Those pages Izik refers to are not allocated up front, they're just a
> limit on the number of process pages which may get held in core at any
> one time, through being shared via the KSM mechanism.
>   

Exactly, this pages are not swappable (now), so we allow the sysadmin to 
control the maximum value of them.
> KSM is not evading vm_committed_space at all, not opening a backdoor
> away from the ordinary mmaps: just collapsing duplicated pages in
> what's been mapped in the usual way, down to single copies.
>
> So the vm_commited_space accounting is exactly as before: it would
> be a bit odd to be running KSM along with OVERCOMMIT_NEVER, but it
> doesn't change its calculations at all - it will and will have to
> be as pessimistic as it ever was.
>
> The only difference would be in how much memory (mostly lowmem)
> KSM's own data structures will take up - as usual, the kernel
> data structures aren't being accounted, but do take up memory.
>
> Hugh
>   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06 11:16             ` Hugh Dickins
@ 2009-05-06 15:27               ` Izik Eidus
  -1 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-06 15:27 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Rik van Riel, akpm, linux-kernel, aarcange, chrisw, alan, device,
	linux-mm, nickpiggin

Hugh Dickins wrote:
>
> If KSM is to behave in the usual madvise way, it'll need to be informed
> of unmaps.  And I suspect it may need to be informed of them, even if we
> let it continue to apply to empty address space.  Because even with your
> more limited unsigned int nrpages interface, the caller can specify an
> enormous range on 64-bit, and ksm.c be fully occupied just incrementing
> from one absent page to the next. 

That is a good point that i didnt think about it.
It is possible to make ksm "unmaped memory" aware by using find_vma(), 
and skipped non mapped area.
But that start to look bad... (I can make that just by every place that 
if_present_pte() fail, and then dont even hurt the scaning performence, 
beacuse i will just check it when the first virtual address is not present)

But why not go go step by step?
We can first start with this ioctl interface, later when we add swapping 
to the pages, we can have madvice, and still (probably easily) support 
the ioctls by just calling from inside ksm the madvice functions for 
that specific address)

I want to see ksm use madvice, but i believe it require some more 
changes to mm/*.c, so it probably better to start with merging it when 
it doesnt touch alot of stuff outisde ksm.c, and then to add swapping 
and after that add madvice support (when the pages are swappable, 
everyone can use it)

What you think about that?

>  mmap's vma ranges confine the space
> to be searched, and instantiated pagetables confine it further: I think
> you're either going to need to rely upon those to confine your search
> area, or else enhance your own data structures to confine it.
>
> But I do appreciate the separation you've kept so far,
> and wouldn't want to tie it all together too closely.
>
> Hugh
>
> p.s.  I wish you'd chosen different name than KSM - the kernel
> has supported shared memory for many years - and notice ksm.c itself
> says "Memory merging driver".  "Merge" would indeed have been a less
> ambiguous term than "Share", but I think too late to change that now
> - except possibly in the MADV_ flag names?
>   


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06 15:27               ` Izik Eidus
  0 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-06 15:27 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Rik van Riel, akpm, linux-kernel, aarcange, chrisw, alan, device,
	linux-mm, nickpiggin

Hugh Dickins wrote:
>
> If KSM is to behave in the usual madvise way, it'll need to be informed
> of unmaps.  And I suspect it may need to be informed of them, even if we
> let it continue to apply to empty address space.  Because even with your
> more limited unsigned int nrpages interface, the caller can specify an
> enormous range on 64-bit, and ksm.c be fully occupied just incrementing
> from one absent page to the next. 

That is a good point that i didnt think about it.
It is possible to make ksm "unmaped memory" aware by using find_vma(), 
and skipped non mapped area.
But that start to look bad... (I can make that just by every place that 
if_present_pte() fail, and then dont even hurt the scaning performence, 
beacuse i will just check it when the first virtual address is not present)

But why not go go step by step?
We can first start with this ioctl interface, later when we add swapping 
to the pages, we can have madvice, and still (probably easily) support 
the ioctls by just calling from inside ksm the madvice functions for 
that specific address)

I want to see ksm use madvice, but i believe it require some more 
changes to mm/*.c, so it probably better to start with merging it when 
it doesnt touch alot of stuff outisde ksm.c, and then to add swapping 
and after that add madvice support (when the pages are swappable, 
everyone can use it)

What you think about that?

>  mmap's vma ranges confine the space
> to be searched, and instantiated pagetables confine it further: I think
> you're either going to need to rely upon those to confine your search
> area, or else enhance your own data structures to confine it.
>
> But I do appreciate the separation you've kept so far,
> and wouldn't want to tie it all together too closely.
>
> Hugh
>
> p.s.  I wish you'd chosen different name than KSM - the kernel
> has supported shared memory for many years - and notice ksm.c itself
> says "Memory merging driver".  "Merge" would indeed have been a less
> ambiguous term than "Share", but I think too late to change that now
> - except possibly in the MADV_ flag names?
>   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06 14:45                   ` Andrea Arcangeli
@ 2009-05-06 15:36                     ` Chris Wright
  -1 siblings, 0 replies; 108+ messages in thread
From: Chris Wright @ 2009-05-06 15:36 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Hugh Dickins, Izik Eidus, Rik van Riel, akpm, linux-kernel,
	chrisw, alan, device, linux-mm, nickpiggin

* Andrea Arcangeli (aarcange@redhat.com) wrote:
> On Wed, May 06, 2009 at 03:25:25PM +0100, Hugh Dickins wrote:
> > silently ignore may fit better with whether module has been loaded yet
> > (we can keep a list of what's registered, for when module is loaded).
> 
> NOTE: it will not fail if the module isn't loaded yet. It must
> succeed! Otherwise it would also need to fail after it succeeded if we
> unload the module later...

I actually see little value in it even being modular.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06 15:36                     ` Chris Wright
  0 siblings, 0 replies; 108+ messages in thread
From: Chris Wright @ 2009-05-06 15:36 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Hugh Dickins, Izik Eidus, Rik van Riel, akpm, linux-kernel,
	chrisw, alan, device, linux-mm, nickpiggin

* Andrea Arcangeli (aarcange@redhat.com) wrote:
> On Wed, May 06, 2009 at 03:25:25PM +0100, Hugh Dickins wrote:
> > silently ignore may fit better with whether module has been loaded yet
> > (we can keep a list of what's registered, for when module is loaded).
> 
> NOTE: it will not fail if the module isn't loaded yet. It must
> succeed! Otherwise it would also need to fail after it succeeded if we
> unload the module later...

I actually see little value in it even being modular.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06 15:27               ` Izik Eidus
@ 2009-05-06 16:14                 ` Chris Wright
  -1 siblings, 0 replies; 108+ messages in thread
From: Chris Wright @ 2009-05-06 16:14 UTC (permalink / raw)
  To: Izik Eidus
  Cc: Hugh Dickins, Rik van Riel, akpm, linux-kernel, aarcange, chrisw,
	alan, device, linux-mm, nickpiggin

* Izik Eidus (ieidus@redhat.com) wrote:
> But why not go go step by step?
> We can first start with this ioctl interface, later when we add swapping  
> to the pages, we can have madvice, and still (probably easily) support  
> the ioctls by just calling from inside ksm the madvice functions for  
> that specific address)

Then we have 2 interfaces to maintain.  Makes more sense to try and get
it right the first time.

> I want to see ksm use madvice, but i believe it require some more  
> changes to mm/*.c, so it probably better to start with merging it when  
> it doesnt touch alot of stuff outisde ksm.c, and then to add swapping  
> and after that add madvice support (when the pages are swappable,  
> everyone can use it)

There's already locking issues w/ using madvise and ksm, so yes,
changes would need to be made.  Some question of how (whether) to handle
registration of unmapped ranges, closest to say ->mm->def_flags=VM_MERGE.
My hunch is there's 2 cases users might care about, a specific range
(qemu-kvm, CERN app, etc) or the entire vma space of a process.  Another
question of what to do w/ VM_LOCKED, should that exclude VM_MERGE or
let user get what asked for?

thanks,
-chris

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06 16:14                 ` Chris Wright
  0 siblings, 0 replies; 108+ messages in thread
From: Chris Wright @ 2009-05-06 16:14 UTC (permalink / raw)
  To: Izik Eidus
  Cc: Hugh Dickins, Rik van Riel, akpm, linux-kernel, aarcange, chrisw,
	alan, device, linux-mm, nickpiggin

* Izik Eidus (ieidus@redhat.com) wrote:
> But why not go go step by step?
> We can first start with this ioctl interface, later when we add swapping  
> to the pages, we can have madvice, and still (probably easily) support  
> the ioctls by just calling from inside ksm the madvice functions for  
> that specific address)

Then we have 2 interfaces to maintain.  Makes more sense to try and get
it right the first time.

> I want to see ksm use madvice, but i believe it require some more  
> changes to mm/*.c, so it probably better to start with merging it when  
> it doesnt touch alot of stuff outisde ksm.c, and then to add swapping  
> and after that add madvice support (when the pages are swappable,  
> everyone can use it)

There's already locking issues w/ using madvise and ksm, so yes,
changes would need to be made.  Some question of how (whether) to handle
registration of unmapped ranges, closest to say ->mm->def_flags=VM_MERGE.
My hunch is there's 2 cases users might care about, a specific range
(qemu-kvm, CERN app, etc) or the entire vma space of a process.  Another
question of what to do w/ VM_LOCKED, should that exclude VM_MERGE or
let user get what asked for?

thanks,
-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06 15:27               ` Izik Eidus
@ 2009-05-06 16:26                 ` Hugh Dickins
  -1 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-06 16:26 UTC (permalink / raw)
  To: Izik Eidus
  Cc: Rik van Riel, akpm, linux-kernel, aarcange, chrisw, alan, device,
	linux-mm, nickpiggin

On Wed, 6 May 2009, Izik Eidus wrote:
> 
> We can first start with this ioctl interface, later when we add swapping to
> the pages, we can have madvice, and still (probably easily) support the ioctls
> by just calling from inside ksm the madvice functions for that specific
> address)
> 
> I want to see ksm use madvice, but i believe it require some more changes to
> mm/*.c, so it probably better to start with merging it when it doesnt touch
> alot of stuff outisde ksm.c, and then to add swapping and after that add
> madvice support (when the pages are swappable, everyone can use it)
> 
> What you think about that?

I think it's the wrong order to follow.

The /dev/ksm interface is fine for your use while it's out of tree,
but we want to get the user interface right when bringing it into
mainline.  I recall Chris being very clear on that too.

Changing from /dev/ksm to madvise() is not a lot of work, it's mainly
a matter of deleting code, and tidying up interfaces which would need
more work anyway (I haven't commented on your curious -EPERMs yet!).

It doesn't involve whether you've enabled swapping or not - let's
forget the CAP_IPC_LOCK idea, and delegate that issue to limitation
via sysfs, along with the ability to limit wild overuse of the feature
- permissions on a sysfs node or something else?

It does nudge towards making some decisions which need to be made
anyway - that tends to be what a correct interface forces upon you.
Like the issue of whether to go on covering unmapped areas or not -
though possibly we could put off that decision, if it's doc'ed
for now.

And if it only covers mapped areas, then there will need to be a
VM_flag for it, mainly so mm can call into ksm.c when it's unmapped;
but I don't see it sinking hooks deeply into mm/*.c.

Hugh

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06 16:26                 ` Hugh Dickins
  0 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-06 16:26 UTC (permalink / raw)
  To: Izik Eidus
  Cc: Rik van Riel, akpm, linux-kernel, aarcange, chrisw, alan, device,
	linux-mm, nickpiggin

On Wed, 6 May 2009, Izik Eidus wrote:
> 
> We can first start with this ioctl interface, later when we add swapping to
> the pages, we can have madvice, and still (probably easily) support the ioctls
> by just calling from inside ksm the madvice functions for that specific
> address)
> 
> I want to see ksm use madvice, but i believe it require some more changes to
> mm/*.c, so it probably better to start with merging it when it doesnt touch
> alot of stuff outisde ksm.c, and then to add swapping and after that add
> madvice support (when the pages are swappable, everyone can use it)
> 
> What you think about that?

I think it's the wrong order to follow.

The /dev/ksm interface is fine for your use while it's out of tree,
but we want to get the user interface right when bringing it into
mainline.  I recall Chris being very clear on that too.

Changing from /dev/ksm to madvise() is not a lot of work, it's mainly
a matter of deleting code, and tidying up interfaces which would need
more work anyway (I haven't commented on your curious -EPERMs yet!).

It doesn't involve whether you've enabled swapping or not - let's
forget the CAP_IPC_LOCK idea, and delegate that issue to limitation
via sysfs, along with the ability to limit wild overuse of the feature
- permissions on a sysfs node or something else?

It does nudge towards making some decisions which need to be made
anyway - that tends to be what a correct interface forces upon you.
Like the issue of whether to go on covering unmapped areas or not -
though possibly we could put off that decision, if it's doc'ed
for now.

And if it only covers mapped areas, then there will need to be a
VM_flag for it, mainly so mm can call into ksm.c when it's unmapped;
but I don't see it sinking hooks deeply into mm/*.c.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06 16:14                 ` Chris Wright
@ 2009-05-06 16:36                   ` Hugh Dickins
  -1 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-06 16:36 UTC (permalink / raw)
  To: Chris Wright
  Cc: Izik Eidus, Rik van Riel, akpm, linux-kernel, aarcange, alan,
	device, linux-mm, nickpiggin

On Wed, 6 May 2009, Chris Wright wrote:
> 
> There's already locking issues w/ using madvise and ksm, so yes,
> changes would need to be made.  Some question of how (whether) to handle
> registration of unmapped ranges, closest to say ->mm->def_flags=VM_MERGE.
> My hunch is there's 2 cases users might care about, a specific range
> (qemu-kvm, CERN app, etc) or the entire vma space of a process.

Good food for thought there, but not on my mind at this moment.

> Another
> question of what to do w/ VM_LOCKED, should that exclude VM_MERGE or
> let user get what asked for?

What's the issue with VM_LOCKED?  We wouldn't want to merge a page
while it was under get_user_pages (unless KSM's own, but ignore that),
but what's the deal with VM_LOCKED?

Is the phrase "covert channel" going to come up somehow?

Hugh

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06 16:36                   ` Hugh Dickins
  0 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-06 16:36 UTC (permalink / raw)
  To: Chris Wright
  Cc: Izik Eidus, Rik van Riel, akpm, linux-kernel, aarcange, alan,
	device, linux-mm, nickpiggin

On Wed, 6 May 2009, Chris Wright wrote:
> 
> There's already locking issues w/ using madvise and ksm, so yes,
> changes would need to be made.  Some question of how (whether) to handle
> registration of unmapped ranges, closest to say ->mm->def_flags=VM_MERGE.
> My hunch is there's 2 cases users might care about, a specific range
> (qemu-kvm, CERN app, etc) or the entire vma space of a process.

Good food for thought there, but not on my mind at this moment.

> Another
> question of what to do w/ VM_LOCKED, should that exclude VM_MERGE or
> let user get what asked for?

What's the issue with VM_LOCKED?  We wouldn't want to merge a page
while it was under get_user_pages (unless KSM's own, but ignore that),
but what's the deal with VM_LOCKED?

Is the phrase "covert channel" going to come up somehow?

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06 13:56                 ` Izik Eidus
@ 2009-05-06 16:41                   ` Hugh Dickins
  -1 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-06 16:41 UTC (permalink / raw)
  To: Izik Eidus
  Cc: Andrea Arcangeli, Rik van Riel, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

On Wed, 6 May 2009, Izik Eidus wrote:
> Andrea Arcangeli wrote:
> > On Wed, May 06, 2009 at 12:16:52PM +0100, Hugh Dickins wrote:
> >   
> >
> > > p.s.  I wish you'd chosen different name than KSM - the kernel
> > > has supported shared memory for many years - and notice ksm.c itself
> > > says "Memory merging driver".  "Merge" would indeed have been a less
> > > ambiguous term than "Share", but I think too late to change that now
> > > - except possibly in the MADV_ flag names?
> > >     
> >
> > I don't actually care about names, so I leave this to other to discuss.
> >   
> Well, There is no real problem changing the name, any suggestions?

mm/merge.c or mm/mmerge.c: the module formerly known as KSM ?

Hugh

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06 16:41                   ` Hugh Dickins
  0 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-06 16:41 UTC (permalink / raw)
  To: Izik Eidus
  Cc: Andrea Arcangeli, Rik van Riel, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

On Wed, 6 May 2009, Izik Eidus wrote:
> Andrea Arcangeli wrote:
> > On Wed, May 06, 2009 at 12:16:52PM +0100, Hugh Dickins wrote:
> >   
> >
> > > p.s.  I wish you'd chosen different name than KSM - the kernel
> > > has supported shared memory for many years - and notice ksm.c itself
> > > says "Memory merging driver".  "Merge" would indeed have been a less
> > > ambiguous term than "Share", but I think too late to change that now
> > > - except possibly in the MADV_ flag names?
> > >     
> >
> > I don't actually care about names, so I leave this to other to discuss.
> >   
> Well, There is no real problem changing the name, any suggestions?

mm/merge.c or mm/mmerge.c: the module formerly known as KSM ?

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06 16:41                   ` Hugh Dickins
@ 2009-05-06 16:49                     ` Chris Wright
  -1 siblings, 0 replies; 108+ messages in thread
From: Chris Wright @ 2009-05-06 16:49 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Izik Eidus, Andrea Arcangeli, Rik van Riel, akpm, linux-kernel,
	chrisw, alan, device, linux-mm, nickpiggin

* Hugh Dickins (hugh@veritas.com) wrote:
> On Wed, 6 May 2009, Izik Eidus wrote:
> > Andrea Arcangeli wrote:
> > > On Wed, May 06, 2009 at 12:16:52PM +0100, Hugh Dickins wrote:
> > >   
> > >
> > > > p.s.  I wish you'd chosen different name than KSM - the kernel
> > > > has supported shared memory for many years - and notice ksm.c itself
> > > > says "Memory merging driver".  "Merge" would indeed have been a less
> > > > ambiguous term than "Share", but I think too late to change that now
> > > > - except possibly in the MADV_ flag names?
> > > >     
> > >
> > > I don't actually care about names, so I leave this to other to discuss.
> > >   
> > Well, There is no real problem changing the name, any suggestions?
> 
> mm/merge.c or mm/mmerge.c: the module formerly known as KSM ?

I like merge.  For madvise() approach I had used:

+#define MADV_SHAREABLE 12              /* can share identical pages */
+#define MADV_UNSHAREABLE 13            /* can not share identical pages

But those are maybe better put as MADV_(UN)MERGEABLE (gets a bit confusing when
you talk of merging vmas ;-)
*/

thanks,
-chris

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06 16:49                     ` Chris Wright
  0 siblings, 0 replies; 108+ messages in thread
From: Chris Wright @ 2009-05-06 16:49 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Izik Eidus, Andrea Arcangeli, Rik van Riel, akpm, linux-kernel,
	chrisw, alan, device, linux-mm, nickpiggin

* Hugh Dickins (hugh@veritas.com) wrote:
> On Wed, 6 May 2009, Izik Eidus wrote:
> > Andrea Arcangeli wrote:
> > > On Wed, May 06, 2009 at 12:16:52PM +0100, Hugh Dickins wrote:
> > >   
> > >
> > > > p.s.  I wish you'd chosen different name than KSM - the kernel
> > > > has supported shared memory for many years - and notice ksm.c itself
> > > > says "Memory merging driver".  "Merge" would indeed have been a less
> > > > ambiguous term than "Share", but I think too late to change that now
> > > > - except possibly in the MADV_ flag names?
> > > >     
> > >
> > > I don't actually care about names, so I leave this to other to discuss.
> > >   
> > Well, There is no real problem changing the name, any suggestions?
> 
> mm/merge.c or mm/mmerge.c: the module formerly known as KSM ?

I like merge.  For madvise() approach I had used:

+#define MADV_SHAREABLE 12              /* can share identical pages */
+#define MADV_UNSHAREABLE 13            /* can not share identical pages

But those are maybe better put as MADV_(UN)MERGEABLE (gets a bit confusing when
you talk of merging vmas ;-)
*/

thanks,
-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06 16:49                     ` Chris Wright
@ 2009-05-06 16:57                       ` Hugh Dickins
  -1 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-06 16:57 UTC (permalink / raw)
  To: Chris Wright
  Cc: Izik Eidus, Andrea Arcangeli, Rik van Riel, akpm, linux-kernel,
	alan, device, linux-mm, nickpiggin

On Wed, 6 May 2009, Chris Wright wrote:
> 
> I like merge.  For madvise() approach I had used:
> 
> +#define MADV_SHAREABLE 12              /* can share identical pages */
> +#define MADV_UNSHAREABLE 13            /* can not share identical pages */
> 
> But those are maybe better put as MADV_(UN)MERGEABLE
> (gets a bit confusing when you talk of merging vmas ;-)

That's true, I hadn't remembered that.  Not _very_ confusing,
but the last thing I'd want is to bully everyone into changing
their familiar name for something, and the result just as poor.
No need to decide today anyway: let's see how others feel.

Hugh

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06 16:57                       ` Hugh Dickins
  0 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-06 16:57 UTC (permalink / raw)
  To: Chris Wright
  Cc: Izik Eidus, Andrea Arcangeli, Rik van Riel, akpm, linux-kernel,
	alan, device, linux-mm, nickpiggin

On Wed, 6 May 2009, Chris Wright wrote:
> 
> I like merge.  For madvise() approach I had used:
> 
> +#define MADV_SHAREABLE 12              /* can share identical pages */
> +#define MADV_UNSHAREABLE 13            /* can not share identical pages */
> 
> But those are maybe better put as MADV_(UN)MERGEABLE
> (gets a bit confusing when you talk of merging vmas ;-)

That's true, I hadn't remembered that.  Not _very_ confusing,
but the last thing I'd want is to bully everyone into changing
their familiar name for something, and the result just as poor.
No need to decide today anyway: let's see how others feel.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06 16:26                 ` Hugh Dickins
@ 2009-05-06 16:58                   ` Izik Eidus
  -1 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-06 16:58 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Rik van Riel, akpm, linux-kernel, aarcange, chrisw, alan, device,
	linux-mm, nickpiggin

Hugh Dickins wrote:
> On Wed, 6 May 2009, Izik Eidus wrote:
>   
>> We can first start with this ioctl interface, later when we add swapping to
>> the pages, we can have madvice, and still (probably easily) support the ioctls
>> by just calling from inside ksm the madvice functions for that specific
>> address)
>>
>> I want to see ksm use madvice, but i believe it require some more changes to
>> mm/*.c, so it probably better to start with merging it when it doesnt touch
>> alot of stuff outisde ksm.c, and then to add swapping and after that add
>> madvice support (when the pages are swappable, everyone can use it)
>>
>> What you think about that?
>>     
>
> I think it's the wrong order to follow.
>
> The /dev/ksm interface is fine for your use while it's out of tree,
> but we want to get the user interface right when bringing it into
> mainline.  I recall Chris being very clear on that too.
>
> Changing from /dev/ksm to madvise() is not a lot of work, it's mainly
> a matter of deleting code, and tidying up interfaces which would need
> more work anyway (I haven't commented on your curious -EPERMs yet!).
>
> It doesn't involve whether you've enabled swapping or not - let's
> forget the CAP_IPC_LOCK idea, and delegate that issue to limitation
> via sysfs, along with the ability to limit wild overuse of the feature
> - permissions on a sysfs node or something else?
>
> It does nudge towards making some decisions which need to be made
> anyway - that tends to be what a correct interface forces upon you.
> Like the issue of whether to go on covering unmapped areas or not -
> though possibly we could put off that decision, if it's doc'ed
> for now.
>
> And if it only covers mapped areas, then there will need to be a
> VM_flag for it, mainly so mm can call into ksm.c when it's unmapped;
> but I don't see it sinking hooks deeply into mm/*.c.
>
> Hugh
>   
Ok, i give up, lets move to madvice(), i will write a patch that move 
the whole thing into madvice after i finish here something, but that 
ofcurse only if Andrea agree for the move?

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06 16:58                   ` Izik Eidus
  0 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-06 16:58 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Rik van Riel, akpm, linux-kernel, aarcange, chrisw, alan, device,
	linux-mm, nickpiggin

Hugh Dickins wrote:
> On Wed, 6 May 2009, Izik Eidus wrote:
>   
>> We can first start with this ioctl interface, later when we add swapping to
>> the pages, we can have madvice, and still (probably easily) support the ioctls
>> by just calling from inside ksm the madvice functions for that specific
>> address)
>>
>> I want to see ksm use madvice, but i believe it require some more changes to
>> mm/*.c, so it probably better to start with merging it when it doesnt touch
>> alot of stuff outisde ksm.c, and then to add swapping and after that add
>> madvice support (when the pages are swappable, everyone can use it)
>>
>> What you think about that?
>>     
>
> I think it's the wrong order to follow.
>
> The /dev/ksm interface is fine for your use while it's out of tree,
> but we want to get the user interface right when bringing it into
> mainline.  I recall Chris being very clear on that too.
>
> Changing from /dev/ksm to madvise() is not a lot of work, it's mainly
> a matter of deleting code, and tidying up interfaces which would need
> more work anyway (I haven't commented on your curious -EPERMs yet!).
>
> It doesn't involve whether you've enabled swapping or not - let's
> forget the CAP_IPC_LOCK idea, and delegate that issue to limitation
> via sysfs, along with the ability to limit wild overuse of the feature
> - permissions on a sysfs node or something else?
>
> It does nudge towards making some decisions which need to be made
> anyway - that tends to be what a correct interface forces upon you.
> Like the issue of whether to go on covering unmapped areas or not -
> though possibly we could put off that decision, if it's doc'ed
> for now.
>
> And if it only covers mapped areas, then there will need to be a
> VM_flag for it, mainly so mm can call into ksm.c when it's unmapped;
> but I don't see it sinking hooks deeply into mm/*.c.
>
> Hugh
>   
Ok, i give up, lets move to madvice(), i will write a patch that move 
the whole thing into madvice after i finish here something, but that 
ofcurse only if Andrea agree for the move?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06 16:49                     ` Chris Wright
@ 2009-05-06 16:59                       ` Izik Eidus
  -1 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-06 16:59 UTC (permalink / raw)
  To: Chris Wright
  Cc: Hugh Dickins, Andrea Arcangeli, Rik van Riel, akpm, linux-kernel,
	alan, device, linux-mm, nickpiggin

Chris Wright wrote:
> * Hugh Dickins (hugh@veritas.com) wrote:
>   
>> On Wed, 6 May 2009, Izik Eidus wrote:
>>     
>>> Andrea Arcangeli wrote:
>>>       
>>>> On Wed, May 06, 2009 at 12:16:52PM +0100, Hugh Dickins wrote:
>>>>   
>>>>
>>>>         
>>>>> p.s.  I wish you'd chosen different name than KSM - the kernel
>>>>> has supported shared memory for many years - and notice ksm.c itself
>>>>> says "Memory merging driver".  "Merge" would indeed have been a less
>>>>> ambiguous term than "Share", but I think too late to change that now
>>>>> - except possibly in the MADV_ flag names?
>>>>>     
>>>>>           
>>>> I don't actually care about names, so I leave this to other to discuss.
>>>>   
>>>>         
>>> Well, There is no real problem changing the name, any suggestions?
>>>       
>> mm/merge.c or mm/mmerge.c: the module formerly known as KSM ?
>>     
>
> I like merge.
Sold !.
>
> thanks,
> -chris
>   


^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06 16:59                       ` Izik Eidus
  0 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-06 16:59 UTC (permalink / raw)
  To: Chris Wright
  Cc: Hugh Dickins, Andrea Arcangeli, Rik van Riel, akpm, linux-kernel,
	alan, device, linux-mm, nickpiggin

Chris Wright wrote:
> * Hugh Dickins (hugh@veritas.com) wrote:
>   
>> On Wed, 6 May 2009, Izik Eidus wrote:
>>     
>>> Andrea Arcangeli wrote:
>>>       
>>>> On Wed, May 06, 2009 at 12:16:52PM +0100, Hugh Dickins wrote:
>>>>   
>>>>
>>>>         
>>>>> p.s.  I wish you'd chosen different name than KSM - the kernel
>>>>> has supported shared memory for many years - and notice ksm.c itself
>>>>> says "Memory merging driver".  "Merge" would indeed have been a less
>>>>> ambiguous term than "Share", but I think too late to change that now
>>>>> - except possibly in the MADV_ flag names?
>>>>>     
>>>>>           
>>>> I don't actually care about names, so I leave this to other to discuss.
>>>>   
>>>>         
>>> Well, There is no real problem changing the name, any suggestions?
>>>       
>> mm/merge.c or mm/mmerge.c: the module formerly known as KSM ?
>>     
>
> I like merge.
Sold !.
>
> thanks,
> -chris
>   

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06 16:36                   ` Hugh Dickins
@ 2009-05-06 17:09                     ` Chris Wright
  -1 siblings, 0 replies; 108+ messages in thread
From: Chris Wright @ 2009-05-06 17:09 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Chris Wright, Izik Eidus, Rik van Riel, akpm, linux-kernel,
	aarcange, alan, device, linux-mm, nickpiggin

* Hugh Dickins (hugh@veritas.com) wrote:
> On Wed, 6 May 2009, Chris Wright wrote:
> > Another
> > question of what to do w/ VM_LOCKED, should that exclude VM_MERGE or
> > let user get what asked for?
> 
> What's the issue with VM_LOCKED?  We wouldn't want to merge a page
> while it was under get_user_pages (unless KSM's own, but ignore that),
> but what's the deal with VM_LOCKED?
> 
> Is the phrase "covert channel" going to come up somehow?

There's two (still hand wavy) conerns I see there.  First is the security
implication: timing writes to see cow and guess the shared data for
another apps VM_LOCKED region, second is just plain old complaints of
those rt latency sensitive apps that somehow have VM_LOCKED|VM_MERGE
and complain of COW fault time, probably just "don't do that".

thanks,
-chris

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06 17:09                     ` Chris Wright
  0 siblings, 0 replies; 108+ messages in thread
From: Chris Wright @ 2009-05-06 17:09 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Chris Wright, Izik Eidus, Rik van Riel, akpm, linux-kernel,
	aarcange, alan, device, linux-mm, nickpiggin

* Hugh Dickins (hugh@veritas.com) wrote:
> On Wed, 6 May 2009, Chris Wright wrote:
> > Another
> > question of what to do w/ VM_LOCKED, should that exclude VM_MERGE or
> > let user get what asked for?
> 
> What's the issue with VM_LOCKED?  We wouldn't want to merge a page
> while it was under get_user_pages (unless KSM's own, but ignore that),
> but what's the deal with VM_LOCKED?
> 
> Is the phrase "covert channel" going to come up somehow?

There's two (still hand wavy) conerns I see there.  First is the security
implication: timing writes to see cow and guess the shared data for
another apps VM_LOCKED region, second is just plain old complaints of
those rt latency sensitive apps that somehow have VM_LOCKED|VM_MERGE
and complain of COW fault time, probably just "don't do that".

thanks,
-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 14:02                 ` Izik Eidus
@ 2009-05-06 17:11                   ` Hugh Dickins
  -1 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-06 17:11 UTC (permalink / raw)
  To: Izik Eidus
  Cc: Andrea Arcangeli, Rik van Riel, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

On Wed, 6 May 2009, Izik Eidus wrote:
> Hugh Dickins wrote:
> >
> > There may prove to be various reasons why it wouldn't work out in practice;
> > but when thinking of swapping them, it is worth considering if those KSM
> > pages can just be assigned to a tmpfs file, then leave the swapping to that.
> 
> The problem here (as i see it) is reverse mapping for this vmas that point
> into the shared page.
> Right now linux use the ->index to find this pages and then unpresent them...
> But even if we move into allocating them inside tmpfs, who will know how to
> unpresent the virtual addresses when we want to swap the page?

Yes, you're right, tmpfs wouldn't be helping you at all with that problem,
so doubtful whether it has any help to offer here.

Hugh

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
@ 2009-05-06 17:11                   ` Hugh Dickins
  0 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-06 17:11 UTC (permalink / raw)
  To: Izik Eidus
  Cc: Andrea Arcangeli, Rik van Riel, akpm, linux-kernel, chrisw, alan,
	device, linux-mm, nickpiggin

On Wed, 6 May 2009, Izik Eidus wrote:
> Hugh Dickins wrote:
> >
> > There may prove to be various reasons why it wouldn't work out in practice;
> > but when thinking of swapping them, it is worth considering if those KSM
> > pages can just be assigned to a tmpfs file, then leave the swapping to that.
> 
> The problem here (as i see it) is reverse mapping for this vmas that point
> into the shared page.
> Right now linux use the ->index to find this pages and then unpresent them...
> But even if we move into allocating them inside tmpfs, who will know how to
> unpresent the virtual addresses when we want to swap the page?

Yes, you're right, tmpfs wouldn't be helping you at all with that problem,
so doubtful whether it has any help to offer here.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06 16:57                       ` Hugh Dickins
@ 2009-05-06 17:47                         ` Andrea Arcangeli
  -1 siblings, 0 replies; 108+ messages in thread
From: Andrea Arcangeli @ 2009-05-06 17:47 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Chris Wright, Izik Eidus, Rik van Riel, akpm, linux-kernel, alan,
	device, linux-mm, nickpiggin

On Wed, May 06, 2009 at 05:57:14PM +0100, Hugh Dickins wrote:
> No need to decide today anyway: let's see how others feel.

It guess is decided today, the length of this thread is enough
regardless of its content. To me madvise looks overdesign but then I'm
also sure KSM is here to stay and it'll soon swap too, so I think it's
no problem at all, if part of it is always present in the VM
core. KSM=N must stay for embedded. Not sure if it worth to keep
KSM=M.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06 17:47                         ` Andrea Arcangeli
  0 siblings, 0 replies; 108+ messages in thread
From: Andrea Arcangeli @ 2009-05-06 17:47 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Chris Wright, Izik Eidus, Rik van Riel, akpm, linux-kernel, alan,
	device, linux-mm, nickpiggin

On Wed, May 06, 2009 at 05:57:14PM +0100, Hugh Dickins wrote:
> No need to decide today anyway: let's see how others feel.

It guess is decided today, the length of this thread is enough
regardless of its content. To me madvise looks overdesign but then I'm
also sure KSM is here to stay and it'll soon swap too, so I think it's
no problem at all, if part of it is always present in the VM
core. KSM=N must stay for embedded. Not sure if it worth to keep
KSM=M.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06 17:09                     ` Chris Wright
@ 2009-05-06 17:54                       ` Hugh Dickins
  -1 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-06 17:54 UTC (permalink / raw)
  To: Chris Wright
  Cc: Izik Eidus, Rik van Riel, akpm, linux-kernel, aarcange, alan,
	device, linux-mm, nickpiggin

On Wed, 6 May 2009, Chris Wright wrote:
> * Hugh Dickins (hugh@veritas.com) wrote:
> > 
> > Is the phrase "covert channel" going to come up somehow?
> 
> There's two (still hand wavy) conerns I see there.  First is the security
> implication: timing writes to see cow and guess the shared data for
> another apps VM_LOCKED region,

Mmm, yes, there's fun to be had there; though I don't see it as having
anything to do with VM_LOCKED, beyond that the paranoid have reason to
place their most anxious data in VM_LOCKED areas.

I'm thinking of an app which prepares pages full of scurrilous rumour,
then waits around looking at its /proc/self/smaps to see if anyone else
is writing stories like that!

> second is just plain old complaints of
> those rt latency sensitive apps that somehow have VM_LOCKED|VM_MERGE
> and complain of COW fault time, probably just "don't do that".

Right.  There are sensitive sites which ought not to configure such
merging on; but I don't think we should disallow merging locked.

Hugh

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06 17:54                       ` Hugh Dickins
  0 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-06 17:54 UTC (permalink / raw)
  To: Chris Wright
  Cc: Izik Eidus, Rik van Riel, akpm, linux-kernel, aarcange, alan,
	device, linux-mm, nickpiggin

On Wed, 6 May 2009, Chris Wright wrote:
> * Hugh Dickins (hugh@veritas.com) wrote:
> > 
> > Is the phrase "covert channel" going to come up somehow?
> 
> There's two (still hand wavy) conerns I see there.  First is the security
> implication: timing writes to see cow and guess the shared data for
> another apps VM_LOCKED region,

Mmm, yes, there's fun to be had there; though I don't see it as having
anything to do with VM_LOCKED, beyond that the paranoid have reason to
place their most anxious data in VM_LOCKED areas.

I'm thinking of an app which prepares pages full of scurrilous rumour,
then waits around looking at its /proc/self/smaps to see if anyone else
is writing stories like that!

> second is just plain old complaints of
> those rt latency sensitive apps that somehow have VM_LOCKED|VM_MERGE
> and complain of COW fault time, probably just "don't do that".

Right.  There are sensitive sites which ought not to configure such
merging on; but I don't think we should disallow merging locked.

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 14:56                       ` Andrea Arcangeli
@ 2009-05-06 23:55                         ` Minchan Kim
  -1 siblings, 0 replies; 108+ messages in thread
From: Minchan Kim @ 2009-05-06 23:55 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Hugh Dickins, Alan Cox, Rik van Riel, Izik Eidus, akpm,
	linux-kernel, chrisw, device, linux-mm, nickpiggin

Hi, Andrea.

On Wed, 6 May 2009 16:56:42 +0200
Andrea Arcangeli <aarcange@redhat.com> wrote:

> On Wed, May 06, 2009 at 03:46:31PM +0100, Hugh Dickins wrote:
> > As I understand it, KSM won't affect the vm_overcommit behaviour at all.
> 
> In short vm_overcommit is a virtual thing, KSM only makes virtual
> takes less physical than before. One issue in KSM that was mentioned
> was the cgroup accounting if you merge two pages in different groups
> but that is kind of a corner case and it'll be handled "somehow" :)
> 
> > The only difference would be in how much memory (mostly lowmem)
> > KSM's own data structures will take up - as usual, the kernel
> > data structures aren't being accounted, but do take up memory.
> 
> Oh yeah, on 32bit systems that would be a problem... That lowmem is
> taken for eacy virtual address scanned. One more reason to still allow
> ksm to all users only selectively through chown/chmod with ioctl or
> sysfs permissions with syscall/madvise. Luckily most systems where ksm
> is used are 64bit. We don't plan to kmap_atomic around the
> rmap_item/tree_item. No ram is allocated in the holes though, so if

Hmm. Don't you consider 32-bit system ?

In http://www.mail-archive.com/kvm@vger.kernel.org/msg13043.html, 
Jared siad, it's also good in embedded system. (but I don't know well his testing environement).
Many embedded system is so I/O bouneded that we can use much CPU time in there. 
I hope this feature will help saving memory in embedded system. 

One more thing about interface. 

Ksm map regions are dynamic characteritic ?
I mean sometime A application calls ioctl(0x800000, 0x10000) and sometime it calls ioctl(0xb7000000, 0x20000);
Of course, It depends on application's behavior. 

For using this feature now, we have to add ioctl and recompile applications.
It means we have to know application internal well and to need source code. 
It would prevent various experiements and easy use.

I want to use this feature without appliation internal knowledge easily. 
Maybe it can be useless without appliation behavior knowledge.
But it will help various application experiments without much knowledge of application and recompile. 

ex) echo 'pid 0x8050000 0x100000' > sysfs or procfs or cgroup. 

Personally, I support cgroup interface but don't have a good idea now. 
It can help fork-like application and we can group same address of KSM range among tasks. 

> there's not a real anonymous page allocated the rmap_item will not be
> allocated either (without requiring pending update ;).
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


-- 
Kinds Regards
Minchan Kim

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
@ 2009-05-06 23:55                         ` Minchan Kim
  0 siblings, 0 replies; 108+ messages in thread
From: Minchan Kim @ 2009-05-06 23:55 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Hugh Dickins, Alan Cox, Rik van Riel, Izik Eidus, akpm,
	linux-kernel, chrisw, device, linux-mm, nickpiggin

Hi, Andrea.

On Wed, 6 May 2009 16:56:42 +0200
Andrea Arcangeli <aarcange@redhat.com> wrote:

> On Wed, May 06, 2009 at 03:46:31PM +0100, Hugh Dickins wrote:
> > As I understand it, KSM won't affect the vm_overcommit behaviour at all.
> 
> In short vm_overcommit is a virtual thing, KSM only makes virtual
> takes less physical than before. One issue in KSM that was mentioned
> was the cgroup accounting if you merge two pages in different groups
> but that is kind of a corner case and it'll be handled "somehow" :)
> 
> > The only difference would be in how much memory (mostly lowmem)
> > KSM's own data structures will take up - as usual, the kernel
> > data structures aren't being accounted, but do take up memory.
> 
> Oh yeah, on 32bit systems that would be a problem... That lowmem is
> taken for eacy virtual address scanned. One more reason to still allow
> ksm to all users only selectively through chown/chmod with ioctl or
> sysfs permissions with syscall/madvise. Luckily most systems where ksm
> is used are 64bit. We don't plan to kmap_atomic around the
> rmap_item/tree_item. No ram is allocated in the holes though, so if

Hmm. Don't you consider 32-bit system ?

In http://www.mail-archive.com/kvm@vger.kernel.org/msg13043.html, 
Jared siad, it's also good in embedded system. (but I don't know well his testing environement).
Many embedded system is so I/O bouneded that we can use much CPU time in there. 
I hope this feature will help saving memory in embedded system. 

One more thing about interface. 

Ksm map regions are dynamic characteritic ?
I mean sometime A application calls ioctl(0x800000, 0x10000) and sometime it calls ioctl(0xb7000000, 0x20000);
Of course, It depends on application's behavior. 

For using this feature now, we have to add ioctl and recompile applications.
It means we have to know application internal well and to need source code. 
It would prevent various experiements and easy use.

I want to use this feature without appliation internal knowledge easily. 
Maybe it can be useless without appliation behavior knowledge.
But it will help various application experiments without much knowledge of application and recompile. 

ex) echo 'pid 0x8050000 0x100000' > sysfs or procfs or cgroup. 

Personally, I support cgroup interface but don't have a good idea now. 
It can help fork-like application and we can group same address of KSM range among tasks. 

> there's not a real anonymous page allocated the rmap_item will not be
> allocated either (without requiring pending update ;).
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


-- 
Kinds Regards
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06 16:58                   ` Izik Eidus
@ 2009-05-06 23:59                     ` Chris Wright
  -1 siblings, 0 replies; 108+ messages in thread
From: Chris Wright @ 2009-05-06 23:59 UTC (permalink / raw)
  To: Izik Eidus
  Cc: Hugh Dickins, Rik van Riel, akpm, linux-kernel, aarcange, chrisw,
	alan, device, linux-mm, nickpiggin

* Izik Eidus (ieidus@redhat.com) wrote:
> Ok, i give up, lets move to madvice(), i will write a patch that move  
> the whole thing into madvice after i finish here something, but that  
> ofcurse only if Andrea agree for the move?

Here's where I left off last time (refreshed against a current mmotm).

It needs to get converted to vma rather than still scanning via slots.
It's got locking issues (I think this can be remedied w/ vma
conversion).  I think the scan list would be ->mm and each ->mm we'd
scan the vma's that are marked VM_MERGEABLE or whatever.

---
 include/asm-generic/mman.h |    2 
 include/linux/ksm.h        |   41 ---------
 include/linux/miscdevice.h |    1 
 mm/ksm.c                   |  205 ++-------------------------------------------
 mm/madvise.c               |   29 ++++++
 5 files changed, 46 insertions(+), 232 deletions(-)

--- a/include/asm-generic/mman.h
+++ b/include/asm-generic/mman.h
@@ -34,6 +34,8 @@
 #define MADV_REMOVE	9		/* remove these pages & resources */
 #define MADV_DONTFORK	10		/* don't inherit across fork */
 #define MADV_DOFORK	11		/* do inherit across fork */
+#define MADV_SHAREABLE	12		/* can share identical pages */
+#define MADV_UNSHAREABLE 13		/* can not share identical pages */
 
 /* compatibility flags */
 #define MAP_FILE	0
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -1,46 +1,9 @@
 #ifndef __LINUX_KSM_H
 #define __LINUX_KSM_H
 
-/*
- * Userspace interface for /dev/ksm - kvm shared memory
- */
-
-#include <linux/types.h>
-#include <linux/ioctl.h>
-
-#define KSM_API_VERSION 1
-
 #define ksm_control_flags_run 1
 
-/* for KSM_REGISTER_MEMORY_REGION */
-struct ksm_memory_region {
-	__u32 npages; /* number of pages to share */
-	__u32 pad;
-	__u64 addr; /* the begining of the virtual address */
-	__u64 reserved_bits;
-};
-
-#define KSMIO 0xAB
-
-/* ioctls for /dev/ksm */
-
-#define KSM_GET_API_VERSION              _IO(KSMIO,   0x00)
-/*
- * KSM_CREATE_SHARED_MEMORY_AREA - create the shared memory reagion fd
- */
-#define KSM_CREATE_SHARED_MEMORY_AREA    _IO(KSMIO,   0x01) /* return SMA fd */
-
-/* ioctls for SMA fds */
-
-/*
- * KSM_REGISTER_MEMORY_REGION - register virtual address memory area to be
- * scanned by kvm.
- */
-#define KSM_REGISTER_MEMORY_REGION       _IOW(KSMIO,  0x20,\
-					      struct ksm_memory_region)
-/*
- * KSM_REMOVE_MEMORY_REGION - remove virtual address memory area from ksm.
- */
-#define KSM_REMOVE_MEMORY_REGION         _IO(KSMIO,   0x21)
+long ksm_register_memory(struct vm_area_struct *, unsigned long, unsigned long);
+long ksm_unregister_memory(struct vm_area_struct *, unsigned long, unsigned long);
 
 #endif
--- a/include/linux/miscdevice.h
+++ b/include/linux/miscdevice.h
@@ -30,7 +30,6 @@
 #define HPET_MINOR		228
 #define FUSE_MINOR		229
 #define KVM_MINOR		232
-#define KSM_MINOR		234
 #define MISC_DYNAMIC_MINOR	255
 
 struct device;
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -17,7 +17,6 @@
 #include <linux/errno.h>
 #include <linux/mm.h>
 #include <linux/fs.h>
-#include <linux/miscdevice.h>
 #include <linux/vmalloc.h>
 #include <linux/file.h>
 #include <linux/mman.h>
@@ -58,21 +57,11 @@ module_param(regions_per_fd, int, 0);
  */
 struct ksm_mem_slot {
 	struct list_head link;
-	struct list_head sma_link;
 	struct mm_struct *mm;
 	unsigned long addr;	/* the begining of the virtual address */
 	unsigned npages;	/* number of pages to share */
 };
 
-/*
- * ksm_sma - shared memory area, each process have its own sma that contain the
- * information about the slots that it own
- */
-struct ksm_sma {
-	struct list_head sma_slots;
-	int nregions;
-};
-
 /**
  * struct ksm_scan - cursor for scanning
  * @slot_index: the current slot we are scanning
@@ -459,85 +448,35 @@ static inline int is_intersecting_addres
 	return 0;
 }
 
-/*
- * is_overlap_mem - check if there is overlapping with memory that was already
- * registred.
- *
- * note - this function must to be called under slots_lock
- */
-static int is_overlap_mem(struct ksm_memory_region *mem)
-{
-	struct ksm_mem_slot *slot;
-
-	list_for_each_entry(slot, &slots, link) {
-		unsigned long mem_end;
-		unsigned long slot_end;
-
-		cond_resched();
-
-		if (current->mm != slot->mm)
-			continue;
-
-		mem_end = mem->addr + (unsigned long)mem->npages * PAGE_SIZE;
-		slot_end = slot->addr + (unsigned long)slot->npages * PAGE_SIZE;
-
-		if (is_intersecting_address(mem->addr, slot->addr, slot_end) ||
-		    is_intersecting_address(mem_end - 1, slot->addr, slot_end))
-			return 1;
-		if (is_intersecting_address(slot->addr, mem->addr, mem_end) ||
-		    is_intersecting_address(slot_end - 1, mem->addr, mem_end))
-			return 1;
-	}
-
-	return 0;
-}
+long ksm_register_memory(struct vm_area_struct *vma, unsigned long start,
+			 unsigned long end)
 
-static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
-						struct ksm_memory_region *mem)
 {
 	struct ksm_mem_slot *slot;
+	int npages = (end - start) >> PAGE_SHIFT;
 	int ret = -EPERM;
 
-	if (!mem->npages)
-		goto out;
-
-	down_write(&slots_lock);
-
-	if ((ksm_sma->nregions + 1) > regions_per_fd) {
-		ret = -EBUSY;
-		goto out_unlock;
-	}
-
-	if (is_overlap_mem(mem))
-		goto out_unlock;
-
 	slot = kzalloc(sizeof(struct ksm_mem_slot), GFP_KERNEL);
 	if (!slot) {
 		ret = -ENOMEM;
-		goto out_unlock;
+		goto out;
 	}
 
-	/*
-	 * We will hold refernce to the task_mm untill the file descriptor
-	 * will be closed, or KSM_REMOVE_MEMORY_REGION will be called.
-	 */
 	slot->mm = get_task_mm(current);
 	if (!slot->mm)
 		goto out_free;
-	slot->addr = mem->addr;
-	slot->npages = mem->npages;
+	slot->addr = start;
+	slot->npages = npages;
+
+	down_write(&slots_lock);
 
 	list_add_tail(&slot->link, &slots);
-	list_add_tail(&slot->sma_link, &ksm_sma->sma_slots);
-	ksm_sma->nregions++;
 
 	up_write(&slots_lock);
 	return 0;
 
 out_free:
 	kfree(slot);
-out_unlock:
-	up_write(&slots_lock);
 out:
 	return ret;
 }
@@ -554,20 +493,18 @@ static void remove_slot_from_hash_and_tr
 	list_del(&slot->link);
 }
 
-static int ksm_sma_ioctl_remove_memory_region(struct ksm_sma *ksm_sma,
-					      unsigned long addr)
+long ksm_unregister_memory(struct vm_area_struct *vma, unsigned long start,
+			   unsigned long end)
 {
 	int ret = -EFAULT;
 	struct ksm_mem_slot *slot, *node;
 
 	down_write(&slots_lock);
-	list_for_each_entry_safe(slot, node, &ksm_sma->sma_slots, sma_link) {
-		if (addr == slot->addr) {
+	list_for_each_entry_safe(slot, node, &slots, link) {
+		if (vma->vm_mm == slot->mm && start == slot->addr) {
 			remove_slot_from_hash_and_tree(slot);
 			mmput(slot->mm);
-			list_del(&slot->sma_link);
 			kfree(slot);
-			ksm_sma->nregions--;
 			ret = 0;
 		}
 	}
@@ -575,52 +512,6 @@ static int ksm_sma_ioctl_remove_memory_r
 	return ret;
 }
 
-static int ksm_sma_release(struct inode *inode, struct file *filp)
-{
-	struct ksm_mem_slot *slot, *node;
-	struct ksm_sma *ksm_sma = filp->private_data;
-
-	down_write(&slots_lock);
-	list_for_each_entry_safe(slot, node, &ksm_sma->sma_slots, sma_link) {
-		remove_slot_from_hash_and_tree(slot);
-		mmput(slot->mm);
-		list_del(&slot->sma_link);
-		kfree(slot);
-	}
-	up_write(&slots_lock);
-
-	kfree(ksm_sma);
-	return 0;
-}
-
-static long ksm_sma_ioctl(struct file *filp,
-			  unsigned int ioctl, unsigned long arg)
-{
-	struct ksm_sma *sma = filp->private_data;
-	void __user *argp = (void __user *)arg;
-	int r = EINVAL;
-
-	switch (ioctl) {
-	case KSM_REGISTER_MEMORY_REGION: {
-		struct ksm_memory_region ksm_memory_region;
-
-		r = -EFAULT;
-		if (copy_from_user(&ksm_memory_region, argp,
-				   sizeof(ksm_memory_region)))
-			goto out;
-		r = ksm_sma_ioctl_register_memory_region(sma,
-							 &ksm_memory_region);
-		break;
-	}
-	case KSM_REMOVE_MEMORY_REGION:
-		r = ksm_sma_ioctl_remove_memory_region(sma, arg);
-		break;
-	}
-
-out:
-	return r;
-}
-
 static int memcmp_pages(struct page *page1, struct page *page2)
 {
 	char *addr1, *addr2;
@@ -1437,67 +1328,6 @@ out:
 	return ret;
 }
 
-static const struct file_operations ksm_sma_fops = {
-	.release        = ksm_sma_release,
-	.unlocked_ioctl = ksm_sma_ioctl,
-	.compat_ioctl   = ksm_sma_ioctl,
-};
-
-static int ksm_dev_ioctl_create_shared_memory_area(void)
-{
-	int fd = -1;
-	struct ksm_sma *ksm_sma;
-
-	ksm_sma = kmalloc(sizeof(struct ksm_sma), GFP_KERNEL);
-	if (!ksm_sma) {
-		fd = -ENOMEM;
-		goto out;
-	}
-
-	INIT_LIST_HEAD(&ksm_sma->sma_slots);
-	ksm_sma->nregions = 0;
-
-	fd = anon_inode_getfd("ksm-sma", &ksm_sma_fops, ksm_sma, 0);
-	if (fd < 0)
-		goto out_free;
-
-	return fd;
-out_free:
-	kfree(ksm_sma);
-out:
-	return fd;
-}
-
-static long ksm_dev_ioctl(struct file *filp,
-			  unsigned int ioctl, unsigned long arg)
-{
-	long r = -EINVAL;
-
-	switch (ioctl) {
-	case KSM_GET_API_VERSION:
-		r = KSM_API_VERSION;
-		break;
-	case KSM_CREATE_SHARED_MEMORY_AREA:
-		r = ksm_dev_ioctl_create_shared_memory_area();
-		break;
-	default:
-		break;
-	}
-	return r;
-}
-
-static const struct file_operations ksm_chardev_ops = {
-	.unlocked_ioctl = ksm_dev_ioctl,
-	.compat_ioctl   = ksm_dev_ioctl,
-	.owner          = THIS_MODULE,
-};
-
-static struct miscdevice ksm_dev = {
-	KSM_MINOR,
-	"ksm",
-	&ksm_chardev_ops,
-};
-
 int ksm_scan_thread(void *nothing)
 {
 	while (!kthread_should_stop()) {
@@ -1708,23 +1538,15 @@ static int __init ksm_init(void)
 		goto out_free2;
 	}
 
-	r = misc_register(&ksm_dev);
-	if (r) {
-		printk(KERN_ERR "ksm: misc device register failed\n");
-		goto out_free3;
-	}
-
 	r = sysfs_create_group(mm_kobj, &ksm_attr_group);
 	if (r) {
 		printk(KERN_ERR "ksm: register sysfs failed\n");
-		goto out_free4;
+		goto out_free3;
 	}
 
 	printk(KERN_WARNING "ksm loaded\n");
 	return 0;
 
-out_free4:
-	misc_deregister(&ksm_dev);
 out_free3:
 	kthread_stop(ksm_thread);
 out_free2:
@@ -1738,7 +1560,6 @@ out:
 static void __exit ksm_exit(void)
 {
 	sysfs_remove_group(mm_kobj, &ksm_attr_group);
-	misc_deregister(&ksm_dev);
 	ksmd_flags = ksm_control_flags_run;
 	kthread_stop(ksm_thread);
 	rmap_hash_free();
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -11,6 +11,7 @@
 #include <linux/mempolicy.h>
 #include <linux/hugetlb.h>
 #include <linux/sched.h>
+#include <linux/ksm.h>
 
 /*
  * Any behaviour which results in changes to the vma->vm_flags needs to
@@ -23,6 +24,8 @@ static int madvise_need_mmap_write(int b
 	case MADV_REMOVE:
 	case MADV_WILLNEED:
 	case MADV_DONTNEED:
+	case MADV_SHAREABLE:
+	case MADV_UNSHAREABLE:
 		return 0;
 	default:
 		/* be safe, default to 1. list exceptions explicitly */
@@ -207,6 +210,26 @@ static long madvise_remove(struct vm_are
 	return error;
 }
 
+/*
+ * Application allows pages to be shared with other pages of identical
+ * content.
+ *
+ */
+static long madvise_shareable(struct vm_area_struct *vma,
+				struct vm_area_struct **prev,
+				unsigned long start, unsigned long end,
+				int behavior)
+{
+	switch (behavior) {
+	case MADV_SHAREABLE:
+		return ksm_register_memory(vma, start, end);
+	case MADV_UNSHAREABLE:
+		return ksm_unregister_memory(vma, start, end);
+	default:
+		return -EINVAL;
+	}
+}
+
 static long
 madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev,
 		unsigned long start, unsigned long end, int behavior)
@@ -237,6 +260,10 @@ madvise_vma(struct vm_area_struct *vma, 
 		error = madvise_dontneed(vma, prev, start, end);
 		break;
 
+	case MADV_SHAREABLE:
+	case MADV_UNSHAREABLE:
+		error = madvise_shareable(vma, prev, start, end, behavior);
+		break;
 	default:
 		error = -EINVAL;
 		break;
@@ -268,6 +295,8 @@ madvise_vma(struct vm_area_struct *vma, 
  *		so the kernel can free resources associated with it.
  *  MADV_REMOVE - the application wants to free up the given range of
  *		pages and associated backing store.
+ *  MADV_SHAREABLE - the application agrees that pages in the given
+ *		range can be shared w/ other pages of identical content.
  *
  * return values:
  *  zero    - success

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-06 23:59                     ` Chris Wright
  0 siblings, 0 replies; 108+ messages in thread
From: Chris Wright @ 2009-05-06 23:59 UTC (permalink / raw)
  To: Izik Eidus
  Cc: Hugh Dickins, Rik van Riel, akpm, linux-kernel, aarcange, chrisw,
	alan, device, linux-mm, nickpiggin

* Izik Eidus (ieidus@redhat.com) wrote:
> Ok, i give up, lets move to madvice(), i will write a patch that move  
> the whole thing into madvice after i finish here something, but that  
> ofcurse only if Andrea agree for the move?

Here's where I left off last time (refreshed against a current mmotm).

It needs to get converted to vma rather than still scanning via slots.
It's got locking issues (I think this can be remedied w/ vma
conversion).  I think the scan list would be ->mm and each ->mm we'd
scan the vma's that are marked VM_MERGEABLE or whatever.

---
 include/asm-generic/mman.h |    2 
 include/linux/ksm.h        |   41 ---------
 include/linux/miscdevice.h |    1 
 mm/ksm.c                   |  205 ++-------------------------------------------
 mm/madvise.c               |   29 ++++++
 5 files changed, 46 insertions(+), 232 deletions(-)

--- a/include/asm-generic/mman.h
+++ b/include/asm-generic/mman.h
@@ -34,6 +34,8 @@
 #define MADV_REMOVE	9		/* remove these pages & resources */
 #define MADV_DONTFORK	10		/* don't inherit across fork */
 #define MADV_DOFORK	11		/* do inherit across fork */
+#define MADV_SHAREABLE	12		/* can share identical pages */
+#define MADV_UNSHAREABLE 13		/* can not share identical pages */
 
 /* compatibility flags */
 #define MAP_FILE	0
--- a/include/linux/ksm.h
+++ b/include/linux/ksm.h
@@ -1,46 +1,9 @@
 #ifndef __LINUX_KSM_H
 #define __LINUX_KSM_H
 
-/*
- * Userspace interface for /dev/ksm - kvm shared memory
- */
-
-#include <linux/types.h>
-#include <linux/ioctl.h>
-
-#define KSM_API_VERSION 1
-
 #define ksm_control_flags_run 1
 
-/* for KSM_REGISTER_MEMORY_REGION */
-struct ksm_memory_region {
-	__u32 npages; /* number of pages to share */
-	__u32 pad;
-	__u64 addr; /* the begining of the virtual address */
-	__u64 reserved_bits;
-};
-
-#define KSMIO 0xAB
-
-/* ioctls for /dev/ksm */
-
-#define KSM_GET_API_VERSION              _IO(KSMIO,   0x00)
-/*
- * KSM_CREATE_SHARED_MEMORY_AREA - create the shared memory reagion fd
- */
-#define KSM_CREATE_SHARED_MEMORY_AREA    _IO(KSMIO,   0x01) /* return SMA fd */
-
-/* ioctls for SMA fds */
-
-/*
- * KSM_REGISTER_MEMORY_REGION - register virtual address memory area to be
- * scanned by kvm.
- */
-#define KSM_REGISTER_MEMORY_REGION       _IOW(KSMIO,  0x20,\
-					      struct ksm_memory_region)
-/*
- * KSM_REMOVE_MEMORY_REGION - remove virtual address memory area from ksm.
- */
-#define KSM_REMOVE_MEMORY_REGION         _IO(KSMIO,   0x21)
+long ksm_register_memory(struct vm_area_struct *, unsigned long, unsigned long);
+long ksm_unregister_memory(struct vm_area_struct *, unsigned long, unsigned long);
 
 #endif
--- a/include/linux/miscdevice.h
+++ b/include/linux/miscdevice.h
@@ -30,7 +30,6 @@
 #define HPET_MINOR		228
 #define FUSE_MINOR		229
 #define KVM_MINOR		232
-#define KSM_MINOR		234
 #define MISC_DYNAMIC_MINOR	255
 
 struct device;
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -17,7 +17,6 @@
 #include <linux/errno.h>
 #include <linux/mm.h>
 #include <linux/fs.h>
-#include <linux/miscdevice.h>
 #include <linux/vmalloc.h>
 #include <linux/file.h>
 #include <linux/mman.h>
@@ -58,21 +57,11 @@ module_param(regions_per_fd, int, 0);
  */
 struct ksm_mem_slot {
 	struct list_head link;
-	struct list_head sma_link;
 	struct mm_struct *mm;
 	unsigned long addr;	/* the begining of the virtual address */
 	unsigned npages;	/* number of pages to share */
 };
 
-/*
- * ksm_sma - shared memory area, each process have its own sma that contain the
- * information about the slots that it own
- */
-struct ksm_sma {
-	struct list_head sma_slots;
-	int nregions;
-};
-
 /**
  * struct ksm_scan - cursor for scanning
  * @slot_index: the current slot we are scanning
@@ -459,85 +448,35 @@ static inline int is_intersecting_addres
 	return 0;
 }
 
-/*
- * is_overlap_mem - check if there is overlapping with memory that was already
- * registred.
- *
- * note - this function must to be called under slots_lock
- */
-static int is_overlap_mem(struct ksm_memory_region *mem)
-{
-	struct ksm_mem_slot *slot;
-
-	list_for_each_entry(slot, &slots, link) {
-		unsigned long mem_end;
-		unsigned long slot_end;
-
-		cond_resched();
-
-		if (current->mm != slot->mm)
-			continue;
-
-		mem_end = mem->addr + (unsigned long)mem->npages * PAGE_SIZE;
-		slot_end = slot->addr + (unsigned long)slot->npages * PAGE_SIZE;
-
-		if (is_intersecting_address(mem->addr, slot->addr, slot_end) ||
-		    is_intersecting_address(mem_end - 1, slot->addr, slot_end))
-			return 1;
-		if (is_intersecting_address(slot->addr, mem->addr, mem_end) ||
-		    is_intersecting_address(slot_end - 1, mem->addr, mem_end))
-			return 1;
-	}
-
-	return 0;
-}
+long ksm_register_memory(struct vm_area_struct *vma, unsigned long start,
+			 unsigned long end)
 
-static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
-						struct ksm_memory_region *mem)
 {
 	struct ksm_mem_slot *slot;
+	int npages = (end - start) >> PAGE_SHIFT;
 	int ret = -EPERM;
 
-	if (!mem->npages)
-		goto out;
-
-	down_write(&slots_lock);
-
-	if ((ksm_sma->nregions + 1) > regions_per_fd) {
-		ret = -EBUSY;
-		goto out_unlock;
-	}
-
-	if (is_overlap_mem(mem))
-		goto out_unlock;
-
 	slot = kzalloc(sizeof(struct ksm_mem_slot), GFP_KERNEL);
 	if (!slot) {
 		ret = -ENOMEM;
-		goto out_unlock;
+		goto out;
 	}
 
-	/*
-	 * We will hold refernce to the task_mm untill the file descriptor
-	 * will be closed, or KSM_REMOVE_MEMORY_REGION will be called.
-	 */
 	slot->mm = get_task_mm(current);
 	if (!slot->mm)
 		goto out_free;
-	slot->addr = mem->addr;
-	slot->npages = mem->npages;
+	slot->addr = start;
+	slot->npages = npages;
+
+	down_write(&slots_lock);
 
 	list_add_tail(&slot->link, &slots);
-	list_add_tail(&slot->sma_link, &ksm_sma->sma_slots);
-	ksm_sma->nregions++;
 
 	up_write(&slots_lock);
 	return 0;
 
 out_free:
 	kfree(slot);
-out_unlock:
-	up_write(&slots_lock);
 out:
 	return ret;
 }
@@ -554,20 +493,18 @@ static void remove_slot_from_hash_and_tr
 	list_del(&slot->link);
 }
 
-static int ksm_sma_ioctl_remove_memory_region(struct ksm_sma *ksm_sma,
-					      unsigned long addr)
+long ksm_unregister_memory(struct vm_area_struct *vma, unsigned long start,
+			   unsigned long end)
 {
 	int ret = -EFAULT;
 	struct ksm_mem_slot *slot, *node;
 
 	down_write(&slots_lock);
-	list_for_each_entry_safe(slot, node, &ksm_sma->sma_slots, sma_link) {
-		if (addr == slot->addr) {
+	list_for_each_entry_safe(slot, node, &slots, link) {
+		if (vma->vm_mm == slot->mm && start == slot->addr) {
 			remove_slot_from_hash_and_tree(slot);
 			mmput(slot->mm);
-			list_del(&slot->sma_link);
 			kfree(slot);
-			ksm_sma->nregions--;
 			ret = 0;
 		}
 	}
@@ -575,52 +512,6 @@ static int ksm_sma_ioctl_remove_memory_r
 	return ret;
 }
 
-static int ksm_sma_release(struct inode *inode, struct file *filp)
-{
-	struct ksm_mem_slot *slot, *node;
-	struct ksm_sma *ksm_sma = filp->private_data;
-
-	down_write(&slots_lock);
-	list_for_each_entry_safe(slot, node, &ksm_sma->sma_slots, sma_link) {
-		remove_slot_from_hash_and_tree(slot);
-		mmput(slot->mm);
-		list_del(&slot->sma_link);
-		kfree(slot);
-	}
-	up_write(&slots_lock);
-
-	kfree(ksm_sma);
-	return 0;
-}
-
-static long ksm_sma_ioctl(struct file *filp,
-			  unsigned int ioctl, unsigned long arg)
-{
-	struct ksm_sma *sma = filp->private_data;
-	void __user *argp = (void __user *)arg;
-	int r = EINVAL;
-
-	switch (ioctl) {
-	case KSM_REGISTER_MEMORY_REGION: {
-		struct ksm_memory_region ksm_memory_region;
-
-		r = -EFAULT;
-		if (copy_from_user(&ksm_memory_region, argp,
-				   sizeof(ksm_memory_region)))
-			goto out;
-		r = ksm_sma_ioctl_register_memory_region(sma,
-							 &ksm_memory_region);
-		break;
-	}
-	case KSM_REMOVE_MEMORY_REGION:
-		r = ksm_sma_ioctl_remove_memory_region(sma, arg);
-		break;
-	}
-
-out:
-	return r;
-}
-
 static int memcmp_pages(struct page *page1, struct page *page2)
 {
 	char *addr1, *addr2;
@@ -1437,67 +1328,6 @@ out:
 	return ret;
 }
 
-static const struct file_operations ksm_sma_fops = {
-	.release        = ksm_sma_release,
-	.unlocked_ioctl = ksm_sma_ioctl,
-	.compat_ioctl   = ksm_sma_ioctl,
-};
-
-static int ksm_dev_ioctl_create_shared_memory_area(void)
-{
-	int fd = -1;
-	struct ksm_sma *ksm_sma;
-
-	ksm_sma = kmalloc(sizeof(struct ksm_sma), GFP_KERNEL);
-	if (!ksm_sma) {
-		fd = -ENOMEM;
-		goto out;
-	}
-
-	INIT_LIST_HEAD(&ksm_sma->sma_slots);
-	ksm_sma->nregions = 0;
-
-	fd = anon_inode_getfd("ksm-sma", &ksm_sma_fops, ksm_sma, 0);
-	if (fd < 0)
-		goto out_free;
-
-	return fd;
-out_free:
-	kfree(ksm_sma);
-out:
-	return fd;
-}
-
-static long ksm_dev_ioctl(struct file *filp,
-			  unsigned int ioctl, unsigned long arg)
-{
-	long r = -EINVAL;
-
-	switch (ioctl) {
-	case KSM_GET_API_VERSION:
-		r = KSM_API_VERSION;
-		break;
-	case KSM_CREATE_SHARED_MEMORY_AREA:
-		r = ksm_dev_ioctl_create_shared_memory_area();
-		break;
-	default:
-		break;
-	}
-	return r;
-}
-
-static const struct file_operations ksm_chardev_ops = {
-	.unlocked_ioctl = ksm_dev_ioctl,
-	.compat_ioctl   = ksm_dev_ioctl,
-	.owner          = THIS_MODULE,
-};
-
-static struct miscdevice ksm_dev = {
-	KSM_MINOR,
-	"ksm",
-	&ksm_chardev_ops,
-};
-
 int ksm_scan_thread(void *nothing)
 {
 	while (!kthread_should_stop()) {
@@ -1708,23 +1538,15 @@ static int __init ksm_init(void)
 		goto out_free2;
 	}
 
-	r = misc_register(&ksm_dev);
-	if (r) {
-		printk(KERN_ERR "ksm: misc device register failed\n");
-		goto out_free3;
-	}
-
 	r = sysfs_create_group(mm_kobj, &ksm_attr_group);
 	if (r) {
 		printk(KERN_ERR "ksm: register sysfs failed\n");
-		goto out_free4;
+		goto out_free3;
 	}
 
 	printk(KERN_WARNING "ksm loaded\n");
 	return 0;
 
-out_free4:
-	misc_deregister(&ksm_dev);
 out_free3:
 	kthread_stop(ksm_thread);
 out_free2:
@@ -1738,7 +1560,6 @@ out:
 static void __exit ksm_exit(void)
 {
 	sysfs_remove_group(mm_kobj, &ksm_attr_group);
-	misc_deregister(&ksm_dev);
 	ksmd_flags = ksm_control_flags_run;
 	kthread_stop(ksm_thread);
 	rmap_hash_free();
--- a/mm/madvise.c
+++ b/mm/madvise.c
@@ -11,6 +11,7 @@
 #include <linux/mempolicy.h>
 #include <linux/hugetlb.h>
 #include <linux/sched.h>
+#include <linux/ksm.h>
 
 /*
  * Any behaviour which results in changes to the vma->vm_flags needs to
@@ -23,6 +24,8 @@ static int madvise_need_mmap_write(int b
 	case MADV_REMOVE:
 	case MADV_WILLNEED:
 	case MADV_DONTNEED:
+	case MADV_SHAREABLE:
+	case MADV_UNSHAREABLE:
 		return 0;
 	default:
 		/* be safe, default to 1. list exceptions explicitly */
@@ -207,6 +210,26 @@ static long madvise_remove(struct vm_are
 	return error;
 }
 
+/*
+ * Application allows pages to be shared with other pages of identical
+ * content.
+ *
+ */
+static long madvise_shareable(struct vm_area_struct *vma,
+				struct vm_area_struct **prev,
+				unsigned long start, unsigned long end,
+				int behavior)
+{
+	switch (behavior) {
+	case MADV_SHAREABLE:
+		return ksm_register_memory(vma, start, end);
+	case MADV_UNSHAREABLE:
+		return ksm_unregister_memory(vma, start, end);
+	default:
+		return -EINVAL;
+	}
+}
+
 static long
 madvise_vma(struct vm_area_struct *vma, struct vm_area_struct **prev,
 		unsigned long start, unsigned long end, int behavior)
@@ -237,6 +260,10 @@ madvise_vma(struct vm_area_struct *vma, 
 		error = madvise_dontneed(vma, prev, start, end);
 		break;
 
+	case MADV_SHAREABLE:
+	case MADV_UNSHAREABLE:
+		error = madvise_shareable(vma, prev, start, end, behavior);
+		break;
 	default:
 		error = -EINVAL;
 		break;
@@ -268,6 +295,8 @@ madvise_vma(struct vm_area_struct *vma, 
  *		so the kernel can free resources associated with it.
  *  MADV_REMOVE - the application wants to free up the given range of
  *		pages and associated backing store.
+ *  MADV_SHAREABLE - the application agrees that pages in the given
+ *		range can be shared w/ other pages of identical content.
  *
  * return values:
  *  zero    - success

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 23:55                         ` Minchan Kim
@ 2009-05-07  0:19                           ` Chris Wright
  -1 siblings, 0 replies; 108+ messages in thread
From: Chris Wright @ 2009-05-07  0:19 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrea Arcangeli, Hugh Dickins, Alan Cox, Rik van Riel,
	Izik Eidus, akpm, linux-kernel, chrisw, device, linux-mm,
	nickpiggin

* Minchan Kim (minchan.kim@gmail.com) wrote:
> I want to use this feature without appliation internal knowledge easily. 
> Maybe it can be useless without appliation behavior knowledge.
> But it will help various application experiments without much knowledge of application and recompile. 
> 
> ex) echo 'pid 0x8050000 0x100000' > sysfs or procfs or cgroup. 

Yes, this is common request.  Izik made some changes to enable a pid
based registration, just not been cleaned up and made available yet.

thanks,
-chris

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
@ 2009-05-07  0:19                           ` Chris Wright
  0 siblings, 0 replies; 108+ messages in thread
From: Chris Wright @ 2009-05-07  0:19 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrea Arcangeli, Hugh Dickins, Alan Cox, Rik van Riel,
	Izik Eidus, akpm, linux-kernel, chrisw, device, linux-mm,
	nickpiggin

* Minchan Kim (minchan.kim@gmail.com) wrote:
> I want to use this feature without appliation internal knowledge easily. 
> Maybe it can be useless without appliation behavior knowledge.
> But it will help various application experiments without much knowledge of application and recompile. 
> 
> ex) echo 'pid 0x8050000 0x100000' > sysfs or procfs or cgroup. 

Yes, this is common request.  Izik made some changes to enable a pid
based registration, just not been cleaned up and made available yet.

thanks,
-chris

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06 23:59                     ` Chris Wright
@ 2009-05-07  2:41                       ` Rik van Riel
  -1 siblings, 0 replies; 108+ messages in thread
From: Rik van Riel @ 2009-05-07  2:41 UTC (permalink / raw)
  To: Chris Wright
  Cc: Izik Eidus, Hugh Dickins, akpm, linux-kernel, aarcange, chrisw,
	alan, device, linux-mm, nickpiggin

On Wed, 6 May 2009 16:59:49 -0700
Chris Wright <chrisw@redhat.com> wrote:

> * Izik Eidus (ieidus@redhat.com) wrote:
> > Ok, i give up, lets move to madvice(), i will write a patch that
> > move the whole thing into madvice after i finish here something,
> > but that ofcurse only if Andrea agree for the move?
> 
> Here's where I left off last time (refreshed against a current mmotm).
> 
> It needs to get converted to vma rather than still scanning via slots.
> It's got locking issues (I think this can be remedied w/ vma
> conversion).  I think the scan list would be ->mm and each ->mm we'd
> scan the vma's that are marked VM_MERGEABLE or whatever.

Doing that kind of scan would be useful for other reasons,
too.

For example, it is not uncommon for large database systems
to end up having half of system memory in page tables
occasionally, which can drive the system to swapping.

Reclaiming some of those (file pte only) page tables would
be a relatively simple thing to do and could really save
such systems from the occasional swap disaster.

The subsequent minor faults would be expensive, but not
nearly as badly as swap disk IO...

-- 
All rights reversed.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-07  2:41                       ` Rik van Riel
  0 siblings, 0 replies; 108+ messages in thread
From: Rik van Riel @ 2009-05-07  2:41 UTC (permalink / raw)
  To: Chris Wright
  Cc: Izik Eidus, Hugh Dickins, akpm, linux-kernel, aarcange, alan,
	device, linux-mm, nickpiggin

On Wed, 6 May 2009 16:59:49 -0700
Chris Wright <chrisw@redhat.com> wrote:

> * Izik Eidus (ieidus@redhat.com) wrote:
> > Ok, i give up, lets move to madvice(), i will write a patch that
> > move the whole thing into madvice after i finish here something,
> > but that ofcurse only if Andrea agree for the move?
> 
> Here's where I left off last time (refreshed against a current mmotm).
> 
> It needs to get converted to vma rather than still scanning via slots.
> It's got locking issues (I think this can be remedied w/ vma
> conversion).  I think the scan list would be ->mm and each ->mm we'd
> scan the vma's that are marked VM_MERGEABLE or whatever.

Doing that kind of scan would be useful for other reasons,
too.

For example, it is not uncommon for large database systems
to end up having half of system memory in page tables
occasionally, which can drive the system to swapping.

Reclaiming some of those (file pte only) page tables would
be a relatively simple thing to do and could really save
such systems from the occasional swap disaster.

The subsequent minor faults would be expensive, but not
nearly as badly as swap disk IO...

-- 
All rights reversed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-06 23:55                         ` Minchan Kim
@ 2009-05-07 10:46                           ` Andrea Arcangeli
  -1 siblings, 0 replies; 108+ messages in thread
From: Andrea Arcangeli @ 2009-05-07 10:46 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hugh Dickins, Alan Cox, Rik van Riel, Izik Eidus, akpm,
	linux-kernel, chrisw, device, linux-mm, nickpiggin

On Thu, May 07, 2009 at 08:55:47AM +0900, Minchan Kim wrote:
> Hmm. Don't you consider 32-bit system ?

Sorry I was too short, don't worry, I meant hugemem 32bit systems,
like 32G. If there's not much highmem, no problem can ever
happen. Just like pagetables had to be moved to highmem on 32G 32bit
systems to make them workable, KSM on those systems may generate lots
of lowmem and triggering early OOM conditions when allocating inodes
or other slab objects etc... and we don't plan to move those
rmap_items that represents physical pages by the chain of the virtual
addresses that maps them in highmem.

> Many embedded system is so I/O bouneded that we can use much CPU time in there. 

Embedded systems with >4G of ram should run 64bit these days, so I
don't see a problem.

> I hope this feature will help saving memory in embedded system. 

It will (assuming that there are apps that are duplicating anonymous
memory of course ;).

> One more thing about interface. 
> 
> Ksm map regions are dynamic characteritic ?
> I mean sometime A application calls ioctl(0x800000, 0x10000) and sometime it calls ioctl(0xb7000000, 0x20000);
> Of course, It depends on application's behavior. 

Looks like the ioctl API is going away in favour of madvise so it'll
function like madvise, if you munmap the region the KSM registration
will go away.

> ex) echo 'pid 0x8050000 0x100000' > sysfs or procfs or cgroup. 

This was answered by Chris, and surely this is feasible, as it is
feasible for kksmd to scan the whole system regardless of any
madvise. Some sysfs mangling should allow it.

However regardless of the highmem issue (this applies to 64bit systems
too) you've to keep in mind that for kksmd to keep track all pages
under scan it has to build rbtree and allocate rmap_items and
tree_items for each page tracked, those objects take some memory, so
if there's not much ram sharing you may waste more memory in the kksmd
allocations than in the amount of memory actually freed by KSM. This
is why it's better to selectively only register ranges that we know in
advance there's an high probability to free memory.

Thanks!
Andrea

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
@ 2009-05-07 10:46                           ` Andrea Arcangeli
  0 siblings, 0 replies; 108+ messages in thread
From: Andrea Arcangeli @ 2009-05-07 10:46 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Hugh Dickins, Alan Cox, Rik van Riel, Izik Eidus, akpm,
	linux-kernel, chrisw, device, linux-mm, nickpiggin

On Thu, May 07, 2009 at 08:55:47AM +0900, Minchan Kim wrote:
> Hmm. Don't you consider 32-bit system ?

Sorry I was too short, don't worry, I meant hugemem 32bit systems,
like 32G. If there's not much highmem, no problem can ever
happen. Just like pagetables had to be moved to highmem on 32G 32bit
systems to make them workable, KSM on those systems may generate lots
of lowmem and triggering early OOM conditions when allocating inodes
or other slab objects etc... and we don't plan to move those
rmap_items that represents physical pages by the chain of the virtual
addresses that maps them in highmem.

> Many embedded system is so I/O bouneded that we can use much CPU time in there. 

Embedded systems with >4G of ram should run 64bit these days, so I
don't see a problem.

> I hope this feature will help saving memory in embedded system. 

It will (assuming that there are apps that are duplicating anonymous
memory of course ;).

> One more thing about interface. 
> 
> Ksm map regions are dynamic characteritic ?
> I mean sometime A application calls ioctl(0x800000, 0x10000) and sometime it calls ioctl(0xb7000000, 0x20000);
> Of course, It depends on application's behavior. 

Looks like the ioctl API is going away in favour of madvise so it'll
function like madvise, if you munmap the region the KSM registration
will go away.

> ex) echo 'pid 0x8050000 0x100000' > sysfs or procfs or cgroup. 

This was answered by Chris, and surely this is feasible, as it is
feasible for kksmd to scan the whole system regardless of any
madvise. Some sysfs mangling should allow it.

However regardless of the highmem issue (this applies to 64bit systems
too) you've to keep in mind that for kksmd to keep track all pages
under scan it has to build rbtree and allocate rmap_items and
tree_items for each page tracked, those objects take some memory, so
if there's not much ram sharing you may waste more memory in the kksmd
allocations than in the amount of memory actually freed by KSM. This
is why it's better to selectively only register ranges that we know in
advance there's an high probability to free memory.

Thanks!
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-06 16:49                     ` Chris Wright
@ 2009-05-07 11:31                       ` Andrea Arcangeli
  -1 siblings, 0 replies; 108+ messages in thread
From: Andrea Arcangeli @ 2009-05-07 11:31 UTC (permalink / raw)
  To: Chris Wright
  Cc: Hugh Dickins, Izik Eidus, Rik van Riel, akpm, linux-kernel, alan,
	device, linux-mm, nickpiggin

On Wed, May 06, 2009 at 09:49:45AM -0700, Chris Wright wrote:
> I like merge.  For madvise() approach I had used:
> 
> +#define MADV_SHAREABLE 12              /* can share identical pages */
> +#define MADV_UNSHAREABLE 13            /* can not share identical pages
> 
> But those are maybe better put as MADV_(UN)MERGEABLE (gets a bit confusing when
> you talk of merging vmas ;-)
> */

What this thing does is to create shared pages by merging equal pages...

While I don't care about the naming much myself, one problem I have is
that I've been writing a KSM paper for linuxsymposium and I'd like to
use a nomenclature that is in sync with how this stuff should be
called on lkml, to avoid confusion.

So should I change the simpler word KSM with "Memory Merging feature"
all over the paper?

In addition I consistently use the term "shared KSM pages" often,
should I rename all those instances to "merged pages"? I used the word
'merging' only when describing the operation KSM does when it creates
shared pages, but never to name the generated pages themself.

Thanks,
Andrea

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-07 11:31                       ` Andrea Arcangeli
  0 siblings, 0 replies; 108+ messages in thread
From: Andrea Arcangeli @ 2009-05-07 11:31 UTC (permalink / raw)
  To: Chris Wright
  Cc: Hugh Dickins, Izik Eidus, Rik van Riel, akpm, linux-kernel, alan,
	device, linux-mm, nickpiggin

On Wed, May 06, 2009 at 09:49:45AM -0700, Chris Wright wrote:
> I like merge.  For madvise() approach I had used:
> 
> +#define MADV_SHAREABLE 12              /* can share identical pages */
> +#define MADV_UNSHAREABLE 13            /* can not share identical pages
> 
> But those are maybe better put as MADV_(UN)MERGEABLE (gets a bit confusing when
> you talk of merging vmas ;-)
> */

What this thing does is to create shared pages by merging equal pages...

While I don't care about the naming much myself, one problem I have is
that I've been writing a KSM paper for linuxsymposium and I'd like to
use a nomenclature that is in sync with how this stuff should be
called on lkml, to avoid confusion.

So should I change the simpler word KSM with "Memory Merging feature"
all over the paper?

In addition I consistently use the term "shared KSM pages" often,
should I rename all those instances to "merged pages"? I used the word
'merging' only when describing the operation KSM does when it creates
shared pages, but never to name the generated pages themself.

Thanks,
Andrea

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses  registrations.
  2009-05-07 10:46                           ` Andrea Arcangeli
@ 2009-05-07 12:01                             ` Minchan Kim
  -1 siblings, 0 replies; 108+ messages in thread
From: Minchan Kim @ 2009-05-07 12:01 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Minchan Kim, Hugh Dickins, Alan Cox, Rik van Riel, Izik Eidus,
	akpm, linux-kernel, chrisw, device, linux-mm, nickpiggin

>> Many embedded system is so I/O bouneded that we can use much CPU time in there.
>
> Embedded systems with >4G of ram should run 64bit these days, so I
> don't see a problem.

What I mean is that many embedded applications don't use so much cpu
time that we can use extra cpu time to scan identical pages for KSM.
:)

>> One more thing about interface.
>>
>> Ksm map regions are dynamic characteritic ?
>> I mean sometime A application calls ioctl(0x800000, 0x10000) and sometime it calls ioctl(0xb7000000, 0x20000);
>> Of course, It depends on application's behavior.
>
> Looks like the ioctl API is going away in favour of madvise so it'll
> function like madvise, if you munmap the region the KSM registration
> will go away.
>
>> ex) echo 'pid 0x8050000 0x100000' > sysfs or procfs or cgroup.
>
> This was answered by Chris, and surely this is feasible, as it is
> feasible for kksmd to scan the whole system regardless of any
> madvise. Some sysfs mangling should allow it.
>
> However regardless of the highmem issue (this applies to 64bit systems
> too) you've to keep in mind that for kksmd to keep track all pages
> under scan it has to build rbtree and allocate rmap_items and
> tree_items for each page tracked, those objects take some memory, so
> if there's not much ram sharing you may waste more memory in the kksmd
> allocations than in the amount of memory actually freed by KSM. This
> is why it's better to selectively only register ranges that we know in
> advance there's an high probability to free memory.

Indeed.

This interface can use for just simple test and profiling.
If it don't add memory pressure and latency, we can use it without
modifying source code.
Unfortunately, it's not in usual case. ;-)

So if KSM can provide profiling information, we can tune easily than now.

ex)
pfn : 0x12, shared [pid 103, vaddr 0x80010000] [pid 201, vaddr 0x800ac000] .....
pfn : 0x301, shared [pid 103, vaddr 0x80020000] [pid 203, vaddr
0x801ac000] .....
...
...

If KSM can provide this profiling information, firstly we try to use
ksm without madive and next we can add madvise call on most like
sharable vma range using profiling data.

> Thanks!
> Andrea
>



-- 
Thanks,
Minchan Kim

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
@ 2009-05-07 12:01                             ` Minchan Kim
  0 siblings, 0 replies; 108+ messages in thread
From: Minchan Kim @ 2009-05-07 12:01 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Minchan Kim, Hugh Dickins, Alan Cox, Rik van Riel, Izik Eidus,
	akpm, linux-kernel, chrisw, device, linux-mm, nickpiggin

>> Many embedded system is so I/O bouneded that we can use much CPU time in there.
>
> Embedded systems with >4G of ram should run 64bit these days, so I
> don't see a problem.

What I mean is that many embedded applications don't use so much cpu
time that we can use extra cpu time to scan identical pages for KSM.
:)

>> One more thing about interface.
>>
>> Ksm map regions are dynamic characteritic ?
>> I mean sometime A application calls ioctl(0x800000, 0x10000) and sometime it calls ioctl(0xb7000000, 0x20000);
>> Of course, It depends on application's behavior.
>
> Looks like the ioctl API is going away in favour of madvise so it'll
> function like madvise, if you munmap the region the KSM registration
> will go away.
>
>> ex) echo 'pid 0x8050000 0x100000' > sysfs or procfs or cgroup.
>
> This was answered by Chris, and surely this is feasible, as it is
> feasible for kksmd to scan the whole system regardless of any
> madvise. Some sysfs mangling should allow it.
>
> However regardless of the highmem issue (this applies to 64bit systems
> too) you've to keep in mind that for kksmd to keep track all pages
> under scan it has to build rbtree and allocate rmap_items and
> tree_items for each page tracked, those objects take some memory, so
> if there's not much ram sharing you may waste more memory in the kksmd
> allocations than in the amount of memory actually freed by KSM. This
> is why it's better to selectively only register ranges that we know in
> advance there's an high probability to free memory.

Indeed.

This interface can use for just simple test and profiling.
If it don't add memory pressure and latency, we can use it without
modifying source code.
Unfortunately, it's not in usual case. ;-)

So if KSM can provide profiling information, we can tune easily than now.

ex)
pfn : 0x12, shared [pid 103, vaddr 0x80010000] [pid 201, vaddr 0x800ac000] .....
pfn : 0x301, shared [pid 103, vaddr 0x80020000] [pid 203, vaddr
0x801ac000] .....
...
...

If KSM can provide this profiling information, firstly we try to use
ksm without madive and next we can add madvise call on most like
sharable vma range using profiling data.

> Thanks!
> Andrea
>



-- 
Thanks,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-07 11:31                       ` Andrea Arcangeli
@ 2009-05-07 13:13                         ` Hugh Dickins
  -1 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-07 13:13 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Chris Wright, Izik Eidus, Rik van Riel, akpm, linux-kernel, alan,
	device, linux-mm, nickpiggin

On Thu, 7 May 2009, Andrea Arcangeli wrote:
> 
> What this thing does is to create shared pages by merging equal pages...
> 
> While I don't care about the naming much myself, one problem I have is
> that I've been writing a KSM paper for linuxsymposium and I'd like to
> use a nomenclature that is in sync with how this stuff should be
> called on lkml, to avoid confusion.

Sorry for sparking this namechange at such a late date,
inflicting such nuisance upon you.

> 
> So should I change the simpler word KSM with "Memory Merging feature"
> all over the paper?

No: "KSM" stands for "Kernel Samepage Merging", doesn't it?
Or maybe someone can devise a better term for the "S" of it.

I think we're all too familiar with "KSM" to want to outlaw that,
just don't dwell too much on the "Kernel Shared Memory" expansion.

> 
> In addition I consistently use the term "shared KSM pages" often,
> should I rename all those instances to "merged pages"? I used the word
> 'merging' only when describing the operation KSM does when it creates
> shared pages, but never to name the generated pages themself.

No, you and your audience and your readers will find it clearest
if you remark on the change in naming upfront, then get on with
describing it all in the way that comes most naturally to you.

But do sprinkle in a few "merged pages", I suggest: perhaps by
the time you deliver it, they'll be coming more naturally to you.

(Actually, "shared KSM pages" makes more sense if that "S" is not
for Shared.  I keep wondering what the two "k"s in kksmd stand for.)

Hugh

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-07 13:13                         ` Hugh Dickins
  0 siblings, 0 replies; 108+ messages in thread
From: Hugh Dickins @ 2009-05-07 13:13 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Chris Wright, Izik Eidus, Rik van Riel, akpm, linux-kernel, alan,
	device, linux-mm, nickpiggin

On Thu, 7 May 2009, Andrea Arcangeli wrote:
> 
> What this thing does is to create shared pages by merging equal pages...
> 
> While I don't care about the naming much myself, one problem I have is
> that I've been writing a KSM paper for linuxsymposium and I'd like to
> use a nomenclature that is in sync with how this stuff should be
> called on lkml, to avoid confusion.

Sorry for sparking this namechange at such a late date,
inflicting such nuisance upon you.

> 
> So should I change the simpler word KSM with "Memory Merging feature"
> all over the paper?

No: "KSM" stands for "Kernel Samepage Merging", doesn't it?
Or maybe someone can devise a better term for the "S" of it.

I think we're all too familiar with "KSM" to want to outlaw that,
just don't dwell too much on the "Kernel Shared Memory" expansion.

> 
> In addition I consistently use the term "shared KSM pages" often,
> should I rename all those instances to "merged pages"? I used the word
> 'merging' only when describing the operation KSM does when it creates
> shared pages, but never to name the generated pages themself.

No, you and your audience and your readers will find it clearest
if you remark on the change in naming upfront, then get on with
describing it all in the way that comes most naturally to you.

But do sprinkle in a few "merged pages", I suggest: perhaps by
the time you deliver it, they'll be coming more naturally to you.

(Actually, "shared KSM pages" makes more sense if that "S" is not
for Shared.  I keep wondering what the two "k"s in kksmd stand for.)

Hugh

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
  2009-05-07 13:13                         ` Hugh Dickins
@ 2009-05-07 13:23                           ` Andrea Arcangeli
  -1 siblings, 0 replies; 108+ messages in thread
From: Andrea Arcangeli @ 2009-05-07 13:23 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Chris Wright, Izik Eidus, Rik van Riel, akpm, linux-kernel, alan,
	device, linux-mm, nickpiggin

On Thu, May 07, 2009 at 02:13:31PM +0100, Hugh Dickins wrote:
> No: "KSM" stands for "Kernel Samepage Merging", doesn't it?
> Or maybe someone can devise a better term for the "S" of it.
> I think we're all too familiar with "KSM" to want to outlaw that,
> just don't dwell too much on the "Kernel Shared Memory" expansion.

So you suggest to keep KSM acronym but meaning Kernel Samepage
Merging instead of Kernel Shared Memory ;).

> > In addition I consistently use the term "shared KSM pages" often,
> > should I rename all those instances to "merged pages"? I used the word
> > 'merging' only when describing the operation KSM does when it creates
> > shared pages, but never to name the generated pages themself.
> 
> No, you and your audience and your readers will find it clearest
> if you remark on the change in naming upfront, then get on with
> describing it all in the way that comes most naturally to you.

Actually that's what I already did yesterday, but then I thought I
shall try to get it in sync to avoid confusion. But if KSM name is
here to stay, I'll stick with the KSM name and perhaps a few to
"merged pages" as you suggested.

> But do sprinkle in a few "merged pages", I suggest: perhaps by
> the time you deliver it, they'll be coming more naturally to you.
> 
> (Actually, "shared KSM pages" makes more sense if that "S" is not

ehehe ;)

> for Shared.  I keep wondering what the two "k"s in kksmd stand for.)

kernel KSM daemon.

^ permalink raw reply	[flat|nested] 108+ messages in thread

* Re: [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl.
@ 2009-05-07 13:23                           ` Andrea Arcangeli
  0 siblings, 0 replies; 108+ messages in thread
From: Andrea Arcangeli @ 2009-05-07 13:23 UTC (permalink / raw)
  To: Hugh Dickins
  Cc: Chris Wright, Izik Eidus, Rik van Riel, akpm, linux-kernel, alan,
	device, linux-mm, nickpiggin

On Thu, May 07, 2009 at 02:13:31PM +0100, Hugh Dickins wrote:
> No: "KSM" stands for "Kernel Samepage Merging", doesn't it?
> Or maybe someone can devise a better term for the "S" of it.
> I think we're all too familiar with "KSM" to want to outlaw that,
> just don't dwell too much on the "Kernel Shared Memory" expansion.

So you suggest to keep KSM acronym but meaning Kernel Samepage
Merging instead of Kernel Shared Memory ;).

> > In addition I consistently use the term "shared KSM pages" often,
> > should I rename all those instances to "merged pages"? I used the word
> > 'merging' only when describing the operation KSM does when it creates
> > shared pages, but never to name the generated pages themself.
> 
> No, you and your audience and your readers will find it clearest
> if you remark on the change in naming upfront, then get on with
> describing it all in the way that comes most naturally to you.

Actually that's what I already did yesterday, but then I thought I
shall try to get it in sync to avoid confusion. But if KSM name is
here to stay, I'll stick with the KSM name and perhaps a few to
"merged pages" as you suggested.

> But do sprinkle in a few "merged pages", I suggest: perhaps by
> the time you deliver it, they'll be coming more naturally to you.
> 
> (Actually, "shared KSM pages" makes more sense if that "S" is not

ehehe ;)

> for Shared.  I keep wondering what the two "k"s in kksmd stand for.)

kernel KSM daemon.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 108+ messages in thread

* [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
  2009-05-02 22:16 ` [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd Izik Eidus
@ 2009-05-02 22:16     ` Izik Eidus
  0 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-02 22:16 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

subjects say it all.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 mm/ksm.c |   58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index d58db6b..982dfff 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -451,21 +451,71 @@ static void remove_page_from_tree(struct mm_struct *mm,
 	remove_rmap_item_from_tree(rmap_item);
 }
 
+static inline int is_intersecting_address(unsigned long addr,
+					  unsigned long begin,
+					  unsigned long end)
+{
+	if (addr >= begin && addr < end)
+		return 1;
+	return 0;
+}
+
+/*
+ * is_overlap_mem - check if there is overlapping with memory that was already
+ * registred.
+ *
+ * note - this function must to be called under slots_lock
+ */
+static int is_overlap_mem(struct ksm_memory_region *mem)
+{
+	struct ksm_mem_slot *slot;
+
+	list_for_each_entry(slot, &slots, link) {
+		unsigned long mem_end;
+		unsigned long slot_end;
+
+		cond_resched();
+
+		if (current->mm != slot->mm)
+			continue;
+
+		mem_end = mem->addr + (unsigned long)mem->npages * PAGE_SIZE;
+		slot_end = slot->addr + (unsigned long)slot->npages * PAGE_SIZE;
+
+		if (is_intersecting_address(mem->addr, slot->addr, slot_end) ||
+		    is_intersecting_address(mem_end - 1, slot->addr, slot_end))
+			return 1;
+		if (is_intersecting_address(slot->addr, mem->addr, mem_end) ||
+		    is_intersecting_address(slot_end - 1, mem->addr, mem_end))
+			return 1;
+	}
+
+	return 0;
+}
+
 static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 						struct ksm_memory_region *mem)
 {
 	struct ksm_mem_slot *slot;
 	int ret = -EPERM;
 
+	if (!mem->npages)
+		goto out;
+
+	down_write(&slots_lock);
+
 	if ((ksm_sma->nregions + 1) > regions_per_fd) {
 		ret = -EBUSY;
-		goto out;
+		goto out_unlock;
 	}
 
+	if (is_overlap_mem(mem))
+		goto out_unlock;
+
 	slot = kzalloc(sizeof(struct ksm_mem_slot), GFP_KERNEL);
 	if (!slot) {
 		ret = -ENOMEM;
-		goto out;
+		goto out_unlock;
 	}
 
 	/*
@@ -478,8 +528,6 @@ static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 	slot->addr = mem->addr;
 	slot->npages = mem->npages;
 
-	down_write(&slots_lock);
-
 	list_add_tail(&slot->link, &slots);
 	list_add_tail(&slot->sma_link, &ksm_sma->sma_slots);
 	ksm_sma->nregions++;
@@ -489,6 +537,8 @@ static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 
 out_free:
 	kfree(slot);
+out_unlock:
+	up_write(&slots_lock);
 out:
 	return ret;
 }
-- 
1.5.6.5


^ permalink raw reply related	[flat|nested] 108+ messages in thread

* [PATCH 2/6] ksm: dont allow overlap memory addresses registrations.
@ 2009-05-02 22:16     ` Izik Eidus
  0 siblings, 0 replies; 108+ messages in thread
From: Izik Eidus @ 2009-05-02 22:16 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, aarcange, chrisw, alan, device, linux-mm, hugh,
	nickpiggin, Izik Eidus

subjects say it all.

Signed-off-by: Izik Eidus <ieidus@redhat.com>
---
 mm/ksm.c |   58 ++++++++++++++++++++++++++++++++++++++++++++++++++++++----
 1 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/mm/ksm.c b/mm/ksm.c
index d58db6b..982dfff 100644
--- a/mm/ksm.c
+++ b/mm/ksm.c
@@ -451,21 +451,71 @@ static void remove_page_from_tree(struct mm_struct *mm,
 	remove_rmap_item_from_tree(rmap_item);
 }
 
+static inline int is_intersecting_address(unsigned long addr,
+					  unsigned long begin,
+					  unsigned long end)
+{
+	if (addr >= begin && addr < end)
+		return 1;
+	return 0;
+}
+
+/*
+ * is_overlap_mem - check if there is overlapping with memory that was already
+ * registred.
+ *
+ * note - this function must to be called under slots_lock
+ */
+static int is_overlap_mem(struct ksm_memory_region *mem)
+{
+	struct ksm_mem_slot *slot;
+
+	list_for_each_entry(slot, &slots, link) {
+		unsigned long mem_end;
+		unsigned long slot_end;
+
+		cond_resched();
+
+		if (current->mm != slot->mm)
+			continue;
+
+		mem_end = mem->addr + (unsigned long)mem->npages * PAGE_SIZE;
+		slot_end = slot->addr + (unsigned long)slot->npages * PAGE_SIZE;
+
+		if (is_intersecting_address(mem->addr, slot->addr, slot_end) ||
+		    is_intersecting_address(mem_end - 1, slot->addr, slot_end))
+			return 1;
+		if (is_intersecting_address(slot->addr, mem->addr, mem_end) ||
+		    is_intersecting_address(slot_end - 1, mem->addr, mem_end))
+			return 1;
+	}
+
+	return 0;
+}
+
 static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 						struct ksm_memory_region *mem)
 {
 	struct ksm_mem_slot *slot;
 	int ret = -EPERM;
 
+	if (!mem->npages)
+		goto out;
+
+	down_write(&slots_lock);
+
 	if ((ksm_sma->nregions + 1) > regions_per_fd) {
 		ret = -EBUSY;
-		goto out;
+		goto out_unlock;
 	}
 
+	if (is_overlap_mem(mem))
+		goto out_unlock;
+
 	slot = kzalloc(sizeof(struct ksm_mem_slot), GFP_KERNEL);
 	if (!slot) {
 		ret = -ENOMEM;
-		goto out;
+		goto out_unlock;
 	}
 
 	/*
@@ -478,8 +528,6 @@ static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 	slot->addr = mem->addr;
 	slot->npages = mem->npages;
 
-	down_write(&slots_lock);
-
 	list_add_tail(&slot->link, &slots);
 	list_add_tail(&slot->sma_link, &ksm_sma->sma_slots);
 	ksm_sma->nregions++;
@@ -489,6 +537,8 @@ static int ksm_sma_ioctl_register_memory_region(struct ksm_sma *ksm_sma,
 
 out_free:
 	kfree(slot);
+out_unlock:
+	up_write(&slots_lock);
 out:
 	return ret;
 }
-- 
1.5.6.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 108+ messages in thread

end of thread, other threads:[~2009-05-07 13:28 UTC | newest]

Thread overview: 108+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2009-05-04 22:25 [PATCH 0/6] ksm changes (v2) Izik Eidus
2009-05-04 22:25 ` Izik Eidus
2009-05-04 22:25 ` [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd Izik Eidus
2009-05-04 22:25   ` Izik Eidus
2009-05-04 22:25   ` [PATCH 2/6] ksm: dont allow overlap memory addresses registrations Izik Eidus
2009-05-04 22:25     ` Izik Eidus
2009-05-04 22:25     ` [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl Izik Eidus
2009-05-04 22:25       ` Izik Eidus
2009-05-04 22:25       ` [PATCH 4/6] ksm: change the prot handling to use the generic helper functions Izik Eidus
2009-05-04 22:25         ` Izik Eidus
2009-05-04 22:25         ` [PATCH 5/6] ksm: build system make it compile for all archs Izik Eidus
2009-05-04 22:25           ` Izik Eidus
2009-05-04 22:25           ` [PATCH 6/6] ksm: use another miscdevice minor number Izik Eidus
2009-05-04 22:25             ` Izik Eidus
2009-05-06  0:55             ` Rik van Riel
2009-05-06  0:55               ` Rik van Riel
2009-05-06  0:54           ` [PATCH 5/6] ksm: build system make it compile for all archs Rik van Riel
2009-05-06  0:54             ` Rik van Riel
2009-05-06  0:54         ` [PATCH 4/6] ksm: change the prot handling to use the generic helper functions Rik van Riel
2009-05-06  0:54           ` Rik van Riel
2009-05-06  0:53       ` [PATCH 3/6] ksm: change the KSM_REMOVE_MEMORY_REGION ioctl Rik van Riel
2009-05-06  0:53         ` Rik van Riel
2009-05-06  8:38         ` Izik Eidus
2009-05-06  8:38           ` Izik Eidus
2009-05-06 11:16           ` Hugh Dickins
2009-05-06 11:16             ` Hugh Dickins
2009-05-06 13:34             ` Andrea Arcangeli
2009-05-06 13:34               ` Andrea Arcangeli
2009-05-06 13:56               ` Izik Eidus
2009-05-06 13:56                 ` Izik Eidus
2009-05-06 16:41                 ` Hugh Dickins
2009-05-06 16:41                   ` Hugh Dickins
2009-05-06 16:49                   ` Chris Wright
2009-05-06 16:49                     ` Chris Wright
2009-05-06 16:57                     ` Hugh Dickins
2009-05-06 16:57                       ` Hugh Dickins
2009-05-06 17:47                       ` Andrea Arcangeli
2009-05-06 17:47                         ` Andrea Arcangeli
2009-05-06 16:59                     ` Izik Eidus
2009-05-06 16:59                       ` Izik Eidus
2009-05-07 11:31                     ` Andrea Arcangeli
2009-05-07 11:31                       ` Andrea Arcangeli
2009-05-07 13:13                       ` Hugh Dickins
2009-05-07 13:13                         ` Hugh Dickins
2009-05-07 13:23                         ` Andrea Arcangeli
2009-05-07 13:23                           ` Andrea Arcangeli
2009-05-06 14:25               ` Hugh Dickins
2009-05-06 14:25                 ` Hugh Dickins
2009-05-06 14:45                 ` Andrea Arcangeli
2009-05-06 14:45                   ` Andrea Arcangeli
2009-05-06 15:36                   ` Chris Wright
2009-05-06 15:36                     ` Chris Wright
2009-05-06 15:27             ` Izik Eidus
2009-05-06 15:27               ` Izik Eidus
2009-05-06 16:14               ` Chris Wright
2009-05-06 16:14                 ` Chris Wright
2009-05-06 16:36                 ` Hugh Dickins
2009-05-06 16:36                   ` Hugh Dickins
2009-05-06 17:09                   ` Chris Wright
2009-05-06 17:09                     ` Chris Wright
2009-05-06 17:54                     ` Hugh Dickins
2009-05-06 17:54                       ` Hugh Dickins
2009-05-06 16:26               ` Hugh Dickins
2009-05-06 16:26                 ` Hugh Dickins
2009-05-06 16:58                 ` Izik Eidus
2009-05-06 16:58                   ` Izik Eidus
2009-05-06 23:59                   ` Chris Wright
2009-05-06 23:59                     ` Chris Wright
2009-05-07  2:41                     ` Rik van Riel
2009-05-07  2:41                       ` Rik van Riel
2009-05-06  0:43     ` [PATCH 2/6] ksm: dont allow overlap memory addresses registrations Rik van Riel
2009-05-06  0:43       ` Rik van Riel
2009-05-06  9:46       ` Izik Eidus
2009-05-06  9:46         ` Izik Eidus
2009-05-06 12:26         ` Rik van Riel
2009-05-06 12:26           ` Rik van Riel
2009-05-06 12:39           ` Izik Eidus
2009-05-06 12:39             ` Izik Eidus
2009-05-06 13:17           ` Andrea Arcangeli
2009-05-06 13:17             ` Andrea Arcangeli
2009-05-06 13:28             ` Hugh Dickins
2009-05-06 13:28               ` Hugh Dickins
2009-05-06 14:02               ` Izik Eidus
2009-05-06 14:02                 ` Izik Eidus
2009-05-06 17:11                 ` Hugh Dickins
2009-05-06 17:11                   ` Hugh Dickins
2009-05-06 14:09               ` Andrea Arcangeli
2009-05-06 14:09                 ` Andrea Arcangeli
2009-05-06 14:21                 ` Alan Cox
2009-05-06 14:21                   ` Alan Cox
2009-05-06 14:46                   ` Hugh Dickins
2009-05-06 14:46                     ` Hugh Dickins
2009-05-06 14:56                     ` Andrea Arcangeli
2009-05-06 14:56                       ` Andrea Arcangeli
2009-05-06 23:55                       ` Minchan Kim
2009-05-06 23:55                         ` Minchan Kim
2009-05-07  0:19                         ` Chris Wright
2009-05-07  0:19                           ` Chris Wright
2009-05-07 10:46                         ` Andrea Arcangeli
2009-05-07 10:46                           ` Andrea Arcangeli
2009-05-07 12:01                           ` Minchan Kim
2009-05-07 12:01                             ` Minchan Kim
2009-05-06 14:57                     ` Izik Eidus
2009-05-06 14:57                       ` Izik Eidus
2009-05-06  0:40   ` [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd Rik van Riel
2009-05-06  0:40     ` Rik van Riel
  -- strict thread matches above, loose matches on Subject: below --
2009-05-02 22:16 [PATCH 0/6] ksm changes Izik Eidus
2009-05-02 22:16 ` [PATCH 1/6] ksm: limiting the num of mem regions user can register per fd Izik Eidus
2009-05-02 22:16   ` [PATCH 2/6] ksm: dont allow overlap memory addresses registrations Izik Eidus
2009-05-02 22:16     ` Izik Eidus

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.