linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [-mm][PATCH 0/4] Add memrlimit controller (v5)
@ 2008-05-21 15:29 Balbir Singh
  2008-05-21 15:29 ` [-mm][PATCH 1/4] Add memrlimit controller documentation (v5) Balbir Singh
                   ` (3 more replies)
  0 siblings, 4 replies; 32+ messages in thread
From: Balbir Singh @ 2008-05-21 15:29 UTC (permalink / raw)
  To: linux-mm
  Cc: Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel,
	Pavel Emelianov, Balbir Singh, Andrew Morton, KAMEZAWA Hiroyuki

This is the fifth version of the address space control patches. These
patches are against 2.6.26-rc2-mm1  and have been tested using KVM in SMP mode,
both on a powerpc box

The goal of this patch is to implement a virtual address space controller
using cgroups. The documentation describes the controller, it's goal and
usage in further details.

Reviews, Comments?

Changelog

Patches [1/4] and [2/4] are unchanged
Patch [3/4] and [4/4] now formally use mmap_sem to protect mm->owner races.

Previous version of this patchset can be found at
http://lwn.net/Articles/282237/

Series

memrlimit-controller-add-documentation.patch
memrlimit-controller-setup.patch
cgroup-add-task-to-mm-owner-callbacks.patch
memrlimit-controller-address-space-accounting-and-control.patch

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [-mm][PATCH 1/4] Add memrlimit controller documentation (v5)
  2008-05-21 15:29 [-mm][PATCH 0/4] Add memrlimit controller (v5) Balbir Singh
@ 2008-05-21 15:29 ` Balbir Singh
  2008-05-22  4:16   ` Andrew Morton
  2008-05-21 15:29 ` [-mm][PATCH 2/4] Setup the memrlimit controller (v5) Balbir Singh
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 32+ messages in thread
From: Balbir Singh @ 2008-05-21 15:29 UTC (permalink / raw)
  To: linux-mm
  Cc: Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel,
	Pavel Emelianov, Balbir Singh, Andrew Morton, KAMEZAWA Hiroyuki



Documentation patch - describes the goals and usage of the memrlimit
controller.


Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 Documentation/controllers/memrlimit.txt |   29 +++++++++++++++++++++++++++++
 1 file changed, 29 insertions(+)

diff -puN /dev/null Documentation/controllers/memrlimit.txt
--- /dev/null	2008-05-16 21:23:36.290004010 +0530
+++ linux-2.6.26-rc2-balbir/Documentation/controllers/memrlimit.txt	2008-05-21 20:53:33.000000000 +0530
@@ -0,0 +1,29 @@
+This controller is enabled by the CONFIG_CGROUP_MEMRLIMIT_CTLR option. Prior
+to reading this documentation please read Documentation/cgroups.txt and
+Documentation/controllers/memory.txt. Several of the principles of this
+controller are similar to the memory resource controller.
+
+This controller framework is designed to be extensible to control any
+memory resource limit with little effort.
+
+This new controller, controls the address space expansion of the tasks
+belonging to a cgroup. Address space control is provided along the same lines as
+RLIMIT_AS control, which is available via getrlimit(2)/setrlimit(2).
+The interface for controlling address space is provided through
+"rlimit.limit_in_bytes". The file is similar to "limit_in_bytes" w.r.t. the user
+interface. Please see section 3 of the memory resource controller documentation
+for more details on how to use the user interface to get and set values.
+
+The "memrlimit.usage_in_bytes" file provides information about the total address
+space usage of the tasks in the cgroup, in bytes.
+
+Advantages of providing this feature
+
+1. Control over virtual address space allows for a cgroup to fail gracefully
+   i.e., via a malloc or mmap failure as compared to OOM kill when no
+   pages can be reclaimed.
+2. It provides better control over how many pages can be swapped out when
+   the cgroup goes over its limit. A badly setup cgroup can cause excessive
+   swapping. Providing control over the address space allocations ensures
+   that the system administrator has control over the total swapping that
+   can take place.
_

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [-mm][PATCH 2/4] Setup the memrlimit controller (v5)
  2008-05-21 15:29 [-mm][PATCH 0/4] Add memrlimit controller (v5) Balbir Singh
  2008-05-21 15:29 ` [-mm][PATCH 1/4] Add memrlimit controller documentation (v5) Balbir Singh
@ 2008-05-21 15:29 ` Balbir Singh
  2008-05-22  4:18   ` Andrew Morton
  2008-06-11 17:10   ` Andrea Righi
  2008-05-21 15:29 ` [-mm][PATCH 3/4] cgroup mm owner callback changes to add task info (v5) Balbir Singh
  2008-05-21 15:30 ` [-mm][PATCH 4/4] Add memrlimit controller accounting and control (v5) Balbir Singh
  3 siblings, 2 replies; 32+ messages in thread
From: Balbir Singh @ 2008-05-21 15:29 UTC (permalink / raw)
  To: linux-mm
  Cc: Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel,
	Pavel Emelianov, Balbir Singh, Andrew Morton, KAMEZAWA Hiroyuki

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 7739 bytes --]


This patch sets up the rlimit cgroup controller. It adds the basic create,
destroy and populate functionality. The user interface provided is very
similar to the memory resource controller. The rlimit controller can be
enhanced easily in the future to control mlocked pages.

Changelog v3->v4

1. Use PAGE_ALIGN()
2. Rename rlimit to memrlimit

Acked-by: Pavel Emelyanov <xemul@openvz.org>
Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 include/linux/cgroup_subsys.h   |    4 +
 include/linux/memrlimitcgroup.h |   19 +++++
 init/Kconfig                    |   10 ++
 mm/Makefile                     |    1 
 mm/memrlimitcgroup.c            |  144 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 178 insertions(+)

diff -puN include/linux/cgroup_subsys.h~memrlimit-controller-setup include/linux/cgroup_subsys.h
--- linux-2.6.26-rc2/include/linux/cgroup_subsys.h~memrlimit-controller-setup	2008-05-21 20:56:50.000000000 +0530
+++ linux-2.6.26-rc2-balbir/include/linux/cgroup_subsys.h	2008-05-21 20:56:50.000000000 +0530
@@ -47,4 +47,8 @@ SUBSYS(mem_cgroup)
 SUBSYS(devices)
 #endif
 
+#ifdef CONFIG_CGROUP_MEMRLIMIT_CTLR
+SUBSYS(memrlimit_cgroup)
+#endif
+
 /* */
diff -puN /dev/null include/linux/memrlimitcgroup.h
--- /dev/null	2008-05-16 21:23:36.290004010 +0530
+++ linux-2.6.26-rc2-balbir/include/linux/memrlimitcgroup.h	2008-05-21 20:56:50.000000000 +0530
@@ -0,0 +1,19 @@
+/*
+ * Copyright © International Business Machines  Corp., 2008
+ *
+ * Author: Balbir Singh <balbir@linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#ifndef LINUX_MEMRLIMITCGROUP_H
+#define LINUX_MEMRLIMITCGROUP_H
+
+#endif /* LINUX_MEMRLIMITCGROUP_H */
diff -puN init/Kconfig~memrlimit-controller-setup init/Kconfig
--- linux-2.6.26-rc2/init/Kconfig~memrlimit-controller-setup	2008-05-21 20:56:50.000000000 +0530
+++ linux-2.6.26-rc2-balbir/init/Kconfig	2008-05-21 20:56:50.000000000 +0530
@@ -407,6 +407,16 @@ config CGROUP_MEM_RES_CTLR
 	  This config option also selects MM_OWNER config option, which
 	  could in turn add some fork/exit overhead.
 
+config CGROUP_MEMRLIMIT_CTLR
+	bool "Memory resource limit controls for cgroups"
+	depends on CGROUPS && RESOURCE_COUNTERS && MMU
+	select MM_OWNER
+	help
+	  Provides resource limits for all the tasks belonging to a
+	  control group. CGROUP_MEM_RES_CTLR provides support for physical
+	  memory RSS and Page Cache control. Virtual address space control
+	  is provided by this controller.
+
 config SYSFS_DEPRECATED
 	bool
 
diff -puN mm/Makefile~memrlimit-controller-setup mm/Makefile
--- linux-2.6.26-rc2/mm/Makefile~memrlimit-controller-setup	2008-05-21 20:56:50.000000000 +0530
+++ linux-2.6.26-rc2-balbir/mm/Makefile	2008-05-21 20:56:50.000000000 +0530
@@ -34,4 +34,5 @@ obj-$(CONFIG_MIGRATION) += migrate.o
 obj-$(CONFIG_SMP) += allocpercpu.o
 obj-$(CONFIG_QUICKLIST) += quicklist.o
 obj-$(CONFIG_CGROUP_MEM_RES_CTLR) += memcontrol.o
+obj-$(CONFIG_CGROUP_MEMRLIMIT_CTLR) += memrlimitcgroup.o
 
diff -puN /dev/null mm/memrlimitcgroup.c
--- /dev/null	2008-05-16 21:23:36.290004010 +0530
+++ linux-2.6.26-rc2-balbir/mm/memrlimitcgroup.c	2008-05-21 20:56:50.000000000 +0530
@@ -0,0 +1,144 @@
+/*
+ * Copyright © International Business Machines  Corp., 2008
+ *
+ * Author: Balbir Singh <balbir@linux.vnet.ibm.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License, or
+ * (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * Provide memory resource limits for tasks in a control group. A lot of code is
+ * duplicated from the memory controller (this code is common to almost
+ * all controllers). TODO: Consider writing a tool that can generate this
+ * code.
+ */
+#include <linux/cgroup.h>
+#include <linux/mm.h>
+#include <linux/smp.h>
+#include <linux/rcupdate.h>
+#include <linux/slab.h>
+#include <linux/swap.h>
+#include <linux/spinlock.h>
+#include <linux/fs.h>
+#include <linux/res_counter.h>
+#include <linux/memrlimitcgroup.h>
+
+struct cgroup_subsys memrlimit_cgroup_subsys;
+
+struct memrlimit_cgroup {
+	struct cgroup_subsys_state css;
+	struct res_counter as_res;	/* address space counter */
+};
+
+static struct memrlimit_cgroup init_memrlimit_cgroup;
+
+static struct memrlimit_cgroup *memrlimit_cgroup_from_cgrp(struct cgroup *cgrp)
+{
+	return container_of(cgroup_subsys_state(cgrp,
+				memrlimit_cgroup_subsys_id),
+				struct memrlimit_cgroup, css);
+}
+
+static struct cgroup_subsys_state *
+memrlimit_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cgrp)
+{
+	struct memrlimit_cgroup *memrcg;
+
+	if (unlikely(cgrp->parent == NULL))
+		memrcg = &init_memrlimit_cgroup;
+	else {
+		memrcg = kzalloc(sizeof(*memrcg), GFP_KERNEL);
+		if (!memrcg)
+			return ERR_PTR(-ENOMEM);
+	}
+	res_counter_init(&memrcg->as_res);
+	return &memrcg->css;
+}
+
+static void memrlimit_cgroup_destroy(struct cgroup_subsys *ss,
+					struct cgroup *cgrp)
+{
+	kfree(memrlimit_cgroup_from_cgrp(cgrp));
+}
+
+static int memrlimit_cgroup_reset(struct cgroup *cgrp, unsigned int event)
+{
+	struct memrlimit_cgroup *memrcg;
+
+	memrcg = memrlimit_cgroup_from_cgrp(cgrp);
+	switch (event) {
+	case RES_FAILCNT:
+		res_counter_reset_failcnt(&memrcg->as_res);
+		break;
+	}
+	return 0;
+}
+
+static u64 memrlimit_cgroup_read(struct cgroup *cgrp, struct cftype *cft)
+{
+	return res_counter_read_u64(&memrlimit_cgroup_from_cgrp(cgrp)->as_res,
+					cft->private);
+}
+
+static int memrlimit_cgroup_write_strategy(char *buf, unsigned long long *tmp)
+{
+	*tmp = memparse(buf, &buf);
+	if (*buf != '\0')
+		return -EINVAL;
+
+	*tmp = PAGE_ALIGN(*tmp);
+	return 0;
+}
+
+static ssize_t memrlimit_cgroup_write(struct cgroup *cgrp, struct cftype *cft,
+					struct file *file,
+					const char __user *userbuf,
+					size_t nbytes,
+					loff_t *ppos)
+{
+	return res_counter_write(&memrlimit_cgroup_from_cgrp(cgrp)->as_res,
+					cft->private, userbuf, nbytes, ppos,
+					memrlimit_cgroup_write_strategy);
+}
+
+static struct cftype memrlimit_cgroup_files[] = {
+	{
+		.name = "usage_in_bytes",
+		.private = RES_USAGE,
+		.read_u64 = memrlimit_cgroup_read,
+	},
+	{
+		.name = "limit_in_bytes",
+		.private = RES_LIMIT,
+		.write = memrlimit_cgroup_write,
+		.read_u64 = memrlimit_cgroup_read,
+	},
+	{
+		.name = "failcnt",
+		.private = RES_FAILCNT,
+		.trigger = memrlimit_cgroup_reset,
+		.read_u64 = memrlimit_cgroup_read,
+	},
+};
+
+static int memrlimit_cgroup_populate(struct cgroup_subsys *ss,
+					struct cgroup *cgrp)
+{
+	return cgroup_add_files(cgrp, ss, memrlimit_cgroup_files,
+				ARRAY_SIZE(memrlimit_cgroup_files));
+}
+
+struct cgroup_subsys memrlimit_cgroup_subsys = {
+	.name = "memrlimit",
+	.subsys_id = memrlimit_cgroup_subsys_id,
+	.create = memrlimit_cgroup_create,
+	.destroy = memrlimit_cgroup_destroy,
+	.populate = memrlimit_cgroup_populate,
+	.early_init = 0,
+};
_

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [-mm][PATCH 3/4] cgroup mm owner callback changes to add task info (v5)
  2008-05-21 15:29 [-mm][PATCH 0/4] Add memrlimit controller (v5) Balbir Singh
  2008-05-21 15:29 ` [-mm][PATCH 1/4] Add memrlimit controller documentation (v5) Balbir Singh
  2008-05-21 15:29 ` [-mm][PATCH 2/4] Setup the memrlimit controller (v5) Balbir Singh
@ 2008-05-21 15:29 ` Balbir Singh
  2008-05-22  4:19   ` Andrew Morton
  2008-05-21 15:30 ` [-mm][PATCH 4/4] Add memrlimit controller accounting and control (v5) Balbir Singh
  3 siblings, 1 reply; 32+ messages in thread
From: Balbir Singh @ 2008-05-21 15:29 UTC (permalink / raw)
  To: linux-mm
  Cc: Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel,
	Pavel Emelianov, Balbir Singh, Andrew Morton, KAMEZAWA Hiroyuki


This patch adds an additional field to the mm_owner callbacks. This field
is required to get to the mm that changed. Hold mmap_sem in write mode
before calling the mm_owner_changed callback

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 include/linux/cgroup.h |    3 ++-
 kernel/cgroup.c        |    4 +++-
 kernel/exit.c          |    3 +++
 3 files changed, 8 insertions(+), 2 deletions(-)

diff -puN include/linux/cgroup.h~cgroup-add-task-to-mm-owner-callbacks include/linux/cgroup.h
--- linux-2.6.26-rc2/include/linux/cgroup.h~cgroup-add-task-to-mm-owner-callbacks	2008-05-21 20:56:54.000000000 +0530
+++ linux-2.6.26-rc2-balbir/include/linux/cgroup.h	2008-05-21 20:56:54.000000000 +0530
@@ -310,7 +310,8 @@ struct cgroup_subsys {
 	 */
 	void (*mm_owner_changed)(struct cgroup_subsys *ss,
 					struct cgroup *old,
-					struct cgroup *new);
+					struct cgroup *new,
+					struct task_struct *p);
 	int subsys_id;
 	int active;
 	int disabled;
diff -puN kernel/cgroup.c~cgroup-add-task-to-mm-owner-callbacks kernel/cgroup.c
--- linux-2.6.26-rc2/kernel/cgroup.c~cgroup-add-task-to-mm-owner-callbacks	2008-05-21 20:56:54.000000000 +0530
+++ linux-2.6.26-rc2-balbir/kernel/cgroup.c	2008-05-21 20:56:54.000000000 +0530
@@ -2758,6 +2758,8 @@ void cgroup_fork_callbacks(struct task_s
  * Called on every change to mm->owner. mm_init_owner() does not
  * invoke this routine, since it assigns the mm->owner the first time
  * and does not change it.
+ *
+ * The callbacks are invoked with mmap_sem held in read mode.
  */
 void cgroup_mm_owner_callbacks(struct task_struct *old, struct task_struct *new)
 {
@@ -2772,7 +2774,7 @@ void cgroup_mm_owner_callbacks(struct ta
 			if (oldcgrp == newcgrp)
 				continue;
 			if (ss->mm_owner_changed)
-				ss->mm_owner_changed(ss, oldcgrp, newcgrp);
+				ss->mm_owner_changed(ss, oldcgrp, newcgrp, new);
 		}
 	}
 }
diff -puN kernel/exit.c~cgroup-add-task-to-mm-owner-callbacks kernel/exit.c
--- linux-2.6.26-rc2/kernel/exit.c~cgroup-add-task-to-mm-owner-callbacks	2008-05-21 20:56:54.000000000 +0530
+++ linux-2.6.26-rc2-balbir/kernel/exit.c	2008-05-21 20:56:54.000000000 +0530
@@ -621,6 +621,7 @@ retry:
 assign_new_owner:
 	BUG_ON(c == p);
 	get_task_struct(c);
+	down_write(&mm->mmap_sem);
 	/*
 	 * The task_lock protects c->mm from changing.
 	 * We always want mm->owner->mm == mm
@@ -634,12 +635,14 @@ assign_new_owner:
 	if (c->mm != mm) {
 		task_unlock(c);
 		put_task_struct(c);
+		up_write(&mm->mmap_sem);
 		goto retry;
 	}
 	cgroup_mm_owner_callbacks(mm->owner, c);
 	mm->owner = c;
 	task_unlock(c);
 	put_task_struct(c);
+	up_write(&mm->mmap_sem);
 }
 #endif /* CONFIG_MM_OWNER */
 
_

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [-mm][PATCH 4/4] Add memrlimit controller accounting and control (v5)
  2008-05-21 15:29 [-mm][PATCH 0/4] Add memrlimit controller (v5) Balbir Singh
                   ` (2 preceding siblings ...)
  2008-05-21 15:29 ` [-mm][PATCH 3/4] cgroup mm owner callback changes to add task info (v5) Balbir Singh
@ 2008-05-21 15:30 ` Balbir Singh
  2008-05-21 17:20   ` Vivek Goyal
  2008-05-22  4:24   ` Andrew Morton
  3 siblings, 2 replies; 32+ messages in thread
From: Balbir Singh @ 2008-05-21 15:30 UTC (permalink / raw)
  To: linux-mm
  Cc: Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf, linux-kernel,
	Pavel Emelianov, Balbir Singh, Andrew Morton, KAMEZAWA Hiroyuki


This patch adds support for accounting and control of virtual address space
limits. The accounting is done via the rlimit_cgroup_(un)charge_as functions.
The core of the accounting takes place during fork time in copy_process(),
may_expand_vm(), remove_vma_list() and exit_mmap(). 

Changelog v5->v4

Move specific hooks in code to insert_vm_struct
Use mmap_sem to protect mm->owner from changing and mm->owner from
changing cgroups.

Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
---

 arch/x86/kernel/ptrace.c        |   18 +++++--
 include/linux/memrlimitcgroup.h |   21 +++++++++
 kernel/fork.c                   |    8 +++
 mm/memrlimitcgroup.c            |   92 ++++++++++++++++++++++++++++++++++++++++
 mm/mmap.c                       |   17 ++++++-
 5 files changed, 149 insertions(+), 7 deletions(-)

diff -puN arch/ia64/kernel/perfmon.c~memrlimit-controller-address-space-accounting-and-control arch/ia64/kernel/perfmon.c
diff -puN arch/x86/kernel/ds.c~memrlimit-controller-address-space-accounting-and-control arch/x86/kernel/ds.c
diff -puN fs/exec.c~memrlimit-controller-address-space-accounting-and-control fs/exec.c
diff -puN include/linux/memrlimitcgroup.h~memrlimit-controller-address-space-accounting-and-control include/linux/memrlimitcgroup.h
--- linux-2.6.26-rc2/include/linux/memrlimitcgroup.h~memrlimit-controller-address-space-accounting-and-control	2008-05-21 20:56:56.000000000 +0530
+++ linux-2.6.26-rc2-balbir/include/linux/memrlimitcgroup.h	2008-05-21 20:56:56.000000000 +0530
@@ -16,4 +16,25 @@
 #ifndef LINUX_MEMRLIMITCGROUP_H
 #define LINUX_MEMRLIMITCGROUP_H
 
+#ifdef CONFIG_CGROUP_MEMRLIMIT_CTLR
+
+int memrlimit_cgroup_charge_as(struct mm_struct *mm, unsigned long nr_pages);
+void memrlimit_cgroup_uncharge_as(struct mm_struct *mm, unsigned long nr_pages);
+
+#else /* !CONFIG_CGROUP_RLIMIT_CTLR */
+
+static inline int
+memrlimit_cgroup_charge_as(struct mm_struct *mm, unsigned long nr_pages)
+{
+	return 0;
+}
+
+static inline void
+memrlimit_cgroup_uncharge_as(struct mm_struct *mm, unsigned long nr_pages)
+{
+}
+
+#endif /* CONFIG_CGROUP_RLIMIT_CTLR */
+
+
 #endif /* LINUX_MEMRLIMITCGROUP_H */
diff -puN kernel/fork.c~memrlimit-controller-address-space-accounting-and-control kernel/fork.c
--- linux-2.6.26-rc2/kernel/fork.c~memrlimit-controller-address-space-accounting-and-control	2008-05-21 20:56:56.000000000 +0530
+++ linux-2.6.26-rc2-balbir/kernel/fork.c	2008-05-21 20:56:56.000000000 +0530
@@ -54,6 +54,7 @@
 #include <linux/tty.h>
 #include <linux/proc_fs.h>
 #include <linux/blkdev.h>
+#include <linux/memrlimitcgroup.h>
 
 #include <asm/pgtable.h>
 #include <asm/pgalloc.h>
@@ -267,6 +268,7 @@ static int dup_mmap(struct mm_struct *mm
 			mm->total_vm -= pages;
 			vm_stat_account(mm, mpnt->vm_flags, mpnt->vm_file,
 								-pages);
+			memrlimit_cgroup_uncharge_as(mm, pages);
 			continue;
 		}
 		charge = 0;
@@ -596,6 +598,12 @@ static int copy_mm(unsigned long clone_f
 		atomic_inc(&oldmm->mm_users);
 		mm = oldmm;
 		goto good_mm;
+	} else {
+		down_write(&oldmm->mmap_sem);
+		retval = memrlimit_cgroup_charge_as(oldmm, oldmm->total_vm);
+		up_write(&oldmm->mmap_sem);
+		if (retval)
+			goto fail_nomem;
 	}
 
 	retval = -ENOMEM;
diff -puN mm/memrlimitcgroup.c~memrlimit-controller-address-space-accounting-and-control mm/memrlimitcgroup.c
--- linux-2.6.26-rc2/mm/memrlimitcgroup.c~memrlimit-controller-address-space-accounting-and-control	2008-05-21 20:56:56.000000000 +0530
+++ linux-2.6.26-rc2-balbir/mm/memrlimitcgroup.c	2008-05-21 20:56:56.000000000 +0530
@@ -45,6 +45,38 @@ static struct memrlimit_cgroup *memrlimi
 				struct memrlimit_cgroup, css);
 }
 
+static struct memrlimit_cgroup *
+memrlimit_cgroup_from_task(struct task_struct *p)
+{
+	return container_of(task_subsys_state(p, memrlimit_cgroup_subsys_id),
+				struct memrlimit_cgroup, css);
+}
+
+/*
+ * Charge the cgroup for address space usage - mmap(), malloc() (through
+ * brk(), sbrk()), stack expansion, mremap(), etc - called with
+ * mmap_sem held.
+ */
+int memrlimit_cgroup_charge_as(struct mm_struct *mm, unsigned long nr_pages)
+{
+	struct memrlimit_cgroup *memrcg;
+
+	memrcg = memrlimit_cgroup_from_task(mm->owner);
+	return res_counter_charge(&memrcg->as_res, (nr_pages << PAGE_SHIFT));
+}
+
+/*
+ * Uncharge the cgroup, as the address space of one of the tasks is
+ * decreasing - called with mmap_sem held.
+ */
+void memrlimit_cgroup_uncharge_as(struct mm_struct *mm, unsigned long nr_pages)
+{
+	struct memrlimit_cgroup *memrcg;
+
+	memrcg = memrlimit_cgroup_from_task(mm->owner);
+	res_counter_uncharge(&memrcg->as_res, (nr_pages << PAGE_SHIFT));
+}
+
 static struct cgroup_subsys_state *
 memrlimit_cgroup_create(struct cgroup_subsys *ss, struct cgroup *cgrp)
 {
@@ -134,11 +166,71 @@ static int memrlimit_cgroup_populate(str
 				ARRAY_SIZE(memrlimit_cgroup_files));
 }
 
+static void memrlimit_cgroup_move_task(struct cgroup_subsys *ss,
+					struct cgroup *cgrp,
+					struct cgroup *old_cgrp,
+					struct task_struct *p)
+{
+	struct mm_struct *mm;
+	struct memrlimit_cgroup *memrcg, *old_memrcg;
+
+	mm = get_task_mm(p);
+	if (mm == NULL)
+		return;
+
+	/*
+	 * Hold mmap_sem, so that total_vm does not change underneath us
+	 */
+	down_read(&mm->mmap_sem);
+
+	rcu_read_lock();
+	if (p != rcu_dereference(mm->owner))
+		goto out;
+
+	memrcg = memrlimit_cgroup_from_cgrp(cgrp);
+	old_memrcg = memrlimit_cgroup_from_cgrp(old_cgrp);
+
+	if (memrcg == old_memrcg)
+		goto out;
+
+	if (res_counter_charge(&memrcg->as_res, (mm->total_vm << PAGE_SHIFT)))
+		goto out;
+	res_counter_uncharge(&old_memrcg->as_res, (mm->total_vm << PAGE_SHIFT));
+out:
+	rcu_read_unlock();
+	up_read(&mm->mmap_sem);
+	mmput(mm);
+}
+
+/*
+ * This callback is called with mmap_sem held
+ */
+static void memrlimit_cgroup_mm_owner_changed(struct cgroup_subsys *ss,
+						struct cgroup *cgrp,
+						struct cgroup *old_cgrp,
+						struct task_struct *p)
+{
+	struct memrlimit_cgroup *memrcg, *old_memrcg;
+	struct mm_struct *mm = get_task_mm(p);
+
+	BUG_ON(!mm);
+	memrcg = memrlimit_cgroup_from_cgrp(cgrp);
+	old_memrcg = memrlimit_cgroup_from_cgrp(old_cgrp);
+
+	if (res_counter_charge(&memrcg->as_res, (mm->total_vm << PAGE_SHIFT)))
+		goto out;
+	res_counter_uncharge(&old_memrcg->as_res, (mm->total_vm << PAGE_SHIFT));
+out:
+	mmput(mm);
+}
+
 struct cgroup_subsys memrlimit_cgroup_subsys = {
 	.name = "memrlimit",
 	.subsys_id = memrlimit_cgroup_subsys_id,
 	.create = memrlimit_cgroup_create,
 	.destroy = memrlimit_cgroup_destroy,
 	.populate = memrlimit_cgroup_populate,
+	.attach = memrlimit_cgroup_move_task,
+	.mm_owner_changed = memrlimit_cgroup_mm_owner_changed,
 	.early_init = 0,
 };
diff -puN mm/mmap.c~memrlimit-controller-address-space-accounting-and-control mm/mmap.c
--- linux-2.6.26-rc2/mm/mmap.c~memrlimit-controller-address-space-accounting-and-control	2008-05-21 20:56:56.000000000 +0530
+++ linux-2.6.26-rc2-balbir/mm/mmap.c	2008-05-21 20:56:56.000000000 +0530
@@ -26,6 +26,7 @@
 #include <linux/mount.h>
 #include <linux/mempolicy.h>
 #include <linux/rmap.h>
+#include <linux/memrlimitcgroup.h>
 
 #include <asm/uaccess.h>
 #include <asm/cacheflush.h>
@@ -1730,6 +1731,7 @@ static void remove_vma_list(struct mm_st
 		long nrpages = vma_pages(vma);
 
 		mm->total_vm -= nrpages;
+		memrlimit_cgroup_uncharge_as(mm, nrpages);
 		if (vma->vm_flags & VM_LOCKED)
 			mm->locked_vm -= nrpages;
 		vm_stat_account(mm, vma->vm_flags, vma->vm_file, -nrpages);
@@ -2056,6 +2058,7 @@ void exit_mmap(struct mm_struct *mm)
 	/* Use -1 here to ensure all VMAs in the mm are unmapped */
 	end = unmap_vmas(&tlb, vma, 0, -1, &nr_accounted, NULL);
 	vm_unacct_memory(nr_accounted);
+	memrlimit_cgroup_uncharge_as(mm, mm->total_vm);
 	free_pgtables(&tlb, vma, FIRST_USER_ADDRESS, 0);
 	tlb_finish_mmu(tlb, 0, end);
 
@@ -2078,6 +2081,9 @@ int insert_vm_struct(struct mm_struct * 
 	struct vm_area_struct * __vma, * prev;
 	struct rb_node ** rb_link, * rb_parent;
 
+	if (memrlimit_cgroup_charge_as(mm, vma_pages(vma)))
+		return -ENOMEM;
+
 	/*
 	 * The vm_pgoff of a purely anonymous vma should be irrelevant
 	 * until its first write fault, when page's anon_vma and index
@@ -2096,12 +2102,15 @@ int insert_vm_struct(struct mm_struct * 
 	}
 	__vma = find_vma_prepare(mm,vma->vm_start,&prev,&rb_link,&rb_parent);
 	if (__vma && __vma->vm_start < vma->vm_end)
-		return -ENOMEM;
+		goto err;
 	if ((vma->vm_flags & VM_ACCOUNT) &&
 	     security_vm_enough_memory_mm(mm, vma_pages(vma)))
-		return -ENOMEM;
+		goto err;
 	vma_link(mm, vma, prev, rb_link, rb_parent);
 	return 0;
+err:
+	memrlimit_cgroup_uncharge_as(mm, vma_pages(vma));
+	return -ENOMEM;
 }
 
 /*
@@ -2174,6 +2183,10 @@ int may_expand_vm(struct mm_struct *mm, 
 
 	if (cur + npages > lim)
 		return 0;
+
+	if (memrlimit_cgroup_charge_as(mm, npages))
+		return 0;
+
 	return 1;
 }
 
diff -puN arch/x86/kernel/ptrace.c~memrlimit-controller-address-space-accounting-and-control arch/x86/kernel/ptrace.c
--- linux-2.6.26-rc2/arch/x86/kernel/ptrace.c~memrlimit-controller-address-space-accounting-and-control	2008-05-21 20:56:56.000000000 +0530
+++ linux-2.6.26-rc2-balbir/arch/x86/kernel/ptrace.c	2008-05-21 20:56:56.000000000 +0530
@@ -20,6 +20,7 @@
 #include <linux/audit.h>
 #include <linux/seccomp.h>
 #include <linux/signal.h>
+#include <linux/memrlimitcgroup.h>
 
 #include <asm/uaccess.h>
 #include <asm/pgtable.h>
@@ -782,21 +783,25 @@ static int ptrace_bts_realloc(struct tas
 
 	current->mm->total_vm  -= old_size;
 	current->mm->locked_vm -= old_size;
+	memrlimit_cgroup_uncharge_as(mm, old_size);
 
 	if (size == 0)
 		goto out;
 
+	if (memrlimit_cgroup_charge_as(current->mm, size))
+		goto out;
+
 	rlim = current->signal->rlim[RLIMIT_AS].rlim_cur >> PAGE_SHIFT;
 	vm = current->mm->total_vm  + size;
 	if (rlim < vm) {
 		ret = -ENOMEM;
 
 		if (!reduce_size)
-			goto out;
+			goto out_uncharge;
 
 		size = rlim - current->mm->total_vm;
 		if (size <= 0)
-			goto out;
+			goto out_uncharge;
 	}
 
 	rlim = current->signal->rlim[RLIMIT_MEMLOCK].rlim_cur >> PAGE_SHIFT;
@@ -805,21 +810,24 @@ static int ptrace_bts_realloc(struct tas
 		ret = -ENOMEM;
 
 		if (!reduce_size)
-			goto out;
+			goto out_uncharge;
 
 		size = rlim - current->mm->locked_vm;
 		if (size <= 0)
-			goto out;
+			goto out_uncharge;
 	}
 
 	ret = ds_allocate((void **)&child->thread.ds_area_msr,
 			  size << PAGE_SHIFT);
 	if (ret < 0)
-		goto out;
+		goto out_uncharge;
 
 	current->mm->total_vm  += size;
 	current->mm->locked_vm += size;
 
+out_uncharge:
+	if (ret < 0)
+		memrlimit_cgroup_uncharge_as(mm, size);
 out:
 	if (child->thread.ds_area_msr)
 		set_tsk_thread_flag(child, TIF_DS_AREA_MSR);
_

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 4/4] Add memrlimit controller accounting and control (v5)
  2008-05-21 15:30 ` [-mm][PATCH 4/4] Add memrlimit controller accounting and control (v5) Balbir Singh
@ 2008-05-21 17:20   ` Vivek Goyal
  2008-05-22  4:41     ` Balbir Singh
  2008-05-22  4:24   ` Andrew Morton
  1 sibling, 1 reply; 32+ messages in thread
From: Vivek Goyal @ 2008-05-21 17:20 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf,
	linux-kernel, Pavel Emelianov, Balbir Singh, Andrew Morton,
	KAMEZAWA Hiroyuki

On Wed, May 21, 2008 at 09:00:12PM +0530, Balbir Singh wrote:

[..]
> +static void memrlimit_cgroup_move_task(struct cgroup_subsys *ss,
> +					struct cgroup *cgrp,
> +					struct cgroup *old_cgrp,
> +					struct task_struct *p)
> +{
> +	struct mm_struct *mm;
> +	struct memrlimit_cgroup *memrcg, *old_memrcg;
> +
> +	mm = get_task_mm(p);
> +	if (mm == NULL)
> +		return;
> +
> +	/*
> +	 * Hold mmap_sem, so that total_vm does not change underneath us
> +	 */
> +	down_read(&mm->mmap_sem);
> +
> +	rcu_read_lock();
> +	if (p != rcu_dereference(mm->owner))
> +		goto out;
> +

Hi Balbir,

How does rcu help here? We are not dereferencing mm->owner. So even if
task_struct it was pointing to goes away, should not be a problem.

OTOH, while updating the mm->owner in mmm_update_next_owner(), we
are not using rcu_assing_pointer() and synchronize_rcu()/call_rcu(). Is
this the right usage if mm->owner is rcu protected?

Thanks
Vivek

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 1/4] Add memrlimit controller documentation (v5)
  2008-05-21 15:29 ` [-mm][PATCH 1/4] Add memrlimit controller documentation (v5) Balbir Singh
@ 2008-05-22  4:16   ` Andrew Morton
  2008-05-22 10:13     ` Balbir Singh
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Morton @ 2008-05-22  4:16 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf,
	linux-kernel, Pavel Emelianov, KAMEZAWA Hiroyuki

On Wed, 21 May 2008 20:59:37 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> 
> 
> Documentation patch - describes the goals and usage of the memrlimit
> controller.
> 
> 
> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
> ---
> 
>  Documentation/controllers/memrlimit.txt |   29 +++++++++++++++++++++++++++++
>  1 file changed, 29 insertions(+)
> 
> diff -puN /dev/null Documentation/controllers/memrlimit.txt
> --- /dev/null	2008-05-16 21:23:36.290004010 +0530
> +++ linux-2.6.26-rc2-balbir/Documentation/controllers/memrlimit.txt	2008-05-21 20:53:33.000000000 +0530
> @@ -0,0 +1,29 @@
> +This controller is enabled by the CONFIG_CGROUP_MEMRLIMIT_CTLR option. Prior
> +to reading this documentation please read Documentation/cgroups.txt and
> +Documentation/controllers/memory.txt. Several of the principles of this
> +controller are similar to the memory resource controller.
> +
> +This controller framework is designed to be extensible to control any
> +memory resource limit with little effort.
> +
> +This new controller, controls the address space expansion of the tasks
> +belonging to a cgroup. Address space control is provided along the same lines as
> +RLIMIT_AS control, which is available via getrlimit(2)/setrlimit(2).

Still would like to see more details here.  RLIMIT_AS is simple because
it applies to a single mm.  But a control group may contain multiple
mm's, and those mm's can share stuff.

The sharing probably isn't important in this situation, where we're
only concerned with virtual address space size, rather than real
memory.  But still, readers will be wondering about this.

They also will be wondering about the handling of threads versus
heavyweight processes.  _we_ know that, but users and administrators
may not.


> +The interface for controlling address space is provided through
> +"rlimit.limit_in_bytes". The file is similar to "limit_in_bytes" w.r.t. the user
> +interface. Please see section 3 of the memory resource controller documentation
> +for more details on how to use the user interface to get and set values.
> +
> +The "memrlimit.usage_in_bytes" file provides information about the total address
> +space usage of the tasks in the cgroup, in bytes.
> +
> +Advantages of providing this feature
> +
> +1. Control over virtual address space allows for a cgroup to fail gracefully
> +   i.e., via a malloc or mmap failure as compared to OOM kill when no
> +   pages can be reclaimed.

Well it's more than "i.e.".  It would be better to precisely spell out
the behaviour of this feature.  It's mmap() and brk() only, is it not? 
malloc() isn't a system call ;)

Ideally, either the changelog, this document or code comments should be
sufficient for a manpage to be written.  And this level of description
will lead to better code review and quite likely a better feature.

> +2. It provides better control over how many pages can be swapped out when
> +   the cgroup goes over its limit. A badly setup cgroup can cause excessive
> +   swapping. Providing control over the address space allocations ensures
> +   that the system administrator has control over the total swapping that
> +   can take place.

IOW: what, exactly and completely, does this feature do?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 2/4] Setup the memrlimit controller (v5)
  2008-05-21 15:29 ` [-mm][PATCH 2/4] Setup the memrlimit controller (v5) Balbir Singh
@ 2008-05-22  4:18   ` Andrew Morton
  2008-05-22 10:14     ` Balbir Singh
  2008-06-11 17:10   ` Andrea Righi
  1 sibling, 1 reply; 32+ messages in thread
From: Andrew Morton @ 2008-05-22  4:18 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf,
	linux-kernel, Pavel Emelianov, KAMEZAWA Hiroyuki

On Wed, 21 May 2008 20:59:48 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> +static int memrlimit_cgroup_write_strategy(char *buf, unsigned long long *tmp)

grumble.  I think I requested a checkpatch warning whenever it comes
across "tmp" or "temp".  Even better would be a gcc coredump.

I'm sure there's something more meaningful we could use here?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 3/4] cgroup mm owner callback changes to add task info (v5)
  2008-05-21 15:29 ` [-mm][PATCH 3/4] cgroup mm owner callback changes to add task info (v5) Balbir Singh
@ 2008-05-22  4:19   ` Andrew Morton
  2008-05-22 10:14     ` Balbir Singh
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Morton @ 2008-05-22  4:19 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf,
	linux-kernel, Pavel Emelianov, KAMEZAWA Hiroyuki

On Wed, 21 May 2008 20:59:59 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> 
> This patch adds an additional field to the mm_owner callbacks. This field
> is required to get to the mm that changed. Hold mmap_sem in write mode
> before calling the mm_owner_changed callback
>
> ...
>
> + * The callbacks are invoked with mmap_sem held in read mode.

Is that true?

> +	down_write(&mm->mmap_sem);
> ...
>  	cgroup_mm_owner_callbacks(mm->owner, c);

Looks like write-mode to me?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 4/4] Add memrlimit controller accounting and control (v5)
  2008-05-21 15:30 ` [-mm][PATCH 4/4] Add memrlimit controller accounting and control (v5) Balbir Singh
  2008-05-21 17:20   ` Vivek Goyal
@ 2008-05-22  4:24   ` Andrew Morton
  2008-05-22 10:15     ` Balbir Singh
  1 sibling, 1 reply; 32+ messages in thread
From: Andrew Morton @ 2008-05-22  4:24 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf,
	linux-kernel, Pavel Emelianov, KAMEZAWA Hiroyuki

On Wed, 21 May 2008 21:00:12 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:

> 
> This patch adds support for accounting and control of virtual address space
> limits. The accounting is done via the rlimit_cgroup_(un)charge_as functions.
> The core of the accounting takes place during fork time in copy_process(),
> may_expand_vm(), remove_vma_list() and exit_mmap(). 
> 
> Changelog v5->v4
> 
> Move specific hooks in code to insert_vm_struct
> Use mmap_sem to protect mm->owner from changing and mm->owner from
> changing cgroups.
> 
> ...
>
> + * brk(), sbrk()), stack expansion, mremap(), etc - called with
> + * mmap_sem held.
> + * decreasing - called with mmap_sem held.
> + * This callback is called with mmap_sem held

It's good to document the locking prerequisites but for rwsems, one
should specify whether it must be held for reading or for writing.

Of course, down_write() is a superset of down_read(), so if it's "held
for reading" then either mode-of-holding is OK.  But it's best to spell
all that out.


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 4/4] Add memrlimit controller accounting and control (v5)
  2008-05-21 17:20   ` Vivek Goyal
@ 2008-05-22  4:41     ` Balbir Singh
  0 siblings, 0 replies; 32+ messages in thread
From: Balbir Singh @ 2008-05-22  4:41 UTC (permalink / raw)
  To: Vivek Goyal
  Cc: linux-mm, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf,
	linux-kernel, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki

Vivek Goyal wrote:
> On Wed, May 21, 2008 at 09:00:12PM +0530, Balbir Singh wrote:
> 
> [..]
>> +static void memrlimit_cgroup_move_task(struct cgroup_subsys *ss,
>> +					struct cgroup *cgrp,
>> +					struct cgroup *old_cgrp,
>> +					struct task_struct *p)
>> +{
>> +	struct mm_struct *mm;
>> +	struct memrlimit_cgroup *memrcg, *old_memrcg;
>> +
>> +	mm = get_task_mm(p);
>> +	if (mm == NULL)
>> +		return;
>> +
>> +	/*
>> +	 * Hold mmap_sem, so that total_vm does not change underneath us
>> +	 */
>> +	down_read(&mm->mmap_sem);
>> +
>> +	rcu_read_lock();
>> +	if (p != rcu_dereference(mm->owner))
>> +		goto out;
>> +
> 
> Hi Balbir,
> 
> How does rcu help here? We are not dereferencing mm->owner. So even if
> task_struct it was pointing to goes away, should not be a problem.
> 

Yes, you are right, since we already have information about the cgroup and new
cgroup, mm->owner's exit should not really cause a problem

> OTOH, while updating the mm->owner in mmm_update_next_owner(), we
> are not using rcu_assing_pointer() and synchronize_rcu()/call_rcu(). Is
> this the right usage if mm->owner is rcu protected?
> 

Yes, you are correct - I'll send out updates on top of this one.

> Thanks
> Vivek
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 1/4] Add memrlimit controller documentation (v5)
  2008-05-22  4:16   ` Andrew Morton
@ 2008-05-22 10:13     ` Balbir Singh
  0 siblings, 0 replies; 32+ messages in thread
From: Balbir Singh @ 2008-05-22 10:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf,
	linux-kernel, Pavel Emelianov, KAMEZAWA Hiroyuki

Andrew Morton wrote:
> On Wed, 21 May 2008 20:59:37 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
>>
>> Documentation patch - describes the goals and usage of the memrlimit
>> controller.
>>
>>
>> Signed-off-by: Balbir Singh <balbir@linux.vnet.ibm.com>
>> ---
>>
>>  Documentation/controllers/memrlimit.txt |   29 +++++++++++++++++++++++++++++
>>  1 file changed, 29 insertions(+)
>>
>> diff -puN /dev/null Documentation/controllers/memrlimit.txt
>> --- /dev/null	2008-05-16 21:23:36.290004010 +0530
>> +++ linux-2.6.26-rc2-balbir/Documentation/controllers/memrlimit.txt	2008-05-21 20:53:33.000000000 +0530
>> @@ -0,0 +1,29 @@
>> +This controller is enabled by the CONFIG_CGROUP_MEMRLIMIT_CTLR option. Prior
>> +to reading this documentation please read Documentation/cgroups.txt and
>> +Documentation/controllers/memory.txt. Several of the principles of this
>> +controller are similar to the memory resource controller.
>> +
>> +This controller framework is designed to be extensible to control any
>> +memory resource limit with little effort.
>> +
>> +This new controller, controls the address space expansion of the tasks
>> +belonging to a cgroup. Address space control is provided along the same lines as
>> +RLIMIT_AS control, which is available via getrlimit(2)/setrlimit(2).
> 
> Still would like to see more details here.  RLIMIT_AS is simple because
> it applies to a single mm.  But a control group may contain multiple
> mm's, and those mm's can share stuff.
> 
> The sharing probably isn't important in this situation, where we're
> only concerned with virtual address space size, rather than real
> memory.  But still, readers will be wondering about this.
> 
> They also will be wondering about the handling of threads versus
> heavyweight processes.  _we_ know that, but users and administrators
> may not.
> 
> 
>> +The interface for controlling address space is provided through
>> +"rlimit.limit_in_bytes". The file is similar to "limit_in_bytes" w.r.t. the user
>> +interface. Please see section 3 of the memory resource controller documentation
>> +for more details on how to use the user interface to get and set values.
>> +
>> +The "memrlimit.usage_in_bytes" file provides information about the total address
>> +space usage of the tasks in the cgroup, in bytes.
>> +
>> +Advantages of providing this feature
>> +
>> +1. Control over virtual address space allows for a cgroup to fail gracefully
>> +   i.e., via a malloc or mmap failure as compared to OOM kill when no
>> +   pages can be reclaimed.
> 
> Well it's more than "i.e.".  It would be better to precisely spell out
> the behaviour of this feature.  It's mmap() and brk() only, is it not? 
> malloc() isn't a system call ;)
> 

Yes, true. I thought about wording it that way, but end users understand mmap()
and malloc() better that mmap with MAP_ANONYMOUS or brk(), sbrk().

> Ideally, either the changelog, this document or code comments should be
> sufficient for a manpage to be written.  And this level of description
> will lead to better code review and quite likely a better feature.
> 

That makes a lot of sense. I'll work on enhancing this documentation.

>> +2. It provides better control over how many pages can be swapped out when
>> +   the cgroup goes over its limit. A badly setup cgroup can cause excessive
>> +   swapping. Providing control over the address space allocations ensures
>> +   that the system administrator has control over the total swapping that
>> +   can take place.
> 
> IOW: what, exactly and completely, does this feature do?

I'll send an update with more details with an example if possible.


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 2/4] Setup the memrlimit controller (v5)
  2008-05-22  4:18   ` Andrew Morton
@ 2008-05-22 10:14     ` Balbir Singh
  0 siblings, 0 replies; 32+ messages in thread
From: Balbir Singh @ 2008-05-22 10:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf,
	linux-kernel, Pavel Emelianov, KAMEZAWA Hiroyuki

Andrew Morton wrote:
> On Wed, 21 May 2008 20:59:48 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
>> +static int memrlimit_cgroup_write_strategy(char *buf, unsigned long long *tmp)
> 
> grumble.  I think I requested a checkpatch warning whenever it comes
> across "tmp" or "temp".  Even better would be a gcc coredump.
> 

:-)

> I'm sure there's something more meaningful we could use here?

I'll send a patch to fix this.


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 3/4] cgroup mm owner callback changes to add task info (v5)
  2008-05-22  4:19   ` Andrew Morton
@ 2008-05-22 10:14     ` Balbir Singh
  0 siblings, 0 replies; 32+ messages in thread
From: Balbir Singh @ 2008-05-22 10:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf,
	linux-kernel, Pavel Emelianov, KAMEZAWA Hiroyuki

Andrew Morton wrote:
> On Wed, 21 May 2008 20:59:59 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
>> This patch adds an additional field to the mm_owner callbacks. This field
>> is required to get to the mm that changed. Hold mmap_sem in write mode
>> before calling the mm_owner_changed callback
>>
>> ...
>>
>> + * The callbacks are invoked with mmap_sem held in read mode.
> 
> Is that true?
> 
>> +	down_write(&mm->mmap_sem);
>> ...
>>  	cgroup_mm_owner_callbacks(mm->owner, c);
> 
> Looks like write-mode to me?

Yes, obsolete comment. Will fix.

-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 4/4] Add memrlimit controller accounting and control (v5)
  2008-05-22  4:24   ` Andrew Morton
@ 2008-05-22 10:15     ` Balbir Singh
  0 siblings, 0 replies; 32+ messages in thread
From: Balbir Singh @ 2008-05-22 10:15 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-mm, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf,
	linux-kernel, Pavel Emelianov, KAMEZAWA Hiroyuki

Andrew Morton wrote:
> On Wed, 21 May 2008 21:00:12 +0530 Balbir Singh <balbir@linux.vnet.ibm.com> wrote:
> 
>> This patch adds support for accounting and control of virtual address space
>> limits. The accounting is done via the rlimit_cgroup_(un)charge_as functions.
>> The core of the accounting takes place during fork time in copy_process(),
>> may_expand_vm(), remove_vma_list() and exit_mmap(). 
>>
>> Changelog v5->v4
>>
>> Move specific hooks in code to insert_vm_struct
>> Use mmap_sem to protect mm->owner from changing and mm->owner from
>> changing cgroups.
>>
>> ...
>>
>> + * brk(), sbrk()), stack expansion, mremap(), etc - called with
>> + * mmap_sem held.
>> + * decreasing - called with mmap_sem held.
>> + * This callback is called with mmap_sem held
> 
> It's good to document the locking prerequisites but for rwsems, one
> should specify whether it must be held for reading or for writing.
> 
> Of course, down_write() is a superset of down_read(), so if it's "held
> for reading" then either mode-of-holding is OK.  But it's best to spell
> all that out.
> 

Sure, will do


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 2/4] Setup the memrlimit controller (v5)
  2008-05-21 15:29 ` [-mm][PATCH 2/4] Setup the memrlimit controller (v5) Balbir Singh
  2008-05-22  4:18   ` Andrew Morton
@ 2008-06-11 17:10   ` Andrea Righi
  2008-06-11 17:33     ` Balbir Singh
  2008-06-11 19:15     ` Andrew Morton
  1 sibling, 2 replies; 32+ messages in thread
From: Andrea Righi @ 2008-06-11 17:10 UTC (permalink / raw)
  To: Balbir Singh
  Cc: linux-mm, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf,
	linux-kernel, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki

Balbir Singh wrote:
> +static int memrlimit_cgroup_write_strategy(char *buf, unsigned long long *tmp)
> +{
> +	*tmp = memparse(buf, &buf);
> +	if (*buf != '\0')
> +		return -EINVAL;
> +
> +	*tmp = PAGE_ALIGN(*tmp);
> +	return 0;
> +}

We shouldn't use PAGE_ALIGN() here, otherwise we limit the address space
to 4GB on 32-bit architectures (that could be reasonable, because this
is a per-cgroup limit and not per-process).

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
---
 mm/memrlimitcgroup.c |    4 +++-
 1 files changed, 3 insertions(+), 1 deletions(-)

diff --git a/mm/memrlimitcgroup.c b/mm/memrlimitcgroup.c
index 9a03d7d..2d42ff3 100644
--- a/mm/memrlimitcgroup.c
+++ b/mm/memrlimitcgroup.c
@@ -29,6 +29,8 @@
 #include <linux/res_counter.h>
 #include <linux/memrlimitcgroup.h>
 
+#define PAGE_ALIGN64(addr) (((((addr)+PAGE_SIZE-1))>>PAGE_SHIFT)<<PAGE_SHIFT)
+
 struct cgroup_subsys memrlimit_cgroup_subsys;
 
 struct memrlimit_cgroup {
@@ -124,7 +126,7 @@ static int memrlimit_cgroup_write_strategy(char *buf, unsigned long long *tmp)
 	if (*buf != '\0')
 		return -EINVAL;
 
-	*tmp = PAGE_ALIGN(*tmp);
+	*tmp = PAGE_ALIGN64(*tmp);
 	return 0;
 }
 

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 2/4] Setup the memrlimit controller (v5)
  2008-06-11 17:10   ` Andrea Righi
@ 2008-06-11 17:33     ` Balbir Singh
  2008-06-11 18:48       ` Andrea Righi
  2008-06-11 19:15     ` Andrew Morton
  1 sibling, 1 reply; 32+ messages in thread
From: Balbir Singh @ 2008-06-11 17:33 UTC (permalink / raw)
  To: righi.andrea
  Cc: linux-mm, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf,
	linux-kernel, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki

Andrea Righi wrote:
> Balbir Singh wrote:
>> +static int memrlimit_cgroup_write_strategy(char *buf, unsigned long
>> long *tmp)
>> +{
>> +    *tmp = memparse(buf, &buf);
>> +    if (*buf != '\0')
>> +        return -EINVAL;
>> +
>> +    *tmp = PAGE_ALIGN(*tmp);
>> +    return 0;
>> +}
> 
> We shouldn't use PAGE_ALIGN() here, otherwise we limit the address space
> to 4GB on 32-bit architectures (that could be reasonable, because this
> is a per-cgroup limit and not per-process).
> 

You mean un-reasonable?

> Signed-off-by: Andrea Righi <righi.andrea@gmail.com>

Seems fair enough.

Acked-by: Balbir Singh <balbir@linux.vnet.ibm.com>



-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 2/4] Setup the memrlimit controller (v5)
  2008-06-11 17:33     ` Balbir Singh
@ 2008-06-11 18:48       ` Andrea Righi
  0 siblings, 0 replies; 32+ messages in thread
From: Andrea Righi @ 2008-06-11 18:48 UTC (permalink / raw)
  To: balbir
  Cc: linux-mm, Sudhir Kumar, YAMAMOTO Takashi, Paul Menage, lizf,
	linux-kernel, Pavel Emelianov, Andrew Morton, KAMEZAWA Hiroyuki

Balbir Singh wrote:
> Andrea Righi wrote:
>> Balbir Singh wrote:
>>> +static int memrlimit_cgroup_write_strategy(char *buf, unsigned long
>>> long *tmp)
>>> +{
>>> +    *tmp = memparse(buf, &buf);
>>> +    if (*buf != '\0')
>>> +        return -EINVAL;
>>> +
>>> +    *tmp = PAGE_ALIGN(*tmp);
>>> +    return 0;
>>> +}
>> We shouldn't use PAGE_ALIGN() here, otherwise we limit the address space
>> to 4GB on 32-bit architectures (that could be reasonable, because this
>> is a per-cgroup limit and not per-process).
>>
> 
> You mean un-reasonable?

well... I mean, there would be no reason to apply this fix if it was a
limit per-task on 32-bit. ;-)

-Andrea

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 2/4] Setup the memrlimit controller (v5)
  2008-06-11 17:10   ` Andrea Righi
  2008-06-11 17:33     ` Balbir Singh
@ 2008-06-11 19:15     ` Andrew Morton
  2008-06-11 20:17       ` Andrea Righi
  1 sibling, 1 reply; 32+ messages in thread
From: Andrew Morton @ 2008-06-11 19:15 UTC (permalink / raw)
  To: righi.andrea
  Cc: balbir, linux-mm, skumar, yamamoto, menage, lizf, linux-kernel,
	xemul, kamezawa.hiroyu

On Wed, 11 Jun 2008 19:10:40 +0200 (MEST)
Andrea Righi <righi.andrea@gmail.com> wrote:

> Balbir Singh wrote:
> > +static int memrlimit_cgroup_write_strategy(char *buf, unsigned long long *tmp)
> > +{
> > +	*tmp = memparse(buf, &buf);
> > +	if (*buf != '\0')
> > +		return -EINVAL;
> > +
> > +	*tmp = PAGE_ALIGN(*tmp);
> > +	return 0;
> > +}
> 
> We shouldn't use PAGE_ALIGN() here, otherwise we limit the address space
> to 4GB on 32-bit architectures (that could be reasonable, because this
> is a per-cgroup limit and not per-process).
> 
> Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
> ---
>  mm/memrlimitcgroup.c |    4 +++-
>  1 files changed, 3 insertions(+), 1 deletions(-)
> 
> diff --git a/mm/memrlimitcgroup.c b/mm/memrlimitcgroup.c
> index 9a03d7d..2d42ff3 100644
> --- a/mm/memrlimitcgroup.c
> +++ b/mm/memrlimitcgroup.c
> @@ -29,6 +29,8 @@
>  #include <linux/res_counter.h>
>  #include <linux/memrlimitcgroup.h>
>  
> +#define PAGE_ALIGN64(addr) (((((addr)+PAGE_SIZE-1))>>PAGE_SHIFT)<<PAGE_SHIFT)
> +
>  struct cgroup_subsys memrlimit_cgroup_subsys;
>  
>  struct memrlimit_cgroup {
> @@ -124,7 +126,7 @@ static int memrlimit_cgroup_write_strategy(char *buf, unsigned long long *tmp)
>  	if (*buf != '\0')
>  		return -EINVAL;
>  
> -	*tmp = PAGE_ALIGN(*tmp);
> +	*tmp = PAGE_ALIGN64(*tmp);
>  	return 0;
>  }
>  

I don't beleive the change is needed.

#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)

that implementation will behaved as desired when passed a 64-bit addr?

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 2/4] Setup the memrlimit controller (v5)
  2008-06-11 19:15     ` Andrew Morton
@ 2008-06-11 20:17       ` Andrea Righi
  2008-06-11 20:43         ` Andrew Morton
  0 siblings, 1 reply; 32+ messages in thread
From: Andrea Righi @ 2008-06-11 20:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: balbir, linux-mm, skumar, yamamoto, menage, lizf, linux-kernel,
	xemul, kamezawa.hiroyu

Andrew Morton wrote:
> On Wed, 11 Jun 2008 19:10:40 +0200 (MEST)
> Andrea Righi <righi.andrea@gmail.com> wrote:
> 
>> Balbir Singh wrote:
>>> +static int memrlimit_cgroup_write_strategy(char *buf, unsigned long long *tmp)
>>> +{
>>> +	*tmp = memparse(buf, &buf);
>>> +	if (*buf != '\0')
>>> +		return -EINVAL;
>>> +
>>> +	*tmp = PAGE_ALIGN(*tmp);
>>> +	return 0;
>>> +}
>> We shouldn't use PAGE_ALIGN() here, otherwise we limit the address space
>> to 4GB on 32-bit architectures (that could be reasonable, because this
>> is a per-cgroup limit and not per-process).
>>
>> Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
>> ---
>>  mm/memrlimitcgroup.c |    4 +++-
>>  1 files changed, 3 insertions(+), 1 deletions(-)
>>
>> diff --git a/mm/memrlimitcgroup.c b/mm/memrlimitcgroup.c
>> index 9a03d7d..2d42ff3 100644
>> --- a/mm/memrlimitcgroup.c
>> +++ b/mm/memrlimitcgroup.c
>> @@ -29,6 +29,8 @@
>>  #include <linux/res_counter.h>
>>  #include <linux/memrlimitcgroup.h>
>>  
>> +#define PAGE_ALIGN64(addr) (((((addr)+PAGE_SIZE-1))>>PAGE_SHIFT)<<PAGE_SHIFT)
>> +
>>  struct cgroup_subsys memrlimit_cgroup_subsys;
>>  
>>  struct memrlimit_cgroup {
>> @@ -124,7 +126,7 @@ static int memrlimit_cgroup_write_strategy(char *buf, unsigned long long *tmp)
>>  	if (*buf != '\0')
>>  		return -EINVAL;
>>  
>> -	*tmp = PAGE_ALIGN(*tmp);
>> +	*tmp = PAGE_ALIGN64(*tmp);
>>  	return 0;
>>  }
>>  
> 
> I don't beleive the change is needed.
> 
> #define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
> 
> that implementation will behaved as desired when passed a 64-bit addr?

If I'm not doing something wrong, here is what happens on my i386 box:

$ uname -m
i686
$ cat 64-bit-page-align.c
#include <stdio.h>
#include <asm/page.h>

#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
#define PAGE_ALIGN64(addr) (((((addr)+PAGE_SIZE-1))>>PAGE_SHIFT)<<PAGE_SHIFT)

#define SIZE ((1ULL << 32) - 1)

int main(int argc, char **argv)
{
	unsigned long long good, bad;

	good = (unsigned long long)PAGE_ALIGN64(SIZE);
	bad = (unsigned long long)PAGE_ALIGN(SIZE);

	fprintf(stdout, "good = %llu, bad = %llu\n", good, bad);

	return 0;
}
$ gcc -O2 -o 64-bit-page-align 64-bit-page-align.c
$ ./64-bit-page-align
good = 4294967296, bad = 0
                   ^^^^^^^
On a x86_64, instead, both PAGE_ALIGN()s work as expected:

$ uname -m
x86_64
$ gcc -O2 -o 64-bit-page-align 64-bit-page-align.c
$ ./64-bit-page-align
good = 4294967296, bad = 4294967296

At least we could add something like:

#ifdef CONFIG_32BIT
#define PAGE_ALIGN64(addr) (((((addr)+PAGE_SIZE-1))>>PAGE_SHIFT)<<PAGE_SHIFT)
#else
#define PAGE_ALIGN64(addr) PAGE_ALIGN(addr)
#endif

But IMHO the single PAGE_ALIGN64() implementation is more clear.

-Andrea

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 2/4] Setup the memrlimit controller (v5)
  2008-06-11 20:17       ` Andrea Righi
@ 2008-06-11 20:43         ` Andrew Morton
  2008-06-11 22:47           ` Andrea Righi
  0 siblings, 1 reply; 32+ messages in thread
From: Andrew Morton @ 2008-06-11 20:43 UTC (permalink / raw)
  To: righi.andrea
  Cc: balbir, linux-mm, skumar, yamamoto, menage, lizf, linux-kernel,
	xemul, kamezawa.hiroyu

On Wed, 11 Jun 2008 22:17:12 +0200
Andrea Righi <righi.andrea@gmail.com> wrote:

> Andrew Morton wrote:
> > On Wed, 11 Jun 2008 19:10:40 +0200 (MEST)
> > Andrea Righi <righi.andrea@gmail.com> wrote:
> > 
> >> Balbir Singh wrote:
> >>> +static int memrlimit_cgroup_write_strategy(char *buf, unsigned long long *tmp)
> >>> +{
> >>> +	*tmp = memparse(buf, &buf);
> >>> +	if (*buf != '\0')
> >>> +		return -EINVAL;
> >>> +
> >>> +	*tmp = PAGE_ALIGN(*tmp);
> >>> +	return 0;
> >>> +}
> >> We shouldn't use PAGE_ALIGN() here, otherwise we limit the address space
> >> to 4GB on 32-bit architectures (that could be reasonable, because this
> >> is a per-cgroup limit and not per-process).
> >>
> >> Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
> >> ---
> >>  mm/memrlimitcgroup.c |    4 +++-
> >>  1 files changed, 3 insertions(+), 1 deletions(-)
> >>
> >> diff --git a/mm/memrlimitcgroup.c b/mm/memrlimitcgroup.c
> >> index 9a03d7d..2d42ff3 100644
> >> --- a/mm/memrlimitcgroup.c
> >> +++ b/mm/memrlimitcgroup.c
> >> @@ -29,6 +29,8 @@
> >>  #include <linux/res_counter.h>
> >>  #include <linux/memrlimitcgroup.h>
> >>  
> >> +#define PAGE_ALIGN64(addr) (((((addr)+PAGE_SIZE-1))>>PAGE_SHIFT)<<PAGE_SHIFT)
> >> +
> >>  struct cgroup_subsys memrlimit_cgroup_subsys;
> >>  
> >>  struct memrlimit_cgroup {
> >> @@ -124,7 +126,7 @@ static int memrlimit_cgroup_write_strategy(char *buf, unsigned long long *tmp)
> >>  	if (*buf != '\0')
> >>  		return -EINVAL;
> >>  
> >> -	*tmp = PAGE_ALIGN(*tmp);
> >> +	*tmp = PAGE_ALIGN64(*tmp);
> >>  	return 0;
> >>  }
> >>  
> > 
> > I don't beleive the change is needed.
> > 
> > #define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
> > 
> > that implementation will behaved as desired when passed a 64-bit addr?
> 
> If I'm not doing something wrong, here is what happens on my i386 box:
> 
> $ uname -m
> i686
> $ cat 64-bit-page-align.c
> #include <stdio.h>
> #include <asm/page.h>
> 
> #define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
> #define PAGE_ALIGN64(addr) (((((addr)+PAGE_SIZE-1))>>PAGE_SHIFT)<<PAGE_SHIFT)
> 
> #define SIZE ((1ULL << 32) - 1)
> 
> int main(int argc, char **argv)
> {
> 	unsigned long long good, bad;
> 
> 	good = (unsigned long long)PAGE_ALIGN64(SIZE);
> 	bad = (unsigned long long)PAGE_ALIGN(SIZE);
> 
> 	fprintf(stdout, "good = %llu, bad = %llu\n", good, bad);
> 
> 	return 0;
> }
> $ gcc -O2 -o 64-bit-page-align 64-bit-page-align.c
> $ ./64-bit-page-align
> good = 4294967296, bad = 0
>                    ^^^^^^^
> On a x86_64, instead, both PAGE_ALIGN()s work as expected:

That's weird.  We have an expression which contains a combination of UL
constants and ULL constants, and the compiler _isn't_ converting
everything into ULL.


> $ uname -m
> x86_64
> $ gcc -O2 -o 64-bit-page-align 64-bit-page-align.c
> $ ./64-bit-page-align
> good = 4294967296, bad = 4294967296
> 
> At least we could add something like:
> 
> #ifdef CONFIG_32BIT
> #define PAGE_ALIGN64(addr) (((((addr)+PAGE_SIZE-1))>>PAGE_SHIFT)<<PAGE_SHIFT)
> #else
> #define PAGE_ALIGN64(addr) PAGE_ALIGN(addr)
> #endif
> 
> But IMHO the single PAGE_ALIGN64() implementation is more clear.

No, we should just fix PAGE_ALIGN.  It should work correctly when
passed a long-long.  Otherwse it's just a timebomb.

This:

#define PAGE_ALIGN(addr) ({				\
	typeof(addr) __size = PAGE_SIZE;		\
	typeof(addr) __mask = PAGE_MASK;		\
	(addr + __size - 1) & __mask;			\
})

(with a suitable comment) does what we want.  I didn't check to see
whether this causes the compiler to generate larger code, but it
shouldn't.

If we go this way then first we should hoist the PAGE_ALIGN definitions
out of include/asm-*/page.h and into, umm, include/linux/mm.h. 
That might cause build errors but from a bit of grepping around, only
include/asm-arm/cacheflush.h is at risk, and it already includes
linux/mm.h.

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 2/4] Setup the memrlimit controller (v5)
  2008-06-11 20:43         ` Andrew Morton
@ 2008-06-11 22:47           ` Andrea Righi
  2008-06-11 22:55             ` Andrew Morton
  0 siblings, 1 reply; 32+ messages in thread
From: Andrea Righi @ 2008-06-11 22:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: balbir, linux-mm, skumar, yamamoto, menage, lizf, linux-kernel,
	xemul, kamezawa.hiroyu

Andrew Morton wrote:
>> At least we could add something like:
>>
>> #ifdef CONFIG_32BIT
>> #define PAGE_ALIGN64(addr) (((((addr)+PAGE_SIZE-1))>>PAGE_SHIFT)<<PAGE_SHIFT)
>> #else
>> #define PAGE_ALIGN64(addr) PAGE_ALIGN(addr)
>> #endif
>>
>> But IMHO the single PAGE_ALIGN64() implementation is more clear.
> 
> No, we should just fix PAGE_ALIGN.  It should work correctly when
> passed a long-long.  Otherwse it's just a timebomb.
> 
> This:
> 
> #define PAGE_ALIGN(addr) ({				\
> 	typeof(addr) __size = PAGE_SIZE;		\
> 	typeof(addr) __mask = PAGE_MASK;		\
> 	(addr + __size - 1) & __mask;			\
> })
> 
> (with a suitable comment) does what we want.  I didn't check to see
> whether this causes the compiler to generate larger code, but it
> shouldn't.
> 

No, it doesn't work. The problem seems to be in the PAGE_MASK definition
(from include/asm-x86/page.h for example):

/* PAGE_SHIFT determines the page size */
#define PAGE_SHIFT      12
#define PAGE_SIZE       (_AC(1,UL) << PAGE_SHIFT)
#define PAGE_MASK       (~(PAGE_SIZE-1))

The "~" is performed on a 32-bit value, so everything in "and" with
PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.

What do you think about the following?

#define PAGE_SIZE64 (1ULL << PAGE_SHIFT)
#define PAGE_MASK64 (~(PAGE_SIZE64 - 1))

#define PAGE_ALIGN(addr) ({					\
	typeof(addr) __size = PAGE_SIZE;			\
	typeof(addr) __ret = (addr) + __size - 1;		\
	__ret > -1UL ? __ret & PAGE_MASK64 : __ret & PAGE_MASK;	\
})

-Andrea

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 2/4] Setup the memrlimit controller (v5)
  2008-06-11 22:47           ` Andrea Righi
@ 2008-06-11 22:55             ` Andrew Morton
  2008-06-11 23:04               ` Andrea Righi
  2008-06-12  6:13               ` Balbir Singh
  0 siblings, 2 replies; 32+ messages in thread
From: Andrew Morton @ 2008-06-11 22:55 UTC (permalink / raw)
  To: righi.andrea
  Cc: balbir, linux-mm, skumar, yamamoto, menage, lizf, linux-kernel,
	xemul, kamezawa.hiroyu

On Thu, 12 Jun 2008 00:47:27 +0200
Andrea Righi <righi.andrea@gmail.com> wrote:

> Andrew Morton wrote:
> >> At least we could add something like:
> >>
> >> #ifdef CONFIG_32BIT
> >> #define PAGE_ALIGN64(addr) (((((addr)+PAGE_SIZE-1))>>PAGE_SHIFT)<<PAGE_SHIFT)
> >> #else
> >> #define PAGE_ALIGN64(addr) PAGE_ALIGN(addr)
> >> #endif
> >>
> >> But IMHO the single PAGE_ALIGN64() implementation is more clear.
> > 
> > No, we should just fix PAGE_ALIGN.  It should work correctly when
> > passed a long-long.  Otherwse it's just a timebomb.
> > 
> > This:
> > 
> > #define PAGE_ALIGN(addr) ({				\
> > 	typeof(addr) __size = PAGE_SIZE;		\
> > 	typeof(addr) __mask = PAGE_MASK;		\
> > 	(addr + __size - 1) & __mask;			\
> > })
> > 
> > (with a suitable comment) does what we want.  I didn't check to see
> > whether this causes the compiler to generate larger code, but it
> > shouldn't.
> > 
> 
> No, it doesn't work. The problem seems to be in the PAGE_MASK definition
> (from include/asm-x86/page.h for example):
> 
> /* PAGE_SHIFT determines the page size */
> #define PAGE_SHIFT      12
> #define PAGE_SIZE       (_AC(1,UL) << PAGE_SHIFT)
> #define PAGE_MASK       (~(PAGE_SIZE-1))
> 
> The "~" is performed on a 32-bit value, so everything in "and" with
> PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.

OK, I oversimplified my testcase.

> What do you think about the following?
> 
> #define PAGE_SIZE64 (1ULL << PAGE_SHIFT)
> #define PAGE_MASK64 (~(PAGE_SIZE64 - 1))
> 
> #define PAGE_ALIGN(addr) ({					\
> 	typeof(addr) __size = PAGE_SIZE;			\
> 	typeof(addr) __ret = (addr) + __size - 1;		\
> 	__ret > -1UL ? __ret & PAGE_MASK64 : __ret & PAGE_MASK;	\
> })

Complex.  And I'd worry about added code overhead.

What about

#define PAGE_ALIGN(addr) ALIGN(addr, PAGE_SIZE)

?

afaict ALIGN() tries to do the right thing, and if it doesn't, we
should fix ALIGN().


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 2/4] Setup the memrlimit controller (v5)
  2008-06-11 22:55             ` Andrew Morton
@ 2008-06-11 23:04               ` Andrea Righi
  2008-06-12  6:13               ` Balbir Singh
  1 sibling, 0 replies; 32+ messages in thread
From: Andrea Righi @ 2008-06-11 23:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: balbir, linux-mm, skumar, yamamoto, menage, lizf, linux-kernel,
	xemul, kamezawa.hiroyu

Andrew Morton wrote:
> On Thu, 12 Jun 2008 00:47:27 +0200
> Andrea Righi <righi.andrea@gmail.com> wrote:
> 
>> Andrew Morton wrote:
>>>> At least we could add something like:
>>>>
>>>> #ifdef CONFIG_32BIT
>>>> #define PAGE_ALIGN64(addr) (((((addr)+PAGE_SIZE-1))>>PAGE_SHIFT)<<PAGE_SHIFT)
>>>> #else
>>>> #define PAGE_ALIGN64(addr) PAGE_ALIGN(addr)
>>>> #endif
>>>>
>>>> But IMHO the single PAGE_ALIGN64() implementation is more clear.
>>> No, we should just fix PAGE_ALIGN.  It should work correctly when
>>> passed a long-long.  Otherwse it's just a timebomb.
>>>
>>> This:
>>>
>>> #define PAGE_ALIGN(addr) ({				\
>>> 	typeof(addr) __size = PAGE_SIZE;		\
>>> 	typeof(addr) __mask = PAGE_MASK;		\
>>> 	(addr + __size - 1) & __mask;			\
>>> })
>>>
>>> (with a suitable comment) does what we want.  I didn't check to see
>>> whether this causes the compiler to generate larger code, but it
>>> shouldn't.
>>>
>> No, it doesn't work. The problem seems to be in the PAGE_MASK definition
>> (from include/asm-x86/page.h for example):
>>
>> /* PAGE_SHIFT determines the page size */
>> #define PAGE_SHIFT      12
>> #define PAGE_SIZE       (_AC(1,UL) << PAGE_SHIFT)
>> #define PAGE_MASK       (~(PAGE_SIZE-1))
>>
>> The "~" is performed on a 32-bit value, so everything in "and" with
>> PAGE_MASK greater than 4GB will be truncated to the 32-bit boundary.
> 
> OK, I oversimplified my testcase.
> 
>> What do you think about the following?
>>
>> #define PAGE_SIZE64 (1ULL << PAGE_SHIFT)
>> #define PAGE_MASK64 (~(PAGE_SIZE64 - 1))
>>
>> #define PAGE_ALIGN(addr) ({					\
>> 	typeof(addr) __size = PAGE_SIZE;			\
>> 	typeof(addr) __ret = (addr) + __size - 1;		\
>> 	__ret > -1UL ? __ret & PAGE_MASK64 : __ret & PAGE_MASK;	\
>> })
> 
> Complex.  And I'd worry about added code overhead.
> 
> What about
> 
> #define PAGE_ALIGN(addr) ALIGN(addr, PAGE_SIZE)
> 
> ?
> 
> afaict ALIGN() tries to do the right thing, and if it doesn't, we
> should fix ALIGN().

Good! Much simpler.

-Andrea

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 2/4] Setup the memrlimit controller (v5)
  2008-06-11 22:55             ` Andrew Morton
  2008-06-11 23:04               ` Andrea Righi
@ 2008-06-12  6:13               ` Balbir Singh
  2008-06-12  8:52                 ` Andrea Righi
  1 sibling, 1 reply; 32+ messages in thread
From: Balbir Singh @ 2008-06-12  6:13 UTC (permalink / raw)
  To: Andrew Morton
  Cc: righi.andrea, linux-mm, skumar, yamamoto, menage, lizf,
	linux-kernel, xemul, kamezawa.hiroyu

Andrew Morton wrote:
> 
> #define PAGE_ALIGN(addr) ALIGN(addr, PAGE_SIZE)
> 
> ?
> 
> afaict ALIGN() tries to do the right thing, and if it doesn't, we
> should fix ALIGN().
> 

That should do the right thing, provided there are no duplicate ALIGN defintions
elsewhere

kernel.h has


#define ALIGN(x,a)              __ALIGN_MASK(x,(typeof(x))(a)-1)
#define __ALIGN_MASK(x,mask)    (((x)+(mask))&~(mask))

Which seems like the correct thing, since we use typeof(x) for (a) - 1.



-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 2/4] Setup the memrlimit controller (v5)
  2008-06-12  6:13               ` Balbir Singh
@ 2008-06-12  8:52                 ` Andrea Righi
  2008-06-12  9:02                   ` Andrew Morton
  0 siblings, 1 reply; 32+ messages in thread
From: Andrea Righi @ 2008-06-12  8:52 UTC (permalink / raw)
  To: balbir, Andrew Morton
  Cc: linux-mm, skumar, yamamoto, menage, lizf, linux-kernel, xemul,
	kamezawa.hiroyu

Balbir Singh wrote:
> Andrew Morton wrote:
>> #define PAGE_ALIGN(addr) ALIGN(addr, PAGE_SIZE)
>>
>> ?
>>
>> afaict ALIGN() tries to do the right thing, and if it doesn't, we
>> should fix ALIGN().
>>
> 
> That should do the right thing, provided there are no duplicate ALIGN defintions
> elsewhere
> 
> kernel.h has
> 
> 
> #define ALIGN(x,a)              __ALIGN_MASK(x,(typeof(x))(a)-1)
> #define __ALIGN_MASK(x,mask)    (((x)+(mask))&~(mask))
> 
> Which seems like the correct thing, since we use typeof(x) for (a) - 1.

OK, I'm going to do some builds and tests on my i386 and x86_64 boxes
with the following patch. I'm sure this will break something and I don't
know if moving PAGE_ALIGN() out of asm-*/page.h is the right way.

I'll keep you informed and post an updated version in case I'll find
obvious errors.

-Andrea

---
Properly handle 64-bit values with PAGE_ALIGN() on 32-bit architectures.

For example:

	u64 val = PAGE_ALIGN(size);

is broken on 32-bit architectures if size is greater than 4GB.

The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for
example):

#define PAGE_SHIFT      12
#define PAGE_SIZE       (_AC(1,UL) << PAGE_SHIFT)
#define PAGE_MASK       (~(PAGE_SIZE-1))
...
#define PAGE_ALIGN(addr)       (((addr)+PAGE_SIZE-1)&PAGE_MASK)

The "~" is performed on a 32-bit value, so everything in "and" with PAGE_MASK
greater than 4GB will be truncated to the 32-bit boundary.

Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in
include/linux/mm.h.

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
 arch/powerpc/boot/page.h     |    3 ---
 drivers/pci/intel-iommu.h    |    2 +-
 include/asm-alpha/page.h     |    3 ---
 include/asm-arm/page-nommu.h |    4 +---
 include/asm-arm/page.h       |    3 ---
 include/asm-avr32/page.h     |    3 ---
 include/asm-blackfin/page.h  |    3 ---
 include/asm-cris/page.h      |    3 ---
 include/asm-frv/page.h       |    3 ---
 include/asm-h8300/page.h     |    3 ---
 include/asm-ia64/page.h      |    1 -
 include/asm-m32r/page.h      |    3 ---
 include/asm-m68k/page.h      |    3 ---
 include/asm-m68knommu/page.h |    3 ---
 include/asm-mips/page.h      |    3 ---
 include/asm-mn10300/page.h   |    3 ---
 include/asm-parisc/page.h    |    4 ----
 include/asm-powerpc/page.h   |    3 ---
 include/asm-ppc/page.h       |    4 ----
 include/asm-s390/page.h      |    3 ---
 include/asm-sh/page.h        |    3 ---
 include/asm-sparc/page.h     |    3 ---
 include/asm-sparc64/page.h   |    3 ---
 include/asm-um/page.h        |    3 ---
 include/asm-v850/page.h      |    6 ------
 include/asm-x86/page.h       |    3 ---
 include/asm-xtensa/page.h    |    2 --
 include/linux/mm.h           |    4 ++++
 28 files changed, 6 insertions(+), 81 deletions(-)

diff --git a/arch/powerpc/boot/page.h b/arch/powerpc/boot/page.h
index 14eca30..aa42298 100644
--- a/arch/powerpc/boot/page.h
+++ b/arch/powerpc/boot/page.h
@@ -28,7 +28,4 @@
 /* align addr on a size boundary - adjust address up if needed */
 #define _ALIGN(addr,size)     _ALIGN_UP(addr,size)
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	_ALIGN(addr, PAGE_SIZE)
-
 #endif				/* _PPC_BOOT_PAGE_H */
diff --git a/drivers/pci/intel-iommu.h b/drivers/pci/intel-iommu.h
index afc0ad9..339da69 100644
--- a/drivers/pci/intel-iommu.h
+++ b/drivers/pci/intel-iommu.h
@@ -35,7 +35,7 @@
 #define PAGE_SHIFT_4K		(12)
 #define PAGE_SIZE_4K		(1UL << PAGE_SHIFT_4K)
 #define PAGE_MASK_4K		(((u64)-1) << PAGE_SHIFT_4K)
-#define PAGE_ALIGN_4K(addr)	(((addr) + PAGE_SIZE_4K - 1) & PAGE_MASK_4K)
+#define PAGE_ALIGN_4K(addr)	ALIGN(addr, PAGE_SIZE_4K)
 
 #define IOVA_PFN(addr)		((addr) >> PAGE_SHIFT_4K)
 #define DMA_32BIT_PFN		IOVA_PFN(DMA_32BIT_MASK)
diff --git a/include/asm-alpha/page.h b/include/asm-alpha/page.h
index 22ff976..0995f9d 100644
--- a/include/asm-alpha/page.h
+++ b/include/asm-alpha/page.h
@@ -80,9 +80,6 @@ typedef struct page *pgtable_t;
 
 #endif /* !__ASSEMBLY__ */
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 #define __pa(x)			((unsigned long) (x) - PAGE_OFFSET)
 #define __va(x)			((void *)((unsigned long) (x) + PAGE_OFFSET))
 #ifndef CONFIG_DISCONTIGMEM
diff --git a/include/asm-arm/page-nommu.h b/include/asm-arm/page-nommu.h
index a1bcad0..ea1cde8 100644
--- a/include/asm-arm/page-nommu.h
+++ b/include/asm-arm/page-nommu.h
@@ -7,6 +7,7 @@
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  */
+
 #ifndef _ASMARM_PAGE_NOMMU_H
 #define _ASMARM_PAGE_NOMMU_H
 
@@ -42,9 +43,6 @@ typedef unsigned long pgprot_t;
 #define __pmd(x)        (x)
 #define __pgprot(x)     (x)
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 extern unsigned long memory_start;
 extern unsigned long memory_end;
 
diff --git a/include/asm-arm/page.h b/include/asm-arm/page.h
index 8e05bdb..7c5fc55 100644
--- a/include/asm-arm/page.h
+++ b/include/asm-arm/page.h
@@ -15,9 +15,6 @@
 #define PAGE_SIZE		(1UL << PAGE_SHIFT)
 #define PAGE_MASK		(~(PAGE_SIZE-1))
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 #ifndef __ASSEMBLY__
 
 #ifndef CONFIG_MMU
diff --git a/include/asm-avr32/page.h b/include/asm-avr32/page.h
index cbbc5ca..f805d1c 100644
--- a/include/asm-avr32/page.h
+++ b/include/asm-avr32/page.h
@@ -57,9 +57,6 @@ static inline int get_order(unsigned long size)
 
 #endif /* !__ASSEMBLY__ */
 
-/* Align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr) + PAGE_SIZE - 1) & PAGE_MASK)
-
 /*
  * The hardware maps the virtual addresses 0x80000000 -> 0x9fffffff
  * permanently to the physical addresses 0x00000000 -> 0x1fffffff when
diff --git a/include/asm-blackfin/page.h b/include/asm-blackfin/page.h
index c7db022..344f6a8 100644
--- a/include/asm-blackfin/page.h
+++ b/include/asm-blackfin/page.h
@@ -51,9 +51,6 @@ typedef struct page *pgtable_t;
 #define __pgd(x)	((pgd_t) { (x) } )
 #define __pgprot(x)	((pgprot_t) { (x) } )
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 extern unsigned long memory_start;
 extern unsigned long memory_end;
 
diff --git a/include/asm-cris/page.h b/include/asm-cris/page.h
index c45bb1e..d19272b 100644
--- a/include/asm-cris/page.h
+++ b/include/asm-cris/page.h
@@ -60,9 +60,6 @@ typedef struct page *pgtable_t;
 
 #define page_to_phys(page)     __pa((((page) - mem_map) << PAGE_SHIFT) + PAGE_OFFSET)
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 #ifndef __ASSEMBLY__
 
 #endif /* __ASSEMBLY__ */
diff --git a/include/asm-frv/page.h b/include/asm-frv/page.h
index c2c1e89..bd9c220 100644
--- a/include/asm-frv/page.h
+++ b/include/asm-frv/page.h
@@ -40,9 +40,6 @@ typedef struct page *pgtable_t;
 #define __pgprot(x)	((pgprot_t) { (x) } )
 #define PTE_MASK	PAGE_MASK
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr) + PAGE_SIZE - 1) & PAGE_MASK)
-
 #define devmem_is_allowed(pfn)	1
 
 #define __pa(vaddr)		virt_to_phys((void *) (unsigned long) (vaddr))
diff --git a/include/asm-h8300/page.h b/include/asm-h8300/page.h
index d6a3eaf..0b6acf0 100644
--- a/include/asm-h8300/page.h
+++ b/include/asm-h8300/page.h
@@ -43,9 +43,6 @@ typedef struct page *pgtable_t;
 #define __pgd(x)	((pgd_t) { (x) } )
 #define __pgprot(x)	((pgprot_t) { (x) } )
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 extern unsigned long memory_start;
 extern unsigned long memory_end;
 
diff --git a/include/asm-ia64/page.h b/include/asm-ia64/page.h
index 36f3932..5f271bc 100644
--- a/include/asm-ia64/page.h
+++ b/include/asm-ia64/page.h
@@ -40,7 +40,6 @@
 
 #define PAGE_SIZE		(__IA64_UL_CONST(1) << PAGE_SHIFT)
 #define PAGE_MASK		(~(PAGE_SIZE - 1))
-#define PAGE_ALIGN(addr)	(((addr) + PAGE_SIZE - 1) & PAGE_MASK)
 
 #define PERCPU_PAGE_SHIFT	16	/* log2() of max. size of per-CPU area */
 #define PERCPU_PAGE_SIZE	(__IA64_UL_CONST(1) << PERCPU_PAGE_SHIFT)
diff --git a/include/asm-m32r/page.h b/include/asm-m32r/page.h
index 8a677f3..c933308 100644
--- a/include/asm-m32r/page.h
+++ b/include/asm-m32r/page.h
@@ -41,9 +41,6 @@ typedef struct page *pgtable_t;
 
 #endif /* !__ASSEMBLY__ */
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr) + PAGE_SIZE - 1) & PAGE_MASK)
-
 /*
  * This handles the memory map.. We could make this a config
  * option, but too many people screw it up, and too few need
diff --git a/include/asm-m68k/page.h b/include/asm-m68k/page.h
index 880c2cb..a34b8ba 100644
--- a/include/asm-m68k/page.h
+++ b/include/asm-m68k/page.h
@@ -103,9 +103,6 @@ typedef struct page *pgtable_t;
 #define __pgd(x)	((pgd_t) { (x) } )
 #define __pgprot(x)	((pgprot_t) { (x) } )
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 #endif /* !__ASSEMBLY__ */
 
 #include <asm/page_offset.h>
diff --git a/include/asm-m68knommu/page.h b/include/asm-m68knommu/page.h
index 1e82ebb..3a1ede4 100644
--- a/include/asm-m68knommu/page.h
+++ b/include/asm-m68knommu/page.h
@@ -43,9 +43,6 @@ typedef struct page *pgtable_t;
 #define __pgd(x)	((pgd_t) { (x) } )
 #define __pgprot(x)	((pgprot_t) { (x) } )
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 extern unsigned long memory_start;
 extern unsigned long memory_end;
 
diff --git a/include/asm-mips/page.h b/include/asm-mips/page.h
index 8735aa0..dd1cc0a 100644
--- a/include/asm-mips/page.h
+++ b/include/asm-mips/page.h
@@ -134,9 +134,6 @@ typedef struct { unsigned long pgprot; } pgprot_t;
 
 #endif /* !__ASSEMBLY__ */
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr) + PAGE_SIZE - 1) & PAGE_MASK)
-
 /*
  * __pa()/__va() should be used only during mem init.
  */
diff --git a/include/asm-mn10300/page.h b/include/asm-mn10300/page.h
index 124971b..8288e12 100644
--- a/include/asm-mn10300/page.h
+++ b/include/asm-mn10300/page.h
@@ -61,9 +61,6 @@ typedef struct page *pgtable_t;
 
 #endif /* !__ASSEMBLY__ */
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr) + PAGE_SIZE - 1) & PAGE_MASK)
-
 /*
  * This handles the memory map.. We could make this a config
  * option, but too many people screw it up, and too few need
diff --git a/include/asm-parisc/page.h b/include/asm-parisc/page.h
index 27d50b8..c3941f0 100644
--- a/include/asm-parisc/page.h
+++ b/include/asm-parisc/page.h
@@ -119,10 +119,6 @@ extern int npmem_ranges;
 #define PMD_ENTRY_SIZE	(1UL << BITS_PER_PMD_ENTRY)
 #define PTE_ENTRY_SIZE	(1UL << BITS_PER_PTE_ENTRY)
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
-
 #define LINUX_GATEWAY_SPACE     0
 
 /* This governs the relationship between virtual and physical addresses.
diff --git a/include/asm-powerpc/page.h b/include/asm-powerpc/page.h
index cffdf0e..e088545 100644
--- a/include/asm-powerpc/page.h
+++ b/include/asm-powerpc/page.h
@@ -119,9 +119,6 @@ extern phys_addr_t kernstart_addr;
 /* align addr on a size boundary - adjust address up if needed */
 #define _ALIGN(addr,size)     _ALIGN_UP(addr,size)
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	_ALIGN(addr, PAGE_SIZE)
-
 /*
  * Don't compare things with KERNELBASE or PAGE_OFFSET to test for
  * "kernelness", use is_kernel_addr() - it should do what you want.
diff --git a/include/asm-ppc/page.h b/include/asm-ppc/page.h
index 37e4756..6a53965 100644
--- a/include/asm-ppc/page.h
+++ b/include/asm-ppc/page.h
@@ -43,10 +43,6 @@ typedef unsigned long pte_basic_t;
 /* align addr on a size boundary - adjust address up if needed */
 #define _ALIGN(addr,size)     _ALIGN_UP(addr,size)
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	_ALIGN(addr, PAGE_SIZE)
-
-
 #undef STRICT_MM_TYPECHECKS
 
 #ifdef STRICT_MM_TYPECHECKS
diff --git a/include/asm-s390/page.h b/include/asm-s390/page.h
index 12fd9c4..991ba93 100644
--- a/include/asm-s390/page.h
+++ b/include/asm-s390/page.h
@@ -138,9 +138,6 @@ void arch_alloc_page(struct page *page, int order);
 
 #endif /* !__ASSEMBLY__ */
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)        (((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 #define __PAGE_OFFSET           0x0UL
 #define PAGE_OFFSET             0x0UL
 #define __pa(x)                 (unsigned long)(x)
diff --git a/include/asm-sh/page.h b/include/asm-sh/page.h
index 304c30b..5dc01d2 100644
--- a/include/asm-sh/page.h
+++ b/include/asm-sh/page.h
@@ -22,9 +22,6 @@
 #define PAGE_MASK	(~(PAGE_SIZE-1))
 #define PTE_MASK	PAGE_MASK
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 #if defined(CONFIG_HUGETLB_PAGE_SIZE_64K)
 #define HPAGE_SHIFT	16
 #elif defined(CONFIG_HUGETLB_PAGE_SIZE_256K)
diff --git a/include/asm-sparc/page.h b/include/asm-sparc/page.h
index 6aa9e4c..a5029f2 100644
--- a/include/asm-sparc/page.h
+++ b/include/asm-sparc/page.h
@@ -136,9 +136,6 @@ BTFIXUPDEF_SETHI(sparc_unmapped_base)
 
 #endif /* !(__ASSEMBLY__) */
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)  (((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 #define PAGE_OFFSET	0xf0000000
 #ifndef __ASSEMBLY__
 extern unsigned long phys_base;
diff --git a/include/asm-sparc64/page.h b/include/asm-sparc64/page.h
index 93f0881..c36a4ba 100644
--- a/include/asm-sparc64/page.h
+++ b/include/asm-sparc64/page.h
@@ -110,9 +110,6 @@ typedef struct page *pgtable_t;
 
 #endif /* !(__ASSEMBLY__) */
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 /* We used to stick this into a hard-coded global register (%g4)
  * but that does not make sense anymore.
  */
diff --git a/include/asm-um/page.h b/include/asm-um/page.h
index 916e1a6..335c573 100644
--- a/include/asm-um/page.h
+++ b/include/asm-um/page.h
@@ -92,9 +92,6 @@ typedef struct page *pgtable_t;
 #define __pgd(x) ((pgd_t) { (x) } )
 #define __pgprot(x)	((pgprot_t) { (x) } )
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 extern unsigned long uml_physmem;
 
 #define PAGE_OFFSET (uml_physmem)
diff --git a/include/asm-v850/page.h b/include/asm-v850/page.h
index 74a539a..3f4fd57 100644
--- a/include/asm-v850/page.h
+++ b/include/asm-v850/page.h
@@ -16,7 +16,6 @@
 
 #include <asm/machdep.h>
 
-
 #define PAGE_SHIFT	12
 #define PAGE_SIZE       (1UL << PAGE_SHIFT)
 #define PAGE_MASK       (~(PAGE_SIZE-1))
@@ -93,11 +92,6 @@ typedef unsigned long pgprot_t;
 
 #endif /* !__ASSEMBLY__ */
 
-
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr) + PAGE_SIZE - 1) & PAGE_MASK)
-
-
 /* No current v850 processor has virtual memory.  */
 #define __virt_to_phys(addr)	(addr)
 #define __phys_to_virt(addr)	(addr)
diff --git a/include/asm-x86/page.h b/include/asm-x86/page.h
index dc936dd..05da537 100644
--- a/include/asm-x86/page.h
+++ b/include/asm-x86/page.h
@@ -29,9 +29,6 @@
 #define HPAGE_MASK		(~(HPAGE_SIZE - 1))
 #define HUGETLB_PAGE_ORDER	(HPAGE_SHIFT - PAGE_SHIFT)
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 #endif
diff --git a/include/asm-xtensa/page.h b/include/asm-xtensa/page.h
index 80a6ae0..11f7dc2 100644
--- a/include/asm-xtensa/page.h
+++ b/include/asm-xtensa/page.h
@@ -26,13 +26,11 @@
 
 /*
  * PAGE_SHIFT determines the page size
- * PAGE_ALIGN(x) aligns the pointer to the (next) page boundary
  */
 
 #define PAGE_SHIFT		12
 #define PAGE_SIZE		(__XTENSA_UL_CONST(1) << PAGE_SHIFT)
 #define PAGE_MASK		(~(PAGE_SIZE-1))
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE - 1) & PAGE_MASK)
 
 #define PAGE_OFFSET		XCHAL_KSEG_CACHED_VADDR
 #define MAX_MEM_PFN		XCHAL_KSEG_SIZE
diff --git a/include/linux/mm.h b/include/linux/mm.h
index c31a9cd..f767a8f 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -9,6 +9,7 @@
 #include <linux/list.h>
 #include <linux/mmzone.h>
 #include <linux/rbtree.h>
+#include <linux/kernel.h>
 #include <linux/prio_tree.h>
 #include <linux/debug_locks.h>
 #include <linux/mm_types.h>
@@ -41,6 +42,9 @@ extern unsigned long mmap_min_addr;
 
 #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
 
+/* to align the pointer to the (next) page boundary */
+#define PAGE_ALIGN(addr) ALIGN(addr, PAGE_SIZE)
+
 /*
  * Linux kernel virtual memory manager primitives.
  * The idea being to have a "virtual" mm in the same way

^ permalink raw reply related	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 2/4] Setup the memrlimit controller (v5)
  2008-06-12  8:52                 ` Andrea Righi
@ 2008-06-12  9:02                   ` Andrew Morton
  2008-06-12  9:12                     ` Andrea Righi
  2008-06-12 17:02                     ` [PATCH -mm] PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures Andrea Righi
  0 siblings, 2 replies; 32+ messages in thread
From: Andrew Morton @ 2008-06-12  9:02 UTC (permalink / raw)
  To: righi.andrea
  Cc: balbir, linux-mm, skumar, yamamoto, menage, lizf, linux-kernel,
	xemul, kamezawa.hiroyu

On Thu, 12 Jun 2008 10:52:13 +0200 (MEST) Andrea Righi <righi.andrea@gmail.com> wrote:

> Balbir Singh wrote:
> > Andrew Morton wrote:
> >> #define PAGE_ALIGN(addr) ALIGN(addr, PAGE_SIZE)
> >>
> >> ?
> >>
> >> afaict ALIGN() tries to do the right thing, and if it doesn't, we
> >> should fix ALIGN().
> >>
> > 
> > That should do the right thing, provided there are no duplicate ALIGN defintions
> > elsewhere
> > 
> > kernel.h has
> > 
> > 
> > #define ALIGN(x,a)              __ALIGN_MASK(x,(typeof(x))(a)-1)
> > #define __ALIGN_MASK(x,mask)    (((x)+(mask))&~(mask))
> > 
> > Which seems like the correct thing, since we use typeof(x) for (a) - 1.
> 
> OK, I'm going to do some builds and tests on my i386 and x86_64 boxes
> with the following patch. I'm sure this will break something and I don't
> know if moving PAGE_ALIGN() out of asm-*/page.h is the right way.

Thanks.  Looks good from here.

>
> ...
>
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -9,6 +9,7 @@
>  #include <linux/list.h>
>  #include <linux/mmzone.h>
>  #include <linux/rbtree.h>
> +#include <linux/kernel.h>
>  #include <linux/prio_tree.h>
>  #include <linux/debug_locks.h>
>  #include <linux/mm_types.h>
> @@ -41,6 +42,9 @@ extern unsigned long mmap_min_addr;
>  
>  #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
>  
> +/* to align the pointer to the (next) page boundary */
> +#define PAGE_ALIGN(addr) ALIGN(addr, PAGE_SIZE)
> +
>  /*
>   * Linux kernel virtual memory manager primitives.
>   * The idea being to have a "virtual" mm in the same way

You don't really need the #include <linux/kernel.h> there.  As long as
all callsites which _use_ PAGE_ALIGN are including kernel.h via some
means (and they surely will be) then things will work OK.

But it won't hurt.  We're already picking up kernel.h there via
mmzone.h->spinlock.h and probably 100 other routes.  One more won't
make a lot of difference ;)


^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [-mm][PATCH 2/4] Setup the memrlimit controller (v5)
  2008-06-12  9:02                   ` Andrew Morton
@ 2008-06-12  9:12                     ` Andrea Righi
  2008-06-12 17:02                     ` [PATCH -mm] PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures Andrea Righi
  1 sibling, 0 replies; 32+ messages in thread
From: Andrea Righi @ 2008-06-12  9:12 UTC (permalink / raw)
  To: Andrew Morton
  Cc: balbir, linux-mm, skumar, yamamoto, menage, lizf, linux-kernel,
	xemul, kamezawa.hiroyu

Andrew Morton wrote:
>> --- a/include/linux/mm.h
>> +++ b/include/linux/mm.h
>> @@ -9,6 +9,7 @@
>>  #include <linux/list.h>
>>  #include <linux/mmzone.h>
>>  #include <linux/rbtree.h>
>> +#include <linux/kernel.h>
>>  #include <linux/prio_tree.h>
>>  #include <linux/debug_locks.h>
>>  #include <linux/mm_types.h>
>> @@ -41,6 +42,9 @@ extern unsigned long mmap_min_addr;
>>  
>>  #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
>>  
>> +/* to align the pointer to the (next) page boundary */
>> +#define PAGE_ALIGN(addr) ALIGN(addr, PAGE_SIZE)
>> +
>>  /*
>>   * Linux kernel virtual memory manager primitives.
>>   * The idea being to have a "virtual" mm in the same way
> 
> You don't really need the #include <linux/kernel.h> there.  As long as
> all callsites which _use_ PAGE_ALIGN are including kernel.h via some
> means (and they surely will be) then things will work OK.

OK, testing without linux/kernel.h inclusion.

-Andrea

> 
> But it won't hurt.  We're already picking up kernel.h there via
> mmzone.h->spinlock.h and probably 100 other routes.  One more won't
> make a lot of difference ;)

^ permalink raw reply	[flat|nested] 32+ messages in thread

* [PATCH -mm] PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures
  2008-06-12  9:02                   ` Andrew Morton
  2008-06-12  9:12                     ` Andrea Righi
@ 2008-06-12 17:02                     ` Andrea Righi
  2008-06-12 22:14                       ` Andrea Righi
  2008-06-13  9:37                       ` Balbir Singh
  1 sibling, 2 replies; 32+ messages in thread
From: Andrea Righi @ 2008-06-12 17:02 UTC (permalink / raw)
  To: Andrew Morton
  Cc: balbir, linux-mm, skumar, yamamoto, menage, lizf, linux-kernel,
	xemul, kamezawa.hiroyu

I've tested the following patch on a i386 box with my usual .config and
everything seems fine. I also tested allmodconfig and some randconfig builds and
I've not seen any evident error.

I'll repeat the tests tonight on a x86_64. Other architectures should be tested
as well...

Patch is against 2.6.25-rc5-mm3.

-Andrea
---
On 32-bit architectures PAGE_ALIGN() truncates 64-bit values to the 32-bit
boundary. For example:

	u64 val = PAGE_ALIGN(size);

always returns a value < 4GB even if size is greater than 4GB.

The problem resides in PAGE_MASK definition (from include/asm-x86/page.h for
example):

#define PAGE_SHIFT      12
#define PAGE_SIZE       (_AC(1,UL) << PAGE_SHIFT)
#define PAGE_MASK       (~(PAGE_SIZE-1))
...
#define PAGE_ALIGN(addr)       (((addr)+PAGE_SIZE-1)&PAGE_MASK)

The "~" is performed on a 32-bit value, so everything in "and" with PAGE_MASK
greater than 4GB will be truncated to the 32-bit boundary. Using the ALIGN()
macro seems to be the right way, because it uses typeof(addr) for the mask.

Also move the PAGE_ALIGN() definitions out of include/asm-*/page.h in
include/linux/mm.h.

See also lkml discussion: http://lkml.org/lkml/2008/6/11/237

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---
diff -urpN linux-2.6.25-rc5-mm3/arch/powerpc/boot/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/arch/powerpc/boot/page.h
--- linux-2.6.25-rc5-mm3/arch/powerpc/boot/page.h	2008-04-17 04:49:44.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/arch/powerpc/boot/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -28,7 +28,4 @@
 /* align addr on a size boundary - adjust address up if needed */
 #define _ALIGN(addr,size)     _ALIGN_UP(addr,size)
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	_ALIGN(addr, PAGE_SIZE)
-
 #endif				/* _PPC_BOOT_PAGE_H */
diff -urpN linux-2.6.25-rc5-mm3/arch/sparc64/kernel/iommu_common.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/arch/sparc64/kernel/iommu_common.h
--- linux-2.6.25-rc5-mm3/arch/sparc64/kernel/iommu_common.h	2008-04-17 04:49:44.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/arch/sparc64/kernel/iommu_common.h	2008-06-12 15:26:46.000000000 +0200
@@ -23,7 +23,7 @@
 #define IO_PAGE_SHIFT			13
 #define IO_PAGE_SIZE			(1UL << IO_PAGE_SHIFT)
 #define IO_PAGE_MASK			(~(IO_PAGE_SIZE-1))
-#define IO_PAGE_ALIGN(addr)		(((addr)+IO_PAGE_SIZE-1)&IO_PAGE_MASK)
+#define IO_PAGE_ALIGN(addr)		ALIGN(addr, IO_PAGE_SIZE)
 
 #define IO_TSB_ENTRIES			(128*1024)
 #define IO_TSB_SIZE			(IO_TSB_ENTRIES * 8)
diff -urpN linux-2.6.25-rc5-mm3/drivers/char/random.c linux-2.6.25-rc5-mm3-fix-64-bit-page-align/drivers/char/random.c
--- linux-2.6.25-rc5-mm3/drivers/char/random.c	2008-06-12 12:35:03.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/drivers/char/random.c	2008-06-12 15:26:46.000000000 +0200
@@ -236,6 +236,7 @@
 #include <linux/fs.h>
 #include <linux/genhd.h>
 #include <linux/interrupt.h>
+#include <linux/mm.h>
 #include <linux/spinlock.h>
 #include <linux/percpu.h>
 #include <linux/cryptohash.h>
diff -urpN linux-2.6.25-rc5-mm3/drivers/media/video/pvrusb2/pvrusb2-ioread.c linux-2.6.25-rc5-mm3-fix-64-bit-page-align/drivers/media/video/pvrusb2/pvrusb2-ioread.c
--- linux-2.6.25-rc5-mm3/drivers/media/video/pvrusb2/pvrusb2-ioread.c	2008-06-12 12:37:39.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/drivers/media/video/pvrusb2/pvrusb2-ioread.c	2008-06-12 17:34:10.000000000 +0200
@@ -22,6 +22,7 @@
 #include "pvrusb2-debug.h"
 #include <linux/errno.h>
 #include <linux/string.h>
+#include <linux/mm.h>
 #include <linux/slab.h>
 #include <linux/mutex.h>
 #include <asm/uaccess.h>
diff -urpN linux-2.6.25-rc5-mm3/drivers/media/video/videobuf-core.c linux-2.6.25-rc5-mm3-fix-64-bit-page-align/drivers/media/video/videobuf-core.c
--- linux-2.6.25-rc5-mm3/drivers/media/video/videobuf-core.c	2008-06-12 12:35:15.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/drivers/media/video/videobuf-core.c	2008-06-12 17:27:11.000000000 +0200
@@ -16,6 +16,7 @@
 #include <linux/init.h>
 #include <linux/module.h>
 #include <linux/moduleparam.h>
+#include <linux/mm.h>
 #include <linux/slab.h>
 #include <linux/interrupt.h>
 
diff -urpN linux-2.6.25-rc5-mm3/drivers/pci/intel-iommu.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/drivers/pci/intel-iommu.h
--- linux-2.6.25-rc5-mm3/drivers/pci/intel-iommu.h	2008-04-17 04:49:44.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/drivers/pci/intel-iommu.h	2008-06-12 15:26:46.000000000 +0200
@@ -35,7 +35,7 @@
 #define PAGE_SHIFT_4K		(12)
 #define PAGE_SIZE_4K		(1UL << PAGE_SHIFT_4K)
 #define PAGE_MASK_4K		(((u64)-1) << PAGE_SHIFT_4K)
-#define PAGE_ALIGN_4K(addr)	(((addr) + PAGE_SIZE_4K - 1) & PAGE_MASK_4K)
+#define PAGE_ALIGN_4K(addr)	ALIGN(addr, PAGE_SIZE_4K)
 
 #define IOVA_PFN(addr)		((addr) >> PAGE_SHIFT_4K)
 #define DMA_32BIT_PFN		IOVA_PFN(DMA_32BIT_MASK)
diff -urpN linux-2.6.25-rc5-mm3/include/asm-alpha/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-alpha/page.h
--- linux-2.6.25-rc5-mm3/include/asm-alpha/page.h	2008-04-17 04:49:44.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-alpha/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -80,9 +80,6 @@ typedef struct page *pgtable_t;
 
 #endif /* !__ASSEMBLY__ */
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 #define __pa(x)			((unsigned long) (x) - PAGE_OFFSET)
 #define __va(x)			((void *)((unsigned long) (x) + PAGE_OFFSET))
 #ifndef CONFIG_DISCONTIGMEM
diff -urpN linux-2.6.25-rc5-mm3/include/asm-arm/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-arm/page.h
--- linux-2.6.25-rc5-mm3/include/asm-arm/page.h	2008-06-12 12:35:36.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-arm/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -15,9 +15,6 @@
 #define PAGE_SIZE		(1UL << PAGE_SHIFT)
 #define PAGE_MASK		(~(PAGE_SIZE-1))
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 #ifndef __ASSEMBLY__
 
 #ifndef CONFIG_MMU
diff -urpN linux-2.6.25-rc5-mm3/include/asm-arm/page-nommu.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-arm/page-nommu.h
--- linux-2.6.25-rc5-mm3/include/asm-arm/page-nommu.h	2008-04-17 04:49:44.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-arm/page-nommu.h	2008-06-12 15:26:46.000000000 +0200
@@ -7,6 +7,7 @@
  * it under the terms of the GNU General Public License version 2 as
  * published by the Free Software Foundation.
  */
+
 #ifndef _ASMARM_PAGE_NOMMU_H
 #define _ASMARM_PAGE_NOMMU_H
 
@@ -42,9 +43,6 @@ typedef unsigned long pgprot_t;
 #define __pmd(x)        (x)
 #define __pgprot(x)     (x)
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 extern unsigned long memory_start;
 extern unsigned long memory_end;
 
diff -urpN linux-2.6.25-rc5-mm3/include/asm-avr32/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-avr32/page.h
--- linux-2.6.25-rc5-mm3/include/asm-avr32/page.h	2008-06-12 12:35:36.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-avr32/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -57,9 +57,6 @@ static inline int get_order(unsigned lon
 
 #endif /* !__ASSEMBLY__ */
 
-/* Align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr) + PAGE_SIZE - 1) & PAGE_MASK)
-
 /*
  * The hardware maps the virtual addresses 0x80000000 -> 0x9fffffff
  * permanently to the physical addresses 0x00000000 -> 0x1fffffff when
diff -urpN linux-2.6.25-rc5-mm3/include/asm-blackfin/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-blackfin/page.h
--- linux-2.6.25-rc5-mm3/include/asm-blackfin/page.h	2008-04-17 04:49:44.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-blackfin/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -51,9 +51,6 @@ typedef struct page *pgtable_t;
 #define __pgd(x)	((pgd_t) { (x) } )
 #define __pgprot(x)	((pgprot_t) { (x) } )
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 extern unsigned long memory_start;
 extern unsigned long memory_end;
 
diff -urpN linux-2.6.25-rc5-mm3/include/asm-cris/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-cris/page.h
--- linux-2.6.25-rc5-mm3/include/asm-cris/page.h	2008-04-17 04:49:44.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-cris/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -60,9 +60,6 @@ typedef struct page *pgtable_t;
 
 #define page_to_phys(page)     __pa((((page) - mem_map) << PAGE_SHIFT) + PAGE_OFFSET)
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 #ifndef __ASSEMBLY__
 
 #endif /* __ASSEMBLY__ */
diff -urpN linux-2.6.25-rc5-mm3/include/asm-frv/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-frv/page.h
--- linux-2.6.25-rc5-mm3/include/asm-frv/page.h	2008-04-17 04:49:44.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-frv/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -40,9 +40,6 @@ typedef struct page *pgtable_t;
 #define __pgprot(x)	((pgprot_t) { (x) } )
 #define PTE_MASK	PAGE_MASK
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr) + PAGE_SIZE - 1) & PAGE_MASK)
-
 #define devmem_is_allowed(pfn)	1
 
 #define __pa(vaddr)		virt_to_phys((void *) (unsigned long) (vaddr))
diff -urpN linux-2.6.25-rc5-mm3/include/asm-h8300/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-h8300/page.h
--- linux-2.6.25-rc5-mm3/include/asm-h8300/page.h	2008-04-17 04:49:44.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-h8300/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -43,9 +43,6 @@ typedef struct page *pgtable_t;
 #define __pgd(x)	((pgd_t) { (x) } )
 #define __pgprot(x)	((pgprot_t) { (x) } )
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 extern unsigned long memory_start;
 extern unsigned long memory_end;
 
diff -urpN linux-2.6.25-rc5-mm3/include/asm-ia64/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-ia64/page.h
--- linux-2.6.25-rc5-mm3/include/asm-ia64/page.h	2008-06-12 12:35:36.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-ia64/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -40,7 +40,6 @@
 
 #define PAGE_SIZE		(__IA64_UL_CONST(1) << PAGE_SHIFT)
 #define PAGE_MASK		(~(PAGE_SIZE - 1))
-#define PAGE_ALIGN(addr)	(((addr) + PAGE_SIZE - 1) & PAGE_MASK)
 
 #define PERCPU_PAGE_SHIFT	16	/* log2() of max. size of per-CPU area */
 #define PERCPU_PAGE_SIZE	(__IA64_UL_CONST(1) << PERCPU_PAGE_SHIFT)
diff -urpN linux-2.6.25-rc5-mm3/include/asm-m32r/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-m32r/page.h
--- linux-2.6.25-rc5-mm3/include/asm-m32r/page.h	2008-04-17 04:49:44.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-m32r/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -41,9 +41,6 @@ typedef struct page *pgtable_t;
 
 #endif /* !__ASSEMBLY__ */
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr) + PAGE_SIZE - 1) & PAGE_MASK)
-
 /*
  * This handles the memory map.. We could make this a config
  * option, but too many people screw it up, and too few need
diff -urpN linux-2.6.25-rc5-mm3/include/asm-m68k/dvma.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-m68k/dvma.h
--- linux-2.6.25-rc5-mm3/include/asm-m68k/dvma.h	2008-06-12 12:38:05.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-m68k/dvma.h	2008-06-12 15:26:46.000000000 +0200
@@ -13,7 +13,7 @@
 #define DVMA_PAGE_SHIFT	13
 #define DVMA_PAGE_SIZE	(1UL << DVMA_PAGE_SHIFT)
 #define DVMA_PAGE_MASK	(~(DVMA_PAGE_SIZE-1))
-#define DVMA_PAGE_ALIGN(addr)	(((addr)+DVMA_PAGE_SIZE-1)&DVMA_PAGE_MASK)
+#define DVMA_PAGE_ALIGN(addr)	ALIGN(addr, DVMA_PAGE_SIZE)
 
 extern void dvma_init(void);
 extern int dvma_map_iommu(unsigned long kaddr, unsigned long baddr,
diff -urpN linux-2.6.25-rc5-mm3/include/asm-m68k/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-m68k/page.h
--- linux-2.6.25-rc5-mm3/include/asm-m68k/page.h	2008-04-17 04:49:44.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-m68k/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -103,9 +103,6 @@ typedef struct page *pgtable_t;
 #define __pgd(x)	((pgd_t) { (x) } )
 #define __pgprot(x)	((pgprot_t) { (x) } )
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 #endif /* !__ASSEMBLY__ */
 
 #include <asm/page_offset.h>
diff -urpN linux-2.6.25-rc5-mm3/include/asm-m68knommu/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-m68knommu/page.h
--- linux-2.6.25-rc5-mm3/include/asm-m68knommu/page.h	2008-04-17 04:49:44.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-m68knommu/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -43,9 +43,6 @@ typedef struct page *pgtable_t;
 #define __pgd(x)	((pgd_t) { (x) } )
 #define __pgprot(x)	((pgprot_t) { (x) } )
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 extern unsigned long memory_start;
 extern unsigned long memory_end;
 
diff -urpN linux-2.6.25-rc5-mm3/include/asm-mips/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-mips/page.h
--- linux-2.6.25-rc5-mm3/include/asm-mips/page.h	2008-04-17 04:49:44.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-mips/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -134,9 +134,6 @@ typedef struct { unsigned long pgprot; }
 
 #endif /* !__ASSEMBLY__ */
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr) + PAGE_SIZE - 1) & PAGE_MASK)
-
 /*
  * __pa()/__va() should be used only during mem init.
  */
diff -urpN linux-2.6.25-rc5-mm3/include/asm-mn10300/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-mn10300/page.h
--- linux-2.6.25-rc5-mm3/include/asm-mn10300/page.h	2008-04-17 04:49:44.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-mn10300/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -61,9 +61,6 @@ typedef struct page *pgtable_t;
 
 #endif /* !__ASSEMBLY__ */
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr) + PAGE_SIZE - 1) & PAGE_MASK)
-
 /*
  * This handles the memory map.. We could make this a config
  * option, but too many people screw it up, and too few need
diff -urpN linux-2.6.25-rc5-mm3/include/asm-parisc/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-parisc/page.h
--- linux-2.6.25-rc5-mm3/include/asm-parisc/page.h	2008-04-17 04:49:44.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-parisc/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -119,10 +119,6 @@ extern int npmem_ranges;
 #define PMD_ENTRY_SIZE	(1UL << BITS_PER_PMD_ENTRY)
 #define PTE_ENTRY_SIZE	(1UL << BITS_PER_PTE_ENTRY)
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
-
 #define LINUX_GATEWAY_SPACE     0
 
 /* This governs the relationship between virtual and physical addresses.
diff -urpN linux-2.6.25-rc5-mm3/include/asm-powerpc/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-powerpc/page.h
--- linux-2.6.25-rc5-mm3/include/asm-powerpc/page.h	2008-06-12 12:35:37.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-powerpc/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -119,9 +119,6 @@ extern phys_addr_t kernstart_addr;
 /* align addr on a size boundary - adjust address up if needed */
 #define _ALIGN(addr,size)     _ALIGN_UP(addr,size)
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	_ALIGN(addr, PAGE_SIZE)
-
 /*
  * Don't compare things with KERNELBASE or PAGE_OFFSET to test for
  * "kernelness", use is_kernel_addr() - it should do what you want.
diff -urpN linux-2.6.25-rc5-mm3/include/asm-ppc/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-ppc/page.h
--- linux-2.6.25-rc5-mm3/include/asm-ppc/page.h	2008-04-17 04:49:44.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-ppc/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -43,10 +43,6 @@ typedef unsigned long pte_basic_t;
 /* align addr on a size boundary - adjust address up if needed */
 #define _ALIGN(addr,size)     _ALIGN_UP(addr,size)
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	_ALIGN(addr, PAGE_SIZE)
-
-
 #undef STRICT_MM_TYPECHECKS
 
 #ifdef STRICT_MM_TYPECHECKS
diff -urpN linux-2.6.25-rc5-mm3/include/asm-s390/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-s390/page.h
--- linux-2.6.25-rc5-mm3/include/asm-s390/page.h	2008-06-12 12:35:37.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-s390/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -138,9 +138,6 @@ void arch_alloc_page(struct page *page, 
 
 #endif /* !__ASSEMBLY__ */
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)        (((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 #define __PAGE_OFFSET           0x0UL
 #define PAGE_OFFSET             0x0UL
 #define __pa(x)                 (unsigned long)(x)
diff -urpN linux-2.6.25-rc5-mm3/include/asm-sh/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-sh/page.h
--- linux-2.6.25-rc5-mm3/include/asm-sh/page.h	2008-06-12 12:38:06.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-sh/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -24,9 +24,6 @@
 #define PAGE_MASK	(~(PAGE_SIZE-1))
 #define PTE_MASK	PAGE_MASK
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 #if defined(CONFIG_HUGETLB_PAGE_SIZE_64K)
 #define HPAGE_SHIFT	16
 #elif defined(CONFIG_HUGETLB_PAGE_SIZE_256K)
diff -urpN linux-2.6.25-rc5-mm3/include/asm-sparc/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-sparc/page.h
--- linux-2.6.25-rc5-mm3/include/asm-sparc/page.h	2008-06-12 12:35:37.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-sparc/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -136,9 +136,6 @@ BTFIXUPDEF_SETHI(sparc_unmapped_base)
 
 #endif /* !(__ASSEMBLY__) */
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)  (((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 #define PAGE_OFFSET	0xf0000000
 #ifndef __ASSEMBLY__
 extern unsigned long phys_base;
diff -urpN linux-2.6.25-rc5-mm3/include/asm-sparc64/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-sparc64/page.h
--- linux-2.6.25-rc5-mm3/include/asm-sparc64/page.h	2008-06-12 12:35:40.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-sparc64/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -110,9 +110,6 @@ typedef struct page *pgtable_t;
 
 #endif /* !(__ASSEMBLY__) */
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 /* We used to stick this into a hard-coded global register (%g4)
  * but that does not make sense anymore.
  */
diff -urpN linux-2.6.25-rc5-mm3/include/asm-um/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-um/page.h
--- linux-2.6.25-rc5-mm3/include/asm-um/page.h	2008-06-12 12:38:06.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-um/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -92,9 +92,6 @@ typedef struct page *pgtable_t;
 #define __pgd(x) ((pgd_t) { (x) } )
 #define __pgprot(x)	((pgprot_t) { (x) } )
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 extern unsigned long uml_physmem;
 
 #define PAGE_OFFSET (uml_physmem)
diff -urpN linux-2.6.25-rc5-mm3/include/asm-x86/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-x86/page.h
--- linux-2.6.25-rc5-mm3/include/asm-x86/page.h	2008-06-12 12:38:07.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-x86/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -31,9 +31,6 @@
 
 #define HUGE_MAX_HSTATE 2
 
-/* to align the pointer to the (next) page boundary */
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE-1)&PAGE_MASK)
-
 #ifndef __ASSEMBLY__
 #include <linux/types.h>
 #endif
diff -urpN linux-2.6.25-rc5-mm3/include/asm-xtensa/page.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-xtensa/page.h
--- linux-2.6.25-rc5-mm3/include/asm-xtensa/page.h	2008-04-17 04:49:44.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/asm-xtensa/page.h	2008-06-12 15:26:46.000000000 +0200
@@ -26,13 +26,11 @@
 
 /*
  * PAGE_SHIFT determines the page size
- * PAGE_ALIGN(x) aligns the pointer to the (next) page boundary
  */
 
 #define PAGE_SHIFT		12
 #define PAGE_SIZE		(__XTENSA_UL_CONST(1) << PAGE_SHIFT)
 #define PAGE_MASK		(~(PAGE_SIZE-1))
-#define PAGE_ALIGN(addr)	(((addr)+PAGE_SIZE - 1) & PAGE_MASK)
 
 #define PAGE_OFFSET		XCHAL_KSEG_CACHED_VADDR
 #define MAX_MEM_PFN		XCHAL_KSEG_SIZE
diff -urpN linux-2.6.25-rc5-mm3/include/linux/mm.h linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/linux/mm.h
--- linux-2.6.25-rc5-mm3/include/linux/mm.h	2008-06-12 12:38:08.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/include/linux/mm.h	2008-06-12 15:26:46.000000000 +0200
@@ -41,6 +41,9 @@ extern unsigned long mmap_min_addr;
 
 #define nth_page(page,n) pfn_to_page(page_to_pfn((page)) + (n))
 
+/* to align the pointer to the (next) page boundary */
+#define PAGE_ALIGN(addr) ALIGN(addr, PAGE_SIZE)
+
 /*
  * Linux kernel virtual memory manager primitives.
  * The idea being to have a "virtual" mm in the same way
diff -urpN linux-2.6.25-rc5-mm3/sound/core/info.c linux-2.6.25-rc5-mm3-fix-64-bit-page-align/sound/core/info.c
--- linux-2.6.25-rc5-mm3/sound/core/info.c	2008-06-12 12:35:49.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/sound/core/info.c	2008-06-12 18:26:31.000000000 +0200
@@ -21,6 +21,7 @@
 
 #include <linux/init.h>
 #include <linux/time.h>
+#include <linux/mm.h>
 #include <linux/smp_lock.h>
 #include <linux/string.h>
 #include <sound/core.h>

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH -mm] PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures
  2008-06-12 17:02                     ` [PATCH -mm] PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures Andrea Righi
@ 2008-06-12 22:14                       ` Andrea Righi
  2008-06-13  9:37                       ` Balbir Singh
  1 sibling, 0 replies; 32+ messages in thread
From: Andrea Righi @ 2008-06-12 22:14 UTC (permalink / raw)
  To: Andrew Morton
  Cc: balbir, linux-mm, skumar, yamamoto, menage, lizf, linux-kernel,
	xemul, kamezawa.hiroyu

Andrea Righi wrote:
> I've tested the following patch on a i386 box with my usual .config and
> everything seems fine. I also tested allmodconfig and some randconfig builds and
> I've not seen any evident error.
> 
> I'll repeat the tests tonight on a x86_64. Other architectures should be tested
> as well...

x86_64 allmodconfig build failed due to a missing #include <linux/mm.h>
in arch/x86/kernel/module_64.c.

Following patch resolves (on top of the previous one).

Except this, no errors for x86_64.

Signed-off-by: Andrea Righi <righi.andrea@gmail.com>
---
diff -urpN linux-2.6.25-rc5-mm3/arch/x86/kernel/module_64.c linux-2.6.25-rc5-mm3-fix-64-bit-page-align/arch/x86/kernel/module_64.c
--- linux-2.6.25-rc5-mm3/arch/x86/kernel/module_64.c	2008-06-13 00:05:51.000000000 +0200
+++ linux-2.6.25-rc5-mm3-fix-64-bit-page-align/arch/x86/kernel/module_64.c	2008-06-13 00:06:21.000000000 +0200
@@ -22,6 +22,7 @@
 #include <linux/fs.h>
 #include <linux/string.h>
 #include <linux/kernel.h>
+#include <linux/mm.h>
 #include <linux/slab.h>
 #include <linux/bug.h>
 

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH -mm] PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures
  2008-06-12 17:02                     ` [PATCH -mm] PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures Andrea Righi
  2008-06-12 22:14                       ` Andrea Righi
@ 2008-06-13  9:37                       ` Balbir Singh
  2008-06-13  9:45                         ` Li Zefan
  1 sibling, 1 reply; 32+ messages in thread
From: Balbir Singh @ 2008-06-13  9:37 UTC (permalink / raw)
  To: righi.andrea
  Cc: Andrew Morton, linux-mm, Sudhir Kumar, yamamoto, menage, lizf,
	linux-kernel, xemul, kamezawa.hiroyu, linux-arch

Andrea Righi wrote:
> I've tested the following patch on a i386 box with my usual .config and
> everything seems fine. I also tested allmodconfig and some randconfig
> builds and
> I've not seen any evident error.
> 
> I'll repeat the tests tonight on a x86_64. Other architectures should be
> tested
> as well...
> 
> Patch is against 2.6.25-rc5-mm3.

Hi, Andrea,

CC'ing linux-arch. I have a power box, but it's busy. I'll try and test your
patch on it as soon as I can get hold of it.


-- 
	Warm Regards,
	Balbir Singh
	Linux Technology Center
	IBM, ISTL

^ permalink raw reply	[flat|nested] 32+ messages in thread

* Re: [PATCH -mm] PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures
  2008-06-13  9:37                       ` Balbir Singh
@ 2008-06-13  9:45                         ` Li Zefan
  0 siblings, 0 replies; 32+ messages in thread
From: Li Zefan @ 2008-06-13  9:45 UTC (permalink / raw)
  To: balbir
  Cc: righi.andrea, Andrew Morton, linux-mm, Sudhir Kumar, yamamoto,
	menage, linux-kernel, xemul, kamezawa.hiroyu, linux-arch

Balbir Singh wrote:
> Andrea Righi wrote:
>> I've tested the following patch on a i386 box with my usual .config and
>> everything seems fine. I also tested allmodconfig and some randconfig
>> builds and
>> I've not seen any evident error.
>>
>> I'll repeat the tests tonight on a x86_64. Other architectures should be
>> tested
>> as well...
>>
>> Patch is against 2.6.25-rc5-mm3.
> 
> Hi, Andrea,
> 
> CC'ing linux-arch. I have a power box, but it's busy. I'll try and test your
> patch on it as soon as I can get hold of it.
> 
> 

It passes allmodconfig and a randconfig build on ia64, but I have no more access
to the machine to do more test.

^ permalink raw reply	[flat|nested] 32+ messages in thread

end of thread, other threads:[~2008-06-13  9:48 UTC | newest]

Thread overview: 32+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-05-21 15:29 [-mm][PATCH 0/4] Add memrlimit controller (v5) Balbir Singh
2008-05-21 15:29 ` [-mm][PATCH 1/4] Add memrlimit controller documentation (v5) Balbir Singh
2008-05-22  4:16   ` Andrew Morton
2008-05-22 10:13     ` Balbir Singh
2008-05-21 15:29 ` [-mm][PATCH 2/4] Setup the memrlimit controller (v5) Balbir Singh
2008-05-22  4:18   ` Andrew Morton
2008-05-22 10:14     ` Balbir Singh
2008-06-11 17:10   ` Andrea Righi
2008-06-11 17:33     ` Balbir Singh
2008-06-11 18:48       ` Andrea Righi
2008-06-11 19:15     ` Andrew Morton
2008-06-11 20:17       ` Andrea Righi
2008-06-11 20:43         ` Andrew Morton
2008-06-11 22:47           ` Andrea Righi
2008-06-11 22:55             ` Andrew Morton
2008-06-11 23:04               ` Andrea Righi
2008-06-12  6:13               ` Balbir Singh
2008-06-12  8:52                 ` Andrea Righi
2008-06-12  9:02                   ` Andrew Morton
2008-06-12  9:12                     ` Andrea Righi
2008-06-12 17:02                     ` [PATCH -mm] PAGE_ALIGN(): correctly handle 64-bit values on 32-bit architectures Andrea Righi
2008-06-12 22:14                       ` Andrea Righi
2008-06-13  9:37                       ` Balbir Singh
2008-06-13  9:45                         ` Li Zefan
2008-05-21 15:29 ` [-mm][PATCH 3/4] cgroup mm owner callback changes to add task info (v5) Balbir Singh
2008-05-22  4:19   ` Andrew Morton
2008-05-22 10:14     ` Balbir Singh
2008-05-21 15:30 ` [-mm][PATCH 4/4] Add memrlimit controller accounting and control (v5) Balbir Singh
2008-05-21 17:20   ` Vivek Goyal
2008-05-22  4:41     ` Balbir Singh
2008-05-22  4:24   ` Andrew Morton
2008-05-22 10:15     ` Balbir Singh

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).