All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/09] cifs: local caching support using FS-Cache
       [not found] <Yes>
@ 2010-07-05 12:41 ` Suresh Jayaraman
       [not found]   ` <1278333663-30464-1-git-send-email-sjayaraman-l3A5Bk7waGM@public.gmane.org>
  2010-07-05 12:41 ` [PATCH 01/09] cifs: add kernel config option for CIFS Client caching support Suresh Jayaraman
                   ` (22 subsequent siblings)
  23 siblings, 1 reply; 265+ messages in thread
From: Suresh Jayaraman @ 2010-07-05 12:41 UTC (permalink / raw)
  To: Steve French; +Cc: linux-fsdevel, linux-cifs, linux-cachefs

This patchset is a second try at adding persistent, local caching facility for
CIFS using the FS-Cache interface.

The cache index hierarchy which is mainly used to locate a file object or
discard a certain subset of the files cached, currently has three levels:
	- Server
	- Share 
	- File

The server index object is keyed by IPaddress of the server, socket family
and the port. The superblock index object is keyed by the sharename and the
inode object is keyed by the UniqueId. The cache coherency is ensured by
checking the LastWriteTime, LastChangeTime and end of file (eof) reported by
the server.

Changes since last post:
-------------------------
   - fix a bug during registration with FS-Cache
   - fix a bug while storing pages to the cache. The earlier set needed an rsize
     of 4096 to make caching working properly due to this bug.
   - server index key uses {IPaddress,family,port} tuple instead of servername
   - root dir of the share is validated by UniqueId/IndexNumber. Almost all
     servers seem to provide one of them and it's unique.
   - we now check LastWriteTime, LastChangeTime and eof of server for data
     coherency. CreateTime could be considered once some of related development
     effort advances
   - dropped the patch to guard cifsglob.h against multiple inclusion as it
     has been included in Jeff Layton's tree
   - some cleanups

To try these patches:

   - apply this patchset in order
   - mount the share, for e.g. mount -t cifs //server/share -o user=guest
   - try copying a huge file (say few hundred MBs) from mount point to local
     filesystem (during the first time, the cache will be initialized)
   - when you copy the second time, it should be read from the local cache (you
     could unmount and remount to reduce the page cache impact).


Known issues
--------------
   - when the 'noserverino' mount option is used, the client generates
     UniqueId itself rather than using the UniqueId from the server. As
     we use UniqueId to ensure that the root directory did not change
     under the hood, we won't be able to benefit from the cache.
   - the cache coherency check may not be reliable always as some
     CIFS servers are known not to update mtime until the filehandle is closed.

Suresh Jayaraman (09):
  cifs: add kernel config option for CIFS Client caching support
  cifs: register CIFS for caching
  cifs: define server-level cache index objects and register them with FS-Cache
  cifs: define superblock-level cache index objects and register them
  cifs: define inode-level cache object and register them
  cifs: FS-Cache page management
  cifs: store pages into local cache
  cifs: read pages from FS-Cache
  cifs: add mount option to enable local caching


 Kconfig      |    9 +
 Makefile     |    2 
 cache.c      |  331 +++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
 cifs_fs_sb.h |    1 
 cifsfs.c     |   15 ++
 cifsglob.h   |   10 +
 connect.c    |   16 ++
 file.c       |   50 ++++++++
 fscache.c    |  236 ++++++++++++++++++++++++++++++++++++++++++
 fscache.h    |  136 ++++++++++++++++++++++++
 inode.c      |    7 +
 11 files changed, 813 insertions(+)

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH 01/09] cifs: add kernel config option for CIFS Client caching support
       [not found] <Yes>
  2010-07-05 12:41 ` [PATCH 00/09] cifs: local caching support using FS-Cache Suresh Jayaraman
@ 2010-07-05 12:41 ` Suresh Jayaraman
  2010-07-05 12:41 ` [PATCH 02/09] cifs: register CIFS for caching Suresh Jayaraman
                   ` (21 subsequent siblings)
  23 siblings, 0 replies; 265+ messages in thread
From: Suresh Jayaraman @ 2010-07-05 12:41 UTC (permalink / raw)
  To: Steve French; +Cc: linux-cifs, linux-fsdevel, linux-cachefs, David Howells

Add a kernel config option to enable local caching for CIFS.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
---
 fs/cifs/Kconfig |    9 +++++++++
 1 file changed, 9 insertions(+)

Index: cifs-2.6/fs/cifs/Kconfig
===================================================================
--- cifs-2.6.orig/fs/cifs/Kconfig
+++ cifs-2.6/fs/cifs/Kconfig
@@ -131,6 +131,15 @@ config CIFS_DFS_UPCALL
 	    IP addresses) which is needed for implicit mounts of DFS junction
 	    points. If unsure, say N.
 
+config CIFS_FSCACHE
+	  bool "Provide CIFS client caching support (EXPERIMENTAL)"
+	  depends on EXPERIMENTAL
+	  depends on CIFS=m && FSCACHE || CIFS=y && FSCACHE=y
+	  help
+	    Makes CIFS FS-Cache capable. Say Y here if you want your CIFS data
+	    to be cached locally on disk through the general filesystem cache
+	    manager. If unsure, say N.
+
 config CIFS_EXPERIMENTAL
 	  bool "CIFS Experimental Features (EXPERIMENTAL)"
 	  depends on CIFS && EXPERIMENTAL

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH 02/09] cifs: register CIFS for caching
       [not found] <Yes>
  2010-07-05 12:41 ` [PATCH 00/09] cifs: local caching support using FS-Cache Suresh Jayaraman
  2010-07-05 12:41 ` [PATCH 01/09] cifs: add kernel config option for CIFS Client caching support Suresh Jayaraman
@ 2010-07-05 12:41 ` Suresh Jayaraman
  2010-07-05 12:42 ` [PATCH 03/09] cifs: define server-level cache index objects and register them Suresh Jayaraman
                   ` (20 subsequent siblings)
  23 siblings, 0 replies; 265+ messages in thread
From: Suresh Jayaraman @ 2010-07-05 12:41 UTC (permalink / raw)
  To: Steve French
  Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA, David Howells

Define CIFS for FS-Cache and register for caching. Upon registration the
top-level index object cookie will be stuck to the netfs definition by
FS-Cache.

Signed-off-by: Suresh Jayaraman <sjayaraman-l3A5Bk7waGM@public.gmane.org>
---
 fs/cifs/Makefile  |    2 ++
 fs/cifs/cache.c   |   46 ++++++++++++++++++++++++++++++++++++++++++++++
 fs/cifs/cifsfs.c  |    8 ++++++++
 fs/cifs/fscache.h |   39 +++++++++++++++++++++++++++++++++++++++
 4 files changed, 95 insertions(+)
 create mode 100644 fs/cifs/cache.c
 create mode 100644 fs/cifs/fscache.h

Index: cifs-2.6/fs/cifs/Makefile
===================================================================
--- cifs-2.6.orig/fs/cifs/Makefile
+++ cifs-2.6/fs/cifs/Makefile
@@ -11,3 +11,5 @@ cifs-y := cifsfs.o cifssmb.o cifs_debug.
 cifs-$(CONFIG_CIFS_UPCALL) += cifs_spnego.o
 
 cifs-$(CONFIG_CIFS_DFS_UPCALL) += dns_resolve.o cifs_dfs_ref.o
+
+cifs-$(CONFIG_CIFS_FSCACHE) += cache.o
Index: cifs-2.6/fs/cifs/cache.c
===================================================================
--- /dev/null
+++ cifs-2.6/fs/cifs/cache.c
@@ -0,0 +1,46 @@
+/*
+ *   fs/cifs/cache.c - CIFS filesystem cache index structure definitions
+ *
+ *   Copyright (c) 2010 Novell, Inc.
+ *   Authors(s): Suresh Jayaraman (sjayaraman-l3A5Bk7waGM@public.gmane.org>
+ *
+ *   This library is free software; you can redistribute it and/or modify
+ *   it under the terms of the GNU Lesser General Public License as published
+ *   by the Free Software Foundation; either version 2.1 of the License, or
+ *   (at your option) any later version.
+ *
+ *   This library is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU Lesser General Public License for more details.
+ *
+ *   You should have received a copy of the GNU Lesser General Public License
+ *   along with this library; if not, write to the Free Software
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+#include "fscache.h"
+
+/*
+ * CIFS filesystem definition for FS-Cache
+ */
+struct fscache_netfs cifs_fscache_netfs = {
+	.name = "cifs",
+	.version = 0,
+};
+
+/*
+ * Register CIFS for caching with FS-Cache
+ */
+int cifs_fscache_register(void)
+{
+	return fscache_register_netfs(&cifs_fscache_netfs);
+}
+
+/*
+ * Unregister CIFS for caching
+ */
+void cifs_fscache_unregister(void)
+{
+	fscache_unregister_netfs(&cifs_fscache_netfs);
+}
+
Index: cifs-2.6/fs/cifs/cifsfs.c
===================================================================
--- cifs-2.6.orig/fs/cifs/cifsfs.c
+++ cifs-2.6/fs/cifs/cifsfs.c
@@ -47,6 +47,7 @@
 #include <linux/key-type.h>
 #include "dns_resolve.h"
 #include "cifs_spnego.h"
+#include "fscache.h"
 #define CIFS_MAGIC_NUMBER 0xFF534D42	/* the first four bytes of SMB PDUs */
 
 int cifsFYI = 0;
@@ -902,6 +903,10 @@ init_cifs(void)
 		cFYI(1, "cifs_max_pending set to max of 256");
 	}
 
+	rc = cifs_fscache_register();
+	if (rc)
+		goto out;
+
 	rc = cifs_init_inodecache();
 	if (rc)
 		goto out_clean_proc;
@@ -951,6 +956,8 @@ init_cifs(void)
 	cifs_destroy_inodecache();
  out_clean_proc:
 	cifs_proc_clean();
+	cifs_fscache_unregister();
+ out:
 	return rc;
 }
 
@@ -959,6 +966,7 @@ exit_cifs(void)
 {
 	cFYI(DBG2, "exit_cifs");
 	cifs_proc_clean();
+	cifs_fscache_unregister();
 #ifdef CONFIG_CIFS_DFS_UPCALL
 	cifs_dfs_release_automount_timer();
 	unregister_key_type(&key_type_dns_resolver);
Index: cifs-2.6/fs/cifs/fscache.h
===================================================================
--- /dev/null
+++ cifs-2.6/fs/cifs/fscache.h
@@ -0,0 +1,39 @@
+/*
+ *   fs/cifs/fscache.h - CIFS filesystem cache interface definitions
+ *
+ *   Copyright (c) 2010 Novell, Inc.
+ *   Authors(s): Suresh Jayaraman (sjayaraman-l3A5Bk7waGM@public.gmane.org>
+ *
+ *   This library is free software; you can redistribute it and/or modify
+ *   it under the terms of the GNU Lesser General Public License as published
+ *   by the Free Software Foundation; either version 2.1 of the License, or
+ *   (at your option) any later version.
+ *
+ *   This library is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU Lesser General Public License for more details.
+ *
+ *   You should have received a copy of the GNU Lesser General Public License
+ *   along with this library; if not, write to the Free Software
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+#ifndef _CIFS_FSCACHE_H
+#define _CIFS_FSCACHE_H
+
+#include <linux/fscache.h>
+
+#ifdef CONFIG_CIFS_FSCACHE
+
+extern struct fscache_netfs cifs_fscache_netfs;
+
+extern int cifs_fscache_register(void);
+extern void cifs_fscache_unregister(void);
+
+#else /* CONFIG_CIFS_FSCACHE */
+static inline int cifs_fscache_register(void) { return 0; }
+static inline void cifs_fscache_unregister(void) {}
+
+#endif /* CONFIG_CIFS_FSCACHE */
+
+#endif /* _CIFS_FSCACHE_H */

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH 03/09] cifs: define server-level cache index objects and register them
       [not found] <Yes>
                   ` (2 preceding siblings ...)
  2010-07-05 12:41 ` [PATCH 02/09] cifs: register CIFS for caching Suresh Jayaraman
@ 2010-07-05 12:42 ` Suresh Jayaraman
  2010-07-05 12:42 ` [PATCH 04/09] cifs: define superblock-level " Suresh Jayaraman
                   ` (19 subsequent siblings)
  23 siblings, 0 replies; 265+ messages in thread
From: Suresh Jayaraman @ 2010-07-05 12:42 UTC (permalink / raw)
  To: Steve French; +Cc: linux-cifs, linux-fsdevel, linux-cachefs, David Howells

Define server-level cache index objects (as managed by TCP_ServerInfo structs)
and register then with FS-Cache. Each server object is created in the CIFS
top-level index object and is itself an index into which superblock-level
objects are inserted.

The server objects are now keyed by {IPaddress,family,port} tuple.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
---
 fs/cifs/Makefile   |    2 -
 fs/cifs/cache.c    |   62 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/cifs/cifsglob.h |    3 ++
 fs/cifs/connect.c  |    5 ++++
 fs/cifs/fscache.c  |   41 +++++++++++++++++++++++++++++++++++
 fs/cifs/fscache.h  |   14 +++++++++++
 6 files changed, 126 insertions(+), 1 deletion(-)
 create mode 100644 fs/cifs/fscache.c

Index: cifs-2.6/fs/cifs/Makefile
===================================================================
--- cifs-2.6.orig/fs/cifs/Makefile
+++ cifs-2.6/fs/cifs/Makefile
@@ -12,4 +12,4 @@ cifs-$(CONFIG_CIFS_UPCALL) += cifs_spneg
 
 cifs-$(CONFIG_CIFS_DFS_UPCALL) += dns_resolve.o cifs_dfs_ref.o
 
-cifs-$(CONFIG_CIFS_FSCACHE) += cache.o
+cifs-$(CONFIG_CIFS_FSCACHE) += fscache.o cache.o
Index: cifs-2.6/fs/cifs/cache.c
===================================================================
--- cifs-2.6.orig/fs/cifs/cache.c
+++ cifs-2.6/fs/cifs/cache.c
@@ -19,6 +19,7 @@
  *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
  */
 #include "fscache.h"
+#include "cifs_debug.h"
 
 /*
  * CIFS filesystem definition for FS-Cache
@@ -44,3 +45,64 @@ void cifs_fscache_unregister(void)
 	fscache_unregister_netfs(&cifs_fscache_netfs);
 }
 
+/*
+ * Key layout of CIFS server cache index object
+ */
+struct cifs_server_key {
+	uint16_t	family;		/* address family */
+	uint16_t	port;		/* IP port */
+	union {
+		struct in_addr	ipv4_addr;
+		struct in6_addr	ipv6_addr;
+	} addr[0];
+};
+
+/*
+ * Server object keyed by {IPaddress,port,family} tuple
+ */
+static uint16_t cifs_server_get_key(const void *cookie_netfs_data,
+				   void *buffer, uint16_t maxbuf)
+{
+	const struct TCP_Server_Info *server = cookie_netfs_data;
+	const struct sockaddr *sa = (struct sockaddr *) &server->addr.sockAddr;
+	struct cifs_server_key *key = buffer;
+	uint16_t key_len = sizeof(struct cifs_server_key);
+
+	memset(key, 0, key_len);
+
+	/*
+	 * Should not be a problem as sin_family/sin6_family overlays
+	 * sa_family field
+	 */
+	switch (sa->sa_family) {
+	case AF_INET:
+		key->family = server->addr.sockAddr.sin_family;
+		key->port = server->addr.sockAddr.sin_port;
+		key->addr[0].ipv4_addr = server->addr.sockAddr.sin_addr;
+		key_len += sizeof(key->addr[0].ipv4_addr);
+		break;
+
+	case AF_INET6:
+		key->family = server->addr.sockAddr6.sin6_family;
+		key->port = server->addr.sockAddr6.sin6_port;
+		key->addr[0].ipv6_addr = server->addr.sockAddr6.sin6_addr;
+		key_len += sizeof(key->addr[0].ipv6_addr);
+		break;
+
+	default:
+		cERROR(1, "CIFS: Unknown network family '%d'", sa->sa_family);
+		key_len = 0;
+		break;
+	}
+
+	return key_len;
+}
+
+/*
+ * Server object for FS-Cache
+ */
+const struct fscache_cookie_def cifs_fscache_server_index_def = {
+	.name = "CIFS.server",
+	.type = FSCACHE_COOKIE_TYPE_INDEX,
+	.get_key = cifs_server_get_key,
+};
Index: cifs-2.6/fs/cifs/cifsglob.h
===================================================================
--- cifs-2.6.orig/fs/cifs/cifsglob.h
+++ cifs-2.6/fs/cifs/cifsglob.h
@@ -190,6 +190,9 @@ struct TCP_Server_Info {
 	bool	sec_mskerberos;		/* supports legacy MS Kerberos */
 	bool	sec_kerberosu2u;	/* supports U2U Kerberos */
 	bool	sec_ntlmssp;		/* supports NTLMSSP */
+#ifdef CONFIG_CIFS_FSCACHE
+	struct fscache_cookie   *fscache; /* client index cache cookie */
+#endif
 };
 
 /*
Index: cifs-2.6/fs/cifs/connect.c
===================================================================
--- cifs-2.6.orig/fs/cifs/connect.c
+++ cifs-2.6/fs/cifs/connect.c
@@ -48,6 +48,7 @@
 #include "nterr.h"
 #include "rfc1002pdu.h"
 #include "cn_cifs.h"
+#include "fscache.h"
 
 #define CIFS_PORT 445
 #define RFC1001_PORT 139
@@ -1460,6 +1461,8 @@ cifs_put_tcp_session(struct TCP_Server_I
 	server->tcpStatus = CifsExiting;
 	spin_unlock(&GlobalMid_Lock);
 
+	cifs_fscache_release_client_cookie(server);
+
 	task = xchg(&server->tsk, NULL);
 	if (task)
 		force_sig(SIGKILL, task);
@@ -1577,6 +1580,8 @@ cifs_get_tcp_session(struct smb_vol *vol
 	list_add(&tcp_ses->tcp_ses_list, &cifs_tcp_ses_list);
 	write_unlock(&cifs_tcp_ses_lock);
 
+	cifs_fscache_get_client_cookie(tcp_ses);
+
 	return tcp_ses;
 
 out_err:
Index: cifs-2.6/fs/cifs/fscache.c
===================================================================
--- /dev/null
+++ cifs-2.6/fs/cifs/fscache.c
@@ -0,0 +1,41 @@
+/*
+ *   fs/cifs/fscache.c - CIFS filesystem cache interface
+ *
+ *   Copyright (c) 2010 Novell, Inc.
+ *   Author(s): Suresh Jayaraman (sjayaraman@suse.de>
+ *
+ *   This library is free software; you can redistribute it and/or modify
+ *   it under the terms of the GNU Lesser General Public License as published
+ *   by the Free Software Foundation; either version 2.1 of the License, or
+ *   (at your option) any later version.
+ *
+ *   This library is distributed in the hope that it will be useful,
+ *   but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See
+ *   the GNU Lesser General Public License for more details.
+ *
+ *   You should have received a copy of the GNU Lesser General Public License
+ *   along with this library; if not, write to the Free Software
+ *   Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
+ */
+#include "fscache.h"
+#include "cifsglob.h"
+#include "cifs_debug.h"
+
+void cifs_fscache_get_client_cookie(struct TCP_Server_Info *server)
+{
+	server->fscache =
+		fscache_acquire_cookie(cifs_fscache_netfs.primary_index,
+				&cifs_fscache_server_index_def, server);
+	cFYI(1, "CIFS: get client cookie (0x%p/0x%p)", server,
+				server->fscache);
+}
+
+void cifs_fscache_release_client_cookie(struct TCP_Server_Info *server)
+{
+	cFYI(1, "CIFS: release client cookie (0x%p/0x%p)", server,
+				server->fscache);
+	fscache_relinquish_cookie(server->fscache, 0);
+	server->fscache = NULL;
+}
+
Index: cifs-2.6/fs/cifs/fscache.h
===================================================================
--- cifs-2.6.orig/fs/cifs/fscache.h
+++ cifs-2.6/fs/cifs/fscache.h
@@ -23,17 +23,31 @@
 
 #include <linux/fscache.h>
 
+#include "cifsglob.h"
+
 #ifdef CONFIG_CIFS_FSCACHE
 
 extern struct fscache_netfs cifs_fscache_netfs;
+extern const struct fscache_cookie_def cifs_fscache_server_index_def;
 
 extern int cifs_fscache_register(void);
 extern void cifs_fscache_unregister(void);
 
+/*
+ * fscache.c
+ */
+extern void cifs_fscache_get_client_cookie(struct TCP_Server_Info *);
+extern void cifs_fscache_release_client_cookie(struct TCP_Server_Info *);
+
 #else /* CONFIG_CIFS_FSCACHE */
 static inline int cifs_fscache_register(void) { return 0; }
 static inline void cifs_fscache_unregister(void) {}
 
+static inline void
+cifs_fscache_get_client_cookie(struct TCP_Server_Info *server) {}
+static inline void
+cifs_fscache_get_client_cookie(struct TCP_Server_Info *server); {}
+
 #endif /* CONFIG_CIFS_FSCACHE */
 
 #endif /* _CIFS_FSCACHE_H */

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH 04/09] cifs: define superblock-level cache index objects and register them
       [not found] <Yes>
                   ` (3 preceding siblings ...)
  2010-07-05 12:42 ` [PATCH 03/09] cifs: define server-level cache index objects and register them Suresh Jayaraman
@ 2010-07-05 12:42 ` Suresh Jayaraman
  2010-07-20 12:53   ` Jeff Layton
  2010-07-05 12:42 ` [PATCH 05/09] cifs: define inode-level cache object " Suresh Jayaraman
                   ` (18 subsequent siblings)
  23 siblings, 1 reply; 265+ messages in thread
From: Suresh Jayaraman @ 2010-07-05 12:42 UTC (permalink / raw)
  To: Steve French
  Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA, David Howells

Define superblock-level cache index objects (managed by cifsTconInfo structs).
Each superblock object is created in a server-level index object and in itself
an index into which inode-level objects are inserted.

The superblock object is keyed by sharename. The UniqueId/IndexNumber is used to
validate that the exported share is the same since we accessed it last time.

Signed-off-by: Suresh Jayaraman <sjayaraman-l3A5Bk7waGM@public.gmane.org>
---
 fs/cifs/cache.c    |  109 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/cifs/cifsglob.h |    4 +
 fs/cifs/connect.c  |    3 +
 fs/cifs/fscache.c  |   17 ++++++++
 fs/cifs/fscache.h  |    6 ++
 fs/cifs/inode.c    |    3 +
 6 files changed, 142 insertions(+)

Index: cifs-2.6/fs/cifs/cache.c
===================================================================
--- cifs-2.6.orig/fs/cifs/cache.c
+++ cifs-2.6/fs/cifs/cache.c
@@ -106,3 +106,112 @@ const struct fscache_cookie_def cifs_fsc
 	.type = FSCACHE_COOKIE_TYPE_INDEX,
 	.get_key = cifs_server_get_key,
 };
+
+/*
+ * Auxiliary data attached to CIFS superblock within the cache
+ */
+struct cifs_fscache_super_auxdata {
+	u64	resource_id;		/* unique server resource id */
+};
+
+static char *extract_sharename(const char *treename)
+{
+	const char *src;
+	char *delim, *dst;
+	int len;
+
+	/* skip double chars at the beginning */
+	src = treename + 2;
+
+	/* share name is always preceded by '\\' now */
+	delim = strchr(src, '\\');
+	if (!delim)
+		return ERR_PTR(-EINVAL);
+	delim++;
+	len = strlen(delim);
+
+	/* caller has to free the memory */
+	dst = kstrndup(delim, len, GFP_KERNEL);
+	if (!dst)
+		return ERR_PTR(-ENOMEM);
+
+	return dst;
+}
+
+/*
+ * Superblock object currently keyed by share name
+ */
+static uint16_t cifs_super_get_key(const void *cookie_netfs_data, void *buffer,
+				   uint16_t maxbuf)
+{
+	const struct cifsTconInfo *tcon = cookie_netfs_data;
+	char *sharename;
+	uint16_t len;
+
+	sharename = extract_sharename(tcon->treeName);
+	if (IS_ERR(sharename)) {
+		cFYI(1, "CIFS: couldn't extract sharename\n");
+		sharename = NULL;
+		return 0;
+	}
+
+	len = strlen(sharename);
+	if (len > maxbuf)
+		return 0;
+
+	memcpy(buffer, sharename, len);
+
+	kfree(sharename);
+
+	return len;
+}
+
+static uint16_t
+cifs_fscache_super_get_aux(const void *cookie_netfs_data, void *buffer,
+			   uint16_t maxbuf)
+{
+	struct cifs_fscache_super_auxdata auxdata;
+	const struct cifsTconInfo *tcon = cookie_netfs_data;
+
+	memset(&auxdata, 0, sizeof(auxdata));
+	auxdata.resource_id = tcon->resource_id;
+
+	if (maxbuf > sizeof(auxdata))
+		maxbuf = sizeof(auxdata);
+
+	memcpy(buffer, &auxdata, maxbuf);
+
+	return maxbuf;
+}
+
+static enum
+fscache_checkaux cifs_fscache_super_check_aux(void *cookie_netfs_data,
+					      const void *data,
+					      uint16_t datalen)
+{
+	struct cifs_fscache_super_auxdata auxdata;
+	const struct cifsTconInfo *tcon = cookie_netfs_data;
+
+	if (datalen != sizeof(auxdata))
+		return FSCACHE_CHECKAUX_OBSOLETE;
+
+	memset(&auxdata, 0, sizeof(auxdata));
+	auxdata.resource_id = tcon->resource_id;
+
+	if (memcmp(data, &auxdata, datalen) != 0)
+		return FSCACHE_CHECKAUX_OBSOLETE;
+
+	return FSCACHE_CHECKAUX_OKAY;
+}
+
+/*
+ * Superblock object for FS-Cache
+ */
+const struct fscache_cookie_def cifs_fscache_super_index_def = {
+	.name = "CIFS.super",
+	.type = FSCACHE_COOKIE_TYPE_INDEX,
+	.get_key = cifs_super_get_key,
+	.get_aux = cifs_fscache_super_get_aux,
+	.check_aux = cifs_fscache_super_check_aux,
+};
+
Index: cifs-2.6/fs/cifs/fscache.c
===================================================================
--- cifs-2.6.orig/fs/cifs/fscache.c
+++ cifs-2.6/fs/cifs/fscache.c
@@ -39,3 +39,20 @@ void cifs_fscache_release_client_cookie(
 	server->fscache = NULL;
 }
 
+void cifs_fscache_get_super_cookie(struct cifsTconInfo *tcon)
+{
+	struct TCP_Server_Info *server = tcon->ses->server;
+
+	tcon->fscache =
+		fscache_acquire_cookie(server->fscache,
+				&cifs_fscache_super_index_def, tcon);
+	cFYI(1, "CIFS: get superblock cookie (0x%p/0x%p)",
+				server->fscache, tcon->fscache);
+}
+
+void cifs_fscache_release_super_cookie(struct cifsTconInfo *tcon)
+{
+	cFYI(1, "CIFS: releasing superblock cookie (0x%p)", tcon->fscache);
+	fscache_relinquish_cookie(tcon->fscache, 0);
+	tcon->fscache = NULL;
+}
Index: cifs-2.6/fs/cifs/fscache.h
===================================================================
--- cifs-2.6.orig/fs/cifs/fscache.h
+++ cifs-2.6/fs/cifs/fscache.h
@@ -29,6 +29,7 @@
 
 extern struct fscache_netfs cifs_fscache_netfs;
 extern const struct fscache_cookie_def cifs_fscache_server_index_def;
+extern const struct fscache_cookie_def cifs_fscache_super_index_def;
 
 extern int cifs_fscache_register(void);
 extern void cifs_fscache_unregister(void);
@@ -38,6 +39,8 @@ extern void cifs_fscache_unregister(void
  */
 extern void cifs_fscache_get_client_cookie(struct TCP_Server_Info *);
 extern void cifs_fscache_release_client_cookie(struct TCP_Server_Info *);
+extern void cifs_fscache_get_super_cookie(struct cifsTconInfo *);
+extern void cifs_fscache_release_super_cookie(struct cifsTconInfo *);
 
 #else /* CONFIG_CIFS_FSCACHE */
 static inline int cifs_fscache_register(void) { return 0; }
@@ -47,6 +50,9 @@ static inline void
 cifs_fscache_get_client_cookie(struct TCP_Server_Info *server) {}
 static inline void
 cifs_fscache_get_client_cookie(struct TCP_Server_Info *server); {}
+static inline void cifs_fscache_get_super_cookie(struct cifsTconInfo *tcon) {}
+static inline void
+cifs_fscache_release_super_cookie(struct cifsTconInfo *tcon) {}
 
 #endif /* CONFIG_CIFS_FSCACHE */
 
Index: cifs-2.6/fs/cifs/cifsglob.h
===================================================================
--- cifs-2.6.orig/fs/cifs/cifsglob.h
+++ cifs-2.6/fs/cifs/cifsglob.h
@@ -314,6 +314,10 @@ struct cifsTconInfo {
 	bool local_lease:1; /* check leases (only) on local system not remote */
 	bool broken_posix_open; /* e.g. Samba server versions < 3.3.2, 3.2.9 */
 	bool need_reconnect:1; /* connection reset, tid now invalid */
+#ifdef CONFIG_CIFS_FSCACHE
+	u64 resource_id;		/* server resource id */
+	struct fscache_cookie *fscache;	/* cookie for share */
+#endif
 	/* BB add field for back pointer to sb struct(s)? */
 };
 
Index: cifs-2.6/fs/cifs/connect.c
===================================================================
--- cifs-2.6.orig/fs/cifs/connect.c
+++ cifs-2.6/fs/cifs/connect.c
@@ -1779,6 +1779,7 @@ cifs_put_tcon(struct cifsTconInfo *tcon)
 	_FreeXid(xid);
 
 	tconInfoFree(tcon);
+	cifs_fscache_release_super_cookie(tcon);
 	cifs_put_smb_ses(ses);
 }
 
@@ -1848,6 +1849,8 @@ cifs_get_tcon(struct cifsSesInfo *ses, s
 	list_add(&tcon->tcon_list, &ses->tcon_list);
 	write_unlock(&cifs_tcp_ses_lock);
 
+	cifs_fscache_get_super_cookie(tcon);
+
 	return tcon;
 
 out_fail:
Index: cifs-2.6/fs/cifs/inode.c
===================================================================
--- cifs-2.6.orig/fs/cifs/inode.c
+++ cifs-2.6/fs/cifs/inode.c
@@ -807,6 +807,9 @@ struct inode *cifs_root_iget(struct supe
 	if (!inode)
 		return ERR_PTR(-ENOMEM);
 
+	/* populate tcon->resource_id */
+	cifs_sb->tcon->resource_id = CIFS_I(inode)->uniqueid;
+
 	if (rc && cifs_sb->tcon->ipc) {
 		cFYI(1, "ipc connection - fake read inode");
 		inode->i_mode |= S_IFDIR;

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH 05/09] cifs: define inode-level cache object and register them
       [not found] <Yes>
                   ` (4 preceding siblings ...)
  2010-07-05 12:42 ` [PATCH 04/09] cifs: define superblock-level " Suresh Jayaraman
@ 2010-07-05 12:42 ` Suresh Jayaraman
  2010-07-05 12:43 ` [PATCH 06/09] cifs: FS-Cache page management Suresh Jayaraman
                   ` (17 subsequent siblings)
  23 siblings, 0 replies; 265+ messages in thread
From: Suresh Jayaraman @ 2010-07-05 12:42 UTC (permalink / raw)
  To: Steve French; +Cc: linux-cifs, linux-fsdevel, linux-cachefs, David Howells

Define inode-level data storage objects (managed by cifsInodeInfo structs).
Each inode-level object is created in a super-block level object and is itself
a data storage object in to which pages from the inode are stored.

The inode object is keyed by UniqueId. The coherency data being used is
LastWriteTime, LastChangeTime and end of file reported by the server.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
---
 fs/cifs/cache.c    |   83 +++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/cifs/cifsfs.c   |    7 ++++
 fs/cifs/cifsglob.h |    3 +
 fs/cifs/file.c     |    6 +++
 fs/cifs/fscache.c  |   68 +++++++++++++++++++++++++++++++++++++++++++
 fs/cifs/fscache.h  |   12 +++++++
 fs/cifs/inode.c    |    4 ++
 7 files changed, 183 insertions(+)

Index: cifs-2.6/fs/cifs/cache.c
===================================================================
--- cifs-2.6.orig/fs/cifs/cache.c
+++ cifs-2.6/fs/cifs/cache.c
@@ -215,3 +215,86 @@ const struct fscache_cookie_def cifs_fsc
 	.check_aux = cifs_fscache_super_check_aux,
 };
 
+/*
+ * Auxiliary data attached to CIFS inode within the cache
+ */
+struct cifs_fscache_inode_auxdata {
+	struct timespec	last_write_time;
+	struct timespec	last_change_time;
+	u64		eof;
+};
+
+static uint16_t cifs_fscache_inode_get_key(const void *cookie_netfs_data,
+					   void *buffer, uint16_t maxbuf)
+{
+	const struct cifsInodeInfo *cifsi = cookie_netfs_data;
+	uint16_t keylen;
+
+	/* use the UniqueId as the key */
+	keylen = sizeof(cifsi->uniqueid);
+	if (keylen > maxbuf)
+		keylen = 0;
+	else
+		memcpy(buffer, &cifsi->uniqueid, keylen);
+
+	return keylen;
+}
+
+static void
+cifs_fscache_inode_get_attr(const void *cookie_netfs_data, uint64_t *size)
+{
+	const struct cifsInodeInfo *cifsi = cookie_netfs_data;
+
+	*size = cifsi->vfs_inode.i_size;
+}
+
+static uint16_t
+cifs_fscache_inode_get_aux(const void *cookie_netfs_data, void *buffer,
+			   uint16_t maxbuf)
+{
+	struct cifs_fscache_inode_auxdata auxdata;
+	const struct cifsInodeInfo *cifsi = cookie_netfs_data;
+
+	memset(&auxdata, 0, sizeof(auxdata));
+	auxdata.eof = cifsi->server_eof;
+	auxdata.last_write_time = cifsi->vfs_inode.i_mtime;
+	auxdata.last_change_time = cifsi->vfs_inode.i_ctime;
+
+	if (maxbuf > sizeof(auxdata))
+		maxbuf = sizeof(auxdata);
+
+	memcpy(buffer, &auxdata, maxbuf);
+
+	return maxbuf;
+}
+
+static enum
+fscache_checkaux cifs_fscache_inode_check_aux(void *cookie_netfs_data,
+					      const void *data,
+					      uint16_t datalen)
+{
+	struct cifs_fscache_inode_auxdata auxdata;
+	struct cifsInodeInfo *cifsi = cookie_netfs_data;
+
+	if (datalen != sizeof(auxdata))
+		return FSCACHE_CHECKAUX_OBSOLETE;
+
+	memset(&auxdata, 0, sizeof(auxdata));
+	auxdata.eof = cifsi->server_eof;
+	auxdata.last_write_time = cifsi->vfs_inode.i_mtime;
+	auxdata.last_change_time = cifsi->vfs_inode.i_ctime;
+
+	if (memcmp(data, &auxdata, datalen) != 0)
+		return FSCACHE_CHECKAUX_OBSOLETE;
+
+	return FSCACHE_CHECKAUX_OKAY;
+}
+
+const struct fscache_cookie_def cifs_fscache_inode_object_def = {
+	.name		= "CIFS.uniqueid",
+	.type		= FSCACHE_COOKIE_TYPE_DATAFILE,
+	.get_key	= cifs_fscache_inode_get_key,
+	.get_attr	= cifs_fscache_inode_get_attr,
+	.get_aux	= cifs_fscache_inode_get_aux,
+	.check_aux	= cifs_fscache_inode_check_aux,
+};
Index: cifs-2.6/fs/cifs/cifsfs.c
===================================================================
--- cifs-2.6.orig/fs/cifs/cifsfs.c
+++ cifs-2.6/fs/cifs/cifsfs.c
@@ -330,6 +330,12 @@ cifs_destroy_inode(struct inode *inode)
 }
 
 static void
+cifs_clear_inode(struct inode *inode)
+{
+	cifs_fscache_release_inode_cookie(inode);
+}
+
+static void
 cifs_show_address(struct seq_file *s, struct TCP_Server_Info *server)
 {
 	seq_printf(s, ",addr=");
@@ -490,6 +496,7 @@ static const struct super_operations cif
 	.alloc_inode = cifs_alloc_inode,
 	.destroy_inode = cifs_destroy_inode,
 	.drop_inode	= cifs_drop_inode,
+	.clear_inode	= cifs_clear_inode,
 /*	.delete_inode	= cifs_delete_inode,  */  /* Do not need above
 	function unless later we add lazy close of inodes or unless the
 	kernel forgets to call us with the same number of releases (closes)
Index: cifs-2.6/fs/cifs/cifsglob.h
===================================================================
--- cifs-2.6.orig/fs/cifs/cifsglob.h
+++ cifs-2.6/fs/cifs/cifsglob.h
@@ -405,6 +405,9 @@ struct cifsInodeInfo {
 	bool invalid_mapping:1;		/* pagecache is invalid */
 	u64  server_eof;		/* current file size on server */
 	u64  uniqueid;			/* server inode number */
+#ifdef CONFIG_CIFS_FSCACHE
+	struct fscache_cookie *fscache;
+#endif
 	struct inode vfs_inode;
 };
 
Index: cifs-2.6/fs/cifs/file.c
===================================================================
--- cifs-2.6.orig/fs/cifs/file.c
+++ cifs-2.6/fs/cifs/file.c
@@ -40,6 +40,7 @@
 #include "cifs_unicode.h"
 #include "cifs_debug.h"
 #include "cifs_fs_sb.h"
+#include "fscache.h"
 
 static inline int cifs_convert_flags(unsigned int flags)
 {
@@ -282,6 +283,9 @@ int cifs_open(struct inode *inode, struc
 				CIFSSMBClose(xid, tcon, netfid);
 				rc = -ENOMEM;
 			}
+
+			cifs_fscache_set_inode_cookie(inode, file);
+
 			goto out;
 		} else if ((rc == -EINVAL) || (rc == -EOPNOTSUPP)) {
 			if (tcon->ses->serverNOS)
@@ -373,6 +377,8 @@ int cifs_open(struct inode *inode, struc
 		goto out;
 	}
 
+	cifs_fscache_set_inode_cookie(inode, file);
+
 	if (oplock & CIFS_CREATE_ACTION) {
 		/* time to set mode which we can not set earlier due to
 		   problems creating new read-only files */
Index: cifs-2.6/fs/cifs/fscache.h
===================================================================
--- cifs-2.6.orig/fs/cifs/fscache.h
+++ cifs-2.6/fs/cifs/fscache.h
@@ -30,6 +30,8 @@
 extern struct fscache_netfs cifs_fscache_netfs;
 extern const struct fscache_cookie_def cifs_fscache_server_index_def;
 extern const struct fscache_cookie_def cifs_fscache_super_index_def;
+extern const struct fscache_cookie_def cifs_fscache_inode_object_def;
+
 
 extern int cifs_fscache_register(void);
 extern void cifs_fscache_unregister(void);
@@ -42,6 +44,10 @@ extern void cifs_fscache_release_client_
 extern void cifs_fscache_get_super_cookie(struct cifsTconInfo *);
 extern void cifs_fscache_release_super_cookie(struct cifsTconInfo *);
 
+extern void cifs_fscache_release_inode_cookie(struct inode *);
+extern void cifs_fscache_set_inode_cookie(struct inode *, struct file *);
+extern void cifs_fscache_reset_inode_cookie(struct inode *);
+
 #else /* CONFIG_CIFS_FSCACHE */
 static inline int cifs_fscache_register(void) { return 0; }
 static inline void cifs_fscache_unregister(void) {}
@@ -54,6 +60,12 @@ static inline void cifs_fscache_get_supe
 static inline void
 cifs_fscache_release_super_cookie(struct cifsTconInfo *tcon) {}
 
+static inline void cifs_fscache_release_inode_cookie(struct inode *inode) {}
+static inline void cifs_fscache_set_inode_cookie(struct inode *inode,
+						 struct file *filp) {}
+static inline void cifs_fscache_reset_inode_cookie(struct inode *inode) {}
+
+
 #endif /* CONFIG_CIFS_FSCACHE */
 
 #endif /* _CIFS_FSCACHE_H */
Index: cifs-2.6/fs/cifs/inode.c
===================================================================
--- cifs-2.6.orig/fs/cifs/inode.c
+++ cifs-2.6/fs/cifs/inode.c
@@ -29,6 +29,7 @@
 #include "cifsproto.h"
 #include "cifs_debug.h"
 #include "cifs_fs_sb.h"
+#include "fscache.h"
 
 
 static void cifs_set_ops(struct inode *inode, const bool is_dfs_referral)
@@ -776,6 +777,8 @@ retry_iget5_locked:
 			inode->i_flags |= S_NOATIME | S_NOCMTIME;
 		if (inode->i_state & I_NEW) {
 			inode->i_ino = hash;
+			/* initialize per-inode cache cookie pointer */
+			CIFS_I(inode)->fscache = NULL;
 			unlock_new_inode(inode);
 		}
 	}
@@ -1571,6 +1574,7 @@ cifs_invalidate_mapping(struct inode *in
 			cifs_i->write_behind_rc = rc;
 	}
 	invalidate_remote_inode(inode);
+	cifs_fscache_reset_inode_cookie(inode);
 }
 
 int cifs_revalidate_file(struct file *filp)
Index: cifs-2.6/fs/cifs/fscache.c
===================================================================
--- cifs-2.6.orig/fs/cifs/fscache.c
+++ cifs-2.6/fs/cifs/fscache.c
@@ -21,6 +21,7 @@
 #include "fscache.h"
 #include "cifsglob.h"
 #include "cifs_debug.h"
+#include "cifs_fs_sb.h"
 
 void cifs_fscache_get_client_cookie(struct TCP_Server_Info *server)
 {
@@ -56,3 +57,70 @@ void cifs_fscache_release_super_cookie(s
 	fscache_relinquish_cookie(tcon->fscache, 0);
 	tcon->fscache = NULL;
 }
+
+static void cifs_fscache_enable_inode_cookie(struct inode *inode)
+{
+	struct cifsInodeInfo *cifsi = CIFS_I(inode);
+	struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
+
+	if (cifsi->fscache)
+		return;
+
+	cifsi->fscache = fscache_acquire_cookie(cifs_sb->tcon->fscache,
+				&cifs_fscache_inode_object_def,
+				cifsi);
+	cFYI(1, "CIFS: got FH cookie (0x%p/0x%p)",
+			cifs_sb->tcon->fscache, cifsi->fscache);
+}
+
+void cifs_fscache_release_inode_cookie(struct inode *inode)
+{
+	struct cifsInodeInfo *cifsi = CIFS_I(inode);
+
+	if (cifsi->fscache) {
+		cFYI(1, "CIFS releasing inode cookie (0x%p)",
+				cifsi->fscache);
+		fscache_relinquish_cookie(cifsi->fscache, 0);
+		cifsi->fscache = NULL;
+	}
+}
+
+static void cifs_fscache_disable_inode_cookie(struct inode *inode)
+{
+	struct cifsInodeInfo *cifsi = CIFS_I(inode);
+
+	if (cifsi->fscache) {
+		cFYI(1, "CIFS disabling inode cookie (0x%p)",
+				cifsi->fscache);
+		fscache_relinquish_cookie(cifsi->fscache, 1);
+		cifsi->fscache = NULL;
+	}
+}
+
+void cifs_fscache_set_inode_cookie(struct inode *inode, struct file *filp)
+{
+	if ((filp->f_flags & O_ACCMODE) != O_RDONLY)
+		cifs_fscache_disable_inode_cookie(inode);
+	else {
+		cifs_fscache_enable_inode_cookie(inode);
+		cFYI(1, "CIFS: fscache inode cookie set");
+	}
+}
+
+void cifs_fscache_reset_inode_cookie(struct inode *inode)
+{
+	struct cifsInodeInfo *cifsi = CIFS_I(inode);
+	struct cifs_sb_info *cifs_sb = CIFS_SB(inode->i_sb);
+	struct fscache_cookie *old = cifsi->fscache;
+
+	if (cifsi->fscache) {
+		/* retire the current fscache cache and get a new one */
+		fscache_relinquish_cookie(cifsi->fscache, 1);
+
+		cifsi->fscache = fscache_acquire_cookie(cifs_sb->tcon->fscache,
+					&cifs_fscache_inode_object_def,
+					cifsi);
+		cFYI(1, "CIFS: new cookie 0x%p oldcookie 0x%p",
+				cifsi->fscache, old);
+	}
+}

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH 06/09] cifs: FS-Cache page management
       [not found] <Yes>
                   ` (5 preceding siblings ...)
  2010-07-05 12:42 ` [PATCH 05/09] cifs: define inode-level cache object " Suresh Jayaraman
@ 2010-07-05 12:43 ` Suresh Jayaraman
  2010-07-05 12:43 ` [PATCH 07/09] cifs: store pages into local cache Suresh Jayaraman
                   ` (16 subsequent siblings)
  23 siblings, 0 replies; 265+ messages in thread
From: Suresh Jayaraman @ 2010-07-05 12:43 UTC (permalink / raw)
  To: Steve French
  Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA, David Howells

Takes care of invalidation and release of FS-Cache marked pages and also
invalidation of the FsCache page flag when the inode is removed.

Signed-off-by: Suresh Jayaraman <sjayaraman-l3A5Bk7waGM@public.gmane.org>
Acked-by: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/cifs/cache.c   |   31 +++++++++++++++++++++++++++++++
 fs/cifs/file.c    |   20 ++++++++++++++++++++
 fs/cifs/fscache.c |   26 ++++++++++++++++++++++++++
 fs/cifs/fscache.h |   16 ++++++++++++++++
 4 files changed, 93 insertions(+)

Index: cifs-2.6/fs/cifs/cache.c
===================================================================
--- cifs-2.6.orig/fs/cifs/cache.c
+++ cifs-2.6/fs/cifs/cache.c
@@ -290,6 +290,36 @@ fscache_checkaux cifs_fscache_inode_chec
 	return FSCACHE_CHECKAUX_OKAY;
 }
 
+static void cifs_fscache_inode_now_uncached(void *cookie_netfs_data)
+{
+	struct cifsInodeInfo *cifsi = cookie_netfs_data;
+	struct pagevec pvec;
+	pgoff_t first;
+	int loop, nr_pages;
+
+	pagevec_init(&pvec, 0);
+	first = 0;
+
+	cFYI(1, "cifs inode 0x%p now uncached", cifsi);
+
+	for (;;) {
+		nr_pages = pagevec_lookup(&pvec,
+					  cifsi->vfs_inode.i_mapping, first,
+					  PAGEVEC_SIZE - pagevec_count(&pvec));
+		if (!nr_pages)
+			break;
+
+		for (loop = 0; loop < nr_pages; loop++)
+			ClearPageFsCache(pvec.pages[loop]);
+
+		first = pvec.pages[nr_pages - 1]->index + 1;
+
+		pvec.nr = nr_pages;
+		pagevec_release(&pvec);
+		cond_resched();
+	}
+}
+
 const struct fscache_cookie_def cifs_fscache_inode_object_def = {
 	.name		= "CIFS.uniqueid",
 	.type		= FSCACHE_COOKIE_TYPE_DATAFILE,
@@ -297,4 +327,5 @@ const struct fscache_cookie_def cifs_fsc
 	.get_attr	= cifs_fscache_inode_get_attr,
 	.get_aux	= cifs_fscache_inode_get_aux,
 	.check_aux	= cifs_fscache_inode_check_aux,
+	.now_uncached	= cifs_fscache_inode_now_uncached,
 };
Index: cifs-2.6/fs/cifs/file.c
===================================================================
--- cifs-2.6.orig/fs/cifs/file.c
+++ cifs-2.6/fs/cifs/file.c
@@ -2271,6 +2271,22 @@ out:
 	return rc;
 }
 
+static int cifs_release_page(struct page *page, gfp_t gfp)
+{
+	if (PagePrivate(page))
+		return 0;
+
+	return cifs_fscache_release_page(page, gfp);
+}
+
+static void cifs_invalidate_page(struct page *page, unsigned long offset)
+{
+	struct cifsInodeInfo *cifsi = CIFS_I(page->mapping->host);
+
+	if (offset == 0)
+		cifs_fscache_invalidate_page(page, &cifsi->vfs_inode);
+}
+
 static void
 cifs_oplock_break(struct slow_work *work)
 {
@@ -2344,6 +2360,8 @@ const struct address_space_operations ci
 	.write_begin = cifs_write_begin,
 	.write_end = cifs_write_end,
 	.set_page_dirty = __set_page_dirty_nobuffers,
+	.releasepage = cifs_release_page,
+	.invalidatepage = cifs_invalidate_page,
 	/* .sync_page = cifs_sync_page, */
 	/* .direct_IO = */
 };
@@ -2360,6 +2378,8 @@ const struct address_space_operations ci
 	.write_begin = cifs_write_begin,
 	.write_end = cifs_write_end,
 	.set_page_dirty = __set_page_dirty_nobuffers,
+	.releasepage = cifs_release_page,
+	.invalidatepage = cifs_invalidate_page,
 	/* .sync_page = cifs_sync_page, */
 	/* .direct_IO = */
 };
Index: cifs-2.6/fs/cifs/fscache.c
===================================================================
--- cifs-2.6.orig/fs/cifs/fscache.c
+++ cifs-2.6/fs/cifs/fscache.c
@@ -124,3 +124,29 @@ void cifs_fscache_reset_inode_cookie(str
 				cifsi->fscache, old);
 	}
 }
+
+int cifs_fscache_release_page(struct page *page, gfp_t gfp)
+{
+	if (PageFsCache(page)) {
+		struct inode *inode = page->mapping->host;
+		struct cifsInodeInfo *cifsi = CIFS_I(inode);
+
+		cFYI(1, "CIFS: fscache release page (0x%p/0x%p)",
+				page, cifsi->fscache);
+		if (!fscache_maybe_release_page(cifsi->fscache, page, gfp))
+			return 0;
+	}
+
+	return 1;
+}
+
+void __cifs_fscache_invalidate_page(struct page *page, struct inode *inode)
+{
+	struct cifsInodeInfo *cifsi = CIFS_I(inode);
+	struct fscache_cookie *cookie = cifsi->fscache;
+
+	cFYI(1, "CIFS: fscache invalidatepage (0x%p/0x%p)", page, cookie);
+	fscache_wait_on_page_write(cookie, page);
+	fscache_uncache_page(cookie, page);
+}
+
Index: cifs-2.6/fs/cifs/fscache.h
===================================================================
--- cifs-2.6.orig/fs/cifs/fscache.h
+++ cifs-2.6/fs/cifs/fscache.h
@@ -48,6 +48,16 @@ extern void cifs_fscache_release_inode_c
 extern void cifs_fscache_set_inode_cookie(struct inode *, struct file *);
 extern void cifs_fscache_reset_inode_cookie(struct inode *);
 
+extern void __cifs_fscache_invalidate_page(struct page *, struct inode *);
+extern int cifs_fscache_release_page(struct page *page, gfp_t gfp);
+
+static inline void cifs_fscache_invalidate_page(struct page *page,
+					       struct inode *inode)
+{
+	if (PageFsCache(page))
+		__cifs_fscache_invalidate_page(page, inode);
+}
+
 #else /* CONFIG_CIFS_FSCACHE */
 static inline int cifs_fscache_register(void) { return 0; }
 static inline void cifs_fscache_unregister(void) {}
@@ -64,7 +74,13 @@ static inline void cifs_fscache_release_
 static inline void cifs_fscache_set_inode_cookie(struct inode *inode,
 						 struct file *filp) {}
 static inline void cifs_fscache_reset_inode_cookie(struct inode *inode) {}
+static inline void cifs_fscache_release_page(struct page *page, gfp_t gfp)
+{
+	return 1; /* May release page */
+}
 
+static inline int cifs_fscache_invalidate_page(struct page *page,
+			struct inode *) {}
 
 #endif /* CONFIG_CIFS_FSCACHE */
 

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH 07/09] cifs: store pages into local cache
       [not found] <Yes>
                   ` (6 preceding siblings ...)
  2010-07-05 12:43 ` [PATCH 06/09] cifs: FS-Cache page management Suresh Jayaraman
@ 2010-07-05 12:43 ` Suresh Jayaraman
  2010-07-05 12:43 ` [PATCH 08/09] cifs: read pages from FS-Cache Suresh Jayaraman
                   ` (15 subsequent siblings)
  23 siblings, 0 replies; 265+ messages in thread
From: Suresh Jayaraman @ 2010-07-05 12:43 UTC (permalink / raw)
  To: Steve French
  Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA, David Howells

Store pages from an CIFS inode into the data storage object associated with
that inode.

Signed-off-by: Suresh Jayaraman <sjayaraman-l3A5Bk7waGM@public.gmane.org>
---
 fs/cifs/file.c    |    7 +++++++
 fs/cifs/fscache.c |   11 +++++++++++
 fs/cifs/fscache.h |   11 +++++++++++
 3 files changed, 29 insertions(+)

Index: cifs-2.6/fs/cifs/file.c
===================================================================
--- cifs-2.6.orig/fs/cifs/file.c
+++ cifs-2.6/fs/cifs/file.c
@@ -1948,6 +1948,9 @@ static void cifs_copy_cache_pages(struct
 		SetPageUptodate(page);
 		unlock_page(page);
 		data += PAGE_CACHE_SIZE;
+
+		/* add page to FS-Cache */
+		cifs_readpage_to_fscache(mapping->host, page);
 	}
 	return;
 }
@@ -2117,6 +2120,10 @@ static int cifs_readpage_worker(struct f
 
 	flush_dcache_page(page);
 	SetPageUptodate(page);
+
+	/* send this page to the cache */
+	cifs_readpage_to_fscache(file->f_path.dentry->d_inode, page);
+
 	rc = 0;
 
 io_error:
Index: cifs-2.6/fs/cifs/fscache.c
===================================================================
--- cifs-2.6.orig/fs/cifs/fscache.c
+++ cifs-2.6/fs/cifs/fscache.c
@@ -140,6 +140,17 @@ int cifs_fscache_release_page(struct pag
 	return 1;
 }
 
+void __cifs_readpage_to_fscache(struct inode *inode, struct page *page)
+{
+	int ret;
+
+	cFYI(1, "CIFS: readpage_to_fscache(fsc: %p, p: %p, i: %p",
+			CIFS_I(inode)->fscache, page, inode);
+	ret = fscache_write_page(CIFS_I(inode)->fscache, page, GFP_KERNEL);
+	if (ret != 0)
+		fscache_uncache_page(CIFS_I(inode)->fscache, page);
+}
+
 void __cifs_fscache_invalidate_page(struct page *page, struct inode *inode)
 {
 	struct cifsInodeInfo *cifsi = CIFS_I(inode);
Index: cifs-2.6/fs/cifs/fscache.h
===================================================================
--- cifs-2.6.orig/fs/cifs/fscache.h
+++ cifs-2.6/fs/cifs/fscache.h
@@ -51,6 +51,8 @@ extern void cifs_fscache_reset_inode_coo
 extern void __cifs_fscache_invalidate_page(struct page *, struct inode *);
 extern int cifs_fscache_release_page(struct page *page, gfp_t gfp);
 
+extern void __cifs_readpage_to_fscache(struct inode *, struct page *);
+
 static inline void cifs_fscache_invalidate_page(struct page *page,
 					       struct inode *inode)
 {
@@ -58,6 +60,13 @@ static inline void cifs_fscache_invalida
 		__cifs_fscache_invalidate_page(page, inode);
 }
 
+static inline void cifs_readpage_to_fscache(struct inode *inode,
+					    struct page *page)
+{
+	if (PageFsCache(page))
+		__cifs_readpage_to_fscache(inode, page);
+}
+
 #else /* CONFIG_CIFS_FSCACHE */
 static inline int cifs_fscache_register(void) { return 0; }
 static inline void cifs_fscache_unregister(void) {}
@@ -81,6 +90,8 @@ static inline void cifs_fscache_release_
 
 static inline int cifs_fscache_invalidate_page(struct page *page,
 			struct inode *) {}
+static inline void cifs_readpage_to_fscache(struct inode *inode,
+			struct page *page) {}
 
 #endif /* CONFIG_CIFS_FSCACHE */
 

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH 08/09] cifs: read pages from FS-Cache
       [not found] <Yes>
                   ` (7 preceding siblings ...)
  2010-07-05 12:43 ` [PATCH 07/09] cifs: store pages into local cache Suresh Jayaraman
@ 2010-07-05 12:43 ` Suresh Jayaraman
  2010-07-05 12:43 ` [PATCH 09/09] cifs: add mount option to enable local caching Suresh Jayaraman
                   ` (14 subsequent siblings)
  23 siblings, 0 replies; 265+ messages in thread
From: Suresh Jayaraman @ 2010-07-05 12:43 UTC (permalink / raw)
  To: Steve French; +Cc: linux-cifs, linux-fsdevel, linux-cachefs, David Howells

Read pages from a FS-Cache data storage object into a CIFS inode.

Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>
Acked-by: David Howells <dhowells@redhat.com>
---
 fs/cifs/file.c    |   17 ++++++++++++
 fs/cifs/fscache.c |   73 ++++++++++++++++++++++++++++++++++++++++++++++++++++++
 fs/cifs/fscache.h |   40 ++++++++++++++++++++++++++++-
 3 files changed, 129 insertions(+), 1 deletion(-)

Index: cifs-2.6/fs/cifs/file.c
===================================================================
--- cifs-2.6.orig/fs/cifs/file.c
+++ cifs-2.6/fs/cifs/file.c
@@ -1981,6 +1981,15 @@ static int cifs_readpages(struct file *f
 	cifs_sb = CIFS_SB(file->f_path.dentry->d_sb);
 	pTcon = cifs_sb->tcon;
 
+	/*
+	 * Reads as many pages as possible from fscache. Returns -ENOBUFS
+	 * immediately if the cookie is negative
+	 */
+	rc = cifs_readpages_from_fscache(mapping->host, mapping, page_list,
+					 &num_pages);
+	if (rc == 0)
+		goto read_complete;
+
 	cFYI(DBG2, "rpages: num pages %d", num_pages);
 	for (i = 0; i < num_pages; ) {
 		unsigned contig_pages;
@@ -2091,6 +2100,7 @@ static int cifs_readpages(struct file *f
 		smb_read_data = NULL;
 	}
 
+read_complete:
 	FreeXid(xid);
 	return rc;
 }
@@ -2101,6 +2111,11 @@ static int cifs_readpage_worker(struct f
 	char *read_data;
 	int rc;
 
+	/* Is the page cached? */
+	rc = cifs_readpage_from_fscache(file->f_path.dentry->d_inode, page);
+	if (rc == 0)
+		goto read_complete;
+
 	page_cache_get(page);
 	read_data = kmap(page);
 	/* for reads over a certain size could initiate async read ahead */
@@ -2129,6 +2144,8 @@ static int cifs_readpage_worker(struct f
 io_error:
 	kunmap(page);
 	page_cache_release(page);
+
+read_complete:
 	return rc;
 }
 
Index: cifs-2.6/fs/cifs/fscache.c
===================================================================
--- cifs-2.6.orig/fs/cifs/fscache.c
+++ cifs-2.6/fs/cifs/fscache.c
@@ -140,6 +140,79 @@ int cifs_fscache_release_page(struct pag
 	return 1;
 }
 
+static void cifs_readpage_from_fscache_complete(struct page *page, void *ctx,
+						int error)
+{
+	cFYI(1, "CFS: readpage_from_fscache_complete (0x%p/%d)",
+			page, error);
+	if (!error)
+		SetPageUptodate(page);
+	unlock_page(page);
+}
+
+/*
+ * Retrieve a page from FS-Cache
+ */
+int __cifs_readpage_from_fscache(struct inode *inode, struct page *page)
+{
+	int ret;
+
+	cFYI(1, "CIFS: readpage_from_fscache(fsc:%p, p:%p, i:0x%p",
+			CIFS_I(inode)->fscache, page, inode);
+	ret = fscache_read_or_alloc_page(CIFS_I(inode)->fscache, page,
+					 cifs_readpage_from_fscache_complete,
+					 NULL,
+					 GFP_KERNEL);
+	switch (ret) {
+
+	case 0: /* page found in fscache, read submitted */
+		cFYI(1, "CIFS: readpage_from_fscache: submitted");
+		return ret;
+	case -ENOBUFS:	/* page won't be cached */
+	case -ENODATA:	/* page not in cache */
+		cFYI(1, "CIFS: readpage_from_fscache %d", ret);
+		return 1;
+
+	default:
+		cERROR(1, "unknown error ret = %d", ret);
+	}
+	return ret;
+}
+
+/*
+ * Retrieve a set of pages from FS-Cache
+ */
+int __cifs_readpages_from_fscache(struct inode *inode,
+				struct address_space *mapping,
+				struct list_head *pages,
+				unsigned *nr_pages)
+{
+	int ret;
+
+	cFYI(1, "CIFS: __cifs_readpages_from_fscache (0x%p/%u/0x%p)",
+			CIFS_I(inode)->fscache, *nr_pages, inode);
+	ret = fscache_read_or_alloc_pages(CIFS_I(inode)->fscache, mapping,
+					  pages, nr_pages,
+					  cifs_readpage_from_fscache_complete,
+					  NULL,
+					  mapping_gfp_mask(mapping));
+	switch (ret) {
+	case 0:	/* read submitted to the cache for all pages */
+		cFYI(1, "CIFS: readpages_from_fscache: submitted");
+		return ret;
+
+	case -ENOBUFS:	/* some pages are not cached and can't be */
+	case -ENODATA:	/* some pages are not cached */
+		cFYI(1, "CIFS: readpages_from_fscache: no page");
+		return 1;
+
+	default:
+		cFYI(1, "unknown error ret = %d", ret);
+	}
+
+	return ret;
+}
+
 void __cifs_readpage_to_fscache(struct inode *inode, struct page *page)
 {
 	int ret;
Index: cifs-2.6/fs/cifs/fscache.h
===================================================================
--- cifs-2.6.orig/fs/cifs/fscache.h
+++ cifs-2.6/fs/cifs/fscache.h
@@ -32,7 +32,6 @@ extern const struct fscache_cookie_def c
 extern const struct fscache_cookie_def cifs_fscache_super_index_def;
 extern const struct fscache_cookie_def cifs_fscache_inode_object_def;
 
-
 extern int cifs_fscache_register(void);
 extern void cifs_fscache_unregister(void);
 
@@ -50,6 +49,11 @@ extern void cifs_fscache_reset_inode_coo
 
 extern void __cifs_fscache_invalidate_page(struct page *, struct inode *);
 extern int cifs_fscache_release_page(struct page *page, gfp_t gfp);
+extern int __cifs_readpage_from_fscache(struct inode *, struct page *);
+extern int __cifs_readpages_from_fscache(struct inode *,
+					 struct address_space *,
+					 struct list_head *,
+					 unsigned *);
 
 extern void __cifs_readpage_to_fscache(struct inode *, struct page *);
 
@@ -60,6 +64,26 @@ static inline void cifs_fscache_invalida
 		__cifs_fscache_invalidate_page(page, inode);
 }
 
+static inline int cifs_readpage_from_fscache(struct inode *inode,
+					     struct page *page)
+{
+	if (CIFS_I(inode)->fscache)
+		return __cifs_readpage_from_fscache(inode, page);
+
+	return -ENOBUFS;
+}
+
+static inline int cifs_readpages_from_fscache(struct inode *inode,
+					      struct address_space *mapping,
+					      struct list_head *pages,
+					      unsigned *nr_pages)
+{
+	if (CIFS_I(inode)->fscache)
+		return __cifs_readpages_from_fscache(inode, mapping, pages,
+						     nr_pages);
+	return -ENOBUFS;
+}
+
 static inline void cifs_readpage_to_fscache(struct inode *inode,
 					    struct page *page)
 {
@@ -90,6 +114,20 @@ static inline void cifs_fscache_release_
 
 static inline int cifs_fscache_invalidate_page(struct page *page,
 			struct inode *) {}
+static inline int
+cifs_readpage_from_fscache(struct inode *inode, struct page *page)
+{
+	return -ENOBUFS;
+}
+
+static inline int cifs_readpages_from_fscache(struct inode *inode,
+					      struct address_space *mapping,
+					      struct list_head *pages,
+					      unsigned *nr_pages)
+{
+	return -ENOBUFS;
+}
+
 static inline void cifs_readpage_to_fscache(struct inode *inode,
 			struct page *page) {}
 

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH 09/09] cifs: add mount option to enable local caching
       [not found] <Yes>
                   ` (8 preceding siblings ...)
  2010-07-05 12:43 ` [PATCH 08/09] cifs: read pages from FS-Cache Suresh Jayaraman
@ 2010-07-05 12:43 ` Suresh Jayaraman
  2010-12-20 22:18 ` [PATCH v2] TODO: CDMA SMS and CDMA CMAS Lei Yu
                   ` (13 subsequent siblings)
  23 siblings, 0 replies; 265+ messages in thread
From: Suresh Jayaraman @ 2010-07-05 12:43 UTC (permalink / raw)
  To: Steve French
  Cc: linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA, David Howells

Add a mount option 'fsc' to enable local caching on CIFS.

I considered adding a separate debug bit for caching, but it appears that
debugging would be relatively easier with the normal CIFS_INFO level.

As the cifs-utils (userspace) changes are not done yet, this patch enables
'fsc' by default to enable testing.

Signed-off-by: Suresh Jayaraman <sjayaraman-l3A5Bk7waGM@public.gmane.org>
Acked-by: David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
---
 fs/cifs/cifs_fs_sb.h |    1 +
 fs/cifs/connect.c    |    8 ++++++++
 2 files changed, 9 insertions(+)

Index: cifs-2.6/fs/cifs/cifs_fs_sb.h
===================================================================
--- cifs-2.6.orig/fs/cifs/cifs_fs_sb.h
+++ cifs-2.6/fs/cifs/cifs_fs_sb.h
@@ -35,6 +35,7 @@
 #define CIFS_MOUNT_DYNPERM      0x1000 /* allow in-memory only mode setting   */
 #define CIFS_MOUNT_NOPOSIXBRL   0x2000 /* mandatory not posix byte range lock */
 #define CIFS_MOUNT_NOSSYNC      0x4000 /* don't do slow SMBflush on every sync*/
+#define CIFS_MOUNT_FSCACHE	0x8000 /* local caching enabled */
 
 struct cifs_sb_info {
 	struct cifsTconInfo *tcon;	/* primary mount */
Index: cifs-2.6/fs/cifs/connect.c
===================================================================
--- cifs-2.6.orig/fs/cifs/connect.c
+++ cifs-2.6/fs/cifs/connect.c
@@ -98,6 +98,7 @@ struct smb_vol {
 	bool noblocksnd:1;
 	bool noautotune:1;
 	bool nostrictsync:1; /* do not force expensive SMBflush on every sync */
+	bool fsc:1;	/* enable fscache */
 	unsigned int rsize;
 	unsigned int wsize;
 	bool sockopt_tcp_nodelay:1;
@@ -843,6 +844,9 @@ cifs_parse_mount_options(char *options,
 	/* default to using server inode numbers where available */
 	vol->server_ino = 1;
 
+	/* XXX: default to fsc for testing until mount.cifs pieces are done */
+	vol->fsc = 1;
+
 	if (!options)
 		return 1;
 
@@ -1332,6 +1336,8 @@ cifs_parse_mount_options(char *options,
 			printk(KERN_WARNING "CIFS: Mount option noac not "
 				"supported. Instead set "
 				"/proc/fs/cifs/LookupCacheEnabled to 0\n");
+		} else if (strnicmp(data, "fsc", 3) == 0) {
+			vol->fsc = true;
 		} else
 			printk(KERN_WARNING "CIFS: Unknown mount option %s\n",
 						data);
@@ -2405,6 +2411,8 @@ static void setup_cifs_sb(struct smb_vol
 		cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_OVERR_GID;
 	if (pvolume_info->dynperm)
 		cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_DYNPERM;
+	if (pvolume_info->fsc)
+		cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_FSCACHE;
 	if (pvolume_info->direct_io) {
 		cFYI(1, "mounting share using direct i/o");
 		cifs_sb->mnt_cifs_flags |= CIFS_MOUNT_DIRECT_IO;

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 00/09] cifs: local caching support using FS-Cache
       [not found]   ` <1278333663-30464-1-git-send-email-sjayaraman-l3A5Bk7waGM@public.gmane.org>
@ 2010-07-14 17:41     ` Scott Lovenberg
       [not found]       ` <4C3DF6BF.3070001-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
  0 siblings, 1 reply; 265+ messages in thread
From: Scott Lovenberg @ 2010-07-14 17:41 UTC (permalink / raw)
  To: Suresh Jayaraman
  Cc: Steve French, linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA, David Howells

On 7/5/2010 8:41 AM, Suresh Jayaraman wrote:
> This patchset is a second try at adding persistent, local caching facility for
> CIFS using the FS-Cache interface.
>
>    
Just wondering, have you bench marked this at all?  I'd be interested to 
see how this compares (performance and scaling) to an oplock-centric 
design.

I'd hazard a guess that with pipelining support in SMB2 the performance 
will be even better since you can have a hot cache and more requests in 
flight.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 00/09] cifs: local caching support using FS-Cache
       [not found]       ` <4C3DF6BF.3070001-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
@ 2010-07-14 18:09         ` Steve French
       [not found]           ` <AANLkTin2tKtkWTflrrzBMYBEd6SFr35uYUl1SmfYlj9W-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 265+ messages in thread
From: Steve French @ 2010-07-14 18:09 UTC (permalink / raw)
  To: Scott Lovenberg
  Cc: Suresh Jayaraman, linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA, David Howells

On Wed, Jul 14, 2010 at 12:41 PM, Scott Lovenberg
<scott.lovenberg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
> On 7/5/2010 8:41 AM, Suresh Jayaraman wrote:
>>
>> This patchset is a second try at adding persistent, local caching facility
>> for
>> CIFS using the FS-Cache interface.
>>
>>
>
> Just wondering, have you bench marked this at all?  I'd be interested to see
> how this compares (performance and scaling) to an oplock-centric design.
>
> I'd hazard a guess that with pipelining support in SMB2 the performance will
> be even better since you can have a hot cache and more requests in flight.

Yes - very plausibly



-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 00/09] cifs: local caching support using FS-Cache
       [not found]           ` <AANLkTin2tKtkWTflrrzBMYBEd6SFr35uYUl1SmfYlj9W-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-07-15 16:23             ` Suresh Jayaraman
       [not found]               ` <4C3F35F7.8060408-l3A5Bk7waGM@public.gmane.org>
       [not found]               ` <4C480F51.8070204-l3A5Bk7waGM@public.gmane.org>
  0 siblings, 2 replies; 265+ messages in thread
From: Suresh Jayaraman @ 2010-07-15 16:23 UTC (permalink / raw)
  To: Steve French
  Cc: Scott Lovenberg, linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA, David Howells

On 07/14/2010 08:09 PM, Steve French wrote:
> On Wed, Jul 14, 2010 at 12:41 PM, Scott Lovenberg
> <scott.lovenberg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>> On 7/5/2010 8:41 AM, Suresh Jayaraman wrote:
>>>
>>> This patchset is a second try at adding persistent, local caching facility
>>> for
>>> CIFS using the FS-Cache interface.
>>>
>>>
>>
>> Just wondering, have you bench marked this at all? �I'd be interested to see
>> how this compares (performance and scaling) to an oplock-centric design.
>>

Yes, I have done a few performance benchmarks with the cifs client (and
not SMB2) and I'll post them early nextweek when I'm back (as I'm
travelling now).

However, I have never done scalability tests (not sure whether there is
a way to simulate a number of cifs clients).

>> I'd hazard a guess that with pipelining support in SMB2 the performance will
>> be even better since you can have a hot cache and more requests in flight.
> 
> Yes - very plausibly
> 

I have not tried the new SMB2 client. But, it seems the pipelining
support, Oplocks (only Level II kind) could help improve performance.


Thanks,

-- 
Suresh Jayaraman

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 04/09] cifs: define superblock-level cache index objects and register them
  2010-07-05 12:42 ` [PATCH 04/09] cifs: define superblock-level " Suresh Jayaraman
@ 2010-07-20 12:53   ` Jeff Layton
       [not found]     ` <20100720085327.4d1bf9d7-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
  0 siblings, 1 reply; 265+ messages in thread
From: Jeff Layton @ 2010-07-20 12:53 UTC (permalink / raw)
  To: Suresh Jayaraman
  Cc: Steve French, linux-cifs, linux-fsdevel, linux-cachefs, David Howells

On Mon,  5 Jul 2010 18:12:27 +0530
Suresh Jayaraman <sjayaraman@suse.de> wrote:

> Define superblock-level cache index objects (managed by cifsTconInfo structs).
> Each superblock object is created in a server-level index object and in itself
> an index into which inode-level objects are inserted.
> 
> The superblock object is keyed by sharename. The UniqueId/IndexNumber is used to
> validate that the exported share is the same since we accessed it last time.
> 
> Signed-off-by: Suresh Jayaraman <sjayaraman@suse.de>

Hmm...Steve started merging these already but I've just now had the
chance to review them.

This approach may be a problem. It seems to make the assumption that
there is only a single tcon per superblock. How exactly will this work
when there are multiple tcons per superblock as will be the case with
multisession mounts?

By having a cache cookie per tcon, will that mean that you'll
potentially have multiple versions of cached inodes (one for each tcon)?

-- 
Jeff Layton <jlayton@samba.org>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 04/09] cifs: define superblock-level cache index objects and register them
       [not found]     ` <20100720085327.4d1bf9d7-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
@ 2010-07-20 13:37       ` Jeff Layton
       [not found]         ` <20100720093722.4f734f03-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
  0 siblings, 1 reply; 265+ messages in thread
From: Jeff Layton @ 2010-07-20 13:37 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Suresh Jayaraman, Steve French,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA, David Howells

On Tue, 20 Jul 2010 08:53:27 -0400
Jeff Layton <jlayton-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org> wrote:

> On Mon,  5 Jul 2010 18:12:27 +0530
> Suresh Jayaraman <sjayaraman-l3A5Bk7waGM@public.gmane.org> wrote:
> 
> > Define superblock-level cache index objects (managed by cifsTconInfo structs).
> > Each superblock object is created in a server-level index object and in itself
> > an index into which inode-level objects are inserted.
> > 
> > The superblock object is keyed by sharename. The UniqueId/IndexNumber is used to
> > validate that the exported share is the same since we accessed it last time.
> > 
> > Signed-off-by: Suresh Jayaraman <sjayaraman-l3A5Bk7waGM@public.gmane.org>
> 
> Hmm...Steve started merging these already but I've just now had the
> chance to review them.
> 
> This approach may be a problem. It seems to make the assumption that
> there is only a single tcon per superblock. How exactly will this work
> when there are multiple tcons per superblock as will be the case with
> multisession mounts?
> 
> By having a cache cookie per tcon, will that mean that you'll
> potentially have multiple versions of cached inodes (one for each tcon)?
> 

Ahh nm. This shouldn't be a problem since this key is based on the
sharename only and that will be the same between the multiple tcons.

Please disregard!
-- 
Jeff Layton <jlayton-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 04/09] cifs: define superblock-level cache index objects and register them
       [not found]         ` <20100720093722.4f734f03-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
@ 2010-07-21 14:06           ` Suresh Jayaraman
  0 siblings, 0 replies; 265+ messages in thread
From: Suresh Jayaraman @ 2010-07-21 14:06 UTC (permalink / raw)
  To: Jeff Layton
  Cc: Steve French, linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA, David Howells

On 07/20/2010 07:07 PM, Jeff Layton wrote:
> On Tue, 20 Jul 2010 08:53:27 -0400
> Jeff Layton <jlayton-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org> wrote:
> 
>> On Mon,  5 Jul 2010 18:12:27 +0530
>> Suresh Jayaraman <sjayaraman-l3A5Bk7waGM@public.gmane.org> wrote:
>>
>>> Define superblock-level cache index objects (managed by cifsTconInfo structs).
>>> Each superblock object is created in a server-level index object and in itself
>>> an index into which inode-level objects are inserted.
>>>
>>> The superblock object is keyed by sharename. The UniqueId/IndexNumber is used to
>>> validate that the exported share is the same since we accessed it last time.
>>>
>>> Signed-off-by: Suresh Jayaraman <sjayaraman-l3A5Bk7waGM@public.gmane.org>
>>
>> Hmm...Steve started merging these already but I've just now had the
>> chance to review them.
>>
>> This approach may be a problem. It seems to make the assumption that
>> there is only a single tcon per superblock. How exactly will this work
>> when there are multiple tcons per superblock as will be the case with
>> multisession mounts?
>>
>> By having a cache cookie per tcon, will that mean that you'll
>> potentially have multiple versions of cached inodes (one for each tcon)?
>>

In case of multisession mounts, there is a cache cookie per tcon, but
they will point to the same cached inodes.

> Ahh nm. This shouldn't be a problem since this key is based on the
> sharename only and that will be the same between the multiple tcons.

yeah. I think the description could be a bit more clear..


Thanks,

-- 
Suresh Jayaraman

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 00/09] cifs: local caching support using FS-Cache
       [not found]               ` <4C3F35F7.8060408-l3A5Bk7waGM@public.gmane.org>
@ 2010-07-22  9:28                 ` Suresh Jayaraman
  0 siblings, 0 replies; 265+ messages in thread
From: Suresh Jayaraman @ 2010-07-22  9:28 UTC (permalink / raw)
  To: Steve French
  Cc: Scott Lovenberg, linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA, David Howells

On 07/15/2010 09:53 PM, Suresh Jayaraman wrote:
> On 07/14/2010 08:09 PM, Steve French wrote:
>> On Wed, Jul 14, 2010 at 12:41 PM, Scott Lovenberg
>> <scott.lovenberg-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org> wrote:
>>> On 7/5/2010 8:41 AM, Suresh Jayaraman wrote:
>>>>
>>>> This patchset is a second try at adding persistent, local caching facility
>>>> for
>>>> CIFS using the FS-Cache interface.
>>>>
>>>>
>>>
>>> Just wondering, have you bench marked this at all? �I'd be interested to see
>>> how this compares (performance and scaling) to an oplock-centric design.
>>>
> 
> Yes, I have done a few performance benchmarks with the cifs client (and
> not SMB2) and I'll post them early nextweek when I'm back (as I'm
> travelling now).
> 

Here are some results from my benchmarking:

Environment
------------

I'm using my T60p laptop as the CIFS server (running Samba) and one of
my test machines as CIFS client, connected over an ethernet of reported
speed 1000 Mb/s. The TCP bandwidth as seen by a pair of netcats between
the client and the server is about 786.24 Mb/s.

Client has a 2.8 GHz Pentium D CPU with 2GB RAM
Server has a 2.33GHz Core2 CPU (T7600) with 2GB RAM


Test
-----
The benchmark involves pulling a 200 MB file over CIFS to the client
using cat to /dev/zero under `time'. The wall clock time reported was
recorded.

Note
----
   - The client was rebooted after each test, but the server was not.
   - The entire file was loaded into RAM on the server before each test
     to eliminate disk I/O latencies on that end.
   - A seperate partition of size 4GB has been dedicated for the cache.
   - There were no other CIFS client that was accessing the Server when
     the tests were performed.


First, the test was run on the server twice and the second result was
recorded (noted as Server below).

Secondly, the client was rebooted and the test was run with cachefiled
not running and was recorded (noted as None below).

Next, the client was rebooted, the cache contents (if any) were erased
with mkfs.ext3 and test was run again with cachefilesd running (noted as
COLD)

Next the client was rebooted, tests were run with cachefilesd running
this time with a populated cache (noted as HOT).

Finally, the test was run again without unmounting, stopping cachefiled
or rebooting to ensure pagecache is valid (noted as PGCACHE).

The benchmark was repeated twice:

Cache (state)	Run #1		Run#2
=============  =======		=======
Server		0.107 s		0.090 s
None		6.497 s		6.440 s
COLD		6.707 s		6.635 s
HOT		5.286 s		5.078 s
PGCACHE		0.090 s		0.091 s

As it can been seen, the performance while reading when data is cache
hot (disk) is not great as the network link is a Gigabit ethernet (with
server having working set in memory) which is mostly expected. (I could
not get access to a slower network (say 100 Mb/s) where the real
performance boost could be evident).


Thanks,


-- 
Suresh Jayaraman

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 00/09] cifs: local caching support using FS-Cache
       [not found]               ` <4C480F51.8070204-l3A5Bk7waGM@public.gmane.org>
@ 2010-07-22 17:40                 ` David Howells
       [not found]                   ` <1892.1279820400-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2010-07-30 23:08                 ` Scott Lovenberg
  1 sibling, 1 reply; 265+ messages in thread
From: David Howells @ 2010-07-22 17:40 UTC (permalink / raw)
  To: Suresh Jayaraman
  Cc: dhowells-H+wXaHxf7aLQT0dZR+AlfA, Steve French, Scott Lovenberg,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA

Suresh Jayaraman <sjayaraman-l3A5Bk7waGM@public.gmane.org> wrote:

> As it can been seen, the performance while reading when data is cache
> hot (disk) is not great as the network link is a Gigabit ethernet (with
> server having working set in memory) which is mostly expected.

That's what I see with NFS and AFS too.

> (I could not get access to a slower network (say 100 Mb/s) where the real
> performance boost could be evident).

ethtool?

David

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 00/09] cifs: local caching support using FS-Cache
       [not found]                   ` <1892.1279820400-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
@ 2010-07-22 19:12                     ` Jeff Layton
  2010-07-22 22:49                     ` Andreas Dilger
  2010-07-23 11:57                     ` Suresh Jayaraman
  2 siblings, 0 replies; 265+ messages in thread
From: Jeff Layton @ 2010-07-22 19:12 UTC (permalink / raw)
  To: David Howells
  Cc: Suresh Jayaraman, Steve French, Scott Lovenberg,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA

On Thu, 22 Jul 2010 18:40:00 +0100
David Howells <dhowells-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org> wrote:

> Suresh Jayaraman <sjayaraman-l3A5Bk7waGM@public.gmane.org> wrote:
> 
> > As it can been seen, the performance while reading when data is cache
> > hot (disk) is not great as the network link is a Gigabit ethernet (with
> > server having working set in memory) which is mostly expected.
> 
> That's what I see with NFS and AFS too.
> 
> > (I could not get access to a slower network (say 100 Mb/s) where the real
> > performance boost could be evident).
> 
> ethtool?
> 

There's also the "netem" thingy that allows you to add arbitrary
latencies to packets (for simulating long-haul network links).

-- 
Jeff Layton <jlayton-eUNUBHrolfbYtjvyW6yDsg@public.gmane.org>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 00/09] cifs: local caching support using FS-Cache
       [not found]                   ` <1892.1279820400-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2010-07-22 19:12                     ` Jeff Layton
@ 2010-07-22 22:49                     ` Andreas Dilger
  2010-07-23  8:35                       ` Stef Bon
  2010-07-23 11:57                     ` Suresh Jayaraman
  2 siblings, 1 reply; 265+ messages in thread
From: Andreas Dilger @ 2010-07-22 22:49 UTC (permalink / raw)
  To: David Howells
  Cc: Suresh Jayaraman, Steve French, Scott Lovenberg,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA

On 2010-07-22, at 11:40, David Howells wrote:
> Suresh Jayaraman <sjayaraman-l3A5Bk7waGM@public.gmane.org> wrote:
>> As it can been seen, the performance while reading when data is cache
>> hot (disk) is not great as the network link is a Gigabit ethernet (with
>> server having working set in memory) which is mostly expected.
> 
> That's what I see with NFS and AFS too.
> 
>> (I could not get access to a slower network (say 100 Mb/s) where the real
>> performance boost could be evident).

More interesting than a slow network is testing with more clients.  10 clients should be able to get 10x the read performance from the client-local cache, probably more than the server's peak disk/network bandwidth.

Cheers, Andreas

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 00/09] cifs: local caching support using FS-Cache
  2010-07-22 22:49                     ` Andreas Dilger
@ 2010-07-23  8:35                       ` Stef Bon
       [not found]                         ` <AANLkTikF5Oz5pobaPUJebUg+yPuoVy_B5PBz+nuUTSii-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
  0 siblings, 1 reply; 265+ messages in thread
From: Stef Bon @ 2010-07-23  8:35 UTC (permalink / raw)
  To: Andreas Dilger
  Cc: David Howells, Suresh Jayaraman, Steve French, Scott Lovenberg,
	linux-cifs, linux-fsdevel, linux-cachefs

In my opinion there should be article published about this, describing
fs-cache generally, and these kinds of benchmarks!

Using fs-cache for network filesystem is an important issue, and
should get "exposure".

Stef Bon

2010/7/23 Andreas Dilger <adilger@dilger.ca>:
> On 2010-07-22, at 11:40, David Howells wrote:
>> Suresh Jayaraman <sjayaraman@suse.de> wrote:
>>> As it can been seen, the performance while reading when data is cache
>>> hot (disk) is not great as the network link is a Gigabit ethernet (with
>>> server having working set in memory) which is mostly expected.
>>
>> That's what I see with NFS and AFS too.
>>
>>> (I could not get access to a slower network (say 100 Mb/s) where the real
>>> performance boost could be evident).
>
> More interesting than a slow network is testing with more clients.  10 clients should be able to get 10x the read performance from the client-local cache, probably more than the server's peak disk/network bandwidth.
>
> Cheers, Andreas
>
>
>
>
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 00/09] cifs: local caching support using FS-Cache
       [not found]                   ` <1892.1279820400-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
  2010-07-22 19:12                     ` Jeff Layton
  2010-07-22 22:49                     ` Andreas Dilger
@ 2010-07-23 11:57                     ` Suresh Jayaraman
       [not found]                       ` <4C4983B0.5080804-l3A5Bk7waGM@public.gmane.org>
  2 siblings, 1 reply; 265+ messages in thread
From: Suresh Jayaraman @ 2010-07-23 11:57 UTC (permalink / raw)
  To: David Howells
  Cc: Steve French, Scott Lovenberg, linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA

On 07/22/2010 11:10 PM, David Howells wrote:
> Suresh Jayaraman <sjayaraman-l3A5Bk7waGM@public.gmane.org> wrote:
> 
>> As it can been seen, the performance while reading when data is cache
>> hot (disk) is not great as the network link is a Gigabit ethernet (with
>> server having working set in memory) which is mostly expected.
> 
> That's what I see with NFS and AFS too.
> 
>> (I could not get access to a slower network (say 100 Mb/s) where the real
>> performance boost could be evident).
> 
> ethtool?
> 

Thanks for the pointer. Here are the results on a 100Mb/s network:


Environment
------------

I'm using my T60p laptop as the CIFS server (running Samba) and one of
my test machines as CIFS client, connected over an ethernet of reported
speed 1000 Mb/s. ethtool was used to throttle the speed to 100 Mb/s. The
TCP bandwidth as seen by a pair of netcats between the client and the
server is about 89.555 Mb/s.

Client has a 2.8 GHz Pentium D CPU with 2GB RAM
Server has a 2.33GHz Core2 CPU (T7600) with 2GB RAM


Test
-----
The benchmark involves pulling a 200 MB file over CIFS to the client
using cat to /dev/zero under `time'. The wall clock time reported was
recorded.

Note
----
   - The client was rebooted after each test, but the server was not.
   - The entire file was loaded into RAM on the server before each test
     to eliminate disk I/O latencies on that end.
   - A seperate partition of size 4GB has been dedicated for the cache.
   - There were no other CIFS client that was accessing the Server when
     the tests were performed.


First, the test was run on the server twice and the second result was
recorded (noted as Server below).

Secondly, the client was rebooted and the test was run with cachefiled
not running and was recorded (noted as None below).

Next, the client was rebooted, the cache contents (if any) were erased
with mkfs.ext3 and test was run again with cachefilesd running (noted as
COLD)

Next the client was rebooted, tests were run with cachefilesd running
this time with a populated disk cache (noted as HOT).

Finally, the test was run again without unmounting, stopping cachefiled
or rebooting to ensure pagecache is valid (noted as PGCACHE).

The benchmark was repeated twice:

Cache (state)	Run #1		Run#2
=============  =======		=======
Server		 0.104 s	 0.107 s
None		26.042 s	26.576 s
COLD		26.703 s        26.787 s
HOT		 5.115 s	 5.147 s
PGCACHE		 0.091 s	 0.092 s

I think the results are inline with the expectations given the speed
reported by netcat.

As noted by Andreas, the read performance with more number of clients
would be more interesting as the cache can positively impact the
scalability. However, I don't have a number of clients or know a way to
simulate a large number of cifs clients. The cache also can positively
impact the performance on heavily loaded network and/or server due to
reduction of network calls to the server.

Also, it should be noted that local caching is not for all workloads and
a few workloads could suffer (for e.g. read-once type workloads).


Thanks,

-- 
Suresh Jayaraman

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 00/09] cifs: local caching support using FS-Cache
       [not found]                       ` <4C4983B0.5080804-l3A5Bk7waGM@public.gmane.org>
@ 2010-07-23 15:03                         ` Steve French
  0 siblings, 0 replies; 265+ messages in thread
From: Steve French @ 2010-07-23 15:03 UTC (permalink / raw)
  To: Suresh Jayaraman
  Cc: David Howells, Scott Lovenberg,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA

On Fri, Jul 23, 2010 at 6:57 AM, Suresh Jayaraman <sjayaraman-l3A5Bk7waGM@public.gmane.org> wrote:
> On 07/22/2010 11:10 PM, David Howells wrote:
>> Suresh Jayaraman <sjayaraman-l3A5Bk7waGM@public.gmane.org> wrote:
>>
>>> As it can been seen, the performance while reading when data is cache
>>> hot (disk) is not great as the network link is a Gigabit ethernet (with
>>> server having working set in memory) which is mostly expected.
>>
>> That's what I see with NFS and AFS too.
>>
>>> (I could not get access to a slower network (say 100 Mb/s) where the real
>>> performance boost could be evident).
>>
>> ethtool?
>>
>
> Thanks for the pointer. Here are the results on a 100Mb/s network:
 <snip>
Excellent data - thx

> As noted by Andreas, the read performance with more number of clients
> would be more interesting as the cache can positively impact the
> scalability. However, I don't have a number of clients or know a way to
> simulate a large number of cifs clients.

You could simulate increased load by running multiple smbtorture
instances from each real client, and perhaps some local dbench like
activity run locally on the server.

> The cache also can positively
> impact the performance on heavily loaded network and/or server due to
> reduction of network calls to the server.

Reminds me a little about the discussions during the last few SMB2 plugfests:

http://channel9.msdn.com/posts/Darryl/Peer-Content-Branch-Caching-and-Retrieval-Presentation/

and an earlier one (I couldn't find the newer version of this talk)
http://channel9.msdn.com/pdc2008/ES23/

and


-- 
Thanks,

Steve

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 00/09] cifs: local caching support using FS-Cache
       [not found]                         ` <AANLkTikF5Oz5pobaPUJebUg+yPuoVy_B5PBz+nuUTSii-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
@ 2010-07-23 16:16                           ` Suresh Jayaraman
  2010-07-24  5:40                             ` Stef Bon
  0 siblings, 1 reply; 265+ messages in thread
From: Suresh Jayaraman @ 2010-07-23 16:16 UTC (permalink / raw)
  To: Stef Bon
  Cc: Andreas Dilger, David Howells, Steve French, Scott Lovenberg,
	linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA

On 07/23/2010 02:05 PM, Stef Bon wrote:
> In my opinion there should be article published about this, describing
> fs-cache generally, and these kinds of benchmarks!

FS-Cache is nicely documented on
Documentation/filesystems/caching/fscache.txt present in the kernel
source. IIRC, LWN - http://lwn.net/ had published a couple of articles
about FS-Cache. I agree that there is little information available about
the use-cases and benchmarks.

What do you mean exactly by article? More blog entries or article in
magazines or whitepaper sort? Which one do you think has a good chance
of reaching the target audience?

Thanks,

> Using fs-cache for network filesystem is an important issue, and
> should get "exposure".
> 
> Stef Bon
> 
> 2010/7/23 Andreas Dilger <adilger-m1MBpc4rdrD3fQ9qLvQP4Q@public.gmane.org>:
>> On 2010-07-22, at 11:40, David Howells wrote:
>>> Suresh Jayaraman <sjayaraman-l3A5Bk7waGM@public.gmane.org> wrote:
>>>> As it can been seen, the performance while reading when data is cache
>>>> hot (disk) is not great as the network link is a Gigabit ethernet (with
>>>> server having working set in memory) which is mostly expected.
>>>
>>> That's what I see with NFS and AFS too.
>>>
>>>> (I could not get access to a slower network (say 100 Mb/s) where the real
>>>> performance boost could be evident).
>>
>> More interesting than a slow network is testing with more clients.  10 clients should be able to get 10x the read performance from the client-local cache, probably more than the server's peak disk/network bandwidth.
>>

-- 
Suresh Jayaraman

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 00/09] cifs: local caching support using FS-Cache
  2010-07-23 16:16                           ` Suresh Jayaraman
@ 2010-07-24  5:40                             ` Stef Bon
  0 siblings, 0 replies; 265+ messages in thread
From: Stef Bon @ 2010-07-24  5:40 UTC (permalink / raw)
  To: Suresh Jayaraman
  Cc: Andreas Dilger, David Howells, Steve French, Scott Lovenberg,
	linux-cifs, linux-fsdevel, linux-cachefs

2010/7/23 Suresh Jayaraman <sjayaraman@suse.de>:
> On 07/23/2010 02:05 PM, Stef Bon wrote:
>> In my opinion there should be article published about this, describing
>> fs-cache generally, and these kinds of benchmarks!
>
> FS-Cache is nicely documented on
> Documentation/filesystems/caching/fscache.txt present in the kernel
> source. IIRC, LWN - http://lwn.net/ had published a couple of articles
> about FS-Cache. I agree that there is little information available about
> the use-cases and benchmarks.
>
> What do you mean exactly by article? More blog entries or article in
> magazines or whitepaper sort? Which one do you think has a good chance
> of reaching the target audience?
>

Well,

I do not have an answer right now, I guess articles in general
magazines, not exactly the Linux Community,
but ICT in general.

What strikes me most, is that there are a lot of people, you, me,
probably everyone reading this, are working on a lot of projects,
creating excellent software, making the Linux better and better.

But does it get exposure? Is the public aware of what happens with
Linux, does "they" know Linux or at least heard of?
No! That's a pity.
Everybody knows the iPad, why not Linux?

Linux needs better marketing!!


Stef Bon

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 00/09] cifs: local caching support using FS-Cache
       [not found]               ` <4C480F51.8070204-l3A5Bk7waGM@public.gmane.org>
  2010-07-22 17:40                 ` David Howells
@ 2010-07-30 23:08                 ` Scott Lovenberg
  1 sibling, 0 replies; 265+ messages in thread
From: Scott Lovenberg @ 2010-07-30 23:08 UTC (permalink / raw)
  To: Suresh Jayaraman
  Cc: Steve French, linux-cifs-u79uwXL29TY76Z2rM5mHXA,
	linux-fsdevel-u79uwXL29TY76Z2rM5mHXA,
	linux-cachefs-H+wXaHxf7aLQT0dZR+AlfA, David Howells

On 7/22/2010 5:28 AM, Suresh Jayaraman wrote:
> Here are some results from my benchmarking:
>
> Environment
> ------------
>
> I'm using my T60p laptop as the CIFS server (running Samba) and one of
> my test machines as CIFS client, connected over an ethernet of reported
> speed 1000 Mb/s. The TCP bandwidth as seen by a pair of netcats between
> the client and the server is about 786.24 Mb/s.
>
> Client has a 2.8 GHz Pentium D CPU with 2GB RAM
> Server has a 2.33GHz Core2 CPU (T7600) with 2GB RAM
>
>
> Test
> -----
> The benchmark involves pulling a 200 MB file over CIFS to the client
> using cat to /dev/zero under `time'. The wall clock time reported was
> recorded.
>
> Note
> ----
>     - The client was rebooted after each test, but the server was not.
>     - The entire file was loaded into RAM on the server before each test
>       to eliminate disk I/O latencies on that end.
>     - A seperate partition of size 4GB has been dedicated for the cache.
>     - There were no other CIFS client that was accessing the Server when
>       the tests were performed.
>
>
> First, the test was run on the server twice and the second result was
> recorded (noted as Server below).
>
> Secondly, the client was rebooted and the test was run with cachefiled
> not running and was recorded (noted as None below).
>
> Next, the client was rebooted, the cache contents (if any) were erased
> with mkfs.ext3 and test was run again with cachefilesd running (noted as
> COLD)
>
> Next the client was rebooted, tests were run with cachefilesd running
> this time with a populated cache (noted as HOT).
>
> Finally, the test was run again without unmounting, stopping cachefiled
> or rebooting to ensure pagecache is valid (noted as PGCACHE).
>
> The benchmark was repeated twice:
>
> Cache (state)	Run #1		Run#2
> =============  =======		=======
> Server		0.107 s		0.090 s
> None		6.497 s		6.440 s
> COLD		6.707 s		6.635 s
> HOT		5.286 s		5.078 s
> PGCACHE		0.090 s		0.091 s
>
> As it can been seen, the performance while reading when data is cache
> hot (disk) is not great as the network link is a Gigabit ethernet (with
> server having working set in memory) which is mostly expected. (I could
> not get access to a slower network (say 100 Mb/s) where the real
> performance boost could be evident).
>
>
> Thanks,
>
>
>    

Suresh, thanks for taking the time to run these bench marks. :)

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v2] TODO: CDMA SMS and CDMA CMAS
       [not found] <Yes>
                   ` (9 preceding siblings ...)
  2010-07-05 12:43 ` [PATCH 09/09] cifs: add mount option to enable local caching Suresh Jayaraman
@ 2010-12-20 22:18 ` Lei Yu
  2010-12-20 22:34   ` Denis Kenzior
  2012-07-16 16:14   ` Joonsoo Kim
                   ` (12 subsequent siblings)
  23 siblings, 1 reply; 265+ messages in thread
From: Lei Yu @ 2010-12-20 22:18 UTC (permalink / raw)
  To: ofono

[-- Attachment #1: Type: text/plain, Size: 5084 bytes --]

---
 TODO |  116 ++++++++++++++++++++++++++++++++++++++++++++++++++---------------
 1 files changed, 89 insertions(+), 27 deletions(-)

diff --git a/TODO b/TODO
index d40c0ff..ee32e34 100644
--- a/TODO
+++ b/TODO
@@ -49,33 +49,6 @@ SMS
   Complexity: C1
   Owner: Kristen Carlson Accardi <kristen@linux.intel.com>
 
-- Add CDMA support to the SMS stack. The idea is to support only the PDU
-  mode. To start with only Submit and Deliver message handling for WMT
-  teleservice will be added to bring the basic CDMA SMS send and receive
-  functionality.
-
-  Priority: Low
-  Complexity: C8
-  Owner: Rajesh Kadhiravan Nagaiah <Rajesh.Nagaiah@elektrobit.com>
-
-- Add CDMA Delivery(Status) Report handling to the SMS stack.
-
-  Priority: Low
-  Complexity: C4
-  Owner: Rajesh Kadhiravan Nagaiah <Rajesh.Nagaiah@elektrobit.com>
-
-- Add CDMA Voice Mail Notification handling to the SMS stack. In CDMA the
-  Message Waiting indication is notified through a specific teleservice ID
-  VMN. No update to corresponding elementary files required since they are
-  not present in the R-UIM. This will result in the message waiting
-  indication being initially processed within the SMS atom and then being
-  passed for delivery to the message waiting atom. Furthemore note that in
-  CDMA only voice mail type is supported.
-
-  Priority: Low
-  Complexity: C4
-  Owner: Rajesh Kadhiravan Nagaiah <Rajesh.Nagaiah@elektrobit.com>
-
 - Asynchronously acknowledge SMS DELIVER messages sent by the SMS driver
   to core using ofono_sms_deliver_notify().  This may require the struct
   ofono_sms_driver to be extended with one more function pointer like:
@@ -529,3 +502,92 @@ CDMA Voicecall
   Priority: High
   Complexity: C2
   Owner: Dara Spieker-Doyle <dara.spieker-doyle@nokia.com>
+
+CDMA SMS
+==============
+
+- Support CDMA SMS stack in PDU mode. This includes basic support of
+  SMS Point-to-Point Message, SMS Broadcast Message and SMS Acknowledge
+  Message as per 3GPP2 C.S0015-B version 2.0.
+
+  Priority: High
+  Complexity: C4
+
+- Support sending Wireless Messaging Teleservice (WMT) Submit Message and
+  receiving WMT Deliver Messsage as defined 3GPP2 C.S0015-B version 2.0.
+
+  Priority: High
+  Complexity: C4
+
+- Support Delivery Acknowledgment. oFono allows requesting of CDMA SMS
+  Delivery Acknowledgment via the MessageManager's
+  UseDeliveryAcknowledgement property. If enabled, oFono's CDMA SMS stack
+  will encode the Reply Option subparameter in the Submit message and
+  process incoming SMS Delivery Acknowledgment Message. oFono will notify
+  UI either via DBus or history plugin API.
+
+  Priority: Medium
+  Complexity: C2
+
+- Support receiving Voice Mail Notification (VMN) Teleservice Deliver
+  message. CDMA network uses VMN Teleservice to deliver the number of
+  messages stored at the Voice Mail System to the CDMA mobile subscriber.
+
+  Priority: High
+  Complexity: C4
+
+- Support sending Wireless Enhanced Messaging Teleservice (WEMT) Submit
+  Message and receiving WEMT Deliver Messsage as defined 3GPP2 C.S0015-B
+  version 2.0.
+
+  WMT does not support message fragmentation thus can not be used to for
+  long message. WEMT is devised to support long message and Enhanced
+  Messaging Service (EMS). The WEMT SMS message's CHARi field of the
+  subparameter User Data encapsulate GSM-SMS TP-User Data as defined in
+  Section 9.2.3.24 of 3GPP TS 23.040.
+
+  Priority: Medium
+  Complexity: C4
+
+- Support sending Wireless Application Protocol (WAP) Teleservice Submit
+  Message and receiving WAP Deliver Messsage as defined 3GPP2 C.S0015-B
+  version 2.0.
+
+  Priority: Medium
+  Complexity: C4
+
+- Support Call-Back Number. The Call-Back Number subparameter indicates
+  the number to be dialed in reply to a received SMS message.
+
+  In transmit direction, oFono allows setting of Call-Back Number. If the
+  Call Back Number property is set, CDMA SMS stack will encode Call-Back
+  Number subparameter in the Submit Message.
+
+  In receiving direction, oFono will process the Call-Back Number
+  subparameter in the incoming Deliver Message and notify UI of the
+  Call-Back Number together with the newly received text message.
+
+  Priority: Medium
+  Complexity: C2
+
+- Support immediately displayed message. oFono CDMA SMS stack will
+  process the optional Message Display Mode subparameter in the incoming
+  SMS message. If Message Display Mode subparameter indicates the
+  message display mode is Immediate Display, oFono will send
+  ImmediateMessage signal, otherwise oFono will send IncomingMessage
+  signal.
+
+  Priority: Medium
+  Complexity: C2
+
+
+CDMA CMAS
+==============
+
+- Support Commercial Mobile Alert Service (CMAS) over CDMA systems. CMAS
+  over CDMA system is defined in TIA-1149. The CMAS message is carried in
+  the CHARi field of the User Data subparameter of CDMA SMS Broadcast
+  message.
+
+  Priority: Medium
+  Complexity: C4
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* Re: [PATCH v2] TODO: CDMA SMS and CDMA CMAS
  2010-12-20 22:18 ` [PATCH v2] TODO: CDMA SMS and CDMA CMAS Lei Yu
@ 2010-12-20 22:34   ` Denis Kenzior
  0 siblings, 0 replies; 265+ messages in thread
From: Denis Kenzior @ 2010-12-20 22:34 UTC (permalink / raw)
  To: ofono

[-- Attachment #1: Type: text/plain, Size: 253 bytes --]

Hi Lei,

On 12/20/2010 04:18 PM, Lei Yu wrote:
> ---
>  TODO |  116 ++++++++++++++++++++++++++++++++++++++++++++++++++---------------
>  1 files changed, 89 insertions(+), 27 deletions(-)
> 

Patch has been applied, thanks.

Regards,
-Denis

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH 1/3] mm: correct return value of migrate_pages()
       [not found] <Yes>
@ 2012-07-16 16:14   ` Joonsoo Kim
  2010-07-05 12:41 ` [PATCH 01/09] cifs: add kernel config option for CIFS Client caching support Suresh Jayaraman
                     ` (22 subsequent siblings)
  23 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-16 16:14 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter

migrate_pages() should return number of pages not migrated or error code.
When unmap_and_move return -EAGAIN, outer loop is re-execution without
initialising nr_failed. This makes nr_failed over-counted.

So this patch correct it by initialising nr_failed in outer loop.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Christoph Lameter <cl@linux.com>

diff --git a/mm/migrate.c b/mm/migrate.c
index be26d5c..294d52a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -982,6 +982,7 @@ int migrate_pages(struct list_head *from,
 
 	for(pass = 0; pass < 10 && retry; pass++) {
 		retry = 0;
+		nr_failed = 0;
 
 		list_for_each_entry_safe(page, page2, from, lru) {
 			cond_resched();
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 1/3] mm: correct return value of migrate_pages()
@ 2012-07-16 16:14   ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-16 16:14 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter

migrate_pages() should return number of pages not migrated or error code.
When unmap_and_move return -EAGAIN, outer loop is re-execution without
initialising nr_failed. This makes nr_failed over-counted.

So this patch correct it by initialising nr_failed in outer loop.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Christoph Lameter <cl@linux.com>

diff --git a/mm/migrate.c b/mm/migrate.c
index be26d5c..294d52a 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -982,6 +982,7 @@ int migrate_pages(struct list_head *from,
 
 	for(pass = 0; pass < 10 && retry; pass++) {
 		retry = 0;
+		nr_failed = 0;
 
 		list_for_each_entry_safe(page, page2, from, lru) {
 			cond_resched();
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall
  2012-07-16 16:14   ` Joonsoo Kim
@ 2012-07-16 16:14     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-16 16:14 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Sasha Levin, Christoph Lameter

do_migrate_pages() can return the number of pages not migrated.
Because migrate_pages() syscall return this value directly,
migrate_pages() syscall may return the number of pages not migrated.
In fail case in migrate_pages() syscall, we should return error value.
So change err to -EIO

Additionally, Correct comment above do_migrate_pages()

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Cc: Christoph Lameter <cl@linux.com>

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 1d771e4..f7df271 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -948,7 +948,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
  * Move pages between the two nodesets so as to preserve the physical
  * layout as much as possible.
  *
- * Returns the number of page that could not be moved.
+ * Returns error or the number of pages not migrated.
  */
 int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
 		     const nodemask_t *to, int flags)
@@ -1382,6 +1382,8 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
 
 	err = do_migrate_pages(mm, old, new,
 		capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
+	if (err > 0)
+		err = -EIO;
 
 	mmput(mm);
 out:
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall
@ 2012-07-16 16:14     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-16 16:14 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Sasha Levin, Christoph Lameter

do_migrate_pages() can return the number of pages not migrated.
Because migrate_pages() syscall return this value directly,
migrate_pages() syscall may return the number of pages not migrated.
In fail case in migrate_pages() syscall, we should return error value.
So change err to -EIO

Additionally, Correct comment above do_migrate_pages()

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Cc: Christoph Lameter <cl@linux.com>

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 1d771e4..f7df271 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -948,7 +948,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
  * Move pages between the two nodesets so as to preserve the physical
  * layout as much as possible.
  *
- * Returns the number of page that could not be moved.
+ * Returns error or the number of pages not migrated.
  */
 int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
 		     const nodemask_t *to, int flags)
@@ -1382,6 +1382,8 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
 
 	err = do_migrate_pages(mm, old, new,
 		capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
+	if (err > 0)
+		err = -EIO;
 
 	mmput(mm);
 out:
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 3/3] mm: fix return value in __alloc_contig_migrate_range()
  2012-07-16 16:14   ` Joonsoo Kim
@ 2012-07-16 16:14     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-16 16:14 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Michal Nazarewicz,
	Marek Szyprowski, Minchan Kim, Christoph Lameter

migrate_pages() would return positive value in some failure case,
so 'ret > 0 ? 0 : ret' may be wrong.
This fix it and remove one dead statement.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Christoph Lameter <cl@linux.com>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4403009..02d4519 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5673,7 +5673,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 			}
 			tries = 0;
 		} else if (++tries == 5) {
-			ret = ret < 0 ? ret : -EBUSY;
 			break;
 		}
 
@@ -5683,7 +5682,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 	}
 
 	putback_lru_pages(&cc.migratepages);
-	return ret > 0 ? 0 : ret;
+	return ret <= 0 ? ret : -EBUSY;
 }
 
 /*
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 3/3] mm: fix return value in __alloc_contig_migrate_range()
@ 2012-07-16 16:14     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-16 16:14 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Michal Nazarewicz,
	Marek Szyprowski, Minchan Kim, Christoph Lameter

migrate_pages() would return positive value in some failure case,
so 'ret > 0 ? 0 : ret' may be wrong.
This fix it and remove one dead statement.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Christoph Lameter <cl@linux.com>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4403009..02d4519 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5673,7 +5673,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 			}
 			tries = 0;
 		} else if (++tries == 5) {
-			ret = ret < 0 ? ret : -EBUSY;
 			break;
 		}
 
@@ -5683,7 +5682,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 	}
 
 	putback_lru_pages(&cc.migratepages);
-	return ret > 0 ? 0 : ret;
+	return ret <= 0 ? ret : -EBUSY;
 }
 
 /*
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 4] mm: fix possible incorrect return value of move_pages() syscall
  2012-07-16 16:14   ` Joonsoo Kim
@ 2012-07-16 17:14     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-16 17:14 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Brice Goglin,
	Christoph Lameter, Minchan Kim

move_pages() syscall may return success in case that
do_move_page_to_node_array return positive value which means migration failed.
This patch changes return value of do_move_page_to_node_array
for not returning positive value. It can fix the problem.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Brice Goglin <brice@myri.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan@kernel.org>

diff --git a/mm/migrate.c b/mm/migrate.c
index 294d52a..adabaf4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1171,7 +1171,7 @@ set_status:
 	}
 
 	up_read(&mm->mmap_sem);
-	return err;
+	return err > 0 ? -EIO : err;
 }
 
 /*
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 4] mm: fix possible incorrect return value of move_pages() syscall
@ 2012-07-16 17:14     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-16 17:14 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Brice Goglin,
	Christoph Lameter, Minchan Kim

move_pages() syscall may return success in case that
do_move_page_to_node_array return positive value which means migration failed.
This patch changes return value of do_move_page_to_node_array
for not returning positive value. It can fix the problem.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Brice Goglin <brice@myri.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan@kernel.org>

diff --git a/mm/migrate.c b/mm/migrate.c
index 294d52a..adabaf4 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1171,7 +1171,7 @@ set_status:
 	}
 
 	up_read(&mm->mmap_sem);
-	return err;
+	return err > 0 ? -EIO : err;
 }
 
 /*
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* Re: [PATCH 1/3] mm: correct return value of migrate_pages()
  2012-07-16 16:14   ` Joonsoo Kim
@ 2012-07-16 17:23     ` Christoph Lameter
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-16 17:23 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm

On Tue, 17 Jul 2012, Joonsoo Kim wrote:

> migrate_pages() should return number of pages not migrated or error code.
> When unmap_and_move return -EAGAIN, outer loop is re-execution without
> initialising nr_failed. This makes nr_failed over-counted.

The itention of the nr_failed was only to give an indication as to how
many attempts where made. The failed pages where on a separate queue that
seems to have vanished.

> So this patch correct it by initialising nr_failed in outer loop.

Well yea it makes sense since retry is initialized there as well.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 1/3] mm: correct return value of migrate_pages()
@ 2012-07-16 17:23     ` Christoph Lameter
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-16 17:23 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm

On Tue, 17 Jul 2012, Joonsoo Kim wrote:

> migrate_pages() should return number of pages not migrated or error code.
> When unmap_and_move return -EAGAIN, outer loop is re-execution without
> initialising nr_failed. This makes nr_failed over-counted.

The itention of the nr_failed was only to give an indication as to how
many attempts where made. The failed pages where on a separate queue that
seems to have vanished.

> So this patch correct it by initialising nr_failed in outer loop.

Well yea it makes sense since retry is initialized there as well.

Acked-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall
  2012-07-16 16:14     ` Joonsoo Kim
@ 2012-07-16 17:26       ` Christoph Lameter
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-16 17:26 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm, Sasha Levin

On Tue, 17 Jul 2012, Joonsoo Kim wrote

> do_migrate_pages() can return the number of pages not migrated.
> Because migrate_pages() syscall return this value directly,
> migrate_pages() syscall may return the number of pages not migrated.
> In fail case in migrate_pages() syscall, we should return error value.
> So change err to -EIO

Pages are not migrated because they are busy not because there is an
error. So lets return EBUSY.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall
@ 2012-07-16 17:26       ` Christoph Lameter
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-16 17:26 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm, Sasha Levin

On Tue, 17 Jul 2012, Joonsoo Kim wrote

> do_migrate_pages() can return the number of pages not migrated.
> Because migrate_pages() syscall return this value directly,
> migrate_pages() syscall may return the number of pages not migrated.
> In fail case in migrate_pages() syscall, we should return error value.
> So change err to -EIO

Pages are not migrated because they are busy not because there is an
error. So lets return EBUSY.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 3/3] mm: fix return value in __alloc_contig_migrate_range()
  2012-07-16 16:14     ` Joonsoo Kim
@ 2012-07-16 17:29       ` Christoph Lameter
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-16 17:29 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: akpm, linux-kernel, linux-mm, Michal Nazarewicz,
	Marek Szyprowski, Minchan Kim

On Tue, 17 Jul 2012, Joonsoo Kim wrote:

> migrate_pages() would return positive value in some failure case,
> so 'ret > 0 ? 0 : ret' may be wrong.
> This fix it and remove one dead statement.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 3/3] mm: fix return value in __alloc_contig_migrate_range()
@ 2012-07-16 17:29       ` Christoph Lameter
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-16 17:29 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: akpm, linux-kernel, linux-mm, Michal Nazarewicz,
	Marek Szyprowski, Minchan Kim

On Tue, 17 Jul 2012, Joonsoo Kim wrote:

> migrate_pages() would return positive value in some failure case,
> so 'ret > 0 ? 0 : ret' may be wrong.
> This fix it and remove one dead statement.

Acked-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 4] mm: fix possible incorrect return value of move_pages() syscall
  2012-07-16 17:14     ` Joonsoo Kim
@ 2012-07-16 17:30       ` Christoph Lameter
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-16 17:30 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim

On Tue, 17 Jul 2012, Joonsoo Kim wrote:

> move_pages() syscall may return success in case that
> do_move_page_to_node_array return positive value which means migration failed.
> This patch changes return value of do_move_page_to_node_array
> for not returning positive value. It can fix the problem.
>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> Cc: Brice Goglin <brice@myri.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Minchan Kim <minchan@kernel.org>
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 294d52a..adabaf4 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1171,7 +1171,7 @@ set_status:
>  	}
>
>  	up_read(&mm->mmap_sem);
> -	return err;
> +	return err > 0 ? -EIO : err;
>  }

Please use EBUSY.


^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 4] mm: fix possible incorrect return value of move_pages() syscall
@ 2012-07-16 17:30       ` Christoph Lameter
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-16 17:30 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim

On Tue, 17 Jul 2012, Joonsoo Kim wrote:

> move_pages() syscall may return success in case that
> do_move_page_to_node_array return positive value which means migration failed.
> This patch changes return value of do_move_page_to_node_array
> for not returning positive value. It can fix the problem.
>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> Cc: Brice Goglin <brice@myri.com>
> Cc: Christoph Lameter <cl@linux.com>
> Cc: Minchan Kim <minchan@kernel.org>
>
> diff --git a/mm/migrate.c b/mm/migrate.c
> index 294d52a..adabaf4 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -1171,7 +1171,7 @@ set_status:
>  	}
>
>  	up_read(&mm->mmap_sem);
> -	return err;
> +	return err > 0 ? -EIO : err;
>  }

Please use EBUSY.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 1/3] mm: correct return value of migrate_pages()
  2012-07-16 17:23     ` Christoph Lameter
@ 2012-07-16 17:32       ` JoonSoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-07-16 17:32 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, linux-mm

2012/7/17 Christoph Lameter <cl@linux.com>:
> On Tue, 17 Jul 2012, Joonsoo Kim wrote:
>
>> migrate_pages() should return number of pages not migrated or error code.
>> When unmap_and_move return -EAGAIN, outer loop is re-execution without
>> initialising nr_failed. This makes nr_failed over-counted.
>
> The itention of the nr_failed was only to give an indication as to how
> many attempts where made. The failed pages where on a separate queue that
> seems to have vanished.
>
>> So this patch correct it by initialising nr_failed in outer loop.
>
> Well yea it makes sense since retry is initialized there as well.
>
> Acked-by: Christoph Lameter <cl@linux.com>

Thanks for comment.

Additinally, I find that migrate_huge_pages() is needed identical fix
as migrate_pages().

@@ -1029,6 +1030,7 @@ int migrate_huge_pages(struct list_head *from,

        for (pass = 0; pass < 10 && retry; pass++) {
                retry = 0;
+               nr_failed = 0;

                list_for_each_entry_safe(page, page2, from, lru) {
                        cond_resched();

When I resend with this, could I include "Acked-by: Christoph Lameter
<cl@linux.com>"?

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 1/3] mm: correct return value of migrate_pages()
@ 2012-07-16 17:32       ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-07-16 17:32 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, linux-mm

2012/7/17 Christoph Lameter <cl@linux.com>:
> On Tue, 17 Jul 2012, Joonsoo Kim wrote:
>
>> migrate_pages() should return number of pages not migrated or error code.
>> When unmap_and_move return -EAGAIN, outer loop is re-execution without
>> initialising nr_failed. This makes nr_failed over-counted.
>
> The itention of the nr_failed was only to give an indication as to how
> many attempts where made. The failed pages where on a separate queue that
> seems to have vanished.
>
>> So this patch correct it by initialising nr_failed in outer loop.
>
> Well yea it makes sense since retry is initialized there as well.
>
> Acked-by: Christoph Lameter <cl@linux.com>

Thanks for comment.

Additinally, I find that migrate_huge_pages() is needed identical fix
as migrate_pages().

@@ -1029,6 +1030,7 @@ int migrate_huge_pages(struct list_head *from,

        for (pass = 0; pass < 10 && retry; pass++) {
                retry = 0;
+               nr_failed = 0;

                list_for_each_entry_safe(page, page2, from, lru) {
                        cond_resched();

When I resend with this, could I include "Acked-by: Christoph Lameter
<cl@linux.com>"?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 1/3] mm: correct return value of migrate_pages()
  2012-07-16 17:32       ` JoonSoo Kim
@ 2012-07-16 17:37         ` Christoph Lameter
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-16 17:37 UTC (permalink / raw)
  To: JoonSoo Kim; +Cc: akpm, linux-kernel, linux-mm

On Tue, 17 Jul 2012, JoonSoo Kim wrote:

>
>         for (pass = 0; pass < 10 && retry; pass++) {
>                 retry = 0;
> +               nr_failed = 0;
>
>                 list_for_each_entry_safe(page, page2, from, lru) {
>                         cond_resched();
>
> When I resend with this, could I include "Acked-by: Christoph Lameter
> <cl@linux.com>"?

Sure.


^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 1/3] mm: correct return value of migrate_pages()
@ 2012-07-16 17:37         ` Christoph Lameter
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-16 17:37 UTC (permalink / raw)
  To: JoonSoo Kim; +Cc: akpm, linux-kernel, linux-mm

On Tue, 17 Jul 2012, JoonSoo Kim wrote:

>
>         for (pass = 0; pass < 10 && retry; pass++) {
>                 retry = 0;
> +               nr_failed = 0;
>
>                 list_for_each_entry_safe(page, page2, from, lru) {
>                         cond_resched();
>
> When I resend with this, could I include "Acked-by: Christoph Lameter
> <cl@linux.com>"?

Sure.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 1/3] mm: correct return value of migrate_pages()
  2012-07-16 16:14   ` Joonsoo Kim
                     ` (4 preceding siblings ...)
  (?)
@ 2012-07-16 17:40   ` Michal Nazarewicz
  2012-07-16 17:57       ` JoonSoo Kim
  -1 siblings, 1 reply; 265+ messages in thread
From: Michal Nazarewicz @ 2012-07-16 17:40 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm, Christoph Lameter

[-- Attachment #1: Type: text/plain, Size: 1307 bytes --]

Joonsoo Kim <js1304@gmail.com> writes:
> migrate_pages() should return number of pages not migrated or error code.
> When unmap_and_move return -EAGAIN, outer loop is re-execution without
> initialising nr_failed. This makes nr_failed over-counted.
>
> So this patch correct it by initialising nr_failed in outer loop.
>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> Cc: Christoph Lameter <cl@linux.com>

Acked-by: Michal Nazarewicz <mina86@mina86.com>

Actually, it makes me wonder if there is any code that uses this
information.  If not, it would be best in my opinion to make it return
zero or negative error code, but that would have to be checked.

> diff --git a/mm/migrate.c b/mm/migrate.c
> index be26d5c..294d52a 100644
> --- a/mm/migrate.c
> +++ b/mm/migrate.c
> @@ -982,6 +982,7 @@ int migrate_pages(struct list_head *from,
>  
>  	for(pass = 0; pass < 10 && retry; pass++) {
>  		retry = 0;
> +		nr_failed = 0;
>  
>  		list_for_each_entry_safe(page, page2, from, lru) {
>  			cond_resched();

-- 
Best regards,                                          _     _
 .o. | Liege of Serenly Enlightened Majesty of       o' \,=./ `o
 ..o | Computer Science,  Michal "mina86" Nazarewicz    (o o)
 ooo +-<mina86-mina86.com>-<jid:mina86-jabber.org>--ooO--(_)--Ooo--

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall
  2012-07-16 16:14     ` Joonsoo Kim
  (?)
  (?)
@ 2012-07-16 17:40     ` Michal Nazarewicz
  2012-07-16 17:59         ` JoonSoo Kim
  -1 siblings, 1 reply; 265+ messages in thread
From: Michal Nazarewicz @ 2012-07-16 17:40 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm, Sasha Levin, Christoph Lameter

[-- Attachment #1: Type: text/plain, Size: 1709 bytes --]

Joonsoo Kim <js1304@gmail.com> writes:
> do_migrate_pages() can return the number of pages not migrated.
> Because migrate_pages() syscall return this value directly,
> migrate_pages() syscall may return the number of pages not migrated.
> In fail case in migrate_pages() syscall, we should return error value.
> So change err to -EIO
>
> Additionally, Correct comment above do_migrate_pages()
>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> Cc: Sasha Levin <levinsasha928@gmail.com>
> Cc: Christoph Lameter <cl@linux.com>

Acked-by: Michal Nazarewicz <mina86@mina86.com>

> diff --git a/mm/mempolicy.c b/mm/mempolicy.c
> index 1d771e4..f7df271 100644
> --- a/mm/mempolicy.c
> +++ b/mm/mempolicy.c
> @@ -948,7 +948,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
>   * Move pages between the two nodesets so as to preserve the physical
>   * layout as much as possible.
>   *
> - * Returns the number of page that could not be moved.
> + * Returns error or the number of pages not migrated.
>   */
>  int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
>  		     const nodemask_t *to, int flags)
> @@ -1382,6 +1382,8 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
>  
>  	err = do_migrate_pages(mm, old, new,
>  		capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
> +	if (err > 0)
> +		err = -EIO;
>  
>  	mmput(mm);
>  out:

-- 
Best regards,                                          _     _
 .o. | Liege of Serenly Enlightened Majesty of       o' \,=./ `o
 ..o | Computer Science,  Michal "mina86" Nazarewicz    (o o)
 ooo +-<mina86-mina86.com>-<jid:mina86-jabber.org>--ooO--(_)--Ooo--

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 3/3] mm: fix return value in __alloc_contig_migrate_range()
  2012-07-16 16:14     ` Joonsoo Kim
  (?)
  (?)
@ 2012-07-16 17:40     ` Michal Nazarewicz
  2012-07-16 18:40         ` JoonSoo Kim
  -1 siblings, 1 reply; 265+ messages in thread
From: Michal Nazarewicz @ 2012-07-16 17:40 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: akpm, linux-kernel, linux-mm, Marek Szyprowski, Minchan Kim,
	Christoph Lameter

[-- Attachment #1: Type: text/plain, Size: 1694 bytes --]

Joonsoo Kim <js1304@gmail.com> writes:

> migrate_pages() would return positive value in some failure case,
> so 'ret > 0 ? 0 : ret' may be wrong.
> This fix it and remove one dead statement.
>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>
> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Christoph Lameter <cl@linux.com>

Have you actually encountered this problem?  If migrate_pages() fails
with a positive value, the code that you are removing kicks in and
-EBUSY is assigned to ret (now that I look at it, I think that in the
current code the "return ret > 0 ? 0 : ret;" statement could be reduced
to "return ret;").  Your code seems to be cleaner, but the commit
message does not look accurate to me.

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4403009..02d4519 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5673,7 +5673,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
>  			}
>  			tries = 0;
>  		} else if (++tries == 5) {
> -			ret = ret < 0 ? ret : -EBUSY;
>  			break;
>  		}
>  
> @@ -5683,7 +5682,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
>  	}
>  
>  	putback_lru_pages(&cc.migratepages);
> -	return ret > 0 ? 0 : ret;
> +	return ret <= 0 ? ret : -EBUSY;
>  }
>  
>  /*

-- 
Best regards,                                          _     _
 .o. | Liege of Serenly Enlightened Majesty of       o' \,=./ `o
 ..o | Computer Science,  Michal "mina86" Nazarewicz    (o o)
 ooo +-<mina86-mina86.com>-<jid:mina86-jabber.org>--ooO--(_)--Ooo--

[-- Attachment #2: Type: application/pgp-signature, Size: 835 bytes --]

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 1/3] mm: correct return value of migrate_pages()
  2012-07-16 17:40   ` Michal Nazarewicz
@ 2012-07-16 17:57       ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-07-16 17:57 UTC (permalink / raw)
  To: Michal Nazarewicz; +Cc: akpm, linux-kernel, linux-mm, Christoph Lameter

2012/7/17 Michal Nazarewicz <mina86@tlen.pl>:
> Acked-by: Michal Nazarewicz <mina86@mina86.com>

Thanks.

> Actually, it makes me wonder if there is any code that uses this
> information.  If not, it would be best in my opinion to make it return
> zero or negative error code, but that would have to be checked.

I think that, too.
I looked at every callsites for migrate_pages() and there is no place
which really need fail count.
This function sometimes makes caller error-prone,
so I think changing return value is preferable.

How do you think, Christoph?

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 1/3] mm: correct return value of migrate_pages()
@ 2012-07-16 17:57       ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-07-16 17:57 UTC (permalink / raw)
  To: Michal Nazarewicz; +Cc: akpm, linux-kernel, linux-mm, Christoph Lameter

2012/7/17 Michal Nazarewicz <mina86@tlen.pl>:
> Acked-by: Michal Nazarewicz <mina86@mina86.com>

Thanks.

> Actually, it makes me wonder if there is any code that uses this
> information.  If not, it would be best in my opinion to make it return
> zero or negative error code, but that would have to be checked.

I think that, too.
I looked at every callsites for migrate_pages() and there is no place
which really need fail count.
This function sometimes makes caller error-prone,
so I think changing return value is preferable.

How do you think, Christoph?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall
  2012-07-16 17:40     ` Michal Nazarewicz
@ 2012-07-16 17:59         ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-07-16 17:59 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: akpm, linux-kernel, linux-mm, Sasha Levin, Christoph Lameter

2012/7/17 Michal Nazarewicz <mina86@tlen.pl>:
> Joonsoo Kim <js1304@gmail.com> writes:
>> do_migrate_pages() can return the number of pages not migrated.
>> Because migrate_pages() syscall return this value directly,
>> migrate_pages() syscall may return the number of pages not migrated.
>> In fail case in migrate_pages() syscall, we should return error value.
>> So change err to -EIO
>>
>> Additionally, Correct comment above do_migrate_pages()
>>
>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>> Cc: Sasha Levin <levinsasha928@gmail.com>
>> Cc: Christoph Lameter <cl@linux.com>
>
> Acked-by: Michal Nazarewicz <mina86@mina86.com>

Thanks.

When I resend with changing -EIO to -EBUSY,
could I include "Acked-by: Michal Nazarewicz <mina86@mina86.com>"?

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall
@ 2012-07-16 17:59         ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-07-16 17:59 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: akpm, linux-kernel, linux-mm, Sasha Levin, Christoph Lameter

2012/7/17 Michal Nazarewicz <mina86@tlen.pl>:
> Joonsoo Kim <js1304@gmail.com> writes:
>> do_migrate_pages() can return the number of pages not migrated.
>> Because migrate_pages() syscall return this value directly,
>> migrate_pages() syscall may return the number of pages not migrated.
>> In fail case in migrate_pages() syscall, we should return error value.
>> So change err to -EIO
>>
>> Additionally, Correct comment above do_migrate_pages()
>>
>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>> Cc: Sasha Levin <levinsasha928@gmail.com>
>> Cc: Christoph Lameter <cl@linux.com>
>
> Acked-by: Michal Nazarewicz <mina86@mina86.com>

Thanks.

When I resend with changing -EIO to -EBUSY,
could I include "Acked-by: Michal Nazarewicz <mina86@mina86.com>"?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 1/3] mm: correct return value of migrate_pages()
  2012-07-16 17:57       ` JoonSoo Kim
@ 2012-07-16 18:05         ` Christoph Lameter
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-16 18:05 UTC (permalink / raw)
  To: JoonSoo Kim; +Cc: Michal Nazarewicz, akpm, linux-kernel, linux-mm

On Tue, 17 Jul 2012, JoonSoo Kim wrote:

> > Actually, it makes me wonder if there is any code that uses this
> > information.  If not, it would be best in my opinion to make it return
> > zero or negative error code, but that would have to be checked.
>
> I think that, too.
> I looked at every callsites for migrate_pages() and there is no place
> which really need fail count.
> This function sometimes makes caller error-prone,
> so I think changing return value is preferable.
>
> How do you think, Christoph?

We could do that. I am not aware of anything using that information
either. However, the condition in which some pages where migrated and
others are not is not like a classic error. In many situations the moving
of the pages is done for performance reasons. This just means that the
best performant memory locations could not be used for some pages. A
situation like that may be ok for an application.


^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 1/3] mm: correct return value of migrate_pages()
@ 2012-07-16 18:05         ` Christoph Lameter
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-16 18:05 UTC (permalink / raw)
  To: JoonSoo Kim; +Cc: Michal Nazarewicz, akpm, linux-kernel, linux-mm

On Tue, 17 Jul 2012, JoonSoo Kim wrote:

> > Actually, it makes me wonder if there is any code that uses this
> > information.  If not, it would be best in my opinion to make it return
> > zero or negative error code, but that would have to be checked.
>
> I think that, too.
> I looked at every callsites for migrate_pages() and there is no place
> which really need fail count.
> This function sometimes makes caller error-prone,
> so I think changing return value is preferable.
>
> How do you think, Christoph?

We could do that. I am not aware of anything using that information
either. However, the condition in which some pages where migrated and
others are not is not like a classic error. In many situations the moving
of the pages is done for performance reasons. This just means that the
best performant memory locations could not be used for some pages. A
situation like that may be ok for an application.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 3/3] mm: fix return value in __alloc_contig_migrate_range()
  2012-07-16 17:40     ` Michal Nazarewicz
@ 2012-07-16 18:40         ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-07-16 18:40 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: akpm, linux-kernel, linux-mm, Marek Szyprowski, Minchan Kim,
	Christoph Lameter

2012/7/17 Michal Nazarewicz <mina86@mina86.com>:
> Joonsoo Kim <js1304@gmail.com> writes:
>
>> migrate_pages() would return positive value in some failure case,
>> so 'ret > 0 ? 0 : ret' may be wrong.
>> This fix it and remove one dead statement.
>>
>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>> Cc: Michal Nazarewicz <mina86@mina86.com>
>> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Christoph Lameter <cl@linux.com>
>
> Have you actually encountered this problem?  If migrate_pages() fails
> with a positive value, the code that you are removing kicks in and
> -EBUSY is assigned to ret (now that I look at it, I think that in the
> current code the "return ret > 0 ? 0 : ret;" statement could be reduced
> to "return ret;").  Your code seems to be cleaner, but the commit
> message does not look accurate to me.
>

I don't encounter this problem yet.

If migrate_pages() with offlining false meets KSM page, then migration failed.
In this case, failed page is removed from cc.migratepage list and
return failed count.
So it can be possible exiting loop without testing ++tries == 5 and
ret is over the zero.
Is there any point which I missing?
Is there any possible scenario "migrate_pages return  > 0 and
cc.migratepages is empty"?

I'm not expert for MM, so please comment my humble opinion.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 3/3] mm: fix return value in __alloc_contig_migrate_range()
@ 2012-07-16 18:40         ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-07-16 18:40 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: akpm, linux-kernel, linux-mm, Marek Szyprowski, Minchan Kim,
	Christoph Lameter

2012/7/17 Michal Nazarewicz <mina86@mina86.com>:
> Joonsoo Kim <js1304@gmail.com> writes:
>
>> migrate_pages() would return positive value in some failure case,
>> so 'ret > 0 ? 0 : ret' may be wrong.
>> This fix it and remove one dead statement.
>>
>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>> Cc: Michal Nazarewicz <mina86@mina86.com>
>> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Cc: Christoph Lameter <cl@linux.com>
>
> Have you actually encountered this problem?  If migrate_pages() fails
> with a positive value, the code that you are removing kicks in and
> -EBUSY is assigned to ret (now that I look at it, I think that in the
> current code the "return ret > 0 ? 0 : ret;" statement could be reduced
> to "return ret;").  Your code seems to be cleaner, but the commit
> message does not look accurate to me.
>

I don't encounter this problem yet.

If migrate_pages() with offlining false meets KSM page, then migration failed.
In this case, failed page is removed from cc.migratepage list and
return failed count.
So it can be possible exiting loop without testing ++tries == 5 and
ret is over the zero.
Is there any point which I missing?
Is there any possible scenario "migrate_pages return  > 0 and
cc.migratepages is empty"?

I'm not expert for MM, so please comment my humble opinion.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH 1/4 v2] mm: correct return value of migrate_pages() and migrate_huge_pages()
       [not found] <Yes>
@ 2012-07-17 12:33   ` Joonsoo Kim
  2010-07-05 12:41 ` [PATCH 01/09] cifs: add kernel config option for CIFS Client caching support Suresh Jayaraman
                     ` (22 subsequent siblings)
  23 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-17 12:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter

migrate_pages() should return number of pages not migrated or error code.
When unmap_and_move return -EAGAIN, outer loop is re-execution without
initialising nr_failed. This makes nr_failed over-counted.

So this patch correct it by initialising nr_failed in outer loop.

migrate_huge_pages() is identical case as migrate_pages()

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>

diff --git a/mm/migrate.c b/mm/migrate.c
index be26d5c..f495c58 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -982,6 +982,7 @@ int migrate_pages(struct list_head *from,
 
 	for(pass = 0; pass < 10 && retry; pass++) {
 		retry = 0;
+		nr_failed = 0;
 
 		list_for_each_entry_safe(page, page2, from, lru) {
 			cond_resched();
@@ -1029,6 +1030,7 @@ int migrate_huge_pages(struct list_head *from,
 
 	for (pass = 0; pass < 10 && retry; pass++) {
 		retry = 0;
+		nr_failed = 0;
 
 		list_for_each_entry_safe(page, page2, from, lru) {
 			cond_resched();
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 1/4 v2] mm: correct return value of migrate_pages() and migrate_huge_pages()
@ 2012-07-17 12:33   ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-17 12:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter

migrate_pages() should return number of pages not migrated or error code.
When unmap_and_move return -EAGAIN, outer loop is re-execution without
initialising nr_failed. This makes nr_failed over-counted.

So this patch correct it by initialising nr_failed in outer loop.

migrate_huge_pages() is identical case as migrate_pages()

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>

diff --git a/mm/migrate.c b/mm/migrate.c
index be26d5c..f495c58 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -982,6 +982,7 @@ int migrate_pages(struct list_head *from,
 
 	for(pass = 0; pass < 10 && retry; pass++) {
 		retry = 0;
+		nr_failed = 0;
 
 		list_for_each_entry_safe(page, page2, from, lru) {
 			cond_resched();
@@ -1029,6 +1030,7 @@ int migrate_huge_pages(struct list_head *from,
 
 	for (pass = 0; pass < 10 && retry; pass++) {
 		retry = 0;
+		nr_failed = 0;
 
 		list_for_each_entry_safe(page, page2, from, lru) {
 			cond_resched();
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 2/4 v2] mm: fix possible incorrect return value of migrate_pages() syscall
  2012-07-17 12:33   ` Joonsoo Kim
@ 2012-07-17 12:33     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-17 12:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Sasha Levin, Christoph Lameter

do_migrate_pages() can return the number of pages not migrated.
Because migrate_pages() syscall return this value directly,
migrate_pages() syscall may return the number of pages not migrated.
In fail case in migrate_pages() syscall, we should return error value.
So change err to -EBUSY

Additionally, Correct comment above do_migrate_pages()

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Cc: Christoph Lameter <cl@linux.com>

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 1d771e4..0732729 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -948,7 +948,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
  * Move pages between the two nodesets so as to preserve the physical
  * layout as much as possible.
  *
- * Returns the number of page that could not be moved.
+ * Returns error or the number of pages not migrated.
  */
 int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
 		     const nodemask_t *to, int flags)
@@ -1382,6 +1382,8 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
 
 	err = do_migrate_pages(mm, old, new,
 		capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
+	if (err > 0)
+		err = -EBUSY;
 
 	mmput(mm);
 out:
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 2/4 v2] mm: fix possible incorrect return value of migrate_pages() syscall
@ 2012-07-17 12:33     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-17 12:33 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Sasha Levin, Christoph Lameter

do_migrate_pages() can return the number of pages not migrated.
Because migrate_pages() syscall return this value directly,
migrate_pages() syscall may return the number of pages not migrated.
In fail case in migrate_pages() syscall, we should return error value.
So change err to -EBUSY

Additionally, Correct comment above do_migrate_pages()

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Cc: Christoph Lameter <cl@linux.com>

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 1d771e4..0732729 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -948,7 +948,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
  * Move pages between the two nodesets so as to preserve the physical
  * layout as much as possible.
  *
- * Returns the number of page that could not be moved.
+ * Returns error or the number of pages not migrated.
  */
 int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
 		     const nodemask_t *to, int flags)
@@ -1382,6 +1382,8 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
 
 	err = do_migrate_pages(mm, old, new,
 		capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
+	if (err > 0)
+		err = -EBUSY;
 
 	mmput(mm);
 out:
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 3/4 v2] mm: fix return value in __alloc_contig_migrate_range()
  2012-07-17 12:33   ` Joonsoo Kim
@ 2012-07-17 12:33     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-17 12:33 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Michal Nazarewicz,
	Marek Szyprowski, Minchan Kim, Christoph Lameter

migrate_pages() would return positive value in some failure case,
so 'ret > 0 ? 0 : ret' may be wrong.
This fix it and remove one dead statement.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Christoph Lameter <cl@linux.com>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4403009..02d4519 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5673,7 +5673,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 			}
 			tries = 0;
 		} else if (++tries == 5) {
-			ret = ret < 0 ? ret : -EBUSY;
 			break;
 		}
 
@@ -5683,7 +5682,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 	}
 
 	putback_lru_pages(&cc.migratepages);
-	return ret > 0 ? 0 : ret;
+	return ret <= 0 ? ret : -EBUSY;
 }
 
 /*
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 3/4 v2] mm: fix return value in __alloc_contig_migrate_range()
@ 2012-07-17 12:33     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-17 12:33 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Michal Nazarewicz,
	Marek Szyprowski, Minchan Kim, Christoph Lameter

migrate_pages() would return positive value in some failure case,
so 'ret > 0 ? 0 : ret' may be wrong.
This fix it and remove one dead statement.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Christoph Lameter <cl@linux.com>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4403009..02d4519 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5673,7 +5673,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 			}
 			tries = 0;
 		} else if (++tries == 5) {
-			ret = ret < 0 ? ret : -EBUSY;
 			break;
 		}
 
@@ -5683,7 +5682,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 	}
 
 	putback_lru_pages(&cc.migratepages);
-	return ret > 0 ? 0 : ret;
+	return ret <= 0 ? ret : -EBUSY;
 }
 
 /*
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 4/4 v2] mm: fix possible incorrect return value of move_pages() syscall
  2012-07-17 12:33   ` Joonsoo Kim
@ 2012-07-17 12:33     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-17 12:33 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Brice Goglin,
	Christoph Lameter, Minchan Kim

move_pages() syscall may return success in case that
do_move_page_to_node_array return positive value which means migration failed.
This patch changes return value of do_move_page_to_node_array
for not returning positive value. It can fix the problem.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Brice Goglin <brice@myri.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan@kernel.org>

diff --git a/mm/migrate.c b/mm/migrate.c
index f495c58..eeaf409 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1172,7 +1172,7 @@ set_status:
 	}
 
 	up_read(&mm->mmap_sem);
-	return err;
+	return err > 0 ? -EBUSY : err;
 }
 
 /*
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 4/4 v2] mm: fix possible incorrect return value of move_pages() syscall
@ 2012-07-17 12:33     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-17 12:33 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Brice Goglin,
	Christoph Lameter, Minchan Kim

move_pages() syscall may return success in case that
do_move_page_to_node_array return positive value which means migration failed.
This patch changes return value of do_move_page_to_node_array
for not returning positive value. It can fix the problem.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Brice Goglin <brice@myri.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan@kernel.org>

diff --git a/mm/migrate.c b/mm/migrate.c
index f495c58..eeaf409 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1172,7 +1172,7 @@ set_status:
 	}
 
 	up_read(&mm->mmap_sem);
-	return err;
+	return err > 0 ? -EBUSY : err;
 }
 
 /*
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall
  2012-07-16 17:59         ` JoonSoo Kim
@ 2012-07-17 13:02           ` Michal Nazarewicz
  -1 siblings, 0 replies; 265+ messages in thread
From: Michal Nazarewicz @ 2012-07-17 13:02 UTC (permalink / raw)
  To: Michal Nazarewicz, JoonSoo Kim
  Cc: akpm, linux-kernel, linux-mm, Sasha Levin, Christoph Lameter

On Mon, 16 Jul 2012 19:59:18 +0200, JoonSoo Kim <js1304@gmail.com> wrote:

> 2012/7/17 Michal Nazarewicz <mina86@tlen.pl>:
>> Joonsoo Kim <js1304@gmail.com> writes:
>>> do_migrate_pages() can return the number of pages not migrated.
>>> Because migrate_pages() syscall return this value directly,
>>> migrate_pages() syscall may return the number of pages not migrated.
>>> In fail case in migrate_pages() syscall, we should return error value.
>>> So change err to -EIO
>>>
>>> Additionally, Correct comment above do_migrate_pages()
>>>
>>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>>> Cc: Sasha Levin <levinsasha928@gmail.com>
>>> Cc: Christoph Lameter <cl@linux.com>
>>
>> Acked-by: Michal Nazarewicz <mina86@mina86.com>
>
> Thanks.
>
> When I resend with changing -EIO to -EBUSY,
> could I include "Acked-by: Michal Nazarewicz <mina86@mina86.com>"?

Sure thing.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall
@ 2012-07-17 13:02           ` Michal Nazarewicz
  0 siblings, 0 replies; 265+ messages in thread
From: Michal Nazarewicz @ 2012-07-17 13:02 UTC (permalink / raw)
  To: Michal Nazarewicz, JoonSoo Kim
  Cc: akpm, linux-kernel, linux-mm, Sasha Levin, Christoph Lameter

On Mon, 16 Jul 2012 19:59:18 +0200, JoonSoo Kim <js1304@gmail.com> wrote:

> 2012/7/17 Michal Nazarewicz <mina86@tlen.pl>:
>> Joonsoo Kim <js1304@gmail.com> writes:
>>> do_migrate_pages() can return the number of pages not migrated.
>>> Because migrate_pages() syscall return this value directly,
>>> migrate_pages() syscall may return the number of pages not migrated.
>>> In fail case in migrate_pages() syscall, we should return error value.
>>> So change err to -EIO
>>>
>>> Additionally, Correct comment above do_migrate_pages()
>>>
>>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>>> Cc: Sasha Levin <levinsasha928@gmail.com>
>>> Cc: Christoph Lameter <cl@linux.com>
>>
>> Acked-by: Michal Nazarewicz <mina86@mina86.com>
>
> Thanks.
>
> When I resend with changing -EIO to -EBUSY,
> could I include "Acked-by: Michal Nazarewicz <mina86@mina86.com>"?

Sure thing.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 3/3] mm: fix return value in __alloc_contig_migrate_range()
  2012-07-16 18:40         ` JoonSoo Kim
@ 2012-07-17 13:16           ` Michal Nazarewicz
  -1 siblings, 0 replies; 265+ messages in thread
From: Michal Nazarewicz @ 2012-07-17 13:16 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: akpm, linux-kernel, linux-mm, Marek Szyprowski, Minchan Kim,
	Christoph Lameter

On Mon, 16 Jul 2012 20:40:56 +0200, JoonSoo Kim <js1304@gmail.com> wrote:

> 2012/7/17 Michal Nazarewicz <mina86@mina86.com>:
>> Joonsoo Kim <js1304@gmail.com> writes:
>>
>>> migrate_pages() would return positive value in some failure case,
>>> so 'ret > 0 ? 0 : ret' may be wrong.
>>> This fix it and remove one dead statement.
>>>
>>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>>> Cc: Michal Nazarewicz <mina86@mina86.com>
>>> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
>>> Cc: Minchan Kim <minchan@kernel.org>
>>> Cc: Christoph Lameter <cl@linux.com>
>>
>> Have you actually encountered this problem?  If migrate_pages() fails
>> with a positive value, the code that you are removing kicks in and
>> -EBUSY is assigned to ret (now that I look at it, I think that in the
>> current code the "return ret > 0 ? 0 : ret;" statement could be reduced
>> to "return ret;").  Your code seems to be cleaner, but the commit
>> message does not look accurate to me.
>>
>
> I don't encounter this problem yet.
>
> If migrate_pages() with offlining false meets KSM page, then migration failed.
> In this case, failed page is removed from cc.migratepage list and
> return failed count.

Good point.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 3/3] mm: fix return value in __alloc_contig_migrate_range()
@ 2012-07-17 13:16           ` Michal Nazarewicz
  0 siblings, 0 replies; 265+ messages in thread
From: Michal Nazarewicz @ 2012-07-17 13:16 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: akpm, linux-kernel, linux-mm, Marek Szyprowski, Minchan Kim,
	Christoph Lameter

On Mon, 16 Jul 2012 20:40:56 +0200, JoonSoo Kim <js1304@gmail.com> wrote:

> 2012/7/17 Michal Nazarewicz <mina86@mina86.com>:
>> Joonsoo Kim <js1304@gmail.com> writes:
>>
>>> migrate_pages() would return positive value in some failure case,
>>> so 'ret > 0 ? 0 : ret' may be wrong.
>>> This fix it and remove one dead statement.
>>>
>>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>>> Cc: Michal Nazarewicz <mina86@mina86.com>
>>> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
>>> Cc: Minchan Kim <minchan@kernel.org>
>>> Cc: Christoph Lameter <cl@linux.com>
>>
>> Have you actually encountered this problem?  If migrate_pages() fails
>> with a positive value, the code that you are removing kicks in and
>> -EBUSY is assigned to ret (now that I look at it, I think that in the
>> current code the "return ret > 0 ? 0 : ret;" statement could be reduced
>> to "return ret;").  Your code seems to be cleaner, but the commit
>> message does not look accurate to me.
>>
>
> I don't encounter this problem yet.
>
> If migrate_pages() with offlining false meets KSM page, then migration failed.
> In this case, failed page is removed from cc.migratepage list and
> return failed count.

Good point.

-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 3/4 v2] mm: fix return value in __alloc_contig_migrate_range()
  2012-07-17 12:33     ` Joonsoo Kim
@ 2012-07-17 13:25       ` Michal Nazarewicz
  -1 siblings, 0 replies; 265+ messages in thread
From: Michal Nazarewicz @ 2012-07-17 13:25 UTC (permalink / raw)
  To: akpm, Joonsoo Kim
  Cc: linux-kernel, linux-mm, Marek Szyprowski, Minchan Kim, Christoph Lameter

On Tue, 17 Jul 2012 14:33:34 +0200, Joonsoo Kim <js1304@gmail.com> wrote:
> migrate_pages() would return positive value in some failure case,
> so 'ret > 0 ? 0 : ret' may be wrong.
> This fix it and remove one dead statement.

How about the following message:

------------------- >8 ---------------------------------------------------
migrate_pages() can return positive value while at the same time emptying
the list of pages it was called with.  Such situation means that it went
through all the pages on the list some of which failed to be migrated.

If that happens, __alloc_contig_migrate_range()'s loop may finish without
"++tries == 5" never being checked.  This in turn means that at the end
of the function, ret may have a positive value, which should be treated
as an error.

This patch changes __alloc_contig_migrate_range() so that the return
statement converts positive ret value into -EBUSY error.
------------------- >8 ---------------------------------------------------

> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Michal Nazarewicz <mina86@mina86.com>

> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Christoph Lameter <cl@linux.com>
> Acked-by: Christoph Lameter <cl@linux.com>

In fact, now that I look at it, I think that __alloc_contig_migrate_range()
should be changed even further.  I'll take a closer look at it and send
a patch (possibly through Marek ;) ).

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4403009..02d4519 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5673,7 +5673,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
>  			}
>  			tries = 0;
>  		} else if (++tries == 5) {
> -			ret = ret < 0 ? ret : -EBUSY;
>  			break;
>  		}
>@@ -5683,7 +5682,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
>  	}
> 	putback_lru_pages(&cc.migratepages);
> -	return ret > 0 ? 0 : ret;
> +	return ret <= 0 ? ret : -EBUSY;
>  }
> /*


-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 3/4 v2] mm: fix return value in __alloc_contig_migrate_range()
@ 2012-07-17 13:25       ` Michal Nazarewicz
  0 siblings, 0 replies; 265+ messages in thread
From: Michal Nazarewicz @ 2012-07-17 13:25 UTC (permalink / raw)
  To: akpm, Joonsoo Kim
  Cc: linux-kernel, linux-mm, Marek Szyprowski, Minchan Kim, Christoph Lameter

On Tue, 17 Jul 2012 14:33:34 +0200, Joonsoo Kim <js1304@gmail.com> wrote:
> migrate_pages() would return positive value in some failure case,
> so 'ret > 0 ? 0 : ret' may be wrong.
> This fix it and remove one dead statement.

How about the following message:

------------------- >8 ---------------------------------------------------
migrate_pages() can return positive value while at the same time emptying
the list of pages it was called with.  Such situation means that it went
through all the pages on the list some of which failed to be migrated.

If that happens, __alloc_contig_migrate_range()'s loop may finish without
"++tries == 5" never being checked.  This in turn means that at the end
of the function, ret may have a positive value, which should be treated
as an error.

This patch changes __alloc_contig_migrate_range() so that the return
statement converts positive ret value into -EBUSY error.
------------------- >8 ---------------------------------------------------

> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> Cc: Michal Nazarewicz <mina86@mina86.com>

Acked-by: Michal Nazarewicz <mina86@mina86.com>

> Cc: Marek Szyprowski <m.szyprowski@samsung.com>
> Cc: Minchan Kim <minchan@kernel.org>
> Cc: Christoph Lameter <cl@linux.com>
> Acked-by: Christoph Lameter <cl@linux.com>

In fact, now that I look at it, I think that __alloc_contig_migrate_range()
should be changed even further.  I'll take a closer look at it and send
a patch (possibly through Marek ;) ).

> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 4403009..02d4519 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -5673,7 +5673,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
>  			}
>  			tries = 0;
>  		} else if (++tries == 5) {
> -			ret = ret < 0 ? ret : -EBUSY;
>  			break;
>  		}
>@@ -5683,7 +5682,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
>  	}
> 	putback_lru_pages(&cc.migratepages);
> -	return ret > 0 ? 0 : ret;
> +	return ret <= 0 ? ret : -EBUSY;
>  }
> /*


-- 
Best regards,                                         _     _
.o. | Liege of Serenely Enlightened Majesty of      o' \,=./ `o
..o | Computer Science,  Michał “mina86” Nazarewicz    (o o)
ooo +----<email/xmpp: mpn@google.com>--------------ooO--(_)--Ooo--

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/4 v2] mm: fix possible incorrect return value of migrate_pages() syscall
  2012-07-17 12:33     ` Joonsoo Kim
@ 2012-07-17 14:28       ` Christoph Lameter
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-17 14:28 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm, Sasha Levin

On Tue, 17 Jul 2012, Joonsoo Kim wrote:

> @@ -1382,6 +1382,8 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
>
>  	err = do_migrate_pages(mm, old, new,
>  		capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
> +	if (err > 0)
> +		err = -EBUSY;
>
>  	mmput(mm);
>  out:

Why not have do_migrate_pages() return EBUSY if we do not need the number
of failed/retried pages?


^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/4 v2] mm: fix possible incorrect return value of migrate_pages() syscall
@ 2012-07-17 14:28       ` Christoph Lameter
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-17 14:28 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm, Sasha Levin

On Tue, 17 Jul 2012, Joonsoo Kim wrote:

> @@ -1382,6 +1382,8 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
>
>  	err = do_migrate_pages(mm, old, new,
>  		capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
> +	if (err > 0)
> +		err = -EBUSY;
>
>  	mmput(mm);
>  out:

Why not have do_migrate_pages() return EBUSY if we do not need the number
of failed/retried pages?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/4 v2] mm: fix possible incorrect return value of migrate_pages() syscall
  2012-07-17 14:28       ` Christoph Lameter
@ 2012-07-17 15:41         ` JoonSoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-07-17 15:41 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, linux-mm, Sasha Levin

2012/7/17 Christoph Lameter <cl@linux.com>:
> On Tue, 17 Jul 2012, Joonsoo Kim wrote:
>
>> @@ -1382,6 +1382,8 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
>>
>>       err = do_migrate_pages(mm, old, new,
>>               capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
>> +     if (err > 0)
>> +             err = -EBUSY;
>>
>>       mmput(mm);
>>  out:
>
> Why not have do_migrate_pages() return EBUSY if we do not need the number
> of failed/retried pages?

There is no serious reason.
do_migrate_pages() have two callsites, although another one doesn't
use return value.
do_migrate_pages() is commented "Return the number of page ...".
And my focus is fixing possible error in migrate_pages() syscall.
So, I keep to return the number of failed/retired pages.

If we really think the number of failed/retired pages is useless, in that time,
instead that do_migrate_pages() return EBUSY, we can make migrate_pages()
return EBUSY. I think it is better to fix all the related codes at one go.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/4 v2] mm: fix possible incorrect return value of migrate_pages() syscall
@ 2012-07-17 15:41         ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-07-17 15:41 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, linux-mm, Sasha Levin

2012/7/17 Christoph Lameter <cl@linux.com>:
> On Tue, 17 Jul 2012, Joonsoo Kim wrote:
>
>> @@ -1382,6 +1382,8 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
>>
>>       err = do_migrate_pages(mm, old, new,
>>               capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
>> +     if (err > 0)
>> +             err = -EBUSY;
>>
>>       mmput(mm);
>>  out:
>
> Why not have do_migrate_pages() return EBUSY if we do not need the number
> of failed/retried pages?

There is no serious reason.
do_migrate_pages() have two callsites, although another one doesn't
use return value.
do_migrate_pages() is commented "Return the number of page ...".
And my focus is fixing possible error in migrate_pages() syscall.
So, I keep to return the number of failed/retired pages.

If we really think the number of failed/retired pages is useless, in that time,
instead that do_migrate_pages() return EBUSY, we can make migrate_pages()
return EBUSY. I think it is better to fix all the related codes at one go.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 3/4 v2] mm: fix return value in __alloc_contig_migrate_range()
  2012-07-17 13:25       ` Michal Nazarewicz
@ 2012-07-17 15:45         ` JoonSoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-07-17 15:45 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: akpm, linux-kernel, linux-mm, Marek Szyprowski, Minchan Kim,
	Christoph Lameter

2012/7/17 Michal Nazarewicz <mina86@mina86.com>:
> On Tue, 17 Jul 2012 14:33:34 +0200, Joonsoo Kim <js1304@gmail.com> wrote:
>>
>> migrate_pages() would return positive value in some failure case,
>> so 'ret > 0 ? 0 : ret' may be wrong.
>> This fix it and remove one dead statement.
>
>
> How about the following message:
>
> ------------------- >8 ---------------------------------------------------
> migrate_pages() can return positive value while at the same time emptying
> the list of pages it was called with.  Such situation means that it went
> through all the pages on the list some of which failed to be migrated.
>
> If that happens, __alloc_contig_migrate_range()'s loop may finish without
> "++tries == 5" never being checked.  This in turn means that at the end
> of the function, ret may have a positive value, which should be treated
> as an error.
>
> This patch changes __alloc_contig_migrate_range() so that the return
> statement converts positive ret value into -EBUSY error.
> ------------------- >8 ---------------------------------------------------

It's good.
I will resend patch replacing my comment with yours.
Thanks for help.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 3/4 v2] mm: fix return value in __alloc_contig_migrate_range()
@ 2012-07-17 15:45         ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-07-17 15:45 UTC (permalink / raw)
  To: Michal Nazarewicz
  Cc: akpm, linux-kernel, linux-mm, Marek Szyprowski, Minchan Kim,
	Christoph Lameter

2012/7/17 Michal Nazarewicz <mina86@mina86.com>:
> On Tue, 17 Jul 2012 14:33:34 +0200, Joonsoo Kim <js1304@gmail.com> wrote:
>>
>> migrate_pages() would return positive value in some failure case,
>> so 'ret > 0 ? 0 : ret' may be wrong.
>> This fix it and remove one dead statement.
>
>
> How about the following message:
>
> ------------------- >8 ---------------------------------------------------
> migrate_pages() can return positive value while at the same time emptying
> the list of pages it was called with.  Such situation means that it went
> through all the pages on the list some of which failed to be migrated.
>
> If that happens, __alloc_contig_migrate_range()'s loop may finish without
> "++tries == 5" never being checked.  This in turn means that at the end
> of the function, ret may have a positive value, which should be treated
> as an error.
>
> This patch changes __alloc_contig_migrate_range() so that the return
> statement converts positive ret value into -EBUSY error.
> ------------------- >8 ---------------------------------------------------

It's good.
I will resend patch replacing my comment with yours.
Thanks for help.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH 3/4 v3] mm: fix return value in __alloc_contig_migrate_range()
  2012-07-17 15:45         ` JoonSoo Kim
@ 2012-07-17 15:49           ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-17 15:49 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Michal Nazarewicz,
	Marek Szyprowski, Minchan Kim, Christoph Lameter

migrate_pages() can return positive value while at the same time emptying
the list of pages it was called with.  Such situation means that it went
through all the pages on the list some of which failed to be migrated.

If that happens, __alloc_contig_migrate_range()'s loop may finish without
"++tries == 5" never being checked.  This in turn means that at the end
of the function, ret may have a positive value, which should be treated
as an error.

This patch changes __alloc_contig_migrate_range() so that the return
statement converts positive ret value into -EBUSY error.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4403009..02d4519 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5673,7 +5673,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 			}
 			tries = 0;
 		} else if (++tries == 5) {
-			ret = ret < 0 ? ret : -EBUSY;
 			break;
 		}
 
@@ -5683,7 +5682,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 	}
 
 	putback_lru_pages(&cc.migratepages);
-	return ret > 0 ? 0 : ret;
+	return ret <= 0 ? ret : -EBUSY;
 }
 
 /*
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 3/4 v3] mm: fix return value in __alloc_contig_migrate_range()
@ 2012-07-17 15:49           ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-17 15:49 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Michal Nazarewicz,
	Marek Szyprowski, Minchan Kim, Christoph Lameter

migrate_pages() can return positive value while at the same time emptying
the list of pages it was called with.  Such situation means that it went
through all the pages on the list some of which failed to be migrated.

If that happens, __alloc_contig_migrate_range()'s loop may finish without
"++tries == 5" never being checked.  This in turn means that at the end
of the function, ret may have a positive value, which should be treated
as an error.

This patch changes __alloc_contig_migrate_range() so that the return
statement converts positive ret value into -EBUSY error.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4403009..02d4519 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5673,7 +5673,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 			}
 			tries = 0;
 		} else if (++tries == 5) {
-			ret = ret < 0 ? ret : -EBUSY;
 			break;
 		}
 
@@ -5683,7 +5682,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 	}
 
 	putback_lru_pages(&cc.migratepages);
-	return ret > 0 ? 0 : ret;
+	return ret <= 0 ? ret : -EBUSY;
 }
 
 /*
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [RESEND PATCH 1/4 v3] mm: correct return value of migrate_pages() and migrate_huge_pages()
       [not found] <Yes>
@ 2012-07-27 17:55   ` Joonsoo Kim
  2010-07-05 12:41 ` [PATCH 01/09] cifs: add kernel config option for CIFS Client caching support Suresh Jayaraman
                     ` (22 subsequent siblings)
  23 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-27 17:55 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter

migrate_pages() should return number of pages not migrated or error code.
When unmap_and_move return -EAGAIN, outer loop is re-execution without
initialising nr_failed. This makes nr_failed over-counted.

So this patch correct it by initialising nr_failed in outer loop.

migrate_huge_pages() is identical case as migrate_pages()

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
---
[Patch 2/4]: add "Acked-by: Michal Nazarewicz <mina86@mina86.com>"
[Patch 3/4]: commit log is changed according to Michal Nazarewicz's suggestion.
There is no other change from v2.
Just resend as ping for Andrew.

diff --git a/mm/migrate.c b/mm/migrate.c
index be26d5c..f495c58 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -982,6 +982,7 @@ int migrate_pages(struct list_head *from,
 
 	for(pass = 0; pass < 10 && retry; pass++) {
 		retry = 0;
+		nr_failed = 0;
 
 		list_for_each_entry_safe(page, page2, from, lru) {
 			cond_resched();
@@ -1029,6 +1030,7 @@ int migrate_huge_pages(struct list_head *from,
 
 	for (pass = 0; pass < 10 && retry; pass++) {
 		retry = 0;
+		nr_failed = 0;
 
 		list_for_each_entry_safe(page, page2, from, lru) {
 			cond_resched();
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [RESEND PATCH 1/4 v3] mm: correct return value of migrate_pages() and migrate_huge_pages()
@ 2012-07-27 17:55   ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-27 17:55 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter

migrate_pages() should return number of pages not migrated or error code.
When unmap_and_move return -EAGAIN, outer loop is re-execution without
initialising nr_failed. This makes nr_failed over-counted.

So this patch correct it by initialising nr_failed in outer loop.

migrate_huge_pages() is identical case as migrate_pages()

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>
---
[Patch 2/4]: add "Acked-by: Michal Nazarewicz <mina86@mina86.com>"
[Patch 3/4]: commit log is changed according to Michal Nazarewicz's suggestion.
There is no other change from v2.
Just resend as ping for Andrew.

diff --git a/mm/migrate.c b/mm/migrate.c
index be26d5c..f495c58 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -982,6 +982,7 @@ int migrate_pages(struct list_head *from,
 
 	for(pass = 0; pass < 10 && retry; pass++) {
 		retry = 0;
+		nr_failed = 0;
 
 		list_for_each_entry_safe(page, page2, from, lru) {
 			cond_resched();
@@ -1029,6 +1030,7 @@ int migrate_huge_pages(struct list_head *from,
 
 	for (pass = 0; pass < 10 && retry; pass++) {
 		retry = 0;
+		nr_failed = 0;
 
 		list_for_each_entry_safe(page, page2, from, lru) {
 			cond_resched();
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [RESEND PATCH 2/4 v3] mm: fix possible incorrect return value of migrate_pages() syscall
  2012-07-27 17:55   ` Joonsoo Kim
@ 2012-07-27 17:55     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-27 17:55 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Sasha Levin, Christoph Lameter

do_migrate_pages() can return the number of pages not migrated.
Because migrate_pages() syscall return this value directly,
migrate_pages() syscall may return the number of pages not migrated.
In fail case in migrate_pages() syscall, we should return error value.
So change err to -EBUSY

Additionally, Correct comment above do_migrate_pages()

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 1d771e4..0732729 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -948,7 +948,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
  * Move pages between the two nodesets so as to preserve the physical
  * layout as much as possible.
  *
- * Returns the number of page that could not be moved.
+ * Returns error or the number of pages not migrated.
  */
 int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
 		     const nodemask_t *to, int flags)
@@ -1382,6 +1382,8 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
 
 	err = do_migrate_pages(mm, old, new,
 		capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
+	if (err > 0)
+		err = -EBUSY;
 
 	mmput(mm);
 out:
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [RESEND PATCH 2/4 v3] mm: fix possible incorrect return value of migrate_pages() syscall
@ 2012-07-27 17:55     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-27 17:55 UTC (permalink / raw)
  To: akpm; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Sasha Levin, Christoph Lameter

do_migrate_pages() can return the number of pages not migrated.
Because migrate_pages() syscall return this value directly,
migrate_pages() syscall may return the number of pages not migrated.
In fail case in migrate_pages() syscall, we should return error value.
So change err to -EBUSY

Additionally, Correct comment above do_migrate_pages()

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Sasha Levin <levinsasha928@gmail.com>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>

diff --git a/mm/mempolicy.c b/mm/mempolicy.c
index 1d771e4..0732729 100644
--- a/mm/mempolicy.c
+++ b/mm/mempolicy.c
@@ -948,7 +948,7 @@ static int migrate_to_node(struct mm_struct *mm, int source, int dest,
  * Move pages between the two nodesets so as to preserve the physical
  * layout as much as possible.
  *
- * Returns the number of page that could not be moved.
+ * Returns error or the number of pages not migrated.
  */
 int do_migrate_pages(struct mm_struct *mm, const nodemask_t *from,
 		     const nodemask_t *to, int flags)
@@ -1382,6 +1382,8 @@ SYSCALL_DEFINE4(migrate_pages, pid_t, pid, unsigned long, maxnode,
 
 	err = do_migrate_pages(mm, old, new,
 		capable(CAP_SYS_NICE) ? MPOL_MF_MOVE_ALL : MPOL_MF_MOVE);
+	if (err > 0)
+		err = -EBUSY;
 
 	mmput(mm);
 out:
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [RESEND PATCH 3/4 v3] mm: fix return value in __alloc_contig_migrate_range()
  2012-07-27 17:55   ` Joonsoo Kim
@ 2012-07-27 17:55     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-27 17:55 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Michal Nazarewicz,
	Marek Szyprowski, Minchan Kim, Christoph Lameter

migrate_pages() can return positive value while at the same time emptying
the list of pages it was called with.  Such situation means that it went
through all the pages on the list some of which failed to be migrated.

If that happens, __alloc_contig_migrate_range()'s loop may finish without
"++tries == 5" never being checked.  This in turn means that at the end
of the function, ret may have a positive value, which should be treated
as an error.

This patch changes __alloc_contig_migrate_range() so that the return
statement converts positive ret value into -EBUSY error.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4403009..02d4519 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5673,7 +5673,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 			}
 			tries = 0;
 		} else if (++tries == 5) {
-			ret = ret < 0 ? ret : -EBUSY;
 			break;
 		}
 
@@ -5683,7 +5682,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 	}
 
 	putback_lru_pages(&cc.migratepages);
-	return ret > 0 ? 0 : ret;
+	return ret <= 0 ? ret : -EBUSY;
 }
 
 /*
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [RESEND PATCH 3/4 v3] mm: fix return value in __alloc_contig_migrate_range()
@ 2012-07-27 17:55     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-27 17:55 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Michal Nazarewicz,
	Marek Szyprowski, Minchan Kim, Christoph Lameter

migrate_pages() can return positive value while at the same time emptying
the list of pages it was called with.  Such situation means that it went
through all the pages on the list some of which failed to be migrated.

If that happens, __alloc_contig_migrate_range()'s loop may finish without
"++tries == 5" never being checked.  This in turn means that at the end
of the function, ret may have a positive value, which should be treated
as an error.

This patch changes __alloc_contig_migrate_range() so that the return
statement converts positive ret value into -EBUSY error.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Michal Nazarewicz <mina86@mina86.com>
Cc: Marek Szyprowski <m.szyprowski@samsung.com>
Cc: Minchan Kim <minchan@kernel.org>
Cc: Christoph Lameter <cl@linux.com>
Acked-by: Christoph Lameter <cl@linux.com>
Acked-by: Michal Nazarewicz <mina86@mina86.com>

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 4403009..02d4519 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -5673,7 +5673,6 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 			}
 			tries = 0;
 		} else if (++tries == 5) {
-			ret = ret < 0 ? ret : -EBUSY;
 			break;
 		}
 
@@ -5683,7 +5682,7 @@ static int __alloc_contig_migrate_range(unsigned long start, unsigned long end)
 	}
 
 	putback_lru_pages(&cc.migratepages);
-	return ret > 0 ? 0 : ret;
+	return ret <= 0 ? ret : -EBUSY;
 }
 
 /*
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
  2012-07-27 17:55   ` Joonsoo Kim
@ 2012-07-27 17:55     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-27 17:55 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Brice Goglin,
	Christoph Lameter, Minchan Kim

move_pages() syscall may return success in case that
do_move_page_to_node_array return positive value which means migration failed.
This patch changes return value of do_move_page_to_node_array
for not returning positive value. It can fix the problem.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Brice Goglin <brice@myri.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan@kernel.org>

diff --git a/mm/migrate.c b/mm/migrate.c
index f495c58..eeaf409 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1172,7 +1172,7 @@ set_status:
 	}
 
 	up_read(&mm->mmap_sem);
-	return err;
+	return err > 0 ? -EBUSY : err;
 }
 
 /*
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
@ 2012-07-27 17:55     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-07-27 17:55 UTC (permalink / raw)
  To: akpm
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Brice Goglin,
	Christoph Lameter, Minchan Kim

move_pages() syscall may return success in case that
do_move_page_to_node_array return positive value which means migration failed.
This patch changes return value of do_move_page_to_node_array
for not returning positive value. It can fix the problem.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Brice Goglin <brice@myri.com>
Cc: Christoph Lameter <cl@linux.com>
Cc: Minchan Kim <minchan@kernel.org>

diff --git a/mm/migrate.c b/mm/migrate.c
index f495c58..eeaf409 100644
--- a/mm/migrate.c
+++ b/mm/migrate.c
@@ -1172,7 +1172,7 @@ set_status:
 	}
 
 	up_read(&mm->mmap_sem);
-	return err;
+	return err > 0 ? -EBUSY : err;
 }
 
 /*
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
  2012-07-27 17:55     ` Joonsoo Kim
@ 2012-07-27 20:54       ` Christoph Lameter
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-27 20:54 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim

On Sat, 28 Jul 2012, Joonsoo Kim wrote:

> move_pages() syscall may return success in case that
> do_move_page_to_node_array return positive value which means migration failed.

Nope. It only means that the migration for some pages has failed. This may
still be considered successful for the app if it moves 10000 pages and one
failed.

This patch would break the move_pages() syscall because an error code
return from do_move_pages_to_node_array() will cause the status byte for
each page move to not be updated anymore. Application will not be able to
tell anymore which pages were successfully moved and which are not.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
@ 2012-07-27 20:54       ` Christoph Lameter
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-27 20:54 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim

On Sat, 28 Jul 2012, Joonsoo Kim wrote:

> move_pages() syscall may return success in case that
> do_move_page_to_node_array return positive value which means migration failed.

Nope. It only means that the migration for some pages has failed. This may
still be considered successful for the app if it moves 10000 pages and one
failed.

This patch would break the move_pages() syscall because an error code
return from do_move_pages_to_node_array() will cause the status byte for
each page move to not be updated anymore. Application will not be able to
tell anymore which pages were successfully moved and which are not.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 2/4 v3] mm: fix possible incorrect return value of migrate_pages() syscall
  2012-07-27 17:55     ` Joonsoo Kim
@ 2012-07-27 20:57       ` Christoph Lameter
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-27 20:57 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm, Sasha Levin

On Sat, 28 Jul 2012, Joonsoo Kim wrote:

> do_migrate_pages() can return the number of pages not migrated.
> Because migrate_pages() syscall return this value directly,
> migrate_pages() syscall may return the number of pages not migrated.
> In fail case in migrate_pages() syscall, we should return error value.
> So change err to -EBUSY

Lets leave this alone. This would change the migrate_pages semantics
because a successful move of N out of M pages would be marked as a
total failure although pages were in fact moved.


^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 2/4 v3] mm: fix possible incorrect return value of migrate_pages() syscall
@ 2012-07-27 20:57       ` Christoph Lameter
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-27 20:57 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: akpm, linux-kernel, linux-mm, Sasha Levin

On Sat, 28 Jul 2012, Joonsoo Kim wrote:

> do_migrate_pages() can return the number of pages not migrated.
> Because migrate_pages() syscall return this value directly,
> migrate_pages() syscall may return the number of pages not migrated.
> In fail case in migrate_pages() syscall, we should return error value.
> So change err to -EBUSY

Lets leave this alone. This would change the migrate_pages semantics
because a successful move of N out of M pages would be marked as a
total failure although pages were in fact moved.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
  2012-07-27 20:54       ` Christoph Lameter
@ 2012-07-28  6:09         ` JoonSoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-07-28  6:09 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim

2012/7/28 Christoph Lameter <cl@linux.com>:
> On Sat, 28 Jul 2012, Joonsoo Kim wrote:
>
>> move_pages() syscall may return success in case that
>> do_move_page_to_node_array return positive value which means migration failed.
>
> Nope. It only means that the migration for some pages has failed. This may
> still be considered successful for the app if it moves 10000 pages and one
> failed.
>
> This patch would break the move_pages() syscall because an error code
> return from do_move_pages_to_node_array() will cause the status byte for
> each page move to not be updated anymore. Application will not be able to
> tell anymore which pages were successfully moved and which are not.

In case of returning non-zero, valid status is not required according
to man page.
So, this patch would not break the move_pages() syscall.
But, I agree that returning positive value only means that the
migration for some pages has failed.
This is my mistake, so please drop this patch.
Thanks for review.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
@ 2012-07-28  6:09         ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-07-28  6:09 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim

2012/7/28 Christoph Lameter <cl@linux.com>:
> On Sat, 28 Jul 2012, Joonsoo Kim wrote:
>
>> move_pages() syscall may return success in case that
>> do_move_page_to_node_array return positive value which means migration failed.
>
> Nope. It only means that the migration for some pages has failed. This may
> still be considered successful for the app if it moves 10000 pages and one
> failed.
>
> This patch would break the move_pages() syscall because an error code
> return from do_move_pages_to_node_array() will cause the status byte for
> each page move to not be updated anymore. Application will not be able to
> tell anymore which pages were successfully moved and which are not.

In case of returning non-zero, valid status is not required according
to man page.
So, this patch would not break the move_pages() syscall.
But, I agree that returning positive value only means that the
migration for some pages has failed.
This is my mistake, so please drop this patch.
Thanks for review.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 2/4 v3] mm: fix possible incorrect return value of migrate_pages() syscall
  2012-07-27 20:57       ` Christoph Lameter
@ 2012-07-28  6:16         ` JoonSoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-07-28  6:16 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, linux-mm, Sasha Levin

2012/7/28 Christoph Lameter <cl@linux.com>:
> On Sat, 28 Jul 2012, Joonsoo Kim wrote:
>
>> do_migrate_pages() can return the number of pages not migrated.
>> Because migrate_pages() syscall return this value directly,
>> migrate_pages() syscall may return the number of pages not migrated.
>> In fail case in migrate_pages() syscall, we should return error value.
>> So change err to -EBUSY
>
> Lets leave this alone. This would change the migrate_pages semantics
> because a successful move of N out of M pages would be marked as a
> total failure although pages were in fact moved.
>

Okay.
Then, do we need to fix man-page of migrate_pages() syscall?
According to man-page, only returning 0 or -1 is valid.
Without this patch, it can return positive value.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 2/4 v3] mm: fix possible incorrect return value of migrate_pages() syscall
@ 2012-07-28  6:16         ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-07-28  6:16 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: akpm, linux-kernel, linux-mm, Sasha Levin

2012/7/28 Christoph Lameter <cl@linux.com>:
> On Sat, 28 Jul 2012, Joonsoo Kim wrote:
>
>> do_migrate_pages() can return the number of pages not migrated.
>> Because migrate_pages() syscall return this value directly,
>> migrate_pages() syscall may return the number of pages not migrated.
>> In fail case in migrate_pages() syscall, we should return error value.
>> So change err to -EBUSY
>
> Lets leave this alone. This would change the migrate_pages semantics
> because a successful move of N out of M pages would be marked as a
> total failure although pages were in fact moved.
>

Okay.
Then, do we need to fix man-page of migrate_pages() syscall?
According to man-page, only returning 0 or -1 is valid.
Without this patch, it can return positive value.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
  2012-07-28  6:09         ` JoonSoo Kim
@ 2012-07-30 19:29           ` Christoph Lameter
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-30 19:29 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim, Michael Kerrisk

On Sat, 28 Jul 2012, JoonSoo Kim wrote:

> 2012/7/28 Christoph Lameter <cl@linux.com>:
> > On Sat, 28 Jul 2012, Joonsoo Kim wrote:
> >
> >> move_pages() syscall may return success in case that
> >> do_move_page_to_node_array return positive value which means migration failed.
> >
> > Nope. It only means that the migration for some pages has failed. This may
> > still be considered successful for the app if it moves 10000 pages and one
> > failed.
> >
> > This patch would break the move_pages() syscall because an error code
> > return from do_move_pages_to_node_array() will cause the status byte for
> > each page move to not be updated anymore. Application will not be able to
> > tell anymore which pages were successfully moved and which are not.
>
> In case of returning non-zero, valid status is not required according
> to man page.

Cannot find a statement like that in the man page. The return code
description is incorrect. It should that that is returns the number of
pages not moved otherwise an error code (Michael please fix the manpage).

> So, this patch would not break the move_pages() syscall.

It changes the way the system call is behaving right now.


^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
@ 2012-07-30 19:29           ` Christoph Lameter
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-30 19:29 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim, Michael Kerrisk

On Sat, 28 Jul 2012, JoonSoo Kim wrote:

> 2012/7/28 Christoph Lameter <cl@linux.com>:
> > On Sat, 28 Jul 2012, Joonsoo Kim wrote:
> >
> >> move_pages() syscall may return success in case that
> >> do_move_page_to_node_array return positive value which means migration failed.
> >
> > Nope. It only means that the migration for some pages has failed. This may
> > still be considered successful for the app if it moves 10000 pages and one
> > failed.
> >
> > This patch would break the move_pages() syscall because an error code
> > return from do_move_pages_to_node_array() will cause the status byte for
> > each page move to not be updated anymore. Application will not be able to
> > tell anymore which pages were successfully moved and which are not.
>
> In case of returning non-zero, valid status is not required according
> to man page.

Cannot find a statement like that in the man page. The return code
description is incorrect. It should that that is returns the number of
pages not moved otherwise an error code (Michael please fix the manpage).

> So, this patch would not break the move_pages() syscall.

It changes the way the system call is behaving right now.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 2/4 v3] mm: fix possible incorrect return value of migrate_pages() syscall
  2012-07-28  6:16         ` JoonSoo Kim
@ 2012-07-30 19:30           ` Christoph Lameter
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-30 19:30 UTC (permalink / raw)
  To: JoonSoo Kim; +Cc: akpm, linux-kernel, linux-mm, Sasha Levin, Michael Kerrisk

On Sat, 28 Jul 2012, JoonSoo Kim wrote:

> 2012/7/28 Christoph Lameter <cl@linux.com>:
> > On Sat, 28 Jul 2012, Joonsoo Kim wrote:
> >
> >> do_migrate_pages() can return the number of pages not migrated.
> >> Because migrate_pages() syscall return this value directly,
> >> migrate_pages() syscall may return the number of pages not migrated.
> >> In fail case in migrate_pages() syscall, we should return error value.
> >> So change err to -EBUSY
> >
> > Lets leave this alone. This would change the migrate_pages semantics
> > because a successful move of N out of M pages would be marked as a
> > total failure although pages were in fact moved.
> >
>
> Okay.
> Then, do we need to fix man-page of migrate_pages() syscall?
> According to man-page, only returning 0 or -1 is valid.
> Without this patch, it can return positive value.

Yes the manpage needs updating to say that it can return the number of
pages not migrated.


^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 2/4 v3] mm: fix possible incorrect return value of migrate_pages() syscall
@ 2012-07-30 19:30           ` Christoph Lameter
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-30 19:30 UTC (permalink / raw)
  To: JoonSoo Kim; +Cc: akpm, linux-kernel, linux-mm, Sasha Levin, Michael Kerrisk

On Sat, 28 Jul 2012, JoonSoo Kim wrote:

> 2012/7/28 Christoph Lameter <cl@linux.com>:
> > On Sat, 28 Jul 2012, Joonsoo Kim wrote:
> >
> >> do_migrate_pages() can return the number of pages not migrated.
> >> Because migrate_pages() syscall return this value directly,
> >> migrate_pages() syscall may return the number of pages not migrated.
> >> In fail case in migrate_pages() syscall, we should return error value.
> >> So change err to -EBUSY
> >
> > Lets leave this alone. This would change the migrate_pages semantics
> > because a successful move of N out of M pages would be marked as a
> > total failure although pages were in fact moved.
> >
>
> Okay.
> Then, do we need to fix man-page of migrate_pages() syscall?
> According to man-page, only returning 0 or -1 is valid.
> Without this patch, it can return positive value.

Yes the manpage needs updating to say that it can return the number of
pages not migrated.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
  2012-07-30 19:29           ` Christoph Lameter
@ 2012-07-31  3:34             ` JoonSoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-07-31  3:34 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim, Michael Kerrisk

2012/7/31 Christoph Lameter <cl@linux.com>:
> On Sat, 28 Jul 2012, JoonSoo Kim wrote:
>
>> 2012/7/28 Christoph Lameter <cl@linux.com>:
>> > On Sat, 28 Jul 2012, Joonsoo Kim wrote:
>> >
>> >> move_pages() syscall may return success in case that
>> >> do_move_page_to_node_array return positive value which means migration failed.
>> >
>> > Nope. It only means that the migration for some pages has failed. This may
>> > still be considered successful for the app if it moves 10000 pages and one
>> > failed.
>> >
>> > This patch would break the move_pages() syscall because an error code
>> > return from do_move_pages_to_node_array() will cause the status byte for
>> > each page move to not be updated anymore. Application will not be able to
>> > tell anymore which pages were successfully moved and which are not.
>>
>> In case of returning non-zero, valid status is not required according
>> to man page.
>
> Cannot find a statement like that in the man page. The return code
> description is incorrect. It should that that is returns the number of
> pages not moved otherwise an error code (Michael please fix the manpage).

In man page, there is following statement.
"status is an array of integers that return the status of each page.  The array
only contains valid values if move_pages() did not return an error."

And current implementation of move_pages() syscall doesn't return the number
of pages not moved, just return 0 when it encounter some failed pages.
So, if u want to fix the man page, u should fix do_pages_move() first.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
@ 2012-07-31  3:34             ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-07-31  3:34 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim, Michael Kerrisk

2012/7/31 Christoph Lameter <cl@linux.com>:
> On Sat, 28 Jul 2012, JoonSoo Kim wrote:
>
>> 2012/7/28 Christoph Lameter <cl@linux.com>:
>> > On Sat, 28 Jul 2012, Joonsoo Kim wrote:
>> >
>> >> move_pages() syscall may return success in case that
>> >> do_move_page_to_node_array return positive value which means migration failed.
>> >
>> > Nope. It only means that the migration for some pages has failed. This may
>> > still be considered successful for the app if it moves 10000 pages and one
>> > failed.
>> >
>> > This patch would break the move_pages() syscall because an error code
>> > return from do_move_pages_to_node_array() will cause the status byte for
>> > each page move to not be updated anymore. Application will not be able to
>> > tell anymore which pages were successfully moved and which are not.
>>
>> In case of returning non-zero, valid status is not required according
>> to man page.
>
> Cannot find a statement like that in the man page. The return code
> description is incorrect. It should that that is returns the number of
> pages not moved otherwise an error code (Michael please fix the manpage).

In man page, there is following statement.
"status is an array of integers that return the status of each page.  The array
only contains valid values if move_pages() did not return an error."

And current implementation of move_pages() syscall doesn't return the number
of pages not moved, just return 0 when it encounter some failed pages.
So, if u want to fix the man page, u should fix do_pages_move() first.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
  2012-07-31  3:34             ` JoonSoo Kim
@ 2012-07-31 14:04               ` Christoph Lameter
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-31 14:04 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim, Michael Kerrisk

On Tue, 31 Jul 2012, JoonSoo Kim wrote:

> In man page, there is following statement.
> "status is an array of integers that return the status of each page.  The array
> only contains valid values if move_pages() did not return an error."

> And current implementation of move_pages() syscall doesn't return the number
> of pages not moved, just return 0 when it encounter some failed pages.
> So, if u want to fix the man page, u should fix do_pages_move() first.

Hmm... Yeah actually that is sufficient since the status is readily
obtainable from the status array. It would be better though if the
function would return the number of pages not moved in the same way as
migrate_pages().




^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
@ 2012-07-31 14:04               ` Christoph Lameter
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-07-31 14:04 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim, Michael Kerrisk

On Tue, 31 Jul 2012, JoonSoo Kim wrote:

> In man page, there is following statement.
> "status is an array of integers that return the status of each page.  The array
> only contains valid values if move_pages() did not return an error."

> And current implementation of move_pages() syscall doesn't return the number
> of pages not moved, just return 0 when it encounter some failed pages.
> So, if u want to fix the man page, u should fix do_pages_move() first.

Hmm... Yeah actually that is sufficient since the status is readily
obtainable from the status array. It would be better though if the
function would return the number of pages not moved in the same way as
migrate_pages().



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
  2012-07-30 19:29           ` Christoph Lameter
  (?)
  (?)
@ 2012-08-01  5:15           ` Michael Kerrisk
  2012-08-01 18:00               ` Christoph Lameter
  -1 siblings, 1 reply; 265+ messages in thread
From: Michael Kerrisk @ 2012-08-01  5:15 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: JoonSoo Kim, akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim

[-- Attachment #1: Type: text/plain, Size: 2059 bytes --]

On Mon, Jul 30, 2012 at 9:29 PM, Christoph Lameter <cl@linux.com> wrote:
> On Sat, 28 Jul 2012, JoonSoo Kim wrote:
>
>> 2012/7/28 Christoph Lameter <cl@linux.com>:
>> > On Sat, 28 Jul 2012, Joonsoo Kim wrote:
>> >
>> >> move_pages() syscall may return success in case that
>> >> do_move_page_to_node_array return positive value which means migration failed.
>> >
>> > Nope. It only means that the migration for some pages has failed. This may
>> > still be considered successful for the app if it moves 10000 pages and one
>> > failed.
>> >
>> > This patch would break the move_pages() syscall because an error code
>> > return from do_move_pages_to_node_array() will cause the status byte for
>> > each page move to not be updated anymore. Application will not be able to
>> > tell anymore which pages were successfully moved and which are not.
>>
>> In case of returning non-zero, valid status is not required according
>> to man page.
>
> Cannot find a statement like that in the man page. The return code
> description is incorrect. It should that that is returns the number of
> pages not moved otherwise an error code (Michael please fix the manpage).

Hi Christoph,

Is the patch below acceptable? (I've attached the complete page as well.)

See you in San Diego (?),

Michael

--- a/man2/migrate_pages.2
+++ b/man2/migrate_pages.2
@@ -29,7 +29,7 @@ migrate_pages \- move all pages in a process to
another set of nodes
 Link with \fI\-lnuma\fP.
 .SH DESCRIPTION
 .BR migrate_pages ()
-moves all pages of the process
+attempts to move all pages of the process
 .I pid
 that are in memory nodes
 .I old_nodes
@@ -87,7 +87,8 @@ privilege.
 .SH "RETURN VALUE"
 On success
 .BR migrate_pages ()
-returns zero.
+returns the number of pages that cold not be moved
+(i.e., a return of zero means that all pages were successfully moved).
 On error, it returns \-1, and sets
 .I errno
 to indicate the error.

-- 
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/

[-- Attachment #2: migrate_pages.2 --]
[-- Type: application/octet-stream, Size: 4084 bytes --]

.\" Hey Emacs! This file is -*- nroff -*- source.
.\"
.\" Copyright 2009 Intel Corporation
.\"                Author: Andi Kleen
.\" Based on the move_pages manpage which was
.\" This manpage is Copyright (C) 2006 Silicon Graphics, Inc.
.\"                               Christoph Lameter
.\"
.\" Permission is granted to make and distribute verbatim copies of this
.\" manual provided the copyright notice and this permission notice are
.\" preserved on all copies.
.\"
.\" Permission is granted to copy and distribute modified versions of this
.\" manual under the conditions for verbatim copying, provided that the
.\" entire resulting derived work is distributed under the terms of a
.\" permission notice identical to this one.
.TH MIGRATE_PAGES 2 2012-08-01 "Linux" "Linux Programmer's Manual"
.SH NAME
migrate_pages \- move all pages in a process to another set of nodes
.SH SYNOPSIS
.nf
.B #include <numaif.h>
.sp
.BI "long migrate_pages(int " pid ", unsigned long " maxnode,
.BI "                   const unsigned long *" old_nodes,
.BI "                   const unsigned long *" new_nodes);
.fi
.sp
Link with \fI\-lnuma\fP.
.SH DESCRIPTION
.BR migrate_pages ()
attempts to move all pages of the process
.I pid
that are in memory nodes
.I old_nodes
to the memory nodes in
.IR new_nodes .
Pages not located in any node in
.I old_nodes
will not be migrated.
As far as possible,
the kernel maintains the relative topology relationship inside
.I old_nodes
during the migration to
.IR new_nodes .

The
.I old_nodes
and
.I new_nodes
arguments are pointers to bit masks of node numbers, with up to
.I maxnode
bits in each mask.
These masks are maintained as arrays of unsigned
.I long
integers (in the last
.I long
integer, the bits beyond those specified by
.I maxnode
are ignored).
The
.I maxnode
argument is the maximum node number in the bit mask plus one (this is the same
as in
.BR mbind (2),
but different from
.BR select (2)).

The
.I pid
argument is the ID of the process whose pages are to be moved.
To move pages in another process,
the caller must be privileged
.RB ( CAP_SYS_NICE )
or the real or effective user ID of the calling process must match the
real or saved-set user ID of the target process.
If
.I pid
is 0, then
.BR migrate_pages ()
moves pages of the calling process.

Pages shared with another process will only be moved if the initiating
process has the
.B CAP_SYS_NICE
privilege.
.SH "RETURN VALUE"
On success
.BR migrate_pages ()
returns the number of pages that cold not be moved
(i.e., a return of zero means that all pages were successfully moved).
On error, it returns \-1, and sets
.I errno
to indicate the error.
.SH ERRORS
.TP
.B EPERM
Insufficient privilege
.RB ( CAP_SYS_NICE )
to move pages of the process specified by
.IR pid ,
or insufficient privilege
.RB ( CAP_SYS_NICE )
to access the specified target nodes.
.TP
.B ESRCH
No process matching
.I pid
could be found.
.\" FIXME There are other errors
.SH VERSIONS
The
.BR migrate_pages ()
system call first appeared on Linux in version 2.6.16.
.SH CONFORMING TO
This system call is Linux-specific.
.SH "NOTES"
For information on library support, see
.BR numa (7).

Use
.BR get_mempolicy (2)
with the
.B MPOL_F_MEMS_ALLOWED
flag to obtain the set of nodes that are allowed by
the calling process's cpuset.
Note that this information is subject to change at any
time by manual or automatic reconfiguration of the cpuset.

Use of
.BR migrate_pages ()
may result in pages whose location
(node) violates the memory policy established for the
specified addresses (see
.BR mbind (2))
and/or the specified process (see
.BR set_mempolicy (2)).
That is, memory policy does not constrain the destination
nodes used by
.BR migrate_pages ().

The
.I <numaif.h>
header is not included with glibc, but requires installing
.I libnuma-devel
or a similar package.
.SH "SEE ALSO"
.BR get_mempolicy (2),
.BR mbind (2),
.BR set_mempolicy (2),
.BR numa (3),
.BR numa_maps (5),
.BR cpuset (7),
.BR numa (7),
.BR migratepages (8),
.BR numa_stat (8);
.br
the kernel source file
.IR Documentation/vm/page_migration .

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
  2012-08-01  5:15           ` Michael Kerrisk
@ 2012-08-01 18:00               ` Christoph Lameter
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-08-01 18:00 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: JoonSoo Kim, akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim

On Wed, 1 Aug 2012, Michael Kerrisk wrote:

> Is the patch below acceptable? (I've attached the complete page as well.)

Yes looks good.

> See you in San Diego (?),

Yup. I will be there too.



^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
@ 2012-08-01 18:00               ` Christoph Lameter
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-08-01 18:00 UTC (permalink / raw)
  To: Michael Kerrisk
  Cc: JoonSoo Kim, akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim

On Wed, 1 Aug 2012, Michael Kerrisk wrote:

> Is the patch below acceptable? (I've attached the complete page as well.)

Yes looks good.

> See you in San Diego (?),

Yup. I will be there too.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
  2012-08-01 18:00               ` Christoph Lameter
@ 2012-08-02  5:52                 ` Michael Kerrisk
  -1 siblings, 0 replies; 265+ messages in thread
From: Michael Kerrisk @ 2012-08-02  5:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: JoonSoo Kim, akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim

On Wed, Aug 1, 2012 at 8:00 PM, Christoph Lameter <cl@linux.com> wrote:
> On Wed, 1 Aug 2012, Michael Kerrisk wrote:
>
>> Is the patch below acceptable? (I've attached the complete page as well.)
>
> Yes looks good.

Thanks for checking it!

>> See you in San Diego (?),
>
> Yup. I will be there too.

See you then!

Cheers,

Michael

-- 
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall
@ 2012-08-02  5:52                 ` Michael Kerrisk
  0 siblings, 0 replies; 265+ messages in thread
From: Michael Kerrisk @ 2012-08-02  5:52 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: JoonSoo Kim, akpm, linux-kernel, linux-mm, Brice Goglin, Minchan Kim

On Wed, Aug 1, 2012 at 8:00 PM, Christoph Lameter <cl@linux.com> wrote:
> On Wed, 1 Aug 2012, Michael Kerrisk wrote:
>
>> Is the patch below acceptable? (I've attached the complete page as well.)
>
> Yes looks good.

Thanks for checking it!

>> See you in San Diego (?),
>
> Yup. I will be there too.

See you then!

Cheers,

Michael

-- 
Michael Kerrisk Linux man-pages maintainer;
http://www.kernel.org/doc/man-pages/
Author of "The Linux Programming Interface", http://blog.man7.org/

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH 1/5] workqueue: use enum value to set array size of pools in gcwq
       [not found] <Yes>
                   ` (13 preceding siblings ...)
  2012-07-27 17:55   ` Joonsoo Kim
@ 2012-08-13 16:17 ` Joonsoo Kim
  2012-08-13 16:17   ` [PATCH 2/5] workqueue: change value of lcpu in queue_delayed_work_on() Joonsoo Kim
                     ` (3 more replies)
  2012-08-14 18:10 ` [PATCH v2 0/6] system_highpri_wq Joonsoo Kim
                   ` (8 subsequent siblings)
  23 siblings, 4 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-13 16:17 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

Commit 3270476a6c0ce322354df8679652f060d66526dc ('workqueue: reimplement
WQ_HIGHPRI using a separate worker_pool') introduce separate worker_pool
for HIGHPRI. Although there is NR_WORKER_POOLS enum value which represent
size of pools, definition of worker_pool in gcwq doesn't use it.
Using it makes code robust and prevent future mistakes.
So change code to use this enum value.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 692d976..188eef8 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -183,7 +183,8 @@ struct global_cwq {
 	struct hlist_head	busy_hash[BUSY_WORKER_HASH_SIZE];
 						/* L: hash of busy workers */
 
-	struct worker_pool	pools[2];	/* normal and highpri pools */
+	struct worker_pool	pools[NR_WORKER_POOLS];
+						/* normal and highpri pools */
 
 	wait_queue_head_t	rebind_hold;	/* rebind hold wait */
 } ____cacheline_aligned_in_smp;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 2/5] workqueue: change value of lcpu in queue_delayed_work_on()
  2012-08-13 16:17 ` [PATCH 1/5] workqueue: use enum value to set array size of pools in gcwq Joonsoo Kim
@ 2012-08-13 16:17   ` Joonsoo Kim
  2012-08-13 16:32     ` Tejun Heo
  2012-08-13 16:17   ` [PATCH 3/5] workqueue: introduce system_highpri_wq Joonsoo Kim
                     ` (2 subsequent siblings)
  3 siblings, 1 reply; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-13 16:17 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

We assign cpu id into work struct in queue_delayed_work_on().
In current implementation, when work is come in first time,
current running cpu id is assigned.
If we do queue_delayed_work_on() with CPU A on CPU B,
__queue_work() invoked in delayed_work_timer_fn() go into sub-optimal path
in case of WQ_NON_REENTRANT.
Change it to cpu argument is prevent to go into sub-optimal path.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 188eef8..bc5c5e1 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1158,6 +1158,8 @@ int queue_delayed_work_on(int cpu, struct workqueue_struct *wq,
 
 			if (gcwq && gcwq->cpu != WORK_CPU_UNBOUND)
 				lcpu = gcwq->cpu;
+			else if (cpu >= 0)
+				lcpu = cpu;
 			else
 				lcpu = raw_smp_processor_id();
 		} else
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 3/5] workqueue: introduce system_highpri_wq
  2012-08-13 16:17 ` [PATCH 1/5] workqueue: use enum value to set array size of pools in gcwq Joonsoo Kim
  2012-08-13 16:17   ` [PATCH 2/5] workqueue: change value of lcpu in queue_delayed_work_on() Joonsoo Kim
@ 2012-08-13 16:17   ` Joonsoo Kim
  2012-08-13 16:17   ` [PATCH 4/5] workqueue: use system_highpri_wq for highpri workers in rebind_workers() Joonsoo Kim
  2012-08-13 16:17   ` [PATCH 5/5] workqueue: use system_highpri_wq for unbind_work Joonsoo Kim
  3 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-13 16:17 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

Commit 3270476a6c0ce322354df8679652f060d66526dc ('workqueue: reimplement
WQ_HIGHPRI using a separate worker_pool') introduce separate worker pool
for HIGHPRI. When we handle busyworkers for gcwq, it can be normal worker
or highpri worker. But, we don't consider this difference in rebind_workers(),
we use just system_wq for highpri worker. It makes mismatch between
cwq->pool and worker->pool.

It doesn't make error in current implementation, but possible in the future.
Now, we introduce system_highpri_wq to use proper cwq for highpri workers
in rebind_workers(). Following patch fix this issue properly.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index bc5c5e1..f69f094 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -269,12 +269,14 @@ struct workqueue_struct {
 };
 
 struct workqueue_struct *system_wq __read_mostly;
+struct workqueue_struct *system_highpri_wq __read_mostly;
 struct workqueue_struct *system_long_wq __read_mostly;
 struct workqueue_struct *system_nrt_wq __read_mostly;
 struct workqueue_struct *system_unbound_wq __read_mostly;
 struct workqueue_struct *system_freezable_wq __read_mostly;
 struct workqueue_struct *system_nrt_freezable_wq __read_mostly;
 EXPORT_SYMBOL_GPL(system_wq);
+EXPORT_SYMBOL_GPL(system_highpri_wq);
 EXPORT_SYMBOL_GPL(system_long_wq);
 EXPORT_SYMBOL_GPL(system_nrt_wq);
 EXPORT_SYMBOL_GPL(system_unbound_wq);
@@ -3749,6 +3751,7 @@ static int __init init_workqueues(void)
 	}
 
 	system_wq = alloc_workqueue("events", 0, 0);
+	system_highpri_wq = alloc_workqueue("events_highpri", 0, 0);
 	system_long_wq = alloc_workqueue("events_long", 0, 0);
 	system_nrt_wq = alloc_workqueue("events_nrt", WQ_NON_REENTRANT, 0);
 	system_unbound_wq = alloc_workqueue("events_unbound", WQ_UNBOUND,
@@ -3757,8 +3760,8 @@ static int __init init_workqueues(void)
 					      WQ_FREEZABLE, 0);
 	system_nrt_freezable_wq = alloc_workqueue("events_nrt_freezable",
 			WQ_NON_REENTRANT | WQ_FREEZABLE, 0);
-	BUG_ON(!system_wq || !system_long_wq || !system_nrt_wq ||
-	       !system_unbound_wq || !system_freezable_wq ||
+	BUG_ON(!system_wq || !system_highpri_wq || !system_long_wq ||
+		!system_nrt_wq || !system_unbound_wq || !system_freezable_wq ||
 		!system_nrt_freezable_wq);
 	return 0;
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 4/5] workqueue: use system_highpri_wq for highpri workers in rebind_workers()
  2012-08-13 16:17 ` [PATCH 1/5] workqueue: use enum value to set array size of pools in gcwq Joonsoo Kim
  2012-08-13 16:17   ` [PATCH 2/5] workqueue: change value of lcpu in queue_delayed_work_on() Joonsoo Kim
  2012-08-13 16:17   ` [PATCH 3/5] workqueue: introduce system_highpri_wq Joonsoo Kim
@ 2012-08-13 16:17   ` Joonsoo Kim
  2012-08-13 16:34     ` Tejun Heo
  2012-08-13 16:17   ` [PATCH 5/5] workqueue: use system_highpri_wq for unbind_work Joonsoo Kim
  3 siblings, 1 reply; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-13 16:17 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

In rebind_workers(), we do inserting a work to rebind to cpu for busy workers.
Currently, in this case, we use only system_wq. This makes a possible
error situation as there is mismatch between cwq->pool and worker->pool.

To prevent this, we should use system_highpri_wq for highpri worker
to match theses. This implements it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f69f094..e0e1d41 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -489,6 +489,11 @@ static int worker_pool_pri(struct worker_pool *pool)
 	return pool - pool->gcwq->pools;
 }
 
+static struct workqueue_struct *get_pool_system_wq(struct worker_pool *pool)
+{
+	return worker_pool_pri(pool) ? system_highpri_wq : system_wq;
+}
+
 static struct global_cwq *get_gcwq(unsigned int cpu)
 {
 	if (cpu != WORK_CPU_UNBOUND)
@@ -1450,9 +1455,10 @@ retry:
 
 		/* wq doesn't matter, use the default one */
 		debug_work_activate(rebind_work);
-		insert_work(get_cwq(gcwq->cpu, system_wq), rebind_work,
-			    worker->scheduled.next,
-			    work_color_to_flags(WORK_NO_COLOR));
+		insert_work(
+			get_cwq(gcwq->cpu, get_pool_system_wq(worker->pool)),
+			rebind_work, worker->scheduled.next,
+			work_color_to_flags(WORK_NO_COLOR));
 	}
 }
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 5/5] workqueue: use system_highpri_wq for unbind_work
  2012-08-13 16:17 ` [PATCH 1/5] workqueue: use enum value to set array size of pools in gcwq Joonsoo Kim
                     ` (2 preceding siblings ...)
  2012-08-13 16:17   ` [PATCH 4/5] workqueue: use system_highpri_wq for highpri workers in rebind_workers() Joonsoo Kim
@ 2012-08-13 16:17   ` Joonsoo Kim
  2012-08-13 16:36     ` Tejun Heo
  3 siblings, 1 reply; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-13 16:17 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

To speed cpu down processing up, use system_highpri_wq.
As scheduling priority of workers on it is higher than system_wq and
it is not contended by other normal works on this cpu, work on it
is processed faster than system_wq.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index e0e1d41..5d50955 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3501,7 +3501,7 @@ static int __devinit workqueue_cpu_down_callback(struct notifier_block *nfb,
 	case CPU_DOWN_PREPARE:
 		/* unbinding should happen on the local CPU */
 		INIT_WORK_ONSTACK(&unbind_work, gcwq_unbind_fn);
-		schedule_work_on(cpu, &unbind_work);
+		queue_work_on(cpu, system_highpri_wq, &unbind_work);
 		flush_work(&unbind_work);
 		break;
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/5] workqueue: change value of lcpu in queue_delayed_work_on()
  2012-08-13 16:17   ` [PATCH 2/5] workqueue: change value of lcpu in queue_delayed_work_on() Joonsoo Kim
@ 2012-08-13 16:32     ` Tejun Heo
  2012-08-13 16:54       ` JoonSoo Kim
  0 siblings, 1 reply; 265+ messages in thread
From: Tejun Heo @ 2012-08-13 16:32 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: linux-kernel

Hello,

On Tue, Aug 14, 2012 at 01:17:49AM +0900, Joonsoo Kim wrote:
> We assign cpu id into work struct in queue_delayed_work_on().
> In current implementation, when work is come in first time,
> current running cpu id is assigned.
> If we do queue_delayed_work_on() with CPU A on CPU B,
> __queue_work() invoked in delayed_work_timer_fn() go into sub-optimal path
> in case of WQ_NON_REENTRANT.
> Change it to cpu argument is prevent to go into sub-optimal path.

Which part is suboptimal?  Also, what if @cpu is WQ_UNBOUND?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 4/5] workqueue: use system_highpri_wq for highpri workers in rebind_workers()
  2012-08-13 16:17   ` [PATCH 4/5] workqueue: use system_highpri_wq for highpri workers in rebind_workers() Joonsoo Kim
@ 2012-08-13 16:34     ` Tejun Heo
  2012-08-13 16:57       ` JoonSoo Kim
  0 siblings, 1 reply; 265+ messages in thread
From: Tejun Heo @ 2012-08-13 16:34 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: linux-kernel

On Tue, Aug 14, 2012 at 01:17:51AM +0900, Joonsoo Kim wrote:
> In rebind_workers(), we do inserting a work to rebind to cpu for busy workers.
> Currently, in this case, we use only system_wq. This makes a possible
> error situation as there is mismatch between cwq->pool and worker->pool.
> 
> To prevent this, we should use system_highpri_wq for highpri worker
> to match theses. This implements it.
> 
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index f69f094..e0e1d41 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -489,6 +489,11 @@ static int worker_pool_pri(struct worker_pool *pool)
>  	return pool - pool->gcwq->pools;
>  }
>  
> +static struct workqueue_struct *get_pool_system_wq(struct worker_pool *pool)
> +{
> +	return worker_pool_pri(pool) ? system_highpri_wq : system_wq;
> +}
> +
>  static struct global_cwq *get_gcwq(unsigned int cpu)
>  {
>  	if (cpu != WORK_CPU_UNBOUND)
> @@ -1450,9 +1455,10 @@ retry:
>  
>  		/* wq doesn't matter, use the default one */
>  		debug_work_activate(rebind_work);
> -		insert_work(get_cwq(gcwq->cpu, system_wq), rebind_work,
> -			    worker->scheduled.next,
> -			    work_color_to_flags(WORK_NO_COLOR));
> +		insert_work(
> +			get_cwq(gcwq->cpu, get_pool_system_wq(worker->pool)),
> +			rebind_work, worker->scheduled.next,
> +			work_color_to_flags(WORK_NO_COLOR));

I think it would be better to just opencode system_wq selection in
rebind_workers().

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 5/5] workqueue: use system_highpri_wq for unbind_work
  2012-08-13 16:17   ` [PATCH 5/5] workqueue: use system_highpri_wq for unbind_work Joonsoo Kim
@ 2012-08-13 16:36     ` Tejun Heo
  2012-08-13 17:02       ` JoonSoo Kim
  0 siblings, 1 reply; 265+ messages in thread
From: Tejun Heo @ 2012-08-13 16:36 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: linux-kernel

On Tue, Aug 14, 2012 at 01:17:52AM +0900, Joonsoo Kim wrote:
> To speed cpu down processing up, use system_highpri_wq.
> As scheduling priority of workers on it is higher than system_wq and
> it is not contended by other normal works on this cpu, work on it
> is processed faster than system_wq.

Is this from an actual workload?  ie. do you have a test case where
this matters?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/5] workqueue: change value of lcpu in queue_delayed_work_on()
  2012-08-13 16:32     ` Tejun Heo
@ 2012-08-13 16:54       ` JoonSoo Kim
  2012-08-13 17:03         ` Tejun Heo
  0 siblings, 1 reply; 265+ messages in thread
From: JoonSoo Kim @ 2012-08-13 16:54 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

2012/8/14 Tejun Heo <tj@kernel.org>:
> Hello,
>
> On Tue, Aug 14, 2012 at 01:17:49AM +0900, Joonsoo Kim wrote:
>> We assign cpu id into work struct in queue_delayed_work_on().
>> In current implementation, when work is come in first time,
>> current running cpu id is assigned.
>> If we do queue_delayed_work_on() with CPU A on CPU B,
>> __queue_work() invoked in delayed_work_timer_fn() go into sub-optimal path
>> in case of WQ_NON_REENTRANT.
>> Change it to cpu argument is prevent to go into sub-optimal path.
>
> Which part is suboptimal?  Also, what if @cpu is WQ_UNBOUND?

Hi.
I think a following scenario.

wq = WQ_NON_REENTRANT.
queue_delayed_work_on(CPU B) is invoked in CPU A, so lcpu = CPU A, cpu = CPU B.

In this case, we call add_time_on(CPU B), then delayed_work_timer_fn()
is invoked on CPU B.
delayed_work_timer_fn() calls __queue_work(), then following
comparisons return true!

                gcwq = get_gcwq(cpu);
                if (wq->flags & WQ_NON_REENTRANT &&
                    (last_gcwq = get_work_gcwq(work)) && last_gcwq != gcwq) {

I thinks that if we assign cpu to lcpu, above comparisons return
false, so save some overheads.
Is there any missing part?

And, do u mean  @cpu is WORK_CPU_UNBOUND?

Thanks.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 4/5] workqueue: use system_highpri_wq for highpri workers in rebind_workers()
  2012-08-13 16:34     ` Tejun Heo
@ 2012-08-13 16:57       ` JoonSoo Kim
  2012-08-13 17:05         ` Tejun Heo
  0 siblings, 1 reply; 265+ messages in thread
From: JoonSoo Kim @ 2012-08-13 16:57 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

> I think it would be better to just opencode system_wq selection in
> rebind_workers().

Sorry for my poor English skill.
Could you elaborate "opencode system_wq selection" means?

Thanks!

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 5/5] workqueue: use system_highpri_wq for unbind_work
  2012-08-13 16:36     ` Tejun Heo
@ 2012-08-13 17:02       ` JoonSoo Kim
  2012-08-13 17:07         ` Tejun Heo
  0 siblings, 1 reply; 265+ messages in thread
From: JoonSoo Kim @ 2012-08-13 17:02 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

2012/8/14 Tejun Heo <tj@kernel.org>:
> On Tue, Aug 14, 2012 at 01:17:52AM +0900, Joonsoo Kim wrote:
>> To speed cpu down processing up, use system_highpri_wq.
>> As scheduling priority of workers on it is higher than system_wq and
>> it is not contended by other normal works on this cpu, work on it
>> is processed faster than system_wq.
>
> Is this from an actual workload?  ie. do you have a test case where
> this matters?

No. it is not from an actual workload.
I got this idea suddenly, during implementing others.
I will do a test for this tomorrow.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/5] workqueue: change value of lcpu in queue_delayed_work_on()
  2012-08-13 16:54       ` JoonSoo Kim
@ 2012-08-13 17:03         ` Tejun Heo
  2012-08-13 17:43           ` JoonSoo Kim
  0 siblings, 1 reply; 265+ messages in thread
From: Tejun Heo @ 2012-08-13 17:03 UTC (permalink / raw)
  To: JoonSoo Kim; +Cc: linux-kernel

Hello,

On Tue, Aug 14, 2012 at 01:54:22AM +0900, JoonSoo Kim wrote:
> wq = WQ_NON_REENTRANT.
> queue_delayed_work_on(CPU B) is invoked in CPU A, so lcpu = CPU A, cpu = CPU B.
> 
> In this case, we call add_time_on(CPU B), then delayed_work_timer_fn()
> is invoked on CPU B.
> delayed_work_timer_fn() calls __queue_work(), then following
> comparisons return true!
> 
>                 gcwq = get_gcwq(cpu);
>                 if (wq->flags & WQ_NON_REENTRANT &&
>                     (last_gcwq = get_work_gcwq(work)) && last_gcwq != gcwq) {
> 
> I thinks that if we assign cpu to lcpu, above comparisons return
> false, so save some overheads.
> Is there any missing part?

Nope, that sound correct to me.

> And, do u mean  @cpu is WORK_CPU_UNBOUND?

@cpu could be WORK_CPU_UNBOUND at that point.  The timer will be added
to local CPU but @work->data would be pointing to WORK_CPU_UNBOUND,
again triggering the condition.  Given that @cpu being
WORK_CPU_UNBOUND is far more common than an actual CPU number, the
patch would actually increase spurious nrt lookups.  The right thing
to do is probably setting cpu to raw_smp_processor_id() beforehand.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 4/5] workqueue: use system_highpri_wq for highpri workers in rebind_workers()
  2012-08-13 16:57       ` JoonSoo Kim
@ 2012-08-13 17:05         ` Tejun Heo
  2012-08-13 17:45           ` JoonSoo Kim
  0 siblings, 1 reply; 265+ messages in thread
From: Tejun Heo @ 2012-08-13 17:05 UTC (permalink / raw)
  To: JoonSoo Kim; +Cc: linux-kernel

On Tue, Aug 14, 2012 at 01:57:10AM +0900, JoonSoo Kim wrote:
> > I think it would be better to just opencode system_wq selection in
> > rebind_workers().
> 
> Sorry for my poor English skill.
> Could you elaborate "opencode system_wq selection" means?

Dropping get_pool_system_wq() and putting the logic inside
rebind_workers() directly.  I tend to dislike short helpers which are
used only once.  They tend to obfuscate while not really contributing
to anything else.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 5/5] workqueue: use system_highpri_wq for unbind_work
  2012-08-13 17:02       ` JoonSoo Kim
@ 2012-08-13 17:07         ` Tejun Heo
  2012-08-13 17:52           ` JoonSoo Kim
  0 siblings, 1 reply; 265+ messages in thread
From: Tejun Heo @ 2012-08-13 17:07 UTC (permalink / raw)
  To: JoonSoo Kim; +Cc: linux-kernel

Hello,

On Tue, Aug 14, 2012 at 02:02:31AM +0900, JoonSoo Kim wrote:
> 2012/8/14 Tejun Heo <tj@kernel.org>:
> > On Tue, Aug 14, 2012 at 01:17:52AM +0900, Joonsoo Kim wrote:
> > Is this from an actual workload?  ie. do you have a test case where
> > this matters?
> 
> No. it is not from an actual workload.
> I got this idea suddenly, during implementing others.
> I will do a test for this tomorrow.

I don't think it's a bad idea and guess this wouldn't make much
difference under most workload.  Was mostly wondering whether this
came from some crazy phone powersave optimization.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/5] workqueue: change value of lcpu in queue_delayed_work_on()
  2012-08-13 17:03         ` Tejun Heo
@ 2012-08-13 17:43           ` JoonSoo Kim
  2012-08-13 18:00             ` Tejun Heo
  0 siblings, 1 reply; 265+ messages in thread
From: JoonSoo Kim @ 2012-08-13 17:43 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

>> And, do u mean  @cpu is WORK_CPU_UNBOUND?
>
> @cpu could be WORK_CPU_UNBOUND at that point.  The timer will be added
> to local CPU but @work->data would be pointing to WORK_CPU_UNBOUND,
> again triggering the condition.  Given that @cpu being
> WORK_CPU_UNBOUND is far more common than an actual CPU number, the
> patch would actually increase spurious nrt lookups.  The right thing
> to do is probably setting cpu to raw_smp_processor_id() beforehand.

I got your point.
Thanks for kind illustration.

Following is a alternative implementation for this.
I thinks this is too rare case, so it doesn't help in any real workload.
But how do you thinks?

@@ -1156,7 +1156,9 @@ int queue_delayed_work_on(int cpu, struct
workqueue_struct *wq,
                if (!(wq->flags & WQ_UNBOUND)) {
                        struct global_cwq *gcwq = get_work_gcwq(work);

-                       if (gcwq && gcwq->cpu != WORK_CPU_UNBOUND)
+                       if (!gcwq)
+                               lcpu = cpu;
+                       else if (gcwq->cpu != WORK_CPU_UNBOUND)
                                lcpu = gcwq->cpu;
                        else
                                lcpu = raw_smp_processor_id();

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 4/5] workqueue: use system_highpri_wq for highpri workers in rebind_workers()
  2012-08-13 17:05         ` Tejun Heo
@ 2012-08-13 17:45           ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-08-13 17:45 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

2012/8/14 Tejun Heo <tj@kernel.org>:
> On Tue, Aug 14, 2012 at 01:57:10AM +0900, JoonSoo Kim wrote:
>> > I think it would be better to just opencode system_wq selection in
>> > rebind_workers().
>>
>> Sorry for my poor English skill.
>> Could you elaborate "opencode system_wq selection" means?
>
> Dropping get_pool_system_wq() and putting the logic inside
> rebind_workers() directly.  I tend to dislike short helpers which are
> used only once.  They tend to obfuscate while not really contributing
> to anything else.

Okay!
I will send a next version which adopt your recommendation tomorrow.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 5/5] workqueue: use system_highpri_wq for unbind_work
  2012-08-13 17:07         ` Tejun Heo
@ 2012-08-13 17:52           ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-08-13 17:52 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

2012/8/14 Tejun Heo <tj@kernel.org>:
> Hello,
>
> On Tue, Aug 14, 2012 at 02:02:31AM +0900, JoonSoo Kim wrote:
>> 2012/8/14 Tejun Heo <tj@kernel.org>:
>> > On Tue, Aug 14, 2012 at 01:17:52AM +0900, Joonsoo Kim wrote:
>> > Is this from an actual workload?  ie. do you have a test case where
>> > this matters?
>>
>> No. it is not from an actual workload.
>> I got this idea suddenly, during implementing others.
>> I will do a test for this tomorrow.
>
> I don't think it's a bad idea and guess this wouldn't make much
> difference under most workload.  Was mostly wondering whether this
> came from some crazy phone powersave optimization.

Okay.
It is not from an actual workload,
but I intent to optimize some crazy phone powersave :)

Thanks for comments.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/5] workqueue: change value of lcpu in queue_delayed_work_on()
  2012-08-13 17:43           ` JoonSoo Kim
@ 2012-08-13 18:00             ` Tejun Heo
  2012-08-14 18:04               ` JoonSoo Kim
  0 siblings, 1 reply; 265+ messages in thread
From: Tejun Heo @ 2012-08-13 18:00 UTC (permalink / raw)
  To: JoonSoo Kim; +Cc: linux-kernel

Hello,

On Tue, Aug 14, 2012 at 02:43:28AM +0900, JoonSoo Kim wrote:
> >> And, do u mean  @cpu is WORK_CPU_UNBOUND?
> >
> > @cpu could be WORK_CPU_UNBOUND at that point.  The timer will be added
> > to local CPU but @work->data would be pointing to WORK_CPU_UNBOUND,
> > again triggering the condition.  Given that @cpu being
> > WORK_CPU_UNBOUND is far more common than an actual CPU number, the
> > patch would actually increase spurious nrt lookups.  The right thing
> > to do is probably setting cpu to raw_smp_processor_id() beforehand.
> 
> I got your point.
> Thanks for kind illustration.
> 
> Following is a alternative implementation for this.
> I thinks this is too rare case, so it doesn't help in any real workload.
> But how do you thinks?
> 
> @@ -1156,7 +1156,9 @@ int queue_delayed_work_on(int cpu, struct
> workqueue_struct *wq,
>                 if (!(wq->flags & WQ_UNBOUND)) {
>                         struct global_cwq *gcwq = get_work_gcwq(work);
> 
> -                       if (gcwq && gcwq->cpu != WORK_CPU_UNBOUND)
> +                       if (!gcwq)
> +                               lcpu = cpu;
> +                       else if (gcwq->cpu != WORK_CPU_UNBOUND)
>                                 lcpu = gcwq->cpu;
>                         else
>                                 lcpu = raw_smp_processor_id();

Why not just do

	if (cpu == WORK_CPU_UNBOUND)
		cpu = raw_smp_processor_id();

	if (!(wq->flags...) {
		...
		if (gcwq && gcwq->cpu != WORK_CPU_UNBOUND)
			lcpu = gcwq->cpu;
		else
			lcpu = cpu;
	}
	...

	add_timer_on(timer, cpu);

Also, can you please base the patches on top of the following git
branch?

  git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-3.7

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/5] workqueue: change value of lcpu in queue_delayed_work_on()
  2012-08-13 18:00             ` Tejun Heo
@ 2012-08-14 18:04               ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-08-14 18:04 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

> Why not just do
>
>         if (cpu == WORK_CPU_UNBOUND)
>                 cpu = raw_smp_processor_id();
>
>         if (!(wq->flags...) {
>                 ...
>                 if (gcwq && gcwq->cpu != WORK_CPU_UNBOUND)
>                         lcpu = gcwq->cpu;
>                 else
>                         lcpu = cpu;
>         }
>         ...
>
>         add_timer_on(timer, cpu);

I re-look at code deeply and find that dwork->cpu is related to
tracing information.
If we do above change, tracing information is bias.

> Also, can you please base the patches on top of the following git
> branch?
>
>   git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-3.7

Sure!

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v2 0/6] system_highpri_wq
       [not found] <Yes>
                   ` (14 preceding siblings ...)
  2012-08-13 16:17 ` [PATCH 1/5] workqueue: use enum value to set array size of pools in gcwq Joonsoo Kim
@ 2012-08-14 18:10 ` Joonsoo Kim
  2012-08-14 18:10   ` [PATCH v2 1/6] workqueue: use enum value to set array size of pools in gcwq Joonsoo Kim
                     ` (5 more replies)
  2012-08-15 14:25 ` [PATCH v3 0/6] system_highpri_wq Joonsoo Kim
                   ` (7 subsequent siblings)
  23 siblings, 6 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-14 18:10 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

Change from v1
[ALL] Add cover-letter
[ALL] Rebase on git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-3.7

[1/6] No change
[2/6] First added into this series.
[3/6] Add change log to clarify what pacth do,
      Add logic to handle the case "@cpu is WORK_CPU_UNBOUND"
[4/6] Fix awsome mistake: Set WQ_HIGHPRI for system_highpri_wq
[5/6] Adopt Tejun's comment about selection system_wq
[6/6] No change

This patchset introduce system_highpri_wq
in order to use proper cwq for highpri worker.

First 3 patches are not related to that purpose.
Just fix arbitrary issues.
Last 3 patches are for our purpose.

Joonsoo Kim (6):
  workqueue: use enum value to set array size of pools in gcwq
  workqueue: correct req_cpu in trace_workqueue_queue_work()
  workqueue: change value of lcpu in __queue_delayed_work_on()
  workqueue: introduce system_highpri_wq
  workqueue: use system_highpri_wq for highpri workers in
    rebind_workers()
  workqueue: use system_highpri_wq for unbind_work

 kernel/workqueue.c |   34 +++++++++++++++++++++++-----------
 1 file changed, 23 insertions(+), 11 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v2 1/6] workqueue: use enum value to set array size of pools in gcwq
  2012-08-14 18:10 ` [PATCH v2 0/6] system_highpri_wq Joonsoo Kim
@ 2012-08-14 18:10   ` Joonsoo Kim
  2012-08-14 18:10   ` [PATCH v2 2/6] workqueue: correct req_cpu in trace_workqueue_queue_work() Joonsoo Kim
                     ` (4 subsequent siblings)
  5 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-14 18:10 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

Commit 3270476a6c0ce322354df8679652f060d66526dc ('workqueue: reimplement
WQ_HIGHPRI using a separate worker_pool') introduce separate worker_pool
for HIGHPRI. Although there is NR_WORKER_POOLS enum value which represent
size of pools, definition of worker_pool in gcwq doesn't use it.
Using it makes code robust and prevent future mistakes.
So change code to use this enum value.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 4fef952..49d8f4a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -183,7 +183,8 @@ struct global_cwq {
 	struct hlist_head	busy_hash[BUSY_WORKER_HASH_SIZE];
 						/* L: hash of busy workers */
 
-	struct worker_pool	pools[2];	/* normal and highpri pools */
+	struct worker_pool	pools[NR_WORKER_POOLS];
+						/* normal and highpri pools */
 
 	wait_queue_head_t	rebind_hold;	/* rebind hold wait */
 } ____cacheline_aligned_in_smp;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v2 2/6] workqueue: correct req_cpu in trace_workqueue_queue_work()
  2012-08-14 18:10 ` [PATCH v2 0/6] system_highpri_wq Joonsoo Kim
  2012-08-14 18:10   ` [PATCH v2 1/6] workqueue: use enum value to set array size of pools in gcwq Joonsoo Kim
@ 2012-08-14 18:10   ` Joonsoo Kim
  2012-08-14 18:34     ` Tejun Heo
  2012-08-14 18:10   ` [PATCH v2 3/6] workqueue: change value of lcpu in __queue_delayed_work_on() Joonsoo Kim
                     ` (3 subsequent siblings)
  5 siblings, 1 reply; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-14 18:10 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

When we do tracing workqueue_queue_work(), it records requested cpu.
But, if !(@wq->flag & WQ_UNBOUND) and @cpu is WORK_CPU_UNBOUND,
requested cpu is changed as local cpu.
In case of @wq->flag & WQ_UNBOUND, above change is not occured,
therefore it is reasonable to correct it.

Use temporary local variable for storing local cpu.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 49d8f4a..6a17ab0 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1198,6 +1198,7 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 	struct cpu_workqueue_struct *cwq;
 	struct list_head *worklist;
 	unsigned int work_flags;
+	unsigned int lcpu;
 
 	/*
 	 * While a work item is PENDING && off queue, a task trying to
@@ -1219,7 +1220,9 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 		struct global_cwq *last_gcwq;
 
 		if (cpu == WORK_CPU_UNBOUND)
-			cpu = raw_smp_processor_id();
+			lcpu = raw_smp_processor_id();
+		else
+			lcpu = cpu;
 
 		/*
 		 * It's multi cpu.  If @wq is non-reentrant and @work
@@ -1227,7 +1230,7 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 		 * be running there, in which case the work needs to
 		 * be queued on that cpu to guarantee non-reentrance.
 		 */
-		gcwq = get_gcwq(cpu);
+		gcwq = get_gcwq(lcpu);
 		if (wq->flags & WQ_NON_REENTRANT &&
 		    (last_gcwq = get_work_gcwq(work)) && last_gcwq != gcwq) {
 			struct worker *worker;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v2 3/6] workqueue: change value of lcpu in __queue_delayed_work_on()
  2012-08-14 18:10 ` [PATCH v2 0/6] system_highpri_wq Joonsoo Kim
  2012-08-14 18:10   ` [PATCH v2 1/6] workqueue: use enum value to set array size of pools in gcwq Joonsoo Kim
  2012-08-14 18:10   ` [PATCH v2 2/6] workqueue: correct req_cpu in trace_workqueue_queue_work() Joonsoo Kim
@ 2012-08-14 18:10   ` Joonsoo Kim
  2012-08-14 19:00     ` Tejun Heo
  2012-08-14 18:10   ` [PATCH v2 4/6] workqueue: introduce system_highpri_wq Joonsoo Kim
                     ` (2 subsequent siblings)
  5 siblings, 1 reply; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-14 18:10 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

We assign cpu id into work struct's data field in __queue_delayed_work_on().
In current implementation, when work is come in first time,
current running cpu id is assigned.
If we do __queue_delayed_work_on() with CPU A on CPU B,
__queue_work() invoked in delayed_work_timer_fn() go into
the following sub-optimal path in case of WQ_NON_REENTRANT.

	gcwq = get_gcwq(cpu);
	if (wq->flags & WQ_NON_REENTRANT &&
		(last_gcwq = get_work_gcwq(work)) && last_gcwq != gcwq) {

Change lcpu to @cpu and rechange lcpu to local cpu if lcpu is WORK_CPU_UNBOUND.
It is sufficient to prevent to go into sub-optimal path.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 6a17ab0..f55ac26 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1358,9 +1358,10 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
 	if (!(wq->flags & WQ_UNBOUND)) {
 		struct global_cwq *gcwq = get_work_gcwq(work);
 
-		if (gcwq && gcwq->cpu != WORK_CPU_UNBOUND)
+		lcpu = cpu;
+		if (gcwq)
 			lcpu = gcwq->cpu;
-		else
+		if (lcpu == WORK_CPU_UNBOUND)
 			lcpu = raw_smp_processor_id();
 	} else {
 		lcpu = WORK_CPU_UNBOUND;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v2 4/6] workqueue: introduce system_highpri_wq
  2012-08-14 18:10 ` [PATCH v2 0/6] system_highpri_wq Joonsoo Kim
                     ` (2 preceding siblings ...)
  2012-08-14 18:10   ` [PATCH v2 3/6] workqueue: change value of lcpu in __queue_delayed_work_on() Joonsoo Kim
@ 2012-08-14 18:10   ` Joonsoo Kim
  2012-08-14 18:10   ` [PATCH v2 5/6] workqueue: use system_highpri_wq for highpri workers in rebind_workers() Joonsoo Kim
  2012-08-14 18:10   ` [PATCH v2 6/6] workqueue: use system_highpri_wq for unbind_work Joonsoo Kim
  5 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-14 18:10 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

Commit 3270476a6c0ce322354df8679652f060d66526dc ('workqueue: reimplement
WQ_HIGHPRI using a separate worker_pool') introduce separate worker pool
for HIGHPRI. When we handle busyworkers for gcwq, it can be normal worker
or highpri worker. But, we don't consider this difference in rebind_workers(),
we use just system_wq for highpri worker. It makes mismatch between
cwq->pool and worker->pool.

It doesn't make error in current implementation, but possible in the future.
Now, we introduce system_highpri_wq to use proper cwq for highpri workers
in rebind_workers(). Following patch fix this issue properly.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index f55ac26..470b0eb 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -269,12 +269,14 @@ struct workqueue_struct {
 };
 
 struct workqueue_struct *system_wq __read_mostly;
+struct workqueue_struct *system_highpri_wq __read_mostly;
 struct workqueue_struct *system_long_wq __read_mostly;
 struct workqueue_struct *system_nrt_wq __read_mostly;
 struct workqueue_struct *system_unbound_wq __read_mostly;
 struct workqueue_struct *system_freezable_wq __read_mostly;
 struct workqueue_struct *system_nrt_freezable_wq __read_mostly;
 EXPORT_SYMBOL_GPL(system_wq);
+EXPORT_SYMBOL_GPL(system_highpri_wq);
 EXPORT_SYMBOL_GPL(system_long_wq);
 EXPORT_SYMBOL_GPL(system_nrt_wq);
 EXPORT_SYMBOL_GPL(system_unbound_wq);
@@ -3925,6 +3927,7 @@ static int __init init_workqueues(void)
 	}
 
 	system_wq = alloc_workqueue("events", 0, 0);
+	system_highpri_wq = alloc_workqueue("events_highpri", WQ_HIGHPRI, 0);
 	system_long_wq = alloc_workqueue("events_long", 0, 0);
 	system_nrt_wq = alloc_workqueue("events_nrt", WQ_NON_REENTRANT, 0);
 	system_unbound_wq = alloc_workqueue("events_unbound", WQ_UNBOUND,
@@ -3933,8 +3936,8 @@ static int __init init_workqueues(void)
 					      WQ_FREEZABLE, 0);
 	system_nrt_freezable_wq = alloc_workqueue("events_nrt_freezable",
 			WQ_NON_REENTRANT | WQ_FREEZABLE, 0);
-	BUG_ON(!system_wq || !system_long_wq || !system_nrt_wq ||
-	       !system_unbound_wq || !system_freezable_wq ||
+	BUG_ON(!system_wq || !system_highpri_wq || !system_long_wq ||
+		!system_nrt_wq || !system_unbound_wq || !system_freezable_wq ||
 		!system_nrt_freezable_wq);
 	return 0;
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v2 5/6] workqueue: use system_highpri_wq for highpri workers in rebind_workers()
  2012-08-14 18:10 ` [PATCH v2 0/6] system_highpri_wq Joonsoo Kim
                     ` (3 preceding siblings ...)
  2012-08-14 18:10   ` [PATCH v2 4/6] workqueue: introduce system_highpri_wq Joonsoo Kim
@ 2012-08-14 18:10   ` Joonsoo Kim
  2012-08-14 18:29     ` Tejun Heo
  2012-08-14 18:10   ` [PATCH v2 6/6] workqueue: use system_highpri_wq for unbind_work Joonsoo Kim
  5 siblings, 1 reply; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-14 18:10 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

In rebind_workers(), we do inserting a work to rebind to cpu for busy workers.
Currently, in this case, we use only system_wq. This makes a possible
error situation as there is mismatch between cwq->pool and worker->pool.

To prevent this, we should use system_highpri_wq for highpri worker
to match theses. This implements it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 470b0eb..4c5733c1 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1738,6 +1738,7 @@ retry:
 	/* rebind busy workers */
 	for_each_busy_worker(worker, i, pos, gcwq) {
 		struct work_struct *rebind_work = &worker->rebind_work;
+		struct workqueue_struct *wq;
 
 		/* morph UNBOUND to REBIND */
 		worker->flags &= ~WORKER_UNBOUND;
@@ -1749,9 +1750,12 @@ retry:
 
 		/* wq doesn't matter, use the default one */
 		debug_work_activate(rebind_work);
-		insert_work(get_cwq(gcwq->cpu, system_wq), rebind_work,
-			    worker->scheduled.next,
-			    work_color_to_flags(WORK_NO_COLOR));
+		wq = worker_pool_pri(worker->pool) ? system_highpri_wq :
+								system_wq;
+		insert_work(
+			get_cwq(gcwq->cpu, wq),
+			rebind_work, worker->scheduled.next,
+			work_color_to_flags(WORK_NO_COLOR));
 	}
 }
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v2 6/6] workqueue: use system_highpri_wq for unbind_work
  2012-08-14 18:10 ` [PATCH v2 0/6] system_highpri_wq Joonsoo Kim
                     ` (4 preceding siblings ...)
  2012-08-14 18:10   ` [PATCH v2 5/6] workqueue: use system_highpri_wq for highpri workers in rebind_workers() Joonsoo Kim
@ 2012-08-14 18:10   ` Joonsoo Kim
  5 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-14 18:10 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

To speed cpu down processing up, use system_highpri_wq.
As scheduling priority of workers on it is higher than system_wq and
it is not contended by other normal works on this cpu, work on it
is processed faster than system_wq.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 4c5733c1..fa5ec4c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3671,7 +3671,7 @@ static int __devinit workqueue_cpu_down_callback(struct notifier_block *nfb,
 	case CPU_DOWN_PREPARE:
 		/* unbinding should happen on the local CPU */
 		INIT_WORK_ONSTACK(&unbind_work, gcwq_unbind_fn);
-		schedule_work_on(cpu, &unbind_work);
+		queue_work_on(cpu, system_highpri_wq, &unbind_work);
 		flush_work(&unbind_work);
 		break;
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 5/6] workqueue: use system_highpri_wq for highpri workers in rebind_workers()
  2012-08-14 18:10   ` [PATCH v2 5/6] workqueue: use system_highpri_wq for highpri workers in rebind_workers() Joonsoo Kim
@ 2012-08-14 18:29     ` Tejun Heo
  0 siblings, 0 replies; 265+ messages in thread
From: Tejun Heo @ 2012-08-14 18:29 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: linux-kernel

On Wed, Aug 15, 2012 at 03:10:15AM +0900, Joonsoo Kim wrote:
> In rebind_workers(), we do inserting a work to rebind to cpu for busy workers.
> Currently, in this case, we use only system_wq. This makes a possible
> error situation as there is mismatch between cwq->pool and worker->pool.
> 
> To prevent this, we should use system_highpri_wq for highpri worker
> to match theses. This implements it.
> 
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 470b0eb..4c5733c1 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1738,6 +1738,7 @@ retry:
>  	/* rebind busy workers */
>  	for_each_busy_worker(worker, i, pos, gcwq) {
>  		struct work_struct *rebind_work = &worker->rebind_work;
> +		struct workqueue_struct *wq;
>  
>  		/* morph UNBOUND to REBIND */
>  		worker->flags &= ~WORKER_UNBOUND;
> @@ -1749,9 +1750,12 @@ retry:
>  
>  		/* wq doesn't matter, use the default one */
>  		debug_work_activate(rebind_work);
> -		insert_work(get_cwq(gcwq->cpu, system_wq), rebind_work,
> -			    worker->scheduled.next,
> -			    work_color_to_flags(WORK_NO_COLOR));
> +		wq = worker_pool_pri(worker->pool) ? system_highpri_wq :
> +								system_wq;
> +		insert_work(
> +			get_cwq(gcwq->cpu, wq),
> +			rebind_work, worker->scheduled.next,
> +			work_color_to_flags(WORK_NO_COLOR));
				
Umm... this indentation is ugly.  Please follow the indentation of the
surrounding code.  If ?: gets too long, using if/else might be better.
Please also comment why the above is necessary.  The comment above
says "wq doesn't matter" and then we're choosing workqueue, which is
confusing.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 2/6] workqueue: correct req_cpu in trace_workqueue_queue_work()
  2012-08-14 18:10   ` [PATCH v2 2/6] workqueue: correct req_cpu in trace_workqueue_queue_work() Joonsoo Kim
@ 2012-08-14 18:34     ` Tejun Heo
  2012-08-14 18:56       ` JoonSoo Kim
  0 siblings, 1 reply; 265+ messages in thread
From: Tejun Heo @ 2012-08-14 18:34 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: linux-kernel

Hello,

On Wed, Aug 15, 2012 at 03:10:12AM +0900, Joonsoo Kim wrote:
> When we do tracing workqueue_queue_work(), it records requested cpu.
> But, if !(@wq->flag & WQ_UNBOUND) and @cpu is WORK_CPU_UNBOUND,
> requested cpu is changed as local cpu.
> In case of @wq->flag & WQ_UNBOUND, above change is not occured,
> therefore it is reasonable to correct it.
>
> Use temporary local variable for storing local cpu.
> 
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 49d8f4a..6a17ab0 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1198,6 +1198,7 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
>  	struct cpu_workqueue_struct *cwq;
>  	struct list_head *worklist;
>  	unsigned int work_flags;
> +	unsigned int lcpu;

@lcpu in __queue_delayed_work() stands for "last cpu", which is kinda
weird here.  Maybe just add "unsigned int req_cpu = cpu" at the top of
the function and use it for the TP?

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 2/6] workqueue: correct req_cpu in trace_workqueue_queue_work()
  2012-08-14 18:34     ` Tejun Heo
@ 2012-08-14 18:56       ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-08-14 18:56 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

2012/8/15 Tejun Heo <tj@kernel.org>:
> Hello,
>
> On Wed, Aug 15, 2012 at 03:10:12AM +0900, Joonsoo Kim wrote:
>> When we do tracing workqueue_queue_work(), it records requested cpu.
>> But, if !(@wq->flag & WQ_UNBOUND) and @cpu is WORK_CPU_UNBOUND,
>> requested cpu is changed as local cpu.
>> In case of @wq->flag & WQ_UNBOUND, above change is not occured,
>> therefore it is reasonable to correct it.
>>
>> Use temporary local variable for storing local cpu.
>>
>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>>
>> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
>> index 49d8f4a..6a17ab0 100644
>> --- a/kernel/workqueue.c
>> +++ b/kernel/workqueue.c
>> @@ -1198,6 +1198,7 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
>>       struct cpu_workqueue_struct *cwq;
>>       struct list_head *worklist;
>>       unsigned int work_flags;
>> +     unsigned int lcpu;
>
> @lcpu in __queue_delayed_work() stands for "last cpu", which is kinda
> weird here.  Maybe just add "unsigned int req_cpu = cpu" at the top of
> the function and use it for the TP?

Okay! It looks better.
I will re-send v3 of two patches commented from u.
Thanks!

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 3/6] workqueue: change value of lcpu in __queue_delayed_work_on()
  2012-08-14 18:10   ` [PATCH v2 3/6] workqueue: change value of lcpu in __queue_delayed_work_on() Joonsoo Kim
@ 2012-08-14 19:00     ` Tejun Heo
  0 siblings, 0 replies; 265+ messages in thread
From: Tejun Heo @ 2012-08-14 19:00 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: linux-kernel

On Wed, Aug 15, 2012 at 03:10:13AM +0900, Joonsoo Kim wrote:
> We assign cpu id into work struct's data field in __queue_delayed_work_on().
> In current implementation, when work is come in first time,
> current running cpu id is assigned.
> If we do __queue_delayed_work_on() with CPU A on CPU B,
> __queue_work() invoked in delayed_work_timer_fn() go into
> the following sub-optimal path in case of WQ_NON_REENTRANT.
> 
> 	gcwq = get_gcwq(cpu);
> 	if (wq->flags & WQ_NON_REENTRANT &&
> 		(last_gcwq = get_work_gcwq(work)) && last_gcwq != gcwq) {
> 
> Change lcpu to @cpu and rechange lcpu to local cpu if lcpu is WORK_CPU_UNBOUND.
> It is sufficient to prevent to go into sub-optimal path.
> 
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> 
> diff --git a/kernel/workqueue.c b/kernel/workqueue.c
> index 6a17ab0..f55ac26 100644
> --- a/kernel/workqueue.c
> +++ b/kernel/workqueue.c
> @@ -1358,9 +1358,10 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
>  	if (!(wq->flags & WQ_UNBOUND)) {
>  		struct global_cwq *gcwq = get_work_gcwq(work);
>  
> -		if (gcwq && gcwq->cpu != WORK_CPU_UNBOUND)
> +		lcpu = cpu;
> +		if (gcwq)
>  			lcpu = gcwq->cpu;
> -		else
> +		if (lcpu == WORK_CPU_UNBOUND)
>  			lcpu = raw_smp_processor_id();

Can you please add a comment explanining what's going on?

-- 
tejun

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v3 0/6] system_highpri_wq
       [not found] <Yes>
                   ` (15 preceding siblings ...)
  2012-08-14 18:10 ` [PATCH v2 0/6] system_highpri_wq Joonsoo Kim
@ 2012-08-15 14:25 ` Joonsoo Kim
  2012-08-15 14:25   ` [PATCH v3 1/6] workqueue: use enum value to set array size of pools in gcwq Joonsoo Kim
                     ` (6 more replies)
  2012-08-24 16:05   ` Joonsoo Kim
                   ` (6 subsequent siblings)
  23 siblings, 7 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-15 14:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

Change from v2
[1/6] No change
[2/6] Change local variable name and use it directly for TP
[3/6] Add a comment.
[4/6] No change
[5/6] Add a comment. Fix ugly indentation.
[6/6] No change

Change from v1
[ALL] Add cover-letter
[ALL] Rebase on git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-3.7

[1/6] No change
[2/6] First added into this series.
[3/6] Add change log to clarify what pacth do,
      Add logic to handle the case "@cpu is WORK_CPU_UNBOUND"
[4/6] Fix awsome mistake: Set WQ_HIGHPRI for system_highpri_wq
[5/6] Adopt Tejun's comment about selection system_wq
[6/6] No change

This patchset introduce system_highpri_wq
in order to use proper cwq for highpri worker.

First 3 patches are not related to that purpose.
Just fix arbitrary issues.
Last 3 patches are for our purpose.

Joonsoo Kim (6):
  workqueue: use enum value to set array size of pools in gcwq
  workqueue: correct req_cpu in trace_workqueue_queue_work()
  workqueue: change value of lcpu in __queue_delayed_work_on()
  workqueue: introduce system_highpri_wq
  workqueue: use system_highpri_wq for highpri workers in
    rebind_workers()
  workqueue: use system_highpri_wq for unbind_work

 kernel/workqueue.c |   42 +++++++++++++++++++++++++++++++-----------
 1 file changed, 31 insertions(+), 11 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v3 1/6] workqueue: use enum value to set array size of pools in gcwq
  2012-08-15 14:25 ` [PATCH v3 0/6] system_highpri_wq Joonsoo Kim
@ 2012-08-15 14:25   ` Joonsoo Kim
  2012-08-15 14:25   ` [PATCH v3 2/6] workqueue: correct req_cpu in trace_workqueue_queue_work() Joonsoo Kim
                     ` (5 subsequent siblings)
  6 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-15 14:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

Commit 3270476a6c0ce322354df8679652f060d66526dc ('workqueue: reimplement
WQ_HIGHPRI using a separate worker_pool') introduce separate worker_pool
for HIGHPRI. Although there is NR_WORKER_POOLS enum value which represent
size of pools, definition of worker_pool in gcwq doesn't use it.
Using it makes code robust and prevent future mistakes.
So change code to use this enum value.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 4fef952..49d8f4a 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -183,7 +183,8 @@ struct global_cwq {
 	struct hlist_head	busy_hash[BUSY_WORKER_HASH_SIZE];
 						/* L: hash of busy workers */
 
-	struct worker_pool	pools[2];	/* normal and highpri pools */
+	struct worker_pool	pools[NR_WORKER_POOLS];
+						/* normal and highpri pools */
 
 	wait_queue_head_t	rebind_hold;	/* rebind hold wait */
 } ____cacheline_aligned_in_smp;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v3 2/6] workqueue: correct req_cpu in trace_workqueue_queue_work()
  2012-08-15 14:25 ` [PATCH v3 0/6] system_highpri_wq Joonsoo Kim
  2012-08-15 14:25   ` [PATCH v3 1/6] workqueue: use enum value to set array size of pools in gcwq Joonsoo Kim
@ 2012-08-15 14:25   ` Joonsoo Kim
  2012-08-15 14:25   ` [PATCH v3 3/6] workqueue: change value of lcpu in __queue_delayed_work_on() Joonsoo Kim
                     ` (4 subsequent siblings)
  6 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-15 14:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

When we do tracing workqueue_queue_work(), it records requested cpu.
But, if !(@wq->flag & WQ_UNBOUND) and @cpu is WORK_CPU_UNBOUND,
requested cpu is changed as local cpu.
In case of @wq->flag & WQ_UNBOUND, above change is not occured,
therefore it is reasonable to correct it.

Use temporary local variable for storing requested cpu.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 49d8f4a..c29f2dc 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1198,6 +1198,7 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 	struct cpu_workqueue_struct *cwq;
 	struct list_head *worklist;
 	unsigned int work_flags;
+	unsigned int req_cpu = cpu;
 
 	/*
 	 * While a work item is PENDING && off queue, a task trying to
@@ -1253,7 +1254,7 @@ static void __queue_work(unsigned int cpu, struct workqueue_struct *wq,
 
 	/* gcwq determined, get cwq and queue */
 	cwq = get_cwq(gcwq->cpu, wq);
-	trace_workqueue_queue_work(cpu, cwq, work);
+	trace_workqueue_queue_work(req_cpu, cwq, work);
 
 	if (WARN_ON(!list_empty(&work->entry))) {
 		spin_unlock(&gcwq->lock);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v3 3/6] workqueue: change value of lcpu in __queue_delayed_work_on()
  2012-08-15 14:25 ` [PATCH v3 0/6] system_highpri_wq Joonsoo Kim
  2012-08-15 14:25   ` [PATCH v3 1/6] workqueue: use enum value to set array size of pools in gcwq Joonsoo Kim
  2012-08-15 14:25   ` [PATCH v3 2/6] workqueue: correct req_cpu in trace_workqueue_queue_work() Joonsoo Kim
@ 2012-08-15 14:25   ` Joonsoo Kim
  2012-08-15 14:25   ` [PATCH v3 4/6] workqueue: introduce system_highpri_wq Joonsoo Kim
                     ` (3 subsequent siblings)
  6 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-15 14:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

We assign cpu id into work struct's data field in __queue_delayed_work_on().
In current implementation, when work is come in first time,
current running cpu id is assigned.
If we do __queue_delayed_work_on() with CPU A on CPU B,
__queue_work() invoked in delayed_work_timer_fn() go into
the following sub-optimal path in case of WQ_NON_REENTRANT.

	gcwq = get_gcwq(cpu);
	if (wq->flags & WQ_NON_REENTRANT &&
		(last_gcwq = get_work_gcwq(work)) && last_gcwq != gcwq) {

Change lcpu to @cpu and rechange lcpu to local cpu if lcpu is WORK_CPU_UNBOUND.
It is sufficient to prevent to go into sub-optimal path.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c29f2dc..32c4f79 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1356,9 +1356,16 @@ static void __queue_delayed_work(int cpu, struct workqueue_struct *wq,
 	if (!(wq->flags & WQ_UNBOUND)) {
 		struct global_cwq *gcwq = get_work_gcwq(work);
 
-		if (gcwq && gcwq->cpu != WORK_CPU_UNBOUND)
+		/*
+		 * If we cannot get gcwq from work directly, we should
+		 * deliberately select last cpu not to go into sub-optimal
+		 * path of reentrance detection for delayed work. In this case,
+		 * we assign requested cpu to lcpu except WORK_CPU_UNBOUND
+		 */
+		lcpu = cpu;
+		if (gcwq)
 			lcpu = gcwq->cpu;
-		else
+		if (lcpu == WORK_CPU_UNBOUND)
 			lcpu = raw_smp_processor_id();
 	} else {
 		lcpu = WORK_CPU_UNBOUND;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v3 4/6] workqueue: introduce system_highpri_wq
  2012-08-15 14:25 ` [PATCH v3 0/6] system_highpri_wq Joonsoo Kim
                     ` (2 preceding siblings ...)
  2012-08-15 14:25   ` [PATCH v3 3/6] workqueue: change value of lcpu in __queue_delayed_work_on() Joonsoo Kim
@ 2012-08-15 14:25   ` Joonsoo Kim
  2012-08-15 14:25   ` [PATCH v3 5/6] workqueue: use system_highpri_wq for highpri workers in rebind_workers() Joonsoo Kim
                     ` (2 subsequent siblings)
  6 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-15 14:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

Commit 3270476a6c0ce322354df8679652f060d66526dc ('workqueue: reimplement
WQ_HIGHPRI using a separate worker_pool') introduce separate worker pool
for HIGHPRI. When we handle busyworkers for gcwq, it can be normal worker
or highpri worker. But, we don't consider this difference in rebind_workers(),
we use just system_wq for highpri worker. It makes mismatch between
cwq->pool and worker->pool.

It doesn't make error in current implementation, but possible in the future.
Now, we introduce system_highpri_wq to use proper cwq for highpri workers
in rebind_workers(). Following patch fix this issue properly.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 32c4f79..a768ffd 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -269,12 +269,14 @@ struct workqueue_struct {
 };
 
 struct workqueue_struct *system_wq __read_mostly;
+struct workqueue_struct *system_highpri_wq __read_mostly;
 struct workqueue_struct *system_long_wq __read_mostly;
 struct workqueue_struct *system_nrt_wq __read_mostly;
 struct workqueue_struct *system_unbound_wq __read_mostly;
 struct workqueue_struct *system_freezable_wq __read_mostly;
 struct workqueue_struct *system_nrt_freezable_wq __read_mostly;
 EXPORT_SYMBOL_GPL(system_wq);
+EXPORT_SYMBOL_GPL(system_highpri_wq);
 EXPORT_SYMBOL_GPL(system_long_wq);
 EXPORT_SYMBOL_GPL(system_nrt_wq);
 EXPORT_SYMBOL_GPL(system_unbound_wq);
@@ -3929,6 +3931,7 @@ static int __init init_workqueues(void)
 	}
 
 	system_wq = alloc_workqueue("events", 0, 0);
+	system_highpri_wq = alloc_workqueue("events_highpri", WQ_HIGHPRI, 0);
 	system_long_wq = alloc_workqueue("events_long", 0, 0);
 	system_nrt_wq = alloc_workqueue("events_nrt", WQ_NON_REENTRANT, 0);
 	system_unbound_wq = alloc_workqueue("events_unbound", WQ_UNBOUND,
@@ -3937,8 +3940,8 @@ static int __init init_workqueues(void)
 					      WQ_FREEZABLE, 0);
 	system_nrt_freezable_wq = alloc_workqueue("events_nrt_freezable",
 			WQ_NON_REENTRANT | WQ_FREEZABLE, 0);
-	BUG_ON(!system_wq || !system_long_wq || !system_nrt_wq ||
-	       !system_unbound_wq || !system_freezable_wq ||
+	BUG_ON(!system_wq || !system_highpri_wq || !system_long_wq ||
+		!system_nrt_wq || !system_unbound_wq || !system_freezable_wq ||
 		!system_nrt_freezable_wq);
 	return 0;
 }
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v3 5/6] workqueue: use system_highpri_wq for highpri workers in rebind_workers()
  2012-08-15 14:25 ` [PATCH v3 0/6] system_highpri_wq Joonsoo Kim
                     ` (3 preceding siblings ...)
  2012-08-15 14:25   ` [PATCH v3 4/6] workqueue: introduce system_highpri_wq Joonsoo Kim
@ 2012-08-15 14:25   ` Joonsoo Kim
  2012-08-15 14:25   ` [PATCH v3 6/6] workqueue: use system_highpri_wq for unbind_work Joonsoo Kim
  2012-08-16 21:22   ` [PATCH v3 0/6] system_highpri_wq Tejun Heo
  6 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-15 14:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

In rebind_workers(), we do inserting a work to rebind to cpu for busy workers.
Currently, in this case, we use only system_wq. This makes a possible
error situation as there is mismatch between cwq->pool and worker->pool.

To prevent this, we should use system_highpri_wq for highpri worker
to match theses. This implements it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index a768ffd..2945734 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1742,6 +1742,7 @@ retry:
 	/* rebind busy workers */
 	for_each_busy_worker(worker, i, pos, gcwq) {
 		struct work_struct *rebind_work = &worker->rebind_work;
+		struct workqueue_struct *wq;
 
 		/* morph UNBOUND to REBIND */
 		worker->flags &= ~WORKER_UNBOUND;
@@ -1751,11 +1752,18 @@ retry:
 				     work_data_bits(rebind_work)))
 			continue;
 
-		/* wq doesn't matter, use the default one */
 		debug_work_activate(rebind_work);
-		insert_work(get_cwq(gcwq->cpu, system_wq), rebind_work,
-			    worker->scheduled.next,
-			    work_color_to_flags(WORK_NO_COLOR));
+		/*
+		 * Now, we have two types of worker based on priority.
+		 * Therefore, each should choose the correct wq.
+		 */
+		if (worker_pool_pri(worker->pool))
+			wq = system_highpri_wq;
+		else
+			wq = system_wq;
+		insert_work(get_cwq(gcwq->cpu, wq), rebind_work,
+			worker->scheduled.next,
+			work_color_to_flags(WORK_NO_COLOR));
 	}
 }
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v3 6/6] workqueue: use system_highpri_wq for unbind_work
  2012-08-15 14:25 ` [PATCH v3 0/6] system_highpri_wq Joonsoo Kim
                     ` (4 preceding siblings ...)
  2012-08-15 14:25   ` [PATCH v3 5/6] workqueue: use system_highpri_wq for highpri workers in rebind_workers() Joonsoo Kim
@ 2012-08-15 14:25   ` Joonsoo Kim
  2012-08-16 21:22   ` [PATCH v3 0/6] system_highpri_wq Tejun Heo
  6 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-15 14:25 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

To speed cpu down processing up, use system_highpri_wq.
As scheduling priority of workers on it is higher than system_wq and
it is not contended by other normal works on this cpu, work on it
is processed faster than system_wq.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 2945734..e17326e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3679,7 +3679,7 @@ static int __devinit workqueue_cpu_down_callback(struct notifier_block *nfb,
 	case CPU_DOWN_PREPARE:
 		/* unbinding should happen on the local CPU */
 		INIT_WORK_ONSTACK(&unbind_work, gcwq_unbind_fn);
-		schedule_work_on(cpu, &unbind_work);
+		queue_work_on(cpu, system_highpri_wq, &unbind_work);
 		flush_work(&unbind_work);
 		break;
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* Re: [PATCH v3 0/6] system_highpri_wq
  2012-08-15 14:25 ` [PATCH v3 0/6] system_highpri_wq Joonsoo Kim
                     ` (5 preceding siblings ...)
  2012-08-15 14:25   ` [PATCH v3 6/6] workqueue: use system_highpri_wq for unbind_work Joonsoo Kim
@ 2012-08-16 21:22   ` Tejun Heo
  2012-08-17 13:38     ` JoonSoo Kim
  6 siblings, 1 reply; 265+ messages in thread
From: Tejun Heo @ 2012-08-16 21:22 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: linux-kernel

On Wed, Aug 15, 2012 at 11:25:35PM +0900, Joonsoo Kim wrote:
> Change from v2
> [1/6] No change
> [2/6] Change local variable name and use it directly for TP
> [3/6] Add a comment.
> [4/6] No change
> [5/6] Add a comment. Fix ugly indentation.
> [6/6] No change
> 
> Change from v1
> [ALL] Add cover-letter
> [ALL] Rebase on git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-3.7
> 
> [1/6] No change
> [2/6] First added into this series.
> [3/6] Add change log to clarify what pacth do,
>       Add logic to handle the case "@cpu is WORK_CPU_UNBOUND"
> [4/6] Fix awsome mistake: Set WQ_HIGHPRI for system_highpri_wq
> [5/6] Adopt Tejun's comment about selection system_wq
> [6/6] No change
> 
> This patchset introduce system_highpri_wq
> in order to use proper cwq for highpri worker.
> 
> First 3 patches are not related to that purpose.
> Just fix arbitrary issues.
> Last 3 patches are for our purpose.

Applied to wq/for-3.7 w/ minor updates.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v3 0/6] system_highpri_wq
  2012-08-16 21:22   ` [PATCH v3 0/6] system_highpri_wq Tejun Heo
@ 2012-08-17 13:38     ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-08-17 13:38 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

2012/8/17 Tejun Heo <tj@kernel.org>:
> On Wed, Aug 15, 2012 at 11:25:35PM +0900, Joonsoo Kim wrote:
>> Change from v2
>> [1/6] No change
>> [2/6] Change local variable name and use it directly for TP
>> [3/6] Add a comment.
>> [4/6] No change
>> [5/6] Add a comment. Fix ugly indentation.
>> [6/6] No change
>>
>> Change from v1
>> [ALL] Add cover-letter
>> [ALL] Rebase on git://git.kernel.org/pub/scm/linux/kernel/git/tj/wq.git for-3.7
>>
>> [1/6] No change
>> [2/6] First added into this series.
>> [3/6] Add change log to clarify what pacth do,
>>       Add logic to handle the case "@cpu is WORK_CPU_UNBOUND"
>> [4/6] Fix awsome mistake: Set WQ_HIGHPRI for system_highpri_wq
>> [5/6] Adopt Tejun's comment about selection system_wq
>> [6/6] No change
>>
>> This patchset introduce system_highpri_wq
>> in order to use proper cwq for highpri worker.
>>
>> First 3 patches are not related to that purpose.
>> Just fix arbitrary issues.
>> Last 3 patches are for our purpose.
>
> Applied to wq/for-3.7 w/ minor updates.
>
> Thanks.
>
> --
> tejun

Hello.
I checked your minor update and it looks better.
Sorry for my poor explanation for patches.

Thank you very much.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH 1/2] slub: rename cpu_partial to max_cpu_object
       [not found] <Yes>
@ 2012-08-24 16:05   ` Joonsoo Kim
  2010-07-05 12:41 ` [PATCH 01/09] cifs: add kernel config option for CIFS Client caching support Suresh Jayaraman
                     ` (22 subsequent siblings)
  23 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-24 16:05 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter

cpu_partial of kmem_cache struct is a bit awkward.

It means the maximum number of objects kept in the per cpu slab
and cpu partial lists of a processor. However, current name
seems to represent objects kept in the cpu partial lists only.
So, this patch renames it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index df448ad..9130e6b 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -84,7 +84,7 @@ struct kmem_cache {
 	int size;		/* The size of an object including meta data */
 	int object_size;	/* The size of an object without meta data */
 	int offset;		/* Free pointer offset. */
-	int cpu_partial;	/* Number of per cpu partial objects to keep around */
+	int max_cpu_object;	/* Number of per cpu objects to keep around */
 	struct kmem_cache_order_objects oo;
 
 	/* Allocation and freeing of slabs */
diff --git a/mm/slub.c b/mm/slub.c
index c67bd0a..d597530 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1565,7 +1565,7 @@ static void *get_partial_node(struct kmem_cache *s,
 			available = put_cpu_partial(s, page, 0);
 			stat(s, CPU_PARTIAL_NODE);
 		}
-		if (kmem_cache_debug(s) || available > s->cpu_partial / 2)
+		if (kmem_cache_debug(s) || available > s->max_cpu_object / 2)
 			break;
 
 	}
@@ -1953,7 +1953,7 @@ int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 		if (oldpage) {
 			pobjects = oldpage->pobjects;
 			pages = oldpage->pages;
-			if (drain && pobjects > s->cpu_partial) {
+			if (drain && pobjects > s->max_cpu_object) {
 				unsigned long flags;
 				/*
 				 * partial array is full. Move the existing
@@ -3073,8 +3073,8 @@ static int kmem_cache_open(struct kmem_cache *s,
 	set_min_partial(s, ilog2(s->size) / 2);
 
 	/*
-	 * cpu_partial determined the maximum number of objects kept in the
-	 * per cpu partial lists of a processor.
+	 * max_cpu_object determined the maximum number of objects kept in the
+	 * per cpu slab and cpu partial lists of a processor.
 	 *
 	 * Per cpu partial lists mainly contain slabs that just have one
 	 * object freed. If they are used for allocation then they can be
@@ -3085,20 +3085,20 @@ static int kmem_cache_open(struct kmem_cache *s,
 	 *
 	 * A) The number of objects from per cpu partial slabs dumped to the
 	 *    per node list when we reach the limit.
-	 * B) The number of objects in cpu partial slabs to extract from the
-	 *    per node list when we run out of per cpu objects. We only fetch 50%
-	 *    to keep some capacity around for frees.
+	 * B) The number of objects in cpu slab and cpu partial lists to
+	 *    extract from the per node list when we run out of per cpu objects.
+	 *    We only fetch 50% to keep some capacity around for frees.
 	 */
 	if (kmem_cache_debug(s))
-		s->cpu_partial = 0;
+		s->max_cpu_object = 0;
 	else if (s->size >= PAGE_SIZE)
-		s->cpu_partial = 2;
+		s->max_cpu_object = 2;
 	else if (s->size >= 1024)
-		s->cpu_partial = 6;
+		s->max_cpu_object = 6;
 	else if (s->size >= 256)
-		s->cpu_partial = 13;
+		s->max_cpu_object = 13;
 	else
-		s->cpu_partial = 30;
+		s->max_cpu_object = 30;
 
 	s->refcount = 1;
 #ifdef CONFIG_NUMA
@@ -4677,12 +4677,12 @@ static ssize_t min_partial_store(struct kmem_cache *s, const char *buf,
 }
 SLAB_ATTR(min_partial);
 
-static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
+static ssize_t max_cpu_object_show(struct kmem_cache *s, char *buf)
 {
-	return sprintf(buf, "%u\n", s->cpu_partial);
+	return sprintf(buf, "%u\n", s->max_cpu_object);
 }
 
-static ssize_t cpu_partial_store(struct kmem_cache *s, const char *buf,
+static ssize_t max_cpu_object_store(struct kmem_cache *s, const char *buf,
 				 size_t length)
 {
 	unsigned long objects;
@@ -4694,11 +4694,11 @@ static ssize_t cpu_partial_store(struct kmem_cache *s, const char *buf,
 	if (objects && kmem_cache_debug(s))
 		return -EINVAL;
 
-	s->cpu_partial = objects;
+	s->max_cpu_object = objects;
 	flush_all(s);
 	return length;
 }
-SLAB_ATTR(cpu_partial);
+SLAB_ATTR(max_cpu_object);
 
 static ssize_t ctor_show(struct kmem_cache *s, char *buf)
 {
@@ -5103,7 +5103,7 @@ static struct attribute *slab_attrs[] = {
 	&objs_per_slab_attr.attr,
 	&order_attr.attr,
 	&min_partial_attr.attr,
-	&cpu_partial_attr.attr,
+	&max_cpu_object_attr.attr,
 	&objects_attr.attr,
 	&objects_partial_attr.attr,
 	&partial_attr.attr,
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 1/2] slub: rename cpu_partial to max_cpu_object
@ 2012-08-24 16:05   ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-24 16:05 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter

cpu_partial of kmem_cache struct is a bit awkward.

It means the maximum number of objects kept in the per cpu slab
and cpu partial lists of a processor. However, current name
seems to represent objects kept in the cpu partial lists only.
So, this patch renames it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index df448ad..9130e6b 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -84,7 +84,7 @@ struct kmem_cache {
 	int size;		/* The size of an object including meta data */
 	int object_size;	/* The size of an object without meta data */
 	int offset;		/* Free pointer offset. */
-	int cpu_partial;	/* Number of per cpu partial objects to keep around */
+	int max_cpu_object;	/* Number of per cpu objects to keep around */
 	struct kmem_cache_order_objects oo;
 
 	/* Allocation and freeing of slabs */
diff --git a/mm/slub.c b/mm/slub.c
index c67bd0a..d597530 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1565,7 +1565,7 @@ static void *get_partial_node(struct kmem_cache *s,
 			available = put_cpu_partial(s, page, 0);
 			stat(s, CPU_PARTIAL_NODE);
 		}
-		if (kmem_cache_debug(s) || available > s->cpu_partial / 2)
+		if (kmem_cache_debug(s) || available > s->max_cpu_object / 2)
 			break;
 
 	}
@@ -1953,7 +1953,7 @@ int put_cpu_partial(struct kmem_cache *s, struct page *page, int drain)
 		if (oldpage) {
 			pobjects = oldpage->pobjects;
 			pages = oldpage->pages;
-			if (drain && pobjects > s->cpu_partial) {
+			if (drain && pobjects > s->max_cpu_object) {
 				unsigned long flags;
 				/*
 				 * partial array is full. Move the existing
@@ -3073,8 +3073,8 @@ static int kmem_cache_open(struct kmem_cache *s,
 	set_min_partial(s, ilog2(s->size) / 2);
 
 	/*
-	 * cpu_partial determined the maximum number of objects kept in the
-	 * per cpu partial lists of a processor.
+	 * max_cpu_object determined the maximum number of objects kept in the
+	 * per cpu slab and cpu partial lists of a processor.
 	 *
 	 * Per cpu partial lists mainly contain slabs that just have one
 	 * object freed. If they are used for allocation then they can be
@@ -3085,20 +3085,20 @@ static int kmem_cache_open(struct kmem_cache *s,
 	 *
 	 * A) The number of objects from per cpu partial slabs dumped to the
 	 *    per node list when we reach the limit.
-	 * B) The number of objects in cpu partial slabs to extract from the
-	 *    per node list when we run out of per cpu objects. We only fetch 50%
-	 *    to keep some capacity around for frees.
+	 * B) The number of objects in cpu slab and cpu partial lists to
+	 *    extract from the per node list when we run out of per cpu objects.
+	 *    We only fetch 50% to keep some capacity around for frees.
 	 */
 	if (kmem_cache_debug(s))
-		s->cpu_partial = 0;
+		s->max_cpu_object = 0;
 	else if (s->size >= PAGE_SIZE)
-		s->cpu_partial = 2;
+		s->max_cpu_object = 2;
 	else if (s->size >= 1024)
-		s->cpu_partial = 6;
+		s->max_cpu_object = 6;
 	else if (s->size >= 256)
-		s->cpu_partial = 13;
+		s->max_cpu_object = 13;
 	else
-		s->cpu_partial = 30;
+		s->max_cpu_object = 30;
 
 	s->refcount = 1;
 #ifdef CONFIG_NUMA
@@ -4677,12 +4677,12 @@ static ssize_t min_partial_store(struct kmem_cache *s, const char *buf,
 }
 SLAB_ATTR(min_partial);
 
-static ssize_t cpu_partial_show(struct kmem_cache *s, char *buf)
+static ssize_t max_cpu_object_show(struct kmem_cache *s, char *buf)
 {
-	return sprintf(buf, "%u\n", s->cpu_partial);
+	return sprintf(buf, "%u\n", s->max_cpu_object);
 }
 
-static ssize_t cpu_partial_store(struct kmem_cache *s, const char *buf,
+static ssize_t max_cpu_object_store(struct kmem_cache *s, const char *buf,
 				 size_t length)
 {
 	unsigned long objects;
@@ -4694,11 +4694,11 @@ static ssize_t cpu_partial_store(struct kmem_cache *s, const char *buf,
 	if (objects && kmem_cache_debug(s))
 		return -EINVAL;
 
-	s->cpu_partial = objects;
+	s->max_cpu_object = objects;
 	flush_all(s);
 	return length;
 }
-SLAB_ATTR(cpu_partial);
+SLAB_ATTR(max_cpu_object);
 
 static ssize_t ctor_show(struct kmem_cache *s, char *buf)
 {
@@ -5103,7 +5103,7 @@ static struct attribute *slab_attrs[] = {
 	&objs_per_slab_attr.attr,
 	&order_attr.attr,
 	&min_partial_attr.attr,
-	&cpu_partial_attr.attr,
+	&max_cpu_object_attr.attr,
 	&objects_attr.attr,
 	&objects_partial_attr.attr,
 	&partial_attr.attr,
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 2/2] slub: correct the calculation of the number of cpu objects in get_partial_node
  2012-08-24 16:05   ` Joonsoo Kim
@ 2012-08-24 16:05     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-24 16:05 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter

In get_partial_node(), we want to refill cpu slab and cpu partial slabs
until the number of object kept in the per cpu slab and cpu partial lists
of a processor is reached to max_cpu_object.

However, in current implementation, it is not achieved.
See following code in get_partial_node().

if (!object) {
	c->page = page;
	stat(s, ALLOC_FROM_PARTIAL);
	object = t;
	available =  page->objects - page->inuse;
} else {
	available = put_cpu_partial(s, page, 0);
	stat(s, CPU_PARTIAL_NODE);
}
if (kmem_cache_debug(s) || available > s->cpu_partial / 2)
	break;

In case of !object (available = page->objects - page->inuse),
"available" means the number of objects in cpu slab.
In this time, we don't have any cpu partial slab, so "available" imply
the number of objects available to the cpu without locking.
This is what we want.

But, look at another "available" (available = put_cpu_partial(s, page, 0)).
This "available" doesn't include the number of objects in cpu slab.
It only include the number of objects in cpu partial slabs.
So, it doesn't imply the number of objects available to the cpu without locking.
This isn't what we want.

Therefore fix it to imply same meaning in both case
and rename "available" to "cpu_slab_objects" for readability.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>

diff --git a/mm/slub.c b/mm/slub.c
index d597530..c96e0e4 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1538,6 +1538,7 @@ static void *get_partial_node(struct kmem_cache *s,
 {
 	struct page *page, *page2;
 	void *object = NULL;
+	int cpu_slab_objects = 0, pobjects = 0;
 
 	/*
 	 * Racy check. If we mistakenly see no partial slabs then we
@@ -1551,7 +1552,6 @@ static void *get_partial_node(struct kmem_cache *s,
 	spin_lock(&n->list_lock);
 	list_for_each_entry_safe(page, page2, &n->partial, lru) {
 		void *t = acquire_slab(s, n, page, object == NULL);
-		int available;
 
 		if (!t)
 			break;
@@ -1560,12 +1560,13 @@ static void *get_partial_node(struct kmem_cache *s,
 			c->page = page;
 			stat(s, ALLOC_FROM_PARTIAL);
 			object = t;
-			available =  page->objects - page->inuse;
+			cpu_slab_objects = page->objects - page->inuse;
 		} else {
-			available = put_cpu_partial(s, page, 0);
+			pobjects = put_cpu_partial(s, page, 0);
 			stat(s, CPU_PARTIAL_NODE);
 		}
-		if (kmem_cache_debug(s) || available > s->max_cpu_object / 2)
+		if (kmem_cache_debug(s)
+			|| cpu_slab_objects + pobjects > s->max_cpu_object / 2)
 			break;
 
 	}
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 2/2] slub: correct the calculation of the number of cpu objects in get_partial_node
@ 2012-08-24 16:05     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-24 16:05 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter

In get_partial_node(), we want to refill cpu slab and cpu partial slabs
until the number of object kept in the per cpu slab and cpu partial lists
of a processor is reached to max_cpu_object.

However, in current implementation, it is not achieved.
See following code in get_partial_node().

if (!object) {
	c->page = page;
	stat(s, ALLOC_FROM_PARTIAL);
	object = t;
	available =  page->objects - page->inuse;
} else {
	available = put_cpu_partial(s, page, 0);
	stat(s, CPU_PARTIAL_NODE);
}
if (kmem_cache_debug(s) || available > s->cpu_partial / 2)
	break;

In case of !object (available = page->objects - page->inuse),
"available" means the number of objects in cpu slab.
In this time, we don't have any cpu partial slab, so "available" imply
the number of objects available to the cpu without locking.
This is what we want.

But, look at another "available" (available = put_cpu_partial(s, page, 0)).
This "available" doesn't include the number of objects in cpu slab.
It only include the number of objects in cpu partial slabs.
So, it doesn't imply the number of objects available to the cpu without locking.
This isn't what we want.

Therefore fix it to imply same meaning in both case
and rename "available" to "cpu_slab_objects" for readability.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Christoph Lameter <cl@linux-foundation.org>

diff --git a/mm/slub.c b/mm/slub.c
index d597530..c96e0e4 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1538,6 +1538,7 @@ static void *get_partial_node(struct kmem_cache *s,
 {
 	struct page *page, *page2;
 	void *object = NULL;
+	int cpu_slab_objects = 0, pobjects = 0;
 
 	/*
 	 * Racy check. If we mistakenly see no partial slabs then we
@@ -1551,7 +1552,6 @@ static void *get_partial_node(struct kmem_cache *s,
 	spin_lock(&n->list_lock);
 	list_for_each_entry_safe(page, page2, &n->partial, lru) {
 		void *t = acquire_slab(s, n, page, object == NULL);
-		int available;
 
 		if (!t)
 			break;
@@ -1560,12 +1560,13 @@ static void *get_partial_node(struct kmem_cache *s,
 			c->page = page;
 			stat(s, ALLOC_FROM_PARTIAL);
 			object = t;
-			available =  page->objects - page->inuse;
+			cpu_slab_objects = page->objects - page->inuse;
 		} else {
-			available = put_cpu_partial(s, page, 0);
+			pobjects = put_cpu_partial(s, page, 0);
 			stat(s, CPU_PARTIAL_NODE);
 		}
-		if (kmem_cache_debug(s) || available > s->max_cpu_object / 2)
+		if (kmem_cache_debug(s)
+			|| cpu_slab_objects + pobjects > s->max_cpu_object / 2)
 			break;
 
 	}
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* Re: [PATCH 1/2] slub: rename cpu_partial to max_cpu_object
  2012-08-24 16:05   ` Joonsoo Kim
@ 2012-08-24 16:12     ` Christoph Lameter
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-08-24 16:12 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Pekka Enberg, linux-kernel, linux-mm

On Sat, 25 Aug 2012, Joonsoo Kim wrote:

> cpu_partial of kmem_cache struct is a bit awkward.

Acked-by: Christoph Lameter <cl@linux.com>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 1/2] slub: rename cpu_partial to max_cpu_object
@ 2012-08-24 16:12     ` Christoph Lameter
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-08-24 16:12 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Pekka Enberg, linux-kernel, linux-mm

On Sat, 25 Aug 2012, Joonsoo Kim wrote:

> cpu_partial of kmem_cache struct is a bit awkward.

Acked-by: Christoph Lameter <cl@linux.com>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/2] slub: correct the calculation of the number of cpu objects in get_partial_node
  2012-08-24 16:05     ` Joonsoo Kim
@ 2012-08-24 16:15       ` Christoph Lameter
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-08-24 16:15 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Pekka Enberg, linux-kernel, linux-mm

On Sat, 25 Aug 2012, Joonsoo Kim wrote:

> index d597530..c96e0e4 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1538,6 +1538,7 @@ static void *get_partial_node(struct kmem_cache *s,
>  {
>  	struct page *page, *page2;
>  	void *object = NULL;
> +	int cpu_slab_objects = 0, pobjects = 0;

We really need be clear here.

One counter is for the numbe of objects in the per cpu slab and the other
for the objects in tbhe per cpu partial lists.

So I think the first name is ok. Second should be similar

cpu_partial_objects?



^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/2] slub: correct the calculation of the number of cpu objects in get_partial_node
@ 2012-08-24 16:15       ` Christoph Lameter
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-08-24 16:15 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Pekka Enberg, linux-kernel, linux-mm

On Sat, 25 Aug 2012, Joonsoo Kim wrote:

> index d597530..c96e0e4 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -1538,6 +1538,7 @@ static void *get_partial_node(struct kmem_cache *s,
>  {
>  	struct page *page, *page2;
>  	void *object = NULL;
> +	int cpu_slab_objects = 0, pobjects = 0;

We really need be clear here.

One counter is for the numbe of objects in the per cpu slab and the other
for the objects in tbhe per cpu partial lists.

So I think the first name is ok. Second should be similar

cpu_partial_objects?


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/2] slub: correct the calculation of the number of cpu objects in get_partial_node
  2012-08-24 16:15       ` Christoph Lameter
@ 2012-08-24 16:28         ` JoonSoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-08-24 16:28 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, linux-kernel, linux-mm

2012/8/25 Christoph Lameter <cl@linux.com>:
> On Sat, 25 Aug 2012, Joonsoo Kim wrote:
>
>> index d597530..c96e0e4 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1538,6 +1538,7 @@ static void *get_partial_node(struct kmem_cache *s,
>>  {
>>       struct page *page, *page2;
>>       void *object = NULL;
>> +     int cpu_slab_objects = 0, pobjects = 0;
>
> We really need be clear here.
>
> One counter is for the numbe of objects in the per cpu slab and the other
> for the objects in tbhe per cpu partial lists.
>
> So I think the first name is ok. Second should be similar
>
> cpu_partial_objects?
>

Okay! It looks good.
But, when using "cpu_partial_objects", I have a coding style problem.

                if (kmem_cache_debug(s)
                        || cpu_slab_objects + cpu_partial_objects
                                                > s->max_cpu_object / 2)

Do you have any good idea?

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/2] slub: correct the calculation of the number of cpu objects in get_partial_node
@ 2012-08-24 16:28         ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-08-24 16:28 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, linux-kernel, linux-mm

2012/8/25 Christoph Lameter <cl@linux.com>:
> On Sat, 25 Aug 2012, Joonsoo Kim wrote:
>
>> index d597530..c96e0e4 100644
>> --- a/mm/slub.c
>> +++ b/mm/slub.c
>> @@ -1538,6 +1538,7 @@ static void *get_partial_node(struct kmem_cache *s,
>>  {
>>       struct page *page, *page2;
>>       void *object = NULL;
>> +     int cpu_slab_objects = 0, pobjects = 0;
>
> We really need be clear here.
>
> One counter is for the numbe of objects in the per cpu slab and the other
> for the objects in tbhe per cpu partial lists.
>
> So I think the first name is ok. Second should be similar
>
> cpu_partial_objects?
>

Okay! It looks good.
But, when using "cpu_partial_objects", I have a coding style problem.

                if (kmem_cache_debug(s)
                        || cpu_slab_objects + cpu_partial_objects
                                                > s->max_cpu_object / 2)

Do you have any good idea?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/2] slub: correct the calculation of the number of cpu objects in get_partial_node
  2012-08-24 16:28         ` JoonSoo Kim
@ 2012-08-24 16:31           ` Christoph Lameter
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-08-24 16:31 UTC (permalink / raw)
  To: JoonSoo Kim; +Cc: Pekka Enberg, linux-kernel, linux-mm

On Sat, 25 Aug 2012, JoonSoo Kim wrote:

> But, when using "cpu_partial_objects", I have a coding style problem.
>
>                 if (kmem_cache_debug(s)
>                         || cpu_slab_objects + cpu_partial_objects
>                                                 > s->max_cpu_object / 2)
>
> Do you have any good idea?

Not sure what the problem is? The line wrap?

Reduce the tabs for the third line?


^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/2] slub: correct the calculation of the number of cpu objects in get_partial_node
@ 2012-08-24 16:31           ` Christoph Lameter
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-08-24 16:31 UTC (permalink / raw)
  To: JoonSoo Kim; +Cc: Pekka Enberg, linux-kernel, linux-mm

On Sat, 25 Aug 2012, JoonSoo Kim wrote:

> But, when using "cpu_partial_objects", I have a coding style problem.
>
>                 if (kmem_cache_debug(s)
>                         || cpu_slab_objects + cpu_partial_objects
>                                                 > s->max_cpu_object / 2)
>
> Do you have any good idea?

Not sure what the problem is? The line wrap?

Reduce the tabs for the third line?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/2] slub: correct the calculation of the number of cpu objects in get_partial_node
  2012-08-24 16:31           ` Christoph Lameter
@ 2012-08-24 16:40             ` JoonSoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-08-24 16:40 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, linux-kernel, linux-mm

2012/8/25 Christoph Lameter <cl@linux.com>:
> On Sat, 25 Aug 2012, JoonSoo Kim wrote:
>
>> But, when using "cpu_partial_objects", I have a coding style problem.
>>
>>                 if (kmem_cache_debug(s)
>>                         || cpu_slab_objects + cpu_partial_objects
>>                                                 > s->max_cpu_object / 2)
>>
>> Do you have any good idea?
>
> Not sure what the problem is? The line wrap?

Yes! The line wrap.


                if (kmem_cache_debug(s)
                || cpu_slab_objects + cpu_partial_objects >
s->max_cpu_object / 2)
                        break;

Above example use 82 columns... The line wrapping problem.

                if (kmem_cache_debug(s) ||
                cpu_slab_objects + cpu_partial_objects > s->max_cpu_object / 2)
                        break;

This one use 79 columns, but somehow ugly
because second line start at same column of above line.
Is it okay?


                if (kmem_cache_debug(s)
                        || cpu_slab_objects + cpu_partial_objects
                                                > s->max_cpu_object / 2)
                        break;

Is it the best?
It use 72 columns.
Let me know what is the best method for this situation.

Thanks!

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/2] slub: correct the calculation of the number of cpu objects in get_partial_node
@ 2012-08-24 16:40             ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-08-24 16:40 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, linux-kernel, linux-mm

2012/8/25 Christoph Lameter <cl@linux.com>:
> On Sat, 25 Aug 2012, JoonSoo Kim wrote:
>
>> But, when using "cpu_partial_objects", I have a coding style problem.
>>
>>                 if (kmem_cache_debug(s)
>>                         || cpu_slab_objects + cpu_partial_objects
>>                                                 > s->max_cpu_object / 2)
>>
>> Do you have any good idea?
>
> Not sure what the problem is? The line wrap?

Yes! The line wrap.


                if (kmem_cache_debug(s)
                || cpu_slab_objects + cpu_partial_objects >
s->max_cpu_object / 2)
                        break;

Above example use 82 columns... The line wrapping problem.

                if (kmem_cache_debug(s) ||
                cpu_slab_objects + cpu_partial_objects > s->max_cpu_object / 2)
                        break;

This one use 79 columns, but somehow ugly
because second line start at same column of above line.
Is it okay?


                if (kmem_cache_debug(s)
                        || cpu_slab_objects + cpu_partial_objects
                                                > s->max_cpu_object / 2)
                        break;

Is it the best?
It use 72 columns.
Let me know what is the best method for this situation.

Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH 1/2] slab:  do ClearSlabPfmemalloc() for all pages of slab
       [not found] <Yes>
@ 2012-08-25 14:11   ` Joonsoo Kim
  2010-07-05 12:41 ` [PATCH 01/09] cifs: add kernel config option for CIFS Client caching support Suresh Jayaraman
                     ` (22 subsequent siblings)
  23 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-25 14:11 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman, Christoph Lameter

Now, we just do ClearSlabPfmemalloc() for first page of slab
when we clear SlabPfmemalloc flag. It is a problem because we sometimes
test flag of page which is not first page of slab in __ac_put_obj().

So add code to do ClearSlabPfmemalloc for all pages of slab.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux-foundation.org>
---
This patch based on Pekka's slab/next tree

diff --git a/mm/slab.c b/mm/slab.c
index 3b4587b..45cf59a 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -992,8 +992,11 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
 		 */
 		l3 = cachep->nodelists[numa_mem_id()];
 		if (!list_empty(&l3->slabs_free) && force_refill) {
-			struct slab *slabp = virt_to_slab(objp);
-			ClearPageSlabPfmemalloc(virt_to_page(slabp->s_mem));
+			int i, nr_pages = (1 << cachep->gfporder);
+			struct page *page = virt_to_head_page(objp);
+
+			for (i = 0; i < nr_pages; i++)
+				ClearPageSlabPfmemalloc(page + i);
 			clear_obj_pfmemalloc(&objp);
 			recheck_pfmemalloc_active(cachep, ac);
 			return objp;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 1/2] slab:  do ClearSlabPfmemalloc() for all pages of slab
@ 2012-08-25 14:11   ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-25 14:11 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman, Christoph Lameter

Now, we just do ClearSlabPfmemalloc() for first page of slab
when we clear SlabPfmemalloc flag. It is a problem because we sometimes
test flag of page which is not first page of slab in __ac_put_obj().

So add code to do ClearSlabPfmemalloc for all pages of slab.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux-foundation.org>
---
This patch based on Pekka's slab/next tree

diff --git a/mm/slab.c b/mm/slab.c
index 3b4587b..45cf59a 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -992,8 +992,11 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
 		 */
 		l3 = cachep->nodelists[numa_mem_id()];
 		if (!list_empty(&l3->slabs_free) && force_refill) {
-			struct slab *slabp = virt_to_slab(objp);
-			ClearPageSlabPfmemalloc(virt_to_page(slabp->s_mem));
+			int i, nr_pages = (1 << cachep->gfporder);
+			struct page *page = virt_to_head_page(objp);
+
+			for (i = 0; i < nr_pages; i++)
+				ClearPageSlabPfmemalloc(page + i);
 			clear_obj_pfmemalloc(&objp);
 			recheck_pfmemalloc_active(cachep, ac);
 			return objp;
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 2/2] slab: fix starting index for finding another object
  2012-08-25 14:11   ` Joonsoo Kim
@ 2012-08-25 14:11     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-25 14:11 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman, Christoph Lameter

In array cache, there is a object at index 0.
So fix it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux-foundation.org>

diff --git a/mm/slab.c b/mm/slab.c
index 45cf59a..eb74bf5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -976,7 +976,7 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
 		}
 
 		/* The caller cannot use PFMEMALLOC objects, find another one */
-		for (i = 1; i < ac->avail; i++) {
+		for (i = 0; i < ac->avail; i++) {
 			/* If a !PFMEMALLOC object is found, swap them */
 			if (!is_obj_pfmemalloc(ac->entry[i])) {
 				objp = ac->entry[i];
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 2/2] slab: fix starting index for finding another object
@ 2012-08-25 14:11     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-08-25 14:11 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman, Christoph Lameter

In array cache, there is a object at index 0.
So fix it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Mel Gorman <mgorman@suse.de>
Cc: Christoph Lameter <cl@linux-foundation.org>

diff --git a/mm/slab.c b/mm/slab.c
index 45cf59a..eb74bf5 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -976,7 +976,7 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
 		}
 
 		/* The caller cannot use PFMEMALLOC objects, find another one */
-		for (i = 1; i < ac->avail; i++) {
+		for (i = 0; i < ac->avail; i++) {
 			/* If a !PFMEMALLOC object is found, swap them */
 			if (!is_obj_pfmemalloc(ac->entry[i])) {
 				objp = ac->entry[i];
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* Re: [PATCH 1/2] slab:  do ClearSlabPfmemalloc() for all pages of slab
  2012-08-25 14:11   ` Joonsoo Kim
@ 2012-09-03 10:08     ` Mel Gorman
  -1 siblings, 0 replies; 265+ messages in thread
From: Mel Gorman @ 2012-09-03 10:08 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Pekka Enberg, linux-kernel, linux-mm, Christoph Lameter

It took me a while to getting around to reviewing this due to attending
kernel summit. Sorry about that.

On Sat, Aug 25, 2012 at 11:11:10PM +0900, Joonsoo Kim wrote:
> Now, we just do ClearSlabPfmemalloc() for first page of slab
> when we clear SlabPfmemalloc flag. It is a problem because we sometimes
> test flag of page which is not first page of slab in __ac_put_obj().
> 

Well spotted.

The impact is marginal as far as pfmemalloc protection is concerned. I do not
believe that any of the slabs that use high-order allocations are used in for
the swap-over-network paths. It would be unfortunate if that ever changed.

> So add code to do ClearSlabPfmemalloc for all pages of slab.
> 

I would prefer if the pfmemalloc information was kept on the head page.
Would the following patch also address your concerns?

diff --git a/mm/slab.c b/mm/slab.c
index 811af03..d34a903 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1000,7 +1000,7 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
 		l3 = cachep->nodelists[numa_mem_id()];
 		if (!list_empty(&l3->slabs_free) && force_refill) {
 			struct slab *slabp = virt_to_slab(objp);
-			ClearPageSlabPfmemalloc(virt_to_page(slabp->s_mem));
+			ClearPageSlabPfmemalloc(virt_to_head_page(slabp->s_mem));
 			clear_obj_pfmemalloc(&objp);
 			recheck_pfmemalloc_active(cachep, ac);
 			return objp;
@@ -1032,7 +1032,7 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
 {
 	if (unlikely(pfmemalloc_active)) {
 		/* Some pfmemalloc slabs exist, check if this is one */
-		struct page *page = virt_to_page(objp);
+		struct page *page = virt_to_head_page(objp);
 		if (PageSlabPfmemalloc(page))
 			set_obj_pfmemalloc(&objp);
 	}

-- 
Mel Gorman
SUSE Labs

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* Re: [PATCH 1/2] slab:  do ClearSlabPfmemalloc() for all pages of slab
@ 2012-09-03 10:08     ` Mel Gorman
  0 siblings, 0 replies; 265+ messages in thread
From: Mel Gorman @ 2012-09-03 10:08 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Pekka Enberg, linux-kernel, linux-mm, Christoph Lameter

It took me a while to getting around to reviewing this due to attending
kernel summit. Sorry about that.

On Sat, Aug 25, 2012 at 11:11:10PM +0900, Joonsoo Kim wrote:
> Now, we just do ClearSlabPfmemalloc() for first page of slab
> when we clear SlabPfmemalloc flag. It is a problem because we sometimes
> test flag of page which is not first page of slab in __ac_put_obj().
> 

Well spotted.

The impact is marginal as far as pfmemalloc protection is concerned. I do not
believe that any of the slabs that use high-order allocations are used in for
the swap-over-network paths. It would be unfortunate if that ever changed.

> So add code to do ClearSlabPfmemalloc for all pages of slab.
> 

I would prefer if the pfmemalloc information was kept on the head page.
Would the following patch also address your concerns?

diff --git a/mm/slab.c b/mm/slab.c
index 811af03..d34a903 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -1000,7 +1000,7 @@ static void *__ac_get_obj(struct kmem_cache *cachep, struct array_cache *ac,
 		l3 = cachep->nodelists[numa_mem_id()];
 		if (!list_empty(&l3->slabs_free) && force_refill) {
 			struct slab *slabp = virt_to_slab(objp);
-			ClearPageSlabPfmemalloc(virt_to_page(slabp->s_mem));
+			ClearPageSlabPfmemalloc(virt_to_head_page(slabp->s_mem));
 			clear_obj_pfmemalloc(&objp);
 			recheck_pfmemalloc_active(cachep, ac);
 			return objp;
@@ -1032,7 +1032,7 @@ static void *__ac_put_obj(struct kmem_cache *cachep, struct array_cache *ac,
 {
 	if (unlikely(pfmemalloc_active)) {
 		/* Some pfmemalloc slabs exist, check if this is one */
-		struct page *page = virt_to_page(objp);
+		struct page *page = virt_to_head_page(objp);
 		if (PageSlabPfmemalloc(page))
 			set_obj_pfmemalloc(&objp);
 	}

-- 
Mel Gorman
SUSE Labs

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 0/2] clean-up initialization of deferrable timer
       [not found] <Yes>
                   ` (18 preceding siblings ...)
  2012-08-25 14:11   ` Joonsoo Kim
@ 2012-10-18 23:18 ` Joonsoo Kim
  2012-10-18 23:18   ` [PATCH 1/2] timer: add setup_timer_deferrable() macro Joonsoo Kim
                     ` (2 more replies)
  2012-10-20 15:48   ` Joonsoo Kim
                   ` (3 subsequent siblings)
  23 siblings, 3 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-18 23:18 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, Joonsoo Kim

This patchset introduces setup_timer_deferrable() macro.
Using it makes code simple and understandable.

This patchset doesn't make any functional difference.
It is just for clean-up.

It is based on v3.7-rc1

Joonsoo Kim (2):
  timer: add setup_timer_deferrable() macro
  timer: use new setup_timer_deferrable() macro

 drivers/acpi/apei/ghes.c                     |    5 ++---
 drivers/ata/libata-core.c                    |    5 ++---
 drivers/net/ethernet/nvidia/forcedeth.c      |   13 ++++---------
 drivers/net/ethernet/qlogic/qlge/qlge_main.c |    4 +---
 drivers/net/vxlan.c                          |    5 ++---
 include/linux/timer.h                        |    2 ++
 kernel/workqueue.c                           |    6 ++----
 net/mac80211/agg-rx.c                        |   12 ++++++------
 net/mac80211/agg-tx.c                        |   12 ++++++------
 net/sched/cls_flow.c                         |    5 ++---
 net/sched/sch_sfq.c                          |    5 ++---
 11 files changed, 31 insertions(+), 43 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH 1/2] timer: add setup_timer_deferrable() macro
  2012-10-18 23:18 ` [PATCH 0/2] clean-up initialization of deferrable timer Joonsoo Kim
@ 2012-10-18 23:18   ` Joonsoo Kim
  2012-10-18 23:18   ` [PATCH 2/2] timer: use new " Joonsoo Kim
  2012-10-26 14:08   ` [PATCH 0/2] clean-up initialization of deferrable timer JoonSoo Kim
  2 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-18 23:18 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, Joonsoo Kim

There are some users of deferrable timer. Because of lacking of
handy initializer, they should initialize deferrable timer fumblingly.
We might do better with new setup_timer_deferrable() macro.
So add it.

Following patch will makes some users of init_timer_deferrable() use
this handy macro.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/include/linux/timer.h b/include/linux/timer.h
index 8c5a197..5950276 100644
--- a/include/linux/timer.h
+++ b/include/linux/timer.h
@@ -151,6 +151,8 @@ static inline void init_timer_on_stack_key(struct timer_list *timer,
 
 #define setup_timer(timer, fn, data)					\
 	__setup_timer((timer), (fn), (data), 0)
+#define setup_timer_deferrable(timer, fn, data)				\
+	__setup_timer((timer), (fn), (data), TIMER_DEFERRABLE)
 #define setup_timer_on_stack(timer, fn, data)				\
 	__setup_timer_on_stack((timer), (fn), (data), 0)
 #define setup_deferrable_timer_on_stack(timer, fn, data)		\
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 2/2] timer: use new setup_timer_deferrable() macro
  2012-10-18 23:18 ` [PATCH 0/2] clean-up initialization of deferrable timer Joonsoo Kim
  2012-10-18 23:18   ` [PATCH 1/2] timer: add setup_timer_deferrable() macro Joonsoo Kim
@ 2012-10-18 23:18   ` Joonsoo Kim
  2012-10-26 14:08   ` [PATCH 0/2] clean-up initialization of deferrable timer JoonSoo Kim
  2 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-18 23:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: linux-kernel, Joonsoo Kim, Len Brown, Jeff Garzik,
	Jitendra Kalsaria, Ron Mercer, Tejun Heo, Johannes Berg,
	John W. Linville, David S. Miller, Jamal Hadi Salim

Now, we have a handy macro for initializing deferrable timer.
Using it makes code clean and easy to understand.

Additionally, in some driver codes, use setup_timer() instead of init_timer().

This patch doesn't make any functional difference.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Cc: Len Brown <lenb@kernel.org>
Cc: Jeff Garzik <jgarzik@pobox.com>
Cc: Jitendra Kalsaria <jitendra.kalsaria@qlogic.com>
Cc: Ron Mercer <ron.mercer@qlogic.com>
Cc: Tejun Heo <tj@kernel.org>
Cc: Johannes Berg <johannes@sipsolutions.net>
Cc: "John W. Linville" <linville@tuxdriver.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jamal Hadi Salim <jhs@mojatatu.com>

diff --git a/drivers/acpi/apei/ghes.c b/drivers/acpi/apei/ghes.c
index 1599566..1707f9a 100644
--- a/drivers/acpi/apei/ghes.c
+++ b/drivers/acpi/apei/ghes.c
@@ -944,9 +944,8 @@ static int __devinit ghes_probe(struct platform_device *ghes_dev)
 	}
 	switch (generic->notify.type) {
 	case ACPI_HEST_NOTIFY_POLLED:
-		ghes->timer.function = ghes_poll_func;
-		ghes->timer.data = (unsigned long)ghes;
-		init_timer_deferrable(&ghes->timer);
+		setup_timer_deferrable(&ghes->timer, ghes_poll_func,
+							(unsigned long)ghes);
 		ghes_add_timer(ghes);
 		break;
 	case ACPI_HEST_NOTIFY_EXTERNAL:
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 3cc7096..a811e6f 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -5616,9 +5616,8 @@ struct ata_port *ata_port_alloc(struct ata_host *host)
 	INIT_LIST_HEAD(&ap->eh_done_q);
 	init_waitqueue_head(&ap->eh_wait_q);
 	init_completion(&ap->park_req_pending);
-	init_timer_deferrable(&ap->fastdrain_timer);
-	ap->fastdrain_timer.function = ata_eh_fastdrain_timerfn;
-	ap->fastdrain_timer.data = (unsigned long)ap;
+	setup_timer_deferrable(&ap->fastdrain_timer, ata_eh_fastdrain_timerfn,
+							(unsigned long)ap);
 
 	ap->cbl = ATA_CBL_NONE;
 
diff --git a/drivers/net/ethernet/nvidia/forcedeth.c b/drivers/net/ethernet/nvidia/forcedeth.c
index 876bece..987e2cf 100644
--- a/drivers/net/ethernet/nvidia/forcedeth.c
+++ b/drivers/net/ethernet/nvidia/forcedeth.c
@@ -5548,15 +5548,10 @@ static int __devinit nv_probe(struct pci_dev *pci_dev, const struct pci_device_i
 	spin_lock_init(&np->hwstats_lock);
 	SET_NETDEV_DEV(dev, &pci_dev->dev);
 
-	init_timer(&np->oom_kick);
-	np->oom_kick.data = (unsigned long) dev;
-	np->oom_kick.function = nv_do_rx_refill;	/* timer handler */
-	init_timer(&np->nic_poll);
-	np->nic_poll.data = (unsigned long) dev;
-	np->nic_poll.function = nv_do_nic_poll;	/* timer handler */
-	init_timer_deferrable(&np->stats_poll);
-	np->stats_poll.data = (unsigned long) dev;
-	np->stats_poll.function = nv_do_stats_poll;	/* timer handler */
+	setup_timer(&np->oom_kick, nv_do_rx_refill, (unsigned long)dev);
+	setup_timer(&np->nic_poll, nv_do_nic_poll, (unsigned long)dev);
+	setup_timer_deferrable(&np->stats_poll, nv_do_stats_poll,
+						(unsigned long)dev);
 
 	err = pci_enable_device(pci_dev);
 	if (err)
diff --git a/drivers/net/ethernet/qlogic/qlge/qlge_main.c b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
index b262d61..1687883 100644
--- a/drivers/net/ethernet/qlogic/qlge/qlge_main.c
+++ b/drivers/net/ethernet/qlogic/qlge/qlge_main.c
@@ -4707,9 +4707,7 @@ static int __devinit qlge_probe(struct pci_dev *pdev,
 	/* Start up the timer to trigger EEH if
 	 * the bus goes dead
 	 */
-	init_timer_deferrable(&qdev->timer);
-	qdev->timer.data = (unsigned long)qdev;
-	qdev->timer.function = ql_timer;
+	setup_timer_deferrable(&qdev->timer, ql_timer, (unsigned long)qdev);
 	qdev->timer.expires = jiffies + (5*HZ);
 	add_timer(&qdev->timer);
 	ql_link_off(qdev);
diff --git a/drivers/net/vxlan.c b/drivers/net/vxlan.c
index 607976c..743c7d7 100644
--- a/drivers/net/vxlan.c
+++ b/drivers/net/vxlan.c
@@ -995,9 +995,8 @@ static void vxlan_setup(struct net_device *dev)
 
 	spin_lock_init(&vxlan->hash_lock);
 
-	init_timer_deferrable(&vxlan->age_timer);
-	vxlan->age_timer.function = vxlan_cleanup;
-	vxlan->age_timer.data = (unsigned long) vxlan;
+	setup_timer_deferrable(&vxlan->age_timer, vxlan_cleanup,
+						(unsigned long)vxlan);
 
 	inet_get_local_port_range(&low, &high);
 	vxlan->port_min = low;
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index d951daa..77d6f62 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3845,10 +3845,8 @@ static int __init init_workqueues(void)
 			INIT_LIST_HEAD(&pool->worklist);
 			INIT_LIST_HEAD(&pool->idle_list);
 
-			init_timer_deferrable(&pool->idle_timer);
-			pool->idle_timer.function = idle_worker_timeout;
-			pool->idle_timer.data = (unsigned long)pool;
-
+			setup_timer_deferrable(&pool->idle_timer,
+				    idle_worker_timeout, (unsigned long)pool);
 			setup_timer(&pool->mayday_timer, gcwq_mayday_timeout,
 				    (unsigned long)pool);
 
diff --git a/net/mac80211/agg-rx.c b/net/mac80211/agg-rx.c
index 186d991..c8985c1 100644
--- a/net/mac80211/agg-rx.c
+++ b/net/mac80211/agg-rx.c
@@ -294,14 +294,14 @@ void ieee80211_process_addba_request(struct ieee80211_local *local,
 	spin_lock_init(&tid_agg_rx->reorder_lock);
 
 	/* rx timer */
-	tid_agg_rx->session_timer.function = sta_rx_agg_session_timer_expired;
-	tid_agg_rx->session_timer.data = (unsigned long)&sta->timer_to_tid[tid];
-	init_timer_deferrable(&tid_agg_rx->session_timer);
+	setup_timer_deferrable(&tid_agg_rx->session_timer,
+				sta_rx_agg_session_timer_expired,
+				(unsigned long)&sta->timer_to_tid[tid]);
 
 	/* rx reorder timer */
-	tid_agg_rx->reorder_timer.function = sta_rx_agg_reorder_timer_expired;
-	tid_agg_rx->reorder_timer.data = (unsigned long)&sta->timer_to_tid[tid];
-	init_timer(&tid_agg_rx->reorder_timer);
+	setup_timer(&tid_agg_rx->reorder_timer,
+				sta_rx_agg_reorder_timer_expired,
+				(unsigned long)&sta->timer_to_tid[tid]);
 
 	/* prepare reordering buffer */
 	tid_agg_rx->reorder_buf =
diff --git a/net/mac80211/agg-tx.c b/net/mac80211/agg-tx.c
index 3195a63..cacd75d 100644
--- a/net/mac80211/agg-tx.c
+++ b/net/mac80211/agg-tx.c
@@ -535,14 +535,14 @@ int ieee80211_start_tx_ba_session(struct ieee80211_sta *pubsta, u16 tid,
 	tid_tx->timeout = timeout;
 
 	/* response timer */
-	tid_tx->addba_resp_timer.function = sta_addba_resp_timer_expired;
-	tid_tx->addba_resp_timer.data = (unsigned long)&sta->timer_to_tid[tid];
-	init_timer(&tid_tx->addba_resp_timer);
+	setup_timer(&tid_tx->addba_resp_timer,
+				sta_addba_resp_timer_expired,
+				(unsigned long)&sta->timer_to_tid[tid]);
 
 	/* tx timer */
-	tid_tx->session_timer.function = sta_tx_agg_session_timer_expired;
-	tid_tx->session_timer.data = (unsigned long)&sta->timer_to_tid[tid];
-	init_timer_deferrable(&tid_tx->session_timer);
+	setup_timer_deferrable(&tid_tx->session_timer,
+				sta_tx_agg_session_timer_expired,
+				(unsigned long)&sta->timer_to_tid[tid]);
 
 	/* assign a dialog token */
 	sta->ampdu_mlme.dialog_token_allocator++;
diff --git a/net/sched/cls_flow.c b/net/sched/cls_flow.c
index ce82d0c..ff1afc5 100644
--- a/net/sched/cls_flow.c
+++ b/net/sched/cls_flow.c
@@ -457,9 +457,8 @@ static int flow_change(struct sk_buff *in_skb,
 		f->mask	  = ~0U;
 
 		get_random_bytes(&f->hashrnd, 4);
-		f->perturb_timer.function = flow_perturbation;
-		f->perturb_timer.data = (unsigned long)f;
-		init_timer_deferrable(&f->perturb_timer);
+		setup_timer_deferrable(&f->perturb_timer, flow_perturbation,
+							(unsigned long)f);
 	}
 
 	tcf_exts_change(tp, &f->exts, &e);
diff --git a/net/sched/sch_sfq.c b/net/sched/sch_sfq.c
index d3a1bc2..93ff569 100644
--- a/net/sched/sch_sfq.c
+++ b/net/sched/sch_sfq.c
@@ -739,9 +739,8 @@ static int sfq_init(struct Qdisc *sch, struct nlattr *opt)
 	struct sfq_sched_data *q = qdisc_priv(sch);
 	int i;
 
-	q->perturb_timer.function = sfq_perturbation;
-	q->perturb_timer.data = (unsigned long)sch;
-	init_timer_deferrable(&q->perturb_timer);
+	setup_timer_deferrable(&q->perturb_timer, sfq_perturbation,
+						(unsigned long)sch);
 
 	for (i = 0; i < SFQ_MAX_DEPTH + 1; i++) {
 		q->dep[i].next = i + SFQ_MAX_FLOWS;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH for-v3.7 1/2] slub: optimize poorly inlined kmalloc* functions
       [not found] <Yes>
@ 2012-10-20 15:48   ` Joonsoo Kim
  2010-07-05 12:41 ` [PATCH 01/09] cifs: add kernel config option for CIFS Client caching support Suresh Jayaraman
                     ` (22 subsequent siblings)
  23 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-20 15:48 UTC (permalink / raw)
  To: Pekka Enberg, Andrew Morton
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter

kmalloc() and kmalloc_node() is always inlined into generic code.
However, there is a mistake in implemention of the SLUB.

In kmalloc() and kmalloc_node() of the SLUB,
we try to compare kmalloc_caches[index] with NULL.
As it cannot be known at compile time,
this comparison is inserted into generic code invoking kmalloc*.
This may decrease system performance, so we should fix it.

Below is the result of "size vmlinux"
text size is decreased roughly 20KB

Before:
   text	   data	    bss	    dec	    hex	filename
10044177        1443168 5722112 17209457        1069871 vmlinux
After:
   text	   data	    bss	    dec	    hex	filename
10022627        1443136 5722112 17187875        1064423 vmlinux

Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
---
With Christoph's patchset(common kmalloc caches:
'[15/15] Common Kmalloc cache determination') which is not merged into mainline yet,
this issue will be fixed.
As it takes some time, I send this patch for v3.7

This patch is based on v3.7-rc1

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index df448ad..4c75f2b 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -271,9 +271,10 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
 			return kmalloc_large(size, flags);
 
 		if (!(flags & SLUB_DMA)) {
-			struct kmem_cache *s = kmalloc_slab(size);
+			int index = kmalloc_index(size);
+			struct kmem_cache *s = kmalloc_caches[index];
 
-			if (!s)
+			if (!index)
 				return ZERO_SIZE_PTR;
 
 			return kmem_cache_alloc_trace(s, flags, size);
@@ -304,9 +305,10 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
 {
 	if (__builtin_constant_p(size) &&
 		size <= SLUB_MAX_SIZE && !(flags & SLUB_DMA)) {
-			struct kmem_cache *s = kmalloc_slab(size);
+		int index = kmalloc_index(size);
+		struct kmem_cache *s = kmalloc_caches[index];
 
-		if (!s)
+		if (!index)
 			return ZERO_SIZE_PTR;
 
 		return kmem_cache_alloc_node_trace(s, flags, node, size);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH for-v3.7 1/2] slub: optimize poorly inlined kmalloc* functions
@ 2012-10-20 15:48   ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-20 15:48 UTC (permalink / raw)
  To: Pekka Enberg, Andrew Morton
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter

kmalloc() and kmalloc_node() is always inlined into generic code.
However, there is a mistake in implemention of the SLUB.

In kmalloc() and kmalloc_node() of the SLUB,
we try to compare kmalloc_caches[index] with NULL.
As it cannot be known at compile time,
this comparison is inserted into generic code invoking kmalloc*.
This may decrease system performance, so we should fix it.

Below is the result of "size vmlinux"
text size is decreased roughly 20KB

Before:
   text	   data	    bss	    dec	    hex	filename
10044177        1443168 5722112 17209457        1069871 vmlinux
After:
   text	   data	    bss	    dec	    hex	filename
10022627        1443136 5722112 17187875        1064423 vmlinux

Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
---
With Christoph's patchset(common kmalloc caches:
'[15/15] Common Kmalloc cache determination') which is not merged into mainline yet,
this issue will be fixed.
As it takes some time, I send this patch for v3.7

This patch is based on v3.7-rc1

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index df448ad..4c75f2b 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -271,9 +271,10 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
 			return kmalloc_large(size, flags);
 
 		if (!(flags & SLUB_DMA)) {
-			struct kmem_cache *s = kmalloc_slab(size);
+			int index = kmalloc_index(size);
+			struct kmem_cache *s = kmalloc_caches[index];
 
-			if (!s)
+			if (!index)
 				return ZERO_SIZE_PTR;
 
 			return kmem_cache_alloc_trace(s, flags, size);
@@ -304,9 +305,10 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
 {
 	if (__builtin_constant_p(size) &&
 		size <= SLUB_MAX_SIZE && !(flags & SLUB_DMA)) {
-			struct kmem_cache *s = kmalloc_slab(size);
+		int index = kmalloc_index(size);
+		struct kmem_cache *s = kmalloc_caches[index];
 
-		if (!s)
+		if (!index)
 			return ZERO_SIZE_PTR;
 
 		return kmem_cache_alloc_node_trace(s, flags, node, size);
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA
  2012-10-20 15:48   ` Joonsoo Kim
@ 2012-10-20 15:48     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-20 15:48 UTC (permalink / raw)
  To: Pekka Enberg, Andrew Morton
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter

kmalloc() and kmalloc_node() of the SLUB isn't inlined when @flags = __GFP_DMA.
This patch optimize this case,
so when @flags = __GFP_DMA, it will be inlined into generic code.

Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 4c75f2b..4adf50b 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -147,6 +147,7 @@ struct kmem_cache {
  * 2^x bytes of allocations.
  */
 extern struct kmem_cache *kmalloc_caches[SLUB_PAGE_SHIFT];
+extern struct kmem_cache *kmalloc_dma_caches[SLUB_PAGE_SHIFT];
 
 /*
  * Sorry that the following has to be that ugly but some versions of GCC
@@ -266,19 +267,24 @@ static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
 
 static __always_inline void *kmalloc(size_t size, gfp_t flags)
 {
+	struct kmem_cache *s;
+	int index;
+
 	if (__builtin_constant_p(size)) {
 		if (size > SLUB_MAX_SIZE)
 			return kmalloc_large(size, flags);
 
-		if (!(flags & SLUB_DMA)) {
-			int index = kmalloc_index(size);
-			struct kmem_cache *s = kmalloc_caches[index];
-
-			if (!index)
-				return ZERO_SIZE_PTR;
+		index = kmalloc_index(size);
+		if (!index)
+			return ZERO_SIZE_PTR;
+#ifdef CONFIG_ZONE_DMA
+		if (unlikely(flags & SLUB_DMA)) {
+			s = kmalloc_dma_caches[index];
+		} else
+#endif
+			s = kmalloc_caches[index];
 
-			return kmem_cache_alloc_trace(s, flags, size);
-		}
+		return kmem_cache_alloc_trace(s, flags, size);
 	}
 	return __kmalloc(size, flags);
 }
@@ -303,13 +309,19 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s,
 
 static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
 {
-	if (__builtin_constant_p(size) &&
-		size <= SLUB_MAX_SIZE && !(flags & SLUB_DMA)) {
-		int index = kmalloc_index(size);
-		struct kmem_cache *s = kmalloc_caches[index];
+	struct kmem_cache *s;
+	int index;
 
+	if (__builtin_constant_p(size) && size <= SLUB_MAX_SIZE) {
+		index = kmalloc_index(size);
 		if (!index)
 			return ZERO_SIZE_PTR;
+#ifdef CONFIG_ZONE_DMA
+		if (unlikely(flags & SLUB_DMA)) {
+			s = kmalloc_dma_caches[index];
+		} else
+#endif
+			s = kmalloc_caches[index];
 
 		return kmem_cache_alloc_node_trace(s, flags, node, size);
 	}
diff --git a/mm/slub.c b/mm/slub.c
index a0d6984..a94533c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3222,7 +3222,8 @@ struct kmem_cache *kmalloc_caches[SLUB_PAGE_SHIFT];
 EXPORT_SYMBOL(kmalloc_caches);
 
 #ifdef CONFIG_ZONE_DMA
-static struct kmem_cache *kmalloc_dma_caches[SLUB_PAGE_SHIFT];
+struct kmem_cache *kmalloc_dma_caches[SLUB_PAGE_SHIFT];
+EXPORT_SYMBOL(kmalloc_dma_caches);
 #endif
 
 static int __init setup_slub_min_order(char *str)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA
@ 2012-10-20 15:48     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-20 15:48 UTC (permalink / raw)
  To: Pekka Enberg, Andrew Morton
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Christoph Lameter

kmalloc() and kmalloc_node() of the SLUB isn't inlined when @flags = __GFP_DMA.
This patch optimize this case,
so when @flags = __GFP_DMA, it will be inlined into generic code.

Cc: Christoph Lameter <cl@linux.com>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index 4c75f2b..4adf50b 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -147,6 +147,7 @@ struct kmem_cache {
  * 2^x bytes of allocations.
  */
 extern struct kmem_cache *kmalloc_caches[SLUB_PAGE_SHIFT];
+extern struct kmem_cache *kmalloc_dma_caches[SLUB_PAGE_SHIFT];
 
 /*
  * Sorry that the following has to be that ugly but some versions of GCC
@@ -266,19 +267,24 @@ static __always_inline void *kmalloc_large(size_t size, gfp_t flags)
 
 static __always_inline void *kmalloc(size_t size, gfp_t flags)
 {
+	struct kmem_cache *s;
+	int index;
+
 	if (__builtin_constant_p(size)) {
 		if (size > SLUB_MAX_SIZE)
 			return kmalloc_large(size, flags);
 
-		if (!(flags & SLUB_DMA)) {
-			int index = kmalloc_index(size);
-			struct kmem_cache *s = kmalloc_caches[index];
-
-			if (!index)
-				return ZERO_SIZE_PTR;
+		index = kmalloc_index(size);
+		if (!index)
+			return ZERO_SIZE_PTR;
+#ifdef CONFIG_ZONE_DMA
+		if (unlikely(flags & SLUB_DMA)) {
+			s = kmalloc_dma_caches[index];
+		} else
+#endif
+			s = kmalloc_caches[index];
 
-			return kmem_cache_alloc_trace(s, flags, size);
-		}
+		return kmem_cache_alloc_trace(s, flags, size);
 	}
 	return __kmalloc(size, flags);
 }
@@ -303,13 +309,19 @@ kmem_cache_alloc_node_trace(struct kmem_cache *s,
 
 static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
 {
-	if (__builtin_constant_p(size) &&
-		size <= SLUB_MAX_SIZE && !(flags & SLUB_DMA)) {
-		int index = kmalloc_index(size);
-		struct kmem_cache *s = kmalloc_caches[index];
+	struct kmem_cache *s;
+	int index;
 
+	if (__builtin_constant_p(size) && size <= SLUB_MAX_SIZE) {
+		index = kmalloc_index(size);
 		if (!index)
 			return ZERO_SIZE_PTR;
+#ifdef CONFIG_ZONE_DMA
+		if (unlikely(flags & SLUB_DMA)) {
+			s = kmalloc_dma_caches[index];
+		} else
+#endif
+			s = kmalloc_caches[index];
 
 		return kmem_cache_alloc_node_trace(s, flags, node, size);
 	}
diff --git a/mm/slub.c b/mm/slub.c
index a0d6984..a94533c 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -3222,7 +3222,8 @@ struct kmem_cache *kmalloc_caches[SLUB_PAGE_SHIFT];
 EXPORT_SYMBOL(kmalloc_caches);
 
 #ifdef CONFIG_ZONE_DMA
-static struct kmem_cache *kmalloc_dma_caches[SLUB_PAGE_SHIFT];
+struct kmem_cache *kmalloc_dma_caches[SLUB_PAGE_SHIFT];
+EXPORT_SYMBOL(kmalloc_dma_caches);
 #endif
 
 static int __init setup_slub_min_order(char *str)
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 0/3] workqueue: minor cleanup
       [not found] <Yes>
                   ` (20 preceding siblings ...)
  2012-10-20 15:48   ` Joonsoo Kim
@ 2012-10-20 16:30 ` Joonsoo Kim
  2012-10-20 16:30   ` [PATCH 1/3] workqueue: optimize mod_delayed_work_on() when @delay == 0 Joonsoo Kim
                     ` (2 more replies)
  2012-10-28 19:12   ` Joonsoo Kim
  2012-10-31 16:56   ` Joonsoo Kim
  23 siblings, 3 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-20 16:30 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

This patchset do minor cleanup for workqueue code.

First patch makes minor behavior change, however, it is trivial.
Others doesn't makes any functional difference.
These are based on v3.7-rc1

Joonsoo Kim (3):
  workqueue: optimize mod_delayed_work_on() when @delay == 0
  workqueue: trivial fix for return statement in work_busy()
  workqueue: remove unused argument of wq_worker_waking_up()

 kernel/sched/core.c      |    2 +-
 kernel/workqueue.c       |   11 +++++++----
 kernel/workqueue_sched.h |    2 +-
 3 files changed, 9 insertions(+), 6 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH 1/3] workqueue: optimize mod_delayed_work_on() when @delay == 0
  2012-10-20 16:30 ` [PATCH 0/3] workqueue: minor cleanup Joonsoo Kim
@ 2012-10-20 16:30   ` Joonsoo Kim
  2012-10-20 16:30   ` [PATCH 2/3] workqueue: trivial fix for return statement in work_busy() Joonsoo Kim
  2012-10-20 16:30   ` [PATCH 3/3] workqueue: remove unused argument of wq_worker_waking_up() Joonsoo Kim
  2 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-20 16:30 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

After try_to_grab_pending(), __queue_delayed_work() is invoked
in mod_delayed_work_on(). When @delay == 0, we can call __queue_work()
directly in order to avoid setting useless timer.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index d951daa..c57358e 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -1477,7 +1477,11 @@ bool mod_delayed_work_on(int cpu, struct workqueue_struct *wq,
 	} while (unlikely(ret == -EAGAIN));
 
 	if (likely(ret >= 0)) {
-		__queue_delayed_work(cpu, wq, dwork, delay);
+		if (!delay)
+			__queue_work(cpu, wq, &dwork->work);
+		else
+			__queue_delayed_work(cpu, wq, dwork, delay);
+
 		local_irq_restore(flags);
 	}
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 2/3] workqueue: trivial fix for return statement in work_busy()
  2012-10-20 16:30 ` [PATCH 0/3] workqueue: minor cleanup Joonsoo Kim
  2012-10-20 16:30   ` [PATCH 1/3] workqueue: optimize mod_delayed_work_on() when @delay == 0 Joonsoo Kim
@ 2012-10-20 16:30   ` Joonsoo Kim
  2012-10-20 22:53     ` Tejun Heo
  2012-10-20 16:30   ` [PATCH 3/3] workqueue: remove unused argument of wq_worker_waking_up() Joonsoo Kim
  2 siblings, 1 reply; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-20 16:30 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

Return type of work_busy() is unsigned int.
There is return statement returning boolean value, 'false' in work_busy().
It is not problem, because 'false' may be treated '0'.
However, fixing it would make code robust.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index c57358e..27a6dee 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3479,7 +3479,7 @@ unsigned int work_busy(struct work_struct *work)
 	unsigned int ret = 0;
 
 	if (!gcwq)
-		return false;
+		return 0;
 
 	spin_lock_irqsave(&gcwq->lock, flags);
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 3/3] workqueue: remove unused argument of wq_worker_waking_up()
  2012-10-20 16:30 ` [PATCH 0/3] workqueue: minor cleanup Joonsoo Kim
  2012-10-20 16:30   ` [PATCH 1/3] workqueue: optimize mod_delayed_work_on() when @delay == 0 Joonsoo Kim
  2012-10-20 16:30   ` [PATCH 2/3] workqueue: trivial fix for return statement in work_busy() Joonsoo Kim
@ 2012-10-20 16:30   ` Joonsoo Kim
  2012-10-20 22:57     ` Tejun Heo
  2 siblings, 1 reply; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-20 16:30 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel, Joonsoo Kim

Commit 63d95a91 ('workqueue: use @pool instead of @gcwq or @cpu where
applicable') changes an approach to access nr_running.
Thus, wq_worker_waking_up() doesn't use @cpu anymore.
Remove it and remove comment related to it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/kernel/sched/core.c b/kernel/sched/core.c
index 2d8927f..30a23d0 100644
--- a/kernel/sched/core.c
+++ b/kernel/sched/core.c
@@ -1243,7 +1243,7 @@ static void ttwu_activate(struct rq *rq, struct task_struct *p, int en_flags)
 
 	/* if a worker is waking up, notify workqueue */
 	if (p->flags & PF_WQ_WORKER)
-		wq_worker_waking_up(p, cpu_of(rq));
+		wq_worker_waking_up(p);
 }
 
 /*
diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index 27a6dee..daf101c 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -727,7 +727,6 @@ static void wake_up_worker(struct worker_pool *pool)
 /**
  * wq_worker_waking_up - a worker is waking up
  * @task: task waking up
- * @cpu: CPU @task is waking up to
  *
  * This function is called during try_to_wake_up() when a worker is
  * being awoken.
@@ -735,7 +734,7 @@ static void wake_up_worker(struct worker_pool *pool)
  * CONTEXT:
  * spin_lock_irq(rq->lock)
  */
-void wq_worker_waking_up(struct task_struct *task, unsigned int cpu)
+void wq_worker_waking_up(struct task_struct *task)
 {
 	struct worker *worker = kthread_data(task);
 
diff --git a/kernel/workqueue_sched.h b/kernel/workqueue_sched.h
index 2d10fc9..c1b45a5 100644
--- a/kernel/workqueue_sched.h
+++ b/kernel/workqueue_sched.h
@@ -4,6 +4,6 @@
  * Scheduler hooks for concurrency managed workqueue.  Only to be
  * included from sched.c and workqueue.c.
  */
-void wq_worker_waking_up(struct task_struct *task, unsigned int cpu);
+void wq_worker_waking_up(struct task_struct *task);
 struct task_struct *wq_worker_sleeping(struct task_struct *task,
 				       unsigned int cpu);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/3] workqueue: trivial fix for return statement in work_busy()
  2012-10-20 16:30   ` [PATCH 2/3] workqueue: trivial fix for return statement in work_busy() Joonsoo Kim
@ 2012-10-20 22:53     ` Tejun Heo
  0 siblings, 0 replies; 265+ messages in thread
From: Tejun Heo @ 2012-10-20 22:53 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: linux-kernel

On Sun, Oct 21, 2012 at 01:30:06AM +0900, Joonsoo Kim wrote:
> Return type of work_busy() is unsigned int.
> There is return statement returning boolean value, 'false' in work_busy().
> It is not problem, because 'false' may be treated '0'.
> However, fixing it would make code robust.
> 
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>

Applied 1-2 to wq/for-3.8.

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 3/3] workqueue: remove unused argument of wq_worker_waking_up()
  2012-10-20 16:30   ` [PATCH 3/3] workqueue: remove unused argument of wq_worker_waking_up() Joonsoo Kim
@ 2012-10-20 22:57     ` Tejun Heo
  2012-10-23  1:44       ` JoonSoo Kim
  0 siblings, 1 reply; 265+ messages in thread
From: Tejun Heo @ 2012-10-20 22:57 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: linux-kernel

On Sun, Oct 21, 2012 at 01:30:07AM +0900, Joonsoo Kim wrote:
> Commit 63d95a91 ('workqueue: use @pool instead of @gcwq or @cpu where
> applicable') changes an approach to access nr_running.
> Thus, wq_worker_waking_up() doesn't use @cpu anymore.
> Remove it and remove comment related to it.
> 
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>

I'm not sure whether I wanna remove or add WARN_ON_ONCE() on it.  That
part has gone through some changes and seen some bugs.  Can we please
do the following instead at least for now?

	if (!(worker->flags & WORKER_NOT_RUNNING)) {
		WARN_ON_ONCE(worker->pool->gcwq->cpu != cpu);
		atomic_inc(get_pool_nr_running(worker->pool));
	}

Thanks.

-- 
tejun

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA
  2012-10-20 15:48     ` Joonsoo Kim
@ 2012-10-22 14:31       ` Christoph Lameter
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-10-22 14:31 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Pekka Enberg, Andrew Morton, linux-kernel, linux-mm

On Sun, 21 Oct 2012, Joonsoo Kim wrote:

> kmalloc() and kmalloc_node() of the SLUB isn't inlined when @flags = __GFP_DMA.
> This patch optimize this case,
> so when @flags = __GFP_DMA, it will be inlined into generic code.

__GFP_DMA is a rarely used flag for kmalloc allocators and so far it was
not considered that it is worth to directly support it in the inlining
code.



^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA
@ 2012-10-22 14:31       ` Christoph Lameter
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-10-22 14:31 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Pekka Enberg, Andrew Morton, linux-kernel, linux-mm

On Sun, 21 Oct 2012, Joonsoo Kim wrote:

> kmalloc() and kmalloc_node() of the SLUB isn't inlined when @flags = __GFP_DMA.
> This patch optimize this case,
> so when @flags = __GFP_DMA, it will be inlined into generic code.

__GFP_DMA is a rarely used flag for kmalloc allocators and so far it was
not considered that it is worth to directly support it in the inlining
code.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 3/3] workqueue: remove unused argument of wq_worker_waking_up()
  2012-10-20 22:57     ` Tejun Heo
@ 2012-10-23  1:44       ` JoonSoo Kim
  2012-10-24 19:14         ` Tejun Heo
  0 siblings, 1 reply; 265+ messages in thread
From: JoonSoo Kim @ 2012-10-23  1:44 UTC (permalink / raw)
  To: Tejun Heo; +Cc: linux-kernel

2012/10/21 Tejun Heo <tj@kernel.org>:
> On Sun, Oct 21, 2012 at 01:30:07AM +0900, Joonsoo Kim wrote:
>> Commit 63d95a91 ('workqueue: use @pool instead of @gcwq or @cpu where
>> applicable') changes an approach to access nr_running.
>> Thus, wq_worker_waking_up() doesn't use @cpu anymore.
>> Remove it and remove comment related to it.
>>
>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>
> I'm not sure whether I wanna remove or add WARN_ON_ONCE() on it.  That
> part has gone through some changes and seen some bugs.  Can we please
> do the following instead at least for now?
>
>         if (!(worker->flags & WORKER_NOT_RUNNING)) {
>                 WARN_ON_ONCE(worker->pool->gcwq->cpu != cpu);
>                 atomic_inc(get_pool_nr_running(worker->pool));
>         }
>

I have no objection to do this for now.
Thanks.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA
  2012-10-22 14:31       ` Christoph Lameter
@ 2012-10-23  2:29         ` JoonSoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-10-23  2:29 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, Andrew Morton, linux-kernel, linux-mm

2012/10/22 Christoph Lameter <cl@linux.com>:
> On Sun, 21 Oct 2012, Joonsoo Kim wrote:
>
>> kmalloc() and kmalloc_node() of the SLUB isn't inlined when @flags = __GFP_DMA.
>> This patch optimize this case,
>> so when @flags = __GFP_DMA, it will be inlined into generic code.
>
> __GFP_DMA is a rarely used flag for kmalloc allocators and so far it was
> not considered that it is worth to directly support it in the inlining
> code.
>
>

Hmm... but, the SLAB already did that optimization for __GFP_DMA.
Almost every kmalloc() is invoked with constant flags value,
so I think that overhead from this patch may be negligible.
With this patch, code size of vmlinux is reduced slightly.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA
@ 2012-10-23  2:29         ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-10-23  2:29 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Pekka Enberg, Andrew Morton, linux-kernel, linux-mm

2012/10/22 Christoph Lameter <cl@linux.com>:
> On Sun, 21 Oct 2012, Joonsoo Kim wrote:
>
>> kmalloc() and kmalloc_node() of the SLUB isn't inlined when @flags = __GFP_DMA.
>> This patch optimize this case,
>> so when @flags = __GFP_DMA, it will be inlined into generic code.
>
> __GFP_DMA is a rarely used flag for kmalloc allocators and so far it was
> not considered that it is worth to directly support it in the inlining
> code.
>
>

Hmm... but, the SLAB already did that optimization for __GFP_DMA.
Almost every kmalloc() is invoked with constant flags value,
so I think that overhead from this patch may be negligible.
With this patch, code size of vmlinux is reduced slightly.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA
  2012-10-23  2:29         ` JoonSoo Kim
@ 2012-10-23  6:16           ` Eric Dumazet
  -1 siblings, 0 replies; 265+ messages in thread
From: Eric Dumazet @ 2012-10-23  6:16 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Christoph Lameter, Pekka Enberg, Andrew Morton, linux-kernel, linux-mm

On Tue, 2012-10-23 at 11:29 +0900, JoonSoo Kim wrote:
> 2012/10/22 Christoph Lameter <cl@linux.com>:
> > On Sun, 21 Oct 2012, Joonsoo Kim wrote:
> >
> >> kmalloc() and kmalloc_node() of the SLUB isn't inlined when @flags = __GFP_DMA.
> >> This patch optimize this case,
> >> so when @flags = __GFP_DMA, it will be inlined into generic code.
> >
> > __GFP_DMA is a rarely used flag for kmalloc allocators and so far it was
> > not considered that it is worth to directly support it in the inlining
> > code.
> >
> >
> 
> Hmm... but, the SLAB already did that optimization for __GFP_DMA.
> Almost every kmalloc() is invoked with constant flags value,
> so I think that overhead from this patch may be negligible.
> With this patch, code size of vmlinux is reduced slightly.

Only because you asked a allyesconfig

GFP_DMA is used for less than 0.1 % of kmalloc() calls, for legacy
hardware (from last century)


In fact if you want to reduce even more your vmlinux, you could test

if (__builtin_constant_p(flags) && (flags & SLUB_DMA))
    return kmem_cache_alloc_trace(s, flags, size);

to force the call to out of line code.





^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA
@ 2012-10-23  6:16           ` Eric Dumazet
  0 siblings, 0 replies; 265+ messages in thread
From: Eric Dumazet @ 2012-10-23  6:16 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Christoph Lameter, Pekka Enberg, Andrew Morton, linux-kernel, linux-mm

On Tue, 2012-10-23 at 11:29 +0900, JoonSoo Kim wrote:
> 2012/10/22 Christoph Lameter <cl@linux.com>:
> > On Sun, 21 Oct 2012, Joonsoo Kim wrote:
> >
> >> kmalloc() and kmalloc_node() of the SLUB isn't inlined when @flags = __GFP_DMA.
> >> This patch optimize this case,
> >> so when @flags = __GFP_DMA, it will be inlined into generic code.
> >
> > __GFP_DMA is a rarely used flag for kmalloc allocators and so far it was
> > not considered that it is worth to directly support it in the inlining
> > code.
> >
> >
> 
> Hmm... but, the SLAB already did that optimization for __GFP_DMA.
> Almost every kmalloc() is invoked with constant flags value,
> so I think that overhead from this patch may be negligible.
> With this patch, code size of vmlinux is reduced slightly.

Only because you asked a allyesconfig

GFP_DMA is used for less than 0.1 % of kmalloc() calls, for legacy
hardware (from last century)


In fact if you want to reduce even more your vmlinux, you could test

if (__builtin_constant_p(flags) && (flags & SLUB_DMA))
    return kmem_cache_alloc_trace(s, flags, size);

to force the call to out of line code.




--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA
  2012-10-23  6:16           ` Eric Dumazet
@ 2012-10-23 16:12             ` JoonSoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-10-23 16:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Lameter, Pekka Enberg, Andrew Morton, linux-kernel, linux-mm

Hi, Eric.

2012/10/23 Eric Dumazet <eric.dumazet@gmail.com>:
> On Tue, 2012-10-23 at 11:29 +0900, JoonSoo Kim wrote:
>> 2012/10/22 Christoph Lameter <cl@linux.com>:
>> > On Sun, 21 Oct 2012, Joonsoo Kim wrote:
>> >
>> >> kmalloc() and kmalloc_node() of the SLUB isn't inlined when @flags = __GFP_DMA.
>> >> This patch optimize this case,
>> >> so when @flags = __GFP_DMA, it will be inlined into generic code.
>> >
>> > __GFP_DMA is a rarely used flag for kmalloc allocators and so far it was
>> > not considered that it is worth to directly support it in the inlining
>> > code.
>> >
>> >
>>
>> Hmm... but, the SLAB already did that optimization for __GFP_DMA.
>> Almost every kmalloc() is invoked with constant flags value,
>> so I think that overhead from this patch may be negligible.
>> With this patch, code size of vmlinux is reduced slightly.
>
> Only because you asked a allyesconfig
>
> GFP_DMA is used for less than 0.1 % of kmalloc() calls, for legacy
> hardware (from last century)

I'm not doing with allyesconfig,
but localmodconfig on my ubuntu desktop system.
On my system, 700 bytes of text of vmlinux is reduced
which mean there may be more than 100 callsite with GFP_DMA.

> In fact if you want to reduce even more your vmlinux, you could test
>
> if (__builtin_constant_p(flags) && (flags & SLUB_DMA))
>     return kmem_cache_alloc_trace(s, flags, size);
>
> to force the call to out of line code.

The reason why I mention about code size is that I want to say it may
be good for performance,
although it has a just small impact.
I'm not interest of reducing code size :)

Thanks for comment.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA
@ 2012-10-23 16:12             ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-10-23 16:12 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Christoph Lameter, Pekka Enberg, Andrew Morton, linux-kernel, linux-mm

Hi, Eric.

2012/10/23 Eric Dumazet <eric.dumazet@gmail.com>:
> On Tue, 2012-10-23 at 11:29 +0900, JoonSoo Kim wrote:
>> 2012/10/22 Christoph Lameter <cl@linux.com>:
>> > On Sun, 21 Oct 2012, Joonsoo Kim wrote:
>> >
>> >> kmalloc() and kmalloc_node() of the SLUB isn't inlined when @flags = __GFP_DMA.
>> >> This patch optimize this case,
>> >> so when @flags = __GFP_DMA, it will be inlined into generic code.
>> >
>> > __GFP_DMA is a rarely used flag for kmalloc allocators and so far it was
>> > not considered that it is worth to directly support it in the inlining
>> > code.
>> >
>> >
>>
>> Hmm... but, the SLAB already did that optimization for __GFP_DMA.
>> Almost every kmalloc() is invoked with constant flags value,
>> so I think that overhead from this patch may be negligible.
>> With this patch, code size of vmlinux is reduced slightly.
>
> Only because you asked a allyesconfig
>
> GFP_DMA is used for less than 0.1 % of kmalloc() calls, for legacy
> hardware (from last century)

I'm not doing with allyesconfig,
but localmodconfig on my ubuntu desktop system.
On my system, 700 bytes of text of vmlinux is reduced
which mean there may be more than 100 callsite with GFP_DMA.

> In fact if you want to reduce even more your vmlinux, you could test
>
> if (__builtin_constant_p(flags) && (flags & SLUB_DMA))
>     return kmem_cache_alloc_trace(s, flags, size);
>
> to force the call to out of line code.

The reason why I mention about code size is that I want to say it may
be good for performance,
although it has a just small impact.
I'm not interest of reducing code size :)

Thanks for comment.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH for-v3.7 1/2] slub: optimize poorly inlined kmalloc* functions
  2012-10-20 15:48   ` Joonsoo Kim
@ 2012-10-24  8:05     ` Pekka Enberg
  -1 siblings, 0 replies; 265+ messages in thread
From: Pekka Enberg @ 2012-10-24  8:05 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm, Christoph Lameter

On Sat, Oct 20, 2012 at 6:48 PM, Joonsoo Kim <js1304@gmail.com> wrote:
> kmalloc() and kmalloc_node() is always inlined into generic code.
> However, there is a mistake in implemention of the SLUB.
>
> In kmalloc() and kmalloc_node() of the SLUB,
> we try to compare kmalloc_caches[index] with NULL.
> As it cannot be known at compile time,
> this comparison is inserted into generic code invoking kmalloc*.
> This may decrease system performance, so we should fix it.
>
> Below is the result of "size vmlinux"
> text size is decreased roughly 20KB
>
> Before:
>    text    data     bss     dec     hex filename
> 10044177        1443168 5722112 17209457        1069871 vmlinux
> After:
>    text    data     bss     dec     hex filename
> 10022627        1443136 5722112 17187875        1064423 vmlinux
>
> Cc: Christoph Lameter <cl@linux.com>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> ---
> With Christoph's patchset(common kmalloc caches:
> '[15/15] Common Kmalloc cache determination') which is not merged into mainline yet,
> this issue will be fixed.
> As it takes some time, I send this patch for v3.7
>
> This patch is based on v3.7-rc1

Looks reasonable to me. Christoph, any objections?

>
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index df448ad..4c75f2b 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -271,9 +271,10 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
>                         return kmalloc_large(size, flags);
>
>                 if (!(flags & SLUB_DMA)) {
> -                       struct kmem_cache *s = kmalloc_slab(size);
> +                       int index = kmalloc_index(size);
> +                       struct kmem_cache *s = kmalloc_caches[index];
>
> -                       if (!s)
> +                       if (!index)
>                                 return ZERO_SIZE_PTR;
>
>                         return kmem_cache_alloc_trace(s, flags, size);
> @@ -304,9 +305,10 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
>  {
>         if (__builtin_constant_p(size) &&
>                 size <= SLUB_MAX_SIZE && !(flags & SLUB_DMA)) {
> -                       struct kmem_cache *s = kmalloc_slab(size);
> +               int index = kmalloc_index(size);
> +               struct kmem_cache *s = kmalloc_caches[index];
>
> -               if (!s)
> +               if (!index)
>                         return ZERO_SIZE_PTR;
>
>                 return kmem_cache_alloc_node_trace(s, flags, node, size);
> --
> 1.7.9.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH for-v3.7 1/2] slub: optimize poorly inlined kmalloc* functions
@ 2012-10-24  8:05     ` Pekka Enberg
  0 siblings, 0 replies; 265+ messages in thread
From: Pekka Enberg @ 2012-10-24  8:05 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm, Christoph Lameter

On Sat, Oct 20, 2012 at 6:48 PM, Joonsoo Kim <js1304@gmail.com> wrote:
> kmalloc() and kmalloc_node() is always inlined into generic code.
> However, there is a mistake in implemention of the SLUB.
>
> In kmalloc() and kmalloc_node() of the SLUB,
> we try to compare kmalloc_caches[index] with NULL.
> As it cannot be known at compile time,
> this comparison is inserted into generic code invoking kmalloc*.
> This may decrease system performance, so we should fix it.
>
> Below is the result of "size vmlinux"
> text size is decreased roughly 20KB
>
> Before:
>    text    data     bss     dec     hex filename
> 10044177        1443168 5722112 17209457        1069871 vmlinux
> After:
>    text    data     bss     dec     hex filename
> 10022627        1443136 5722112 17187875        1064423 vmlinux
>
> Cc: Christoph Lameter <cl@linux.com>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> ---
> With Christoph's patchset(common kmalloc caches:
> '[15/15] Common Kmalloc cache determination') which is not merged into mainline yet,
> this issue will be fixed.
> As it takes some time, I send this patch for v3.7
>
> This patch is based on v3.7-rc1

Looks reasonable to me. Christoph, any objections?

>
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index df448ad..4c75f2b 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -271,9 +271,10 @@ static __always_inline void *kmalloc(size_t size, gfp_t flags)
>                         return kmalloc_large(size, flags);
>
>                 if (!(flags & SLUB_DMA)) {
> -                       struct kmem_cache *s = kmalloc_slab(size);
> +                       int index = kmalloc_index(size);
> +                       struct kmem_cache *s = kmalloc_caches[index];
>
> -                       if (!s)
> +                       if (!index)
>                                 return ZERO_SIZE_PTR;
>
>                         return kmem_cache_alloc_trace(s, flags, size);
> @@ -304,9 +305,10 @@ static __always_inline void *kmalloc_node(size_t size, gfp_t flags, int node)
>  {
>         if (__builtin_constant_p(size) &&
>                 size <= SLUB_MAX_SIZE && !(flags & SLUB_DMA)) {
> -                       struct kmem_cache *s = kmalloc_slab(size);
> +               int index = kmalloc_index(size);
> +               struct kmem_cache *s = kmalloc_caches[index];
>
> -               if (!s)
> +               if (!index)
>                         return ZERO_SIZE_PTR;
>
>                 return kmem_cache_alloc_node_trace(s, flags, node, size);
> --
> 1.7.9.5
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH for-v3.7 1/2] slub: optimize poorly inlined kmalloc* functions
  2012-10-24  8:05     ` Pekka Enberg
@ 2012-10-24 13:36       ` Christoph Lameter
  -1 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-10-24 13:36 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Joonsoo Kim, Andrew Morton, linux-kernel, linux-mm

On Wed, 24 Oct 2012, Pekka Enberg wrote:

> Looks reasonable to me. Christoph, any objections?

I am fine with it. Its going to be short lived because my latest patchset
will do the same. Can we merge this for 3.7?

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH for-v3.7 1/2] slub: optimize poorly inlined kmalloc* functions
@ 2012-10-24 13:36       ` Christoph Lameter
  0 siblings, 0 replies; 265+ messages in thread
From: Christoph Lameter @ 2012-10-24 13:36 UTC (permalink / raw)
  To: Pekka Enberg; +Cc: Joonsoo Kim, Andrew Morton, linux-kernel, linux-mm

On Wed, 24 Oct 2012, Pekka Enberg wrote:

> Looks reasonable to me. Christoph, any objections?

I am fine with it. Its going to be short lived because my latest patchset
will do the same. Can we merge this for 3.7?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 3/3] workqueue: remove unused argument of wq_worker_waking_up()
  2012-10-23  1:44       ` JoonSoo Kim
@ 2012-10-24 19:14         ` Tejun Heo
  0 siblings, 0 replies; 265+ messages in thread
From: Tejun Heo @ 2012-10-24 19:14 UTC (permalink / raw)
  To: JoonSoo Kim; +Cc: linux-kernel

On Tue, Oct 23, 2012 at 10:44:49AM +0900, JoonSoo Kim wrote:
> >         if (!(worker->flags & WORKER_NOT_RUNNING)) {
> >                 WARN_ON_ONCE(worker->pool->gcwq->cpu != cpu);
> >                 atomic_inc(get_pool_nr_running(worker->pool));
> >         }
> >
> 
> I have no objection to do this for now.

Care to cook up a patch?

Thanks.

--
tejun


^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 0/2] clean-up initialization of deferrable timer
  2012-10-18 23:18 ` [PATCH 0/2] clean-up initialization of deferrable timer Joonsoo Kim
  2012-10-18 23:18   ` [PATCH 1/2] timer: add setup_timer_deferrable() macro Joonsoo Kim
  2012-10-18 23:18   ` [PATCH 2/2] timer: use new " Joonsoo Kim
@ 2012-10-26 14:08   ` JoonSoo Kim
  2 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-10-26 14:08 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, Joonsoo Kim

2012/10/19 Joonsoo Kim <js1304@gmail.com>:
> This patchset introduces setup_timer_deferrable() macro.
> Using it makes code simple and understandable.
>
> This patchset doesn't make any functional difference.
> It is just for clean-up.
>
> It is based on v3.7-rc1
>
> Joonsoo Kim (2):
>   timer: add setup_timer_deferrable() macro
>   timer: use new setup_timer_deferrable() macro
>
>  drivers/acpi/apei/ghes.c                     |    5 ++---
>  drivers/ata/libata-core.c                    |    5 ++---
>  drivers/net/ethernet/nvidia/forcedeth.c      |   13 ++++---------
>  drivers/net/ethernet/qlogic/qlge/qlge_main.c |    4 +---
>  drivers/net/vxlan.c                          |    5 ++---
>  include/linux/timer.h                        |    2 ++
>  kernel/workqueue.c                           |    6 ++----
>  net/mac80211/agg-rx.c                        |   12 ++++++------
>  net/mac80211/agg-tx.c                        |   12 ++++++------
>  net/sched/cls_flow.c                         |    5 ++---
>  net/sched/sch_sfq.c                          |    5 ++---
>  11 files changed, 31 insertions(+), 43 deletions(-)
>
> --
> 1.7.9.5

Hello, Thomas.
Will you pick this for your tree?
Or is there anything wrong with it?

Thanks.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH 0/5] minor clean-up and optimize highmem related code
       [not found] <Yes>
@ 2012-10-28 19:12   ` Joonsoo Kim
  2010-07-05 12:41 ` [PATCH 01/09] cifs: add kernel config option for CIFS Client caching support Suresh Jayaraman
                     ` (22 subsequent siblings)
  23 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-28 19:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim

This patchset clean-up and optimize highmem related code.

[1] is just clean-up and doesn't introduce any functional change.
[2-3] are for clean-up and optimization.
These eliminate an useless lock opearation and list management.
[4-5] is for optimization related to flush_all_zero_pkmaps().

Joonsoo Kim (5):
  mm, highmem: use PKMAP_NR() to calculate an index of pkmap
  mm, highmem: remove useless pool_lock
  mm, highmem: remove page_address_pool list
  mm, highmem: makes flush_all_zero_pkmaps() return index of last
    flushed entry
  mm, highmem: get virtual address of the page using PKMAP_ADDR()

 include/linux/highmem.h |    1 +
 mm/highmem.c            |  102 ++++++++++++++++++++---------------------------
 2 files changed, 45 insertions(+), 58 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH 0/5] minor clean-up and optimize highmem related code
@ 2012-10-28 19:12   ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-28 19:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim

This patchset clean-up and optimize highmem related code.

[1] is just clean-up and doesn't introduce any functional change.
[2-3] are for clean-up and optimization.
These eliminate an useless lock opearation and list management.
[4-5] is for optimization related to flush_all_zero_pkmaps().

Joonsoo Kim (5):
  mm, highmem: use PKMAP_NR() to calculate an index of pkmap
  mm, highmem: remove useless pool_lock
  mm, highmem: remove page_address_pool list
  mm, highmem: makes flush_all_zero_pkmaps() return index of last
    flushed entry
  mm, highmem: get virtual address of the page using PKMAP_ADDR()

 include/linux/highmem.h |    1 +
 mm/highmem.c            |  102 ++++++++++++++++++++---------------------------
 2 files changed, 45 insertions(+), 58 deletions(-)

-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH 1/5] mm, highmem: use PKMAP_NR() to calculate an index of pkmap
  2012-10-28 19:12   ` Joonsoo Kim
@ 2012-10-28 19:12     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-28 19:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman

To calculate an index of pkmap, using PKMAP_NR() is more understandable
and maintainable, So change it.

Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/highmem.c b/mm/highmem.c
index d517cd1..b3b3d68 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -99,7 +99,7 @@ struct page *kmap_to_page(void *vaddr)
 	unsigned long addr = (unsigned long)vaddr;
 
 	if (addr >= PKMAP_ADDR(0) && addr <= PKMAP_ADDR(LAST_PKMAP)) {
-		int i = (addr - PKMAP_ADDR(0)) >> PAGE_SHIFT;
+		int i = PKMAP_NR(addr);
 		return pte_page(pkmap_page_table[i]);
 	}
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 1/5] mm, highmem: use PKMAP_NR() to calculate an index of pkmap
@ 2012-10-28 19:12     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-28 19:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman

To calculate an index of pkmap, using PKMAP_NR() is more understandable
and maintainable, So change it.

Cc: Mel Gorman <mgorman@suse.de>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/highmem.c b/mm/highmem.c
index d517cd1..b3b3d68 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -99,7 +99,7 @@ struct page *kmap_to_page(void *vaddr)
 	unsigned long addr = (unsigned long)vaddr;
 
 	if (addr >= PKMAP_ADDR(0) && addr <= PKMAP_ADDR(LAST_PKMAP)) {
-		int i = (addr - PKMAP_ADDR(0)) >> PAGE_SHIFT;
+		int i = PKMAP_NR(addr);
 		return pte_page(pkmap_page_table[i]);
 	}
 
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 2/5] mm, highmem: remove useless pool_lock
  2012-10-28 19:12   ` Joonsoo Kim
@ 2012-10-28 19:12     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-28 19:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim

The pool_lock protects the page_address_pool from concurrent access.
But, access to the page_address_pool is already protected by kmap_lock.
So remove it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/highmem.c b/mm/highmem.c
index b3b3d68..017bad1 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -328,7 +328,6 @@ struct page_address_map {
  * page_address_map freelist, allocated from page_address_maps.
  */
 static struct list_head page_address_pool;	/* freelist */
-static spinlock_t pool_lock;			/* protects page_address_pool */
 
 /*
  * Hash table bucket
@@ -395,11 +394,9 @@ void set_page_address(struct page *page, void *virtual)
 	if (virtual) {		/* Add */
 		BUG_ON(list_empty(&page_address_pool));
 
-		spin_lock_irqsave(&pool_lock, flags);
 		pam = list_entry(page_address_pool.next,
 				struct page_address_map, list);
 		list_del(&pam->list);
-		spin_unlock_irqrestore(&pool_lock, flags);
 
 		pam->page = page;
 		pam->virtual = virtual;
@@ -413,9 +410,7 @@ void set_page_address(struct page *page, void *virtual)
 			if (pam->page == page) {
 				list_del(&pam->list);
 				spin_unlock_irqrestore(&pas->lock, flags);
-				spin_lock_irqsave(&pool_lock, flags);
 				list_add_tail(&pam->list, &page_address_pool);
-				spin_unlock_irqrestore(&pool_lock, flags);
 				goto done;
 			}
 		}
@@ -438,7 +433,6 @@ void __init page_address_init(void)
 		INIT_LIST_HEAD(&page_address_htable[i].lh);
 		spin_lock_init(&page_address_htable[i].lock);
 	}
-	spin_lock_init(&pool_lock);
 }
 
 #endif	/* defined(CONFIG_HIGHMEM) && !defined(WANT_PAGE_VIRTUAL) */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 2/5] mm, highmem: remove useless pool_lock
@ 2012-10-28 19:12     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-28 19:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim

The pool_lock protects the page_address_pool from concurrent access.
But, access to the page_address_pool is already protected by kmap_lock.
So remove it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/highmem.c b/mm/highmem.c
index b3b3d68..017bad1 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -328,7 +328,6 @@ struct page_address_map {
  * page_address_map freelist, allocated from page_address_maps.
  */
 static struct list_head page_address_pool;	/* freelist */
-static spinlock_t pool_lock;			/* protects page_address_pool */
 
 /*
  * Hash table bucket
@@ -395,11 +394,9 @@ void set_page_address(struct page *page, void *virtual)
 	if (virtual) {		/* Add */
 		BUG_ON(list_empty(&page_address_pool));
 
-		spin_lock_irqsave(&pool_lock, flags);
 		pam = list_entry(page_address_pool.next,
 				struct page_address_map, list);
 		list_del(&pam->list);
-		spin_unlock_irqrestore(&pool_lock, flags);
 
 		pam->page = page;
 		pam->virtual = virtual;
@@ -413,9 +410,7 @@ void set_page_address(struct page *page, void *virtual)
 			if (pam->page == page) {
 				list_del(&pam->list);
 				spin_unlock_irqrestore(&pas->lock, flags);
-				spin_lock_irqsave(&pool_lock, flags);
 				list_add_tail(&pam->list, &page_address_pool);
-				spin_unlock_irqrestore(&pool_lock, flags);
 				goto done;
 			}
 		}
@@ -438,7 +433,6 @@ void __init page_address_init(void)
 		INIT_LIST_HEAD(&page_address_htable[i].lh);
 		spin_lock_init(&page_address_htable[i].lock);
 	}
-	spin_lock_init(&pool_lock);
 }
 
 #endif	/* defined(CONFIG_HIGHMEM) && !defined(WANT_PAGE_VIRTUAL) */
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 3/5] mm, highmem: remove page_address_pool list
  2012-10-28 19:12   ` Joonsoo Kim
@ 2012-10-28 19:12     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-28 19:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim

We can find free page_address_map instance without the page_address_pool.
So remove it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/highmem.c b/mm/highmem.c
index 017bad1..731cf9a 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -323,11 +323,7 @@ struct page_address_map {
 	void *virtual;
 	struct list_head list;
 };
-
-/*
- * page_address_map freelist, allocated from page_address_maps.
- */
-static struct list_head page_address_pool;	/* freelist */
+static struct page_address_map page_address_maps[LAST_PKMAP];
 
 /*
  * Hash table bucket
@@ -392,12 +388,7 @@ void set_page_address(struct page *page, void *virtual)
 
 	pas = page_slot(page);
 	if (virtual) {		/* Add */
-		BUG_ON(list_empty(&page_address_pool));
-
-		pam = list_entry(page_address_pool.next,
-				struct page_address_map, list);
-		list_del(&pam->list);
-
+		pam = &page_address_maps[PKMAP_NR((unsigned long)virtual)];
 		pam->page = page;
 		pam->virtual = virtual;
 
@@ -410,7 +401,6 @@ void set_page_address(struct page *page, void *virtual)
 			if (pam->page == page) {
 				list_del(&pam->list);
 				spin_unlock_irqrestore(&pas->lock, flags);
-				list_add_tail(&pam->list, &page_address_pool);
 				goto done;
 			}
 		}
@@ -420,15 +410,10 @@ done:
 	return;
 }
 
-static struct page_address_map page_address_maps[LAST_PKMAP];
-
 void __init page_address_init(void)
 {
 	int i;
 
-	INIT_LIST_HEAD(&page_address_pool);
-	for (i = 0; i < ARRAY_SIZE(page_address_maps); i++)
-		list_add(&page_address_maps[i].list, &page_address_pool);
 	for (i = 0; i < ARRAY_SIZE(page_address_htable); i++) {
 		INIT_LIST_HEAD(&page_address_htable[i].lh);
 		spin_lock_init(&page_address_htable[i].lock);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 3/5] mm, highmem: remove page_address_pool list
@ 2012-10-28 19:12     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-28 19:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim

We can find free page_address_map instance without the page_address_pool.
So remove it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/highmem.c b/mm/highmem.c
index 017bad1..731cf9a 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -323,11 +323,7 @@ struct page_address_map {
 	void *virtual;
 	struct list_head list;
 };
-
-/*
- * page_address_map freelist, allocated from page_address_maps.
- */
-static struct list_head page_address_pool;	/* freelist */
+static struct page_address_map page_address_maps[LAST_PKMAP];
 
 /*
  * Hash table bucket
@@ -392,12 +388,7 @@ void set_page_address(struct page *page, void *virtual)
 
 	pas = page_slot(page);
 	if (virtual) {		/* Add */
-		BUG_ON(list_empty(&page_address_pool));
-
-		pam = list_entry(page_address_pool.next,
-				struct page_address_map, list);
-		list_del(&pam->list);
-
+		pam = &page_address_maps[PKMAP_NR((unsigned long)virtual)];
 		pam->page = page;
 		pam->virtual = virtual;
 
@@ -410,7 +401,6 @@ void set_page_address(struct page *page, void *virtual)
 			if (pam->page == page) {
 				list_del(&pam->list);
 				spin_unlock_irqrestore(&pas->lock, flags);
-				list_add_tail(&pam->list, &page_address_pool);
 				goto done;
 			}
 		}
@@ -420,15 +410,10 @@ done:
 	return;
 }
 
-static struct page_address_map page_address_maps[LAST_PKMAP];
-
 void __init page_address_init(void)
 {
 	int i;
 
-	INIT_LIST_HEAD(&page_address_pool);
-	for (i = 0; i < ARRAY_SIZE(page_address_maps); i++)
-		list_add(&page_address_maps[i].list, &page_address_pool);
 	for (i = 0; i < ARRAY_SIZE(page_address_htable); i++) {
 		INIT_LIST_HEAD(&page_address_htable[i].lh);
 		spin_lock_init(&page_address_htable[i].lock);
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of last flushed entry
  2012-10-28 19:12   ` Joonsoo Kim
@ 2012-10-28 19:12     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-28 19:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim

In current code, after flush_all_zero_pkmaps() is invoked,
then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
return index of flushed entry. With this index,
we can immediately map highmem page to virtual address represented by index.
So change return type of flush_all_zero_pkmaps()
and return index of last flushed entry.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index ef788b5..0683869 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
 
 #ifdef CONFIG_HIGHMEM
 #include <asm/highmem.h>
+#define PKMAP_INDEX_INVAL (-1)
 
 /* declarations for linux/mm/highmem.c */
 unsigned int nr_free_highpages(void);
diff --git a/mm/highmem.c b/mm/highmem.c
index 731cf9a..65beb9a 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
 	return virt_to_page(addr);
 }
 
-static void flush_all_zero_pkmaps(void)
+static int flush_all_zero_pkmaps(void)
 {
 	int i;
-	int need_flush = 0;
+	int index = PKMAP_INDEX_INVAL;
 
 	flush_cache_kmaps();
 
@@ -141,10 +141,12 @@ static void flush_all_zero_pkmaps(void)
 			  &pkmap_page_table[i]);
 
 		set_page_address(page, NULL);
-		need_flush = 1;
+		index = i;
 	}
-	if (need_flush)
+	if (index != PKMAP_INDEX_INVAL)
 		flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
+
+	return index;
 }
 
 /**
@@ -160,6 +162,7 @@ void kmap_flush_unused(void)
 static inline unsigned long map_new_virtual(struct page *page)
 {
 	unsigned long vaddr;
+	int index = PKMAP_INDEX_INVAL;
 	int count;
 
 start:
@@ -168,40 +171,45 @@ start:
 	for (;;) {
 		last_pkmap_nr = (last_pkmap_nr + 1) & LAST_PKMAP_MASK;
 		if (!last_pkmap_nr) {
-			flush_all_zero_pkmaps();
-			count = LAST_PKMAP;
+			index = flush_all_zero_pkmaps();
+			if (index != PKMAP_INDEX_INVAL)
+				break; /* Found a usable entry */
 		}
-		if (!pkmap_count[last_pkmap_nr])
+		if (!pkmap_count[last_pkmap_nr]) {
+			index = last_pkmap_nr;
 			break;	/* Found a usable entry */
-		if (--count)
-			continue;
-
-		/*
-		 * Sleep for somebody else to unmap their entries
-		 */
-		{
-			DECLARE_WAITQUEUE(wait, current);
-
-			__set_current_state(TASK_UNINTERRUPTIBLE);
-			add_wait_queue(&pkmap_map_wait, &wait);
-			unlock_kmap();
-			schedule();
-			remove_wait_queue(&pkmap_map_wait, &wait);
-			lock_kmap();
-
-			/* Somebody else might have mapped it while we slept */
-			if (page_address(page))
-				return (unsigned long)page_address(page);
-
-			/* Re-start */
-			goto start;
 		}
+		if (--count == 0)
+			break;
 	}
-	vaddr = PKMAP_ADDR(last_pkmap_nr);
+
+	/*
+	 * Sleep for somebody else to unmap their entries
+	 */
+	if (index == PKMAP_INDEX_INVAL) {
+		DECLARE_WAITQUEUE(wait, current);
+
+		__set_current_state(TASK_UNINTERRUPTIBLE);
+		add_wait_queue(&pkmap_map_wait, &wait);
+		unlock_kmap();
+		schedule();
+		remove_wait_queue(&pkmap_map_wait, &wait);
+		lock_kmap();
+
+		/* Somebody else might have mapped it while we slept */
+		vaddr = (unsigned long)page_address(page);
+		if (vaddr)
+			return vaddr;
+
+		/* Re-start */
+		goto start;
+	}
+
+	vaddr = PKMAP_ADDR(index);
 	set_pte_at(&init_mm, vaddr,
-		   &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));
+		   &(pkmap_page_table[index]), mk_pte(page, kmap_prot));
 
-	pkmap_count[last_pkmap_nr] = 1;
+	pkmap_count[index] = 1;
 	set_page_address(page, (void *)vaddr);
 
 	return vaddr;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of last flushed entry
@ 2012-10-28 19:12     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-28 19:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim

In current code, after flush_all_zero_pkmaps() is invoked,
then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
return index of flushed entry. With this index,
we can immediately map highmem page to virtual address represented by index.
So change return type of flush_all_zero_pkmaps()
and return index of last flushed entry.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index ef788b5..0683869 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
 
 #ifdef CONFIG_HIGHMEM
 #include <asm/highmem.h>
+#define PKMAP_INDEX_INVAL (-1)
 
 /* declarations for linux/mm/highmem.c */
 unsigned int nr_free_highpages(void);
diff --git a/mm/highmem.c b/mm/highmem.c
index 731cf9a..65beb9a 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
 	return virt_to_page(addr);
 }
 
-static void flush_all_zero_pkmaps(void)
+static int flush_all_zero_pkmaps(void)
 {
 	int i;
-	int need_flush = 0;
+	int index = PKMAP_INDEX_INVAL;
 
 	flush_cache_kmaps();
 
@@ -141,10 +141,12 @@ static void flush_all_zero_pkmaps(void)
 			  &pkmap_page_table[i]);
 
 		set_page_address(page, NULL);
-		need_flush = 1;
+		index = i;
 	}
-	if (need_flush)
+	if (index != PKMAP_INDEX_INVAL)
 		flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
+
+	return index;
 }
 
 /**
@@ -160,6 +162,7 @@ void kmap_flush_unused(void)
 static inline unsigned long map_new_virtual(struct page *page)
 {
 	unsigned long vaddr;
+	int index = PKMAP_INDEX_INVAL;
 	int count;
 
 start:
@@ -168,40 +171,45 @@ start:
 	for (;;) {
 		last_pkmap_nr = (last_pkmap_nr + 1) & LAST_PKMAP_MASK;
 		if (!last_pkmap_nr) {
-			flush_all_zero_pkmaps();
-			count = LAST_PKMAP;
+			index = flush_all_zero_pkmaps();
+			if (index != PKMAP_INDEX_INVAL)
+				break; /* Found a usable entry */
 		}
-		if (!pkmap_count[last_pkmap_nr])
+		if (!pkmap_count[last_pkmap_nr]) {
+			index = last_pkmap_nr;
 			break;	/* Found a usable entry */
-		if (--count)
-			continue;
-
-		/*
-		 * Sleep for somebody else to unmap their entries
-		 */
-		{
-			DECLARE_WAITQUEUE(wait, current);
-
-			__set_current_state(TASK_UNINTERRUPTIBLE);
-			add_wait_queue(&pkmap_map_wait, &wait);
-			unlock_kmap();
-			schedule();
-			remove_wait_queue(&pkmap_map_wait, &wait);
-			lock_kmap();
-
-			/* Somebody else might have mapped it while we slept */
-			if (page_address(page))
-				return (unsigned long)page_address(page);
-
-			/* Re-start */
-			goto start;
 		}
+		if (--count == 0)
+			break;
 	}
-	vaddr = PKMAP_ADDR(last_pkmap_nr);
+
+	/*
+	 * Sleep for somebody else to unmap their entries
+	 */
+	if (index == PKMAP_INDEX_INVAL) {
+		DECLARE_WAITQUEUE(wait, current);
+
+		__set_current_state(TASK_UNINTERRUPTIBLE);
+		add_wait_queue(&pkmap_map_wait, &wait);
+		unlock_kmap();
+		schedule();
+		remove_wait_queue(&pkmap_map_wait, &wait);
+		lock_kmap();
+
+		/* Somebody else might have mapped it while we slept */
+		vaddr = (unsigned long)page_address(page);
+		if (vaddr)
+			return vaddr;
+
+		/* Re-start */
+		goto start;
+	}
+
+	vaddr = PKMAP_ADDR(index);
 	set_pte_at(&init_mm, vaddr,
-		   &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));
+		   &(pkmap_page_table[index]), mk_pte(page, kmap_prot));
 
-	pkmap_count[last_pkmap_nr] = 1;
+	pkmap_count[index] = 1;
 	set_page_address(page, (void *)vaddr);
 
 	return vaddr;
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 5/5] mm, highmem: get virtual address of the page using PKMAP_ADDR()
  2012-10-28 19:12   ` Joonsoo Kim
@ 2012-10-28 19:12     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-28 19:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim

In flush_all_zero_pkmaps(), we have an index of the pkmap associated the page.
Using this index, we can simply get virtual address of the page.
So change it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/highmem.c b/mm/highmem.c
index 65beb9a..1417f4f 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -137,8 +137,7 @@ static int flush_all_zero_pkmaps(void)
 		 * So no dangers, even with speculative execution.
 		 */
 		page = pte_page(pkmap_page_table[i]);
-		pte_clear(&init_mm, (unsigned long)page_address(page),
-			  &pkmap_page_table[i]);
+		pte_clear(&init_mm, PKMAP_ADDR(i), &pkmap_page_table[i]);
 
 		set_page_address(page, NULL);
 		index = i;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH 5/5] mm, highmem: get virtual address of the page using PKMAP_ADDR()
@ 2012-10-28 19:12     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-28 19:12 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim

In flush_all_zero_pkmaps(), we have an index of the pkmap associated the page.
Using this index, we can simply get virtual address of the page.
So change it.

Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/mm/highmem.c b/mm/highmem.c
index 65beb9a..1417f4f 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -137,8 +137,7 @@ static int flush_all_zero_pkmaps(void)
 		 * So no dangers, even with speculative execution.
 		 */
 		page = pte_page(pkmap_page_table[i]);
-		pte_clear(&init_mm, (unsigned long)page_address(page),
-			  &pkmap_page_table[i]);
+		pte_clear(&init_mm, PKMAP_ADDR(i), &pkmap_page_table[i]);
 
 		set_page_address(page, NULL);
 		index = i;
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* Re: [PATCH 1/5] mm, highmem: use PKMAP_NR() to calculate an index of pkmap
  2012-10-28 19:12     ` Joonsoo Kim
@ 2012-10-29  1:48       ` Minchan Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-10-29  1:48 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman

On Mon, Oct 29, 2012 at 04:12:52AM +0900, Joonsoo Kim wrote:
> To calculate an index of pkmap, using PKMAP_NR() is more understandable
> and maintainable, So change it.
> 
> Cc: Mel Gorman <mgorman@suse.de>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 1/5] mm, highmem: use PKMAP_NR() to calculate an index of pkmap
@ 2012-10-29  1:48       ` Minchan Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-10-29  1:48 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman

On Mon, Oct 29, 2012 at 04:12:52AM +0900, Joonsoo Kim wrote:
> To calculate an index of pkmap, using PKMAP_NR() is more understandable
> and maintainable, So change it.
> 
> Cc: Mel Gorman <mgorman@suse.de>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/5] mm, highmem: remove useless pool_lock
  2012-10-28 19:12     ` Joonsoo Kim
@ 2012-10-29  1:52       ` Minchan Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-10-29  1:52 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm

On Mon, Oct 29, 2012 at 04:12:53AM +0900, Joonsoo Kim wrote:
> The pool_lock protects the page_address_pool from concurrent access.
> But, access to the page_address_pool is already protected by kmap_lock.
> So remove it.
> 
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kin <minchan@kernel.org>

Looks good to me.
Just a nitpick.

Please write comment about locking rule like below.

> 
> diff --git a/mm/highmem.c b/mm/highmem.c
> index b3b3d68..017bad1 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -328,7 +328,6 @@ struct page_address_map {
>   * page_address_map freelist, allocated from page_address_maps.
>   */

/* page_address_pool is protected by kmap_lock */

>  static struct list_head page_address_pool;	/* freelist */
> -static spinlock_t pool_lock;			/* protects page_address_pool */
>  

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/5] mm, highmem: remove useless pool_lock
@ 2012-10-29  1:52       ` Minchan Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-10-29  1:52 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm

On Mon, Oct 29, 2012 at 04:12:53AM +0900, Joonsoo Kim wrote:
> The pool_lock protects the page_address_pool from concurrent access.
> But, access to the page_address_pool is already protected by kmap_lock.
> So remove it.
> 
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kin <minchan@kernel.org>

Looks good to me.
Just a nitpick.

Please write comment about locking rule like below.

> 
> diff --git a/mm/highmem.c b/mm/highmem.c
> index b3b3d68..017bad1 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -328,7 +328,6 @@ struct page_address_map {
>   * page_address_map freelist, allocated from page_address_maps.
>   */

/* page_address_pool is protected by kmap_lock */

>  static struct list_head page_address_pool;	/* freelist */
> -static spinlock_t pool_lock;			/* protects page_address_pool */
>  

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 3/5] mm, highmem: remove page_address_pool list
  2012-10-28 19:12     ` Joonsoo Kim
@ 2012-10-29  1:57       ` Minchan Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-10-29  1:57 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm

On Mon, Oct 29, 2012 at 04:12:54AM +0900, Joonsoo Kim wrote:
> We can find free page_address_map instance without the page_address_pool.
> So remove it.
> 
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>

See below a nitpick. :)

> 
> diff --git a/mm/highmem.c b/mm/highmem.c
> index 017bad1..731cf9a 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -323,11 +323,7 @@ struct page_address_map {
>  	void *virtual;
>  	struct list_head list;
>  };
> -

Let's leave a blank line.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 3/5] mm, highmem: remove page_address_pool list
@ 2012-10-29  1:57       ` Minchan Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-10-29  1:57 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm

On Mon, Oct 29, 2012 at 04:12:54AM +0900, Joonsoo Kim wrote:
> We can find free page_address_map instance without the page_address_pool.
> So remove it.
> 
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>

See below a nitpick. :)

> 
> diff --git a/mm/highmem.c b/mm/highmem.c
> index 017bad1..731cf9a 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -323,11 +323,7 @@ struct page_address_map {
>  	void *virtual;
>  	struct list_head list;
>  };
> -

Let's leave a blank line.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of last flushed entry
  2012-10-28 19:12     ` Joonsoo Kim
@ 2012-10-29  2:06       ` Minchan Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-10-29  2:06 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm

On Mon, Oct 29, 2012 at 04:12:55AM +0900, Joonsoo Kim wrote:
> In current code, after flush_all_zero_pkmaps() is invoked,
> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
> return index of flushed entry. With this index,
> we can immediately map highmem page to virtual address represented by index.
> So change return type of flush_all_zero_pkmaps()
> and return index of last flushed entry.
> 
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> 
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index ef788b5..0683869 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>  
>  #ifdef CONFIG_HIGHMEM
>  #include <asm/highmem.h>
> +#define PKMAP_INDEX_INVAL (-1)

How about this?

#define PKMAP_INVALID_INDEX (-1)

>  
>  /* declarations for linux/mm/highmem.c */
>  unsigned int nr_free_highpages(void);
> diff --git a/mm/highmem.c b/mm/highmem.c
> index 731cf9a..65beb9a 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>  	return virt_to_page(addr);
>  }
>  
> -static void flush_all_zero_pkmaps(void)
> +static int flush_all_zero_pkmaps(void)
>  {
>  	int i;
> -	int need_flush = 0;
> +	int index = PKMAP_INDEX_INVAL;
>  
>  	flush_cache_kmaps();
>  
> @@ -141,10 +141,12 @@ static void flush_all_zero_pkmaps(void)
>  			  &pkmap_page_table[i]);
>  
>  		set_page_address(page, NULL);
> -		need_flush = 1;
> +		index = i;

How about returning first free index instead of last one?
and update last_pkmap_nr to it.

>  	}
> -	if (need_flush)
> +	if (index != PKMAP_INDEX_INVAL)
>  		flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> +
> +	return index;
>  }
>  
>  /**
> @@ -160,6 +162,7 @@ void kmap_flush_unused(void)
>  static inline unsigned long map_new_virtual(struct page *page)
>  {
>  	unsigned long vaddr;
> +	int index = PKMAP_INDEX_INVAL;
>  	int count;
>  
>  start:
> @@ -168,40 +171,45 @@ start:
>  	for (;;) {
>  		last_pkmap_nr = (last_pkmap_nr + 1) & LAST_PKMAP_MASK;
>  		if (!last_pkmap_nr) {
> -			flush_all_zero_pkmaps();
> -			count = LAST_PKMAP;
> +			index = flush_all_zero_pkmaps();
> +			if (index != PKMAP_INDEX_INVAL)
> +				break; /* Found a usable entry */
>  		}
> -		if (!pkmap_count[last_pkmap_nr])
> +		if (!pkmap_count[last_pkmap_nr]) {
> +			index = last_pkmap_nr;
>  			break;	/* Found a usable entry */
> -		if (--count)
> -			continue;
> -
> -		/*
> -		 * Sleep for somebody else to unmap their entries
> -		 */
> -		{
> -			DECLARE_WAITQUEUE(wait, current);
> -
> -			__set_current_state(TASK_UNINTERRUPTIBLE);
> -			add_wait_queue(&pkmap_map_wait, &wait);
> -			unlock_kmap();
> -			schedule();
> -			remove_wait_queue(&pkmap_map_wait, &wait);
> -			lock_kmap();
> -
> -			/* Somebody else might have mapped it while we slept */
> -			if (page_address(page))
> -				return (unsigned long)page_address(page);
> -
> -			/* Re-start */
> -			goto start;
>  		}
> +		if (--count == 0)
> +			break;
>  	}
> -	vaddr = PKMAP_ADDR(last_pkmap_nr);
> +
> +	/*
> +	 * Sleep for somebody else to unmap their entries
> +	 */
> +	if (index == PKMAP_INDEX_INVAL) {
> +		DECLARE_WAITQUEUE(wait, current);
> +
> +		__set_current_state(TASK_UNINTERRUPTIBLE);
> +		add_wait_queue(&pkmap_map_wait, &wait);
> +		unlock_kmap();
> +		schedule();
> +		remove_wait_queue(&pkmap_map_wait, &wait);
> +		lock_kmap();
> +
> +		/* Somebody else might have mapped it while we slept */
> +		vaddr = (unsigned long)page_address(page);
> +		if (vaddr)
> +			return vaddr;
> +
> +		/* Re-start */
> +		goto start;
> +	}
> +
> +	vaddr = PKMAP_ADDR(index);
>  	set_pte_at(&init_mm, vaddr,
> -		   &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));
> +		   &(pkmap_page_table[index]), mk_pte(page, kmap_prot));
>  
> -	pkmap_count[last_pkmap_nr] = 1;
> +	pkmap_count[index] = 1;
>  	set_page_address(page, (void *)vaddr);
>  
>  	return vaddr;
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of last flushed entry
@ 2012-10-29  2:06       ` Minchan Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-10-29  2:06 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm

On Mon, Oct 29, 2012 at 04:12:55AM +0900, Joonsoo Kim wrote:
> In current code, after flush_all_zero_pkmaps() is invoked,
> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
> return index of flushed entry. With this index,
> we can immediately map highmem page to virtual address represented by index.
> So change return type of flush_all_zero_pkmaps()
> and return index of last flushed entry.
> 
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> 
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index ef788b5..0683869 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>  
>  #ifdef CONFIG_HIGHMEM
>  #include <asm/highmem.h>
> +#define PKMAP_INDEX_INVAL (-1)

How about this?

#define PKMAP_INVALID_INDEX (-1)

>  
>  /* declarations for linux/mm/highmem.c */
>  unsigned int nr_free_highpages(void);
> diff --git a/mm/highmem.c b/mm/highmem.c
> index 731cf9a..65beb9a 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>  	return virt_to_page(addr);
>  }
>  
> -static void flush_all_zero_pkmaps(void)
> +static int flush_all_zero_pkmaps(void)
>  {
>  	int i;
> -	int need_flush = 0;
> +	int index = PKMAP_INDEX_INVAL;
>  
>  	flush_cache_kmaps();
>  
> @@ -141,10 +141,12 @@ static void flush_all_zero_pkmaps(void)
>  			  &pkmap_page_table[i]);
>  
>  		set_page_address(page, NULL);
> -		need_flush = 1;
> +		index = i;

How about returning first free index instead of last one?
and update last_pkmap_nr to it.

>  	}
> -	if (need_flush)
> +	if (index != PKMAP_INDEX_INVAL)
>  		flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> +
> +	return index;
>  }
>  
>  /**
> @@ -160,6 +162,7 @@ void kmap_flush_unused(void)
>  static inline unsigned long map_new_virtual(struct page *page)
>  {
>  	unsigned long vaddr;
> +	int index = PKMAP_INDEX_INVAL;
>  	int count;
>  
>  start:
> @@ -168,40 +171,45 @@ start:
>  	for (;;) {
>  		last_pkmap_nr = (last_pkmap_nr + 1) & LAST_PKMAP_MASK;
>  		if (!last_pkmap_nr) {
> -			flush_all_zero_pkmaps();
> -			count = LAST_PKMAP;
> +			index = flush_all_zero_pkmaps();
> +			if (index != PKMAP_INDEX_INVAL)
> +				break; /* Found a usable entry */
>  		}
> -		if (!pkmap_count[last_pkmap_nr])
> +		if (!pkmap_count[last_pkmap_nr]) {
> +			index = last_pkmap_nr;
>  			break;	/* Found a usable entry */
> -		if (--count)
> -			continue;
> -
> -		/*
> -		 * Sleep for somebody else to unmap their entries
> -		 */
> -		{
> -			DECLARE_WAITQUEUE(wait, current);
> -
> -			__set_current_state(TASK_UNINTERRUPTIBLE);
> -			add_wait_queue(&pkmap_map_wait, &wait);
> -			unlock_kmap();
> -			schedule();
> -			remove_wait_queue(&pkmap_map_wait, &wait);
> -			lock_kmap();
> -
> -			/* Somebody else might have mapped it while we slept */
> -			if (page_address(page))
> -				return (unsigned long)page_address(page);
> -
> -			/* Re-start */
> -			goto start;
>  		}
> +		if (--count == 0)
> +			break;
>  	}
> -	vaddr = PKMAP_ADDR(last_pkmap_nr);
> +
> +	/*
> +	 * Sleep for somebody else to unmap their entries
> +	 */
> +	if (index == PKMAP_INDEX_INVAL) {
> +		DECLARE_WAITQUEUE(wait, current);
> +
> +		__set_current_state(TASK_UNINTERRUPTIBLE);
> +		add_wait_queue(&pkmap_map_wait, &wait);
> +		unlock_kmap();
> +		schedule();
> +		remove_wait_queue(&pkmap_map_wait, &wait);
> +		lock_kmap();
> +
> +		/* Somebody else might have mapped it while we slept */
> +		vaddr = (unsigned long)page_address(page);
> +		if (vaddr)
> +			return vaddr;
> +
> +		/* Re-start */
> +		goto start;
> +	}
> +
> +	vaddr = PKMAP_ADDR(index);
>  	set_pte_at(&init_mm, vaddr,
> -		   &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));
> +		   &(pkmap_page_table[index]), mk_pte(page, kmap_prot));
>  
> -	pkmap_count[last_pkmap_nr] = 1;
> +	pkmap_count[index] = 1;
>  	set_page_address(page, (void *)vaddr);
>  
>  	return vaddr;
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 5/5] mm, highmem: get virtual address of the page using PKMAP_ADDR()
  2012-10-28 19:12     ` Joonsoo Kim
@ 2012-10-29  2:09       ` Minchan Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-10-29  2:09 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm

On Mon, Oct 29, 2012 at 04:12:56AM +0900, Joonsoo Kim wrote:
> In flush_all_zero_pkmaps(), we have an index of the pkmap associated the page.
> Using this index, we can simply get virtual address of the page.
> So change it.
> 
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 5/5] mm, highmem: get virtual address of the page using PKMAP_ADDR()
@ 2012-10-29  2:09       ` Minchan Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-10-29  2:09 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm

On Mon, Oct 29, 2012 at 04:12:56AM +0900, Joonsoo Kim wrote:
> In flush_all_zero_pkmaps(), we have an index of the pkmap associated the page.
> Using this index, we can simply get virtual address of the page.
> So change it.
> 
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 0/5] minor clean-up and optimize highmem related code
  2012-10-28 19:12   ` Joonsoo Kim
@ 2012-10-29  2:12     ` Minchan Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-10-29  2:12 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm, Peter Zijlstra

Hi Joonsoo,

On Mon, Oct 29, 2012 at 04:12:51AM +0900, Joonsoo Kim wrote:
> This patchset clean-up and optimize highmem related code.
> 
> [1] is just clean-up and doesn't introduce any functional change.
> [2-3] are for clean-up and optimization.
> These eliminate an useless lock opearation and list management.
> [4-5] is for optimization related to flush_all_zero_pkmaps().
> 
> Joonsoo Kim (5):
>   mm, highmem: use PKMAP_NR() to calculate an index of pkmap
>   mm, highmem: remove useless pool_lock
>   mm, highmem: remove page_address_pool list
>   mm, highmem: makes flush_all_zero_pkmaps() return index of last
>     flushed entry
>   mm, highmem: get virtual address of the page using PKMAP_ADDR()

This patchset looks awesome to me.
If you have a plan to respin, please CCed Peter.

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 0/5] minor clean-up and optimize highmem related code
@ 2012-10-29  2:12     ` Minchan Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-10-29  2:12 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: Andrew Morton, linux-kernel, linux-mm, Peter Zijlstra

Hi Joonsoo,

On Mon, Oct 29, 2012 at 04:12:51AM +0900, Joonsoo Kim wrote:
> This patchset clean-up and optimize highmem related code.
> 
> [1] is just clean-up and doesn't introduce any functional change.
> [2-3] are for clean-up and optimization.
> These eliminate an useless lock opearation and list management.
> [4-5] is for optimization related to flush_all_zero_pkmaps().
> 
> Joonsoo Kim (5):
>   mm, highmem: use PKMAP_NR() to calculate an index of pkmap
>   mm, highmem: remove useless pool_lock
>   mm, highmem: remove page_address_pool list
>   mm, highmem: makes flush_all_zero_pkmaps() return index of last
>     flushed entry
>   mm, highmem: get virtual address of the page using PKMAP_ADDR()

This patchset looks awesome to me.
If you have a plan to respin, please CCed Peter.

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of last flushed entry
  2012-10-29  2:06       ` Minchan Kim
@ 2012-10-29 13:12         ` JoonSoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-10-29 13:12 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, linux-kernel, linux-mm

2012/10/29 Minchan Kim <minchan@kernel.org>:
> On Mon, Oct 29, 2012 at 04:12:55AM +0900, Joonsoo Kim wrote:
>> In current code, after flush_all_zero_pkmaps() is invoked,
>> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
>> return index of flushed entry. With this index,
>> we can immediately map highmem page to virtual address represented by index.
>> So change return type of flush_all_zero_pkmaps()
>> and return index of last flushed entry.
>>
>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>>
>> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> index ef788b5..0683869 100644
>> --- a/include/linux/highmem.h
>> +++ b/include/linux/highmem.h
>> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>>
>>  #ifdef CONFIG_HIGHMEM
>>  #include <asm/highmem.h>
>> +#define PKMAP_INDEX_INVAL (-1)
>
> How about this?
>
> #define PKMAP_INVALID_INDEX (-1)

Okay.

>>
>>  /* declarations for linux/mm/highmem.c */
>>  unsigned int nr_free_highpages(void);
>> diff --git a/mm/highmem.c b/mm/highmem.c
>> index 731cf9a..65beb9a 100644
>> --- a/mm/highmem.c
>> +++ b/mm/highmem.c
>> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>>       return virt_to_page(addr);
>>  }
>>
>> -static void flush_all_zero_pkmaps(void)
>> +static int flush_all_zero_pkmaps(void)
>>  {
>>       int i;
>> -     int need_flush = 0;
>> +     int index = PKMAP_INDEX_INVAL;
>>
>>       flush_cache_kmaps();
>>
>> @@ -141,10 +141,12 @@ static void flush_all_zero_pkmaps(void)
>>                         &pkmap_page_table[i]);
>>
>>               set_page_address(page, NULL);
>> -             need_flush = 1;
>> +             index = i;
>
> How about returning first free index instead of last one?
> and update last_pkmap_nr to it.

Okay. It will be more good.

Thanks.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of last flushed entry
@ 2012-10-29 13:12         ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-10-29 13:12 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, linux-kernel, linux-mm

2012/10/29 Minchan Kim <minchan@kernel.org>:
> On Mon, Oct 29, 2012 at 04:12:55AM +0900, Joonsoo Kim wrote:
>> In current code, after flush_all_zero_pkmaps() is invoked,
>> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
>> return index of flushed entry. With this index,
>> we can immediately map highmem page to virtual address represented by index.
>> So change return type of flush_all_zero_pkmaps()
>> and return index of last flushed entry.
>>
>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>>
>> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> index ef788b5..0683869 100644
>> --- a/include/linux/highmem.h
>> +++ b/include/linux/highmem.h
>> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>>
>>  #ifdef CONFIG_HIGHMEM
>>  #include <asm/highmem.h>
>> +#define PKMAP_INDEX_INVAL (-1)
>
> How about this?
>
> #define PKMAP_INVALID_INDEX (-1)

Okay.

>>
>>  /* declarations for linux/mm/highmem.c */
>>  unsigned int nr_free_highpages(void);
>> diff --git a/mm/highmem.c b/mm/highmem.c
>> index 731cf9a..65beb9a 100644
>> --- a/mm/highmem.c
>> +++ b/mm/highmem.c
>> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>>       return virt_to_page(addr);
>>  }
>>
>> -static void flush_all_zero_pkmaps(void)
>> +static int flush_all_zero_pkmaps(void)
>>  {
>>       int i;
>> -     int need_flush = 0;
>> +     int index = PKMAP_INDEX_INVAL;
>>
>>       flush_cache_kmaps();
>>
>> @@ -141,10 +141,12 @@ static void flush_all_zero_pkmaps(void)
>>                         &pkmap_page_table[i]);
>>
>>               set_page_address(page, NULL);
>> -             need_flush = 1;
>> +             index = i;
>
> How about returning first free index instead of last one?
> and update last_pkmap_nr to it.

Okay. It will be more good.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 0/5] minor clean-up and optimize highmem related code
  2012-10-29  2:12     ` Minchan Kim
@ 2012-10-29 13:15       ` JoonSoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-10-29 13:15 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, linux-kernel, linux-mm, Peter Zijlstra

Hi, Minchan.

2012/10/29 Minchan Kim <minchan@kernel.org>:
> Hi Joonsoo,
>
> On Mon, Oct 29, 2012 at 04:12:51AM +0900, Joonsoo Kim wrote:
>> This patchset clean-up and optimize highmem related code.
>>
>> [1] is just clean-up and doesn't introduce any functional change.
>> [2-3] are for clean-up and optimization.
>> These eliminate an useless lock opearation and list management.
>> [4-5] is for optimization related to flush_all_zero_pkmaps().
>>
>> Joonsoo Kim (5):
>>   mm, highmem: use PKMAP_NR() to calculate an index of pkmap
>>   mm, highmem: remove useless pool_lock
>>   mm, highmem: remove page_address_pool list
>>   mm, highmem: makes flush_all_zero_pkmaps() return index of last
>>     flushed entry
>>   mm, highmem: get virtual address of the page using PKMAP_ADDR()
>
> This patchset looks awesome to me.
> If you have a plan to respin, please CCed Peter.

Thanks for review.
I will wait more review and respin, the day after tomorrow.
Version 2 will include fix about your comment.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 0/5] minor clean-up and optimize highmem related code
@ 2012-10-29 13:15       ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-10-29 13:15 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, linux-kernel, linux-mm, Peter Zijlstra

Hi, Minchan.

2012/10/29 Minchan Kim <minchan@kernel.org>:
> Hi Joonsoo,
>
> On Mon, Oct 29, 2012 at 04:12:51AM +0900, Joonsoo Kim wrote:
>> This patchset clean-up and optimize highmem related code.
>>
>> [1] is just clean-up and doesn't introduce any functional change.
>> [2-3] are for clean-up and optimization.
>> These eliminate an useless lock opearation and list management.
>> [4-5] is for optimization related to flush_all_zero_pkmaps().
>>
>> Joonsoo Kim (5):
>>   mm, highmem: use PKMAP_NR() to calculate an index of pkmap
>>   mm, highmem: remove useless pool_lock
>>   mm, highmem: remove page_address_pool list
>>   mm, highmem: makes flush_all_zero_pkmaps() return index of last
>>     flushed entry
>>   mm, highmem: get virtual address of the page using PKMAP_ADDR()
>
> This patchset looks awesome to me.
> If you have a plan to respin, please CCed Peter.

Thanks for review.
I will wait more review and respin, the day after tomorrow.
Version 2 will include fix about your comment.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/5] mm, highmem: remove useless pool_lock
  2012-10-28 19:12     ` Joonsoo Kim
@ 2012-10-30 21:31       ` Andrew Morton
  -1 siblings, 0 replies; 265+ messages in thread
From: Andrew Morton @ 2012-10-30 21:31 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: linux-kernel, linux-mm

On Mon, 29 Oct 2012 04:12:53 +0900
Joonsoo Kim <js1304@gmail.com> wrote:

> The pool_lock protects the page_address_pool from concurrent access.
> But, access to the page_address_pool is already protected by kmap_lock.
> So remove it.

Well, there's a set_page_address() call in mm/page_alloc.c which
doesn't have lock_kmap().  it doesn't *need* lock_kmap() because it's
init-time code and we're running single-threaded there.  I hope!

But this exception should be double-checked and mentioned in the
changelog, please.  And it's a reason why we can't add
assert_spin_locked(&kmap_lock) to set_page_address(), which is
unfortunate.


The irq-disabling in this code is odd.  If ARCH_NEEDS_KMAP_HIGH_GET=n,
we didn't need irq-safe locking in set_page_address().  I guess we'll
need to retain it in page_address() - I expect some callers have IRQs
disabled.


ARCH_NEEDS_KMAP_HIGH_GET is a nasty looking thing.  It's ARM:

/*
 * The reason for kmap_high_get() is to ensure that the currently kmap'd
 * page usage count does not decrease to zero while we're using its
 * existing virtual mapping in an atomic context.  With a VIVT cache this
 * is essential to do, but with a VIPT cache this is only an optimization
 * so not to pay the price of establishing a second mapping if an existing
 * one can be used.  However, on platforms without hardware TLB maintenance
 * broadcast, we simply cannot use ARCH_NEEDS_KMAP_HIGH_GET at all since
 * the locking involved must also disable IRQs which is incompatible with
 * the IPI mechanism used by global TLB operations.
 */
#define ARCH_NEEDS_KMAP_HIGH_GET
#if defined(CONFIG_SMP) && defined(CONFIG_CPU_TLB_V6)
#undef ARCH_NEEDS_KMAP_HIGH_GET
#if defined(CONFIG_HIGHMEM) && defined(CONFIG_CPU_CACHE_VIVT)
#error "The sum of features in your kernel config cannot be supported together"
#endif
#endif


^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/5] mm, highmem: remove useless pool_lock
@ 2012-10-30 21:31       ` Andrew Morton
  0 siblings, 0 replies; 265+ messages in thread
From: Andrew Morton @ 2012-10-30 21:31 UTC (permalink / raw)
  To: Joonsoo Kim; +Cc: linux-kernel, linux-mm

On Mon, 29 Oct 2012 04:12:53 +0900
Joonsoo Kim <js1304@gmail.com> wrote:

> The pool_lock protects the page_address_pool from concurrent access.
> But, access to the page_address_pool is already protected by kmap_lock.
> So remove it.

Well, there's a set_page_address() call in mm/page_alloc.c which
doesn't have lock_kmap().  it doesn't *need* lock_kmap() because it's
init-time code and we're running single-threaded there.  I hope!

But this exception should be double-checked and mentioned in the
changelog, please.  And it's a reason why we can't add
assert_spin_locked(&kmap_lock) to set_page_address(), which is
unfortunate.


The irq-disabling in this code is odd.  If ARCH_NEEDS_KMAP_HIGH_GET=n,
we didn't need irq-safe locking in set_page_address().  I guess we'll
need to retain it in page_address() - I expect some callers have IRQs
disabled.


ARCH_NEEDS_KMAP_HIGH_GET is a nasty looking thing.  It's ARM:

/*
 * The reason for kmap_high_get() is to ensure that the currently kmap'd
 * page usage count does not decrease to zero while we're using its
 * existing virtual mapping in an atomic context.  With a VIVT cache this
 * is essential to do, but with a VIPT cache this is only an optimization
 * so not to pay the price of establishing a second mapping if an existing
 * one can be used.  However, on platforms without hardware TLB maintenance
 * broadcast, we simply cannot use ARCH_NEEDS_KMAP_HIGH_GET at all since
 * the locking involved must also disable IRQs which is incompatible with
 * the IPI mechanism used by global TLB operations.
 */
#define ARCH_NEEDS_KMAP_HIGH_GET
#if defined(CONFIG_SMP) && defined(CONFIG_CPU_TLB_V6)
#undef ARCH_NEEDS_KMAP_HIGH_GET
#if defined(CONFIG_HIGHMEM) && defined(CONFIG_CPU_CACHE_VIVT)
#error "The sum of features in your kernel config cannot be supported together"
#endif
#endif

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/5] mm, highmem: remove useless pool_lock
  2012-10-30 21:31       ` Andrew Morton
@ 2012-10-31  5:14         ` Minchan Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-10-31  5:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Joonsoo Kim, linux-kernel, linux-mm

Hi Andrew,

On Tue, Oct 30, 2012 at 02:31:07PM -0700, Andrew Morton wrote:
> On Mon, 29 Oct 2012 04:12:53 +0900
> Joonsoo Kim <js1304@gmail.com> wrote:
> 
> > The pool_lock protects the page_address_pool from concurrent access.
> > But, access to the page_address_pool is already protected by kmap_lock.
> > So remove it.
> 
> Well, there's a set_page_address() call in mm/page_alloc.c which
> doesn't have lock_kmap().  it doesn't *need* lock_kmap() because it's
> init-time code and we're running single-threaded there.  I hope!
> 
> But this exception should be double-checked and mentioned in the
> changelog, please.  And it's a reason why we can't add
> assert_spin_locked(&kmap_lock) to set_page_address(), which is
> unfortunate.
> 

The exception is vaild only in m68k and sparc and they will use not
set_page_address of highmem.c but page->virtual. So I think we can add
such lock check in set_page_address in highmem.c.

But I'm not sure we really need it because set_page_address is used in
few places so isn't it enough adding a just wording to avoid unnecessary
overhead?

/* NOTE : Caller should hold kmap_lock by lock_kmap() */

> 
> The irq-disabling in this code is odd.  If ARCH_NEEDS_KMAP_HIGH_GET=n,
> we didn't need irq-safe locking in set_page_address().  I guess we'll

What lock you mean in set_page_address?
We have two locks in there, pool_lock and pas->lock.
By this patchset, we don't need pool_lock any more.
Remained thing is pas->lock.

If we make the lock irq-unsafe, it would be deadlock with page_addresss
if it is called in irq context. Currenntly, page_address is used
lots of places and not sure it's called only process context.
Was there any rule that we have to use page_addresss in only
process context?

> need to retain it in page_address() - I expect some callers have IRQs
> disabled.
> 
> 
> ARCH_NEEDS_KMAP_HIGH_GET is a nasty looking thing.  It's ARM:
> 
> /*
>  * The reason for kmap_high_get() is to ensure that the currently kmap'd
>  * page usage count does not decrease to zero while we're using its
>  * existing virtual mapping in an atomic context.  With a VIVT cache this
>  * is essential to do, but with a VIPT cache this is only an optimization
>  * so not to pay the price of establishing a second mapping if an existing
>  * one can be used.  However, on platforms without hardware TLB maintenance
>  * broadcast, we simply cannot use ARCH_NEEDS_KMAP_HIGH_GET at all since
>  * the locking involved must also disable IRQs which is incompatible with
>  * the IPI mechanism used by global TLB operations.
>  */
> #define ARCH_NEEDS_KMAP_HIGH_GET
> #if defined(CONFIG_SMP) && defined(CONFIG_CPU_TLB_V6)
> #undef ARCH_NEEDS_KMAP_HIGH_GET
> #if defined(CONFIG_HIGHMEM) && defined(CONFIG_CPU_CACHE_VIVT)
> #error "The sum of features in your kernel config cannot be supported together"
> #endif
> #endif
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/5] mm, highmem: remove useless pool_lock
@ 2012-10-31  5:14         ` Minchan Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-10-31  5:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Joonsoo Kim, linux-kernel, linux-mm

Hi Andrew,

On Tue, Oct 30, 2012 at 02:31:07PM -0700, Andrew Morton wrote:
> On Mon, 29 Oct 2012 04:12:53 +0900
> Joonsoo Kim <js1304@gmail.com> wrote:
> 
> > The pool_lock protects the page_address_pool from concurrent access.
> > But, access to the page_address_pool is already protected by kmap_lock.
> > So remove it.
> 
> Well, there's a set_page_address() call in mm/page_alloc.c which
> doesn't have lock_kmap().  it doesn't *need* lock_kmap() because it's
> init-time code and we're running single-threaded there.  I hope!
> 
> But this exception should be double-checked and mentioned in the
> changelog, please.  And it's a reason why we can't add
> assert_spin_locked(&kmap_lock) to set_page_address(), which is
> unfortunate.
> 

The exception is vaild only in m68k and sparc and they will use not
set_page_address of highmem.c but page->virtual. So I think we can add
such lock check in set_page_address in highmem.c.

But I'm not sure we really need it because set_page_address is used in
few places so isn't it enough adding a just wording to avoid unnecessary
overhead?

/* NOTE : Caller should hold kmap_lock by lock_kmap() */

> 
> The irq-disabling in this code is odd.  If ARCH_NEEDS_KMAP_HIGH_GET=n,
> we didn't need irq-safe locking in set_page_address().  I guess we'll

What lock you mean in set_page_address?
We have two locks in there, pool_lock and pas->lock.
By this patchset, we don't need pool_lock any more.
Remained thing is pas->lock.

If we make the lock irq-unsafe, it would be deadlock with page_addresss
if it is called in irq context. Currenntly, page_address is used
lots of places and not sure it's called only process context.
Was there any rule that we have to use page_addresss in only
process context?

> need to retain it in page_address() - I expect some callers have IRQs
> disabled.
> 
> 
> ARCH_NEEDS_KMAP_HIGH_GET is a nasty looking thing.  It's ARM:
> 
> /*
>  * The reason for kmap_high_get() is to ensure that the currently kmap'd
>  * page usage count does not decrease to zero while we're using its
>  * existing virtual mapping in an atomic context.  With a VIVT cache this
>  * is essential to do, but with a VIPT cache this is only an optimization
>  * so not to pay the price of establishing a second mapping if an existing
>  * one can be used.  However, on platforms without hardware TLB maintenance
>  * broadcast, we simply cannot use ARCH_NEEDS_KMAP_HIGH_GET at all since
>  * the locking involved must also disable IRQs which is incompatible with
>  * the IPI mechanism used by global TLB operations.
>  */
> #define ARCH_NEEDS_KMAP_HIGH_GET
> #if defined(CONFIG_SMP) && defined(CONFIG_CPU_TLB_V6)
> #undef ARCH_NEEDS_KMAP_HIGH_GET
> #if defined(CONFIG_HIGHMEM) && defined(CONFIG_CPU_CACHE_VIVT)
> #error "The sum of features in your kernel config cannot be supported together"
> #endif
> #endif
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/5] mm, highmem: remove useless pool_lock
  2012-10-30 21:31       ` Andrew Morton
@ 2012-10-31 15:01         ` JoonSoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-10-31 15:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

Hello, Andrew.

2012/10/31 Andrew Morton <akpm@linux-foundation.org>:
> On Mon, 29 Oct 2012 04:12:53 +0900
> Joonsoo Kim <js1304@gmail.com> wrote:
>
>> The pool_lock protects the page_address_pool from concurrent access.
>> But, access to the page_address_pool is already protected by kmap_lock.
>> So remove it.
>
> Well, there's a set_page_address() call in mm/page_alloc.c which
> doesn't have lock_kmap().  it doesn't *need* lock_kmap() because it's
> init-time code and we're running single-threaded there.  I hope!
>
> But this exception should be double-checked and mentioned in the
> changelog, please.  And it's a reason why we can't add
> assert_spin_locked(&kmap_lock) to set_page_address(), which is
> unfortunate.

set_page_address() in mm/page_alloc.c is invoked only when
WANT_PAGE_VIRTUAL is defined.
And in this case, set_page_address()'s definition is not in highmem.c,
but in include/linux/mm.h.
So, we don't need to worry about set_page_address() call in mm/page_alloc.c

> The irq-disabling in this code is odd.  If ARCH_NEEDS_KMAP_HIGH_GET=n,
> we didn't need irq-safe locking in set_page_address().  I guess we'll
> need to retain it in page_address() - I expect some callers have IRQs
> disabled.

As Minchan described, if we don't disable irq when we take a lock for pas->lock,
it would be deadlock with page_address().

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 2/5] mm, highmem: remove useless pool_lock
@ 2012-10-31 15:01         ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-10-31 15:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

Hello, Andrew.

2012/10/31 Andrew Morton <akpm@linux-foundation.org>:
> On Mon, 29 Oct 2012 04:12:53 +0900
> Joonsoo Kim <js1304@gmail.com> wrote:
>
>> The pool_lock protects the page_address_pool from concurrent access.
>> But, access to the page_address_pool is already protected by kmap_lock.
>> So remove it.
>
> Well, there's a set_page_address() call in mm/page_alloc.c which
> doesn't have lock_kmap().  it doesn't *need* lock_kmap() because it's
> init-time code and we're running single-threaded there.  I hope!
>
> But this exception should be double-checked and mentioned in the
> changelog, please.  And it's a reason why we can't add
> assert_spin_locked(&kmap_lock) to set_page_address(), which is
> unfortunate.

set_page_address() in mm/page_alloc.c is invoked only when
WANT_PAGE_VIRTUAL is defined.
And in this case, set_page_address()'s definition is not in highmem.c,
but in include/linux/mm.h.
So, we don't need to worry about set_page_address() call in mm/page_alloc.c

> The irq-disabling in this code is odd.  If ARCH_NEEDS_KMAP_HIGH_GET=n,
> we didn't need irq-safe locking in set_page_address().  I guess we'll
> need to retain it in page_address() - I expect some callers have IRQs
> disabled.

As Minchan described, if we don't disable irq when we take a lock for pas->lock,
it would be deadlock with page_address().

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v2 0/5] minor clean-up and optimize highmem related code
       [not found] <Yes>
@ 2012-10-31 16:56   ` Joonsoo Kim
  2010-07-05 12:41 ` [PATCH 01/09] cifs: add kernel config option for CIFS Client caching support Suresh Jayaraman
                     ` (22 subsequent siblings)
  23 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-31 16:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim

This patchset clean-up and optimize highmem related code.

Change from v1
Rebase on v3.7-rc3
[4] Instead of returning index of last flushed entry, return first index.
And update last_pkmap_nr to this index to optimize more.

Summary for v1
[1] is just clean-up and doesn't introduce any functional change.
[2-3] are for clean-up and optimization.
These eliminate an useless lock opearation and list management.
[4-5] is for optimization related to flush_all_zero_pkmaps().

Joonsoo Kim (5):
  mm, highmem: use PKMAP_NR() to calculate an index of pkmap
  mm, highmem: remove useless pool_lock
  mm, highmem: remove page_address_pool list
  mm, highmem: makes flush_all_zero_pkmaps() return index of first
    flushed entry
  mm, highmem: get virtual address of the page using PKMAP_ADDR()

 include/linux/highmem.h |    1 +
 mm/highmem.c            |  108 ++++++++++++++++++++++-------------------------
 2 files changed, 51 insertions(+), 58 deletions(-)

-- 
1.7.9.5


^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v2 0/5] minor clean-up and optimize highmem related code
@ 2012-10-31 16:56   ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-31 16:56 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, Joonsoo Kim

This patchset clean-up and optimize highmem related code.

Change from v1
Rebase on v3.7-rc3
[4] Instead of returning index of last flushed entry, return first index.
And update last_pkmap_nr to this index to optimize more.

Summary for v1
[1] is just clean-up and doesn't introduce any functional change.
[2-3] are for clean-up and optimization.
These eliminate an useless lock opearation and list management.
[4-5] is for optimization related to flush_all_zero_pkmaps().

Joonsoo Kim (5):
  mm, highmem: use PKMAP_NR() to calculate an index of pkmap
  mm, highmem: remove useless pool_lock
  mm, highmem: remove page_address_pool list
  mm, highmem: makes flush_all_zero_pkmaps() return index of first
    flushed entry
  mm, highmem: get virtual address of the page using PKMAP_ADDR()

 include/linux/highmem.h |    1 +
 mm/highmem.c            |  108 ++++++++++++++++++++++-------------------------
 2 files changed, 51 insertions(+), 58 deletions(-)

-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* [PATCH v2 1/5] mm, highmem: use PKMAP_NR() to calculate an index of pkmap
  2012-10-31 16:56   ` Joonsoo Kim
@ 2012-10-31 16:56     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-31 16:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman, Peter Zijlstra

To calculate an index of pkmap, using PKMAP_NR() is more understandable
and maintainable, So change it.

Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>

diff --git a/mm/highmem.c b/mm/highmem.c
index d517cd1..b3b3d68 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -99,7 +99,7 @@ struct page *kmap_to_page(void *vaddr)
 	unsigned long addr = (unsigned long)vaddr;
 
 	if (addr >= PKMAP_ADDR(0) && addr <= PKMAP_ADDR(LAST_PKMAP)) {
-		int i = (addr - PKMAP_ADDR(0)) >> PAGE_SHIFT;
+		int i = PKMAP_NR(addr);
 		return pte_page(pkmap_page_table[i]);
 	}
 
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v2 1/5] mm, highmem: use PKMAP_NR() to calculate an index of pkmap
@ 2012-10-31 16:56     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-31 16:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman, Peter Zijlstra

To calculate an index of pkmap, using PKMAP_NR() is more understandable
and maintainable, So change it.

Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>

diff --git a/mm/highmem.c b/mm/highmem.c
index d517cd1..b3b3d68 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -99,7 +99,7 @@ struct page *kmap_to_page(void *vaddr)
 	unsigned long addr = (unsigned long)vaddr;
 
 	if (addr >= PKMAP_ADDR(0) && addr <= PKMAP_ADDR(LAST_PKMAP)) {
-		int i = (addr - PKMAP_ADDR(0)) >> PAGE_SHIFT;
+		int i = PKMAP_NR(addr);
 		return pte_page(pkmap_page_table[i]);
 	}
 
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v2 2/5] mm, highmem: remove useless pool_lock
  2012-10-31 16:56   ` Joonsoo Kim
@ 2012-10-31 16:56     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-31 16:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman, Peter Zijlstra

The pool_lock protects the page_address_pool from concurrent access.
But, access to the page_address_pool is already protected by kmap_lock.
So remove it.

Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>

diff --git a/mm/highmem.c b/mm/highmem.c
index b3b3d68..017bad1 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -328,7 +328,6 @@ struct page_address_map {
  * page_address_map freelist, allocated from page_address_maps.
  */
 static struct list_head page_address_pool;	/* freelist */
-static spinlock_t pool_lock;			/* protects page_address_pool */
 
 /*
  * Hash table bucket
@@ -395,11 +394,9 @@ void set_page_address(struct page *page, void *virtual)
 	if (virtual) {		/* Add */
 		BUG_ON(list_empty(&page_address_pool));
 
-		spin_lock_irqsave(&pool_lock, flags);
 		pam = list_entry(page_address_pool.next,
 				struct page_address_map, list);
 		list_del(&pam->list);
-		spin_unlock_irqrestore(&pool_lock, flags);
 
 		pam->page = page;
 		pam->virtual = virtual;
@@ -413,9 +410,7 @@ void set_page_address(struct page *page, void *virtual)
 			if (pam->page == page) {
 				list_del(&pam->list);
 				spin_unlock_irqrestore(&pas->lock, flags);
-				spin_lock_irqsave(&pool_lock, flags);
 				list_add_tail(&pam->list, &page_address_pool);
-				spin_unlock_irqrestore(&pool_lock, flags);
 				goto done;
 			}
 		}
@@ -438,7 +433,6 @@ void __init page_address_init(void)
 		INIT_LIST_HEAD(&page_address_htable[i].lh);
 		spin_lock_init(&page_address_htable[i].lock);
 	}
-	spin_lock_init(&pool_lock);
 }
 
 #endif	/* defined(CONFIG_HIGHMEM) && !defined(WANT_PAGE_VIRTUAL) */
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v2 2/5] mm, highmem: remove useless pool_lock
@ 2012-10-31 16:56     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-31 16:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman, Peter Zijlstra

The pool_lock protects the page_address_pool from concurrent access.
But, access to the page_address_pool is already protected by kmap_lock.
So remove it.

Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>

diff --git a/mm/highmem.c b/mm/highmem.c
index b3b3d68..017bad1 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -328,7 +328,6 @@ struct page_address_map {
  * page_address_map freelist, allocated from page_address_maps.
  */
 static struct list_head page_address_pool;	/* freelist */
-static spinlock_t pool_lock;			/* protects page_address_pool */
 
 /*
  * Hash table bucket
@@ -395,11 +394,9 @@ void set_page_address(struct page *page, void *virtual)
 	if (virtual) {		/* Add */
 		BUG_ON(list_empty(&page_address_pool));
 
-		spin_lock_irqsave(&pool_lock, flags);
 		pam = list_entry(page_address_pool.next,
 				struct page_address_map, list);
 		list_del(&pam->list);
-		spin_unlock_irqrestore(&pool_lock, flags);
 
 		pam->page = page;
 		pam->virtual = virtual;
@@ -413,9 +410,7 @@ void set_page_address(struct page *page, void *virtual)
 			if (pam->page == page) {
 				list_del(&pam->list);
 				spin_unlock_irqrestore(&pas->lock, flags);
-				spin_lock_irqsave(&pool_lock, flags);
 				list_add_tail(&pam->list, &page_address_pool);
-				spin_unlock_irqrestore(&pool_lock, flags);
 				goto done;
 			}
 		}
@@ -438,7 +433,6 @@ void __init page_address_init(void)
 		INIT_LIST_HEAD(&page_address_htable[i].lh);
 		spin_lock_init(&page_address_htable[i].lock);
 	}
-	spin_lock_init(&pool_lock);
 }
 
 #endif	/* defined(CONFIG_HIGHMEM) && !defined(WANT_PAGE_VIRTUAL) */
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v2 3/5] mm, highmem: remove page_address_pool list
  2012-10-31 16:56   ` Joonsoo Kim
@ 2012-10-31 16:56     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-31 16:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman, Peter Zijlstra

We can find free page_address_map instance without the page_address_pool.
So remove it.

Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>

diff --git a/mm/highmem.c b/mm/highmem.c
index 017bad1..d98b0a9 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -324,10 +324,7 @@ struct page_address_map {
 	struct list_head list;
 };
 
-/*
- * page_address_map freelist, allocated from page_address_maps.
- */
-static struct list_head page_address_pool;	/* freelist */
+static struct page_address_map page_address_maps[LAST_PKMAP];
 
 /*
  * Hash table bucket
@@ -392,12 +389,7 @@ void set_page_address(struct page *page, void *virtual)
 
 	pas = page_slot(page);
 	if (virtual) {		/* Add */
-		BUG_ON(list_empty(&page_address_pool));
-
-		pam = list_entry(page_address_pool.next,
-				struct page_address_map, list);
-		list_del(&pam->list);
-
+		pam = &page_address_maps[PKMAP_NR((unsigned long)virtual)];
 		pam->page = page;
 		pam->virtual = virtual;
 
@@ -410,7 +402,6 @@ void set_page_address(struct page *page, void *virtual)
 			if (pam->page == page) {
 				list_del(&pam->list);
 				spin_unlock_irqrestore(&pas->lock, flags);
-				list_add_tail(&pam->list, &page_address_pool);
 				goto done;
 			}
 		}
@@ -420,15 +411,10 @@ done:
 	return;
 }
 
-static struct page_address_map page_address_maps[LAST_PKMAP];
-
 void __init page_address_init(void)
 {
 	int i;
 
-	INIT_LIST_HEAD(&page_address_pool);
-	for (i = 0; i < ARRAY_SIZE(page_address_maps); i++)
-		list_add(&page_address_maps[i].list, &page_address_pool);
 	for (i = 0; i < ARRAY_SIZE(page_address_htable); i++) {
 		INIT_LIST_HEAD(&page_address_htable[i].lh);
 		spin_lock_init(&page_address_htable[i].lock);
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v2 3/5] mm, highmem: remove page_address_pool list
@ 2012-10-31 16:56     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-31 16:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman, Peter Zijlstra

We can find free page_address_map instance without the page_address_pool.
So remove it.

Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>

diff --git a/mm/highmem.c b/mm/highmem.c
index 017bad1..d98b0a9 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -324,10 +324,7 @@ struct page_address_map {
 	struct list_head list;
 };
 
-/*
- * page_address_map freelist, allocated from page_address_maps.
- */
-static struct list_head page_address_pool;	/* freelist */
+static struct page_address_map page_address_maps[LAST_PKMAP];
 
 /*
  * Hash table bucket
@@ -392,12 +389,7 @@ void set_page_address(struct page *page, void *virtual)
 
 	pas = page_slot(page);
 	if (virtual) {		/* Add */
-		BUG_ON(list_empty(&page_address_pool));
-
-		pam = list_entry(page_address_pool.next,
-				struct page_address_map, list);
-		list_del(&pam->list);
-
+		pam = &page_address_maps[PKMAP_NR((unsigned long)virtual)];
 		pam->page = page;
 		pam->virtual = virtual;
 
@@ -410,7 +402,6 @@ void set_page_address(struct page *page, void *virtual)
 			if (pam->page == page) {
 				list_del(&pam->list);
 				spin_unlock_irqrestore(&pas->lock, flags);
-				list_add_tail(&pam->list, &page_address_pool);
 				goto done;
 			}
 		}
@@ -420,15 +411,10 @@ done:
 	return;
 }
 
-static struct page_address_map page_address_maps[LAST_PKMAP];
-
 void __init page_address_init(void)
 {
 	int i;
 
-	INIT_LIST_HEAD(&page_address_pool);
-	for (i = 0; i < ARRAY_SIZE(page_address_maps); i++)
-		list_add(&page_address_maps[i].list, &page_address_pool);
 	for (i = 0; i < ARRAY_SIZE(page_address_htable); i++) {
 		INIT_LIST_HEAD(&page_address_htable[i].lh);
 		spin_lock_init(&page_address_htable[i].lock);
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
  2012-10-31 16:56   ` Joonsoo Kim
@ 2012-10-31 16:56     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-31 16:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman, Peter Zijlstra,
	Minchan Kim

In current code, after flush_all_zero_pkmaps() is invoked,
then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
return index of first flushed entry. With this index,
we can immediately map highmem page to virtual address represented by index.
So change return type of flush_all_zero_pkmaps()
and return index of first flushed entry.

Additionally, update last_pkmap_nr to this index.
It is certain that entry which is below this index is occupied by other mapping,
therefore updating last_pkmap_nr to this index is reasonable optimization.

Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index ef788b5..97ad208 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
 
 #ifdef CONFIG_HIGHMEM
 #include <asm/highmem.h>
+#define PKMAP_INVALID_INDEX (LAST_PKMAP)
 
 /* declarations for linux/mm/highmem.c */
 unsigned int nr_free_highpages(void);
diff --git a/mm/highmem.c b/mm/highmem.c
index d98b0a9..b365f7b 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
 	return virt_to_page(addr);
 }
 
-static void flush_all_zero_pkmaps(void)
+static unsigned int flush_all_zero_pkmaps(void)
 {
 	int i;
-	int need_flush = 0;
+	unsigned int index = PKMAP_INVALID_INDEX;
 
 	flush_cache_kmaps();
 
@@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
 			  &pkmap_page_table[i]);
 
 		set_page_address(page, NULL);
-		need_flush = 1;
+		if (index == PKMAP_INVALID_INDEX)
+			index = i;
 	}
-	if (need_flush)
+	if (index != PKMAP_INVALID_INDEX)
 		flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
+
+	return index;
 }
 
 /**
@@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
  */
 void kmap_flush_unused(void)
 {
+	unsigned int index;
+
 	lock_kmap();
-	flush_all_zero_pkmaps();
+	index = flush_all_zero_pkmaps();
+	if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
+		last_pkmap_nr = index;
 	unlock_kmap();
 }
 
 static inline unsigned long map_new_virtual(struct page *page)
 {
 	unsigned long vaddr;
+	unsigned int index = PKMAP_INVALID_INDEX;
 	int count;
 
 start:
@@ -168,40 +176,45 @@ start:
 	for (;;) {
 		last_pkmap_nr = (last_pkmap_nr + 1) & LAST_PKMAP_MASK;
 		if (!last_pkmap_nr) {
-			flush_all_zero_pkmaps();
-			count = LAST_PKMAP;
+			index = flush_all_zero_pkmaps();
+			break;
 		}
-		if (!pkmap_count[last_pkmap_nr])
+		if (!pkmap_count[last_pkmap_nr]) {
+			index = last_pkmap_nr;
 			break;	/* Found a usable entry */
-		if (--count)
-			continue;
-
-		/*
-		 * Sleep for somebody else to unmap their entries
-		 */
-		{
-			DECLARE_WAITQUEUE(wait, current);
-
-			__set_current_state(TASK_UNINTERRUPTIBLE);
-			add_wait_queue(&pkmap_map_wait, &wait);
-			unlock_kmap();
-			schedule();
-			remove_wait_queue(&pkmap_map_wait, &wait);
-			lock_kmap();
-
-			/* Somebody else might have mapped it while we slept */
-			if (page_address(page))
-				return (unsigned long)page_address(page);
-
-			/* Re-start */
-			goto start;
 		}
+		if (--count == 0)
+			break;
 	}
-	vaddr = PKMAP_ADDR(last_pkmap_nr);
+
+	/*
+	 * Sleep for somebody else to unmap their entries
+	 */
+	if (index == PKMAP_INVALID_INDEX) {
+		DECLARE_WAITQUEUE(wait, current);
+
+		__set_current_state(TASK_UNINTERRUPTIBLE);
+		add_wait_queue(&pkmap_map_wait, &wait);
+		unlock_kmap();
+		schedule();
+		remove_wait_queue(&pkmap_map_wait, &wait);
+		lock_kmap();
+
+		/* Somebody else might have mapped it while we slept */
+		vaddr = (unsigned long)page_address(page);
+		if (vaddr)
+			return vaddr;
+
+		/* Re-start */
+		goto start;
+	}
+
+	vaddr = PKMAP_ADDR(index);
 	set_pte_at(&init_mm, vaddr,
-		   &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));
+		   &(pkmap_page_table[index]), mk_pte(page, kmap_prot));
 
-	pkmap_count[last_pkmap_nr] = 1;
+	pkmap_count[index] = 1;
+	last_pkmap_nr = index;
 	set_page_address(page, (void *)vaddr);
 
 	return vaddr;
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
@ 2012-10-31 16:56     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-31 16:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman, Peter Zijlstra,
	Minchan Kim

In current code, after flush_all_zero_pkmaps() is invoked,
then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
return index of first flushed entry. With this index,
we can immediately map highmem page to virtual address represented by index.
So change return type of flush_all_zero_pkmaps()
and return index of first flushed entry.

Additionally, update last_pkmap_nr to this index.
It is certain that entry which is below this index is occupied by other mapping,
therefore updating last_pkmap_nr to this index is reasonable optimization.

Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Cc: Minchan Kim <minchan@kernel.org>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>

diff --git a/include/linux/highmem.h b/include/linux/highmem.h
index ef788b5..97ad208 100644
--- a/include/linux/highmem.h
+++ b/include/linux/highmem.h
@@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
 
 #ifdef CONFIG_HIGHMEM
 #include <asm/highmem.h>
+#define PKMAP_INVALID_INDEX (LAST_PKMAP)
 
 /* declarations for linux/mm/highmem.c */
 unsigned int nr_free_highpages(void);
diff --git a/mm/highmem.c b/mm/highmem.c
index d98b0a9..b365f7b 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
 	return virt_to_page(addr);
 }
 
-static void flush_all_zero_pkmaps(void)
+static unsigned int flush_all_zero_pkmaps(void)
 {
 	int i;
-	int need_flush = 0;
+	unsigned int index = PKMAP_INVALID_INDEX;
 
 	flush_cache_kmaps();
 
@@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
 			  &pkmap_page_table[i]);
 
 		set_page_address(page, NULL);
-		need_flush = 1;
+		if (index == PKMAP_INVALID_INDEX)
+			index = i;
 	}
-	if (need_flush)
+	if (index != PKMAP_INVALID_INDEX)
 		flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
+
+	return index;
 }
 
 /**
@@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
  */
 void kmap_flush_unused(void)
 {
+	unsigned int index;
+
 	lock_kmap();
-	flush_all_zero_pkmaps();
+	index = flush_all_zero_pkmaps();
+	if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
+		last_pkmap_nr = index;
 	unlock_kmap();
 }
 
 static inline unsigned long map_new_virtual(struct page *page)
 {
 	unsigned long vaddr;
+	unsigned int index = PKMAP_INVALID_INDEX;
 	int count;
 
 start:
@@ -168,40 +176,45 @@ start:
 	for (;;) {
 		last_pkmap_nr = (last_pkmap_nr + 1) & LAST_PKMAP_MASK;
 		if (!last_pkmap_nr) {
-			flush_all_zero_pkmaps();
-			count = LAST_PKMAP;
+			index = flush_all_zero_pkmaps();
+			break;
 		}
-		if (!pkmap_count[last_pkmap_nr])
+		if (!pkmap_count[last_pkmap_nr]) {
+			index = last_pkmap_nr;
 			break;	/* Found a usable entry */
-		if (--count)
-			continue;
-
-		/*
-		 * Sleep for somebody else to unmap their entries
-		 */
-		{
-			DECLARE_WAITQUEUE(wait, current);
-
-			__set_current_state(TASK_UNINTERRUPTIBLE);
-			add_wait_queue(&pkmap_map_wait, &wait);
-			unlock_kmap();
-			schedule();
-			remove_wait_queue(&pkmap_map_wait, &wait);
-			lock_kmap();
-
-			/* Somebody else might have mapped it while we slept */
-			if (page_address(page))
-				return (unsigned long)page_address(page);
-
-			/* Re-start */
-			goto start;
 		}
+		if (--count == 0)
+			break;
 	}
-	vaddr = PKMAP_ADDR(last_pkmap_nr);
+
+	/*
+	 * Sleep for somebody else to unmap their entries
+	 */
+	if (index == PKMAP_INVALID_INDEX) {
+		DECLARE_WAITQUEUE(wait, current);
+
+		__set_current_state(TASK_UNINTERRUPTIBLE);
+		add_wait_queue(&pkmap_map_wait, &wait);
+		unlock_kmap();
+		schedule();
+		remove_wait_queue(&pkmap_map_wait, &wait);
+		lock_kmap();
+
+		/* Somebody else might have mapped it while we slept */
+		vaddr = (unsigned long)page_address(page);
+		if (vaddr)
+			return vaddr;
+
+		/* Re-start */
+		goto start;
+	}
+
+	vaddr = PKMAP_ADDR(index);
 	set_pte_at(&init_mm, vaddr,
-		   &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));
+		   &(pkmap_page_table[index]), mk_pte(page, kmap_prot));
 
-	pkmap_count[last_pkmap_nr] = 1;
+	pkmap_count[index] = 1;
+	last_pkmap_nr = index;
 	set_page_address(page, (void *)vaddr);
 
 	return vaddr;
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v2 5/5] mm, highmem: get virtual address of the page using PKMAP_ADDR()
  2012-10-31 16:56   ` Joonsoo Kim
@ 2012-10-31 16:56     ` Joonsoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-31 16:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman, Peter Zijlstra

In flush_all_zero_pkmaps(), we have an index of the pkmap associated the page.
Using this index, we can simply get virtual address of the page.
So change it.

Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>

diff --git a/mm/highmem.c b/mm/highmem.c
index b365f7b..675ec97 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -137,8 +137,7 @@ static unsigned int flush_all_zero_pkmaps(void)
 		 * So no dangers, even with speculative execution.
 		 */
 		page = pte_page(pkmap_page_table[i]);
-		pte_clear(&init_mm, (unsigned long)page_address(page),
-			  &pkmap_page_table[i]);
+		pte_clear(&init_mm, PKMAP_ADDR(i), &pkmap_page_table[i]);
 
 		set_page_address(page, NULL);
 		if (index == PKMAP_INVALID_INDEX)
-- 
1.7.9.5


^ permalink raw reply related	[flat|nested] 265+ messages in thread

* [PATCH v2 5/5] mm, highmem: get virtual address of the page using PKMAP_ADDR()
@ 2012-10-31 16:56     ` Joonsoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Joonsoo Kim @ 2012-10-31 16:56 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, linux-mm, Joonsoo Kim, Mel Gorman, Peter Zijlstra

In flush_all_zero_pkmaps(), we have an index of the pkmap associated the page.
Using this index, we can simply get virtual address of the page.
So change it.

Cc: Mel Gorman <mel@csn.ul.ie>
Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Joonsoo Kim <js1304@gmail.com>
Reviewed-by: Minchan Kim <minchan@kernel.org>

diff --git a/mm/highmem.c b/mm/highmem.c
index b365f7b..675ec97 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -137,8 +137,7 @@ static unsigned int flush_all_zero_pkmaps(void)
 		 * So no dangers, even with speculative execution.
 		 */
 		page = pte_page(pkmap_page_table[i]);
-		pte_clear(&init_mm, (unsigned long)page_address(page),
-			  &pkmap_page_table[i]);
+		pte_clear(&init_mm, PKMAP_ADDR(i), &pkmap_page_table[i]);
 
 		set_page_address(page, NULL);
 		if (index == PKMAP_INVALID_INDEX)
-- 
1.7.9.5

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* Re: [PATCH 0/5] minor clean-up and optimize highmem related code
  2012-10-29 13:15       ` JoonSoo Kim
@ 2012-10-31 17:11         ` JoonSoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-10-31 17:11 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, linux-kernel, linux-mm, Peter Zijlstra

Hello, Andrew.

2012/10/29 JoonSoo Kim <js1304@gmail.com>:
> Hi, Minchan.
>
> 2012/10/29 Minchan Kim <minchan@kernel.org>:
>> Hi Joonsoo,
>>
>> On Mon, Oct 29, 2012 at 04:12:51AM +0900, Joonsoo Kim wrote:
>>> This patchset clean-up and optimize highmem related code.
>>>
>>> [1] is just clean-up and doesn't introduce any functional change.
>>> [2-3] are for clean-up and optimization.
>>> These eliminate an useless lock opearation and list management.
>>> [4-5] is for optimization related to flush_all_zero_pkmaps().
>>>
>>> Joonsoo Kim (5):
>>>   mm, highmem: use PKMAP_NR() to calculate an index of pkmap
>>>   mm, highmem: remove useless pool_lock
>>>   mm, highmem: remove page_address_pool list
>>>   mm, highmem: makes flush_all_zero_pkmaps() return index of last
>>>     flushed entry
>>>   mm, highmem: get virtual address of the page using PKMAP_ADDR()
>>
>> This patchset looks awesome to me.
>> If you have a plan to respin, please CCed Peter.
>
> Thanks for review.
> I will wait more review and respin, the day after tomorrow.
> Version 2 will include fix about your comment.

Could you pick up second version of this patchset?

[3] is changed to leave one blank line.
[4] is changed in order to further optimize according to Minchan's comment.
It return first index of flushed entry, instead of last index.
Others doesn't changed.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH 0/5] minor clean-up and optimize highmem related code
@ 2012-10-31 17:11         ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-10-31 17:11 UTC (permalink / raw)
  To: Minchan Kim; +Cc: Andrew Morton, linux-kernel, linux-mm, Peter Zijlstra

Hello, Andrew.

2012/10/29 JoonSoo Kim <js1304@gmail.com>:
> Hi, Minchan.
>
> 2012/10/29 Minchan Kim <minchan@kernel.org>:
>> Hi Joonsoo,
>>
>> On Mon, Oct 29, 2012 at 04:12:51AM +0900, Joonsoo Kim wrote:
>>> This patchset clean-up and optimize highmem related code.
>>>
>>> [1] is just clean-up and doesn't introduce any functional change.
>>> [2-3] are for clean-up and optimization.
>>> These eliminate an useless lock opearation and list management.
>>> [4-5] is for optimization related to flush_all_zero_pkmaps().
>>>
>>> Joonsoo Kim (5):
>>>   mm, highmem: use PKMAP_NR() to calculate an index of pkmap
>>>   mm, highmem: remove useless pool_lock
>>>   mm, highmem: remove page_address_pool list
>>>   mm, highmem: makes flush_all_zero_pkmaps() return index of last
>>>     flushed entry
>>>   mm, highmem: get virtual address of the page using PKMAP_ADDR()
>>
>> This patchset looks awesome to me.
>> If you have a plan to respin, please CCed Peter.
>
> Thanks for review.
> I will wait more review and respin, the day after tomorrow.
> Version 2 will include fix about your comment.

Could you pick up second version of this patchset?

[3] is changed to leave one blank line.
[4] is changed in order to further optimize according to Minchan's comment.
It return first index of flushed entry, instead of last index.
Others doesn't changed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
  2012-10-31 16:56     ` Joonsoo Kim
@ 2012-11-01  5:03       ` Minchan Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-11-01  5:03 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra

On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
> In current code, after flush_all_zero_pkmaps() is invoked,
> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
> return index of first flushed entry. With this index,
> we can immediately map highmem page to virtual address represented by index.
> So change return type of flush_all_zero_pkmaps()
> and return index of first flushed entry.
> 
> Additionally, update last_pkmap_nr to this index.
> It is certain that entry which is below this index is occupied by other mapping,
> therefore updating last_pkmap_nr to this index is reasonable optimization.
> 
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> 
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index ef788b5..97ad208 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>  
>  #ifdef CONFIG_HIGHMEM
>  #include <asm/highmem.h>
> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
>  
>  /* declarations for linux/mm/highmem.c */
>  unsigned int nr_free_highpages(void);
> diff --git a/mm/highmem.c b/mm/highmem.c
> index d98b0a9..b365f7b 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>  	return virt_to_page(addr);
>  }
>  
> -static void flush_all_zero_pkmaps(void)
> +static unsigned int flush_all_zero_pkmaps(void)
>  {
>  	int i;
> -	int need_flush = 0;
> +	unsigned int index = PKMAP_INVALID_INDEX;
>  
>  	flush_cache_kmaps();
>  
> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
>  			  &pkmap_page_table[i]);
>  
>  		set_page_address(page, NULL);
> -		need_flush = 1;
> +		if (index == PKMAP_INVALID_INDEX)
> +			index = i;
>  	}
> -	if (need_flush)
> +	if (index != PKMAP_INVALID_INDEX)
>  		flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> +
> +	return index;
>  }
>  
>  /**
> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
>   */
>  void kmap_flush_unused(void)
>  {
> +	unsigned int index;
> +
>  	lock_kmap();
> -	flush_all_zero_pkmaps();
> +	index = flush_all_zero_pkmaps();
> +	if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
> +		last_pkmap_nr = index;

I don't know how kmap_flush_unused is really fast path so how my nitpick
is effective. Anyway,
What problem happens if we do following as?

lock()
index = flush_all_zero_pkmaps();
if (index != PKMAP_INVALID_INDEX)
        last_pkmap_nr = index;
unlock();

Normally, last_pkmap_nr is increased with searching empty slot in
map_new_virtual. So I expect return value of flush_all_zero_pkmaps
in kmap_flush_unused normally become either less than last_pkmap_nr
or last_pkmap_nr + 1.

 
>  	unlock_kmap();
>  }
>  
>  static inline unsigned long map_new_virtual(struct page *page)
>  {
>  	unsigned long vaddr;
> +	unsigned int index = PKMAP_INVALID_INDEX;
>  	int count;
>  
>  start:
> @@ -168,40 +176,45 @@ start:
>  	for (;;) {
>  		last_pkmap_nr = (last_pkmap_nr + 1) & LAST_PKMAP_MASK;
>  		if (!last_pkmap_nr) {
> -			flush_all_zero_pkmaps();
> -			count = LAST_PKMAP;
> +			index = flush_all_zero_pkmaps();
> +			break;
>  		}
> -		if (!pkmap_count[last_pkmap_nr])
> +		if (!pkmap_count[last_pkmap_nr]) {
> +			index = last_pkmap_nr;
>  			break;	/* Found a usable entry */
> -		if (--count)
> -			continue;
> -
> -		/*
> -		 * Sleep for somebody else to unmap their entries
> -		 */
> -		{
> -			DECLARE_WAITQUEUE(wait, current);
> -
> -			__set_current_state(TASK_UNINTERRUPTIBLE);
> -			add_wait_queue(&pkmap_map_wait, &wait);
> -			unlock_kmap();
> -			schedule();
> -			remove_wait_queue(&pkmap_map_wait, &wait);
> -			lock_kmap();
> -
> -			/* Somebody else might have mapped it while we slept */
> -			if (page_address(page))
> -				return (unsigned long)page_address(page);
> -
> -			/* Re-start */
> -			goto start;
>  		}
> +		if (--count == 0)
> +			break;
>  	}
> -	vaddr = PKMAP_ADDR(last_pkmap_nr);
> +
> +	/*
> +	 * Sleep for somebody else to unmap their entries
> +	 */
> +	if (index == PKMAP_INVALID_INDEX) {
> +		DECLARE_WAITQUEUE(wait, current);
> +
> +		__set_current_state(TASK_UNINTERRUPTIBLE);
> +		add_wait_queue(&pkmap_map_wait, &wait);
> +		unlock_kmap();
> +		schedule();
> +		remove_wait_queue(&pkmap_map_wait, &wait);
> +		lock_kmap();
> +
> +		/* Somebody else might have mapped it while we slept */
> +		vaddr = (unsigned long)page_address(page);
> +		if (vaddr)
> +			return vaddr;
> +
> +		/* Re-start */
> +		goto start;
> +	}
> +
> +	vaddr = PKMAP_ADDR(index);
>  	set_pte_at(&init_mm, vaddr,
> -		   &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));
> +		   &(pkmap_page_table[index]), mk_pte(page, kmap_prot));
>  
> -	pkmap_count[last_pkmap_nr] = 1;
> +	pkmap_count[index] = 1;
> +	last_pkmap_nr = index;
>  	set_page_address(page, (void *)vaddr);
>  
>  	return vaddr;
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
@ 2012-11-01  5:03       ` Minchan Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-11-01  5:03 UTC (permalink / raw)
  To: Joonsoo Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra

On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
> In current code, after flush_all_zero_pkmaps() is invoked,
> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
> return index of first flushed entry. With this index,
> we can immediately map highmem page to virtual address represented by index.
> So change return type of flush_all_zero_pkmaps()
> and return index of first flushed entry.
> 
> Additionally, update last_pkmap_nr to this index.
> It is certain that entry which is below this index is occupied by other mapping,
> therefore updating last_pkmap_nr to this index is reasonable optimization.
> 
> Cc: Mel Gorman <mel@csn.ul.ie>
> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> Cc: Minchan Kim <minchan@kernel.org>
> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> 
> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> index ef788b5..97ad208 100644
> --- a/include/linux/highmem.h
> +++ b/include/linux/highmem.h
> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>  
>  #ifdef CONFIG_HIGHMEM
>  #include <asm/highmem.h>
> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
>  
>  /* declarations for linux/mm/highmem.c */
>  unsigned int nr_free_highpages(void);
> diff --git a/mm/highmem.c b/mm/highmem.c
> index d98b0a9..b365f7b 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>  	return virt_to_page(addr);
>  }
>  
> -static void flush_all_zero_pkmaps(void)
> +static unsigned int flush_all_zero_pkmaps(void)
>  {
>  	int i;
> -	int need_flush = 0;
> +	unsigned int index = PKMAP_INVALID_INDEX;
>  
>  	flush_cache_kmaps();
>  
> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
>  			  &pkmap_page_table[i]);
>  
>  		set_page_address(page, NULL);
> -		need_flush = 1;
> +		if (index == PKMAP_INVALID_INDEX)
> +			index = i;
>  	}
> -	if (need_flush)
> +	if (index != PKMAP_INVALID_INDEX)
>  		flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> +
> +	return index;
>  }
>  
>  /**
> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
>   */
>  void kmap_flush_unused(void)
>  {
> +	unsigned int index;
> +
>  	lock_kmap();
> -	flush_all_zero_pkmaps();
> +	index = flush_all_zero_pkmaps();
> +	if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
> +		last_pkmap_nr = index;

I don't know how kmap_flush_unused is really fast path so how my nitpick
is effective. Anyway,
What problem happens if we do following as?

lock()
index = flush_all_zero_pkmaps();
if (index != PKMAP_INVALID_INDEX)
        last_pkmap_nr = index;
unlock();

Normally, last_pkmap_nr is increased with searching empty slot in
map_new_virtual. So I expect return value of flush_all_zero_pkmaps
in kmap_flush_unused normally become either less than last_pkmap_nr
or last_pkmap_nr + 1.

 
>  	unlock_kmap();
>  }
>  
>  static inline unsigned long map_new_virtual(struct page *page)
>  {
>  	unsigned long vaddr;
> +	unsigned int index = PKMAP_INVALID_INDEX;
>  	int count;
>  
>  start:
> @@ -168,40 +176,45 @@ start:
>  	for (;;) {
>  		last_pkmap_nr = (last_pkmap_nr + 1) & LAST_PKMAP_MASK;
>  		if (!last_pkmap_nr) {
> -			flush_all_zero_pkmaps();
> -			count = LAST_PKMAP;
> +			index = flush_all_zero_pkmaps();
> +			break;
>  		}
> -		if (!pkmap_count[last_pkmap_nr])
> +		if (!pkmap_count[last_pkmap_nr]) {
> +			index = last_pkmap_nr;
>  			break;	/* Found a usable entry */
> -		if (--count)
> -			continue;
> -
> -		/*
> -		 * Sleep for somebody else to unmap their entries
> -		 */
> -		{
> -			DECLARE_WAITQUEUE(wait, current);
> -
> -			__set_current_state(TASK_UNINTERRUPTIBLE);
> -			add_wait_queue(&pkmap_map_wait, &wait);
> -			unlock_kmap();
> -			schedule();
> -			remove_wait_queue(&pkmap_map_wait, &wait);
> -			lock_kmap();
> -
> -			/* Somebody else might have mapped it while we slept */
> -			if (page_address(page))
> -				return (unsigned long)page_address(page);
> -
> -			/* Re-start */
> -			goto start;
>  		}
> +		if (--count == 0)
> +			break;
>  	}
> -	vaddr = PKMAP_ADDR(last_pkmap_nr);
> +
> +	/*
> +	 * Sleep for somebody else to unmap their entries
> +	 */
> +	if (index == PKMAP_INVALID_INDEX) {
> +		DECLARE_WAITQUEUE(wait, current);
> +
> +		__set_current_state(TASK_UNINTERRUPTIBLE);
> +		add_wait_queue(&pkmap_map_wait, &wait);
> +		unlock_kmap();
> +		schedule();
> +		remove_wait_queue(&pkmap_map_wait, &wait);
> +		lock_kmap();
> +
> +		/* Somebody else might have mapped it while we slept */
> +		vaddr = (unsigned long)page_address(page);
> +		if (vaddr)
> +			return vaddr;
> +
> +		/* Re-start */
> +		goto start;
> +	}
> +
> +	vaddr = PKMAP_ADDR(index);
>  	set_pte_at(&init_mm, vaddr,
> -		   &(pkmap_page_table[last_pkmap_nr]), mk_pte(page, kmap_prot));
> +		   &(pkmap_page_table[index]), mk_pte(page, kmap_prot));
>  
> -	pkmap_count[last_pkmap_nr] = 1;
> +	pkmap_count[index] = 1;
> +	last_pkmap_nr = index;
>  	set_page_address(page, (void *)vaddr);
>  
>  	return vaddr;
> -- 
> 1.7.9.5
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
  2012-11-01  5:03       ` Minchan Kim
@ 2012-11-02 19:07         ` JoonSoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-11-02 19:07 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra

Hello, Minchan.

2012/11/1 Minchan Kim <minchan@kernel.org>:
> On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
>> In current code, after flush_all_zero_pkmaps() is invoked,
>> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
>> return index of first flushed entry. With this index,
>> we can immediately map highmem page to virtual address represented by index.
>> So change return type of flush_all_zero_pkmaps()
>> and return index of first flushed entry.
>>
>> Additionally, update last_pkmap_nr to this index.
>> It is certain that entry which is below this index is occupied by other mapping,
>> therefore updating last_pkmap_nr to this index is reasonable optimization.
>>
>> Cc: Mel Gorman <mel@csn.ul.ie>
>> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>>
>> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> index ef788b5..97ad208 100644
>> --- a/include/linux/highmem.h
>> +++ b/include/linux/highmem.h
>> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>>
>>  #ifdef CONFIG_HIGHMEM
>>  #include <asm/highmem.h>
>> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
>>
>>  /* declarations for linux/mm/highmem.c */
>>  unsigned int nr_free_highpages(void);
>> diff --git a/mm/highmem.c b/mm/highmem.c
>> index d98b0a9..b365f7b 100644
>> --- a/mm/highmem.c
>> +++ b/mm/highmem.c
>> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>>       return virt_to_page(addr);
>>  }
>>
>> -static void flush_all_zero_pkmaps(void)
>> +static unsigned int flush_all_zero_pkmaps(void)
>>  {
>>       int i;
>> -     int need_flush = 0;
>> +     unsigned int index = PKMAP_INVALID_INDEX;
>>
>>       flush_cache_kmaps();
>>
>> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
>>                         &pkmap_page_table[i]);
>>
>>               set_page_address(page, NULL);
>> -             need_flush = 1;
>> +             if (index == PKMAP_INVALID_INDEX)
>> +                     index = i;
>>       }
>> -     if (need_flush)
>> +     if (index != PKMAP_INVALID_INDEX)
>>               flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> +
>> +     return index;
>>  }
>>
>>  /**
>> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
>>   */
>>  void kmap_flush_unused(void)
>>  {
>> +     unsigned int index;
>> +
>>       lock_kmap();
>> -     flush_all_zero_pkmaps();
>> +     index = flush_all_zero_pkmaps();
>> +     if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
>> +             last_pkmap_nr = index;
>
> I don't know how kmap_flush_unused is really fast path so how my nitpick
> is effective. Anyway,
> What problem happens if we do following as?
>
> lock()
> index = flush_all_zero_pkmaps();
> if (index != PKMAP_INVALID_INDEX)
>         last_pkmap_nr = index;
> unlock();
>
> Normally, last_pkmap_nr is increased with searching empty slot in
> map_new_virtual. So I expect return value of flush_all_zero_pkmaps
> in kmap_flush_unused normally become either less than last_pkmap_nr
> or last_pkmap_nr + 1.

There is a case that return value of kmap_flush_unused() is larger
than last_pkmap_nr.
Look at the following example.

Assume last_pkmap = 20 and index 1-9, 11-19 is kmapped. 10 is kunmapped.

do kmap_flush_unused() => flush index 10 => last_pkmap = 10;
do kunmap() with index 17
do kmap_flush_unused() => flush index 17

So, little dirty implementation is needed.

Thanks.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
@ 2012-11-02 19:07         ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-11-02 19:07 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra

Hello, Minchan.

2012/11/1 Minchan Kim <minchan@kernel.org>:
> On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
>> In current code, after flush_all_zero_pkmaps() is invoked,
>> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
>> return index of first flushed entry. With this index,
>> we can immediately map highmem page to virtual address represented by index.
>> So change return type of flush_all_zero_pkmaps()
>> and return index of first flushed entry.
>>
>> Additionally, update last_pkmap_nr to this index.
>> It is certain that entry which is below this index is occupied by other mapping,
>> therefore updating last_pkmap_nr to this index is reasonable optimization.
>>
>> Cc: Mel Gorman <mel@csn.ul.ie>
>> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> Cc: Minchan Kim <minchan@kernel.org>
>> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>>
>> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> index ef788b5..97ad208 100644
>> --- a/include/linux/highmem.h
>> +++ b/include/linux/highmem.h
>> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>>
>>  #ifdef CONFIG_HIGHMEM
>>  #include <asm/highmem.h>
>> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
>>
>>  /* declarations for linux/mm/highmem.c */
>>  unsigned int nr_free_highpages(void);
>> diff --git a/mm/highmem.c b/mm/highmem.c
>> index d98b0a9..b365f7b 100644
>> --- a/mm/highmem.c
>> +++ b/mm/highmem.c
>> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>>       return virt_to_page(addr);
>>  }
>>
>> -static void flush_all_zero_pkmaps(void)
>> +static unsigned int flush_all_zero_pkmaps(void)
>>  {
>>       int i;
>> -     int need_flush = 0;
>> +     unsigned int index = PKMAP_INVALID_INDEX;
>>
>>       flush_cache_kmaps();
>>
>> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
>>                         &pkmap_page_table[i]);
>>
>>               set_page_address(page, NULL);
>> -             need_flush = 1;
>> +             if (index == PKMAP_INVALID_INDEX)
>> +                     index = i;
>>       }
>> -     if (need_flush)
>> +     if (index != PKMAP_INVALID_INDEX)
>>               flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> +
>> +     return index;
>>  }
>>
>>  /**
>> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
>>   */
>>  void kmap_flush_unused(void)
>>  {
>> +     unsigned int index;
>> +
>>       lock_kmap();
>> -     flush_all_zero_pkmaps();
>> +     index = flush_all_zero_pkmaps();
>> +     if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
>> +             last_pkmap_nr = index;
>
> I don't know how kmap_flush_unused is really fast path so how my nitpick
> is effective. Anyway,
> What problem happens if we do following as?
>
> lock()
> index = flush_all_zero_pkmaps();
> if (index != PKMAP_INVALID_INDEX)
>         last_pkmap_nr = index;
> unlock();
>
> Normally, last_pkmap_nr is increased with searching empty slot in
> map_new_virtual. So I expect return value of flush_all_zero_pkmaps
> in kmap_flush_unused normally become either less than last_pkmap_nr
> or last_pkmap_nr + 1.

There is a case that return value of kmap_flush_unused() is larger
than last_pkmap_nr.
Look at the following example.

Assume last_pkmap = 20 and index 1-9, 11-19 is kmapped. 10 is kunmapped.

do kmap_flush_unused() => flush index 10 => last_pkmap = 10;
do kunmap() with index 17
do kmap_flush_unused() => flush index 17

So, little dirty implementation is needed.

Thanks.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
  2012-11-02 19:07         ` JoonSoo Kim
@ 2012-11-02 22:42           ` Minchan Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-11-02 22:42 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra

Hi Joonsoo,

On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
> Hello, Minchan.
> 
> 2012/11/1 Minchan Kim <minchan@kernel.org>:
> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
> >> In current code, after flush_all_zero_pkmaps() is invoked,
> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
> >> return index of first flushed entry. With this index,
> >> we can immediately map highmem page to virtual address represented by index.
> >> So change return type of flush_all_zero_pkmaps()
> >> and return index of first flushed entry.
> >>
> >> Additionally, update last_pkmap_nr to this index.
> >> It is certain that entry which is below this index is occupied by other mapping,
> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
> >>
> >> Cc: Mel Gorman <mel@csn.ul.ie>
> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >> Cc: Minchan Kim <minchan@kernel.org>
> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> >>
> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> >> index ef788b5..97ad208 100644
> >> --- a/include/linux/highmem.h
> >> +++ b/include/linux/highmem.h
> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
> >>
> >>  #ifdef CONFIG_HIGHMEM
> >>  #include <asm/highmem.h>
> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
> >>
> >>  /* declarations for linux/mm/highmem.c */
> >>  unsigned int nr_free_highpages(void);
> >> diff --git a/mm/highmem.c b/mm/highmem.c
> >> index d98b0a9..b365f7b 100644
> >> --- a/mm/highmem.c
> >> +++ b/mm/highmem.c
> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
> >>       return virt_to_page(addr);
> >>  }
> >>
> >> -static void flush_all_zero_pkmaps(void)
> >> +static unsigned int flush_all_zero_pkmaps(void)
> >>  {
> >>       int i;
> >> -     int need_flush = 0;
> >> +     unsigned int index = PKMAP_INVALID_INDEX;
> >>
> >>       flush_cache_kmaps();
> >>
> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
> >>                         &pkmap_page_table[i]);
> >>
> >>               set_page_address(page, NULL);
> >> -             need_flush = 1;
> >> +             if (index == PKMAP_INVALID_INDEX)
> >> +                     index = i;
> >>       }
> >> -     if (need_flush)
> >> +     if (index != PKMAP_INVALID_INDEX)
> >>               flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> >> +
> >> +     return index;
> >>  }
> >>
> >>  /**
> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
> >>   */
> >>  void kmap_flush_unused(void)
> >>  {
> >> +     unsigned int index;
> >> +
> >>       lock_kmap();
> >> -     flush_all_zero_pkmaps();
> >> +     index = flush_all_zero_pkmaps();
> >> +     if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
> >> +             last_pkmap_nr = index;
> >
> > I don't know how kmap_flush_unused is really fast path so how my nitpick
> > is effective. Anyway,
> > What problem happens if we do following as?
> >
> > lock()
> > index = flush_all_zero_pkmaps();
> > if (index != PKMAP_INVALID_INDEX)
> >         last_pkmap_nr = index;
> > unlock();
> >
> > Normally, last_pkmap_nr is increased with searching empty slot in
> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
> > in kmap_flush_unused normally become either less than last_pkmap_nr
> > or last_pkmap_nr + 1.
> 
> There is a case that return value of kmap_flush_unused() is larger
> than last_pkmap_nr.

I see but why it's problem? kmap_flush_unused returns larger value than
last_pkmap_nr means that there is no free slot at below the value.
So unconditional last_pkmap_nr update is vaild.

> Look at the following example.
> 
> Assume last_pkmap = 20 and index 1-9, 11-19 is kmapped. 10 is kunmapped.
> 
> do kmap_flush_unused() => flush index 10 => last_pkmap = 10;
> do kunmap() with index 17
> do kmap_flush_unused() => flush index 17
> 
> So, little dirty implementation is needed.
> 
> Thanks.

-- 
Kind Regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
@ 2012-11-02 22:42           ` Minchan Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-11-02 22:42 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra

Hi Joonsoo,

On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
> Hello, Minchan.
> 
> 2012/11/1 Minchan Kim <minchan@kernel.org>:
> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
> >> In current code, after flush_all_zero_pkmaps() is invoked,
> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
> >> return index of first flushed entry. With this index,
> >> we can immediately map highmem page to virtual address represented by index.
> >> So change return type of flush_all_zero_pkmaps()
> >> and return index of first flushed entry.
> >>
> >> Additionally, update last_pkmap_nr to this index.
> >> It is certain that entry which is below this index is occupied by other mapping,
> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
> >>
> >> Cc: Mel Gorman <mel@csn.ul.ie>
> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >> Cc: Minchan Kim <minchan@kernel.org>
> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> >>
> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> >> index ef788b5..97ad208 100644
> >> --- a/include/linux/highmem.h
> >> +++ b/include/linux/highmem.h
> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
> >>
> >>  #ifdef CONFIG_HIGHMEM
> >>  #include <asm/highmem.h>
> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
> >>
> >>  /* declarations for linux/mm/highmem.c */
> >>  unsigned int nr_free_highpages(void);
> >> diff --git a/mm/highmem.c b/mm/highmem.c
> >> index d98b0a9..b365f7b 100644
> >> --- a/mm/highmem.c
> >> +++ b/mm/highmem.c
> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
> >>       return virt_to_page(addr);
> >>  }
> >>
> >> -static void flush_all_zero_pkmaps(void)
> >> +static unsigned int flush_all_zero_pkmaps(void)
> >>  {
> >>       int i;
> >> -     int need_flush = 0;
> >> +     unsigned int index = PKMAP_INVALID_INDEX;
> >>
> >>       flush_cache_kmaps();
> >>
> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
> >>                         &pkmap_page_table[i]);
> >>
> >>               set_page_address(page, NULL);
> >> -             need_flush = 1;
> >> +             if (index == PKMAP_INVALID_INDEX)
> >> +                     index = i;
> >>       }
> >> -     if (need_flush)
> >> +     if (index != PKMAP_INVALID_INDEX)
> >>               flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> >> +
> >> +     return index;
> >>  }
> >>
> >>  /**
> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
> >>   */
> >>  void kmap_flush_unused(void)
> >>  {
> >> +     unsigned int index;
> >> +
> >>       lock_kmap();
> >> -     flush_all_zero_pkmaps();
> >> +     index = flush_all_zero_pkmaps();
> >> +     if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
> >> +             last_pkmap_nr = index;
> >
> > I don't know how kmap_flush_unused is really fast path so how my nitpick
> > is effective. Anyway,
> > What problem happens if we do following as?
> >
> > lock()
> > index = flush_all_zero_pkmaps();
> > if (index != PKMAP_INVALID_INDEX)
> >         last_pkmap_nr = index;
> > unlock();
> >
> > Normally, last_pkmap_nr is increased with searching empty slot in
> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
> > in kmap_flush_unused normally become either less than last_pkmap_nr
> > or last_pkmap_nr + 1.
> 
> There is a case that return value of kmap_flush_unused() is larger
> than last_pkmap_nr.

I see but why it's problem? kmap_flush_unused returns larger value than
last_pkmap_nr means that there is no free slot at below the value.
So unconditional last_pkmap_nr update is vaild.

> Look at the following example.
> 
> Assume last_pkmap = 20 and index 1-9, 11-19 is kmapped. 10 is kunmapped.
> 
> do kmap_flush_unused() => flush index 10 => last_pkmap = 10;
> do kunmap() with index 17
> do kmap_flush_unused() => flush index 17
> 
> So, little dirty implementation is needed.
> 
> Thanks.

-- 
Kind Regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
  2012-11-02 22:42           ` Minchan Kim
@ 2012-11-13  0:30             ` JoonSoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-11-13  0:30 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra

2012/11/3 Minchan Kim <minchan@kernel.org>:
> Hi Joonsoo,
>
> On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
>> Hello, Minchan.
>>
>> 2012/11/1 Minchan Kim <minchan@kernel.org>:
>> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
>> >> In current code, after flush_all_zero_pkmaps() is invoked,
>> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
>> >> return index of first flushed entry. With this index,
>> >> we can immediately map highmem page to virtual address represented by index.
>> >> So change return type of flush_all_zero_pkmaps()
>> >> and return index of first flushed entry.
>> >>
>> >> Additionally, update last_pkmap_nr to this index.
>> >> It is certain that entry which is below this index is occupied by other mapping,
>> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
>> >>
>> >> Cc: Mel Gorman <mel@csn.ul.ie>
>> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> >> Cc: Minchan Kim <minchan@kernel.org>
>> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>> >>
>> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> >> index ef788b5..97ad208 100644
>> >> --- a/include/linux/highmem.h
>> >> +++ b/include/linux/highmem.h
>> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>> >>
>> >>  #ifdef CONFIG_HIGHMEM
>> >>  #include <asm/highmem.h>
>> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
>> >>
>> >>  /* declarations for linux/mm/highmem.c */
>> >>  unsigned int nr_free_highpages(void);
>> >> diff --git a/mm/highmem.c b/mm/highmem.c
>> >> index d98b0a9..b365f7b 100644
>> >> --- a/mm/highmem.c
>> >> +++ b/mm/highmem.c
>> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>> >>       return virt_to_page(addr);
>> >>  }
>> >>
>> >> -static void flush_all_zero_pkmaps(void)
>> >> +static unsigned int flush_all_zero_pkmaps(void)
>> >>  {
>> >>       int i;
>> >> -     int need_flush = 0;
>> >> +     unsigned int index = PKMAP_INVALID_INDEX;
>> >>
>> >>       flush_cache_kmaps();
>> >>
>> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
>> >>                         &pkmap_page_table[i]);
>> >>
>> >>               set_page_address(page, NULL);
>> >> -             need_flush = 1;
>> >> +             if (index == PKMAP_INVALID_INDEX)
>> >> +                     index = i;
>> >>       }
>> >> -     if (need_flush)
>> >> +     if (index != PKMAP_INVALID_INDEX)
>> >>               flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> >> +
>> >> +     return index;
>> >>  }
>> >>
>> >>  /**
>> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
>> >>   */
>> >>  void kmap_flush_unused(void)
>> >>  {
>> >> +     unsigned int index;
>> >> +
>> >>       lock_kmap();
>> >> -     flush_all_zero_pkmaps();
>> >> +     index = flush_all_zero_pkmaps();
>> >> +     if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
>> >> +             last_pkmap_nr = index;
>> >
>> > I don't know how kmap_flush_unused is really fast path so how my nitpick
>> > is effective. Anyway,
>> > What problem happens if we do following as?
>> >
>> > lock()
>> > index = flush_all_zero_pkmaps();
>> > if (index != PKMAP_INVALID_INDEX)
>> >         last_pkmap_nr = index;
>> > unlock();
>> >
>> > Normally, last_pkmap_nr is increased with searching empty slot in
>> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
>> > in kmap_flush_unused normally become either less than last_pkmap_nr
>> > or last_pkmap_nr + 1.
>>
>> There is a case that return value of kmap_flush_unused() is larger
>> than last_pkmap_nr.
>
> I see but why it's problem? kmap_flush_unused returns larger value than
> last_pkmap_nr means that there is no free slot at below the value.
> So unconditional last_pkmap_nr update is vaild.

I think that this is not true.
Look at the slightly different example.

Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.

do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
do kunmap() with index 17
do kmap_flush_unused() => flush index 17 => last_pkmap = 17?

In this case, unconditional last_pkmap_nr update skip one kunmapped index.
So, conditional update is needed.

>> Look at the following example.
>>
>> Assume last_pkmap = 20 and index 1-9, 11-19 is kmapped. 10 is kunmapped.
>>
>> do kmap_flush_unused() => flush index 10 => last_pkmap = 10;
>> do kunmap() with index 17
>> do kmap_flush_unused() => flush index 17
>>
>> So, little dirty implementation is needed.
>>
>> Thanks.
>
> --
> Kind Regards,
> Minchan Kim

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
@ 2012-11-13  0:30             ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-11-13  0:30 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra

2012/11/3 Minchan Kim <minchan@kernel.org>:
> Hi Joonsoo,
>
> On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
>> Hello, Minchan.
>>
>> 2012/11/1 Minchan Kim <minchan@kernel.org>:
>> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
>> >> In current code, after flush_all_zero_pkmaps() is invoked,
>> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
>> >> return index of first flushed entry. With this index,
>> >> we can immediately map highmem page to virtual address represented by index.
>> >> So change return type of flush_all_zero_pkmaps()
>> >> and return index of first flushed entry.
>> >>
>> >> Additionally, update last_pkmap_nr to this index.
>> >> It is certain that entry which is below this index is occupied by other mapping,
>> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
>> >>
>> >> Cc: Mel Gorman <mel@csn.ul.ie>
>> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> >> Cc: Minchan Kim <minchan@kernel.org>
>> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>> >>
>> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> >> index ef788b5..97ad208 100644
>> >> --- a/include/linux/highmem.h
>> >> +++ b/include/linux/highmem.h
>> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>> >>
>> >>  #ifdef CONFIG_HIGHMEM
>> >>  #include <asm/highmem.h>
>> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
>> >>
>> >>  /* declarations for linux/mm/highmem.c */
>> >>  unsigned int nr_free_highpages(void);
>> >> diff --git a/mm/highmem.c b/mm/highmem.c
>> >> index d98b0a9..b365f7b 100644
>> >> --- a/mm/highmem.c
>> >> +++ b/mm/highmem.c
>> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>> >>       return virt_to_page(addr);
>> >>  }
>> >>
>> >> -static void flush_all_zero_pkmaps(void)
>> >> +static unsigned int flush_all_zero_pkmaps(void)
>> >>  {
>> >>       int i;
>> >> -     int need_flush = 0;
>> >> +     unsigned int index = PKMAP_INVALID_INDEX;
>> >>
>> >>       flush_cache_kmaps();
>> >>
>> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
>> >>                         &pkmap_page_table[i]);
>> >>
>> >>               set_page_address(page, NULL);
>> >> -             need_flush = 1;
>> >> +             if (index == PKMAP_INVALID_INDEX)
>> >> +                     index = i;
>> >>       }
>> >> -     if (need_flush)
>> >> +     if (index != PKMAP_INVALID_INDEX)
>> >>               flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> >> +
>> >> +     return index;
>> >>  }
>> >>
>> >>  /**
>> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
>> >>   */
>> >>  void kmap_flush_unused(void)
>> >>  {
>> >> +     unsigned int index;
>> >> +
>> >>       lock_kmap();
>> >> -     flush_all_zero_pkmaps();
>> >> +     index = flush_all_zero_pkmaps();
>> >> +     if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
>> >> +             last_pkmap_nr = index;
>> >
>> > I don't know how kmap_flush_unused is really fast path so how my nitpick
>> > is effective. Anyway,
>> > What problem happens if we do following as?
>> >
>> > lock()
>> > index = flush_all_zero_pkmaps();
>> > if (index != PKMAP_INVALID_INDEX)
>> >         last_pkmap_nr = index;
>> > unlock();
>> >
>> > Normally, last_pkmap_nr is increased with searching empty slot in
>> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
>> > in kmap_flush_unused normally become either less than last_pkmap_nr
>> > or last_pkmap_nr + 1.
>>
>> There is a case that return value of kmap_flush_unused() is larger
>> than last_pkmap_nr.
>
> I see but why it's problem? kmap_flush_unused returns larger value than
> last_pkmap_nr means that there is no free slot at below the value.
> So unconditional last_pkmap_nr update is vaild.

I think that this is not true.
Look at the slightly different example.

Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.

do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
do kunmap() with index 17
do kmap_flush_unused() => flush index 17 => last_pkmap = 17?

In this case, unconditional last_pkmap_nr update skip one kunmapped index.
So, conditional update is needed.

>> Look at the following example.
>>
>> Assume last_pkmap = 20 and index 1-9, 11-19 is kmapped. 10 is kunmapped.
>>
>> do kmap_flush_unused() => flush index 10 => last_pkmap = 10;
>> do kunmap() with index 17
>> do kmap_flush_unused() => flush index 17
>>
>> So, little dirty implementation is needed.
>>
>> Thanks.
>
> --
> Kind Regards,
> Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
  2012-11-13  0:30             ` JoonSoo Kim
@ 2012-11-13 12:49               ` Minchan Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-11-13 12:49 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra

On Tue, Nov 13, 2012 at 09:30:57AM +0900, JoonSoo Kim wrote:
> 2012/11/3 Minchan Kim <minchan@kernel.org>:
> > Hi Joonsoo,
> >
> > On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
> >> Hello, Minchan.
> >>
> >> 2012/11/1 Minchan Kim <minchan@kernel.org>:
> >> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
> >> >> In current code, after flush_all_zero_pkmaps() is invoked,
> >> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
> >> >> return index of first flushed entry. With this index,
> >> >> we can immediately map highmem page to virtual address represented by index.
> >> >> So change return type of flush_all_zero_pkmaps()
> >> >> and return index of first flushed entry.
> >> >>
> >> >> Additionally, update last_pkmap_nr to this index.
> >> >> It is certain that entry which is below this index is occupied by other mapping,
> >> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
> >> >>
> >> >> Cc: Mel Gorman <mel@csn.ul.ie>
> >> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >> >> Cc: Minchan Kim <minchan@kernel.org>
> >> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> >> >>
> >> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> >> >> index ef788b5..97ad208 100644
> >> >> --- a/include/linux/highmem.h
> >> >> +++ b/include/linux/highmem.h
> >> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
> >> >>
> >> >>  #ifdef CONFIG_HIGHMEM
> >> >>  #include <asm/highmem.h>
> >> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
> >> >>
> >> >>  /* declarations for linux/mm/highmem.c */
> >> >>  unsigned int nr_free_highpages(void);
> >> >> diff --git a/mm/highmem.c b/mm/highmem.c
> >> >> index d98b0a9..b365f7b 100644
> >> >> --- a/mm/highmem.c
> >> >> +++ b/mm/highmem.c
> >> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
> >> >>       return virt_to_page(addr);
> >> >>  }
> >> >>
> >> >> -static void flush_all_zero_pkmaps(void)
> >> >> +static unsigned int flush_all_zero_pkmaps(void)
> >> >>  {
> >> >>       int i;
> >> >> -     int need_flush = 0;
> >> >> +     unsigned int index = PKMAP_INVALID_INDEX;
> >> >>
> >> >>       flush_cache_kmaps();
> >> >>
> >> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
> >> >>                         &pkmap_page_table[i]);
> >> >>
> >> >>               set_page_address(page, NULL);
> >> >> -             need_flush = 1;
> >> >> +             if (index == PKMAP_INVALID_INDEX)
> >> >> +                     index = i;
> >> >>       }
> >> >> -     if (need_flush)
> >> >> +     if (index != PKMAP_INVALID_INDEX)
> >> >>               flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> >> >> +
> >> >> +     return index;
> >> >>  }
> >> >>
> >> >>  /**
> >> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
> >> >>   */
> >> >>  void kmap_flush_unused(void)
> >> >>  {
> >> >> +     unsigned int index;
> >> >> +
> >> >>       lock_kmap();
> >> >> -     flush_all_zero_pkmaps();
> >> >> +     index = flush_all_zero_pkmaps();
> >> >> +     if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
> >> >> +             last_pkmap_nr = index;
> >> >
> >> > I don't know how kmap_flush_unused is really fast path so how my nitpick
> >> > is effective. Anyway,
> >> > What problem happens if we do following as?
> >> >
> >> > lock()
> >> > index = flush_all_zero_pkmaps();
> >> > if (index != PKMAP_INVALID_INDEX)
> >> >         last_pkmap_nr = index;
> >> > unlock();
> >> >
> >> > Normally, last_pkmap_nr is increased with searching empty slot in
> >> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
> >> > in kmap_flush_unused normally become either less than last_pkmap_nr
> >> > or last_pkmap_nr + 1.
> >>
> >> There is a case that return value of kmap_flush_unused() is larger
> >> than last_pkmap_nr.
> >
> > I see but why it's problem? kmap_flush_unused returns larger value than
> > last_pkmap_nr means that there is no free slot at below the value.
> > So unconditional last_pkmap_nr update is vaild.
> 
> I think that this is not true.
> Look at the slightly different example.
> 
> Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.
> 
> do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
> do kunmap() with index 17
> do kmap_flush_unused() => flush index 17 => last_pkmap = 17?
> 
> In this case, unconditional last_pkmap_nr update skip one kunmapped index.
> So, conditional update is needed.

Thanks for pouinting out, Joonsoo.
You're right. I misunderstood your flush_all_zero_pkmaps change.
As your change, flush_all_zero_pkmaps returns first *flushed* free slot index.
What's the benefit returning flushed flushed free slot index rather than free slot index?
I think flush_all_zero_pkmaps should return first free slot because customer of
flush_all_zero_pkmaps doesn't care whether it's just flushed or not.
What he want is just free or not. In such case, we can remove above check and it makes
flusha_all_zero_pkmaps more intuitive.


-- 
Kind Regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
@ 2012-11-13 12:49               ` Minchan Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-11-13 12:49 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra

On Tue, Nov 13, 2012 at 09:30:57AM +0900, JoonSoo Kim wrote:
> 2012/11/3 Minchan Kim <minchan@kernel.org>:
> > Hi Joonsoo,
> >
> > On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
> >> Hello, Minchan.
> >>
> >> 2012/11/1 Minchan Kim <minchan@kernel.org>:
> >> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
> >> >> In current code, after flush_all_zero_pkmaps() is invoked,
> >> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
> >> >> return index of first flushed entry. With this index,
> >> >> we can immediately map highmem page to virtual address represented by index.
> >> >> So change return type of flush_all_zero_pkmaps()
> >> >> and return index of first flushed entry.
> >> >>
> >> >> Additionally, update last_pkmap_nr to this index.
> >> >> It is certain that entry which is below this index is occupied by other mapping,
> >> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
> >> >>
> >> >> Cc: Mel Gorman <mel@csn.ul.ie>
> >> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >> >> Cc: Minchan Kim <minchan@kernel.org>
> >> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> >> >>
> >> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> >> >> index ef788b5..97ad208 100644
> >> >> --- a/include/linux/highmem.h
> >> >> +++ b/include/linux/highmem.h
> >> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
> >> >>
> >> >>  #ifdef CONFIG_HIGHMEM
> >> >>  #include <asm/highmem.h>
> >> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
> >> >>
> >> >>  /* declarations for linux/mm/highmem.c */
> >> >>  unsigned int nr_free_highpages(void);
> >> >> diff --git a/mm/highmem.c b/mm/highmem.c
> >> >> index d98b0a9..b365f7b 100644
> >> >> --- a/mm/highmem.c
> >> >> +++ b/mm/highmem.c
> >> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
> >> >>       return virt_to_page(addr);
> >> >>  }
> >> >>
> >> >> -static void flush_all_zero_pkmaps(void)
> >> >> +static unsigned int flush_all_zero_pkmaps(void)
> >> >>  {
> >> >>       int i;
> >> >> -     int need_flush = 0;
> >> >> +     unsigned int index = PKMAP_INVALID_INDEX;
> >> >>
> >> >>       flush_cache_kmaps();
> >> >>
> >> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
> >> >>                         &pkmap_page_table[i]);
> >> >>
> >> >>               set_page_address(page, NULL);
> >> >> -             need_flush = 1;
> >> >> +             if (index == PKMAP_INVALID_INDEX)
> >> >> +                     index = i;
> >> >>       }
> >> >> -     if (need_flush)
> >> >> +     if (index != PKMAP_INVALID_INDEX)
> >> >>               flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> >> >> +
> >> >> +     return index;
> >> >>  }
> >> >>
> >> >>  /**
> >> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
> >> >>   */
> >> >>  void kmap_flush_unused(void)
> >> >>  {
> >> >> +     unsigned int index;
> >> >> +
> >> >>       lock_kmap();
> >> >> -     flush_all_zero_pkmaps();
> >> >> +     index = flush_all_zero_pkmaps();
> >> >> +     if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
> >> >> +             last_pkmap_nr = index;
> >> >
> >> > I don't know how kmap_flush_unused is really fast path so how my nitpick
> >> > is effective. Anyway,
> >> > What problem happens if we do following as?
> >> >
> >> > lock()
> >> > index = flush_all_zero_pkmaps();
> >> > if (index != PKMAP_INVALID_INDEX)
> >> >         last_pkmap_nr = index;
> >> > unlock();
> >> >
> >> > Normally, last_pkmap_nr is increased with searching empty slot in
> >> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
> >> > in kmap_flush_unused normally become either less than last_pkmap_nr
> >> > or last_pkmap_nr + 1.
> >>
> >> There is a case that return value of kmap_flush_unused() is larger
> >> than last_pkmap_nr.
> >
> > I see but why it's problem? kmap_flush_unused returns larger value than
> > last_pkmap_nr means that there is no free slot at below the value.
> > So unconditional last_pkmap_nr update is vaild.
> 
> I think that this is not true.
> Look at the slightly different example.
> 
> Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.
> 
> do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
> do kunmap() with index 17
> do kmap_flush_unused() => flush index 17 => last_pkmap = 17?
> 
> In this case, unconditional last_pkmap_nr update skip one kunmapped index.
> So, conditional update is needed.

Thanks for pouinting out, Joonsoo.
You're right. I misunderstood your flush_all_zero_pkmaps change.
As your change, flush_all_zero_pkmaps returns first *flushed* free slot index.
What's the benefit returning flushed flushed free slot index rather than free slot index?
I think flush_all_zero_pkmaps should return first free slot because customer of
flush_all_zero_pkmaps doesn't care whether it's just flushed or not.
What he want is just free or not. In such case, we can remove above check and it makes
flusha_all_zero_pkmaps more intuitive.


-- 
Kind Regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
  2012-11-13 12:49               ` Minchan Kim
@ 2012-11-13 14:12                 ` JoonSoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-11-13 14:12 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra

2012/11/13 Minchan Kim <minchan@kernel.org>:
> On Tue, Nov 13, 2012 at 09:30:57AM +0900, JoonSoo Kim wrote:
>> 2012/11/3 Minchan Kim <minchan@kernel.org>:
>> > Hi Joonsoo,
>> >
>> > On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
>> >> Hello, Minchan.
>> >>
>> >> 2012/11/1 Minchan Kim <minchan@kernel.org>:
>> >> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
>> >> >> In current code, after flush_all_zero_pkmaps() is invoked,
>> >> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
>> >> >> return index of first flushed entry. With this index,
>> >> >> we can immediately map highmem page to virtual address represented by index.
>> >> >> So change return type of flush_all_zero_pkmaps()
>> >> >> and return index of first flushed entry.
>> >> >>
>> >> >> Additionally, update last_pkmap_nr to this index.
>> >> >> It is certain that entry which is below this index is occupied by other mapping,
>> >> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
>> >> >>
>> >> >> Cc: Mel Gorman <mel@csn.ul.ie>
>> >> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> >> >> Cc: Minchan Kim <minchan@kernel.org>
>> >> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>> >> >>
>> >> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> >> >> index ef788b5..97ad208 100644
>> >> >> --- a/include/linux/highmem.h
>> >> >> +++ b/include/linux/highmem.h
>> >> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>> >> >>
>> >> >>  #ifdef CONFIG_HIGHMEM
>> >> >>  #include <asm/highmem.h>
>> >> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
>> >> >>
>> >> >>  /* declarations for linux/mm/highmem.c */
>> >> >>  unsigned int nr_free_highpages(void);
>> >> >> diff --git a/mm/highmem.c b/mm/highmem.c
>> >> >> index d98b0a9..b365f7b 100644
>> >> >> --- a/mm/highmem.c
>> >> >> +++ b/mm/highmem.c
>> >> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>> >> >>       return virt_to_page(addr);
>> >> >>  }
>> >> >>
>> >> >> -static void flush_all_zero_pkmaps(void)
>> >> >> +static unsigned int flush_all_zero_pkmaps(void)
>> >> >>  {
>> >> >>       int i;
>> >> >> -     int need_flush = 0;
>> >> >> +     unsigned int index = PKMAP_INVALID_INDEX;
>> >> >>
>> >> >>       flush_cache_kmaps();
>> >> >>
>> >> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
>> >> >>                         &pkmap_page_table[i]);
>> >> >>
>> >> >>               set_page_address(page, NULL);
>> >> >> -             need_flush = 1;
>> >> >> +             if (index == PKMAP_INVALID_INDEX)
>> >> >> +                     index = i;
>> >> >>       }
>> >> >> -     if (need_flush)
>> >> >> +     if (index != PKMAP_INVALID_INDEX)
>> >> >>               flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> >> >> +
>> >> >> +     return index;
>> >> >>  }
>> >> >>
>> >> >>  /**
>> >> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
>> >> >>   */
>> >> >>  void kmap_flush_unused(void)
>> >> >>  {
>> >> >> +     unsigned int index;
>> >> >> +
>> >> >>       lock_kmap();
>> >> >> -     flush_all_zero_pkmaps();
>> >> >> +     index = flush_all_zero_pkmaps();
>> >> >> +     if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
>> >> >> +             last_pkmap_nr = index;
>> >> >
>> >> > I don't know how kmap_flush_unused is really fast path so how my nitpick
>> >> > is effective. Anyway,
>> >> > What problem happens if we do following as?
>> >> >
>> >> > lock()
>> >> > index = flush_all_zero_pkmaps();
>> >> > if (index != PKMAP_INVALID_INDEX)
>> >> >         last_pkmap_nr = index;
>> >> > unlock();
>> >> >
>> >> > Normally, last_pkmap_nr is increased with searching empty slot in
>> >> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
>> >> > in kmap_flush_unused normally become either less than last_pkmap_nr
>> >> > or last_pkmap_nr + 1.
>> >>
>> >> There is a case that return value of kmap_flush_unused() is larger
>> >> than last_pkmap_nr.
>> >
>> > I see but why it's problem? kmap_flush_unused returns larger value than
>> > last_pkmap_nr means that there is no free slot at below the value.
>> > So unconditional last_pkmap_nr update is vaild.
>>
>> I think that this is not true.
>> Look at the slightly different example.
>>
>> Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.
>>
>> do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
>> do kunmap() with index 17
>> do kmap_flush_unused() => flush index 17 => last_pkmap = 17?
>>
>> In this case, unconditional last_pkmap_nr update skip one kunmapped index.
>> So, conditional update is needed.
>
> Thanks for pouinting out, Joonsoo.
> You're right. I misunderstood your flush_all_zero_pkmaps change.
> As your change, flush_all_zero_pkmaps returns first *flushed* free slot index.
> What's the benefit returning flushed flushed free slot index rather than free slot index?

If flush_all_zero_pkmaps() return free slot index rather than first
flushed free slot,
we need another comparison like as 'if pkmap_count[i] == 0' and
need another local variable for determining whether flush is occurred or not.
I want to minimize these overhead and churning of the code, although
they are negligible.

> I think flush_all_zero_pkmaps should return first free slot because customer of
> flush_all_zero_pkmaps doesn't care whether it's just flushed or not.
> What he want is just free or not. In such case, we can remove above check and it makes
> flusha_all_zero_pkmaps more intuitive.

Yes, it is more intuitive, but as I mentioned above, it need another comparison,
so with that, a benefit which prevent to re-iterate when there is no
free slot, may be disappeared.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
@ 2012-11-13 14:12                 ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-11-13 14:12 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra

2012/11/13 Minchan Kim <minchan@kernel.org>:
> On Tue, Nov 13, 2012 at 09:30:57AM +0900, JoonSoo Kim wrote:
>> 2012/11/3 Minchan Kim <minchan@kernel.org>:
>> > Hi Joonsoo,
>> >
>> > On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
>> >> Hello, Minchan.
>> >>
>> >> 2012/11/1 Minchan Kim <minchan@kernel.org>:
>> >> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
>> >> >> In current code, after flush_all_zero_pkmaps() is invoked,
>> >> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
>> >> >> return index of first flushed entry. With this index,
>> >> >> we can immediately map highmem page to virtual address represented by index.
>> >> >> So change return type of flush_all_zero_pkmaps()
>> >> >> and return index of first flushed entry.
>> >> >>
>> >> >> Additionally, update last_pkmap_nr to this index.
>> >> >> It is certain that entry which is below this index is occupied by other mapping,
>> >> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
>> >> >>
>> >> >> Cc: Mel Gorman <mel@csn.ul.ie>
>> >> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> >> >> Cc: Minchan Kim <minchan@kernel.org>
>> >> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>> >> >>
>> >> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> >> >> index ef788b5..97ad208 100644
>> >> >> --- a/include/linux/highmem.h
>> >> >> +++ b/include/linux/highmem.h
>> >> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>> >> >>
>> >> >>  #ifdef CONFIG_HIGHMEM
>> >> >>  #include <asm/highmem.h>
>> >> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
>> >> >>
>> >> >>  /* declarations for linux/mm/highmem.c */
>> >> >>  unsigned int nr_free_highpages(void);
>> >> >> diff --git a/mm/highmem.c b/mm/highmem.c
>> >> >> index d98b0a9..b365f7b 100644
>> >> >> --- a/mm/highmem.c
>> >> >> +++ b/mm/highmem.c
>> >> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>> >> >>       return virt_to_page(addr);
>> >> >>  }
>> >> >>
>> >> >> -static void flush_all_zero_pkmaps(void)
>> >> >> +static unsigned int flush_all_zero_pkmaps(void)
>> >> >>  {
>> >> >>       int i;
>> >> >> -     int need_flush = 0;
>> >> >> +     unsigned int index = PKMAP_INVALID_INDEX;
>> >> >>
>> >> >>       flush_cache_kmaps();
>> >> >>
>> >> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
>> >> >>                         &pkmap_page_table[i]);
>> >> >>
>> >> >>               set_page_address(page, NULL);
>> >> >> -             need_flush = 1;
>> >> >> +             if (index == PKMAP_INVALID_INDEX)
>> >> >> +                     index = i;
>> >> >>       }
>> >> >> -     if (need_flush)
>> >> >> +     if (index != PKMAP_INVALID_INDEX)
>> >> >>               flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> >> >> +
>> >> >> +     return index;
>> >> >>  }
>> >> >>
>> >> >>  /**
>> >> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
>> >> >>   */
>> >> >>  void kmap_flush_unused(void)
>> >> >>  {
>> >> >> +     unsigned int index;
>> >> >> +
>> >> >>       lock_kmap();
>> >> >> -     flush_all_zero_pkmaps();
>> >> >> +     index = flush_all_zero_pkmaps();
>> >> >> +     if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
>> >> >> +             last_pkmap_nr = index;
>> >> >
>> >> > I don't know how kmap_flush_unused is really fast path so how my nitpick
>> >> > is effective. Anyway,
>> >> > What problem happens if we do following as?
>> >> >
>> >> > lock()
>> >> > index = flush_all_zero_pkmaps();
>> >> > if (index != PKMAP_INVALID_INDEX)
>> >> >         last_pkmap_nr = index;
>> >> > unlock();
>> >> >
>> >> > Normally, last_pkmap_nr is increased with searching empty slot in
>> >> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
>> >> > in kmap_flush_unused normally become either less than last_pkmap_nr
>> >> > or last_pkmap_nr + 1.
>> >>
>> >> There is a case that return value of kmap_flush_unused() is larger
>> >> than last_pkmap_nr.
>> >
>> > I see but why it's problem? kmap_flush_unused returns larger value than
>> > last_pkmap_nr means that there is no free slot at below the value.
>> > So unconditional last_pkmap_nr update is vaild.
>>
>> I think that this is not true.
>> Look at the slightly different example.
>>
>> Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.
>>
>> do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
>> do kunmap() with index 17
>> do kmap_flush_unused() => flush index 17 => last_pkmap = 17?
>>
>> In this case, unconditional last_pkmap_nr update skip one kunmapped index.
>> So, conditional update is needed.
>
> Thanks for pouinting out, Joonsoo.
> You're right. I misunderstood your flush_all_zero_pkmaps change.
> As your change, flush_all_zero_pkmaps returns first *flushed* free slot index.
> What's the benefit returning flushed flushed free slot index rather than free slot index?

If flush_all_zero_pkmaps() return free slot index rather than first
flushed free slot,
we need another comparison like as 'if pkmap_count[i] == 0' and
need another local variable for determining whether flush is occurred or not.
I want to minimize these overhead and churning of the code, although
they are negligible.

> I think flush_all_zero_pkmaps should return first free slot because customer of
> flush_all_zero_pkmaps doesn't care whether it's just flushed or not.
> What he want is just free or not. In such case, we can remove above check and it makes
> flusha_all_zero_pkmaps more intuitive.

Yes, it is more intuitive, but as I mentioned above, it need another comparison,
so with that, a benefit which prevent to re-iterate when there is no
free slot, may be disappeared.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
  2012-11-13 14:12                 ` JoonSoo Kim
@ 2012-11-13 15:01                   ` Minchan Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-11-13 15:01 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra

On Tue, Nov 13, 2012 at 11:12:28PM +0900, JoonSoo Kim wrote:
> 2012/11/13 Minchan Kim <minchan@kernel.org>:
> > On Tue, Nov 13, 2012 at 09:30:57AM +0900, JoonSoo Kim wrote:
> >> 2012/11/3 Minchan Kim <minchan@kernel.org>:
> >> > Hi Joonsoo,
> >> >
> >> > On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
> >> >> Hello, Minchan.
> >> >>
> >> >> 2012/11/1 Minchan Kim <minchan@kernel.org>:
> >> >> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
> >> >> >> In current code, after flush_all_zero_pkmaps() is invoked,
> >> >> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
> >> >> >> return index of first flushed entry. With this index,
> >> >> >> we can immediately map highmem page to virtual address represented by index.
> >> >> >> So change return type of flush_all_zero_pkmaps()
> >> >> >> and return index of first flushed entry.
> >> >> >>
> >> >> >> Additionally, update last_pkmap_nr to this index.
> >> >> >> It is certain that entry which is below this index is occupied by other mapping,
> >> >> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
> >> >> >>
> >> >> >> Cc: Mel Gorman <mel@csn.ul.ie>
> >> >> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >> >> >> Cc: Minchan Kim <minchan@kernel.org>
> >> >> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> >> >> >>
> >> >> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> >> >> >> index ef788b5..97ad208 100644
> >> >> >> --- a/include/linux/highmem.h
> >> >> >> +++ b/include/linux/highmem.h
> >> >> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
> >> >> >>
> >> >> >>  #ifdef CONFIG_HIGHMEM
> >> >> >>  #include <asm/highmem.h>
> >> >> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
> >> >> >>
> >> >> >>  /* declarations for linux/mm/highmem.c */
> >> >> >>  unsigned int nr_free_highpages(void);
> >> >> >> diff --git a/mm/highmem.c b/mm/highmem.c
> >> >> >> index d98b0a9..b365f7b 100644
> >> >> >> --- a/mm/highmem.c
> >> >> >> +++ b/mm/highmem.c
> >> >> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
> >> >> >>       return virt_to_page(addr);
> >> >> >>  }
> >> >> >>
> >> >> >> -static void flush_all_zero_pkmaps(void)
> >> >> >> +static unsigned int flush_all_zero_pkmaps(void)
> >> >> >>  {
> >> >> >>       int i;
> >> >> >> -     int need_flush = 0;
> >> >> >> +     unsigned int index = PKMAP_INVALID_INDEX;
> >> >> >>
> >> >> >>       flush_cache_kmaps();
> >> >> >>
> >> >> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
> >> >> >>                         &pkmap_page_table[i]);
> >> >> >>
> >> >> >>               set_page_address(page, NULL);
> >> >> >> -             need_flush = 1;
> >> >> >> +             if (index == PKMAP_INVALID_INDEX)
> >> >> >> +                     index = i;
> >> >> >>       }
> >> >> >> -     if (need_flush)
> >> >> >> +     if (index != PKMAP_INVALID_INDEX)
> >> >> >>               flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> >> >> >> +
> >> >> >> +     return index;
> >> >> >>  }
> >> >> >>
> >> >> >>  /**
> >> >> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
> >> >> >>   */
> >> >> >>  void kmap_flush_unused(void)
> >> >> >>  {
> >> >> >> +     unsigned int index;
> >> >> >> +
> >> >> >>       lock_kmap();
> >> >> >> -     flush_all_zero_pkmaps();
> >> >> >> +     index = flush_all_zero_pkmaps();
> >> >> >> +     if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
> >> >> >> +             last_pkmap_nr = index;
> >> >> >
> >> >> > I don't know how kmap_flush_unused is really fast path so how my nitpick
> >> >> > is effective. Anyway,
> >> >> > What problem happens if we do following as?
> >> >> >
> >> >> > lock()
> >> >> > index = flush_all_zero_pkmaps();
> >> >> > if (index != PKMAP_INVALID_INDEX)
> >> >> >         last_pkmap_nr = index;
> >> >> > unlock();
> >> >> >
> >> >> > Normally, last_pkmap_nr is increased with searching empty slot in
> >> >> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
> >> >> > in kmap_flush_unused normally become either less than last_pkmap_nr
> >> >> > or last_pkmap_nr + 1.
> >> >>
> >> >> There is a case that return value of kmap_flush_unused() is larger
> >> >> than last_pkmap_nr.
> >> >
> >> > I see but why it's problem? kmap_flush_unused returns larger value than
> >> > last_pkmap_nr means that there is no free slot at below the value.
> >> > So unconditional last_pkmap_nr update is vaild.
> >>
> >> I think that this is not true.
> >> Look at the slightly different example.
> >>
> >> Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.
> >>
> >> do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
> >> do kunmap() with index 17
> >> do kmap_flush_unused() => flush index 17 => last_pkmap = 17?
> >>
> >> In this case, unconditional last_pkmap_nr update skip one kunmapped index.
> >> So, conditional update is needed.
> >
> > Thanks for pouinting out, Joonsoo.
> > You're right. I misunderstood your flush_all_zero_pkmaps change.
> > As your change, flush_all_zero_pkmaps returns first *flushed* free slot index.
> > What's the benefit returning flushed flushed free slot index rather than free slot index?
> 
> If flush_all_zero_pkmaps() return free slot index rather than first
> flushed free slot,
> we need another comparison like as 'if pkmap_count[i] == 0' and
> need another local variable for determining whether flush is occurred or not.
> I want to minimize these overhead and churning of the code, although
> they are negligible.
> 
> > I think flush_all_zero_pkmaps should return first free slot because customer of
> > flush_all_zero_pkmaps doesn't care whether it's just flushed or not.
> > What he want is just free or not. In such case, we can remove above check and it makes
> > flusha_all_zero_pkmaps more intuitive.
> 
> Yes, it is more intuitive, but as I mentioned above, it need another comparison,
> so with that, a benefit which prevent to re-iterate when there is no
> free slot, may be disappeared.

If you're very keen on the performance, why do you have such code?
You can remove below branch if you were keen on the performance.

diff --git a/mm/highmem.c b/mm/highmem.c
index c8be376..44a88dd 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -114,7 +114,7 @@ static unsigned int flush_all_zero_pkmaps(void)
 
        flush_cache_kmaps();
 
-       for (i = 0; i < LAST_PKMAP; i++) {
+       for (i = LAST_PKMAP - 1; i >= 0; i--) {
                struct page *page;
 
                /*
@@ -141,8 +141,7 @@ static unsigned int flush_all_zero_pkmaps(void)
                pte_clear(&init_mm, PKMAP_ADDR(i), &pkmap_page_table[i]);
 
                set_page_address(page, NULL);
-               if (index == PKMAP_INVALID_INDEX)
-                       index = i;
+               index = i;
        }
        if (index != PKMAP_INVALID_INDEX)
                flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));


Anyway, if you have the concern of performance, Okay let's give up making code clear
although I didn't see any report about kmap perfomance. Instead, please consider above
optimization because you have already broken what you mentioned.
If we can't make function clear, another method for it is to add function comment. Please.

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind Regards,
Minchan Kim

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
@ 2012-11-13 15:01                   ` Minchan Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-11-13 15:01 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra

On Tue, Nov 13, 2012 at 11:12:28PM +0900, JoonSoo Kim wrote:
> 2012/11/13 Minchan Kim <minchan@kernel.org>:
> > On Tue, Nov 13, 2012 at 09:30:57AM +0900, JoonSoo Kim wrote:
> >> 2012/11/3 Minchan Kim <minchan@kernel.org>:
> >> > Hi Joonsoo,
> >> >
> >> > On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
> >> >> Hello, Minchan.
> >> >>
> >> >> 2012/11/1 Minchan Kim <minchan@kernel.org>:
> >> >> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
> >> >> >> In current code, after flush_all_zero_pkmaps() is invoked,
> >> >> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
> >> >> >> return index of first flushed entry. With this index,
> >> >> >> we can immediately map highmem page to virtual address represented by index.
> >> >> >> So change return type of flush_all_zero_pkmaps()
> >> >> >> and return index of first flushed entry.
> >> >> >>
> >> >> >> Additionally, update last_pkmap_nr to this index.
> >> >> >> It is certain that entry which is below this index is occupied by other mapping,
> >> >> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
> >> >> >>
> >> >> >> Cc: Mel Gorman <mel@csn.ul.ie>
> >> >> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >> >> >> Cc: Minchan Kim <minchan@kernel.org>
> >> >> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> >> >> >>
> >> >> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> >> >> >> index ef788b5..97ad208 100644
> >> >> >> --- a/include/linux/highmem.h
> >> >> >> +++ b/include/linux/highmem.h
> >> >> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
> >> >> >>
> >> >> >>  #ifdef CONFIG_HIGHMEM
> >> >> >>  #include <asm/highmem.h>
> >> >> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
> >> >> >>
> >> >> >>  /* declarations for linux/mm/highmem.c */
> >> >> >>  unsigned int nr_free_highpages(void);
> >> >> >> diff --git a/mm/highmem.c b/mm/highmem.c
> >> >> >> index d98b0a9..b365f7b 100644
> >> >> >> --- a/mm/highmem.c
> >> >> >> +++ b/mm/highmem.c
> >> >> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
> >> >> >>       return virt_to_page(addr);
> >> >> >>  }
> >> >> >>
> >> >> >> -static void flush_all_zero_pkmaps(void)
> >> >> >> +static unsigned int flush_all_zero_pkmaps(void)
> >> >> >>  {
> >> >> >>       int i;
> >> >> >> -     int need_flush = 0;
> >> >> >> +     unsigned int index = PKMAP_INVALID_INDEX;
> >> >> >>
> >> >> >>       flush_cache_kmaps();
> >> >> >>
> >> >> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
> >> >> >>                         &pkmap_page_table[i]);
> >> >> >>
> >> >> >>               set_page_address(page, NULL);
> >> >> >> -             need_flush = 1;
> >> >> >> +             if (index == PKMAP_INVALID_INDEX)
> >> >> >> +                     index = i;
> >> >> >>       }
> >> >> >> -     if (need_flush)
> >> >> >> +     if (index != PKMAP_INVALID_INDEX)
> >> >> >>               flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> >> >> >> +
> >> >> >> +     return index;
> >> >> >>  }
> >> >> >>
> >> >> >>  /**
> >> >> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
> >> >> >>   */
> >> >> >>  void kmap_flush_unused(void)
> >> >> >>  {
> >> >> >> +     unsigned int index;
> >> >> >> +
> >> >> >>       lock_kmap();
> >> >> >> -     flush_all_zero_pkmaps();
> >> >> >> +     index = flush_all_zero_pkmaps();
> >> >> >> +     if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
> >> >> >> +             last_pkmap_nr = index;
> >> >> >
> >> >> > I don't know how kmap_flush_unused is really fast path so how my nitpick
> >> >> > is effective. Anyway,
> >> >> > What problem happens if we do following as?
> >> >> >
> >> >> > lock()
> >> >> > index = flush_all_zero_pkmaps();
> >> >> > if (index != PKMAP_INVALID_INDEX)
> >> >> >         last_pkmap_nr = index;
> >> >> > unlock();
> >> >> >
> >> >> > Normally, last_pkmap_nr is increased with searching empty slot in
> >> >> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
> >> >> > in kmap_flush_unused normally become either less than last_pkmap_nr
> >> >> > or last_pkmap_nr + 1.
> >> >>
> >> >> There is a case that return value of kmap_flush_unused() is larger
> >> >> than last_pkmap_nr.
> >> >
> >> > I see but why it's problem? kmap_flush_unused returns larger value than
> >> > last_pkmap_nr means that there is no free slot at below the value.
> >> > So unconditional last_pkmap_nr update is vaild.
> >>
> >> I think that this is not true.
> >> Look at the slightly different example.
> >>
> >> Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.
> >>
> >> do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
> >> do kunmap() with index 17
> >> do kmap_flush_unused() => flush index 17 => last_pkmap = 17?
> >>
> >> In this case, unconditional last_pkmap_nr update skip one kunmapped index.
> >> So, conditional update is needed.
> >
> > Thanks for pouinting out, Joonsoo.
> > You're right. I misunderstood your flush_all_zero_pkmaps change.
> > As your change, flush_all_zero_pkmaps returns first *flushed* free slot index.
> > What's the benefit returning flushed flushed free slot index rather than free slot index?
> 
> If flush_all_zero_pkmaps() return free slot index rather than first
> flushed free slot,
> we need another comparison like as 'if pkmap_count[i] == 0' and
> need another local variable for determining whether flush is occurred or not.
> I want to minimize these overhead and churning of the code, although
> they are negligible.
> 
> > I think flush_all_zero_pkmaps should return first free slot because customer of
> > flush_all_zero_pkmaps doesn't care whether it's just flushed or not.
> > What he want is just free or not. In such case, we can remove above check and it makes
> > flusha_all_zero_pkmaps more intuitive.
> 
> Yes, it is more intuitive, but as I mentioned above, it need another comparison,
> so with that, a benefit which prevent to re-iterate when there is no
> free slot, may be disappeared.

If you're very keen on the performance, why do you have such code?
You can remove below branch if you were keen on the performance.

diff --git a/mm/highmem.c b/mm/highmem.c
index c8be376..44a88dd 100644
--- a/mm/highmem.c
+++ b/mm/highmem.c
@@ -114,7 +114,7 @@ static unsigned int flush_all_zero_pkmaps(void)
 
        flush_cache_kmaps();
 
-       for (i = 0; i < LAST_PKMAP; i++) {
+       for (i = LAST_PKMAP - 1; i >= 0; i--) {
                struct page *page;
 
                /*
@@ -141,8 +141,7 @@ static unsigned int flush_all_zero_pkmaps(void)
                pte_clear(&init_mm, PKMAP_ADDR(i), &pkmap_page_table[i]);
 
                set_page_address(page, NULL);
-               if (index == PKMAP_INVALID_INDEX)
-                       index = i;
+               index = i;
        }
        if (index != PKMAP_INVALID_INDEX)
                flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));


Anyway, if you have the concern of performance, Okay let's give up making code clear
although I didn't see any report about kmap perfomance. Instead, please consider above
optimization because you have already broken what you mentioned.
If we can't make function clear, another method for it is to add function comment. Please.

> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind Regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
  2012-11-13 15:01                   ` Minchan Kim
@ 2012-11-14 17:09                     ` JoonSoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-11-14 17:09 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra

Hi, Minchan.

2012/11/14 Minchan Kim <minchan@kernel.org>:
> On Tue, Nov 13, 2012 at 11:12:28PM +0900, JoonSoo Kim wrote:
>> 2012/11/13 Minchan Kim <minchan@kernel.org>:
>> > On Tue, Nov 13, 2012 at 09:30:57AM +0900, JoonSoo Kim wrote:
>> >> 2012/11/3 Minchan Kim <minchan@kernel.org>:
>> >> > Hi Joonsoo,
>> >> >
>> >> > On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
>> >> >> Hello, Minchan.
>> >> >>
>> >> >> 2012/11/1 Minchan Kim <minchan@kernel.org>:
>> >> >> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
>> >> >> >> In current code, after flush_all_zero_pkmaps() is invoked,
>> >> >> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
>> >> >> >> return index of first flushed entry. With this index,
>> >> >> >> we can immediately map highmem page to virtual address represented by index.
>> >> >> >> So change return type of flush_all_zero_pkmaps()
>> >> >> >> and return index of first flushed entry.
>> >> >> >>
>> >> >> >> Additionally, update last_pkmap_nr to this index.
>> >> >> >> It is certain that entry which is below this index is occupied by other mapping,
>> >> >> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
>> >> >> >>
>> >> >> >> Cc: Mel Gorman <mel@csn.ul.ie>
>> >> >> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> >> >> >> Cc: Minchan Kim <minchan@kernel.org>
>> >> >> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>> >> >> >>
>> >> >> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> >> >> >> index ef788b5..97ad208 100644
>> >> >> >> --- a/include/linux/highmem.h
>> >> >> >> +++ b/include/linux/highmem.h
>> >> >> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>> >> >> >>
>> >> >> >>  #ifdef CONFIG_HIGHMEM
>> >> >> >>  #include <asm/highmem.h>
>> >> >> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
>> >> >> >>
>> >> >> >>  /* declarations for linux/mm/highmem.c */
>> >> >> >>  unsigned int nr_free_highpages(void);
>> >> >> >> diff --git a/mm/highmem.c b/mm/highmem.c
>> >> >> >> index d98b0a9..b365f7b 100644
>> >> >> >> --- a/mm/highmem.c
>> >> >> >> +++ b/mm/highmem.c
>> >> >> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>> >> >> >>       return virt_to_page(addr);
>> >> >> >>  }
>> >> >> >>
>> >> >> >> -static void flush_all_zero_pkmaps(void)
>> >> >> >> +static unsigned int flush_all_zero_pkmaps(void)
>> >> >> >>  {
>> >> >> >>       int i;
>> >> >> >> -     int need_flush = 0;
>> >> >> >> +     unsigned int index = PKMAP_INVALID_INDEX;
>> >> >> >>
>> >> >> >>       flush_cache_kmaps();
>> >> >> >>
>> >> >> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
>> >> >> >>                         &pkmap_page_table[i]);
>> >> >> >>
>> >> >> >>               set_page_address(page, NULL);
>> >> >> >> -             need_flush = 1;
>> >> >> >> +             if (index == PKMAP_INVALID_INDEX)
>> >> >> >> +                     index = i;
>> >> >> >>       }
>> >> >> >> -     if (need_flush)
>> >> >> >> +     if (index != PKMAP_INVALID_INDEX)
>> >> >> >>               flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> >> >> >> +
>> >> >> >> +     return index;
>> >> >> >>  }
>> >> >> >>
>> >> >> >>  /**
>> >> >> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
>> >> >> >>   */
>> >> >> >>  void kmap_flush_unused(void)
>> >> >> >>  {
>> >> >> >> +     unsigned int index;
>> >> >> >> +
>> >> >> >>       lock_kmap();
>> >> >> >> -     flush_all_zero_pkmaps();
>> >> >> >> +     index = flush_all_zero_pkmaps();
>> >> >> >> +     if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
>> >> >> >> +             last_pkmap_nr = index;
>> >> >> >
>> >> >> > I don't know how kmap_flush_unused is really fast path so how my nitpick
>> >> >> > is effective. Anyway,
>> >> >> > What problem happens if we do following as?
>> >> >> >
>> >> >> > lock()
>> >> >> > index = flush_all_zero_pkmaps();
>> >> >> > if (index != PKMAP_INVALID_INDEX)
>> >> >> >         last_pkmap_nr = index;
>> >> >> > unlock();
>> >> >> >
>> >> >> > Normally, last_pkmap_nr is increased with searching empty slot in
>> >> >> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
>> >> >> > in kmap_flush_unused normally become either less than last_pkmap_nr
>> >> >> > or last_pkmap_nr + 1.
>> >> >>
>> >> >> There is a case that return value of kmap_flush_unused() is larger
>> >> >> than last_pkmap_nr.
>> >> >
>> >> > I see but why it's problem? kmap_flush_unused returns larger value than
>> >> > last_pkmap_nr means that there is no free slot at below the value.
>> >> > So unconditional last_pkmap_nr update is vaild.
>> >>
>> >> I think that this is not true.
>> >> Look at the slightly different example.
>> >>
>> >> Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.
>> >>
>> >> do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
>> >> do kunmap() with index 17
>> >> do kmap_flush_unused() => flush index 17 => last_pkmap = 17?
>> >>
>> >> In this case, unconditional last_pkmap_nr update skip one kunmapped index.
>> >> So, conditional update is needed.
>> >
>> > Thanks for pouinting out, Joonsoo.
>> > You're right. I misunderstood your flush_all_zero_pkmaps change.
>> > As your change, flush_all_zero_pkmaps returns first *flushed* free slot index.
>> > What's the benefit returning flushed flushed free slot index rather than free slot index?
>>
>> If flush_all_zero_pkmaps() return free slot index rather than first
>> flushed free slot,
>> we need another comparison like as 'if pkmap_count[i] == 0' and
>> need another local variable for determining whether flush is occurred or not.
>> I want to minimize these overhead and churning of the code, although
>> they are negligible.
>>
>> > I think flush_all_zero_pkmaps should return first free slot because customer of
>> > flush_all_zero_pkmaps doesn't care whether it's just flushed or not.
>> > What he want is just free or not. In such case, we can remove above check and it makes
>> > flusha_all_zero_pkmaps more intuitive.
>>
>> Yes, it is more intuitive, but as I mentioned above, it need another comparison,
>> so with that, a benefit which prevent to re-iterate when there is no
>> free slot, may be disappeared.
>
> If you're very keen on the performance, why do you have such code?
> You can remove below branch if you were keen on the performance.
>
> diff --git a/mm/highmem.c b/mm/highmem.c
> index c8be376..44a88dd 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -114,7 +114,7 @@ static unsigned int flush_all_zero_pkmaps(void)
>
>         flush_cache_kmaps();
>
> -       for (i = 0; i < LAST_PKMAP; i++) {
> +       for (i = LAST_PKMAP - 1; i >= 0; i--) {
>                 struct page *page;
>
>                 /*
> @@ -141,8 +141,7 @@ static unsigned int flush_all_zero_pkmaps(void)
>                 pte_clear(&init_mm, PKMAP_ADDR(i), &pkmap_page_table[i]);
>
>                 set_page_address(page, NULL);
> -               if (index == PKMAP_INVALID_INDEX)
> -                       index = i;
> +               index = i;
>         }
>         if (index != PKMAP_INVALID_INDEX)
>                 flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>
>
> Anyway, if you have the concern of performance, Okay let's give up making code clear
> although I didn't see any report about kmap perfomance. Instead, please consider above
> optimization because you have already broken what you mentioned.
> If we can't make function clear, another method for it is to add function comment. Please.

Yes, I also didn't see any report about kmap performance.
By your reviewing comment, I eventually reach that this patch will not
give any benefit.
So how about to drop it?

Thanks for review.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
@ 2012-11-14 17:09                     ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-11-14 17:09 UTC (permalink / raw)
  To: Minchan Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra

Hi, Minchan.

2012/11/14 Minchan Kim <minchan@kernel.org>:
> On Tue, Nov 13, 2012 at 11:12:28PM +0900, JoonSoo Kim wrote:
>> 2012/11/13 Minchan Kim <minchan@kernel.org>:
>> > On Tue, Nov 13, 2012 at 09:30:57AM +0900, JoonSoo Kim wrote:
>> >> 2012/11/3 Minchan Kim <minchan@kernel.org>:
>> >> > Hi Joonsoo,
>> >> >
>> >> > On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
>> >> >> Hello, Minchan.
>> >> >>
>> >> >> 2012/11/1 Minchan Kim <minchan@kernel.org>:
>> >> >> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
>> >> >> >> In current code, after flush_all_zero_pkmaps() is invoked,
>> >> >> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
>> >> >> >> return index of first flushed entry. With this index,
>> >> >> >> we can immediately map highmem page to virtual address represented by index.
>> >> >> >> So change return type of flush_all_zero_pkmaps()
>> >> >> >> and return index of first flushed entry.
>> >> >> >>
>> >> >> >> Additionally, update last_pkmap_nr to this index.
>> >> >> >> It is certain that entry which is below this index is occupied by other mapping,
>> >> >> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
>> >> >> >>
>> >> >> >> Cc: Mel Gorman <mel@csn.ul.ie>
>> >> >> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> >> >> >> Cc: Minchan Kim <minchan@kernel.org>
>> >> >> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>> >> >> >>
>> >> >> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> >> >> >> index ef788b5..97ad208 100644
>> >> >> >> --- a/include/linux/highmem.h
>> >> >> >> +++ b/include/linux/highmem.h
>> >> >> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>> >> >> >>
>> >> >> >>  #ifdef CONFIG_HIGHMEM
>> >> >> >>  #include <asm/highmem.h>
>> >> >> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
>> >> >> >>
>> >> >> >>  /* declarations for linux/mm/highmem.c */
>> >> >> >>  unsigned int nr_free_highpages(void);
>> >> >> >> diff --git a/mm/highmem.c b/mm/highmem.c
>> >> >> >> index d98b0a9..b365f7b 100644
>> >> >> >> --- a/mm/highmem.c
>> >> >> >> +++ b/mm/highmem.c
>> >> >> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>> >> >> >>       return virt_to_page(addr);
>> >> >> >>  }
>> >> >> >>
>> >> >> >> -static void flush_all_zero_pkmaps(void)
>> >> >> >> +static unsigned int flush_all_zero_pkmaps(void)
>> >> >> >>  {
>> >> >> >>       int i;
>> >> >> >> -     int need_flush = 0;
>> >> >> >> +     unsigned int index = PKMAP_INVALID_INDEX;
>> >> >> >>
>> >> >> >>       flush_cache_kmaps();
>> >> >> >>
>> >> >> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
>> >> >> >>                         &pkmap_page_table[i]);
>> >> >> >>
>> >> >> >>               set_page_address(page, NULL);
>> >> >> >> -             need_flush = 1;
>> >> >> >> +             if (index == PKMAP_INVALID_INDEX)
>> >> >> >> +                     index = i;
>> >> >> >>       }
>> >> >> >> -     if (need_flush)
>> >> >> >> +     if (index != PKMAP_INVALID_INDEX)
>> >> >> >>               flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> >> >> >> +
>> >> >> >> +     return index;
>> >> >> >>  }
>> >> >> >>
>> >> >> >>  /**
>> >> >> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
>> >> >> >>   */
>> >> >> >>  void kmap_flush_unused(void)
>> >> >> >>  {
>> >> >> >> +     unsigned int index;
>> >> >> >> +
>> >> >> >>       lock_kmap();
>> >> >> >> -     flush_all_zero_pkmaps();
>> >> >> >> +     index = flush_all_zero_pkmaps();
>> >> >> >> +     if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
>> >> >> >> +             last_pkmap_nr = index;
>> >> >> >
>> >> >> > I don't know how kmap_flush_unused is really fast path so how my nitpick
>> >> >> > is effective. Anyway,
>> >> >> > What problem happens if we do following as?
>> >> >> >
>> >> >> > lock()
>> >> >> > index = flush_all_zero_pkmaps();
>> >> >> > if (index != PKMAP_INVALID_INDEX)
>> >> >> >         last_pkmap_nr = index;
>> >> >> > unlock();
>> >> >> >
>> >> >> > Normally, last_pkmap_nr is increased with searching empty slot in
>> >> >> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
>> >> >> > in kmap_flush_unused normally become either less than last_pkmap_nr
>> >> >> > or last_pkmap_nr + 1.
>> >> >>
>> >> >> There is a case that return value of kmap_flush_unused() is larger
>> >> >> than last_pkmap_nr.
>> >> >
>> >> > I see but why it's problem? kmap_flush_unused returns larger value than
>> >> > last_pkmap_nr means that there is no free slot at below the value.
>> >> > So unconditional last_pkmap_nr update is vaild.
>> >>
>> >> I think that this is not true.
>> >> Look at the slightly different example.
>> >>
>> >> Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.
>> >>
>> >> do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
>> >> do kunmap() with index 17
>> >> do kmap_flush_unused() => flush index 17 => last_pkmap = 17?
>> >>
>> >> In this case, unconditional last_pkmap_nr update skip one kunmapped index.
>> >> So, conditional update is needed.
>> >
>> > Thanks for pouinting out, Joonsoo.
>> > You're right. I misunderstood your flush_all_zero_pkmaps change.
>> > As your change, flush_all_zero_pkmaps returns first *flushed* free slot index.
>> > What's the benefit returning flushed flushed free slot index rather than free slot index?
>>
>> If flush_all_zero_pkmaps() return free slot index rather than first
>> flushed free slot,
>> we need another comparison like as 'if pkmap_count[i] == 0' and
>> need another local variable for determining whether flush is occurred or not.
>> I want to minimize these overhead and churning of the code, although
>> they are negligible.
>>
>> > I think flush_all_zero_pkmaps should return first free slot because customer of
>> > flush_all_zero_pkmaps doesn't care whether it's just flushed or not.
>> > What he want is just free or not. In such case, we can remove above check and it makes
>> > flusha_all_zero_pkmaps more intuitive.
>>
>> Yes, it is more intuitive, but as I mentioned above, it need another comparison,
>> so with that, a benefit which prevent to re-iterate when there is no
>> free slot, may be disappeared.
>
> If you're very keen on the performance, why do you have such code?
> You can remove below branch if you were keen on the performance.
>
> diff --git a/mm/highmem.c b/mm/highmem.c
> index c8be376..44a88dd 100644
> --- a/mm/highmem.c
> +++ b/mm/highmem.c
> @@ -114,7 +114,7 @@ static unsigned int flush_all_zero_pkmaps(void)
>
>         flush_cache_kmaps();
>
> -       for (i = 0; i < LAST_PKMAP; i++) {
> +       for (i = LAST_PKMAP - 1; i >= 0; i--) {
>                 struct page *page;
>
>                 /*
> @@ -141,8 +141,7 @@ static unsigned int flush_all_zero_pkmaps(void)
>                 pte_clear(&init_mm, PKMAP_ADDR(i), &pkmap_page_table[i]);
>
>                 set_page_address(page, NULL);
> -               if (index == PKMAP_INVALID_INDEX)
> -                       index = i;
> +               index = i;
>         }
>         if (index != PKMAP_INVALID_INDEX)
>                 flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>
>
> Anyway, if you have the concern of performance, Okay let's give up making code clear
> although I didn't see any report about kmap perfomance. Instead, please consider above
> optimization because you have already broken what you mentioned.
> If we can't make function clear, another method for it is to add function comment. Please.

Yes, I also didn't see any report about kmap performance.
By your reviewing comment, I eventually reach that this patch will not
give any benefit.
So how about to drop it?

Thanks for review.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
  2012-11-14 17:09                     ` JoonSoo Kim
@ 2012-11-19 23:46                       ` Minchan Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-11-19 23:46 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra

Hi Joonsoo,
Sorry for the delay.

On Thu, Nov 15, 2012 at 02:09:04AM +0900, JoonSoo Kim wrote:
> Hi, Minchan.
> 
> 2012/11/14 Minchan Kim <minchan@kernel.org>:
> > On Tue, Nov 13, 2012 at 11:12:28PM +0900, JoonSoo Kim wrote:
> >> 2012/11/13 Minchan Kim <minchan@kernel.org>:
> >> > On Tue, Nov 13, 2012 at 09:30:57AM +0900, JoonSoo Kim wrote:
> >> >> 2012/11/3 Minchan Kim <minchan@kernel.org>:
> >> >> > Hi Joonsoo,
> >> >> >
> >> >> > On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
> >> >> >> Hello, Minchan.
> >> >> >>
> >> >> >> 2012/11/1 Minchan Kim <minchan@kernel.org>:
> >> >> >> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
> >> >> >> >> In current code, after flush_all_zero_pkmaps() is invoked,
> >> >> >> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
> >> >> >> >> return index of first flushed entry. With this index,
> >> >> >> >> we can immediately map highmem page to virtual address represented by index.
> >> >> >> >> So change return type of flush_all_zero_pkmaps()
> >> >> >> >> and return index of first flushed entry.
> >> >> >> >>
> >> >> >> >> Additionally, update last_pkmap_nr to this index.
> >> >> >> >> It is certain that entry which is below this index is occupied by other mapping,
> >> >> >> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
> >> >> >> >>
> >> >> >> >> Cc: Mel Gorman <mel@csn.ul.ie>
> >> >> >> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >> >> >> >> Cc: Minchan Kim <minchan@kernel.org>
> >> >> >> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> >> >> >> >>
> >> >> >> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> >> >> >> >> index ef788b5..97ad208 100644
> >> >> >> >> --- a/include/linux/highmem.h
> >> >> >> >> +++ b/include/linux/highmem.h
> >> >> >> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
> >> >> >> >>
> >> >> >> >>  #ifdef CONFIG_HIGHMEM
> >> >> >> >>  #include <asm/highmem.h>
> >> >> >> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
> >> >> >> >>
> >> >> >> >>  /* declarations for linux/mm/highmem.c */
> >> >> >> >>  unsigned int nr_free_highpages(void);
> >> >> >> >> diff --git a/mm/highmem.c b/mm/highmem.c
> >> >> >> >> index d98b0a9..b365f7b 100644
> >> >> >> >> --- a/mm/highmem.c
> >> >> >> >> +++ b/mm/highmem.c
> >> >> >> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
> >> >> >> >>       return virt_to_page(addr);
> >> >> >> >>  }
> >> >> >> >>
> >> >> >> >> -static void flush_all_zero_pkmaps(void)
> >> >> >> >> +static unsigned int flush_all_zero_pkmaps(void)
> >> >> >> >>  {
> >> >> >> >>       int i;
> >> >> >> >> -     int need_flush = 0;
> >> >> >> >> +     unsigned int index = PKMAP_INVALID_INDEX;
> >> >> >> >>
> >> >> >> >>       flush_cache_kmaps();
> >> >> >> >>
> >> >> >> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
> >> >> >> >>                         &pkmap_page_table[i]);
> >> >> >> >>
> >> >> >> >>               set_page_address(page, NULL);
> >> >> >> >> -             need_flush = 1;
> >> >> >> >> +             if (index == PKMAP_INVALID_INDEX)
> >> >> >> >> +                     index = i;
> >> >> >> >>       }
> >> >> >> >> -     if (need_flush)
> >> >> >> >> +     if (index != PKMAP_INVALID_INDEX)
> >> >> >> >>               flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> >> >> >> >> +
> >> >> >> >> +     return index;
> >> >> >> >>  }
> >> >> >> >>
> >> >> >> >>  /**
> >> >> >> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
> >> >> >> >>   */
> >> >> >> >>  void kmap_flush_unused(void)
> >> >> >> >>  {
> >> >> >> >> +     unsigned int index;
> >> >> >> >> +
> >> >> >> >>       lock_kmap();
> >> >> >> >> -     flush_all_zero_pkmaps();
> >> >> >> >> +     index = flush_all_zero_pkmaps();
> >> >> >> >> +     if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
> >> >> >> >> +             last_pkmap_nr = index;
> >> >> >> >
> >> >> >> > I don't know how kmap_flush_unused is really fast path so how my nitpick
> >> >> >> > is effective. Anyway,
> >> >> >> > What problem happens if we do following as?
> >> >> >> >
> >> >> >> > lock()
> >> >> >> > index = flush_all_zero_pkmaps();
> >> >> >> > if (index != PKMAP_INVALID_INDEX)
> >> >> >> >         last_pkmap_nr = index;
> >> >> >> > unlock();
> >> >> >> >
> >> >> >> > Normally, last_pkmap_nr is increased with searching empty slot in
> >> >> >> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
> >> >> >> > in kmap_flush_unused normally become either less than last_pkmap_nr
> >> >> >> > or last_pkmap_nr + 1.
> >> >> >>
> >> >> >> There is a case that return value of kmap_flush_unused() is larger
> >> >> >> than last_pkmap_nr.
> >> >> >
> >> >> > I see but why it's problem? kmap_flush_unused returns larger value than
> >> >> > last_pkmap_nr means that there is no free slot at below the value.
> >> >> > So unconditional last_pkmap_nr update is vaild.
> >> >>
> >> >> I think that this is not true.
> >> >> Look at the slightly different example.
> >> >>
> >> >> Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.
> >> >>
> >> >> do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
> >> >> do kunmap() with index 17
> >> >> do kmap_flush_unused() => flush index 17 => last_pkmap = 17?
> >> >>
> >> >> In this case, unconditional last_pkmap_nr update skip one kunmapped index.
> >> >> So, conditional update is needed.
> >> >
> >> > Thanks for pouinting out, Joonsoo.
> >> > You're right. I misunderstood your flush_all_zero_pkmaps change.
> >> > As your change, flush_all_zero_pkmaps returns first *flushed* free slot index.
> >> > What's the benefit returning flushed flushed free slot index rather than free slot index?
> >>
> >> If flush_all_zero_pkmaps() return free slot index rather than first
> >> flushed free slot,
> >> we need another comparison like as 'if pkmap_count[i] == 0' and
> >> need another local variable for determining whether flush is occurred or not.
> >> I want to minimize these overhead and churning of the code, although
> >> they are negligible.
> >>
> >> > I think flush_all_zero_pkmaps should return first free slot because customer of
> >> > flush_all_zero_pkmaps doesn't care whether it's just flushed or not.
> >> > What he want is just free or not. In such case, we can remove above check and it makes
> >> > flusha_all_zero_pkmaps more intuitive.
> >>
> >> Yes, it is more intuitive, but as I mentioned above, it need another comparison,
> >> so with that, a benefit which prevent to re-iterate when there is no
> >> free slot, may be disappeared.
> >
> > If you're very keen on the performance, why do you have such code?
> > You can remove below branch if you were keen on the performance.
> >
> > diff --git a/mm/highmem.c b/mm/highmem.c
> > index c8be376..44a88dd 100644
> > --- a/mm/highmem.c
> > +++ b/mm/highmem.c
> > @@ -114,7 +114,7 @@ static unsigned int flush_all_zero_pkmaps(void)
> >
> >         flush_cache_kmaps();
> >
> > -       for (i = 0; i < LAST_PKMAP; i++) {
> > +       for (i = LAST_PKMAP - 1; i >= 0; i--) {
> >                 struct page *page;
> >
> >                 /*
> > @@ -141,8 +141,7 @@ static unsigned int flush_all_zero_pkmaps(void)
> >                 pte_clear(&init_mm, PKMAP_ADDR(i), &pkmap_page_table[i]);
> >
> >                 set_page_address(page, NULL);
> > -               if (index == PKMAP_INVALID_INDEX)
> > -                       index = i;
> > +               index = i;
> >         }
> >         if (index != PKMAP_INVALID_INDEX)
> >                 flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> >
> >
> > Anyway, if you have the concern of performance, Okay let's give up making code clear
> > although I didn't see any report about kmap perfomance. Instead, please consider above
> > optimization because you have already broken what you mentioned.
> > If we can't make function clear, another method for it is to add function comment. Please.
> 
> Yes, I also didn't see any report about kmap performance.
> By your reviewing comment, I eventually reach that this patch will not
> give any benefit.
> So how about to drop it?

Personally, I prefer to proceed but if you don't have a confidence about gain,
No problem to drop it.
Thanks.

> 
> Thanks for review.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
@ 2012-11-19 23:46                       ` Minchan Kim
  0 siblings, 0 replies; 265+ messages in thread
From: Minchan Kim @ 2012-11-19 23:46 UTC (permalink / raw)
  To: JoonSoo Kim
  Cc: Andrew Morton, linux-kernel, linux-mm, Mel Gorman, Peter Zijlstra

Hi Joonsoo,
Sorry for the delay.

On Thu, Nov 15, 2012 at 02:09:04AM +0900, JoonSoo Kim wrote:
> Hi, Minchan.
> 
> 2012/11/14 Minchan Kim <minchan@kernel.org>:
> > On Tue, Nov 13, 2012 at 11:12:28PM +0900, JoonSoo Kim wrote:
> >> 2012/11/13 Minchan Kim <minchan@kernel.org>:
> >> > On Tue, Nov 13, 2012 at 09:30:57AM +0900, JoonSoo Kim wrote:
> >> >> 2012/11/3 Minchan Kim <minchan@kernel.org>:
> >> >> > Hi Joonsoo,
> >> >> >
> >> >> > On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
> >> >> >> Hello, Minchan.
> >> >> >>
> >> >> >> 2012/11/1 Minchan Kim <minchan@kernel.org>:
> >> >> >> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
> >> >> >> >> In current code, after flush_all_zero_pkmaps() is invoked,
> >> >> >> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
> >> >> >> >> return index of first flushed entry. With this index,
> >> >> >> >> we can immediately map highmem page to virtual address represented by index.
> >> >> >> >> So change return type of flush_all_zero_pkmaps()
> >> >> >> >> and return index of first flushed entry.
> >> >> >> >>
> >> >> >> >> Additionally, update last_pkmap_nr to this index.
> >> >> >> >> It is certain that entry which is below this index is occupied by other mapping,
> >> >> >> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
> >> >> >> >>
> >> >> >> >> Cc: Mel Gorman <mel@csn.ul.ie>
> >> >> >> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
> >> >> >> >> Cc: Minchan Kim <minchan@kernel.org>
> >> >> >> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
> >> >> >> >>
> >> >> >> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
> >> >> >> >> index ef788b5..97ad208 100644
> >> >> >> >> --- a/include/linux/highmem.h
> >> >> >> >> +++ b/include/linux/highmem.h
> >> >> >> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
> >> >> >> >>
> >> >> >> >>  #ifdef CONFIG_HIGHMEM
> >> >> >> >>  #include <asm/highmem.h>
> >> >> >> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
> >> >> >> >>
> >> >> >> >>  /* declarations for linux/mm/highmem.c */
> >> >> >> >>  unsigned int nr_free_highpages(void);
> >> >> >> >> diff --git a/mm/highmem.c b/mm/highmem.c
> >> >> >> >> index d98b0a9..b365f7b 100644
> >> >> >> >> --- a/mm/highmem.c
> >> >> >> >> +++ b/mm/highmem.c
> >> >> >> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
> >> >> >> >>       return virt_to_page(addr);
> >> >> >> >>  }
> >> >> >> >>
> >> >> >> >> -static void flush_all_zero_pkmaps(void)
> >> >> >> >> +static unsigned int flush_all_zero_pkmaps(void)
> >> >> >> >>  {
> >> >> >> >>       int i;
> >> >> >> >> -     int need_flush = 0;
> >> >> >> >> +     unsigned int index = PKMAP_INVALID_INDEX;
> >> >> >> >>
> >> >> >> >>       flush_cache_kmaps();
> >> >> >> >>
> >> >> >> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
> >> >> >> >>                         &pkmap_page_table[i]);
> >> >> >> >>
> >> >> >> >>               set_page_address(page, NULL);
> >> >> >> >> -             need_flush = 1;
> >> >> >> >> +             if (index == PKMAP_INVALID_INDEX)
> >> >> >> >> +                     index = i;
> >> >> >> >>       }
> >> >> >> >> -     if (need_flush)
> >> >> >> >> +     if (index != PKMAP_INVALID_INDEX)
> >> >> >> >>               flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> >> >> >> >> +
> >> >> >> >> +     return index;
> >> >> >> >>  }
> >> >> >> >>
> >> >> >> >>  /**
> >> >> >> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
> >> >> >> >>   */
> >> >> >> >>  void kmap_flush_unused(void)
> >> >> >> >>  {
> >> >> >> >> +     unsigned int index;
> >> >> >> >> +
> >> >> >> >>       lock_kmap();
> >> >> >> >> -     flush_all_zero_pkmaps();
> >> >> >> >> +     index = flush_all_zero_pkmaps();
> >> >> >> >> +     if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
> >> >> >> >> +             last_pkmap_nr = index;
> >> >> >> >
> >> >> >> > I don't know how kmap_flush_unused is really fast path so how my nitpick
> >> >> >> > is effective. Anyway,
> >> >> >> > What problem happens if we do following as?
> >> >> >> >
> >> >> >> > lock()
> >> >> >> > index = flush_all_zero_pkmaps();
> >> >> >> > if (index != PKMAP_INVALID_INDEX)
> >> >> >> >         last_pkmap_nr = index;
> >> >> >> > unlock();
> >> >> >> >
> >> >> >> > Normally, last_pkmap_nr is increased with searching empty slot in
> >> >> >> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
> >> >> >> > in kmap_flush_unused normally become either less than last_pkmap_nr
> >> >> >> > or last_pkmap_nr + 1.
> >> >> >>
> >> >> >> There is a case that return value of kmap_flush_unused() is larger
> >> >> >> than last_pkmap_nr.
> >> >> >
> >> >> > I see but why it's problem? kmap_flush_unused returns larger value than
> >> >> > last_pkmap_nr means that there is no free slot at below the value.
> >> >> > So unconditional last_pkmap_nr update is vaild.
> >> >>
> >> >> I think that this is not true.
> >> >> Look at the slightly different example.
> >> >>
> >> >> Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.
> >> >>
> >> >> do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
> >> >> do kunmap() with index 17
> >> >> do kmap_flush_unused() => flush index 17 => last_pkmap = 17?
> >> >>
> >> >> In this case, unconditional last_pkmap_nr update skip one kunmapped index.
> >> >> So, conditional update is needed.
> >> >
> >> > Thanks for pouinting out, Joonsoo.
> >> > You're right. I misunderstood your flush_all_zero_pkmaps change.
> >> > As your change, flush_all_zero_pkmaps returns first *flushed* free slot index.
> >> > What's the benefit returning flushed flushed free slot index rather than free slot index?
> >>
> >> If flush_all_zero_pkmaps() return free slot index rather than first
> >> flushed free slot,
> >> we need another comparison like as 'if pkmap_count[i] == 0' and
> >> need another local variable for determining whether flush is occurred or not.
> >> I want to minimize these overhead and churning of the code, although
> >> they are negligible.
> >>
> >> > I think flush_all_zero_pkmaps should return first free slot because customer of
> >> > flush_all_zero_pkmaps doesn't care whether it's just flushed or not.
> >> > What he want is just free or not. In such case, we can remove above check and it makes
> >> > flusha_all_zero_pkmaps more intuitive.
> >>
> >> Yes, it is more intuitive, but as I mentioned above, it need another comparison,
> >> so with that, a benefit which prevent to re-iterate when there is no
> >> free slot, may be disappeared.
> >
> > If you're very keen on the performance, why do you have such code?
> > You can remove below branch if you were keen on the performance.
> >
> > diff --git a/mm/highmem.c b/mm/highmem.c
> > index c8be376..44a88dd 100644
> > --- a/mm/highmem.c
> > +++ b/mm/highmem.c
> > @@ -114,7 +114,7 @@ static unsigned int flush_all_zero_pkmaps(void)
> >
> >         flush_cache_kmaps();
> >
> > -       for (i = 0; i < LAST_PKMAP; i++) {
> > +       for (i = LAST_PKMAP - 1; i >= 0; i--) {
> >                 struct page *page;
> >
> >                 /*
> > @@ -141,8 +141,7 @@ static unsigned int flush_all_zero_pkmaps(void)
> >                 pte_clear(&init_mm, PKMAP_ADDR(i), &pkmap_page_table[i]);
> >
> >                 set_page_address(page, NULL);
> > -               if (index == PKMAP_INVALID_INDEX)
> > -                       index = i;
> > +               index = i;
> >         }
> >         if (index != PKMAP_INVALID_INDEX)
> >                 flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
> >
> >
> > Anyway, if you have the concern of performance, Okay let's give up making code clear
> > although I didn't see any report about kmap perfomance. Instead, please consider above
> > optimization because you have already broken what you mentioned.
> > If we can't make function clear, another method for it is to add function comment. Please.
> 
> Yes, I also didn't see any report about kmap performance.
> By your reviewing comment, I eventually reach that this patch will not
> give any benefit.
> So how about to drop it?

Personally, I prefer to proceed but if you don't have a confidence about gain,
No problem to drop it.
Thanks.

> 
> Thanks for review.
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

-- 
Kind regards,
Minchan Kim

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
  2012-11-19 23:46                       ` Minchan Kim
@ 2012-11-27 15:01                         ` JoonSoo Kim
  -1 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-11-27 15:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, Mel Gorman, Minchan Kim

Hello, Andrew.

2012/11/20 Minchan Kim <minchan@kernel.org>:
> Hi Joonsoo,
> Sorry for the delay.
>
> On Thu, Nov 15, 2012 at 02:09:04AM +0900, JoonSoo Kim wrote:
>> Hi, Minchan.
>>
>> 2012/11/14 Minchan Kim <minchan@kernel.org>:
>> > On Tue, Nov 13, 2012 at 11:12:28PM +0900, JoonSoo Kim wrote:
>> >> 2012/11/13 Minchan Kim <minchan@kernel.org>:
>> >> > On Tue, Nov 13, 2012 at 09:30:57AM +0900, JoonSoo Kim wrote:
>> >> >> 2012/11/3 Minchan Kim <minchan@kernel.org>:
>> >> >> > Hi Joonsoo,
>> >> >> >
>> >> >> > On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
>> >> >> >> Hello, Minchan.
>> >> >> >>
>> >> >> >> 2012/11/1 Minchan Kim <minchan@kernel.org>:
>> >> >> >> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
>> >> >> >> >> In current code, after flush_all_zero_pkmaps() is invoked,
>> >> >> >> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
>> >> >> >> >> return index of first flushed entry. With this index,
>> >> >> >> >> we can immediately map highmem page to virtual address represented by index.
>> >> >> >> >> So change return type of flush_all_zero_pkmaps()
>> >> >> >> >> and return index of first flushed entry.
>> >> >> >> >>
>> >> >> >> >> Additionally, update last_pkmap_nr to this index.
>> >> >> >> >> It is certain that entry which is below this index is occupied by other mapping,
>> >> >> >> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
>> >> >> >> >>
>> >> >> >> >> Cc: Mel Gorman <mel@csn.ul.ie>
>> >> >> >> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> >> >> >> >> Cc: Minchan Kim <minchan@kernel.org>
>> >> >> >> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>> >> >> >> >>
>> >> >> >> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> >> >> >> >> index ef788b5..97ad208 100644
>> >> >> >> >> --- a/include/linux/highmem.h
>> >> >> >> >> +++ b/include/linux/highmem.h
>> >> >> >> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>> >> >> >> >>
>> >> >> >> >>  #ifdef CONFIG_HIGHMEM
>> >> >> >> >>  #include <asm/highmem.h>
>> >> >> >> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
>> >> >> >> >>
>> >> >> >> >>  /* declarations for linux/mm/highmem.c */
>> >> >> >> >>  unsigned int nr_free_highpages(void);
>> >> >> >> >> diff --git a/mm/highmem.c b/mm/highmem.c
>> >> >> >> >> index d98b0a9..b365f7b 100644
>> >> >> >> >> --- a/mm/highmem.c
>> >> >> >> >> +++ b/mm/highmem.c
>> >> >> >> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>> >> >> >> >>       return virt_to_page(addr);
>> >> >> >> >>  }
>> >> >> >> >>
>> >> >> >> >> -static void flush_all_zero_pkmaps(void)
>> >> >> >> >> +static unsigned int flush_all_zero_pkmaps(void)
>> >> >> >> >>  {
>> >> >> >> >>       int i;
>> >> >> >> >> -     int need_flush = 0;
>> >> >> >> >> +     unsigned int index = PKMAP_INVALID_INDEX;
>> >> >> >> >>
>> >> >> >> >>       flush_cache_kmaps();
>> >> >> >> >>
>> >> >> >> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
>> >> >> >> >>                         &pkmap_page_table[i]);
>> >> >> >> >>
>> >> >> >> >>               set_page_address(page, NULL);
>> >> >> >> >> -             need_flush = 1;
>> >> >> >> >> +             if (index == PKMAP_INVALID_INDEX)
>> >> >> >> >> +                     index = i;
>> >> >> >> >>       }
>> >> >> >> >> -     if (need_flush)
>> >> >> >> >> +     if (index != PKMAP_INVALID_INDEX)
>> >> >> >> >>               flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> >> >> >> >> +
>> >> >> >> >> +     return index;
>> >> >> >> >>  }
>> >> >> >> >>
>> >> >> >> >>  /**
>> >> >> >> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
>> >> >> >> >>   */
>> >> >> >> >>  void kmap_flush_unused(void)
>> >> >> >> >>  {
>> >> >> >> >> +     unsigned int index;
>> >> >> >> >> +
>> >> >> >> >>       lock_kmap();
>> >> >> >> >> -     flush_all_zero_pkmaps();
>> >> >> >> >> +     index = flush_all_zero_pkmaps();
>> >> >> >> >> +     if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
>> >> >> >> >> +             last_pkmap_nr = index;
>> >> >> >> >
>> >> >> >> > I don't know how kmap_flush_unused is really fast path so how my nitpick
>> >> >> >> > is effective. Anyway,
>> >> >> >> > What problem happens if we do following as?
>> >> >> >> >
>> >> >> >> > lock()
>> >> >> >> > index = flush_all_zero_pkmaps();
>> >> >> >> > if (index != PKMAP_INVALID_INDEX)
>> >> >> >> >         last_pkmap_nr = index;
>> >> >> >> > unlock();
>> >> >> >> >
>> >> >> >> > Normally, last_pkmap_nr is increased with searching empty slot in
>> >> >> >> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
>> >> >> >> > in kmap_flush_unused normally become either less than last_pkmap_nr
>> >> >> >> > or last_pkmap_nr + 1.
>> >> >> >>
>> >> >> >> There is a case that return value of kmap_flush_unused() is larger
>> >> >> >> than last_pkmap_nr.
>> >> >> >
>> >> >> > I see but why it's problem? kmap_flush_unused returns larger value than
>> >> >> > last_pkmap_nr means that there is no free slot at below the value.
>> >> >> > So unconditional last_pkmap_nr update is vaild.
>> >> >>
>> >> >> I think that this is not true.
>> >> >> Look at the slightly different example.
>> >> >>
>> >> >> Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.
>> >> >>
>> >> >> do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
>> >> >> do kunmap() with index 17
>> >> >> do kmap_flush_unused() => flush index 17 => last_pkmap = 17?
>> >> >>
>> >> >> In this case, unconditional last_pkmap_nr update skip one kunmapped index.
>> >> >> So, conditional update is needed.
>> >> >
>> >> > Thanks for pouinting out, Joonsoo.
>> >> > You're right. I misunderstood your flush_all_zero_pkmaps change.
>> >> > As your change, flush_all_zero_pkmaps returns first *flushed* free slot index.
>> >> > What's the benefit returning flushed flushed free slot index rather than free slot index?
>> >>
>> >> If flush_all_zero_pkmaps() return free slot index rather than first
>> >> flushed free slot,
>> >> we need another comparison like as 'if pkmap_count[i] == 0' and
>> >> need another local variable for determining whether flush is occurred or not.
>> >> I want to minimize these overhead and churning of the code, although
>> >> they are negligible.
>> >>
>> >> > I think flush_all_zero_pkmaps should return first free slot because customer of
>> >> > flush_all_zero_pkmaps doesn't care whether it's just flushed or not.
>> >> > What he want is just free or not. In such case, we can remove above check and it makes
>> >> > flusha_all_zero_pkmaps more intuitive.
>> >>
>> >> Yes, it is more intuitive, but as I mentioned above, it need another comparison,
>> >> so with that, a benefit which prevent to re-iterate when there is no
>> >> free slot, may be disappeared.
>> >
>> > If you're very keen on the performance, why do you have such code?
>> > You can remove below branch if you were keen on the performance.
>> >
>> > diff --git a/mm/highmem.c b/mm/highmem.c
>> > index c8be376..44a88dd 100644
>> > --- a/mm/highmem.c
>> > +++ b/mm/highmem.c
>> > @@ -114,7 +114,7 @@ static unsigned int flush_all_zero_pkmaps(void)
>> >
>> >         flush_cache_kmaps();
>> >
>> > -       for (i = 0; i < LAST_PKMAP; i++) {
>> > +       for (i = LAST_PKMAP - 1; i >= 0; i--) {
>> >                 struct page *page;
>> >
>> >                 /*
>> > @@ -141,8 +141,7 @@ static unsigned int flush_all_zero_pkmaps(void)
>> >                 pte_clear(&init_mm, PKMAP_ADDR(i), &pkmap_page_table[i]);
>> >
>> >                 set_page_address(page, NULL);
>> > -               if (index == PKMAP_INVALID_INDEX)
>> > -                       index = i;
>> > +               index = i;
>> >         }
>> >         if (index != PKMAP_INVALID_INDEX)
>> >                 flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> >
>> >
>> > Anyway, if you have the concern of performance, Okay let's give up making code clear
>> > although I didn't see any report about kmap perfomance. Instead, please consider above
>> > optimization because you have already broken what you mentioned.
>> > If we can't make function clear, another method for it is to add function comment. Please.
>>
>> Yes, I also didn't see any report about kmap performance.
>> By your reviewing comment, I eventually reach that this patch will not
>> give any benefit.
>> So how about to drop it?
>
> Personally, I prefer to proceed but if you don't have a confidence about gain,
> No problem to drop it.
> Thanks.
>
>>
>> Thanks for review.

During the review, I concluded that this patch have no gain.
And this patch churn the code too much.
So I want to drop this patch for your tree.

Sorry for late notification.

^ permalink raw reply	[flat|nested] 265+ messages in thread

* Re: [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry
@ 2012-11-27 15:01                         ` JoonSoo Kim
  0 siblings, 0 replies; 265+ messages in thread
From: JoonSoo Kim @ 2012-11-27 15:01 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm, Mel Gorman, Minchan Kim

Hello, Andrew.

2012/11/20 Minchan Kim <minchan@kernel.org>:
> Hi Joonsoo,
> Sorry for the delay.
>
> On Thu, Nov 15, 2012 at 02:09:04AM +0900, JoonSoo Kim wrote:
>> Hi, Minchan.
>>
>> 2012/11/14 Minchan Kim <minchan@kernel.org>:
>> > On Tue, Nov 13, 2012 at 11:12:28PM +0900, JoonSoo Kim wrote:
>> >> 2012/11/13 Minchan Kim <minchan@kernel.org>:
>> >> > On Tue, Nov 13, 2012 at 09:30:57AM +0900, JoonSoo Kim wrote:
>> >> >> 2012/11/3 Minchan Kim <minchan@kernel.org>:
>> >> >> > Hi Joonsoo,
>> >> >> >
>> >> >> > On Sat, Nov 03, 2012 at 04:07:25AM +0900, JoonSoo Kim wrote:
>> >> >> >> Hello, Minchan.
>> >> >> >>
>> >> >> >> 2012/11/1 Minchan Kim <minchan@kernel.org>:
>> >> >> >> > On Thu, Nov 01, 2012 at 01:56:36AM +0900, Joonsoo Kim wrote:
>> >> >> >> >> In current code, after flush_all_zero_pkmaps() is invoked,
>> >> >> >> >> then re-iterate all pkmaps. It can be optimized if flush_all_zero_pkmaps()
>> >> >> >> >> return index of first flushed entry. With this index,
>> >> >> >> >> we can immediately map highmem page to virtual address represented by index.
>> >> >> >> >> So change return type of flush_all_zero_pkmaps()
>> >> >> >> >> and return index of first flushed entry.
>> >> >> >> >>
>> >> >> >> >> Additionally, update last_pkmap_nr to this index.
>> >> >> >> >> It is certain that entry which is below this index is occupied by other mapping,
>> >> >> >> >> therefore updating last_pkmap_nr to this index is reasonable optimization.
>> >> >> >> >>
>> >> >> >> >> Cc: Mel Gorman <mel@csn.ul.ie>
>> >> >> >> >> Cc: Peter Zijlstra <a.p.zijlstra@chello.nl>
>> >> >> >> >> Cc: Minchan Kim <minchan@kernel.org>
>> >> >> >> >> Signed-off-by: Joonsoo Kim <js1304@gmail.com>
>> >> >> >> >>
>> >> >> >> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h
>> >> >> >> >> index ef788b5..97ad208 100644
>> >> >> >> >> --- a/include/linux/highmem.h
>> >> >> >> >> +++ b/include/linux/highmem.h
>> >> >> >> >> @@ -32,6 +32,7 @@ static inline void invalidate_kernel_vmap_range(void *vaddr, int size)
>> >> >> >> >>
>> >> >> >> >>  #ifdef CONFIG_HIGHMEM
>> >> >> >> >>  #include <asm/highmem.h>
>> >> >> >> >> +#define PKMAP_INVALID_INDEX (LAST_PKMAP)
>> >> >> >> >>
>> >> >> >> >>  /* declarations for linux/mm/highmem.c */
>> >> >> >> >>  unsigned int nr_free_highpages(void);
>> >> >> >> >> diff --git a/mm/highmem.c b/mm/highmem.c
>> >> >> >> >> index d98b0a9..b365f7b 100644
>> >> >> >> >> --- a/mm/highmem.c
>> >> >> >> >> +++ b/mm/highmem.c
>> >> >> >> >> @@ -106,10 +106,10 @@ struct page *kmap_to_page(void *vaddr)
>> >> >> >> >>       return virt_to_page(addr);
>> >> >> >> >>  }
>> >> >> >> >>
>> >> >> >> >> -static void flush_all_zero_pkmaps(void)
>> >> >> >> >> +static unsigned int flush_all_zero_pkmaps(void)
>> >> >> >> >>  {
>> >> >> >> >>       int i;
>> >> >> >> >> -     int need_flush = 0;
>> >> >> >> >> +     unsigned int index = PKMAP_INVALID_INDEX;
>> >> >> >> >>
>> >> >> >> >>       flush_cache_kmaps();
>> >> >> >> >>
>> >> >> >> >> @@ -141,10 +141,13 @@ static void flush_all_zero_pkmaps(void)
>> >> >> >> >>                         &pkmap_page_table[i]);
>> >> >> >> >>
>> >> >> >> >>               set_page_address(page, NULL);
>> >> >> >> >> -             need_flush = 1;
>> >> >> >> >> +             if (index == PKMAP_INVALID_INDEX)
>> >> >> >> >> +                     index = i;
>> >> >> >> >>       }
>> >> >> >> >> -     if (need_flush)
>> >> >> >> >> +     if (index != PKMAP_INVALID_INDEX)
>> >> >> >> >>               flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> >> >> >> >> +
>> >> >> >> >> +     return index;
>> >> >> >> >>  }
>> >> >> >> >>
>> >> >> >> >>  /**
>> >> >> >> >> @@ -152,14 +155,19 @@ static void flush_all_zero_pkmaps(void)
>> >> >> >> >>   */
>> >> >> >> >>  void kmap_flush_unused(void)
>> >> >> >> >>  {
>> >> >> >> >> +     unsigned int index;
>> >> >> >> >> +
>> >> >> >> >>       lock_kmap();
>> >> >> >> >> -     flush_all_zero_pkmaps();
>> >> >> >> >> +     index = flush_all_zero_pkmaps();
>> >> >> >> >> +     if (index != PKMAP_INVALID_INDEX && (index < last_pkmap_nr))
>> >> >> >> >> +             last_pkmap_nr = index;
>> >> >> >> >
>> >> >> >> > I don't know how kmap_flush_unused is really fast path so how my nitpick
>> >> >> >> > is effective. Anyway,
>> >> >> >> > What problem happens if we do following as?
>> >> >> >> >
>> >> >> >> > lock()
>> >> >> >> > index = flush_all_zero_pkmaps();
>> >> >> >> > if (index != PKMAP_INVALID_INDEX)
>> >> >> >> >         last_pkmap_nr = index;
>> >> >> >> > unlock();
>> >> >> >> >
>> >> >> >> > Normally, last_pkmap_nr is increased with searching empty slot in
>> >> >> >> > map_new_virtual. So I expect return value of flush_all_zero_pkmaps
>> >> >> >> > in kmap_flush_unused normally become either less than last_pkmap_nr
>> >> >> >> > or last_pkmap_nr + 1.
>> >> >> >>
>> >> >> >> There is a case that return value of kmap_flush_unused() is larger
>> >> >> >> than last_pkmap_nr.
>> >> >> >
>> >> >> > I see but why it's problem? kmap_flush_unused returns larger value than
>> >> >> > last_pkmap_nr means that there is no free slot at below the value.
>> >> >> > So unconditional last_pkmap_nr update is vaild.
>> >> >>
>> >> >> I think that this is not true.
>> >> >> Look at the slightly different example.
>> >> >>
>> >> >> Assume last_pkmap = 20 and index 1-9, 12-19 is kmapped. 10, 11 is kunmapped.
>> >> >>
>> >> >> do kmap_flush_unused() => flush index 10,11 => last_pkmap = 10;
>> >> >> do kunmap() with index 17
>> >> >> do kmap_flush_unused() => flush index 17 => last_pkmap = 17?
>> >> >>
>> >> >> In this case, unconditional last_pkmap_nr update skip one kunmapped index.
>> >> >> So, conditional update is needed.
>> >> >
>> >> > Thanks for pouinting out, Joonsoo.
>> >> > You're right. I misunderstood your flush_all_zero_pkmaps change.
>> >> > As your change, flush_all_zero_pkmaps returns first *flushed* free slot index.
>> >> > What's the benefit returning flushed flushed free slot index rather than free slot index?
>> >>
>> >> If flush_all_zero_pkmaps() return free slot index rather than first
>> >> flushed free slot,
>> >> we need another comparison like as 'if pkmap_count[i] == 0' and
>> >> need another local variable for determining whether flush is occurred or not.
>> >> I want to minimize these overhead and churning of the code, although
>> >> they are negligible.
>> >>
>> >> > I think flush_all_zero_pkmaps should return first free slot because customer of
>> >> > flush_all_zero_pkmaps doesn't care whether it's just flushed or not.
>> >> > What he want is just free or not. In such case, we can remove above check and it makes
>> >> > flusha_all_zero_pkmaps more intuitive.
>> >>
>> >> Yes, it is more intuitive, but as I mentioned above, it need another comparison,
>> >> so with that, a benefit which prevent to re-iterate when there is no
>> >> free slot, may be disappeared.
>> >
>> > If you're very keen on the performance, why do you have such code?
>> > You can remove below branch if you were keen on the performance.
>> >
>> > diff --git a/mm/highmem.c b/mm/highmem.c
>> > index c8be376..44a88dd 100644
>> > --- a/mm/highmem.c
>> > +++ b/mm/highmem.c
>> > @@ -114,7 +114,7 @@ static unsigned int flush_all_zero_pkmaps(void)
>> >
>> >         flush_cache_kmaps();
>> >
>> > -       for (i = 0; i < LAST_PKMAP; i++) {
>> > +       for (i = LAST_PKMAP - 1; i >= 0; i--) {
>> >                 struct page *page;
>> >
>> >                 /*
>> > @@ -141,8 +141,7 @@ static unsigned int flush_all_zero_pkmaps(void)
>> >                 pte_clear(&init_mm, PKMAP_ADDR(i), &pkmap_page_table[i]);
>> >
>> >                 set_page_address(page, NULL);
>> > -               if (index == PKMAP_INVALID_INDEX)
>> > -                       index = i;
>> > +               index = i;
>> >         }
>> >         if (index != PKMAP_INVALID_INDEX)
>> >                 flush_tlb_kernel_range(PKMAP_ADDR(0), PKMAP_ADDR(LAST_PKMAP));
>> >
>> >
>> > Anyway, if you have the concern of performance, Okay let's give up making code clear
>> > although I didn't see any report about kmap perfomance. Instead, please consider above
>> > optimization because you have already broken what you mentioned.
>> > If we can't make function clear, another method for it is to add function comment. Please.
>>
>> Yes, I also didn't see any report about kmap performance.
>> By your reviewing comment, I eventually reach that this patch will not
>> give any benefit.
>> So how about to drop it?
>
> Personally, I prefer to proceed but if you don't have a confidence about gain,
> No problem to drop it.
> Thanks.
>
>>
>> Thanks for review.

During the review, I concluded that this patch have no gain.
And this patch churn the code too much.
So I want to drop this patch for your tree.

Sorry for late notification.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 265+ messages in thread

end of thread, other threads:[~2012-11-27 15:01 UTC | newest]

Thread overview: 265+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <Yes>
2010-07-05 12:41 ` [PATCH 00/09] cifs: local caching support using FS-Cache Suresh Jayaraman
     [not found]   ` <1278333663-30464-1-git-send-email-sjayaraman-l3A5Bk7waGM@public.gmane.org>
2010-07-14 17:41     ` Scott Lovenberg
     [not found]       ` <4C3DF6BF.3070001-Re5JQEeQqe8AvxtiuMwx3w@public.gmane.org>
2010-07-14 18:09         ` Steve French
     [not found]           ` <AANLkTin2tKtkWTflrrzBMYBEd6SFr35uYUl1SmfYlj9W-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-15 16:23             ` Suresh Jayaraman
     [not found]               ` <4C3F35F7.8060408-l3A5Bk7waGM@public.gmane.org>
2010-07-22  9:28                 ` Suresh Jayaraman
     [not found]               ` <4C480F51.8070204-l3A5Bk7waGM@public.gmane.org>
2010-07-22 17:40                 ` David Howells
     [not found]                   ` <1892.1279820400-H+wXaHxf7aLQT0dZR+AlfA@public.gmane.org>
2010-07-22 19:12                     ` Jeff Layton
2010-07-22 22:49                     ` Andreas Dilger
2010-07-23  8:35                       ` Stef Bon
     [not found]                         ` <AANLkTikF5Oz5pobaPUJebUg+yPuoVy_B5PBz+nuUTSii-JsoAwUIsXosN+BqQ9rBEUg@public.gmane.org>
2010-07-23 16:16                           ` Suresh Jayaraman
2010-07-24  5:40                             ` Stef Bon
2010-07-23 11:57                     ` Suresh Jayaraman
     [not found]                       ` <4C4983B0.5080804-l3A5Bk7waGM@public.gmane.org>
2010-07-23 15:03                         ` Steve French
2010-07-30 23:08                 ` Scott Lovenberg
2010-07-05 12:41 ` [PATCH 01/09] cifs: add kernel config option for CIFS Client caching support Suresh Jayaraman
2010-07-05 12:41 ` [PATCH 02/09] cifs: register CIFS for caching Suresh Jayaraman
2010-07-05 12:42 ` [PATCH 03/09] cifs: define server-level cache index objects and register them Suresh Jayaraman
2010-07-05 12:42 ` [PATCH 04/09] cifs: define superblock-level " Suresh Jayaraman
2010-07-20 12:53   ` Jeff Layton
     [not found]     ` <20100720085327.4d1bf9d7-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2010-07-20 13:37       ` Jeff Layton
     [not found]         ` <20100720093722.4f734f03-9yPaYZwiELC+kQycOl6kW4xkIHaj4LzF@public.gmane.org>
2010-07-21 14:06           ` Suresh Jayaraman
2010-07-05 12:42 ` [PATCH 05/09] cifs: define inode-level cache object " Suresh Jayaraman
2010-07-05 12:43 ` [PATCH 06/09] cifs: FS-Cache page management Suresh Jayaraman
2010-07-05 12:43 ` [PATCH 07/09] cifs: store pages into local cache Suresh Jayaraman
2010-07-05 12:43 ` [PATCH 08/09] cifs: read pages from FS-Cache Suresh Jayaraman
2010-07-05 12:43 ` [PATCH 09/09] cifs: add mount option to enable local caching Suresh Jayaraman
2010-12-20 22:18 ` [PATCH v2] TODO: CDMA SMS and CDMA CMAS Lei Yu
2010-12-20 22:34   ` Denis Kenzior
2012-07-16 16:14 ` [PATCH 1/3] mm: correct return value of migrate_pages() Joonsoo Kim
2012-07-16 16:14   ` Joonsoo Kim
2012-07-16 16:14   ` [PATCH 2/3] mm: fix possible incorrect return value of migrate_pages() syscall Joonsoo Kim
2012-07-16 16:14     ` Joonsoo Kim
2012-07-16 17:26     ` Christoph Lameter
2012-07-16 17:26       ` Christoph Lameter
2012-07-16 17:40     ` Michal Nazarewicz
2012-07-16 17:59       ` JoonSoo Kim
2012-07-16 17:59         ` JoonSoo Kim
2012-07-17 13:02         ` Michal Nazarewicz
2012-07-17 13:02           ` Michal Nazarewicz
2012-07-16 16:14   ` [PATCH 3/3] mm: fix return value in __alloc_contig_migrate_range() Joonsoo Kim
2012-07-16 16:14     ` Joonsoo Kim
2012-07-16 17:29     ` Christoph Lameter
2012-07-16 17:29       ` Christoph Lameter
2012-07-16 17:40     ` Michal Nazarewicz
2012-07-16 18:40       ` JoonSoo Kim
2012-07-16 18:40         ` JoonSoo Kim
2012-07-17 13:16         ` Michal Nazarewicz
2012-07-17 13:16           ` Michal Nazarewicz
2012-07-16 17:14   ` [PATCH 4] mm: fix possible incorrect return value of move_pages() syscall Joonsoo Kim
2012-07-16 17:14     ` Joonsoo Kim
2012-07-16 17:30     ` Christoph Lameter
2012-07-16 17:30       ` Christoph Lameter
2012-07-16 17:23   ` [PATCH 1/3] mm: correct return value of migrate_pages() Christoph Lameter
2012-07-16 17:23     ` Christoph Lameter
2012-07-16 17:32     ` JoonSoo Kim
2012-07-16 17:32       ` JoonSoo Kim
2012-07-16 17:37       ` Christoph Lameter
2012-07-16 17:37         ` Christoph Lameter
2012-07-16 17:40   ` Michal Nazarewicz
2012-07-16 17:57     ` JoonSoo Kim
2012-07-16 17:57       ` JoonSoo Kim
2012-07-16 18:05       ` Christoph Lameter
2012-07-16 18:05         ` Christoph Lameter
2012-07-17 12:33 ` [PATCH 1/4 v2] mm: correct return value of migrate_pages() and migrate_huge_pages() Joonsoo Kim
2012-07-17 12:33   ` Joonsoo Kim
2012-07-17 12:33   ` [PATCH 2/4 v2] mm: fix possible incorrect return value of migrate_pages() syscall Joonsoo Kim
2012-07-17 12:33     ` Joonsoo Kim
2012-07-17 14:28     ` Christoph Lameter
2012-07-17 14:28       ` Christoph Lameter
2012-07-17 15:41       ` JoonSoo Kim
2012-07-17 15:41         ` JoonSoo Kim
2012-07-17 12:33   ` [PATCH 3/4 v2] mm: fix return value in __alloc_contig_migrate_range() Joonsoo Kim
2012-07-17 12:33     ` Joonsoo Kim
2012-07-17 13:25     ` Michal Nazarewicz
2012-07-17 13:25       ` Michal Nazarewicz
2012-07-17 15:45       ` JoonSoo Kim
2012-07-17 15:45         ` JoonSoo Kim
2012-07-17 15:49         ` [PATCH 3/4 v3] " Joonsoo Kim
2012-07-17 15:49           ` Joonsoo Kim
2012-07-17 12:33   ` [PATCH 4/4 v2] mm: fix possible incorrect return value of move_pages() syscall Joonsoo Kim
2012-07-17 12:33     ` Joonsoo Kim
2012-07-27 17:55 ` [RESEND PATCH 1/4 v3] mm: correct return value of migrate_pages() and migrate_huge_pages() Joonsoo Kim
2012-07-27 17:55   ` Joonsoo Kim
2012-07-27 17:55   ` [RESEND PATCH 2/4 v3] mm: fix possible incorrect return value of migrate_pages() syscall Joonsoo Kim
2012-07-27 17:55     ` Joonsoo Kim
2012-07-27 20:57     ` Christoph Lameter
2012-07-27 20:57       ` Christoph Lameter
2012-07-28  6:16       ` JoonSoo Kim
2012-07-28  6:16         ` JoonSoo Kim
2012-07-30 19:30         ` Christoph Lameter
2012-07-30 19:30           ` Christoph Lameter
2012-07-27 17:55   ` [RESEND PATCH 3/4 v3] mm: fix return value in __alloc_contig_migrate_range() Joonsoo Kim
2012-07-27 17:55     ` Joonsoo Kim
2012-07-27 17:55   ` [RESEND PATCH 4/4 v3] mm: fix possible incorrect return value of move_pages() syscall Joonsoo Kim
2012-07-27 17:55     ` Joonsoo Kim
2012-07-27 20:54     ` Christoph Lameter
2012-07-27 20:54       ` Christoph Lameter
2012-07-28  6:09       ` JoonSoo Kim
2012-07-28  6:09         ` JoonSoo Kim
2012-07-30 19:29         ` Christoph Lameter
2012-07-30 19:29           ` Christoph Lameter
2012-07-31  3:34           ` JoonSoo Kim
2012-07-31  3:34             ` JoonSoo Kim
2012-07-31 14:04             ` Christoph Lameter
2012-07-31 14:04               ` Christoph Lameter
2012-08-01  5:15           ` Michael Kerrisk
2012-08-01 18:00             ` Christoph Lameter
2012-08-01 18:00               ` Christoph Lameter
2012-08-02  5:52               ` Michael Kerrisk
2012-08-02  5:52                 ` Michael Kerrisk
2012-08-13 16:17 ` [PATCH 1/5] workqueue: use enum value to set array size of pools in gcwq Joonsoo Kim
2012-08-13 16:17   ` [PATCH 2/5] workqueue: change value of lcpu in queue_delayed_work_on() Joonsoo Kim
2012-08-13 16:32     ` Tejun Heo
2012-08-13 16:54       ` JoonSoo Kim
2012-08-13 17:03         ` Tejun Heo
2012-08-13 17:43           ` JoonSoo Kim
2012-08-13 18:00             ` Tejun Heo
2012-08-14 18:04               ` JoonSoo Kim
2012-08-13 16:17   ` [PATCH 3/5] workqueue: introduce system_highpri_wq Joonsoo Kim
2012-08-13 16:17   ` [PATCH 4/5] workqueue: use system_highpri_wq for highpri workers in rebind_workers() Joonsoo Kim
2012-08-13 16:34     ` Tejun Heo
2012-08-13 16:57       ` JoonSoo Kim
2012-08-13 17:05         ` Tejun Heo
2012-08-13 17:45           ` JoonSoo Kim
2012-08-13 16:17   ` [PATCH 5/5] workqueue: use system_highpri_wq for unbind_work Joonsoo Kim
2012-08-13 16:36     ` Tejun Heo
2012-08-13 17:02       ` JoonSoo Kim
2012-08-13 17:07         ` Tejun Heo
2012-08-13 17:52           ` JoonSoo Kim
2012-08-14 18:10 ` [PATCH v2 0/6] system_highpri_wq Joonsoo Kim
2012-08-14 18:10   ` [PATCH v2 1/6] workqueue: use enum value to set array size of pools in gcwq Joonsoo Kim
2012-08-14 18:10   ` [PATCH v2 2/6] workqueue: correct req_cpu in trace_workqueue_queue_work() Joonsoo Kim
2012-08-14 18:34     ` Tejun Heo
2012-08-14 18:56       ` JoonSoo Kim
2012-08-14 18:10   ` [PATCH v2 3/6] workqueue: change value of lcpu in __queue_delayed_work_on() Joonsoo Kim
2012-08-14 19:00     ` Tejun Heo
2012-08-14 18:10   ` [PATCH v2 4/6] workqueue: introduce system_highpri_wq Joonsoo Kim
2012-08-14 18:10   ` [PATCH v2 5/6] workqueue: use system_highpri_wq for highpri workers in rebind_workers() Joonsoo Kim
2012-08-14 18:29     ` Tejun Heo
2012-08-14 18:10   ` [PATCH v2 6/6] workqueue: use system_highpri_wq for unbind_work Joonsoo Kim
2012-08-15 14:25 ` [PATCH v3 0/6] system_highpri_wq Joonsoo Kim
2012-08-15 14:25   ` [PATCH v3 1/6] workqueue: use enum value to set array size of pools in gcwq Joonsoo Kim
2012-08-15 14:25   ` [PATCH v3 2/6] workqueue: correct req_cpu in trace_workqueue_queue_work() Joonsoo Kim
2012-08-15 14:25   ` [PATCH v3 3/6] workqueue: change value of lcpu in __queue_delayed_work_on() Joonsoo Kim
2012-08-15 14:25   ` [PATCH v3 4/6] workqueue: introduce system_highpri_wq Joonsoo Kim
2012-08-15 14:25   ` [PATCH v3 5/6] workqueue: use system_highpri_wq for highpri workers in rebind_workers() Joonsoo Kim
2012-08-15 14:25   ` [PATCH v3 6/6] workqueue: use system_highpri_wq for unbind_work Joonsoo Kim
2012-08-16 21:22   ` [PATCH v3 0/6] system_highpri_wq Tejun Heo
2012-08-17 13:38     ` JoonSoo Kim
2012-08-24 16:05 ` [PATCH 1/2] slub: rename cpu_partial to max_cpu_object Joonsoo Kim
2012-08-24 16:05   ` Joonsoo Kim
2012-08-24 16:05   ` [PATCH 2/2] slub: correct the calculation of the number of cpu objects in get_partial_node Joonsoo Kim
2012-08-24 16:05     ` Joonsoo Kim
2012-08-24 16:15     ` Christoph Lameter
2012-08-24 16:15       ` Christoph Lameter
2012-08-24 16:28       ` JoonSoo Kim
2012-08-24 16:28         ` JoonSoo Kim
2012-08-24 16:31         ` Christoph Lameter
2012-08-24 16:31           ` Christoph Lameter
2012-08-24 16:40           ` JoonSoo Kim
2012-08-24 16:40             ` JoonSoo Kim
2012-08-24 16:12   ` [PATCH 1/2] slub: rename cpu_partial to max_cpu_object Christoph Lameter
2012-08-24 16:12     ` Christoph Lameter
2012-08-25 14:11 ` [PATCH 1/2] slab: do ClearSlabPfmemalloc() for all pages of slab Joonsoo Kim
2012-08-25 14:11   ` Joonsoo Kim
2012-08-25 14:11   ` [PATCH 2/2] slab: fix starting index for finding another object Joonsoo Kim
2012-08-25 14:11     ` Joonsoo Kim
2012-09-03 10:08   ` [PATCH 1/2] slab: do ClearSlabPfmemalloc() for all pages of slab Mel Gorman
2012-09-03 10:08     ` Mel Gorman
2012-10-18 23:18 ` [PATCH 0/2] clean-up initialization of deferrable timer Joonsoo Kim
2012-10-18 23:18   ` [PATCH 1/2] timer: add setup_timer_deferrable() macro Joonsoo Kim
2012-10-18 23:18   ` [PATCH 2/2] timer: use new " Joonsoo Kim
2012-10-26 14:08   ` [PATCH 0/2] clean-up initialization of deferrable timer JoonSoo Kim
2012-10-20 15:48 ` [PATCH for-v3.7 1/2] slub: optimize poorly inlined kmalloc* functions Joonsoo Kim
2012-10-20 15:48   ` Joonsoo Kim
2012-10-20 15:48   ` [PATCH for-v3.7 2/2] slub: optimize kmalloc* inlining for GFP_DMA Joonsoo Kim
2012-10-20 15:48     ` Joonsoo Kim
2012-10-22 14:31     ` Christoph Lameter
2012-10-22 14:31       ` Christoph Lameter
2012-10-23  2:29       ` JoonSoo Kim
2012-10-23  2:29         ` JoonSoo Kim
2012-10-23  6:16         ` Eric Dumazet
2012-10-23  6:16           ` Eric Dumazet
2012-10-23 16:12           ` JoonSoo Kim
2012-10-23 16:12             ` JoonSoo Kim
2012-10-24  8:05   ` [PATCH for-v3.7 1/2] slub: optimize poorly inlined kmalloc* functions Pekka Enberg
2012-10-24  8:05     ` Pekka Enberg
2012-10-24 13:36     ` Christoph Lameter
2012-10-24 13:36       ` Christoph Lameter
2012-10-20 16:30 ` [PATCH 0/3] workqueue: minor cleanup Joonsoo Kim
2012-10-20 16:30   ` [PATCH 1/3] workqueue: optimize mod_delayed_work_on() when @delay == 0 Joonsoo Kim
2012-10-20 16:30   ` [PATCH 2/3] workqueue: trivial fix for return statement in work_busy() Joonsoo Kim
2012-10-20 22:53     ` Tejun Heo
2012-10-20 16:30   ` [PATCH 3/3] workqueue: remove unused argument of wq_worker_waking_up() Joonsoo Kim
2012-10-20 22:57     ` Tejun Heo
2012-10-23  1:44       ` JoonSoo Kim
2012-10-24 19:14         ` Tejun Heo
2012-10-28 19:12 ` [PATCH 0/5] minor clean-up and optimize highmem related code Joonsoo Kim
2012-10-28 19:12   ` Joonsoo Kim
2012-10-28 19:12   ` [PATCH 1/5] mm, highmem: use PKMAP_NR() to calculate an index of pkmap Joonsoo Kim
2012-10-28 19:12     ` Joonsoo Kim
2012-10-29  1:48     ` Minchan Kim
2012-10-29  1:48       ` Minchan Kim
2012-10-28 19:12   ` [PATCH 2/5] mm, highmem: remove useless pool_lock Joonsoo Kim
2012-10-28 19:12     ` Joonsoo Kim
2012-10-29  1:52     ` Minchan Kim
2012-10-29  1:52       ` Minchan Kim
2012-10-30 21:31     ` Andrew Morton
2012-10-30 21:31       ` Andrew Morton
2012-10-31  5:14       ` Minchan Kim
2012-10-31  5:14         ` Minchan Kim
2012-10-31 15:01       ` JoonSoo Kim
2012-10-31 15:01         ` JoonSoo Kim
2012-10-28 19:12   ` [PATCH 3/5] mm, highmem: remove page_address_pool list Joonsoo Kim
2012-10-28 19:12     ` Joonsoo Kim
2012-10-29  1:57     ` Minchan Kim
2012-10-29  1:57       ` Minchan Kim
2012-10-28 19:12   ` [PATCH 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of last flushed entry Joonsoo Kim
2012-10-28 19:12     ` Joonsoo Kim
2012-10-29  2:06     ` Minchan Kim
2012-10-29  2:06       ` Minchan Kim
2012-10-29 13:12       ` JoonSoo Kim
2012-10-29 13:12         ` JoonSoo Kim
2012-10-28 19:12   ` [PATCH 5/5] mm, highmem: get virtual address of the page using PKMAP_ADDR() Joonsoo Kim
2012-10-28 19:12     ` Joonsoo Kim
2012-10-29  2:09     ` Minchan Kim
2012-10-29  2:09       ` Minchan Kim
2012-10-29  2:12   ` [PATCH 0/5] minor clean-up and optimize highmem related code Minchan Kim
2012-10-29  2:12     ` Minchan Kim
2012-10-29 13:15     ` JoonSoo Kim
2012-10-29 13:15       ` JoonSoo Kim
2012-10-31 17:11       ` JoonSoo Kim
2012-10-31 17:11         ` JoonSoo Kim
2012-10-31 16:56 ` [PATCH v2 " Joonsoo Kim
2012-10-31 16:56   ` Joonsoo Kim
2012-10-31 16:56   ` [PATCH v2 1/5] mm, highmem: use PKMAP_NR() to calculate an index of pkmap Joonsoo Kim
2012-10-31 16:56     ` Joonsoo Kim
2012-10-31 16:56   ` [PATCH v2 2/5] mm, highmem: remove useless pool_lock Joonsoo Kim
2012-10-31 16:56     ` Joonsoo Kim
2012-10-31 16:56   ` [PATCH v2 3/5] mm, highmem: remove page_address_pool list Joonsoo Kim
2012-10-31 16:56     ` Joonsoo Kim
2012-10-31 16:56   ` [PATCH v2 4/5] mm, highmem: makes flush_all_zero_pkmaps() return index of first flushed entry Joonsoo Kim
2012-10-31 16:56     ` Joonsoo Kim
2012-11-01  5:03     ` Minchan Kim
2012-11-01  5:03       ` Minchan Kim
2012-11-02 19:07       ` JoonSoo Kim
2012-11-02 19:07         ` JoonSoo Kim
2012-11-02 22:42         ` Minchan Kim
2012-11-02 22:42           ` Minchan Kim
2012-11-13  0:30           ` JoonSoo Kim
2012-11-13  0:30             ` JoonSoo Kim
2012-11-13 12:49             ` Minchan Kim
2012-11-13 12:49               ` Minchan Kim
2012-11-13 14:12               ` JoonSoo Kim
2012-11-13 14:12                 ` JoonSoo Kim
2012-11-13 15:01                 ` Minchan Kim
2012-11-13 15:01                   ` Minchan Kim
2012-11-14 17:09                   ` JoonSoo Kim
2012-11-14 17:09                     ` JoonSoo Kim
2012-11-19 23:46                     ` Minchan Kim
2012-11-19 23:46                       ` Minchan Kim
2012-11-27 15:01                       ` JoonSoo Kim
2012-11-27 15:01                         ` JoonSoo Kim
2012-10-31 16:56   ` [PATCH v2 5/5] mm, highmem: get virtual address of the page using PKMAP_ADDR() Joonsoo Kim
2012-10-31 16:56     ` Joonsoo Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.