All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] zram: various improvements and cleanups
@ 2010-08-09 17:26 ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

The zram module creates RAM based block devices named /dev/zram<id>
(<id> = 0, 1, ...). Pages written to these disks are compressed and stored
in memory itself.

One of the major changes done is the replacement of ioctls with sysfs
interface. One of the advantages of this approach is we no longer depend on the
userspace tool (rzscontrol) which was used to set various parameters and check
statistics. Maintaining updated version of rzscontrol as changes were done to
ioctls, statistics exported etc. was a major pain.

Another significant change is the introduction of percpu stats and compression
buffers. Earlier, we had per-device buffers protected by a mutex. This was a
major bottleneck on multi-core systems. With these changes, benchmarks with
fio[1] showed a speedup of about 20% for write performance on dual-core
system (see patch 4/10 description for details).


For easier testing, a single patch against 2.6.35-git8 has been uploaded at:
http://compcache.googlecode.com/hg/sub-projects/mainline/zram_2.6.36-rc0.patch

[1] fio I/O benchmarking tool: http://freshmeat.net/projects/fio/

Nitin Gupta (10):
  Replace ioctls with sysfs interface
  Remove need for explicit device initialization
  Use percpu stats
  Use percpu buffers
  Reduce per table entry overhead by 4 bytes
  Block discard support
  Increase compressed page size threshold
  Some cleanups
  Update zram documentation
  Document sysfs entries

 Documentation/ABI/testing/sysfs-block-zram |   99 ++++
 drivers/staging/zram/Kconfig               |   12 -
 drivers/staging/zram/Makefile              |    2 +-
 drivers/staging/zram/xvmalloc.c            |   22 +-
 drivers/staging/zram/zram.txt              |   58 ++-
 drivers/staging/zram/zram_drv.c            |  670 ++++++++++++++--------------
 drivers/staging/zram/zram_drv.h            |  138 ++----
 drivers/staging/zram/zram_ioctl.h          |   41 --
 drivers/staging/zram/zram_sysfs.c          |  253 +++++++++++
 9 files changed, 760 insertions(+), 535 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-block-zram
 delete mode 100644 drivers/staging/zram/zram_ioctl.h
 create mode 100644 drivers/staging/zram/zram_sysfs.c

-- 
1.7.2.1


^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 00/10] zram: various improvements and cleanups
@ 2010-08-09 17:26 ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

The zram module creates RAM based block devices named /dev/zram<id>
(<id> = 0, 1, ...). Pages written to these disks are compressed and stored
in memory itself.

One of the major changes done is the replacement of ioctls with sysfs
interface. One of the advantages of this approach is we no longer depend on the
userspace tool (rzscontrol) which was used to set various parameters and check
statistics. Maintaining updated version of rzscontrol as changes were done to
ioctls, statistics exported etc. was a major pain.

Another significant change is the introduction of percpu stats and compression
buffers. Earlier, we had per-device buffers protected by a mutex. This was a
major bottleneck on multi-core systems. With these changes, benchmarks with
fio[1] showed a speedup of about 20% for write performance on dual-core
system (see patch 4/10 description for details).


For easier testing, a single patch against 2.6.35-git8 has been uploaded at:
http://compcache.googlecode.com/hg/sub-projects/mainline/zram_2.6.36-rc0.patch

[1] fio I/O benchmarking tool: http://freshmeat.net/projects/fio/

Nitin Gupta (10):
  Replace ioctls with sysfs interface
  Remove need for explicit device initialization
  Use percpu stats
  Use percpu buffers
  Reduce per table entry overhead by 4 bytes
  Block discard support
  Increase compressed page size threshold
  Some cleanups
  Update zram documentation
  Document sysfs entries

 Documentation/ABI/testing/sysfs-block-zram |   99 ++++
 drivers/staging/zram/Kconfig               |   12 -
 drivers/staging/zram/Makefile              |    2 +-
 drivers/staging/zram/xvmalloc.c            |   22 +-
 drivers/staging/zram/zram.txt              |   58 ++-
 drivers/staging/zram/zram_drv.c            |  670 ++++++++++++++--------------
 drivers/staging/zram/zram_drv.h            |  138 ++----
 drivers/staging/zram/zram_ioctl.h          |   41 --
 drivers/staging/zram/zram_sysfs.c          |  253 +++++++++++
 9 files changed, 760 insertions(+), 535 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-block-zram
 delete mode 100644 drivers/staging/zram/zram_ioctl.h
 create mode 100644 drivers/staging/zram/zram_sysfs.c

-- 
1.7.2.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* [PATCH 01/10] Replace ioctls with sysfs interface
  2010-08-09 17:26 ` Nitin Gupta
@ 2010-08-09 17:26   ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

Creates per-device sysfs nodes in /sys/block/zram<id>/
Currently following stats are exported:
 - disksize
 - num_reads
 - num_writes
 - invalid_io
 - zero_pages
 - orig_data_size
 - compr_data_size
 - mem_used_total

By default, disksize is set to 0. So, to start using
a zram device, fist write a disksize value and then
initialize device by writing any positive value to
initstate. For example:

        # initialize /dev/zram0 with 50MB disksize
        echo 50*1024*1024 | bc > /sys/block/zram0/disksize
        echo 1 > /sys/block/zram0/initstate

When done using a disk, issue reset to free its memory
by writing any positive value to reset node:

        echo 1 > /sys/block/zram0/reset

This change also obviates the need for 'rzscontrol' utility.

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
 drivers/staging/zram/Kconfig      |   12 --
 drivers/staging/zram/Makefile     |    2 +-
 drivers/staging/zram/zram_drv.c   |  186 ++++++++--------------------
 drivers/staging/zram/zram_drv.h   |   55 ++-------
 drivers/staging/zram/zram_ioctl.h |   41 ------
 drivers/staging/zram/zram_sysfs.c |  242 +++++++++++++++++++++++++++++++++++++
 6 files changed, 306 insertions(+), 232 deletions(-)
 delete mode 100644 drivers/staging/zram/zram_ioctl.h
 create mode 100644 drivers/staging/zram/zram_sysfs.c

diff --git a/drivers/staging/zram/Kconfig b/drivers/staging/zram/Kconfig
index 4654ae2..da079f8 100644
--- a/drivers/staging/zram/Kconfig
+++ b/drivers/staging/zram/Kconfig
@@ -15,15 +15,3 @@ config ZRAM
 
 	  See zram.txt for more information.
 	  Project home: http://compcache.googlecode.com/
-
-config ZRAM_STATS
-	bool "Enable statistics for compressed RAM disks"
-	depends on ZRAM
-	default y
-	help
-	  Enable statistics collection for compressed RAM devices. Statistics
-	  are exported through ioctl interface, so you have to use zramconfig
-	  program to get them. This adds only a minimal overhead.
-
-	  If unsure, say Y.
-
diff --git a/drivers/staging/zram/Makefile b/drivers/staging/zram/Makefile
index b2c087a..c01160a 100644
--- a/drivers/staging/zram/Makefile
+++ b/drivers/staging/zram/Makefile
@@ -1,3 +1,3 @@
-zram-objs	:=	zram_drv.o xvmalloc.o
+zram-objs	:=	zram_drv.o zram_sysfs.o xvmalloc.o
 
 obj-$(CONFIG_ZRAM)	+=	zram.o
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 77d4d71..3f698a5 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -33,10 +33,39 @@
 
 /* Globals */
 static int zram_major;
-static struct zram *devices;
+struct zram *devices;
 
 /* Module params (documentation at end) */
-static unsigned int num_devices;
+unsigned int num_devices;
+
+static void zram_stat_inc(u32 *v)
+{
+	*v = *v + 1;
+}
+
+static void zram_stat_dec(u32 *v)
+{
+	*v = *v - 1;
+}
+
+static void zram_stat64_add(struct zram *zram, u64 *v, u64 inc)
+{
+	spin_lock(&zram->stat64_lock);
+	*v = *v + inc;
+	spin_unlock(&zram->stat64_lock);
+}
+
+static void zram_stat64_sub(struct zram *zram, u64 *v, u64 dec)
+{
+	spin_lock(&zram->stat64_lock);
+	*v = *v - dec;
+	spin_unlock(&zram->stat64_lock);
+}
+
+static void zram_stat64_inc(struct zram *zram, u64 *v)
+{
+	zram_stat64_add(zram, v, 1);
+}
 
 static int zram_test_flag(struct zram *zram, u32 index,
 			enum zram_pageflags flag)
@@ -91,7 +120,7 @@ static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
 		"the disk when not in use so a huge zram is "
 		"wasteful.\n"
 		"\tMemory Size: %zu kB\n"
-		"\tSize you selected: %zu kB\n"
+		"\tSize you selected: %llu kB\n"
 		"Continuing anyway ...\n",
 		totalram_bytes >> 10, zram->disksize
 		);
@@ -100,49 +129,6 @@ static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
 	zram->disksize &= PAGE_MASK;
 }
 
-static void zram_ioctl_get_stats(struct zram *zram,
-			struct zram_ioctl_stats *s)
-{
-	s->disksize = zram->disksize;
-
-#if defined(CONFIG_ZRAM_STATS)
-	{
-	struct zram_stats *rs = &zram->stats;
-	size_t succ_writes, mem_used;
-	unsigned int good_compress_perc = 0, no_compress_perc = 0;
-
-	mem_used = xv_get_total_size_bytes(zram->mem_pool)
-			+ (rs->pages_expand << PAGE_SHIFT);
-	succ_writes = zram_stat64_read(zram, &rs->num_writes) -
-			zram_stat64_read(zram, &rs->failed_writes);
-
-	if (succ_writes && rs->pages_stored) {
-		good_compress_perc = rs->good_compress * 100
-					/ rs->pages_stored;
-		no_compress_perc = rs->pages_expand * 100
-					/ rs->pages_stored;
-	}
-
-	s->num_reads = zram_stat64_read(zram, &rs->num_reads);
-	s->num_writes = zram_stat64_read(zram, &rs->num_writes);
-	s->failed_reads = zram_stat64_read(zram, &rs->failed_reads);
-	s->failed_writes = zram_stat64_read(zram, &rs->failed_writes);
-	s->invalid_io = zram_stat64_read(zram, &rs->invalid_io);
-	s->notify_free = zram_stat64_read(zram, &rs->notify_free);
-	s->pages_zero = rs->pages_zero;
-
-	s->good_compress_pct = good_compress_perc;
-	s->pages_expand_pct = no_compress_perc;
-
-	s->pages_stored = rs->pages_stored;
-	s->pages_used = mem_used >> PAGE_SHIFT;
-	s->orig_data_size = rs->pages_stored << PAGE_SHIFT;
-	s->compr_data_size = rs->compr_size;
-	s->mem_used_total = mem_used;
-	}
-#endif /* CONFIG_ZRAM_STATS */
-}
-
 static void zram_free_page(struct zram *zram, size_t index)
 {
 	u32 clen;
@@ -180,7 +166,7 @@ static void zram_free_page(struct zram *zram, size_t index)
 		zram_stat_dec(&zram->stats.good_compress);
 
 out:
-	zram->stats.compr_size -= clen;
+	zram_stat64_sub(zram, &zram->stats.compr_size, clen);
 	zram_stat_dec(&zram->stats.pages_stored);
 
 	zram->table[index].page = NULL;
@@ -396,7 +382,7 @@ memstore:
 			kunmap_atomic(src, KM_USER0);
 
 		/* Update stats */
-		zram->stats.compr_size += clen;
+		zram_stat64_add(zram, &zram->stats.compr_size, clen);
 		zram_stat_inc(&zram->stats.pages_stored);
 		if (clen <= PAGE_SIZE / 2)
 			zram_stat_inc(&zram->stats.good_compress);
@@ -463,7 +449,7 @@ static int zram_make_request(struct request_queue *queue, struct bio *bio)
 	return ret;
 }
 
-static void reset_device(struct zram *zram)
+void zram_reset_device(struct zram *zram)
 {
 	size_t index;
 
@@ -506,7 +492,7 @@ static void reset_device(struct zram *zram)
 	zram->disksize = 0;
 }
 
-static int zram_ioctl_init_device(struct zram *zram)
+int zram_init_device(struct zram *zram)
 {
 	int ret;
 	size_t num_pages;
@@ -561,91 +547,12 @@ static int zram_ioctl_init_device(struct zram *zram)
 	return 0;
 
 fail:
-	reset_device(zram);
+	zram_reset_device(zram);
 
 	pr_err("Initialization failed: err=%d\n", ret);
 	return ret;
 }
 
-static int zram_ioctl_reset_device(struct zram *zram)
-{
-	if (zram->init_done)
-		reset_device(zram);
-
-	return 0;
-}
-
-static int zram_ioctl(struct block_device *bdev, fmode_t mode,
-			unsigned int cmd, unsigned long arg)
-{
-	int ret = 0;
-	size_t disksize_kb;
-
-	struct zram *zram = bdev->bd_disk->private_data;
-
-	switch (cmd) {
-	case ZRAMIO_SET_DISKSIZE_KB:
-		if (zram->init_done) {
-			ret = -EBUSY;
-			goto out;
-		}
-		if (copy_from_user(&disksize_kb, (void *)arg,
-						_IOC_SIZE(cmd))) {
-			ret = -EFAULT;
-			goto out;
-		}
-		zram->disksize = disksize_kb << 10;
-		pr_info("Disk size set to %zu kB\n", disksize_kb);
-		break;
-
-	case ZRAMIO_GET_STATS:
-	{
-		struct zram_ioctl_stats *stats;
-		if (!zram->init_done) {
-			ret = -ENOTTY;
-			goto out;
-		}
-		stats = kzalloc(sizeof(*stats), GFP_KERNEL);
-		if (!stats) {
-			ret = -ENOMEM;
-			goto out;
-		}
-		zram_ioctl_get_stats(zram, stats);
-		if (copy_to_user((void *)arg, stats, sizeof(*stats))) {
-			kfree(stats);
-			ret = -EFAULT;
-			goto out;
-		}
-		kfree(stats);
-		break;
-	}
-	case ZRAMIO_INIT:
-		ret = zram_ioctl_init_device(zram);
-		break;
-
-	case ZRAMIO_RESET:
-		/* Do not reset an active device! */
-		if (bdev->bd_holders) {
-			ret = -EBUSY;
-			goto out;
-		}
-
-		/* Make sure all pending I/O is finished */
-		if (bdev)
-			fsync_bdev(bdev);
-
-		ret = zram_ioctl_reset_device(zram);
-		break;
-
-	default:
-		pr_info("Invalid ioctl %u\n", cmd);
-		ret = -ENOTTY;
-	}
-
-out:
-	return ret;
-}
-
 void zram_slot_free_notify(struct block_device *bdev, unsigned long index)
 {
 	struct zram *zram;
@@ -656,7 +563,6 @@ void zram_slot_free_notify(struct block_device *bdev, unsigned long index)
 }
 
 static const struct block_device_operations zram_devops = {
-	.ioctl = zram_ioctl,
 	.swap_slot_free_notify = zram_slot_free_notify,
 	.owner = THIS_MODULE
 };
@@ -696,7 +602,7 @@ static int create_device(struct zram *zram, int device_id)
 	zram->disk->private_data = zram;
 	snprintf(zram->disk->disk_name, 16, "zram%d", device_id);
 
-	/* Actual capacity set using ZRAMIO_SET_DISKSIZE_KB ioctl */
+	/* Actual capacity set using syfs (/sys/block/zram<id>/disksize */
 	set_capacity(zram->disk, 0);
 
 	/*
@@ -710,6 +616,15 @@ static int create_device(struct zram *zram, int device_id)
 
 	add_disk(zram->disk);
 
+#ifdef CONFIG_SYSFS
+	ret = sysfs_create_group(&disk_to_dev(zram->disk)->kobj,
+				&zram_disk_attr_group);
+	if (ret < 0) {
+		pr_warning("Error creating sysfs group");
+		goto out;
+	}
+#endif
+
 	zram->init_done = 0;
 
 out:
@@ -718,6 +633,11 @@ out:
 
 static void destroy_device(struct zram *zram)
 {
+#ifdef CONFIG_SYSFS
+	sysfs_remove_group(&disk_to_dev(zram->disk)->kobj,
+			&zram_disk_attr_group);
+#endif
+
 	if (zram->disk) {
 		del_gendisk(zram->disk);
 		put_disk(zram->disk);
@@ -785,7 +705,7 @@ static void __exit zram_exit(void)
 
 		destroy_device(zram);
 		if (zram->init_done)
-			reset_device(zram);
+			zram_reset_device(zram);
 	}
 
 	unregister_blkdev(zram_major, "zram");
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 945f974..2ef93cc 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -18,7 +18,6 @@
 #include <linux/spinlock.h>
 #include <linux/mutex.h>
 
-#include "zram_ioctl.h"
 #include "xvmalloc.h"
 
 /*
@@ -85,11 +84,7 @@ struct table {
 } __attribute__((aligned(4)));
 
 struct zram_stats {
-	/* basic stats */
-	size_t compr_size;	/* compressed size of pages stored -
-				 * needed to enforce memlimit */
-	/* more stats */
-#if defined(CONFIG_ZRAM_STATS)
+	u64 compr_size;		/* compressed size of pages stored */
 	u64 num_reads;		/* failed + successful */
 	u64 num_writes;		/* --do-- */
 	u64 failed_reads;	/* should NEVER! happen */
@@ -100,7 +95,6 @@ struct zram_stats {
 	u32 pages_stored;	/* no. of pages currently stored */
 	u32 good_compress;	/* % of pages with compression ratio<=50% */
 	u32 pages_expand;	/* % of incompressible pages */
-#endif
 };
 
 struct zram {
@@ -118,47 +112,18 @@ struct zram {
 	 * This is the limit on amount of *uncompressed* worth of data
 	 * we can store in a disk.
 	 */
-	size_t disksize;	/* bytes */
+	u64 disksize;	/* bytes */
 
 	struct zram_stats stats;
 };
 
-/*-- */
-
-/* Debugging and Stats */
-#if defined(CONFIG_ZRAM_STATS)
-static void zram_stat_inc(u32 *v)
-{
-	*v = *v + 1;
-}
-
-static void zram_stat_dec(u32 *v)
-{
-	*v = *v - 1;
-}
-
-static void zram_stat64_inc(struct zram *zram, u64 *v)
-{
-	spin_lock(&zram->stat64_lock);
-	*v = *v + 1;
-	spin_unlock(&zram->stat64_lock);
-}
-
-static u64 zram_stat64_read(struct zram *zram, u64 *v)
-{
-	u64 val;
-
-	spin_lock(&zram->stat64_lock);
-	val = *v;
-	spin_unlock(&zram->stat64_lock);
-
-	return val;
-}
-#else
-#define zram_stat_inc(v)
-#define zram_stat_dec(v)
-#define zram_stat64_inc(r, v)
-#define zram_stat64_read(r, v)
-#endif /* CONFIG_ZRAM_STATS */
+extern struct zram *devices;
+extern unsigned int num_devices;
+#ifdef CONFIG_SYSFS
+extern struct attribute_group zram_disk_attr_group;
+#endif
+
+extern int zram_init_device(struct zram *zram);
+extern void zram_reset_device(struct zram *zram);
 
 #endif
diff --git a/drivers/staging/zram/zram_ioctl.h b/drivers/staging/zram/zram_ioctl.h
deleted file mode 100644
index 5c415fa..0000000
--- a/drivers/staging/zram/zram_ioctl.h
+++ /dev/null
@@ -1,41 +0,0 @@
-/*
- * Compressed RAM block device
- *
- * Copyright (C) 2008, 2009, 2010  Nitin Gupta
- *
- * This code is released using a dual license strategy: BSD/GPL
- * You can choose the licence that better fits your requirements.
- *
- * Released under the terms of 3-clause BSD License
- * Released under the terms of GNU General Public License Version 2.0
- *
- * Project home: http://compcache.googlecode.com
- */
-
-#ifndef _ZRAM_IOCTL_H_
-#define _ZRAM_IOCTL_H_
-
-struct zram_ioctl_stats {
-	u64 disksize;		/* disksize in bytes (user specifies in KB) */
-	u64 num_reads;		/* failed + successful */
-	u64 num_writes;		/* --do-- */
-	u64 failed_reads;	/* should NEVER! happen */
-	u64 failed_writes;	/* can happen when memory is too low */
-	u64 invalid_io;		/* non-page-aligned I/O requests */
-	u64 notify_free;	/* no. of swap slot free notifications */
-	u32 pages_zero;		/* no. of zero filled pages */
-	u32 good_compress_pct;	/* no. of pages with compression ratio<=50% */
-	u32 pages_expand_pct;	/* no. of incompressible pages */
-	u32 pages_stored;
-	u32 pages_used;
-	u64 orig_data_size;
-	u64 compr_data_size;
-	u64 mem_used_total;
-} __attribute__ ((packed, aligned(4)));
-
-#define ZRAMIO_SET_DISKSIZE_KB	_IOW('z', 0, size_t)
-#define ZRAMIO_GET_STATS	_IOR('z', 1, struct zram_ioctl_stats)
-#define ZRAMIO_INIT		_IO('z', 2)
-#define ZRAMIO_RESET		_IO('z', 3)
-
-#endif
diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
new file mode 100644
index 0000000..b8dbaee
--- /dev/null
+++ b/drivers/staging/zram/zram_sysfs.c
@@ -0,0 +1,242 @@
+/*
+ * Compressed RAM block device
+ *
+ * Copyright (C) 2008, 2009, 2010  Nitin Gupta
+ *
+ * This code is released using a dual license strategy: BSD/GPL
+ * You can choose the licence that better fits your requirements.
+ *
+ * Released under the terms of 3-clause BSD License
+ * Released under the terms of GNU General Public License Version 2.0
+ *
+ * Project home: http://compcache.googlecode.com/
+ */
+
+#include <linux/device.h>
+#include <linux/genhd.h>
+
+#include "zram_drv.h"
+
+#ifdef CONFIG_SYSFS
+
+static u64 zram_stat64_read(struct zram *zram, u64 *v)
+{
+	u64 val;
+
+	spin_lock(&zram->stat64_lock);
+	val = *v;
+	spin_unlock(&zram->stat64_lock);
+
+	return val;
+}
+
+static struct zram *dev_to_zram(struct device *dev)
+{
+	int i;
+	struct zram *zram = NULL;
+
+	for (i = 0; i < num_devices; i++) {
+		zram = &devices[i];
+		if (disk_to_dev(zram->disk) == dev)
+			break;
+	}
+
+	return zram;
+}
+
+static ssize_t disksize_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return sprintf(buf, "%llu\n", zram->disksize);
+}
+
+static ssize_t disksize_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
+{
+	int ret;
+	struct zram *zram = dev_to_zram(dev);
+
+	if (zram->init_done)
+		return -EBUSY;
+
+	ret = strict_strtoull(buf, 10, &zram->disksize);
+	if (ret)
+		return ret;
+
+	zram->disksize &= PAGE_MASK;
+	set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
+
+	return len;
+}
+
+static ssize_t initstate_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return sprintf(buf, "%u\n", zram->init_done);
+}
+
+static ssize_t initstate_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
+{
+	int ret;
+	unsigned long do_init;
+	struct zram *zram = dev_to_zram(dev);
+
+	ret = strict_strtoul(buf, 10, &do_init);
+	if (ret)
+		return ret;
+
+	if (!do_init)
+		return -EINVAL;
+
+	zram_init_device(zram);
+
+	return len;
+}
+
+static ssize_t reset_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
+{
+	int ret;
+	unsigned long do_reset;
+	struct zram *zram;
+	struct block_device *bdev;
+
+	zram = dev_to_zram(dev);
+	bdev = bdget_disk(zram->disk, 0);
+
+	/* Do not reset an active device! */
+	if (bdev->bd_holders)
+		return -EBUSY;
+
+	ret = strict_strtoul(buf, 10, &do_reset);
+	if (ret)
+		return ret;
+
+	if (!do_reset)
+		return -EINVAL;
+
+	/* Make sure all pending I/O is finished */
+	if (bdev)
+		fsync_bdev(bdev);
+
+	if (zram->init_done)
+		zram_reset_device(zram);
+
+	return len;
+}
+
+static ssize_t num_reads_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return sprintf(buf, "%llu\n",
+		zram_stat64_read(zram, &zram->stats.num_reads));
+}
+
+static ssize_t num_writes_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return sprintf(buf, "%llu\n",
+		zram_stat64_read(zram, &zram->stats.num_writes));
+}
+
+static ssize_t invalid_io_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return sprintf(buf, "%llu\n",
+		zram_stat64_read(zram, &zram->stats.invalid_io));
+}
+
+static ssize_t notify_free_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return sprintf(buf, "%llu\n",
+		zram_stat64_read(zram, &zram->stats.notify_free));
+}
+
+static ssize_t zero_pages_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return sprintf(buf, "%u\n", zram->stats.pages_zero);
+}
+
+static ssize_t orig_data_size_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return sprintf(buf, "%llu\n",
+		(u64)(zram->stats.pages_stored) << PAGE_SHIFT);
+}
+
+static ssize_t compr_data_size_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return sprintf(buf, "%llu\n",
+		zram_stat64_read(zram, &zram->stats.compr_size));
+}
+
+static ssize_t mem_used_total_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	u64 val = 0;
+	struct zram *zram = dev_to_zram(dev);
+
+	if (zram->init_done) {
+		val = xv_get_total_size_bytes(zram->mem_pool) +
+			((u64)(zram->stats.pages_expand) << PAGE_SHIFT);
+	}
+
+	return sprintf(buf, "%llu\n", val);
+}
+
+static DEVICE_ATTR(disksize, S_IRUGO | S_IWUGO,
+		disksize_show, disksize_store);
+static DEVICE_ATTR(initstate, S_IRUGO | S_IWUGO,
+		initstate_show, initstate_store);
+static DEVICE_ATTR(reset, S_IWUGO, NULL, reset_store);
+static DEVICE_ATTR(num_reads, S_IRUGO, num_reads_show, NULL);
+static DEVICE_ATTR(num_writes, S_IRUGO, num_writes_show, NULL);
+static DEVICE_ATTR(invalid_io, S_IRUGO, invalid_io_show, NULL);
+static DEVICE_ATTR(notify_free, S_IRUGO, notify_free_show, NULL);
+static DEVICE_ATTR(zero_pages, S_IRUGO, zero_pages_show, NULL);
+static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
+static DEVICE_ATTR(compr_data_size, S_IRUGO, compr_data_size_show, NULL);
+static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
+
+static struct attribute *zram_disk_attrs[] = {
+	&dev_attr_disksize.attr,
+	&dev_attr_initstate.attr,
+	&dev_attr_reset.attr,
+	&dev_attr_num_reads.attr,
+	&dev_attr_num_writes.attr,
+	&dev_attr_invalid_io.attr,
+	&dev_attr_notify_free.attr,
+	&dev_attr_zero_pages.attr,
+	&dev_attr_orig_data_size.attr,
+	&dev_attr_compr_data_size.attr,
+	&dev_attr_mem_used_total.attr,
+	NULL,
+};
+
+struct attribute_group zram_disk_attr_group = {
+	.attrs = zram_disk_attrs,
+};
+
+#endif	/* CONFIG_SYSFS */
-- 
1.7.2.1


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 01/10] Replace ioctls with sysfs interface
@ 2010-08-09 17:26   ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

Creates per-device sysfs nodes in /sys/block/zram<id>/
Currently following stats are exported:
 - disksize
 - num_reads
 - num_writes
 - invalid_io
 - zero_pages
 - orig_data_size
 - compr_data_size
 - mem_used_total

By default, disksize is set to 0. So, to start using
a zram device, fist write a disksize value and then
initialize device by writing any positive value to
initstate. For example:

        # initialize /dev/zram0 with 50MB disksize
        echo 50*1024*1024 | bc > /sys/block/zram0/disksize
        echo 1 > /sys/block/zram0/initstate

When done using a disk, issue reset to free its memory
by writing any positive value to reset node:

        echo 1 > /sys/block/zram0/reset

This change also obviates the need for 'rzscontrol' utility.

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
 drivers/staging/zram/Kconfig      |   12 --
 drivers/staging/zram/Makefile     |    2 +-
 drivers/staging/zram/zram_drv.c   |  186 ++++++++--------------------
 drivers/staging/zram/zram_drv.h   |   55 ++-------
 drivers/staging/zram/zram_ioctl.h |   41 ------
 drivers/staging/zram/zram_sysfs.c |  242 +++++++++++++++++++++++++++++++++++++
 6 files changed, 306 insertions(+), 232 deletions(-)
 delete mode 100644 drivers/staging/zram/zram_ioctl.h
 create mode 100644 drivers/staging/zram/zram_sysfs.c

diff --git a/drivers/staging/zram/Kconfig b/drivers/staging/zram/Kconfig
index 4654ae2..da079f8 100644
--- a/drivers/staging/zram/Kconfig
+++ b/drivers/staging/zram/Kconfig
@@ -15,15 +15,3 @@ config ZRAM
 
 	  See zram.txt for more information.
 	  Project home: http://compcache.googlecode.com/
-
-config ZRAM_STATS
-	bool "Enable statistics for compressed RAM disks"
-	depends on ZRAM
-	default y
-	help
-	  Enable statistics collection for compressed RAM devices. Statistics
-	  are exported through ioctl interface, so you have to use zramconfig
-	  program to get them. This adds only a minimal overhead.
-
-	  If unsure, say Y.
-
diff --git a/drivers/staging/zram/Makefile b/drivers/staging/zram/Makefile
index b2c087a..c01160a 100644
--- a/drivers/staging/zram/Makefile
+++ b/drivers/staging/zram/Makefile
@@ -1,3 +1,3 @@
-zram-objs	:=	zram_drv.o xvmalloc.o
+zram-objs	:=	zram_drv.o zram_sysfs.o xvmalloc.o
 
 obj-$(CONFIG_ZRAM)	+=	zram.o
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 77d4d71..3f698a5 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -33,10 +33,39 @@
 
 /* Globals */
 static int zram_major;
-static struct zram *devices;
+struct zram *devices;
 
 /* Module params (documentation at end) */
-static unsigned int num_devices;
+unsigned int num_devices;
+
+static void zram_stat_inc(u32 *v)
+{
+	*v = *v + 1;
+}
+
+static void zram_stat_dec(u32 *v)
+{
+	*v = *v - 1;
+}
+
+static void zram_stat64_add(struct zram *zram, u64 *v, u64 inc)
+{
+	spin_lock(&zram->stat64_lock);
+	*v = *v + inc;
+	spin_unlock(&zram->stat64_lock);
+}
+
+static void zram_stat64_sub(struct zram *zram, u64 *v, u64 dec)
+{
+	spin_lock(&zram->stat64_lock);
+	*v = *v - dec;
+	spin_unlock(&zram->stat64_lock);
+}
+
+static void zram_stat64_inc(struct zram *zram, u64 *v)
+{
+	zram_stat64_add(zram, v, 1);
+}
 
 static int zram_test_flag(struct zram *zram, u32 index,
 			enum zram_pageflags flag)
@@ -91,7 +120,7 @@ static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
 		"the disk when not in use so a huge zram is "
 		"wasteful.\n"
 		"\tMemory Size: %zu kB\n"
-		"\tSize you selected: %zu kB\n"
+		"\tSize you selected: %llu kB\n"
 		"Continuing anyway ...\n",
 		totalram_bytes >> 10, zram->disksize
 		);
@@ -100,49 +129,6 @@ static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
 	zram->disksize &= PAGE_MASK;
 }
 
-static void zram_ioctl_get_stats(struct zram *zram,
-			struct zram_ioctl_stats *s)
-{
-	s->disksize = zram->disksize;
-
-#if defined(CONFIG_ZRAM_STATS)
-	{
-	struct zram_stats *rs = &zram->stats;
-	size_t succ_writes, mem_used;
-	unsigned int good_compress_perc = 0, no_compress_perc = 0;
-
-	mem_used = xv_get_total_size_bytes(zram->mem_pool)
-			+ (rs->pages_expand << PAGE_SHIFT);
-	succ_writes = zram_stat64_read(zram, &rs->num_writes) -
-			zram_stat64_read(zram, &rs->failed_writes);
-
-	if (succ_writes && rs->pages_stored) {
-		good_compress_perc = rs->good_compress * 100
-					/ rs->pages_stored;
-		no_compress_perc = rs->pages_expand * 100
-					/ rs->pages_stored;
-	}
-
-	s->num_reads = zram_stat64_read(zram, &rs->num_reads);
-	s->num_writes = zram_stat64_read(zram, &rs->num_writes);
-	s->failed_reads = zram_stat64_read(zram, &rs->failed_reads);
-	s->failed_writes = zram_stat64_read(zram, &rs->failed_writes);
-	s->invalid_io = zram_stat64_read(zram, &rs->invalid_io);
-	s->notify_free = zram_stat64_read(zram, &rs->notify_free);
-	s->pages_zero = rs->pages_zero;
-
-	s->good_compress_pct = good_compress_perc;
-	s->pages_expand_pct = no_compress_perc;
-
-	s->pages_stored = rs->pages_stored;
-	s->pages_used = mem_used >> PAGE_SHIFT;
-	s->orig_data_size = rs->pages_stored << PAGE_SHIFT;
-	s->compr_data_size = rs->compr_size;
-	s->mem_used_total = mem_used;
-	}
-#endif /* CONFIG_ZRAM_STATS */
-}
-
 static void zram_free_page(struct zram *zram, size_t index)
 {
 	u32 clen;
@@ -180,7 +166,7 @@ static void zram_free_page(struct zram *zram, size_t index)
 		zram_stat_dec(&zram->stats.good_compress);
 
 out:
-	zram->stats.compr_size -= clen;
+	zram_stat64_sub(zram, &zram->stats.compr_size, clen);
 	zram_stat_dec(&zram->stats.pages_stored);
 
 	zram->table[index].page = NULL;
@@ -396,7 +382,7 @@ memstore:
 			kunmap_atomic(src, KM_USER0);
 
 		/* Update stats */
-		zram->stats.compr_size += clen;
+		zram_stat64_add(zram, &zram->stats.compr_size, clen);
 		zram_stat_inc(&zram->stats.pages_stored);
 		if (clen <= PAGE_SIZE / 2)
 			zram_stat_inc(&zram->stats.good_compress);
@@ -463,7 +449,7 @@ static int zram_make_request(struct request_queue *queue, struct bio *bio)
 	return ret;
 }
 
-static void reset_device(struct zram *zram)
+void zram_reset_device(struct zram *zram)
 {
 	size_t index;
 
@@ -506,7 +492,7 @@ static void reset_device(struct zram *zram)
 	zram->disksize = 0;
 }
 
-static int zram_ioctl_init_device(struct zram *zram)
+int zram_init_device(struct zram *zram)
 {
 	int ret;
 	size_t num_pages;
@@ -561,91 +547,12 @@ static int zram_ioctl_init_device(struct zram *zram)
 	return 0;
 
 fail:
-	reset_device(zram);
+	zram_reset_device(zram);
 
 	pr_err("Initialization failed: err=%d\n", ret);
 	return ret;
 }
 
-static int zram_ioctl_reset_device(struct zram *zram)
-{
-	if (zram->init_done)
-		reset_device(zram);
-
-	return 0;
-}
-
-static int zram_ioctl(struct block_device *bdev, fmode_t mode,
-			unsigned int cmd, unsigned long arg)
-{
-	int ret = 0;
-	size_t disksize_kb;
-
-	struct zram *zram = bdev->bd_disk->private_data;
-
-	switch (cmd) {
-	case ZRAMIO_SET_DISKSIZE_KB:
-		if (zram->init_done) {
-			ret = -EBUSY;
-			goto out;
-		}
-		if (copy_from_user(&disksize_kb, (void *)arg,
-						_IOC_SIZE(cmd))) {
-			ret = -EFAULT;
-			goto out;
-		}
-		zram->disksize = disksize_kb << 10;
-		pr_info("Disk size set to %zu kB\n", disksize_kb);
-		break;
-
-	case ZRAMIO_GET_STATS:
-	{
-		struct zram_ioctl_stats *stats;
-		if (!zram->init_done) {
-			ret = -ENOTTY;
-			goto out;
-		}
-		stats = kzalloc(sizeof(*stats), GFP_KERNEL);
-		if (!stats) {
-			ret = -ENOMEM;
-			goto out;
-		}
-		zram_ioctl_get_stats(zram, stats);
-		if (copy_to_user((void *)arg, stats, sizeof(*stats))) {
-			kfree(stats);
-			ret = -EFAULT;
-			goto out;
-		}
-		kfree(stats);
-		break;
-	}
-	case ZRAMIO_INIT:
-		ret = zram_ioctl_init_device(zram);
-		break;
-
-	case ZRAMIO_RESET:
-		/* Do not reset an active device! */
-		if (bdev->bd_holders) {
-			ret = -EBUSY;
-			goto out;
-		}
-
-		/* Make sure all pending I/O is finished */
-		if (bdev)
-			fsync_bdev(bdev);
-
-		ret = zram_ioctl_reset_device(zram);
-		break;
-
-	default:
-		pr_info("Invalid ioctl %u\n", cmd);
-		ret = -ENOTTY;
-	}
-
-out:
-	return ret;
-}
-
 void zram_slot_free_notify(struct block_device *bdev, unsigned long index)
 {
 	struct zram *zram;
@@ -656,7 +563,6 @@ void zram_slot_free_notify(struct block_device *bdev, unsigned long index)
 }
 
 static const struct block_device_operations zram_devops = {
-	.ioctl = zram_ioctl,
 	.swap_slot_free_notify = zram_slot_free_notify,
 	.owner = THIS_MODULE
 };
@@ -696,7 +602,7 @@ static int create_device(struct zram *zram, int device_id)
 	zram->disk->private_data = zram;
 	snprintf(zram->disk->disk_name, 16, "zram%d", device_id);
 
-	/* Actual capacity set using ZRAMIO_SET_DISKSIZE_KB ioctl */
+	/* Actual capacity set using syfs (/sys/block/zram<id>/disksize */
 	set_capacity(zram->disk, 0);
 
 	/*
@@ -710,6 +616,15 @@ static int create_device(struct zram *zram, int device_id)
 
 	add_disk(zram->disk);
 
+#ifdef CONFIG_SYSFS
+	ret = sysfs_create_group(&disk_to_dev(zram->disk)->kobj,
+				&zram_disk_attr_group);
+	if (ret < 0) {
+		pr_warning("Error creating sysfs group");
+		goto out;
+	}
+#endif
+
 	zram->init_done = 0;
 
 out:
@@ -718,6 +633,11 @@ out:
 
 static void destroy_device(struct zram *zram)
 {
+#ifdef CONFIG_SYSFS
+	sysfs_remove_group(&disk_to_dev(zram->disk)->kobj,
+			&zram_disk_attr_group);
+#endif
+
 	if (zram->disk) {
 		del_gendisk(zram->disk);
 		put_disk(zram->disk);
@@ -785,7 +705,7 @@ static void __exit zram_exit(void)
 
 		destroy_device(zram);
 		if (zram->init_done)
-			reset_device(zram);
+			zram_reset_device(zram);
 	}
 
 	unregister_blkdev(zram_major, "zram");
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 945f974..2ef93cc 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -18,7 +18,6 @@
 #include <linux/spinlock.h>
 #include <linux/mutex.h>
 
-#include "zram_ioctl.h"
 #include "xvmalloc.h"
 
 /*
@@ -85,11 +84,7 @@ struct table {
 } __attribute__((aligned(4)));
 
 struct zram_stats {
-	/* basic stats */
-	size_t compr_size;	/* compressed size of pages stored -
-				 * needed to enforce memlimit */
-	/* more stats */
-#if defined(CONFIG_ZRAM_STATS)
+	u64 compr_size;		/* compressed size of pages stored */
 	u64 num_reads;		/* failed + successful */
 	u64 num_writes;		/* --do-- */
 	u64 failed_reads;	/* should NEVER! happen */
@@ -100,7 +95,6 @@ struct zram_stats {
 	u32 pages_stored;	/* no. of pages currently stored */
 	u32 good_compress;	/* % of pages with compression ratio<=50% */
 	u32 pages_expand;	/* % of incompressible pages */
-#endif
 };
 
 struct zram {
@@ -118,47 +112,18 @@ struct zram {
 	 * This is the limit on amount of *uncompressed* worth of data
 	 * we can store in a disk.
 	 */
-	size_t disksize;	/* bytes */
+	u64 disksize;	/* bytes */
 
 	struct zram_stats stats;
 };
 
-/*-- */
-
-/* Debugging and Stats */
-#if defined(CONFIG_ZRAM_STATS)
-static void zram_stat_inc(u32 *v)
-{
-	*v = *v + 1;
-}
-
-static void zram_stat_dec(u32 *v)
-{
-	*v = *v - 1;
-}
-
-static void zram_stat64_inc(struct zram *zram, u64 *v)
-{
-	spin_lock(&zram->stat64_lock);
-	*v = *v + 1;
-	spin_unlock(&zram->stat64_lock);
-}
-
-static u64 zram_stat64_read(struct zram *zram, u64 *v)
-{
-	u64 val;
-
-	spin_lock(&zram->stat64_lock);
-	val = *v;
-	spin_unlock(&zram->stat64_lock);
-
-	return val;
-}
-#else
-#define zram_stat_inc(v)
-#define zram_stat_dec(v)
-#define zram_stat64_inc(r, v)
-#define zram_stat64_read(r, v)
-#endif /* CONFIG_ZRAM_STATS */
+extern struct zram *devices;
+extern unsigned int num_devices;
+#ifdef CONFIG_SYSFS
+extern struct attribute_group zram_disk_attr_group;
+#endif
+
+extern int zram_init_device(struct zram *zram);
+extern void zram_reset_device(struct zram *zram);
 
 #endif
diff --git a/drivers/staging/zram/zram_ioctl.h b/drivers/staging/zram/zram_ioctl.h
deleted file mode 100644
index 5c415fa..0000000
--- a/drivers/staging/zram/zram_ioctl.h
+++ /dev/null
@@ -1,41 +0,0 @@
-/*
- * Compressed RAM block device
- *
- * Copyright (C) 2008, 2009, 2010  Nitin Gupta
- *
- * This code is released using a dual license strategy: BSD/GPL
- * You can choose the licence that better fits your requirements.
- *
- * Released under the terms of 3-clause BSD License
- * Released under the terms of GNU General Public License Version 2.0
- *
- * Project home: http://compcache.googlecode.com
- */
-
-#ifndef _ZRAM_IOCTL_H_
-#define _ZRAM_IOCTL_H_
-
-struct zram_ioctl_stats {
-	u64 disksize;		/* disksize in bytes (user specifies in KB) */
-	u64 num_reads;		/* failed + successful */
-	u64 num_writes;		/* --do-- */
-	u64 failed_reads;	/* should NEVER! happen */
-	u64 failed_writes;	/* can happen when memory is too low */
-	u64 invalid_io;		/* non-page-aligned I/O requests */
-	u64 notify_free;	/* no. of swap slot free notifications */
-	u32 pages_zero;		/* no. of zero filled pages */
-	u32 good_compress_pct;	/* no. of pages with compression ratio<=50% */
-	u32 pages_expand_pct;	/* no. of incompressible pages */
-	u32 pages_stored;
-	u32 pages_used;
-	u64 orig_data_size;
-	u64 compr_data_size;
-	u64 mem_used_total;
-} __attribute__ ((packed, aligned(4)));
-
-#define ZRAMIO_SET_DISKSIZE_KB	_IOW('z', 0, size_t)
-#define ZRAMIO_GET_STATS	_IOR('z', 1, struct zram_ioctl_stats)
-#define ZRAMIO_INIT		_IO('z', 2)
-#define ZRAMIO_RESET		_IO('z', 3)
-
-#endif
diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
new file mode 100644
index 0000000..b8dbaee
--- /dev/null
+++ b/drivers/staging/zram/zram_sysfs.c
@@ -0,0 +1,242 @@
+/*
+ * Compressed RAM block device
+ *
+ * Copyright (C) 2008, 2009, 2010  Nitin Gupta
+ *
+ * This code is released using a dual license strategy: BSD/GPL
+ * You can choose the licence that better fits your requirements.
+ *
+ * Released under the terms of 3-clause BSD License
+ * Released under the terms of GNU General Public License Version 2.0
+ *
+ * Project home: http://compcache.googlecode.com/
+ */
+
+#include <linux/device.h>
+#include <linux/genhd.h>
+
+#include "zram_drv.h"
+
+#ifdef CONFIG_SYSFS
+
+static u64 zram_stat64_read(struct zram *zram, u64 *v)
+{
+	u64 val;
+
+	spin_lock(&zram->stat64_lock);
+	val = *v;
+	spin_unlock(&zram->stat64_lock);
+
+	return val;
+}
+
+static struct zram *dev_to_zram(struct device *dev)
+{
+	int i;
+	struct zram *zram = NULL;
+
+	for (i = 0; i < num_devices; i++) {
+		zram = &devices[i];
+		if (disk_to_dev(zram->disk) == dev)
+			break;
+	}
+
+	return zram;
+}
+
+static ssize_t disksize_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return sprintf(buf, "%llu\n", zram->disksize);
+}
+
+static ssize_t disksize_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
+{
+	int ret;
+	struct zram *zram = dev_to_zram(dev);
+
+	if (zram->init_done)
+		return -EBUSY;
+
+	ret = strict_strtoull(buf, 10, &zram->disksize);
+	if (ret)
+		return ret;
+
+	zram->disksize &= PAGE_MASK;
+	set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
+
+	return len;
+}
+
+static ssize_t initstate_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return sprintf(buf, "%u\n", zram->init_done);
+}
+
+static ssize_t initstate_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
+{
+	int ret;
+	unsigned long do_init;
+	struct zram *zram = dev_to_zram(dev);
+
+	ret = strict_strtoul(buf, 10, &do_init);
+	if (ret)
+		return ret;
+
+	if (!do_init)
+		return -EINVAL;
+
+	zram_init_device(zram);
+
+	return len;
+}
+
+static ssize_t reset_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t len)
+{
+	int ret;
+	unsigned long do_reset;
+	struct zram *zram;
+	struct block_device *bdev;
+
+	zram = dev_to_zram(dev);
+	bdev = bdget_disk(zram->disk, 0);
+
+	/* Do not reset an active device! */
+	if (bdev->bd_holders)
+		return -EBUSY;
+
+	ret = strict_strtoul(buf, 10, &do_reset);
+	if (ret)
+		return ret;
+
+	if (!do_reset)
+		return -EINVAL;
+
+	/* Make sure all pending I/O is finished */
+	if (bdev)
+		fsync_bdev(bdev);
+
+	if (zram->init_done)
+		zram_reset_device(zram);
+
+	return len;
+}
+
+static ssize_t num_reads_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return sprintf(buf, "%llu\n",
+		zram_stat64_read(zram, &zram->stats.num_reads));
+}
+
+static ssize_t num_writes_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return sprintf(buf, "%llu\n",
+		zram_stat64_read(zram, &zram->stats.num_writes));
+}
+
+static ssize_t invalid_io_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return sprintf(buf, "%llu\n",
+		zram_stat64_read(zram, &zram->stats.invalid_io));
+}
+
+static ssize_t notify_free_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return sprintf(buf, "%llu\n",
+		zram_stat64_read(zram, &zram->stats.notify_free));
+}
+
+static ssize_t zero_pages_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return sprintf(buf, "%u\n", zram->stats.pages_zero);
+}
+
+static ssize_t orig_data_size_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return sprintf(buf, "%llu\n",
+		(u64)(zram->stats.pages_stored) << PAGE_SHIFT);
+}
+
+static ssize_t compr_data_size_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return sprintf(buf, "%llu\n",
+		zram_stat64_read(zram, &zram->stats.compr_size));
+}
+
+static ssize_t mem_used_total_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	u64 val = 0;
+	struct zram *zram = dev_to_zram(dev);
+
+	if (zram->init_done) {
+		val = xv_get_total_size_bytes(zram->mem_pool) +
+			((u64)(zram->stats.pages_expand) << PAGE_SHIFT);
+	}
+
+	return sprintf(buf, "%llu\n", val);
+}
+
+static DEVICE_ATTR(disksize, S_IRUGO | S_IWUGO,
+		disksize_show, disksize_store);
+static DEVICE_ATTR(initstate, S_IRUGO | S_IWUGO,
+		initstate_show, initstate_store);
+static DEVICE_ATTR(reset, S_IWUGO, NULL, reset_store);
+static DEVICE_ATTR(num_reads, S_IRUGO, num_reads_show, NULL);
+static DEVICE_ATTR(num_writes, S_IRUGO, num_writes_show, NULL);
+static DEVICE_ATTR(invalid_io, S_IRUGO, invalid_io_show, NULL);
+static DEVICE_ATTR(notify_free, S_IRUGO, notify_free_show, NULL);
+static DEVICE_ATTR(zero_pages, S_IRUGO, zero_pages_show, NULL);
+static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
+static DEVICE_ATTR(compr_data_size, S_IRUGO, compr_data_size_show, NULL);
+static DEVICE_ATTR(mem_used_total, S_IRUGO, mem_used_total_show, NULL);
+
+static struct attribute *zram_disk_attrs[] = {
+	&dev_attr_disksize.attr,
+	&dev_attr_initstate.attr,
+	&dev_attr_reset.attr,
+	&dev_attr_num_reads.attr,
+	&dev_attr_num_writes.attr,
+	&dev_attr_invalid_io.attr,
+	&dev_attr_notify_free.attr,
+	&dev_attr_zero_pages.attr,
+	&dev_attr_orig_data_size.attr,
+	&dev_attr_compr_data_size.attr,
+	&dev_attr_mem_used_total.attr,
+	NULL,
+};
+
+struct attribute_group zram_disk_attr_group = {
+	.attrs = zram_disk_attrs,
+};
+
+#endif	/* CONFIG_SYSFS */
-- 
1.7.2.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 02/10] Remove need for explicit device initialization
  2010-08-09 17:26 ` Nitin Gupta
@ 2010-08-09 17:26   ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

Currently, the user has to explicitly write a positive value to
initstate sysfs node before the device can be used. This event
triggers allocation of per-device metadata like memory pool,
table array and so on.

We do not pre-initialize all zram devices since the 'table' array,
mapping disk blocks to compressed chunks, takes considerable amount
of memory (8 bytes per page). So, pre-initializing all devices will
be quite wasteful if only few or none of the devices are actually
used.

This explicit device initialization from user is an odd requirement and
can be easily avoided. We now initialize the device when first write is
done to the device.

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
 drivers/staging/zram/zram_drv.c   |   35 +++++++++++++++++++++++------------
 drivers/staging/zram/zram_drv.h   |    2 ++
 drivers/staging/zram/zram_sysfs.c |   26 ++++----------------------
 3 files changed, 29 insertions(+), 34 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 3f698a5..c5f84ee 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -207,9 +207,15 @@ static int zram_read(struct zram *zram, struct bio *bio)
 	u32 index;
 	struct bio_vec *bvec;
 
-	zram_stat64_inc(zram, &zram->stats.num_reads);
+	if (unlikely(!zram->init_done)) {
+		set_bit(BIO_UPTODATE, &bio->bi_flags);
+		bio_endio(bio, 0);
+		return 0;
+	}
 
+	zram_stat64_inc(zram, &zram->stats.num_reads);
 	index = bio->bi_sector >> SECTORS_PER_PAGE_SHIFT;
+
 	bio_for_each_segment(bvec, bio, i) {
 		int ret;
 		size_t clen;
@@ -275,16 +281,20 @@ out:
 
 static int zram_write(struct zram *zram, struct bio *bio)
 {
-	int i;
+	int i, ret;
 	u32 index;
 	struct bio_vec *bvec;
 
-	zram_stat64_inc(zram, &zram->stats.num_writes);
+	if (unlikely(!zram->init_done)) {
+		ret = zram_init_device(zram);
+		if (ret)
+			goto out;
+	}
 
+	zram_stat64_inc(zram, &zram->stats.num_writes);
 	index = bio->bi_sector >> SECTORS_PER_PAGE_SHIFT;
 
 	bio_for_each_segment(bvec, bio, i) {
-		int ret;
 		u32 offset;
 		size_t clen;
 		struct zobj_header *zheader;
@@ -425,11 +435,6 @@ static int zram_make_request(struct request_queue *queue, struct bio *bio)
 	int ret = 0;
 	struct zram *zram = queue->queuedata;
 
-	if (unlikely(!zram->init_done)) {
-		bio_io_error(bio);
-		return 0;
-	}
-
 	if (!valid_io_request(zram, bio)) {
 		zram_stat64_inc(zram, &zram->stats.invalid_io);
 		bio_io_error(bio);
@@ -453,7 +458,7 @@ void zram_reset_device(struct zram *zram)
 {
 	size_t index;
 
-	/* Do not accept any new I/O request */
+	mutex_lock(&zram->init_lock);
 	zram->init_done = 0;
 
 	/* Free various per-device buffers */
@@ -490,6 +495,7 @@ void zram_reset_device(struct zram *zram)
 	memset(&zram->stats, 0, sizeof(zram->stats));
 
 	zram->disksize = 0;
+	mutex_unlock(&zram->init_lock);
 }
 
 int zram_init_device(struct zram *zram)
@@ -497,9 +503,11 @@ int zram_init_device(struct zram *zram)
 	int ret;
 	size_t num_pages;
 
+	mutex_lock(&zram->init_lock);
+
 	if (zram->init_done) {
-		pr_info("Device already initialized!\n");
-		return -EBUSY;
+		mutex_unlock(&zram->init_lock);
+		return 0;
 	}
 
 	zram_set_disksize(zram, totalram_pages << PAGE_SHIFT);
@@ -542,11 +550,13 @@ int zram_init_device(struct zram *zram)
 	}
 
 	zram->init_done = 1;
+	mutex_unlock(&zram->init_lock);
 
 	pr_debug("Initialization done!\n");
 	return 0;
 
 fail:
+	mutex_unlock(&zram->init_lock);
 	zram_reset_device(zram);
 
 	pr_err("Initialization failed: err=%d\n", ret);
@@ -572,6 +582,7 @@ static int create_device(struct zram *zram, int device_id)
 	int ret = 0;
 
 	mutex_init(&zram->lock);
+	mutex_init(&zram->init_lock);
 	spin_lock_init(&zram->stat64_lock);
 
 	zram->queue = blk_alloc_queue(GFP_KERNEL);
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 2ef93cc..a481551 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -108,6 +108,8 @@ struct zram {
 	struct request_queue *queue;
 	struct gendisk *disk;
 	int init_done;
+	/* Prevent concurrent execution of device init and reset */
+	struct mutex init_lock;
 	/*
 	 * This is the limit on amount of *uncompressed* worth of data
 	 * we can store in a disk.
diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
index b8dbaee..6c574a9 100644
--- a/drivers/staging/zram/zram_sysfs.c
+++ b/drivers/staging/zram/zram_sysfs.c
@@ -58,8 +58,10 @@ static ssize_t disksize_store(struct device *dev,
 	int ret;
 	struct zram *zram = dev_to_zram(dev);
 
-	if (zram->init_done)
+	if (zram->init_done) {
+		pr_info("Cannot change disksize for initialized device\n");
 		return -EBUSY;
+	}
 
 	ret = strict_strtoull(buf, 10, &zram->disksize);
 	if (ret)
@@ -79,25 +81,6 @@ static ssize_t initstate_show(struct device *dev,
 	return sprintf(buf, "%u\n", zram->init_done);
 }
 
-static ssize_t initstate_store(struct device *dev,
-		struct device_attribute *attr, const char *buf, size_t len)
-{
-	int ret;
-	unsigned long do_init;
-	struct zram *zram = dev_to_zram(dev);
-
-	ret = strict_strtoul(buf, 10, &do_init);
-	if (ret)
-		return ret;
-
-	if (!do_init)
-		return -EINVAL;
-
-	zram_init_device(zram);
-
-	return len;
-}
-
 static ssize_t reset_store(struct device *dev,
 		struct device_attribute *attr, const char *buf, size_t len)
 {
@@ -208,8 +191,7 @@ static ssize_t mem_used_total_show(struct device *dev,
 
 static DEVICE_ATTR(disksize, S_IRUGO | S_IWUGO,
 		disksize_show, disksize_store);
-static DEVICE_ATTR(initstate, S_IRUGO | S_IWUGO,
-		initstate_show, initstate_store);
+static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
 static DEVICE_ATTR(reset, S_IWUGO, NULL, reset_store);
 static DEVICE_ATTR(num_reads, S_IRUGO, num_reads_show, NULL);
 static DEVICE_ATTR(num_writes, S_IRUGO, num_writes_show, NULL);
-- 
1.7.2.1


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 02/10] Remove need for explicit device initialization
@ 2010-08-09 17:26   ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

Currently, the user has to explicitly write a positive value to
initstate sysfs node before the device can be used. This event
triggers allocation of per-device metadata like memory pool,
table array and so on.

We do not pre-initialize all zram devices since the 'table' array,
mapping disk blocks to compressed chunks, takes considerable amount
of memory (8 bytes per page). So, pre-initializing all devices will
be quite wasteful if only few or none of the devices are actually
used.

This explicit device initialization from user is an odd requirement and
can be easily avoided. We now initialize the device when first write is
done to the device.

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
 drivers/staging/zram/zram_drv.c   |   35 +++++++++++++++++++++++------------
 drivers/staging/zram/zram_drv.h   |    2 ++
 drivers/staging/zram/zram_sysfs.c |   26 ++++----------------------
 3 files changed, 29 insertions(+), 34 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 3f698a5..c5f84ee 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -207,9 +207,15 @@ static int zram_read(struct zram *zram, struct bio *bio)
 	u32 index;
 	struct bio_vec *bvec;
 
-	zram_stat64_inc(zram, &zram->stats.num_reads);
+	if (unlikely(!zram->init_done)) {
+		set_bit(BIO_UPTODATE, &bio->bi_flags);
+		bio_endio(bio, 0);
+		return 0;
+	}
 
+	zram_stat64_inc(zram, &zram->stats.num_reads);
 	index = bio->bi_sector >> SECTORS_PER_PAGE_SHIFT;
+
 	bio_for_each_segment(bvec, bio, i) {
 		int ret;
 		size_t clen;
@@ -275,16 +281,20 @@ out:
 
 static int zram_write(struct zram *zram, struct bio *bio)
 {
-	int i;
+	int i, ret;
 	u32 index;
 	struct bio_vec *bvec;
 
-	zram_stat64_inc(zram, &zram->stats.num_writes);
+	if (unlikely(!zram->init_done)) {
+		ret = zram_init_device(zram);
+		if (ret)
+			goto out;
+	}
 
+	zram_stat64_inc(zram, &zram->stats.num_writes);
 	index = bio->bi_sector >> SECTORS_PER_PAGE_SHIFT;
 
 	bio_for_each_segment(bvec, bio, i) {
-		int ret;
 		u32 offset;
 		size_t clen;
 		struct zobj_header *zheader;
@@ -425,11 +435,6 @@ static int zram_make_request(struct request_queue *queue, struct bio *bio)
 	int ret = 0;
 	struct zram *zram = queue->queuedata;
 
-	if (unlikely(!zram->init_done)) {
-		bio_io_error(bio);
-		return 0;
-	}
-
 	if (!valid_io_request(zram, bio)) {
 		zram_stat64_inc(zram, &zram->stats.invalid_io);
 		bio_io_error(bio);
@@ -453,7 +458,7 @@ void zram_reset_device(struct zram *zram)
 {
 	size_t index;
 
-	/* Do not accept any new I/O request */
+	mutex_lock(&zram->init_lock);
 	zram->init_done = 0;
 
 	/* Free various per-device buffers */
@@ -490,6 +495,7 @@ void zram_reset_device(struct zram *zram)
 	memset(&zram->stats, 0, sizeof(zram->stats));
 
 	zram->disksize = 0;
+	mutex_unlock(&zram->init_lock);
 }
 
 int zram_init_device(struct zram *zram)
@@ -497,9 +503,11 @@ int zram_init_device(struct zram *zram)
 	int ret;
 	size_t num_pages;
 
+	mutex_lock(&zram->init_lock);
+
 	if (zram->init_done) {
-		pr_info("Device already initialized!\n");
-		return -EBUSY;
+		mutex_unlock(&zram->init_lock);
+		return 0;
 	}
 
 	zram_set_disksize(zram, totalram_pages << PAGE_SHIFT);
@@ -542,11 +550,13 @@ int zram_init_device(struct zram *zram)
 	}
 
 	zram->init_done = 1;
+	mutex_unlock(&zram->init_lock);
 
 	pr_debug("Initialization done!\n");
 	return 0;
 
 fail:
+	mutex_unlock(&zram->init_lock);
 	zram_reset_device(zram);
 
 	pr_err("Initialization failed: err=%d\n", ret);
@@ -572,6 +582,7 @@ static int create_device(struct zram *zram, int device_id)
 	int ret = 0;
 
 	mutex_init(&zram->lock);
+	mutex_init(&zram->init_lock);
 	spin_lock_init(&zram->stat64_lock);
 
 	zram->queue = blk_alloc_queue(GFP_KERNEL);
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 2ef93cc..a481551 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -108,6 +108,8 @@ struct zram {
 	struct request_queue *queue;
 	struct gendisk *disk;
 	int init_done;
+	/* Prevent concurrent execution of device init and reset */
+	struct mutex init_lock;
 	/*
 	 * This is the limit on amount of *uncompressed* worth of data
 	 * we can store in a disk.
diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
index b8dbaee..6c574a9 100644
--- a/drivers/staging/zram/zram_sysfs.c
+++ b/drivers/staging/zram/zram_sysfs.c
@@ -58,8 +58,10 @@ static ssize_t disksize_store(struct device *dev,
 	int ret;
 	struct zram *zram = dev_to_zram(dev);
 
-	if (zram->init_done)
+	if (zram->init_done) {
+		pr_info("Cannot change disksize for initialized device\n");
 		return -EBUSY;
+	}
 
 	ret = strict_strtoull(buf, 10, &zram->disksize);
 	if (ret)
@@ -79,25 +81,6 @@ static ssize_t initstate_show(struct device *dev,
 	return sprintf(buf, "%u\n", zram->init_done);
 }
 
-static ssize_t initstate_store(struct device *dev,
-		struct device_attribute *attr, const char *buf, size_t len)
-{
-	int ret;
-	unsigned long do_init;
-	struct zram *zram = dev_to_zram(dev);
-
-	ret = strict_strtoul(buf, 10, &do_init);
-	if (ret)
-		return ret;
-
-	if (!do_init)
-		return -EINVAL;
-
-	zram_init_device(zram);
-
-	return len;
-}
-
 static ssize_t reset_store(struct device *dev,
 		struct device_attribute *attr, const char *buf, size_t len)
 {
@@ -208,8 +191,7 @@ static ssize_t mem_used_total_show(struct device *dev,
 
 static DEVICE_ATTR(disksize, S_IRUGO | S_IWUGO,
 		disksize_show, disksize_store);
-static DEVICE_ATTR(initstate, S_IRUGO | S_IWUGO,
-		initstate_show, initstate_store);
+static DEVICE_ATTR(initstate, S_IRUGO, initstate_show, NULL);
 static DEVICE_ATTR(reset, S_IWUGO, NULL, reset_store);
 static DEVICE_ATTR(num_reads, S_IRUGO, num_reads_show, NULL);
 static DEVICE_ATTR(num_writes, S_IRUGO, num_writes_show, NULL);
-- 
1.7.2.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 03/10] Use percpu stats
  2010-08-09 17:26 ` Nitin Gupta
@ 2010-08-09 17:26   ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

Also remove references to removed stats (ex: good_comress).

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
 drivers/staging/zram/zram_drv.c   |   77 ++++++++++++++++--------------------
 drivers/staging/zram/zram_drv.h   |   33 +++++++++-------
 drivers/staging/zram/zram_sysfs.c |   46 +++++++++++++++-------
 3 files changed, 84 insertions(+), 72 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index c5f84ee..f111b86 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -38,33 +38,27 @@ struct zram *devices;
 /* Module params (documentation at end) */
 unsigned int num_devices;
 
-static void zram_stat_inc(u32 *v)
+static void zram_add_stat(struct zram *zram,
+			enum zram_stats_index idx, s64 val)
 {
-	*v = *v + 1;
+	struct zram_stats_cpu *stats;
+
+	preempt_disable();
+	stats = __this_cpu_ptr(zram->stats);
+	u64_stats_update_begin(&stats->syncp);
+	stats->count[idx] += val;
+	u64_stats_update_end(&stats->syncp);
+	preempt_enable();
 }
 
-static void zram_stat_dec(u32 *v)
+static void zram_inc_stat(struct zram *zram, enum zram_stats_index idx)
 {
-	*v = *v - 1;
+	zram_add_stat(zram, idx, 1);
 }
 
-static void zram_stat64_add(struct zram *zram, u64 *v, u64 inc)
+static void zram_dec_stat(struct zram *zram, enum zram_stats_index idx)
 {
-	spin_lock(&zram->stat64_lock);
-	*v = *v + inc;
-	spin_unlock(&zram->stat64_lock);
-}
-
-static void zram_stat64_sub(struct zram *zram, u64 *v, u64 dec)
-{
-	spin_lock(&zram->stat64_lock);
-	*v = *v - dec;
-	spin_unlock(&zram->stat64_lock);
-}
-
-static void zram_stat64_inc(struct zram *zram, u64 *v)
-{
-	zram_stat64_add(zram, v, 1);
+	zram_add_stat(zram, idx, -1);
 }
 
 static int zram_test_flag(struct zram *zram, u32 index,
@@ -131,7 +125,7 @@ static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
 
 static void zram_free_page(struct zram *zram, size_t index)
 {
-	u32 clen;
+	int clen;
 	void *obj;
 
 	struct page *page = zram->table[index].page;
@@ -144,7 +138,7 @@ static void zram_free_page(struct zram *zram, size_t index)
 		 */
 		if (zram_test_flag(zram, index, ZRAM_ZERO)) {
 			zram_clear_flag(zram, index, ZRAM_ZERO);
-			zram_stat_dec(&zram->stats.pages_zero);
+			zram_dec_stat(zram, ZRAM_STAT_PAGES_ZERO);
 		}
 		return;
 	}
@@ -153,7 +147,7 @@ static void zram_free_page(struct zram *zram, size_t index)
 		clen = PAGE_SIZE;
 		__free_page(page);
 		zram_clear_flag(zram, index, ZRAM_UNCOMPRESSED);
-		zram_stat_dec(&zram->stats.pages_expand);
+		zram_dec_stat(zram, ZRAM_STAT_PAGES_EXPAND);
 		goto out;
 	}
 
@@ -162,12 +156,10 @@ static void zram_free_page(struct zram *zram, size_t index)
 	kunmap_atomic(obj, KM_USER0);
 
 	xv_free(zram->mem_pool, page, offset);
-	if (clen <= PAGE_SIZE / 2)
-		zram_stat_dec(&zram->stats.good_compress);
 
 out:
-	zram_stat64_sub(zram, &zram->stats.compr_size, clen);
-	zram_stat_dec(&zram->stats.pages_stored);
+	zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, -clen);
+	zram_dec_stat(zram, ZRAM_STAT_PAGES_STORED);
 
 	zram->table[index].page = NULL;
 	zram->table[index].offset = 0;
@@ -213,7 +205,7 @@ static int zram_read(struct zram *zram, struct bio *bio)
 		return 0;
 	}
 
-	zram_stat64_inc(zram, &zram->stats.num_reads);
+	zram_inc_stat(zram, ZRAM_STAT_NUM_READS);
 	index = bio->bi_sector >> SECTORS_PER_PAGE_SHIFT;
 
 	bio_for_each_segment(bvec, bio, i) {
@@ -262,7 +254,6 @@ static int zram_read(struct zram *zram, struct bio *bio)
 		if (unlikely(ret != LZO_E_OK)) {
 			pr_err("Decompression failed! err=%d, page=%u\n",
 				ret, index);
-			zram_stat64_inc(zram, &zram->stats.failed_reads);
 			goto out;
 		}
 
@@ -291,7 +282,7 @@ static int zram_write(struct zram *zram, struct bio *bio)
 			goto out;
 	}
 
-	zram_stat64_inc(zram, &zram->stats.num_writes);
+	zram_inc_stat(zram, ZRAM_STAT_NUM_WRITES);
 	index = bio->bi_sector >> SECTORS_PER_PAGE_SHIFT;
 
 	bio_for_each_segment(bvec, bio, i) {
@@ -318,7 +309,7 @@ static int zram_write(struct zram *zram, struct bio *bio)
 		if (page_zero_filled(user_mem)) {
 			kunmap_atomic(user_mem, KM_USER0);
 			mutex_unlock(&zram->lock);
-			zram_stat_inc(&zram->stats.pages_zero);
+			zram_inc_stat(zram, ZRAM_STAT_PAGES_ZERO);
 			zram_set_flag(zram, index, ZRAM_ZERO);
 			continue;
 		}
@@ -331,7 +322,6 @@ static int zram_write(struct zram *zram, struct bio *bio)
 		if (unlikely(ret != LZO_E_OK)) {
 			mutex_unlock(&zram->lock);
 			pr_err("Compression failed! err=%d\n", ret);
-			zram_stat64_inc(zram, &zram->stats.failed_writes);
 			goto out;
 		}
 
@@ -347,14 +337,12 @@ static int zram_write(struct zram *zram, struct bio *bio)
 				mutex_unlock(&zram->lock);
 				pr_info("Error allocating memory for "
 					"incompressible page: %u\n", index);
-				zram_stat64_inc(zram,
-					&zram->stats.failed_writes);
 				goto out;
 			}
 
 			offset = 0;
 			zram_set_flag(zram, index, ZRAM_UNCOMPRESSED);
-			zram_stat_inc(&zram->stats.pages_expand);
+			zram_inc_stat(zram, ZRAM_STAT_PAGES_EXPAND);
 			zram->table[index].page = page_store;
 			src = kmap_atomic(page, KM_USER0);
 			goto memstore;
@@ -366,7 +354,6 @@ static int zram_write(struct zram *zram, struct bio *bio)
 			mutex_unlock(&zram->lock);
 			pr_info("Error allocating memory for compressed "
 				"page: %u, size=%zu\n", index, clen);
-			zram_stat64_inc(zram, &zram->stats.failed_writes);
 			goto out;
 		}
 
@@ -392,10 +379,8 @@ memstore:
 			kunmap_atomic(src, KM_USER0);
 
 		/* Update stats */
-		zram_stat64_add(zram, &zram->stats.compr_size, clen);
-		zram_stat_inc(&zram->stats.pages_stored);
-		if (clen <= PAGE_SIZE / 2)
-			zram_stat_inc(&zram->stats.good_compress);
+		zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, clen);
+		zram_inc_stat(zram, ZRAM_STAT_PAGES_STORED);
 
 		mutex_unlock(&zram->lock);
 		index++;
@@ -436,7 +421,7 @@ static int zram_make_request(struct request_queue *queue, struct bio *bio)
 	struct zram *zram = queue->queuedata;
 
 	if (!valid_io_request(zram, bio)) {
-		zram_stat64_inc(zram, &zram->stats.invalid_io);
+		zram_inc_stat(zram, ZRAM_STAT_INVALID_IO);
 		bio_io_error(bio);
 		return 0;
 	}
@@ -542,6 +527,13 @@ int zram_init_device(struct zram *zram)
 	/* zram devices sort of resembles non-rotational disks */
 	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, zram->disk->queue);
 
+	zram->stats = alloc_percpu(struct zram_stats_cpu);
+	if (!zram->stats) {
+		pr_err("Error allocating percpu stats\n");
+		ret = -ENOMEM;
+		goto fail;
+	}
+
 	zram->mem_pool = xv_create_pool();
 	if (!zram->mem_pool) {
 		pr_err("Error creating memory pool\n");
@@ -569,7 +561,7 @@ void zram_slot_free_notify(struct block_device *bdev, unsigned long index)
 
 	zram = bdev->bd_disk->private_data;
 	zram_free_page(zram, index);
-	zram_stat64_inc(zram, &zram->stats.notify_free);
+	zram_inc_stat(zram, ZRAM_STAT_NOTIFY_FREE);
 }
 
 static const struct block_device_operations zram_devops = {
@@ -583,7 +575,6 @@ static int create_device(struct zram *zram, int device_id)
 
 	mutex_init(&zram->lock);
 	mutex_init(&zram->init_lock);
-	spin_lock_init(&zram->stat64_lock);
 
 	zram->queue = blk_alloc_queue(GFP_KERNEL);
 	if (!zram->queue) {
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index a481551..88fddb4 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -15,8 +15,8 @@
 #ifndef _ZRAM_DRV_H_
 #define _ZRAM_DRV_H_
 
-#include <linux/spinlock.h>
 #include <linux/mutex.h>
+#include <linux/u64_stats_sync.h>
 
 #include "xvmalloc.h"
 
@@ -83,18 +83,22 @@ struct table {
 	u8 flags;
 } __attribute__((aligned(4)));
 
-struct zram_stats {
-	u64 compr_size;		/* compressed size of pages stored */
-	u64 num_reads;		/* failed + successful */
-	u64 num_writes;		/* --do-- */
-	u64 failed_reads;	/* should NEVER! happen */
-	u64 failed_writes;	/* can happen when memory is too low */
-	u64 invalid_io;		/* non-page-aligned I/O requests */
-	u64 notify_free;	/* no. of swap slot free notifications */
-	u32 pages_zero;		/* no. of zero filled pages */
-	u32 pages_stored;	/* no. of pages currently stored */
-	u32 good_compress;	/* % of pages with compression ratio<=50% */
-	u32 pages_expand;	/* % of incompressible pages */
+enum zram_stats_index {
+	ZRAM_STAT_COMPR_SIZE,	/* compressed size of pages stored */
+	ZRAM_STAT_NUM_READS,	/* failed + successful */
+	ZRAM_STAT_NUM_WRITES,	/* --do-- */
+	ZRAM_STAT_INVALID_IO,	/* non-page-aligned I/O requests */
+	ZRAM_STAT_NOTIFY_FREE,	/* no. of swap slot free notifications */
+	ZRAM_STAT_DISCARD,	/* no. of block discard requests */
+	ZRAM_STAT_PAGES_ZERO,	/* no. of zero filled pages */
+	ZRAM_STAT_PAGES_STORED,	/* no. of pages currently stored */
+	ZRAM_STAT_PAGES_EXPAND,	/* no. of incompressible pages */
+	ZRAM_STAT_NSTATS,
+};
+
+struct zram_stats_cpu {
+	s64 count[ZRAM_STAT_NSTATS];
+	struct u64_stats_sync syncp;
 };
 
 struct zram {
@@ -102,7 +106,6 @@ struct zram {
 	void *compress_workmem;
 	void *compress_buffer;
 	struct table *table;
-	spinlock_t stat64_lock;	/* protect 64-bit stats */
 	struct mutex lock;	/* protect compression buffers against
 				 * concurrent writes */
 	struct request_queue *queue;
@@ -116,7 +119,7 @@ struct zram {
 	 */
 	u64 disksize;	/* bytes */
 
-	struct zram_stats stats;
+	struct zram_stats_cpu *stats;
 };
 
 extern struct zram *devices;
diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
index 6c574a9..43bcdd4 100644
--- a/drivers/staging/zram/zram_sysfs.c
+++ b/drivers/staging/zram/zram_sysfs.c
@@ -19,14 +19,30 @@
 
 #ifdef CONFIG_SYSFS
 
-static u64 zram_stat64_read(struct zram *zram, u64 *v)
+/*
+ * Individual percpu values can go negative but the sum across all CPUs
+ * must always be positive (we store various counts). So, return sum as
+ * unsigned value.
+ */
+static u64 zram_get_stat(struct zram *zram, enum zram_stats_index idx)
 {
-	u64 val;
-
-	spin_lock(&zram->stat64_lock);
-	val = *v;
-	spin_unlock(&zram->stat64_lock);
+	int cpu;
+	s64 val = 0;
+
+	for_each_possible_cpu(cpu) {
+		s64 temp;
+		unsigned int start;
+		struct zram_stats_cpu *stats;
+
+		stats = per_cpu_ptr(zram->stats, cpu);
+		do {
+			start = u64_stats_fetch_begin(&stats->syncp);
+			temp = stats->count[idx];
+		} while (u64_stats_fetch_retry(&stats->syncp, start));
+		val += temp;
+	}
 
+	WARN_ON(val < 0);
 	return val;
 }
 
@@ -119,7 +135,7 @@ static ssize_t num_reads_show(struct device *dev,
 	struct zram *zram = dev_to_zram(dev);
 
 	return sprintf(buf, "%llu\n",
-		zram_stat64_read(zram, &zram->stats.num_reads));
+		zram_get_stat(zram, ZRAM_STAT_NUM_READS));
 }
 
 static ssize_t num_writes_show(struct device *dev,
@@ -128,7 +144,7 @@ static ssize_t num_writes_show(struct device *dev,
 	struct zram *zram = dev_to_zram(dev);
 
 	return sprintf(buf, "%llu\n",
-		zram_stat64_read(zram, &zram->stats.num_writes));
+		zram_get_stat(zram, ZRAM_STAT_NUM_WRITES));
 }
 
 static ssize_t invalid_io_show(struct device *dev,
@@ -137,7 +153,7 @@ static ssize_t invalid_io_show(struct device *dev,
 	struct zram *zram = dev_to_zram(dev);
 
 	return sprintf(buf, "%llu\n",
-		zram_stat64_read(zram, &zram->stats.invalid_io));
+		zram_get_stat(zram, ZRAM_STAT_INVALID_IO));
 }
 
 static ssize_t notify_free_show(struct device *dev,
@@ -146,7 +162,7 @@ static ssize_t notify_free_show(struct device *dev,
 	struct zram *zram = dev_to_zram(dev);
 
 	return sprintf(buf, "%llu\n",
-		zram_stat64_read(zram, &zram->stats.notify_free));
+		zram_get_stat(zram, ZRAM_STAT_NOTIFY_FREE));
 }
 
 static ssize_t zero_pages_show(struct device *dev,
@@ -154,7 +170,8 @@ static ssize_t zero_pages_show(struct device *dev,
 {
 	struct zram *zram = dev_to_zram(dev);
 
-	return sprintf(buf, "%u\n", zram->stats.pages_zero);
+	return sprintf(buf, "%llu\n",
+		zram_get_stat(zram, ZRAM_STAT_PAGES_ZERO));
 }
 
 static ssize_t orig_data_size_show(struct device *dev,
@@ -163,7 +180,7 @@ static ssize_t orig_data_size_show(struct device *dev,
 	struct zram *zram = dev_to_zram(dev);
 
 	return sprintf(buf, "%llu\n",
-		(u64)(zram->stats.pages_stored) << PAGE_SHIFT);
+		zram_get_stat(zram, ZRAM_STAT_PAGES_STORED) << PAGE_SHIFT);
 }
 
 static ssize_t compr_data_size_show(struct device *dev,
@@ -172,7 +189,7 @@ static ssize_t compr_data_size_show(struct device *dev,
 	struct zram *zram = dev_to_zram(dev);
 
 	return sprintf(buf, "%llu\n",
-		zram_stat64_read(zram, &zram->stats.compr_size));
+		zram_get_stat(zram, ZRAM_STAT_COMPR_SIZE));
 }
 
 static ssize_t mem_used_total_show(struct device *dev,
@@ -183,7 +200,8 @@ static ssize_t mem_used_total_show(struct device *dev,
 
 	if (zram->init_done) {
 		val = xv_get_total_size_bytes(zram->mem_pool) +
-			((u64)(zram->stats.pages_expand) << PAGE_SHIFT);
+			(zram_get_stat(zram, ZRAM_STAT_PAGES_EXPAND)
+				<< PAGE_SHIFT);
 	}
 
 	return sprintf(buf, "%llu\n", val);
-- 
1.7.2.1


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 03/10] Use percpu stats
@ 2010-08-09 17:26   ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

Also remove references to removed stats (ex: good_comress).

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
 drivers/staging/zram/zram_drv.c   |   77 ++++++++++++++++--------------------
 drivers/staging/zram/zram_drv.h   |   33 +++++++++-------
 drivers/staging/zram/zram_sysfs.c |   46 +++++++++++++++-------
 3 files changed, 84 insertions(+), 72 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index c5f84ee..f111b86 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -38,33 +38,27 @@ struct zram *devices;
 /* Module params (documentation at end) */
 unsigned int num_devices;
 
-static void zram_stat_inc(u32 *v)
+static void zram_add_stat(struct zram *zram,
+			enum zram_stats_index idx, s64 val)
 {
-	*v = *v + 1;
+	struct zram_stats_cpu *stats;
+
+	preempt_disable();
+	stats = __this_cpu_ptr(zram->stats);
+	u64_stats_update_begin(&stats->syncp);
+	stats->count[idx] += val;
+	u64_stats_update_end(&stats->syncp);
+	preempt_enable();
 }
 
-static void zram_stat_dec(u32 *v)
+static void zram_inc_stat(struct zram *zram, enum zram_stats_index idx)
 {
-	*v = *v - 1;
+	zram_add_stat(zram, idx, 1);
 }
 
-static void zram_stat64_add(struct zram *zram, u64 *v, u64 inc)
+static void zram_dec_stat(struct zram *zram, enum zram_stats_index idx)
 {
-	spin_lock(&zram->stat64_lock);
-	*v = *v + inc;
-	spin_unlock(&zram->stat64_lock);
-}
-
-static void zram_stat64_sub(struct zram *zram, u64 *v, u64 dec)
-{
-	spin_lock(&zram->stat64_lock);
-	*v = *v - dec;
-	spin_unlock(&zram->stat64_lock);
-}
-
-static void zram_stat64_inc(struct zram *zram, u64 *v)
-{
-	zram_stat64_add(zram, v, 1);
+	zram_add_stat(zram, idx, -1);
 }
 
 static int zram_test_flag(struct zram *zram, u32 index,
@@ -131,7 +125,7 @@ static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
 
 static void zram_free_page(struct zram *zram, size_t index)
 {
-	u32 clen;
+	int clen;
 	void *obj;
 
 	struct page *page = zram->table[index].page;
@@ -144,7 +138,7 @@ static void zram_free_page(struct zram *zram, size_t index)
 		 */
 		if (zram_test_flag(zram, index, ZRAM_ZERO)) {
 			zram_clear_flag(zram, index, ZRAM_ZERO);
-			zram_stat_dec(&zram->stats.pages_zero);
+			zram_dec_stat(zram, ZRAM_STAT_PAGES_ZERO);
 		}
 		return;
 	}
@@ -153,7 +147,7 @@ static void zram_free_page(struct zram *zram, size_t index)
 		clen = PAGE_SIZE;
 		__free_page(page);
 		zram_clear_flag(zram, index, ZRAM_UNCOMPRESSED);
-		zram_stat_dec(&zram->stats.pages_expand);
+		zram_dec_stat(zram, ZRAM_STAT_PAGES_EXPAND);
 		goto out;
 	}
 
@@ -162,12 +156,10 @@ static void zram_free_page(struct zram *zram, size_t index)
 	kunmap_atomic(obj, KM_USER0);
 
 	xv_free(zram->mem_pool, page, offset);
-	if (clen <= PAGE_SIZE / 2)
-		zram_stat_dec(&zram->stats.good_compress);
 
 out:
-	zram_stat64_sub(zram, &zram->stats.compr_size, clen);
-	zram_stat_dec(&zram->stats.pages_stored);
+	zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, -clen);
+	zram_dec_stat(zram, ZRAM_STAT_PAGES_STORED);
 
 	zram->table[index].page = NULL;
 	zram->table[index].offset = 0;
@@ -213,7 +205,7 @@ static int zram_read(struct zram *zram, struct bio *bio)
 		return 0;
 	}
 
-	zram_stat64_inc(zram, &zram->stats.num_reads);
+	zram_inc_stat(zram, ZRAM_STAT_NUM_READS);
 	index = bio->bi_sector >> SECTORS_PER_PAGE_SHIFT;
 
 	bio_for_each_segment(bvec, bio, i) {
@@ -262,7 +254,6 @@ static int zram_read(struct zram *zram, struct bio *bio)
 		if (unlikely(ret != LZO_E_OK)) {
 			pr_err("Decompression failed! err=%d, page=%u\n",
 				ret, index);
-			zram_stat64_inc(zram, &zram->stats.failed_reads);
 			goto out;
 		}
 
@@ -291,7 +282,7 @@ static int zram_write(struct zram *zram, struct bio *bio)
 			goto out;
 	}
 
-	zram_stat64_inc(zram, &zram->stats.num_writes);
+	zram_inc_stat(zram, ZRAM_STAT_NUM_WRITES);
 	index = bio->bi_sector >> SECTORS_PER_PAGE_SHIFT;
 
 	bio_for_each_segment(bvec, bio, i) {
@@ -318,7 +309,7 @@ static int zram_write(struct zram *zram, struct bio *bio)
 		if (page_zero_filled(user_mem)) {
 			kunmap_atomic(user_mem, KM_USER0);
 			mutex_unlock(&zram->lock);
-			zram_stat_inc(&zram->stats.pages_zero);
+			zram_inc_stat(zram, ZRAM_STAT_PAGES_ZERO);
 			zram_set_flag(zram, index, ZRAM_ZERO);
 			continue;
 		}
@@ -331,7 +322,6 @@ static int zram_write(struct zram *zram, struct bio *bio)
 		if (unlikely(ret != LZO_E_OK)) {
 			mutex_unlock(&zram->lock);
 			pr_err("Compression failed! err=%d\n", ret);
-			zram_stat64_inc(zram, &zram->stats.failed_writes);
 			goto out;
 		}
 
@@ -347,14 +337,12 @@ static int zram_write(struct zram *zram, struct bio *bio)
 				mutex_unlock(&zram->lock);
 				pr_info("Error allocating memory for "
 					"incompressible page: %u\n", index);
-				zram_stat64_inc(zram,
-					&zram->stats.failed_writes);
 				goto out;
 			}
 
 			offset = 0;
 			zram_set_flag(zram, index, ZRAM_UNCOMPRESSED);
-			zram_stat_inc(&zram->stats.pages_expand);
+			zram_inc_stat(zram, ZRAM_STAT_PAGES_EXPAND);
 			zram->table[index].page = page_store;
 			src = kmap_atomic(page, KM_USER0);
 			goto memstore;
@@ -366,7 +354,6 @@ static int zram_write(struct zram *zram, struct bio *bio)
 			mutex_unlock(&zram->lock);
 			pr_info("Error allocating memory for compressed "
 				"page: %u, size=%zu\n", index, clen);
-			zram_stat64_inc(zram, &zram->stats.failed_writes);
 			goto out;
 		}
 
@@ -392,10 +379,8 @@ memstore:
 			kunmap_atomic(src, KM_USER0);
 
 		/* Update stats */
-		zram_stat64_add(zram, &zram->stats.compr_size, clen);
-		zram_stat_inc(&zram->stats.pages_stored);
-		if (clen <= PAGE_SIZE / 2)
-			zram_stat_inc(&zram->stats.good_compress);
+		zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, clen);
+		zram_inc_stat(zram, ZRAM_STAT_PAGES_STORED);
 
 		mutex_unlock(&zram->lock);
 		index++;
@@ -436,7 +421,7 @@ static int zram_make_request(struct request_queue *queue, struct bio *bio)
 	struct zram *zram = queue->queuedata;
 
 	if (!valid_io_request(zram, bio)) {
-		zram_stat64_inc(zram, &zram->stats.invalid_io);
+		zram_inc_stat(zram, ZRAM_STAT_INVALID_IO);
 		bio_io_error(bio);
 		return 0;
 	}
@@ -542,6 +527,13 @@ int zram_init_device(struct zram *zram)
 	/* zram devices sort of resembles non-rotational disks */
 	queue_flag_set_unlocked(QUEUE_FLAG_NONROT, zram->disk->queue);
 
+	zram->stats = alloc_percpu(struct zram_stats_cpu);
+	if (!zram->stats) {
+		pr_err("Error allocating percpu stats\n");
+		ret = -ENOMEM;
+		goto fail;
+	}
+
 	zram->mem_pool = xv_create_pool();
 	if (!zram->mem_pool) {
 		pr_err("Error creating memory pool\n");
@@ -569,7 +561,7 @@ void zram_slot_free_notify(struct block_device *bdev, unsigned long index)
 
 	zram = bdev->bd_disk->private_data;
 	zram_free_page(zram, index);
-	zram_stat64_inc(zram, &zram->stats.notify_free);
+	zram_inc_stat(zram, ZRAM_STAT_NOTIFY_FREE);
 }
 
 static const struct block_device_operations zram_devops = {
@@ -583,7 +575,6 @@ static int create_device(struct zram *zram, int device_id)
 
 	mutex_init(&zram->lock);
 	mutex_init(&zram->init_lock);
-	spin_lock_init(&zram->stat64_lock);
 
 	zram->queue = blk_alloc_queue(GFP_KERNEL);
 	if (!zram->queue) {
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index a481551..88fddb4 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -15,8 +15,8 @@
 #ifndef _ZRAM_DRV_H_
 #define _ZRAM_DRV_H_
 
-#include <linux/spinlock.h>
 #include <linux/mutex.h>
+#include <linux/u64_stats_sync.h>
 
 #include "xvmalloc.h"
 
@@ -83,18 +83,22 @@ struct table {
 	u8 flags;
 } __attribute__((aligned(4)));
 
-struct zram_stats {
-	u64 compr_size;		/* compressed size of pages stored */
-	u64 num_reads;		/* failed + successful */
-	u64 num_writes;		/* --do-- */
-	u64 failed_reads;	/* should NEVER! happen */
-	u64 failed_writes;	/* can happen when memory is too low */
-	u64 invalid_io;		/* non-page-aligned I/O requests */
-	u64 notify_free;	/* no. of swap slot free notifications */
-	u32 pages_zero;		/* no. of zero filled pages */
-	u32 pages_stored;	/* no. of pages currently stored */
-	u32 good_compress;	/* % of pages with compression ratio<=50% */
-	u32 pages_expand;	/* % of incompressible pages */
+enum zram_stats_index {
+	ZRAM_STAT_COMPR_SIZE,	/* compressed size of pages stored */
+	ZRAM_STAT_NUM_READS,	/* failed + successful */
+	ZRAM_STAT_NUM_WRITES,	/* --do-- */
+	ZRAM_STAT_INVALID_IO,	/* non-page-aligned I/O requests */
+	ZRAM_STAT_NOTIFY_FREE,	/* no. of swap slot free notifications */
+	ZRAM_STAT_DISCARD,	/* no. of block discard requests */
+	ZRAM_STAT_PAGES_ZERO,	/* no. of zero filled pages */
+	ZRAM_STAT_PAGES_STORED,	/* no. of pages currently stored */
+	ZRAM_STAT_PAGES_EXPAND,	/* no. of incompressible pages */
+	ZRAM_STAT_NSTATS,
+};
+
+struct zram_stats_cpu {
+	s64 count[ZRAM_STAT_NSTATS];
+	struct u64_stats_sync syncp;
 };
 
 struct zram {
@@ -102,7 +106,6 @@ struct zram {
 	void *compress_workmem;
 	void *compress_buffer;
 	struct table *table;
-	spinlock_t stat64_lock;	/* protect 64-bit stats */
 	struct mutex lock;	/* protect compression buffers against
 				 * concurrent writes */
 	struct request_queue *queue;
@@ -116,7 +119,7 @@ struct zram {
 	 */
 	u64 disksize;	/* bytes */
 
-	struct zram_stats stats;
+	struct zram_stats_cpu *stats;
 };
 
 extern struct zram *devices;
diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
index 6c574a9..43bcdd4 100644
--- a/drivers/staging/zram/zram_sysfs.c
+++ b/drivers/staging/zram/zram_sysfs.c
@@ -19,14 +19,30 @@
 
 #ifdef CONFIG_SYSFS
 
-static u64 zram_stat64_read(struct zram *zram, u64 *v)
+/*
+ * Individual percpu values can go negative but the sum across all CPUs
+ * must always be positive (we store various counts). So, return sum as
+ * unsigned value.
+ */
+static u64 zram_get_stat(struct zram *zram, enum zram_stats_index idx)
 {
-	u64 val;
-
-	spin_lock(&zram->stat64_lock);
-	val = *v;
-	spin_unlock(&zram->stat64_lock);
+	int cpu;
+	s64 val = 0;
+
+	for_each_possible_cpu(cpu) {
+		s64 temp;
+		unsigned int start;
+		struct zram_stats_cpu *stats;
+
+		stats = per_cpu_ptr(zram->stats, cpu);
+		do {
+			start = u64_stats_fetch_begin(&stats->syncp);
+			temp = stats->count[idx];
+		} while (u64_stats_fetch_retry(&stats->syncp, start));
+		val += temp;
+	}
 
+	WARN_ON(val < 0);
 	return val;
 }
 
@@ -119,7 +135,7 @@ static ssize_t num_reads_show(struct device *dev,
 	struct zram *zram = dev_to_zram(dev);
 
 	return sprintf(buf, "%llu\n",
-		zram_stat64_read(zram, &zram->stats.num_reads));
+		zram_get_stat(zram, ZRAM_STAT_NUM_READS));
 }
 
 static ssize_t num_writes_show(struct device *dev,
@@ -128,7 +144,7 @@ static ssize_t num_writes_show(struct device *dev,
 	struct zram *zram = dev_to_zram(dev);
 
 	return sprintf(buf, "%llu\n",
-		zram_stat64_read(zram, &zram->stats.num_writes));
+		zram_get_stat(zram, ZRAM_STAT_NUM_WRITES));
 }
 
 static ssize_t invalid_io_show(struct device *dev,
@@ -137,7 +153,7 @@ static ssize_t invalid_io_show(struct device *dev,
 	struct zram *zram = dev_to_zram(dev);
 
 	return sprintf(buf, "%llu\n",
-		zram_stat64_read(zram, &zram->stats.invalid_io));
+		zram_get_stat(zram, ZRAM_STAT_INVALID_IO));
 }
 
 static ssize_t notify_free_show(struct device *dev,
@@ -146,7 +162,7 @@ static ssize_t notify_free_show(struct device *dev,
 	struct zram *zram = dev_to_zram(dev);
 
 	return sprintf(buf, "%llu\n",
-		zram_stat64_read(zram, &zram->stats.notify_free));
+		zram_get_stat(zram, ZRAM_STAT_NOTIFY_FREE));
 }
 
 static ssize_t zero_pages_show(struct device *dev,
@@ -154,7 +170,8 @@ static ssize_t zero_pages_show(struct device *dev,
 {
 	struct zram *zram = dev_to_zram(dev);
 
-	return sprintf(buf, "%u\n", zram->stats.pages_zero);
+	return sprintf(buf, "%llu\n",
+		zram_get_stat(zram, ZRAM_STAT_PAGES_ZERO));
 }
 
 static ssize_t orig_data_size_show(struct device *dev,
@@ -163,7 +180,7 @@ static ssize_t orig_data_size_show(struct device *dev,
 	struct zram *zram = dev_to_zram(dev);
 
 	return sprintf(buf, "%llu\n",
-		(u64)(zram->stats.pages_stored) << PAGE_SHIFT);
+		zram_get_stat(zram, ZRAM_STAT_PAGES_STORED) << PAGE_SHIFT);
 }
 
 static ssize_t compr_data_size_show(struct device *dev,
@@ -172,7 +189,7 @@ static ssize_t compr_data_size_show(struct device *dev,
 	struct zram *zram = dev_to_zram(dev);
 
 	return sprintf(buf, "%llu\n",
-		zram_stat64_read(zram, &zram->stats.compr_size));
+		zram_get_stat(zram, ZRAM_STAT_COMPR_SIZE));
 }
 
 static ssize_t mem_used_total_show(struct device *dev,
@@ -183,7 +200,8 @@ static ssize_t mem_used_total_show(struct device *dev,
 
 	if (zram->init_done) {
 		val = xv_get_total_size_bytes(zram->mem_pool) +
-			((u64)(zram->stats.pages_expand) << PAGE_SHIFT);
+			(zram_get_stat(zram, ZRAM_STAT_PAGES_EXPAND)
+				<< PAGE_SHIFT);
 	}
 
 	return sprintf(buf, "%llu\n", val);
-- 
1.7.2.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 04/10] Use percpu buffers
  2010-08-09 17:26 ` Nitin Gupta
@ 2010-08-09 17:26   ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

This also removes the (mutex) lock which was used to
protect these buffers.

Tests with fio on dual core showed improvement of about
20% for write speed.

fio job description:

[global]
bs=4k
ioengine=libaio
iodepth=8
size=1g
direct=1
runtime=60
directory=/mnt/zdisk
verify=meta
verify=0x1234567890abcdef
numjobs=2

[seq-write]
rw=write

Speedup was negligible with iodepth of 4 or less. Maybe
gains will be more visible with higher number of cores.

Results with above job file:
Before:
  WRITE: io=2,048MB, aggrb=70,725KB/s
  WRITE: io=2,048MB, aggrb=71,938KB/s
  WRITE: io=2,048MB, aggrb=70,461KB/s

After:
  WRITE: io=2,048MB, aggrb=86,691KB/s
  WRITE: io=2,048MB, aggrb=87,025KB/s
  WRITE: io=2,048MB, aggrb=88,700KB/s

Speedup: ~20%

Read performance remained almost the same since the read
path was lockless earlier also (LZO decompressor does not
require any additional buffer).

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
 drivers/staging/zram/zram_drv.c |  136 +++++++++++++++++++++++++--------------
 drivers/staging/zram/zram_drv.h |    4 -
 2 files changed, 88 insertions(+), 52 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index f111b86..c16e09a 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -17,13 +17,14 @@
 
 #include <linux/module.h>
 #include <linux/kernel.h>
-#include <linux/bio.h>
 #include <linux/bitops.h>
 #include <linux/blkdev.h>
 #include <linux/buffer_head.h>
+#include <linux/cpu.h>
 #include <linux/device.h>
 #include <linux/genhd.h>
 #include <linux/highmem.h>
+#include <linux/notifier.h>
 #include <linux/slab.h>
 #include <linux/lzo.h>
 #include <linux/string.h>
@@ -31,6 +32,9 @@
 
 #include "zram_drv.h"
 
+static DEFINE_PER_CPU(unsigned char *, compress_buffer);
+static DEFINE_PER_CPU(unsigned char *, compress_workmem);
+
 /* Globals */
 static int zram_major;
 struct zram *devices;
@@ -290,10 +294,10 @@ static int zram_write(struct zram *zram, struct bio *bio)
 		size_t clen;
 		struct zobj_header *zheader;
 		struct page *page, *page_store;
+		unsigned char *zbuffer, *zworkmem;
 		unsigned char *user_mem, *cmem, *src;
 
 		page = bvec->bv_page;
-		src = zram->compress_buffer;
 
 		/*
 		 * System overwrites unused sectors. Free memory associated
@@ -303,38 +307,41 @@ static int zram_write(struct zram *zram, struct bio *bio)
 				zram_test_flag(zram, index, ZRAM_ZERO))
 			zram_free_page(zram, index);
 
-		mutex_lock(&zram->lock);
+		preempt_disable();
+		zbuffer = __get_cpu_var(compress_buffer);
+		zworkmem = __get_cpu_var(compress_workmem);
+		if (unlikely(!zbuffer || !zworkmem)) {
+			preempt_enable();
+			goto out;
+		}
 
+		src = zbuffer;
 		user_mem = kmap_atomic(page, KM_USER0);
 		if (page_zero_filled(user_mem)) {
 			kunmap_atomic(user_mem, KM_USER0);
-			mutex_unlock(&zram->lock);
+			preempt_enable();
 			zram_inc_stat(zram, ZRAM_STAT_PAGES_ZERO);
 			zram_set_flag(zram, index, ZRAM_ZERO);
 			continue;
 		}
 
 		ret = lzo1x_1_compress(user_mem, PAGE_SIZE, src, &clen,
-					zram->compress_workmem);
+					zworkmem);
 
 		kunmap_atomic(user_mem, KM_USER0);
 
 		if (unlikely(ret != LZO_E_OK)) {
-			mutex_unlock(&zram->lock);
+			preempt_enable();
 			pr_err("Compression failed! err=%d\n", ret);
 			goto out;
 		}
 
-		/*
-		 * Page is incompressible. Store it as-is (uncompressed)
-		 * since we do not want to return too many disk write
-		 * errors which has side effect of hanging the system.
-		 */
+		 /* Page is incompressible. Store it as-is (uncompressed) */
 		if (unlikely(clen > max_zpage_size)) {
 			clen = PAGE_SIZE;
-			page_store = alloc_page(GFP_NOIO | __GFP_HIGHMEM);
+			page_store = alloc_page(GFP_NOWAIT | __GFP_HIGHMEM);
 			if (unlikely(!page_store)) {
-				mutex_unlock(&zram->lock);
+				preempt_enable();
 				pr_info("Error allocating memory for "
 					"incompressible page: %u\n", index);
 				goto out;
@@ -350,8 +357,8 @@ static int zram_write(struct zram *zram, struct bio *bio)
 
 		if (xv_malloc(zram->mem_pool, clen + sizeof(*zheader),
 				&zram->table[index].page, &offset,
-				GFP_NOIO | __GFP_HIGHMEM)) {
-			mutex_unlock(&zram->lock);
+				GFP_NOWAIT | __GFP_HIGHMEM)) {
+			preempt_enable();
 			pr_info("Error allocating memory for compressed "
 				"page: %u, size=%zu\n", index, clen);
 			goto out;
@@ -363,18 +370,10 @@ memstore:
 		cmem = kmap_atomic(zram->table[index].page, KM_USER1) +
 				zram->table[index].offset;
 
-#if 0
-		/* Back-reference needed for memory defragmentation */
-		if (!zram_test_flag(zram, index, ZRAM_UNCOMPRESSED)) {
-			zheader = (struct zobj_header *)cmem;
-			zheader->table_idx = index;
-			cmem += sizeof(*zheader);
-		}
-#endif
-
 		memcpy(cmem, src, clen);
-
 		kunmap_atomic(cmem, KM_USER1);
+		preempt_enable();
+
 		if (unlikely(zram_test_flag(zram, index, ZRAM_UNCOMPRESSED)))
 			kunmap_atomic(src, KM_USER0);
 
@@ -382,7 +381,6 @@ memstore:
 		zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, clen);
 		zram_inc_stat(zram, ZRAM_STAT_PAGES_STORED);
 
-		mutex_unlock(&zram->lock);
 		index++;
 	}
 
@@ -446,13 +444,6 @@ void zram_reset_device(struct zram *zram)
 	mutex_lock(&zram->init_lock);
 	zram->init_done = 0;
 
-	/* Free various per-device buffers */
-	kfree(zram->compress_workmem);
-	free_pages((unsigned long)zram->compress_buffer, 1);
-
-	zram->compress_workmem = NULL;
-	zram->compress_buffer = NULL;
-
 	/* Free all pages that are still in this zram device */
 	for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++) {
 		struct page *page;
@@ -497,20 +488,6 @@ int zram_init_device(struct zram *zram)
 
 	zram_set_disksize(zram, totalram_pages << PAGE_SHIFT);
 
-	zram->compress_workmem = kzalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
-	if (!zram->compress_workmem) {
-		pr_err("Error allocating compressor working memory!\n");
-		ret = -ENOMEM;
-		goto fail;
-	}
-
-	zram->compress_buffer = (void *)__get_free_pages(__GFP_ZERO, 1);
-	if (!zram->compress_buffer) {
-		pr_err("Error allocating compressor buffer space\n");
-		ret = -ENOMEM;
-		goto fail;
-	}
-
 	num_pages = zram->disksize >> PAGE_SHIFT;
 	zram->table = vmalloc(num_pages * sizeof(*zram->table));
 	if (!zram->table) {
@@ -573,7 +550,6 @@ static int create_device(struct zram *zram, int device_id)
 {
 	int ret = 0;
 
-	mutex_init(&zram->lock);
 	mutex_init(&zram->init_lock);
 
 	zram->queue = blk_alloc_queue(GFP_KERNEL);
@@ -649,9 +625,46 @@ static void destroy_device(struct zram *zram)
 		blk_cleanup_queue(zram->queue);
 }
 
+/*
+ * Callback for CPU hotplug events. Allocates percpu compression buffers.
+ */
+static int zram_cpu_notify(struct notifier_block *nb, unsigned long action,
+			void *pcpu)
+{
+	int cpu = (long)pcpu;
+
+	switch (action) {
+	case CPU_UP_PREPARE:
+		per_cpu(compress_buffer, cpu) = (void *)__get_free_pages(
+					GFP_KERNEL | __GFP_ZERO, 1);
+		per_cpu(compress_workmem, cpu) = kzalloc(
+					LZO1X_MEM_COMPRESS, GFP_KERNEL);
+
+		break;
+	case CPU_DEAD:
+	case CPU_UP_CANCELED:
+		free_pages((unsigned long)(per_cpu(compress_buffer, cpu)), 1);
+		per_cpu(compress_buffer, cpu) = NULL;
+
+		kfree(per_cpu(compress_buffer, cpu));
+		per_cpu(compress_buffer, cpu) = NULL;
+
+		break;
+	default:
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block zram_cpu_nb = {
+	.notifier_call = zram_cpu_notify
+};
+
 static int __init zram_init(void)
 {
 	int ret, dev_id;
+	unsigned int cpu;
 
 	if (num_devices > max_num_devices) {
 		pr_warning("Invalid value for num_devices: %u\n",
@@ -660,6 +673,20 @@ static int __init zram_init(void)
 		goto out;
 	}
 
+	ret = register_cpu_notifier(&zram_cpu_nb);
+	if (ret)
+		goto out;
+
+	for_each_online_cpu(cpu) {
+		void *pcpu = (void *)(long)cpu;
+		zram_cpu_notify(&zram_cpu_nb, CPU_UP_PREPARE, pcpu);
+		if (!per_cpu(compress_buffer, cpu) ||
+					!per_cpu(compress_workmem, cpu)) {
+			ret = -ENOMEM;
+			goto out;
+		}
+	}
+
 	zram_major = register_blkdev(0, "zram");
 	if (zram_major <= 0) {
 		pr_warning("Unable to get major number\n");
@@ -694,12 +721,18 @@ free_devices:
 unregister:
 	unregister_blkdev(zram_major, "zram");
 out:
+	for_each_online_cpu(cpu) {
+		void *pcpu = (void *)(long)cpu;
+		zram_cpu_notify(&zram_cpu_nb, CPU_UP_CANCELED, pcpu);
+	}
+
 	return ret;
 }
 
 static void __exit zram_exit(void)
 {
 	int i;
+	unsigned int cpu;
 	struct zram *zram;
 
 	for (i = 0; i < num_devices; i++) {
@@ -712,6 +745,13 @@ static void __exit zram_exit(void)
 
 	unregister_blkdev(zram_major, "zram");
 
+	for_each_online_cpu(cpu) {
+		void *pcpu = (void *)(long)cpu;
+		zram_cpu_notify(&zram_cpu_nb, CPU_UP_CANCELED, pcpu);
+	}
+
+	unregister_cpu_notifier(&zram_cpu_nb);
+
 	kfree(devices);
 	pr_debug("Cleanup done!\n");
 }
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 88fddb4..21c97f6 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -103,11 +103,7 @@ struct zram_stats_cpu {
 
 struct zram {
 	struct xv_pool *mem_pool;
-	void *compress_workmem;
-	void *compress_buffer;
 	struct table *table;
-	struct mutex lock;	/* protect compression buffers against
-				 * concurrent writes */
 	struct request_queue *queue;
 	struct gendisk *disk;
 	int init_done;
-- 
1.7.2.1


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 04/10] Use percpu buffers
@ 2010-08-09 17:26   ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

This also removes the (mutex) lock which was used to
protect these buffers.

Tests with fio on dual core showed improvement of about
20% for write speed.

fio job description:

[global]
bs=4k
ioengine=libaio
iodepth=8
size=1g
direct=1
runtime=60
directory=/mnt/zdisk
verify=meta
verify=0x1234567890abcdef
numjobs=2

[seq-write]
rw=write

Speedup was negligible with iodepth of 4 or less. Maybe
gains will be more visible with higher number of cores.

Results with above job file:
Before:
  WRITE: io=2,048MB, aggrb=70,725KB/s
  WRITE: io=2,048MB, aggrb=71,938KB/s
  WRITE: io=2,048MB, aggrb=70,461KB/s

After:
  WRITE: io=2,048MB, aggrb=86,691KB/s
  WRITE: io=2,048MB, aggrb=87,025KB/s
  WRITE: io=2,048MB, aggrb=88,700KB/s

Speedup: ~20%

Read performance remained almost the same since the read
path was lockless earlier also (LZO decompressor does not
require any additional buffer).

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
 drivers/staging/zram/zram_drv.c |  136 +++++++++++++++++++++++++--------------
 drivers/staging/zram/zram_drv.h |    4 -
 2 files changed, 88 insertions(+), 52 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index f111b86..c16e09a 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -17,13 +17,14 @@
 
 #include <linux/module.h>
 #include <linux/kernel.h>
-#include <linux/bio.h>
 #include <linux/bitops.h>
 #include <linux/blkdev.h>
 #include <linux/buffer_head.h>
+#include <linux/cpu.h>
 #include <linux/device.h>
 #include <linux/genhd.h>
 #include <linux/highmem.h>
+#include <linux/notifier.h>
 #include <linux/slab.h>
 #include <linux/lzo.h>
 #include <linux/string.h>
@@ -31,6 +32,9 @@
 
 #include "zram_drv.h"
 
+static DEFINE_PER_CPU(unsigned char *, compress_buffer);
+static DEFINE_PER_CPU(unsigned char *, compress_workmem);
+
 /* Globals */
 static int zram_major;
 struct zram *devices;
@@ -290,10 +294,10 @@ static int zram_write(struct zram *zram, struct bio *bio)
 		size_t clen;
 		struct zobj_header *zheader;
 		struct page *page, *page_store;
+		unsigned char *zbuffer, *zworkmem;
 		unsigned char *user_mem, *cmem, *src;
 
 		page = bvec->bv_page;
-		src = zram->compress_buffer;
 
 		/*
 		 * System overwrites unused sectors. Free memory associated
@@ -303,38 +307,41 @@ static int zram_write(struct zram *zram, struct bio *bio)
 				zram_test_flag(zram, index, ZRAM_ZERO))
 			zram_free_page(zram, index);
 
-		mutex_lock(&zram->lock);
+		preempt_disable();
+		zbuffer = __get_cpu_var(compress_buffer);
+		zworkmem = __get_cpu_var(compress_workmem);
+		if (unlikely(!zbuffer || !zworkmem)) {
+			preempt_enable();
+			goto out;
+		}
 
+		src = zbuffer;
 		user_mem = kmap_atomic(page, KM_USER0);
 		if (page_zero_filled(user_mem)) {
 			kunmap_atomic(user_mem, KM_USER0);
-			mutex_unlock(&zram->lock);
+			preempt_enable();
 			zram_inc_stat(zram, ZRAM_STAT_PAGES_ZERO);
 			zram_set_flag(zram, index, ZRAM_ZERO);
 			continue;
 		}
 
 		ret = lzo1x_1_compress(user_mem, PAGE_SIZE, src, &clen,
-					zram->compress_workmem);
+					zworkmem);
 
 		kunmap_atomic(user_mem, KM_USER0);
 
 		if (unlikely(ret != LZO_E_OK)) {
-			mutex_unlock(&zram->lock);
+			preempt_enable();
 			pr_err("Compression failed! err=%d\n", ret);
 			goto out;
 		}
 
-		/*
-		 * Page is incompressible. Store it as-is (uncompressed)
-		 * since we do not want to return too many disk write
-		 * errors which has side effect of hanging the system.
-		 */
+		 /* Page is incompressible. Store it as-is (uncompressed) */
 		if (unlikely(clen > max_zpage_size)) {
 			clen = PAGE_SIZE;
-			page_store = alloc_page(GFP_NOIO | __GFP_HIGHMEM);
+			page_store = alloc_page(GFP_NOWAIT | __GFP_HIGHMEM);
 			if (unlikely(!page_store)) {
-				mutex_unlock(&zram->lock);
+				preempt_enable();
 				pr_info("Error allocating memory for "
 					"incompressible page: %u\n", index);
 				goto out;
@@ -350,8 +357,8 @@ static int zram_write(struct zram *zram, struct bio *bio)
 
 		if (xv_malloc(zram->mem_pool, clen + sizeof(*zheader),
 				&zram->table[index].page, &offset,
-				GFP_NOIO | __GFP_HIGHMEM)) {
-			mutex_unlock(&zram->lock);
+				GFP_NOWAIT | __GFP_HIGHMEM)) {
+			preempt_enable();
 			pr_info("Error allocating memory for compressed "
 				"page: %u, size=%zu\n", index, clen);
 			goto out;
@@ -363,18 +370,10 @@ memstore:
 		cmem = kmap_atomic(zram->table[index].page, KM_USER1) +
 				zram->table[index].offset;
 
-#if 0
-		/* Back-reference needed for memory defragmentation */
-		if (!zram_test_flag(zram, index, ZRAM_UNCOMPRESSED)) {
-			zheader = (struct zobj_header *)cmem;
-			zheader->table_idx = index;
-			cmem += sizeof(*zheader);
-		}
-#endif
-
 		memcpy(cmem, src, clen);
-
 		kunmap_atomic(cmem, KM_USER1);
+		preempt_enable();
+
 		if (unlikely(zram_test_flag(zram, index, ZRAM_UNCOMPRESSED)))
 			kunmap_atomic(src, KM_USER0);
 
@@ -382,7 +381,6 @@ memstore:
 		zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, clen);
 		zram_inc_stat(zram, ZRAM_STAT_PAGES_STORED);
 
-		mutex_unlock(&zram->lock);
 		index++;
 	}
 
@@ -446,13 +444,6 @@ void zram_reset_device(struct zram *zram)
 	mutex_lock(&zram->init_lock);
 	zram->init_done = 0;
 
-	/* Free various per-device buffers */
-	kfree(zram->compress_workmem);
-	free_pages((unsigned long)zram->compress_buffer, 1);
-
-	zram->compress_workmem = NULL;
-	zram->compress_buffer = NULL;
-
 	/* Free all pages that are still in this zram device */
 	for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++) {
 		struct page *page;
@@ -497,20 +488,6 @@ int zram_init_device(struct zram *zram)
 
 	zram_set_disksize(zram, totalram_pages << PAGE_SHIFT);
 
-	zram->compress_workmem = kzalloc(LZO1X_MEM_COMPRESS, GFP_KERNEL);
-	if (!zram->compress_workmem) {
-		pr_err("Error allocating compressor working memory!\n");
-		ret = -ENOMEM;
-		goto fail;
-	}
-
-	zram->compress_buffer = (void *)__get_free_pages(__GFP_ZERO, 1);
-	if (!zram->compress_buffer) {
-		pr_err("Error allocating compressor buffer space\n");
-		ret = -ENOMEM;
-		goto fail;
-	}
-
 	num_pages = zram->disksize >> PAGE_SHIFT;
 	zram->table = vmalloc(num_pages * sizeof(*zram->table));
 	if (!zram->table) {
@@ -573,7 +550,6 @@ static int create_device(struct zram *zram, int device_id)
 {
 	int ret = 0;
 
-	mutex_init(&zram->lock);
 	mutex_init(&zram->init_lock);
 
 	zram->queue = blk_alloc_queue(GFP_KERNEL);
@@ -649,9 +625,46 @@ static void destroy_device(struct zram *zram)
 		blk_cleanup_queue(zram->queue);
 }
 
+/*
+ * Callback for CPU hotplug events. Allocates percpu compression buffers.
+ */
+static int zram_cpu_notify(struct notifier_block *nb, unsigned long action,
+			void *pcpu)
+{
+	int cpu = (long)pcpu;
+
+	switch (action) {
+	case CPU_UP_PREPARE:
+		per_cpu(compress_buffer, cpu) = (void *)__get_free_pages(
+					GFP_KERNEL | __GFP_ZERO, 1);
+		per_cpu(compress_workmem, cpu) = kzalloc(
+					LZO1X_MEM_COMPRESS, GFP_KERNEL);
+
+		break;
+	case CPU_DEAD:
+	case CPU_UP_CANCELED:
+		free_pages((unsigned long)(per_cpu(compress_buffer, cpu)), 1);
+		per_cpu(compress_buffer, cpu) = NULL;
+
+		kfree(per_cpu(compress_buffer, cpu));
+		per_cpu(compress_buffer, cpu) = NULL;
+
+		break;
+	default:
+		break;
+	}
+
+	return NOTIFY_OK;
+}
+
+static struct notifier_block zram_cpu_nb = {
+	.notifier_call = zram_cpu_notify
+};
+
 static int __init zram_init(void)
 {
 	int ret, dev_id;
+	unsigned int cpu;
 
 	if (num_devices > max_num_devices) {
 		pr_warning("Invalid value for num_devices: %u\n",
@@ -660,6 +673,20 @@ static int __init zram_init(void)
 		goto out;
 	}
 
+	ret = register_cpu_notifier(&zram_cpu_nb);
+	if (ret)
+		goto out;
+
+	for_each_online_cpu(cpu) {
+		void *pcpu = (void *)(long)cpu;
+		zram_cpu_notify(&zram_cpu_nb, CPU_UP_PREPARE, pcpu);
+		if (!per_cpu(compress_buffer, cpu) ||
+					!per_cpu(compress_workmem, cpu)) {
+			ret = -ENOMEM;
+			goto out;
+		}
+	}
+
 	zram_major = register_blkdev(0, "zram");
 	if (zram_major <= 0) {
 		pr_warning("Unable to get major number\n");
@@ -694,12 +721,18 @@ free_devices:
 unregister:
 	unregister_blkdev(zram_major, "zram");
 out:
+	for_each_online_cpu(cpu) {
+		void *pcpu = (void *)(long)cpu;
+		zram_cpu_notify(&zram_cpu_nb, CPU_UP_CANCELED, pcpu);
+	}
+
 	return ret;
 }
 
 static void __exit zram_exit(void)
 {
 	int i;
+	unsigned int cpu;
 	struct zram *zram;
 
 	for (i = 0; i < num_devices; i++) {
@@ -712,6 +745,13 @@ static void __exit zram_exit(void)
 
 	unregister_blkdev(zram_major, "zram");
 
+	for_each_online_cpu(cpu) {
+		void *pcpu = (void *)(long)cpu;
+		zram_cpu_notify(&zram_cpu_nb, CPU_UP_CANCELED, pcpu);
+	}
+
+	unregister_cpu_notifier(&zram_cpu_nb);
+
 	kfree(devices);
 	pr_debug("Cleanup done!\n");
 }
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 88fddb4..21c97f6 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -103,11 +103,7 @@ struct zram_stats_cpu {
 
 struct zram {
 	struct xv_pool *mem_pool;
-	void *compress_workmem;
-	void *compress_buffer;
 	struct table *table;
-	struct mutex lock;	/* protect compression buffers against
-				 * concurrent writes */
 	struct request_queue *queue;
 	struct gendisk *disk;
 	int init_done;
-- 
1.7.2.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 05/10] Reduce per table entry overhead by 4 bytes
  2010-08-09 17:26 ` Nitin Gupta
@ 2010-08-09 17:26   ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

Each zram device maintains an array (table) that maps
index within the device to the location of corresponding
compressed chunk. Currently we store 'struct page' pointer,
offset with page and various flags separately which takes
12 bytes per table entry. Now all these are encoded in a
single 'phys_add_t' value which results in savings of 4 bytes
per entry (except on PAE systems).

Unfortunately, cleanups related to some variable renames
were mixed in this patch. So, please bear some additional
noise.

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
 drivers/staging/zram/zram_drv.c |  256 ++++++++++++++++++++------------------
 drivers/staging/zram/zram_drv.h |   24 +---
 2 files changed, 140 insertions(+), 140 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index c16e09a..efe9c93 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -42,6 +42,13 @@ struct zram *devices;
 /* Module params (documentation at end) */
 unsigned int num_devices;
 
+/*
+ * We do not allocate any memory for zero-filled pages.
+ * Rather, we simply mark them in corresponding table
+ * entry by setting this bit.
+ */
+#define ZRAM_ZERO_PAGE_MARK_BIT		(1 << 0)
+
 static void zram_add_stat(struct zram *zram,
 			enum zram_stats_index idx, s64 val)
 {
@@ -65,37 +72,62 @@ static void zram_dec_stat(struct zram *zram, enum zram_stats_index idx)
 	zram_add_stat(zram, idx, -1);
 }
 
-static int zram_test_flag(struct zram *zram, u32 index,
-			enum zram_pageflags flag)
+static int page_zero_filled(void *ptr)
 {
-	return zram->table[index].flags & BIT(flag);
+	unsigned int pos;
+	unsigned long *page;
+
+	page = (unsigned long *)ptr;
+
+	for (pos = 0; pos != PAGE_SIZE / sizeof(*page); pos++) {
+		if (page[pos])
+			return 0;
+	}
+
+	return 1;
 }
 
-static void zram_set_flag(struct zram *zram, u32 index,
-			enum zram_pageflags flag)
+static int zram_is_zero_page(struct zram *zram, u32 index)
 {
-	zram->table[index].flags |= BIT(flag);
+	phys_addr_t addr = zram->table[index].addr;
+
+	return addr & ZRAM_ZERO_PAGE_MARK_BIT;
 }
 
-static void zram_clear_flag(struct zram *zram, u32 index,
-			enum zram_pageflags flag)
+static void zram_set_zero_page(struct zram *zram, u32 index)
 {
-	zram->table[index].flags &= ~BIT(flag);
+	zram->table[index].addr |= ZRAM_ZERO_PAGE_MARK_BIT;
 }
 
-static int page_zero_filled(void *ptr)
+static void zram_clear_zero_page(struct zram *zram, u32 index)
 {
-	unsigned int pos;
-	unsigned long *page;
+	zram->table[index].addr &= ~ZRAM_ZERO_PAGE_MARK_BIT;
+}
 
-	page = (unsigned long *)ptr;
+static void zram_find_obj(struct zram *zram, u32 index, struct page **page,
+			u32 *offset)
+{
+	phys_addr_t addr = zram->table[index].addr;
 
-	for (pos = 0; pos != PAGE_SIZE / sizeof(*page); pos++) {
-		if (page[pos])
-			return 0;
+	if (!addr) {
+		*page = NULL;
+		*offset = 0;
+		return;
 	}
 
-	return 1;
+	*page = pfn_to_page(addr >> PAGE_SHIFT);
+	*offset = addr & ~PAGE_MASK;
+}
+
+static void zram_insert_obj(struct zram *zram, u32 index, struct page *page,
+			u32 offset)
+{
+	phys_addr_t addr;
+
+	addr = page_to_pfn(page) << PAGE_SHIFT;
+	addr |= (offset & ~PAGE_MASK);
+
+	zram->table[index].addr = addr;
 }
 
 static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
@@ -129,44 +161,44 @@ static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
 
 static void zram_free_page(struct zram *zram, size_t index)
 {
-	int clen;
+	int zlen;
 	void *obj;
+	u32 offset;
+	struct page *page;
 
-	struct page *page = zram->table[index].page;
-	u32 offset = zram->table[index].offset;
-
-	if (unlikely(!page)) {
-		/*
-		 * No memory is allocated for zero filled pages.
-		 * Simply clear zero page flag.
-		 */
-		if (zram_test_flag(zram, index, ZRAM_ZERO)) {
-			zram_clear_flag(zram, index, ZRAM_ZERO);
-			zram_dec_stat(zram, ZRAM_STAT_PAGES_ZERO);
-		}
+	/*
+	 * No memory is allocated for zero filled pages.
+	 * Simply clear corresponding table entry.
+	 */
+	if (zram_is_zero_page(zram, index)) {
+		zram_clear_zero_page(zram, index);
+		zram_dec_stat(zram, ZRAM_STAT_PAGES_ZERO);
 		return;
 	}
 
-	if (unlikely(zram_test_flag(zram, index, ZRAM_UNCOMPRESSED))) {
-		clen = PAGE_SIZE;
+	zram_find_obj(zram, index, &page, &offset);
+	if (!page)
+		return;
+
+	/* Uncompressed pages cosume whole page, so offset is zero */
+	if (unlikely(!offset)) {
+		zlen = PAGE_SIZE;
 		__free_page(page);
-		zram_clear_flag(zram, index, ZRAM_UNCOMPRESSED);
 		zram_dec_stat(zram, ZRAM_STAT_PAGES_EXPAND);
 		goto out;
 	}
 
 	obj = kmap_atomic(page, KM_USER0) + offset;
-	clen = xv_get_object_size(obj) - sizeof(struct zobj_header);
+	zlen = xv_get_object_size(obj);
 	kunmap_atomic(obj, KM_USER0);
 
 	xv_free(zram->mem_pool, page, offset);
 
 out:
-	zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, -clen);
+	zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, -zlen);
 	zram_dec_stat(zram, ZRAM_STAT_PAGES_STORED);
 
-	zram->table[index].page = NULL;
-	zram->table[index].offset = 0;
+	zram->table[index].addr = 0;
 }
 
 static void handle_zero_page(struct page *page)
@@ -181,24 +213,27 @@ static void handle_zero_page(struct page *page)
 }
 
 static void handle_uncompressed_page(struct zram *zram,
-				struct page *page, u32 index)
+				struct page *bio_page, u32 index)
 {
-	unsigned char *user_mem, *cmem;
+	u32 zoffset;
+	struct page *zpage;
+	unsigned char *bio_mem, *zmem;
 
-	user_mem = kmap_atomic(page, KM_USER0);
-	cmem = kmap_atomic(zram->table[index].page, KM_USER1) +
-			zram->table[index].offset;
+	zram_find_obj(zram, index, &zpage, &zoffset);
+	BUG_ON(zoffset);
 
-	memcpy(user_mem, cmem, PAGE_SIZE);
-	kunmap_atomic(user_mem, KM_USER0);
-	kunmap_atomic(cmem, KM_USER1);
+	bio_mem = kmap_atomic(bio_page, KM_USER0);
+	zmem = kmap_atomic(zpage, KM_USER1);
 
-	flush_dcache_page(page);
+	memcpy(bio_mem, zmem, PAGE_SIZE);
+	kunmap_atomic(bio_mem, KM_USER0);
+	kunmap_atomic(zmem, KM_USER1);
+
+	flush_dcache_page(bio_page);
 }
 
 static int zram_read(struct zram *zram, struct bio *bio)
 {
-
 	int i;
 	u32 index;
 	struct bio_vec *bvec;
@@ -214,54 +249,54 @@ static int zram_read(struct zram *zram, struct bio *bio)
 
 	bio_for_each_segment(bvec, bio, i) {
 		int ret;
-		size_t clen;
-		struct page *page;
-		struct zobj_header *zheader;
-		unsigned char *user_mem, *cmem;
+		size_t zlen;
+		u32 zoffset;
+		struct page *bio_page, *zpage;
+		unsigned char *bio_mem, *zmem;
 
-		page = bvec->bv_page;
+		bio_page = bvec->bv_page;
 
-		if (zram_test_flag(zram, index, ZRAM_ZERO)) {
-			handle_zero_page(page);
+		if (zram_is_zero_page(zram, index)) {
+			handle_zero_page(bio_page);
 			continue;
 		}
 
+		zram_find_obj(zram, index, &zpage, &zoffset);
+
 		/* Requested page is not present in compressed area */
-		if (unlikely(!zram->table[index].page)) {
-			pr_debug("Read before write: sector=%lu, size=%u",
+		if (unlikely(!zpage)) {
+			pr_debug("Read before write on swap device: "
+				"sector=%lu, size=%u",
 				(ulong)(bio->bi_sector), bio->bi_size);
 			/* Do nothing */
 			continue;
 		}
 
 		/* Page is stored uncompressed since it's incompressible */
-		if (unlikely(zram_test_flag(zram, index, ZRAM_UNCOMPRESSED))) {
-			handle_uncompressed_page(zram, page, index);
+		if (unlikely(!zoffset)) {
+			handle_uncompressed_page(zram, bio_page, index);
 			continue;
 		}
 
-		user_mem = kmap_atomic(page, KM_USER0);
-		clen = PAGE_SIZE;
+		bio_mem = kmap_atomic(bio_page, KM_USER0);
+		zlen = PAGE_SIZE;
 
-		cmem = kmap_atomic(zram->table[index].page, KM_USER1) +
-				zram->table[index].offset;
+		zmem = kmap_atomic(zpage, KM_USER1) + zoffset;
 
-		ret = lzo1x_decompress_safe(
-			cmem + sizeof(*zheader),
-			xv_get_object_size(cmem) - sizeof(*zheader),
-			user_mem, &clen);
+		ret = lzo1x_decompress_safe(zmem, xv_get_object_size(zmem),
+					bio_mem, &zlen);
 
-		kunmap_atomic(user_mem, KM_USER0);
-		kunmap_atomic(cmem, KM_USER1);
+		kunmap_atomic(bio_mem, KM_USER0);
+		kunmap_atomic(zmem, KM_USER1);
 
-		/* Should NEVER happen. Return bio error if it does. */
+		/* This should NEVER happen - return bio error if it does! */
 		if (unlikely(ret != LZO_E_OK)) {
 			pr_err("Decompression failed! err=%d, page=%u\n",
 				ret, index);
 			goto out;
 		}
 
-		flush_dcache_page(page);
+		flush_dcache_page(bio_page);
 		index++;
 	}
 
@@ -290,22 +325,19 @@ static int zram_write(struct zram *zram, struct bio *bio)
 	index = bio->bi_sector >> SECTORS_PER_PAGE_SHIFT;
 
 	bio_for_each_segment(bvec, bio, i) {
-		u32 offset;
-		size_t clen;
-		struct zobj_header *zheader;
-		struct page *page, *page_store;
+		size_t zlen;
+		u32 zoffset;
+		struct page *bio_page, *zpage;
 		unsigned char *zbuffer, *zworkmem;
-		unsigned char *user_mem, *cmem, *src;
+		unsigned char *bio_mem, *zmem, *src;
 
-		page = bvec->bv_page;
+		bio_page = bvec->bv_page;
 
 		/*
 		 * System overwrites unused sectors. Free memory associated
-		 * with this sector now.
+		 * with this sector now (if used).
 		 */
-		if (zram->table[index].page ||
-				zram_test_flag(zram, index, ZRAM_ZERO))
-			zram_free_page(zram, index);
+		zram_free_page(zram, index);
 
 		preempt_disable();
 		zbuffer = __get_cpu_var(compress_buffer);
@@ -316,19 +348,19 @@ static int zram_write(struct zram *zram, struct bio *bio)
 		}
 
 		src = zbuffer;
-		user_mem = kmap_atomic(page, KM_USER0);
-		if (page_zero_filled(user_mem)) {
-			kunmap_atomic(user_mem, KM_USER0);
+		bio_mem = kmap_atomic(bio_page, KM_USER0);
+		if (page_zero_filled(bio_mem)) {
+			kunmap_atomic(bio_mem, KM_USER0);
 			preempt_enable();
 			zram_inc_stat(zram, ZRAM_STAT_PAGES_ZERO);
-			zram_set_flag(zram, index, ZRAM_ZERO);
+			zram_set_zero_page(zram, index);
 			continue;
 		}
 
-		ret = lzo1x_1_compress(user_mem, PAGE_SIZE, src, &clen,
+		ret = lzo1x_1_compress(bio_mem, PAGE_SIZE, src, &zlen,
 					zworkmem);
 
-		kunmap_atomic(user_mem, KM_USER0);
+		kunmap_atomic(bio_mem, KM_USER0);
 
 		if (unlikely(ret != LZO_E_OK)) {
 			preempt_enable();
@@ -337,50 +369,45 @@ static int zram_write(struct zram *zram, struct bio *bio)
 		}
 
 		 /* Page is incompressible. Store it as-is (uncompressed) */
-		if (unlikely(clen > max_zpage_size)) {
-			clen = PAGE_SIZE;
-			page_store = alloc_page(GFP_NOWAIT | __GFP_HIGHMEM);
-			if (unlikely(!page_store)) {
+		if (unlikely(zlen > max_zpage_size)) {
+			zlen = PAGE_SIZE;
+			zpage = alloc_page(GFP_NOWAIT | __GFP_HIGHMEM);
+			if (unlikely(!zpage)) {
 				preempt_enable();
 				pr_info("Error allocating memory for "
 					"incompressible page: %u\n", index);
 				goto out;
 			}
 
-			offset = 0;
-			zram_set_flag(zram, index, ZRAM_UNCOMPRESSED);
+			zoffset = 0;
 			zram_inc_stat(zram, ZRAM_STAT_PAGES_EXPAND);
-			zram->table[index].page = page_store;
-			src = kmap_atomic(page, KM_USER0);
+			src = kmap_atomic(zpage, KM_USER0);
 			goto memstore;
 		}
 
-		if (xv_malloc(zram->mem_pool, clen + sizeof(*zheader),
-				&zram->table[index].page, &offset,
+		if (xv_malloc(zram->mem_pool, zlen, &zpage, &zoffset,
 				GFP_NOWAIT | __GFP_HIGHMEM)) {
 			preempt_enable();
 			pr_info("Error allocating memory for compressed "
-				"page: %u, size=%zu\n", index, clen);
+				"page: %u, size=%zu\n", index, zlen);
 			goto out;
 		}
 
 memstore:
-		zram->table[index].offset = offset;
+		zmem = kmap_atomic(zpage, KM_USER1) + zoffset;
 
-		cmem = kmap_atomic(zram->table[index].page, KM_USER1) +
-				zram->table[index].offset;
-
-		memcpy(cmem, src, clen);
-		kunmap_atomic(cmem, KM_USER1);
+		memcpy(zmem, src, zlen);
+		kunmap_atomic(zmem, KM_USER1);
 		preempt_enable();
 
-		if (unlikely(zram_test_flag(zram, index, ZRAM_UNCOMPRESSED)))
+		if (unlikely(!zoffset))
 			kunmap_atomic(src, KM_USER0);
 
 		/* Update stats */
-		zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, clen);
+		zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, zlen);
 		zram_inc_stat(zram, ZRAM_STAT_PAGES_STORED);
 
+		zram_insert_obj(zram, index, zpage, zoffset);
 		index++;
 	}
 
@@ -445,21 +472,8 @@ void zram_reset_device(struct zram *zram)
 	zram->init_done = 0;
 
 	/* Free all pages that are still in this zram device */
-	for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++) {
-		struct page *page;
-		u16 offset;
-
-		page = zram->table[index].page;
-		offset = zram->table[index].offset;
-
-		if (!page)
-			continue;
-
-		if (unlikely(zram_test_flag(zram, index, ZRAM_UNCOMPRESSED)))
-			__free_page(page);
-		else
-			xv_free(zram->mem_pool, page, offset);
-	}
+	for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++)
+		zram_free_page(zram, index);
 
 	vfree(zram->table);
 	zram->table = NULL;
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 21c97f6..65e512d 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -62,26 +62,12 @@ static const unsigned max_zpage_size = PAGE_SIZE / 4 * 3;
 #define SECTORS_PER_PAGE_SHIFT	(PAGE_SHIFT - SECTOR_SHIFT)
 #define SECTORS_PER_PAGE	(1 << SECTORS_PER_PAGE_SHIFT)
 
-/* Flags for zram pages (table[page_no].flags) */
-enum zram_pageflags {
-	/* Page is stored uncompressed */
-	ZRAM_UNCOMPRESSED,
-
-	/* Page consists entirely of zeros */
-	ZRAM_ZERO,
-
-	__NR_ZRAM_PAGEFLAGS,
-};
-
-/*-- Data structures */
-
-/* Allocated for each disk page */
+/*
+ * Maintains swap slot to compressed object mapping.
+ */
 struct table {
-	struct page *page;
-	u16 offset;
-	u8 count;	/* object ref count (not yet used) */
-	u8 flags;
-} __attribute__((aligned(4)));
+	phys_addr_t addr;	/* location of [compressed] object */
+};
 
 enum zram_stats_index {
 	ZRAM_STAT_COMPR_SIZE,	/* compressed size of pages stored */
-- 
1.7.2.1


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 05/10] Reduce per table entry overhead by 4 bytes
@ 2010-08-09 17:26   ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

Each zram device maintains an array (table) that maps
index within the device to the location of corresponding
compressed chunk. Currently we store 'struct page' pointer,
offset with page and various flags separately which takes
12 bytes per table entry. Now all these are encoded in a
single 'phys_add_t' value which results in savings of 4 bytes
per entry (except on PAE systems).

Unfortunately, cleanups related to some variable renames
were mixed in this patch. So, please bear some additional
noise.

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
 drivers/staging/zram/zram_drv.c |  256 ++++++++++++++++++++------------------
 drivers/staging/zram/zram_drv.h |   24 +---
 2 files changed, 140 insertions(+), 140 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index c16e09a..efe9c93 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -42,6 +42,13 @@ struct zram *devices;
 /* Module params (documentation at end) */
 unsigned int num_devices;
 
+/*
+ * We do not allocate any memory for zero-filled pages.
+ * Rather, we simply mark them in corresponding table
+ * entry by setting this bit.
+ */
+#define ZRAM_ZERO_PAGE_MARK_BIT		(1 << 0)
+
 static void zram_add_stat(struct zram *zram,
 			enum zram_stats_index idx, s64 val)
 {
@@ -65,37 +72,62 @@ static void zram_dec_stat(struct zram *zram, enum zram_stats_index idx)
 	zram_add_stat(zram, idx, -1);
 }
 
-static int zram_test_flag(struct zram *zram, u32 index,
-			enum zram_pageflags flag)
+static int page_zero_filled(void *ptr)
 {
-	return zram->table[index].flags & BIT(flag);
+	unsigned int pos;
+	unsigned long *page;
+
+	page = (unsigned long *)ptr;
+
+	for (pos = 0; pos != PAGE_SIZE / sizeof(*page); pos++) {
+		if (page[pos])
+			return 0;
+	}
+
+	return 1;
 }
 
-static void zram_set_flag(struct zram *zram, u32 index,
-			enum zram_pageflags flag)
+static int zram_is_zero_page(struct zram *zram, u32 index)
 {
-	zram->table[index].flags |= BIT(flag);
+	phys_addr_t addr = zram->table[index].addr;
+
+	return addr & ZRAM_ZERO_PAGE_MARK_BIT;
 }
 
-static void zram_clear_flag(struct zram *zram, u32 index,
-			enum zram_pageflags flag)
+static void zram_set_zero_page(struct zram *zram, u32 index)
 {
-	zram->table[index].flags &= ~BIT(flag);
+	zram->table[index].addr |= ZRAM_ZERO_PAGE_MARK_BIT;
 }
 
-static int page_zero_filled(void *ptr)
+static void zram_clear_zero_page(struct zram *zram, u32 index)
 {
-	unsigned int pos;
-	unsigned long *page;
+	zram->table[index].addr &= ~ZRAM_ZERO_PAGE_MARK_BIT;
+}
 
-	page = (unsigned long *)ptr;
+static void zram_find_obj(struct zram *zram, u32 index, struct page **page,
+			u32 *offset)
+{
+	phys_addr_t addr = zram->table[index].addr;
 
-	for (pos = 0; pos != PAGE_SIZE / sizeof(*page); pos++) {
-		if (page[pos])
-			return 0;
+	if (!addr) {
+		*page = NULL;
+		*offset = 0;
+		return;
 	}
 
-	return 1;
+	*page = pfn_to_page(addr >> PAGE_SHIFT);
+	*offset = addr & ~PAGE_MASK;
+}
+
+static void zram_insert_obj(struct zram *zram, u32 index, struct page *page,
+			u32 offset)
+{
+	phys_addr_t addr;
+
+	addr = page_to_pfn(page) << PAGE_SHIFT;
+	addr |= (offset & ~PAGE_MASK);
+
+	zram->table[index].addr = addr;
 }
 
 static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
@@ -129,44 +161,44 @@ static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
 
 static void zram_free_page(struct zram *zram, size_t index)
 {
-	int clen;
+	int zlen;
 	void *obj;
+	u32 offset;
+	struct page *page;
 
-	struct page *page = zram->table[index].page;
-	u32 offset = zram->table[index].offset;
-
-	if (unlikely(!page)) {
-		/*
-		 * No memory is allocated for zero filled pages.
-		 * Simply clear zero page flag.
-		 */
-		if (zram_test_flag(zram, index, ZRAM_ZERO)) {
-			zram_clear_flag(zram, index, ZRAM_ZERO);
-			zram_dec_stat(zram, ZRAM_STAT_PAGES_ZERO);
-		}
+	/*
+	 * No memory is allocated for zero filled pages.
+	 * Simply clear corresponding table entry.
+	 */
+	if (zram_is_zero_page(zram, index)) {
+		zram_clear_zero_page(zram, index);
+		zram_dec_stat(zram, ZRAM_STAT_PAGES_ZERO);
 		return;
 	}
 
-	if (unlikely(zram_test_flag(zram, index, ZRAM_UNCOMPRESSED))) {
-		clen = PAGE_SIZE;
+	zram_find_obj(zram, index, &page, &offset);
+	if (!page)
+		return;
+
+	/* Uncompressed pages cosume whole page, so offset is zero */
+	if (unlikely(!offset)) {
+		zlen = PAGE_SIZE;
 		__free_page(page);
-		zram_clear_flag(zram, index, ZRAM_UNCOMPRESSED);
 		zram_dec_stat(zram, ZRAM_STAT_PAGES_EXPAND);
 		goto out;
 	}
 
 	obj = kmap_atomic(page, KM_USER0) + offset;
-	clen = xv_get_object_size(obj) - sizeof(struct zobj_header);
+	zlen = xv_get_object_size(obj);
 	kunmap_atomic(obj, KM_USER0);
 
 	xv_free(zram->mem_pool, page, offset);
 
 out:
-	zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, -clen);
+	zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, -zlen);
 	zram_dec_stat(zram, ZRAM_STAT_PAGES_STORED);
 
-	zram->table[index].page = NULL;
-	zram->table[index].offset = 0;
+	zram->table[index].addr = 0;
 }
 
 static void handle_zero_page(struct page *page)
@@ -181,24 +213,27 @@ static void handle_zero_page(struct page *page)
 }
 
 static void handle_uncompressed_page(struct zram *zram,
-				struct page *page, u32 index)
+				struct page *bio_page, u32 index)
 {
-	unsigned char *user_mem, *cmem;
+	u32 zoffset;
+	struct page *zpage;
+	unsigned char *bio_mem, *zmem;
 
-	user_mem = kmap_atomic(page, KM_USER0);
-	cmem = kmap_atomic(zram->table[index].page, KM_USER1) +
-			zram->table[index].offset;
+	zram_find_obj(zram, index, &zpage, &zoffset);
+	BUG_ON(zoffset);
 
-	memcpy(user_mem, cmem, PAGE_SIZE);
-	kunmap_atomic(user_mem, KM_USER0);
-	kunmap_atomic(cmem, KM_USER1);
+	bio_mem = kmap_atomic(bio_page, KM_USER0);
+	zmem = kmap_atomic(zpage, KM_USER1);
 
-	flush_dcache_page(page);
+	memcpy(bio_mem, zmem, PAGE_SIZE);
+	kunmap_atomic(bio_mem, KM_USER0);
+	kunmap_atomic(zmem, KM_USER1);
+
+	flush_dcache_page(bio_page);
 }
 
 static int zram_read(struct zram *zram, struct bio *bio)
 {
-
 	int i;
 	u32 index;
 	struct bio_vec *bvec;
@@ -214,54 +249,54 @@ static int zram_read(struct zram *zram, struct bio *bio)
 
 	bio_for_each_segment(bvec, bio, i) {
 		int ret;
-		size_t clen;
-		struct page *page;
-		struct zobj_header *zheader;
-		unsigned char *user_mem, *cmem;
+		size_t zlen;
+		u32 zoffset;
+		struct page *bio_page, *zpage;
+		unsigned char *bio_mem, *zmem;
 
-		page = bvec->bv_page;
+		bio_page = bvec->bv_page;
 
-		if (zram_test_flag(zram, index, ZRAM_ZERO)) {
-			handle_zero_page(page);
+		if (zram_is_zero_page(zram, index)) {
+			handle_zero_page(bio_page);
 			continue;
 		}
 
+		zram_find_obj(zram, index, &zpage, &zoffset);
+
 		/* Requested page is not present in compressed area */
-		if (unlikely(!zram->table[index].page)) {
-			pr_debug("Read before write: sector=%lu, size=%u",
+		if (unlikely(!zpage)) {
+			pr_debug("Read before write on swap device: "
+				"sector=%lu, size=%u",
 				(ulong)(bio->bi_sector), bio->bi_size);
 			/* Do nothing */
 			continue;
 		}
 
 		/* Page is stored uncompressed since it's incompressible */
-		if (unlikely(zram_test_flag(zram, index, ZRAM_UNCOMPRESSED))) {
-			handle_uncompressed_page(zram, page, index);
+		if (unlikely(!zoffset)) {
+			handle_uncompressed_page(zram, bio_page, index);
 			continue;
 		}
 
-		user_mem = kmap_atomic(page, KM_USER0);
-		clen = PAGE_SIZE;
+		bio_mem = kmap_atomic(bio_page, KM_USER0);
+		zlen = PAGE_SIZE;
 
-		cmem = kmap_atomic(zram->table[index].page, KM_USER1) +
-				zram->table[index].offset;
+		zmem = kmap_atomic(zpage, KM_USER1) + zoffset;
 
-		ret = lzo1x_decompress_safe(
-			cmem + sizeof(*zheader),
-			xv_get_object_size(cmem) - sizeof(*zheader),
-			user_mem, &clen);
+		ret = lzo1x_decompress_safe(zmem, xv_get_object_size(zmem),
+					bio_mem, &zlen);
 
-		kunmap_atomic(user_mem, KM_USER0);
-		kunmap_atomic(cmem, KM_USER1);
+		kunmap_atomic(bio_mem, KM_USER0);
+		kunmap_atomic(zmem, KM_USER1);
 
-		/* Should NEVER happen. Return bio error if it does. */
+		/* This should NEVER happen - return bio error if it does! */
 		if (unlikely(ret != LZO_E_OK)) {
 			pr_err("Decompression failed! err=%d, page=%u\n",
 				ret, index);
 			goto out;
 		}
 
-		flush_dcache_page(page);
+		flush_dcache_page(bio_page);
 		index++;
 	}
 
@@ -290,22 +325,19 @@ static int zram_write(struct zram *zram, struct bio *bio)
 	index = bio->bi_sector >> SECTORS_PER_PAGE_SHIFT;
 
 	bio_for_each_segment(bvec, bio, i) {
-		u32 offset;
-		size_t clen;
-		struct zobj_header *zheader;
-		struct page *page, *page_store;
+		size_t zlen;
+		u32 zoffset;
+		struct page *bio_page, *zpage;
 		unsigned char *zbuffer, *zworkmem;
-		unsigned char *user_mem, *cmem, *src;
+		unsigned char *bio_mem, *zmem, *src;
 
-		page = bvec->bv_page;
+		bio_page = bvec->bv_page;
 
 		/*
 		 * System overwrites unused sectors. Free memory associated
-		 * with this sector now.
+		 * with this sector now (if used).
 		 */
-		if (zram->table[index].page ||
-				zram_test_flag(zram, index, ZRAM_ZERO))
-			zram_free_page(zram, index);
+		zram_free_page(zram, index);
 
 		preempt_disable();
 		zbuffer = __get_cpu_var(compress_buffer);
@@ -316,19 +348,19 @@ static int zram_write(struct zram *zram, struct bio *bio)
 		}
 
 		src = zbuffer;
-		user_mem = kmap_atomic(page, KM_USER0);
-		if (page_zero_filled(user_mem)) {
-			kunmap_atomic(user_mem, KM_USER0);
+		bio_mem = kmap_atomic(bio_page, KM_USER0);
+		if (page_zero_filled(bio_mem)) {
+			kunmap_atomic(bio_mem, KM_USER0);
 			preempt_enable();
 			zram_inc_stat(zram, ZRAM_STAT_PAGES_ZERO);
-			zram_set_flag(zram, index, ZRAM_ZERO);
+			zram_set_zero_page(zram, index);
 			continue;
 		}
 
-		ret = lzo1x_1_compress(user_mem, PAGE_SIZE, src, &clen,
+		ret = lzo1x_1_compress(bio_mem, PAGE_SIZE, src, &zlen,
 					zworkmem);
 
-		kunmap_atomic(user_mem, KM_USER0);
+		kunmap_atomic(bio_mem, KM_USER0);
 
 		if (unlikely(ret != LZO_E_OK)) {
 			preempt_enable();
@@ -337,50 +369,45 @@ static int zram_write(struct zram *zram, struct bio *bio)
 		}
 
 		 /* Page is incompressible. Store it as-is (uncompressed) */
-		if (unlikely(clen > max_zpage_size)) {
-			clen = PAGE_SIZE;
-			page_store = alloc_page(GFP_NOWAIT | __GFP_HIGHMEM);
-			if (unlikely(!page_store)) {
+		if (unlikely(zlen > max_zpage_size)) {
+			zlen = PAGE_SIZE;
+			zpage = alloc_page(GFP_NOWAIT | __GFP_HIGHMEM);
+			if (unlikely(!zpage)) {
 				preempt_enable();
 				pr_info("Error allocating memory for "
 					"incompressible page: %u\n", index);
 				goto out;
 			}
 
-			offset = 0;
-			zram_set_flag(zram, index, ZRAM_UNCOMPRESSED);
+			zoffset = 0;
 			zram_inc_stat(zram, ZRAM_STAT_PAGES_EXPAND);
-			zram->table[index].page = page_store;
-			src = kmap_atomic(page, KM_USER0);
+			src = kmap_atomic(zpage, KM_USER0);
 			goto memstore;
 		}
 
-		if (xv_malloc(zram->mem_pool, clen + sizeof(*zheader),
-				&zram->table[index].page, &offset,
+		if (xv_malloc(zram->mem_pool, zlen, &zpage, &zoffset,
 				GFP_NOWAIT | __GFP_HIGHMEM)) {
 			preempt_enable();
 			pr_info("Error allocating memory for compressed "
-				"page: %u, size=%zu\n", index, clen);
+				"page: %u, size=%zu\n", index, zlen);
 			goto out;
 		}
 
 memstore:
-		zram->table[index].offset = offset;
+		zmem = kmap_atomic(zpage, KM_USER1) + zoffset;
 
-		cmem = kmap_atomic(zram->table[index].page, KM_USER1) +
-				zram->table[index].offset;
-
-		memcpy(cmem, src, clen);
-		kunmap_atomic(cmem, KM_USER1);
+		memcpy(zmem, src, zlen);
+		kunmap_atomic(zmem, KM_USER1);
 		preempt_enable();
 
-		if (unlikely(zram_test_flag(zram, index, ZRAM_UNCOMPRESSED)))
+		if (unlikely(!zoffset))
 			kunmap_atomic(src, KM_USER0);
 
 		/* Update stats */
-		zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, clen);
+		zram_add_stat(zram, ZRAM_STAT_COMPR_SIZE, zlen);
 		zram_inc_stat(zram, ZRAM_STAT_PAGES_STORED);
 
+		zram_insert_obj(zram, index, zpage, zoffset);
 		index++;
 	}
 
@@ -445,21 +472,8 @@ void zram_reset_device(struct zram *zram)
 	zram->init_done = 0;
 
 	/* Free all pages that are still in this zram device */
-	for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++) {
-		struct page *page;
-		u16 offset;
-
-		page = zram->table[index].page;
-		offset = zram->table[index].offset;
-
-		if (!page)
-			continue;
-
-		if (unlikely(zram_test_flag(zram, index, ZRAM_UNCOMPRESSED)))
-			__free_page(page);
-		else
-			xv_free(zram->mem_pool, page, offset);
-	}
+	for (index = 0; index < zram->disksize >> PAGE_SHIFT; index++)
+		zram_free_page(zram, index);
 
 	vfree(zram->table);
 	zram->table = NULL;
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 21c97f6..65e512d 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -62,26 +62,12 @@ static const unsigned max_zpage_size = PAGE_SIZE / 4 * 3;
 #define SECTORS_PER_PAGE_SHIFT	(PAGE_SHIFT - SECTOR_SHIFT)
 #define SECTORS_PER_PAGE	(1 << SECTORS_PER_PAGE_SHIFT)
 
-/* Flags for zram pages (table[page_no].flags) */
-enum zram_pageflags {
-	/* Page is stored uncompressed */
-	ZRAM_UNCOMPRESSED,
-
-	/* Page consists entirely of zeros */
-	ZRAM_ZERO,
-
-	__NR_ZRAM_PAGEFLAGS,
-};
-
-/*-- Data structures */
-
-/* Allocated for each disk page */
+/*
+ * Maintains swap slot to compressed object mapping.
+ */
 struct table {
-	struct page *page;
-	u16 offset;
-	u8 count;	/* object ref count (not yet used) */
-	u8 flags;
-} __attribute__((aligned(4)));
+	phys_addr_t addr;	/* location of [compressed] object */
+};
 
 enum zram_stats_index {
 	ZRAM_STAT_COMPR_SIZE,	/* compressed size of pages stored */
-- 
1.7.2.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 06/10] Block discard support
  2010-08-09 17:26 ` Nitin Gupta
@ 2010-08-09 17:26   ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

The 'discard' bio discard request provides information to
zram disks regarding blocks which are no longer in use by
filesystem. This allows freeing memory allocated for such
blocks.

When zram devices are used as swap disks, we already have
a callback (block_device_operations->swap_slot_free_notify).
So, the discard support is useful only when used as generic
(non-swap) disk.

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
 drivers/staging/zram/zram_drv.c   |   25 +++++++++++++++++++++++++
 drivers/staging/zram/zram_sysfs.c |   11 +++++++++++
 2 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index efe9c93..0f9785f 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -420,6 +420,20 @@ out:
 	return 0;
 }
 
+static void zram_discard(struct zram *zram, struct bio *bio)
+{
+	size_t bytes = bio->bi_size;
+	sector_t sector = bio->bi_sector;
+
+	while (bytes >= PAGE_SIZE) {
+		zram_free_page(zram, sector >> SECTORS_PER_PAGE_SHIFT);
+		sector += PAGE_SIZE >> SECTOR_SHIFT;
+		bytes -= PAGE_SIZE;
+	}
+
+	bio_endio(bio, 0);
+}
+
 /*
  * Check if request is within bounds and page aligned.
  */
@@ -451,6 +465,12 @@ static int zram_make_request(struct request_queue *queue, struct bio *bio)
 		return 0;
 	}
 
+	if (unlikely(bio_rw_flagged(bio, BIO_RW_DISCARD))) {
+		zram_inc_stat(zram, ZRAM_STAT_DISCARD);
+		zram_discard(zram, bio);
+		return 0;
+	}
+
 	switch (bio_data_dir(bio)) {
 	case READ:
 		ret = zram_read(zram, bio);
@@ -606,6 +626,11 @@ static int create_device(struct zram *zram, int device_id)
 	blk_queue_io_min(zram->disk->queue, PAGE_SIZE);
 	blk_queue_io_opt(zram->disk->queue, PAGE_SIZE);
 
+	zram->disk->queue->limits.discard_granularity = PAGE_SIZE;
+	zram->disk->queue->limits.max_discard_sectors = UINT_MAX;
+	zram->disk->queue->limits.discard_zeroes_data = 1;
+	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, zram->queue);
+
 	add_disk(zram->disk);
 
 #ifdef CONFIG_SYSFS
diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
index 43bcdd4..74971c0 100644
--- a/drivers/staging/zram/zram_sysfs.c
+++ b/drivers/staging/zram/zram_sysfs.c
@@ -165,6 +165,15 @@ static ssize_t notify_free_show(struct device *dev,
 		zram_get_stat(zram, ZRAM_STAT_NOTIFY_FREE));
 }
 
+static ssize_t discard_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return sprintf(buf, "%llu\n",
+		zram_get_stat(zram, ZRAM_STAT_DISCARD));
+}
+
 static ssize_t zero_pages_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
@@ -215,6 +224,7 @@ static DEVICE_ATTR(num_reads, S_IRUGO, num_reads_show, NULL);
 static DEVICE_ATTR(num_writes, S_IRUGO, num_writes_show, NULL);
 static DEVICE_ATTR(invalid_io, S_IRUGO, invalid_io_show, NULL);
 static DEVICE_ATTR(notify_free, S_IRUGO, notify_free_show, NULL);
+static DEVICE_ATTR(discard, S_IRUGO, discard_show, NULL);
 static DEVICE_ATTR(zero_pages, S_IRUGO, zero_pages_show, NULL);
 static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
 static DEVICE_ATTR(compr_data_size, S_IRUGO, compr_data_size_show, NULL);
@@ -228,6 +238,7 @@ static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_num_writes.attr,
 	&dev_attr_invalid_io.attr,
 	&dev_attr_notify_free.attr,
+	&dev_attr_discard.attr,
 	&dev_attr_zero_pages.attr,
 	&dev_attr_orig_data_size.attr,
 	&dev_attr_compr_data_size.attr,
-- 
1.7.2.1


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 06/10] Block discard support
@ 2010-08-09 17:26   ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

The 'discard' bio discard request provides information to
zram disks regarding blocks which are no longer in use by
filesystem. This allows freeing memory allocated for such
blocks.

When zram devices are used as swap disks, we already have
a callback (block_device_operations->swap_slot_free_notify).
So, the discard support is useful only when used as generic
(non-swap) disk.

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
 drivers/staging/zram/zram_drv.c   |   25 +++++++++++++++++++++++++
 drivers/staging/zram/zram_sysfs.c |   11 +++++++++++
 2 files changed, 36 insertions(+), 0 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index efe9c93..0f9785f 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -420,6 +420,20 @@ out:
 	return 0;
 }
 
+static void zram_discard(struct zram *zram, struct bio *bio)
+{
+	size_t bytes = bio->bi_size;
+	sector_t sector = bio->bi_sector;
+
+	while (bytes >= PAGE_SIZE) {
+		zram_free_page(zram, sector >> SECTORS_PER_PAGE_SHIFT);
+		sector += PAGE_SIZE >> SECTOR_SHIFT;
+		bytes -= PAGE_SIZE;
+	}
+
+	bio_endio(bio, 0);
+}
+
 /*
  * Check if request is within bounds and page aligned.
  */
@@ -451,6 +465,12 @@ static int zram_make_request(struct request_queue *queue, struct bio *bio)
 		return 0;
 	}
 
+	if (unlikely(bio_rw_flagged(bio, BIO_RW_DISCARD))) {
+		zram_inc_stat(zram, ZRAM_STAT_DISCARD);
+		zram_discard(zram, bio);
+		return 0;
+	}
+
 	switch (bio_data_dir(bio)) {
 	case READ:
 		ret = zram_read(zram, bio);
@@ -606,6 +626,11 @@ static int create_device(struct zram *zram, int device_id)
 	blk_queue_io_min(zram->disk->queue, PAGE_SIZE);
 	blk_queue_io_opt(zram->disk->queue, PAGE_SIZE);
 
+	zram->disk->queue->limits.discard_granularity = PAGE_SIZE;
+	zram->disk->queue->limits.max_discard_sectors = UINT_MAX;
+	zram->disk->queue->limits.discard_zeroes_data = 1;
+	queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, zram->queue);
+
 	add_disk(zram->disk);
 
 #ifdef CONFIG_SYSFS
diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
index 43bcdd4..74971c0 100644
--- a/drivers/staging/zram/zram_sysfs.c
+++ b/drivers/staging/zram/zram_sysfs.c
@@ -165,6 +165,15 @@ static ssize_t notify_free_show(struct device *dev,
 		zram_get_stat(zram, ZRAM_STAT_NOTIFY_FREE));
 }
 
+static ssize_t discard_show(struct device *dev,
+		struct device_attribute *attr, char *buf)
+{
+	struct zram *zram = dev_to_zram(dev);
+
+	return sprintf(buf, "%llu\n",
+		zram_get_stat(zram, ZRAM_STAT_DISCARD));
+}
+
 static ssize_t zero_pages_show(struct device *dev,
 		struct device_attribute *attr, char *buf)
 {
@@ -215,6 +224,7 @@ static DEVICE_ATTR(num_reads, S_IRUGO, num_reads_show, NULL);
 static DEVICE_ATTR(num_writes, S_IRUGO, num_writes_show, NULL);
 static DEVICE_ATTR(invalid_io, S_IRUGO, invalid_io_show, NULL);
 static DEVICE_ATTR(notify_free, S_IRUGO, notify_free_show, NULL);
+static DEVICE_ATTR(discard, S_IRUGO, discard_show, NULL);
 static DEVICE_ATTR(zero_pages, S_IRUGO, zero_pages_show, NULL);
 static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
 static DEVICE_ATTR(compr_data_size, S_IRUGO, compr_data_size_show, NULL);
@@ -228,6 +238,7 @@ static struct attribute *zram_disk_attrs[] = {
 	&dev_attr_num_writes.attr,
 	&dev_attr_invalid_io.attr,
 	&dev_attr_notify_free.attr,
+	&dev_attr_discard.attr,
 	&dev_attr_zero_pages.attr,
 	&dev_attr_orig_data_size.attr,
 	&dev_attr_compr_data_size.attr,
-- 
1.7.2.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 07/10] Increase compressed page size threshold
  2010-08-09 17:26 ` Nitin Gupta
@ 2010-08-09 17:26   ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

Compression takes much more time than decompression. So, its quite
wasteful in terms of both CPU cycles and memory usage to have a very
low compressed page size threshold and thereby storing such not-so-well
compressible pages as-is (uncompressed). So, increasing it from
PAGE_SIZE/2 to PAGE_SIZE/8*7. A low threshold was useful when we had
"backing swap" support where we could forward such pages to the backing
device (applicable only when zram was used as swap disk).

It is not yet configurable through sysfs but may be exported in future,
along with threshold for average compression ratio.

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
 drivers/staging/zram/zram_drv.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 65e512d..bcc51ea 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -47,7 +47,7 @@ static const unsigned default_disksize_perc_ram = 25;
  * Pages that compress to size greater than this are stored
  * uncompressed in memory.
  */
-static const unsigned max_zpage_size = PAGE_SIZE / 4 * 3;
+static const unsigned max_zpage_size = PAGE_SIZE / 8 * 7;
 
 /*
  * NOTE: max_zpage_size must be less than or equal to:
-- 
1.7.2.1


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 07/10] Increase compressed page size threshold
@ 2010-08-09 17:26   ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

Compression takes much more time than decompression. So, its quite
wasteful in terms of both CPU cycles and memory usage to have a very
low compressed page size threshold and thereby storing such not-so-well
compressible pages as-is (uncompressed). So, increasing it from
PAGE_SIZE/2 to PAGE_SIZE/8*7. A low threshold was useful when we had
"backing swap" support where we could forward such pages to the backing
device (applicable only when zram was used as swap disk).

It is not yet configurable through sysfs but may be exported in future,
along with threshold for average compression ratio.

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
 drivers/staging/zram/zram_drv.h |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index 65e512d..bcc51ea 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -47,7 +47,7 @@ static const unsigned default_disksize_perc_ram = 25;
  * Pages that compress to size greater than this are stored
  * uncompressed in memory.
  */
-static const unsigned max_zpage_size = PAGE_SIZE / 4 * 3;
+static const unsigned max_zpage_size = PAGE_SIZE / 8 * 7;
 
 /*
  * NOTE: max_zpage_size must be less than or equal to:
-- 
1.7.2.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 08/10] Some cleanups
  2010-08-09 17:26 ` Nitin Gupta
@ 2010-08-09 17:26   ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

 - xvmalloc: Remove unnecessary stat_{inc,dec} and increment
   pages_stored directly
 - xvmalloc: Initialize pointers with NULL instead of 0
 - zram: Remove verbose message when use sets insane disksize
 - zram: Mark some messages as pr_debug
 - zram: Refine some comments

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
 drivers/staging/zram/xvmalloc.c |   22 ++++-------------
 drivers/staging/zram/zram_drv.c |   49 +++++++++++++-------------------------
 drivers/staging/zram/zram_drv.h |   26 ++++----------------
 3 files changed, 28 insertions(+), 69 deletions(-)

diff --git a/drivers/staging/zram/xvmalloc.c b/drivers/staging/zram/xvmalloc.c
index 3fdbb8a..6268f65 100644
--- a/drivers/staging/zram/xvmalloc.c
+++ b/drivers/staging/zram/xvmalloc.c
@@ -20,16 +20,6 @@
 #include "xvmalloc.h"
 #include "xvmalloc_int.h"
 
-static void stat_inc(u64 *value)
-{
-	*value = *value + 1;
-}
-
-static void stat_dec(u64 *value)
-{
-	*value = *value - 1;
-}
-
 static int test_flag(struct block_header *block, enum blockflags flag)
 {
 	return block->prev & BIT(flag);
@@ -187,7 +177,7 @@ static void insert_block(struct xv_pool *pool, struct page *page, u32 offset,
 	slindex = get_index_for_insert(block->size);
 	flindex = slindex / BITS_PER_LONG;
 
-	block->link.prev_page = 0;
+	block->link.prev_page = NULL;
 	block->link.prev_offset = 0;
 	block->link.next_page = pool->freelist[slindex].page;
 	block->link.next_offset = pool->freelist[slindex].offset;
@@ -217,7 +207,7 @@ static void remove_block_head(struct xv_pool *pool,
 
 	pool->freelist[slindex].page = block->link.next_page;
 	pool->freelist[slindex].offset = block->link.next_offset;
-	block->link.prev_page = 0;
+	block->link.prev_page = NULL;
 	block->link.prev_offset = 0;
 
 	if (!pool->freelist[slindex].page) {
@@ -232,7 +222,7 @@ static void remove_block_head(struct xv_pool *pool,
 		 */
 		tmpblock = get_ptr_atomic(pool->freelist[slindex].page,
 				pool->freelist[slindex].offset, KM_USER1);
-		tmpblock->link.prev_page = 0;
+		tmpblock->link.prev_page = NULL;
 		tmpblock->link.prev_offset = 0;
 		put_ptr_atomic(tmpblock, KM_USER1);
 	}
@@ -284,7 +274,7 @@ static int grow_pool(struct xv_pool *pool, gfp_t flags)
 	if (unlikely(!page))
 		return -ENOMEM;
 
-	stat_inc(&pool->total_pages);
+	pool->total_pages++;
 
 	spin_lock(&pool->lock);
 	block = get_ptr_atomic(page, 0, KM_USER0);
@@ -361,8 +351,6 @@ int xv_malloc(struct xv_pool *pool, u32 size, struct page **page,
 
 	if (!*page) {
 		spin_unlock(&pool->lock);
-		if (flags & GFP_NOWAIT)
-			return -ENOMEM;
 		error = grow_pool(pool, flags);
 		if (unlikely(error))
 			return error;
@@ -472,7 +460,7 @@ void xv_free(struct xv_pool *pool, struct page *page, u32 offset)
 		spin_unlock(&pool->lock);
 
 		__free_page(page);
-		stat_dec(&pool->total_pages);
+		pool->total_pages--;
 		return;
 	}
 
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 0f9785f..42aa271 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -9,7 +9,7 @@
  * Released under the terms of 3-clause BSD License
  * Released under the terms of GNU General Public License Version 2.0
  *
- * Project home: http://compcache.googlecode.com
+ * Project home: http://compcache.googlecode.com/
  */
 
 #define KMSG_COMPONENT "zram"
@@ -130,33 +130,15 @@ static void zram_insert_obj(struct zram *zram, u32 index, struct page *page,
 	zram->table[index].addr = addr;
 }
 
-static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
+static u64 zram_default_disksize(void)
 {
-	if (!zram->disksize) {
-		pr_info(
-		"disk size not provided. You can use disksize_kb module "
-		"param to specify size.\nUsing default: (%u%% of RAM).\n",
-		default_disksize_perc_ram
-		);
-		zram->disksize = default_disksize_perc_ram *
-					(totalram_bytes / 100);
-	}
+	u64 disksize;
 
-	if (zram->disksize > 2 * (totalram_bytes)) {
-		pr_info(
-		"There is little point creating a zram of greater than "
-		"twice the size of memory since we expect a 2:1 compression "
-		"ratio. Note that zram uses about 0.1%% of the size of "
-		"the disk when not in use so a huge zram is "
-		"wasteful.\n"
-		"\tMemory Size: %zu kB\n"
-		"\tSize you selected: %llu kB\n"
-		"Continuing anyway ...\n",
-		totalram_bytes >> 10, zram->disksize
-		);
-	}
+	disksize = default_disksize_perc_ram *
+			(totalram_pages / 100);
+	disksize = (disksize << PAGE_SHIFT) & PAGE_MASK;
 
-	zram->disksize &= PAGE_MASK;
+	return disksize;
 }
 
 static void zram_free_page(struct zram *zram, size_t index)
@@ -459,7 +441,7 @@ static int zram_make_request(struct request_queue *queue, struct bio *bio)
 	int ret = 0;
 	struct zram *zram = queue->queuedata;
 
-	if (!valid_io_request(zram, bio)) {
+	if (unlikely(!valid_io_request(zram, bio))) {
 		zram_inc_stat(zram, ZRAM_STAT_INVALID_IO);
 		bio_io_error(bio);
 		return 0;
@@ -504,7 +486,7 @@ void zram_reset_device(struct zram *zram)
 	/* Reset stats */
 	memset(&zram->stats, 0, sizeof(zram->stats));
 
-	zram->disksize = 0;
+	zram->disksize = zram_default_disksize();
 	mutex_unlock(&zram->init_lock);
 }
 
@@ -520,7 +502,8 @@ int zram_init_device(struct zram *zram)
 		return 0;
 	}
 
-	zram_set_disksize(zram, totalram_pages << PAGE_SHIFT);
+	if (!zram->disksize)
+		zram->disksize = zram_default_disksize();
 
 	num_pages = zram->disksize >> PAGE_SHIFT;
 	zram->table = vmalloc(num_pages * sizeof(*zram->table));
@@ -566,7 +549,8 @@ fail:
 	return ret;
 }
 
-void zram_slot_free_notify(struct block_device *bdev, unsigned long index)
+static void zram_slot_free_notify(struct block_device *bdev,
+				unsigned long index)
 {
 	struct zram *zram;
 
@@ -614,8 +598,9 @@ static int create_device(struct zram *zram, int device_id)
 	zram->disk->private_data = zram;
 	snprintf(zram->disk->disk_name, 16, "zram%d", device_id);
 
-	/* Actual capacity set using syfs (/sys/block/zram<id>/disksize */
-	set_capacity(zram->disk, 0);
+	/* Custom size can be set using syfs (/sys/block/zram<id>/disksize) */
+	zram->disksize = zram_default_disksize();
+	set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
 
 	/*
 	 * To ensure that we always get PAGE_SIZE aligned
@@ -739,7 +724,7 @@ static int __init zram_init(void)
 	}
 
 	/* Allocate the device array and initialize each one */
-	pr_info("Creating %u devices ...\n", num_devices);
+	pr_debug("Creating %u devices ...\n", num_devices);
 	devices = kzalloc(num_devices * sizeof(struct zram), GFP_KERNEL);
 	if (!devices) {
 		ret = -ENOMEM;
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index bcc51ea..2cf36db 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -9,7 +9,7 @@
  * Released under the terms of 3-clause BSD License
  * Released under the terms of GNU General Public License Version 2.0
  *
- * Project home: http://compcache.googlecode.com
+ * Project home: http://compcache.googlecode.com/
  */
 
 #ifndef _ZRAM_DRV_H_
@@ -26,18 +26,6 @@
  */
 static const unsigned max_num_devices = 32;
 
-/*
- * Stored at beginning of each compressed object.
- *
- * It stores back-reference to table entry which points to this
- * object. This is required to support memory defragmentation.
- */
-struct zobj_header {
-#if 0
-	u32 table_idx;
-#endif
-};
-
 /*-- Configurable parameters */
 
 /* Default zram disk size: 25% of total RAM */
@@ -50,8 +38,7 @@ static const unsigned default_disksize_perc_ram = 25;
 static const unsigned max_zpage_size = PAGE_SIZE / 8 * 7;
 
 /*
- * NOTE: max_zpage_size must be less than or equal to:
- *   XV_MAX_ALLOC_SIZE - sizeof(struct zobj_header)
+ * NOTE: max_zpage_size must be less than or equal to XV_MAX_ALLOC_SIZE
  * otherwise, xv_malloc() would always return failure.
  */
 
@@ -92,16 +79,15 @@ struct zram {
 	struct table *table;
 	struct request_queue *queue;
 	struct gendisk *disk;
-	int init_done;
+	unsigned int init_done;
 	/* Prevent concurrent execution of device init and reset */
 	struct mutex init_lock;
 	/*
 	 * This is the limit on amount of *uncompressed* worth of data
-	 * we can store in a disk.
+	 * we can store in a disk (in bytes).
 	 */
-	u64 disksize;	/* bytes */
-
-	struct zram_stats_cpu *stats;
+	u64 disksize;
+	struct zram_stats_cpu *stats;	/* percpu stats */
 };
 
 extern struct zram *devices;
-- 
1.7.2.1


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 08/10] Some cleanups
@ 2010-08-09 17:26   ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

 - xvmalloc: Remove unnecessary stat_{inc,dec} and increment
   pages_stored directly
 - xvmalloc: Initialize pointers with NULL instead of 0
 - zram: Remove verbose message when use sets insane disksize
 - zram: Mark some messages as pr_debug
 - zram: Refine some comments

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
 drivers/staging/zram/xvmalloc.c |   22 ++++-------------
 drivers/staging/zram/zram_drv.c |   49 +++++++++++++-------------------------
 drivers/staging/zram/zram_drv.h |   26 ++++----------------
 3 files changed, 28 insertions(+), 69 deletions(-)

diff --git a/drivers/staging/zram/xvmalloc.c b/drivers/staging/zram/xvmalloc.c
index 3fdbb8a..6268f65 100644
--- a/drivers/staging/zram/xvmalloc.c
+++ b/drivers/staging/zram/xvmalloc.c
@@ -20,16 +20,6 @@
 #include "xvmalloc.h"
 #include "xvmalloc_int.h"
 
-static void stat_inc(u64 *value)
-{
-	*value = *value + 1;
-}
-
-static void stat_dec(u64 *value)
-{
-	*value = *value - 1;
-}
-
 static int test_flag(struct block_header *block, enum blockflags flag)
 {
 	return block->prev & BIT(flag);
@@ -187,7 +177,7 @@ static void insert_block(struct xv_pool *pool, struct page *page, u32 offset,
 	slindex = get_index_for_insert(block->size);
 	flindex = slindex / BITS_PER_LONG;
 
-	block->link.prev_page = 0;
+	block->link.prev_page = NULL;
 	block->link.prev_offset = 0;
 	block->link.next_page = pool->freelist[slindex].page;
 	block->link.next_offset = pool->freelist[slindex].offset;
@@ -217,7 +207,7 @@ static void remove_block_head(struct xv_pool *pool,
 
 	pool->freelist[slindex].page = block->link.next_page;
 	pool->freelist[slindex].offset = block->link.next_offset;
-	block->link.prev_page = 0;
+	block->link.prev_page = NULL;
 	block->link.prev_offset = 0;
 
 	if (!pool->freelist[slindex].page) {
@@ -232,7 +222,7 @@ static void remove_block_head(struct xv_pool *pool,
 		 */
 		tmpblock = get_ptr_atomic(pool->freelist[slindex].page,
 				pool->freelist[slindex].offset, KM_USER1);
-		tmpblock->link.prev_page = 0;
+		tmpblock->link.prev_page = NULL;
 		tmpblock->link.prev_offset = 0;
 		put_ptr_atomic(tmpblock, KM_USER1);
 	}
@@ -284,7 +274,7 @@ static int grow_pool(struct xv_pool *pool, gfp_t flags)
 	if (unlikely(!page))
 		return -ENOMEM;
 
-	stat_inc(&pool->total_pages);
+	pool->total_pages++;
 
 	spin_lock(&pool->lock);
 	block = get_ptr_atomic(page, 0, KM_USER0);
@@ -361,8 +351,6 @@ int xv_malloc(struct xv_pool *pool, u32 size, struct page **page,
 
 	if (!*page) {
 		spin_unlock(&pool->lock);
-		if (flags & GFP_NOWAIT)
-			return -ENOMEM;
 		error = grow_pool(pool, flags);
 		if (unlikely(error))
 			return error;
@@ -472,7 +460,7 @@ void xv_free(struct xv_pool *pool, struct page *page, u32 offset)
 		spin_unlock(&pool->lock);
 
 		__free_page(page);
-		stat_dec(&pool->total_pages);
+		pool->total_pages--;
 		return;
 	}
 
diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
index 0f9785f..42aa271 100644
--- a/drivers/staging/zram/zram_drv.c
+++ b/drivers/staging/zram/zram_drv.c
@@ -9,7 +9,7 @@
  * Released under the terms of 3-clause BSD License
  * Released under the terms of GNU General Public License Version 2.0
  *
- * Project home: http://compcache.googlecode.com
+ * Project home: http://compcache.googlecode.com/
  */
 
 #define KMSG_COMPONENT "zram"
@@ -130,33 +130,15 @@ static void zram_insert_obj(struct zram *zram, u32 index, struct page *page,
 	zram->table[index].addr = addr;
 }
 
-static void zram_set_disksize(struct zram *zram, size_t totalram_bytes)
+static u64 zram_default_disksize(void)
 {
-	if (!zram->disksize) {
-		pr_info(
-		"disk size not provided. You can use disksize_kb module "
-		"param to specify size.\nUsing default: (%u%% of RAM).\n",
-		default_disksize_perc_ram
-		);
-		zram->disksize = default_disksize_perc_ram *
-					(totalram_bytes / 100);
-	}
+	u64 disksize;
 
-	if (zram->disksize > 2 * (totalram_bytes)) {
-		pr_info(
-		"There is little point creating a zram of greater than "
-		"twice the size of memory since we expect a 2:1 compression "
-		"ratio. Note that zram uses about 0.1%% of the size of "
-		"the disk when not in use so a huge zram is "
-		"wasteful.\n"
-		"\tMemory Size: %zu kB\n"
-		"\tSize you selected: %llu kB\n"
-		"Continuing anyway ...\n",
-		totalram_bytes >> 10, zram->disksize
-		);
-	}
+	disksize = default_disksize_perc_ram *
+			(totalram_pages / 100);
+	disksize = (disksize << PAGE_SHIFT) & PAGE_MASK;
 
-	zram->disksize &= PAGE_MASK;
+	return disksize;
 }
 
 static void zram_free_page(struct zram *zram, size_t index)
@@ -459,7 +441,7 @@ static int zram_make_request(struct request_queue *queue, struct bio *bio)
 	int ret = 0;
 	struct zram *zram = queue->queuedata;
 
-	if (!valid_io_request(zram, bio)) {
+	if (unlikely(!valid_io_request(zram, bio))) {
 		zram_inc_stat(zram, ZRAM_STAT_INVALID_IO);
 		bio_io_error(bio);
 		return 0;
@@ -504,7 +486,7 @@ void zram_reset_device(struct zram *zram)
 	/* Reset stats */
 	memset(&zram->stats, 0, sizeof(zram->stats));
 
-	zram->disksize = 0;
+	zram->disksize = zram_default_disksize();
 	mutex_unlock(&zram->init_lock);
 }
 
@@ -520,7 +502,8 @@ int zram_init_device(struct zram *zram)
 		return 0;
 	}
 
-	zram_set_disksize(zram, totalram_pages << PAGE_SHIFT);
+	if (!zram->disksize)
+		zram->disksize = zram_default_disksize();
 
 	num_pages = zram->disksize >> PAGE_SHIFT;
 	zram->table = vmalloc(num_pages * sizeof(*zram->table));
@@ -566,7 +549,8 @@ fail:
 	return ret;
 }
 
-void zram_slot_free_notify(struct block_device *bdev, unsigned long index)
+static void zram_slot_free_notify(struct block_device *bdev,
+				unsigned long index)
 {
 	struct zram *zram;
 
@@ -614,8 +598,9 @@ static int create_device(struct zram *zram, int device_id)
 	zram->disk->private_data = zram;
 	snprintf(zram->disk->disk_name, 16, "zram%d", device_id);
 
-	/* Actual capacity set using syfs (/sys/block/zram<id>/disksize */
-	set_capacity(zram->disk, 0);
+	/* Custom size can be set using syfs (/sys/block/zram<id>/disksize) */
+	zram->disksize = zram_default_disksize();
+	set_capacity(zram->disk, zram->disksize >> SECTOR_SHIFT);
 
 	/*
 	 * To ensure that we always get PAGE_SIZE aligned
@@ -739,7 +724,7 @@ static int __init zram_init(void)
 	}
 
 	/* Allocate the device array and initialize each one */
-	pr_info("Creating %u devices ...\n", num_devices);
+	pr_debug("Creating %u devices ...\n", num_devices);
 	devices = kzalloc(num_devices * sizeof(struct zram), GFP_KERNEL);
 	if (!devices) {
 		ret = -ENOMEM;
diff --git a/drivers/staging/zram/zram_drv.h b/drivers/staging/zram/zram_drv.h
index bcc51ea..2cf36db 100644
--- a/drivers/staging/zram/zram_drv.h
+++ b/drivers/staging/zram/zram_drv.h
@@ -9,7 +9,7 @@
  * Released under the terms of 3-clause BSD License
  * Released under the terms of GNU General Public License Version 2.0
  *
- * Project home: http://compcache.googlecode.com
+ * Project home: http://compcache.googlecode.com/
  */
 
 #ifndef _ZRAM_DRV_H_
@@ -26,18 +26,6 @@
  */
 static const unsigned max_num_devices = 32;
 
-/*
- * Stored at beginning of each compressed object.
- *
- * It stores back-reference to table entry which points to this
- * object. This is required to support memory defragmentation.
- */
-struct zobj_header {
-#if 0
-	u32 table_idx;
-#endif
-};
-
 /*-- Configurable parameters */
 
 /* Default zram disk size: 25% of total RAM */
@@ -50,8 +38,7 @@ static const unsigned default_disksize_perc_ram = 25;
 static const unsigned max_zpage_size = PAGE_SIZE / 8 * 7;
 
 /*
- * NOTE: max_zpage_size must be less than or equal to:
- *   XV_MAX_ALLOC_SIZE - sizeof(struct zobj_header)
+ * NOTE: max_zpage_size must be less than or equal to XV_MAX_ALLOC_SIZE
  * otherwise, xv_malloc() would always return failure.
  */
 
@@ -92,16 +79,15 @@ struct zram {
 	struct table *table;
 	struct request_queue *queue;
 	struct gendisk *disk;
-	int init_done;
+	unsigned int init_done;
 	/* Prevent concurrent execution of device init and reset */
 	struct mutex init_lock;
 	/*
 	 * This is the limit on amount of *uncompressed* worth of data
-	 * we can store in a disk.
+	 * we can store in a disk (in bytes).
 	 */
-	u64 disksize;	/* bytes */
-
-	struct zram_stats_cpu *stats;
+	u64 disksize;
+	struct zram_stats_cpu *stats;	/* percpu stats */
 };
 
 extern struct zram *devices;
-- 
1.7.2.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 09/10] Update zram documentation
  2010-08-09 17:26 ` Nitin Gupta
@ 2010-08-09 17:26   ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

Update zram documentation to reflect transition form
ioctl to sysfs interface.

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
 drivers/staging/zram/zram.txt |   58 +++++++++++++++++++++++++---------------
 1 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/drivers/staging/zram/zram.txt b/drivers/staging/zram/zram.txt
index 520edc1..5f75d29 100644
--- a/drivers/staging/zram/zram.txt
+++ b/drivers/staging/zram/zram.txt
@@ -5,33 +5,35 @@ Project home: http://compcache.googlecode.com/
 
 * Introduction
 
-The zram module creates RAM based block devices: /dev/ramX (X = 0, 1, ...).
-Pages written to these disks are compressed and stored in memory itself.
-These disks allow very fast I/O and compression provides good amounts of
-memory savings.
+The zram module creates RAM based block devices named /dev/zram<id>
+(<id> = 0, 1, ...). Pages written to these disks are compressed and stored
+in memory itself. These disks allow very fast I/O and compression provides
+good amounts of memory savings. Some of the usecases include /tmp storage,
+use as swap disks, various caches under /var and maybe many more :)
 
-See project home for use cases, performance numbers and a lot more.
-
-Individual zram devices are configured and initialized using zramconfig
-userspace utility as shown in examples below. See zramconfig man page for
-more details.
+Statistics for individual zram devices are exported through sysfs nodes at
+/sys/block/zram<id>/
 
 * Usage
 
 Following shows a typical sequence of steps for using zram.
 
-1) Load Modules:
+1) Load Module:
 	modprobe zram num_devices=4
-	This creates 4 (uninitialized) devices: /dev/zram{0,1,2,3}
+	This creates 4 devices: /dev/zram{0,1,2,3}
 	(num_devices parameter is optional. Default: 1)
 
-2) Initialize:
-	Use zramconfig utility to configure and initialize individual
-	zram devices. For example:
-	zramconfig /dev/zram0 --init # uses default value of disksize_kb
-	zramconfig /dev/zram1 --disksize_kb=102400 # 100MB /dev/zram1
+2) Set Disksize (Optional):
+	Set disk size by writing the value to sysfs node 'disksize'
+	(in bytes). If disksize is not given, default value of 25%
+	of RAM is used.
+
+	# Initialize /dev/zram0 with 50MB disksize
+	echo $((50*1024*1024)) > /sys/block/zram0/disksize
 
-	*See zramconfig man page for more details and examples*
+	NOTE: disksize cannot be changed if the disk contains any
+	data. So, for such a disk, you need to issue 'reset' (see below)
+	before you can change its disksize.
 
 3) Activate:
 	mkswap /dev/zram0
@@ -41,17 +43,29 @@ Following shows a typical sequence of steps for using zram.
 	mount /dev/zram1 /tmp
 
 4) Stats:
-	zramconfig /dev/zram0 --stats
-	zramconfig /dev/zram1 --stats
+	Per-device statistics are exported as various nodes under
+	/sys/block/zram<id>/
+		disksize
+		num_reads
+		num_writes
+		invalid_io
+		notify_free
+		discard
+		zero_pages
+		orig_data_size
+		compr_data_size
+		mem_used_total
 
 5) Deactivate:
 	swapoff /dev/zram0
 	umount /dev/zram1
 
 6) Reset:
-	zramconfig /dev/zram0 --reset
-	zramconfig /dev/zram1 --reset
-	(This frees memory allocated for the given device).
+	Write any positive value to 'reset' sysfs node
+	echo 1 > /sys/block/zram0/reset
+	echo 1 > /sys/block/zram1/reset
+
+	(This frees all the memory allocated for the given device).
 
 
 Please report any problems at:
-- 
1.7.2.1


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 09/10] Update zram documentation
@ 2010-08-09 17:26   ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

Update zram documentation to reflect transition form
ioctl to sysfs interface.

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
 drivers/staging/zram/zram.txt |   58 +++++++++++++++++++++++++---------------
 1 files changed, 36 insertions(+), 22 deletions(-)

diff --git a/drivers/staging/zram/zram.txt b/drivers/staging/zram/zram.txt
index 520edc1..5f75d29 100644
--- a/drivers/staging/zram/zram.txt
+++ b/drivers/staging/zram/zram.txt
@@ -5,33 +5,35 @@ Project home: http://compcache.googlecode.com/
 
 * Introduction
 
-The zram module creates RAM based block devices: /dev/ramX (X = 0, 1, ...).
-Pages written to these disks are compressed and stored in memory itself.
-These disks allow very fast I/O and compression provides good amounts of
-memory savings.
+The zram module creates RAM based block devices named /dev/zram<id>
+(<id> = 0, 1, ...). Pages written to these disks are compressed and stored
+in memory itself. These disks allow very fast I/O and compression provides
+good amounts of memory savings. Some of the usecases include /tmp storage,
+use as swap disks, various caches under /var and maybe many more :)
 
-See project home for use cases, performance numbers and a lot more.
-
-Individual zram devices are configured and initialized using zramconfig
-userspace utility as shown in examples below. See zramconfig man page for
-more details.
+Statistics for individual zram devices are exported through sysfs nodes at
+/sys/block/zram<id>/
 
 * Usage
 
 Following shows a typical sequence of steps for using zram.
 
-1) Load Modules:
+1) Load Module:
 	modprobe zram num_devices=4
-	This creates 4 (uninitialized) devices: /dev/zram{0,1,2,3}
+	This creates 4 devices: /dev/zram{0,1,2,3}
 	(num_devices parameter is optional. Default: 1)
 
-2) Initialize:
-	Use zramconfig utility to configure and initialize individual
-	zram devices. For example:
-	zramconfig /dev/zram0 --init # uses default value of disksize_kb
-	zramconfig /dev/zram1 --disksize_kb=102400 # 100MB /dev/zram1
+2) Set Disksize (Optional):
+	Set disk size by writing the value to sysfs node 'disksize'
+	(in bytes). If disksize is not given, default value of 25%
+	of RAM is used.
+
+	# Initialize /dev/zram0 with 50MB disksize
+	echo $((50*1024*1024)) > /sys/block/zram0/disksize
 
-	*See zramconfig man page for more details and examples*
+	NOTE: disksize cannot be changed if the disk contains any
+	data. So, for such a disk, you need to issue 'reset' (see below)
+	before you can change its disksize.
 
 3) Activate:
 	mkswap /dev/zram0
@@ -41,17 +43,29 @@ Following shows a typical sequence of steps for using zram.
 	mount /dev/zram1 /tmp
 
 4) Stats:
-	zramconfig /dev/zram0 --stats
-	zramconfig /dev/zram1 --stats
+	Per-device statistics are exported as various nodes under
+	/sys/block/zram<id>/
+		disksize
+		num_reads
+		num_writes
+		invalid_io
+		notify_free
+		discard
+		zero_pages
+		orig_data_size
+		compr_data_size
+		mem_used_total
 
 5) Deactivate:
 	swapoff /dev/zram0
 	umount /dev/zram1
 
 6) Reset:
-	zramconfig /dev/zram0 --reset
-	zramconfig /dev/zram1 --reset
-	(This frees memory allocated for the given device).
+	Write any positive value to 'reset' sysfs node
+	echo 1 > /sys/block/zram0/reset
+	echo 1 > /sys/block/zram1/reset
+
+	(This frees all the memory allocated for the given device).
 
 
 Please report any problems at:
-- 
1.7.2.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 10/10] Document sysfs entries
  2010-08-09 17:26 ` Nitin Gupta
@ 2010-08-09 17:26   ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
 Documentation/ABI/testing/sysfs-block-zram |   99 ++++++++++++++++++++++++++++
 1 files changed, 99 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-block-zram

diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
new file mode 100644
index 0000000..c8b3b48
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -0,0 +1,99 @@
+What:		/sys/block/zram<id>/disksize
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The disksize file is read-write and specifies the disk size
+		which represents the limit on the *uncompressed* worth of data
+		that can be stored in this disk.
+
+What:		/sys/block/zram<id>/initstate
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The disksize file is read-only and shows the initialization
+		state of the device.
+
+What:		/sys/block/zram<id>/reset
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The disksize file is write-only and allows resetting the
+		device. The reset operation frees all the memory assocaited
+		with this device.
+
+What:		/sys/block/zram<id>/num_reads
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The num_reads file is read-only and specifies the number of
+		reads (failed or successful) done on this device.
+
+What:		/sys/block/zram<id>/num_writes
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The num_writes file is read-only and specifies the number of
+		writes (failed or successful) done on this device.
+
+What:		/sys/block/zram<id>/invalid_io
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The invalid_io file is read-only and specifies the number of
+		non-page-size-aligned I/O requests issued to this device.
+
+What:		/sys/block/zram<id>/notify_free
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The notify_free file is read-only and specifies the number of
+		swap slot free notifications received by this device. These
+		notifications are send to a swap block device when a swap slot
+		is freed. This statistic is applicable only when this disk is
+		being used as a swap disk.
+
+What:		/sys/block/zram<id>/discard
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The discard file is read-only and specifies the number of
+		discard requests received by this device. These requests
+		provide information to block device regarding blocks which are
+		no longer used by filesystem.
+
+What:		/sys/block/zram<id>/zero_pages
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The zero_pages file is read-only and specifies number of zero
+		filled pages written to this disk. No memory is allocated for
+		such pages.
+
+What:		/sys/block/zram<id>/orig_data_size
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The orig_data_size file is read-only and specifies uncompressed
+		size of data stored in this disk. This excludes zero-filled
+		pages (zero_pages) since no memory is allocated for them.
+		Unit: bytes
+
+What:		/sys/block/zram<id>/compr_data_size
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The compr_data_size file is read-only and specifies compressed
+		size of data stored in this disk. So, compression ratio can be
+		calculated using orig_data_size and this statistic.
+		Unit: bytes
+
+What:		/sys/block/zram<id>/mem_used_total
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The mem_used_total file is read-only and specifies the amount
+		of memory, including allocator fragmentation and metadata
+		overhead, allocated for this disk. So, allocator space
+		efficiency can be calculated using compr_data_size and this
+		statistic.
+		Unit: bytes
\ No newline at end of file
-- 
1.7.2.1


^ permalink raw reply related	[flat|nested] 137+ messages in thread

* [PATCH 10/10] Document sysfs entries
@ 2010-08-09 17:26   ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-09 17:26 UTC (permalink / raw)
  To: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH
  Cc: Linux Driver Project, linux-mm, linux-kernel

Signed-off-by: Nitin Gupta <ngupta@vflare.org>
---
 Documentation/ABI/testing/sysfs-block-zram |   99 ++++++++++++++++++++++++++++
 1 files changed, 99 insertions(+), 0 deletions(-)
 create mode 100644 Documentation/ABI/testing/sysfs-block-zram

diff --git a/Documentation/ABI/testing/sysfs-block-zram b/Documentation/ABI/testing/sysfs-block-zram
new file mode 100644
index 0000000..c8b3b48
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-block-zram
@@ -0,0 +1,99 @@
+What:		/sys/block/zram<id>/disksize
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The disksize file is read-write and specifies the disk size
+		which represents the limit on the *uncompressed* worth of data
+		that can be stored in this disk.
+
+What:		/sys/block/zram<id>/initstate
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The disksize file is read-only and shows the initialization
+		state of the device.
+
+What:		/sys/block/zram<id>/reset
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The disksize file is write-only and allows resetting the
+		device. The reset operation frees all the memory assocaited
+		with this device.
+
+What:		/sys/block/zram<id>/num_reads
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The num_reads file is read-only and specifies the number of
+		reads (failed or successful) done on this device.
+
+What:		/sys/block/zram<id>/num_writes
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The num_writes file is read-only and specifies the number of
+		writes (failed or successful) done on this device.
+
+What:		/sys/block/zram<id>/invalid_io
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The invalid_io file is read-only and specifies the number of
+		non-page-size-aligned I/O requests issued to this device.
+
+What:		/sys/block/zram<id>/notify_free
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The notify_free file is read-only and specifies the number of
+		swap slot free notifications received by this device. These
+		notifications are send to a swap block device when a swap slot
+		is freed. This statistic is applicable only when this disk is
+		being used as a swap disk.
+
+What:		/sys/block/zram<id>/discard
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The discard file is read-only and specifies the number of
+		discard requests received by this device. These requests
+		provide information to block device regarding blocks which are
+		no longer used by filesystem.
+
+What:		/sys/block/zram<id>/zero_pages
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The zero_pages file is read-only and specifies number of zero
+		filled pages written to this disk. No memory is allocated for
+		such pages.
+
+What:		/sys/block/zram<id>/orig_data_size
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The orig_data_size file is read-only and specifies uncompressed
+		size of data stored in this disk. This excludes zero-filled
+		pages (zero_pages) since no memory is allocated for them.
+		Unit: bytes
+
+What:		/sys/block/zram<id>/compr_data_size
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The compr_data_size file is read-only and specifies compressed
+		size of data stored in this disk. So, compression ratio can be
+		calculated using orig_data_size and this statistic.
+		Unit: bytes
+
+What:		/sys/block/zram<id>/mem_used_total
+Date:		August 2010
+Contact:	Nitin Gupta <ngupta@vflare.org>
+Description:
+		The mem_used_total file is read-only and specifies the amount
+		of memory, including allocator fragmentation and metadata
+		overhead, allocated for this disk. So, allocator space
+		efficiency can be calculated using compr_data_size and this
+		statistic.
+		Unit: bytes
\ No newline at end of file
-- 
1.7.2.1

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply related	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/10] Increase compressed page size threshold
  2010-08-09 17:26   ` Nitin Gupta
@ 2010-08-09 18:32     ` Pekka Enberg
  -1 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-09 18:32 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

Hi Nitin,

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
> Compression takes much more time than decompression. So, its quite
> wasteful in terms of both CPU cycles and memory usage to have a very
> low compressed page size threshold and thereby storing such not-so-well
> compressible pages as-is (uncompressed). So, increasing it from
> PAGE_SIZE/2 to PAGE_SIZE/8*7. A low threshold was useful when we had
> "backing swap" support where we could forward such pages to the backing
> device (applicable only when zram was used as swap disk).
>
> It is not yet configurable through sysfs but may be exported in future,
> along with threshold for average compression ratio.
>
> Signed-off-by: Nitin Gupta <ngupta@vflare.org>

The description makes sense but lacks any real data. What kind of
workloads did you test this with? Where does it help most? How much?

                        Pekka

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 07/10] Increase compressed page size threshold
@ 2010-08-09 18:32     ` Pekka Enberg
  0 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-09 18:32 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

Hi Nitin,

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
> Compression takes much more time than decompression. So, its quite
> wasteful in terms of both CPU cycles and memory usage to have a very
> low compressed page size threshold and thereby storing such not-so-well
> compressible pages as-is (uncompressed). So, increasing it from
> PAGE_SIZE/2 to PAGE_SIZE/8*7. A low threshold was useful when we had
> "backing swap" support where we could forward such pages to the backing
> device (applicable only when zram was used as swap disk).
>
> It is not yet configurable through sysfs but may be exported in future,
> along with threshold for average compression ratio.
>
> Signed-off-by: Nitin Gupta <ngupta@vflare.org>

The description makes sense but lacks any real data. What kind of
workloads did you test this with? Where does it help most? How much?

                        Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 01/10] Replace ioctls with sysfs interface
  2010-08-09 17:26   ` Nitin Gupta
@ 2010-08-09 18:34     ` Pekka Enberg
  -1 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-09 18:34 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
> Creates per-device sysfs nodes in /sys/block/zram<id>/
> Currently following stats are exported:
>  - disksize
>  - num_reads
>  - num_writes
>  - invalid_io
>  - zero_pages
>  - orig_data_size
>  - compr_data_size
>  - mem_used_total
>
> By default, disksize is set to 0. So, to start using
> a zram device, fist write a disksize value and then
> initialize device by writing any positive value to
> initstate. For example:
>
>        # initialize /dev/zram0 with 50MB disksize
>        echo 50*1024*1024 | bc > /sys/block/zram0/disksize
>        echo 1 > /sys/block/zram0/initstate
>
> When done using a disk, issue reset to free its memory
> by writing any positive value to reset node:
>
>        echo 1 > /sys/block/zram0/reset
>
> This change also obviates the need for 'rzscontrol' utility.
>
> Signed-off-by: Nitin Gupta <ngupta@vflare.org>

Looks good to me (but I'm not a sysfs guy).

Acked-by: Pekka Enberg <penberg@kernel.org>

>  /* Module params (documentation at end) */
> -static unsigned int num_devices;
> +unsigned int num_devices;
> +
> +static void zram_stat_inc(u32 *v)
> +{
> +       *v = *v + 1;
> +}
> +
> +static void zram_stat_dec(u32 *v)
> +{
> +       *v = *v - 1;
> +}
> +
> +static void zram_stat64_add(struct zram *zram, u64 *v, u64 inc)
> +{
> +       spin_lock(&zram->stat64_lock);
> +       *v = *v + inc;
> +       spin_unlock(&zram->stat64_lock);
> +}
> +
> +static void zram_stat64_sub(struct zram *zram, u64 *v, u64 dec)
> +{
> +       spin_lock(&zram->stat64_lock);
> +       *v = *v - dec;
> +       spin_unlock(&zram->stat64_lock);
> +}
> +
> +static void zram_stat64_inc(struct zram *zram, u64 *v)
> +{
> +       zram_stat64_add(zram, v, 1);
> +}

These could probably use atomic_inc(), atomic64_inc(), and friends, no?

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 01/10] Replace ioctls with sysfs interface
@ 2010-08-09 18:34     ` Pekka Enberg
  0 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-09 18:34 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
> Creates per-device sysfs nodes in /sys/block/zram<id>/
> Currently following stats are exported:
>  - disksize
>  - num_reads
>  - num_writes
>  - invalid_io
>  - zero_pages
>  - orig_data_size
>  - compr_data_size
>  - mem_used_total
>
> By default, disksize is set to 0. So, to start using
> a zram device, fist write a disksize value and then
> initialize device by writing any positive value to
> initstate. For example:
>
>        # initialize /dev/zram0 with 50MB disksize
>        echo 50*1024*1024 | bc > /sys/block/zram0/disksize
>        echo 1 > /sys/block/zram0/initstate
>
> When done using a disk, issue reset to free its memory
> by writing any positive value to reset node:
>
>        echo 1 > /sys/block/zram0/reset
>
> This change also obviates the need for 'rzscontrol' utility.
>
> Signed-off-by: Nitin Gupta <ngupta@vflare.org>

Looks good to me (but I'm not a sysfs guy).

Acked-by: Pekka Enberg <penberg@kernel.org>

>  /* Module params (documentation at end) */
> -static unsigned int num_devices;
> +unsigned int num_devices;
> +
> +static void zram_stat_inc(u32 *v)
> +{
> +       *v = *v + 1;
> +}
> +
> +static void zram_stat_dec(u32 *v)
> +{
> +       *v = *v - 1;
> +}
> +
> +static void zram_stat64_add(struct zram *zram, u64 *v, u64 inc)
> +{
> +       spin_lock(&zram->stat64_lock);
> +       *v = *v + inc;
> +       spin_unlock(&zram->stat64_lock);
> +}
> +
> +static void zram_stat64_sub(struct zram *zram, u64 *v, u64 dec)
> +{
> +       spin_lock(&zram->stat64_lock);
> +       *v = *v - dec;
> +       spin_unlock(&zram->stat64_lock);
> +}
> +
> +static void zram_stat64_inc(struct zram *zram, u64 *v)
> +{
> +       zram_stat64_add(zram, v, 1);
> +}

These could probably use atomic_inc(), atomic64_inc(), and friends, no?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 02/10] Remove need for explicit device initialization
  2010-08-09 17:26   ` Nitin Gupta
@ 2010-08-09 18:36     ` Pekka Enberg
  -1 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-09 18:36 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
> Currently, the user has to explicitly write a positive value to
> initstate sysfs node before the device can be used. This event
> triggers allocation of per-device metadata like memory pool,
> table array and so on.
>
> We do not pre-initialize all zram devices since the 'table' array,
> mapping disk blocks to compressed chunks, takes considerable amount
> of memory (8 bytes per page). So, pre-initializing all devices will
> be quite wasteful if only few or none of the devices are actually
> used.
>
> This explicit device initialization from user is an odd requirement and
> can be easily avoided. We now initialize the device when first write is
> done to the device.
>
> Signed-off-by: Nitin Gupta <ngupta@vflare.org>

AFAICT, most hardware block device drivers do things like this in the
probe function. Why can't we do that for zram as well and drop the
->init_done and ->init_lock parts?

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 02/10] Remove need for explicit device initialization
@ 2010-08-09 18:36     ` Pekka Enberg
  0 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-09 18:36 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
> Currently, the user has to explicitly write a positive value to
> initstate sysfs node before the device can be used. This event
> triggers allocation of per-device metadata like memory pool,
> table array and so on.
>
> We do not pre-initialize all zram devices since the 'table' array,
> mapping disk blocks to compressed chunks, takes considerable amount
> of memory (8 bytes per page). So, pre-initializing all devices will
> be quite wasteful if only few or none of the devices are actually
> used.
>
> This explicit device initialization from user is an odd requirement and
> can be easily avoided. We now initialize the device when first write is
> done to the device.
>
> Signed-off-by: Nitin Gupta <ngupta@vflare.org>

AFAICT, most hardware block device drivers do things like this in the
probe function. Why can't we do that for zram as well and drop the
->init_done and ->init_lock parts?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
  2010-08-09 17:26   ` Nitin Gupta
@ 2010-08-09 18:44     ` Pekka Enberg
  -1 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-09 18:44 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
> Also remove references to removed stats (ex: good_comress).
>
> Signed-off-by: Nitin Gupta <ngupta@vflare.org>

Acked-by: Pekka Enberg <penberg@kernel.org>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
@ 2010-08-09 18:44     ` Pekka Enberg
  0 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-09 18:44 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
> Also remove references to removed stats (ex: good_comress).
>
> Signed-off-by: Nitin Gupta <ngupta@vflare.org>

Acked-by: Pekka Enberg <penberg@kernel.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 04/10] Use percpu buffers
  2010-08-09 17:26   ` Nitin Gupta
@ 2010-08-09 18:57     ` Pekka Enberg
  -1 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-09 18:57 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
> @@ -303,38 +307,41 @@ static int zram_write(struct zram *zram, struct bio *bio)
>                                zram_test_flag(zram, index, ZRAM_ZERO))
>                        zram_free_page(zram, index);
>
> -               mutex_lock(&zram->lock);
> +               preempt_disable();
> +               zbuffer = __get_cpu_var(compress_buffer);
> +               zworkmem = __get_cpu_var(compress_workmem);
> +               if (unlikely(!zbuffer || !zworkmem)) {
> +                       preempt_enable();
> +                       goto out;
> +               }

The per-CPU buffer thing with this preempt_disable() trickery looks
overkill to me. Most block device drivers seem to use mempool_alloc()
for this sort of thing. Is there some reason you can't use that here?

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 04/10] Use percpu buffers
@ 2010-08-09 18:57     ` Pekka Enberg
  0 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-09 18:57 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
> @@ -303,38 +307,41 @@ static int zram_write(struct zram *zram, struct bio *bio)
>                                zram_test_flag(zram, index, ZRAM_ZERO))
>                        zram_free_page(zram, index);
>
> -               mutex_lock(&zram->lock);
> +               preempt_disable();
> +               zbuffer = __get_cpu_var(compress_buffer);
> +               zworkmem = __get_cpu_var(compress_workmem);
> +               if (unlikely(!zbuffer || !zworkmem)) {
> +                       preempt_enable();
> +                       goto out;
> +               }

The per-CPU buffer thing with this preempt_disable() trickery looks
overkill to me. Most block device drivers seem to use mempool_alloc()
for this sort of thing. Is there some reason you can't use that here?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 05/10] Reduce per table entry overhead by 4 bytes
  2010-08-09 17:26   ` Nitin Gupta
@ 2010-08-09 18:59     ` Pekka Enberg
  -1 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-09 18:59 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
> Each zram device maintains an array (table) that maps
> index within the device to the location of corresponding
> compressed chunk. Currently we store 'struct page' pointer,
> offset with page and various flags separately which takes
> 12 bytes per table entry. Now all these are encoded in a
> single 'phys_add_t' value which results in savings of 4 bytes
> per entry (except on PAE systems).
>
> Unfortunately, cleanups related to some variable renames
> were mixed in this patch. So, please bear some additional
> noise.

The noise makes this patch pretty difficult to review properly. Care
to spilt the patch into two pieces?

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 05/10] Reduce per table entry overhead by 4 bytes
@ 2010-08-09 18:59     ` Pekka Enberg
  0 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-09 18:59 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
> Each zram device maintains an array (table) that maps
> index within the device to the location of corresponding
> compressed chunk. Currently we store 'struct page' pointer,
> offset with page and various flags separately which takes
> 12 bytes per table entry. Now all these are encoded in a
> single 'phys_add_t' value which results in savings of 4 bytes
> per entry (except on PAE systems).
>
> Unfortunately, cleanups related to some variable renames
> were mixed in this patch. So, please bear some additional
> noise.

The noise makes this patch pretty difficult to review properly. Care
to spilt the patch into two pieces?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 08/10] Some cleanups
  2010-08-09 17:26   ` Nitin Gupta
@ 2010-08-09 19:02     ` Pekka Enberg
  -1 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-09 19:02 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>  - xvmalloc: Remove unnecessary stat_{inc,dec} and increment
>   pages_stored directly
>  - xvmalloc: Initialize pointers with NULL instead of 0
>  - zram: Remove verbose message when use sets insane disksize
>  - zram: Mark some messages as pr_debug
>  - zram: Refine some comments
>
> Signed-off-by: Nitin Gupta <ngupta@vflare.org>

Acked-by: Pekka Enberg <penberg@kernel.org>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 08/10] Some cleanups
@ 2010-08-09 19:02     ` Pekka Enberg
  0 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-09 19:02 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>  - xvmalloc: Remove unnecessary stat_{inc,dec} and increment
>   pages_stored directly
>  - xvmalloc: Initialize pointers with NULL instead of 0
>  - zram: Remove verbose message when use sets insane disksize
>  - zram: Mark some messages as pr_debug
>  - zram: Refine some comments
>
> Signed-off-by: Nitin Gupta <ngupta@vflare.org>

Acked-by: Pekka Enberg <penberg@kernel.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 10/10] Document sysfs entries
  2010-08-09 17:26   ` Nitin Gupta
@ 2010-08-09 19:02     ` Pekka Enberg
  -1 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-09 19:02 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
> Signed-off-by: Nitin Gupta <ngupta@vflare.org>

Acked-by: Pekka Enberg <penberg@kernel.org>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 10/10] Document sysfs entries
@ 2010-08-09 19:02     ` Pekka Enberg
  0 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-09 19:02 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
> Signed-off-by: Nitin Gupta <ngupta@vflare.org>

Acked-by: Pekka Enberg <penberg@kernel.org>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 06/10] Block discard support
  2010-08-09 17:26   ` Nitin Gupta
@ 2010-08-09 19:03     ` Pekka Enberg
  -1 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-09 19:03 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel, linux-fsdevel,
	jaxboe

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
> The 'discard' bio discard request provides information to
> zram disks regarding blocks which are no longer in use by
> filesystem. This allows freeing memory allocated for such
> blocks.
>
> When zram devices are used as swap disks, we already have
> a callback (block_device_operations->swap_slot_free_notify).
> So, the discard support is useful only when used as generic
> (non-swap) disk.
>
> Signed-off-by: Nitin Gupta <ngupta@vflare.org>

Lets CC fsdevel and Jens for this.

> ---
>  drivers/staging/zram/zram_drv.c   |   25 +++++++++++++++++++++++++
>  drivers/staging/zram/zram_sysfs.c |   11 +++++++++++
>  2 files changed, 36 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> index efe9c93..0f9785f 100644
> --- a/drivers/staging/zram/zram_drv.c
> +++ b/drivers/staging/zram/zram_drv.c
> @@ -420,6 +420,20 @@ out:
>        return 0;
>  }
>
> +static void zram_discard(struct zram *zram, struct bio *bio)
> +{
> +       size_t bytes = bio->bi_size;
> +       sector_t sector = bio->bi_sector;
> +
> +       while (bytes >= PAGE_SIZE) {
> +               zram_free_page(zram, sector >> SECTORS_PER_PAGE_SHIFT);
> +               sector += PAGE_SIZE >> SECTOR_SHIFT;
> +               bytes -= PAGE_SIZE;
> +       }
> +
> +       bio_endio(bio, 0);
> +}
> +
>  /*
>  * Check if request is within bounds and page aligned.
>  */
> @@ -451,6 +465,12 @@ static int zram_make_request(struct request_queue *queue, struct bio *bio)
>                return 0;
>        }
>
> +       if (unlikely(bio_rw_flagged(bio, BIO_RW_DISCARD))) {
> +               zram_inc_stat(zram, ZRAM_STAT_DISCARD);
> +               zram_discard(zram, bio);
> +               return 0;
> +       }
> +
>        switch (bio_data_dir(bio)) {
>        case READ:
>                ret = zram_read(zram, bio);
> @@ -606,6 +626,11 @@ static int create_device(struct zram *zram, int device_id)
>        blk_queue_io_min(zram->disk->queue, PAGE_SIZE);
>        blk_queue_io_opt(zram->disk->queue, PAGE_SIZE);
>
> +       zram->disk->queue->limits.discard_granularity = PAGE_SIZE;
> +       zram->disk->queue->limits.max_discard_sectors = UINT_MAX;
> +       zram->disk->queue->limits.discard_zeroes_data = 1;
> +       queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, zram->queue);
> +
>        add_disk(zram->disk);
>
>  #ifdef CONFIG_SYSFS
> diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
> index 43bcdd4..74971c0 100644
> --- a/drivers/staging/zram/zram_sysfs.c
> +++ b/drivers/staging/zram/zram_sysfs.c
> @@ -165,6 +165,15 @@ static ssize_t notify_free_show(struct device *dev,
>                zram_get_stat(zram, ZRAM_STAT_NOTIFY_FREE));
>  }
>
> +static ssize_t discard_show(struct device *dev,
> +               struct device_attribute *attr, char *buf)
> +{
> +       struct zram *zram = dev_to_zram(dev);
> +
> +       return sprintf(buf, "%llu\n",
> +               zram_get_stat(zram, ZRAM_STAT_DISCARD));
> +}
> +
>  static ssize_t zero_pages_show(struct device *dev,
>                struct device_attribute *attr, char *buf)
>  {
> @@ -215,6 +224,7 @@ static DEVICE_ATTR(num_reads, S_IRUGO, num_reads_show, NULL);
>  static DEVICE_ATTR(num_writes, S_IRUGO, num_writes_show, NULL);
>  static DEVICE_ATTR(invalid_io, S_IRUGO, invalid_io_show, NULL);
>  static DEVICE_ATTR(notify_free, S_IRUGO, notify_free_show, NULL);
> +static DEVICE_ATTR(discard, S_IRUGO, discard_show, NULL);
>  static DEVICE_ATTR(zero_pages, S_IRUGO, zero_pages_show, NULL);
>  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>  static DEVICE_ATTR(compr_data_size, S_IRUGO, compr_data_size_show, NULL);
> @@ -228,6 +238,7 @@ static struct attribute *zram_disk_attrs[] = {
>        &dev_attr_num_writes.attr,
>        &dev_attr_invalid_io.attr,
>        &dev_attr_notify_free.attr,
> +       &dev_attr_discard.attr,
>        &dev_attr_zero_pages.attr,
>        &dev_attr_orig_data_size.attr,
>        &dev_attr_compr_data_size.attr,
> --
> 1.7.2.1
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 06/10] Block discard support
@ 2010-08-09 19:03     ` Pekka Enberg
  0 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-09 19:03 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel, linux-fsdevel,
	jaxboe

On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
> The 'discard' bio discard request provides information to
> zram disks regarding blocks which are no longer in use by
> filesystem. This allows freeing memory allocated for such
> blocks.
>
> When zram devices are used as swap disks, we already have
> a callback (block_device_operations->swap_slot_free_notify).
> So, the discard support is useful only when used as generic
> (non-swap) disk.
>
> Signed-off-by: Nitin Gupta <ngupta@vflare.org>

Lets CC fsdevel and Jens for this.

> ---
>  drivers/staging/zram/zram_drv.c   |   25 +++++++++++++++++++++++++
>  drivers/staging/zram/zram_sysfs.c |   11 +++++++++++
>  2 files changed, 36 insertions(+), 0 deletions(-)
>
> diff --git a/drivers/staging/zram/zram_drv.c b/drivers/staging/zram/zram_drv.c
> index efe9c93..0f9785f 100644
> --- a/drivers/staging/zram/zram_drv.c
> +++ b/drivers/staging/zram/zram_drv.c
> @@ -420,6 +420,20 @@ out:
>        return 0;
>  }
>
> +static void zram_discard(struct zram *zram, struct bio *bio)
> +{
> +       size_t bytes = bio->bi_size;
> +       sector_t sector = bio->bi_sector;
> +
> +       while (bytes >= PAGE_SIZE) {
> +               zram_free_page(zram, sector >> SECTORS_PER_PAGE_SHIFT);
> +               sector += PAGE_SIZE >> SECTOR_SHIFT;
> +               bytes -= PAGE_SIZE;
> +       }
> +
> +       bio_endio(bio, 0);
> +}
> +
>  /*
>  * Check if request is within bounds and page aligned.
>  */
> @@ -451,6 +465,12 @@ static int zram_make_request(struct request_queue *queue, struct bio *bio)
>                return 0;
>        }
>
> +       if (unlikely(bio_rw_flagged(bio, BIO_RW_DISCARD))) {
> +               zram_inc_stat(zram, ZRAM_STAT_DISCARD);
> +               zram_discard(zram, bio);
> +               return 0;
> +       }
> +
>        switch (bio_data_dir(bio)) {
>        case READ:
>                ret = zram_read(zram, bio);
> @@ -606,6 +626,11 @@ static int create_device(struct zram *zram, int device_id)
>        blk_queue_io_min(zram->disk->queue, PAGE_SIZE);
>        blk_queue_io_opt(zram->disk->queue, PAGE_SIZE);
>
> +       zram->disk->queue->limits.discard_granularity = PAGE_SIZE;
> +       zram->disk->queue->limits.max_discard_sectors = UINT_MAX;
> +       zram->disk->queue->limits.discard_zeroes_data = 1;
> +       queue_flag_set_unlocked(QUEUE_FLAG_DISCARD, zram->queue);
> +
>        add_disk(zram->disk);
>
>  #ifdef CONFIG_SYSFS
> diff --git a/drivers/staging/zram/zram_sysfs.c b/drivers/staging/zram/zram_sysfs.c
> index 43bcdd4..74971c0 100644
> --- a/drivers/staging/zram/zram_sysfs.c
> +++ b/drivers/staging/zram/zram_sysfs.c
> @@ -165,6 +165,15 @@ static ssize_t notify_free_show(struct device *dev,
>                zram_get_stat(zram, ZRAM_STAT_NOTIFY_FREE));
>  }
>
> +static ssize_t discard_show(struct device *dev,
> +               struct device_attribute *attr, char *buf)
> +{
> +       struct zram *zram = dev_to_zram(dev);
> +
> +       return sprintf(buf, "%llu\n",
> +               zram_get_stat(zram, ZRAM_STAT_DISCARD));
> +}
> +
>  static ssize_t zero_pages_show(struct device *dev,
>                struct device_attribute *attr, char *buf)
>  {
> @@ -215,6 +224,7 @@ static DEVICE_ATTR(num_reads, S_IRUGO, num_reads_show, NULL);
>  static DEVICE_ATTR(num_writes, S_IRUGO, num_writes_show, NULL);
>  static DEVICE_ATTR(invalid_io, S_IRUGO, invalid_io_show, NULL);
>  static DEVICE_ATTR(notify_free, S_IRUGO, notify_free_show, NULL);
> +static DEVICE_ATTR(discard, S_IRUGO, discard_show, NULL);
>  static DEVICE_ATTR(zero_pages, S_IRUGO, zero_pages_show, NULL);
>  static DEVICE_ATTR(orig_data_size, S_IRUGO, orig_data_size_show, NULL);
>  static DEVICE_ATTR(compr_data_size, S_IRUGO, compr_data_size_show, NULL);
> @@ -228,6 +238,7 @@ static struct attribute *zram_disk_attrs[] = {
>        &dev_attr_num_writes.attr,
>        &dev_attr_invalid_io.attr,
>        &dev_attr_notify_free.attr,
> +       &dev_attr_discard.attr,
>        &dev_attr_zero_pages.attr,
>        &dev_attr_orig_data_size.attr,
>        &dev_attr_compr_data_size.attr,
> --
> 1.7.2.1
>
> --
> To unsubscribe, send a message with 'unsubscribe linux-mm' in
> the body to majordomo@kvack.org.  For more info on Linux MM,
> see: http://www.linux-mm.org/ .
> Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>
>

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 06/10] Block discard support
  2010-08-09 19:03     ` Pekka Enberg
  (?)
@ 2010-08-10  2:23       ` Jens Axboe
  -1 siblings, 0 replies; 137+ messages in thread
From: Jens Axboe @ 2010-08-10  2:23 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Nitin Gupta, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel, linux-fsdevel

On 08/09/2010 03:03 PM, Pekka Enberg wrote:
> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>> The 'discard' bio discard request provides information to
>> zram disks regarding blocks which are no longer in use by
>> filesystem. This allows freeing memory allocated for such
>> blocks.
>>
>> When zram devices are used as swap disks, we already have
>> a callback (block_device_operations->swap_slot_free_notify).
>> So, the discard support is useful only when used as generic
>> (non-swap) disk.
>>
>> Signed-off-by: Nitin Gupta <ngupta@vflare.org>
> 
> Lets CC fsdevel and Jens for this.

Looks OK from a quick look. One comment, though:

>> +static void zram_discard(struct zram *zram, struct bio *bio)
>> +{
>> +       size_t bytes = bio->bi_size;
>> +       sector_t sector = bio->bi_sector;
>> +
>> +       while (bytes >= PAGE_SIZE) {
>> +               zram_free_page(zram, sector >> SECTORS_PER_PAGE_SHIFT);
>> +               sector += PAGE_SIZE >> SECTOR_SHIFT;
>> +               bytes -= PAGE_SIZE;
>> +       }
>> +
>> +       bio_endio(bio, 0);
>> +}
>> +

So freeing the page here will guarantee zeroed return on read?
And since you set PAGE_SIZE as the discard granularity, the above loop
could be coded more readable with the knowledge that ->bi_size is always
a multiple of the page size.

-- 
Jens Axboe


Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 06/10] Block discard support
@ 2010-08-10  2:23       ` Jens Axboe
  0 siblings, 0 replies; 137+ messages in thread
From: Jens Axboe @ 2010-08-10  2:23 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Nitin Gupta, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel, linux-fsdevel

On 08/09/2010 03:03 PM, Pekka Enberg wrote:
> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>> The 'discard' bio discard request provides information to
>> zram disks regarding blocks which are no longer in use by
>> filesystem. This allows freeing memory allocated for such
>> blocks.
>>
>> When zram devices are used as swap disks, we already have
>> a callback (block_device_operations->swap_slot_free_notify).
>> So, the discard support is useful only when used as generic
>> (non-swap) disk.
>>
>> Signed-off-by: Nitin Gupta <ngupta@vflare.org>
> 
> Lets CC fsdevel and Jens for this.

Looks OK from a quick look. One comment, though:

>> +static void zram_discard(struct zram *zram, struct bio *bio)
>> +{
>> +       size_t bytes = bio->bi_size;
>> +       sector_t sector = bio->bi_sector;
>> +
>> +       while (bytes >= PAGE_SIZE) {
>> +               zram_free_page(zram, sector >> SECTORS_PER_PAGE_SHIFT);
>> +               sector += PAGE_SIZE >> SECTOR_SHIFT;
>> +               bytes -= PAGE_SIZE;
>> +       }
>> +
>> +       bio_endio(bio, 0);
>> +}
>> +

So freeing the page here will guarantee zeroed return on read?
And since you set PAGE_SIZE as the discard granularity, the above loop
could be coded more readable with the knowledge that ->bi_size is always
a multiple of the page size.

-- 
Jens Axboe


Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 06/10] Block discard support
@ 2010-08-10  2:23       ` Jens Axboe
  0 siblings, 0 replies; 137+ messages in thread
From: Jens Axboe @ 2010-08-10  2:23 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Nitin Gupta, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel, linux-fsdevel

On 08/09/2010 03:03 PM, Pekka Enberg wrote:
> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>> The 'discard' bio discard request provides information to
>> zram disks regarding blocks which are no longer in use by
>> filesystem. This allows freeing memory allocated for such
>> blocks.
>>
>> When zram devices are used as swap disks, we already have
>> a callback (block_device_operations->swap_slot_free_notify).
>> So, the discard support is useful only when used as generic
>> (non-swap) disk.
>>
>> Signed-off-by: Nitin Gupta <ngupta@vflare.org>
> 
> Lets CC fsdevel and Jens for this.

Looks OK from a quick look. One comment, though:

>> +static void zram_discard(struct zram *zram, struct bio *bio)
>> +{
>> +       size_t bytes = bio->bi_size;
>> +       sector_t sector = bio->bi_sector;
>> +
>> +       while (bytes >= PAGE_SIZE) {
>> +               zram_free_page(zram, sector >> SECTORS_PER_PAGE_SHIFT);
>> +               sector += PAGE_SIZE >> SECTOR_SHIFT;
>> +               bytes -= PAGE_SIZE;
>> +       }
>> +
>> +       bio_endio(bio, 0);
>> +}
>> +

So freeing the page here will guarantee zeroed return on read?
And since you set PAGE_SIZE as the discard granularity, the above loop
could be coded more readable with the knowledge that ->bi_size is always
a multiple of the page size.

-- 
Jens Axboe


Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 01/10] Replace ioctls with sysfs interface
  2010-08-09 18:34     ` Pekka Enberg
@ 2010-08-10  3:06       ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-10  3:06 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On 08/10/2010 12:04 AM, Pekka Enberg wrote:
> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>> Creates per-device sysfs nodes in /sys/block/zram<id>/
>> Currently following stats are exported:
>>  - disksize
>>  - num_reads
>>  - num_writes
>>  - invalid_io
>>  - zero_pages
>>  - orig_data_size
>>  - compr_data_size
>>  - mem_used_total
>>
<snip>
>>
>> Signed-off-by: Nitin Gupta <ngupta@vflare.org>
> 
> Looks good to me (but I'm not a sysfs guy).
> 
> Acked-by: Pekka Enberg <penberg@kernel.org>
> 

Thanks!

>>  /* Module params (documentation at end) */
>> -static unsigned int num_devices;
>> +unsigned int num_devices;
>> +
>> +static void zram_stat_inc(u32 *v)
>> +{
>> +       *v = *v + 1;
>> +}
>> +
>> +static void zram_stat_dec(u32 *v)
>> +{
>> +       *v = *v - 1;
>> +}
>> +
>> +static void zram_stat64_add(struct zram *zram, u64 *v, u64 inc)
>> +{
>> +       spin_lock(&zram->stat64_lock);
>> +       *v = *v + inc;
>> +       spin_unlock(&zram->stat64_lock);
>> +}
>> +
>> +static void zram_stat64_sub(struct zram *zram, u64 *v, u64 dec)
>> +{
>> +       spin_lock(&zram->stat64_lock);
>> +       *v = *v - dec;
>> +       spin_unlock(&zram->stat64_lock);
>> +}
>> +
>> +static void zram_stat64_inc(struct zram *zram, u64 *v)
>> +{
>> +       zram_stat64_add(zram, v, 1);
>> +}
> 
> These could probably use atomic_inc(), atomic64_inc(), and friends, no?
> 

Yes, I think we could use them. Anyways, they are replaced by percpu stats in
patch 3, so probably this can be left as-is.

Thanks,
Nitin

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 01/10] Replace ioctls with sysfs interface
@ 2010-08-10  3:06       ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-10  3:06 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On 08/10/2010 12:04 AM, Pekka Enberg wrote:
> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>> Creates per-device sysfs nodes in /sys/block/zram<id>/
>> Currently following stats are exported:
>>  - disksize
>>  - num_reads
>>  - num_writes
>>  - invalid_io
>>  - zero_pages
>>  - orig_data_size
>>  - compr_data_size
>>  - mem_used_total
>>
<snip>
>>
>> Signed-off-by: Nitin Gupta <ngupta@vflare.org>
> 
> Looks good to me (but I'm not a sysfs guy).
> 
> Acked-by: Pekka Enberg <penberg@kernel.org>
> 

Thanks!

>>  /* Module params (documentation at end) */
>> -static unsigned int num_devices;
>> +unsigned int num_devices;
>> +
>> +static void zram_stat_inc(u32 *v)
>> +{
>> +       *v = *v + 1;
>> +}
>> +
>> +static void zram_stat_dec(u32 *v)
>> +{
>> +       *v = *v - 1;
>> +}
>> +
>> +static void zram_stat64_add(struct zram *zram, u64 *v, u64 inc)
>> +{
>> +       spin_lock(&zram->stat64_lock);
>> +       *v = *v + inc;
>> +       spin_unlock(&zram->stat64_lock);
>> +}
>> +
>> +static void zram_stat64_sub(struct zram *zram, u64 *v, u64 dec)
>> +{
>> +       spin_lock(&zram->stat64_lock);
>> +       *v = *v - dec;
>> +       spin_unlock(&zram->stat64_lock);
>> +}
>> +
>> +static void zram_stat64_inc(struct zram *zram, u64 *v)
>> +{
>> +       zram_stat64_add(zram, v, 1);
>> +}
> 
> These could probably use atomic_inc(), atomic64_inc(), and friends, no?
> 

Yes, I think we could use them. Anyways, they are replaced by percpu stats in
patch 3, so probably this can be left as-is.

Thanks,
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 02/10] Remove need for explicit device initialization
  2010-08-09 18:36     ` Pekka Enberg
@ 2010-08-10  3:38       ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-10  3:38 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On 08/10/2010 12:06 AM, Pekka Enberg wrote:
> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>> Currently, the user has to explicitly write a positive value to
>> initstate sysfs node before the device can be used. This event
>> triggers allocation of per-device metadata like memory pool,
>> table array and so on.
>>
>> We do not pre-initialize all zram devices since the 'table' array,
>> mapping disk blocks to compressed chunks, takes considerable amount
>> of memory (8 bytes per page). So, pre-initializing all devices will
>> be quite wasteful if only few or none of the devices are actually
>> used.
>>
>> This explicit device initialization from user is an odd requirement and
>> can be easily avoided. We now initialize the device when first write is
>> done to the device.
>>
>> Signed-off-by: Nitin Gupta <ngupta@vflare.org>
> 
> AFAICT, most hardware block device drivers do things like this in the
> probe function. Why can't we do that for zram as well and drop the
> ->init_done and ->init_lock parts?
> 

I think probe is only for PCI devices? Maybe should hook into open function
in struct block_device_operations. That way, we can also drop init_done and
init_lock parts.

Thanks,
Nitin

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 02/10] Remove need for explicit device initialization
@ 2010-08-10  3:38       ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-10  3:38 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On 08/10/2010 12:06 AM, Pekka Enberg wrote:
> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>> Currently, the user has to explicitly write a positive value to
>> initstate sysfs node before the device can be used. This event
>> triggers allocation of per-device metadata like memory pool,
>> table array and so on.
>>
>> We do not pre-initialize all zram devices since the 'table' array,
>> mapping disk blocks to compressed chunks, takes considerable amount
>> of memory (8 bytes per page). So, pre-initializing all devices will
>> be quite wasteful if only few or none of the devices are actually
>> used.
>>
>> This explicit device initialization from user is an odd requirement and
>> can be easily avoided. We now initialize the device when first write is
>> done to the device.
>>
>> Signed-off-by: Nitin Gupta <ngupta@vflare.org>
> 
> AFAICT, most hardware block device drivers do things like this in the
> probe function. Why can't we do that for zram as well and drop the
> ->init_done and ->init_lock parts?
> 

I think probe is only for PCI devices? Maybe should hook into open function
in struct block_device_operations. That way, we can also drop init_done and
init_lock parts.

Thanks,
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
  2010-08-09 17:26   ` Nitin Gupta
@ 2010-08-10  4:34     ` Andrew Morton
  -1 siblings, 0 replies; 137+ messages in thread
From: Andrew Morton @ 2010-08-10  4:34 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Greg KH, Linux Driver Project,
	linux-mm, linux-kernel

On Mon,  9 Aug 2010 22:56:49 +0530 Nitin Gupta <ngupta@vflare.org> wrote:

> +/*
> + * Individual percpu values can go negative but the sum across all CPUs
> + * must always be positive (we store various counts). So, return sum as
> + * unsigned value.
> + */
> +static u64 zram_get_stat(struct zram *zram, enum zram_stats_index idx)
>  {
> -	u64 val;
> -
> -	spin_lock(&zram->stat64_lock);
> -	val = *v;
> -	spin_unlock(&zram->stat64_lock);
> +	int cpu;
> +	s64 val = 0;
> +
> +	for_each_possible_cpu(cpu) {
> +		s64 temp;
> +		unsigned int start;
> +		struct zram_stats_cpu *stats;
> +
> +		stats = per_cpu_ptr(zram->stats, cpu);
> +		do {
> +			start = u64_stats_fetch_begin(&stats->syncp);
> +			temp = stats->count[idx];
> +		} while (u64_stats_fetch_retry(&stats->syncp, start));
> +		val += temp;
> +	}
>  
> +	WARN_ON(val < 0);
>  	return val;
>  }

That reimplements include/linux/percpu_counter.h, poorly.

Please see the June discussion "[PATCH v2 1/2] tmpfs: Quick token
library to allow scalable retrieval of tokens from token jar" for some
discussion.


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
@ 2010-08-10  4:34     ` Andrew Morton
  0 siblings, 0 replies; 137+ messages in thread
From: Andrew Morton @ 2010-08-10  4:34 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Greg KH, Linux Driver Project,
	linux-mm, linux-kernel

On Mon,  9 Aug 2010 22:56:49 +0530 Nitin Gupta <ngupta@vflare.org> wrote:

> +/*
> + * Individual percpu values can go negative but the sum across all CPUs
> + * must always be positive (we store various counts). So, return sum as
> + * unsigned value.
> + */
> +static u64 zram_get_stat(struct zram *zram, enum zram_stats_index idx)
>  {
> -	u64 val;
> -
> -	spin_lock(&zram->stat64_lock);
> -	val = *v;
> -	spin_unlock(&zram->stat64_lock);
> +	int cpu;
> +	s64 val = 0;
> +
> +	for_each_possible_cpu(cpu) {
> +		s64 temp;
> +		unsigned int start;
> +		struct zram_stats_cpu *stats;
> +
> +		stats = per_cpu_ptr(zram->stats, cpu);
> +		do {
> +			start = u64_stats_fetch_begin(&stats->syncp);
> +			temp = stats->count[idx];
> +		} while (u64_stats_fetch_retry(&stats->syncp, start));
> +		val += temp;
> +	}
>  
> +	WARN_ON(val < 0);
>  	return val;
>  }

That reimplements include/linux/percpu_counter.h, poorly.

Please see the June discussion "[PATCH v2 1/2] tmpfs: Quick token
library to allow scalable retrieval of tokens from token jar" for some
discussion.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 04/10] Use percpu buffers
  2010-08-09 18:57     ` Pekka Enberg
@ 2010-08-10  4:47       ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-10  4:47 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On 08/10/2010 12:27 AM, Pekka Enberg wrote:
> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>> @@ -303,38 +307,41 @@ static int zram_write(struct zram *zram, struct bio *bio)
>>                                zram_test_flag(zram, index, ZRAM_ZERO))
>>                        zram_free_page(zram, index);
>>
>> -               mutex_lock(&zram->lock);
>> +               preempt_disable();
>> +               zbuffer = __get_cpu_var(compress_buffer);
>> +               zworkmem = __get_cpu_var(compress_workmem);
>> +               if (unlikely(!zbuffer || !zworkmem)) {
>> +                       preempt_enable();
>> +                       goto out;
>> +               }
> 
> The per-CPU buffer thing with this preempt_disable() trickery looks
> overkill to me. Most block device drivers seem to use mempool_alloc()
> for this sort of thing. Is there some reason you can't use that here?
> 

Other block drivers are allocating relatively small structs using
mempool_alloc(). However, in case of zram, these buffers are quite
large (compress_workmem is 64K!). So, allocating them on every write
would probably be much slower than using a pre-allocated per-cpu buffer.

Thanks,
Nitin

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 04/10] Use percpu buffers
@ 2010-08-10  4:47       ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-10  4:47 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On 08/10/2010 12:27 AM, Pekka Enberg wrote:
> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>> @@ -303,38 +307,41 @@ static int zram_write(struct zram *zram, struct bio *bio)
>>                                zram_test_flag(zram, index, ZRAM_ZERO))
>>                        zram_free_page(zram, index);
>>
>> -               mutex_lock(&zram->lock);
>> +               preempt_disable();
>> +               zbuffer = __get_cpu_var(compress_buffer);
>> +               zworkmem = __get_cpu_var(compress_workmem);
>> +               if (unlikely(!zbuffer || !zworkmem)) {
>> +                       preempt_enable();
>> +                       goto out;
>> +               }
> 
> The per-CPU buffer thing with this preempt_disable() trickery looks
> overkill to me. Most block device drivers seem to use mempool_alloc()
> for this sort of thing. Is there some reason you can't use that here?
> 

Other block drivers are allocating relatively small structs using
mempool_alloc(). However, in case of zram, these buffers are quite
large (compress_workmem is 64K!). So, allocating them on every write
would probably be much slower than using a pre-allocated per-cpu buffer.

Thanks,
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 06/10] Block discard support
  2010-08-10  2:23       ` Jens Axboe
  (?)
@ 2010-08-10  4:54         ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-10  4:54 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Pekka Enberg, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel, linux-fsdevel

On 08/10/2010 07:53 AM, Jens Axboe wrote:
> On 08/09/2010 03:03 PM, Pekka Enberg wrote:
>> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>>> The 'discard' bio discard request provides information to
>>> zram disks regarding blocks which are no longer in use by
>>> filesystem. This allows freeing memory allocated for such
>>> blocks.
>>>
>>> When zram devices are used as swap disks, we already have
>>> a callback (block_device_operations->swap_slot_free_notify).
>>> So, the discard support is useful only when used as generic
>>> (non-swap) disk.
>>>
>>> Signed-off-by: Nitin Gupta <ngupta@vflare.org>
>>
>> Lets CC fsdevel and Jens for this.
> 
> Looks OK from a quick look. One comment, though:
> 
>>> +static void zram_discard(struct zram *zram, struct bio *bio)
>>> +{
>>> +       size_t bytes = bio->bi_size;
>>> +       sector_t sector = bio->bi_sector;
>>> +
>>> +       while (bytes >= PAGE_SIZE) {
>>> +               zram_free_page(zram, sector >> SECTORS_PER_PAGE_SHIFT);
>>> +               sector += PAGE_SIZE >> SECTOR_SHIFT;
>>> +               bytes -= PAGE_SIZE;
>>> +       }
>>> +
>>> +       bio_endio(bio, 0);
>>> +}
>>> +
> 
> So freeing the page here will guarantee zeroed return on read?

For reads on freed/unwritten sectors, it simply returns success and
does not touch the bio page. Is it better to zero the page in such
cases?

> And since you set PAGE_SIZE as the discard granularity, the above loop
> could be coded more readable with the knowledge that ->bi_size is always
> a multiple of the page size.
> 

Ok, I will cleanup it up.

Thanks for comments.
Nitin

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 06/10] Block discard support
@ 2010-08-10  4:54         ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-10  4:54 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Pekka Enberg, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel, linux-fsdevel

On 08/10/2010 07:53 AM, Jens Axboe wrote:
> On 08/09/2010 03:03 PM, Pekka Enberg wrote:
>> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>>> The 'discard' bio discard request provides information to
>>> zram disks regarding blocks which are no longer in use by
>>> filesystem. This allows freeing memory allocated for such
>>> blocks.
>>>
>>> When zram devices are used as swap disks, we already have
>>> a callback (block_device_operations->swap_slot_free_notify).
>>> So, the discard support is useful only when used as generic
>>> (non-swap) disk.
>>>
>>> Signed-off-by: Nitin Gupta <ngupta@vflare.org>
>>
>> Lets CC fsdevel and Jens for this.
> 
> Looks OK from a quick look. One comment, though:
> 
>>> +static void zram_discard(struct zram *zram, struct bio *bio)
>>> +{
>>> +       size_t bytes = bio->bi_size;
>>> +       sector_t sector = bio->bi_sector;
>>> +
>>> +       while (bytes >= PAGE_SIZE) {
>>> +               zram_free_page(zram, sector >> SECTORS_PER_PAGE_SHIFT);
>>> +               sector += PAGE_SIZE >> SECTOR_SHIFT;
>>> +               bytes -= PAGE_SIZE;
>>> +       }
>>> +
>>> +       bio_endio(bio, 0);
>>> +}
>>> +
> 
> So freeing the page here will guarantee zeroed return on read?

For reads on freed/unwritten sectors, it simply returns success and
does not touch the bio page. Is it better to zero the page in such
cases?

> And since you set PAGE_SIZE as the discard granularity, the above loop
> could be coded more readable with the knowledge that ->bi_size is always
> a multiple of the page size.
> 

Ok, I will cleanup it up.

Thanks for comments.
Nitin

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 06/10] Block discard support
@ 2010-08-10  4:54         ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-10  4:54 UTC (permalink / raw)
  To: Jens Axboe
  Cc: Pekka Enberg, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel, linux-fsdevel

On 08/10/2010 07:53 AM, Jens Axboe wrote:
> On 08/09/2010 03:03 PM, Pekka Enberg wrote:
>> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>>> The 'discard' bio discard request provides information to
>>> zram disks regarding blocks which are no longer in use by
>>> filesystem. This allows freeing memory allocated for such
>>> blocks.
>>>
>>> When zram devices are used as swap disks, we already have
>>> a callback (block_device_operations->swap_slot_free_notify).
>>> So, the discard support is useful only when used as generic
>>> (non-swap) disk.
>>>
>>> Signed-off-by: Nitin Gupta <ngupta@vflare.org>
>>
>> Lets CC fsdevel and Jens for this.
> 
> Looks OK from a quick look. One comment, though:
> 
>>> +static void zram_discard(struct zram *zram, struct bio *bio)
>>> +{
>>> +       size_t bytes = bio->bi_size;
>>> +       sector_t sector = bio->bi_sector;
>>> +
>>> +       while (bytes >= PAGE_SIZE) {
>>> +               zram_free_page(zram, sector >> SECTORS_PER_PAGE_SHIFT);
>>> +               sector += PAGE_SIZE >> SECTOR_SHIFT;
>>> +               bytes -= PAGE_SIZE;
>>> +       }
>>> +
>>> +       bio_endio(bio, 0);
>>> +}
>>> +
> 
> So freeing the page here will guarantee zeroed return on read?

For reads on freed/unwritten sectors, it simply returns success and
does not touch the bio page. Is it better to zero the page in such
cases?

> And since you set PAGE_SIZE as the discard granularity, the above loop
> could be coded more readable with the knowledge that ->bi_size is always
> a multiple of the page size.
> 

Ok, I will cleanup it up.

Thanks for comments.
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 05/10] Reduce per table entry overhead by 4 bytes
  2010-08-09 18:59     ` Pekka Enberg
@ 2010-08-10  4:55       ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-10  4:55 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On 08/10/2010 12:29 AM, Pekka Enberg wrote:
> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>> Each zram device maintains an array (table) that maps
>> index within the device to the location of corresponding
>> compressed chunk. Currently we store 'struct page' pointer,
>> offset with page and various flags separately which takes
>> 12 bytes per table entry. Now all these are encoded in a
>> single 'phys_add_t' value which results in savings of 4 bytes
>> per entry (except on PAE systems).
>>
>> Unfortunately, cleanups related to some variable renames
>> were mixed in this patch. So, please bear some additional
>> noise.
> 
> The noise makes this patch pretty difficult to review properly. Care
> to spilt the patch into two pieces?
> 

Ok, I will split them as separate patches.

Thanks for all the reviews and Acks.
Nitin


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 05/10] Reduce per table entry overhead by 4 bytes
@ 2010-08-10  4:55       ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-10  4:55 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On 08/10/2010 12:29 AM, Pekka Enberg wrote:
> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>> Each zram device maintains an array (table) that maps
>> index within the device to the location of corresponding
>> compressed chunk. Currently we store 'struct page' pointer,
>> offset with page and various flags separately which takes
>> 12 bytes per table entry. Now all these are encoded in a
>> single 'phys_add_t' value which results in savings of 4 bytes
>> per entry (except on PAE systems).
>>
>> Unfortunately, cleanups related to some variable renames
>> were mixed in this patch. So, please bear some additional
>> noise.
> 
> The noise makes this patch pretty difficult to review properly. Care
> to spilt the patch into two pieces?
> 

Ok, I will split them as separate patches.

Thanks for all the reviews and Acks.
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 04/10] Use percpu buffers
  2010-08-10  4:47       ` Nitin Gupta
@ 2010-08-10  5:05         ` Pekka Enberg
  -1 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-10  5:05 UTC (permalink / raw)
  To: ngupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

  Hi Nitin,

On 10.8.2010 7.47, Nitin Gupta wrote:
> On 08/10/2010 12:27 AM, Pekka Enberg wrote:
>> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta<ngupta@vflare.org>  wrote:
>>> @@ -303,38 +307,41 @@ static int zram_write(struct zram *zram, struct bio *bio)
>>>                                 zram_test_flag(zram, index, ZRAM_ZERO))
>>>                         zram_free_page(zram, index);
>>>
>>> -               mutex_lock(&zram->lock);
>>> +               preempt_disable();
>>> +               zbuffer = __get_cpu_var(compress_buffer);
>>> +               zworkmem = __get_cpu_var(compress_workmem);
>>> +               if (unlikely(!zbuffer || !zworkmem)) {
>>> +                       preempt_enable();
>>> +                       goto out;
>>> +               }
>> The per-CPU buffer thing with this preempt_disable() trickery looks
>> overkill to me. Most block device drivers seem to use mempool_alloc()
>> for this sort of thing. Is there some reason you can't use that here?
>>
> Other block drivers are allocating relatively small structs using
> mempool_alloc(). However, in case of zram, these buffers are quite
> large (compress_workmem is 64K!). So, allocating them on every write
> would probably be much slower than using a pre-allocated per-cpu buffer.
The mempool API is precisely for that - using pre-allocated buffers 
instead of allocating every time. The preempt_disable() games make the 
code complex and have the downside of higher scheduling latencies so why 
not give mempools a try?

             Pekka

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 04/10] Use percpu buffers
@ 2010-08-10  5:05         ` Pekka Enberg
  0 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-10  5:05 UTC (permalink / raw)
  To: ngupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

  Hi Nitin,

On 10.8.2010 7.47, Nitin Gupta wrote:
> On 08/10/2010 12:27 AM, Pekka Enberg wrote:
>> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta<ngupta@vflare.org>  wrote:
>>> @@ -303,38 +307,41 @@ static int zram_write(struct zram *zram, struct bio *bio)
>>>                                 zram_test_flag(zram, index, ZRAM_ZERO))
>>>                         zram_free_page(zram, index);
>>>
>>> -               mutex_lock(&zram->lock);
>>> +               preempt_disable();
>>> +               zbuffer = __get_cpu_var(compress_buffer);
>>> +               zworkmem = __get_cpu_var(compress_workmem);
>>> +               if (unlikely(!zbuffer || !zworkmem)) {
>>> +                       preempt_enable();
>>> +                       goto out;
>>> +               }
>> The per-CPU buffer thing with this preempt_disable() trickery looks
>> overkill to me. Most block device drivers seem to use mempool_alloc()
>> for this sort of thing. Is there some reason you can't use that here?
>>
> Other block drivers are allocating relatively small structs using
> mempool_alloc(). However, in case of zram, these buffers are quite
> large (compress_workmem is 64K!). So, allocating them on every write
> would probably be much slower than using a pre-allocated per-cpu buffer.
The mempool API is precisely for that - using pre-allocated buffers 
instead of allocating every time. The preempt_disable() games make the 
code complex and have the downside of higher scheduling latencies so why 
not give mempools a try?

             Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 04/10] Use percpu buffers
  2010-08-10  5:05         ` Pekka Enberg
@ 2010-08-10  5:32           ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-10  5:32 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

Hi,

On 08/10/2010 10:35 AM, Pekka Enberg wrote:
> On 10.8.2010 7.47, Nitin Gupta wrote:
>> On 08/10/2010 12:27 AM, Pekka Enberg wrote:
>>> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta<ngupta@vflare.org>  wrote:
>>>> @@ -303,38 +307,41 @@ static int zram_write(struct zram *zram, struct bio *bio)
>>>>                                 zram_test_flag(zram, index, ZRAM_ZERO))
>>>>                         zram_free_page(zram, index);
>>>>
>>>> -               mutex_lock(&zram->lock);
>>>> +               preempt_disable();
>>>> +               zbuffer = __get_cpu_var(compress_buffer);
>>>> +               zworkmem = __get_cpu_var(compress_workmem);
>>>> +               if (unlikely(!zbuffer || !zworkmem)) {
>>>> +                       preempt_enable();
>>>> +                       goto out;
>>>> +               }
>>> The per-CPU buffer thing with this preempt_disable() trickery looks
>>> overkill to me. Most block device drivers seem to use mempool_alloc()
>>> for this sort of thing. Is there some reason you can't use that here?
>>>
>> Other block drivers are allocating relatively small structs using
>> mempool_alloc(). However, in case of zram, these buffers are quite
>> large (compress_workmem is 64K!). So, allocating them on every write
>> would probably be much slower than using a pre-allocated per-cpu buffer.
> The mempool API is precisely for that - using pre-allocated buffers instead of allocating every time. The preempt_disable() games make the code complex and have the downside of higher scheduling latencies so why not give mempools a try?
> 

mempool_alloc() first calls alloc_fn with ~(__GFP_WAIT | __GFP_IO)
and *then* falls down to pre-allocated buffers. So, it will always
be slower than directly using pre-allocated buffers as is done
currently.

One trick we can use is to have alloc_fn such that it always returns
failure with ~__GFP_WAIT and do actual allocation otherwise. But still
it seems like unnecessary cost.

Thanks,
Nitin
 



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 04/10] Use percpu buffers
@ 2010-08-10  5:32           ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-10  5:32 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

Hi,

On 08/10/2010 10:35 AM, Pekka Enberg wrote:
> On 10.8.2010 7.47, Nitin Gupta wrote:
>> On 08/10/2010 12:27 AM, Pekka Enberg wrote:
>>> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta<ngupta@vflare.org>  wrote:
>>>> @@ -303,38 +307,41 @@ static int zram_write(struct zram *zram, struct bio *bio)
>>>>                                 zram_test_flag(zram, index, ZRAM_ZERO))
>>>>                         zram_free_page(zram, index);
>>>>
>>>> -               mutex_lock(&zram->lock);
>>>> +               preempt_disable();
>>>> +               zbuffer = __get_cpu_var(compress_buffer);
>>>> +               zworkmem = __get_cpu_var(compress_workmem);
>>>> +               if (unlikely(!zbuffer || !zworkmem)) {
>>>> +                       preempt_enable();
>>>> +                       goto out;
>>>> +               }
>>> The per-CPU buffer thing with this preempt_disable() trickery looks
>>> overkill to me. Most block device drivers seem to use mempool_alloc()
>>> for this sort of thing. Is there some reason you can't use that here?
>>>
>> Other block drivers are allocating relatively small structs using
>> mempool_alloc(). However, in case of zram, these buffers are quite
>> large (compress_workmem is 64K!). So, allocating them on every write
>> would probably be much slower than using a pre-allocated per-cpu buffer.
> The mempool API is precisely for that - using pre-allocated buffers instead of allocating every time. The preempt_disable() games make the code complex and have the downside of higher scheduling latencies so why not give mempools a try?
> 

mempool_alloc() first calls alloc_fn with ~(__GFP_WAIT | __GFP_IO)
and *then* falls down to pre-allocated buffers. So, it will always
be slower than directly using pre-allocated buffers as is done
currently.

One trick we can use is to have alloc_fn such that it always returns
failure with ~__GFP_WAIT and do actual allocation otherwise. But still
it seems like unnecessary cost.

Thanks,
Nitin
 


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 04/10] Use percpu buffers
  2010-08-10  5:32           ` Nitin Gupta
@ 2010-08-10  7:36             ` Pekka Enberg
  -1 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-10  7:36 UTC (permalink / raw)
  To: ngupta, jaxboe
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

Hi Nitin,

On 8/10/10 8:32 AM, Nitin Gupta wrote:
> Other block drivers are allocating relatively small structs using
>>> mempool_alloc(). However, in case of zram, these buffers are quite
>>> large (compress_workmem is 64K!). So, allocating them on every write
>>> would probably be much slower than using a pre-allocated per-cpu buffer.
>>>        
>> The mempool API is precisely for that - using pre-allocated buffers instead of allocating every time. The preempt_disable() games make the code complex and have the downside of higher scheduling latencies so why not give mempools a try?
>>      
> mempool_alloc() first calls alloc_fn with ~(__GFP_WAIT | __GFP_IO)
> and *then* falls down to pre-allocated buffers. So, it will always
> be slower than directly using pre-allocated buffers as is done
> currently.
>
> One trick we can use is to have alloc_fn such that it always returns
> failure with ~__GFP_WAIT and do actual allocation otherwise. But still
> it seems like unnecessary cost.
>    
We can always extend the mempool API with mempool_prealloc() function if 
that turns out to be a problem. The per-CPU buffer with 
preempt_disable() trickery isn't really the proper thing to do here. It 
doesn't make much sense to disable preemption for compression that's 
purely CPU bound.

                     Pekka

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 04/10] Use percpu buffers
@ 2010-08-10  7:36             ` Pekka Enberg
  0 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-08-10  7:36 UTC (permalink / raw)
  To: ngupta, jaxboe
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

Hi Nitin,

On 8/10/10 8:32 AM, Nitin Gupta wrote:
> Other block drivers are allocating relatively small structs using
>>> mempool_alloc(). However, in case of zram, these buffers are quite
>>> large (compress_workmem is 64K!). So, allocating them on every write
>>> would probably be much slower than using a pre-allocated per-cpu buffer.
>>>        
>> The mempool API is precisely for that - using pre-allocated buffers instead of allocating every time. The preempt_disable() games make the code complex and have the downside of higher scheduling latencies so why not give mempools a try?
>>      
> mempool_alloc() first calls alloc_fn with ~(__GFP_WAIT | __GFP_IO)
> and *then* falls down to pre-allocated buffers. So, it will always
> be slower than directly using pre-allocated buffers as is done
> currently.
>
> One trick we can use is to have alloc_fn such that it always returns
> failure with ~__GFP_WAIT and do actual allocation otherwise. But still
> it seems like unnecessary cost.
>    
We can always extend the mempool API with mempool_prealloc() function if 
that turns out to be a problem. The per-CPU buffer with 
preempt_disable() trickery isn't really the proper thing to do here. It 
doesn't make much sense to disable preemption for compression that's 
purely CPU bound.

                     Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 06/10] Block discard support
  2010-08-10  4:54         ` Nitin Gupta
  (?)
@ 2010-08-10 15:54           ` Jens Axboe
  -1 siblings, 0 replies; 137+ messages in thread
From: Jens Axboe @ 2010-08-10 15:54 UTC (permalink / raw)
  To: ngupta
  Cc: Pekka Enberg, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel, linux-fsdevel

On 08/10/2010 12:54 AM, Nitin Gupta wrote:
> On 08/10/2010 07:53 AM, Jens Axboe wrote:
>> On 08/09/2010 03:03 PM, Pekka Enberg wrote:
>>> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>>>> The 'discard' bio discard request provides information to
>>>> zram disks regarding blocks which are no longer in use by
>>>> filesystem. This allows freeing memory allocated for such
>>>> blocks.
>>>>
>>>> When zram devices are used as swap disks, we already have
>>>> a callback (block_device_operations->swap_slot_free_notify).
>>>> So, the discard support is useful only when used as generic
>>>> (non-swap) disk.
>>>>
>>>> Signed-off-by: Nitin Gupta <ngupta@vflare.org>
>>>
>>> Lets CC fsdevel and Jens for this.
>>
>> Looks OK from a quick look. One comment, though:
>>
>>>> +static void zram_discard(struct zram *zram, struct bio *bio)
>>>> +{
>>>> +       size_t bytes = bio->bi_size;
>>>> +       sector_t sector = bio->bi_sector;
>>>> +
>>>> +       while (bytes >= PAGE_SIZE) {
>>>> +               zram_free_page(zram, sector >> SECTORS_PER_PAGE_SHIFT);
>>>> +               sector += PAGE_SIZE >> SECTOR_SHIFT;
>>>> +               bytes -= PAGE_SIZE;
>>>> +       }
>>>> +
>>>> +       bio_endio(bio, 0);
>>>> +}
>>>> +
>>
>> So freeing the page here will guarantee zeroed return on read?
> 
> For reads on freed/unwritten sectors, it simply returns success and
> does not touch the bio page. Is it better to zero the page in such
> cases?

Well, you told the kernel that you return zeroes on discarded ranges:

        zram->disk->queue->limits.discard_zeroes_data = 1;

So yes, if you intend to keep that, then you need to zero the
incoming pages that have been explicitly trimmed by a discard.

-- 
Jens Axboe


Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 06/10] Block discard support
@ 2010-08-10 15:54           ` Jens Axboe
  0 siblings, 0 replies; 137+ messages in thread
From: Jens Axboe @ 2010-08-10 15:54 UTC (permalink / raw)
  To: ngupta
  Cc: Pekka Enberg, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel, linux-fsdevel

On 08/10/2010 12:54 AM, Nitin Gupta wrote:
> On 08/10/2010 07:53 AM, Jens Axboe wrote:
>> On 08/09/2010 03:03 PM, Pekka Enberg wrote:
>>> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>>>> The 'discard' bio discard request provides information to
>>>> zram disks regarding blocks which are no longer in use by
>>>> filesystem. This allows freeing memory allocated for such
>>>> blocks.
>>>>
>>>> When zram devices are used as swap disks, we already have
>>>> a callback (block_device_operations->swap_slot_free_notify).
>>>> So, the discard support is useful only when used as generic
>>>> (non-swap) disk.
>>>>
>>>> Signed-off-by: Nitin Gupta <ngupta@vflare.org>
>>>
>>> Lets CC fsdevel and Jens for this.
>>
>> Looks OK from a quick look. One comment, though:
>>
>>>> +static void zram_discard(struct zram *zram, struct bio *bio)
>>>> +{
>>>> +       size_t bytes = bio->bi_size;
>>>> +       sector_t sector = bio->bi_sector;
>>>> +
>>>> +       while (bytes >= PAGE_SIZE) {
>>>> +               zram_free_page(zram, sector >> SECTORS_PER_PAGE_SHIFT);
>>>> +               sector += PAGE_SIZE >> SECTOR_SHIFT;
>>>> +               bytes -= PAGE_SIZE;
>>>> +       }
>>>> +
>>>> +       bio_endio(bio, 0);
>>>> +}
>>>> +
>>
>> So freeing the page here will guarantee zeroed return on read?
> 
> For reads on freed/unwritten sectors, it simply returns success and
> does not touch the bio page. Is it better to zero the page in such
> cases?

Well, you told the kernel that you return zeroes on discarded ranges:

        zram->disk->queue->limits.discard_zeroes_data = 1;

So yes, if you intend to keep that, then you need to zero the
incoming pages that have been explicitly trimmed by a discard.

-- 
Jens Axboe


Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 06/10] Block discard support
@ 2010-08-10 15:54           ` Jens Axboe
  0 siblings, 0 replies; 137+ messages in thread
From: Jens Axboe @ 2010-08-10 15:54 UTC (permalink / raw)
  To: ngupta
  Cc: Pekka Enberg, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel, linux-fsdevel

On 08/10/2010 12:54 AM, Nitin Gupta wrote:
> On 08/10/2010 07:53 AM, Jens Axboe wrote:
>> On 08/09/2010 03:03 PM, Pekka Enberg wrote:
>>> On Mon, Aug 9, 2010 at 8:26 PM, Nitin Gupta <ngupta@vflare.org> wrote:
>>>> The 'discard' bio discard request provides information to
>>>> zram disks regarding blocks which are no longer in use by
>>>> filesystem. This allows freeing memory allocated for such
>>>> blocks.
>>>>
>>>> When zram devices are used as swap disks, we already have
>>>> a callback (block_device_operations->swap_slot_free_notify).
>>>> So, the discard support is useful only when used as generic
>>>> (non-swap) disk.
>>>>
>>>> Signed-off-by: Nitin Gupta <ngupta@vflare.org>
>>>
>>> Lets CC fsdevel and Jens for this.
>>
>> Looks OK from a quick look. One comment, though:
>>
>>>> +static void zram_discard(struct zram *zram, struct bio *bio)
>>>> +{
>>>> +       size_t bytes = bio->bi_size;
>>>> +       sector_t sector = bio->bi_sector;
>>>> +
>>>> +       while (bytes >= PAGE_SIZE) {
>>>> +               zram_free_page(zram, sector >> SECTORS_PER_PAGE_SHIFT);
>>>> +               sector += PAGE_SIZE >> SECTOR_SHIFT;
>>>> +               bytes -= PAGE_SIZE;
>>>> +       }
>>>> +
>>>> +       bio_endio(bio, 0);
>>>> +}
>>>> +
>>
>> So freeing the page here will guarantee zeroed return on read?
> 
> For reads on freed/unwritten sectors, it simply returns success and
> does not touch the bio page. Is it better to zero the page in such
> cases?

Well, you told the kernel that you return zeroes on discarded ranges:

        zram->disk->queue->limits.discard_zeroes_data = 1;

So yes, if you intend to keep that, then you need to zero the
incoming pages that have been explicitly trimmed by a discard.

-- 
Jens Axboe


Confidentiality Notice: This e-mail message, its contents and any attachments to it are confidential to the intended recipient, and may contain information that is privileged and/or exempt from disclosure under applicable law. If you are not the intended recipient, please immediately notify the sender and destroy the original e-mail message and any attachments (and any copies that may have been made) from your system or otherwise. Any unauthorized use, copying, disclosure or distribution of this information is strictly prohibited.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
  2010-08-10  4:34     ` Andrew Morton
@ 2010-08-11 16:39       ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-11 16:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Pekka Enberg, Minchan Kim, Greg KH, Linux Driver Project,
	linux-mm, linux-kernel

On 08/10/2010 10:04 AM, Andrew Morton wrote:
> On Mon,  9 Aug 2010 22:56:49 +0530 Nitin Gupta <ngupta@vflare.org> wrote:
> 
>> +/*
>> + * Individual percpu values can go negative but the sum across all CPUs
>> + * must always be positive (we store various counts). So, return sum as
>> + * unsigned value.
>> + */
>> +static u64 zram_get_stat(struct zram *zram, enum zram_stats_index idx)
>>  {
>> -	u64 val;
>> -
>> -	spin_lock(&zram->stat64_lock);
>> -	val = *v;
>> -	spin_unlock(&zram->stat64_lock);
>> +	int cpu;
>> +	s64 val = 0;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		s64 temp;
>> +		unsigned int start;
>> +		struct zram_stats_cpu *stats;
>> +
>> +		stats = per_cpu_ptr(zram->stats, cpu);
>> +		do {
>> +			start = u64_stats_fetch_begin(&stats->syncp);
>> +			temp = stats->count[idx];
>> +		} while (u64_stats_fetch_retry(&stats->syncp, start));
>> +		val += temp;
>> +	}
>>  
>> +	WARN_ON(val < 0);
>>  	return val;
>>  }
> 
> That reimplements include/linux/percpu_counter.h, poorly.
> 
> Please see the June discussion "[PATCH v2 1/2] tmpfs: Quick token
> library to allow scalable retrieval of tokens from token jar" for some
> discussion.
> 
> 

I read the discussion you pointed out but still fail to see how percpu_counters,
with all their overhead, are better than simple pcpu variable used in current
version. What is the advantage?

Thanks,
Nitin


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
@ 2010-08-11 16:39       ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-11 16:39 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Pekka Enberg, Minchan Kim, Greg KH, Linux Driver Project,
	linux-mm, linux-kernel

On 08/10/2010 10:04 AM, Andrew Morton wrote:
> On Mon,  9 Aug 2010 22:56:49 +0530 Nitin Gupta <ngupta@vflare.org> wrote:
> 
>> +/*
>> + * Individual percpu values can go negative but the sum across all CPUs
>> + * must always be positive (we store various counts). So, return sum as
>> + * unsigned value.
>> + */
>> +static u64 zram_get_stat(struct zram *zram, enum zram_stats_index idx)
>>  {
>> -	u64 val;
>> -
>> -	spin_lock(&zram->stat64_lock);
>> -	val = *v;
>> -	spin_unlock(&zram->stat64_lock);
>> +	int cpu;
>> +	s64 val = 0;
>> +
>> +	for_each_possible_cpu(cpu) {
>> +		s64 temp;
>> +		unsigned int start;
>> +		struct zram_stats_cpu *stats;
>> +
>> +		stats = per_cpu_ptr(zram->stats, cpu);
>> +		do {
>> +			start = u64_stats_fetch_begin(&stats->syncp);
>> +			temp = stats->count[idx];
>> +		} while (u64_stats_fetch_retry(&stats->syncp, start));
>> +		val += temp;
>> +	}
>>  
>> +	WARN_ON(val < 0);
>>  	return val;
>>  }
> 
> That reimplements include/linux/percpu_counter.h, poorly.
> 
> Please see the June discussion "[PATCH v2 1/2] tmpfs: Quick token
> library to allow scalable retrieval of tokens from token jar" for some
> discussion.
> 
> 

I read the discussion you pointed out but still fail to see how percpu_counters,
with all their overhead, are better than simple pcpu variable used in current
version. What is the advantage?

Thanks,
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
  2010-08-11 16:39       ` Nitin Gupta
@ 2010-08-11 17:18         ` Andrew Morton
  -1 siblings, 0 replies; 137+ messages in thread
From: Andrew Morton @ 2010-08-11 17:18 UTC (permalink / raw)
  To: ngupta
  Cc: Pekka Enberg, Minchan Kim, Greg KH, Linux Driver Project,
	linux-mm, linux-kernel

On Wed, 11 Aug 2010 22:09:29 +0530 Nitin Gupta <ngupta@vflare.org> wrote:

> On 08/10/2010 10:04 AM, Andrew Morton wrote:
> > On Mon,  9 Aug 2010 22:56:49 +0530 Nitin Gupta <ngupta@vflare.org> wrote:
> > 
> >> +/*
> >> + * Individual percpu values can go negative but the sum across all CPUs
> >> + * must always be positive (we store various counts). So, return sum as
> >> + * unsigned value.
> >> + */
> >> +static u64 zram_get_stat(struct zram *zram, enum zram_stats_index idx)
> >>  {
> >> -	u64 val;
> >> -
> >> -	spin_lock(&zram->stat64_lock);
> >> -	val = *v;
> >> -	spin_unlock(&zram->stat64_lock);
> >> +	int cpu;
> >> +	s64 val = 0;
> >> +
> >> +	for_each_possible_cpu(cpu) {
> >> +		s64 temp;
> >> +		unsigned int start;
> >> +		struct zram_stats_cpu *stats;
> >> +
> >> +		stats = per_cpu_ptr(zram->stats, cpu);
> >> +		do {
> >> +			start = u64_stats_fetch_begin(&stats->syncp);
> >> +			temp = stats->count[idx];
> >> +		} while (u64_stats_fetch_retry(&stats->syncp, start));
> >> +		val += temp;
> >> +	}
> >>  
> >> +	WARN_ON(val < 0);
> >>  	return val;
> >>  }
> > 
> > That reimplements include/linux/percpu_counter.h, poorly.
> > 
> > Please see the June discussion "[PATCH v2 1/2] tmpfs: Quick token
> > library to allow scalable retrieval of tokens from token jar" for some
> > discussion.
> > 
> > 
> 
> I read the discussion you pointed out but still fail to see how percpu_counters,
> with all their overhead,

What overhead?  Send numbers.  Then extrapolate those numbers to a
machine which has 128 possible CPUs and 4 present CPUs.

> are better than simple pcpu variable used in current
> version. What is the advantage?

Firstly, they'd have saved all the time you spent duplicating them.

Secondly, getting additional users of the standard facility results in
more testing and perhaps enhancement of that facility, thus benefiting
other users too.

Thirdly, using the standard facility permits your code to leverage
enhancements which others add. 

Fourthly, they would result in a smaller kernel.

You didn't really need me to teach you the benefits of code reuse did
you?


Please do not merge this code unless there is a good reason to do so
and it has been shown that the standard facility cannot be suitably
fixed or enhanced to address the deficiency.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
@ 2010-08-11 17:18         ` Andrew Morton
  0 siblings, 0 replies; 137+ messages in thread
From: Andrew Morton @ 2010-08-11 17:18 UTC (permalink / raw)
  To: ngupta
  Cc: Pekka Enberg, Minchan Kim, Greg KH, Linux Driver Project,
	linux-mm, linux-kernel

On Wed, 11 Aug 2010 22:09:29 +0530 Nitin Gupta <ngupta@vflare.org> wrote:

> On 08/10/2010 10:04 AM, Andrew Morton wrote:
> > On Mon,  9 Aug 2010 22:56:49 +0530 Nitin Gupta <ngupta@vflare.org> wrote:
> > 
> >> +/*
> >> + * Individual percpu values can go negative but the sum across all CPUs
> >> + * must always be positive (we store various counts). So, return sum as
> >> + * unsigned value.
> >> + */
> >> +static u64 zram_get_stat(struct zram *zram, enum zram_stats_index idx)
> >>  {
> >> -	u64 val;
> >> -
> >> -	spin_lock(&zram->stat64_lock);
> >> -	val = *v;
> >> -	spin_unlock(&zram->stat64_lock);
> >> +	int cpu;
> >> +	s64 val = 0;
> >> +
> >> +	for_each_possible_cpu(cpu) {
> >> +		s64 temp;
> >> +		unsigned int start;
> >> +		struct zram_stats_cpu *stats;
> >> +
> >> +		stats = per_cpu_ptr(zram->stats, cpu);
> >> +		do {
> >> +			start = u64_stats_fetch_begin(&stats->syncp);
> >> +			temp = stats->count[idx];
> >> +		} while (u64_stats_fetch_retry(&stats->syncp, start));
> >> +		val += temp;
> >> +	}
> >>  
> >> +	WARN_ON(val < 0);
> >>  	return val;
> >>  }
> > 
> > That reimplements include/linux/percpu_counter.h, poorly.
> > 
> > Please see the June discussion "[PATCH v2 1/2] tmpfs: Quick token
> > library to allow scalable retrieval of tokens from token jar" for some
> > discussion.
> > 
> > 
> 
> I read the discussion you pointed out but still fail to see how percpu_counters,
> with all their overhead,

What overhead?  Send numbers.  Then extrapolate those numbers to a
machine which has 128 possible CPUs and 4 present CPUs.

> are better than simple pcpu variable used in current
> version. What is the advantage?

Firstly, they'd have saved all the time you spent duplicating them.

Secondly, getting additional users of the standard facility results in
more testing and perhaps enhancement of that facility, thus benefiting
other users too.

Thirdly, using the standard facility permits your code to leverage
enhancements which others add. 

Fourthly, they would result in a smaller kernel.

You didn't really need me to teach you the benefits of code reuse did
you?


Please do not merge this code unless there is a good reason to do so
and it has been shown that the standard facility cannot be suitably
fixed or enhanced to address the deficiency.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
  2010-08-09 17:26   ` Nitin Gupta
@ 2010-08-30 16:20     ` Christoph Lameter
  -1 siblings, 0 replies; 137+ messages in thread
From: Christoph Lameter @ 2010-08-30 16:20 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Mon, 9 Aug 2010, Nitin Gupta wrote:

> -static void zram_stat_inc(u32 *v)
> +static void zram_add_stat(struct zram *zram,
> +			enum zram_stats_index idx, s64 val)
>  {
> -	*v = *v + 1;
> +	struct zram_stats_cpu *stats;
> +
> +	preempt_disable();
> +	stats = __this_cpu_ptr(zram->stats);
> +	u64_stats_update_begin(&stats->syncp);
> +	stats->count[idx] += val;
> +	u64_stats_update_end(&stats->syncp);
> +	preempt_enable();

Maybe do

#define zram_add_stat(zram, index, val)
		this_cpu_add(zram->stats->count[index], val)

instead? It creates an add in a single "atomic" per cpu instruction and
deals with the fallback scenarios for processors that cannot handle 64
bit adds.


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
@ 2010-08-30 16:20     ` Christoph Lameter
  0 siblings, 0 replies; 137+ messages in thread
From: Christoph Lameter @ 2010-08-30 16:20 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Mon, 9 Aug 2010, Nitin Gupta wrote:

> -static void zram_stat_inc(u32 *v)
> +static void zram_add_stat(struct zram *zram,
> +			enum zram_stats_index idx, s64 val)
>  {
> -	*v = *v + 1;
> +	struct zram_stats_cpu *stats;
> +
> +	preempt_disable();
> +	stats = __this_cpu_ptr(zram->stats);
> +	u64_stats_update_begin(&stats->syncp);
> +	stats->count[idx] += val;
> +	u64_stats_update_end(&stats->syncp);
> +	preempt_enable();

Maybe do

#define zram_add_stat(zram, index, val)
		this_cpu_add(zram->stats->count[index], val)

instead? It creates an add in a single "atomic" per cpu instruction and
deals with the fallback scenarios for processors that cannot handle 64
bit adds.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
  2010-08-09 17:26   ` Nitin Gupta
@ 2010-08-31  5:36     ` Anton Blanchard
  -1 siblings, 0 replies; 137+ messages in thread
From: Anton Blanchard @ 2010-08-31  5:36 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel


Hi,

> +	zram->stats = alloc_percpu(struct zram_stats_cpu);
> +	if (!zram->stats) {
> +		pr_err("Error allocating percpu stats\n");
> +		ret = -ENOMEM;
> +		goto fail;
> +	}

There doesn't seem to be a free_percpu() in the module exit path. Something
like this perhaps?

Anton
--

zram: Free percpu data on module exit.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: powerpc.git/drivers/staging/zram/zram_drv.c
===================================================================
--- powerpc.git.orig/drivers/staging/zram/zram_drv.c	2010-08-31 15:15:59.344290847 +1000
+++ powerpc.git/drivers/staging/zram/zram_drv.c	2010-08-31 15:17:00.383045836 +1000
@@ -483,8 +483,7 @@ void zram_reset_device(struct zram *zram
 	xv_destroy_pool(zram->mem_pool);
 	zram->mem_pool = NULL;
 
-	/* Reset stats */
-	memset(&zram->stats, 0, sizeof(zram->stats));
+	free_percpu(&zram->stats);
 
 	zram->disksize = zram_default_disksize();
 	mutex_unlock(&zram->init_lock);

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
@ 2010-08-31  5:36     ` Anton Blanchard
  0 siblings, 0 replies; 137+ messages in thread
From: Anton Blanchard @ 2010-08-31  5:36 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel


Hi,

> +	zram->stats = alloc_percpu(struct zram_stats_cpu);
> +	if (!zram->stats) {
> +		pr_err("Error allocating percpu stats\n");
> +		ret = -ENOMEM;
> +		goto fail;
> +	}

There doesn't seem to be a free_percpu() in the module exit path. Something
like this perhaps?

Anton
--

zram: Free percpu data on module exit.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: powerpc.git/drivers/staging/zram/zram_drv.c
===================================================================
--- powerpc.git.orig/drivers/staging/zram/zram_drv.c	2010-08-31 15:15:59.344290847 +1000
+++ powerpc.git/drivers/staging/zram/zram_drv.c	2010-08-31 15:17:00.383045836 +1000
@@ -483,8 +483,7 @@ void zram_reset_device(struct zram *zram
 	xv_destroy_pool(zram->mem_pool);
 	zram->mem_pool = NULL;
 
-	/* Reset stats */
-	memset(&zram->stats, 0, sizeof(zram->stats));
+	free_percpu(&zram->stats);
 
 	zram->disksize = zram_default_disksize();
 	mutex_unlock(&zram->init_lock);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
  2010-08-30 16:20     ` Christoph Lameter
@ 2010-08-31 20:31       ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-31 20:31 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Mon, Aug 30, 2010 at 12:20 PM, Christoph Lameter <cl@linux.com> wrote:
> On Mon, 9 Aug 2010, Nitin Gupta wrote:
>
>> -static void zram_stat_inc(u32 *v)
>> +static void zram_add_stat(struct zram *zram,
>> +                     enum zram_stats_index idx, s64 val)
>>  {
>> -     *v = *v + 1;
>> +     struct zram_stats_cpu *stats;
>> +
>> +     preempt_disable();
>> +     stats = __this_cpu_ptr(zram->stats);
>> +     u64_stats_update_begin(&stats->syncp);
>> +     stats->count[idx] += val;
>> +     u64_stats_update_end(&stats->syncp);
>> +     preempt_enable();
>
> Maybe do
>
> #define zram_add_stat(zram, index, val)
>                this_cpu_add(zram->stats->count[index], val)
>
> instead? It creates an add in a single "atomic" per cpu instruction and
> deals with the fallback scenarios for processors that cannot handle 64
> bit adds.
>
>

Yes, this_cpu_add() seems sufficient. I can't recall why I used u64_stats_*
but if it's not required for atomic access to 64-bit then why was it added to
the mainline in the first place?

Thanks,
Nitin

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
@ 2010-08-31 20:31       ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-08-31 20:31 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Mon, Aug 30, 2010 at 12:20 PM, Christoph Lameter <cl@linux.com> wrote:
> On Mon, 9 Aug 2010, Nitin Gupta wrote:
>
>> -static void zram_stat_inc(u32 *v)
>> +static void zram_add_stat(struct zram *zram,
>> +                     enum zram_stats_index idx, s64 val)
>>  {
>> -     *v = *v + 1;
>> +     struct zram_stats_cpu *stats;
>> +
>> +     preempt_disable();
>> +     stats = __this_cpu_ptr(zram->stats);
>> +     u64_stats_update_begin(&stats->syncp);
>> +     stats->count[idx] += val;
>> +     u64_stats_update_end(&stats->syncp);
>> +     preempt_enable();
>
> Maybe do
>
> #define zram_add_stat(zram, index, val)
>                this_cpu_add(zram->stats->count[index], val)
>
> instead? It creates an add in a single "atomic" per cpu instruction and
> deals with the fallback scenarios for processors that cannot handle 64
> bit adds.
>
>

Yes, this_cpu_add() seems sufficient. I can't recall why I used u64_stats_*
but if it's not required for atomic access to 64-bit then why was it added to
the mainline in the first place?

Thanks,
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
  2010-08-31 20:31       ` Nitin Gupta
@ 2010-08-31 21:28         ` Eric Dumazet
  -1 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2010-08-31 21:28 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Christoph Lameter, Pekka Enberg, Minchan Kim, Andrew Morton,
	Greg KH, Linux Driver Project, linux-mm, linux-kernel

Le mardi 31 août 2010 à 16:31 -0400, Nitin Gupta a écrit :
> On Mon, Aug 30, 2010 at 12:20 PM, Christoph Lameter <cl@linux.com> wrote:
> > On Mon, 9 Aug 2010, Nitin Gupta wrote:
> >
> >> -static void zram_stat_inc(u32 *v)
> >> +static void zram_add_stat(struct zram *zram,
> >> +                     enum zram_stats_index idx, s64 val)
> >>  {
> >> -     *v = *v + 1;
> >> +     struct zram_stats_cpu *stats;
> >> +
> >> +     preempt_disable();
> >> +     stats = __this_cpu_ptr(zram->stats);
> >> +     u64_stats_update_begin(&stats->syncp);
> >> +     stats->count[idx] += val;
> >> +     u64_stats_update_end(&stats->syncp);
> >> +     preempt_enable();
> >
> > Maybe do
> >
> > #define zram_add_stat(zram, index, val)
> >                this_cpu_add(zram->stats->count[index], val)
> >
> > instead? It creates an add in a single "atomic" per cpu instruction and
> > deals with the fallback scenarios for processors that cannot handle 64
> > bit adds.
> >
> >
> 
> Yes, this_cpu_add() seems sufficient. I can't recall why I used u64_stats_*
> but if it's not required for atomic access to 64-bit then why was it added to
> the mainline in the first place?

Because we wanted to have fast 64bit counters, even on 32bit arches, and
this has litle to do with 'atomic' on one entity, but a group of
counters. (check drivers/net/loopback.c, lines 91-94). No lock prefix
used in fast path.

We also wanted readers to read correct values, not a value being changed
by a writer, with inconsistent 32bit halves. SNMP applications want
monotonically increasing counters.

this_cpu_add()/this_cpu_read() doesnt fit.

Even for single counter, this_cpu_read(64bit) is not using an RMW
(cmpxchg8) instruction, so you can get very strange results when low
order 32bit wraps.




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
@ 2010-08-31 21:28         ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2010-08-31 21:28 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Christoph Lameter, Pekka Enberg, Minchan Kim, Andrew Morton,
	Greg KH, Linux Driver Project, linux-mm, linux-kernel

Le mardi 31 aoA>>t 2010 A  16:31 -0400, Nitin Gupta a A(C)crit :
> On Mon, Aug 30, 2010 at 12:20 PM, Christoph Lameter <cl@linux.com> wrote:
> > On Mon, 9 Aug 2010, Nitin Gupta wrote:
> >
> >> -static void zram_stat_inc(u32 *v)
> >> +static void zram_add_stat(struct zram *zram,
> >> +                     enum zram_stats_index idx, s64 val)
> >>  {
> >> -     *v = *v + 1;
> >> +     struct zram_stats_cpu *stats;
> >> +
> >> +     preempt_disable();
> >> +     stats = __this_cpu_ptr(zram->stats);
> >> +     u64_stats_update_begin(&stats->syncp);
> >> +     stats->count[idx] += val;
> >> +     u64_stats_update_end(&stats->syncp);
> >> +     preempt_enable();
> >
> > Maybe do
> >
> > #define zram_add_stat(zram, index, val)
> >                this_cpu_add(zram->stats->count[index], val)
> >
> > instead? It creates an add in a single "atomic" per cpu instruction and
> > deals with the fallback scenarios for processors that cannot handle 64
> > bit adds.
> >
> >
> 
> Yes, this_cpu_add() seems sufficient. I can't recall why I used u64_stats_*
> but if it's not required for atomic access to 64-bit then why was it added to
> the mainline in the first place?

Because we wanted to have fast 64bit counters, even on 32bit arches, and
this has litle to do with 'atomic' on one entity, but a group of
counters. (check drivers/net/loopback.c, lines 91-94). No lock prefix
used in fast path.

We also wanted readers to read correct values, not a value being changed
by a writer, with inconsistent 32bit halves. SNMP applications want
monotonically increasing counters.

this_cpu_add()/this_cpu_read() doesnt fit.

Even for single counter, this_cpu_read(64bit) is not using an RMW
(cmpxchg8) instruction, so you can get very strange results when low
order 32bit wraps.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
  2010-08-31 21:28         ` Eric Dumazet
@ 2010-08-31 21:35           ` Christoph Lameter
  -1 siblings, 0 replies; 137+ messages in thread
From: Christoph Lameter @ 2010-08-31 21:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Nitin Gupta, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Tue, 31 Aug 2010, Eric Dumazet wrote:

> > Yes, this_cpu_add() seems sufficient. I can't recall why I used u64_stats_*
> > but if it's not required for atomic access to 64-bit then why was it added to
> > the mainline in the first place?
>
> Because we wanted to have fast 64bit counters, even on 32bit arches, and
> this has litle to do with 'atomic' on one entity, but a group of
> counters. (check drivers/net/loopback.c, lines 91-94). No lock prefix
> used in fast path.
>
> We also wanted readers to read correct values, not a value being changed
> by a writer, with inconsistent 32bit halves. SNMP applications want
> monotonically increasing counters.
>
> this_cpu_add()/this_cpu_read() doesnt fit.
>
> Even for single counter, this_cpu_read(64bit) is not using an RMW
> (cmpxchg8) instruction, so you can get very strange results when low
> order 32bit wraps.

How about fixing it so that everyone benefits?


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
@ 2010-08-31 21:35           ` Christoph Lameter
  0 siblings, 0 replies; 137+ messages in thread
From: Christoph Lameter @ 2010-08-31 21:35 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Nitin Gupta, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Tue, 31 Aug 2010, Eric Dumazet wrote:

> > Yes, this_cpu_add() seems sufficient. I can't recall why I used u64_stats_*
> > but if it's not required for atomic access to 64-bit then why was it added to
> > the mainline in the first place?
>
> Because we wanted to have fast 64bit counters, even on 32bit arches, and
> this has litle to do with 'atomic' on one entity, but a group of
> counters. (check drivers/net/loopback.c, lines 91-94). No lock prefix
> used in fast path.
>
> We also wanted readers to read correct values, not a value being changed
> by a writer, with inconsistent 32bit halves. SNMP applications want
> monotonically increasing counters.
>
> this_cpu_add()/this_cpu_read() doesnt fit.
>
> Even for single counter, this_cpu_read(64bit) is not using an RMW
> (cmpxchg8) instruction, so you can get very strange results when low
> order 32bit wraps.

How about fixing it so that everyone benefits?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
  2010-08-31 21:35           ` Christoph Lameter
@ 2010-08-31 21:41             ` Eric Dumazet
  -1 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2010-08-31 21:41 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nitin Gupta, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

Le mardi 31 août 2010 à 16:35 -0500, Christoph Lameter a écrit :
> On Tue, 31 Aug 2010, Eric Dumazet wrote:
> 
> > > Yes, this_cpu_add() seems sufficient. I can't recall why I used u64_stats_*
> > > but if it's not required for atomic access to 64-bit then why was it added to
> > > the mainline in the first place?
> >
> > Because we wanted to have fast 64bit counters, even on 32bit arches, and
> > this has litle to do with 'atomic' on one entity, but a group of
> > counters. (check drivers/net/loopback.c, lines 91-94). No lock prefix
> > used in fast path.
> >
> > We also wanted readers to read correct values, not a value being changed
> > by a writer, with inconsistent 32bit halves. SNMP applications want
> > monotonically increasing counters.
> >
> > this_cpu_add()/this_cpu_read() doesnt fit.
> >
> > Even for single counter, this_cpu_read(64bit) is not using an RMW
> > (cmpxchg8) instruction, so you can get very strange results when low
> > order 32bit wraps.
> 
> How about fixing it so that everyone benefits?
> 

IMHO, this_cpu_read() is fine as is : a _read_ operation.

Dont pretend it can be used in every context, its not true.




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
@ 2010-08-31 21:41             ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2010-08-31 21:41 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nitin Gupta, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

Le mardi 31 aoA>>t 2010 A  16:35 -0500, Christoph Lameter a A(C)crit :
> On Tue, 31 Aug 2010, Eric Dumazet wrote:
> 
> > > Yes, this_cpu_add() seems sufficient. I can't recall why I used u64_stats_*
> > > but if it's not required for atomic access to 64-bit then why was it added to
> > > the mainline in the first place?
> >
> > Because we wanted to have fast 64bit counters, even on 32bit arches, and
> > this has litle to do with 'atomic' on one entity, but a group of
> > counters. (check drivers/net/loopback.c, lines 91-94). No lock prefix
> > used in fast path.
> >
> > We also wanted readers to read correct values, not a value being changed
> > by a writer, with inconsistent 32bit halves. SNMP applications want
> > monotonically increasing counters.
> >
> > this_cpu_add()/this_cpu_read() doesnt fit.
> >
> > Even for single counter, this_cpu_read(64bit) is not using an RMW
> > (cmpxchg8) instruction, so you can get very strange results when low
> > order 32bit wraps.
> 
> How about fixing it so that everyone benefits?
> 

IMHO, this_cpu_read() is fine as is : a _read_ operation.

Dont pretend it can be used in every context, its not true.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 00/10] zram: various improvements and cleanups
  2010-08-09 17:26 ` Nitin Gupta
@ 2010-08-31 22:37   ` Greg KH
  -1 siblings, 0 replies; 137+ messages in thread
From: Greg KH @ 2010-08-31 22:37 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Linux Driver Project,
	linux-mm, linux-kernel

On Mon, Aug 09, 2010 at 10:56:46PM +0530, Nitin Gupta wrote:
> The zram module creates RAM based block devices named /dev/zram<id>
> (<id> = 0, 1, ...). Pages written to these disks are compressed and stored
> in memory itself.
> 
> One of the major changes done is the replacement of ioctls with sysfs
> interface. One of the advantages of this approach is we no longer depend on the
> userspace tool (rzscontrol) which was used to set various parameters and check
> statistics. Maintaining updated version of rzscontrol as changes were done to
> ioctls, statistics exported etc. was a major pain.
> 
> Another significant change is the introduction of percpu stats and compression
> buffers. Earlier, we had per-device buffers protected by a mutex. This was a
> major bottleneck on multi-core systems. With these changes, benchmarks with
> fio[1] showed a speedup of about 20% for write performance on dual-core
> system (see patch 4/10 description for details).
> 
> 
> For easier testing, a single patch against 2.6.35-git8 has been uploaded at:
> http://compcache.googlecode.com/hg/sub-projects/mainline/zram_2.6.36-rc0.patch

I applied the first 2 and last 2 of these patches and they will show up
in the linux-next tree tomorrow.  I stopped there due to Andrew's
complaints about the per-cpu variable stuff.  Please resolve this and
redo the patch set and I will be glad to apply them.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 00/10] zram: various improvements and cleanups
@ 2010-08-31 22:37   ` Greg KH
  0 siblings, 0 replies; 137+ messages in thread
From: Greg KH @ 2010-08-31 22:37 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Linux Driver Project,
	linux-mm, linux-kernel

On Mon, Aug 09, 2010 at 10:56:46PM +0530, Nitin Gupta wrote:
> The zram module creates RAM based block devices named /dev/zram<id>
> (<id> = 0, 1, ...). Pages written to these disks are compressed and stored
> in memory itself.
> 
> One of the major changes done is the replacement of ioctls with sysfs
> interface. One of the advantages of this approach is we no longer depend on the
> userspace tool (rzscontrol) which was used to set various parameters and check
> statistics. Maintaining updated version of rzscontrol as changes were done to
> ioctls, statistics exported etc. was a major pain.
> 
> Another significant change is the introduction of percpu stats and compression
> buffers. Earlier, we had per-device buffers protected by a mutex. This was a
> major bottleneck on multi-core systems. With these changes, benchmarks with
> fio[1] showed a speedup of about 20% for write performance on dual-core
> system (see patch 4/10 description for details).
> 
> 
> For easier testing, a single patch against 2.6.35-git8 has been uploaded at:
> http://compcache.googlecode.com/hg/sub-projects/mainline/zram_2.6.36-rc0.patch

I applied the first 2 and last 2 of these patches and they will show up
in the linux-next tree tomorrow.  I stopped there due to Andrew's
complaints about the per-cpu variable stuff.  Please resolve this and
redo the patch set and I will be glad to apply them.

thanks,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 01/10] Replace ioctls with sysfs interface
  2010-08-09 17:26   ` Nitin Gupta
@ 2010-08-31 23:06     ` Dave Hansen
  -1 siblings, 0 replies; 137+ messages in thread
From: Dave Hansen @ 2010-08-31 23:06 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Mon, 2010-08-09 at 22:56 +0530, Nitin Gupta wrote:
> Creates per-device sysfs nodes in /sys/block/zram<id>/
> Currently following stats are exported:
>  - disksize
>  - num_reads
>  - num_writes
>  - invalid_io
>  - zero_pages
>  - orig_data_size
>  - compr_data_size
>  - mem_used_total
> 
> By default, disksize is set to 0. So, to start using
> a zram device, fist write a disksize value and then
> initialize device by writing any positive value to
> initstate. For example:
> 
>         # initialize /dev/zram0 with 50MB disksize
>         echo 50*1024*1024 | bc > /sys/block/zram0/disksize
>         echo 1 > /sys/block/zram0/initstate
> 
> When done using a disk, issue reset to free its memory
> by writing any positive value to reset node:
> 
>         echo 1 > /sys/block/zram0/reset

Maybe I'm just a weirdo, but I don't really use modules much.  That
effectively means that I'm stuck at boot with one zram device.

Making it a read-only module param also means that someone can't add a
second at runtime while the first is still in use.

It doesn't seem to be used very pervasively, but there is a
module_param_cb() function so you can register callbacks when the param
gets updated.  Might come in handy for this.

-- Dave


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 01/10] Replace ioctls with sysfs interface
@ 2010-08-31 23:06     ` Dave Hansen
  0 siblings, 0 replies; 137+ messages in thread
From: Dave Hansen @ 2010-08-31 23:06 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Mon, 2010-08-09 at 22:56 +0530, Nitin Gupta wrote:
> Creates per-device sysfs nodes in /sys/block/zram<id>/
> Currently following stats are exported:
>  - disksize
>  - num_reads
>  - num_writes
>  - invalid_io
>  - zero_pages
>  - orig_data_size
>  - compr_data_size
>  - mem_used_total
> 
> By default, disksize is set to 0. So, to start using
> a zram device, fist write a disksize value and then
> initialize device by writing any positive value to
> initstate. For example:
> 
>         # initialize /dev/zram0 with 50MB disksize
>         echo 50*1024*1024 | bc > /sys/block/zram0/disksize
>         echo 1 > /sys/block/zram0/initstate
> 
> When done using a disk, issue reset to free its memory
> by writing any positive value to reset node:
> 
>         echo 1 > /sys/block/zram0/reset

Maybe I'm just a weirdo, but I don't really use modules much.  That
effectively means that I'm stuck at boot with one zram device.

Making it a read-only module param also means that someone can't add a
second at runtime while the first is still in use.

It doesn't seem to be used very pervasively, but there is a
module_param_cb() function so you can register callbacks when the param
gets updated.  Might come in handy for this.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 00/10] zram: various improvements and cleanups
  2010-08-09 17:26 ` Nitin Gupta
@ 2010-09-01  3:32   ` Anton Blanchard
  -1 siblings, 0 replies; 137+ messages in thread
From: Anton Blanchard @ 2010-09-01  3:32 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel


Hi Nitin,

I gave zram a try on a ppc64 box with a 64kB PAGE_SIZE. It looks like the
xvmalloc allocator fails when we add in a large enough block (in this case
65532 bytes).

flindex ends up as 127 which is larger than BITS_PER_LONG. We continually call
grow_block inside find_block and fail:

zram: Error allocating memory for compressed page: 0, size=467

Anton

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 00/10] zram: various improvements and cleanups
@ 2010-09-01  3:32   ` Anton Blanchard
  0 siblings, 0 replies; 137+ messages in thread
From: Anton Blanchard @ 2010-09-01  3:32 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel


Hi Nitin,

I gave zram a try on a ppc64 box with a 64kB PAGE_SIZE. It looks like the
xvmalloc allocator fails when we add in a large enough block (in this case
65532 bytes).

flindex ends up as 127 which is larger than BITS_PER_LONG. We continually call
grow_block inside find_block and fail:

zram: Error allocating memory for compressed page: 0, size=467

Anton

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
  2010-08-31  5:36     ` Anton Blanchard
@ 2010-09-01  3:41       ` Anton Blanchard
  -1 siblings, 0 replies; 137+ messages in thread
From: Anton Blanchard @ 2010-09-01  3:41 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

 
Hi Nitin,

> There doesn't seem to be a free_percpu() in the module exit path. Something
> like this perhaps?

Sorry, was having other zram issues on ppc64 that prevented me from testing
this. Stupid bug, removed the &.

Anton

zram: Free percpu data on module exit.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: powerpc.git/drivers/staging/zram/zram_drv.c
===================================================================
--- powerpc.git.orig/drivers/staging/zram/zram_drv.c	2010-08-31 15:15:59.344290847 +1000
+++ powerpc.git/drivers/staging/zram/zram_drv.c	2010-09-01 12:35:02.964893575 +1000
@@ -483,8 +483,7 @@ void zram_reset_device(struct zram *zram
 	xv_destroy_pool(zram->mem_pool);
 	zram->mem_pool = NULL;
 
-	/* Reset stats */
-	memset(&zram->stats, 0, sizeof(zram->stats));
+	free_percpu(zram->stats);
 
 	zram->disksize = zram_default_disksize();
 	mutex_unlock(&zram->init_lock);

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
@ 2010-09-01  3:41       ` Anton Blanchard
  0 siblings, 0 replies; 137+ messages in thread
From: Anton Blanchard @ 2010-09-01  3:41 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

 
Hi Nitin,

> There doesn't seem to be a free_percpu() in the module exit path. Something
> like this perhaps?

Sorry, was having other zram issues on ppc64 that prevented me from testing
this. Stupid bug, removed the &.

Anton

zram: Free percpu data on module exit.

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: powerpc.git/drivers/staging/zram/zram_drv.c
===================================================================
--- powerpc.git.orig/drivers/staging/zram/zram_drv.c	2010-08-31 15:15:59.344290847 +1000
+++ powerpc.git/drivers/staging/zram/zram_drv.c	2010-09-01 12:35:02.964893575 +1000
@@ -483,8 +483,7 @@ void zram_reset_device(struct zram *zram
 	xv_destroy_pool(zram->mem_pool);
 	zram->mem_pool = NULL;
 
-	/* Reset stats */
-	memset(&zram->stats, 0, sizeof(zram->stats));
+	free_percpu(zram->stats);
 
 	zram->disksize = zram_default_disksize();
 	mutex_unlock(&zram->init_lock);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
  2010-08-09 17:26   ` Nitin Gupta
@ 2010-09-01  3:51     ` Anton Blanchard
  -1 siblings, 0 replies; 137+ messages in thread
From: Anton Blanchard @ 2010-09-01  3:51 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel


Hi,

> Also remove references to removed stats (ex: good_comress).

I'm getting an oops when running mkfs on zram:

NIP [d0000000030e0340] .zram_inc_stat+0x58/0x84 [zram]
[c00000006d58f720] [d0000000030e091c] .zram_make_request+0xa8/0x6a0 [zram]
[c00000006d58f840] [c00000000035795c] .generic_make_request+0x390/0x434
[c00000006d58f950] [c000000000357b14] .submit_bio+0x114/0x140
[c00000006d58fa20] [c000000000361778] .blkdev_issue_discard+0x1ac/0x250
[c00000006d58fb10] [c000000000361f68] .blkdev_ioctl+0x358/0x7fc
[c00000006d58fbd0] [c0000000001c1c1c] .block_ioctl+0x6c/0x90
[c00000006d58fc70] [c0000000001984c4] .do_vfs_ioctl+0x660/0x6d4
[c00000006d58fd70] [c0000000001985a0] .SyS_ioctl+0x68/0xb0

Since disksize no longer starts as 0 it looks like we can call
zram_make_request before the device has been initialised. The patch below
fixes the immediate problem but this would go away if we move the
initialisation function elsewhere (as suggested in another thread).

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: powerpc.git/drivers/staging/zram/zram_drv.c
===================================================================
--- powerpc.git.orig/drivers/staging/zram/zram_drv.c	2010-09-01 12:35:14.286515175 +1000
+++ powerpc.git/drivers/staging/zram/zram_drv.c	2010-09-01 12:35:24.167930504 +1000
@@ -441,6 +441,12 @@ static int zram_make_request(struct requ
 	int ret = 0;
 	struct zram *zram = queue->queuedata;
 
+	if (unlikely(!zram->init_done)) {
+		set_bit(BIO_UPTODATE, &bio->bi_flags);
+		bio_endio(bio, 0);
+		return 0;
+	}
+
 	if (unlikely(!valid_io_request(zram, bio))) {
 		zram_inc_stat(zram, ZRAM_STAT_INVALID_IO);
 		bio_io_error(bio);

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
@ 2010-09-01  3:51     ` Anton Blanchard
  0 siblings, 0 replies; 137+ messages in thread
From: Anton Blanchard @ 2010-09-01  3:51 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel


Hi,

> Also remove references to removed stats (ex: good_comress).

I'm getting an oops when running mkfs on zram:

NIP [d0000000030e0340] .zram_inc_stat+0x58/0x84 [zram]
[c00000006d58f720] [d0000000030e091c] .zram_make_request+0xa8/0x6a0 [zram]
[c00000006d58f840] [c00000000035795c] .generic_make_request+0x390/0x434
[c00000006d58f950] [c000000000357b14] .submit_bio+0x114/0x140
[c00000006d58fa20] [c000000000361778] .blkdev_issue_discard+0x1ac/0x250
[c00000006d58fb10] [c000000000361f68] .blkdev_ioctl+0x358/0x7fc
[c00000006d58fbd0] [c0000000001c1c1c] .block_ioctl+0x6c/0x90
[c00000006d58fc70] [c0000000001984c4] .do_vfs_ioctl+0x660/0x6d4
[c00000006d58fd70] [c0000000001985a0] .SyS_ioctl+0x68/0xb0

Since disksize no longer starts as 0 it looks like we can call
zram_make_request before the device has been initialised. The patch below
fixes the immediate problem but this would go away if we move the
initialisation function elsewhere (as suggested in another thread).

Signed-off-by: Anton Blanchard <anton@samba.org>
---

Index: powerpc.git/drivers/staging/zram/zram_drv.c
===================================================================
--- powerpc.git.orig/drivers/staging/zram/zram_drv.c	2010-09-01 12:35:14.286515175 +1000
+++ powerpc.git/drivers/staging/zram/zram_drv.c	2010-09-01 12:35:24.167930504 +1000
@@ -441,6 +441,12 @@ static int zram_make_request(struct requ
 	int ret = 0;
 	struct zram *zram = queue->queuedata;
 
+	if (unlikely(!zram->init_done)) {
+		set_bit(BIO_UPTODATE, &bio->bi_flags);
+		bio_endio(bio, 0);
+		return 0;
+	}
+
 	if (unlikely(!valid_io_request(zram, bio))) {
 		zram_inc_stat(zram, ZRAM_STAT_INVALID_IO);
 		bio_io_error(bio);

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
  2010-08-31 21:41             ` Eric Dumazet
@ 2010-09-01 20:05               ` Christoph Lameter
  -1 siblings, 0 replies; 137+ messages in thread
From: Christoph Lameter @ 2010-09-01 20:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Nitin Gupta, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Tue, 31 Aug 2010, Eric Dumazet wrote:

> > > Even for single counter, this_cpu_read(64bit) is not using an RMW
> > > (cmpxchg8) instruction, so you can get very strange results when low
> > > order 32bit wraps.
> >
> > How about fixing it so that everyone benefits?
> >
>
> IMHO, this_cpu_read() is fine as is : a _read_ operation.
>
> Dont pretend it can be used in every context, its not true.

The problem only exists on 32 bit platforms using 64 bit counters. If you
would provide this functionality for the fallback case of 64 bit counters
(here x86) in 32 bit arch code then you could use the this_cpu_*
operations in all context without your special code being replicated in
ohter places.

The additional advantage would be that for the 64bit case you would have
much faster and more compact code.



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
@ 2010-09-01 20:05               ` Christoph Lameter
  0 siblings, 0 replies; 137+ messages in thread
From: Christoph Lameter @ 2010-09-01 20:05 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Nitin Gupta, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Tue, 31 Aug 2010, Eric Dumazet wrote:

> > > Even for single counter, this_cpu_read(64bit) is not using an RMW
> > > (cmpxchg8) instruction, so you can get very strange results when low
> > > order 32bit wraps.
> >
> > How about fixing it so that everyone benefits?
> >
>
> IMHO, this_cpu_read() is fine as is : a _read_ operation.
>
> Dont pretend it can be used in every context, its not true.

The problem only exists on 32 bit platforms using 64 bit counters. If you
would provide this functionality for the fallback case of 64 bit counters
(here x86) in 32 bit arch code then you could use the this_cpu_*
operations in all context without your special code being replicated in
ohter places.

The additional advantage would be that for the 64bit case you would have
much faster and more compact code.


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
  2010-09-01 20:05               ` Christoph Lameter
@ 2010-09-01 20:38                 ` Eric Dumazet
  -1 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2010-09-01 20:38 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nitin Gupta, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

Le mercredi 01 septembre 2010 à 15:05 -0500, Christoph Lameter a écrit :

> The problem only exists on 32 bit platforms using 64 bit counters. If you
> would provide this functionality for the fallback case of 64 bit counters
> (here x86) in 32 bit arch code then you could use the this_cpu_*
> operations in all context without your special code being replicated in
> ohter places.
> 
> The additional advantage would be that for the 64bit case you would have
> much faster and more compact code.
> 
> 

My implementation is portable and use existing infrastructure, at the
time it was coded. BTW, its fast on 64bit too. As fast as previous
implementation. No extra code added. Please double check.

If you believe you can do better, please do so.

Of course, we added 64bit network stats to all 32bit arches only because
cost was acceptable. (I say all 32bit arches, because you seem to think
only x86 was the target)

Using this_cpu_{add|res}() fallback using atomic ops or spinlocks would
be slower than actual implemenation (smp_wmb() (nops on x86) and
increments).




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
@ 2010-09-01 20:38                 ` Eric Dumazet
  0 siblings, 0 replies; 137+ messages in thread
From: Eric Dumazet @ 2010-09-01 20:38 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Nitin Gupta, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

Le mercredi 01 septembre 2010 A  15:05 -0500, Christoph Lameter a A(C)crit :

> The problem only exists on 32 bit platforms using 64 bit counters. If you
> would provide this functionality for the fallback case of 64 bit counters
> (here x86) in 32 bit arch code then you could use the this_cpu_*
> operations in all context without your special code being replicated in
> ohter places.
> 
> The additional advantage would be that for the 64bit case you would have
> much faster and more compact code.
> 
> 

My implementation is portable and use existing infrastructure, at the
time it was coded. BTW, its fast on 64bit too. As fast as previous
implementation. No extra code added. Please double check.

If you believe you can do better, please do so.

Of course, we added 64bit network stats to all 32bit arches only because
cost was acceptable. (I say all 32bit arches, because you seem to think
only x86 was the target)

Using this_cpu_{add|res}() fallback using atomic ops or spinlocks would
be slower than actual implemenation (smp_wmb() (nops on x86) and
increments).



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
  2010-09-01 20:38                 ` Eric Dumazet
@ 2010-09-02  0:34                   ` Christoph Lameter
  -1 siblings, 0 replies; 137+ messages in thread
From: Christoph Lameter @ 2010-09-02  0:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Nitin Gupta, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Wed, 1 Sep 2010, Eric Dumazet wrote:

> > The additional advantage would be that for the 64bit case you would have
> > much faster and more compact code.
>
> My implementation is portable and use existing infrastructure, at the
> time it was coded. BTW, its fast on 64bit too. As fast as previous
> implementation. No extra code added. Please double check.

Well if you use the this_cpu_add() on 64 bit you have a single instruction
without the preempt off/on dance.

> If you believe you can do better, please do so.

Its right there in the x86_64 implementation of the this_cpu_ ops.

> Of course, we added 64bit network stats to all 32bit arches only because
> cost was acceptable. (I say all 32bit arches, because you seem to think
> only x86 was the target)

Ok then add your 64 bit on 32 bit implementation as a default for all 32
bit configurations. The point is to not have special implementations for
particular counters. It would be great if the logic you developed for the
network counters could be used in general for all who have issues with 64
bit counter increments.

> Using this_cpu_{add|res}() fallback using atomic ops or spinlocks would
> be slower than actual implemenation (smp_wmb() (nops on x86) and
> increments).

That would be bad. But then I do not see anyone around that wants
to implement such fallback behavior.

The current generic fallback for 64 ops (if the arch does not provide it
which 32 bit arches generally do not) is to switch off preempt and then
perform the 64 bit op. There is also a irqsafe version as well. Have a
look at inclue/linux/percpu.h.




^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
@ 2010-09-02  0:34                   ` Christoph Lameter
  0 siblings, 0 replies; 137+ messages in thread
From: Christoph Lameter @ 2010-09-02  0:34 UTC (permalink / raw)
  To: Eric Dumazet
  Cc: Nitin Gupta, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Wed, 1 Sep 2010, Eric Dumazet wrote:

> > The additional advantage would be that for the 64bit case you would have
> > much faster and more compact code.
>
> My implementation is portable and use existing infrastructure, at the
> time it was coded. BTW, its fast on 64bit too. As fast as previous
> implementation. No extra code added. Please double check.

Well if you use the this_cpu_add() on 64 bit you have a single instruction
without the preempt off/on dance.

> If you believe you can do better, please do so.

Its right there in the x86_64 implementation of the this_cpu_ ops.

> Of course, we added 64bit network stats to all 32bit arches only because
> cost was acceptable. (I say all 32bit arches, because you seem to think
> only x86 was the target)

Ok then add your 64 bit on 32 bit implementation as a default for all 32
bit configurations. The point is to not have special implementations for
particular counters. It would be great if the logic you developed for the
network counters could be used in general for all who have issues with 64
bit counter increments.

> Using this_cpu_{add|res}() fallback using atomic ops or spinlocks would
> be slower than actual implemenation (smp_wmb() (nops on x86) and
> increments).

That would be bad. But then I do not see anyone around that wants
to implement such fallback behavior.

The current generic fallback for 64 ops (if the arch does not provide it
which 32 bit arches generally do not) is to switch off preempt and then
perform the 64 bit op. There is also a irqsafe version as well. Have a
look at inclue/linux/percpu.h.



--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* OOM panics with zram
  2010-08-09 17:26 ` Nitin Gupta
@ 2010-09-09 17:24   ` Dave Hansen
  -1 siblings, 0 replies; 137+ messages in thread
From: Dave Hansen @ 2010-09-09 17:24 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

Hi Nitin,

I've been playing with using zram (from -staging) to back some qemu
guest memory directly.  Basically mmap()'ing the device in instead of
using anonymous memory.  The old code with the backing swap devices
seemed to work pretty well, but I'm running into a problem with the new
code.

I have plenty of swap on the system, and I'd been running with compcache
nicely for a while.  But, I went to go tar up (and gzip) a pretty large
directory in my qemu guest.  It panic'd the qemu host system:

[703826.003126] Kernel panic - not syncing: Out of memory and no killable processes...
[703826.003127] 
[703826.012350] Pid: 25508, comm: cat Not tainted 2.6.36-rc3-00114-g9b9913d #29
[703826.019385] Call Trace:
[703826.021928]  [<ffffffff8104032a>] panic+0xba/0x1e0
[703826.026801]  [<ffffffff810bb4a1>] ? next_online_pgdat+0x21/0x50
[703826.032799]  [<ffffffff810a7713>] ? find_lock_task_mm+0x23/0x60
[703826.038795]  [<ffffffff810a79ab>] ? dump_header+0x19b/0x1b0
[703826.044446]  [<ffffffff810a8157>] out_of_memory+0x297/0x2d0
[703826.050098]  [<ffffffff810abbaf>] __alloc_pages_nodemask+0x72f/0x740
[703826.056528]  [<ffffffff81110d4e>] ? __set_page_dirty+0x6e/0xc0
[703826.062438]  [<ffffffff810da477>] alloc_pages_current+0x87/0xd0
[703826.068438]  [<ffffffff810a533b>] __page_cache_alloc+0xb/0x10
[703826.074263]  [<ffffffff810ae2ff>] __do_page_cache_readahead+0xdf/0x220
[703826.080865]  [<ffffffff810ae45c>] ra_submit+0x1c/0x20
[703826.085998]  [<ffffffff810ae5f8>] ondemand_readahead+0xa8/0x1d0
[703826.091994]  [<ffffffff810ae797>] page_cache_async_readahead+0x77/0xc0
[703826.098595]  [<ffffffff810a6489>] generic_file_aio_read+0x259/0x6d0
[703826.104941]  [<ffffffff810eac21>] do_sync_read+0xd1/0x110
[703826.110418]  [<ffffffff810eb3f6>] vfs_read+0xc6/0x170
[703826.115547]  [<ffffffff810eb860>] sys_read+0x50/0x90
[703826.120591]  [<ffffffff81002c2b>] system_call_fastpath+0x16/0x1b

I have the feeling that the compcache device all of a sudden lost its
efficiency.  It can't do much about having non-compressible data stuck
in it, of course.

But, it used to be able to write things out to backing storage.  It
tries to return I/O errors when it runs out of space, but my system
didn't get that far.  It panic'd before it got the chance.

This seems like an issue that will probably crop up when we use zram as
a swap device too.  A panic seems like pretty undesirable behavior when
you've simply changed the kind of data being used.  Have you run into
this at all?

-- Dave


^ permalink raw reply	[flat|nested] 137+ messages in thread

* OOM panics with zram
@ 2010-09-09 17:24   ` Dave Hansen
  0 siblings, 0 replies; 137+ messages in thread
From: Dave Hansen @ 2010-09-09 17:24 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

Hi Nitin,

I've been playing with using zram (from -staging) to back some qemu
guest memory directly.  Basically mmap()'ing the device in instead of
using anonymous memory.  The old code with the backing swap devices
seemed to work pretty well, but I'm running into a problem with the new
code.

I have plenty of swap on the system, and I'd been running with compcache
nicely for a while.  But, I went to go tar up (and gzip) a pretty large
directory in my qemu guest.  It panic'd the qemu host system:

[703826.003126] Kernel panic - not syncing: Out of memory and no killable processes...
[703826.003127] 
[703826.012350] Pid: 25508, comm: cat Not tainted 2.6.36-rc3-00114-g9b9913d #29
[703826.019385] Call Trace:
[703826.021928]  [<ffffffff8104032a>] panic+0xba/0x1e0
[703826.026801]  [<ffffffff810bb4a1>] ? next_online_pgdat+0x21/0x50
[703826.032799]  [<ffffffff810a7713>] ? find_lock_task_mm+0x23/0x60
[703826.038795]  [<ffffffff810a79ab>] ? dump_header+0x19b/0x1b0
[703826.044446]  [<ffffffff810a8157>] out_of_memory+0x297/0x2d0
[703826.050098]  [<ffffffff810abbaf>] __alloc_pages_nodemask+0x72f/0x740
[703826.056528]  [<ffffffff81110d4e>] ? __set_page_dirty+0x6e/0xc0
[703826.062438]  [<ffffffff810da477>] alloc_pages_current+0x87/0xd0
[703826.068438]  [<ffffffff810a533b>] __page_cache_alloc+0xb/0x10
[703826.074263]  [<ffffffff810ae2ff>] __do_page_cache_readahead+0xdf/0x220
[703826.080865]  [<ffffffff810ae45c>] ra_submit+0x1c/0x20
[703826.085998]  [<ffffffff810ae5f8>] ondemand_readahead+0xa8/0x1d0
[703826.091994]  [<ffffffff810ae797>] page_cache_async_readahead+0x77/0xc0
[703826.098595]  [<ffffffff810a6489>] generic_file_aio_read+0x259/0x6d0
[703826.104941]  [<ffffffff810eac21>] do_sync_read+0xd1/0x110
[703826.110418]  [<ffffffff810eb3f6>] vfs_read+0xc6/0x170
[703826.115547]  [<ffffffff810eb860>] sys_read+0x50/0x90
[703826.120591]  [<ffffffff81002c2b>] system_call_fastpath+0x16/0x1b

I have the feeling that the compcache device all of a sudden lost its
efficiency.  It can't do much about having non-compressible data stuck
in it, of course.

But, it used to be able to write things out to backing storage.  It
tries to return I/O errors when it runs out of space, but my system
didn't get that far.  It panic'd before it got the chance.

This seems like an issue that will probably crop up when we use zram as
a swap device too.  A panic seems like pretty undesirable behavior when
you've simply changed the kind of data being used.  Have you run into
this at all?

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* [patch -rc] oom: always return a badness score of non-zero for eligible tasks
  2010-09-09 17:24   ` Dave Hansen
@ 2010-09-09 19:07     ` David Rientjes
  -1 siblings, 0 replies; 137+ messages in thread
From: David Rientjes @ 2010-09-09 19:07 UTC (permalink / raw)
  To: Andrew Morton, Dave Hansen
  Cc: Nitin Gupta, Pekka Enberg, Minchan Kim, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Thu, 9 Sep 2010, Dave Hansen wrote:

> Hi Nitin,
> 
> I've been playing with using zram (from -staging) to back some qemu
> guest memory directly.  Basically mmap()'ing the device in instead of
> using anonymous memory.  The old code with the backing swap devices
> seemed to work pretty well, but I'm running into a problem with the new
> code.
> 
> I have plenty of swap on the system, and I'd been running with compcache
> nicely for a while.  But, I went to go tar up (and gzip) a pretty large
> directory in my qemu guest.  It panic'd the qemu host system:
> 
> [703826.003126] Kernel panic - not syncing: Out of memory and no killable processes...
> [703826.003127] 
> [703826.012350] Pid: 25508, comm: cat Not tainted 2.6.36-rc3-00114-g9b9913d #29

I'm curious why there are no killable processes on the system; it seems 
like the triggering task here, cat, would at least be killable itself.  
Could you post the tasklist dump that preceeds this (or, if you've 
disabled it try echo 1 > /proc/sys/vm/oom_dump_tasks first)?

It's possible that if you have enough swap that none of the eligible tasks 
actually have non-zero badness scores either because they are being run as 
root or because the amount of RAM or swap is sufficiently high such that 
(task's rss + swap) / (total rss + swap) is never non-zero.  And, since 
root tasks have a 3% bonus, it's possible these are all root tasks and no 
single task uses more than 3% of rss and swap.

While this may not be the issue in your case, and can be confirmed with 
the tasklist dump if you can get it, we need to protect against these 
situations where eligible tasks may not be killed.

Andrew, I'd like to propose this patch for 2.6.36-rc-series since the 
worst case is that the machine will panic if there are an exceptionally 
large number of tasks, each with little memory usage at the time of oom.



oom: always return a badness score of non-zero for eligible tasks

A task's badness score is roughly a proportion of its rss and swap
compared to the system's capacity.  The scale ranges from 0 to 1000 with
the highest score chosen for kill.  Thus, this scale operates on a
resolution of 0.1% of RAM + swap.  Admin tasks are also given a 3% bonus,
so the badness score of an admin task using 3% of memory, for example,
would still be 0.

It's possible that an exceptionally large number of tasks will combine to 
exhaust all resources but never have a single task that uses more than
0.1% of RAM and swap (or 3.0% for admin tasks).

This patch ensures that the badness score of any eligible task is never 0
so the machine doesn't unnecessarily panic because it cannot find a task
to kill.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/oom_kill.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -208,8 +208,13 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 	 */
 	points += p->signal->oom_score_adj;
 
-	if (points < 0)
-		return 0;
+	/*
+	 * Never return 0 for an eligible task that may be killed since it's
+	 * possible that no single user task uses more than 0.1% of memory and
+	 * no single admin tasks uses more than 3.0%.
+	 */
+	if (points <= 0)
+		return 1;
 	return (points < 1000) ? points : 1000;
 }
 

^ permalink raw reply	[flat|nested] 137+ messages in thread

* [patch -rc] oom: always return a badness score of non-zero for eligible tasks
@ 2010-09-09 19:07     ` David Rientjes
  0 siblings, 0 replies; 137+ messages in thread
From: David Rientjes @ 2010-09-09 19:07 UTC (permalink / raw)
  To: Andrew Morton, Dave Hansen
  Cc: Nitin Gupta, Pekka Enberg, Minchan Kim, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Thu, 9 Sep 2010, Dave Hansen wrote:

> Hi Nitin,
> 
> I've been playing with using zram (from -staging) to back some qemu
> guest memory directly.  Basically mmap()'ing the device in instead of
> using anonymous memory.  The old code with the backing swap devices
> seemed to work pretty well, but I'm running into a problem with the new
> code.
> 
> I have plenty of swap on the system, and I'd been running with compcache
> nicely for a while.  But, I went to go tar up (and gzip) a pretty large
> directory in my qemu guest.  It panic'd the qemu host system:
> 
> [703826.003126] Kernel panic - not syncing: Out of memory and no killable processes...
> [703826.003127] 
> [703826.012350] Pid: 25508, comm: cat Not tainted 2.6.36-rc3-00114-g9b9913d #29

I'm curious why there are no killable processes on the system; it seems 
like the triggering task here, cat, would at least be killable itself.  
Could you post the tasklist dump that preceeds this (or, if you've 
disabled it try echo 1 > /proc/sys/vm/oom_dump_tasks first)?

It's possible that if you have enough swap that none of the eligible tasks 
actually have non-zero badness scores either because they are being run as 
root or because the amount of RAM or swap is sufficiently high such that 
(task's rss + swap) / (total rss + swap) is never non-zero.  And, since 
root tasks have a 3% bonus, it's possible these are all root tasks and no 
single task uses more than 3% of rss and swap.

While this may not be the issue in your case, and can be confirmed with 
the tasklist dump if you can get it, we need to protect against these 
situations where eligible tasks may not be killed.

Andrew, I'd like to propose this patch for 2.6.36-rc-series since the 
worst case is that the machine will panic if there are an exceptionally 
large number of tasks, each with little memory usage at the time of oom.



oom: always return a badness score of non-zero for eligible tasks

A task's badness score is roughly a proportion of its rss and swap
compared to the system's capacity.  The scale ranges from 0 to 1000 with
the highest score chosen for kill.  Thus, this scale operates on a
resolution of 0.1% of RAM + swap.  Admin tasks are also given a 3% bonus,
so the badness score of an admin task using 3% of memory, for example,
would still be 0.

It's possible that an exceptionally large number of tasks will combine to 
exhaust all resources but never have a single task that uses more than
0.1% of RAM and swap (or 3.0% for admin tasks).

This patch ensures that the badness score of any eligible task is never 0
so the machine doesn't unnecessarily panic because it cannot find a task
to kill.

Signed-off-by: David Rientjes <rientjes@google.com>
---
 mm/oom_kill.c |    9 +++++++--
 1 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/mm/oom_kill.c b/mm/oom_kill.c
--- a/mm/oom_kill.c
+++ b/mm/oom_kill.c
@@ -208,8 +208,13 @@ unsigned int oom_badness(struct task_struct *p, struct mem_cgroup *mem,
 	 */
 	points += p->signal->oom_score_adj;
 
-	if (points < 0)
-		return 0;
+	/*
+	 * Never return 0 for an eligible task that may be killed since it's
+	 * possible that no single user task uses more than 0.1% of memory and
+	 * no single admin tasks uses more than 3.0%.
+	 */
+	if (points <= 0)
+		return 1;
 	return (points < 1000) ? points : 1000;
 }
 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [patch -rc] oom: always return a badness score of non-zero for eligible tasks
  2010-09-09 19:07     ` David Rientjes
@ 2010-09-09 19:48       ` Dave Hansen
  -1 siblings, 0 replies; 137+ messages in thread
From: Dave Hansen @ 2010-09-09 19:48 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Nitin Gupta, Pekka Enberg, Minchan Kim, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Thu, 2010-09-09 at 12:07 -0700, David Rientjes wrote:
> On Thu, 9 Sep 2010, Dave Hansen wrote:
> 
> > Hi Nitin,
> > 
> > I've been playing with using zram (from -staging) to back some qemu
> > guest memory directly.  Basically mmap()'ing the device in instead of
> > using anonymous memory.  The old code with the backing swap devices
> > seemed to work pretty well, but I'm running into a problem with the new
> > code.
> > 
> > I have plenty of swap on the system, and I'd been running with compcache
> > nicely for a while.  But, I went to go tar up (and gzip) a pretty large
> > directory in my qemu guest.  It panic'd the qemu host system:
> > 
> > [703826.003126] Kernel panic - not syncing: Out of memory and no killable processes...
> > [703826.003127] 
> > [703826.012350] Pid: 25508, comm: cat Not tainted 2.6.36-rc3-00114-g9b9913d #29
> 
> I'm curious why there are no killable processes on the system; it seems 
> like the triggering task here, cat, would at least be killable itself.  
> Could you post the tasklist dump that preceeds this (or, if you've 
> disabled it try echo 1 > /proc/sys/vm/oom_dump_tasks first)?

That was one odd part here.  I didn't disable the tasklist dump, and
there was none in the dump.

> It's possible that if you have enough swap that none of the eligible tasks 
> actually have non-zero badness scores either because they are being run as 
> root or because the amount of RAM or swap is sufficiently high such that 
> (task's rss + swap) / (total rss + swap) is never non-zero.  And, since 
> root tasks have a 3% bonus, it's possible these are all root tasks and no 
> single task uses more than 3% of rss and swap.

It's a 64GB machine with ~30GB of swap and very little RSS.  Your
hypothesis seems correct.  Just grepping through /proc/[0-9]*/oom_score
shows nothing other than 0's.

Trying this again, I just hung the system instead of OOM'ing straight
away like last time.

Your patch makes a lot of sense to me in any case where there aren't
large-RSS tasks around using memory.  That definitely applies here
because of the amount in the compcache store and might also apply with
ramfs and hugetlbfs.

-- Dave


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [patch -rc] oom: always return a badness score of non-zero for eligible tasks
@ 2010-09-09 19:48       ` Dave Hansen
  0 siblings, 0 replies; 137+ messages in thread
From: Dave Hansen @ 2010-09-09 19:48 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Nitin Gupta, Pekka Enberg, Minchan Kim, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Thu, 2010-09-09 at 12:07 -0700, David Rientjes wrote:
> On Thu, 9 Sep 2010, Dave Hansen wrote:
> 
> > Hi Nitin,
> > 
> > I've been playing with using zram (from -staging) to back some qemu
> > guest memory directly.  Basically mmap()'ing the device in instead of
> > using anonymous memory.  The old code with the backing swap devices
> > seemed to work pretty well, but I'm running into a problem with the new
> > code.
> > 
> > I have plenty of swap on the system, and I'd been running with compcache
> > nicely for a while.  But, I went to go tar up (and gzip) a pretty large
> > directory in my qemu guest.  It panic'd the qemu host system:
> > 
> > [703826.003126] Kernel panic - not syncing: Out of memory and no killable processes...
> > [703826.003127] 
> > [703826.012350] Pid: 25508, comm: cat Not tainted 2.6.36-rc3-00114-g9b9913d #29
> 
> I'm curious why there are no killable processes on the system; it seems 
> like the triggering task here, cat, would at least be killable itself.  
> Could you post the tasklist dump that preceeds this (or, if you've 
> disabled it try echo 1 > /proc/sys/vm/oom_dump_tasks first)?

That was one odd part here.  I didn't disable the tasklist dump, and
there was none in the dump.

> It's possible that if you have enough swap that none of the eligible tasks 
> actually have non-zero badness scores either because they are being run as 
> root or because the amount of RAM or swap is sufficiently high such that 
> (task's rss + swap) / (total rss + swap) is never non-zero.  And, since 
> root tasks have a 3% bonus, it's possible these are all root tasks and no 
> single task uses more than 3% of rss and swap.

It's a 64GB machine with ~30GB of swap and very little RSS.  Your
hypothesis seems correct.  Just grepping through /proc/[0-9]*/oom_score
shows nothing other than 0's.

Trying this again, I just hung the system instead of OOM'ing straight
away like last time.

Your patch makes a lot of sense to me in any case where there aren't
large-RSS tasks around using memory.  That definitely applies here
because of the amount in the compcache store and might also apply with
ramfs and hugetlbfs.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [patch -rc] oom: always return a badness score of non-zero for eligible tasks
  2010-09-09 19:48       ` Dave Hansen
@ 2010-09-09 21:00         ` David Rientjes
  -1 siblings, 0 replies; 137+ messages in thread
From: David Rientjes @ 2010-09-09 21:00 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, Nitin Gupta, Pekka Enberg, Minchan Kim, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Thu, 9 Sep 2010, Dave Hansen wrote:

> > I'm curious why there are no killable processes on the system; it seems 
> > like the triggering task here, cat, would at least be killable itself.  
> > Could you post the tasklist dump that preceeds this (or, if you've 
> > disabled it try echo 1 > /proc/sys/vm/oom_dump_tasks first)?
> 
> That was one odd part here.  I didn't disable the tasklist dump, and
> there was none in the dump.
> 

Hmm, could you very that /proc/sys/vm/oom_dump_tasks is set?  Perhaps it's 
getting cleared by something else before you use zram.  The sysctl should 
default to on as of 2.6.36-rc1.

> > It's possible that if you have enough swap that none of the eligible tasks 
> > actually have non-zero badness scores either because they are being run as 
> > root or because the amount of RAM or swap is sufficiently high such that 
> > (task's rss + swap) / (total rss + swap) is never non-zero.  And, since 
> > root tasks have a 3% bonus, it's possible these are all root tasks and no 
> > single task uses more than 3% of rss and swap.
> 
> It's a 64GB machine with ~30GB of swap and very little RSS.  Your
> hypothesis seems correct.  Just grepping through /proc/[0-9]*/oom_score
> shows nothing other than 0's.
> 

Presumably you're not using a large amount of swap, either, or that would 
be accounted for in oom_score.

> Trying this again, I just hung the system instead of OOM'ing straight
> away like last time.
> 

with the patch, you should still be calling the oom killer and instead of 
panicking it will go on a serial killing spree because everything that it 
wasn't judging as a candidate before (oom_score of 0) now is if it's truly 
killable (oom_score of 1).  The patch is definitely needed for correctness 
since an oom_score of 0 implies the task is unkillable.

We're apparently hanging in the exit path for the oom killed task or 
something is constantly respawning threads that repeatedly get killed.  It 
appears as though nothing is actually a worthwhile target for the oom 
killer, however, and this is a bad configuration.

> Your patch makes a lot of sense to me in any case where there aren't
> large-RSS tasks around using memory.  That definitely applies here
> because of the amount in the compcache store and might also apply with
> ramfs and hugetlbfs.
> 

Agreed, we'll need to address hugepages specifically because they don't 
get accounted for in rss but do free memory when the task is killed.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [patch -rc] oom: always return a badness score of non-zero for eligible tasks
@ 2010-09-09 21:00         ` David Rientjes
  0 siblings, 0 replies; 137+ messages in thread
From: David Rientjes @ 2010-09-09 21:00 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, Nitin Gupta, Pekka Enberg, Minchan Kim, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Thu, 9 Sep 2010, Dave Hansen wrote:

> > I'm curious why there are no killable processes on the system; it seems 
> > like the triggering task here, cat, would at least be killable itself.  
> > Could you post the tasklist dump that preceeds this (or, if you've 
> > disabled it try echo 1 > /proc/sys/vm/oom_dump_tasks first)?
> 
> That was one odd part here.  I didn't disable the tasklist dump, and
> there was none in the dump.
> 

Hmm, could you very that /proc/sys/vm/oom_dump_tasks is set?  Perhaps it's 
getting cleared by something else before you use zram.  The sysctl should 
default to on as of 2.6.36-rc1.

> > It's possible that if you have enough swap that none of the eligible tasks 
> > actually have non-zero badness scores either because they are being run as 
> > root or because the amount of RAM or swap is sufficiently high such that 
> > (task's rss + swap) / (total rss + swap) is never non-zero.  And, since 
> > root tasks have a 3% bonus, it's possible these are all root tasks and no 
> > single task uses more than 3% of rss and swap.
> 
> It's a 64GB machine with ~30GB of swap and very little RSS.  Your
> hypothesis seems correct.  Just grepping through /proc/[0-9]*/oom_score
> shows nothing other than 0's.
> 

Presumably you're not using a large amount of swap, either, or that would 
be accounted for in oom_score.

> Trying this again, I just hung the system instead of OOM'ing straight
> away like last time.
> 

with the patch, you should still be calling the oom killer and instead of 
panicking it will go on a serial killing spree because everything that it 
wasn't judging as a candidate before (oom_score of 0) now is if it's truly 
killable (oom_score of 1).  The patch is definitely needed for correctness 
since an oom_score of 0 implies the task is unkillable.

We're apparently hanging in the exit path for the oom killed task or 
something is constantly respawning threads that repeatedly get killed.  It 
appears as though nothing is actually a worthwhile target for the oom 
killer, however, and this is a bad configuration.

> Your patch makes a lot of sense to me in any case where there aren't
> large-RSS tasks around using memory.  That definitely applies here
> because of the amount in the compcache store and might also apply with
> ramfs and hugetlbfs.
> 

Agreed, we'll need to address hugepages specifically because they don't 
get accounted for in rss but do free memory when the task is killed.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [patch -rc] oom: always return a badness score of non-zero for eligible tasks
  2010-09-09 21:00         ` David Rientjes
@ 2010-09-09 21:10           ` Dave Hansen
  -1 siblings, 0 replies; 137+ messages in thread
From: Dave Hansen @ 2010-09-09 21:10 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Nitin Gupta, Pekka Enberg, Minchan Kim, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Thu, 2010-09-09 at 14:00 -0700, David Rientjes wrote:
> On Thu, 9 Sep 2010, Dave Hansen wrote:
> > > I'm curious why there are no killable processes on the system; it seems 
> > > like the triggering task here, cat, would at least be killable itself.  
> > > Could you post the tasklist dump that preceeds this (or, if you've 
> > > disabled it try echo 1 > /proc/sys/vm/oom_dump_tasks first)?
> > 
> > That was one odd part here.  I didn't disable the tasklist dump, and
> > there was none in the dump.
> 
> Hmm, could you very that /proc/sys/vm/oom_dump_tasks is set?  Perhaps it's 
> getting cleared by something else before you use zram.  The sysctl should 
> default to on as of 2.6.36-rc1.

I double-checked.  It defaults to on and remains that way.

> > > It's possible that if you have enough swap that none of the eligible tasks 
> > > actually have non-zero badness scores either because they are being run as 
> > > root or because the amount of RAM or swap is sufficiently high such that 
> > > (task's rss + swap) / (total rss + swap) is never non-zero.  And, since 
> > > root tasks have a 3% bonus, it's possible these are all root tasks and no 
> > > single task uses more than 3% of rss and swap.
> > 
> > It's a 64GB machine with ~30GB of swap and very little RSS.  Your
> > hypothesis seems correct.  Just grepping through /proc/[0-9]*/oom_score
> > shows nothing other than 0's.
> 
> Presumably you're not using a large amount of swap, either, or that would 
> be accounted for in oom_score.

Nope.  There's very little happening on the system except for me toying
with the compcache device.

> > Trying this again, I just hung the system instead of OOM'ing straight
> > away like last time.
> 
> with the patch, you should still be calling the oom killer and instead of 
> panicking it will go on a serial killing spree because everything that it 
> wasn't judging as a candidate before (oom_score of 0) now is if it's truly 
> killable (oom_score of 1).  The patch is definitely needed for correctness 
> since an oom_score of 0 implies the task is unkillable.
> 
> We're apparently hanging in the exit path for the oom killed task or 
> something is constantly respawning threads that repeatedly get killed.  It 
> appears as though nothing is actually a worthwhile target for the oom 
> killer, however, and this is a bad configuration.

I'll give the patch a shot and see if I get any better behavior.  But, I
really do think the root cause here is compcache exhausting the system
when you feed incompressible pages into it.  We can kill all the tasks
we want, but I think it'll continue to gobble memory up as fast as we
free it.

We either need to put some upper bounds on the amount of memory that
compcache uses for its backing store, or reintroduce the code that lets
it fall back to swap.

> > Your patch makes a lot of sense to me in any case where there aren't
> > large-RSS tasks around using memory.  That definitely applies here
> > because of the amount in the compcache store and might also apply with
> > ramfs and hugetlbfs.
> > 
> 
> Agreed, we'll need to address hugepages specifically because they don't 
> get accounted for in rss but do free memory when the task is killed.

They do sometimes.  But, if they're preallocated, or stuck in a linked
file on the filesystem, killing the task doesn't do any good.

-- Dave


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [patch -rc] oom: always return a badness score of non-zero for eligible tasks
@ 2010-09-09 21:10           ` Dave Hansen
  0 siblings, 0 replies; 137+ messages in thread
From: Dave Hansen @ 2010-09-09 21:10 UTC (permalink / raw)
  To: David Rientjes
  Cc: Andrew Morton, Nitin Gupta, Pekka Enberg, Minchan Kim, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Thu, 2010-09-09 at 14:00 -0700, David Rientjes wrote:
> On Thu, 9 Sep 2010, Dave Hansen wrote:
> > > I'm curious why there are no killable processes on the system; it seems 
> > > like the triggering task here, cat, would at least be killable itself.  
> > > Could you post the tasklist dump that preceeds this (or, if you've 
> > > disabled it try echo 1 > /proc/sys/vm/oom_dump_tasks first)?
> > 
> > That was one odd part here.  I didn't disable the tasklist dump, and
> > there was none in the dump.
> 
> Hmm, could you very that /proc/sys/vm/oom_dump_tasks is set?  Perhaps it's 
> getting cleared by something else before you use zram.  The sysctl should 
> default to on as of 2.6.36-rc1.

I double-checked.  It defaults to on and remains that way.

> > > It's possible that if you have enough swap that none of the eligible tasks 
> > > actually have non-zero badness scores either because they are being run as 
> > > root or because the amount of RAM or swap is sufficiently high such that 
> > > (task's rss + swap) / (total rss + swap) is never non-zero.  And, since 
> > > root tasks have a 3% bonus, it's possible these are all root tasks and no 
> > > single task uses more than 3% of rss and swap.
> > 
> > It's a 64GB machine with ~30GB of swap and very little RSS.  Your
> > hypothesis seems correct.  Just grepping through /proc/[0-9]*/oom_score
> > shows nothing other than 0's.
> 
> Presumably you're not using a large amount of swap, either, or that would 
> be accounted for in oom_score.

Nope.  There's very little happening on the system except for me toying
with the compcache device.

> > Trying this again, I just hung the system instead of OOM'ing straight
> > away like last time.
> 
> with the patch, you should still be calling the oom killer and instead of 
> panicking it will go on a serial killing spree because everything that it 
> wasn't judging as a candidate before (oom_score of 0) now is if it's truly 
> killable (oom_score of 1).  The patch is definitely needed for correctness 
> since an oom_score of 0 implies the task is unkillable.
> 
> We're apparently hanging in the exit path for the oom killed task or 
> something is constantly respawning threads that repeatedly get killed.  It 
> appears as though nothing is actually a worthwhile target for the oom 
> killer, however, and this is a bad configuration.

I'll give the patch a shot and see if I get any better behavior.  But, I
really do think the root cause here is compcache exhausting the system
when you feed incompressible pages into it.  We can kill all the tasks
we want, but I think it'll continue to gobble memory up as fast as we
free it.

We either need to put some upper bounds on the amount of memory that
compcache uses for its backing store, or reintroduce the code that lets
it fall back to swap.

> > Your patch makes a lot of sense to me in any case where there aren't
> > large-RSS tasks around using memory.  That definitely applies here
> > because of the amount in the compcache store and might also apply with
> > ramfs and hugetlbfs.
> > 
> 
> Agreed, we'll need to address hugepages specifically because they don't 
> get accounted for in rss but do free memory when the task is killed.

They do sometimes.  But, if they're preallocated, or stuck in a linked
file on the filesystem, killing the task doesn't do any good.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [patch -rc] oom: always return a badness score of non-zero for eligible tasks
  2010-09-09 21:10           ` Dave Hansen
@ 2010-09-09 21:40             ` David Rientjes
  -1 siblings, 0 replies; 137+ messages in thread
From: David Rientjes @ 2010-09-09 21:40 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, Nitin Gupta, Pekka Enberg, Minchan Kim, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Thu, 9 Sep 2010, Dave Hansen wrote:

> > Hmm, could you very that /proc/sys/vm/oom_dump_tasks is set?  Perhaps it's 
> > getting cleared by something else before you use zram.  The sysctl should 
> > default to on as of 2.6.36-rc1.
> 
> I double-checked.  It defaults to on and remains that way.
> 

Ok, I assume you aren't getting the typical "cat invoked oom-killer..." 
message, the memory state dump, etc., either, so there's something strange 
with your log level such that nothing under KERN_WARNING is getting 
through or you can't access the actual kernel log due to the panic.  I can 
capture all that information with a netdump on panic with 2.6.36-rc3.

> I'll give the patch a shot and see if I get any better behavior.  But, I
> really do think the root cause here is compcache exhausting the system
> when you feed incompressible pages into it.  We can kill all the tasks
> we want, but I think it'll continue to gobble memory up as fast as we
> free it.
> 

That certainly seems to be the case and is the true topic of this thread, 
so I don't want to hijack it any further since it's outside the scope of 
the oom killer :)

But I'm still curious as to why the machine is hanging and not eventually 
panicking when we run out of killable tasks.  It seems as though something 
is hanging in the exit path, meaning memory reserves aren't even safe from 
compcache, or there's something wrong in the oom killer retry logic, or 
you're simply forking more tasks, perhaps as a response to threads getting 
killed by the kernel, than we can kill.

We'd certainly prefer to panic the machine if no work is getting done than 
simply killing everything that gets forked.  The problem before was that 
we panicked too early before we killed anything and now we don't know when 
to panic appropriately.

> > Agreed, we'll need to address hugepages specifically because they don't 
> > get accounted for in rss but do free memory when the task is killed.
> 
> They do sometimes.  But, if they're preallocated, or stuck in a linked
> file on the filesystem, killing the task doesn't do any good.
> 

Indeed you're right, I meant s/hugepages/transparent hugepages/, sorry.  
It appears as though they get included in the rss of the allocating task, 
though, via MM_ANONPAGES, so this is already represented in the task's 
badness score.

Thanks for trying the patch out, Dave, I hope we can add your Tested-by 
line and it can get pushed to the rc-series.

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [patch -rc] oom: always return a badness score of non-zero for eligible tasks
@ 2010-09-09 21:40             ` David Rientjes
  0 siblings, 0 replies; 137+ messages in thread
From: David Rientjes @ 2010-09-09 21:40 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Andrew Morton, Nitin Gupta, Pekka Enberg, Minchan Kim, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Thu, 9 Sep 2010, Dave Hansen wrote:

> > Hmm, could you very that /proc/sys/vm/oom_dump_tasks is set?  Perhaps it's 
> > getting cleared by something else before you use zram.  The sysctl should 
> > default to on as of 2.6.36-rc1.
> 
> I double-checked.  It defaults to on and remains that way.
> 

Ok, I assume you aren't getting the typical "cat invoked oom-killer..." 
message, the memory state dump, etc., either, so there's something strange 
with your log level such that nothing under KERN_WARNING is getting 
through or you can't access the actual kernel log due to the panic.  I can 
capture all that information with a netdump on panic with 2.6.36-rc3.

> I'll give the patch a shot and see if I get any better behavior.  But, I
> really do think the root cause here is compcache exhausting the system
> when you feed incompressible pages into it.  We can kill all the tasks
> we want, but I think it'll continue to gobble memory up as fast as we
> free it.
> 

That certainly seems to be the case and is the true topic of this thread, 
so I don't want to hijack it any further since it's outside the scope of 
the oom killer :)

But I'm still curious as to why the machine is hanging and not eventually 
panicking when we run out of killable tasks.  It seems as though something 
is hanging in the exit path, meaning memory reserves aren't even safe from 
compcache, or there's something wrong in the oom killer retry logic, or 
you're simply forking more tasks, perhaps as a response to threads getting 
killed by the kernel, than we can kill.

We'd certainly prefer to panic the machine if no work is getting done than 
simply killing everything that gets forked.  The problem before was that 
we panicked too early before we killed anything and now we don't know when 
to panic appropriately.

> > Agreed, we'll need to address hugepages specifically because they don't 
> > get accounted for in rss but do free memory when the task is killed.
> 
> They do sometimes.  But, if they're preallocated, or stuck in a linked
> file on the filesystem, killing the task doesn't do any good.
> 

Indeed you're right, I meant s/hugepages/transparent hugepages/, sorry.  
It appears as though they get included in the rss of the allocating task, 
though, via MM_ANONPAGES, so this is already represented in the task's 
badness score.

Thanks for trying the patch out, Dave, I hope we can add your Tested-by 
line and it can get pushed to the rc-series.

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
  2010-09-01  3:51     ` Anton Blanchard
@ 2010-09-17 20:59       ` Andrew Morton
  -1 siblings, 0 replies; 137+ messages in thread
From: Andrew Morton @ 2010-09-17 20:59 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: Nitin Gupta, Pekka Enberg, Minchan Kim, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Wed, 1 Sep 2010 13:51:35 +1000
Anton Blanchard <anton@samba.org> wrote:

> 
> Hi,
> 
> > Also remove references to removed stats (ex: good_comress).
> 
> I'm getting an oops when running mkfs on zram:
> 
> NIP [d0000000030e0340] .zram_inc_stat+0x58/0x84 [zram]
> [c00000006d58f720] [d0000000030e091c] .zram_make_request+0xa8/0x6a0 [zram]
> [c00000006d58f840] [c00000000035795c] .generic_make_request+0x390/0x434
> [c00000006d58f950] [c000000000357b14] .submit_bio+0x114/0x140
> [c00000006d58fa20] [c000000000361778] .blkdev_issue_discard+0x1ac/0x250
> [c00000006d58fb10] [c000000000361f68] .blkdev_ioctl+0x358/0x7fc
> [c00000006d58fbd0] [c0000000001c1c1c] .block_ioctl+0x6c/0x90
> [c00000006d58fc70] [c0000000001984c4] .do_vfs_ioctl+0x660/0x6d4
> [c00000006d58fd70] [c0000000001985a0] .SyS_ioctl+0x68/0xb0
> 
> Since disksize no longer starts as 0 it looks like we can call
> zram_make_request before the device has been initialised. The patch below
> fixes the immediate problem but this would go away if we move the
> initialisation function elsewhere (as suggested in another thread).
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
> 
> Index: powerpc.git/drivers/staging/zram/zram_drv.c
> ===================================================================
> --- powerpc.git.orig/drivers/staging/zram/zram_drv.c	2010-09-01 12:35:14.286515175 +1000
> +++ powerpc.git/drivers/staging/zram/zram_drv.c	2010-09-01 12:35:24.167930504 +1000
> @@ -441,6 +441,12 @@ static int zram_make_request(struct requ
>  	int ret = 0;
>  	struct zram *zram = queue->queuedata;
>  
> +	if (unlikely(!zram->init_done)) {
> +		set_bit(BIO_UPTODATE, &bio->bi_flags);
> +		bio_endio(bio, 0);
> +		return 0;
> +	}
> +
>  	if (unlikely(!valid_io_request(zram, bio))) {
>  		zram_inc_stat(zram, ZRAM_STAT_INVALID_IO);
>  		bio_io_error(bio);

So... what happened with this and your other bugfix in
zram_reset_device()?

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: [PATCH 03/10] Use percpu stats
@ 2010-09-17 20:59       ` Andrew Morton
  0 siblings, 0 replies; 137+ messages in thread
From: Andrew Morton @ 2010-09-17 20:59 UTC (permalink / raw)
  To: Anton Blanchard
  Cc: Nitin Gupta, Pekka Enberg, Minchan Kim, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

On Wed, 1 Sep 2010 13:51:35 +1000
Anton Blanchard <anton@samba.org> wrote:

> 
> Hi,
> 
> > Also remove references to removed stats (ex: good_comress).
> 
> I'm getting an oops when running mkfs on zram:
> 
> NIP [d0000000030e0340] .zram_inc_stat+0x58/0x84 [zram]
> [c00000006d58f720] [d0000000030e091c] .zram_make_request+0xa8/0x6a0 [zram]
> [c00000006d58f840] [c00000000035795c] .generic_make_request+0x390/0x434
> [c00000006d58f950] [c000000000357b14] .submit_bio+0x114/0x140
> [c00000006d58fa20] [c000000000361778] .blkdev_issue_discard+0x1ac/0x250
> [c00000006d58fb10] [c000000000361f68] .blkdev_ioctl+0x358/0x7fc
> [c00000006d58fbd0] [c0000000001c1c1c] .block_ioctl+0x6c/0x90
> [c00000006d58fc70] [c0000000001984c4] .do_vfs_ioctl+0x660/0x6d4
> [c00000006d58fd70] [c0000000001985a0] .SyS_ioctl+0x68/0xb0
> 
> Since disksize no longer starts as 0 it looks like we can call
> zram_make_request before the device has been initialised. The patch below
> fixes the immediate problem but this would go away if we move the
> initialisation function elsewhere (as suggested in another thread).
> 
> Signed-off-by: Anton Blanchard <anton@samba.org>
> ---
> 
> Index: powerpc.git/drivers/staging/zram/zram_drv.c
> ===================================================================
> --- powerpc.git.orig/drivers/staging/zram/zram_drv.c	2010-09-01 12:35:14.286515175 +1000
> +++ powerpc.git/drivers/staging/zram/zram_drv.c	2010-09-01 12:35:24.167930504 +1000
> @@ -441,6 +441,12 @@ static int zram_make_request(struct requ
>  	int ret = 0;
>  	struct zram *zram = queue->queuedata;
>  
> +	if (unlikely(!zram->init_done)) {
> +		set_bit(BIO_UPTODATE, &bio->bi_flags);
> +		bio_endio(bio, 0);
> +		return 0;
> +	}
> +
>  	if (unlikely(!valid_io_request(zram, bio))) {
>  		zram_inc_stat(zram, ZRAM_STAT_INVALID_IO);
>  		bio_io_error(bio);

So... what happened with this and your other bugfix in
zram_reset_device()?

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
  2010-09-09 17:24   ` Dave Hansen
@ 2010-10-03 18:41     ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-10-03 18:41 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

Hi Dave,

Sorry for late reply. Since last month I couldn't get any chance to
work on this project.

On 9/9/2010 1:24 PM, Dave Hansen wrote:
> 
> I've been playing with using zram (from -staging) to back some qemu
> guest memory directly.  Basically mmap()'ing the device in instead of
> using anonymous memory.  The old code with the backing swap devices
> seemed to work pretty well, but I'm running into a problem with the new
> code.
> 
> I have plenty of swap on the system, and I'd been running with compcache
> nicely for a while.  But, I went to go tar up (and gzip) a pretty large
> directory in my qemu guest.  It panic'd the qemu host system:
> 
> [703826.003126] Kernel panic - not syncing: Out of memory and no killable processes...
> [703826.003127] 
> [703826.012350] Pid: 25508, comm: cat Not tainted 2.6.36-rc3-00114-g9b9913d #29
> [703826.019385] Call Trace:
> [703826.021928]  [<ffffffff8104032a>] panic+0xba/0x1e0
> [703826.026801]  [<ffffffff810bb4a1>] ? next_online_pgdat+0x21/0x50
> [703826.032799]  [<ffffffff810a7713>] ? find_lock_task_mm+0x23/0x60
> [703826.038795]  [<ffffffff810a79ab>] ? dump_header+0x19b/0x1b0
> [703826.044446]  [<ffffffff810a8157>] out_of_memory+0x297/0x2d0
> [703826.050098]  [<ffffffff810abbaf>] __alloc_pages_nodemask+0x72f/0x740
> [703826.056528]  [<ffffffff81110d4e>] ? __set_page_dirty+0x6e/0xc0
> [703826.062438]  [<ffffffff810da477>] alloc_pages_current+0x87/0xd0
> [703826.068438]  [<ffffffff810a533b>] __page_cache_alloc+0xb/0x10
> [703826.074263]  [<ffffffff810ae2ff>] __do_page_cache_readahead+0xdf/0x220
> [703826.080865]  [<ffffffff810ae45c>] ra_submit+0x1c/0x20
> [703826.085998]  [<ffffffff810ae5f8>] ondemand_readahead+0xa8/0x1d0
> [703826.091994]  [<ffffffff810ae797>] page_cache_async_readahead+0x77/0xc0
> [703826.098595]  [<ffffffff810a6489>] generic_file_aio_read+0x259/0x6d0
> [703826.104941]  [<ffffffff810eac21>] do_sync_read+0xd1/0x110
> [703826.110418]  [<ffffffff810eb3f6>] vfs_read+0xc6/0x170
> [703826.115547]  [<ffffffff810eb860>] sys_read+0x50/0x90
> [703826.120591]  [<ffffffff81002c2b>] system_call_fastpath+0x16/0x1b
> 
> I have the feeling that the compcache device all of a sudden lost its
> efficiency.  It can't do much about having non-compressible data stuck
> in it, of course.
> 
> But, it used to be able to write things out to backing storage.  It
> tries to return I/O errors when it runs out of space, but my system
> didn't get that far.  It panic'd before it got the chance.
> 
> This seems like an issue that will probably crop up when we use zram as
> a swap device too.  A panic seems like pretty undesirable behavior when
> you've simply changed the kind of data being used.  Have you run into
> this at all?
> 


Ability to write out zram (compressed) memory to a backing disk seems
really useful. However considering lkml reviews, I had to drop this
feature. Anyways, I guess I will try to push this feature again.

Also, please do not use linux-next/mainline version of compcache. Instead
just use version in the project repository here:
hg clone https://compcache.googlecode.com/hg/ compcache 

This is updated much more frequently and has many more bug fixes over
the mainline. It will also be easier to fix bugs/add features much more
quickly in this repo rather than sending them to lkml which can take
long time.

Thanks,
Nitin

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
@ 2010-10-03 18:41     ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-10-03 18:41 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel

Hi Dave,

Sorry for late reply. Since last month I couldn't get any chance to
work on this project.

On 9/9/2010 1:24 PM, Dave Hansen wrote:
> 
> I've been playing with using zram (from -staging) to back some qemu
> guest memory directly.  Basically mmap()'ing the device in instead of
> using anonymous memory.  The old code with the backing swap devices
> seemed to work pretty well, but I'm running into a problem with the new
> code.
> 
> I have plenty of swap on the system, and I'd been running with compcache
> nicely for a while.  But, I went to go tar up (and gzip) a pretty large
> directory in my qemu guest.  It panic'd the qemu host system:
> 
> [703826.003126] Kernel panic - not syncing: Out of memory and no killable processes...
> [703826.003127] 
> [703826.012350] Pid: 25508, comm: cat Not tainted 2.6.36-rc3-00114-g9b9913d #29
> [703826.019385] Call Trace:
> [703826.021928]  [<ffffffff8104032a>] panic+0xba/0x1e0
> [703826.026801]  [<ffffffff810bb4a1>] ? next_online_pgdat+0x21/0x50
> [703826.032799]  [<ffffffff810a7713>] ? find_lock_task_mm+0x23/0x60
> [703826.038795]  [<ffffffff810a79ab>] ? dump_header+0x19b/0x1b0
> [703826.044446]  [<ffffffff810a8157>] out_of_memory+0x297/0x2d0
> [703826.050098]  [<ffffffff810abbaf>] __alloc_pages_nodemask+0x72f/0x740
> [703826.056528]  [<ffffffff81110d4e>] ? __set_page_dirty+0x6e/0xc0
> [703826.062438]  [<ffffffff810da477>] alloc_pages_current+0x87/0xd0
> [703826.068438]  [<ffffffff810a533b>] __page_cache_alloc+0xb/0x10
> [703826.074263]  [<ffffffff810ae2ff>] __do_page_cache_readahead+0xdf/0x220
> [703826.080865]  [<ffffffff810ae45c>] ra_submit+0x1c/0x20
> [703826.085998]  [<ffffffff810ae5f8>] ondemand_readahead+0xa8/0x1d0
> [703826.091994]  [<ffffffff810ae797>] page_cache_async_readahead+0x77/0xc0
> [703826.098595]  [<ffffffff810a6489>] generic_file_aio_read+0x259/0x6d0
> [703826.104941]  [<ffffffff810eac21>] do_sync_read+0xd1/0x110
> [703826.110418]  [<ffffffff810eb3f6>] vfs_read+0xc6/0x170
> [703826.115547]  [<ffffffff810eb860>] sys_read+0x50/0x90
> [703826.120591]  [<ffffffff81002c2b>] system_call_fastpath+0x16/0x1b
> 
> I have the feeling that the compcache device all of a sudden lost its
> efficiency.  It can't do much about having non-compressible data stuck
> in it, of course.
> 
> But, it used to be able to write things out to backing storage.  It
> tries to return I/O errors when it runs out of space, but my system
> didn't get that far.  It panic'd before it got the chance.
> 
> This seems like an issue that will probably crop up when we use zram as
> a swap device too.  A panic seems like pretty undesirable behavior when
> you've simply changed the kind of data being used.  Have you run into
> this at all?
> 


Ability to write out zram (compressed) memory to a backing disk seems
really useful. However considering lkml reviews, I had to drop this
feature. Anyways, I guess I will try to push this feature again.

Also, please do not use linux-next/mainline version of compcache. Instead
just use version in the project repository here:
hg clone https://compcache.googlecode.com/hg/ compcache 

This is updated much more frequently and has many more bug fixes over
the mainline. It will also be easier to fix bugs/add features much more
quickly in this repo rather than sending them to lkml which can take
long time.

Thanks,
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
  2010-10-03 18:41     ` Nitin Gupta
@ 2010-10-03 19:27       ` Dave Hansen
  -1 siblings, 0 replies; 137+ messages in thread
From: Dave Hansen @ 2010-10-03 19:27 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel, Greg KH - Meetings

On Sun, 2010-10-03 at 14:41 -0400, Nitin Gupta wrote:
> Ability to write out zram (compressed) memory to a backing disk seems
> really useful. However considering lkml reviews, I had to drop this
> feature. Anyways, I guess I will try to push this feature again.

I'd argue that zram is pretty useless without some ability to write to a
backing store, unless you *really* know what is going to be stored in it
and you trust the user.  Otherwise, it's just too easy to OOM the
system.

I've been investigating backing the xvmalloc space with a tmpfs file.
Instead of keeping page/offset pairs, you just keep a linear address
inside the tmpfile file.  There's an extra step needed to look up and
lock the page cache page into place each time you go into the xvmalloc
store, but it does seem to basically work.  The patches are really rough
and not quite functional, but I'm happy to share if you want to see them
now.

> Also, please do not use linux-next/mainline version of compcache. Instead
> just use version in the project repository here:
> hg clone https://compcache.googlecode.com/hg/ compcache 
> 
> This is updated much more frequently and has many more bug fixes over
> the mainline. It will also be easier to fix bugs/add features much more
> quickly in this repo rather than sending them to lkml which can take
> long time.

That looks like just a clone of the code needed to build the module.  

Kernel developers are pretty used to _some_ kernel tree being the
authoritative source.  Also, having it in a kernel tree makes it
possible to get testing in places like linux-next, and it makes it
easier for people to make patches or kernel trees on top of your work. 

There's not really a point to the code being in -staging if it isn't
somewhat up-to-date or people can't generate patches to it.  It sounds
to me like we need to take it out of -staging.

-- Dave


^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
@ 2010-10-03 19:27       ` Dave Hansen
  0 siblings, 0 replies; 137+ messages in thread
From: Dave Hansen @ 2010-10-03 19:27 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel, Greg KH - Meetings

On Sun, 2010-10-03 at 14:41 -0400, Nitin Gupta wrote:
> Ability to write out zram (compressed) memory to a backing disk seems
> really useful. However considering lkml reviews, I had to drop this
> feature. Anyways, I guess I will try to push this feature again.

I'd argue that zram is pretty useless without some ability to write to a
backing store, unless you *really* know what is going to be stored in it
and you trust the user.  Otherwise, it's just too easy to OOM the
system.

I've been investigating backing the xvmalloc space with a tmpfs file.
Instead of keeping page/offset pairs, you just keep a linear address
inside the tmpfile file.  There's an extra step needed to look up and
lock the page cache page into place each time you go into the xvmalloc
store, but it does seem to basically work.  The patches are really rough
and not quite functional, but I'm happy to share if you want to see them
now.

> Also, please do not use linux-next/mainline version of compcache. Instead
> just use version in the project repository here:
> hg clone https://compcache.googlecode.com/hg/ compcache 
> 
> This is updated much more frequently and has many more bug fixes over
> the mainline. It will also be easier to fix bugs/add features much more
> quickly in this repo rather than sending them to lkml which can take
> long time.

That looks like just a clone of the code needed to build the module.  

Kernel developers are pretty used to _some_ kernel tree being the
authoritative source.  Also, having it in a kernel tree makes it
possible to get testing in places like linux-next, and it makes it
easier for people to make patches or kernel trees on top of your work. 

There's not really a point to the code being in -staging if it isn't
somewhat up-to-date or people can't generate patches to it.  It sounds
to me like we need to take it out of -staging.

-- Dave

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
  2010-10-03 19:27       ` Dave Hansen
@ 2010-10-03 19:40         ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-10-03 19:40 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel, Greg KH - Meetings

On 10/3/2010 3:27 PM, Dave Hansen wrote:
> On Sun, 2010-10-03 at 14:41 -0400, Nitin Gupta wrote:
>> Ability to write out zram (compressed) memory to a backing disk seems
>> really useful. However considering lkml reviews, I had to drop this
>> feature. Anyways, I guess I will try to push this feature again.
> 
> I'd argue that zram is pretty useless without some ability to write to a
> backing store, unless you *really* know what is going to be stored in it
> and you trust the user.  Otherwise, it's just too easy to OOM the
> system.
>
> I've been investigating backing the xvmalloc space with a tmpfs file.
> Instead of keeping page/offset pairs, you just keep a linear address
> inside the tmpfile file.  There's an extra step needed to look up and
> lock the page cache page into place each time you go into the xvmalloc
> store, but it does seem to basically work.  The patches are really rough
> and not quite functional, but I'm happy to share if you want to see them
> now.
>

Yes, I would be really interested to look at them. Thanks.

 
>> Also, please do not use linux-next/mainline version of compcache. Instead
>> just use version in the project repository here:
>> hg clone https://compcache.googlecode.com/hg/ compcache 
>>
>> This is updated much more frequently and has many more bug fixes over
>> the mainline. It will also be easier to fix bugs/add features much more
>> quickly in this repo rather than sending them to lkml which can take
>> long time.
> 
> That looks like just a clone of the code needed to build the module.  
> 
> Kernel developers are pretty used to _some_ kernel tree being the
> authoritative source.  Also, having it in a kernel tree makes it
> possible to get testing in places like linux-next, and it makes it
> easier for people to make patches or kernel trees on top of your work. 
> 
> There's not really a point to the code being in -staging if it isn't
> somewhat up-to-date or people can't generate patches to it.  It sounds
> to me like we need to take it out of -staging.
> 

I will try sending patches to sync mainline and hg code (along with
some changes in pipeline), or maybe just take it out of -staging and
send fresh patch series.

Thanks,
Nitin

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
@ 2010-10-03 19:40         ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-10-03 19:40 UTC (permalink / raw)
  To: Dave Hansen
  Cc: Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel, Greg KH - Meetings

On 10/3/2010 3:27 PM, Dave Hansen wrote:
> On Sun, 2010-10-03 at 14:41 -0400, Nitin Gupta wrote:
>> Ability to write out zram (compressed) memory to a backing disk seems
>> really useful. However considering lkml reviews, I had to drop this
>> feature. Anyways, I guess I will try to push this feature again.
> 
> I'd argue that zram is pretty useless without some ability to write to a
> backing store, unless you *really* know what is going to be stored in it
> and you trust the user.  Otherwise, it's just too easy to OOM the
> system.
>
> I've been investigating backing the xvmalloc space with a tmpfs file.
> Instead of keeping page/offset pairs, you just keep a linear address
> inside the tmpfile file.  There's an extra step needed to look up and
> lock the page cache page into place each time you go into the xvmalloc
> store, but it does seem to basically work.  The patches are really rough
> and not quite functional, but I'm happy to share if you want to see them
> now.
>

Yes, I would be really interested to look at them. Thanks.

 
>> Also, please do not use linux-next/mainline version of compcache. Instead
>> just use version in the project repository here:
>> hg clone https://compcache.googlecode.com/hg/ compcache 
>>
>> This is updated much more frequently and has many more bug fixes over
>> the mainline. It will also be easier to fix bugs/add features much more
>> quickly in this repo rather than sending them to lkml which can take
>> long time.
> 
> That looks like just a clone of the code needed to build the module.  
> 
> Kernel developers are pretty used to _some_ kernel tree being the
> authoritative source.  Also, having it in a kernel tree makes it
> possible to get testing in places like linux-next, and it makes it
> easier for people to make patches or kernel trees on top of your work. 
> 
> There's not really a point to the code being in -staging if it isn't
> somewhat up-to-date or people can't generate patches to it.  It sounds
> to me like we need to take it out of -staging.
> 

I will try sending patches to sync mainline and hg code (along with
some changes in pipeline), or maybe just take it out of -staging and
send fresh patch series.

Thanks,
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
  2010-10-03 19:40         ` Nitin Gupta
@ 2010-10-04 11:08           ` Ed Tomlinson
  -1 siblings, 0 replies; 137+ messages in thread
From: Ed Tomlinson @ 2010-10-04 11:08 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Dave Hansen, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel, Greg KH - Meetings

On Sunday 03 October 2010 15:40:55 Nitin Gupta wrote:
> On 10/3/2010 3:27 PM, Dave Hansen wrote:
> > On Sun, 2010-10-03 at 14:41 -0400, Nitin Gupta wrote:
> >> Ability to write out zram (compressed) memory to a backing disk seems
> >> really useful. However considering lkml reviews, I had to drop this
> >> feature. Anyways, I guess I will try to push this feature again.
> > 
> > I'd argue that zram is pretty useless without some ability to write to a
> > backing store, unless you *really* know what is going to be stored in it
> > and you trust the user.  Otherwise, it's just too easy to OOM the
> > system.
> >
> > I've been investigating backing the xvmalloc space with a tmpfs file.
> > Instead of keeping page/offset pairs, you just keep a linear address
> > inside the tmpfile file.  There's an extra step needed to look up and
> > lock the page cache page into place each time you go into the xvmalloc
> > store, but it does seem to basically work.  The patches are really rough
> > and not quite functional, but I'm happy to share if you want to see them
> > now.
> >
> 
> Yes, I would be really interested to look at them. Thanks.
> 
>  
> >> Also, please do not use linux-next/mainline version of compcache. Instead
> >> just use version in the project repository here:
> >> hg clone https://compcache.googlecode.com/hg/ compcache 
> >>
> >> This is updated much more frequently and has many more bug fixes over
> >> the mainline. It will also be easier to fix bugs/add features much more
> >> quickly in this repo rather than sending them to lkml which can take
> >> long time.
> > 
> > That looks like just a clone of the code needed to build the module.  
> > 
> > Kernel developers are pretty used to _some_ kernel tree being the
> > authoritative source.  Also, having it in a kernel tree makes it
> > possible to get testing in places like linux-next, and it makes it
> > easier for people to make patches or kernel trees on top of your work. 
> > 
> > There's not really a point to the code being in -staging if it isn't
> > somewhat up-to-date or people can't generate patches to it.  It sounds
> > to me like we need to take it out of -staging.
> > 
> 
> I will try sending patches to sync mainline and hg code (along with
> some changes in pipeline), or maybe just take it out of -staging and
> send fresh patch series.

Or move it to a git tree.  Then generating patches becomes tivial for all of
us and keeping staging upto date becomes easier for you.

Ed

> Thanks,
> Nitin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
@ 2010-10-04 11:08           ` Ed Tomlinson
  0 siblings, 0 replies; 137+ messages in thread
From: Ed Tomlinson @ 2010-10-04 11:08 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Dave Hansen, Pekka Enberg, Minchan Kim, Andrew Morton, Greg KH,
	Linux Driver Project, linux-mm, linux-kernel, Greg KH - Meetings

On Sunday 03 October 2010 15:40:55 Nitin Gupta wrote:
> On 10/3/2010 3:27 PM, Dave Hansen wrote:
> > On Sun, 2010-10-03 at 14:41 -0400, Nitin Gupta wrote:
> >> Ability to write out zram (compressed) memory to a backing disk seems
> >> really useful. However considering lkml reviews, I had to drop this
> >> feature. Anyways, I guess I will try to push this feature again.
> > 
> > I'd argue that zram is pretty useless without some ability to write to a
> > backing store, unless you *really* know what is going to be stored in it
> > and you trust the user.  Otherwise, it's just too easy to OOM the
> > system.
> >
> > I've been investigating backing the xvmalloc space with a tmpfs file.
> > Instead of keeping page/offset pairs, you just keep a linear address
> > inside the tmpfile file.  There's an extra step needed to look up and
> > lock the page cache page into place each time you go into the xvmalloc
> > store, but it does seem to basically work.  The patches are really rough
> > and not quite functional, but I'm happy to share if you want to see them
> > now.
> >
> 
> Yes, I would be really interested to look at them. Thanks.
> 
>  
> >> Also, please do not use linux-next/mainline version of compcache. Instead
> >> just use version in the project repository here:
> >> hg clone https://compcache.googlecode.com/hg/ compcache 
> >>
> >> This is updated much more frequently and has many more bug fixes over
> >> the mainline. It will also be easier to fix bugs/add features much more
> >> quickly in this repo rather than sending them to lkml which can take
> >> long time.
> > 
> > That looks like just a clone of the code needed to build the module.  
> > 
> > Kernel developers are pretty used to _some_ kernel tree being the
> > authoritative source.  Also, having it in a kernel tree makes it
> > possible to get testing in places like linux-next, and it makes it
> > easier for people to make patches or kernel trees on top of your work. 
> > 
> > There's not really a point to the code being in -staging if it isn't
> > somewhat up-to-date or people can't generate patches to it.  It sounds
> > to me like we need to take it out of -staging.
> > 
> 
> I will try sending patches to sync mainline and hg code (along with
> some changes in pipeline), or maybe just take it out of -staging and
> send fresh patch series.

Or move it to a git tree.  Then generating patches becomes tivial for all of
us and keeping staging upto date becomes easier for you.

Ed

> Thanks,
> Nitin
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
> 
> 

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
  2010-10-03 18:41     ` Nitin Gupta
@ 2010-10-05 23:43       ` Greg KH
  -1 siblings, 0 replies; 137+ messages in thread
From: Greg KH @ 2010-10-05 23:43 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Dave Hansen, Pekka Enberg, Minchan Kim, Andrew Morton,
	Linux Driver Project, linux-mm, linux-kernel

On Sun, Oct 03, 2010 at 02:41:09PM -0400, Nitin Gupta wrote:
> Also, please do not use linux-next/mainline version of compcache. Instead
> just use version in the project repository here:
> hg clone https://compcache.googlecode.com/hg/ compcache 

What?  No, the reason we put this into the kernel was so that _everyone_
could work on it, including the original developers.  Going off and
doing development somewhere else just isn't ok.  Should I just delete
this driver from the staging tree as you don't seem to want to work with
the community at this point in time?

> This is updated much more frequently and has many more bug fixes over
> the mainline. It will also be easier to fix bugs/add features much more
> quickly in this repo rather than sending them to lkml which can take
> long time.

Yes, developing in your own sandbox can always be faster, but there is
no feedback loop.

sad,

greg k-h

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
@ 2010-10-05 23:43       ` Greg KH
  0 siblings, 0 replies; 137+ messages in thread
From: Greg KH @ 2010-10-05 23:43 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Dave Hansen, Pekka Enberg, Minchan Kim, Andrew Morton,
	Linux Driver Project, linux-mm, linux-kernel

On Sun, Oct 03, 2010 at 02:41:09PM -0400, Nitin Gupta wrote:
> Also, please do not use linux-next/mainline version of compcache. Instead
> just use version in the project repository here:
> hg clone https://compcache.googlecode.com/hg/ compcache 

What?  No, the reason we put this into the kernel was so that _everyone_
could work on it, including the original developers.  Going off and
doing development somewhere else just isn't ok.  Should I just delete
this driver from the staging tree as you don't seem to want to work with
the community at this point in time?

> This is updated much more frequently and has many more bug fixes over
> the mainline. It will also be easier to fix bugs/add features much more
> quickly in this repo rather than sending them to lkml which can take
> long time.

Yes, developing in your own sandbox can always be faster, but there is
no feedback loop.

sad,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
  2010-10-05 23:43       ` Greg KH
@ 2010-10-06  2:29         ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-10-06  2:29 UTC (permalink / raw)
  To: Greg KH
  Cc: Dave Hansen, Pekka Enberg, Minchan Kim, Andrew Morton,
	Linux Driver Project, linux-mm, linux-kernel

On 10/5/2010 7:43 PM, Greg KH wrote:
> On Sun, Oct 03, 2010 at 02:41:09PM -0400, Nitin Gupta wrote:
>> Also, please do not use linux-next/mainline version of compcache. Instead
>> just use version in the project repository here:
>> hg clone https://compcache.googlecode.com/hg/ compcache 
> 
> What?  No, the reason we put this into the kernel was so that _everyone_
> could work on it, including the original developers.  Going off and
> doing development somewhere else just isn't ok.  Should I just delete
> this driver from the staging tree as you don't seem to want to work with
> the community at this point in time?
>

Getting it out of -staging wasn't my intent. Community is the reason
that this project still exists.


>> This is updated much more frequently and has many more bug fixes over
>> the mainline. It will also be easier to fix bugs/add features much more
>> quickly in this repo rather than sending them to lkml which can take
>> long time.
> 
> Yes, developing in your own sandbox can always be faster, but there is
> no feedback loop.
> 

I was finding it real hard to find time to properly discuss each patch
over LKML, so I thought of shifting focus to local project repository
and then later go through proper reviews.

Thanks,
Nitin

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
@ 2010-10-06  2:29         ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-10-06  2:29 UTC (permalink / raw)
  To: Greg KH
  Cc: Dave Hansen, Pekka Enberg, Minchan Kim, Andrew Morton,
	Linux Driver Project, linux-mm, linux-kernel

On 10/5/2010 7:43 PM, Greg KH wrote:
> On Sun, Oct 03, 2010 at 02:41:09PM -0400, Nitin Gupta wrote:
>> Also, please do not use linux-next/mainline version of compcache. Instead
>> just use version in the project repository here:
>> hg clone https://compcache.googlecode.com/hg/ compcache 
> 
> What?  No, the reason we put this into the kernel was so that _everyone_
> could work on it, including the original developers.  Going off and
> doing development somewhere else just isn't ok.  Should I just delete
> this driver from the staging tree as you don't seem to want to work with
> the community at this point in time?
>

Getting it out of -staging wasn't my intent. Community is the reason
that this project still exists.


>> This is updated much more frequently and has many more bug fixes over
>> the mainline. It will also be easier to fix bugs/add features much more
>> quickly in this repo rather than sending them to lkml which can take
>> long time.
> 
> Yes, developing in your own sandbox can always be faster, but there is
> no feedback loop.
> 

I was finding it real hard to find time to properly discuss each patch
over LKML, so I thought of shifting focus to local project repository
and then later go through proper reviews.

Thanks,
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
  2010-10-06  2:29         ` Nitin Gupta
@ 2010-10-06  2:36           ` Greg KH
  -1 siblings, 0 replies; 137+ messages in thread
From: Greg KH @ 2010-10-06  2:36 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Dave Hansen, Pekka Enberg, Minchan Kim, Andrew Morton,
	Linux Driver Project, linux-mm, linux-kernel

On Tue, Oct 05, 2010 at 10:29:34PM -0400, Nitin Gupta wrote:
> On 10/5/2010 7:43 PM, Greg KH wrote:
> > On Sun, Oct 03, 2010 at 02:41:09PM -0400, Nitin Gupta wrote:
> >> Also, please do not use linux-next/mainline version of compcache. Instead
> >> just use version in the project repository here:
> >> hg clone https://compcache.googlecode.com/hg/ compcache 
> > 
> > What?  No, the reason we put this into the kernel was so that _everyone_
> > could work on it, including the original developers.  Going off and
> > doing development somewhere else just isn't ok.  Should I just delete
> > this driver from the staging tree as you don't seem to want to work with
> > the community at this point in time?
> >
> 
> Getting it out of -staging wasn't my intent. Community is the reason
> that this project still exists.
> 
> 
> >> This is updated much more frequently and has many more bug fixes over
> >> the mainline. It will also be easier to fix bugs/add features much more
> >> quickly in this repo rather than sending them to lkml which can take
> >> long time.
> > 
> > Yes, developing in your own sandbox can always be faster, but there is
> > no feedback loop.
> > 
> 
> I was finding it real hard to find time to properly discuss each patch
> over LKML, so I thought of shifting focus to local project repository
> and then later go through proper reviews.

So, should I delete the version in staging, or are you going to send
patches to sync it up with your development version?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
@ 2010-10-06  2:36           ` Greg KH
  0 siblings, 0 replies; 137+ messages in thread
From: Greg KH @ 2010-10-06  2:36 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Dave Hansen, Pekka Enberg, Minchan Kim, Andrew Morton,
	Linux Driver Project, linux-mm, linux-kernel

On Tue, Oct 05, 2010 at 10:29:34PM -0400, Nitin Gupta wrote:
> On 10/5/2010 7:43 PM, Greg KH wrote:
> > On Sun, Oct 03, 2010 at 02:41:09PM -0400, Nitin Gupta wrote:
> >> Also, please do not use linux-next/mainline version of compcache. Instead
> >> just use version in the project repository here:
> >> hg clone https://compcache.googlecode.com/hg/ compcache 
> > 
> > What?  No, the reason we put this into the kernel was so that _everyone_
> > could work on it, including the original developers.  Going off and
> > doing development somewhere else just isn't ok.  Should I just delete
> > this driver from the staging tree as you don't seem to want to work with
> > the community at this point in time?
> >
> 
> Getting it out of -staging wasn't my intent. Community is the reason
> that this project still exists.
> 
> 
> >> This is updated much more frequently and has many more bug fixes over
> >> the mainline. It will also be easier to fix bugs/add features much more
> >> quickly in this repo rather than sending them to lkml which can take
> >> long time.
> > 
> > Yes, developing in your own sandbox can always be faster, but there is
> > no feedback loop.
> > 
> 
> I was finding it real hard to find time to properly discuss each patch
> over LKML, so I thought of shifting focus to local project repository
> and then later go through proper reviews.

So, should I delete the version in staging, or are you going to send
patches to sync it up with your development version?

thanks,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
  2010-10-06  2:36           ` Greg KH
@ 2010-10-06  4:30             ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-10-06  4:30 UTC (permalink / raw)
  To: Greg KH
  Cc: Dave Hansen, Pekka Enberg, Minchan Kim, Andrew Morton,
	Linux Driver Project, linux-mm, linux-kernel

On 10/5/2010 10:36 PM, Greg KH wrote:
> On Tue, Oct 05, 2010 at 10:29:34PM -0400, Nitin Gupta wrote:
>> On 10/5/2010 7:43 PM, Greg KH wrote:
>>> On Sun, Oct 03, 2010 at 02:41:09PM -0400, Nitin Gupta wrote:
>>>> Also, please do not use linux-next/mainline version of compcache. Instead
>>>> just use version in the project repository here:
>>>> hg clone https://compcache.googlecode.com/hg/ compcache 
>>>
>>> What?  No, the reason we put this into the kernel was so that _everyone_
>>> could work on it, including the original developers.  Going off and
>>> doing development somewhere else just isn't ok.  Should I just delete
>>> this driver from the staging tree as you don't seem to want to work with
>>> the community at this point in time?
>>>
>>
>> Getting it out of -staging wasn't my intent. Community is the reason
>> that this project still exists.
>>
>>
>>>> This is updated much more frequently and has many more bug fixes over
>>>> the mainline. It will also be easier to fix bugs/add features much more
>>>> quickly in this repo rather than sending them to lkml which can take
>>>> long time.
>>>
>>> Yes, developing in your own sandbox can always be faster, but there is
>>> no feedback loop.
>>>
>>
>> I was finding it real hard to find time to properly discuss each patch
>> over LKML, so I thought of shifting focus to local project repository
>> and then later go through proper reviews.
> 
> So, should I delete the version in staging, or are you going to send
> patches to sync it up with your development version?
> 

Deleting it from staging would not help much. Much more helpful would
be to sync at least the mainline and linux-next version of the driver
so it's easier to develop against these kernel trees.  Initially, I
thought -staging means that any reviewed change can quickly make it
to *both* linux-next and more importantly -staging in mainline. Working/
Testing against mainline is much smoother than against linux-next.

Thanks,
Nitin



^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
@ 2010-10-06  4:30             ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-10-06  4:30 UTC (permalink / raw)
  To: Greg KH
  Cc: Dave Hansen, Pekka Enberg, Minchan Kim, Andrew Morton,
	Linux Driver Project, linux-mm, linux-kernel

On 10/5/2010 10:36 PM, Greg KH wrote:
> On Tue, Oct 05, 2010 at 10:29:34PM -0400, Nitin Gupta wrote:
>> On 10/5/2010 7:43 PM, Greg KH wrote:
>>> On Sun, Oct 03, 2010 at 02:41:09PM -0400, Nitin Gupta wrote:
>>>> Also, please do not use linux-next/mainline version of compcache. Instead
>>>> just use version in the project repository here:
>>>> hg clone https://compcache.googlecode.com/hg/ compcache 
>>>
>>> What?  No, the reason we put this into the kernel was so that _everyone_
>>> could work on it, including the original developers.  Going off and
>>> doing development somewhere else just isn't ok.  Should I just delete
>>> this driver from the staging tree as you don't seem to want to work with
>>> the community at this point in time?
>>>
>>
>> Getting it out of -staging wasn't my intent. Community is the reason
>> that this project still exists.
>>
>>
>>>> This is updated much more frequently and has many more bug fixes over
>>>> the mainline. It will also be easier to fix bugs/add features much more
>>>> quickly in this repo rather than sending them to lkml which can take
>>>> long time.
>>>
>>> Yes, developing in your own sandbox can always be faster, but there is
>>> no feedback loop.
>>>
>>
>> I was finding it real hard to find time to properly discuss each patch
>> over LKML, so I thought of shifting focus to local project repository
>> and then later go through proper reviews.
> 
> So, should I delete the version in staging, or are you going to send
> patches to sync it up with your development version?
> 

Deleting it from staging would not help much. Much more helpful would
be to sync at least the mainline and linux-next version of the driver
so it's easier to develop against these kernel trees.  Initially, I
thought -staging means that any reviewed change can quickly make it
to *both* linux-next and more importantly -staging in mainline. Working/
Testing against mainline is much smoother than against linux-next.

Thanks,
Nitin


--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
  2010-10-06  4:30             ` Nitin Gupta
@ 2010-10-06  7:38               ` Pekka Enberg
  -1 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-10-06  7:38 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Greg KH, Dave Hansen, Pekka Enberg, Minchan Kim, Andrew Morton,
	Linux Driver Project, linux-mm, linux-kernel

Hi,

On Wed, Oct 6, 2010 at 7:30 AM, Nitin Gupta <ngupta@vflare.org> wrote:
> Deleting it from staging would not help much. Much more helpful would
> be to sync at least the mainline and linux-next version of the driver
> so it's easier to develop against these kernel trees.  Initially, I
> thought -staging means that any reviewed change can quickly make it
> to *both* linux-next and more importantly -staging in mainline. Working/
> Testing against mainline is much smoother than against linux-next.

We can't push the patches immediately to mainline because we need to
respect the merge window. You shouldn't need to rely on linux-next for
testing, though, but work directly against Greg's staging tree. Greg,
where's the official tree at, btw? The tree at

  http://kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/

seems empty.

                        Pekka

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
@ 2010-10-06  7:38               ` Pekka Enberg
  0 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-10-06  7:38 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Greg KH, Dave Hansen, Pekka Enberg, Minchan Kim, Andrew Morton,
	Linux Driver Project, linux-mm, linux-kernel

Hi,

On Wed, Oct 6, 2010 at 7:30 AM, Nitin Gupta <ngupta@vflare.org> wrote:
> Deleting it from staging would not help much. Much more helpful would
> be to sync at least the mainline and linux-next version of the driver
> so it's easier to develop against these kernel trees.  Initially, I
> thought -staging means that any reviewed change can quickly make it
> to *both* linux-next and more importantly -staging in mainline. Working/
> Testing against mainline is much smoother than against linux-next.

We can't push the patches immediately to mainline because we need to
respect the merge window. You shouldn't need to rely on linux-next for
testing, though, but work directly against Greg's staging tree. Greg,
where's the official tree at, btw? The tree at

  http://kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/

seems empty.

                        Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
  2010-10-06  4:30             ` Nitin Gupta
@ 2010-10-06 14:02               ` Greg KH
  -1 siblings, 0 replies; 137+ messages in thread
From: Greg KH @ 2010-10-06 14:02 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Dave Hansen, Pekka Enberg, Minchan Kim, Andrew Morton,
	Linux Driver Project, linux-mm, linux-kernel

On Wed, Oct 06, 2010 at 12:30:39AM -0400, Nitin Gupta wrote:
> > So, should I delete the version in staging, or are you going to send
> > patches to sync it up with your development version?
> > 
> 
> Deleting it from staging would not help much. Much more helpful would
> be to sync at least the mainline and linux-next version of the driver
> so it's easier to develop against these kernel trees.  Initially, I
> thought -staging means that any reviewed change can quickly make it
> to *both* linux-next and more importantly -staging in mainline.

It makes it quickly to linux-next, right?

Nothing goes "quickly" to mainline, other than the normal development
process, which is well documented as to how it all works.

> Working/ Testing against mainline is much smoother than against
> linux-next.

Sure it's "easier", but that's not how kernel development is done,
sorry.  See Documentation/development-process/ for details.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
@ 2010-10-06 14:02               ` Greg KH
  0 siblings, 0 replies; 137+ messages in thread
From: Greg KH @ 2010-10-06 14:02 UTC (permalink / raw)
  To: Nitin Gupta
  Cc: Dave Hansen, Pekka Enberg, Minchan Kim, Andrew Morton,
	Linux Driver Project, linux-mm, linux-kernel

On Wed, Oct 06, 2010 at 12:30:39AM -0400, Nitin Gupta wrote:
> > So, should I delete the version in staging, or are you going to send
> > patches to sync it up with your development version?
> > 
> 
> Deleting it from staging would not help much. Much more helpful would
> be to sync at least the mainline and linux-next version of the driver
> so it's easier to develop against these kernel trees.  Initially, I
> thought -staging means that any reviewed change can quickly make it
> to *both* linux-next and more importantly -staging in mainline.

It makes it quickly to linux-next, right?

Nothing goes "quickly" to mainline, other than the normal development
process, which is well documented as to how it all works.

> Working/ Testing against mainline is much smoother than against
> linux-next.

Sure it's "easier", but that's not how kernel development is done,
sorry.  See Documentation/development-process/ for details.

thanks,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
  2010-10-06  7:38               ` Pekka Enberg
@ 2010-10-06 14:03                 ` Greg KH
  -1 siblings, 0 replies; 137+ messages in thread
From: Greg KH @ 2010-10-06 14:03 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Nitin Gupta, Dave Hansen, Pekka Enberg, Minchan Kim,
	Andrew Morton, Linux Driver Project, linux-mm, linux-kernel

On Wed, Oct 06, 2010 at 10:38:59AM +0300, Pekka Enberg wrote:
> Hi,
> 
> On Wed, Oct 6, 2010 at 7:30 AM, Nitin Gupta <ngupta@vflare.org> wrote:
> > Deleting it from staging would not help much. Much more helpful would
> > be to sync at least the mainline and linux-next version of the driver
> > so it's easier to develop against these kernel trees.  Initially, I
> > thought -staging means that any reviewed change can quickly make it
> > to *both* linux-next and more importantly -staging in mainline. Working/
> > Testing against mainline is much smoother than against linux-next.
> 
> We can't push the patches immediately to mainline because we need to
> respect the merge window. You shouldn't need to rely on linux-next for
> testing, though, but work directly against Greg's staging tree. Greg,
> where's the official tree at, btw? The tree at
> 
>   http://kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/
> 
> seems empty.

Oops, I need to update the MAINTAINERS file, the proper place for the
staging tree is now in git, at

git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-next-2.6.git

which feeds directly into the linux-next tree.

hope this helps,

greg k-h

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
@ 2010-10-06 14:03                 ` Greg KH
  0 siblings, 0 replies; 137+ messages in thread
From: Greg KH @ 2010-10-06 14:03 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Nitin Gupta, Dave Hansen, Pekka Enberg, Minchan Kim,
	Andrew Morton, Linux Driver Project, linux-mm, linux-kernel

On Wed, Oct 06, 2010 at 10:38:59AM +0300, Pekka Enberg wrote:
> Hi,
> 
> On Wed, Oct 6, 2010 at 7:30 AM, Nitin Gupta <ngupta@vflare.org> wrote:
> > Deleting it from staging would not help much. Much more helpful would
> > be to sync at least the mainline and linux-next version of the driver
> > so it's easier to develop against these kernel trees.  Initially, I
> > thought -staging means that any reviewed change can quickly make it
> > to *both* linux-next and more importantly -staging in mainline. Working/
> > Testing against mainline is much smoother than against linux-next.
> 
> We can't push the patches immediately to mainline because we need to
> respect the merge window. You shouldn't need to rely on linux-next for
> testing, though, but work directly against Greg's staging tree. Greg,
> where's the official tree at, btw? The tree at
> 
>   http://kernel.org/pub/linux/kernel/people/gregkh/gregkh-2.6/
> 
> seems empty.

Oops, I need to update the MAINTAINERS file, the proper place for the
staging tree is now in git, at

git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-next-2.6.git

which feeds directly into the linux-next tree.

hope this helps,

greg k-h

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
  2010-10-06 14:03                 ` Greg KH
@ 2010-10-06 14:16                   ` Pekka Enberg
  -1 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-10-06 14:16 UTC (permalink / raw)
  To: Greg KH
  Cc: Nitin Gupta, Dave Hansen, Pekka Enberg, Minchan Kim,
	Andrew Morton, Linux Driver Project, linux-mm, linux-kernel

  On 6.10.2010 17.03, Greg KH wrote:
> Oops, I need to update the MAINTAINERS file, the proper place for the
> staging tree is now in git, at
>
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-next-2.6.git
>
> which feeds directly into the linux-next tree.

Excellent! Nitin, can you develop and test zram against this tree?

             Pekka

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
@ 2010-10-06 14:16                   ` Pekka Enberg
  0 siblings, 0 replies; 137+ messages in thread
From: Pekka Enberg @ 2010-10-06 14:16 UTC (permalink / raw)
  To: Greg KH
  Cc: Nitin Gupta, Dave Hansen, Pekka Enberg, Minchan Kim,
	Andrew Morton, Linux Driver Project, linux-mm, linux-kernel

  On 6.10.2010 17.03, Greg KH wrote:
> Oops, I need to update the MAINTAINERS file, the proper place for the
> staging tree is now in git, at
>
> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-next-2.6.git
>
> which feeds directly into the linux-next tree.

Excellent! Nitin, can you develop and test zram against this tree?

             Pekka

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
  2010-10-06 14:16                   ` Pekka Enberg
@ 2010-10-06 14:53                     ` Nitin Gupta
  -1 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-10-06 14:53 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Greg KH, Dave Hansen, Pekka Enberg, Minchan Kim, Andrew Morton,
	Linux Driver Project, linux-mm, linux-kernel

On 10/6/2010 10:16 AM, Pekka Enberg wrote:
>  On 6.10.2010 17.03, Greg KH wrote:
>> Oops, I need to update the MAINTAINERS file, the proper place for the
>> staging tree is now in git, at
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-next-2.6.git
>>
>> which feeds directly into the linux-next tree.
> 
> Excellent! Nitin, can you develop and test zram against this tree?
> 

This seems like the ideal tree to develop against.

Thanks!
Nitin

^ permalink raw reply	[flat|nested] 137+ messages in thread

* Re: OOM panics with zram
@ 2010-10-06 14:53                     ` Nitin Gupta
  0 siblings, 0 replies; 137+ messages in thread
From: Nitin Gupta @ 2010-10-06 14:53 UTC (permalink / raw)
  To: Pekka Enberg
  Cc: Greg KH, Dave Hansen, Pekka Enberg, Minchan Kim, Andrew Morton,
	Linux Driver Project, linux-mm, linux-kernel

On 10/6/2010 10:16 AM, Pekka Enberg wrote:
>  On 6.10.2010 17.03, Greg KH wrote:
>> Oops, I need to update the MAINTAINERS file, the proper place for the
>> staging tree is now in git, at
>>
>> git://git.kernel.org/pub/scm/linux/kernel/git/gregkh/staging-next-2.6.git
>>
>> which feeds directly into the linux-next tree.
> 
> Excellent! Nitin, can you develop and test zram against this tree?
> 

This seems like the ideal tree to develop against.

Thanks!
Nitin

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@kvack.org.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@kvack.org"> email@kvack.org </a>

^ permalink raw reply	[flat|nested] 137+ messages in thread

end of thread, other threads:[~2010-10-06 14:53 UTC | newest]

Thread overview: 137+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2010-08-09 17:26 [PATCH 00/10] zram: various improvements and cleanups Nitin Gupta
2010-08-09 17:26 ` Nitin Gupta
2010-08-09 17:26 ` [PATCH 01/10] Replace ioctls with sysfs interface Nitin Gupta
2010-08-09 17:26   ` Nitin Gupta
2010-08-09 18:34   ` Pekka Enberg
2010-08-09 18:34     ` Pekka Enberg
2010-08-10  3:06     ` Nitin Gupta
2010-08-10  3:06       ` Nitin Gupta
2010-08-31 23:06   ` Dave Hansen
2010-08-31 23:06     ` Dave Hansen
2010-08-09 17:26 ` [PATCH 02/10] Remove need for explicit device initialization Nitin Gupta
2010-08-09 17:26   ` Nitin Gupta
2010-08-09 18:36   ` Pekka Enberg
2010-08-09 18:36     ` Pekka Enberg
2010-08-10  3:38     ` Nitin Gupta
2010-08-10  3:38       ` Nitin Gupta
2010-08-09 17:26 ` [PATCH 03/10] Use percpu stats Nitin Gupta
2010-08-09 17:26   ` Nitin Gupta
2010-08-09 18:44   ` Pekka Enberg
2010-08-09 18:44     ` Pekka Enberg
2010-08-10  4:34   ` Andrew Morton
2010-08-10  4:34     ` Andrew Morton
2010-08-11 16:39     ` Nitin Gupta
2010-08-11 16:39       ` Nitin Gupta
2010-08-11 17:18       ` Andrew Morton
2010-08-11 17:18         ` Andrew Morton
2010-08-30 16:20   ` Christoph Lameter
2010-08-30 16:20     ` Christoph Lameter
2010-08-31 20:31     ` Nitin Gupta
2010-08-31 20:31       ` Nitin Gupta
2010-08-31 21:28       ` Eric Dumazet
2010-08-31 21:28         ` Eric Dumazet
2010-08-31 21:35         ` Christoph Lameter
2010-08-31 21:35           ` Christoph Lameter
2010-08-31 21:41           ` Eric Dumazet
2010-08-31 21:41             ` Eric Dumazet
2010-09-01 20:05             ` Christoph Lameter
2010-09-01 20:05               ` Christoph Lameter
2010-09-01 20:38               ` Eric Dumazet
2010-09-01 20:38                 ` Eric Dumazet
2010-09-02  0:34                 ` Christoph Lameter
2010-09-02  0:34                   ` Christoph Lameter
2010-08-31  5:36   ` Anton Blanchard
2010-08-31  5:36     ` Anton Blanchard
2010-09-01  3:41     ` Anton Blanchard
2010-09-01  3:41       ` Anton Blanchard
2010-09-01  3:51   ` Anton Blanchard
2010-09-01  3:51     ` Anton Blanchard
2010-09-17 20:59     ` Andrew Morton
2010-09-17 20:59       ` Andrew Morton
2010-08-09 17:26 ` [PATCH 04/10] Use percpu buffers Nitin Gupta
2010-08-09 17:26   ` Nitin Gupta
2010-08-09 18:57   ` Pekka Enberg
2010-08-09 18:57     ` Pekka Enberg
2010-08-10  4:47     ` Nitin Gupta
2010-08-10  4:47       ` Nitin Gupta
2010-08-10  5:05       ` Pekka Enberg
2010-08-10  5:05         ` Pekka Enberg
2010-08-10  5:32         ` Nitin Gupta
2010-08-10  5:32           ` Nitin Gupta
2010-08-10  7:36           ` Pekka Enberg
2010-08-10  7:36             ` Pekka Enberg
2010-08-09 17:26 ` [PATCH 05/10] Reduce per table entry overhead by 4 bytes Nitin Gupta
2010-08-09 17:26   ` Nitin Gupta
2010-08-09 18:59   ` Pekka Enberg
2010-08-09 18:59     ` Pekka Enberg
2010-08-10  4:55     ` Nitin Gupta
2010-08-10  4:55       ` Nitin Gupta
2010-08-09 17:26 ` [PATCH 06/10] Block discard support Nitin Gupta
2010-08-09 17:26   ` Nitin Gupta
2010-08-09 19:03   ` Pekka Enberg
2010-08-09 19:03     ` Pekka Enberg
2010-08-10  2:23     ` Jens Axboe
2010-08-10  2:23       ` Jens Axboe
2010-08-10  2:23       ` Jens Axboe
2010-08-10  4:54       ` Nitin Gupta
2010-08-10  4:54         ` Nitin Gupta
2010-08-10  4:54         ` Nitin Gupta
2010-08-10 15:54         ` Jens Axboe
2010-08-10 15:54           ` Jens Axboe
2010-08-10 15:54           ` Jens Axboe
2010-08-09 17:26 ` [PATCH 07/10] Increase compressed page size threshold Nitin Gupta
2010-08-09 17:26   ` Nitin Gupta
2010-08-09 18:32   ` Pekka Enberg
2010-08-09 18:32     ` Pekka Enberg
2010-08-09 17:26 ` [PATCH 08/10] Some cleanups Nitin Gupta
2010-08-09 17:26   ` Nitin Gupta
2010-08-09 19:02   ` Pekka Enberg
2010-08-09 19:02     ` Pekka Enberg
2010-08-09 17:26 ` [PATCH 09/10] Update zram documentation Nitin Gupta
2010-08-09 17:26   ` Nitin Gupta
2010-08-09 17:26 ` [PATCH 10/10] Document sysfs entries Nitin Gupta
2010-08-09 17:26   ` Nitin Gupta
2010-08-09 19:02   ` Pekka Enberg
2010-08-09 19:02     ` Pekka Enberg
2010-08-31 22:37 ` [PATCH 00/10] zram: various improvements and cleanups Greg KH
2010-08-31 22:37   ` Greg KH
2010-09-01  3:32 ` Anton Blanchard
2010-09-01  3:32   ` Anton Blanchard
2010-09-09 17:24 ` OOM panics with zram Dave Hansen
2010-09-09 17:24   ` Dave Hansen
2010-09-09 19:07   ` [patch -rc] oom: always return a badness score of non-zero for eligible tasks David Rientjes
2010-09-09 19:07     ` David Rientjes
2010-09-09 19:48     ` Dave Hansen
2010-09-09 19:48       ` Dave Hansen
2010-09-09 21:00       ` David Rientjes
2010-09-09 21:00         ` David Rientjes
2010-09-09 21:10         ` Dave Hansen
2010-09-09 21:10           ` Dave Hansen
2010-09-09 21:40           ` David Rientjes
2010-09-09 21:40             ` David Rientjes
2010-10-03 18:41   ` OOM panics with zram Nitin Gupta
2010-10-03 18:41     ` Nitin Gupta
2010-10-03 19:27     ` Dave Hansen
2010-10-03 19:27       ` Dave Hansen
2010-10-03 19:40       ` Nitin Gupta
2010-10-03 19:40         ` Nitin Gupta
2010-10-04 11:08         ` Ed Tomlinson
2010-10-04 11:08           ` Ed Tomlinson
2010-10-05 23:43     ` Greg KH
2010-10-05 23:43       ` Greg KH
2010-10-06  2:29       ` Nitin Gupta
2010-10-06  2:29         ` Nitin Gupta
2010-10-06  2:36         ` Greg KH
2010-10-06  2:36           ` Greg KH
2010-10-06  4:30           ` Nitin Gupta
2010-10-06  4:30             ` Nitin Gupta
2010-10-06  7:38             ` Pekka Enberg
2010-10-06  7:38               ` Pekka Enberg
2010-10-06 14:03               ` Greg KH
2010-10-06 14:03                 ` Greg KH
2010-10-06 14:16                 ` Pekka Enberg
2010-10-06 14:16                   ` Pekka Enberg
2010-10-06 14:53                   ` Nitin Gupta
2010-10-06 14:53                     ` Nitin Gupta
2010-10-06 14:02             ` Greg KH
2010-10-06 14:02               ` Greg KH

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.