linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 000 of 8] md/bitmap: Introduction - rework management of bitmap files.
@ 2006-05-12  6:07 NeilBrown
  2006-05-12  6:07 ` [PATCH 001 of 8] md/bitmap: Fix online removal of file-backed bitmaps NeilBrown
                   ` (7 more replies)
  0 siblings, 8 replies; 22+ messages in thread
From: NeilBrown @ 2006-05-12  6:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel, Paul Clements

I thought it was time to review the md/bitmap code - partly because I
wasn't comfortable with how it was handling writing to files.  The more
I learnt about the VM/VFS, the more I realised it was wrong....

I found plenty to do...

The last patch in this series of 8 is the big one.  It substantially
changes the way bitmap files are handled.  The key change is that it
now works more like swapfile: bmap() is used to find where the blocks
are and write goes direct to storage bypassing the filesystem.

These are *not* for 2.6.17, but should be ok for when 2.6.18 opens.

I've done some testing and it seems to work OK, but it is a big change
and more testing wouldn't be a bad thing :-)

 [PATCH 001 of 8] md/bitmap: Fix online removal of file-backed bitmaps
 [PATCH 002 of 8] md/bitmap: Remove bitmap writeback daemon.
 [PATCH 003 of 8] md/bitmap: Cleaner separation of page attribute handlers in md/bitmap.
 [PATCH 004 of 8] md/bitmap: Use set_bit etc for bitmap page attributes.
 [PATCH 005 of 8] md/bitmap: Remove unnecessary page reference manipulations from md/bitmap code.
 [PATCH 006 of 8] md/bitmap: Remove dead code from md/bitmap.
 [PATCH 007 of 8] md/bitmap: Tidy up i_writecount handling in md/bitmap
 [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling to use bmap to file blocks.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 001 of 8] md/bitmap: Fix online removal of file-backed bitmaps
  2006-05-12  6:07 [PATCH 000 of 8] md/bitmap: Introduction - rework management of bitmap files NeilBrown
@ 2006-05-12  6:07 ` NeilBrown
  2006-05-12  6:07 ` [PATCH 002 of 8] md/bitmap: Remove bitmap writeback daemon NeilBrown
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2006-05-12  6:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel, Paul Clements


When "mdadm --grow /dev/mdX --bitmap=none" is used to remove
a filebacked bitmap, the bitmap was disconnected from the array,
but the file wasn't closed (until the array was stopped).

The file also wasn't closed if adding the bitmap file failed.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/md.c |    7 +++++--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2006-05-12 15:49:10.000000000 +1000
+++ ./drivers/md/md.c	2006-05-12 15:54:48.000000000 +1000
@@ -3588,10 +3588,13 @@ static int set_bitmap_file(mddev_t *mdde
 		mddev->pers->quiesce(mddev, 1);
 		if (fd >= 0)
 			err = bitmap_create(mddev);
-		if (fd < 0 || err)
+		if (fd < 0 || err) {
 			bitmap_destroy(mddev);
+			fd = -1; /* make sure to put the file */
+		}
 		mddev->pers->quiesce(mddev, 0);
-	} else if (fd < 0) {
+	}
+	if (fd < 0) {
 		if (mddev->bitmap_file)
 			fput(mddev->bitmap_file);
 		mddev->bitmap_file = NULL;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 002 of 8] md/bitmap: Remove bitmap writeback daemon.
  2006-05-12  6:07 [PATCH 000 of 8] md/bitmap: Introduction - rework management of bitmap files NeilBrown
  2006-05-12  6:07 ` [PATCH 001 of 8] md/bitmap: Fix online removal of file-backed bitmaps NeilBrown
@ 2006-05-12  6:07 ` NeilBrown
  2006-05-12 17:40   ` Andrew Morton
  2006-05-12  6:07 ` [PATCH 003 of 8] md/bitmap: Cleaner separation of page attribute handlers in md/bitmap NeilBrown
                   ` (5 subsequent siblings)
  7 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2006-05-12  6:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel, Paul Clements


md/bitmap currently has a separate thread to wait for writes
to the bitmap file to complete (as we cannot get a callback
on that action).
However this isn't needed as bitmap_unplug is called from 
process context and waits for the writeback thread to do it's
work.  The same result can be achieved by doing the waiting
directly in bitmap_unplug.


Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/bitmap.c         |  115 ++----------------------------------------
 ./include/linux/raid/bitmap.h |    6 --
 2 files changed, 8 insertions(+), 113 deletions(-)

diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~	2006-05-12 15:49:10.000000000 +1000
+++ ./drivers/md/bitmap.c	2006-05-12 15:55:37.000000000 +1000
@@ -7,7 +7,6 @@
  * additions, Copyright (C) 2003-2004, Paul Clements, SteelEye Technology, Inc.:
  * - added disk storage for bitmap
  * - changes to allow various bitmap chunk sizes
- * - added bitmap daemon (to asynchronously clear bitmap bits from disk)
  */
 
 /*
@@ -330,14 +329,13 @@ static int write_page(struct bitmap *bit
 	set_page_dirty(page); /* force it to be written out */
 
 	if (!wait) {
-		/* add to list to be waited for by daemon */
+		/* add to list to be waited for */
 		struct page_list *item = mempool_alloc(bitmap->write_pool, GFP_NOIO);
 		item->page = page;
 		get_page(page);
 		spin_lock(&bitmap->write_lock);
 		list_add(&item->list, &bitmap->complete_pages);
 		spin_unlock(&bitmap->write_lock);
-		md_wakeup_thread(bitmap->writeback_daemon);
 	}
 	return write_one_page(page, wait);
 }
@@ -621,8 +619,6 @@ static void bitmap_file_unmap(struct bit
 	safe_put_page(sb_page);
 }
 
-static void bitmap_stop_daemon(struct bitmap *bitmap);
-
 /* dequeue the next item in a page list -- don't call from irq context */
 static struct page_list *dequeue_page(struct bitmap *bitmap)
 {
@@ -648,8 +644,6 @@ static void drain_write_queues(struct bi
 		put_page(item->page);
 		mempool_free(item, bitmap->write_pool);
 	}
-
-	wake_up(&bitmap->write_wait);
 }
 
 static void bitmap_file_put(struct bitmap *bitmap)
@@ -663,8 +657,6 @@ static void bitmap_file_put(struct bitma
 	bitmap->file = NULL;
 	spin_unlock_irqrestore(&bitmap->lock, flags);
 
-	bitmap_stop_daemon(bitmap);
-
 	drain_write_queues(bitmap);
 
 	bitmap_file_unmap(bitmap);
@@ -770,6 +762,8 @@ static void bitmap_file_set_bit(struct b
 
 }
 
+static void bitmap_writeback(struct bitmap *bitmap);
+
 /* this gets called when the md device is ready to unplug its underlying
  * (slave) device queues -- before we let any writes go down, we need to
  * sync the dirty pages of the bitmap file to disk */
@@ -812,13 +806,9 @@ int bitmap_unplug(struct bitmap *bitmap)
 		}
 	}
 	if (wait) { /* if any writes were performed, we need to wait on them */
-		if (bitmap->file) {
-			spin_lock_irq(&bitmap->write_lock);
-			wait_event_lock_irq(bitmap->write_wait,
-					    list_empty(&bitmap->complete_pages), bitmap->write_lock,
-					    wake_up_process(bitmap->writeback_daemon->tsk));
-			spin_unlock_irq(&bitmap->write_lock);
-		} else
+		if (bitmap->file)
+			bitmap_writeback(bitmap);
+		else
 			md_super_wait(bitmap->mddev);
 	}
 	return 0;
@@ -1126,41 +1116,12 @@ int bitmap_daemon_work(struct bitmap *bi
 	return err;
 }
 
-static void daemon_exit(struct bitmap *bitmap, mdk_thread_t **daemon)
+static void bitmap_writeback(struct bitmap *bitmap)
 {
-	mdk_thread_t *dmn;
-	unsigned long flags;
-
-	/* if no one is waiting on us, we'll free the md thread struct
-	 * and exit, otherwise we let the waiter clean things up */
-	spin_lock_irqsave(&bitmap->lock, flags);
-	if ((dmn = *daemon)) { /* no one is waiting, cleanup and exit */
-		*daemon = NULL;
-		spin_unlock_irqrestore(&bitmap->lock, flags);
-		kfree(dmn);
-		complete_and_exit(NULL, 0); /* do_exit not exported */
-	}
-	spin_unlock_irqrestore(&bitmap->lock, flags);
-}
-
-static void bitmap_writeback_daemon(mddev_t *mddev)
-{
-	struct bitmap *bitmap = mddev->bitmap;
 	struct page *page;
 	struct page_list *item;
 	int err = 0;
 
-	if (signal_pending(current)) {
-		printk(KERN_INFO
-		       "%s: bitmap writeback daemon got signal, exiting...\n",
-		       bmname(bitmap));
-		err = -EINTR;
-		goto out;
-	}
-	if (bitmap == NULL)
-		/* about to be stopped. */
-		return;
-
 	PRINTK("%s: bitmap writeback daemon woke up...\n", bmname(bitmap));
 	/* wait on bitmap page writebacks */
 	while ((item = dequeue_page(bitmap))) {
@@ -1177,59 +1138,9 @@ static void bitmap_writeback_daemon(mdde
 			       "failed (page %lu): %d\n",
 			       bmname(bitmap), page->index, err);
 			bitmap_file_kick(bitmap);
-			goto out;
+			break;
 		}
 	}
- out:
-	wake_up(&bitmap->write_wait);
-	if (err) {
-		printk(KERN_INFO "%s: bitmap writeback daemon exiting (%d)\n",
-		       bmname(bitmap), err);
-		daemon_exit(bitmap, &bitmap->writeback_daemon);
-	}
-}
-
-static mdk_thread_t *bitmap_start_daemon(struct bitmap *bitmap,
-				void (*func)(mddev_t *), char *name)
-{
-	mdk_thread_t *daemon;
-	char namebuf[32];
-
-#ifdef INJECT_FATAL_FAULT_2
-	daemon = NULL;
-#else
-	sprintf(namebuf, "%%s_%s", name);
-	daemon = md_register_thread(func, bitmap->mddev, namebuf);
-#endif
-	if (!daemon) {
-		printk(KERN_ERR "%s: failed to start bitmap daemon\n",
-			bmname(bitmap));
-		return ERR_PTR(-ECHILD);
-	}
-
-	md_wakeup_thread(daemon); /* start it running */
-
-	PRINTK("%s: %s daemon (pid %d) started...\n",
-		bmname(bitmap), name, daemon->tsk->pid);
-
-	return daemon;
-}
-
-static void bitmap_stop_daemon(struct bitmap *bitmap)
-{
-	/* the daemon can't stop itself... it'll just exit instead... */
-	if (bitmap->writeback_daemon && ! IS_ERR(bitmap->writeback_daemon) &&
-	    current->pid != bitmap->writeback_daemon->tsk->pid) {
-		mdk_thread_t *daemon;
-		unsigned long flags;
-
-		spin_lock_irqsave(&bitmap->lock, flags);
-		daemon = bitmap->writeback_daemon;
-		bitmap->writeback_daemon = NULL;
-		spin_unlock_irqrestore(&bitmap->lock, flags);
-		if (daemon && ! IS_ERR(daemon))
-			md_unregister_thread(daemon); /* destroy the thread */
-	}
 }
 
 static bitmap_counter_t *bitmap_get_counter(struct bitmap *bitmap,
@@ -1553,7 +1464,6 @@ int bitmap_create(mddev_t *mddev)
 
 	spin_lock_init(&bitmap->write_lock);
 	INIT_LIST_HEAD(&bitmap->complete_pages);
-	init_waitqueue_head(&bitmap->write_wait);
 	bitmap->write_pool = mempool_create_kmalloc_pool(WRITE_POOL_SIZE,
 						sizeof(struct page_list));
 	err = -ENOMEM;
@@ -1613,15 +1523,6 @@ int bitmap_create(mddev_t *mddev)
 
 	mddev->bitmap = bitmap;
 
-	if (file)
-		/* kick off the bitmap writeback daemon */
-		bitmap->writeback_daemon =
-			bitmap_start_daemon(bitmap,
-					    bitmap_writeback_daemon,
-					    "bitmap_wb");
-
-	if (IS_ERR(bitmap->writeback_daemon))
-		return PTR_ERR(bitmap->writeback_daemon);
 	mddev->thread->timeout = bitmap->daemon_sleep * HZ;
 
 	return bitmap_update_sb(bitmap);

diff ./include/linux/raid/bitmap.h~current~ ./include/linux/raid/bitmap.h
--- ./include/linux/raid/bitmap.h~current~	2006-05-12 15:49:10.000000000 +1000
+++ ./include/linux/raid/bitmap.h	2006-05-12 15:55:37.000000000 +1000
@@ -244,13 +244,7 @@ struct bitmap {
 	unsigned long daemon_lastrun; /* jiffies of last run */
 	unsigned long daemon_sleep; /* how many seconds between updates? */
 
-	/*
-	 * bitmap_writeback_daemon waits for file-pages that have been written,
-	 * as there is no way to get a call-back when a page write completes.
-	 */
-	mdk_thread_t *writeback_daemon;
 	spinlock_t write_lock;
-	wait_queue_head_t write_wait;
 	struct list_head complete_pages;
 	mempool_t *write_pool;
 };

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 003 of 8] md/bitmap: Cleaner separation of page attribute handlers in md/bitmap.
  2006-05-12  6:07 [PATCH 000 of 8] md/bitmap: Introduction - rework management of bitmap files NeilBrown
  2006-05-12  6:07 ` [PATCH 001 of 8] md/bitmap: Fix online removal of file-backed bitmaps NeilBrown
  2006-05-12  6:07 ` [PATCH 002 of 8] md/bitmap: Remove bitmap writeback daemon NeilBrown
@ 2006-05-12  6:07 ` NeilBrown
  2006-05-12  6:07 ` [PATCH 004 of 8] md/bitmap: Use set_bit etc for bitmap page attributes NeilBrown
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2006-05-12  6:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel, Paul Clements


md/bitmap has some attributes per-page.  Handling of these
attributes in largely abstracted in set_page_attr and
clear_page_attr.  However get_page_attr exposes the 
format used to store them.  So prior to changing that
format, introduce test_page_attr instead of get_page_attr,
and make appropriate usage changes.


Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/bitmap.c |   41 ++++++++++++++++++++++-------------------
 1 file changed, 22 insertions(+), 19 deletions(-)

diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~	2006-05-12 15:55:37.000000000 +1000
+++ ./drivers/md/bitmap.c	2006-05-12 15:56:51.000000000 +1000
@@ -717,9 +717,10 @@ static inline void clear_page_attr(struc
 	bitmap->filemap_attr[page->index] &= ~attr;
 }
 
-static inline unsigned long get_page_attr(struct bitmap *bitmap, struct page *page)
+static inline unsigned long test_page_attr(struct bitmap *bitmap, struct page *page,
+					   enum bitmap_page_attr attr)
 {
-	return bitmap->filemap_attr[page->index];
+	return bitmap->filemap_attr[page->index] & attr;
 }
 
 /*
@@ -745,7 +746,7 @@ static void bitmap_file_set_bit(struct b
 
 
 	/* make sure the page stays cached until it gets written out */
-	if (! (get_page_attr(bitmap, page) & BITMAP_PAGE_DIRTY))
+	if (! test_page_attr(bitmap, page, BITMAP_PAGE_DIRTY))
 		get_page(page);
 
  	/* set the bit */
@@ -769,7 +770,8 @@ static void bitmap_writeback(struct bitm
  * sync the dirty pages of the bitmap file to disk */
 int bitmap_unplug(struct bitmap *bitmap)
 {
-	unsigned long i, attr, flags;
+	unsigned long i, flags;
+	int dirty, need_write;
 	struct page *page;
 	int wait = 0;
 	int err;
@@ -786,17 +788,18 @@ int bitmap_unplug(struct bitmap *bitmap)
 			return 0;
 		}
 		page = bitmap->filemap[i];
-		attr = get_page_attr(bitmap, page);
+		dirty = test_page_attr(bitmap, page, BITMAP_PAGE_DIRTY);
+		need_write = test_page_attr(bitmap, page, BITMAP_PAGE_NEEDWRITE);
 		clear_page_attr(bitmap, page, BITMAP_PAGE_DIRTY);
 		clear_page_attr(bitmap, page, BITMAP_PAGE_NEEDWRITE);
-		if ((attr & BITMAP_PAGE_DIRTY))
+		if (dirty)
 			wait = 1;
 		spin_unlock_irqrestore(&bitmap->lock, flags);
 
-		if (attr & (BITMAP_PAGE_DIRTY | BITMAP_PAGE_NEEDWRITE)) {
+		if (dirty | need_write) {
 			err = write_page(bitmap, page, 0);
 			if (err == -EAGAIN) {
-				if (attr & BITMAP_PAGE_DIRTY)
+				if (dirty)
 					err = write_page(bitmap, page, 1);
 				else
 					err = 0;
@@ -961,12 +964,11 @@ void bitmap_write_all(struct bitmap *bit
 	/* We don't actually write all bitmap blocks here,
 	 * just flag them as needing to be written
 	 */
+	int i;
 
-	unsigned long chunks = bitmap->chunks;
-	unsigned long bytes = (chunks+7)/8 + sizeof(bitmap_super_t);
-	unsigned long num_pages = (bytes + PAGE_SIZE-1) / PAGE_SIZE;
-	while (num_pages--)
-		bitmap->filemap_attr[num_pages] |= BITMAP_PAGE_NEEDWRITE;
+	for (i=0; i < bitmap->file_pages; i++)
+		set_page_attr(bitmap, bitmap->filemap[i],
+			      BITMAP_PAGE_NEEDWRITE);
 }
 
 
@@ -997,7 +999,6 @@ int bitmap_daemon_work(struct bitmap *bi
 	struct page *page = NULL, *lastpage = NULL;
 	int err = 0;
 	int blocks;
-	int attr;
 	void *paddr;
 
 	if (bitmap == NULL)
@@ -1019,13 +1020,15 @@ int bitmap_daemon_work(struct bitmap *bi
 
 		if (page != lastpage) {
 			/* skip this page unless it's marked as needing cleaning */
-			if (!((attr=get_page_attr(bitmap, page)) & BITMAP_PAGE_CLEAN)) {
-				if (attr & BITMAP_PAGE_NEEDWRITE) {
+			if (!test_page_attr(bitmap, page, BITMAP_PAGE_CLEAN)) {
+				int need_write = test_page_attr(bitmap, page,
+								BITMAP_PAGE_NEEDWRITE);
+				if (need_write) {
 					get_page(page);
 					clear_page_attr(bitmap, page, BITMAP_PAGE_NEEDWRITE);
 				}
 				spin_unlock_irqrestore(&bitmap->lock, flags);
-				if (attr & BITMAP_PAGE_NEEDWRITE) {
+				if (need_write) {
 					switch (write_page(bitmap, page, 0)) {
 					case -EAGAIN:
 						set_page_attr(bitmap, page, BITMAP_PAGE_NEEDWRITE);
@@ -1043,7 +1046,7 @@ int bitmap_daemon_work(struct bitmap *bi
 			/* grab the new page, sync and release the old */
 			get_page(page);
 			if (lastpage != NULL) {
-				if (get_page_attr(bitmap, lastpage) & BITMAP_PAGE_NEEDWRITE) {
+				if (test_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE)) {
 					clear_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE);
 					spin_unlock_irqrestore(&bitmap->lock, flags);
 					err = write_page(bitmap, lastpage, 0);
@@ -1097,7 +1100,7 @@ int bitmap_daemon_work(struct bitmap *bi
 	/* now sync the final page */
 	if (lastpage != NULL) {
 		spin_lock_irqsave(&bitmap->lock, flags);
-		if (get_page_attr(bitmap, lastpage) &BITMAP_PAGE_NEEDWRITE) {
+		if (test_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE)) {
 			clear_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE);
 			spin_unlock_irqrestore(&bitmap->lock, flags);
 			err = write_page(bitmap, lastpage, 0);

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 004 of 8] md/bitmap: Use set_bit etc for bitmap page attributes.
  2006-05-12  6:07 [PATCH 000 of 8] md/bitmap: Introduction - rework management of bitmap files NeilBrown
                   ` (2 preceding siblings ...)
  2006-05-12  6:07 ` [PATCH 003 of 8] md/bitmap: Cleaner separation of page attribute handlers in md/bitmap NeilBrown
@ 2006-05-12  6:07 ` NeilBrown
  2006-05-12  6:07 ` [PATCH 005 of 8] md/bitmap: Remove unnecessary page reference manipulations from md/bitmap code NeilBrown
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2006-05-12  6:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel, Paul Clements


In particular, this means that we use 4 bits per page instead of a
whole unsigned long.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/bitmap.c |   19 ++++++++++++-------
 1 file changed, 12 insertions(+), 7 deletions(-)

diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~	2006-05-12 15:56:51.000000000 +1000
+++ ./drivers/md/bitmap.c	2006-05-12 15:58:22.000000000 +1000
@@ -700,27 +700,27 @@ static void bitmap_file_kick(struct bitm
 }
 
 enum bitmap_page_attr {
-	BITMAP_PAGE_DIRTY = 1, // there are set bits that need to be synced
-	BITMAP_PAGE_CLEAN = 2, // there are bits that might need to be cleared
-	BITMAP_PAGE_NEEDWRITE=4, // there are cleared bits that need to be synced
+	BITMAP_PAGE_DIRTY = 0, // there are set bits that need to be synced
+	BITMAP_PAGE_CLEAN = 1, // there are bits that might need to be cleared
+	BITMAP_PAGE_NEEDWRITE=2, // there are cleared bits that need to be synced
 };
 
 static inline void set_page_attr(struct bitmap *bitmap, struct page *page,
 				enum bitmap_page_attr attr)
 {
-	bitmap->filemap_attr[page->index] |= attr;
+	__set_bit((page->index<<2) + attr, bitmap->filemap_attr);
 }
 
 static inline void clear_page_attr(struct bitmap *bitmap, struct page *page,
 				enum bitmap_page_attr attr)
 {
-	bitmap->filemap_attr[page->index] &= ~attr;
+	__clear_bit((page->index<<2) + attr, bitmap->filemap_attr);
 }
 
 static inline unsigned long test_page_attr(struct bitmap *bitmap, struct page *page,
 					   enum bitmap_page_attr attr)
 {
-	return bitmap->filemap_attr[page->index] & attr;
+	return test_bit((page->index<<2) + attr, bitmap->filemap_attr);
 }
 
 /*
@@ -872,7 +872,12 @@ static int bitmap_init_from_disk(struct 
 	if (!bitmap->filemap)
 		goto out;
 
-	bitmap->filemap_attr = kzalloc(sizeof(long) * num_pages, GFP_KERNEL);
+	/* We need 4 bits per page, rounded up to a multiple of sizeof(unsigned long) */
+	bitmap->filemap_attr = kzalloc(
+		(((num_pages*4/8)+sizeof(unsigned long)-1)
+		 /sizeof(unsigned long))
+		*sizeof(unsigned long),
+		GFP_KERNEL);
 	if (!bitmap->filemap_attr)
 		goto out;
 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 005 of 8] md/bitmap: Remove unnecessary page reference manipulations from md/bitmap code.
  2006-05-12  6:07 [PATCH 000 of 8] md/bitmap: Introduction - rework management of bitmap files NeilBrown
                   ` (3 preceding siblings ...)
  2006-05-12  6:07 ` [PATCH 004 of 8] md/bitmap: Use set_bit etc for bitmap page attributes NeilBrown
@ 2006-05-12  6:07 ` NeilBrown
  2006-05-12  6:07 ` [PATCH 006 of 8] md/bitmap: Remove dead code from md/bitmap NeilBrown
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2006-05-12  6:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel, Paul Clements


md/bitmap gets a collection of pages representing the bitmap when it
initialises the bitmap, and puts all the references when discarding
the bitmap.

It also occasionally takes extra references without any good reason,
and sometimes drops them ... though it doesn't always drop them, which
can result in a memory leak.

This patch removes the unnecessary 'get_page' calls, and the
corresponding 'put_page' calls.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/bitmap.c |   20 ++------------------
 1 file changed, 2 insertions(+), 18 deletions(-)

diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~	2006-05-12 15:58:22.000000000 +1000
+++ ./drivers/md/bitmap.c	2006-05-12 15:58:57.000000000 +1000
@@ -332,7 +332,6 @@ static int write_page(struct bitmap *bit
 		/* add to list to be waited for */
 		struct page_list *item = mempool_alloc(bitmap->write_pool, GFP_NOIO);
 		item->page = page;
-		get_page(page);
 		spin_lock(&bitmap->write_lock);
 		list_add(&item->list, &bitmap->complete_pages);
 		spin_unlock(&bitmap->write_lock);
@@ -548,7 +547,6 @@ static void bitmap_mask_state(struct bit
 		spin_unlock_irqrestore(&bitmap->lock, flags);
 		return;
 	}
-	get_page(bitmap->sb_page);
 	spin_unlock_irqrestore(&bitmap->lock, flags);
 	sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0);
 	switch (op) {
@@ -559,7 +557,6 @@ static void bitmap_mask_state(struct bit
 		default: BUG();
 	}
 	kunmap_atomic(sb, KM_USER0);
-	put_page(bitmap->sb_page);
 }
 
 /*
@@ -641,7 +638,6 @@ static void drain_write_queues(struct bi
 
 	while ((item = dequeue_page(bitmap))) {
 		/* don't bother to wait */
-		put_page(item->page);
 		mempool_free(item, bitmap->write_pool);
 	}
 }
@@ -744,11 +740,6 @@ static void bitmap_file_set_bit(struct b
 	page = filemap_get_page(bitmap, chunk);
 	bit = file_page_offset(chunk);
 
-
-	/* make sure the page stays cached until it gets written out */
-	if (! test_page_attr(bitmap, page, BITMAP_PAGE_DIRTY))
-		get_page(page);
-
  	/* set the bit */
 	kaddr = kmap_atomic(page, KM_USER0);
 	if (bitmap->flags & BITMAP_HOSTENDIAN)
@@ -1028,10 +1019,9 @@ int bitmap_daemon_work(struct bitmap *bi
 			if (!test_page_attr(bitmap, page, BITMAP_PAGE_CLEAN)) {
 				int need_write = test_page_attr(bitmap, page,
 								BITMAP_PAGE_NEEDWRITE);
-				if (need_write) {
-					get_page(page);
+				if (need_write)
 					clear_page_attr(bitmap, page, BITMAP_PAGE_NEEDWRITE);
-				}
+
 				spin_unlock_irqrestore(&bitmap->lock, flags);
 				if (need_write) {
 					switch (write_page(bitmap, page, 0)) {
@@ -1043,13 +1033,11 @@ int bitmap_daemon_work(struct bitmap *bi
 					default:
 						bitmap_file_kick(bitmap);
 					}
-					put_page(page);
 				}
 				continue;
 			}
 
 			/* grab the new page, sync and release the old */
-			get_page(page);
 			if (lastpage != NULL) {
 				if (test_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE)) {
 					clear_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE);
@@ -1063,7 +1051,6 @@ int bitmap_daemon_work(struct bitmap *bi
 					set_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE);
 					spin_unlock_irqrestore(&bitmap->lock, flags);
 				}
-				put_page(lastpage);
 				if (err)
 					bitmap_file_kick(bitmap);
 			} else
@@ -1117,8 +1104,6 @@ int bitmap_daemon_work(struct bitmap *bi
 			set_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE);
 			spin_unlock_irqrestore(&bitmap->lock, flags);
 		}
-
-		put_page(lastpage);
 	}
 
 	return err;
@@ -1140,7 +1125,6 @@ static void bitmap_writeback(struct bitm
 		PRINTK("finished page writeback: %p\n", page);
 
 		err = PageError(page);
-		put_page(page);
 		if (err) {
 			printk(KERN_WARNING "%s: bitmap file writeback "
 			       "failed (page %lu): %d\n",

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 006 of 8] md/bitmap: Remove dead code from md/bitmap.
  2006-05-12  6:07 [PATCH 000 of 8] md/bitmap: Introduction - rework management of bitmap files NeilBrown
                   ` (4 preceding siblings ...)
  2006-05-12  6:07 ` [PATCH 005 of 8] md/bitmap: Remove unnecessary page reference manipulations from md/bitmap code NeilBrown
@ 2006-05-12  6:07 ` NeilBrown
  2006-05-12  6:08 ` [PATCH 007 of 8] md/bitmap: Tidy up i_writecount handling in md/bitmap NeilBrown
  2006-05-12  6:08 ` [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling to use bmap to file blocks NeilBrown
  7 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2006-05-12  6:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel, Paul Clements


bitmap_active is never called, and the BITMAP_ACTIVE flag is
never users or tested, so discard them both.

Also remove some out-of-date 'todo' comments.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/bitmap.c |   22 ----------------------
 1 file changed, 22 deletions(-)

diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~	2006-05-12 15:58:57.000000000 +1000
+++ ./drivers/md/bitmap.c	2006-05-12 15:59:53.000000000 +1000
@@ -14,9 +14,6 @@
  *
  * flush after percent set rather than just time based. (maybe both).
  * wait if count gets too high, wake when it drops to half.
- * allow bitmap to be mirrored with superblock (before or after...)
- * allow hot-add to re-instate a current device.
- * allow hot-add of bitmap after quiessing device
  */
 
 #include <linux/module.h>
@@ -70,23 +67,6 @@ static inline char * bmname(struct bitma
 	return bitmap->mddev ? mdname(bitmap->mddev) : "mdX";
 }
 
-
-/*
- * test if the bitmap is active
- */
-int bitmap_active(struct bitmap *bitmap)
-{
-	unsigned long flags;
-	int res = 0;
-
-	if (!bitmap)
-		return res;
-	spin_lock_irqsave(&bitmap->lock, flags);
-	res = bitmap->flags & BITMAP_ACTIVE;
-	spin_unlock_irqrestore(&bitmap->lock, flags);
-	return res;
-}
-
 #define WRITE_POOL_SIZE 256
 
 /*
@@ -1496,8 +1476,6 @@ int bitmap_create(mddev_t *mddev)
 	if (!bitmap->bp)
 		goto error;
 
-	bitmap->flags |= BITMAP_ACTIVE;
-
 	/* now that we have some pages available, initialize the in-memory
 	 * bitmap from the on-disk bitmap */
 	start = 0;

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 007 of 8] md/bitmap: Tidy up i_writecount handling in md/bitmap
  2006-05-12  6:07 [PATCH 000 of 8] md/bitmap: Introduction - rework management of bitmap files NeilBrown
                   ` (5 preceding siblings ...)
  2006-05-12  6:07 ` [PATCH 006 of 8] md/bitmap: Remove dead code from md/bitmap NeilBrown
@ 2006-05-12  6:08 ` NeilBrown
  2006-05-12  6:08 ` [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling to use bmap to file blocks NeilBrown
  7 siblings, 0 replies; 22+ messages in thread
From: NeilBrown @ 2006-05-12  6:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel, Paul Clements


md/bitmap modifies i_writecount of a bitmap file to make sure
that no-one else writes to it.  
The reverting of the change is sometimes done twice, and there
is one error path where it is omitted.

This patch tidies that up.

Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/bitmap.c |    8 +-------
 ./drivers/md/md.c     |   49 ++++++++++++++++++++++++++++++-------------------
 2 files changed, 31 insertions(+), 26 deletions(-)

diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~	2006-05-12 15:59:53.000000000 +1000
+++ ./drivers/md/bitmap.c	2006-05-12 16:00:03.000000000 +1000
@@ -625,7 +625,6 @@ static void drain_write_queues(struct bi
 static void bitmap_file_put(struct bitmap *bitmap)
 {
 	struct file *file;
-	struct inode *inode;
 	unsigned long flags;
 
 	spin_lock_irqsave(&bitmap->lock, flags);
@@ -637,13 +636,8 @@ static void bitmap_file_put(struct bitma
 
 	bitmap_file_unmap(bitmap);
 
-	if (file) {
-		inode = file->f_mapping->host;
-		spin_lock(&inode->i_lock);
-		atomic_set(&inode->i_writecount, 1); /* allow writes again */
-		spin_unlock(&inode->i_lock);
+	if (file)
 		fput(file);
-	}
 }
 
 

diff ./drivers/md/md.c~current~ ./drivers/md/md.c
--- ./drivers/md/md.c~current~	2006-05-12 15:54:48.000000000 +1000
+++ ./drivers/md/md.c	2006-05-12 16:00:03.000000000 +1000
@@ -2864,6 +2864,32 @@ out:
 	return err;
 }
 
+/* similar to deny_write_access, but accounts for our holding a reference
+ * to the file ourselves */
+static int deny_bitmap_write_access(struct file * file)
+{
+	struct inode *inode = file->f_mapping->host;
+
+	spin_lock(&inode->i_lock);
+	if (atomic_read(&inode->i_writecount) > 1) {
+		spin_unlock(&inode->i_lock);
+		return -ETXTBSY;
+	}
+	atomic_set(&inode->i_writecount, -1);
+	spin_unlock(&inode->i_lock);
+
+	return 0;
+}
+
+static void restore_bitmap_write_access(struct file *file)
+{
+	struct inode *inode = file->f_mapping->host;
+
+	spin_lock(&inode->i_lock);
+	atomic_set(&inode->i_writecount, 1);
+	spin_unlock(&inode->i_lock);
+}
+
 static int do_md_stop(mddev_t * mddev, int ro)
 {
 	int err = 0;
@@ -2927,7 +2953,7 @@ static int do_md_stop(mddev_t * mddev, i
 
 		bitmap_destroy(mddev);
 		if (mddev->bitmap_file) {
-			atomic_set(&mddev->bitmap_file->f_dentry->d_inode->i_writecount, 1);
+			restore_bitmap_write_access(mddev->bitmap_file);
 			fput(mddev->bitmap_file);
 			mddev->bitmap_file = NULL;
 		}
@@ -3531,23 +3557,6 @@ abort_export:
 	return err;
 }
 
-/* similar to deny_write_access, but accounts for our holding a reference
- * to the file ourselves */
-static int deny_bitmap_write_access(struct file * file)
-{
-	struct inode *inode = file->f_mapping->host;
-
-	spin_lock(&inode->i_lock);
-	if (atomic_read(&inode->i_writecount) > 1) {
-		spin_unlock(&inode->i_lock);
-		return -ETXTBSY;
-	}
-	atomic_set(&inode->i_writecount, -1);
-	spin_unlock(&inode->i_lock);
-
-	return 0;
-}
-
 static int set_bitmap_file(mddev_t *mddev, int fd)
 {
 	int err;
@@ -3595,8 +3604,10 @@ static int set_bitmap_file(mddev_t *mdde
 		mddev->pers->quiesce(mddev, 0);
 	}
 	if (fd < 0) {
-		if (mddev->bitmap_file)
+		if (mddev->bitmap_file) {
+			restore_bitmap_write_access(mddev->bitmap_file);
 			fput(mddev->bitmap_file);
+		}
 		mddev->bitmap_file = NULL;
 	}
 

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling to use bmap to file blocks.
  2006-05-12  6:07 [PATCH 000 of 8] md/bitmap: Introduction - rework management of bitmap files NeilBrown
                   ` (6 preceding siblings ...)
  2006-05-12  6:08 ` [PATCH 007 of 8] md/bitmap: Tidy up i_writecount handling in md/bitmap NeilBrown
@ 2006-05-12  6:08 ` NeilBrown
  2006-05-12 17:47   ` Andrew Morton
  7 siblings, 1 reply; 22+ messages in thread
From: NeilBrown @ 2006-05-12  6:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel, Paul Clements


If md is asked to store a bitmap in a file, it tries to hold onto the
page cache pages for that file, manipulate them directly, and call a
cocktail of operations to write the file out.  I don't believe this is
a supportable approach.

This patch changes the approach to use the same approach as swap files.
i.e. bmap is used to enumerate all the block address of parts of the file
and we write directly to those blocks of the device.

swapfile only uses parts of the file that provide a full pages at
contiguous addresses.  We don't have that luxury so we have to cope
with pages that are non-contiguous in storage.  To handle this 
we attach buffers to each page, and store the addresses in those buffers.

With this approach the pagecache may contain data which is inconsistent with 
what is on disk.  To alleviate the problems this can cause, md invalidates
the pagecache when releasing the file.  If the file is to be examined
while the array is active (a non-critical but occasionally useful function),
O_DIRECT io must be used.  And new version of mdadm will have support for this.

This approach simplifies a lot of code:
 - we no longer need to keep a list of pages which we need to wait for,
   as the b_endio function can keep track of how many outstanding
   writes there are.  This saves a mempool.
 - -EAGAIN returns from write_page are no longer possible (not sure if
    they ever were actually).



Signed-off-by: Neil Brown <neilb@suse.de>

### Diffstat output
 ./drivers/md/bitmap.c         |  283 ++++++++++++++++++++----------------------
 ./include/linux/raid/bitmap.h |    7 -
 2 files changed, 139 insertions(+), 151 deletions(-)

diff ./drivers/md/bitmap.c~current~ ./drivers/md/bitmap.c
--- ./drivers/md/bitmap.c~current~	2006-05-12 16:00:03.000000000 +1000
+++ ./drivers/md/bitmap.c	2006-05-12 16:01:06.000000000 +1000
@@ -67,7 +67,6 @@ static inline char * bmname(struct bitma
 	return bitmap->mddev ? mdname(bitmap->mddev) : "mdX";
 }
 
-#define WRITE_POOL_SIZE 256
 
 /*
  * just a placeholder - calls kmalloc for bitmap pages
@@ -279,75 +278,137 @@ static int write_sb_page(mddev_t *mddev,
  */
 static int write_page(struct bitmap *bitmap, struct page *page, int wait)
 {
-	int ret = -ENOMEM;
+	struct buffer_head *bh;
 
 	if (bitmap->file == NULL)
 		return write_sb_page(bitmap->mddev, bitmap->offset, page, wait);
 
-	flush_dcache_page(page); /* make sure visible to anyone reading the file */
+	bh = page_buffers(page);
 
-	if (wait)
-		lock_page(page);
-	else {
-		if (TestSetPageLocked(page))
-			return -EAGAIN; /* already locked */
-		if (PageWriteback(page)) {
-			unlock_page(page);
-			return -EAGAIN;
-		}
+	while (bh && bh->b_blocknr) {
+		atomic_inc(&bitmap->pending_writes);
+		set_buffer_locked(bh);
+		set_buffer_mapped(bh);
+		submit_bh(WRITE, bh);
+		bh = bh->b_this_page;
+	}
+
+	if (wait) {
+		wait_event(bitmap->write_wait,
+			   atomic_read(&bitmap->pending_writes)==0);
+		return (bitmap->flags & BITMAP_WRITE_ERROR) ? -EIO : 0;
 	}
+	return 0;
+}
+
+static void end_bitmap_write(struct buffer_head *bh, int uptodate)
+{
+	struct bitmap *bitmap = bh->b_private;
+	unsigned long flags;
 
-	ret = page->mapping->a_ops->prepare_write(bitmap->file, page, 0, PAGE_SIZE);
-	if (!ret)
-		ret = page->mapping->a_ops->commit_write(bitmap->file, page, 0,
-			PAGE_SIZE);
-	if (ret) {
-		unlock_page(page);
-		return ret;
+	if (!uptodate) {
+		spin_lock_irqsave(&bitmap->lock, flags);
+		bitmap->flags |= BITMAP_WRITE_ERROR;
+		spin_unlock_irqrestore(&bitmap->lock, flags);
 	}
+	if (atomic_dec_and_test(&bitmap->pending_writes))
+		wake_up(&bitmap->write_wait);
+}
 
-	set_page_dirty(page); /* force it to be written out */
+/* copied from buffer.c */
+static void
+__clear_page_buffers(struct page *page)
+{
+	ClearPagePrivate(page);
+	set_page_private(page, 0);
+	page_cache_release(page);
+}
+static void free_buffers(struct page *page)
+{
+	struct buffer_head *bh = page_buffers(page);
 
-	if (!wait) {
-		/* add to list to be waited for */
-		struct page_list *item = mempool_alloc(bitmap->write_pool, GFP_NOIO);
-		item->page = page;
-		spin_lock(&bitmap->write_lock);
-		list_add(&item->list, &bitmap->complete_pages);
-		spin_unlock(&bitmap->write_lock);
+	while (bh) {
+		struct buffer_head *next = bh->b_this_page;
+		free_buffer_head(bh);
+		bh = next;
 	}
-	return write_one_page(page, wait);
+	__clear_page_buffers(page);
+	put_page(page);
 }
 
-/* read a page from a file, pinning it into cache, and return bytes_read */
+/* read a page from a file.
+ * We both read the page, and attach buffers to the page to record the
+ * address of each block (using bmap).  These addresses will be used
+ * to write the block later, completely bypassing the filesystem.
+ * This usage is similar to how swap files are handled, and allows us
+ * to write to a file with no concerns of memory allocation failing.
+ */
 static struct page *read_page(struct file *file, unsigned long index,
-					unsigned long *bytes_read)
+			      struct bitmap *bitmap,
+			      unsigned long count)
 {
-	struct inode *inode = file->f_mapping->host;
 	struct page *page = NULL;
-	loff_t isize = i_size_read(inode);
-	unsigned long end_index = isize >> PAGE_SHIFT;
+	struct inode *inode = file->f_dentry->d_inode;
+	loff_t pos = index << PAGE_SHIFT;
+	int ret;
+	struct buffer_head *bh;
+	sector_t block;
+	mm_segment_t oldfs;
 
 	PRINTK("read bitmap file (%dB @ %Lu)\n", (int)PAGE_SIZE,
 			(unsigned long long)index << PAGE_SHIFT);
 
-	page = read_cache_page(inode->i_mapping, index,
-			(filler_t *)inode->i_mapping->a_ops->readpage, file);
+	page = alloc_page(GFP_KERNEL);
+	if (!page)
+		page = ERR_PTR(-ENOMEM);
 	if (IS_ERR(page))
 		goto out;
-	wait_on_page_locked(page);
-	if (!PageUptodate(page) || PageError(page)) {
+
+	oldfs = get_fs();
+	set_fs(KERNEL_DS);
+	ret = vfs_read(file, (char __user*) page_address(page), count, &pos);
+	set_fs(oldfs);
+
+	if (ret >= 0 && ret != count)
+		ret = -EIO;
+	if (ret < 0) {
 		put_page(page);
-		page = ERR_PTR(-EIO);
+		page = ERR_PTR(ret);
 		goto out;
 	}
+	bh = alloc_page_buffers(page, 1<<inode->i_blkbits, 0);
+	if (!bh) {
+		put_page(page);
+		page = ERR_PTR(-ENOMEM);
+		goto out;
+	}
+	attach_page_buffers(page, bh);
+	block = index << (PAGE_SHIFT - inode->i_blkbits);
+	while (bh) {
+		if (count == 0)
+			bh->b_blocknr = 0;
+		else {
+			bh->b_blocknr = bmap(inode, block);
+			if (bh->b_blocknr == 0) {
+				/* Cannot use this file! */
+				free_buffers(page);
+				page = ERR_PTR(-EINVAL);
+				goto out;
+			}
+			bh->b_bdev = inode->i_sb->s_bdev;
+			if (count < (1<<inode->i_blkbits))
+				count = 0;
+			else
+				count -= (1<<inode->i_blkbits);
 
-	if (index > end_index) /* we have read beyond EOF */
-		*bytes_read = 0;
-	else if (index == end_index) /* possible short read */
-		*bytes_read = isize & ~PAGE_MASK;
-	else
-		*bytes_read = PAGE_SIZE; /* got a full page */
+			bh->b_end_io = end_bitmap_write;
+			bh->b_private = bitmap;
+		}
+		block++;
+		bh = bh->b_this_page;
+	}
+
+	page->index = index;
 out:
 	if (IS_ERR(page))
 		printk(KERN_ALERT "md: bitmap read error: (%dB @ %Lu): %ld\n",
@@ -418,16 +479,14 @@ static int bitmap_read_sb(struct bitmap 
 	char *reason = NULL;
 	bitmap_super_t *sb;
 	unsigned long chunksize, daemon_sleep, write_behind;
-	unsigned long bytes_read;
 	unsigned long long events;
 	int err = -EINVAL;
 
 	/* page 0 is the superblock, read it... */
 	if (bitmap->file)
-		bitmap->sb_page = read_page(bitmap->file, 0, &bytes_read);
+		bitmap->sb_page = read_page(bitmap->file, 0, bitmap, PAGE_SIZE);
 	else {
 		bitmap->sb_page = read_sb_page(bitmap->mddev, bitmap->offset, 0);
-		bytes_read = PAGE_SIZE;
 	}
 	if (IS_ERR(bitmap->sb_page)) {
 		err = PTR_ERR(bitmap->sb_page);
@@ -437,13 +496,6 @@ static int bitmap_read_sb(struct bitmap 
 
 	sb = (bitmap_super_t *)kmap_atomic(bitmap->sb_page, KM_USER0);
 
-	if (bytes_read < sizeof(*sb)) { /* short read */
-		printk(KERN_INFO "%s: bitmap file superblock truncated\n",
-			bmname(bitmap));
-		err = -ENOSPC;
-		goto out;
-	}
-
 	chunksize = le32_to_cpu(sb->chunksize);
 	daemon_sleep = le32_to_cpu(sb->daemon_sleep);
 	write_behind = le32_to_cpu(sb->write_behind);
@@ -589,37 +641,12 @@ static void bitmap_file_unmap(struct bit
 
 	while (pages--)
 		if (map[pages]->index != 0) /* 0 is sb_page, release it below */
-			put_page(map[pages]);
+			free_buffers(map[pages]);
 	kfree(map);
 	kfree(attr);
 
-	safe_put_page(sb_page);
-}
-
-/* dequeue the next item in a page list -- don't call from irq context */
-static struct page_list *dequeue_page(struct bitmap *bitmap)
-{
-	struct page_list *item = NULL;
-	struct list_head *head = &bitmap->complete_pages;
-
-	spin_lock(&bitmap->write_lock);
-	if (list_empty(head))
-		goto out;
-	item = list_entry(head->prev, struct page_list, list);
-	list_del(head->prev);
-out:
-	spin_unlock(&bitmap->write_lock);
-	return item;
-}
-
-static void drain_write_queues(struct bitmap *bitmap)
-{
-	struct page_list *item;
-
-	while ((item = dequeue_page(bitmap))) {
-		/* don't bother to wait */
-		mempool_free(item, bitmap->write_pool);
-	}
+	if (sb_page)
+		free_buffers(sb_page);
 }
 
 static void bitmap_file_put(struct bitmap *bitmap)
@@ -632,12 +659,16 @@ static void bitmap_file_put(struct bitma
 	bitmap->file = NULL;
 	spin_unlock_irqrestore(&bitmap->lock, flags);
 
-	drain_write_queues(bitmap);
-
+	if (file)
+		wait_event(bitmap->write_wait,
+			   atomic_read(&bitmap->pending_writes)==0);
 	bitmap_file_unmap(bitmap);
 
-	if (file)
+	if (file) {
+		struct inode *inode = file->f_dentry->d_inode;
+		invalidate_inode_pages(inode->i_mapping);
 		fput(file);
+	}
 }
 
 
@@ -728,8 +759,6 @@ static void bitmap_file_set_bit(struct b
 
 }
 
-static void bitmap_writeback(struct bitmap *bitmap);
-
 /* this gets called when the md device is ready to unplug its underlying
  * (slave) device queues -- before we let any writes go down, we need to
  * sync the dirty pages of the bitmap file to disk */
@@ -761,24 +790,18 @@ int bitmap_unplug(struct bitmap *bitmap)
 			wait = 1;
 		spin_unlock_irqrestore(&bitmap->lock, flags);
 
-		if (dirty | need_write) {
+		if (dirty | need_write)
 			err = write_page(bitmap, page, 0);
-			if (err == -EAGAIN) {
-				if (dirty)
-					err = write_page(bitmap, page, 1);
-				else
-					err = 0;
-			}
-			if (err)
-				return 1;
-		}
 	}
 	if (wait) { /* if any writes were performed, we need to wait on them */
 		if (bitmap->file)
-			bitmap_writeback(bitmap);
+			wait_event(bitmap->write_wait,
+				   atomic_read(&bitmap->pending_writes)==0);
 		else
 			md_super_wait(bitmap->mddev);
 	}
+	if (bitmap->flags & BITMAP_WRITE_ERROR)
+		bitmap_file_kick(bitmap);
 	return 0;
 }
 
@@ -800,7 +823,7 @@ static int bitmap_init_from_disk(struct 
 	struct page *page = NULL, *oldpage = NULL;
 	unsigned long num_pages, bit_cnt = 0;
 	struct file *file;
-	unsigned long bytes, offset, dummy;
+	unsigned long bytes, offset;
 	int outofdate;
 	int ret = -ENOSPC;
 	void *paddr;
@@ -853,7 +876,12 @@ static int bitmap_init_from_disk(struct 
 		index = file_page_index(i);
 		bit = file_page_offset(i);
 		if (index != oldindex) { /* this is a new page, read it in */
+			int count;
 			/* unmap the old page, we're done with it */
+			if (index == num_pages-1)
+				count = bytes - index * PAGE_SIZE;
+			else
+				count = PAGE_SIZE;
 			if (index == 0) {
 				/*
 				 * if we're here then the superblock page
@@ -863,7 +891,7 @@ static int bitmap_init_from_disk(struct 
 				page = bitmap->sb_page;
 				offset = sizeof(bitmap_super_t);
 			} else if (file) {
-				page = read_page(file, index, &dummy);
+				page = read_page(file, index, bitmap, count);
 				offset = 0;
 			} else {
 				page = read_sb_page(bitmap->mddev, bitmap->offset, index);
@@ -999,9 +1027,6 @@ int bitmap_daemon_work(struct bitmap *bi
 				spin_unlock_irqrestore(&bitmap->lock, flags);
 				if (need_write) {
 					switch (write_page(bitmap, page, 0)) {
-					case -EAGAIN:
-						set_page_attr(bitmap, page, BITMAP_PAGE_NEEDWRITE);
-						break;
 					case 0:
 						break;
 					default:
@@ -1017,10 +1042,6 @@ int bitmap_daemon_work(struct bitmap *bi
 					clear_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE);
 					spin_unlock_irqrestore(&bitmap->lock, flags);
 					err = write_page(bitmap, lastpage, 0);
-					if (err == -EAGAIN) {
-						err = 0;
-						set_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE);
-					}
 				} else {
 					set_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE);
 					spin_unlock_irqrestore(&bitmap->lock, flags);
@@ -1070,10 +1091,6 @@ int bitmap_daemon_work(struct bitmap *bi
 			clear_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE);
 			spin_unlock_irqrestore(&bitmap->lock, flags);
 			err = write_page(bitmap, lastpage, 0);
-			if (err == -EAGAIN) {
-				set_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE);
-				err = 0;
-			}
 		} else {
 			set_page_attr(bitmap, lastpage, BITMAP_PAGE_NEEDWRITE);
 			spin_unlock_irqrestore(&bitmap->lock, flags);
@@ -1083,32 +1100,6 @@ int bitmap_daemon_work(struct bitmap *bi
 	return err;
 }
 
-static void bitmap_writeback(struct bitmap *bitmap)
-{
-	struct page *page;
-	struct page_list *item;
-	int err = 0;
-
-	PRINTK("%s: bitmap writeback daemon woke up...\n", bmname(bitmap));
-	/* wait on bitmap page writebacks */
-	while ((item = dequeue_page(bitmap))) {
-		page = item->page;
-		mempool_free(item, bitmap->write_pool);
-		PRINTK("wait on page writeback: %p\n", page);
-		wait_on_page_writeback(page);
-		PRINTK("finished page writeback: %p\n", page);
-
-		err = PageError(page);
-		if (err) {
-			printk(KERN_WARNING "%s: bitmap file writeback "
-			       "failed (page %lu): %d\n",
-			       bmname(bitmap), page->index, err);
-			bitmap_file_kick(bitmap);
-			break;
-		}
-	}
-}
-
 static bitmap_counter_t *bitmap_get_counter(struct bitmap *bitmap,
 					    sector_t offset, int *blocks,
 					    int create)
@@ -1377,8 +1368,6 @@ static void bitmap_free(struct bitmap *b
 
 	/* free all allocated memory */
 
-	mempool_destroy(bitmap->write_pool);
-
 	if (bp) /* deallocate the page memory */
 		for (k = 0; k < pages; k++)
 			if (bp[k].map && !bp[k].hijacked)
@@ -1428,17 +1417,13 @@ int bitmap_create(mddev_t *mddev)
 	spin_lock_init(&bitmap->lock);
 	bitmap->mddev = mddev;
 
-	spin_lock_init(&bitmap->write_lock);
-	INIT_LIST_HEAD(&bitmap->complete_pages);
-	bitmap->write_pool = mempool_create_kmalloc_pool(WRITE_POOL_SIZE,
-						sizeof(struct page_list));
-	err = -ENOMEM;
-	if (!bitmap->write_pool)
-		goto error;
-
 	bitmap->file = file;
 	bitmap->offset = mddev->bitmap_offset;
 	if (file) get_file(file);
+
+	/* Ensure we read fresh data */
+	invalidate_inode_pages(file->f_dentry->d_inode->i_mapping);
+
 	/* read superblock from bitmap file (this sets bitmap->chunksize) */
 	err = bitmap_read_sb(bitmap);
 	if (err)
@@ -1461,6 +1446,9 @@ int bitmap_create(mddev_t *mddev)
 
 	bitmap->syncchunk = ~0UL;
 
+	atomic_set(&bitmap->pending_writes, 0);
+	init_waitqueue_head(&bitmap->write_wait);
+
 #ifdef INJECT_FATAL_FAULT_1
 	bitmap->bp = NULL;
 #else
@@ -1503,4 +1491,3 @@ EXPORT_SYMBOL(bitmap_start_sync);
 EXPORT_SYMBOL(bitmap_end_sync);
 EXPORT_SYMBOL(bitmap_unplug);
 EXPORT_SYMBOL(bitmap_close_sync);
-EXPORT_SYMBOL(bitmap_daemon_work);

diff ./include/linux/raid/bitmap.h~current~ ./include/linux/raid/bitmap.h
--- ./include/linux/raid/bitmap.h~current~	2006-05-12 15:55:37.000000000 +1000
+++ ./include/linux/raid/bitmap.h	2006-05-12 16:01:06.000000000 +1000
@@ -140,6 +140,7 @@ typedef __u16 bitmap_counter_t;
 enum bitmap_state {
 	BITMAP_ACTIVE = 0x001, /* the bitmap is in use */
 	BITMAP_STALE  = 0x002,  /* the bitmap file is out of date or had -EIO */
+	BITMAP_WRITE_ERROR = 0x004, /* A write error has occurred */
 	BITMAP_HOSTENDIAN = 0x8000,
 };
 
@@ -244,9 +245,9 @@ struct bitmap {
 	unsigned long daemon_lastrun; /* jiffies of last run */
 	unsigned long daemon_sleep; /* how many seconds between updates? */
 
-	spinlock_t write_lock;
-	struct list_head complete_pages;
-	mempool_t *write_pool;
+	atomic_t pending_writes; /* pending writes to the bitmap file */
+	wait_queue_head_t write_wait;
+
 };
 
 /* the bitmap API */

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 002 of 8] md/bitmap: Remove bitmap writeback daemon.
  2006-05-12  6:07 ` [PATCH 002 of 8] md/bitmap: Remove bitmap writeback daemon NeilBrown
@ 2006-05-12 17:40   ` Andrew Morton
  2006-05-13  3:14     ` Neil Brown
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2006-05-12 17:40 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, linux-kernel, paul.clements

NeilBrown <neilb@suse.de> wrote:
>
>  ./drivers/md/bitmap.c         |  115 ++----------------------------------------

hmm.  I hope we're not doing any of that filesystem I/O within the context
of submit_bio() or kblockd or anything like that.  Looks OK from a quick
scan.

a_ops->commit_write() already ran set_page_dirty(), so you don't need that
in there.

I assume this always works in units of a complete page?  It's strange to do
prepare_write() followed immediately by commit_write().  Normally
prepare_write() will do some prereading, but it's smart enough to not do
that if the caller is preparing to write the whole page.

We normally use PAGE_CACHE_SIZE for these things, not PAGE_SIZE.  Same diff.

If you have a page and you want to write the whole thing out then there's
really no need to run prepare_write or commit_write at all.  Just
initialise the whole page, run set_page_dirty() then write_one_page().

Perhaps it should check that the backing filesystem actually implements
commit_write(), prepare_write(), readpage(), etc.  Some might not, and the
user will get taught not to do that via an oops.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling to use bmap to file blocks.
  2006-05-12  6:08 ` [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling to use bmap to file blocks NeilBrown
@ 2006-05-12 17:47   ` Andrew Morton
  2006-05-13  3:46     ` Neil Brown
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2006-05-12 17:47 UTC (permalink / raw)
  To: NeilBrown; +Cc: linux-raid, linux-kernel, paul.clements

NeilBrown <neilb@suse.de> wrote:
>
> If md is asked to store a bitmap in a file, it tries to hold onto the
> page cache pages for that file, manipulate them directly, and call a
> cocktail of operations to write the file out.  I don't believe this is
> a supportable approach.

erk.  I think it's better than...

> This patch changes the approach to use the same approach as swap files.
> i.e. bmap is used to enumerate all the block address of parts of the file
> and we write directly to those blocks of the device.

That's going in at a much lower level.  Even swapfiles don't assume
buffer_heads.

When playing with bmap() one needs to be careful that nobody has truncated
the file on you, else you end up writing to disk blocks which aren't part
of the file any more.

All this (and a set_fs(KERNEL_DS), ug) looks like a step backwards to me. 
Operating at the pagecache a_ops level looked better, and more
filesystem-independent.

I haven't looked at this patch at all closely yet.  Do I really need to?

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 002 of 8] md/bitmap: Remove bitmap writeback daemon.
  2006-05-12 17:40   ` Andrew Morton
@ 2006-05-13  3:14     ` Neil Brown
  2006-05-13  6:59       ` Andrew Morton
  0 siblings, 1 reply; 22+ messages in thread
From: Neil Brown @ 2006-05-13  3:14 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel, paul.clements

On Friday May 12, akpm@osdl.org wrote:
> NeilBrown <neilb@suse.de> wrote:
> >
> >  ./drivers/md/bitmap.c         |  115 ++----------------------------------------
> 
> hmm.  I hope we're not doing any of that filesystem I/O within the context
> of submit_bio() or kblockd or anything like that.  Looks OK from a quick
> scan.

No.  We do all the I/O from the context of the per-array thread.
However some IO requests cannot complete until the filesystem I/O
completes, so we need to be sure that the filesystem I/O won't block
waiting for memory, or fail with -ENOMEM.

> 
> a_ops->commit_write() already ran set_page_dirty(), so you don't need that
> in there.

Is that documented somewhere..... but yes, that seems to be right. Thanks.

> 
> I assume this always works in units of a complete page?  It's strange to do
> prepare_write() followed immediately by commit_write().  Normally
> prepare_write() will do some prereading, but it's smart enough to not do
> that if the caller is preparing to write the whole page.
> 

Yes, it is strange.  That was one of the things that made me want to
review this code and figure out how to do it "properly".

As far as I can see, much of 'address_space' is really an internal
interface to support routines used by the filesystem.  A filesystem
may choose to use address spaces, and has a fair degree of freedom
when it comes to which bits to make use of and exactly what they
mean.
About the only thing that *has* to be supported is ->writepages --
which has a fair degree of latitude in exactly what it does -- and
->writepage -- which can only be called after locking a page and
rechecking the ->mapping.

bitmap.c is currently trying to do something every different.
It uses ->readpage to get pages in the page cache (even though some
address spaces don't define ->readpage) and then holds onto those
pages without holding the page lock, and then calls ->writepage to
flush them out from time to time.
Before calling writepage it gets the pagelock, but doesn't re-check
that ->mapping is correct (there is nothing much it can do if it isn't
correct..).

I noticed this is particularly a problem with tmpfs.  When you call
writepage on a tmpfs page, the page is swizzled into the swap cache,
and ->mapping becomes NULL - not the behaviour that bitmap is
expecting.

Now I agree that tmpfs is an unusual case, and that storing a bitmap
in tmpfs doesn't make a lot of sense (though it can make some...) but
the point is that if a filesystem is allowed to move pages around like
that, then bitmap cannot hold on to pages in the page cache like it
wants to.  It simply isn't a well defined thing to do.


> We normally use PAGE_CACHE_SIZE for these things, not PAGE_SIZE.  Same diff.
> 

Yeah.... why is that?  Why have two names for exactly the same value?
How does a poor develop know when to use one and when the other?  More
particularly, how does one remember.
I would argue that the "same diff" should be no difference - not even
in spelling....

> If you have a page and you want to write the whole thing out then there's
> really no need to run prepare_write or commit_write at all.  Just
> initialise the whole page, run set_page_dirty() then write_one_page().
> 

I see that now.  But only after locking the page, and rechecking that
->mapping is correct, and if it isn't .... well, more work is involved
that bitmap is in a position to do.

> Perhaps it should check that the backing filesystem actually implements
> commit_write(), prepare_write(), readpage(), etc.  Some might not, and the
> user will get taught not to do that via an oops.
> 

Might help, but as I think you've gathered, I really want a whole
different approach to writing to the file.  One that I can justify as
being a "correct" use of interfaces, and also that I can be certain
will not block or fail on a kmalloc or similar.  Hence the bmap thing
later.


Thanks for your feedback.

NeilBrown

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling to use bmap to file blocks.
  2006-05-12 17:47   ` Andrew Morton
@ 2006-05-13  3:46     ` Neil Brown
  2006-05-13  6:59       ` Andrew Morton
  0 siblings, 1 reply; 22+ messages in thread
From: Neil Brown @ 2006-05-13  3:46 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel, paul.clements

On Friday May 12, akpm@osdl.org wrote:
> NeilBrown <neilb@suse.de> wrote:
> >
> > If md is asked to store a bitmap in a file, it tries to hold onto the
> > page cache pages for that file, manipulate them directly, and call a
> > cocktail of operations to write the file out.  I don't believe this is
> > a supportable approach.
> 
> erk.  I think it's better than...
> 
> > This patch changes the approach to use the same approach as swap files.
> > i.e. bmap is used to enumerate all the block address of parts of the file
> > and we write directly to those blocks of the device.
> 
> That's going in at a much lower level.  Even swapfiles don't assume
> buffer_heads.

I'm not "assuming" buffer_heads.  I'm creating buffer heads and using
them for my own purposes.  These are my pages and my buffer heads.
None of them belong to the filesystem.
The buffer_heads are simply a convenient data-structure to record the
several block addresses for each page.  I could have equally created
an array storing all the addresses, and built the required bios by
hand at write time.  But buffer_heads did most of the work for me, so
I used them.

Yes, it is a lower level, but
 1/ I am certain that there will be no kmalloc problems and
 2/ Because it is exactly the level used by swapfile, I know that it
    is sufficiently 'well defined' that no-one is going to break it.

> 
> When playing with bmap() one needs to be careful that nobody has truncated
> the file on you, else you end up writing to disk blocks which aren't part
> of the file any more.

Well we currently play games with i_write_count to ensure that no-one
else has the file open for write.  And if no-one else can get write
access, then it cannot be truncated.
I did think about adding the S_SWAPFILE flag, but decided to leave
that for a separate patch and review different approaches to
preventing write access first (e.g. can I use a lease?).

> 
> All this (and a set_fs(KERNEL_DS), ug) looks like a step backwards to me. 
> Operating at the pagecache a_ops level looked better, and more
> filesystem-independent.

If you really want filesystem independence, you need to use vfs_read
and vfs_write to read/write the file.  I have a patch which did that,
but decided that the possibility of kmalloc failure at awkward times
would make that not suitable.
So I now use vfs_read to read in the file (just like knfsd) and
bmap/submit_bh to write out the file (just like swapfile).

I don't think a_ops really provides an interface that I can use, partly
because, as I said in a previous email, it isn't really a public
interface to a filesystem.

> 
> I haven't looked at this patch at all closely yet.  Do I really need to?

I assume you are asking that because you hope I will retract the
patch.  While I'm always open to being educated, I am not yet
convinced that there is any better way, or even any other usable way,
to do what needs to be done, so I am not inclined to retract the
patch.

I'd like to say that you don't need to read it because it is perfect,
but unfortunately history suggests that is unlikely to be true.

Whether you look more closely is of course up to you, but I'm convinced
that patch is in the right direction, and your review and comments are
always very valuable.

NeilBrown

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 002 of 8] md/bitmap: Remove bitmap writeback daemon.
  2006-05-13  3:14     ` Neil Brown
@ 2006-05-13  6:59       ` Andrew Morton
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2006-05-13  6:59 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid, linux-kernel, paul.clements

Neil Brown <neilb@suse.de> wrote:
>
> On Friday May 12, akpm@osdl.org wrote:
> > NeilBrown <neilb@suse.de> wrote:
> > >
> > >  ./drivers/md/bitmap.c         |  115 ++----------------------------------------
> > 
> > hmm.  I hope we're not doing any of that filesystem I/O within the context
> > of submit_bio() or kblockd or anything like that.  Looks OK from a quick
> > scan.
> 
> No.  We do all the I/O from the context of the per-array thread.

OK.

> However some IO requests cannot complete until the filesystem I/O
> completes, so we need to be sure that the filesystem I/O won't block
> waiting for memory, or fail with -ENOMEM.

That sounds like a complex deadlock.  Suppose the bitmap writeout requres
some writeback to happen before it can get enough memory to proceed.

> bitmap.c is currently trying to do something every different.
> It uses ->readpage to get pages in the page cache (even though some
> address spaces don't define ->readpage) and then holds onto those
> pages without holding the page lock, and then calls ->writepage to
> flush them out from time to time.

That's OK.

> Before calling writepage it gets the pagelock, but doesn't re-check
> that ->mapping is correct (there is nothing much it can do if it isn't
> correct..).

We do that to find out if the page was truncated while we ewre waiting for
the page lock.

> I noticed this is particularly a problem with tmpfs.  When you call
> writepage on a tmpfs page, the page is swizzled into the swap cache,
> and ->mapping becomes NULL - not the behaviour that bitmap is
> expecting.

You shouldn't call writepage on tmpfs.  If !bdi_cap_writeback_dirty(), your
write_page() should immediately give up and "succeed".  Because this
filesystem has no permanent backing store anyway.

tmpfs doesn't implement bmap(), so your new patch fixes that problem ;)

> > We normally use PAGE_CACHE_SIZE for these things, not PAGE_SIZE.  Same diff.
> > 
> 
> Yeah.... why is that?  Why have two names for exactly the same value?

I guess at one time it was thought that we could use a different unit size
for pagecache.  Nowadays we use it by convention - it has little more than
documentary value.

> How does a poor develop know when to use one and when the other?  More
> particularly, how does one remember.

If the page is a pagecache page, use PAGE_CACHE_SIZE.  There are a few
spots where it blurs, but not many.

> > If you have a page and you want to write the whole thing out then there's
> > really no need to run prepare_write or commit_write at all.  Just
> > initialise the whole page, run set_page_dirty() then write_one_page().
> > 
> 
> I see that now.  But only after locking the page, and rechecking that
> ->mapping is correct, and if it isn't .... well, more work is involved
> that bitmap is in a position to do.

If you've bumped i_writecount to block truncate then there's no need to
check for truncation after locking the page.

i_mutex could be used similarly.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling to use bmap to file blocks.
  2006-05-13  3:46     ` Neil Brown
@ 2006-05-13  6:59       ` Andrew Morton
  2006-05-13 15:29         ` Paul Clements
  2006-05-15  0:26         ` Neil Brown
  0 siblings, 2 replies; 22+ messages in thread
From: Andrew Morton @ 2006-05-13  6:59 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid, linux-kernel, paul.clements

Neil Brown <neilb@suse.de> wrote:
>
> On Friday May 12, akpm@osdl.org wrote:
> > NeilBrown <neilb@suse.de> wrote:
> > >
> > > If md is asked to store a bitmap in a file, it tries to hold onto the
> > > page cache pages for that file, manipulate them directly, and call a
> > > cocktail of operations to write the file out.  I don't believe this is
> > > a supportable approach.
> > 
> > erk.  I think it's better than...
> > 
> > > This patch changes the approach to use the same approach as swap files.
> > > i.e. bmap is used to enumerate all the block address of parts of the file
> > > and we write directly to those blocks of the device.
> > 
> > That's going in at a much lower level.  Even swapfiles don't assume
> > buffer_heads.
> 
> I'm not "assuming" buffer_heads.  I'm creating buffer heads and using
> them for my own purposes.  These are my pages and my buffer heads.
> None of them belong to the filesystem.

Right, so it's incoherent with pagecache and userspace can no longer
usefully read this file.

> The buffer_heads are simply a convenient data-structure to record the
> several block addresses for each page.  I could have equally created
> an array storing all the addresses, and built the required bios by
> hand at write time.  But buffer_heads did most of the work for me, so
> I used them.

OK.

> Yes, it is a lower level, but
>  1/ I am certain that there will be no kmalloc problems and
>  2/ Because it is exactly the level used by swapfile, I know that it
>     is sufficiently 'well defined' that no-one is going to break it.

It would be nicer of course to actually use the mm/page_io.c code.  That
would involve implementing swap_bmap() and reimplementing the
get_swap_bio() stuff in terms of a_ops->bmap().

But the swap code can afford to skip blockruns which aren't page-sized and
it uses that capability nicely.  You cannot do that.

> > 
> > All this (and a set_fs(KERNEL_DS), ug) looks like a step backwards to me. 
> > Operating at the pagecache a_ops level looked better, and more
> > filesystem-independent.
> 
> If you really want filesystem independence, you need to use vfs_read
> and vfs_write to read/write the file.

yup.

>  I have a patch which did that,
> but decided that the possibility of kmalloc failure at awkward times
> would make that not suitable.

submit_bh() can and will allocate memory, although most decent device
drivers should be OK.

There are tricks we can do with writepage.  If the backing filesystem uses
buffer_heads and if you hold a ref on the page then we know that there
won't be any buffer_head allocations nor any disk reads in the writepage
path.  It'll go direct into bio_alloc+submit_bio, just as you're doing now.
IOW: no gain.

> So I now use vfs_read to read in the file (just like knfsd) and
> bmap/submit_bh to write out the file (just like swapfile).
> 
> I don't think a_ops really provides an interface that I can use, partly
> because, as I said in a previous email, it isn't really a public
> interface to a filesystem.

It's publicer than bmap+submit_bh!

> > 
> > I haven't looked at this patch at all closely yet.  Do I really need to?
> 
> I assume you are asking that because you hope I will retract the
> patch.

Was kinda hoping that would be the outcome.  It's rather gruesome, using
set_fs()+vfs_read() on one side and submit_bh() on the other.

Are you sure the handling at EOF for a non-multiple-of-PAGE_SIZE file
is OK?

The loss of pagecache coherency seems sad.  I assume there's never a
requirement for userspace to read this file.

invalidate_inode_pages() is best-effort.  If someone else has the page
locked or if the page is mmapped then the attempt to take down the
pagecache will fail.  That's relatively-OK, because it'll just lead to
userspace seeing wrong stuff, and we've already conceded that.

But if the pagecache is dirty then invalidate_inode_pages() will leave it
dirty and the VM will later write it back, corrupting your bitmap file. 
You should get i_writecount, fsync the file and then run
invalidate_inode_pages().

Or not run invalidate_inode_pages() - it doesn't gain anything and will
just reduce the observeability of bugs.  Better off leaving the pagecache
there all the time so that any rarely-occurring bugs become all-the-time
bugs.

You might as well use kernel_read() too, if you insist on begin oddball ;)

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling to use bmap to file blocks.
  2006-05-13  6:59       ` Andrew Morton
@ 2006-05-13 15:29         ` Paul Clements
  2006-05-13 15:42           ` Andrew Morton
  2006-05-15  0:26         ` Neil Brown
  1 sibling, 1 reply; 22+ messages in thread
From: Paul Clements @ 2006-05-13 15:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Neil Brown, linux-raid, linux-kernel

Andrew Morton wrote:

> The loss of pagecache coherency seems sad.  I assume there's never a
> requirement for userspace to read this file.

Actually, there is. mdadm reads the bitmap file, so that would be 
broken. Also, it's just useful for a user to be able to read the bitmap 
(od -x, or similar) to figure out approximately how much more he's got 
to resync to get an array in-sync. Other than reading the bitmap file, I 
don't know of any way to determine that.

--
Paul


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling to use bmap to file blocks.
  2006-05-13 15:29         ` Paul Clements
@ 2006-05-13 15:42           ` Andrew Morton
  2006-05-14 11:15             ` Neil Brown
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2006-05-13 15:42 UTC (permalink / raw)
  To: Paul Clements; +Cc: neilb, linux-raid, linux-kernel

Paul Clements <paul.clements@steeleye.com> wrote:
>
> Andrew Morton wrote:
> 
> > The loss of pagecache coherency seems sad.  I assume there's never a
> > requirement for userspace to read this file.
> 
> Actually, there is. mdadm reads the bitmap file, so that would be 
> broken. Also, it's just useful for a user to be able to read the bitmap 
> (od -x, or similar) to figure out approximately how much more he's got 
> to resync to get an array in-sync. Other than reading the bitmap file, I 
> don't know of any way to determine that.

Read it with O_DIRECT :(

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling to use bmap to file blocks.
  2006-05-13 15:42           ` Andrew Morton
@ 2006-05-14 11:15             ` Neil Brown
  2006-05-14 11:22               ` Andrew Morton
  0 siblings, 1 reply; 22+ messages in thread
From: Neil Brown @ 2006-05-14 11:15 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Paul Clements, linux-raid, linux-kernel

On Saturday May 13, akpm@osdl.org wrote:
> Paul Clements <paul.clements@steeleye.com> wrote:
> >
> > Andrew Morton wrote:
> > 
> > > The loss of pagecache coherency seems sad.  I assume there's never a
> > > requirement for userspace to read this file.
> > 
> > Actually, there is. mdadm reads the bitmap file, so that would be 
> > broken. Also, it's just useful for a user to be able to read the bitmap 
> > (od -x, or similar) to figure out approximately how much more he's got 
> > to resync to get an array in-sync. Other than reading the bitmap file, I 
> > don't know of any way to determine that.
> 
> Read it with O_DIRECT :(

Which is exactly what the next release of mdadm does.
As the patch comment said:

: With this approach the pagecache may contain data which is inconsistent with 
: what is on disk.  To alleviate the problems this can cause, md invalidates
: the pagecache when releasing the file.  If the file is to be examined
: while the array is active (a non-critical but occasionally useful function),
: O_DIRECT io must be used.  And new version of mdadm will have support for this.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling to use bmap to file blocks.
  2006-05-14 11:15             ` Neil Brown
@ 2006-05-14 11:22               ` Andrew Morton
  0 siblings, 0 replies; 22+ messages in thread
From: Andrew Morton @ 2006-05-14 11:22 UTC (permalink / raw)
  To: Neil Brown; +Cc: paul.clements, linux-raid, linux-kernel

Neil Brown <neilb@suse.de> wrote:
>
> On Saturday May 13, akpm@osdl.org wrote:
> > Paul Clements <paul.clements@steeleye.com> wrote:
> > >
> > > Andrew Morton wrote:
> > > 
> > > > The loss of pagecache coherency seems sad.  I assume there's never a
> > > > requirement for userspace to read this file.
> > > 
> > > Actually, there is. mdadm reads the bitmap file, so that would be 
> > > broken. Also, it's just useful for a user to be able to read the bitmap 
> > > (od -x, or similar) to figure out approximately how much more he's got 
> > > to resync to get an array in-sync. Other than reading the bitmap file, I 
> > > don't know of any way to determine that.
> > 
> > Read it with O_DIRECT :(
> 
> Which is exactly what the next release of mdadm does.
> As the patch comment said:
> 
> : With this approach the pagecache may contain data which is inconsistent with 
> : what is on disk.  To alleviate the problems this can cause, md invalidates
> : the pagecache when releasing the file.  If the file is to be examined
> : while the array is active (a non-critical but occasionally useful function),
> : O_DIRECT io must be used.  And new version of mdadm will have support for this.

Which doesn't help `od -x' and is going to cause older mdadm userspace to
mysteriously and subtly fail.  Or does the user<->kernel interface have
versioning which will prevent this?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling to use bmap to file blocks.
  2006-05-13  6:59       ` Andrew Morton
  2006-05-13 15:29         ` Paul Clements
@ 2006-05-15  0:26         ` Neil Brown
  2006-05-15 21:04           ` Andrew Morton
  1 sibling, 1 reply; 22+ messages in thread
From: Neil Brown @ 2006-05-15  0:26 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel, paul.clements


(replying to bits of several emails)

On Friday May 12, akpm@osdl.org wrote:
> Neil Brown <neilb@suse.de> wrote:

> > However some IO requests cannot complete until the filesystem I/O
> > completes, so we need to be sure that the filesystem I/O won't block
> > waiting for memory, or fail with -ENOMEM.
> 
> That sounds like a complex deadlock.  Suppose the bitmap writeout requres
> some writeback to happen before it can get enough memory to proceed.
> 

Exactly. Bitmap writeout must not block on fs-writeback.  It can block
on device writeout (e.g. queue congestion or mempool exhaustion) but
it must complete without waiting in the fs layer or above, and without
the possibility of any error other -EIO.  Otherwise we can get
deadlocked writing to the raid array. bh_submit (or bio_submit) is
certain to be safe in this respect.  I'm not so confident about
anything at the fs level.

> > > Read it with O_DIRECT :(
> > 
> > Which is exactly what the next release of mdadm does.
> > As the patch comment said:
> > 
> > : With this approach the pagecache may contain data which is inconsistent with 
> > : what is on disk.  To alleviate the problems this can cause, md invalidates
> > : the pagecache when releasing the file.  If the file is to be examined
> > : while the array is active (a non-critical but occasionally useful function),
> > : O_DIRECT io must be used.  And new version of mdadm will have support for this.
> 
> Which doesn't help `od -x' and is going to cause older mdadm userspace to
> mysteriously and subtly fail.  Or does the user<->kernel interface have
> versioning which will prevent this?
> 

As I said: 'non-critical'.  Nothing important breaks if reading the
file gets old data.  Reading the file while the array is active is
purely a curiosity thing.  There is information in /proc/mdstat which
gives a fairly coarse view of the same data.  It could lead to some
confusion, but if a compliant mdadm comes out before this gets into a
mainline kernel, I doubt there will be any significant issue.

Read/writing the bitmap needs to work reliably when the array is not
active, but suitable sync/invalidate calls in the kernel should make
that work perfectly.

I know this is technically a regression in user-space interface, and
you don't like such regression with good reason.... Maybe I could call
invalidate_inode_pages every few seconds or whenever the atime
changes, just to be on the safe side :-)

> >  I have a patch which did that,
> > but decided that the possibility of kmalloc failure at awkward times
> > would make that not suitable.
> 
> submit_bh() can and will allocate memory, although most decent device
> drivers should be OK.
> 

submit_bh (like all decent device drivers) uses a mempool for memory
allocation so we can be sure that the delay in getting memory is
bounded by the delay for a few IO requests to complete, and we can be
sure the allocation won't fail.  This is perfectly fine.

> > 
> > I don't think a_ops really provides an interface that I can use, partly
> > because, as I said in a previous email, it isn't really a public
> > interface to a filesystem.
> 
> It's publicer than bmap+submit_bh!
> 

I don't know how you can say that.

bmap is so public that it is exported to userspace through an IOCTL
and is used by lilo (admitted only for reading, not writing).  More
significantly it is used by swapfile which is a completely independent
subsystem from the filesystem.  Contrast this with a_ops.  The primary
users of a_ops are routines like generic_file_{read,write} and
friends.  These are tools - library routines - that are used by
filesystems to implement their 'file_operations' which are the real
public interface.  As far as these uses go, it is not a public
interface.  Where a filesystem doesn't use some library routines, it
does not need to implement the matching functionality in the a_op
interface.

The other main user is the 'VM' which might try to flush out or
invalidate pages.  However the VM isn't using this interface to
interact with files, but only to interact with pages, and it doesn't
much care what is done with the pages providing they get clean, or get
released, or whatever.

The way I perceive Linux design/development, active usage is far more
significant than documented design.  If some feature of an interface
isn't being actively used - by in-kernel code - then you cannot be
sure that feature will be uniformly implemented, or that it won't
change subtly next week.

So when I went looking for the best way to get md/bitmap to write to a
file, I didn't just look at the interface specs (which are pretty
poorly documented anyway), I looked at existing code.
I can find 3 different parts of the kernel that write to a file.
They are
   swap-file
   loop
   nfsd

nfsd uses vfs_read/vfs_write  which have too many failure/delay modes
  for me to safely use.
loop uses prepare_write/commit_write (if available) or f_op->write
  (not vfs_write - I wonder why) which is not much better than what
  nfsd uses.  And as far as I can tell loop never actually flushes data to 
  storage (hope no-one thinks a journalling filesystem on loop is a
  good idea) so it isn't a good model to follow.
  [[If loop was a good model to follow, I would have rejected the
  code for writing to a file in the first place, only accepted code
  for writing to a device, and insisted on using loop to close the gap]]

That leaves swap-file as the only in-kernel user of a filesystem that
accesses the file in a way similar to what I need.  Is it any surprise
that I chose to copy it?


> > > 
> > > I haven't looked at this patch at all closely yet.  Do I really need to?
> > 
> > I assume you are asking that because you hope I will retract the
> > patch.
> 
> Was kinda hoping that would be the outcome.  It's rather gruesome, using
> set_fs()+vfs_read() on one side and submit_bh() on the other.
> 

More gruesome than a_op->readpage on one side and bmap/submit_bio
on the other side like swapfile does?  However if it makes you happy I
can change the read side to anything that you suggest - submit_bh
would be the 'obvious' choice I guess.

> Are you sure the handling at EOF for a non-multiple-of-PAGE_SIZE file
> is OK?

I think so yes, but I will review it.
I assume that if bmap gave me a non-zero block number for the last
(partial) block, then it is safe to write that whole block.  I guess
that is a subtlety in semantics that no other user would be using.
However if a filesystem were packing the tail, then it would not be
able to return an integral block number for some fragments, so
hopefully it wouldn't for any.

> 
> But if the pagecache is dirty then invalidate_inode_pages() will leave it
> dirty and the VM will later write it back, corrupting your bitmap file. 
> You should get i_writecount, fsync the file and then run
> invalidate_inode_pages().

Well, as I am currently reading the file through the pagecache.... but
yes, I should put an fsync in there (and the invalidate before read is
fairly pointless.  It is the invalidate after close that is important).

> 
> You might as well use kernel_read() too, if you insist on begin oddball ;)

I didn't know about that one..

If you would feel more comfortable with kernel_read I would be more
than happy to use it.

In fact, if you can assure me that if I have an inode, and I have
  confirmed that
    bdi_cap_writeback_dirty(inode->i_mapping->backing_dev_info)
   && inode->i_mapping->a_ops->readpage != NULL

  and have disabled truncation, either via i_write_count or otherwise,
  then
   - read_cache_page( .... ..a_ops->readpage ....)
      will return a page that will, as long as I hold a counted reference,
      remain in the page cache for that file at the appropriate address
      (even though I don't hold it locked)
   - set_page_dirty and write_one_page
      will not fail, except with -EIO, and will not block waiting for
      any writeback except for writeback of that file (e.g. won't do
      any unprotected kmallocs)

then I'll be happy to go back to using that interface, though I guess I'd
have to figure out what to do about redirty_page_for_writepage calls...

But if you can assure me of the above, then I'd be curious as to why
swapfile doesn't use the readpage/writepage interface....

NeilBrown

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling to use bmap to file blocks.
  2006-05-15  0:26         ` Neil Brown
@ 2006-05-15 21:04           ` Andrew Morton
  2006-05-15 23:03             ` Neil Brown
  0 siblings, 1 reply; 22+ messages in thread
From: Andrew Morton @ 2006-05-15 21:04 UTC (permalink / raw)
  To: Neil Brown; +Cc: linux-raid, linux-kernel, paul.clements

Neil Brown <neilb@suse.de> wrote:
>
> ...
> 
> > >  I have a patch which did that,
> > > but decided that the possibility of kmalloc failure at awkward times
> > > would make that not suitable.
> > 
> > submit_bh() can and will allocate memory, although most decent device
> > drivers should be OK.
> > 
> 
> submit_bh (like all decent device drivers) uses a mempool for memory
> allocation so we can be sure that the delay in getting memory is
> bounded by the delay for a few IO requests to complete, and we can be
> sure the allocation won't fail.  This is perfectly fine.

My point is that some device drivers will apparently call into the mamory
allocator (should be GFP_NOIO) on the request submission path.  I don't
recall whcih drivers - but not mainstream ones.

> > > 
> > > I don't think a_ops really provides an interface that I can use, partly
> > > because, as I said in a previous email, it isn't really a public
> > > interface to a filesystem.
> > 
> > It's publicer than bmap+submit_bh!
> > 
> 
> I don't know how you can say that.

bmap() is a userspace API, not really a kernel one.  And it assumes a
block-backed filesystem.

> bmap is so public that it is exported to userspace through an IOCTL
> and is used by lilo (admitted only for reading, not writing).  More
> significantly it is used by swapfile which is a completely independent
> subsystem from the filesystem.  Contrast this with a_ops.  The primary
> users of a_ops are routines like generic_file_{read,write} and
> friends.  These are tools - library routines - that are used by
> filesystems to implement their 'file_operations' which are the real
> public interface.  As far as these uses go, it is not a public
> interface.  Where a filesystem doesn't use some library routines, it
> does not need to implement the matching functionality in the a_op
> interface.

Well yeah.  If one wants to do filesystem I/O in-kernel then one should use
the filesystem I/O entry points: read() and write() (and get flamed ;)).

If one wants to cut in at a lower level than that then we get into a pickle
because different types of filesytems do quite different things to
implement read() and write().

> The other main user is the 'VM' which might try to flush out or
> invalidate pages.  However the VM isn't using this interface to
> interact with files, but only to interact with pages, and it doesn't
> much care what is done with the pages providing they get clean, or get
> released, or whatever.
> 
> The way I perceive Linux design/development, active usage is far more
> significant than documented design.  If some feature of an interface
> isn't being actively used - by in-kernel code - then you cannot be
> sure that feature will be uniformly implemented, or that it won't
> change subtly next week.
> 
> So when I went looking for the best way to get md/bitmap to write to a
> file, I didn't just look at the interface specs (which are pretty
> poorly documented anyway), I looked at existing code.
> I can find 3 different parts of the kernel that write to a file.
> They are
>    swap-file
>    loop
>    nfsd
> 
> nfsd uses vfs_read/vfs_write  which have too many failure/delay modes
>   for me to safely use.
> loop uses prepare_write/commit_write (if available) or f_op->write
>   (not vfs_write - I wonder why) which is not much better than what
>   nfsd uses.  And as far as I can tell loop never actually flushes data to 
>   storage (hope no-one thinks a journalling filesystem on loop is a
>   good idea) so it isn't a good model to follow.
>   [[If loop was a good model to follow, I would have rejected the
>   code for writing to a file in the first place, only accepted code
>   for writing to a device, and insisted on using loop to close the gap]]
> 
> That leaves swap-file as the only in-kernel user of a filesystem that
> accesses the file in a way similar to what I need.  Is it any surprise
> that I chose to copy it?

swapfile doesn't use vfs_read() for swapin...

> 
> > 
> > But if the pagecache is dirty then invalidate_inode_pages() will leave it
> > dirty and the VM will later write it back, corrupting your bitmap file. 
> > You should get i_writecount, fsync the file and then run
> > invalidate_inode_pages().
> 
> Well, as I am currently reading the file through the pagecache.... but
> yes, I should put an fsync in there (and the invalidate before read is
> fairly pointless.  It is the invalidate after close that is important).

umm, the code in its present form _will_ corrupt MD bitmap files. 
invalidate_inode_pages() will not invalidate dirty pagecache, and that
dirty pagecache will later get written back, undoing any of your
intervening direct-io writes.  Most of the time, MD will do another
direct-io write _after_ the VM has done that writeback and things will get
fixed up.  But there is a window.  So that fsync() is required.

> > 
> > You might as well use kernel_read() too, if you insist on begin oddball ;)
> 
> I didn't know about that one..
> 
> If you would feel more comfortable with kernel_read I would be more
> than happy to use it.

We might as well save some code.

> In fact, if you can assure me that if I have an inode, and I have
>   confirmed that
>     bdi_cap_writeback_dirty(inode->i_mapping->backing_dev_info)
>    && inode->i_mapping->a_ops->readpage != NULL
> 
>   and have disabled truncation, either via i_write_count or otherwise,
>   then
>    - read_cache_page( .... ..a_ops->readpage ....)
>       will return a page that will, as long as I hold a counted reference,
>       remain in the page cache for that file at the appropriate address
>       (even though I don't hold it locked)

yup, as long as the page is pinned with get_page().

>    - set_page_dirty and write_one_page
>       will not fail, except with -EIO, and will not block waiting for
>       any writeback except for writeback of that file (e.g. won't do
>       any unprotected kmallocs)

Mostly.  There is a risk that ->writepage() will need to read filesystem
metadata to locate the page's blocks.  Most of the time that won't happen
because the page still has buffers attached.  You'd have to keep the pages
locked to prevent that.  But writepage() will unlock the page so there's
still a teeny window where the page could get its buffers taken away.  Plus
we don't know that the filesystem caches the disk mapping in the page's
buffers anyway (ext2 -o nobh, for example).


> then I'll be happy to go back to using that interface, though I guess I'd
> have to figure out what to do about redirty_page_for_writepage calls...
> 
> But if you can assure me of the above, then I'd be curious as to why
> swapfile doesn't use the readpage/writepage interface....

If we did that we'd need to memcpy the data between the swapfile pagecache
and swapcache whenever doing swapin and swapout.  Plus swap goes
direct-to-BIO to minimise the risk of doing memory allocations on the
swapout path.  And the swap code treats swapdevs and swapfiles identically.
And the swap code cannot pin all of the swapfile's pagecache in memory
(would make it rather pointless), whereas the MD bitmap code can afford to
do this.

Ho hum, I give up.  I don't think, in practice, this code fixes any
demonstrable bug though.


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling to use bmap to file blocks.
  2006-05-15 21:04           ` Andrew Morton
@ 2006-05-15 23:03             ` Neil Brown
  0 siblings, 0 replies; 22+ messages in thread
From: Neil Brown @ 2006-05-15 23:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-raid, linux-kernel, paul.clements

On Monday May 15, akpm@osdl.org wrote:
> 
> Ho hum, I give up.

Thankyou :-)  I found our debate very valuable - it helped me clarify
my understanding of some areas of linux filesystem semantics (and as I
am trying to write a filesystem in my 'spare time', that will turn out
to be very useful).  It also revealed some problems in the code!

>                     I don't think, in practice, this code fixes any
> demonstrable bug though.

I thought it was our job to kill the bugs *before* they were
demonstrated :-)

I'm still convinced that the previous code could lead to deadlocks or
worse under sufficiently sustained high memory pressure and fs
activity.

I'll send a patch shortly that fixes the known problems and
awkwardnesses in the new code.

Thanks again,
NeilBrown

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2006-05-15 23:03 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2006-05-12  6:07 [PATCH 000 of 8] md/bitmap: Introduction - rework management of bitmap files NeilBrown
2006-05-12  6:07 ` [PATCH 001 of 8] md/bitmap: Fix online removal of file-backed bitmaps NeilBrown
2006-05-12  6:07 ` [PATCH 002 of 8] md/bitmap: Remove bitmap writeback daemon NeilBrown
2006-05-12 17:40   ` Andrew Morton
2006-05-13  3:14     ` Neil Brown
2006-05-13  6:59       ` Andrew Morton
2006-05-12  6:07 ` [PATCH 003 of 8] md/bitmap: Cleaner separation of page attribute handlers in md/bitmap NeilBrown
2006-05-12  6:07 ` [PATCH 004 of 8] md/bitmap: Use set_bit etc for bitmap page attributes NeilBrown
2006-05-12  6:07 ` [PATCH 005 of 8] md/bitmap: Remove unnecessary page reference manipulations from md/bitmap code NeilBrown
2006-05-12  6:07 ` [PATCH 006 of 8] md/bitmap: Remove dead code from md/bitmap NeilBrown
2006-05-12  6:08 ` [PATCH 007 of 8] md/bitmap: Tidy up i_writecount handling in md/bitmap NeilBrown
2006-05-12  6:08 ` [PATCH 008 of 8] md/bitmap: Change md/bitmap file handling to use bmap to file blocks NeilBrown
2006-05-12 17:47   ` Andrew Morton
2006-05-13  3:46     ` Neil Brown
2006-05-13  6:59       ` Andrew Morton
2006-05-13 15:29         ` Paul Clements
2006-05-13 15:42           ` Andrew Morton
2006-05-14 11:15             ` Neil Brown
2006-05-14 11:22               ` Andrew Morton
2006-05-15  0:26         ` Neil Brown
2006-05-15 21:04           ` Andrew Morton
2006-05-15 23:03             ` Neil Brown

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).