linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* CMA page migration failure due to buffers on bh_lru
@ 2012-08-30  1:03 Laura Abbott
  2012-08-31 20:46 ` Laura Abbott
  0 siblings, 1 reply; 2+ messages in thread
From: Laura Abbott @ 2012-08-30  1:03 UTC (permalink / raw)
  To: linux-fsdevel, Marek Szyprowski
  Cc: linux-kernel, linaro-mm-sig, linux-arm-kernel, linux-arm-msm

Hi,

I've been observing a high rate of failures with CMA allocations on my
ARM system. I've set up a test case set up with a 56MB CMA region that
essentially does the following:

	total_failures = 0;
	loop forever:
		loop_failure = 0;
		for (i = 0; i < 56; i++)
			chunk[i] = dma_allocate(&cma_dev, 1MB)
			if (!chunk[i])
				loop_failure = 0

		if (loop_failure)
			total_failures++
			loop_failure = 0

		for (i = 0; i < 56; i++)
			dma_free(&cma_dev, chunk[i], 1MB)

In the background, I also have a process doing some amount of filesystem
activity (adb push/pull since this is an android system). During the 
course of my investigations I generally get ~8500 loops total and ~450 
total failures (i.e. one or more buffers could not be allocated). This 
is unacceptably high for our use cases.

In every case the allocation failure was ultimately due to a migration
failure; the pages contained buffers which could not be dropped because 
the buffers were busy (move_to_new_page -> fallback_migrate_page ->
try_to_release_page -> try_to_free_buffers -> drop_buffers -> 
buffer_busy). In every case, the b_count on the buffer head was always 1.

The problem arises because of the LRU lists for buffer heads:

__getblk
     __getblk_slow
         grow_buffers
             grow_dev_page
                 find_or_create_page -- create a possibly movable page
         __find_get_block
	    __find_get_block_slow
                 find_get_page -- return the movable page
             bh_lru_install
                 get_bh -- buffer head now has a reference

The reference taken in bh_lru_install won't be dropped until the bh is 
evicted from the lru. This means the page cannot be migrated as long as 
the buffer exists on an LRU list. The real issue is that unless the 
buffer gets evicted quickly the page can remain non-migratible for long 
periods of time. This makes CMA regions unusable for long periods of 
time given that we generally don't want to size CMA regions any larger 
than necessary ergo any failure will cause a problem.

My quick and dirty workaround for testing is to remove the GFP_MOVABLE 
flag from find_or_create_page but this seems significantly less than 
optimal. Ideally, it seems like the buffers should be evicted from the 
LRU when trying to drop (expand on invalid_bh_lru?) but I'm not familiar 
enough with the code path to know if this is a good approach.

Any suggestions/feedback is appreciated. Thanks.

Laura
-- 
Sent by an employee of the Qualcomm Innovation Center, Inc.
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum.

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: CMA page migration failure due to buffers on bh_lru
  2012-08-30  1:03 CMA page migration failure due to buffers on bh_lru Laura Abbott
@ 2012-08-31 20:46 ` Laura Abbott
  0 siblings, 0 replies; 2+ messages in thread
From: Laura Abbott @ 2012-08-31 20:46 UTC (permalink / raw)
  To: linux-fsdevel, Marek Szyprowski
  Cc: linaro-mm-sig, linux-arm-msm, linux-kernel, linux-arm-kernel

On 8/29/2012 6:03 PM, Laura Abbott wrote:

> My quick and dirty workaround for testing is to remove the GFP_MOVABLE
> flag from find_or_create_page but this seems significantly less than
> optimal. Ideally, it seems like the buffers should be evicted from the
> LRU when trying to drop (expand on invalid_bh_lru?) but I'm not familiar
> enough with the code path to know if this is a good approach.
>
> Any suggestions/feedback is appreciated. Thanks.
>
> Laura

I came up with what I think is a reasonable fix to this. Feedback is 
appreciated. Thanks.

Laura


8<---

diff --git a/fs/buffer.c b/fs/buffer.c
index ad5938c..daa0c3d 100644
--- a/fs/buffer.c
+++ b/fs/buffer.c
@@ -1399,12 +1399,49 @@ static bool has_bh_in_lru(int cpu, void *dummy)
  	return 0;
  }

+static void __evict_bh_lru(void *arg)
+{
+	struct bh_lru *b = &get_cpu_var(bh_lrus);
+	struct buffer_head *bh = arg;
+	int i;
+
+	for (i = 0; i < BH_LRU_SIZE; i++) {
+		if (b->bhs[i] == bh) {
+			brelse(b->bhs[i]);
+			b->bhs[i] = NULL;
+			goto out;
+		}
+	}
+out:
+	put_cpu_var(bh_lrus);
+}
+
+static bool bh_exists_in_lru(int cpu, void *arg)
+{
+	struct bh_lru *b = per_cpu_ptr(&bh_lrus, cpu);
+	struct buffer_head *bh = arg;
+	int i;
+
+	for (i = 0; i < BH_LRU_SIZE; i++) {
+		if (b->bhs[i] == bh)
+			return 1;
+	}
+
+	return 0;
+
+}
  void invalidate_bh_lrus(void)
  {
  	on_each_cpu_cond(has_bh_in_lru, invalidate_bh_lru, NULL, 1, GFP_KERNEL);
  }
  EXPORT_SYMBOL_GPL(invalidate_bh_lrus);

+void evict_bh_lrus(struct buffer_head *bh)
+{
+	on_each_cpu_cond(bh_exists_in_lru, __evict_bh_lru, bh, 1, GFP_ATOMIC);
+}
+EXPORT_SYMBOL_GPL(evict_bh_lrus);
+
  void set_bh_page(struct buffer_head *bh,
  		struct page *page, unsigned long offset)
  {
@@ -3052,6 +3089,7 @@ drop_buffers(struct page *page, struct buffer_head 
**buffers_to_free)

  	bh = head;
  	do {
+		evict_bh_lrus(bh);
  		if (buffer_write_io_error(bh) && page->mapping)
  			set_bit(AS_EIO, &page->mapping->flags);
  		if (buffer_busy(bh))
-- 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
hosted by The Linux Foundation

^ permalink raw reply related	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2012-08-31 20:47 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-08-30  1:03 CMA page migration failure due to buffers on bh_lru Laura Abbott
2012-08-31 20:46 ` Laura Abbott

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).