staging: zcache: remove zcache_direct_reclaim_lock
diff mbox series

Message ID 1318448460-5930-1-git-send-email-sjenning@linux.vnet.ibm.com
State New, archived
Headers show
Series
  • staging: zcache: remove zcache_direct_reclaim_lock
Related show

Commit Message

Seth Jennings Oct. 12, 2011, 7:41 p.m. UTC
zcache_do_preload() currently does a spin_trylock() on the
zcache_direct_reclaim_lock. Holding this lock intends to prevent
shrink_zcache_memory() from evicting zbud pages as a result
of a preload.

However, it also prevents two threads from
executing zcache_do_preload() at the same time.  The first
thread will obtain the lock and the second thread's spin_trylock()
will fail (an aborted preload) causing the page to be either lost
(cleancache) or pushed out to the swap device (frontswap). It
also doesn't ensure that the call to shrink_zcache_memory() is
on the same thread as the call to zcache_do_preload().

Additional, there is no need for this mechanism because all
zcache_do_preload() calls that come down from cleancache already
have PF_MEMALLOC set in the process flags which prevents
direct reclaim in the memory manager. If the zcache_do_preload()
call is done from the frontswap path, we _want_ reclaim to be
done (which it isn't right now).

This patch removes the zcache_direct_reclaim_lock and related
statistics in zcache.

Based on v3.1-rc8

Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
Reviewed-by: Dave Hansen <dave@linux.vnet.ibm.com>
---
 drivers/staging/zcache/zcache-main.c |   31 ++++---------------------------
 1 files changed, 4 insertions(+), 27 deletions(-)

Comments

Dan Magenheimer Oct. 12, 2011, 8:39 p.m. UTC | #1
> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> Subject: [PATCH] staging: zcache: remove zcache_direct_reclaim_lock
> 
> zcache_do_preload() currently does a spin_trylock() on the
> zcache_direct_reclaim_lock. Holding this lock intends to prevent
> shrink_zcache_memory() from evicting zbud pages as a result
> of a preload.
>
> However, it also prevents two threads from
> executing zcache_do_preload() at the same time.  The first
> thread will obtain the lock and the second thread's spin_trylock()
> will fail (an aborted preload) causing the page to be either lost
> (cleancache) or pushed out to the swap device (frontswap). It
> also doesn't ensure that the call to shrink_zcache_memory() is
> on the same thread as the call to zcache_do_preload().

Yes, this looks to be leftover code from early in kztmem/zcache
development.  Good analysis.
 
> Additional, there is no need for this mechanism because all
> zcache_do_preload() calls that come down from cleancache already
> have PF_MEMALLOC set in the process flags which prevents
> direct reclaim in the memory manager. If the zcache_do_preload()

Might it be worthwhile to add a BUG/ASSERT for the presence
of PF_MEMALLOC, or at least a comment in the code?

> call is done from the frontswap path, we _want_ reclaim to be
> done (which it isn't right now).
> 
> This patch removes the zcache_direct_reclaim_lock and related
> statistics in zcache.
> 
> Based on v3.1-rc8
> 
> Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
> Reviewed-by: Dave Hansen <dave@linux.vnet.ibm.com>

With added code/comment per above...
Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Seth Jennings Oct. 14, 2011, 5:04 p.m. UTC | #2
On 10/12/2011 03:39 PM, Dan Magenheimer wrote:
>> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
>> Subject: [PATCH] staging: zcache: remove zcache_direct_reclaim_lock
>>
>> zcache_do_preload() currently does a spin_trylock() on the
>> zcache_direct_reclaim_lock. Holding this lock intends to prevent
>> shrink_zcache_memory() from evicting zbud pages as a result
>> of a preload.
>>
>> However, it also prevents two threads from
>> executing zcache_do_preload() at the same time.  The first
>> thread will obtain the lock and the second thread's spin_trylock()
>> will fail (an aborted preload) causing the page to be either lost
>> (cleancache) or pushed out to the swap device (frontswap). It
>> also doesn't ensure that the call to shrink_zcache_memory() is
>> on the same thread as the call to zcache_do_preload().
> 
> Yes, this looks to be leftover code from early in kztmem/zcache
> development.  Good analysis.
>  
>> Additional, there is no need for this mechanism because all
>> zcache_do_preload() calls that come down from cleancache already
>> have PF_MEMALLOC set in the process flags which prevents
>> direct reclaim in the memory manager. If the zcache_do_preload()
> 
> Might it be worthwhile to add a BUG/ASSERT for the presence
> of PF_MEMALLOC, or at least a comment in the code?

I was mistaken in my commit comments. Not all cleancache calls have
PF_MEMALLOC set.  One exception is calls from the cgroup code paths.

However, there isn't a way for the code to loop back on itself.

Regardless of whether or not PF_MEMALLOC is set coming into
the preload, the call path only goes one way:

zcache_do_preload()
kmem_cache_alloc()
possibly reclaim and call to shrink_zcache_memory()
zbud_evict_pages()

Nothing done in zbud_evict_pages() can result in a call back to
zcache_do_preload().  So there isn't a threat of recursion.

NOW, if the logic your are trying to implement is: "Don't kick
out zbud pages as the result of preload allocations" then that's
a different story.

If the preload is called with PF_MEMALLOC set, then 
the shrinker will not be run during a kmem_cache_alloc().

However if the preload is called with PF_MEMALLOC being set
then there is a chance that some zbud pages might be reclaimed
as a result.  BUT, I'm not convinced that is a bad thing.

> 
>> call is done from the frontswap path, we _want_ reclaim to be
>> done (which it isn't right now).
>>
>> This patch removes the zcache_direct_reclaim_lock and related
>> statistics in zcache.
>>
>> Based on v3.1-rc8
>>
>> Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
>> Reviewed-by: Dave Hansen <dave@linux.vnet.ibm.com>
> 
> With added code/comment per above...
> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Seth Jennings Oct. 14, 2011, 6:21 p.m. UTC | #3
On 10/14/2011 12:04 PM, Seth Jennings wrote:
> On 10/12/2011 03:39 PM, Dan Magenheimer wrote:
>>> From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
>>> Subject: [PATCH] staging: zcache: remove zcache_direct_reclaim_lock
> 
> If the preload is called with PF_MEMALLOC set, then 
> the shrinker will not be run during a kmem_cache_alloc().
> 
> However if the preload is called with PF_MEMALLOC being set
Sorry, should have been *without PF_MEMALLOC being set
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/
Dan Magenheimer Oct. 17, 2011, 8:14 p.m. UTC | #4
> From: Dan Magenheimer
> Sent: Wednesday, October 12, 2011 2:39 PM
> To: Seth Jennings; gregkh@suse.de
> Cc: cascardo@holoscopio.com; rdunlap@xenotime.net; devel@driverdev.osuosl.org; linux-
> kernel@vger.kernel.org; linux-mm@kvack.org; rcj@linux.vnet.ibm.com; brking@linux.vnet.ibm.com
> Subject: RE: [PATCH] staging: zcache: remove zcache_direct_reclaim_lock
> 
> > From: Seth Jennings [mailto:sjenning@linux.vnet.ibm.com]
> > Subject: [PATCH] staging: zcache: remove zcache_direct_reclaim_lock
> >
> > zcache_do_preload() currently does a spin_trylock() on the
> > zcache_direct_reclaim_lock. Holding this lock intends to prevent
> > shrink_zcache_memory() from evicting zbud pages as a result
> > of a preload.
> >
> > However, it also prevents two threads from
> > executing zcache_do_preload() at the same time.  The first
> > thread will obtain the lock and the second thread's spin_trylock()
> > will fail (an aborted preload) causing the page to be either lost
> > (cleancache) or pushed out to the swap device (frontswap). It
> > also doesn't ensure that the call to shrink_zcache_memory() is
> > on the same thread as the call to zcache_do_preload().
> 
> Yes, this looks to be leftover code from early in kztmem/zcache
> development.  Good analysis.
> 
> > Additional, there is no need for this mechanism because all
> > zcache_do_preload() calls that come down from cleancache already
> > have PF_MEMALLOC set in the process flags which prevents
> > direct reclaim in the memory manager. If the zcache_do_preload()
> 
> Might it be worthwhile to add a BUG/ASSERT for the presence
> of PF_MEMALLOC, or at least a comment in the code?
> 
> > call is done from the frontswap path, we _want_ reclaim to be
> > done (which it isn't right now).
> >
> > This patch removes the zcache_direct_reclaim_lock and related
> > statistics in zcache.
> >
> > Based on v3.1-rc8
> >
> > Signed-off-by: Seth Jennings <sjenning@linux.vnet.ibm.com>
> > Reviewed-by: Dave Hansen <dave@linux.vnet.ibm.com>
> 
> With added code/comment per above...
> Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>

After Seth's further analysis, ignore my conditional and
consider v1 of this patch:

Acked-by: Dan Magenheimer <dan.magenheimer@oracle.com>
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/

Patch
diff mbox series

diff --git a/drivers/staging/zcache/zcache-main.c b/drivers/staging/zcache/zcache-main.c
index 462fbc2..a61b267 100644
--- a/drivers/staging/zcache/zcache-main.c
+++ b/drivers/staging/zcache/zcache-main.c
@@ -962,15 +962,6 @@  out:
 static unsigned long zcache_failed_get_free_pages;
 static unsigned long zcache_failed_alloc;
 static unsigned long zcache_put_to_flush;
-static unsigned long zcache_aborted_preload;
-static unsigned long zcache_aborted_shrink;
-
-/*
- * Ensure that memory allocation requests in zcache don't result
- * in direct reclaim requests via the shrinker, which would cause
- * an infinite loop.  Maybe a GFP flag would be better?
- */
-static DEFINE_SPINLOCK(zcache_direct_reclaim_lock);
 
 /*
  * for now, used named slabs so can easily track usage; later can
@@ -1009,10 +1000,6 @@  static int zcache_do_preload(struct tmem_pool *pool)
 		goto out;
 	if (unlikely(zcache_obj_cache == NULL))
 		goto out;
-	if (!spin_trylock(&zcache_direct_reclaim_lock)) {
-		zcache_aborted_preload++;
-		goto out;
-	}
 	preempt_disable();
 	kp = &__get_cpu_var(zcache_preloads);
 	while (kp->nr < ARRAY_SIZE(kp->objnodes)) {
@@ -1021,7 +1008,7 @@  static int zcache_do_preload(struct tmem_pool *pool)
 				ZCACHE_GFP_MASK);
 		if (unlikely(objnode == NULL)) {
 			zcache_failed_alloc++;
-			goto unlock_out;
+			goto out;
 		}
 		preempt_disable();
 		kp = &__get_cpu_var(zcache_preloads);
@@ -1034,13 +1021,13 @@  static int zcache_do_preload(struct tmem_pool *pool)
 	obj = kmem_cache_alloc(zcache_obj_cache, ZCACHE_GFP_MASK);
 	if (unlikely(obj == NULL)) {
 		zcache_failed_alloc++;
-		goto unlock_out;
+		goto out;
 	}
 	page = (void *)__get_free_page(ZCACHE_GFP_MASK);
 	if (unlikely(page == NULL)) {
 		zcache_failed_get_free_pages++;
 		kmem_cache_free(zcache_obj_cache, obj);
-		goto unlock_out;
+		goto out;
 	}
 	preempt_disable();
 	kp = &__get_cpu_var(zcache_preloads);
@@ -1053,8 +1040,6 @@  static int zcache_do_preload(struct tmem_pool *pool)
 	else
 		free_page((unsigned long)page);
 	ret = 0;
-unlock_out:
-	spin_unlock(&zcache_direct_reclaim_lock);
 out:
 	return ret;
 }
@@ -1423,8 +1408,6 @@  ZCACHE_SYSFS_RO(evicted_buddied_pages);
 ZCACHE_SYSFS_RO(failed_get_free_pages);
 ZCACHE_SYSFS_RO(failed_alloc);
 ZCACHE_SYSFS_RO(put_to_flush);
-ZCACHE_SYSFS_RO(aborted_preload);
-ZCACHE_SYSFS_RO(aborted_shrink);
 ZCACHE_SYSFS_RO(compress_poor);
 ZCACHE_SYSFS_RO(mean_compress_poor);
 ZCACHE_SYSFS_RO_ATOMIC(zbud_curr_raw_pages);
@@ -1466,8 +1449,6 @@  static struct attribute *zcache_attrs[] = {
 	&zcache_failed_get_free_pages_attr.attr,
 	&zcache_failed_alloc_attr.attr,
 	&zcache_put_to_flush_attr.attr,
-	&zcache_aborted_preload_attr.attr,
-	&zcache_aborted_shrink_attr.attr,
 	&zcache_zbud_unbuddied_list_counts_attr.attr,
 	&zcache_zbud_cumul_chunk_counts_attr.attr,
 	&zcache_zv_curr_dist_counts_attr.attr,
@@ -1507,11 +1488,7 @@  static int shrink_zcache_memory(struct shrinker *shrink,
 		if (!(gfp_mask & __GFP_FS))
 			/* does this case really need to be skipped? */
 			goto out;
-		if (spin_trylock(&zcache_direct_reclaim_lock)) {
-			zbud_evict_pages(nr);
-			spin_unlock(&zcache_direct_reclaim_lock);
-		} else
-			zcache_aborted_shrink++;
+		zbud_evict_pages(nr);
 	}
 	ret = (int)atomic_read(&zcache_zbud_curr_raw_pages);
 out: