linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Using vmalloc instead of get_free_pages?
@ 2014-12-27 10:52 Stephen R. van den Berg
  2014-12-29  4:09 ` Slava Pestov
  0 siblings, 1 reply; 3+ messages in thread
From: Stephen R. van den Berg @ 2014-12-27 10:52 UTC (permalink / raw)
  To: linux-bcache

I have a system with currently:
4 x 6TB HDD backing devices  (it will expand to 12 x 8TB HDD)
2 x 490GB SSD caching devices
16GB RAM
16GB Swap

I intend to:
a. Use BTRFS using ncopies=3 (three way mirroring) for data and metadata
   on the backing devices.
b. Use the two SSD caches as a non-redundant (striped) bcache for the
   whole HDD set.

In trying to do this I notice:

- That the amount of memory being allocated for a single caching device
  exceeds maximum the amount allocatable by get_free_pages(), so I changed
  that to use vzalloc(), which seems to work (patch included).
  Is there any direct io from that area which requires special handling?

- bcache does not allow more than one cache device per set.  Is simply
  allowing this in make-bcache enough to get this working, or does it need
  code changes in the driver to allow for this?

commit 65e977a48967804a63487273a47ad39e26f39970
Author: Stephen R. van den Berg <srb@cuci.nl>
Date:   Thu Dec 25 13:54:42 2014 +0100

    Use vmalloc for alloc_bucket_pages.

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 4dd2bb7..cbba7ec 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1339,6 +1339,8 @@ void bch_cache_set_release(struct kobject *kobj)
 	module_put(THIS_MODULE);
 }
 
+#define free_bucket_pages(p, c)		(vfree((void*) (p)))
+
 static void cache_set_free(struct closure *cl)
 {
 	struct cache_set *c = container_of(cl, struct cache_set, cl);
@@ -1360,7 +1362,7 @@ static void cache_set_free(struct closure *cl)
 		}
 
 	bch_bset_sort_state_free(&c->sort);
-	free_pages((unsigned long) c->uuids, ilog2(bucket_pages(c)));
+	free_bucket_pages(c->uuids, c);
 
 	if (c->moving_gc_wq)
 		destroy_workqueue(c->moving_gc_wq);
@@ -1462,7 +1464,7 @@ void bch_cache_set_unregister(struct cache_set *c)
 }
 
 #define alloc_bucket_pages(gfp, c)			\
-	((void *) __get_free_pages(__GFP_ZERO|gfp, ilog2(bucket_pages(c))))
+	((void *) vzalloc(bucket_pages(c)<<PAGE_SHIFT))
 
 struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
 {
@@ -1801,7 +1803,7 @@ void bch_cache_release(struct kobject *kobj)
 
 	bio_split_pool_free(&ca->bio_split_hook);
 
-	free_pages((unsigned long) ca->disk_buckets, ilog2(bucket_pages(ca)));
+	free_bucket_pages(ca->disk_buckets, ca);
 	kfree(ca->prio_buckets);
 	vfree(ca->buckets);
 
-- 
Stephen.

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: Using vmalloc instead of get_free_pages?
  2014-12-27 10:52 Using vmalloc instead of get_free_pages? Stephen R. van den Berg
@ 2014-12-29  4:09 ` Slava Pestov
  2014-12-29  9:06   ` Stephen R. van den Berg
  0 siblings, 1 reply; 3+ messages in thread
From: Slava Pestov @ 2014-12-29  4:09 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: linux-bcache

Hi Stephen,

On Sat, Dec 27, 2014 at 2:52 AM, Stephen R. van den Berg <srb@cuci.nl> wrote:
> I have a system with currently:
> 4 x 6TB HDD backing devices  (it will expand to 12 x 8TB HDD)
> 2 x 490GB SSD caching devices
> 16GB RAM
> 16GB Swap
>
> I intend to:
> a. Use BTRFS using ncopies=3 (three way mirroring) for data and metadata
>    on the backing devices.
> b. Use the two SSD caches as a non-redundant (striped) bcache for the
>    whole HDD set.
>
> In trying to do this I notice:
>
> - That the amount of memory being allocated for a single caching device
>   exceeds maximum the amount allocatable by get_free_pages(), so I changed
>   that to use vzalloc(), which seems to work (patch included).
>   Is there any direct io from that area which requires special handling?

As you've already noticed this is fixed in bcache-dev, but that brings
in a huge set of changes. I think this change can be upstreamed pretty
easily in isolation. Kent (and myself) has been busy with other work
stuff but I hope he can prepare another pull request soon. There are a
number of important bug fixes that need to get cherry-picked from
bcache-dev.

>
> - bcache does not allow more than one cache device per set.  Is simply
>   allowing this in make-bcache enough to get this working, or does it need
>   code changes in the driver to allow for this?

Support for multiple cache devices is in bcache-dev, it requires
changes to both make-bcache and the kernel. I'm not sure what the time
frame for upstreaming this is, since the changes are more involved.

>
> commit 65e977a48967804a63487273a47ad39e26f39970
> Author: Stephen R. van den Berg <srb@cuci.nl>
> Date:   Thu Dec 25 13:54:42 2014 +0100
>
>     Use vmalloc for alloc_bucket_pages.
>
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 4dd2bb7..cbba7ec 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1339,6 +1339,8 @@ void bch_cache_set_release(struct kobject *kobj)
>         module_put(THIS_MODULE);
>  }
>
> +#define free_bucket_pages(p, c)                (vfree((void*) (p)))
> +
>  static void cache_set_free(struct closure *cl)
>  {
>         struct cache_set *c = container_of(cl, struct cache_set, cl);
> @@ -1360,7 +1362,7 @@ static void cache_set_free(struct closure *cl)
>                 }
>
>         bch_bset_sort_state_free(&c->sort);
> -       free_pages((unsigned long) c->uuids, ilog2(bucket_pages(c)));
> +       free_bucket_pages(c->uuids, c);
>
>         if (c->moving_gc_wq)
>                 destroy_workqueue(c->moving_gc_wq);
> @@ -1462,7 +1464,7 @@ void bch_cache_set_unregister(struct cache_set *c)
>  }
>
>  #define alloc_bucket_pages(gfp, c)                     \
> -       ((void *) __get_free_pages(__GFP_ZERO|gfp, ilog2(bucket_pages(c))))
> +       ((void *) vzalloc(bucket_pages(c)<<PAGE_SHIFT))
>
>  struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
>  {
> @@ -1801,7 +1803,7 @@ void bch_cache_release(struct kobject *kobj)
>
>         bio_split_pool_free(&ca->bio_split_hook);
>
> -       free_pages((unsigned long) ca->disk_buckets, ilog2(bucket_pages(ca)));
> +       free_bucket_pages(ca->disk_buckets, ca);
>         kfree(ca->prio_buckets);
>         vfree(ca->buckets);
>
> --
> Stephen.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Using vmalloc instead of get_free_pages?
  2014-12-29  4:09 ` Slava Pestov
@ 2014-12-29  9:06   ` Stephen R. van den Berg
  0 siblings, 0 replies; 3+ messages in thread
From: Stephen R. van den Berg @ 2014-12-29  9:06 UTC (permalink / raw)
  To: Slava Pestov; +Cc: linux-bcache

Slava Pestov wrote:
>On Sat, Dec 27, 2014 at 2:52 AM, Stephen R. van den Berg <srb@cuci.nl> wrote:
>> - That the amount of memory being allocated for a single caching device
>> exceeds maximum the amount allocatable by get_free_pages(), so I changed

>As you've already noticed this is fixed in bcache-dev, but that brings in a

I already hoped it would have been, but since the ondisk format changed,
I couldn't fully confirm this yet.  I don't mind changing the on-disk format,
BTW, because I am still able to reformat the disks before the servers
go into production.

>Support for multiple cache devices is in bcache-dev, it requires changes
>to both make-bcache and the kernel. I'm not sure what the time frame for
>upstreaming this is, since the changes are more involved.

Well, I don't mind running a custom kernel and custom bcache userspace
tools for a while.  I have been rolling my own kernels since 1992.

I presume that eventually all changes will be upstreamed (including the
changes in the on-disk format)?  So, until then, I'll simply rebase bcache-dev
on top of the most recent kernels manually.

I *would* be interested in the new bcache userspace tools though then.
-- 
Stephen.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-12-29  9:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-27 10:52 Using vmalloc instead of get_free_pages? Stephen R. van den Berg
2014-12-29  4:09 ` Slava Pestov
2014-12-29  9:06   ` Stephen R. van den Berg

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).