All of lore.kernel.org
 help / color / mirror / Atom feed
* Using vmalloc instead of get_free_pages?
@ 2014-12-27 10:52 Stephen R. van den Berg
  2014-12-29  4:09 ` Slava Pestov
  0 siblings, 1 reply; 3+ messages in thread
From: Stephen R. van den Berg @ 2014-12-27 10:52 UTC (permalink / raw)
  To: linux-bcache

I have a system with currently:
4 x 6TB HDD backing devices  (it will expand to 12 x 8TB HDD)
2 x 490GB SSD caching devices
16GB RAM
16GB Swap

I intend to:
a. Use BTRFS using ncopies=3 (three way mirroring) for data and metadata
   on the backing devices.
b. Use the two SSD caches as a non-redundant (striped) bcache for the
   whole HDD set.

In trying to do this I notice:

- That the amount of memory being allocated for a single caching device
  exceeds maximum the amount allocatable by get_free_pages(), so I changed
  that to use vzalloc(), which seems to work (patch included).
  Is there any direct io from that area which requires special handling?

- bcache does not allow more than one cache device per set.  Is simply
  allowing this in make-bcache enough to get this working, or does it need
  code changes in the driver to allow for this?

commit 65e977a48967804a63487273a47ad39e26f39970
Author: Stephen R. van den Berg <srb@cuci.nl>
Date:   Thu Dec 25 13:54:42 2014 +0100

    Use vmalloc for alloc_bucket_pages.

diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 4dd2bb7..cbba7ec 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1339,6 +1339,8 @@ void bch_cache_set_release(struct kobject *kobj)
 	module_put(THIS_MODULE);
 }
 
+#define free_bucket_pages(p, c)		(vfree((void*) (p)))
+
 static void cache_set_free(struct closure *cl)
 {
 	struct cache_set *c = container_of(cl, struct cache_set, cl);
@@ -1360,7 +1362,7 @@ static void cache_set_free(struct closure *cl)
 		}
 
 	bch_bset_sort_state_free(&c->sort);
-	free_pages((unsigned long) c->uuids, ilog2(bucket_pages(c)));
+	free_bucket_pages(c->uuids, c);
 
 	if (c->moving_gc_wq)
 		destroy_workqueue(c->moving_gc_wq);
@@ -1462,7 +1464,7 @@ void bch_cache_set_unregister(struct cache_set *c)
 }
 
 #define alloc_bucket_pages(gfp, c)			\
-	((void *) __get_free_pages(__GFP_ZERO|gfp, ilog2(bucket_pages(c))))
+	((void *) vzalloc(bucket_pages(c)<<PAGE_SHIFT))
 
 struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
 {
@@ -1801,7 +1803,7 @@ void bch_cache_release(struct kobject *kobj)
 
 	bio_split_pool_free(&ca->bio_split_hook);
 
-	free_pages((unsigned long) ca->disk_buckets, ilog2(bucket_pages(ca)));
+	free_bucket_pages(ca->disk_buckets, ca);
 	kfree(ca->prio_buckets);
 	vfree(ca->buckets);
 
-- 
Stephen.

^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: Using vmalloc instead of get_free_pages?
  2014-12-27 10:52 Using vmalloc instead of get_free_pages? Stephen R. van den Berg
@ 2014-12-29  4:09 ` Slava Pestov
  2014-12-29  9:06   ` Stephen R. van den Berg
  0 siblings, 1 reply; 3+ messages in thread
From: Slava Pestov @ 2014-12-29  4:09 UTC (permalink / raw)
  To: Stephen R. van den Berg; +Cc: linux-bcache

Hi Stephen,

On Sat, Dec 27, 2014 at 2:52 AM, Stephen R. van den Berg <srb@cuci.nl> wrote:
> I have a system with currently:
> 4 x 6TB HDD backing devices  (it will expand to 12 x 8TB HDD)
> 2 x 490GB SSD caching devices
> 16GB RAM
> 16GB Swap
>
> I intend to:
> a. Use BTRFS using ncopies=3 (three way mirroring) for data and metadata
>    on the backing devices.
> b. Use the two SSD caches as a non-redundant (striped) bcache for the
>    whole HDD set.
>
> In trying to do this I notice:
>
> - That the amount of memory being allocated for a single caching device
>   exceeds maximum the amount allocatable by get_free_pages(), so I changed
>   that to use vzalloc(), which seems to work (patch included).
>   Is there any direct io from that area which requires special handling?

As you've already noticed this is fixed in bcache-dev, but that brings
in a huge set of changes. I think this change can be upstreamed pretty
easily in isolation. Kent (and myself) has been busy with other work
stuff but I hope he can prepare another pull request soon. There are a
number of important bug fixes that need to get cherry-picked from
bcache-dev.

>
> - bcache does not allow more than one cache device per set.  Is simply
>   allowing this in make-bcache enough to get this working, or does it need
>   code changes in the driver to allow for this?

Support for multiple cache devices is in bcache-dev, it requires
changes to both make-bcache and the kernel. I'm not sure what the time
frame for upstreaming this is, since the changes are more involved.

>
> commit 65e977a48967804a63487273a47ad39e26f39970
> Author: Stephen R. van den Berg <srb@cuci.nl>
> Date:   Thu Dec 25 13:54:42 2014 +0100
>
>     Use vmalloc for alloc_bucket_pages.
>
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 4dd2bb7..cbba7ec 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1339,6 +1339,8 @@ void bch_cache_set_release(struct kobject *kobj)
>         module_put(THIS_MODULE);
>  }
>
> +#define free_bucket_pages(p, c)                (vfree((void*) (p)))
> +
>  static void cache_set_free(struct closure *cl)
>  {
>         struct cache_set *c = container_of(cl, struct cache_set, cl);
> @@ -1360,7 +1362,7 @@ static void cache_set_free(struct closure *cl)
>                 }
>
>         bch_bset_sort_state_free(&c->sort);
> -       free_pages((unsigned long) c->uuids, ilog2(bucket_pages(c)));
> +       free_bucket_pages(c->uuids, c);
>
>         if (c->moving_gc_wq)
>                 destroy_workqueue(c->moving_gc_wq);
> @@ -1462,7 +1464,7 @@ void bch_cache_set_unregister(struct cache_set *c)
>  }
>
>  #define alloc_bucket_pages(gfp, c)                     \
> -       ((void *) __get_free_pages(__GFP_ZERO|gfp, ilog2(bucket_pages(c))))
> +       ((void *) vzalloc(bucket_pages(c)<<PAGE_SHIFT))
>
>  struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
>  {
> @@ -1801,7 +1803,7 @@ void bch_cache_release(struct kobject *kobj)
>
>         bio_split_pool_free(&ca->bio_split_hook);
>
> -       free_pages((unsigned long) ca->disk_buckets, ilog2(bucket_pages(ca)));
> +       free_bucket_pages(ca->disk_buckets, ca);
>         kfree(ca->prio_buckets);
>         vfree(ca->buckets);
>
> --
> Stephen.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Using vmalloc instead of get_free_pages?
  2014-12-29  4:09 ` Slava Pestov
@ 2014-12-29  9:06   ` Stephen R. van den Berg
  0 siblings, 0 replies; 3+ messages in thread
From: Stephen R. van den Berg @ 2014-12-29  9:06 UTC (permalink / raw)
  To: Slava Pestov; +Cc: linux-bcache

Slava Pestov wrote:
>On Sat, Dec 27, 2014 at 2:52 AM, Stephen R. van den Berg <srb@cuci.nl> wrote:
>> - That the amount of memory being allocated for a single caching device
>> exceeds maximum the amount allocatable by get_free_pages(), so I changed

>As you've already noticed this is fixed in bcache-dev, but that brings in a

I already hoped it would have been, but since the ondisk format changed,
I couldn't fully confirm this yet.  I don't mind changing the on-disk format,
BTW, because I am still able to reformat the disks before the servers
go into production.

>Support for multiple cache devices is in bcache-dev, it requires changes
>to both make-bcache and the kernel. I'm not sure what the time frame for
>upstreaming this is, since the changes are more involved.

Well, I don't mind running a custom kernel and custom bcache userspace
tools for a while.  I have been rolling my own kernels since 1992.

I presume that eventually all changes will be upstreamed (including the
changes in the on-disk format)?  So, until then, I'll simply rebase bcache-dev
on top of the most recent kernels manually.

I *would* be interested in the new bcache userspace tools though then.
-- 
Stephen.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2014-12-29  9:06 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-27 10:52 Using vmalloc instead of get_free_pages? Stephen R. van den Berg
2014-12-29  4:09 ` Slava Pestov
2014-12-29  9:06   ` Stephen R. van den Berg

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.