* Using vmalloc instead of get_free_pages?
@ 2014-12-27 10:52 Stephen R. van den Berg
2014-12-29 4:09 ` Slava Pestov
0 siblings, 1 reply; 3+ messages in thread
From: Stephen R. van den Berg @ 2014-12-27 10:52 UTC (permalink / raw)
To: linux-bcache
I have a system with currently:
4 x 6TB HDD backing devices (it will expand to 12 x 8TB HDD)
2 x 490GB SSD caching devices
16GB RAM
16GB Swap
I intend to:
a. Use BTRFS using ncopies=3 (three way mirroring) for data and metadata
on the backing devices.
b. Use the two SSD caches as a non-redundant (striped) bcache for the
whole HDD set.
In trying to do this I notice:
- That the amount of memory being allocated for a single caching device
exceeds maximum the amount allocatable by get_free_pages(), so I changed
that to use vzalloc(), which seems to work (patch included).
Is there any direct io from that area which requires special handling?
- bcache does not allow more than one cache device per set. Is simply
allowing this in make-bcache enough to get this working, or does it need
code changes in the driver to allow for this?
commit 65e977a48967804a63487273a47ad39e26f39970
Author: Stephen R. van den Berg <srb@cuci.nl>
Date: Thu Dec 25 13:54:42 2014 +0100
Use vmalloc for alloc_bucket_pages.
diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
index 4dd2bb7..cbba7ec 100644
--- a/drivers/md/bcache/super.c
+++ b/drivers/md/bcache/super.c
@@ -1339,6 +1339,8 @@ void bch_cache_set_release(struct kobject *kobj)
module_put(THIS_MODULE);
}
+#define free_bucket_pages(p, c) (vfree((void*) (p)))
+
static void cache_set_free(struct closure *cl)
{
struct cache_set *c = container_of(cl, struct cache_set, cl);
@@ -1360,7 +1362,7 @@ static void cache_set_free(struct closure *cl)
}
bch_bset_sort_state_free(&c->sort);
- free_pages((unsigned long) c->uuids, ilog2(bucket_pages(c)));
+ free_bucket_pages(c->uuids, c);
if (c->moving_gc_wq)
destroy_workqueue(c->moving_gc_wq);
@@ -1462,7 +1464,7 @@ void bch_cache_set_unregister(struct cache_set *c)
}
#define alloc_bucket_pages(gfp, c) \
- ((void *) __get_free_pages(__GFP_ZERO|gfp, ilog2(bucket_pages(c))))
+ ((void *) vzalloc(bucket_pages(c)<<PAGE_SHIFT))
struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
{
@@ -1801,7 +1803,7 @@ void bch_cache_release(struct kobject *kobj)
bio_split_pool_free(&ca->bio_split_hook);
- free_pages((unsigned long) ca->disk_buckets, ilog2(bucket_pages(ca)));
+ free_bucket_pages(ca->disk_buckets, ca);
kfree(ca->prio_buckets);
vfree(ca->buckets);
--
Stephen.
^ permalink raw reply related [flat|nested] 3+ messages in thread
* Re: Using vmalloc instead of get_free_pages?
2014-12-27 10:52 Using vmalloc instead of get_free_pages? Stephen R. van den Berg
@ 2014-12-29 4:09 ` Slava Pestov
2014-12-29 9:06 ` Stephen R. van den Berg
0 siblings, 1 reply; 3+ messages in thread
From: Slava Pestov @ 2014-12-29 4:09 UTC (permalink / raw)
To: Stephen R. van den Berg; +Cc: linux-bcache
Hi Stephen,
On Sat, Dec 27, 2014 at 2:52 AM, Stephen R. van den Berg <srb@cuci.nl> wrote:
> I have a system with currently:
> 4 x 6TB HDD backing devices (it will expand to 12 x 8TB HDD)
> 2 x 490GB SSD caching devices
> 16GB RAM
> 16GB Swap
>
> I intend to:
> a. Use BTRFS using ncopies=3 (three way mirroring) for data and metadata
> on the backing devices.
> b. Use the two SSD caches as a non-redundant (striped) bcache for the
> whole HDD set.
>
> In trying to do this I notice:
>
> - That the amount of memory being allocated for a single caching device
> exceeds maximum the amount allocatable by get_free_pages(), so I changed
> that to use vzalloc(), which seems to work (patch included).
> Is there any direct io from that area which requires special handling?
As you've already noticed this is fixed in bcache-dev, but that brings
in a huge set of changes. I think this change can be upstreamed pretty
easily in isolation. Kent (and myself) has been busy with other work
stuff but I hope he can prepare another pull request soon. There are a
number of important bug fixes that need to get cherry-picked from
bcache-dev.
>
> - bcache does not allow more than one cache device per set. Is simply
> allowing this in make-bcache enough to get this working, or does it need
> code changes in the driver to allow for this?
Support for multiple cache devices is in bcache-dev, it requires
changes to both make-bcache and the kernel. I'm not sure what the time
frame for upstreaming this is, since the changes are more involved.
>
> commit 65e977a48967804a63487273a47ad39e26f39970
> Author: Stephen R. van den Berg <srb@cuci.nl>
> Date: Thu Dec 25 13:54:42 2014 +0100
>
> Use vmalloc for alloc_bucket_pages.
>
> diff --git a/drivers/md/bcache/super.c b/drivers/md/bcache/super.c
> index 4dd2bb7..cbba7ec 100644
> --- a/drivers/md/bcache/super.c
> +++ b/drivers/md/bcache/super.c
> @@ -1339,6 +1339,8 @@ void bch_cache_set_release(struct kobject *kobj)
> module_put(THIS_MODULE);
> }
>
> +#define free_bucket_pages(p, c) (vfree((void*) (p)))
> +
> static void cache_set_free(struct closure *cl)
> {
> struct cache_set *c = container_of(cl, struct cache_set, cl);
> @@ -1360,7 +1362,7 @@ static void cache_set_free(struct closure *cl)
> }
>
> bch_bset_sort_state_free(&c->sort);
> - free_pages((unsigned long) c->uuids, ilog2(bucket_pages(c)));
> + free_bucket_pages(c->uuids, c);
>
> if (c->moving_gc_wq)
> destroy_workqueue(c->moving_gc_wq);
> @@ -1462,7 +1464,7 @@ void bch_cache_set_unregister(struct cache_set *c)
> }
>
> #define alloc_bucket_pages(gfp, c) \
> - ((void *) __get_free_pages(__GFP_ZERO|gfp, ilog2(bucket_pages(c))))
> + ((void *) vzalloc(bucket_pages(c)<<PAGE_SHIFT))
>
> struct cache_set *bch_cache_set_alloc(struct cache_sb *sb)
> {
> @@ -1801,7 +1803,7 @@ void bch_cache_release(struct kobject *kobj)
>
> bio_split_pool_free(&ca->bio_split_hook);
>
> - free_pages((unsigned long) ca->disk_buckets, ilog2(bucket_pages(ca)));
> + free_bucket_pages(ca->disk_buckets, ca);
> kfree(ca->prio_buckets);
> vfree(ca->buckets);
>
> --
> Stephen.
> --
> To unsubscribe from this list: send the line "unsubscribe linux-bcache" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Using vmalloc instead of get_free_pages?
2014-12-29 4:09 ` Slava Pestov
@ 2014-12-29 9:06 ` Stephen R. van den Berg
0 siblings, 0 replies; 3+ messages in thread
From: Stephen R. van den Berg @ 2014-12-29 9:06 UTC (permalink / raw)
To: Slava Pestov; +Cc: linux-bcache
Slava Pestov wrote:
>On Sat, Dec 27, 2014 at 2:52 AM, Stephen R. van den Berg <srb@cuci.nl> wrote:
>> - That the amount of memory being allocated for a single caching device
>> exceeds maximum the amount allocatable by get_free_pages(), so I changed
>As you've already noticed this is fixed in bcache-dev, but that brings in a
I already hoped it would have been, but since the ondisk format changed,
I couldn't fully confirm this yet. I don't mind changing the on-disk format,
BTW, because I am still able to reformat the disks before the servers
go into production.
>Support for multiple cache devices is in bcache-dev, it requires changes
>to both make-bcache and the kernel. I'm not sure what the time frame for
>upstreaming this is, since the changes are more involved.
Well, I don't mind running a custom kernel and custom bcache userspace
tools for a while. I have been rolling my own kernels since 1992.
I presume that eventually all changes will be upstreamed (including the
changes in the on-disk format)? So, until then, I'll simply rebase bcache-dev
on top of the most recent kernels manually.
I *would* be interested in the new bcache userspace tools though then.
--
Stephen.
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2014-12-29 9:06 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-12-27 10:52 Using vmalloc instead of get_free_pages? Stephen R. van den Berg
2014-12-29 4:09 ` Slava Pestov
2014-12-29 9:06 ` Stephen R. van den Berg
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).