linux-block.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Coly Li <colyli@suse.de>
To: Hannes Reinecke <hare@suse.de>
Cc: linux-bcache@vger.kernel.org, axboe@kernel.dk,
	linux-block@vger.kernel.org, Jianpeng Ma <jianpeng.ma@intel.com>,
	Qiaowei Ren <qiaowei.ren@intel.com>
Subject: Re: [PATCH 06/14] bcache: bch_nvm_alloc_pages() of the buddy
Date: Wed, 23 Jun 2021 14:02:17 +0800	[thread overview]
Message-ID: <e5b642b5-47d1-ce30-7931-817d4ec4cbdc@suse.de> (raw)
In-Reply-To: <34dc388c-ccbf-3b09-8254-188d183c3d26@suse.de>

On 6/22/21 6:51 PM, Hannes Reinecke wrote:
> On 6/15/21 7:49 AM, Coly Li wrote:
>> From: Jianpeng Ma <jianpeng.ma@intel.com>
>>
>> This patch implements the bch_nvm_alloc_pages() of the buddy.
>> In terms of function, this func is like current-page-buddy-alloc.
>> But the differences are:
>> a: it need owner_uuid as parameter which record owner info. And it
>> make those info persistence.
>> b: it don't need flags like GFP_*. All allocs are the equal.
>> c: it don't trigger other ops etc swap/recycle.
>>
>> Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
>> Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
>> Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
>> Signed-off-by: Coly Li <colyli@suse.de>
>> ---
>>  drivers/md/bcache/nvm-pages.c   | 174 ++++++++++++++++++++++++++++++++
>>  drivers/md/bcache/nvm-pages.h   |   6 ++
>>  include/uapi/linux/bcache-nvm.h |   6 +-
>>  3 files changed, 184 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
>> index 804ee66e97be..5d095d241483 100644
>> --- a/drivers/md/bcache/nvm-pages.c
>> +++ b/drivers/md/bcache/nvm-pages.c
>> @@ -74,6 +74,180 @@ static inline void remove_owner_space(struct bch_nvm_namespace *ns,
>>  	}
>>  }
>>  
>> +/* If not found, it will create if create == true */
>> +static struct bch_nvm_pages_owner_head *find_owner_head(const char *owner_uuid, bool create)
>> +{
>> +	struct bch_owner_list_head *owner_list_head = only_set->owner_list_head;
>> +	struct bch_nvm_pages_owner_head *owner_head = NULL;
>> +	int i;
>> +
>> +	if (owner_list_head == NULL)
>> +		goto out;
>> +
>> +	for (i = 0; i < only_set->owner_list_used; i++) {
>> +		if (!memcmp(owner_uuid, owner_list_head->heads[i].uuid, 16)) {
>> +			owner_head = &(owner_list_head->heads[i]);
>> +			break;
>> +		}
>> +	}
>> +
> Please, don't name is 'heads'. If this is supposed to be a linked list,
> use the standard list implementation and initialize the pointers correctly.
> If it isn't use an array (as you know in advance how many array entries
> you can allocate).

heads is an array to store the heads of all owner lists. Each element in
array heads[] is a head of an owner list.

An owner is identified by its uuid. When allocating nvm pages from the
nvm-pages allocator, the owner's uuid is provided. And all its allocated
nvm pages are tracked by this owner's owner list. Typically the owner is
a device driver using nvm pages like bcache.

After reboot, bcache will ask the nvm-pages allocator to return the whole
owner list to it by the previous provided uuid of bcache driver. Then it
is bcache driver's duty to restore all data layout from all the nvm pages
which are tracked by the returned owner list.

So heads is named for an array to store all the heads of all the owner list.


>> +	if (!owner_head && create) {
>> +		u32 used = only_set->owner_list_used;
>> +
>> +		if (only_set->owner_list_size > used) {
>> +			memcpy_flushcache(owner_list_head->heads[used].uuid, owner_uuid, 16);
>> +			only_set->owner_list_used++;
>> +
>> +			owner_list_head->used++;
>> +			owner_head = &(owner_list_head->heads[used]);
>> +		} else
>> +			pr_info("no free bch_nvm_pages_owner_head\n");
>> +	}
>> +
>> +out:
>> +	return owner_head;
>> +}
>> +
>> +static struct bch_nvm_pgalloc_recs *find_empty_pgalloc_recs(void)
>> +{
>> +	unsigned int start;
>> +	struct bch_nvm_namespace *ns = only_set->nss[0];
>> +	struct bch_nvm_pgalloc_recs *recs;
>> +
>> +	start = bitmap_find_next_zero_area(ns->pgalloc_recs_bitmap, BCH_MAX_PGALLOC_RECS, 0, 1, 0);
>> +	if (start > BCH_MAX_PGALLOC_RECS) {
>> +		pr_info("no free struct bch_nvm_pgalloc_recs\n");
>> +		return NULL;
>> +	}
>> +
>> +	bitmap_set(ns->pgalloc_recs_bitmap, start, 1);
>> +	recs = (struct bch_nvm_pgalloc_recs *)(ns->kaddr + BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET)
>> +		+ start;
>> +	return recs;
>> +}
>> +
>> +static struct bch_nvm_pgalloc_recs *find_nvm_pgalloc_recs(struct bch_nvm_namespace *ns,
>> +		struct bch_nvm_pages_owner_head *owner_head, bool create)
>> +{
>> +	int ns_nr = ns->sb->this_namespace_nr;
>> +	struct bch_nvm_pgalloc_recs *prev_recs = NULL, *recs = owner_head->recs[ns_nr];
>> +
>> +	/* If create=false, we return recs[nr] */
>> +	if (!create)
>> +		return recs;
>> +
>> +	/*
>> +	 * If create=true, it mean we need a empty struct bch_pgalloc_rec
>> +	 * So we should find non-empty struct bch_nvm_pgalloc_recs or alloc
>> +	 * new struct bch_nvm_pgalloc_recs. And return this bch_nvm_pgalloc_recs
>> +	 */
>> +	while (recs && (recs->used == recs->size)) {
>> +		prev_recs = recs;
>> +		recs = recs->next;
>> +	}
>> +
>> +	/* Found empty struct bch_nvm_pgalloc_recs */
>> +	if (recs)
>> +		return recs;
>> +	/* Need alloc new struct bch_nvm_galloc_recs */
>> +	recs = find_empty_pgalloc_recs();
>> +	if (recs) {
>> +		recs->next = NULL;
>> +		recs->owner = owner_head;
>> +		memcpy_flushcache(recs->magic, bch_nvm_pages_pgalloc_magic, 16);
>> +		memcpy_flushcache(recs->owner_uuid, owner_head->uuid, 16);
>> +		recs->size = BCH_MAX_RECS;
>> +		recs->used = 0;
>> +
>> +		if (prev_recs)
>> +			prev_recs->next = recs;
>> +		else
>> +			owner_head->recs[ns_nr] = recs;
>> +	}
>> +
> Wouldn't it be easier if the bitmap covers the entire range, and not
> just the non-empty ones?
> Eventually (ie if the NVM set becomes full) it'll cover it anyway, so
> can't we save ourselves some time to allocate a large enough bitmap
> upfront and only use it do figure out empty recs?

Yes we will do it later. We don't do it now is because a struct
bch_nvm_pgalloc_recs may contain 1000+ records and all current
code only use 1 record for bcache journal. Later when I star to
store bcache btree nodes on NVDIMM, then I can use the suggested
bitmap optimization with real workload to test.

Thanks for the suggestion.


>
>> +	return recs;
>> +}
>> +
>> +static void add_pgalloc_rec(struct bch_nvm_pgalloc_recs *recs, void *kaddr, int order)
>> +{
>> +	int i;
>> +
>> +	for (i = 0; i < recs->size; i++) {
>> +		if (recs->recs[i].pgoff == 0) {
>> +			recs->recs[i].pgoff = (unsigned long)kaddr >> PAGE_SHIFT;
>> +			recs->recs[i].order = order;
>> +			recs->used++;
>> +			break;
>> +		}
>> +	}
>> +	BUG_ON(i == recs->size);
>> +}
>> +
>> +void *bch_nvm_alloc_pages(int order, const char *owner_uuid)
>> +{
>> +	void *kaddr = NULL;
>> +	struct bch_nvm_pgalloc_recs *pgalloc_recs;
>> +	struct bch_nvm_pages_owner_head *owner_head;
>> +	int i, j;
>> +
>> +	mutex_lock(&only_set->lock);
>> +	owner_head = find_owner_head(owner_uuid, true);
>> +
>> +	if (!owner_head) {
>> +		pr_err("can't find bch_nvm_pgalloc_recs by(uuid=%s)\n", owner_uuid);
>> +		goto unlock;
>> +	}
>> +
>> +	for (j = 0; j < only_set->total_namespaces_nr; j++) {
>> +		struct bch_nvm_namespace *ns = only_set->nss[j];
>> +
>> +		if (!ns || (ns->free < (1L << order)))
>> +			continue;
>> +
>> +		for (i = order; i < BCH_MAX_ORDER; i++) {
>> +			struct list_head *list;
>> +			struct page *page, *buddy_page;
>> +
>> +			if (list_empty(&ns->free_area[i]))
>> +				continue;
>> +
>> +			list = ns->free_area[i].next;
> list_first_entry()?

Copied. It will be updated in next post.

Thanks for your review.

Coly Li





  reply	other threads:[~2021-06-23  6:02 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-15  5:49 [PATCH 00/14] bcache patches for Linux v5.14 Coly Li
2021-06-15  5:49 ` [PATCH 01/14] bcache: fix error info in register_bcache() Coly Li
2021-06-22  9:47   ` Hannes Reinecke
2021-06-15  5:49 ` [PATCH 02/14] md: bcache: Fix spelling of 'acquire' Coly Li
2021-06-22 10:03   ` Hannes Reinecke
2021-06-15  5:49 ` [PATCH 03/14] bcache: add initial data structures for nvm pages Coly Li
2021-06-21 16:17   ` Ask help for code review (was Re: [PATCH 03/14] bcache: add initial data structures for nvm pages) Coly Li
2021-06-22  8:41     ` Huang, Ying
2021-06-23  4:32       ` Coly Li
2021-06-23  6:53         ` Huang, Ying
2021-06-23  7:04           ` Christoph Hellwig
2021-06-23  7:19             ` Coly Li
2021-06-23  7:21               ` Christoph Hellwig
2021-06-23 10:05                 ` Coly Li
2021-06-23 11:16                   ` Coly Li
2021-06-23 11:49                   ` Christoph Hellwig
2021-06-23 12:09                     ` Coly Li
2021-06-22 10:19   ` [PATCH 03/14] bcache: add initial data structures for nvm pages Hannes Reinecke
2021-06-23  7:09     ` Coly Li
2021-06-15  5:49 ` [PATCH 04/14] bcache: initialize the nvm pages allocator Coly Li
2021-06-22 10:39   ` Hannes Reinecke
2021-06-23  5:26     ` Coly Li
2021-06-23  9:16       ` Hannes Reinecke
2021-06-23  9:34         ` Coly Li
2021-06-15  5:49 ` [PATCH 05/14] bcache: initialization of the buddy Coly Li
2021-06-22 10:45   ` Hannes Reinecke
2021-06-23  5:35     ` Coly Li
2021-06-23  5:46       ` Re[2]: " Pavel Goran
2021-06-23  6:03         ` Coly Li
2021-06-15  5:49 ` [PATCH 06/14] bcache: bch_nvm_alloc_pages() " Coly Li
2021-06-22 10:51   ` Hannes Reinecke
2021-06-23  6:02     ` Coly Li [this message]
2021-06-15  5:49 ` [PATCH 07/14] bcache: bch_nvm_free_pages() " Coly Li
2021-06-22 10:53   ` Hannes Reinecke
2021-06-23  6:06     ` Coly Li
2021-06-15  5:49 ` [PATCH 08/14] bcache: get allocated pages from specific owner Coly Li
2021-06-22 10:54   ` Hannes Reinecke
2021-06-23  6:08     ` Coly Li
2021-06-15  5:49 ` [PATCH 09/14] bcache: use bucket index to set GC_MARK_METADATA for journal buckets in bch_btree_gc_finish() Coly Li
2021-06-22 10:55   ` Hannes Reinecke
2021-06-23  6:09     ` Coly Li
2021-06-15  5:49 ` [PATCH 10/14] bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set Coly Li
2021-06-22 10:59   ` Hannes Reinecke
2021-06-23  6:09     ` Coly Li
2021-06-15  5:49 ` [PATCH 11/14] bcache: initialize bcache journal for NVDIMM meta device Coly Li
2021-06-22 11:01   ` Hannes Reinecke
2021-06-23  6:17     ` Coly Li
2021-06-23  9:20       ` Hannes Reinecke
2021-06-23 10:14         ` Coly Li
2021-06-15  5:49 ` [PATCH 12/14] bcache: support storing bcache journal into " Coly Li
2021-06-22 11:03   ` Hannes Reinecke
2021-06-23  6:19     ` Coly Li
2021-06-15  5:49 ` [PATCH 13/14] bcache: read jset from NVDIMM pages for journal replay Coly Li
2021-06-22 11:04   ` Hannes Reinecke
2021-06-23  6:21     ` Coly Li
2021-06-15  5:49 ` [PATCH 14/14] bcache: add sysfs interface register_nvdimm_meta to register NVDIMM meta device Coly Li
2021-06-22 11:04   ` Hannes Reinecke
2021-06-21 15:14 ` [PATCH 00/14] bcache patches for Linux v5.14 Jens Axboe
2021-06-21 15:25   ` Coly Li
2021-06-21 15:27     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e5b642b5-47d1-ce30-7931-817d4ec4cbdc@suse.de \
    --to=colyli@suse.de \
    --cc=axboe@kernel.dk \
    --cc=hare@suse.de \
    --cc=jianpeng.ma@intel.com \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=qiaowei.ren@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).