linux-bcache.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: Coly Li <colyli@suse.de>
To: Hannes Reinecke <hare@suse.de>
Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org,
	Jianpeng Ma <jianpeng.ma@intel.com>,
	kernel test robot <lkp@intel.com>,
	Dan Carpenter <dan.carpenter@oracle.com>,
	axboe@kernel.dk, Qiaowei Ren <qiaowei.ren@intel.com>
Subject: Re: [PATCH 05/14] bcache: initialization of the buddy
Date: Wed, 23 Jun 2021 13:35:21 +0800	[thread overview]
Message-ID: <e66262c1-7ce1-cd67-b48b-982b6d1ea1d1@suse.de> (raw)
In-Reply-To: <bfa10634-b144-e180-c66a-5bf839c5ce71@suse.de>

On 6/22/21 6:45 PM, Hannes Reinecke wrote:
> On 6/15/21 7:49 AM, Coly Li wrote:
>> From: Jianpeng Ma <jianpeng.ma@intel.com>
>>
>> This nvm pages allocator will implement the simple buddy to manage the
>> nvm address space. This patch initializes this buddy for new namespace.
>>
> Please use 'buddy allocator' instead of just 'buddy'.

Will update in next post.


>
>> the unit of alloc/free of the buddy is page. DAX device has their
>> struct page(in dram or PMEM).
>>
>>         struct {        /* ZONE_DEVICE pages */
>>                 /** @pgmap: Points to the hosting device page map. */
>>                 struct dev_pagemap *pgmap;
>>                 void *zone_device_data;
>>                 /*
>>                  * ZONE_DEVICE private pages are counted as being
>>                  * mapped so the next 3 words hold the mapping, index,
>>                  * and private fields from the source anonymous or
>>                  * page cache page while the page is migrated to device
>>                  * private memory.
>>                  * ZONE_DEVICE MEMORY_DEVICE_FS_DAX pages also
>>                  * use the mapping, index, and private fields when
>>                  * pmem backed DAX files are mapped.
>>                  */
>>         };
>>
>> ZONE_DEVICE pages only use pgmap. Other 4 words[16/32 bytes] don't use.
>> So the second/third word will be used as 'struct list_head ' which list
>> in buddy. The fourth word(that is normal struct page::index) store pgoff
>> which the page-offset in the dax device. And the fifth word (that is
>> normal struct page::private) store order of buddy. page_type will be used
>> to store buddy flags.
>>
>> Reported-by: kernel test robot <lkp@intel.com>
>> Reported-by: Dan Carpenter <dan.carpenter@oracle.com>
>> Signed-off-by: Jianpeng Ma <jianpeng.ma@intel.com>
>> Co-developed-by: Qiaowei Ren <qiaowei.ren@intel.com>
>> Signed-off-by: Qiaowei Ren <qiaowei.ren@intel.com>
>> Signed-off-by: Coly Li <colyli@suse.de>
>> ---
>>  drivers/md/bcache/nvm-pages.c   | 156 +++++++++++++++++++++++++++++++-
>>  drivers/md/bcache/nvm-pages.h   |   6 ++
>>  include/uapi/linux/bcache-nvm.h |  10 +-
>>  3 files changed, 165 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/md/bcache/nvm-pages.c b/drivers/md/bcache/nvm-pages.c
>> index 18fdadbc502f..804ee66e97be 100644
>> --- a/drivers/md/bcache/nvm-pages.c
>> +++ b/drivers/md/bcache/nvm-pages.c
>> @@ -34,6 +34,10 @@ static void release_nvm_namespaces(struct bch_nvm_set *nvm_set)
>>  	for (i = 0; i < nvm_set->total_namespaces_nr; i++) {
>>  		ns = nvm_set->nss[i];
>>  		if (ns) {
>> +			kvfree(ns->pages_bitmap);
>> +			if (ns->pgalloc_recs_bitmap)
>> +				bitmap_free(ns->pgalloc_recs_bitmap);
>> +
>>  			blkdev_put(ns->bdev, FMODE_READ|FMODE_WRITE|FMODE_EXEC);
>>  			kfree(ns);
>>  		}
>> @@ -48,17 +52,130 @@ static void release_nvm_set(struct bch_nvm_set *nvm_set)
>>  	kfree(nvm_set);
>>  }
>>  
>> +static struct page *nvm_vaddr_to_page(struct bch_nvm_namespace *ns, void *addr)
>> +{
>> +	return virt_to_page(addr);
>> +}
>> +
>> +static void *nvm_pgoff_to_vaddr(struct bch_nvm_namespace *ns, pgoff_t pgoff)
>> +{
>> +	return ns->kaddr + (pgoff << PAGE_SHIFT);
>> +}
>> +
>> +static inline void remove_owner_space(struct bch_nvm_namespace *ns,
>> +					pgoff_t pgoff, u64 nr)
>> +{
>> +	while (nr > 0) {
>> +		unsigned int num = nr > UINT_MAX ? UINT_MAX : nr;
>> +
>> +		bitmap_set(ns->pages_bitmap, pgoff, num);
>> +		nr -= num;
>> +		pgoff += num;
>> +	}
>> +}
>> +
>> +#define BCH_PGOFF_TO_KVADDR(pgoff) ((void *)((unsigned long)pgoff << PAGE_SHIFT))
>> +
>>  static int init_owner_info(struct bch_nvm_namespace *ns)
>>  {
>>  	struct bch_owner_list_head *owner_list_head = ns->sb->owner_list_head;
>> +	struct bch_nvm_pgalloc_recs *sys_recs;
>> +	int i, j, k, rc = 0;
>>  
>>  	mutex_lock(&only_set->lock);
>>  	only_set->owner_list_head = owner_list_head;
>>  	only_set->owner_list_size = owner_list_head->size;
>>  	only_set->owner_list_used = owner_list_head->used;
>> +
>> +	/* remove used space */
>> +	remove_owner_space(ns, 0, div_u64(ns->pages_offset, ns->page_size));
>> +
>> +	sys_recs = ns->kaddr + BCH_NVM_PAGES_SYS_RECS_HEAD_OFFSET;
>> +	/* suppose no hole in array */
>> +	for (i = 0; i < owner_list_head->used; i++) {
>> +		struct bch_nvm_pages_owner_head *head = &owner_list_head->heads[i];
>> +
>> +		for (j = 0; j < BCH_NVM_PAGES_NAMESPACES_MAX; j++) {
>> +			struct bch_nvm_pgalloc_recs *pgalloc_recs = head->recs[j];
>> +			unsigned long offset = (unsigned long)ns->kaddr >> PAGE_SHIFT;
>> +			struct page *page;
>> +
>> +			while (pgalloc_recs) {
>> +				u32 pgalloc_recs_pos = (unsigned int)(pgalloc_recs - sys_recs);
>> +
>> +				if (memcmp(pgalloc_recs->magic, bch_nvm_pages_pgalloc_magic, 16)) {
>> +					pr_info("invalid bch_nvm_pages_pgalloc_magic\n");
>> +					rc = -EINVAL;
>> +					goto unlock;
>> +				}
>> +				if (memcmp(pgalloc_recs->owner_uuid, head->uuid, 16)) {
>> +					pr_info("invalid owner_uuid in bch_nvm_pgalloc_recs\n");
>> +					rc = -EINVAL;
>> +					goto unlock;
>> +				}
>> +				if (pgalloc_recs->owner != head) {
>> +					pr_info("invalid owner in bch_nvm_pgalloc_recs\n");
>> +					rc = -EINVAL;
>> +					goto unlock;
>> +				}
>> +
>> +				/* recs array can has hole */
> can have holes ?

It means the valid record is not always continuously stored in recs[]
from struct bch_nvm_pgalloc_recs. Because currently only 8 bytes write
to a 8 bytes aligned address on NVDIMM is stomic for power failure.

When a record is removed from the recs[] array by a block of NVDIMM pages
are freed, if the following valid records are moved forward to make all
records stored continuously, such memory movement is not atomic for power
failure. Then we need to design more complicated method to make sure the
meta data consistency for power failure.

Allowing hole (records can be non-continuously stored in recs[] array)
can make things much simpler here.

Thanks for your review.

Coly Li


  reply	other threads:[~2021-06-23  5:35 UTC|newest]

Thread overview: 60+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-06-15  5:49 [PATCH 00/14] bcache patches for Linux v5.14 Coly Li
2021-06-15  5:49 ` [PATCH 01/14] bcache: fix error info in register_bcache() Coly Li
2021-06-22  9:47   ` Hannes Reinecke
2021-06-15  5:49 ` [PATCH 02/14] md: bcache: Fix spelling of 'acquire' Coly Li
2021-06-22 10:03   ` Hannes Reinecke
2021-06-15  5:49 ` [PATCH 03/14] bcache: add initial data structures for nvm pages Coly Li
2021-06-21 16:17   ` Ask help for code review (was Re: [PATCH 03/14] bcache: add initial data structures for nvm pages) Coly Li
2021-06-22  8:41     ` Huang, Ying
2021-06-23  4:32       ` Coly Li
2021-06-23  6:53         ` Huang, Ying
2021-06-23  7:04           ` Christoph Hellwig
2021-06-23  7:19             ` Coly Li
2021-06-23  7:21               ` Christoph Hellwig
2021-06-23 10:05                 ` Coly Li
2021-06-23 11:16                   ` Coly Li
2021-06-23 11:49                   ` Christoph Hellwig
2021-06-23 12:09                     ` Coly Li
2021-06-22 10:19   ` [PATCH 03/14] bcache: add initial data structures for nvm pages Hannes Reinecke
2021-06-23  7:09     ` Coly Li
2021-06-15  5:49 ` [PATCH 04/14] bcache: initialize the nvm pages allocator Coly Li
2021-06-22 10:39   ` Hannes Reinecke
2021-06-23  5:26     ` Coly Li
2021-06-23  9:16       ` Hannes Reinecke
2021-06-23  9:34         ` Coly Li
2021-06-15  5:49 ` [PATCH 05/14] bcache: initialization of the buddy Coly Li
2021-06-22 10:45   ` Hannes Reinecke
2021-06-23  5:35     ` Coly Li [this message]
2021-06-23  5:46       ` Re[2]: " Pavel Goran
2021-06-23  6:03         ` Coly Li
2021-06-15  5:49 ` [PATCH 06/14] bcache: bch_nvm_alloc_pages() " Coly Li
2021-06-22 10:51   ` Hannes Reinecke
2021-06-23  6:02     ` Coly Li
2021-06-15  5:49 ` [PATCH 07/14] bcache: bch_nvm_free_pages() " Coly Li
2021-06-22 10:53   ` Hannes Reinecke
2021-06-23  6:06     ` Coly Li
2021-06-15  5:49 ` [PATCH 08/14] bcache: get allocated pages from specific owner Coly Li
2021-06-22 10:54   ` Hannes Reinecke
2021-06-23  6:08     ` Coly Li
2021-06-15  5:49 ` [PATCH 09/14] bcache: use bucket index to set GC_MARK_METADATA for journal buckets in bch_btree_gc_finish() Coly Li
2021-06-22 10:55   ` Hannes Reinecke
2021-06-23  6:09     ` Coly Li
2021-06-15  5:49 ` [PATCH 10/14] bcache: add BCH_FEATURE_INCOMPAT_NVDIMM_META into incompat feature set Coly Li
2021-06-22 10:59   ` Hannes Reinecke
2021-06-23  6:09     ` Coly Li
2021-06-15  5:49 ` [PATCH 11/14] bcache: initialize bcache journal for NVDIMM meta device Coly Li
2021-06-22 11:01   ` Hannes Reinecke
2021-06-23  6:17     ` Coly Li
2021-06-23  9:20       ` Hannes Reinecke
2021-06-23 10:14         ` Coly Li
2021-06-15  5:49 ` [PATCH 12/14] bcache: support storing bcache journal into " Coly Li
2021-06-22 11:03   ` Hannes Reinecke
2021-06-23  6:19     ` Coly Li
2021-06-15  5:49 ` [PATCH 13/14] bcache: read jset from NVDIMM pages for journal replay Coly Li
2021-06-22 11:04   ` Hannes Reinecke
2021-06-23  6:21     ` Coly Li
2021-06-15  5:49 ` [PATCH 14/14] bcache: add sysfs interface register_nvdimm_meta to register NVDIMM meta device Coly Li
2021-06-22 11:04   ` Hannes Reinecke
2021-06-21 15:14 ` [PATCH 00/14] bcache patches for Linux v5.14 Jens Axboe
2021-06-21 15:25   ` Coly Li
2021-06-21 15:27     ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=e66262c1-7ce1-cd67-b48b-982b6d1ea1d1@suse.de \
    --to=colyli@suse.de \
    --cc=axboe@kernel.dk \
    --cc=dan.carpenter@oracle.com \
    --cc=hare@suse.de \
    --cc=jianpeng.ma@intel.com \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=lkp@intel.com \
    --cc=qiaowei.ren@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).