All of lore.kernel.org
 help / color / mirror / Atom feed
From: Coly Li <colyli@suse.de>
To: Christoph Hellwig <hch@lst.de>
Cc: axboe@kernel.dk, linux-bcache@vger.kernel.org,
	linux-block@vger.kernel.org, stable@vger.kernel.org
Subject: Re: [PATCH] bcache: Revert "bcache: use bvec_virt"
Date: Thu, 4 Nov 2021 12:25:25 +0800	[thread overview]
Message-ID: <9e23e103-40ff-963f-739f-730da4d5fe3f@suse.de> (raw)
In-Reply-To: <20211103161955.GA394@lst.de>

On 11/4/21 12:19 AM, Christoph Hellwig wrote:
> On Thu, Nov 04, 2021 at 12:11:45AM +0800, Coly Li wrote:
>>> fresh page for each vec, and bio_for_each_segment_all iterates page
>>> by page.  IFF there is an offset there is proble in the surrounding
>>> code as bch_bio_alloc_pages assumes that it is called on a freshly
>>> allocate and initialized bio.
>> Yes, the offset is modified in bch_bio_alloc_pages().
> Where?   In my upstream copy of bch_bio_alloc_pages there is no bv_offset
> manipulation, and I could not see how such a manipulation would make
> sense.

Forgive my typo. It was bch_bio_map() before bch_bio_alloc_pages(), both 
in do_btree_node_write() and in util.c, bv->bv_offset is set by,
     bv->bv_offset = base ? offset_in_page(base) : 0;

Here base is bset *i which is initialized in do_btree_node_write() as,
     struct bset *i = btree_bset_last(b);

The unit size of bset is 1 bcache-defined-block size, minimized value is 
512.

>> Normally the bcache
>> defined block size is 4KB so the issue was not triggered frequently. I
>> found it during testing my nvdimm enabling code for bcache, where I happen
>> to make the bcache defined block size to non-4KB. The offset is from the
>> previous written bkey set, which the minimized unit size is 1
>> bcache-defined-block-size.
> So you have some out of tree changes here?  Copying a PAGE_SIZE into
> a 'segment' bvec just does not make any sense if there is an offset,
> as segments are defined as bvecs that do not span page boundaries.
>
> I suspect the best thing to do in do_btree_node_write would be something
> like the patch below instead of poking into the internals here, but I'd
> also really like to understand the root cause as it does point to a bug
> somewhere else.
>
>
> diff --git a/drivers/md/bcache/btree.c b/drivers/md/bcache/btree.c
> index 93b67b8d31c3d..f69914848f32f 100644
> --- a/drivers/md/bcache/btree.c
> +++ b/drivers/md/bcache/btree.c
> @@ -378,8 +378,8 @@ static void do_btree_node_write(struct btree *b)
>   		struct bvec_iter_all iter_all;
>   
>   		bio_for_each_segment_all(bv, b->bio, iter_all) {
> -			memcpy(bvec_virt(bv), addr, PAGE_SIZE);
> -			addr += PAGE_SIZE;
> +			memcpy_to_bvec(bvec_virt(bv), addr);
> +			addr += bv->bv_len;
>   		}
>   
>   		bch_submit_bbio(b->bio, b->c, &k.key, 0);

The above change doesn't work, still observe panic[1].

Before calling bio_for_each_segment_all(), void *addr is calculated by,
     void *addr = (void *) ((unsigned long) i & ~(PAGE_SIZE - 1));
which is a page size aligned address.  When writing down a dirty btree 
node, it requires whole page to be written. Your original patch works 
fine when there is not previously unwirtten keys in the page as previous 
bkey set (and corrupts memory when bv->bv_offset is non-zero). The above 
change seems missing the part in offset [0, bv->bv_offset] inside the 
dirty page, I am not sure how the bellowed panic happens by the above 
change, it seems like wild pointer from the missing part of btree node 
when iterating the btree.

If you don't want to directly access members inside struct bio_vec, is 
there something like page_base(vb) which returns bv->bv_page ?

Thanks.

Coly Li



[1] the panic message starts like,

[ 1926.350362] ------------[ cut here ]------------
[ 1926.405603] kernel BUG at ./include/linux/highmem.h:316!
[ 1926.469172] invalid opcode: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI
[ 1926.540006] CPU: 10 PID: 477 Comm: kworker/10:2 Kdump: loaded 
Tainted: G            E     5.15.0-59.27-default+ #57
[ 1926.350362 1926.540009] Hardware name: Lenovo ThinkSystem SR650 
-[7X05CTO1WW]-/-[7X05CTO1WW]-, BIOS -[IVE164L-2.80]- 10/23/2020
[ 1926.540010] Workqueue: bcache bch_data_insert_keys [bcache]
[ 1926.806482] RIP: 0010:__bch_btree_node_write+0x662/0x690 [bcache]
<snipped>

  reply	other threads:[~2021-11-04  4:25 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-03 15:10 [PATCH] bcache: Revert "bcache: use bvec_virt" Coly Li
2021-11-03 15:46 ` Christoph Hellwig
2021-11-03 16:11   ` Coly Li
2021-11-03 16:19     ` Christoph Hellwig
2021-11-04  4:25       ` Coly Li [this message]
2021-11-04 20:23         ` Christoph Hellwig
2021-11-06 13:36           ` Coly Li
2021-11-08  8:16 ` Coly Li
2021-11-09  8:03   ` Christoph Hellwig
2021-11-08 13:23 ` Jens Axboe

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=9e23e103-40ff-963f-739f-730da4d5fe3f@suse.de \
    --to=colyli@suse.de \
    --cc=axboe@kernel.dk \
    --cc=hch@lst.de \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=stable@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.