All of lore.kernel.org
 help / color / mirror / Atom feed
From: Christoph Hellwig <hch@lst.de>
To: Coly Li <colyli@suse.de>
Cc: linux-bcache@vger.kernel.org, linux-block@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	Diego Ercolani <diego.ercolani@gmail.com>,
	Jan Szubiak <jan.szubiak@linuxpolska.pl>,
	Marco Rebhan <me@dblsaiko.net>,
	Matthias Ferdinand <bcache@mfedv.net>,
	Thorsten Knabe <linux@thorsten-knabe.de>,
	Victor Westerhuis <victor@westerhu.is>,
	Vojtech Pavlik <vojtech@suse.cz>,
	stable@vger.kernel.org, Christoph Hellwig <hch@lst.de>,
	Kent Overstreet <kent.overstreet@gmail.com>,
	Takashi Iwai <tiwai@suse.com>
Subject: Re: [PATCH v4] bcache: avoid oversized read request in cache missing code path
Date: Thu, 27 May 2021 13:36:29 +0200	[thread overview]
Message-ID: <20210527113629.GA17266@lst.de> (raw)
In-Reply-To: <20210526151450.45211-1-colyli@suse.de>

On Wed, May 26, 2021 at 11:14:50PM +0800, Coly Li wrote:
> In the cache missing code path of cached device, if a proper location
> from the internal B+ tree is matched for a cache miss range, function
> cached_dev_cache_miss() will be called in cache_lookup_fn() in the
> following code block,
> [code block 1]
>   526         unsigned int sectors = KEY_INODE(k) == s->iop.inode
>   527                 ? min_t(uint64_t, INT_MAX,
>   528                         KEY_START(k) - bio->bi_iter.bi_sector)
>   529                 : INT_MAX;
>   530         int ret = s->d->cache_miss(b, s, bio, sectors);
> 
> Here s->d->cache_miss() is the call backfunction pointer initialized as
> cached_dev_cache_miss(), the last parameter 'sectors' is an important
> hint to calculate the size of read request to backing device of the
> missing cache data.
> 
> Current calculation in above code block may generate oversized value of
> 'sectors', which consequently may trigger 2 different potential kernel
> panics by BUG() or BUG_ON() as listed below,
> 
> 1) BUG_ON() inside bch_btree_insert_key(),
> [code block 2]
>    886         BUG_ON(b->ops->is_extents && !KEY_SIZE(k));
> 2) BUG() inside biovec_slab(),
> [code block 3]
>    51         default:
>    52                 BUG();
>    53                 return NULL;
> 
> All the above panics are original from cached_dev_cache_miss() by the
> oversized parameter 'sectors'.
> 
> Inside cached_dev_cache_miss(), parameter 'sectors' is used to calculate
> the size of data read from backing device for the cache missing. This
> size is stored in s->insert_bio_sectors by the following lines of code,
> [code block 4]
>   909    s->insert_bio_sectors = min(sectors, bio_sectors(bio) + reada);
> 
> Then the actual key inserting to the internal B+ tree is generated and
> stored in s->iop.replace_key by the following lines of code,
> [code block 5]
>   911   s->iop.replace_key = KEY(s->iop.inode,
>   912                    bio->bi_iter.bi_sector + s->insert_bio_sectors,
>   913                    s->insert_bio_sectors);
> The oversized parameter 'sectors' may trigger panic 1) by BUG_ON() from
> the above code block.
> 
> And the bio sending to backing device for the missing data is allocated
> with hint from s->insert_bio_sectors by the following lines of code,
> [code block 6]
>   926    cache_bio = bio_alloc_bioset(GFP_NOWAIT,
>   927                 DIV_ROUND_UP(s->insert_bio_sectors, PAGE_SECTORS),
>   928                 &dc->disk.bio_split);
> The oversized parameter 'sectors' may trigger panic 2) by BUG() from the
> agove code block.
> 
> Now let me explain how the panics happen with the oversized 'sectors'.
> In code block 5, replace_key is generated by macro KEY(). From the
> definition of macro KEY(),
> [code block 7]
>   71 #define KEY(inode, offset, size)                                  \
>   72 ((struct bkey) {                                                  \
>   73      .high = (1ULL << 63) | ((__u64) (size) << 20) | (inode),     \
>   74      .low = (offset)                                              \
>   75 })
> 
> Here 'size' is 16bits width embedded in 64bits member 'high' of struct
> bkey. But in code block 1, if "KEY_START(k) - bio->bi_iter.bi_sector" is
> very probably to be larger than (1<<16) - 1, which makes the bkey size
> calculation in code block 5 is overflowed. In one bug report the value
> of parameter 'sectors' is 131072 (= 1 << 17), the overflowed 'sectors'
> results the overflowed s->insert_bio_sectors in code block 4, then makes
> size field of s->iop.replace_key to be 0 in code block 5. Then the 0-
> sized s->iop.replace_key is inserted into the internal B+ tree as cache
> missing check key (a special key to detect and avoid a racing between
> normal write request and cache missing read request) as,
> [code block 8]
>   915   ret = bch_btree_insert_check_key(b, &s->op, &s->iop.replace_key);
> 
> Then the 0-sized s->iop.replace_key as 3rd parameter triggers the bkey
> size check BUG_ON() in code block 2, and causes the kernel panic 1).
> 
> Another kernel panic is from code block 6, is by the bvecs number
> oversized value s->insert_bio_sectors from code block 4,
> 	min(sectors, bio_sectors(bio) + reada)
> There are two possibility for oversized reresult,
> - bio_sectors(bio) is valid, but bio_sectors(bio) + reada is oversized.
> - sectors < bio_sectors(bio) + reada, but sectors is oversized.
> 
> >From a bug report the result of "DIV_ROUND_UP(s->insert_bio_sectors,
> PAGE_SECTORS)" from code block 6 can be 344, 282, 946, 342 and many
> other values which larther than BIO_MAX_VECS (a.k.a 256). When calling
> bio_alloc_bioset() with such larger-than-256 value as the 2nd parameter,
> this value will eventually be sent to biovec_slab() as parameter
> 'nr_vecs' in following code path,
>    bio_alloc_bioset() ==> bvec_alloc() ==> biovec_slab()
> Because parameter 'nr_vecs' is larger-than-256 value, the panic by BUG()
> in code block 3 is triggered inside biovec_slab().
> 
> >From the above analysis, we know that the 4th parameter 'sector' sent
> into cached_dev_cache_miss() may cause overflow in code block 5 and 6,
> and finally cause kernel panic in code block 2 and 3. And if result of
> bio_sectors(bio) + reada exceeds valid bvecs number, it may also trigger
> kernel panic in code block 3 from code block 6.
> 
> In this patch, the above two panics are avoided by the following
> changes,
> - If DIV_ROUND_UP(bio_sectors(bio) + reada, PAGE_SECTORS) exceeds the
>   maximum bvecs counter, reduce reada to make sure the DIV_ROUND_UP()
>   result won't generate a oversized s->insert_bio_sectors to cause
>   invalid bvecs number to cache_bio.
> - If sectors exceeds the maximum bkey size, then set the maximum valid
>   bkey size to sectors.
> 
> By the above changes, in code block 5 the size value in KEY() macro will
> always be in valid range. As well in code block 6, the nr_iovecs
> parameter of bio_alloc_bioset() calculated by
> DIV_ROUND_UP(s->insert_bio_sectors, PAGE_SECTORS) will always be a valid
> bvecs number. Now both panics won't happen anymore.
> 
> Current problmatic code can be partially found since Linux v5.13-rc1,
> therefore all maintained stable kernels should try to apply this fix.
> 
> Reported-by: Diego Ercolani <diego.ercolani@gmail.com>
> Reported-by: Jan Szubiak <jan.szubiak@linuxpolska.pl>
> Reported-by: Marco Rebhan <me@dblsaiko.net>
> Reported-by: Matthias Ferdinand <bcache@mfedv.net>
> Reported-by: Thorsten Knabe <linux@thorsten-knabe.de>
> Reported-by: Victor Westerhuis <victor@westerhu.is>
> Reported-by: Vojtech Pavlik <vojtech@suse.cz>
> Signed-off-by: Coly Li <colyli@suse.de>
> Cc: stable@vger.kernel.org
> Cc: Christoph Hellwig <hch@lst.de>
> Cc: Kent Overstreet <kent.overstreet@gmail.com>
> Cc: Takashi Iwai <tiwai@suse.com>
> ---
> Changelog:
> v4, not directly access BIO_MAX_VECS and reduce reada value to avoid
>     oversized bvecs number, by hint from Christoph Hellwig. 
> v3, fix typo in v2.
> v2, fix the bypass bio size calculation in v1.
> v1, the initial version
> 
>  drivers/md/bcache/request.c | 19 +++++++++++++++++++
>  1 file changed, 19 insertions(+)
> 
> diff --git a/drivers/md/bcache/request.c b/drivers/md/bcache/request.c
> index 29c231758293..054948f037ed 100644
> --- a/drivers/md/bcache/request.c
> +++ b/drivers/md/bcache/request.c
> @@ -883,6 +883,7 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s,
>  	unsigned int reada = 0;
>  	struct cached_dev *dc = container_of(s->d, struct cached_dev, disk);
>  	struct bio *miss, *cache_bio;
> +	unsigned int nr_bvecs, max_segs;
>  
>  	s->cache_missed = 1;
>  
> @@ -899,6 +900,24 @@ static int cached_dev_cache_miss(struct btree *b, struct search *s,
>  			      get_capacity(bio->bi_bdev->bd_disk) -
>  			      bio_end_sector(bio));
>  
> +	/*
> +	 * If "bio_sectors(bio) + reada" may causes an oversized bio bvecs
> +	 * number, reada size must be deducted to make sure the following
> +	 * calculated s->insert_bio_sectors won't cause oversized bvecs number
> +	 * to cache_bio.
> +	 */
> +	nr_bvecs = DIV_ROUND_UP(bio_sectors(bio) + reada, PAGE_SECTORS);

Can't this overflow if bio_sectors(bio) is close to UINT_MAX already?

> +	/*
> +	 * Make sure sectors won't exceed (1 << KEY_SIZE_BITS) - 1, which is
> +	 * the maximum bkey size in unit of sector. Then s->insert_bio_sectors
> +	 * will always be a valid bio in valid bkey size range.
> +	 */
> +	if (sectors > ((1 << KEY_SIZE_BITS) - 1))
> +		sectors = (1 << KEY_SIZE_BITS) - 1;

This should use min() or min_t().

      parent reply	other threads:[~2021-05-27 11:36 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-05-26 15:14 [PATCH v4] bcache: avoid oversized read request in cache missing code path Coly Li
2021-05-26 15:59 ` Nix
2021-05-27  2:55   ` Coly Li
2021-05-27 11:36 ` Christoph Hellwig [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210527113629.GA17266@lst.de \
    --to=hch@lst.de \
    --cc=bcache@mfedv.net \
    --cc=colyli@suse.de \
    --cc=diego.ercolani@gmail.com \
    --cc=jan.szubiak@linuxpolska.pl \
    --cc=kent.overstreet@gmail.com \
    --cc=linux-bcache@vger.kernel.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=linux@thorsten-knabe.de \
    --cc=me@dblsaiko.net \
    --cc=stable@vger.kernel.org \
    --cc=tiwai@suse.com \
    --cc=victor@westerhu.is \
    --cc=vojtech@suse.cz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.