Linux-bcache Archive on lore.kernel.org
 help / color / Atom feed
* bcache integer overflow for large devices w/small io_opt
@ 2020-07-10 22:47 Ken Raeburn
  2020-07-11 15:28 ` Coly Li
  0 siblings, 1 reply; 4+ messages in thread
From: Ken Raeburn @ 2020-07-10 22:47 UTC (permalink / raw)
  To: linux-bcache


The long version is written up at
https://bugzilla.redhat.com/show_bug.cgi?id=1783075 but the short
version:

There are devices out there which set q->limits.io_opt to small values
like 4096 bytes, causing bcache to use that for the stripe size, but the
device size could still be large enough that the computed stripe count
is 2**32 or more. That value gets stuffed into a 32-bit (unsigned int)
field, throwing away the high bits, and then that truncated value is
range-checked and used. This can result in memory corruption or faults
in some cases.

The problem was brought up with us on Red Hat's VDO driver team by a
bcache user on a 4.17.8 kernel, has been demonstrated in the Fedora
5.3.15-300.fc31 kernel, and by inspection appears to be present in
Linus's tree as of this morning.

The easy fix would be to keep the quotient in a 64-bit variable until
it's validated, but that would simply limit the size of such devices as
bcache backing storage (in this case, limiting VDO volumes to under 8
TB). Is there a way to still be able to use larger devices? Perhaps
scale up the stripe size from io_opt to the point where the stripe count
falls in the allowed range?

Ken Raeburn
(Red Hat VDO driver developer)


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bcache integer overflow for large devices w/small io_opt
  2020-07-10 22:47 bcache integer overflow for large devices w/small io_opt Ken Raeburn
@ 2020-07-11 15:28 ` Coly Li
  2020-07-12  3:06   ` Ken Raeburn
  0 siblings, 1 reply; 4+ messages in thread
From: Coly Li @ 2020-07-11 15:28 UTC (permalink / raw)
  To: Ken Raeburn; +Cc: linux-bcache

On 2020/7/11 06:47, Ken Raeburn wrote:
> 
> The long version is written up at
> https://bugzilla.redhat.com/show_bug.cgi?id=1783075 but the short
> version:
> 
> There are devices out there which set q->limits.io_opt to small values
> like 4096 bytes, causing bcache to use that for the stripe size, but the
> device size could still be large enough that the computed stripe count
> is 2**32 or more. That value gets stuffed into a 32-bit (unsigned int)
> field, throwing away the high bits, and then that truncated value is
> range-checked and used. This can result in memory corruption or faults
> in some cases.
> 
> The problem was brought up with us on Red Hat's VDO driver team by a
> bcache user on a 4.17.8 kernel, has been demonstrated in the Fedora
> 5.3.15-300.fc31 kernel, and by inspection appears to be present in
> Linus's tree as of this morning.
> 
> The easy fix would be to keep the quotient in a 64-bit variable until
> it's validated, but that would simply limit the size of such devices as
> bcache backing storage (in this case, limiting VDO volumes to under 8
> TB). Is there a way to still be able to use larger devices? Perhaps
> scale up the stripe size from io_opt to the point where the stripe count
> falls in the allowed range?
> 
> Ken Raeburn
> (Red Hat VDO driver developer)
> 

We cannot extend the bit width of nr_stripes, because
d->full_dirty_stripes memory allocation depends on it.

For the 18T volume, and stripe_size is 4KB, there are 4831838208
stripes. Then size of d->full_dirty_stripes will be
4831838208*sizeof(atomic_t) > 140GB. This is too large for kernel memory
allocation.

Does it help of we have a option in bcache-tools to specify a
stripe_size number to overwrite limit->io_opt ? Then you may specify a
larger stripe size which may avoid nr_stripes overflow.

Thanks for the report.

Coly Li



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bcache integer overflow for large devices w/small io_opt
  2020-07-11 15:28 ` Coly Li
@ 2020-07-12  3:06   ` Ken Raeburn
  2020-07-12  9:31     ` Coly Li
  0 siblings, 1 reply; 4+ messages in thread
From: Ken Raeburn @ 2020-07-12  3:06 UTC (permalink / raw)
  To: Coly Li; +Cc: linux-bcache

On 7/11/20 11:28 AM, Coly Li wrote:

> On 2020/7/11 06:47, Ken Raeburn wrote:
>> The long version is written up at
>> https://bugzilla.redhat.com/show_bug.cgi?id=1783075 but the short
>> version:
>>
>> There are devices out there which set q->limits.io_opt to small values
>> like 4096 bytes, causing bcache to use that for the stripe size, but the
>> device size could still be large enough that the computed stripe count
>> is 2**32 or more. That value gets stuffed into a 32-bit (unsigned int)
>> field, throwing away the high bits, and then that truncated value is
>> range-checked and used. This can result in memory corruption or faults
>> in some cases.
>>
>> The problem was brought up with us on Red Hat's VDO driver team by a
>> bcache user on a 4.17.8 kernel, has been demonstrated in the Fedora
>> 5.3.15-300.fc31 kernel, and by inspection appears to be present in
>> Linus's tree as of this morning.
>>
>> The easy fix would be to keep the quotient in a 64-bit variable until
>> it's validated, but that would simply limit the size of such devices as
>> bcache backing storage (in this case, limiting VDO volumes to under 8
>> TB). Is there a way to still be able to use larger devices? Perhaps
>> scale up the stripe size from io_opt to the point where the stripe count
>> falls in the allowed range?
>>
>> Ken Raeburn
>> (Red Hat VDO driver developer)
>>
> We cannot extend the bit width of nr_stripes, because
> d->full_dirty_stripes memory allocation depends on it.
>
> For the 18T volume, and stripe_size is 4KB, there are 4831838208
> stripes. Then size of d->full_dirty_stripes will be
> 4831838208*sizeof(atomic_t) > 140GB. This is too large for kernel memory
> allocation.
I didn't intend for nr_stripes to be made 64 bits. Just a temporary 
variable for the purposes of validation, to ensure that you won't be 
losing high bits when coercing to unsigned int.
> Does it help of we have a option in bcache-tools to specify a
> stripe_size number to overwrite limit->io_opt ? Then you may specify a
> larger stripe size which may avoid nr_stripes overflow.
>
> Thanks for the report.
>
> Coly Li
>
Yes, I think letting the user choose a stripe size would be a good way 
to address the problem. Of course, the driver must still defend itself 
against memory corruption anyway, if the user doesn't do so, by 
rejecting or adjusting the values. But whereas I wouldn't recommend the 
driver alter the stripe size by more than necessary to make the stripe 
count fit, the user can make it as big as they want, if they want to 
bring the memory requirement down further, or if they've done some 
performance measurements of different configurations, or they know 
something interesting about their workload's access patterns, etc.

Ken


^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: bcache integer overflow for large devices w/small io_opt
  2020-07-12  3:06   ` Ken Raeburn
@ 2020-07-12  9:31     ` Coly Li
  0 siblings, 0 replies; 4+ messages in thread
From: Coly Li @ 2020-07-12  9:31 UTC (permalink / raw)
  To: Ken Raeburn; +Cc: linux-bcache

On 2020/7/12 11:06, Ken Raeburn wrote:
> On 7/11/20 11:28 AM, Coly Li wrote:
> 
>> On 2020/7/11 06:47, Ken Raeburn wrote:
>>> The long version is written up at
>>> https://bugzilla.redhat.com/show_bug.cgi?id=1783075 but the short
>>> version:
>>>
>>> There are devices out there which set q->limits.io_opt to small values
>>> like 4096 bytes, causing bcache to use that for the stripe size, but the
>>> device size could still be large enough that the computed stripe count
>>> is 2**32 or more. That value gets stuffed into a 32-bit (unsigned int)
>>> field, throwing away the high bits, and then that truncated value is
>>> range-checked and used. This can result in memory corruption or faults
>>> in some cases.
>>>
>>> The problem was brought up with us on Red Hat's VDO driver team by a
>>> bcache user on a 4.17.8 kernel, has been demonstrated in the Fedora
>>> 5.3.15-300.fc31 kernel, and by inspection appears to be present in
>>> Linus's tree as of this morning.
>>>
>>> The easy fix would be to keep the quotient in a 64-bit variable until
>>> it's validated, but that would simply limit the size of such devices as
>>> bcache backing storage (in this case, limiting VDO volumes to under 8
>>> TB). Is there a way to still be able to use larger devices? Perhaps
>>> scale up the stripe size from io_opt to the point where the stripe count
>>> falls in the allowed range?
>>>
>>> Ken Raeburn
>>> (Red Hat VDO driver developer)
>>>
>> We cannot extend the bit width of nr_stripes, because
>> d->full_dirty_stripes memory allocation depends on it.
>>
>> For the 18T volume, and stripe_size is 4KB, there are 4831838208
>> stripes. Then size of d->full_dirty_stripes will be
>> 4831838208*sizeof(atomic_t) > 140GB. This is too large for kernel memory
>> allocation.
> I didn't intend for nr_stripes to be made 64 bits. Just a temporary
> variable for the purposes of validation, to ensure that you won't be
> losing high bits when coercing to unsigned int.
>> Does it help of we have a option in bcache-tools to specify a
>> stripe_size number to overwrite limit->io_opt ? Then you may specify a
>> larger stripe size which may avoid nr_stripes overflow.
>>
>> Thanks for the report.
>>
>> Coly Li
>>
> Yes, I think letting the user choose a stripe size would be a good way
> to address the problem. Of course, the driver must still defend itself
> against memory corruption anyway, if the user doesn't do so, by
> rejecting or adjusting the values. But whereas I wouldn't recommend the
> driver alter the stripe size by more than necessary to make the stripe
> count fit, the user can make it as big as they want, if they want to
> bring the memory requirement down further, or if they've done some
> performance measurements of different configurations, or they know
> something interesting about their workload's access patterns, etc.

Copied. Correct me if I am wrong, I will do two fixes to solve the problem,
1, The quick fix is to solve ad avoid the kernel panic reported in the
bugzilla.
2, Permit people to set their own stripe_size to overwhelm the default
one, then bache make continue to work on small limit->io_opt device.

Coly Li

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, back to index

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-10 22:47 bcache integer overflow for large devices w/small io_opt Ken Raeburn
2020-07-11 15:28 ` Coly Li
2020-07-12  3:06   ` Ken Raeburn
2020-07-12  9:31     ` Coly Li

Linux-bcache Archive on lore.kernel.org

Archives are clonable:
	git clone --mirror https://lore.kernel.org/linux-bcache/0 linux-bcache/git/0.git

	# If you have public-inbox 1.1+ installed, you may
	# initialize and index your mirror using the following commands:
	public-inbox-init -V2 linux-bcache linux-bcache/ https://lore.kernel.org/linux-bcache \
		linux-bcache@vger.kernel.org
	public-inbox-index linux-bcache

Example config snippet for mirrors

Newsgroup available over NNTP:
	nntp://nntp.lore.kernel.org/org.kernel.vger.linux-bcache


AGPL code for this site: git clone https://public-inbox.org/public-inbox.git