* bcache integer overflow for large devices w/small io_opt @ 2020-07-10 22:47 Ken Raeburn 2020-07-11 15:28 ` Coly Li 0 siblings, 1 reply; 4+ messages in thread From: Ken Raeburn @ 2020-07-10 22:47 UTC (permalink / raw) To: linux-bcache The long version is written up at https://bugzilla.redhat.com/show_bug.cgi?id=1783075 but the short version: There are devices out there which set q->limits.io_opt to small values like 4096 bytes, causing bcache to use that for the stripe size, but the device size could still be large enough that the computed stripe count is 2**32 or more. That value gets stuffed into a 32-bit (unsigned int) field, throwing away the high bits, and then that truncated value is range-checked and used. This can result in memory corruption or faults in some cases. The problem was brought up with us on Red Hat's VDO driver team by a bcache user on a 4.17.8 kernel, has been demonstrated in the Fedora 5.3.15-300.fc31 kernel, and by inspection appears to be present in Linus's tree as of this morning. The easy fix would be to keep the quotient in a 64-bit variable until it's validated, but that would simply limit the size of such devices as bcache backing storage (in this case, limiting VDO volumes to under 8 TB). Is there a way to still be able to use larger devices? Perhaps scale up the stripe size from io_opt to the point where the stripe count falls in the allowed range? Ken Raeburn (Red Hat VDO driver developer) ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: bcache integer overflow for large devices w/small io_opt 2020-07-10 22:47 bcache integer overflow for large devices w/small io_opt Ken Raeburn @ 2020-07-11 15:28 ` Coly Li 2020-07-12 3:06 ` Ken Raeburn 0 siblings, 1 reply; 4+ messages in thread From: Coly Li @ 2020-07-11 15:28 UTC (permalink / raw) To: Ken Raeburn; +Cc: linux-bcache On 2020/7/11 06:47, Ken Raeburn wrote: > > The long version is written up at > https://bugzilla.redhat.com/show_bug.cgi?id=1783075 but the short > version: > > There are devices out there which set q->limits.io_opt to small values > like 4096 bytes, causing bcache to use that for the stripe size, but the > device size could still be large enough that the computed stripe count > is 2**32 or more. That value gets stuffed into a 32-bit (unsigned int) > field, throwing away the high bits, and then that truncated value is > range-checked and used. This can result in memory corruption or faults > in some cases. > > The problem was brought up with us on Red Hat's VDO driver team by a > bcache user on a 4.17.8 kernel, has been demonstrated in the Fedora > 5.3.15-300.fc31 kernel, and by inspection appears to be present in > Linus's tree as of this morning. > > The easy fix would be to keep the quotient in a 64-bit variable until > it's validated, but that would simply limit the size of such devices as > bcache backing storage (in this case, limiting VDO volumes to under 8 > TB). Is there a way to still be able to use larger devices? Perhaps > scale up the stripe size from io_opt to the point where the stripe count > falls in the allowed range? > > Ken Raeburn > (Red Hat VDO driver developer) > We cannot extend the bit width of nr_stripes, because d->full_dirty_stripes memory allocation depends on it. For the 18T volume, and stripe_size is 4KB, there are 4831838208 stripes. Then size of d->full_dirty_stripes will be 4831838208*sizeof(atomic_t) > 140GB. This is too large for kernel memory allocation. Does it help of we have a option in bcache-tools to specify a stripe_size number to overwrite limit->io_opt ? Then you may specify a larger stripe size which may avoid nr_stripes overflow. Thanks for the report. Coly Li ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: bcache integer overflow for large devices w/small io_opt 2020-07-11 15:28 ` Coly Li @ 2020-07-12 3:06 ` Ken Raeburn 2020-07-12 9:31 ` Coly Li 0 siblings, 1 reply; 4+ messages in thread From: Ken Raeburn @ 2020-07-12 3:06 UTC (permalink / raw) To: Coly Li; +Cc: linux-bcache On 7/11/20 11:28 AM, Coly Li wrote: > On 2020/7/11 06:47, Ken Raeburn wrote: >> The long version is written up at >> https://bugzilla.redhat.com/show_bug.cgi?id=1783075 but the short >> version: >> >> There are devices out there which set q->limits.io_opt to small values >> like 4096 bytes, causing bcache to use that for the stripe size, but the >> device size could still be large enough that the computed stripe count >> is 2**32 or more. That value gets stuffed into a 32-bit (unsigned int) >> field, throwing away the high bits, and then that truncated value is >> range-checked and used. This can result in memory corruption or faults >> in some cases. >> >> The problem was brought up with us on Red Hat's VDO driver team by a >> bcache user on a 4.17.8 kernel, has been demonstrated in the Fedora >> 5.3.15-300.fc31 kernel, and by inspection appears to be present in >> Linus's tree as of this morning. >> >> The easy fix would be to keep the quotient in a 64-bit variable until >> it's validated, but that would simply limit the size of such devices as >> bcache backing storage (in this case, limiting VDO volumes to under 8 >> TB). Is there a way to still be able to use larger devices? Perhaps >> scale up the stripe size from io_opt to the point where the stripe count >> falls in the allowed range? >> >> Ken Raeburn >> (Red Hat VDO driver developer) >> > We cannot extend the bit width of nr_stripes, because > d->full_dirty_stripes memory allocation depends on it. > > For the 18T volume, and stripe_size is 4KB, there are 4831838208 > stripes. Then size of d->full_dirty_stripes will be > 4831838208*sizeof(atomic_t) > 140GB. This is too large for kernel memory > allocation. I didn't intend for nr_stripes to be made 64 bits. Just a temporary variable for the purposes of validation, to ensure that you won't be losing high bits when coercing to unsigned int. > Does it help of we have a option in bcache-tools to specify a > stripe_size number to overwrite limit->io_opt ? Then you may specify a > larger stripe size which may avoid nr_stripes overflow. > > Thanks for the report. > > Coly Li > Yes, I think letting the user choose a stripe size would be a good way to address the problem. Of course, the driver must still defend itself against memory corruption anyway, if the user doesn't do so, by rejecting or adjusting the values. But whereas I wouldn't recommend the driver alter the stripe size by more than necessary to make the stripe count fit, the user can make it as big as they want, if they want to bring the memory requirement down further, or if they've done some performance measurements of different configurations, or they know something interesting about their workload's access patterns, etc. Ken ^ permalink raw reply [flat|nested] 4+ messages in thread
* Re: bcache integer overflow for large devices w/small io_opt 2020-07-12 3:06 ` Ken Raeburn @ 2020-07-12 9:31 ` Coly Li 0 siblings, 0 replies; 4+ messages in thread From: Coly Li @ 2020-07-12 9:31 UTC (permalink / raw) To: Ken Raeburn; +Cc: linux-bcache On 2020/7/12 11:06, Ken Raeburn wrote: > On 7/11/20 11:28 AM, Coly Li wrote: > >> On 2020/7/11 06:47, Ken Raeburn wrote: >>> The long version is written up at >>> https://bugzilla.redhat.com/show_bug.cgi?id=1783075 but the short >>> version: >>> >>> There are devices out there which set q->limits.io_opt to small values >>> like 4096 bytes, causing bcache to use that for the stripe size, but the >>> device size could still be large enough that the computed stripe count >>> is 2**32 or more. That value gets stuffed into a 32-bit (unsigned int) >>> field, throwing away the high bits, and then that truncated value is >>> range-checked and used. This can result in memory corruption or faults >>> in some cases. >>> >>> The problem was brought up with us on Red Hat's VDO driver team by a >>> bcache user on a 4.17.8 kernel, has been demonstrated in the Fedora >>> 5.3.15-300.fc31 kernel, and by inspection appears to be present in >>> Linus's tree as of this morning. >>> >>> The easy fix would be to keep the quotient in a 64-bit variable until >>> it's validated, but that would simply limit the size of such devices as >>> bcache backing storage (in this case, limiting VDO volumes to under 8 >>> TB). Is there a way to still be able to use larger devices? Perhaps >>> scale up the stripe size from io_opt to the point where the stripe count >>> falls in the allowed range? >>> >>> Ken Raeburn >>> (Red Hat VDO driver developer) >>> >> We cannot extend the bit width of nr_stripes, because >> d->full_dirty_stripes memory allocation depends on it. >> >> For the 18T volume, and stripe_size is 4KB, there are 4831838208 >> stripes. Then size of d->full_dirty_stripes will be >> 4831838208*sizeof(atomic_t) > 140GB. This is too large for kernel memory >> allocation. > I didn't intend for nr_stripes to be made 64 bits. Just a temporary > variable for the purposes of validation, to ensure that you won't be > losing high bits when coercing to unsigned int. >> Does it help of we have a option in bcache-tools to specify a >> stripe_size number to overwrite limit->io_opt ? Then you may specify a >> larger stripe size which may avoid nr_stripes overflow. >> >> Thanks for the report. >> >> Coly Li >> > Yes, I think letting the user choose a stripe size would be a good way > to address the problem. Of course, the driver must still defend itself > against memory corruption anyway, if the user doesn't do so, by > rejecting or adjusting the values. But whereas I wouldn't recommend the > driver alter the stripe size by more than necessary to make the stripe > count fit, the user can make it as big as they want, if they want to > bring the memory requirement down further, or if they've done some > performance measurements of different configurations, or they know > something interesting about their workload's access patterns, etc. Copied. Correct me if I am wrong, I will do two fixes to solve the problem, 1, The quick fix is to solve ad avoid the kernel panic reported in the bugzilla. 2, Permit people to set their own stripe_size to overwhelm the default one, then bache make continue to work on small limit->io_opt device. Coly Li ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2020-07-12 9:31 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2020-07-10 22:47 bcache integer overflow for large devices w/small io_opt Ken Raeburn 2020-07-11 15:28 ` Coly Li 2020-07-12 3:06 ` Ken Raeburn 2020-07-12 9:31 ` Coly Li
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).