* [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
@ 2021-02-16 19:39 Dan Carpenter
2021-02-16 23:33 ` Damien Le Moal
0 siblings, 1 reply; 14+ messages in thread
From: Dan Carpenter @ 2021-02-16 19:39 UTC (permalink / raw)
To: johannes.thumshirn; +Cc: linux-scsi, Michal Hocko
Hello Johannes Thumshirn,
The patch 5795eb443060: "scsi: sd_zbc: emulate ZONE_APPEND commands"
from May 12, 2020, leads to the following static checker warning:
drivers/scsi/sd_zbc.c:741 sd_zbc_revalidate_zones()
error: kvmalloc() only makes sense with GFP_KERNEL
drivers/scsi/sd_zbc.c
721 /*
722 * There is nothing to do for regular disks, including host-aware disks
723 * that have partitions.
724 */
725 if (!blk_queue_is_zoned(q))
726 return 0;
727
728 /*
729 * Make sure revalidate zones are serialized to ensure exclusive
730 * updates of the scsi disk data.
731 */
732 mutex_lock(&sdkp->rev_mutex);
733
734 if (sdkp->zone_blocks == zone_blocks &&
735 sdkp->nr_zones == nr_zones &&
736 disk->queue->nr_zones == nr_zones)
737 goto unlock;
738
739 sdkp->zone_blocks = zone_blocks;
740 sdkp->nr_zones = nr_zones;
741 sdkp->rev_wp_offset = kvcalloc(nr_zones, sizeof(u32), GFP_NOIO);
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We're passing GFP_NOIO here so it just defaults to kcalloc() and will
not vmalloc() the memory.
742 if (!sdkp->rev_wp_offset) {
743 ret = -ENOMEM;
744 goto unlock;
745 }
746
747 ret = blk_revalidate_disk_zones(disk, sd_zbc_revalidate_zones_cb);
748
regards,
dan carpenter
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
2021-02-16 19:39 [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands Dan Carpenter
@ 2021-02-16 23:33 ` Damien Le Moal
2021-02-17 6:42 ` Johannes Thumshirn
0 siblings, 1 reply; 14+ messages in thread
From: Damien Le Moal @ 2021-02-16 23:33 UTC (permalink / raw)
To: Dan Carpenter, Johannes Thumshirn; +Cc: linux-scsi, Michal Hocko
On 2021/02/17 4:42, Dan Carpenter wrote:
> Hello Johannes Thumshirn,
>
> The patch 5795eb443060: "scsi: sd_zbc: emulate ZONE_APPEND commands"
> from May 12, 2020, leads to the following static checker warning:
>
> drivers/scsi/sd_zbc.c:741 sd_zbc_revalidate_zones()
> error: kvmalloc() only makes sense with GFP_KERNEL
>
> drivers/scsi/sd_zbc.c
> 721 /*
> 722 * There is nothing to do for regular disks, including host-aware disks
> 723 * that have partitions.
> 724 */
> 725 if (!blk_queue_is_zoned(q))
> 726 return 0;
> 727
> 728 /*
> 729 * Make sure revalidate zones are serialized to ensure exclusive
> 730 * updates of the scsi disk data.
> 731 */
> 732 mutex_lock(&sdkp->rev_mutex);
> 733
> 734 if (sdkp->zone_blocks == zone_blocks &&
> 735 sdkp->nr_zones == nr_zones &&
> 736 disk->queue->nr_zones == nr_zones)
> 737 goto unlock;
> 738
> 739 sdkp->zone_blocks = zone_blocks;
> 740 sdkp->nr_zones = nr_zones;
> 741 sdkp->rev_wp_offset = kvcalloc(nr_zones, sizeof(u32), GFP_NOIO);
> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> We're passing GFP_NOIO here so it just defaults to kcalloc() and will
> not vmalloc() the memory.
Indeed... And the allocation can get a little too big for kmalloc().
Johannes, I think we need to move that allocation before the rev_mutex locking,
using a local var for the allocated address, and then using GFP_KERNEL should be
safe... But not entirely sure. Using kmalloc would be simpler but on large SMR
drives, that allocation will soon need to be 400K or so (i.e. 100,000 zones or
even more), too large for kmalloc to succeed reliably.
>
> 742 if (!sdkp->rev_wp_offset) {
> 743 ret = -ENOMEM;
> 744 goto unlock;
> 745 }
> 746
> 747 ret = blk_revalidate_disk_zones(disk, sd_zbc_revalidate_zones_cb);
> 748
>
> regards,
> dan carpenter
>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
2021-02-16 23:33 ` Damien Le Moal
@ 2021-02-17 6:42 ` Johannes Thumshirn
2021-02-17 8:00 ` Damien Le Moal
2021-02-17 8:03 ` Michal Hocko
0 siblings, 2 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2021-02-17 6:42 UTC (permalink / raw)
To: Damien Le Moal, Dan Carpenter; +Cc: linux-scsi, Michal Hocko
On 17/02/2021 00:33, Damien Le Moal wrote:
> On 2021/02/17 4:42, Dan Carpenter wrote:
>> Hello Johannes Thumshirn,
>>
>> The patch 5795eb443060: "scsi: sd_zbc: emulate ZONE_APPEND commands"
>> from May 12, 2020, leads to the following static checker warning:
>>
>> drivers/scsi/sd_zbc.c:741 sd_zbc_revalidate_zones()
>> error: kvmalloc() only makes sense with GFP_KERNEL
>>
>> drivers/scsi/sd_zbc.c
>> 721 /*
>> 722 * There is nothing to do for regular disks, including host-aware disks
>> 723 * that have partitions.
>> 724 */
>> 725 if (!blk_queue_is_zoned(q))
>> 726 return 0;
>> 727
>> 728 /*
>> 729 * Make sure revalidate zones are serialized to ensure exclusive
>> 730 * updates of the scsi disk data.
>> 731 */
>> 732 mutex_lock(&sdkp->rev_mutex);
>> 733
>> 734 if (sdkp->zone_blocks == zone_blocks &&
>> 735 sdkp->nr_zones == nr_zones &&
>> 736 disk->queue->nr_zones == nr_zones)
>> 737 goto unlock;
>> 738
>> 739 sdkp->zone_blocks = zone_blocks;
>> 740 sdkp->nr_zones = nr_zones;
>> 741 sdkp->rev_wp_offset = kvcalloc(nr_zones, sizeof(u32), GFP_NOIO);
>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> We're passing GFP_NOIO here so it just defaults to kcalloc() and will
>> not vmalloc() the memory.
>
> Indeed... And the allocation can get a little too big for kmalloc().
>
> Johannes, I think we need to move that allocation before the rev_mutex locking,
> using a local var for the allocated address, and then using GFP_KERNEL should be
> safe... But not entirely sure. Using kmalloc would be simpler but on large SMR
> drives, that allocation will soon need to be 400K or so (i.e. 100,000 zones or
> even more), too large for kmalloc to succeed reliably.
>
No I don't think so. A mutex isn't a spinlock so we can sleep on the allocation.
We can't use GFP_KERNEL as we're about to do I/O. blk_revalidate_disk_zones() called
a few line below also does the memalloc_noio_{save,restore}() dance.
Would a kmem_cache for these revalidations help us in any way?
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
2021-02-17 6:42 ` Johannes Thumshirn
@ 2021-02-17 8:00 ` Damien Le Moal
2021-02-17 8:03 ` Michal Hocko
1 sibling, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2021-02-17 8:00 UTC (permalink / raw)
To: Johannes Thumshirn, Dan Carpenter; +Cc: linux-scsi, Michal Hocko
On 2021/02/17 15:42, Johannes Thumshirn wrote:
> On 17/02/2021 00:33, Damien Le Moal wrote:
>> On 2021/02/17 4:42, Dan Carpenter wrote:
>>> Hello Johannes Thumshirn,
>>>
>>> The patch 5795eb443060: "scsi: sd_zbc: emulate ZONE_APPEND commands"
>>> from May 12, 2020, leads to the following static checker warning:
>>>
>>> drivers/scsi/sd_zbc.c:741 sd_zbc_revalidate_zones()
>>> error: kvmalloc() only makes sense with GFP_KERNEL
>>>
>>> drivers/scsi/sd_zbc.c
>>> 721 /*
>>> 722 * There is nothing to do for regular disks, including host-aware disks
>>> 723 * that have partitions.
>>> 724 */
>>> 725 if (!blk_queue_is_zoned(q))
>>> 726 return 0;
>>> 727
>>> 728 /*
>>> 729 * Make sure revalidate zones are serialized to ensure exclusive
>>> 730 * updates of the scsi disk data.
>>> 731 */
>>> 732 mutex_lock(&sdkp->rev_mutex);
>>> 733
>>> 734 if (sdkp->zone_blocks == zone_blocks &&
>>> 735 sdkp->nr_zones == nr_zones &&
>>> 736 disk->queue->nr_zones == nr_zones)
>>> 737 goto unlock;
>>> 738
>>> 739 sdkp->zone_blocks = zone_blocks;
>>> 740 sdkp->nr_zones = nr_zones;
>>> 741 sdkp->rev_wp_offset = kvcalloc(nr_zones, sizeof(u32), GFP_NOIO);
>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> We're passing GFP_NOIO here so it just defaults to kcalloc() and will
>>> not vmalloc() the memory.
>>
>> Indeed... And the allocation can get a little too big for kmalloc().
>>
>> Johannes, I think we need to move that allocation before the rev_mutex locking,
>> using a local var for the allocated address, and then using GFP_KERNEL should be
>> safe... But not entirely sure. Using kmalloc would be simpler but on large SMR
>> drives, that allocation will soon need to be 400K or so (i.e. 100,000 zones or
>> even more), too large for kmalloc to succeed reliably.
>>
>
>
> No I don't think so. A mutex isn't a spinlock so we can sleep on the allocation.
> We can't use GFP_KERNEL as we're about to do I/O. blk_revalidate_disk_zones() called
> a few line below also does the memalloc_noio_{save,restore}() dance.
Yes, but blk_revalidate_disk_zones() only allocates the zone bitmaps and these
are much smaller. So kmalloc is fine and GFP_NOIO is natural. For the wp array,
I think we really need kvmalloc() due to the potential very large size (and
growing with new drive models) and GFP_NOIO does not work for that. Not sure if
memalloc_noio_{save,restore}() can change that in vmalloc context (I do not
think so).
> Would a kmem_cache for these revalidations help us in any way?
I am not familiar with that... Would need to dig into it.
For this to be safe, we only need to guarantee forward progress, and in this
case this means not causing problems if a GFP_KERNEL allocation causes us to
reenter the scsi driver for I/Os. Since (I think) revalidation never happens in
FS or I/O context, GFP_KERNEL allocation should be safe if done outside of the
rev_mutex lock. Not 100% sure here, just a hunch... We may need to check the
block layer level to check if there are any locks being held when revalidation
triggers.
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
2021-02-17 6:42 ` Johannes Thumshirn
2021-02-17 8:00 ` Damien Le Moal
@ 2021-02-17 8:03 ` Michal Hocko
2021-02-17 9:08 ` Johannes Thumshirn
2021-02-17 9:13 ` Damien Le Moal
1 sibling, 2 replies; 14+ messages in thread
From: Michal Hocko @ 2021-02-17 8:03 UTC (permalink / raw)
To: Johannes Thumshirn; +Cc: Damien Le Moal, Dan Carpenter, linux-scsi
On Wed 17-02-21 06:42:37, Johannes Thumshirn wrote:
> On 17/02/2021 00:33, Damien Le Moal wrote:
> > On 2021/02/17 4:42, Dan Carpenter wrote:
> >> Hello Johannes Thumshirn,
> >>
> >> The patch 5795eb443060: "scsi: sd_zbc: emulate ZONE_APPEND commands"
> >> from May 12, 2020, leads to the following static checker warning:
> >>
> >> drivers/scsi/sd_zbc.c:741 sd_zbc_revalidate_zones()
> >> error: kvmalloc() only makes sense with GFP_KERNEL
> >>
> >> drivers/scsi/sd_zbc.c
> >> 721 /*
> >> 722 * There is nothing to do for regular disks, including host-aware disks
> >> 723 * that have partitions.
> >> 724 */
> >> 725 if (!blk_queue_is_zoned(q))
> >> 726 return 0;
> >> 727
> >> 728 /*
> >> 729 * Make sure revalidate zones are serialized to ensure exclusive
> >> 730 * updates of the scsi disk data.
> >> 731 */
> >> 732 mutex_lock(&sdkp->rev_mutex);
> >> 733
> >> 734 if (sdkp->zone_blocks == zone_blocks &&
> >> 735 sdkp->nr_zones == nr_zones &&
> >> 736 disk->queue->nr_zones == nr_zones)
> >> 737 goto unlock;
> >> 738
> >> 739 sdkp->zone_blocks = zone_blocks;
> >> 740 sdkp->nr_zones = nr_zones;
> >> 741 sdkp->rev_wp_offset = kvcalloc(nr_zones, sizeof(u32), GFP_NOIO);
> >> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >> We're passing GFP_NOIO here so it just defaults to kcalloc() and will
> >> not vmalloc() the memory.
> >
> > Indeed... And the allocation can get a little too big for kmalloc().
> >
> > Johannes, I think we need to move that allocation before the rev_mutex locking,
> > using a local var for the allocated address, and then using GFP_KERNEL should be
> > safe... But not entirely sure. Using kmalloc would be simpler but on large SMR
> > drives, that allocation will soon need to be 400K or so (i.e. 100,000 zones or
> > even more), too large for kmalloc to succeed reliably.
> >
>
>
> No I don't think so. A mutex isn't a spinlock so we can sleep on the allocation.
> We can't use GFP_KERNEL as we're about to do I/O. blk_revalidate_disk_zones() called
> a few line below also does the memalloc_noio_{save,restore}() dance.
You should be extending noio scope then if this allocation falls into
the same category. Ideally the scope should start at the recursion place
and end where the scope really ened.
>
> Would a kmem_cache for these revalidations help us in any way?
I am not sure what you mean here.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
2021-02-17 8:03 ` Michal Hocko
@ 2021-02-17 9:08 ` Johannes Thumshirn
2021-02-17 9:36 ` Michal Hocko
2021-02-17 9:13 ` Damien Le Moal
1 sibling, 1 reply; 14+ messages in thread
From: Johannes Thumshirn @ 2021-02-17 9:08 UTC (permalink / raw)
To: Michal Hocko; +Cc: Damien Le Moal, Dan Carpenter, linux-scsi
On 17/02/2021 09:03, Michal Hocko wrote:
>> No I don't think so. A mutex isn't a spinlock so we can sleep on the allocation.
>> We can't use GFP_KERNEL as we're about to do I/O. blk_revalidate_disk_zones() called
>> a few line below also does the memalloc_noio_{save,restore}() dance.
>
> You should be extending noio scope then if this allocation falls into
> the same category. Ideally the scope should start at the recursion place
> and end where the scope really ened.
That means all callers of blk_revalidate_disk_zones() should do
memalloc_noio_{save,restore}? If yes, can we somehow runtime assert
that this is done, so we don't end up with bad surprises?
>>
>> Would a kmem_cache for these revalidations help us in any way?
>
> I am not sure what you mean here.
>
Using a kmem_cache for the allocations passed into blk_revalidate_disk_zones().
I've looked into kmem_cache_alloc() and I couldn't find anything that speaks
against it, but I'm not too familiar with the code.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
2021-02-17 8:03 ` Michal Hocko
2021-02-17 9:08 ` Johannes Thumshirn
@ 2021-02-17 9:13 ` Damien Le Moal
2021-02-17 9:18 ` Johannes Thumshirn
2021-02-17 9:32 ` Michal Hocko
1 sibling, 2 replies; 14+ messages in thread
From: Damien Le Moal @ 2021-02-17 9:13 UTC (permalink / raw)
To: Michal Hocko, Johannes Thumshirn; +Cc: Dan Carpenter, linux-scsi
On 2021/02/17 17:03, Michal Hocko wrote:
> On Wed 17-02-21 06:42:37, Johannes Thumshirn wrote:
>> On 17/02/2021 00:33, Damien Le Moal wrote:
>>> On 2021/02/17 4:42, Dan Carpenter wrote:
>>>> Hello Johannes Thumshirn,
>>>>
>>>> The patch 5795eb443060: "scsi: sd_zbc: emulate ZONE_APPEND commands"
>>>> from May 12, 2020, leads to the following static checker warning:
>>>>
>>>> drivers/scsi/sd_zbc.c:741 sd_zbc_revalidate_zones()
>>>> error: kvmalloc() only makes sense with GFP_KERNEL
>>>>
>>>> drivers/scsi/sd_zbc.c
>>>> 721 /*
>>>> 722 * There is nothing to do for regular disks, including host-aware disks
>>>> 723 * that have partitions.
>>>> 724 */
>>>> 725 if (!blk_queue_is_zoned(q))
>>>> 726 return 0;
>>>> 727
>>>> 728 /*
>>>> 729 * Make sure revalidate zones are serialized to ensure exclusive
>>>> 730 * updates of the scsi disk data.
>>>> 731 */
>>>> 732 mutex_lock(&sdkp->rev_mutex);
>>>> 733
>>>> 734 if (sdkp->zone_blocks == zone_blocks &&
>>>> 735 sdkp->nr_zones == nr_zones &&
>>>> 736 disk->queue->nr_zones == nr_zones)
>>>> 737 goto unlock;
>>>> 738
>>>> 739 sdkp->zone_blocks = zone_blocks;
>>>> 740 sdkp->nr_zones = nr_zones;
>>>> 741 sdkp->rev_wp_offset = kvcalloc(nr_zones, sizeof(u32), GFP_NOIO);
>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> We're passing GFP_NOIO here so it just defaults to kcalloc() and will
>>>> not vmalloc() the memory.
>>>
>>> Indeed... And the allocation can get a little too big for kmalloc().
>>>
>>> Johannes, I think we need to move that allocation before the rev_mutex locking,
>>> using a local var for the allocated address, and then using GFP_KERNEL should be
>>> safe... But not entirely sure. Using kmalloc would be simpler but on large SMR
>>> drives, that allocation will soon need to be 400K or so (i.e. 100,000 zones or
>>> even more), too large for kmalloc to succeed reliably.
>>>
>>
>>
>> No I don't think so. A mutex isn't a spinlock so we can sleep on the allocation.
>> We can't use GFP_KERNEL as we're about to do I/O. blk_revalidate_disk_zones() called
>> a few line below also does the memalloc_noio_{save,restore}() dance.
>
> You should be extending noio scope then if this allocation falls into
> the same category. Ideally the scope should start at the recursion place
> and end where the scope really ened.
But it does not look like __vmalloc_node() (fallback in kvmalloc_node() if
kmalloc() fails) cares about the context allocation flags... I can't see
if/where the context allocation flags are taken into account. It looks like only
the gfp_mask argument is used. Am I missing something ?
>>
>> Would a kmem_cache for these revalidations help us in any way?
>
> I am not sure what you mean here.
>
--
Damien Le Moal
Western Digital Research
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
2021-02-17 9:13 ` Damien Le Moal
@ 2021-02-17 9:18 ` Johannes Thumshirn
2021-02-17 11:16 ` Dan Carpenter
2021-02-17 9:32 ` Michal Hocko
1 sibling, 1 reply; 14+ messages in thread
From: Johannes Thumshirn @ 2021-02-17 9:18 UTC (permalink / raw)
To: Damien Le Moal, Michal Hocko; +Cc: Dan Carpenter, linux-scsi
On 17/02/2021 10:13, Damien Le Moal wrote:
> On 2021/02/17 17:03, Michal Hocko wrote:
>> On Wed 17-02-21 06:42:37, Johannes Thumshirn wrote:
>>> On 17/02/2021 00:33, Damien Le Moal wrote:
>>>> On 2021/02/17 4:42, Dan Carpenter wrote:
>>>>> Hello Johannes Thumshirn,
>>>>>
>>>>> The patch 5795eb443060: "scsi: sd_zbc: emulate ZONE_APPEND commands"
>>>>> from May 12, 2020, leads to the following static checker warning:
>>>>>
>>>>> drivers/scsi/sd_zbc.c:741 sd_zbc_revalidate_zones()
>>>>> error: kvmalloc() only makes sense with GFP_KERNEL
>>>>>
>>>>> drivers/scsi/sd_zbc.c
>>>>> 721 /*
>>>>> 722 * There is nothing to do for regular disks, including host-aware disks
>>>>> 723 * that have partitions.
>>>>> 724 */
>>>>> 725 if (!blk_queue_is_zoned(q))
>>>>> 726 return 0;
>>>>> 727
>>>>> 728 /*
>>>>> 729 * Make sure revalidate zones are serialized to ensure exclusive
>>>>> 730 * updates of the scsi disk data.
>>>>> 731 */
>>>>> 732 mutex_lock(&sdkp->rev_mutex);
>>>>> 733
>>>>> 734 if (sdkp->zone_blocks == zone_blocks &&
>>>>> 735 sdkp->nr_zones == nr_zones &&
>>>>> 736 disk->queue->nr_zones == nr_zones)
>>>>> 737 goto unlock;
>>>>> 738
>>>>> 739 sdkp->zone_blocks = zone_blocks;
>>>>> 740 sdkp->nr_zones = nr_zones;
>>>>> 741 sdkp->rev_wp_offset = kvcalloc(nr_zones, sizeof(u32), GFP_NOIO);
>>>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>> We're passing GFP_NOIO here so it just defaults to kcalloc() and will
>>>>> not vmalloc() the memory.
>>>>
>>>> Indeed... And the allocation can get a little too big for kmalloc().
>>>>
>>>> Johannes, I think we need to move that allocation before the rev_mutex locking,
>>>> using a local var for the allocated address, and then using GFP_KERNEL should be
>>>> safe... But not entirely sure. Using kmalloc would be simpler but on large SMR
>>>> drives, that allocation will soon need to be 400K or so (i.e. 100,000 zones or
>>>> even more), too large for kmalloc to succeed reliably.
>>>>
>>>
>>>
>>> No I don't think so. A mutex isn't a spinlock so we can sleep on the allocation.
>>> We can't use GFP_KERNEL as we're about to do I/O. blk_revalidate_disk_zones() called
>>> a few line below also does the memalloc_noio_{save,restore}() dance.
>>
>> You should be extending noio scope then if this allocation falls into
>> the same category. Ideally the scope should start at the recursion place
>> and end where the scope really ened.
>
> But it does not look like __vmalloc_node() (fallback in kvmalloc_node() if
> kmalloc() fails) cares about the context allocation flags... I can't see
> if/where the context allocation flags are taken into account. It looks like only
> the gfp_mask argument is used. Am I missing something ?
It does here:
void *kvmalloc_node(size_t size, gfp_t flags, int node)
{
gfp_t kmalloc_flags = flags;
void *ret;
/*
* vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
* so the given set of flags has to be compatible.
*/
if ((flags & GFP_KERNEL) != GFP_KERNEL)
return kmalloc_node(size, flags, node);
/* [...] */
return __vmalloc_node(size, 1, flags, node,
__builtin_return_address(0));
}
EXPORT_SYMBOL(kvmalloc_node);
We're not reaching __vmalloc_node() if GFP_KERNEL isn't set. Also, from the, function's
documentation:
* @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
So yeah, we should've read that before calling kvcalloc(..., GFP_NOIO).
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
2021-02-17 9:13 ` Damien Le Moal
2021-02-17 9:18 ` Johannes Thumshirn
@ 2021-02-17 9:32 ` Michal Hocko
1 sibling, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2021-02-17 9:32 UTC (permalink / raw)
To: Damien Le Moal; +Cc: Johannes Thumshirn, Dan Carpenter, linux-scsi
On Wed 17-02-21 09:13:57, Damien Le Moal wrote:
> On 2021/02/17 17:03, Michal Hocko wrote:
> > On Wed 17-02-21 06:42:37, Johannes Thumshirn wrote:
> >> On 17/02/2021 00:33, Damien Le Moal wrote:
> >>> On 2021/02/17 4:42, Dan Carpenter wrote:
> >>>> Hello Johannes Thumshirn,
> >>>>
> >>>> The patch 5795eb443060: "scsi: sd_zbc: emulate ZONE_APPEND commands"
> >>>> from May 12, 2020, leads to the following static checker warning:
> >>>>
> >>>> drivers/scsi/sd_zbc.c:741 sd_zbc_revalidate_zones()
> >>>> error: kvmalloc() only makes sense with GFP_KERNEL
> >>>>
> >>>> drivers/scsi/sd_zbc.c
> >>>> 721 /*
> >>>> 722 * There is nothing to do for regular disks, including host-aware disks
> >>>> 723 * that have partitions.
> >>>> 724 */
> >>>> 725 if (!blk_queue_is_zoned(q))
> >>>> 726 return 0;
> >>>> 727
> >>>> 728 /*
> >>>> 729 * Make sure revalidate zones are serialized to ensure exclusive
> >>>> 730 * updates of the scsi disk data.
> >>>> 731 */
> >>>> 732 mutex_lock(&sdkp->rev_mutex);
> >>>> 733
> >>>> 734 if (sdkp->zone_blocks == zone_blocks &&
> >>>> 735 sdkp->nr_zones == nr_zones &&
> >>>> 736 disk->queue->nr_zones == nr_zones)
> >>>> 737 goto unlock;
> >>>> 738
> >>>> 739 sdkp->zone_blocks = zone_blocks;
> >>>> 740 sdkp->nr_zones = nr_zones;
> >>>> 741 sdkp->rev_wp_offset = kvcalloc(nr_zones, sizeof(u32), GFP_NOIO);
> >>>> ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >>>> We're passing GFP_NOIO here so it just defaults to kcalloc() and will
> >>>> not vmalloc() the memory.
> >>>
> >>> Indeed... And the allocation can get a little too big for kmalloc().
> >>>
> >>> Johannes, I think we need to move that allocation before the rev_mutex locking,
> >>> using a local var for the allocated address, and then using GFP_KERNEL should be
> >>> safe... But not entirely sure. Using kmalloc would be simpler but on large SMR
> >>> drives, that allocation will soon need to be 400K or so (i.e. 100,000 zones or
> >>> even more), too large for kmalloc to succeed reliably.
> >>>
> >>
> >>
> >> No I don't think so. A mutex isn't a spinlock so we can sleep on the allocation.
> >> We can't use GFP_KERNEL as we're about to do I/O. blk_revalidate_disk_zones() called
> >> a few line below also does the memalloc_noio_{save,restore}() dance.
> >
> > You should be extending noio scope then if this allocation falls into
> > the same category. Ideally the scope should start at the recursion place
> > and end where the scope really ened.
>
> But it does not look like __vmalloc_node() (fallback in kvmalloc_node() if
> kmalloc() fails) cares about the context allocation flags... I can't see
> if/where the context allocation flags are taken into account. It looks like only
> the gfp_mask argument is used. Am I missing something ?
current_gfp_context in the page allocator. vmalloc doesn't do any
reclaim on its own. It relies on the page allocator for that.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
2021-02-17 9:08 ` Johannes Thumshirn
@ 2021-02-17 9:36 ` Michal Hocko
2021-02-17 9:41 ` Johannes Thumshirn
0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2021-02-17 9:36 UTC (permalink / raw)
To: Johannes Thumshirn; +Cc: Damien Le Moal, Dan Carpenter, linux-scsi
On Wed 17-02-21 09:08:07, Johannes Thumshirn wrote:
> On 17/02/2021 09:03, Michal Hocko wrote:
> >> No I don't think so. A mutex isn't a spinlock so we can sleep on the allocation.
> >> We can't use GFP_KERNEL as we're about to do I/O. blk_revalidate_disk_zones() called
> >> a few line below also does the memalloc_noio_{save,restore}() dance.
> >
> > You should be extending noio scope then if this allocation falls into
> > the same category. Ideally the scope should start at the recursion place
> > and end where the scope really ened.
>
> That means all callers of blk_revalidate_disk_zones() should do
> memalloc_noio_{save,restore}?
I am not really familiar with the IO area to answer this. The base idea
is to start the NOIO scope at the boundary which defines "unsafe to
re-enter or cannot deal with a new IO" from the reclaim path.
> If yes, can we somehow runtime assert that this is done, so we don't
> end up with bad surprises?
Could you elaborate?
> >> Would a kmem_cache for these revalidations help us in any way?
> >
> > I am not sure what you mean here.
> >
>
> Using a kmem_cache for the allocations passed into blk_revalidate_disk_zones().
> I've looked into kmem_cache_alloc() and I couldn't find anything that speaks
> against it, but I'm not too familiar with the code.
kmem_cache_alloc is only an extension to allow to allocate from a
specific cache. I do not really see how it is going to help with larger
allocation and my current understanding is that kvmalloc is used because
the requested allocation size can be large.
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
2021-02-17 9:36 ` Michal Hocko
@ 2021-02-17 9:41 ` Johannes Thumshirn
2021-02-17 10:05 ` Michal Hocko
0 siblings, 1 reply; 14+ messages in thread
From: Johannes Thumshirn @ 2021-02-17 9:41 UTC (permalink / raw)
To: Michal Hocko; +Cc: Damien Le Moal, Dan Carpenter, linux-scsi
On 17/02/2021 10:36, Michal Hocko wrote:
> On Wed 17-02-21 09:08:07, Johannes Thumshirn wrote:
>> On 17/02/2021 09:03, Michal Hocko wrote:
>>>> No I don't think so. A mutex isn't a spinlock so we can sleep on the allocation.
>>>> We can't use GFP_KERNEL as we're about to do I/O. blk_revalidate_disk_zones() called
>>>> a few line below also does the memalloc_noio_{save,restore}() dance.
>>>
>>> You should be extending noio scope then if this allocation falls into
>>> the same category. Ideally the scope should start at the recursion place
>>> and end where the scope really ened.
>>
>> That means all callers of blk_revalidate_disk_zones() should do
>> memalloc_noio_{save,restore}?
>
> I am not really familiar with the IO area to answer this. The base idea
> is to start the NOIO scope at the boundary which defines "unsafe to
> re-enter or cannot deal with a new IO" from the reclaim path.
>
>> If yes, can we somehow runtime assert that this is done, so we don't
>> end up with bad surprises?
>
> Could you elaborate?
I though of lifting the noio scope into the callers of
blk_revalidate_disk_zones() and then "check" in blk_revalidate_disk_zones()
this has been done. But it looks like memalloc_noio_save() can handle nesting,
so this is actually unneeded.
>
>>>> Would a kmem_cache for these revalidations help us in any way?
>>>
>>> I am not sure what you mean here.
>>>
>>
>> Using a kmem_cache for the allocations passed into blk_revalidate_disk_zones().
>> I've looked into kmem_cache_alloc() and I couldn't find anything that speaks
>> against it, but I'm not too familiar with the code.
>
> kmem_cache_alloc is only an extension to allow to allocate from a
> specific cache. I do not really see how it is going to help with larger
> allocation and my current understanding is that kvmalloc is used because
> the requested allocation size can be large.
>
Ah ok so we can't set aside a big enough pool to do allocations from there,
this was a misunderstanding from my side.
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
2021-02-17 9:41 ` Johannes Thumshirn
@ 2021-02-17 10:05 ` Michal Hocko
0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2021-02-17 10:05 UTC (permalink / raw)
To: Johannes Thumshirn; +Cc: Damien Le Moal, Dan Carpenter, linux-scsi
On Wed 17-02-21 09:41:58, Johannes Thumshirn wrote:
> On 17/02/2021 10:36, Michal Hocko wrote:
> > On Wed 17-02-21 09:08:07, Johannes Thumshirn wrote:
> >> On 17/02/2021 09:03, Michal Hocko wrote:
> >>>> No I don't think so. A mutex isn't a spinlock so we can sleep on the allocation.
> >>>> We can't use GFP_KERNEL as we're about to do I/O. blk_revalidate_disk_zones() called
> >>>> a few line below also does the memalloc_noio_{save,restore}() dance.
> >>>
> >>> You should be extending noio scope then if this allocation falls into
> >>> the same category. Ideally the scope should start at the recursion place
> >>> and end where the scope really ened.
> >>
> >> That means all callers of blk_revalidate_disk_zones() should do
> >> memalloc_noio_{save,restore}?
> >
> > I am not really familiar with the IO area to answer this. The base idea
> > is to start the NOIO scope at the boundary which defines "unsafe to
> > re-enter or cannot deal with a new IO" from the reclaim path.
> >
> >> If yes, can we somehow runtime assert that this is done, so we don't
> >> end up with bad surprises?
> >
> > Could you elaborate?
>
> I though of lifting the noio scope into the callers of
> blk_revalidate_disk_zones() and then "check" in blk_revalidate_disk_zones()
> this has been done. But it looks like memalloc_noio_save() can handle nesting,
> so this is actually unneeded.
Yes, it can handle nesting. As to where to start the scope, this should
be at the boundary of "do not reenter IO path from the reclaim". This
can be a lock which is held for the allocation which will be needed for
an IO induced by the reclaim or any other resouce that will block from
the IO path. Is this more clear now?
--
Michal Hocko
SUSE Labs
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
2021-02-17 9:18 ` Johannes Thumshirn
@ 2021-02-17 11:16 ` Dan Carpenter
2021-02-17 11:18 ` Johannes Thumshirn
0 siblings, 1 reply; 14+ messages in thread
From: Dan Carpenter @ 2021-02-17 11:16 UTC (permalink / raw)
To: Johannes Thumshirn; +Cc: Damien Le Moal, Michal Hocko, linux-scsi
On Wed, Feb 17, 2021 at 09:18:37AM +0000, Johannes Thumshirn wrote:
> * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
>
> So yeah, we should've read that before calling kvcalloc(..., GFP_NOIO).
This is a Smatch check now and checks are better than comments. :)
regards,
dan carpenter
^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
2021-02-17 11:16 ` Dan Carpenter
@ 2021-02-17 11:18 ` Johannes Thumshirn
0 siblings, 0 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2021-02-17 11:18 UTC (permalink / raw)
To: Dan Carpenter; +Cc: Damien Le Moal, Michal Hocko, linux-scsi
On 17/02/2021 12:17, Dan Carpenter wrote:
> On Wed, Feb 17, 2021 at 09:18:37AM +0000, Johannes Thumshirn wrote:
>> * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
>>
>> So yeah, we should've read that before calling kvcalloc(..., GFP_NOIO).
>
> This is a Smatch check now and checks are better than comments. :)
Definitively
^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2021-02-17 11:19 UTC | newest]
Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-16 19:39 [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands Dan Carpenter
2021-02-16 23:33 ` Damien Le Moal
2021-02-17 6:42 ` Johannes Thumshirn
2021-02-17 8:00 ` Damien Le Moal
2021-02-17 8:03 ` Michal Hocko
2021-02-17 9:08 ` Johannes Thumshirn
2021-02-17 9:36 ` Michal Hocko
2021-02-17 9:41 ` Johannes Thumshirn
2021-02-17 10:05 ` Michal Hocko
2021-02-17 9:13 ` Damien Le Moal
2021-02-17 9:18 ` Johannes Thumshirn
2021-02-17 11:16 ` Dan Carpenter
2021-02-17 11:18 ` Johannes Thumshirn
2021-02-17 9:32 ` Michal Hocko
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.