All of lore.kernel.org
 help / color / mirror / Atom feed
* [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
@ 2021-02-16 19:39 Dan Carpenter
  2021-02-16 23:33 ` Damien Le Moal
  0 siblings, 1 reply; 14+ messages in thread
From: Dan Carpenter @ 2021-02-16 19:39 UTC (permalink / raw)
  To: johannes.thumshirn; +Cc: linux-scsi, Michal Hocko

Hello Johannes Thumshirn,

The patch 5795eb443060: "scsi: sd_zbc: emulate ZONE_APPEND commands"
from May 12, 2020, leads to the following static checker warning:

	drivers/scsi/sd_zbc.c:741 sd_zbc_revalidate_zones()
	error: kvmalloc() only makes sense with GFP_KERNEL

drivers/scsi/sd_zbc.c
   721          /*
   722           * There is nothing to do for regular disks, including host-aware disks
   723           * that have partitions.
   724           */
   725          if (!blk_queue_is_zoned(q))
   726                  return 0;
   727  
   728          /*
   729           * Make sure revalidate zones are serialized to ensure exclusive
   730           * updates of the scsi disk data.
   731           */
   732          mutex_lock(&sdkp->rev_mutex);
   733  
   734          if (sdkp->zone_blocks == zone_blocks &&
   735              sdkp->nr_zones == nr_zones &&
   736              disk->queue->nr_zones == nr_zones)
   737                  goto unlock;
   738  
   739          sdkp->zone_blocks = zone_blocks;
   740          sdkp->nr_zones = nr_zones;
   741          sdkp->rev_wp_offset = kvcalloc(nr_zones, sizeof(u32), GFP_NOIO);
                                      ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
We're passing GFP_NOIO here so it just defaults to kcalloc() and will
not vmalloc() the memory.

   742          if (!sdkp->rev_wp_offset) {
   743                  ret = -ENOMEM;
   744                  goto unlock;
   745          }
   746  
   747          ret = blk_revalidate_disk_zones(disk, sd_zbc_revalidate_zones_cb);
   748  

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
  2021-02-16 19:39 [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands Dan Carpenter
@ 2021-02-16 23:33 ` Damien Le Moal
  2021-02-17  6:42   ` Johannes Thumshirn
  0 siblings, 1 reply; 14+ messages in thread
From: Damien Le Moal @ 2021-02-16 23:33 UTC (permalink / raw)
  To: Dan Carpenter, Johannes Thumshirn; +Cc: linux-scsi, Michal Hocko

On 2021/02/17 4:42, Dan Carpenter wrote:
> Hello Johannes Thumshirn,
> 
> The patch 5795eb443060: "scsi: sd_zbc: emulate ZONE_APPEND commands"
> from May 12, 2020, leads to the following static checker warning:
> 
> 	drivers/scsi/sd_zbc.c:741 sd_zbc_revalidate_zones()
> 	error: kvmalloc() only makes sense with GFP_KERNEL
> 
> drivers/scsi/sd_zbc.c
>    721          /*
>    722           * There is nothing to do for regular disks, including host-aware disks
>    723           * that have partitions.
>    724           */
>    725          if (!blk_queue_is_zoned(q))
>    726                  return 0;
>    727  
>    728          /*
>    729           * Make sure revalidate zones are serialized to ensure exclusive
>    730           * updates of the scsi disk data.
>    731           */
>    732          mutex_lock(&sdkp->rev_mutex);
>    733  
>    734          if (sdkp->zone_blocks == zone_blocks &&
>    735              sdkp->nr_zones == nr_zones &&
>    736              disk->queue->nr_zones == nr_zones)
>    737                  goto unlock;
>    738  
>    739          sdkp->zone_blocks = zone_blocks;
>    740          sdkp->nr_zones = nr_zones;
>    741          sdkp->rev_wp_offset = kvcalloc(nr_zones, sizeof(u32), GFP_NOIO);
>                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> We're passing GFP_NOIO here so it just defaults to kcalloc() and will
> not vmalloc() the memory.

Indeed... And the allocation can get a little too big for kmalloc().

Johannes, I think we need to move that allocation before the rev_mutex locking,
using a local var for the allocated address, and then using GFP_KERNEL should be
safe... But not entirely sure. Using kmalloc would be simpler but on large SMR
drives, that allocation will soon need to be 400K or so (i.e. 100,000 zones or
even more), too large for kmalloc to succeed reliably.

> 
>    742          if (!sdkp->rev_wp_offset) {
>    743                  ret = -ENOMEM;
>    744                  goto unlock;
>    745          }
>    746  
>    747          ret = blk_revalidate_disk_zones(disk, sd_zbc_revalidate_zones_cb);
>    748  
> 
> regards,
> dan carpenter
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
  2021-02-16 23:33 ` Damien Le Moal
@ 2021-02-17  6:42   ` Johannes Thumshirn
  2021-02-17  8:00     ` Damien Le Moal
  2021-02-17  8:03     ` Michal Hocko
  0 siblings, 2 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2021-02-17  6:42 UTC (permalink / raw)
  To: Damien Le Moal, Dan Carpenter; +Cc: linux-scsi, Michal Hocko

On 17/02/2021 00:33, Damien Le Moal wrote:
> On 2021/02/17 4:42, Dan Carpenter wrote:
>> Hello Johannes Thumshirn,
>>
>> The patch 5795eb443060: "scsi: sd_zbc: emulate ZONE_APPEND commands"
>> from May 12, 2020, leads to the following static checker warning:
>>
>> 	drivers/scsi/sd_zbc.c:741 sd_zbc_revalidate_zones()
>> 	error: kvmalloc() only makes sense with GFP_KERNEL
>>
>> drivers/scsi/sd_zbc.c
>>    721          /*
>>    722           * There is nothing to do for regular disks, including host-aware disks
>>    723           * that have partitions.
>>    724           */
>>    725          if (!blk_queue_is_zoned(q))
>>    726                  return 0;
>>    727  
>>    728          /*
>>    729           * Make sure revalidate zones are serialized to ensure exclusive
>>    730           * updates of the scsi disk data.
>>    731           */
>>    732          mutex_lock(&sdkp->rev_mutex);
>>    733  
>>    734          if (sdkp->zone_blocks == zone_blocks &&
>>    735              sdkp->nr_zones == nr_zones &&
>>    736              disk->queue->nr_zones == nr_zones)
>>    737                  goto unlock;
>>    738  
>>    739          sdkp->zone_blocks = zone_blocks;
>>    740          sdkp->nr_zones = nr_zones;
>>    741          sdkp->rev_wp_offset = kvcalloc(nr_zones, sizeof(u32), GFP_NOIO);
>>                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>> We're passing GFP_NOIO here so it just defaults to kcalloc() and will
>> not vmalloc() the memory.
> 
> Indeed... And the allocation can get a little too big for kmalloc().
> 
> Johannes, I think we need to move that allocation before the rev_mutex locking,
> using a local var for the allocated address, and then using GFP_KERNEL should be
> safe... But not entirely sure. Using kmalloc would be simpler but on large SMR
> drives, that allocation will soon need to be 400K or so (i.e. 100,000 zones or
> even more), too large for kmalloc to succeed reliably.
> 


No I don't think so. A mutex isn't a spinlock so we can sleep on the allocation.
We can't use GFP_KERNEL as we're about to do I/O. blk_revalidate_disk_zones() called
a few line below also does the memalloc_noio_{save,restore}() dance.

Would a kmem_cache for these revalidations help us in any way?

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
  2021-02-17  6:42   ` Johannes Thumshirn
@ 2021-02-17  8:00     ` Damien Le Moal
  2021-02-17  8:03     ` Michal Hocko
  1 sibling, 0 replies; 14+ messages in thread
From: Damien Le Moal @ 2021-02-17  8:00 UTC (permalink / raw)
  To: Johannes Thumshirn, Dan Carpenter; +Cc: linux-scsi, Michal Hocko

On 2021/02/17 15:42, Johannes Thumshirn wrote:
> On 17/02/2021 00:33, Damien Le Moal wrote:
>> On 2021/02/17 4:42, Dan Carpenter wrote:
>>> Hello Johannes Thumshirn,
>>>
>>> The patch 5795eb443060: "scsi: sd_zbc: emulate ZONE_APPEND commands"
>>> from May 12, 2020, leads to the following static checker warning:
>>>
>>> 	drivers/scsi/sd_zbc.c:741 sd_zbc_revalidate_zones()
>>> 	error: kvmalloc() only makes sense with GFP_KERNEL
>>>
>>> drivers/scsi/sd_zbc.c
>>>    721          /*
>>>    722           * There is nothing to do for regular disks, including host-aware disks
>>>    723           * that have partitions.
>>>    724           */
>>>    725          if (!blk_queue_is_zoned(q))
>>>    726                  return 0;
>>>    727  
>>>    728          /*
>>>    729           * Make sure revalidate zones are serialized to ensure exclusive
>>>    730           * updates of the scsi disk data.
>>>    731           */
>>>    732          mutex_lock(&sdkp->rev_mutex);
>>>    733  
>>>    734          if (sdkp->zone_blocks == zone_blocks &&
>>>    735              sdkp->nr_zones == nr_zones &&
>>>    736              disk->queue->nr_zones == nr_zones)
>>>    737                  goto unlock;
>>>    738  
>>>    739          sdkp->zone_blocks = zone_blocks;
>>>    740          sdkp->nr_zones = nr_zones;
>>>    741          sdkp->rev_wp_offset = kvcalloc(nr_zones, sizeof(u32), GFP_NOIO);
>>>                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>> We're passing GFP_NOIO here so it just defaults to kcalloc() and will
>>> not vmalloc() the memory.
>>
>> Indeed... And the allocation can get a little too big for kmalloc().
>>
>> Johannes, I think we need to move that allocation before the rev_mutex locking,
>> using a local var for the allocated address, and then using GFP_KERNEL should be
>> safe... But not entirely sure. Using kmalloc would be simpler but on large SMR
>> drives, that allocation will soon need to be 400K or so (i.e. 100,000 zones or
>> even more), too large for kmalloc to succeed reliably.
>>
> 
> 
> No I don't think so. A mutex isn't a spinlock so we can sleep on the allocation.
> We can't use GFP_KERNEL as we're about to do I/O. blk_revalidate_disk_zones() called
> a few line below also does the memalloc_noio_{save,restore}() dance.

Yes, but blk_revalidate_disk_zones() only allocates the zone bitmaps and these
are much smaller. So kmalloc is fine and GFP_NOIO is natural. For the wp array,
I think we really need kvmalloc() due to the potential very large size (and
growing with new drive models) and GFP_NOIO does not work for that. Not sure if
memalloc_noio_{save,restore}() can change that in vmalloc context (I do not
think so).

> Would a kmem_cache for these revalidations help us in any way?

I am not familiar with that... Would need to dig into it.

For this to be safe, we only need to guarantee forward progress, and in this
case this means not causing problems if a GFP_KERNEL allocation causes us to
reenter the scsi driver for I/Os. Since (I think) revalidation never happens in
FS or I/O context, GFP_KERNEL allocation should be safe if done outside of the
rev_mutex lock. Not 100% sure here, just a hunch... We may need to check the
block layer level to check if there are any locks being held when revalidation
triggers.



-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
  2021-02-17  6:42   ` Johannes Thumshirn
  2021-02-17  8:00     ` Damien Le Moal
@ 2021-02-17  8:03     ` Michal Hocko
  2021-02-17  9:08       ` Johannes Thumshirn
  2021-02-17  9:13       ` Damien Le Moal
  1 sibling, 2 replies; 14+ messages in thread
From: Michal Hocko @ 2021-02-17  8:03 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: Damien Le Moal, Dan Carpenter, linux-scsi

On Wed 17-02-21 06:42:37, Johannes Thumshirn wrote:
> On 17/02/2021 00:33, Damien Le Moal wrote:
> > On 2021/02/17 4:42, Dan Carpenter wrote:
> >> Hello Johannes Thumshirn,
> >>
> >> The patch 5795eb443060: "scsi: sd_zbc: emulate ZONE_APPEND commands"
> >> from May 12, 2020, leads to the following static checker warning:
> >>
> >> 	drivers/scsi/sd_zbc.c:741 sd_zbc_revalidate_zones()
> >> 	error: kvmalloc() only makes sense with GFP_KERNEL
> >>
> >> drivers/scsi/sd_zbc.c
> >>    721          /*
> >>    722           * There is nothing to do for regular disks, including host-aware disks
> >>    723           * that have partitions.
> >>    724           */
> >>    725          if (!blk_queue_is_zoned(q))
> >>    726                  return 0;
> >>    727  
> >>    728          /*
> >>    729           * Make sure revalidate zones are serialized to ensure exclusive
> >>    730           * updates of the scsi disk data.
> >>    731           */
> >>    732          mutex_lock(&sdkp->rev_mutex);
> >>    733  
> >>    734          if (sdkp->zone_blocks == zone_blocks &&
> >>    735              sdkp->nr_zones == nr_zones &&
> >>    736              disk->queue->nr_zones == nr_zones)
> >>    737                  goto unlock;
> >>    738  
> >>    739          sdkp->zone_blocks = zone_blocks;
> >>    740          sdkp->nr_zones = nr_zones;
> >>    741          sdkp->rev_wp_offset = kvcalloc(nr_zones, sizeof(u32), GFP_NOIO);
> >>                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >> We're passing GFP_NOIO here so it just defaults to kcalloc() and will
> >> not vmalloc() the memory.
> > 
> > Indeed... And the allocation can get a little too big for kmalloc().
> > 
> > Johannes, I think we need to move that allocation before the rev_mutex locking,
> > using a local var for the allocated address, and then using GFP_KERNEL should be
> > safe... But not entirely sure. Using kmalloc would be simpler but on large SMR
> > drives, that allocation will soon need to be 400K or so (i.e. 100,000 zones or
> > even more), too large for kmalloc to succeed reliably.
> > 
> 
> 
> No I don't think so. A mutex isn't a spinlock so we can sleep on the allocation.
> We can't use GFP_KERNEL as we're about to do I/O. blk_revalidate_disk_zones() called
> a few line below also does the memalloc_noio_{save,restore}() dance.

You should be extending noio scope then if this allocation falls into
the same category. Ideally the scope should start at the recursion place
and end where the scope really ened.
> 
> Would a kmem_cache for these revalidations help us in any way?

I am not sure what you mean here.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
  2021-02-17  8:03     ` Michal Hocko
@ 2021-02-17  9:08       ` Johannes Thumshirn
  2021-02-17  9:36         ` Michal Hocko
  2021-02-17  9:13       ` Damien Le Moal
  1 sibling, 1 reply; 14+ messages in thread
From: Johannes Thumshirn @ 2021-02-17  9:08 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Damien Le Moal, Dan Carpenter, linux-scsi

On 17/02/2021 09:03, Michal Hocko wrote:
>> No I don't think so. A mutex isn't a spinlock so we can sleep on the allocation.
>> We can't use GFP_KERNEL as we're about to do I/O. blk_revalidate_disk_zones() called
>> a few line below also does the memalloc_noio_{save,restore}() dance.
> 
> You should be extending noio scope then if this allocation falls into
> the same category. Ideally the scope should start at the recursion place
> and end where the scope really ened.

That means all callers of blk_revalidate_disk_zones() should do 
memalloc_noio_{save,restore}? If yes, can we somehow runtime assert
that this is done, so we don't end up with bad surprises?

>>
>> Would a kmem_cache for these revalidations help us in any way?
> 
> I am not sure what you mean here.
> 

Using a kmem_cache for the allocations passed into blk_revalidate_disk_zones().
I've looked into kmem_cache_alloc() and I couldn't find anything that speaks 
against it, but I'm not too familiar with the code.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
  2021-02-17  8:03     ` Michal Hocko
  2021-02-17  9:08       ` Johannes Thumshirn
@ 2021-02-17  9:13       ` Damien Le Moal
  2021-02-17  9:18         ` Johannes Thumshirn
  2021-02-17  9:32         ` Michal Hocko
  1 sibling, 2 replies; 14+ messages in thread
From: Damien Le Moal @ 2021-02-17  9:13 UTC (permalink / raw)
  To: Michal Hocko, Johannes Thumshirn; +Cc: Dan Carpenter, linux-scsi

On 2021/02/17 17:03, Michal Hocko wrote:
> On Wed 17-02-21 06:42:37, Johannes Thumshirn wrote:
>> On 17/02/2021 00:33, Damien Le Moal wrote:
>>> On 2021/02/17 4:42, Dan Carpenter wrote:
>>>> Hello Johannes Thumshirn,
>>>>
>>>> The patch 5795eb443060: "scsi: sd_zbc: emulate ZONE_APPEND commands"
>>>> from May 12, 2020, leads to the following static checker warning:
>>>>
>>>> 	drivers/scsi/sd_zbc.c:741 sd_zbc_revalidate_zones()
>>>> 	error: kvmalloc() only makes sense with GFP_KERNEL
>>>>
>>>> drivers/scsi/sd_zbc.c
>>>>    721          /*
>>>>    722           * There is nothing to do for regular disks, including host-aware disks
>>>>    723           * that have partitions.
>>>>    724           */
>>>>    725          if (!blk_queue_is_zoned(q))
>>>>    726                  return 0;
>>>>    727  
>>>>    728          /*
>>>>    729           * Make sure revalidate zones are serialized to ensure exclusive
>>>>    730           * updates of the scsi disk data.
>>>>    731           */
>>>>    732          mutex_lock(&sdkp->rev_mutex);
>>>>    733  
>>>>    734          if (sdkp->zone_blocks == zone_blocks &&
>>>>    735              sdkp->nr_zones == nr_zones &&
>>>>    736              disk->queue->nr_zones == nr_zones)
>>>>    737                  goto unlock;
>>>>    738  
>>>>    739          sdkp->zone_blocks = zone_blocks;
>>>>    740          sdkp->nr_zones = nr_zones;
>>>>    741          sdkp->rev_wp_offset = kvcalloc(nr_zones, sizeof(u32), GFP_NOIO);
>>>>                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>> We're passing GFP_NOIO here so it just defaults to kcalloc() and will
>>>> not vmalloc() the memory.
>>>
>>> Indeed... And the allocation can get a little too big for kmalloc().
>>>
>>> Johannes, I think we need to move that allocation before the rev_mutex locking,
>>> using a local var for the allocated address, and then using GFP_KERNEL should be
>>> safe... But not entirely sure. Using kmalloc would be simpler but on large SMR
>>> drives, that allocation will soon need to be 400K or so (i.e. 100,000 zones or
>>> even more), too large for kmalloc to succeed reliably.
>>>
>>
>>
>> No I don't think so. A mutex isn't a spinlock so we can sleep on the allocation.
>> We can't use GFP_KERNEL as we're about to do I/O. blk_revalidate_disk_zones() called
>> a few line below also does the memalloc_noio_{save,restore}() dance.
> 
> You should be extending noio scope then if this allocation falls into
> the same category. Ideally the scope should start at the recursion place
> and end where the scope really ened.

But it does not look like __vmalloc_node() (fallback in kvmalloc_node() if
kmalloc() fails) cares about the context allocation flags... I can't see
if/where the context allocation flags are taken into account. It looks like only
the gfp_mask argument is used. Am I missing something ?

>>
>> Would a kmem_cache for these revalidations help us in any way?
> 
> I am not sure what you mean here.
> 


-- 
Damien Le Moal
Western Digital Research

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
  2021-02-17  9:13       ` Damien Le Moal
@ 2021-02-17  9:18         ` Johannes Thumshirn
  2021-02-17 11:16           ` Dan Carpenter
  2021-02-17  9:32         ` Michal Hocko
  1 sibling, 1 reply; 14+ messages in thread
From: Johannes Thumshirn @ 2021-02-17  9:18 UTC (permalink / raw)
  To: Damien Le Moal, Michal Hocko; +Cc: Dan Carpenter, linux-scsi

On 17/02/2021 10:13, Damien Le Moal wrote:
> On 2021/02/17 17:03, Michal Hocko wrote:
>> On Wed 17-02-21 06:42:37, Johannes Thumshirn wrote:
>>> On 17/02/2021 00:33, Damien Le Moal wrote:
>>>> On 2021/02/17 4:42, Dan Carpenter wrote:
>>>>> Hello Johannes Thumshirn,
>>>>>
>>>>> The patch 5795eb443060: "scsi: sd_zbc: emulate ZONE_APPEND commands"
>>>>> from May 12, 2020, leads to the following static checker warning:
>>>>>
>>>>> 	drivers/scsi/sd_zbc.c:741 sd_zbc_revalidate_zones()
>>>>> 	error: kvmalloc() only makes sense with GFP_KERNEL
>>>>>
>>>>> drivers/scsi/sd_zbc.c
>>>>>    721          /*
>>>>>    722           * There is nothing to do for regular disks, including host-aware disks
>>>>>    723           * that have partitions.
>>>>>    724           */
>>>>>    725          if (!blk_queue_is_zoned(q))
>>>>>    726                  return 0;
>>>>>    727  
>>>>>    728          /*
>>>>>    729           * Make sure revalidate zones are serialized to ensure exclusive
>>>>>    730           * updates of the scsi disk data.
>>>>>    731           */
>>>>>    732          mutex_lock(&sdkp->rev_mutex);
>>>>>    733  
>>>>>    734          if (sdkp->zone_blocks == zone_blocks &&
>>>>>    735              sdkp->nr_zones == nr_zones &&
>>>>>    736              disk->queue->nr_zones == nr_zones)
>>>>>    737                  goto unlock;
>>>>>    738  
>>>>>    739          sdkp->zone_blocks = zone_blocks;
>>>>>    740          sdkp->nr_zones = nr_zones;
>>>>>    741          sdkp->rev_wp_offset = kvcalloc(nr_zones, sizeof(u32), GFP_NOIO);
>>>>>                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
>>>>> We're passing GFP_NOIO here so it just defaults to kcalloc() and will
>>>>> not vmalloc() the memory.
>>>>
>>>> Indeed... And the allocation can get a little too big for kmalloc().
>>>>
>>>> Johannes, I think we need to move that allocation before the rev_mutex locking,
>>>> using a local var for the allocated address, and then using GFP_KERNEL should be
>>>> safe... But not entirely sure. Using kmalloc would be simpler but on large SMR
>>>> drives, that allocation will soon need to be 400K or so (i.e. 100,000 zones or
>>>> even more), too large for kmalloc to succeed reliably.
>>>>
>>>
>>>
>>> No I don't think so. A mutex isn't a spinlock so we can sleep on the allocation.
>>> We can't use GFP_KERNEL as we're about to do I/O. blk_revalidate_disk_zones() called
>>> a few line below also does the memalloc_noio_{save,restore}() dance.
>>
>> You should be extending noio scope then if this allocation falls into
>> the same category. Ideally the scope should start at the recursion place
>> and end where the scope really ened.
> 
> But it does not look like __vmalloc_node() (fallback in kvmalloc_node() if
> kmalloc() fails) cares about the context allocation flags... I can't see
> if/where the context allocation flags are taken into account. It looks like only
> the gfp_mask argument is used. Am I missing something ?


It does here:

void *kvmalloc_node(size_t size, gfp_t flags, int node)                                                                                                                              
{                                                                                                                                                                                    
        gfp_t kmalloc_flags = flags;                                                                                                                                                 
        void *ret;                                                                                                                                                                   
                                                                                                                                                                                     
        /*                                                                                                                                                                           
         * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)                                                                                                   
         * so the given set of flags has to be compatible.                                                                                                                           
         */                                                                                                                                                                          
        if ((flags & GFP_KERNEL) != GFP_KERNEL)                                                                                                                                      
                return kmalloc_node(size, flags, node);

	/* [...] */

	return __vmalloc_node(size, 1, flags, node,                                                                                                                                  
                        __builtin_return_address(0));                                                                                                                                
}                                                                                                                                                                                    
EXPORT_SYMBOL(kvmalloc_node);

We're not reaching __vmalloc_node() if GFP_KERNEL isn't set. Also, from the, function's
documentation:

 * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.                                                                                              

So yeah, we should've read that before calling kvcalloc(..., GFP_NOIO).

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
  2021-02-17  9:13       ` Damien Le Moal
  2021-02-17  9:18         ` Johannes Thumshirn
@ 2021-02-17  9:32         ` Michal Hocko
  1 sibling, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2021-02-17  9:32 UTC (permalink / raw)
  To: Damien Le Moal; +Cc: Johannes Thumshirn, Dan Carpenter, linux-scsi

On Wed 17-02-21 09:13:57, Damien Le Moal wrote:
> On 2021/02/17 17:03, Michal Hocko wrote:
> > On Wed 17-02-21 06:42:37, Johannes Thumshirn wrote:
> >> On 17/02/2021 00:33, Damien Le Moal wrote:
> >>> On 2021/02/17 4:42, Dan Carpenter wrote:
> >>>> Hello Johannes Thumshirn,
> >>>>
> >>>> The patch 5795eb443060: "scsi: sd_zbc: emulate ZONE_APPEND commands"
> >>>> from May 12, 2020, leads to the following static checker warning:
> >>>>
> >>>> 	drivers/scsi/sd_zbc.c:741 sd_zbc_revalidate_zones()
> >>>> 	error: kvmalloc() only makes sense with GFP_KERNEL
> >>>>
> >>>> drivers/scsi/sd_zbc.c
> >>>>    721          /*
> >>>>    722           * There is nothing to do for regular disks, including host-aware disks
> >>>>    723           * that have partitions.
> >>>>    724           */
> >>>>    725          if (!blk_queue_is_zoned(q))
> >>>>    726                  return 0;
> >>>>    727  
> >>>>    728          /*
> >>>>    729           * Make sure revalidate zones are serialized to ensure exclusive
> >>>>    730           * updates of the scsi disk data.
> >>>>    731           */
> >>>>    732          mutex_lock(&sdkp->rev_mutex);
> >>>>    733  
> >>>>    734          if (sdkp->zone_blocks == zone_blocks &&
> >>>>    735              sdkp->nr_zones == nr_zones &&
> >>>>    736              disk->queue->nr_zones == nr_zones)
> >>>>    737                  goto unlock;
> >>>>    738  
> >>>>    739          sdkp->zone_blocks = zone_blocks;
> >>>>    740          sdkp->nr_zones = nr_zones;
> >>>>    741          sdkp->rev_wp_offset = kvcalloc(nr_zones, sizeof(u32), GFP_NOIO);
> >>>>                                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
> >>>> We're passing GFP_NOIO here so it just defaults to kcalloc() and will
> >>>> not vmalloc() the memory.
> >>>
> >>> Indeed... And the allocation can get a little too big for kmalloc().
> >>>
> >>> Johannes, I think we need to move that allocation before the rev_mutex locking,
> >>> using a local var for the allocated address, and then using GFP_KERNEL should be
> >>> safe... But not entirely sure. Using kmalloc would be simpler but on large SMR
> >>> drives, that allocation will soon need to be 400K or so (i.e. 100,000 zones or
> >>> even more), too large for kmalloc to succeed reliably.
> >>>
> >>
> >>
> >> No I don't think so. A mutex isn't a spinlock so we can sleep on the allocation.
> >> We can't use GFP_KERNEL as we're about to do I/O. blk_revalidate_disk_zones() called
> >> a few line below also does the memalloc_noio_{save,restore}() dance.
> > 
> > You should be extending noio scope then if this allocation falls into
> > the same category. Ideally the scope should start at the recursion place
> > and end where the scope really ened.
> 
> But it does not look like __vmalloc_node() (fallback in kvmalloc_node() if
> kmalloc() fails) cares about the context allocation flags... I can't see
> if/where the context allocation flags are taken into account. It looks like only
> the gfp_mask argument is used. Am I missing something ?

current_gfp_context in the page allocator. vmalloc doesn't do any
reclaim on its own. It relies on the page allocator for that.
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
  2021-02-17  9:08       ` Johannes Thumshirn
@ 2021-02-17  9:36         ` Michal Hocko
  2021-02-17  9:41           ` Johannes Thumshirn
  0 siblings, 1 reply; 14+ messages in thread
From: Michal Hocko @ 2021-02-17  9:36 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: Damien Le Moal, Dan Carpenter, linux-scsi

On Wed 17-02-21 09:08:07, Johannes Thumshirn wrote:
> On 17/02/2021 09:03, Michal Hocko wrote:
> >> No I don't think so. A mutex isn't a spinlock so we can sleep on the allocation.
> >> We can't use GFP_KERNEL as we're about to do I/O. blk_revalidate_disk_zones() called
> >> a few line below also does the memalloc_noio_{save,restore}() dance.
> > 
> > You should be extending noio scope then if this allocation falls into
> > the same category. Ideally the scope should start at the recursion place
> > and end where the scope really ened.
> 
> That means all callers of blk_revalidate_disk_zones() should do 
> memalloc_noio_{save,restore}?

I am not really familiar with the IO area to answer this. The base idea
is to start the NOIO scope at the boundary which defines "unsafe to
re-enter or cannot deal with a new IO" from the reclaim path.

> If yes, can we somehow runtime assert that this is done, so we don't
> end up with bad surprises?

Could you elaborate?
 
> >> Would a kmem_cache for these revalidations help us in any way?
> > 
> > I am not sure what you mean here.
> > 
> 
> Using a kmem_cache for the allocations passed into blk_revalidate_disk_zones().
> I've looked into kmem_cache_alloc() and I couldn't find anything that speaks 
> against it, but I'm not too familiar with the code.

kmem_cache_alloc is only an extension to allow to allocate from a
specific cache. I do not really see how it is going to help with larger
allocation and my current understanding is that kvmalloc is used because
the requested allocation size can be large.

-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
  2021-02-17  9:36         ` Michal Hocko
@ 2021-02-17  9:41           ` Johannes Thumshirn
  2021-02-17 10:05             ` Michal Hocko
  0 siblings, 1 reply; 14+ messages in thread
From: Johannes Thumshirn @ 2021-02-17  9:41 UTC (permalink / raw)
  To: Michal Hocko; +Cc: Damien Le Moal, Dan Carpenter, linux-scsi

On 17/02/2021 10:36, Michal Hocko wrote:
> On Wed 17-02-21 09:08:07, Johannes Thumshirn wrote:
>> On 17/02/2021 09:03, Michal Hocko wrote:
>>>> No I don't think so. A mutex isn't a spinlock so we can sleep on the allocation.
>>>> We can't use GFP_KERNEL as we're about to do I/O. blk_revalidate_disk_zones() called
>>>> a few line below also does the memalloc_noio_{save,restore}() dance.
>>>
>>> You should be extending noio scope then if this allocation falls into
>>> the same category. Ideally the scope should start at the recursion place
>>> and end where the scope really ened.
>>
>> That means all callers of blk_revalidate_disk_zones() should do 
>> memalloc_noio_{save,restore}?
> 
> I am not really familiar with the IO area to answer this. The base idea
> is to start the NOIO scope at the boundary which defines "unsafe to
> re-enter or cannot deal with a new IO" from the reclaim path.
> 
>> If yes, can we somehow runtime assert that this is done, so we don't
>> end up with bad surprises?
> 
> Could you elaborate?

I though of lifting the noio scope into the callers of 
blk_revalidate_disk_zones() and then "check" in blk_revalidate_disk_zones()
this has been done. But it looks like memalloc_noio_save() can handle nesting,
so this is actually unneeded.


>  
>>>> Would a kmem_cache for these revalidations help us in any way?
>>>
>>> I am not sure what you mean here.
>>>
>>
>> Using a kmem_cache for the allocations passed into blk_revalidate_disk_zones().
>> I've looked into kmem_cache_alloc() and I couldn't find anything that speaks 
>> against it, but I'm not too familiar with the code.
> 
> kmem_cache_alloc is only an extension to allow to allocate from a
> specific cache. I do not really see how it is going to help with larger
> allocation and my current understanding is that kvmalloc is used because
> the requested allocation size can be large.
> 

Ah ok so we can't set aside a big enough pool to do allocations from there,
this was a misunderstanding from my side.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
  2021-02-17  9:41           ` Johannes Thumshirn
@ 2021-02-17 10:05             ` Michal Hocko
  0 siblings, 0 replies; 14+ messages in thread
From: Michal Hocko @ 2021-02-17 10:05 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: Damien Le Moal, Dan Carpenter, linux-scsi

On Wed 17-02-21 09:41:58, Johannes Thumshirn wrote:
> On 17/02/2021 10:36, Michal Hocko wrote:
> > On Wed 17-02-21 09:08:07, Johannes Thumshirn wrote:
> >> On 17/02/2021 09:03, Michal Hocko wrote:
> >>>> No I don't think so. A mutex isn't a spinlock so we can sleep on the allocation.
> >>>> We can't use GFP_KERNEL as we're about to do I/O. blk_revalidate_disk_zones() called
> >>>> a few line below also does the memalloc_noio_{save,restore}() dance.
> >>>
> >>> You should be extending noio scope then if this allocation falls into
> >>> the same category. Ideally the scope should start at the recursion place
> >>> and end where the scope really ened.
> >>
> >> That means all callers of blk_revalidate_disk_zones() should do 
> >> memalloc_noio_{save,restore}?
> > 
> > I am not really familiar with the IO area to answer this. The base idea
> > is to start the NOIO scope at the boundary which defines "unsafe to
> > re-enter or cannot deal with a new IO" from the reclaim path.
> > 
> >> If yes, can we somehow runtime assert that this is done, so we don't
> >> end up with bad surprises?
> > 
> > Could you elaborate?
> 
> I though of lifting the noio scope into the callers of 
> blk_revalidate_disk_zones() and then "check" in blk_revalidate_disk_zones()
> this has been done. But it looks like memalloc_noio_save() can handle nesting,
> so this is actually unneeded.

Yes, it can handle nesting. As to where to start the scope, this should
be at the boundary of "do not reenter IO path from the reclaim". This
can be a lock which is held for the allocation which will be needed for
an IO induced by the reclaim or any other resouce that will block from
the IO path. Is this more clear now?
-- 
Michal Hocko
SUSE Labs

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
  2021-02-17  9:18         ` Johannes Thumshirn
@ 2021-02-17 11:16           ` Dan Carpenter
  2021-02-17 11:18             ` Johannes Thumshirn
  0 siblings, 1 reply; 14+ messages in thread
From: Dan Carpenter @ 2021-02-17 11:16 UTC (permalink / raw)
  To: Johannes Thumshirn; +Cc: Damien Le Moal, Michal Hocko, linux-scsi

On Wed, Feb 17, 2021 at 09:18:37AM +0000, Johannes Thumshirn wrote:
>  * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
> 
> So yeah, we should've read that before calling kvcalloc(..., GFP_NOIO).

This is a Smatch check now and checks are better than comments.  :)

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands
  2021-02-17 11:16           ` Dan Carpenter
@ 2021-02-17 11:18             ` Johannes Thumshirn
  0 siblings, 0 replies; 14+ messages in thread
From: Johannes Thumshirn @ 2021-02-17 11:18 UTC (permalink / raw)
  To: Dan Carpenter; +Cc: Damien Le Moal, Michal Hocko, linux-scsi

On 17/02/2021 12:17, Dan Carpenter wrote:
> On Wed, Feb 17, 2021 at 09:18:37AM +0000, Johannes Thumshirn wrote:
>>  * @flags: gfp mask for the allocation - must be compatible (superset) with GFP_KERNEL.
>>
>> So yeah, we should've read that before calling kvcalloc(..., GFP_NOIO).
> 
> This is a Smatch check now and checks are better than comments.  :)

Definitively

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-02-17 11:19 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-16 19:39 [bug report] scsi: sd_zbc: emulate ZONE_APPEND commands Dan Carpenter
2021-02-16 23:33 ` Damien Le Moal
2021-02-17  6:42   ` Johannes Thumshirn
2021-02-17  8:00     ` Damien Le Moal
2021-02-17  8:03     ` Michal Hocko
2021-02-17  9:08       ` Johannes Thumshirn
2021-02-17  9:36         ` Michal Hocko
2021-02-17  9:41           ` Johannes Thumshirn
2021-02-17 10:05             ` Michal Hocko
2021-02-17  9:13       ` Damien Le Moal
2021-02-17  9:18         ` Johannes Thumshirn
2021-02-17 11:16           ` Dan Carpenter
2021-02-17 11:18             ` Johannes Thumshirn
2021-02-17  9:32         ` Michal Hocko

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.