All of lore.kernel.org
 help / color / mirror / Atom feed
From: "Christian König" <christian.koenig@amd.com>
To: Arunpravin Paneer Selvam <arunpravin.paneerselvam@amd.com>
Cc: "intel-gfx@lists.freedesktop.org"
	<intel-gfx@lists.freedesktop.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"Deucher, Alexander" <alexander.deucher@amd.com>,
	Matthew Auld <matthew.auld@intel.com>
Subject: Re: [PATCH v11] drm/amdgpu: add drm buddy support to amdgpu
Date: Tue, 29 Mar 2022 13:24:02 +0200	[thread overview]
Message-ID: <417bc262-17db-551f-1334-3cfafe997c60@amd.com> (raw)
In-Reply-To: <2b77dca5-df7e-5a65-eb3e-f186e1037e4d@amd.com>

Am 29.03.22 um 13:19 schrieb Arunpravin Paneer Selvam:
> [SNIP]
>>> +	pages_left = node->base.num_pages;
>>>    
>>>    	i = 0;
>>> -	spin_lock(&mgr->lock);
>>>    	while (pages_left) {
>>> -		uint32_t alignment = tbo->page_alignment;
>>> +		if (tbo->page_alignment)
>>> +			min_page_size = tbo->page_alignment << PAGE_SHIFT;
>>> +		else
>>> +			min_page_size = mgr->default_page_size;
>> The handling here looks extremely awkward to me.
>>
>> min_page_size should be determined outside of the loop, based on default_page_size, alignment and contiguous flag.
> I kept min_page_size determine logic inside the loop for cases 2GiB+
> requirements, Since now we are round up the size to the required
> alignment, I modified the min_page_size determine logic outside of the
> loop in v12. Please review.

Ah! So do we only have the loop so that each allocation isn't bigger 
than 2GiB? If yes couldn't we instead add a max_alloc_size or something 
similar?

BTW: I strongly suggest that you rename min_page_size to min_alloc_size. 
Otherwise somebody could think that those numbers are in pages and not 
bytes.

>> Then why do you drop the lock and grab it again inside the loop? And what is "i" actually good for?
> modified the lock/unlock placement in v12.
>
> "i" is to track when there is 2GiB+ contiguous allocation request, first
> we allocate 2GiB (due to SG table limit) continuously and the remaining
> pages in the next iteration, hence this request can't be a continuous.
> To set the placement flag we make use of "i" value. In our case "i"
> value becomes 2 and we don't set the below flag.
> node->base.placement |= TTM_PL_FLAG_CONTIGUOUS;
>
> If we don't get such requests, I will remove "i".

I'm not sure if that works.

As far as I can see drm_buddy_alloc_blocks() can allocate multiple 
blocks at the same time, but i is only incremented when we loop.

So what you should do instead is to check if node->blocks just contain 
exactly one element after the allocation but before the trim.

>>> +
>>> +		/* Limit maximum size to 2GB due to SG table limitations */
>>> +		pages = min(pages_left, 2UL << (30 - PAGE_SHIFT));
>>>    
>>>    		if (pages >= pages_per_node)
>>> -			alignment = pages_per_node;
>>> -
>>> -		r = drm_mm_insert_node_in_range(mm, &node->mm_nodes[i], pages,
>>> -						alignment, 0, place->fpfn,
>>> -						lpfn, mode);
>>> -		if (unlikely(r)) {
>>> -			if (pages > pages_per_node) {
>>> -				if (is_power_of_2(pages))
>>> -					pages = pages / 2;
>>> -				else
>>> -					pages = rounddown_pow_of_two(pages);
>>> -				continue;
>>> -			}
>>> -			goto error_free;
>>> +			min_page_size = pages_per_node << PAGE_SHIFT;
>>> +
>>> +		if (!is_contiguous && !IS_ALIGNED(pages, min_page_size >> PAGE_SHIFT))
>>> +			is_contiguous = 1;
>>> +
>>> +		if (is_contiguous) {
>>> +			pages = roundup_pow_of_two(pages);
>>> +			min_page_size = pages << PAGE_SHIFT;
>>> +
>>> +			if (pages > lpfn)
>>> +				lpfn = pages;
>>>    		}
>>>    
>>> -		vis_usage += amdgpu_vram_mgr_vis_size(adev, &node->mm_nodes[i]);
>>> -		amdgpu_vram_mgr_virt_start(&node->base, &node->mm_nodes[i]);
>>> -		pages_left -= pages;
>>> +		BUG_ON(min_page_size < mm->chunk_size);
>>> +
>>> +		mutex_lock(&mgr->lock);
>>> +		r = drm_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT,
>>> +					   (u64)lpfn << PAGE_SHIFT,
>>> +					   (u64)pages << PAGE_SHIFT,
>>> +					   min_page_size,
>>> +					   &node->blocks,
>>> +					   node->flags);
>>> +		mutex_unlock(&mgr->lock);
>>> +		if (unlikely(r))
>>> +			goto error_free_blocks;
>>> +
>>>    		++i;
>>>    
>>>    		if (pages > pages_left)
>>> -			pages = pages_left;
>>> +			pages_left = 0;
>>> +		else
>>> +			pages_left -= pages;
>>>    	}
>>> -	spin_unlock(&mgr->lock);
>>>    
>>> -	if (i == 1)
>>> +	/* Free unused pages for contiguous allocation */
>>> +	if (is_contiguous) {
>> Well that looks really odd, why is trimming not part of
>> drm_buddy_alloc_blocks() ?
> we didn't place trim function part of drm_buddy_alloc_blocks since we
> thought this function can be a generic one and it can be used by any
> other application as well. For example, now we are using it for trimming
> the last block in case of size non-alignment with min_page_size.

Good argument. Another thing I just realized is that we probably want to 
double check if we only allocated one block before the trim.

Thanks,
Christian.

WARNING: multiple messages have this Message-ID (diff)
From: "Christian König" <christian.koenig@amd.com>
To: Arunpravin Paneer Selvam <arunpravin.paneerselvam@amd.com>
Cc: "intel-gfx@lists.freedesktop.org"
	<intel-gfx@lists.freedesktop.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"Deucher, Alexander" <alexander.deucher@amd.com>,
	Matthew Auld <matthew.auld@intel.com>
Subject: Re: [Intel-gfx] [PATCH v11] drm/amdgpu: add drm buddy support to amdgpu
Date: Tue, 29 Mar 2022 13:24:02 +0200	[thread overview]
Message-ID: <417bc262-17db-551f-1334-3cfafe997c60@amd.com> (raw)
In-Reply-To: <2b77dca5-df7e-5a65-eb3e-f186e1037e4d@amd.com>

Am 29.03.22 um 13:19 schrieb Arunpravin Paneer Selvam:
> [SNIP]
>>> +	pages_left = node->base.num_pages;
>>>    
>>>    	i = 0;
>>> -	spin_lock(&mgr->lock);
>>>    	while (pages_left) {
>>> -		uint32_t alignment = tbo->page_alignment;
>>> +		if (tbo->page_alignment)
>>> +			min_page_size = tbo->page_alignment << PAGE_SHIFT;
>>> +		else
>>> +			min_page_size = mgr->default_page_size;
>> The handling here looks extremely awkward to me.
>>
>> min_page_size should be determined outside of the loop, based on default_page_size, alignment and contiguous flag.
> I kept min_page_size determine logic inside the loop for cases 2GiB+
> requirements, Since now we are round up the size to the required
> alignment, I modified the min_page_size determine logic outside of the
> loop in v12. Please review.

Ah! So do we only have the loop so that each allocation isn't bigger 
than 2GiB? If yes couldn't we instead add a max_alloc_size or something 
similar?

BTW: I strongly suggest that you rename min_page_size to min_alloc_size. 
Otherwise somebody could think that those numbers are in pages and not 
bytes.

>> Then why do you drop the lock and grab it again inside the loop? And what is "i" actually good for?
> modified the lock/unlock placement in v12.
>
> "i" is to track when there is 2GiB+ contiguous allocation request, first
> we allocate 2GiB (due to SG table limit) continuously and the remaining
> pages in the next iteration, hence this request can't be a continuous.
> To set the placement flag we make use of "i" value. In our case "i"
> value becomes 2 and we don't set the below flag.
> node->base.placement |= TTM_PL_FLAG_CONTIGUOUS;
>
> If we don't get such requests, I will remove "i".

I'm not sure if that works.

As far as I can see drm_buddy_alloc_blocks() can allocate multiple 
blocks at the same time, but i is only incremented when we loop.

So what you should do instead is to check if node->blocks just contain 
exactly one element after the allocation but before the trim.

>>> +
>>> +		/* Limit maximum size to 2GB due to SG table limitations */
>>> +		pages = min(pages_left, 2UL << (30 - PAGE_SHIFT));
>>>    
>>>    		if (pages >= pages_per_node)
>>> -			alignment = pages_per_node;
>>> -
>>> -		r = drm_mm_insert_node_in_range(mm, &node->mm_nodes[i], pages,
>>> -						alignment, 0, place->fpfn,
>>> -						lpfn, mode);
>>> -		if (unlikely(r)) {
>>> -			if (pages > pages_per_node) {
>>> -				if (is_power_of_2(pages))
>>> -					pages = pages / 2;
>>> -				else
>>> -					pages = rounddown_pow_of_two(pages);
>>> -				continue;
>>> -			}
>>> -			goto error_free;
>>> +			min_page_size = pages_per_node << PAGE_SHIFT;
>>> +
>>> +		if (!is_contiguous && !IS_ALIGNED(pages, min_page_size >> PAGE_SHIFT))
>>> +			is_contiguous = 1;
>>> +
>>> +		if (is_contiguous) {
>>> +			pages = roundup_pow_of_two(pages);
>>> +			min_page_size = pages << PAGE_SHIFT;
>>> +
>>> +			if (pages > lpfn)
>>> +				lpfn = pages;
>>>    		}
>>>    
>>> -		vis_usage += amdgpu_vram_mgr_vis_size(adev, &node->mm_nodes[i]);
>>> -		amdgpu_vram_mgr_virt_start(&node->base, &node->mm_nodes[i]);
>>> -		pages_left -= pages;
>>> +		BUG_ON(min_page_size < mm->chunk_size);
>>> +
>>> +		mutex_lock(&mgr->lock);
>>> +		r = drm_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT,
>>> +					   (u64)lpfn << PAGE_SHIFT,
>>> +					   (u64)pages << PAGE_SHIFT,
>>> +					   min_page_size,
>>> +					   &node->blocks,
>>> +					   node->flags);
>>> +		mutex_unlock(&mgr->lock);
>>> +		if (unlikely(r))
>>> +			goto error_free_blocks;
>>> +
>>>    		++i;
>>>    
>>>    		if (pages > pages_left)
>>> -			pages = pages_left;
>>> +			pages_left = 0;
>>> +		else
>>> +			pages_left -= pages;
>>>    	}
>>> -	spin_unlock(&mgr->lock);
>>>    
>>> -	if (i == 1)
>>> +	/* Free unused pages for contiguous allocation */
>>> +	if (is_contiguous) {
>> Well that looks really odd, why is trimming not part of
>> drm_buddy_alloc_blocks() ?
> we didn't place trim function part of drm_buddy_alloc_blocks since we
> thought this function can be a generic one and it can be used by any
> other application as well. For example, now we are using it for trimming
> the last block in case of size non-alignment with min_page_size.

Good argument. Another thing I just realized is that we probably want to 
double check if we only allocated one block before the trim.

Thanks,
Christian.

WARNING: multiple messages have this Message-ID (diff)
From: "Christian König" <christian.koenig@amd.com>
To: Arunpravin Paneer Selvam <arunpravin.paneerselvam@amd.com>
Cc: "intel-gfx@lists.freedesktop.org"
	<intel-gfx@lists.freedesktop.org>,
	"dri-devel@lists.freedesktop.org"
	<dri-devel@lists.freedesktop.org>,
	"amd-gfx@lists.freedesktop.org" <amd-gfx@lists.freedesktop.org>,
	"daniel@ffwll.ch" <daniel@ffwll.ch>,
	"Deucher, Alexander" <alexander.deucher@amd.com>,
	Matthew Auld <matthew.auld@intel.com>
Subject: Re: [PATCH v11] drm/amdgpu: add drm buddy support to amdgpu
Date: Tue, 29 Mar 2022 13:24:02 +0200	[thread overview]
Message-ID: <417bc262-17db-551f-1334-3cfafe997c60@amd.com> (raw)
In-Reply-To: <2b77dca5-df7e-5a65-eb3e-f186e1037e4d@amd.com>

Am 29.03.22 um 13:19 schrieb Arunpravin Paneer Selvam:
> [SNIP]
>>> +	pages_left = node->base.num_pages;
>>>    
>>>    	i = 0;
>>> -	spin_lock(&mgr->lock);
>>>    	while (pages_left) {
>>> -		uint32_t alignment = tbo->page_alignment;
>>> +		if (tbo->page_alignment)
>>> +			min_page_size = tbo->page_alignment << PAGE_SHIFT;
>>> +		else
>>> +			min_page_size = mgr->default_page_size;
>> The handling here looks extremely awkward to me.
>>
>> min_page_size should be determined outside of the loop, based on default_page_size, alignment and contiguous flag.
> I kept min_page_size determine logic inside the loop for cases 2GiB+
> requirements, Since now we are round up the size to the required
> alignment, I modified the min_page_size determine logic outside of the
> loop in v12. Please review.

Ah! So do we only have the loop so that each allocation isn't bigger 
than 2GiB? If yes couldn't we instead add a max_alloc_size or something 
similar?

BTW: I strongly suggest that you rename min_page_size to min_alloc_size. 
Otherwise somebody could think that those numbers are in pages and not 
bytes.

>> Then why do you drop the lock and grab it again inside the loop? And what is "i" actually good for?
> modified the lock/unlock placement in v12.
>
> "i" is to track when there is 2GiB+ contiguous allocation request, first
> we allocate 2GiB (due to SG table limit) continuously and the remaining
> pages in the next iteration, hence this request can't be a continuous.
> To set the placement flag we make use of "i" value. In our case "i"
> value becomes 2 and we don't set the below flag.
> node->base.placement |= TTM_PL_FLAG_CONTIGUOUS;
>
> If we don't get such requests, I will remove "i".

I'm not sure if that works.

As far as I can see drm_buddy_alloc_blocks() can allocate multiple 
blocks at the same time, but i is only incremented when we loop.

So what you should do instead is to check if node->blocks just contain 
exactly one element after the allocation but before the trim.

>>> +
>>> +		/* Limit maximum size to 2GB due to SG table limitations */
>>> +		pages = min(pages_left, 2UL << (30 - PAGE_SHIFT));
>>>    
>>>    		if (pages >= pages_per_node)
>>> -			alignment = pages_per_node;
>>> -
>>> -		r = drm_mm_insert_node_in_range(mm, &node->mm_nodes[i], pages,
>>> -						alignment, 0, place->fpfn,
>>> -						lpfn, mode);
>>> -		if (unlikely(r)) {
>>> -			if (pages > pages_per_node) {
>>> -				if (is_power_of_2(pages))
>>> -					pages = pages / 2;
>>> -				else
>>> -					pages = rounddown_pow_of_two(pages);
>>> -				continue;
>>> -			}
>>> -			goto error_free;
>>> +			min_page_size = pages_per_node << PAGE_SHIFT;
>>> +
>>> +		if (!is_contiguous && !IS_ALIGNED(pages, min_page_size >> PAGE_SHIFT))
>>> +			is_contiguous = 1;
>>> +
>>> +		if (is_contiguous) {
>>> +			pages = roundup_pow_of_two(pages);
>>> +			min_page_size = pages << PAGE_SHIFT;
>>> +
>>> +			if (pages > lpfn)
>>> +				lpfn = pages;
>>>    		}
>>>    
>>> -		vis_usage += amdgpu_vram_mgr_vis_size(adev, &node->mm_nodes[i]);
>>> -		amdgpu_vram_mgr_virt_start(&node->base, &node->mm_nodes[i]);
>>> -		pages_left -= pages;
>>> +		BUG_ON(min_page_size < mm->chunk_size);
>>> +
>>> +		mutex_lock(&mgr->lock);
>>> +		r = drm_buddy_alloc_blocks(mm, (u64)place->fpfn << PAGE_SHIFT,
>>> +					   (u64)lpfn << PAGE_SHIFT,
>>> +					   (u64)pages << PAGE_SHIFT,
>>> +					   min_page_size,
>>> +					   &node->blocks,
>>> +					   node->flags);
>>> +		mutex_unlock(&mgr->lock);
>>> +		if (unlikely(r))
>>> +			goto error_free_blocks;
>>> +
>>>    		++i;
>>>    
>>>    		if (pages > pages_left)
>>> -			pages = pages_left;
>>> +			pages_left = 0;
>>> +		else
>>> +			pages_left -= pages;
>>>    	}
>>> -	spin_unlock(&mgr->lock);
>>>    
>>> -	if (i == 1)
>>> +	/* Free unused pages for contiguous allocation */
>>> +	if (is_contiguous) {
>> Well that looks really odd, why is trimming not part of
>> drm_buddy_alloc_blocks() ?
> we didn't place trim function part of drm_buddy_alloc_blocks since we
> thought this function can be a generic one and it can be used by any
> other application as well. For example, now we are using it for trimming
> the last block in case of size non-alignment with min_page_size.

Good argument. Another thing I just realized is that we probably want to 
double check if we only allocated one block before the trim.

Thanks,
Christian.

  reply	other threads:[~2022-03-29 11:24 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-03-23  6:25 [PATCH v11] drm/amdgpu: add drm buddy support to amdgpu Arunpravin Paneer Selvam
2022-03-23  6:25 ` Arunpravin Paneer Selvam
2022-03-23  6:25 ` [Intel-gfx] " Arunpravin Paneer Selvam
2022-03-23  6:42 ` Paul Menzel
2022-03-23  6:42   ` Paul Menzel
2022-03-23  6:42   ` [Intel-gfx] " Paul Menzel
2022-03-23  7:42   ` Christian König
2022-03-23  7:42     ` Christian König
2022-03-23  7:42     ` [Intel-gfx] " Christian König
2022-03-23  8:10     ` Commit messages (was: [PATCH v11] drm/amdgpu: add drm buddy support to amdgpu) Paul Menzel
2022-03-23  8:10       ` Paul Menzel
2022-03-23  8:10       ` [Intel-gfx] " Paul Menzel
2022-03-23  8:18       ` Christian König
2022-03-23  8:18         ` Christian König
2022-03-23  8:18         ` [Intel-gfx] " Christian König
2022-03-23 14:00         ` Daniel Stone
2022-03-23 14:00           ` [Intel-gfx] " Daniel Stone
2022-03-23 14:42           ` Alex Deucher
2022-03-23 14:42             ` [Intel-gfx] " Alex Deucher
2022-03-23 15:03             ` Daniel Stone
2022-03-23 15:03               ` [Intel-gfx] " Daniel Stone
2022-03-23 15:14               ` Alex Deucher
2022-03-23 15:14                 ` [Intel-gfx] " Alex Deucher
2022-03-23 15:24                 ` Daniel Stone
2022-03-23 15:24                   ` [Intel-gfx] " Daniel Stone
2022-03-23 15:32                   ` Christian König
2022-03-23 15:32                     ` Christian König
2022-03-24 10:30                     ` [Intel-gfx] " Daniel Vetter
2022-03-24 10:30                       ` Daniel Vetter
2022-03-25 15:56                     ` Commit messages Paul Menzel
2022-03-25 15:56                       ` [Intel-gfx] " Paul Menzel
2022-03-23 15:19           ` Commit messages (was: [PATCH v11] drm/amdgpu: add drm buddy support to amdgpu) Christian König
2022-03-23 15:19             ` [Intel-gfx] " Christian König
2022-03-23  7:16 ` [Intel-gfx] ✗ Fi.CI.BUILD: failure for drm/amdgpu: add drm buddy support to amdgpu (rev2) Patchwork
2022-03-23  7:37 ` [PATCH v11] drm/amdgpu: add drm buddy support to amdgpu Christian König
2022-03-23  7:37   ` [Intel-gfx] " Christian König
2022-03-23  7:37   ` Christian König
     [not found]   ` <MN2PR12MB4342B7FD0CC5C480DEEF8A77E41D9@MN2PR12MB4342.namprd12.prod.outlook.com>
2022-03-29 11:19     ` Arunpravin Paneer Selvam
2022-03-29 11:19       ` Arunpravin Paneer Selvam
2022-03-29 11:19       ` [Intel-gfx] " Arunpravin Paneer Selvam
2022-03-29 11:24       ` Christian König [this message]
2022-03-29 11:24         ` Christian König
2022-03-29 11:24         ` [Intel-gfx] " Christian König
2022-03-29 16:00         ` Arunpravin Paneer Selvam
2022-03-29 16:00           ` Arunpravin Paneer Selvam
2022-03-29 16:00           ` [Intel-gfx] " Arunpravin Paneer Selvam
2022-03-29 19:18           ` Arunpravin Paneer Selvam
2022-03-29 19:18             ` Arunpravin Paneer Selvam
2022-03-29 19:18             ` [Intel-gfx] " Arunpravin Paneer Selvam
2022-03-30  6:53             ` Christian König
2022-03-30  6:53               ` Christian König
2022-03-30  6:53               ` [Intel-gfx] " Christian König
2022-03-23 11:26 ` kernel test robot
2022-03-23 11:26   ` [Intel-gfx] " kernel test robot

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=417bc262-17db-551f-1334-3cfafe997c60@amd.com \
    --to=christian.koenig@amd.com \
    --cc=alexander.deucher@amd.com \
    --cc=amd-gfx@lists.freedesktop.org \
    --cc=arunpravin.paneerselvam@amd.com \
    --cc=dri-devel@lists.freedesktop.org \
    --cc=intel-gfx@lists.freedesktop.org \
    --cc=matthew.auld@intel.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.