> Without correctly kunmap, page table is corrupted. Page entries point > to wrong memory locations. You might call it completely harmless. But > I think this is a severe bug. Leaking memory is better than a > corrupted page table. Think security. We are talking about the page tables for the vmalloc area in the kernel here, so no security problem. Leaking memory is much more problematic. > Would you provide any document and reference by saying" It is > impossible to receive a signal during module load/unload"? For > example, if the unload stuck in a lock, can CTRL+C stop the unload? > No, CTRL+C doesn't abort module load/unload. There have been patches to changes this a while ago, but IIRC it broke a whole bunch of driver relying on this. > What about there is some other return error? What about in future > somebody improve amdgpu_bo_reserve to return other errors, > then function amdgpu_vram_scratch_fini becomes buggy? > Yes, that is indeed an issue. For example -EDEADLK is possible as well. That's why I said we should use amdgpu_bo_free_kernel() instead. > While I am thinking whether there is a better way for the current > situation, would you give a real world example that my patch really > not working? Then we can address it. > I don't think there is because the driver can't receive a signal during load/unload, but the problem is rather that the patch doesn't improve the situation at all. Regards, Christian. Am 19.04.2017 um 13:37 schrieb Xie, AlexBin: > > Hi Christian, > > > Without correctly kunmap, page table is corrupted. Page entries point > to wrong memory locations. You might call it completely harmless. But > I think this is a severe bug. Leaking memory is better than a > corrupted page table. Think security. > > > Would you provide any document and reference by saying" It is > impossible to receive a signal during module load/unload"? For > example, if the unload stuck in a lock, can CTRL+C stop the unload? > > > If "It is impossible to receive a signal during module load/unload", > interruptible waiting is fine too, because function amdgpu_bo_reserve > will return successfully. > > > What about there is some other return error? What about in future > somebody improve amdgpu_bo_reserve to return other errors, > then function amdgpu_vram_scratch_fini becomes buggy? > > > While I am thinking whether there is a better way for the current > situation, would you give a real world example that my patch really > not working? Then we can address it. > > > Thanks, > > Alex Bin > > > ------------------------------------------------------------------------ > *From:* Christian König > *Sent:* Wednesday, April 19, 2017 2:35 AM > *To:* Xie, AlexBin; Zhou, David(ChunMing); amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org > *Subject:* Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO > Hi AlexBin, > > the answer is ttm_bo_kunmap isn't called at all and yes in the case of > an iomap we leak the address space reserved. > > But that is completely harmless on a 64bit system compared to leaking > the memory backing the address space. > > Using amdgpu_bo_free_kernel() instead of openly coding it here is > probably a good idea. > > Additional to that it's probably a good idea to set the no_intr flag > when reserving kernel BOs. It is impossible to receive a signal during > module load/unload, but it's probably better to document that in the > code as well. > > Regards, > Christian. > > Am 18.04.2017 um 20:54 schrieb Xie, AlexBin: >> Hi Christian, >> >> Have you found how/where/when? When you said "mapping will just be >> released a bit later on", you must know the answer. >> >> It is difficult to prove something does not exist. Anyway, I will >> give it a try to prove such "later on" does not exist. >> >> Function ttm_bo_kunmap is the only function to unmap. To prove this, >> search ttm_bo_map_iomap, only ttm_bo_kunmap use this enum to >> correctly kunmap. >> >> Function ttm_bo_kunmap is not called by ttm itself. This is a hint >> that all TTM delay delete mechanism or unref mechanism will NOT >> kunmap BO later on. >> >> Function ttm_bo_kunmap is called by AMDGPU function amdgpu_bo_kunmap >> and amdgpu_gem_prime_vunmap. >> >> Search AMDGPU for amdgpu_bo_kunmap. All matches do not kunmap for >> scratch VRAM BO. amdgpu_bo_free_kernel is a suspect but the answer is >> still NO. >> >> So all possibilities are searched. Did I miss anything? >> >> Thanks, >> Alex Bin Xie >> >> ------------------------------------------------------------------------ >> *From:* Xie, AlexBin >> *Sent:* Tuesday, April 18, 2017 2:04:33 PM >> *To:* Christian König; Zhou, David(ChunMing); >> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org >> *Subject:* Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO >> >> Hi Christian, >> >> >> Would you point out where/when will kunmap happen for this BO when >> release? It must be somewhere in some function calls. >> >> >> I checked before I asked for review. But I did not see such obvious >> kunmap function call. >> >> >> If so, there should be a comment in function amdgpu_vram_scratch_fini >> to avoid future confusion. >> >> >> Thanks, >> Alex Bin Xie >> ------------------------------------------------------------------------ >> *From:* Christian König >> *Sent:* Tuesday, April 18, 2017 1:46 PM >> *To:* Xie, AlexBin; Zhou, David(ChunMing); amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org >> *Subject:* Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO >> Hi AlexBin, >> >> No, David is right. This is a very common coding pattern in the >> kernel module. >> >> Freeing up a BO while there still exists a kernel mapping is >> perfectly ok, the mapping will just be released a bit later on. >> >> So this code is actually perfectly ok and just an optimization, but >> your patch breaks it and creates a memory leak. >> >> Regards, >> Christian. >> >> Am 18.04.2017 um 17:17 schrieb Xie, AlexBin: >>> >>> Hi David, >>> >>> >>> When amdgpu_bo_reserve return errors, we cannot release the BO. This >>> is not a memory leak. General speaking, memory leak is unnoticed and >>> unintentional. >>> >>> >>> The caller of function amdgpu_vram_scratch_fini ignores the return >>> error value... >>> >>> >>> The "memory leak" is not caused by my patch. It is caused because >>> reserving BO fails. >>> >>> >>> This patch only aim to make function amdgpu_vram_scratch_fini behave >>> correctly. >>> >>> >>> To follow up, we can add a warning message when amdgpu_bo_reserve >>> error happens in a different patch. >>> >>> >>> If function call amdgpu_bo_reserve is changed to uninterruptible, >>> this changes driver behaviour. Without a substantial issue, I would >>> be cautious for such a change. >>> >>> >>> Thanks, >>> >>> Alex Bin Xie >>> >>> >>> ------------------------------------------------------------------------ >>> *From:* Zhou, David(ChunMing) >>> *Sent:* Monday, April 17, 2017 10:38 PM >>> *To:* Xie, AlexBin; amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org >>> *Subject:* Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO >>> >>> >>> On 2017年04月17日 22:54, Xie, AlexBin wrote: >>>> >>>> Hi David, >>>> >>>> >>>> Thanks for the comments. However, please have look at >>>> amdgpu_bo_reserve definition. >>>> >>>> static inline int amdgpu_bo_reserve(struct amdgpu_bo *bo, bool no_intr) >>>> >>> Ah, this is a wired wrapper for ttm_bo_reserve. >>> >>>> >>>> When we call this function like the following: >>>> >>>> r = amdgpu_bo_reserve(adev->vram_scratch.robj, false); >>>> The false means interruptible. >>>> >>>> >>>> On the other hand, when amdgpu_bo_reserve function return error, >>>> why do we unref BO without kunmap and unpin the BO? This is wrong >>>> implementation when amdgpu_bo_reserve return any error. >>>> >>> Yeah, I see your mean, it's in driver un-loading, How about changing >>> to no interruptible? Your patch will make a memleak if bo_reserve >>> fails, but it seems not matter. I have no strong preference. >>> >>> Regards, >>> David Zhou >>>> >>>> >>>> Thanks, >>>> Alex Bin Xie >>>> >>>> ------------------------------------------------------------------------ >>>> *From:* Zhou, David(ChunMing) >>>> *Sent:* Friday, April 14, 2017 12:00 AM >>>> *To:* Xie, AlexBin; amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org >>>> *Subject:* Re: [PATCH] dmr/amdgpu: Fix wrongly unref of BO >>>> >>>> >>>> On 2017年04月14日 05:34, Alex Xie wrote: >>>> > According to comment of amdgpu_bo_reserve, amdgpu_bo_reserve >>>> > can return with -ERESTARTSYS. When this function was interrupted >>>> > by a signal, BO should not be unref. Otherwise the BO might be >>>> > released while is kmapped and pinned, or BO MIGHT be deref >>>> > multiple times, etc. >>>> r = amdgpu_bo_reserve(adev->vram_scratch.robj, false); >>>> we have specified interruptible to false, so -ERESTARTSYS isn't >>>> possible >>>> here. >>>> >>>> Thanks, >>>> David Zhou >>>> > >>>> > Change-Id: If76071a768950a0d3ad9d5da7fcae04881807621 >>>> > Signed-off-by: Alex Xie >>>> > --- >>>> > drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 2 +- >>>> > 1 file changed, 1 insertion(+), 1 deletion(-) >>>> > >>>> > diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> > index 53996e3..1dcc2d1 100644 >>>> > --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> > +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c >>>> > @@ -355,8 +355,8 @@ static void amdgpu_vram_scratch_fini(struct >>>> amdgpu_device *adev) >>>> > amdgpu_bo_kunmap(adev->vram_scratch.robj); >>>> > amdgpu_bo_unpin(adev->vram_scratch.robj); >>>> > amdgpu_bo_unreserve(adev->vram_scratch.robj); >>>> > + amdgpu_bo_unref(&adev->vram_scratch.robj); >>>> > } >>>> > - amdgpu_bo_unref(&adev->vram_scratch.robj); >>>> > } >>>> > >>>> > /** >>>> >>> >>> >>> >>> _______________________________________________ >>> amd-gfx mailing list >>> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org >>> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >> >> >> >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx > > > > > _______________________________________________ > amd-gfx mailing list > amd-gfx-PD4FTy7X32lNgt0PjOBp9y5qC8QIuHrW@public.gmane.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx