All of lore.kernel.org
 help / color / mirror / Atom feed
* It appears drm-next TTM cleanup broke something . . .
@ 2020-10-18 19:15 Kevin Brace
  2020-10-18 19:50 ` Dave Airlie
  2020-10-18 21:04 ` Sam Ravnborg
  0 siblings, 2 replies; 16+ messages in thread
From: Kevin Brace @ 2020-10-18 19:15 UTC (permalink / raw)
  To: dri-devel, Dave Airlie

Hi Dave,

It is a little urgent, so I am writing this right now.
As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.
While going through the changes I need to make to OpenChrome DRM to compile with the latest Linux kernel, I noticed that ttm_bo_init_mm() was discontinued, and it was replaced with ttm_range_man_init().
ttm_range_man_init() has a parameter called "bool use_tt", but honestly, I do not think it is functioning correctly.
If I keep "ttm_tt_create" member of ttm_bo_driver struct null by not specifying it, TTM still tries to call it, and crashes due to a null pointer access.
The workaround I found so far is to specify the "ttm_tt_create" member by copying bo_driver_ttm_tt_create() from drm/drm_gem_vram_helper.c.
This is what the call trace looks like without specifying the "ttm_tt_create" member (i.e., this member is null).

_______________________________________________
. . .
kernel: [   34.310674] [drm:openchrome_bo_create [openchrome]] Entered openchrome_bo_create.
kernel: [   34.310697] [drm:openchrome_ttm_domain_to_placement [openchrome]] Entered openchrome_ttm_domain_to_placement.
kernel: [   34.310706] [drm:openchrome_ttm_domain_to_placement [openchrome]] Exiting openchrome_ttm_domain_to_placement.
kernel: [   34.310737] BUG: kernel NULL pointer dereference, address: 0000000000000000
kernel: [   34.310742] #PF: supervisor instruction fetch in kernel mode
kernel: [   34.310745] #PF: error_code(0x0010) - not-present page
. . .
kernel: [   34.310807] Call Trace:
kernel: [   34.310827]  ttm_tt_create+0x5f/0xa0 [ttm]
kernel: [   34.310839]  ttm_bo_validate+0xb8/0x140 [ttm]
kernel: [   34.310886]  ? drm_vma_offset_add+0x56/0x70 [drm]
kernel: [   34.310897]  ? openchrome_gem_create_ioctl+0x150/0x150 [openchrome]
. . .
_______________________________________________

The erroneous call to  "ttm_tt_create" member happens right after TTM placement is performed (openchrome_ttm_domain_to_placement()).
Currently, OpenChrome DRM's TTM implementation does not use "ttm_tt_create" member, and this arrangement worked fine until Linux 5.9's drm-next code.
It appears that Linux 5.10's drm-next code broke the code.

Regards,

Kevin Brace
Brace Computer Laboratory blog
https://bracecomputerlab.com

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: It appears drm-next TTM cleanup broke something . . .
  2020-10-18 19:15 It appears drm-next TTM cleanup broke something . . Kevin Brace
@ 2020-10-18 19:50 ` Dave Airlie
  2020-10-19  7:23   ` Kevin Brace
  2020-10-18 21:04 ` Sam Ravnborg
  1 sibling, 1 reply; 16+ messages in thread
From: Dave Airlie @ 2020-10-18 19:50 UTC (permalink / raw)
  To: Kevin Brace, Christian König; +Cc: Dave Airlie, dri-devel

On Mon, 19 Oct 2020 at 05:15, Kevin Brace <kevinbrace@gmx.com> wrote:
>
> Hi Dave,
>
> It is a little urgent, so I am writing this right now.
> As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.
> While going through the changes I need to make to OpenChrome DRM to compile with the latest Linux kernel, I noticed that ttm_bo_init_mm() was discontinued, and it was replaced with ttm_range_man_init().
> ttm_range_man_init() has a parameter called "bool use_tt", but honestly, I do not think it is functioning correctly.
> If I keep "ttm_tt_create" member of ttm_bo_driver struct null by not specifying it, TTM still tries to call it, and crashes due to a null pointer access.
> The workaround I found so far is to specify the "ttm_tt_create" member by copying bo_driver_ttm_tt_create() from drm/drm_gem_vram_helper.c.
> This is what the call trace looks like without specifying the "ttm_tt_create" member (i.e., this member is null).

cc'ing Christian,

I can't remember if we did this deliberately or if just worked by
accident previously.

Either way, you should probably need a ttm_tt_create going forward.

Dave.

>
> _______________________________________________
> . . .
> kernel: [   34.310674] [drm:openchrome_bo_create [openchrome]] Entered openchrome_bo_create.
> kernel: [   34.310697] [drm:openchrome_ttm_domain_to_placement [openchrome]] Entered openchrome_ttm_domain_to_placement.
> kernel: [   34.310706] [drm:openchrome_ttm_domain_to_placement [openchrome]] Exiting openchrome_ttm_domain_to_placement.
> kernel: [   34.310737] BUG: kernel NULL pointer dereference, address: 0000000000000000
> kernel: [   34.310742] #PF: supervisor instruction fetch in kernel mode
> kernel: [   34.310745] #PF: error_code(0x0010) - not-present page
> . . .
> kernel: [   34.310807] Call Trace:
> kernel: [   34.310827]  ttm_tt_create+0x5f/0xa0 [ttm]
> kernel: [   34.310839]  ttm_bo_validate+0xb8/0x140 [ttm]
> kernel: [   34.310886]  ? drm_vma_offset_add+0x56/0x70 [drm]
> kernel: [   34.310897]  ? openchrome_gem_create_ioctl+0x150/0x150 [openchrome]
> . . .
> _______________________________________________
>
> The erroneous call to  "ttm_tt_create" member happens right after TTM placement is performed (openchrome_ttm_domain_to_placement()).
> Currently, OpenChrome DRM's TTM implementation does not use "ttm_tt_create" member, and this arrangement worked fine until Linux 5.9's drm-next code.
> It appears that Linux 5.10's drm-next code broke the code.
>
> Regards,
>
> Kevin Brace
> Brace Computer Laboratory blog
> https://bracecomputerlab.com
>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: It appears drm-next TTM cleanup broke something . . .
  2020-10-18 19:15 It appears drm-next TTM cleanup broke something . . Kevin Brace
  2020-10-18 19:50 ` Dave Airlie
@ 2020-10-18 21:04 ` Sam Ravnborg
  2020-10-19 19:43   ` Kevin Brace
  1 sibling, 1 reply; 16+ messages in thread
From: Sam Ravnborg @ 2020-10-18 21:04 UTC (permalink / raw)
  To: Kevin Brace; +Cc: Dave Airlie, dri-devel

Hi Kevin.

On Sun, Oct 18, 2020 at 09:15:17PM +0200, Kevin Brace wrote:
> As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.

I know you have been working on and off on the openchrome driver for a
long time now. Any chance we will see the driver submitted for upstream soon?

	Sam
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: It appears drm-next TTM cleanup broke something . . .
  2020-10-18 19:50 ` Dave Airlie
@ 2020-10-19  7:23   ` Kevin Brace
  2020-10-19 10:13     ` Christian König
  0 siblings, 1 reply; 16+ messages in thread
From: Kevin Brace @ 2020-10-19  7:23 UTC (permalink / raw)
  To: Dave Airlie, dri-devel; +Cc: Christian König

Hi Dave,

Yeah, with the workaround I mentioned in my previous e-mail, OpenChrome DRM does not crash for "ttm_tt_create" member being null.
It is still not able to boot X Server due to some other TTM related memory allocation issue it is suffering from.
I think making huge changes to TTM during this development cycle broke OpenChrome DRM.
    Following up on the question I raised during the previous e-mail.
Shouldn't "use_tt" parameter being "false" for ttm_range_man_init() disable TTM TT functionality?
I feel like that should be the expected behavior.
Again, there is only 5 to 6 more days left until Linux 5.10-rc2, so I decided to contact you on Sunday (I consider this bug to be urgent.).
Assuming what I am asserting is correct, I think the reason why this was not discovered earlier was due to the following reasons.

1) nouveau, radeon, and amdgpu already use TTM TT functionality.
2) ast uses GEM VRAM helper that internally uses TTM. It populates "ttm_tt_create" and "ttm_tt_destroy" members, hence, the developers did not notice the breakage.
3) OpenChrome DRM is still not in the mainline tree, so no one other than myself noticed the problem until now.


Regarding the TTM TT functionality, OpenChrome DRM currently does not support acceleration, hence, I did not believe it was necessary to populate "ttm_tt_create" and "ttm_tt_destroy" members.
That implementation worked fine until the previous development cycle code.
Of course, I will eventually add support for acceleration, hence, TTM TT functionality will be utilized at some point.

Regards,

Kevin Brace
Brace Computer Laboratory blog
https://bracecomputerlab.com


> Sent: Sunday, October 18, 2020 at 12:50 PM
> From: "Dave Airlie" <airlied@gmail.com>
> To: "Kevin Brace" <kevinbrace@gmx.com>, "Christian König" <ckoenig.leichtzumerken@gmail.com>
> Cc: "dri-devel" <dri-devel@lists.freedesktop.org>, "Dave Airlie" <airlied@redhat.com>
> Subject: Re: It appears drm-next TTM cleanup broke something . . .
>
> On Mon, 19 Oct 2020 at 05:15, Kevin Brace <kevinbrace@gmx.com> wrote:
> >
> > Hi Dave,
> >
> > It is a little urgent, so I am writing this right now.
> > As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.
> > While going through the changes I need to make to OpenChrome DRM to compile with the latest Linux kernel, I noticed that ttm_bo_init_mm() was discontinued, and it was replaced with ttm_range_man_init().
> > ttm_range_man_init() has a parameter called "bool use_tt", but honestly, I do not think it is functioning correctly.
> > If I keep "ttm_tt_create" member of ttm_bo_driver struct null by not specifying it, TTM still tries to call it, and crashes due to a null pointer access.
> > The workaround I found so far is to specify the "ttm_tt_create" member by copying bo_driver_ttm_tt_create() from drm/drm_gem_vram_helper.c.
> > This is what the call trace looks like without specifying the "ttm_tt_create" member (i.e., this member is null).
> 
> cc'ing Christian,
> 
> I can't remember if we did this deliberately or if just worked by
> accident previously.
> 
> Either way, you should probably need a ttm_tt_create going forward.
> 
> Dave.
> 
> >
> > _______________________________________________
> > . . .
> > kernel: [   34.310674] [drm:openchrome_bo_create [openchrome]] Entered openchrome_bo_create.
> > kernel: [   34.310697] [drm:openchrome_ttm_domain_to_placement [openchrome]] Entered openchrome_ttm_domain_to_placement.
> > kernel: [   34.310706] [drm:openchrome_ttm_domain_to_placement [openchrome]] Exiting openchrome_ttm_domain_to_placement.
> > kernel: [   34.310737] BUG: kernel NULL pointer dereference, address: 0000000000000000
> > kernel: [   34.310742] #PF: supervisor instruction fetch in kernel mode
> > kernel: [   34.310745] #PF: error_code(0x0010) - not-present page
> > . . .
> > kernel: [   34.310807] Call Trace:
> > kernel: [   34.310827]  ttm_tt_create+0x5f/0xa0 [ttm]
> > kernel: [   34.310839]  ttm_bo_validate+0xb8/0x140 [ttm]
> > kernel: [   34.310886]  ? drm_vma_offset_add+0x56/0x70 [drm]
> > kernel: [   34.310897]  ? openchrome_gem_create_ioctl+0x150/0x150 [openchrome]
> > . . .
> > _______________________________________________
> >
> > The erroneous call to  "ttm_tt_create" member happens right after TTM placement is performed (openchrome_ttm_domain_to_placement()).
> > Currently, OpenChrome DRM's TTM implementation does not use "ttm_tt_create" member, and this arrangement worked fine until Linux 5.9's drm-next code.
> > It appears that Linux 5.10's drm-next code broke the code.
> >
> > Regards,
> >
> > Kevin Brace
> > Brace Computer Laboratory blog
> > https://bracecomputerlab.com
> >
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: It appears drm-next TTM cleanup broke something . . .
  2020-10-19  7:23   ` Kevin Brace
@ 2020-10-19 10:13     ` Christian König
  2020-10-19 16:20       ` Kevin Brace
  0 siblings, 1 reply; 16+ messages in thread
From: Christian König @ 2020-10-19 10:13 UTC (permalink / raw)
  To: Kevin Brace, Dave Airlie, dri-devel

Hi Kevin,

the basic problem you are facing is that ttm_tt_create/destroy is 
mandatory (It always was). You need an implementation or otherwise you 
won't be able to use the system domain (additional to the optional GTT 
domain).

My best guess is that the difference is that we now force to initiate 
the system domain for all drivers.

If that is correct you just that you never ran into because you never 
correctly initialized TTM to support buffer moves.

I'm not sure what exactly the OpenChrome DRM driver is doing, but I 
strongly suggest to just drop TTM support completely and use the GEM 
VRAM helper layer instead.

Regards,
Christian.

Am 19.10.20 um 09:23 schrieb Kevin Brace:
> Hi Dave,
>
> Yeah, with the workaround I mentioned in my previous e-mail, OpenChrome DRM does not crash for "ttm_tt_create" member being null.
> It is still not able to boot X Server due to some other TTM related memory allocation issue it is suffering from.
> I think making huge changes to TTM during this development cycle broke OpenChrome DRM.
>      Following up on the question I raised during the previous e-mail.
> Shouldn't "use_tt" parameter being "false" for ttm_range_man_init() disable TTM TT functionality?
> I feel like that should be the expected behavior.
> Again, there is only 5 to 6 more days left until Linux 5.10-rc2, so I decided to contact you on Sunday (I consider this bug to be urgent.).
> Assuming what I am asserting is correct, I think the reason why this was not discovered earlier was due to the following reasons.
>
> 1) nouveau, radeon, and amdgpu already use TTM TT functionality.
> 2) ast uses GEM VRAM helper that internally uses TTM. It populates "ttm_tt_create" and "ttm_tt_destroy" members, hence, the developers did not notice the breakage.
> 3) OpenChrome DRM is still not in the mainline tree, so no one other than myself noticed the problem until now.
>
>
> Regarding the TTM TT functionality, OpenChrome DRM currently does not support acceleration, hence, I did not believe it was necessary to populate "ttm_tt_create" and "ttm_tt_destroy" members.
> That implementation worked fine until the previous development cycle code.
> Of course, I will eventually add support for acceleration, hence, TTM TT functionality will be utilized at some point.
>
> Regards,
>
> Kevin Brace
> Brace Computer Laboratory blog
> https://bracecomputerlab.com
>
>
>> Sent: Sunday, October 18, 2020 at 12:50 PM
>> From: "Dave Airlie" <airlied@gmail.com>
>> To: "Kevin Brace" <kevinbrace@gmx.com>, "Christian König" <ckoenig.leichtzumerken@gmail.com>
>> Cc: "dri-devel" <dri-devel@lists.freedesktop.org>, "Dave Airlie" <airlied@redhat.com>
>> Subject: Re: It appears drm-next TTM cleanup broke something . . .
>>
>> On Mon, 19 Oct 2020 at 05:15, Kevin Brace <kevinbrace@gmx.com> wrote:
>>> Hi Dave,
>>>
>>> It is a little urgent, so I am writing this right now.
>>> As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.
>>> While going through the changes I need to make to OpenChrome DRM to compile with the latest Linux kernel, I noticed that ttm_bo_init_mm() was discontinued, and it was replaced with ttm_range_man_init().
>>> ttm_range_man_init() has a parameter called "bool use_tt", but honestly, I do not think it is functioning correctly.
>>> If I keep "ttm_tt_create" member of ttm_bo_driver struct null by not specifying it, TTM still tries to call it, and crashes due to a null pointer access.
>>> The workaround I found so far is to specify the "ttm_tt_create" member by copying bo_driver_ttm_tt_create() from drm/drm_gem_vram_helper.c.
>>> This is what the call trace looks like without specifying the "ttm_tt_create" member (i.e., this member is null).
>> cc'ing Christian,
>>
>> I can't remember if we did this deliberately or if just worked by
>> accident previously.
>>
>> Either way, you should probably need a ttm_tt_create going forward.
>>
>> Dave.
>>
>>> _______________________________________________
>>> . . .
>>> kernel: [   34.310674] [drm:openchrome_bo_create [openchrome]] Entered openchrome_bo_create.
>>> kernel: [   34.310697] [drm:openchrome_ttm_domain_to_placement [openchrome]] Entered openchrome_ttm_domain_to_placement.
>>> kernel: [   34.310706] [drm:openchrome_ttm_domain_to_placement [openchrome]] Exiting openchrome_ttm_domain_to_placement.
>>> kernel: [   34.310737] BUG: kernel NULL pointer dereference, address: 0000000000000000
>>> kernel: [   34.310742] #PF: supervisor instruction fetch in kernel mode
>>> kernel: [   34.310745] #PF: error_code(0x0010) - not-present page
>>> . . .
>>> kernel: [   34.310807] Call Trace:
>>> kernel: [   34.310827]  ttm_tt_create+0x5f/0xa0 [ttm]
>>> kernel: [   34.310839]  ttm_bo_validate+0xb8/0x140 [ttm]
>>> kernel: [   34.310886]  ? drm_vma_offset_add+0x56/0x70 [drm]
>>> kernel: [   34.310897]  ? openchrome_gem_create_ioctl+0x150/0x150 [openchrome]
>>> . . .
>>> _______________________________________________
>>>
>>> The erroneous call to  "ttm_tt_create" member happens right after TTM placement is performed (openchrome_ttm_domain_to_placement()).
>>> Currently, OpenChrome DRM's TTM implementation does not use "ttm_tt_create" member, and this arrangement worked fine until Linux 5.9's drm-next code.
>>> It appears that Linux 5.10's drm-next code broke the code.
>>>
>>> Regards,
>>>
>>> Kevin Brace
>>> Brace Computer Laboratory blog
>>> https://bracecomputerlab.com
>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: It appears drm-next TTM cleanup broke something . . .
  2020-10-19 10:13     ` Christian König
@ 2020-10-19 16:20       ` Kevin Brace
  2020-10-19 16:37         ` Christian König
  0 siblings, 1 reply; 16+ messages in thread
From: Kevin Brace @ 2020-10-19 16:20 UTC (permalink / raw)
  To: christian.koenig; +Cc: dri-devel

Hi Christian,

I looked into a few more things, and figured out why OpenChrome DRM was not booting X Server.
Now the situation is under control in my side of the world (OpenChrome development world), and I got X Server working again with drm-next 5.10 code and OpenChrome DRM.
Code will be committed into OpenChrome DRM upstream repository's drm-next-5.10 branch in the next few days.
    I can see that OpenChrome DRM "just happened to" work without ttm_tt_create/destroy callbacks being specified.
I can fix this issue easily.
The other issue I was facing was that commit 48e07c23cbeba2a2cda7ca73be0015e727818536 (drm/ttm: nuke memory type flags) discontinued TTM_PL_FLAG_* flags.
This caused incompatibility between OpenChrome DDX and the updated OpenChrome DRM that incorporated the changes made with commit 48e07c2.
OpenChrome DDX was sending TTM_PL_FLAG_* based flags to OpenChrome DRM.
I changed OpenChrome DDX code to use TTM_PL_* instead for allocating memory via TTM, and OpenChrome DRM based Linux kernel was able to boot X Server properly.
Again, the code change to the DDX will happen in a few days.
    Before I moved on from this issue, I will like to ask for one request regarding TTM and its callbacks.
Is it too much to ask for using more BUG_ON null pointer assertions for TTM callbacks?
Although I did not bring it up with you back then, I did get hit with an issue similar to this 3 years ago.

https://cgit.freedesktop.org/openchrome/drm-openchrome/commit/?h=drm-next-4.13&id=3c08ec601bb1ccd5ff58a9101317b728aa7204bd
https://cgit.freedesktop.org/openchrome/drm-openchrome/commit/?h=drm-next-4.13&id=2064175f977d9859831be653df16e3ea10415a8a


I did not send you an e-mail about it, I do recall complaining to Alex Deucher about the above io_mem_pfn null pointer bug at XDC2017 (I did attend it, and I presented about the state of OpenChrome Project.).
I brought this up with Alex because I stuck for more than 2 weeks to figure out the issue (the commit for overcoming io_mem_pfn callback issue was made on 2017-09-13, but the commit prior to that was on 2017-08-28).

https://cgit.freedesktop.org/openchrome/drm-openchrome/log/?h=drm-next-4.13&ofs=50


Anyway, someone else got hit by the same issue a month or two later

https://lists.freedesktop.org/archives/dri-devel/2017-November/159002.html
https://lists.freedesktop.org/archives/dri-devel/2017-December/161168.html


These are the commits that fixed the issue.

https://cgit.freedesktop.org/drm/drm/commit/?id=ea642c3216cb2a60d1c0e760ae47ee85c9c16447
https://cgit.freedesktop.org/drm/drm/commit/?id=c67fa6edc8b11afe22c88a23963170bf5f151acf


If you compare the commit date, I figured it out io_mem_pfn null pointer bug 2 to 3 months earlier.
I think the problem we got with the latest code change to the TTM is, many TTM callbacks do not check for a null pointer for mandatory callbacks.
Also, the ttm_bo_driver struct comment section inside ttm_bo_driver.h does not make it 100% clear that what is mandatory and what is not.
If these were done, I am sure I would have left ttm_tt_create/destroy callbacks intact.

https://cgit.freedesktop.org/openchrome/drm-openchrome/commit/?h=drm-next-5.10&id=64947142a1cf8b83faa73da7aa35a17f6a24568a
(scroll down to openchrome/openchrome_ttm.c)


    Regarding your suggestion that I should abandon OpenChrome DRM's current GEM / TTM implementation and instead I should use the GEM VRAM helpers, since I was able to figure out the issues with some suggestions from you and Dave, I do not think it is worth switching to GEM VRAM helpers at this point.
Obviously, the current implementation does not support acceleration, hence, it appears to be a good candidate for switching to GEM VRAM helpers.
While that may be true, I do plan to implement acceleration later, and this is why I do not want to settle with the GEM VRAM helpers.

Regards,

Kevin Brace
Brace Computer Laboratory blog
https://bracecomputerlab.com


> Sent: Monday, October 19, 2020 at 3:13 AM
> From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
> To: "Kevin Brace" <kevinbrace@gmx.com>, "Dave Airlie" <airlied@gmail.com>, "dri-devel" <dri-devel@lists.freedesktop.org>
> Subject: Re: It appears drm-next TTM cleanup broke something . . .
>
> Hi Kevin,
> 
> the basic problem you are facing is that ttm_tt_create/destroy is 
> mandatory (It always was). You need an implementation or otherwise you 
> won't be able to use the system domain (additional to the optional GTT 
> domain).
> 
> My best guess is that the difference is that we now force to initiate 
> the system domain for all drivers.
> 
> If that is correct you just that you never ran into because you never 
> correctly initialized TTM to support buffer moves.
> 
> I'm not sure what exactly the OpenChrome DRM driver is doing, but I 
> strongly suggest to just drop TTM support completely and use the GEM 
> VRAM helper layer instead.
> 
> Regards,
> Christian.
> 
> Am 19.10.20 um 09:23 schrieb Kevin Brace:
> > Hi Dave,
> >
> > Yeah, with the workaround I mentioned in my previous e-mail, OpenChrome DRM does not crash for "ttm_tt_create" member being null.
> > It is still not able to boot X Server due to some other TTM related memory allocation issue it is suffering from.
> > I think making huge changes to TTM during this development cycle broke OpenChrome DRM.
> >      Following up on the question I raised during the previous e-mail.
> > Shouldn't "use_tt" parameter being "false" for ttm_range_man_init() disable TTM TT functionality?
> > I feel like that should be the expected behavior.
> > Again, there is only 5 to 6 more days left until Linux 5.10-rc2, so I decided to contact you on Sunday (I consider this bug to be urgent.).
> > Assuming what I am asserting is correct, I think the reason why this was not discovered earlier was due to the following reasons.
> >
> > 1) nouveau, radeon, and amdgpu already use TTM TT functionality.
> > 2) ast uses GEM VRAM helper that internally uses TTM. It populates "ttm_tt_create" and "ttm_tt_destroy" members, hence, the developers did not notice the breakage.
> > 3) OpenChrome DRM is still not in the mainline tree, so no one other than myself noticed the problem until now.
> >
> >
> > Regarding the TTM TT functionality, OpenChrome DRM currently does not support acceleration, hence, I did not believe it was necessary to populate "ttm_tt_create" and "ttm_tt_destroy" members.
> > That implementation worked fine until the previous development cycle code.
> > Of course, I will eventually add support for acceleration, hence, TTM TT functionality will be utilized at some point.
> >
> > Regards,
> >
> > Kevin Brace
> > Brace Computer Laboratory blog
> > https://bracecomputerlab.com
> >
> >
> >> Sent: Sunday, October 18, 2020 at 12:50 PM
> >> From: "Dave Airlie" <airlied@gmail.com>
> >> To: "Kevin Brace" <kevinbrace@gmx.com>, "Christian König" <ckoenig.leichtzumerken@gmail.com>
> >> Cc: "dri-devel" <dri-devel@lists.freedesktop.org>, "Dave Airlie" <airlied@redhat.com>
> >> Subject: Re: It appears drm-next TTM cleanup broke something . . .
> >>
> >> On Mon, 19 Oct 2020 at 05:15, Kevin Brace <kevinbrace@gmx.com> wrote:
> >>> Hi Dave,
> >>>
> >>> It is a little urgent, so I am writing this right now.
> >>> As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.
> >>> While going through the changes I need to make to OpenChrome DRM to compile with the latest Linux kernel, I noticed that ttm_bo_init_mm() was discontinued, and it was replaced with ttm_range_man_init().
> >>> ttm_range_man_init() has a parameter called "bool use_tt", but honestly, I do not think it is functioning correctly.
> >>> If I keep "ttm_tt_create" member of ttm_bo_driver struct null by not specifying it, TTM still tries to call it, and crashes due to a null pointer access.
> >>> The workaround I found so far is to specify the "ttm_tt_create" member by copying bo_driver_ttm_tt_create() from drm/drm_gem_vram_helper.c.
> >>> This is what the call trace looks like without specifying the "ttm_tt_create" member (i.e., this member is null).
> >> cc'ing Christian,
> >>
> >> I can't remember if we did this deliberately or if just worked by
> >> accident previously.
> >>
> >> Either way, you should probably need a ttm_tt_create going forward.
> >>
> >> Dave.
> >>
> >>> _______________________________________________
> >>> . . .
> >>> kernel: [   34.310674] [drm:openchrome_bo_create [openchrome]] Entered openchrome_bo_create.
> >>> kernel: [   34.310697] [drm:openchrome_ttm_domain_to_placement [openchrome]] Entered openchrome_ttm_domain_to_placement.
> >>> kernel: [   34.310706] [drm:openchrome_ttm_domain_to_placement [openchrome]] Exiting openchrome_ttm_domain_to_placement.
> >>> kernel: [   34.310737] BUG: kernel NULL pointer dereference, address: 0000000000000000
> >>> kernel: [   34.310742] #PF: supervisor instruction fetch in kernel mode
> >>> kernel: [   34.310745] #PF: error_code(0x0010) - not-present page
> >>> . . .
> >>> kernel: [   34.310807] Call Trace:
> >>> kernel: [   34.310827]  ttm_tt_create+0x5f/0xa0 [ttm]
> >>> kernel: [   34.310839]  ttm_bo_validate+0xb8/0x140 [ttm]
> >>> kernel: [   34.310886]  ? drm_vma_offset_add+0x56/0x70 [drm]
> >>> kernel: [   34.310897]  ? openchrome_gem_create_ioctl+0x150/0x150 [openchrome]
> >>> . . .
> >>> _______________________________________________
> >>>
> >>> The erroneous call to  "ttm_tt_create" member happens right after TTM placement is performed (openchrome_ttm_domain_to_placement()).
> >>> Currently, OpenChrome DRM's TTM implementation does not use "ttm_tt_create" member, and this arrangement worked fine until Linux 5.9's drm-next code.
> >>> It appears that Linux 5.10's drm-next code broke the code.
> >>>
> >>> Regards,
> >>>
> >>> Kevin Brace
> >>> Brace Computer Laboratory blog
> >>> https://bracecomputerlab.com
> >>>
> >>> _______________________________________________
> >>> dri-devel mailing list
> >>> dri-devel@lists.freedesktop.org
> >>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: It appears drm-next TTM cleanup broke something . . .
  2020-10-19 16:20       ` Kevin Brace
@ 2020-10-19 16:37         ` Christian König
  0 siblings, 0 replies; 16+ messages in thread
From: Christian König @ 2020-10-19 16:37 UTC (permalink / raw)
  To: Kevin Brace; +Cc: dri-devel

Hi Kevin,

> OpenChrome DDX was sending TTM_PL_FLAG_* based flags to OpenChrome DRM.

Ugh, that would be an absolute no-go for upstreaming.

> Is it too much to ask for using more BUG_ON null pointer assertions for TTM callbacks?

I don't think that this is useful at all. See a BUG_ON() has the same 
result as a NULL pointer dereference.

> While that may be true, I do plan to implement acceleration later, and this is why I do not want to settle with the GEM VRAM helpers.

TTM is quite a mess and the effort here is essential to clean it up and 
kill most of the driver specific workarounds we have in there.

As the maintainer of all of this I would probably reject any newly added 
driver which is using the layer directly instead of the VRAM GEM wrapper.

Regards,
Christian.

Am 19.10.20 um 18:20 schrieb Kevin Brace:
> Hi Christian,
>
> I looked into a few more things, and figured out why OpenChrome DRM was not booting X Server.
> Now the situation is under control in my side of the world (OpenChrome development world), and I got X Server working again with drm-next 5.10 code and OpenChrome DRM.
> Code will be committed into OpenChrome DRM upstream repository's drm-next-5.10 branch in the next few days.
>      I can see that OpenChrome DRM "just happened to" work without ttm_tt_create/destroy callbacks being specified.
> I can fix this issue easily.
> The other issue I was facing was that commit 48e07c23cbeba2a2cda7ca73be0015e727818536 (drm/ttm: nuke memory type flags) discontinued TTM_PL_FLAG_* flags.
> This caused incompatibility between OpenChrome DDX and the updated OpenChrome DRM that incorporated the changes made with commit 48e07c2.
> OpenChrome DDX was sending TTM_PL_FLAG_* based flags to OpenChrome DRM.
> I changed OpenChrome DDX code to use TTM_PL_* instead for allocating memory via TTM, and OpenChrome DRM based Linux kernel was able to boot X Server properly.
> Again, the code change to the DDX will happen in a few days.
>      Before I moved on from this issue, I will like to ask for one request regarding TTM and its callbacks.
> Is it too much to ask for using more BUG_ON null pointer assertions for TTM callbacks?
> Although I did not bring it up with you back then, I did get hit with an issue similar to this 3 years ago.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fopenchrome%2Fdrm-openchrome%2Fcommit%2F%3Fh%3Ddrm-next-4.13%26id%3D3c08ec601bb1ccd5ff58a9101317b728aa7204bd&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C053fff6809b14e2976d408d8744aee28%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637387212462965991&amp;sdata=wmGsqFinBsTLJSgp6G5ygoW7J%2Fppph4CzgO9q7q4h34%3D&amp;reserved=0
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fopenchrome%2Fdrm-openchrome%2Fcommit%2F%3Fh%3Ddrm-next-4.13%26id%3D2064175f977d9859831be653df16e3ea10415a8a&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C053fff6809b14e2976d408d8744aee28%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637387212462965991&amp;sdata=1Pl3PZ%2FY8P2aFE9in3oYCgl1CW6Nq%2FIQiHe75U8Dp0I%3D&amp;reserved=0
>
>
> I did not send you an e-mail about it, I do recall complaining to Alex Deucher about the above io_mem_pfn null pointer bug at XDC2017 (I did attend it, and I presented about the state of OpenChrome Project.).
> I brought this up with Alex because I stuck for more than 2 weeks to figure out the issue (the commit for overcoming io_mem_pfn callback issue was made on 2017-09-13, but the commit prior to that was on 2017-08-28).
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fopenchrome%2Fdrm-openchrome%2Flog%2F%3Fh%3Ddrm-next-4.13%26ofs%3D50&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C053fff6809b14e2976d408d8744aee28%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637387212462965991&amp;sdata=zsLWqjAPDFhokHptCAHl4XRCZJSjcWsnvAimoUyb7Mk%3D&amp;reserved=0
>
>
> Anyway, someone else got hit by the same issue a month or two later
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Fdri-devel%2F2017-November%2F159002.html&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C053fff6809b14e2976d408d8744aee28%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637387212462965991&amp;sdata=wbkUKUL0XVOviw%2Bvt0WFGvSyJiOF%2Bdz%2FZVSMSPTwuwI%3D&amp;reserved=0
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Farchives%2Fdri-devel%2F2017-December%2F161168.html&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C053fff6809b14e2976d408d8744aee28%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637387212462965991&amp;sdata=nJ8n0lFRTX%2FoxPsKCvV%2FYvtUvmvzxLGVikUWtBsZy5U%3D&amp;reserved=0
>
>
> These are the commits that fixed the issue.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm%2Fcommit%2F%3Fid%3Dea642c3216cb2a60d1c0e760ae47ee85c9c16447&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C053fff6809b14e2976d408d8744aee28%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637387212462965991&amp;sdata=aL1C28MKl5SPRsP%2F9A6XWFMmtXmxymmGic3vLZe5%2FNw%3D&amp;reserved=0
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fdrm%2Fdrm%2Fcommit%2F%3Fid%3Dc67fa6edc8b11afe22c88a23963170bf5f151acf&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C053fff6809b14e2976d408d8744aee28%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637387212462965991&amp;sdata=%2BY1z8sgjIf%2FPiJG9PBIOnsHuRCFq1uHNSVJodWQz%2B5k%3D&amp;reserved=0
>
>
> If you compare the commit date, I figured it out io_mem_pfn null pointer bug 2 to 3 months earlier.
> I think the problem we got with the latest code change to the TTM is, many TTM callbacks do not check for a null pointer for mandatory callbacks.
> Also, the ttm_bo_driver struct comment section inside ttm_bo_driver.h does not make it 100% clear that what is mandatory and what is not.
> If these were done, I am sure I would have left ttm_tt_create/destroy callbacks intact.
>
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fcgit.freedesktop.org%2Fopenchrome%2Fdrm-openchrome%2Fcommit%2F%3Fh%3Ddrm-next-5.10%26id%3D64947142a1cf8b83faa73da7aa35a17f6a24568a&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C053fff6809b14e2976d408d8744aee28%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637387212462965991&amp;sdata=egysIQ%2BgxKi3ujKmzyRNRvI93SY%2BFI5dlW%2FvyGglAn8%3D&amp;reserved=0
> (scroll down to openchrome/openchrome_ttm.c)
>
>
>      Regarding your suggestion that I should abandon OpenChrome DRM's current GEM / TTM implementation and instead I should use the GEM VRAM helpers, since I was able to figure out the issues with some suggestions from you and Dave, I do not think it is worth switching to GEM VRAM helpers at this point.
> Obviously, the current implementation does not support acceleration, hence, it appears to be a good candidate for switching to GEM VRAM helpers.
> While that may be true, I do plan to implement acceleration later, and this is why I do not want to settle with the GEM VRAM helpers.
>
> Regards,
>
> Kevin Brace
> Brace Computer Laboratory blog
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbracecomputerlab.com%2F&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C053fff6809b14e2976d408d8744aee28%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637387212462965991&amp;sdata=a11wNdmtVlgNBWHuBnxT%2BhCZRC%2BCtZw2xrVebNylSLU%3D&amp;reserved=0
>
>
>> Sent: Monday, October 19, 2020 at 3:13 AM
>> From: "Christian König" <ckoenig.leichtzumerken@gmail.com>
>> To: "Kevin Brace" <kevinbrace@gmx.com>, "Dave Airlie" <airlied@gmail.com>, "dri-devel" <dri-devel@lists.freedesktop.org>
>> Subject: Re: It appears drm-next TTM cleanup broke something . . .
>>
>> Hi Kevin,
>>
>> the basic problem you are facing is that ttm_tt_create/destroy is
>> mandatory (It always was). You need an implementation or otherwise you
>> won't be able to use the system domain (additional to the optional GTT
>> domain).
>>
>> My best guess is that the difference is that we now force to initiate
>> the system domain for all drivers.
>>
>> If that is correct you just that you never ran into because you never
>> correctly initialized TTM to support buffer moves.
>>
>> I'm not sure what exactly the OpenChrome DRM driver is doing, but I
>> strongly suggest to just drop TTM support completely and use the GEM
>> VRAM helper layer instead.
>>
>> Regards,
>> Christian.
>>
>> Am 19.10.20 um 09:23 schrieb Kevin Brace:
>>> Hi Dave,
>>>
>>> Yeah, with the workaround I mentioned in my previous e-mail, OpenChrome DRM does not crash for "ttm_tt_create" member being null.
>>> It is still not able to boot X Server due to some other TTM related memory allocation issue it is suffering from.
>>> I think making huge changes to TTM during this development cycle broke OpenChrome DRM.
>>>       Following up on the question I raised during the previous e-mail.
>>> Shouldn't "use_tt" parameter being "false" for ttm_range_man_init() disable TTM TT functionality?
>>> I feel like that should be the expected behavior.
>>> Again, there is only 5 to 6 more days left until Linux 5.10-rc2, so I decided to contact you on Sunday (I consider this bug to be urgent.).
>>> Assuming what I am asserting is correct, I think the reason why this was not discovered earlier was due to the following reasons.
>>>
>>> 1) nouveau, radeon, and amdgpu already use TTM TT functionality.
>>> 2) ast uses GEM VRAM helper that internally uses TTM. It populates "ttm_tt_create" and "ttm_tt_destroy" members, hence, the developers did not notice the breakage.
>>> 3) OpenChrome DRM is still not in the mainline tree, so no one other than myself noticed the problem until now.
>>>
>>>
>>> Regarding the TTM TT functionality, OpenChrome DRM currently does not support acceleration, hence, I did not believe it was necessary to populate "ttm_tt_create" and "ttm_tt_destroy" members.
>>> That implementation worked fine until the previous development cycle code.
>>> Of course, I will eventually add support for acceleration, hence, TTM TT functionality will be utilized at some point.
>>>
>>> Regards,
>>>
>>> Kevin Brace
>>> Brace Computer Laboratory blog
>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbracecomputerlab.com%2F&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C053fff6809b14e2976d408d8744aee28%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637387212462965991&amp;sdata=a11wNdmtVlgNBWHuBnxT%2BhCZRC%2BCtZw2xrVebNylSLU%3D&amp;reserved=0
>>>
>>>
>>>> Sent: Sunday, October 18, 2020 at 12:50 PM
>>>> From: "Dave Airlie" <airlied@gmail.com>
>>>> To: "Kevin Brace" <kevinbrace@gmx.com>, "Christian König" <ckoenig.leichtzumerken@gmail.com>
>>>> Cc: "dri-devel" <dri-devel@lists.freedesktop.org>, "Dave Airlie" <airlied@redhat.com>
>>>> Subject: Re: It appears drm-next TTM cleanup broke something . . .
>>>>
>>>> On Mon, 19 Oct 2020 at 05:15, Kevin Brace <kevinbrace@gmx.com> wrote:
>>>>> Hi Dave,
>>>>>
>>>>> It is a little urgent, so I am writing this right now.
>>>>> As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.
>>>>> While going through the changes I need to make to OpenChrome DRM to compile with the latest Linux kernel, I noticed that ttm_bo_init_mm() was discontinued, and it was replaced with ttm_range_man_init().
>>>>> ttm_range_man_init() has a parameter called "bool use_tt", but honestly, I do not think it is functioning correctly.
>>>>> If I keep "ttm_tt_create" member of ttm_bo_driver struct null by not specifying it, TTM still tries to call it, and crashes due to a null pointer access.
>>>>> The workaround I found so far is to specify the "ttm_tt_create" member by copying bo_driver_ttm_tt_create() from drm/drm_gem_vram_helper.c.
>>>>> This is what the call trace looks like without specifying the "ttm_tt_create" member (i.e., this member is null).
>>>> cc'ing Christian,
>>>>
>>>> I can't remember if we did this deliberately or if just worked by
>>>> accident previously.
>>>>
>>>> Either way, you should probably need a ttm_tt_create going forward.
>>>>
>>>> Dave.
>>>>
>>>>> _______________________________________________
>>>>> . . .
>>>>> kernel: [   34.310674] [drm:openchrome_bo_create [openchrome]] Entered openchrome_bo_create.
>>>>> kernel: [   34.310697] [drm:openchrome_ttm_domain_to_placement [openchrome]] Entered openchrome_ttm_domain_to_placement.
>>>>> kernel: [   34.310706] [drm:openchrome_ttm_domain_to_placement [openchrome]] Exiting openchrome_ttm_domain_to_placement.
>>>>> kernel: [   34.310737] BUG: kernel NULL pointer dereference, address: 0000000000000000
>>>>> kernel: [   34.310742] #PF: supervisor instruction fetch in kernel mode
>>>>> kernel: [   34.310745] #PF: error_code(0x0010) - not-present page
>>>>> . . .
>>>>> kernel: [   34.310807] Call Trace:
>>>>> kernel: [   34.310827]  ttm_tt_create+0x5f/0xa0 [ttm]
>>>>> kernel: [   34.310839]  ttm_bo_validate+0xb8/0x140 [ttm]
>>>>> kernel: [   34.310886]  ? drm_vma_offset_add+0x56/0x70 [drm]
>>>>> kernel: [   34.310897]  ? openchrome_gem_create_ioctl+0x150/0x150 [openchrome]
>>>>> . . .
>>>>> _______________________________________________
>>>>>
>>>>> The erroneous call to  "ttm_tt_create" member happens right after TTM placement is performed (openchrome_ttm_domain_to_placement()).
>>>>> Currently, OpenChrome DRM's TTM implementation does not use "ttm_tt_create" member, and this arrangement worked fine until Linux 5.9's drm-next code.
>>>>> It appears that Linux 5.10's drm-next code broke the code.
>>>>>
>>>>> Regards,
>>>>>
>>>>> Kevin Brace
>>>>> Brace Computer Laboratory blog
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbracecomputerlab.com%2F&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C053fff6809b14e2976d408d8744aee28%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637387212462975986&amp;sdata=oQr07H953zWryuBQJyPgFsqtOkXAtfprsxiA4ny%2F4LE%3D&amp;reserved=0
>>>>>
>>>>> _______________________________________________
>>>>> dri-devel mailing list
>>>>> dri-devel@lists.freedesktop.org
>>>>> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flists.freedesktop.org%2Fmailman%2Flistinfo%2Fdri-devel&amp;data=02%7C01%7Cchristian.koenig%40amd.com%7C053fff6809b14e2976d408d8744aee28%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637387212462975986&amp;sdata=gq2l57u0hZD2arzcWU2ppTuA0mgcshZHI%2BVvHYJ8hcs%3D&amp;reserved=0
>>

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: It appears drm-next TTM cleanup broke something . . .
  2020-10-18 21:04 ` Sam Ravnborg
@ 2020-10-19 19:43   ` Kevin Brace
  2020-10-19 20:28     ` Sam Ravnborg
                       ` (2 more replies)
  0 siblings, 3 replies; 16+ messages in thread
From: Kevin Brace @ 2020-10-19 19:43 UTC (permalink / raw)
  To: Sam Ravnborg; +Cc: Dave Airlie, dri-devel

Hi Sam,

Thanks for asking the question.
The current OpenChrome DRM code has these two major issues.

1) It does not support atomic modesetting

I do internally have working code to support atomic modesetting, but it is not ready for committing into the upstream OpenChrome DRM repository.
In particular, it suffers from a freeze relating to a cursor plane.
The freeze is a bad kind that kern.log does not really tell me what is wrong.
If I disable hardware cursor, the atomic modesetting based OpenChrome DRM appears to work okay.
In other words, I am getting close to getting atomic modesetting working, but I am stuck.


2) Double allocation of visible portion of frame buffer

This is a big problem left behind from the previous developer who developed OpenChrome prior to me.
For some reason, the developer wanted to allocate visible portion of the frame buffer to be the maximum possible size supported by the detected monitor when initializing the frame buffer inside OpenChrome DRM code.
I believe Radeon DRM does something similar to that.
The problem is, OpenChrome DDX allocates an equal sized frame buffer visible portion during the DDX's initialization.
This means that we got two same sized visible portions allocated, but OpenChrome DDX and OpenChrome DRM combined should really be allocating only one.
At this point, OpenChrome is not supporting double buffering.
This double allocation of a visible portion of the frame buffer contributes to a X Server crash when the screen is resized and 16 MB or less (i.e., 8 MB) shared frame buffer is reserved by the system via BIOS setup.
I personally think letting OpenChrome DRM allocate the visible portion of the frame buffer is the way to go, but if so, how do I get the DDX or shadow FB to access the frame buffer visible portion allocated by OpenChrome DRM?
Any suggestions on what to do about this issue will be greatly appreciated.
Perhaps, I should post a question to dri-devel regarding this issue.
I really do not know what I should do at this point.

Regards,

Kevin Brace
Brace Computer Laboratory blog
https://bracecomputerlab.com


> Sent: Sunday, October 18, 2020 at 2:04 PM
> From: "Sam Ravnborg" <sam@ravnborg.org>
> To: "Kevin Brace" <kevinbrace@gmx.com>
> Cc: dri-devel@lists.freedesktop.org, "Dave Airlie" <airlied@redhat.com>
> Subject: Re: It appears drm-next TTM cleanup broke something . . .
>
> Hi Kevin.
>
> On Sun, Oct 18, 2020 at 09:15:17PM +0200, Kevin Brace wrote:
> > As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.
>
> I know you have been working on and off on the openchrome driver for a
> long time now. Any chance we will see the driver submitted for upstream soon?
>
> 	Sam
>
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: It appears drm-next TTM cleanup broke something . . .
  2020-10-19 19:43   ` Kevin Brace
@ 2020-10-19 20:28     ` Sam Ravnborg
  2020-10-20  6:45       ` Thomas Zimmermann
  2020-10-20 17:17     ` Alex Deucher
  2020-10-21  8:03     ` Thomas Zimmermann
  2 siblings, 1 reply; 16+ messages in thread
From: Sam Ravnborg @ 2020-10-19 20:28 UTC (permalink / raw)
  To: Kevin Brace; +Cc: Dave Airlie, dri-devel

Hi Kevin.

On Mon, Oct 19, 2020 at 09:43:08PM +0200, Kevin Brace wrote:
> Hi Sam,
> 
> Thanks for asking the question.
> The current OpenChrome DRM code has these two major issues.
> 
> 1) It does not support atomic modesetting
> 
> I do internally have working code to support atomic modesetting, but it is not ready for committing into the upstream OpenChrome DRM repository.
> In particular, it suffers from a freeze relating to a cursor plane.
> The freeze is a bad kind that kern.log does not really tell me what is wrong.
> If I disable hardware cursor, the atomic modesetting based OpenChrome DRM appears to work okay.
> In other words, I am getting close to getting atomic modesetting working, but I am stuck.
Maybe posting what you have now - and explain that it has this defect.
Chances are that you will receive feedback that may help you on your way
to fix this.

With all the infrastructure improvements made the last years I would be
suprised if you have managed to include it all and maybe some of the
infrastructure may help you.

Also I know we have seems some cursor plane related discussions the last
months so maybe there are something to gain from the people involved
there.


> 2) Double allocation of visible portion of frame buffer
> 
> This is a big problem left behind from the previous developer who developed OpenChrome prior to me.
> For some reason, the developer wanted to allocate visible portion of the frame buffer to be the maximum possible size supported by the detected monitor when initializing the frame buffer inside OpenChrome DRM code.
> I believe Radeon DRM does something similar to that.
> The problem is, OpenChrome DDX allocates an equal sized frame buffer visible portion during the DDX's initialization.
> This means that we got two same sized visible portions allocated, but OpenChrome DDX and OpenChrome DRM combined should really be allocating only one.
> At this point, OpenChrome is not supporting double buffering.
> This double allocation of a visible portion of the frame buffer contributes to a X Server crash when the screen is resized and 16 MB or less (i.e., 8 MB) shared frame buffer is reserved by the system via BIOS setup.
> I personally think letting OpenChrome DRM allocate the visible portion of the frame buffer is the way to go, but if so, how do I get the DDX or shadow FB to access the frame buffer visible portion allocated by OpenChrome DRM?
> Any suggestions on what to do about this issue will be greatly appreciated.
> Perhaps, I should post a question to dri-devel regarding this issue.
> I really do not know what I should do at this point.
Likewise.

But obviously you shall not post it to dri-devel unless you are prepared
to handle the feedback that you *may* get.

I promise to take a look - but that will cover mostly trivial stuff.
You have to rely on others for all the stuff around atomic modestetting
and the memory handling etc. - the areas where you have challenges now.

	Sam
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: It appears drm-next TTM cleanup broke something . . .
  2020-10-19 20:28     ` Sam Ravnborg
@ 2020-10-20  6:45       ` Thomas Zimmermann
  2020-10-20  9:30         ` Sam Ravnborg
  0 siblings, 1 reply; 16+ messages in thread
From: Thomas Zimmermann @ 2020-10-20  6:45 UTC (permalink / raw)
  To: Sam Ravnborg, Kevin Brace; +Cc: Dave Airlie, dri-devel

Hi

On 19.10.20 22:28, Sam Ravnborg wrote:
> Hi Kevin.
> 
> On Mon, Oct 19, 2020 at 09:43:08PM +0200, Kevin Brace wrote:
>> Hi Sam,
>>
>> Thanks for asking the question.
>> The current OpenChrome DRM code has these two major issues.
>>
>> 1) It does not support atomic modesetting
>>
>> I do internally have working code to support atomic modesetting, but it is not ready for committing into the upstream OpenChrome DRM repository.
>> In particular, it suffers from a freeze relating to a cursor plane.
>> The freeze is a bad kind that kern.log does not really tell me what is wrong.
>> If I disable hardware cursor, the atomic modesetting based OpenChrome DRM appears to work okay.
>> In other words, I am getting close to getting atomic modesetting working, but I am stuck.
> Maybe posting what you have now - and explain that it has this defect.
> Chances are that you will receive feedback that may help you on your way
> to fix this.
> 
> With all the infrastructure improvements made the last years I would be
> suprised if you have managed to include it all and maybe some of the
> infrastructure may help you.
> 
> Also I know we have seems some cursor plane related discussions the last
> months so maybe there are something to gain from the people involved
> there.

I'd be interested in this as well. If you could share an URL to the
repo, I'd take a look. I think I even have a Via machine somewhere to
give it a try.

Best regards
Thomas

> 
> 
>> 2) Double allocation of visible portion of frame buffer
>>
>> This is a big problem left behind from the previous developer who developed OpenChrome prior to me.
>> For some reason, the developer wanted to allocate visible portion of the frame buffer to be the maximum possible size supported by the detected monitor when initializing the frame buffer inside OpenChrome DRM code.
>> I believe Radeon DRM does something similar to that.
>> The problem is, OpenChrome DDX allocates an equal sized frame buffer visible portion during the DDX's initialization.
>> This means that we got two same sized visible portions allocated, but OpenChrome DDX and OpenChrome DRM combined should really be allocating only one.
>> At this point, OpenChrome is not supporting double buffering.
>> This double allocation of a visible portion of the frame buffer contributes to a X Server crash when the screen is resized and 16 MB or less (i.e., 8 MB) shared frame buffer is reserved by the system via BIOS setup.
>> I personally think letting OpenChrome DRM allocate the visible portion of the frame buffer is the way to go, but if so, how do I get the DDX or shadow FB to access the frame buffer visible portion allocated by OpenChrome DRM?
>> Any suggestions on what to do about this issue will be greatly appreciated.
>> Perhaps, I should post a question to dri-devel regarding this issue.
>> I really do not know what I should do at this point.
> Likewise.
> 
> But obviously you shall not post it to dri-devel unless you are prepared
> to handle the feedback that you *may* get.
> 
> I promise to take a look - but that will cover mostly trivial stuff.
> You have to rely on others for all the stuff around atomic modestetting
> and the memory handling etc. - the areas where you have challenges now.
> 
> 	Sam
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: It appears drm-next TTM cleanup broke something . . .
  2020-10-20  6:45       ` Thomas Zimmermann
@ 2020-10-20  9:30         ` Sam Ravnborg
  0 siblings, 0 replies; 16+ messages in thread
From: Sam Ravnborg @ 2020-10-20  9:30 UTC (permalink / raw)
  To: Thomas Zimmermann; +Cc: Dave Airlie, dri-devel, Kevin Brace

Hi Thomas.

> I'd be interested in this as well. If you could share an URL to the
> repo, I'd take a look. I think I even have a Via machine somewhere to
> give it a try.

You can find it here:
git://anongit.freedesktop.org/openchrome/drm-openchrome

	Sam
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: It appears drm-next TTM cleanup broke something . . .
  2020-10-19 19:43   ` Kevin Brace
  2020-10-19 20:28     ` Sam Ravnborg
@ 2020-10-20 17:17     ` Alex Deucher
  2020-10-20 18:03       ` Ilia Mirkin
  2020-10-21  8:03     ` Thomas Zimmermann
  2 siblings, 1 reply; 16+ messages in thread
From: Alex Deucher @ 2020-10-20 17:17 UTC (permalink / raw)
  To: Kevin Brace; +Cc: Dave Airlie, Sam Ravnborg, Maling list - DRI developers

On Mon, Oct 19, 2020 at 3:43 PM Kevin Brace <kevinbrace@gmx.com> wrote:
>
> Hi Sam,
>
> Thanks for asking the question.
> The current OpenChrome DRM code has these two major issues.
>
> 1) It does not support atomic modesetting
>
> I do internally have working code to support atomic modesetting, but it is not ready for committing into the upstream OpenChrome DRM repository.
> In particular, it suffers from a freeze relating to a cursor plane.
> The freeze is a bad kind that kern.log does not really tell me what is wrong.
> If I disable hardware cursor, the atomic modesetting based OpenChrome DRM appears to work okay.
> In other words, I am getting close to getting atomic modesetting working, but I am stuck.
>
>
> 2) Double allocation of visible portion of frame buffer
>
> This is a big problem left behind from the previous developer who developed OpenChrome prior to me.
> For some reason, the developer wanted to allocate visible portion of the frame buffer to be the maximum possible size supported by the detected monitor when initializing the frame buffer inside OpenChrome DRM code.
> I believe Radeon DRM does something similar to that.
> The problem is, OpenChrome DDX allocates an equal sized frame buffer visible portion during the DDX's initialization.
> This means that we got two same sized visible portions allocated, but OpenChrome DDX and OpenChrome DRM combined should really be allocating only one.
> At this point, OpenChrome is not supporting double buffering.
> This double allocation of a visible portion of the frame buffer contributes to a X Server crash when the screen is resized and 16 MB or less (i.e., 8 MB) shared frame buffer is reserved by the system via BIOS setup.
> I personally think letting OpenChrome DRM allocate the visible portion of the frame buffer is the way to go, but if so, how do I get the DDX or shadow FB to access the frame buffer visible portion allocated by OpenChrome DRM?
> Any suggestions on what to do about this issue will be greatly appreciated.

All drivers do this.  You have one buffer for the fbdev console and
then userspace allocates buffers it needs in addition, otherwise,
you'd overwrite your console.  You can disable it by not enabling the
fbdev emulation if you don't want a console.

Alex


> Perhaps, I should post a question to dri-devel regarding this issue.
> I really do not know what I should do at this point.
>
> Regards,
>
> Kevin Brace
> Brace Computer Laboratory blog
> https://bracecomputerlab.com
>
>
> > Sent: Sunday, October 18, 2020 at 2:04 PM
> > From: "Sam Ravnborg" <sam@ravnborg.org>
> > To: "Kevin Brace" <kevinbrace@gmx.com>
> > Cc: dri-devel@lists.freedesktop.org, "Dave Airlie" <airlied@redhat.com>
> > Subject: Re: It appears drm-next TTM cleanup broke something . . .
> >
> > Hi Kevin.
> >
> > On Sun, Oct 18, 2020 at 09:15:17PM +0200, Kevin Brace wrote:
> > > As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.
> >
> > I know you have been working on and off on the openchrome driver for a
> > long time now. Any chance we will see the driver submitted for upstream soon?
> >
> >       Sam
> >
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: It appears drm-next TTM cleanup broke something . . .
  2020-10-20 17:17     ` Alex Deucher
@ 2020-10-20 18:03       ` Ilia Mirkin
  0 siblings, 0 replies; 16+ messages in thread
From: Ilia Mirkin @ 2020-10-20 18:03 UTC (permalink / raw)
  To: Alex Deucher
  Cc: Dave Airlie, Sam Ravnborg, Maling list - DRI developers, Kevin Brace

On Tue, Oct 20, 2020 at 1:17 PM Alex Deucher <alexdeucher@gmail.com> wrote:
>
> On Mon, Oct 19, 2020 at 3:43 PM Kevin Brace <kevinbrace@gmx.com> wrote:
> >
> > Hi Sam,
> >
> > Thanks for asking the question.
> > The current OpenChrome DRM code has these two major issues.
> >
> > 1) It does not support atomic modesetting
> >
> > I do internally have working code to support atomic modesetting, but it is not ready for committing into the upstream OpenChrome DRM repository.
> > In particular, it suffers from a freeze relating to a cursor plane.
> > The freeze is a bad kind that kern.log does not really tell me what is wrong.
> > If I disable hardware cursor, the atomic modesetting based OpenChrome DRM appears to work okay.
> > In other words, I am getting close to getting atomic modesetting working, but I am stuck.
> >
> >
> > 2) Double allocation of visible portion of frame buffer
> >
> > This is a big problem left behind from the previous developer who developed OpenChrome prior to me.
> > For some reason, the developer wanted to allocate visible portion of the frame buffer to be the maximum possible size supported by the detected monitor when initializing the frame buffer inside OpenChrome DRM code.
> > I believe Radeon DRM does something similar to that.
> > The problem is, OpenChrome DDX allocates an equal sized frame buffer visible portion during the DDX's initialization.
> > This means that we got two same sized visible portions allocated, but OpenChrome DDX and OpenChrome DRM combined should really be allocating only one.
> > At this point, OpenChrome is not supporting double buffering.
> > This double allocation of a visible portion of the frame buffer contributes to a X Server crash when the screen is resized and 16 MB or less (i.e., 8 MB) shared frame buffer is reserved by the system via BIOS setup.
> > I personally think letting OpenChrome DRM allocate the visible portion of the frame buffer is the way to go, but if so, how do I get the DDX or shadow FB to access the frame buffer visible portion allocated by OpenChrome DRM?
> > Any suggestions on what to do about this issue will be greatly appreciated.
>
> All drivers do this.  You have one buffer for the fbdev console and
> then userspace allocates buffers it needs in addition, otherwise,
> you'd overwrite your console.  You can disable it by not enabling the
> fbdev emulation if you don't want a console.

I don't think disabling fbdev emulation is a practical thing to do for
standard PCs.

Not sure what the capabilities of the underlying hardware are, but on
nouveau, we use 16bpp/8bpp framebuffers by default (for fbdev) on
boards with limited VRAM (64/32/16 MB, I don't remember precisely).
You'd have to add C8 drm format support for this, but assuming you
have that, the core supports it OK.

Ideally you'd have logic which is able to manage buffers and move them
between GPU-accessible and GPU-inaccessible memory (e.g. TTM, or
perhaps some evolution of it, I'm not up on all the latest).

Cheers,

  -ilia
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: It appears drm-next TTM cleanup broke something . . .
  2020-10-19 19:43   ` Kevin Brace
  2020-10-19 20:28     ` Sam Ravnborg
  2020-10-20 17:17     ` Alex Deucher
@ 2020-10-21  8:03     ` Thomas Zimmermann
  2020-10-21  8:14       ` Daniel Vetter
  2 siblings, 1 reply; 16+ messages in thread
From: Thomas Zimmermann @ 2020-10-21  8:03 UTC (permalink / raw)
  To: Kevin Brace, Sam Ravnborg; +Cc: Dave Airlie, dri-devel


[-- Attachment #1.1.1.1: Type: text/plain, Size: 4424 bytes --]

Hi

On 19.10.20 21:43, Kevin Brace wrote:
> Hi Sam,
> 
> Thanks for asking the question.
> The current OpenChrome DRM code has these two major issues.
> 
> 1) It does not support atomic modesetting
> 
> I do internally have working code to support atomic modesetting, but it is not ready for committing into the upstream OpenChrome DRM repository.
> In particular, it suffers from a freeze relating to a cursor plane.
> The freeze is a bad kind that kern.log does not really tell me what is wrong.
> If I disable hardware cursor, the atomic modesetting based OpenChrome DRM appears to work okay.
> In other words, I am getting close to getting atomic modesetting working, but I am stuck.

This could be related to the memory problems. See below. Otherwise, I
suggest to reduce the driver to the minimum functionality that is
required for modesetting (even without HW cursors) and submit this code
for review and merging.

> 
> 
> 2) Double allocation of visible portion of frame buffer
> 
> This is a big problem left behind from the previous developer who developed OpenChrome prior to me.
> For some reason, the developer wanted to allocate visible portion of the frame buffer to be the maximum possible size supported by the detected monitor when initializing the frame buffer inside OpenChrome DRM code.
> I believe Radeon DRM does something similar to that.
> The problem is, OpenChrome DDX allocates an equal sized frame buffer visible portion during the DDX's initialization.
> This means that we got two same sized visible portions allocated, but OpenChrome DDX and OpenChrome DRM combined should really be allocating only one.
> At this point, OpenChrome is not supporting double buffering.
> This double allocation of a visible portion of the frame buffer contributes to a X Server crash when the screen is resized and 16 MB or less (i.e., 8 MB) shared frame buffer is reserved by the system via BIOS setup.
> I personally think letting OpenChrome DRM allocate the visible portion of the frame buffer is the way to go, but if so, how do I get the DDX or shadow FB to access the frame buffer visible portion allocated by OpenChrome DRM?
> Any suggestions on what to do about this issue will be greatly appreciated.
> Perhaps, I should post a question to dri-devel regarding this issue.
> I really do not know what I should do at this point.

The double allocation is expected. Atomic modesetting requires two
framebuffers in video memory during the pageflip. You have to sort out
the modes where 2 framebuffers do not fit into video memory at the same
time.

The mode_valid callback in struct drm_mode_config_funcs [1] is a good
place to do this. Check the mode's pixels with the maximum BPC against
the available memory. Example code is at [2]. You should also plane for
common additional layers, such as HW cursors, to require video memory.
So maybe test the mode against 80% of the video memory.

Best regards
Thomas

[1]
https://cgit.freedesktop.org/openchrome/drm-openchrome/tree/drivers/gpu/drm/openchrome/openchrome_fb.c?h=drm-next-5.10&id=22e0ee2460b4b70cde562b7a3818ae94c2786f46#n102

[2]
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/drm_gem_vram_helper.c?h=v5.9#n1285

> 
> Regards,
> 
> Kevin Brace
> Brace Computer Laboratory blog
> https://bracecomputerlab.com
> 
> 
>> Sent: Sunday, October 18, 2020 at 2:04 PM
>> From: "Sam Ravnborg" <sam@ravnborg.org>
>> To: "Kevin Brace" <kevinbrace@gmx.com>
>> Cc: dri-devel@lists.freedesktop.org, "Dave Airlie" <airlied@redhat.com>
>> Subject: Re: It appears drm-next TTM cleanup broke something . . .
>>
>> Hi Kevin.
>>
>> On Sun, Oct 18, 2020 at 09:15:17PM +0200, Kevin Brace wrote:
>>> As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.
>>
>> I know you have been working on and off on the openchrome driver for a
>> long time now. Any chance we will see the driver submitted for upstream soon?
>>
>> 	Sam
>>
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer

[-- Attachment #1.1.1.2: OpenPGP_0x680DC11D530B7A23.asc --]
[-- Type: application/pgp-keys, Size: 4259 bytes --]

[-- Attachment #1.2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 495 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: It appears drm-next TTM cleanup broke something . . .
  2020-10-21  8:03     ` Thomas Zimmermann
@ 2020-10-21  8:14       ` Daniel Vetter
  2020-10-21  8:30         ` Thomas Zimmermann
  0 siblings, 1 reply; 16+ messages in thread
From: Daniel Vetter @ 2020-10-21  8:14 UTC (permalink / raw)
  To: Thomas Zimmermann; +Cc: Dave Airlie, Sam Ravnborg, dri-devel, Kevin Brace

On Wed, Oct 21, 2020 at 10:03 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>
> Hi
>
> On 19.10.20 21:43, Kevin Brace wrote:
> > Hi Sam,
> >
> > Thanks for asking the question.
> > The current OpenChrome DRM code has these two major issues.
> >
> > 1) It does not support atomic modesetting
> >
> > I do internally have working code to support atomic modesetting, but it is not ready for committing into the upstream OpenChrome DRM repository.
> > In particular, it suffers from a freeze relating to a cursor plane.
> > The freeze is a bad kind that kern.log does not really tell me what is wrong.
> > If I disable hardware cursor, the atomic modesetting based OpenChrome DRM appears to work okay.
> > In other words, I am getting close to getting atomic modesetting working, but I am stuck.
>
> This could be related to the memory problems. See below. Otherwise, I
> suggest to reduce the driver to the minimum functionality that is
> required for modesetting (even without HW cursors) and submit this code
> for review and merging.
>
> >
> >
> > 2) Double allocation of visible portion of frame buffer
> >
> > This is a big problem left behind from the previous developer who developed OpenChrome prior to me.
> > For some reason, the developer wanted to allocate visible portion of the frame buffer to be the maximum possible size supported by the detected monitor when initializing the frame buffer inside OpenChrome DRM code.
> > I believe Radeon DRM does something similar to that.
> > The problem is, OpenChrome DDX allocates an equal sized frame buffer visible portion during the DDX's initialization.
> > This means that we got two same sized visible portions allocated, but OpenChrome DDX and OpenChrome DRM combined should really be allocating only one.
> > At this point, OpenChrome is not supporting double buffering.
> > This double allocation of a visible portion of the frame buffer contributes to a X Server crash when the screen is resized and 16 MB or less (i.e., 8 MB) shared frame buffer is reserved by the system via BIOS setup.
> > I personally think letting OpenChrome DRM allocate the visible portion of the frame buffer is the way to go, but if so, how do I get the DDX or shadow FB to access the frame buffer visible portion allocated by OpenChrome DRM?
> > Any suggestions on what to do about this issue will be greatly appreciated.
> > Perhaps, I should post a question to dri-devel regarding this issue.
> > I really do not know what I should do at this point.
>
> The double allocation is expected. Atomic modesetting requires two
> framebuffers in video memory during the pageflip. You have to sort out
> the modes where 2 framebuffers do not fit into video memory at the same
> time.

What we have done on severly restricted discrete gpu is to keep one
framebuffer for everyone in vram, and blt the kms framebuffers over as
needed. With all the dirty tracking helpers for atomic that's like a
one-liner to set up (or just slightly more). I think cirrus works like
that (but it's using cpu memcpy since that's the only thing that
exists, I guess openchrome could even use the blitter for this).

The more usual approach is what nouveau guys already explained: Just
run fbcon at very low resolution so it doesn't consume too much space.
-Daniel

> The mode_valid callback in struct drm_mode_config_funcs [1] is a good
> place to do this. Check the mode's pixels with the maximum BPC against
> the available memory. Example code is at [2]. You should also plane for
> common additional layers, such as HW cursors, to require video memory.
> So maybe test the mode against 80% of the video memory.
>
> Best regards
> Thomas
>
> [1]
> https://cgit.freedesktop.org/openchrome/drm-openchrome/tree/drivers/gpu/drm/openchrome/openchrome_fb.c?h=drm-next-5.10&id=22e0ee2460b4b70cde562b7a3818ae94c2786f46#n102
>
> [2]
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/drm_gem_vram_helper.c?h=v5.9#n1285
>
> >
> > Regards,
> >
> > Kevin Brace
> > Brace Computer Laboratory blog
> > https://bracecomputerlab.com
> >
> >
> >> Sent: Sunday, October 18, 2020 at 2:04 PM
> >> From: "Sam Ravnborg" <sam@ravnborg.org>
> >> To: "Kevin Brace" <kevinbrace@gmx.com>
> >> Cc: dri-devel@lists.freedesktop.org, "Dave Airlie" <airlied@redhat.com>
> >> Subject: Re: It appears drm-next TTM cleanup broke something . . .
> >>
> >> Hi Kevin.
> >>
> >> On Sun, Oct 18, 2020 at 09:15:17PM +0200, Kevin Brace wrote:
> >>> As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.
> >>
> >> I know you have been working on and off on the openchrome driver for a
> >> long time now. Any chance we will see the driver submitted for upstream soon?
> >>
> >>      Sam
> >>
> > _______________________________________________
> > dri-devel mailing list
> > dri-devel@lists.freedesktop.org
> > https://lists.freedesktop.org/mailman/listinfo/dri-devel
> >
>
> --
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Felix Imendörffer
> _______________________________________________
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

* Re: It appears drm-next TTM cleanup broke something . . .
  2020-10-21  8:14       ` Daniel Vetter
@ 2020-10-21  8:30         ` Thomas Zimmermann
  0 siblings, 0 replies; 16+ messages in thread
From: Thomas Zimmermann @ 2020-10-21  8:30 UTC (permalink / raw)
  To: Daniel Vetter; +Cc: Dave Airlie, Sam Ravnborg, dri-devel, Kevin Brace

Hi

On 21.10.20 10:14, Daniel Vetter wrote:
> On Wed, Oct 21, 2020 at 10:03 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>>
>> Hi
>>
>> On 19.10.20 21:43, Kevin Brace wrote:
>>> Hi Sam,
>>>
>>> Thanks for asking the question.
>>> The current OpenChrome DRM code has these two major issues.
>>>
>>> 1) It does not support atomic modesetting
>>>
>>> I do internally have working code to support atomic modesetting, but it is not ready for committing into the upstream OpenChrome DRM repository.
>>> In particular, it suffers from a freeze relating to a cursor plane.
>>> The freeze is a bad kind that kern.log does not really tell me what is wrong.
>>> If I disable hardware cursor, the atomic modesetting based OpenChrome DRM appears to work okay.
>>> In other words, I am getting close to getting atomic modesetting working, but I am stuck.
>>
>> This could be related to the memory problems. See below. Otherwise, I
>> suggest to reduce the driver to the minimum functionality that is
>> required for modesetting (even without HW cursors) and submit this code
>> for review and merging.
>>
>>>
>>>
>>> 2) Double allocation of visible portion of frame buffer
>>>
>>> This is a big problem left behind from the previous developer who developed OpenChrome prior to me.
>>> For some reason, the developer wanted to allocate visible portion of the frame buffer to be the maximum possible size supported by the detected monitor when initializing the frame buffer inside OpenChrome DRM code.
>>> I believe Radeon DRM does something similar to that.
>>> The problem is, OpenChrome DDX allocates an equal sized frame buffer visible portion during the DDX's initialization.
>>> This means that we got two same sized visible portions allocated, but OpenChrome DDX and OpenChrome DRM combined should really be allocating only one.
>>> At this point, OpenChrome is not supporting double buffering.
>>> This double allocation of a visible portion of the frame buffer contributes to a X Server crash when the screen is resized and 16 MB or less (i.e., 8 MB) shared frame buffer is reserved by the system via BIOS setup.
>>> I personally think letting OpenChrome DRM allocate the visible portion of the frame buffer is the way to go, but if so, how do I get the DDX or shadow FB to access the frame buffer visible portion allocated by OpenChrome DRM?
>>> Any suggestions on what to do about this issue will be greatly appreciated.
>>> Perhaps, I should post a question to dri-devel regarding this issue.
>>> I really do not know what I should do at this point.
>>
>> The double allocation is expected. Atomic modesetting requires two
>> framebuffers in video memory during the pageflip. You have to sort out
>> the modes where 2 framebuffers do not fit into video memory at the same
>> time.
> 
> What we have done on severly restricted discrete gpu is to keep one
> framebuffer for everyone in vram, and blt the kms framebuffers over as
> needed. With all the dirty tracking helpers for atomic that's like a
> one-liner to set up (or just slightly more). I think cirrus works like
> that (but it's using cpu memcpy since that's the only thing that
> exists, I guess openchrome could even use the blitter for this).

A yes, converting to SHMEM is also an option. But it prevents any kind
of 3d acceleration, if you want to add that as well.

> 
> The more usual approach is what nouveau guys already explained: Just
> run fbcon at very low resolution so it doesn't consume too much space.

A better approach is to kill openchromes custom fbdev implementation
entirely. During a review, we'd probably ask you to do this anyway. :)

Go for the generic fbdev emulation and set struct
drm_mode_config.prefer_shadow_fbdev to true. This will give you a shadow
buffer for the console, and the actual fbdev framebuffer can be evicted
from video memory when the space is required.

Best regards
Thomas

> -Daniel
> 
>> The mode_valid callback in struct drm_mode_config_funcs [1] is a good
>> place to do this. Check the mode's pixels with the maximum BPC against
>> the available memory. Example code is at [2]. You should also plane for
>> common additional layers, such as HW cursors, to require video memory.
>> So maybe test the mode against 80% of the video memory.
>>
>> Best regards
>> Thomas
>>
>> [1]
>> https://cgit.freedesktop.org/openchrome/drm-openchrome/tree/drivers/gpu/drm/openchrome/openchrome_fb.c?h=drm-next-5.10&id=22e0ee2460b4b70cde562b7a3818ae94c2786f46#n102
>>
>> [2]
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/drm_gem_vram_helper.c?h=v5.9#n1285
>>
>>>
>>> Regards,
>>>
>>> Kevin Brace
>>> Brace Computer Laboratory blog
>>> https://bracecomputerlab.com
>>>
>>>
>>>> Sent: Sunday, October 18, 2020 at 2:04 PM
>>>> From: "Sam Ravnborg" <sam@ravnborg.org>
>>>> To: "Kevin Brace" <kevinbrace@gmx.com>
>>>> Cc: dri-devel@lists.freedesktop.org, "Dave Airlie" <airlied@redhat.com>
>>>> Subject: Re: It appears drm-next TTM cleanup broke something . . .
>>>>
>>>> Hi Kevin.
>>>>
>>>> On Sun, Oct 18, 2020 at 09:15:17PM +0200, Kevin Brace wrote:
>>>>> As usual, I pulled in DRM repository code for an out of tree OpenChrome DRM repository a few days ago.
>>>>
>>>> I know you have been working on and off on the openchrome driver for a
>>>> long time now. Any chance we will see the driver submitted for upstream soon?
>>>>
>>>>      Sam
>>>>
>>> _______________________________________________
>>> dri-devel mailing list
>>> dri-devel@lists.freedesktop.org
>>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
>>>
>>
>> --
>> Thomas Zimmermann
>> Graphics Driver Developer
>> SUSE Software Solutions Germany GmbH
>> Maxfeldstr. 5, 90409 Nürnberg, Germany
>> (HRB 36809, AG Nürnberg)
>> Geschäftsführer: Felix Imendörffer
>> _______________________________________________
>> dri-devel mailing list
>> dri-devel@lists.freedesktop.org
>> https://lists.freedesktop.org/mailman/listinfo/dri-devel
> 
> 
> 

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Felix Imendörffer
_______________________________________________
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel

^ permalink raw reply	[flat|nested] 16+ messages in thread

end of thread, other threads:[~2020-10-21  8:30 UTC | newest]

Thread overview: 16+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-10-18 19:15 It appears drm-next TTM cleanup broke something . . Kevin Brace
2020-10-18 19:50 ` Dave Airlie
2020-10-19  7:23   ` Kevin Brace
2020-10-19 10:13     ` Christian König
2020-10-19 16:20       ` Kevin Brace
2020-10-19 16:37         ` Christian König
2020-10-18 21:04 ` Sam Ravnborg
2020-10-19 19:43   ` Kevin Brace
2020-10-19 20:28     ` Sam Ravnborg
2020-10-20  6:45       ` Thomas Zimmermann
2020-10-20  9:30         ` Sam Ravnborg
2020-10-20 17:17     ` Alex Deucher
2020-10-20 18:03       ` Ilia Mirkin
2020-10-21  8:03     ` Thomas Zimmermann
2020-10-21  8:14       ` Daniel Vetter
2020-10-21  8:30         ` Thomas Zimmermann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.