All of lore.kernel.org
 help / color / mirror / Atom feed
* [RFC]: Arbitrated system memory bandwidth workarounds implementation for watermark.
@ 2017-03-27 15:52 Mahesh Kumar
  2017-03-28  8:08 ` Maarten Lankhorst
  0 siblings, 1 reply; 3+ messages in thread
From: Mahesh Kumar @ 2017-03-27 15:52 UTC (permalink / raw)
  To: intel-gfx, mahesh1.kumar, Zanoni, Paulo R, Maarten Lankhorst,
	Matt Roper, daniel.vetter


[-- Attachment #1.1: Type: text/plain, Size: 4016 bytes --]

*Arbitrated system bandwidth workarounds for watermark.*

All GEN-9 based platforms require watermark related WA to be enabled if 
Display memory bandwidth requirement is exceeding XX% of total available 
system memory bandwidth.
This XX% depend on multiple factors.
*e.g.* if all the enabled planes have X-tiled or linear memory then,
                     XX = 60
         if any Y-tiled plane is enabled then
                     XX = 20 etc.
In current implementation of workarounds we enable maximum WA (i.e. add 
15us latency during WM calculation) irrespective of workaround is 
required OR not.
total display bandwidth requirement is sum of display requirement of 
individual pipe, In order to calculate correct BW requirement plane 
configuration of any pipe should not be changing during calculation.

To implement & optimize above requirement many implementations are 
possible, I'm proposing few of options.
Please review & let know which option is better to implement WA's.

*Option 1:*

    Use connection_mutex (this will change to i915 specific lock only
    that is available in atomic design) to serialize all the commits.
    If memory bandwidth WA is changing then get all crtc_states for
    calculating watermark values.
    *Pros:*

      * In each flip optimum WM values (not more than the required
        value) will be used.

    *Cons:*

      * This approach will serialize all the flips so there will be
        performance impact, in case of blocking commits this impact will
        be even worse, e.g. three display with refresh-rate of 30fps,
        60fps & 90fps.
      * If commit is going-on in 30FPS display, all other flip will be
        blocked & frames in 60 & 90fps display will be dropped/blocked.

*
Option 2:*

    Use two levels of system bandwidth check, once during calculation &
    second during commit.
    During intel_atomic_check (as part of compute_ddb) don’t hold any
    system level mutex, instead hold WM mutex & compute system bandwidth
    requirement. If WA is changing then get crtc_state of all other
    pipes & go  ahead with commit.
    During intel_atomic_commit, again take wm_mutex & recalculate
    complete system bandwidth requirement. If requirement is changed in
    a way that computed WM are not valid anymore fail the flip.
    Update the bandwidth requirement for each plane in global state
    (dev_priv->wm) so other flips don’t need to recalculate it.

    *Pros:*

      * It reduces critical section time.
      * Still optimum use of available DDB & optimum WM values are used.

    *Cons:*

      * If memory bandwidth WA are changing very frequently then there
        will be many flip failures which will impact the performance.


*Option 3:*

    Compute maximum bandwidth requirement during modeset.
    i.e. if modeset is of 1080p @60fps & maximum plane in CRTC are 3, 
    with maximum supported downscale amount “XX.YY” (defined by min of
    cdclk/crtc_clock  & max(hscale x vscale)) then max bandwidth
    requirement for CRTC will be
    (1080p x 60 x 3 x XX.YY).

    Now during flip if there is any change which will change the WA
    (e.g. tiling change) then take wm_mutex lock & recalculate complete
    bandwidth requirement. If WA is changing then get crtc_state of all
    other pipes & go ahead with commit. (if total display memory BW %
    is  less than lowest % to enable WA i.e. 20%, then no need to recompute)
    Update per-CRTC bandwidth requirement in global state so other flips
    don’t need to recalculate each time.

    *Pros:*

      * All CRTC can flip independently until there is change which will
        impact WA.
      * No locking until potential WM WA change.

    *Cons:*

      * If memory bandwidth WA is changing very frequently then there
        will be slight performance impact.
      * We may not be programming optimum WM values, which may have some
        power impact.


If you think any other approach should be used please let know that as well.


Regards,
-Mahesh

[-- Attachment #1.2: Type: text/html, Size: 5120 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC]: Arbitrated system memory bandwidth workarounds implementation for watermark.
  2017-03-27 15:52 [RFC]: Arbitrated system memory bandwidth workarounds implementation for watermark Mahesh Kumar
@ 2017-03-28  8:08 ` Maarten Lankhorst
  2017-04-03  7:56   ` Mahesh Kumar
  0 siblings, 1 reply; 3+ messages in thread
From: Maarten Lankhorst @ 2017-03-28  8:08 UTC (permalink / raw)
  To: Mahesh Kumar, intel-gfx, Zanoni, Paulo R, Matt Roper, daniel.vetter


[-- Attachment #1.1: Type: text/plain, Size: 4546 bytes --]

Op 27-03-17 om 17:52 schreef Mahesh Kumar:
> *Arbitrated system bandwidth workarounds for watermark.*
>  
> All GEN-9 based platforms require watermark related WA to be enabled if Display memory bandwidth requirement is exceeding XX% of total available system memory bandwidth.
> This XX% depend on multiple factors.
> *e.g.* if all the enabled planes have X-tiled or linear memory then,
>                     XX = 60
>         if any Y-tiled plane is enabled then
>                     XX = 20 etc.
> In current implementation of workarounds we enable maximum WA (i.e. add 15us latency during WM calculation) irrespective of workaround is required OR not.
> total display bandwidth requirement is sum of display requirement of individual pipe, In order to calculate correct BW requirement plane configuration of any pipe should not be changing during calculation.
>
> To implement & optimize above requirement many implementations are possible, I'm proposing few of options.
> Please review & let know which option is better to implement WA's.
>  
> *Option 1:*
>
>     Use connection_mutex (this will change to i915 specific lock only that is available in atomic design) to serialize all the commits.
>     If memory bandwidth WA is changing then get all crtc_states for calculating watermark values.
>     *Pros:*
>
>       * In each flip optimum WM values (not more than the required value) will be used.
>
>     *Cons:*
>
>       * This approach will serialize all the flips so there will be performance impact, in case of blocking commits this impact will be even worse, e.g. three display with refresh-rate of 30fps, 60fps & 90fps.
>       * If commit is going-on in 30FPS display, all other flip will be blocked & frames in 60 & 90fps display will be dropped/blocked.
>
> *Option 2:*
>
>     Use two levels of system bandwidth check, once during calculation & second during commit.
>     During intel_atomic_check (as part of compute_ddb) don’t hold any system level mutex, instead hold WM mutex & compute system bandwidth requirement. If WA is changing then get crtc_state of all other pipes & go  ahead with commit.
>     During intel_atomic_commit, again take wm_mutex & recalculate complete system bandwidth requirement. If requirement is changed in a way that computed WM are not valid anymore fail the flip.
>     Update the bandwidth requirement for each plane in global state (dev_priv->wm) so other flips don’t need to recalculate it.
>      
>     *Pros:*
>
>       * It reduces critical section time.
>       * Still optimum use of available DDB & optimum WM values are used.
>
>     *Cons:*
>
>       * If memory bandwidth WA are changing very frequently then there will be many flip failures which will impact the performance.
>
>      
>
> *Option 3:*
>
>     Compute maximum bandwidth requirement during modeset.
>     i.e. if modeset is of 1080p @60fps & maximum plane in CRTC are 3,  with maximum supported downscale amount “XX.YY” (defined by min of cdclk/crtc_clock  & max(hscale x vscale)) then max bandwidth requirement for CRTC will be
>     (1080p x 60 x 3 x XX.YY).
>      
>     Now during flip if there is any change which will change the WA (e.g. tiling change) then take wm_mutex lock & recalculate complete bandwidth requirement. If WA is changing then get crtc_state of all other pipes & go ahead with commit. (if total display memory BW % is  less than lowest % to enable WA i.e. 20%, then no need to recompute)
>     Update per-CRTC bandwidth requirement in global state so other flips don’t need to recalculate each time.
>      
>     *Pros:*
>
>       * All CRTC can flip independently until there is change which will impact WA.
>       * No locking until potential WM WA change.
>
>     *Cons:*
>
>       * If memory bandwidth WA is changing very frequently then there will be slight performance impact.
>       * We may not be programming optimum WM values, which may have some power impact.
>
>
> If you think any other approach should be used please let know that as well.
>
>
Option 4:
	Check if watermarks for the current pipe needs global adjustment between last commit and current, if not do nothing.

	If there is, we could do 1 of the below:
		1. Blindly grab all other crtc state and do watermark reprogramming.
		2. If it does need adjustment, grab all other crtc's mutexes and see if we need to adjust watermark state. If we do, grab other affected crtc's states as well to perform watermark reprogramming.

	Perhaps add some elements of option 3 too? I like that one too.

~Maarten


[-- Attachment #1.2: Type: text/html, Size: 6128 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [RFC]: Arbitrated system memory bandwidth workarounds implementation for watermark.
  2017-03-28  8:08 ` Maarten Lankhorst
@ 2017-04-03  7:56   ` Mahesh Kumar
  0 siblings, 0 replies; 3+ messages in thread
From: Mahesh Kumar @ 2017-04-03  7:56 UTC (permalink / raw)
  To: Maarten Lankhorst, intel-gfx, Zanoni, Paulo R, Matt Roper, daniel.vetter


[-- Attachment #1.1: Type: text/plain, Size: 7173 bytes --]

Hi Maarten,
sorry for delay in reply...

In Option 3:

We know maximum number of plane for any given CRTC, We also know, what 
is the maximum downscaling supported (only downscaling affects WM) per 
pipe/plane.

     Maximum downscaling per plane can be :

             max plane hscale * max plane vscale,    which is 2.99x2.99 
in GEN9

     This scaling should also be less than cdclk / pixel clock.

     same limitation applies for pipe downscaling as well.

     following patch implements limitation related to cdclk/pixel_clock 
(max supported pixel rate).

         https://patchwork.freedesktop.org/patch/141210/

So our downscaling related final limitation will be something like

             min ( (max_plane_hscale * max_plane_vscale) * 
(max_pipe_hscale * max_pipe_vscale), (cdclk / pixel_clock))

             min (2.99*2.99*2.99*2.99, (cdclk / pixel_clock))

During modeset we can compute the same & enable the WA.

One of mem bandwidth limitation is, if Y_tile is enabled in any of the 
plane & total display bandwidth is > 20% then enable Y-tile specific WA, 
20% mark will hit only in case of DRAM connected is of lower frequency 
OR high resolution & high refresh-rate  monitors are connected.

for X-tile WA this % is 35% OR 60%, So we have pretty slim chances of 
hitting the situation.

for e.g. 4K@60 display will have pixel clock about 540-545MHz, & cdclk 
will be 594MHz

if 1600MHz dual-channel DRAM is connected to the system, then available 
system bandwidth will be :

     1600 * 2 * 8 = 25600,

if 3 planes are enabled & all 3 pipes are enables in that case total 
display bandwidth requirement will be approx

     545 * 3 * 3 = 4905, which is roughly 20% (19.16%) of total 
available bandwidth, & y-tile WA maybe needed

if downscaling is enabled max supported downscaling will be (594 / 545) 
1.08%,

in such case max display bandwidth requirement may reach

     545 * 1.08 * 3 * 3 = 5297.4, which is 20.69%, & Y-tile WA will be 
needed.

for higher frequency DRAM this % will be even less

so whenever total bandwidth is going > 20% & Y-tile is enabled, then 
only we may need to take the mutex of all CRTC, so there will be fairly 
less changes of holding any lock.

Regards,

-Mahesh

On Tuesday 28 March 2017 01:38 PM, Maarten Lankhorst wrote:
> Op 27-03-17 om 17:52 schreef Mahesh Kumar:
>> *Arbitrated system bandwidth workarounds for watermark.*
>>
>> All GEN-9 based platforms require watermark related WA to be enabled 
>> if Display memory bandwidth requirement is exceeding XX% of total 
>> available system memory bandwidth.
>> This XX% depend on multiple factors.
>> *e.g.* if all the enabled planes have X-tiled or linear memory then,
>>                     XX = 60
>>         if any Y-tiled plane is enabled then
>>                     XX = 20 etc.
>> In current implementation of workarounds we enable maximum WA (i.e. 
>> add 15us latency during WM calculation) irrespective of workaround is 
>> required OR not.
>> total display bandwidth requirement is sum of display requirement of 
>> individual pipe, In order to calculate correct BW requirement plane 
>> configuration of any pipe should not be changing during calculation.
>>
>> To implement & optimize above requirement many implementations are 
>> possible, I'm proposing few of options.
>> Please review & let know which option is better to implement WA's.
>>
>> *Option 1:*
>>
>>     Use connection_mutex (this will change to i915 specific lock only
>>     that is available in atomic design) to serialize all the commits.
>>     If memory bandwidth WA is changing then get all crtc_states for
>>     calculating watermark values.
>>     *Pros:*
>>
>>       * In each flip optimum WM values (not more than the required
>>         value) will be used.
>>
>>     *Cons:*
>>
>>       * This approach will serialize all the flips so there will be
>>         performance impact, in case of blocking commits this impact
>>         will be even worse, e.g. three display with refresh-rate of
>>         30fps, 60fps & 90fps.
>>       * If commit is going-on in 30FPS display, all other flip will
>>         be blocked & frames in 60 & 90fps display will be
>>         dropped/blocked.
>>
>> *Option 2:*
>>
>>     Use two levels of system bandwidth check, once during calculation
>>     & second during commit.
>>     During intel_atomic_check (as part of compute_ddb) don’t hold any
>>     system level mutex, instead hold WM mutex & compute system
>>     bandwidth requirement. If WA is changing then get crtc_state of
>>     all other pipes & go  ahead with commit.
>>     During intel_atomic_commit, again take wm_mutex & recalculate
>>     complete system bandwidth requirement. If requirement is changed
>>     in a way that computed WM are not valid anymore fail the flip.
>>     Update the bandwidth requirement for each plane in global state
>>     (dev_priv->wm) so other flips don’t need to recalculate it.
>>
>>     *Pros:*
>>
>>       * It reduces critical section time.
>>       * Still optimum use of available DDB & optimum WM values are used.
>>
>>     *Cons:*
>>
>>       * If memory bandwidth WA are changing very frequently then
>>         there will be many flip failures which will impact the
>>         performance.
>>
>>
>> *Option 3:*
>>
>>     Compute maximum bandwidth requirement during modeset.
>>     i.e. if modeset is of 1080p @60fps & maximum plane in CRTC are
>>     3,  with maximum supported downscale amount “XX.YY” (defined by
>>     min of cdclk/crtc_clock  & max(hscale x vscale)) then max
>>     bandwidth requirement for CRTC will be
>>     (1080p x 60 x 3 x XX.YY).
>>
>>     Now during flip if there is any change which will change the WA
>>     (e.g. tiling change) then take wm_mutex lock & recalculate
>>     complete bandwidth requirement. If WA is changing then get
>>     crtc_state of all other pipes & go ahead with commit. (if total
>>     display memory BW % is  less than lowest % to enable WA i.e. 20%,
>>     then no need to recompute)
>>     Update per-CRTC bandwidth requirement in global state so other
>>     flips don’t need to recalculate each time.
>>
>>     *Pros:*
>>
>>       * All CRTC can flip independently until there is change which
>>         will impact WA.
>>       * No locking until potential WM WA change.
>>
>>     *Cons:*
>>
>>       * If memory bandwidth WA is changing very frequently then there
>>         will be slight performance impact.
>>       * We may not be programming optimum WM values, which may have
>>         some power impact.
>>
>>
>> If you think any other approach should be used please let know that 
>> as well.
>>
>>
> Option 4:
> 	Check if watermarks for the current pipe needs global adjustment between last commit and current, if not do nothing.
>
> 	If there is, we could do 1 of the below:
> 		1. Blindly grab all other crtc state and do watermark reprogramming.
> 		2. If it does need adjustment, grab all other crtc's mutexes and see if we need to adjust watermark state. If we do, grab other affected crtc's states as well to perform watermark reprogramming.
>
> 	Perhaps add some elements of option 3 too? I like that one too.
>
> ~Maarten
>


[-- Attachment #1.2: Type: text/html, Size: 9567 bytes --]

[-- Attachment #2: Type: text/plain, Size: 160 bytes --]

_______________________________________________
Intel-gfx mailing list
Intel-gfx@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/intel-gfx

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2017-04-03  7:53 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-03-27 15:52 [RFC]: Arbitrated system memory bandwidth workarounds implementation for watermark Mahesh Kumar
2017-03-28  8:08 ` Maarten Lankhorst
2017-04-03  7:56   ` Mahesh Kumar

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.