Hi Thomas, 

On Mon, Aug 12, 2019 at 03:25:45PM +0800, Feng Tang wrote:
> Hi Thomas,
> 
> On Fri, Aug 09, 2019 at 04:12:29PM +0800, Rong Chen wrote:
> > Hi,
> > 
> > >>Actually we run the benchmark as a background process, do we need to
> > >>disable the cursor and test again?
> > >There's a worker thread that updates the display from the shadow buffer.
> > >The blinking cursor periodically triggers the worker thread, but the
> > >actual update is just the size of one character.
> > >
> > >The point of the test without output is to see if the regression comes
> > >from the buffer update (i.e., the memcpy from shadow buffer to VRAM), or
> > >from the worker thread. If the regression goes away after disabling the
> > >blinking cursor, then the worker thread is the problem. If it already
> > >goes away if there's simply no output from the test, the screen update
> > >is the problem. On my machine I have to disable the blinking cursor, so
> > >I think the worker causes the performance drop.
> > 
> > We disabled redirecting stdout/stderr to /dev/kmsg,  and the regression is
> > gone.
> > 
> > commit:
> >   f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console
> >   90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic framebuffer
> > emulation
> > 
> > f1f8555dfb9a70a2  90f479ae51afa45efab97afdde testcase/testparams/testbox
> > ----------------  -------------------------- ---------------------------
> >          %stddev      change         %stddev
> >              \          |                \
> >      43785                       44481
> > vm-scalability/300s-8T-anon-cow-seq-hugetlb/lkp-knm01
> >      43785                       44481        GEO-MEAN vm-scalability.median
> 
> Till now, from Rong's tests:
> 1. Disabling cursor blinking doesn't cure the regression.
> 2. Disabling printint test results to console can workaround the
> regression.
> 
> Also if we set the perfer_shadown to 0, the regression is also
> gone.

We also did some further break down for the time consumed by the
new code.

The drm_fb_helper_dirty_work() calls sequentially 
1. drm_client_buffer_vmap	  (290 us)
2. drm_fb_helper_dirty_blit_real  (19240 us)
3. helper->fb->funcs->dirty()    ---> NULL for mgag200 driver
4. drm_client_buffer_vunmap       (215 us)

The average run time is listed after the function names.

From it, we can see drm_fb_helper_dirty_blit_real() takes too long
time (about 20ms for each run). I guess this is the root cause
of this regression, as the original code doesn't use this dirty worker.

As said in last email, setting the prefer_shadow to 0 can avoid
the regrssion. Could it be an option?

Thanks,
Feng

> 
> --- a/drivers/gpu/drm/mgag200/mgag200_main.c
> +++ b/drivers/gpu/drm/mgag200/mgag200_main.c
> @@ -167,7 +167,7 @@ int mgag200_driver_load(struct drm_device *dev, unsigned long flags)
>  		dev->mode_config.preferred_depth = 16;
>  	else
>  		dev->mode_config.preferred_depth = 32;
> -	dev->mode_config.prefer_shadow = 1;
> +	dev->mode_config.prefer_shadow = 0;
> 
> And from the perf data, one obvious difference is good case don't
> call drm_fb_helper_dirty_work(), while bad case calls.
> 
> Thanks,
> Feng
> 
> > Best Regards,
> > Rong Chen