Hi, On 8/7/19 6:42 PM, Thomas Zimmermann wrote: > Hi Rong > > Am 06.08.19 um 14:59 schrieb Chen, Rong A: >> Hi, >> >> On 8/5/2019 6:25 PM, Thomas Zimmermann wrote: >>> Hi >>> >>> Am 05.08.19 um 09:28 schrieb Rong Chen: >>>> Hi, >>>> >>>> On 8/5/19 3:02 PM, Feng Tang wrote: >>>>> Hi Thomas, >>>>> >>>>> On Sun, Aug 04, 2019 at 08:39:19PM +0200, Thomas Zimmermann wrote: >>>>>> Hi >>>>>> >>>>>> I did some further analysis on this problem and found that the blinking >>>>>> cursor affects performance of the vm-scalability test case. >>>>>> >>>>>> I only have a 4-core machine, so scalability is not really testable. Yet >>>>>> I see the effects of running vm-scalibility against drm-tip, a revert of >>>>>> the mgag200 patch and the vmap fixes that I posted a few days ago. >>>>>> >>>>>> After reverting the mgag200 patch, running the test as described in the >>>>>> report >>>>>> >>>>>>    bin/lkp run job.yaml >>>>>> >>>>>> gives results like >>>>>> >>>>>>    2019-08-02 19:34:37  ./case-anon-cow-seq-hugetlb >>>>>>    2019-08-02 19:34:37  ./usemem --runtime 300 -n 4 --prealloc >>>>>> --prefault >>>>>>      -O -U 815395225 >>>>>>    917319627 bytes / 756534 usecs = 1184110 KB/s >>>>>>    917319627 bytes / 764675 usecs = 1171504 KB/s >>>>>>    917319627 bytes / 766414 usecs = 1168846 KB/s >>>>>>    917319627 bytes / 777990 usecs = 1151454 KB/s >>>>>> >>>>>> Running the test against current drm-tip gives slightly worse results, >>>>>> such as. >>>>>> >>>>>>    2019-08-03 19:17:06  ./case-anon-cow-seq-hugetlb >>>>>>    2019-08-03 19:17:06  ./usemem --runtime 300 -n 4 --prealloc >>>>>> --prefault >>>>>>      -O -U 815394406 >>>>>>    917318700 bytes / 871607 usecs = 1027778 KB/s >>>>>>    917318700 bytes / 894173 usecs = 1001840 KB/s >>>>>>    917318700 bytes / 919694 usecs = 974040 KB/s >>>>>>    917318700 bytes / 923341 usecs = 970193 KB/s >>>>>> >>>>>> The test puts out roughly one result per second. Strangely sending the >>>>>> output to /dev/null can make results significantly worse. >>>>>> >>>>>>    bin/lkp run job.yaml > /dev/null >>>>>> >>>>>>    2019-08-03 19:23:04  ./case-anon-cow-seq-hugetlb >>>>>>    2019-08-03 19:23:04  ./usemem --runtime 300 -n 4 --prealloc >>>>>> --prefault >>>>>>      -O -U 815394406 >>>>>>    917318700 bytes / 1207358 usecs = 741966 KB/s >>>>>>    917318700 bytes / 1210456 usecs = 740067 KB/s >>>>>>    917318700 bytes / 1216572 usecs = 736346 KB/s >>>>>>    917318700 bytes / 1239152 usecs = 722929 KB/s >>>>>> >>>>>> I realized that there's still a blinking cursor on the screen, which I >>>>>> disabled with >>>>>> >>>>>>    tput civis >>>>>> >>>>>> or alternatively >>>>>> >>>>>>    echo 0 > /sys/devices/virtual/graphics/fbcon/cursor_blink >>>>>> >>>>>> Running the the test now gives the original or even better results, >>>>>> such as >>>>>> >>>>>>    bin/lkp run job.yaml > /dev/null >>>>>> >>>>>>    2019-08-03 19:29:17  ./case-anon-cow-seq-hugetlb >>>>>>    2019-08-03 19:29:17  ./usemem --runtime 300 -n 4 --prealloc >>>>>> --prefault >>>>>>      -O -U 815394406 >>>>>>    917318700 bytes / 659419 usecs = 1358497 KB/s >>>>>>    917318700 bytes / 659658 usecs = 1358005 KB/s >>>>>>    917318700 bytes / 659916 usecs = 1357474 KB/s >>>>>>    917318700 bytes / 660168 usecs = 1356956 KB/s >>>>>> >>>>>> Rong, Feng, could you confirm this by disabling the cursor or blinking? >>>>> Glad to know this method restored the drop. Rong is running the case. >>>> I set "echo 0 > /sys/devices/virtual/graphics/fbcon/cursor_blink" for >>>> both commits, >>>> and the regression has no obvious change. >>> Ah, I see. Thank you for testing. There are two questions that come to >>> my mind: did you send the regular output to /dev/null? And what happens >>> if you disable the cursor with 'tput civis'? >> I didn't send the output to /dev/null because we need to collect data >> from the output, > You can send it to any file, as long as it doesn't show up on the > console. I also found the latest results in the file result/vm-scalability. > > >> Actually we run the benchmark as a background process, do we need to >> disable the cursor and test again? > There's a worker thread that updates the display from the shadow buffer. > The blinking cursor periodically triggers the worker thread, but the > actual update is just the size of one character. > > The point of the test without output is to see if the regression comes > from the buffer update (i.e., the memcpy from shadow buffer to VRAM), or > from the worker thread. If the regression goes away after disabling the > blinking cursor, then the worker thread is the problem. If it already > goes away if there's simply no output from the test, the screen update > is the problem. On my machine I have to disable the blinking cursor, so > I think the worker causes the performance drop. We disabled redirecting stdout/stderr to /dev/kmsg,  and the regression is gone. commit:   f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console   90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic framebuffer emulation f1f8555dfb9a70a2  90f479ae51afa45efab97afdde testcase/testparams/testbox ----------------  -------------------------- ---------------------------          %stddev      change         %stddev              \          |                \      43785                       44481 vm-scalability/300s-8T-anon-cow-seq-hugetlb/lkp-knm01      43785                       44481        GEO-MEAN vm-scalability.median Best Regards, Rong Chen > > Best regards > Thomas > >> Best Regards, >> Rong Chen >> >>> If there is absolutely nothing changing on the screen, I don't see how >>> the regression could persist. >>> >>> Best regards >>> Thomas >>> >>> >>>> commit: >>>>   f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console >>>>   90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic >>>> framebuffer emulation >>>> >>>> f1f8555dfb9a70a2  90f479ae51afa45efab97afdde testcase/testparams/testbox >>>> ----------------  -------------------------- --------------------------- >>>>          %stddev      change         %stddev >>>>              \          |                \ >>>>      43394             -20%      34575 ±  3% >>>> vm-scalability/performance-300s-8T-anon-cow-seq-hugetlb/lkp-knm01 >>>>      43393             -20%      34575        GEO-MEAN >>>> vm-scalability.median >>>> >>>> Best Regards, >>>> Rong Chen >>>> >>>>> While I have another finds, as I noticed your patch changed the bpp from >>>>> 24 to 32, I had a patch to change it back to 24, and run the case in >>>>> the weekend, the -18% regrssion was reduced to about -5%. Could this >>>>> be related? >>>>> >>>>> commit: >>>>>    f1f8555dfb9 drm/bochs: Use shadow buffer for bochs framebuffer console >>>>>    90f479ae51a drm/mgag200: Replace struct mga_fbdev with generic >>>>> framebuffer emulation >>>>>    01e75fea0d5 mgag200: restore the depth back to 24 >>>>> >>>>> f1f8555dfb9a70a2 90f479ae51afa45efab97afdde9 01e75fea0d5ff39d3e588c20ec5 >>>>> ---------------- --------------------------- --------------------------- >>>>>       43921 ±  2%     -18.3%      35884            -4.8% >>>>> 41826        vm-scalability.median >>>>>    14889337           -17.5%   12291029            -4.1% >>>>> 14278574        vm-scalability.throughput >>>>>   commit 01e75fea0d5ff39d3e588c20ec52e7a4e6588a74 >>>>> Author: Feng Tang >>>>> Date:   Fri Aug 2 15:09:19 2019 +0800 >>>>> >>>>>      mgag200: restore the depth back to 24 >>>>>           Signed-off-by: Feng Tang >>>>> >>>>> diff --git a/drivers/gpu/drm/mgag200/mgag200_main.c >>>>> b/drivers/gpu/drm/mgag200/mgag200_main.c >>>>> index a977333..ac8f6c9 100644 >>>>> --- a/drivers/gpu/drm/mgag200/mgag200_main.c >>>>> +++ b/drivers/gpu/drm/mgag200/mgag200_main.c >>>>> @@ -162,7 +162,7 @@ int mgag200_driver_load(struct drm_device *dev, >>>>> unsigned long flags) >>>>>       if (IS_G200_SE(mdev) && mdev->mc.vram_size < (2048*1024)) >>>>>           dev->mode_config.preferred_depth = 16; >>>>>       else >>>>> -        dev->mode_config.preferred_depth = 32; >>>>> +        dev->mode_config.preferred_depth = 24; >>>>>       dev->mode_config.prefer_shadow = 1; >>>>>         r = mgag200_modeset_init(mdev); >>>>> >>>>> Thanks, >>>>> Feng >>>>> >>>>>> The difference between mgag200's original fbdev support and generic >>>>>> fbdev emulation is generic fbdev's worker task that updates the VRAM >>>>>> buffer from the shadow buffer. mgag200 does this immediately, but relies >>>>>> on drm_can_sleep(), which is deprecated. >>>>>> >>>>>> I think that the worker task interferes with the test case, as the >>>>>> worker has been in fbdev emulation since forever and no performance >>>>>> regressions have been reported so far. >>>>>> >>>>>> >>>>>> So unless there's a report where this problem happens in a real-world >>>>>> use case, I'd like to keep code as it is. And apparently there's always >>>>>> the workaround of disabling the cursor blinking. >>>>>> >>>>>> Best regards >>>>>> Thomas >>>>>> >>> _______________________________________________ >>> LKP mailing list >>> LKP@lists.01.org >>> https://lists.01.org/mailman/listinfo/lkp _______________________________________________ dri-devel mailing list dri-devel@lists.freedesktop.org https://lists.freedesktop.org/mailman/listinfo/dri-devel