All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2] fbdev: Significantly improve performance of fbdefio mmap
@ 2022-02-10 14:11 ` Thomas Zimmermann
  0 siblings, 0 replies; 28+ messages in thread
From: Thomas Zimmermann @ 2022-02-10 14:11 UTC (permalink / raw)
  To: daniel, javierm, noralf, andriy.shevchenko, deller, bernie, jayalk
  Cc: dri-devel, linux-fbdev, linux-staging, Thomas Zimmermann

Remove the bubble sort from fbdev defered I/O page tracking. Most
drivers only want to know which pages have been written to. The exact
order is not important.

Tested on simpledrm.

Thomas Zimmermann (2):
  fbdev/defio: Early-out if page is already enlisted
  fbdev: Don't sorting deferred-I/O pages by default

 drivers/staging/fbtft/fbtft-core.c  |  1 +
 drivers/video/fbdev/broadsheetfb.c  |  1 +
 drivers/video/fbdev/core/fb_defio.c | 38 +++++++++++++++++------------
 drivers/video/fbdev/metronomefb.c   |  1 +
 drivers/video/fbdev/udlfb.c         |  1 +
 include/linux/fb.h                  |  1 +
 6 files changed, 28 insertions(+), 15 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 0/2] fbdev: Significantly improve performance of fbdefio mmap
@ 2022-02-10 14:11 ` Thomas Zimmermann
  0 siblings, 0 replies; 28+ messages in thread
From: Thomas Zimmermann @ 2022-02-10 14:11 UTC (permalink / raw)
  To: daniel, javierm, noralf, andriy.shevchenko, deller, bernie, jayalk
  Cc: linux-fbdev, linux-staging, Thomas Zimmermann, dri-devel

Remove the bubble sort from fbdev defered I/O page tracking. Most
drivers only want to know which pages have been written to. The exact
order is not important.

Tested on simpledrm.

Thomas Zimmermann (2):
  fbdev/defio: Early-out if page is already enlisted
  fbdev: Don't sorting deferred-I/O pages by default

 drivers/staging/fbtft/fbtft-core.c  |  1 +
 drivers/video/fbdev/broadsheetfb.c  |  1 +
 drivers/video/fbdev/core/fb_defio.c | 38 +++++++++++++++++------------
 drivers/video/fbdev/metronomefb.c   |  1 +
 drivers/video/fbdev/udlfb.c         |  1 +
 include/linux/fb.h                  |  1 +
 6 files changed, 28 insertions(+), 15 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 28+ messages in thread

* [PATCH 1/2] fbdev/defio: Early-out if page is already enlisted
  2022-02-10 14:11 ` Thomas Zimmermann
@ 2022-02-10 14:11   ` Thomas Zimmermann
  -1 siblings, 0 replies; 28+ messages in thread
From: Thomas Zimmermann @ 2022-02-10 14:11 UTC (permalink / raw)
  To: daniel, javierm, noralf, andriy.shevchenko, deller, bernie, jayalk
  Cc: dri-devel, linux-fbdev, linux-staging, Thomas Zimmermann

Return early if a page is already in the list of dirty pages for
deferred I/O. This can be detected if the page's list head is not
empty. Keep the list head initialized while the page is not enlisted
to make this work reliably.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
---
 drivers/video/fbdev/core/fb_defio.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
index a591d291b231..3727b1ca87b1 100644
--- a/drivers/video/fbdev/core/fb_defio.c
+++ b/drivers/video/fbdev/core/fb_defio.c
@@ -59,6 +59,7 @@ static vm_fault_t fb_deferred_io_fault(struct vm_fault *vmf)
 		printk(KERN_ERR "no mapping available\n");
 
 	BUG_ON(!page->mapping);
+	INIT_LIST_HEAD(&page->lru);
 	page->index = vmf->pgoff;
 
 	vmf->page = page;
@@ -122,17 +123,19 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
 	 */
 	lock_page(page);
 
+	/*
+	 * This check is to catch the case where a new process could start
+	 * writing to the same page through a new pte. this new access
+	 * can cause the mkwrite even when the original ps's pte is marked
+	 * writable.
+	 */
+	if (!list_empty(&page->lru))
+		goto page_already_added;
+
 	/* we loop through the pagelist before adding in order
 	to keep the pagelist sorted */
 	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
-		/* this check is to catch the case where a new
-		process could start writing to the same page
-		through a new pte. this new access can cause the
-		mkwrite even when the original ps's pte is marked
-		writable */
-		if (unlikely(cur == page))
-			goto page_already_added;
-		else if (cur->index > page->index)
+		if (cur->index > page->index)
 			break;
 	}
 
@@ -194,7 +197,7 @@ static void fb_deferred_io_work(struct work_struct *work)
 
 	/* clear the list */
 	list_for_each_safe(node, next, &fbdefio->pagelist) {
-		list_del(node);
+		list_del_init(node);
 	}
 	mutex_unlock(&fbdefio->lock);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 1/2] fbdev/defio: Early-out if page is already enlisted
@ 2022-02-10 14:11   ` Thomas Zimmermann
  0 siblings, 0 replies; 28+ messages in thread
From: Thomas Zimmermann @ 2022-02-10 14:11 UTC (permalink / raw)
  To: daniel, javierm, noralf, andriy.shevchenko, deller, bernie, jayalk
  Cc: linux-fbdev, linux-staging, Thomas Zimmermann, dri-devel

Return early if a page is already in the list of dirty pages for
deferred I/O. This can be detected if the page's list head is not
empty. Keep the list head initialized while the page is not enlisted
to make this work reliably.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
---
 drivers/video/fbdev/core/fb_defio.c | 21 ++++++++++++---------
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
index a591d291b231..3727b1ca87b1 100644
--- a/drivers/video/fbdev/core/fb_defio.c
+++ b/drivers/video/fbdev/core/fb_defio.c
@@ -59,6 +59,7 @@ static vm_fault_t fb_deferred_io_fault(struct vm_fault *vmf)
 		printk(KERN_ERR "no mapping available\n");
 
 	BUG_ON(!page->mapping);
+	INIT_LIST_HEAD(&page->lru);
 	page->index = vmf->pgoff;
 
 	vmf->page = page;
@@ -122,17 +123,19 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
 	 */
 	lock_page(page);
 
+	/*
+	 * This check is to catch the case where a new process could start
+	 * writing to the same page through a new pte. this new access
+	 * can cause the mkwrite even when the original ps's pte is marked
+	 * writable.
+	 */
+	if (!list_empty(&page->lru))
+		goto page_already_added;
+
 	/* we loop through the pagelist before adding in order
 	to keep the pagelist sorted */
 	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
-		/* this check is to catch the case where a new
-		process could start writing to the same page
-		through a new pte. this new access can cause the
-		mkwrite even when the original ps's pte is marked
-		writable */
-		if (unlikely(cur == page))
-			goto page_already_added;
-		else if (cur->index > page->index)
+		if (cur->index > page->index)
 			break;
 	}
 
@@ -194,7 +197,7 @@ static void fb_deferred_io_work(struct work_struct *work)
 
 	/* clear the list */
 	list_for_each_safe(node, next, &fbdefio->pagelist) {
-		list_del(node);
+		list_del_init(node);
 	}
 	mutex_unlock(&fbdefio->lock);
 }
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
  2022-02-10 14:11 ` Thomas Zimmermann
@ 2022-02-10 14:11   ` Thomas Zimmermann
  -1 siblings, 0 replies; 28+ messages in thread
From: Thomas Zimmermann @ 2022-02-10 14:11 UTC (permalink / raw)
  To: daniel, javierm, noralf, andriy.shevchenko, deller, bernie, jayalk
  Cc: dri-devel, linux-fbdev, linux-staging, Thomas Zimmermann

Fbdev's deferred I/O sorts all dirty pages by default, which incurs a
significant overhead. Make the sorting step optional and update the few
drivers that require it. Use a FIFO list by default.

Sorting pages by memory offset for deferred I/O performs an implicit
bubble-sort step on the list of dirty pages. The algorithm goes through
the list of dirty pages and inserts each new page according to its
index field. Even worse, list traversal always starts at the first
entry. As video memory is most likely updated scanline by scanline, the
algorithm traverses through the complete list for each updated page.

For example, with 1024x768x32bpp a page covers exactly one scanline.
Writing a single screen update from top to bottom requires updating
768 pages. With an average list length of 384 entries, a screen update
creates (768 * 384 =) 294912 compare operation.

Fix this by making the sorting step opt-in and update the few drivers
that require it. All other drivers work with unsorted page lists. Pages
are appended to the list. Therefore, in the common case of writing the
framebuffer top to bottom, pages are still sorted by offset, which may
have a positive effect on performance.

Playing a video [1] in mplayer's benchmark mode shows the difference
(i7-4790, FullHD, simpledrm, kernel with debugging).

  mplayer -benchmark -nosound -vo fbdev ./big_buck_bunny_720p_stereo.ogg

With sorted page lists:

  BENCHMARKs: VC:  32.960s VO:  73.068s A:   0.000s Sys:   2.413s =  108.441s
  BENCHMARK%: VC: 30.3947% VO: 67.3802% A:  0.0000% Sys:  2.2251% = 100.0000%

With unsorted page lists:

  BENCHMARKs: VC:  31.005s VO:  42.889s A:   0.000s Sys:   2.256s =   76.150s
  BENCHMARK%: VC: 40.7156% VO: 56.3219% A:  0.0000% Sys:  2.9625% = 100.0000%

VC shows the overhead of video decoding, VO shows the overhead of the
video output. Using unsorted page lists reduces the benchmark's run time
by ~32s/~25%.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_720p_stereo.ogg # [1]
---
 drivers/staging/fbtft/fbtft-core.c  |  1 +
 drivers/video/fbdev/broadsheetfb.c  |  1 +
 drivers/video/fbdev/core/fb_defio.c | 19 ++++++++++++-------
 drivers/video/fbdev/metronomefb.c   |  1 +
 drivers/video/fbdev/udlfb.c         |  1 +
 include/linux/fb.h                  |  1 +
 6 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/fbtft/fbtft-core.c b/drivers/staging/fbtft/fbtft-core.c
index f2684d2d6851..4a35347b3020 100644
--- a/drivers/staging/fbtft/fbtft-core.c
+++ b/drivers/staging/fbtft/fbtft-core.c
@@ -654,6 +654,7 @@ struct fb_info *fbtft_framebuffer_alloc(struct fbtft_display *display,
 	fbops->fb_blank     =      fbtft_fb_blank;
 
 	fbdefio->delay =           HZ / fps;
+	fbdefio->sort_pagelist =   true;
 	fbdefio->deferred_io =     fbtft_deferred_io;
 	fb_deferred_io_init(info);
 
diff --git a/drivers/video/fbdev/broadsheetfb.c b/drivers/video/fbdev/broadsheetfb.c
index fd66f4d4a621..b9054f658838 100644
--- a/drivers/video/fbdev/broadsheetfb.c
+++ b/drivers/video/fbdev/broadsheetfb.c
@@ -1059,6 +1059,7 @@ static const struct fb_ops broadsheetfb_ops = {
 
 static struct fb_deferred_io broadsheetfb_defio = {
 	.delay		= HZ/4,
+	.sort_pagelist	= true,
 	.deferred_io	= broadsheetfb_dpy_deferred_io,
 };
 
diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
index 3727b1ca87b1..1f672cf253b2 100644
--- a/drivers/video/fbdev/core/fb_defio.c
+++ b/drivers/video/fbdev/core/fb_defio.c
@@ -132,15 +132,20 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
 	if (!list_empty(&page->lru))
 		goto page_already_added;
 
-	/* we loop through the pagelist before adding in order
-	to keep the pagelist sorted */
-	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
-		if (cur->index > page->index)
-			break;
+	if (fbdefio->sort_pagelist) {
+		/*
+		 * We loop through the pagelist before adding in order
+		 * to keep the pagelist sorted.
+		 */
+		list_for_each_entry(cur, &fbdefio->pagelist, lru) {
+			if (cur->index > page->index)
+				break;
+		}
+		list_add_tail(&page->lru, &cur->lru);
+	} else {
+		list_add_tail(&page->lru, &fbdefio->pagelist);
 	}
 
-	list_add_tail(&page->lru, &cur->lru);
-
 page_already_added:
 	mutex_unlock(&fbdefio->lock);
 
diff --git a/drivers/video/fbdev/metronomefb.c b/drivers/video/fbdev/metronomefb.c
index 952826557a0c..af858dd23ea6 100644
--- a/drivers/video/fbdev/metronomefb.c
+++ b/drivers/video/fbdev/metronomefb.c
@@ -568,6 +568,7 @@ static const struct fb_ops metronomefb_ops = {
 
 static struct fb_deferred_io metronomefb_defio = {
 	.delay		= HZ,
+	.sort_pagelist	= true,
 	.deferred_io	= metronomefb_dpy_deferred_io,
 };
 
diff --git a/drivers/video/fbdev/udlfb.c b/drivers/video/fbdev/udlfb.c
index b9cdd02c1000..184bb8433b78 100644
--- a/drivers/video/fbdev/udlfb.c
+++ b/drivers/video/fbdev/udlfb.c
@@ -980,6 +980,7 @@ static int dlfb_ops_open(struct fb_info *info, int user)
 
 		if (fbdefio) {
 			fbdefio->delay = DL_DEFIO_WRITE_DELAY;
+			fbdefio->sort_pagelist = true;
 			fbdefio->deferred_io = dlfb_dpy_deferred_io;
 		}
 
diff --git a/include/linux/fb.h b/include/linux/fb.h
index 3d7306c9a706..9a77ab615c36 100644
--- a/include/linux/fb.h
+++ b/include/linux/fb.h
@@ -204,6 +204,7 @@ struct fb_pixmap {
 struct fb_deferred_io {
 	/* delay between mkwrite and deferred handler */
 	unsigned long delay;
+	bool sort_pagelist; /* sort pagelist by offset */
 	struct mutex lock; /* mutex that protects the page list */
 	struct list_head pagelist; /* list of touched pages */
 	/* callback */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
@ 2022-02-10 14:11   ` Thomas Zimmermann
  0 siblings, 0 replies; 28+ messages in thread
From: Thomas Zimmermann @ 2022-02-10 14:11 UTC (permalink / raw)
  To: daniel, javierm, noralf, andriy.shevchenko, deller, bernie, jayalk
  Cc: linux-fbdev, linux-staging, Thomas Zimmermann, dri-devel

Fbdev's deferred I/O sorts all dirty pages by default, which incurs a
significant overhead. Make the sorting step optional and update the few
drivers that require it. Use a FIFO list by default.

Sorting pages by memory offset for deferred I/O performs an implicit
bubble-sort step on the list of dirty pages. The algorithm goes through
the list of dirty pages and inserts each new page according to its
index field. Even worse, list traversal always starts at the first
entry. As video memory is most likely updated scanline by scanline, the
algorithm traverses through the complete list for each updated page.

For example, with 1024x768x32bpp a page covers exactly one scanline.
Writing a single screen update from top to bottom requires updating
768 pages. With an average list length of 384 entries, a screen update
creates (768 * 384 =) 294912 compare operation.

Fix this by making the sorting step opt-in and update the few drivers
that require it. All other drivers work with unsorted page lists. Pages
are appended to the list. Therefore, in the common case of writing the
framebuffer top to bottom, pages are still sorted by offset, which may
have a positive effect on performance.

Playing a video [1] in mplayer's benchmark mode shows the difference
(i7-4790, FullHD, simpledrm, kernel with debugging).

  mplayer -benchmark -nosound -vo fbdev ./big_buck_bunny_720p_stereo.ogg

With sorted page lists:

  BENCHMARKs: VC:  32.960s VO:  73.068s A:   0.000s Sys:   2.413s =  108.441s
  BENCHMARK%: VC: 30.3947% VO: 67.3802% A:  0.0000% Sys:  2.2251% = 100.0000%

With unsorted page lists:

  BENCHMARKs: VC:  31.005s VO:  42.889s A:   0.000s Sys:   2.256s =   76.150s
  BENCHMARK%: VC: 40.7156% VO: 56.3219% A:  0.0000% Sys:  2.9625% = 100.0000%

VC shows the overhead of video decoding, VO shows the overhead of the
video output. Using unsorted page lists reduces the benchmark's run time
by ~32s/~25%.

Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
Link: https://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_720p_stereo.ogg # [1]
---
 drivers/staging/fbtft/fbtft-core.c  |  1 +
 drivers/video/fbdev/broadsheetfb.c  |  1 +
 drivers/video/fbdev/core/fb_defio.c | 19 ++++++++++++-------
 drivers/video/fbdev/metronomefb.c   |  1 +
 drivers/video/fbdev/udlfb.c         |  1 +
 include/linux/fb.h                  |  1 +
 6 files changed, 17 insertions(+), 7 deletions(-)

diff --git a/drivers/staging/fbtft/fbtft-core.c b/drivers/staging/fbtft/fbtft-core.c
index f2684d2d6851..4a35347b3020 100644
--- a/drivers/staging/fbtft/fbtft-core.c
+++ b/drivers/staging/fbtft/fbtft-core.c
@@ -654,6 +654,7 @@ struct fb_info *fbtft_framebuffer_alloc(struct fbtft_display *display,
 	fbops->fb_blank     =      fbtft_fb_blank;
 
 	fbdefio->delay =           HZ / fps;
+	fbdefio->sort_pagelist =   true;
 	fbdefio->deferred_io =     fbtft_deferred_io;
 	fb_deferred_io_init(info);
 
diff --git a/drivers/video/fbdev/broadsheetfb.c b/drivers/video/fbdev/broadsheetfb.c
index fd66f4d4a621..b9054f658838 100644
--- a/drivers/video/fbdev/broadsheetfb.c
+++ b/drivers/video/fbdev/broadsheetfb.c
@@ -1059,6 +1059,7 @@ static const struct fb_ops broadsheetfb_ops = {
 
 static struct fb_deferred_io broadsheetfb_defio = {
 	.delay		= HZ/4,
+	.sort_pagelist	= true,
 	.deferred_io	= broadsheetfb_dpy_deferred_io,
 };
 
diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
index 3727b1ca87b1..1f672cf253b2 100644
--- a/drivers/video/fbdev/core/fb_defio.c
+++ b/drivers/video/fbdev/core/fb_defio.c
@@ -132,15 +132,20 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
 	if (!list_empty(&page->lru))
 		goto page_already_added;
 
-	/* we loop through the pagelist before adding in order
-	to keep the pagelist sorted */
-	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
-		if (cur->index > page->index)
-			break;
+	if (fbdefio->sort_pagelist) {
+		/*
+		 * We loop through the pagelist before adding in order
+		 * to keep the pagelist sorted.
+		 */
+		list_for_each_entry(cur, &fbdefio->pagelist, lru) {
+			if (cur->index > page->index)
+				break;
+		}
+		list_add_tail(&page->lru, &cur->lru);
+	} else {
+		list_add_tail(&page->lru, &fbdefio->pagelist);
 	}
 
-	list_add_tail(&page->lru, &cur->lru);
-
 page_already_added:
 	mutex_unlock(&fbdefio->lock);
 
diff --git a/drivers/video/fbdev/metronomefb.c b/drivers/video/fbdev/metronomefb.c
index 952826557a0c..af858dd23ea6 100644
--- a/drivers/video/fbdev/metronomefb.c
+++ b/drivers/video/fbdev/metronomefb.c
@@ -568,6 +568,7 @@ static const struct fb_ops metronomefb_ops = {
 
 static struct fb_deferred_io metronomefb_defio = {
 	.delay		= HZ,
+	.sort_pagelist	= true,
 	.deferred_io	= metronomefb_dpy_deferred_io,
 };
 
diff --git a/drivers/video/fbdev/udlfb.c b/drivers/video/fbdev/udlfb.c
index b9cdd02c1000..184bb8433b78 100644
--- a/drivers/video/fbdev/udlfb.c
+++ b/drivers/video/fbdev/udlfb.c
@@ -980,6 +980,7 @@ static int dlfb_ops_open(struct fb_info *info, int user)
 
 		if (fbdefio) {
 			fbdefio->delay = DL_DEFIO_WRITE_DELAY;
+			fbdefio->sort_pagelist = true;
 			fbdefio->deferred_io = dlfb_dpy_deferred_io;
 		}
 
diff --git a/include/linux/fb.h b/include/linux/fb.h
index 3d7306c9a706..9a77ab615c36 100644
--- a/include/linux/fb.h
+++ b/include/linux/fb.h
@@ -204,6 +204,7 @@ struct fb_pixmap {
 struct fb_deferred_io {
 	/* delay between mkwrite and deferred handler */
 	unsigned long delay;
+	bool sort_pagelist; /* sort pagelist by offset */
 	struct mutex lock; /* mutex that protects the page list */
 	struct list_head pagelist; /* list of touched pages */
 	/* callback */
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
  2022-02-10 14:11   ` Thomas Zimmermann
@ 2022-02-10 16:15     ` Andy Shevchenko
  -1 siblings, 0 replies; 28+ messages in thread
From: Andy Shevchenko @ 2022-02-10 16:15 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: daniel, javierm, noralf, deller, bernie, jayalk, dri-devel,
	linux-fbdev, linux-staging

On Thu, Feb 10, 2022 at 03:11:13PM +0100, Thomas Zimmermann wrote:
> Fbdev's deferred I/O sorts all dirty pages by default, which incurs a
> significant overhead. Make the sorting step optional and update the few
> drivers that require it. Use a FIFO list by default.
> 
> Sorting pages by memory offset for deferred I/O performs an implicit
> bubble-sort step on the list of dirty pages. The algorithm goes through
> the list of dirty pages and inserts each new page according to its
> index field. Even worse, list traversal always starts at the first
> entry. As video memory is most likely updated scanline by scanline, the
> algorithm traverses through the complete list for each updated page.
> 
> For example, with 1024x768x32bpp a page covers exactly one scanline.
> Writing a single screen update from top to bottom requires updating
> 768 pages. With an average list length of 384 entries, a screen update
> creates (768 * 384 =) 294912 compare operation.
> 
> Fix this by making the sorting step opt-in and update the few drivers
> that require it. All other drivers work with unsorted page lists. Pages
> are appended to the list. Therefore, in the common case of writing the
> framebuffer top to bottom, pages are still sorted by offset, which may
> have a positive effect on performance.
> 
> Playing a video [1] in mplayer's benchmark mode shows the difference
> (i7-4790, FullHD, simpledrm, kernel with debugging).
> 
>   mplayer -benchmark -nosound -vo fbdev ./big_buck_bunny_720p_stereo.ogg
> 
> With sorted page lists:
> 
>   BENCHMARKs: VC:  32.960s VO:  73.068s A:   0.000s Sys:   2.413s =  108.441s
>   BENCHMARK%: VC: 30.3947% VO: 67.3802% A:  0.0000% Sys:  2.2251% = 100.0000%
> 
> With unsorted page lists:
> 
>   BENCHMARKs: VC:  31.005s VO:  42.889s A:   0.000s Sys:   2.256s =   76.150s
>   BENCHMARK%: VC: 40.7156% VO: 56.3219% A:  0.0000% Sys:  2.9625% = 100.0000%
> 
> VC shows the overhead of video decoding, VO shows the overhead of the
> video output. Using unsorted page lists reduces the benchmark's run time
> by ~32s/~25%.

Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
from fbtft perspective, thanks!

> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> Link: https://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_720p_stereo.ogg # [1]
> ---
>  drivers/staging/fbtft/fbtft-core.c  |  1 +
>  drivers/video/fbdev/broadsheetfb.c  |  1 +
>  drivers/video/fbdev/core/fb_defio.c | 19 ++++++++++++-------
>  drivers/video/fbdev/metronomefb.c   |  1 +
>  drivers/video/fbdev/udlfb.c         |  1 +
>  include/linux/fb.h                  |  1 +
>  6 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/staging/fbtft/fbtft-core.c b/drivers/staging/fbtft/fbtft-core.c
> index f2684d2d6851..4a35347b3020 100644
> --- a/drivers/staging/fbtft/fbtft-core.c
> +++ b/drivers/staging/fbtft/fbtft-core.c
> @@ -654,6 +654,7 @@ struct fb_info *fbtft_framebuffer_alloc(struct fbtft_display *display,
>  	fbops->fb_blank     =      fbtft_fb_blank;
>  
>  	fbdefio->delay =           HZ / fps;
> +	fbdefio->sort_pagelist =   true;
>  	fbdefio->deferred_io =     fbtft_deferred_io;
>  	fb_deferred_io_init(info);
>  
> diff --git a/drivers/video/fbdev/broadsheetfb.c b/drivers/video/fbdev/broadsheetfb.c
> index fd66f4d4a621..b9054f658838 100644
> --- a/drivers/video/fbdev/broadsheetfb.c
> +++ b/drivers/video/fbdev/broadsheetfb.c
> @@ -1059,6 +1059,7 @@ static const struct fb_ops broadsheetfb_ops = {
>  
>  static struct fb_deferred_io broadsheetfb_defio = {
>  	.delay		= HZ/4,
> +	.sort_pagelist	= true,
>  	.deferred_io	= broadsheetfb_dpy_deferred_io,
>  };
>  
> diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
> index 3727b1ca87b1..1f672cf253b2 100644
> --- a/drivers/video/fbdev/core/fb_defio.c
> +++ b/drivers/video/fbdev/core/fb_defio.c
> @@ -132,15 +132,20 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
>  	if (!list_empty(&page->lru))
>  		goto page_already_added;
>  
> -	/* we loop through the pagelist before adding in order
> -	to keep the pagelist sorted */
> -	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
> -		if (cur->index > page->index)
> -			break;
> +	if (fbdefio->sort_pagelist) {
> +		/*
> +		 * We loop through the pagelist before adding in order
> +		 * to keep the pagelist sorted.
> +		 */
> +		list_for_each_entry(cur, &fbdefio->pagelist, lru) {
> +			if (cur->index > page->index)
> +				break;
> +		}
> +		list_add_tail(&page->lru, &cur->lru);
> +	} else {
> +		list_add_tail(&page->lru, &fbdefio->pagelist);
>  	}
>  
> -	list_add_tail(&page->lru, &cur->lru);
> -
>  page_already_added:
>  	mutex_unlock(&fbdefio->lock);
>  
> diff --git a/drivers/video/fbdev/metronomefb.c b/drivers/video/fbdev/metronomefb.c
> index 952826557a0c..af858dd23ea6 100644
> --- a/drivers/video/fbdev/metronomefb.c
> +++ b/drivers/video/fbdev/metronomefb.c
> @@ -568,6 +568,7 @@ static const struct fb_ops metronomefb_ops = {
>  
>  static struct fb_deferred_io metronomefb_defio = {
>  	.delay		= HZ,
> +	.sort_pagelist	= true,
>  	.deferred_io	= metronomefb_dpy_deferred_io,
>  };
>  
> diff --git a/drivers/video/fbdev/udlfb.c b/drivers/video/fbdev/udlfb.c
> index b9cdd02c1000..184bb8433b78 100644
> --- a/drivers/video/fbdev/udlfb.c
> +++ b/drivers/video/fbdev/udlfb.c
> @@ -980,6 +980,7 @@ static int dlfb_ops_open(struct fb_info *info, int user)
>  
>  		if (fbdefio) {
>  			fbdefio->delay = DL_DEFIO_WRITE_DELAY;
> +			fbdefio->sort_pagelist = true;
>  			fbdefio->deferred_io = dlfb_dpy_deferred_io;
>  		}
>  
> diff --git a/include/linux/fb.h b/include/linux/fb.h
> index 3d7306c9a706..9a77ab615c36 100644
> --- a/include/linux/fb.h
> +++ b/include/linux/fb.h
> @@ -204,6 +204,7 @@ struct fb_pixmap {
>  struct fb_deferred_io {
>  	/* delay between mkwrite and deferred handler */
>  	unsigned long delay;
> +	bool sort_pagelist; /* sort pagelist by offset */
>  	struct mutex lock; /* mutex that protects the page list */
>  	struct list_head pagelist; /* list of touched pages */
>  	/* callback */
> -- 
> 2.34.1
> 

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
@ 2022-02-10 16:15     ` Andy Shevchenko
  0 siblings, 0 replies; 28+ messages in thread
From: Andy Shevchenko @ 2022-02-10 16:15 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: linux-fbdev, deller, linux-staging, javierm, dri-devel, bernie,
	noralf, jayalk

On Thu, Feb 10, 2022 at 03:11:13PM +0100, Thomas Zimmermann wrote:
> Fbdev's deferred I/O sorts all dirty pages by default, which incurs a
> significant overhead. Make the sorting step optional and update the few
> drivers that require it. Use a FIFO list by default.
> 
> Sorting pages by memory offset for deferred I/O performs an implicit
> bubble-sort step on the list of dirty pages. The algorithm goes through
> the list of dirty pages and inserts each new page according to its
> index field. Even worse, list traversal always starts at the first
> entry. As video memory is most likely updated scanline by scanline, the
> algorithm traverses through the complete list for each updated page.
> 
> For example, with 1024x768x32bpp a page covers exactly one scanline.
> Writing a single screen update from top to bottom requires updating
> 768 pages. With an average list length of 384 entries, a screen update
> creates (768 * 384 =) 294912 compare operation.
> 
> Fix this by making the sorting step opt-in and update the few drivers
> that require it. All other drivers work with unsorted page lists. Pages
> are appended to the list. Therefore, in the common case of writing the
> framebuffer top to bottom, pages are still sorted by offset, which may
> have a positive effect on performance.
> 
> Playing a video [1] in mplayer's benchmark mode shows the difference
> (i7-4790, FullHD, simpledrm, kernel with debugging).
> 
>   mplayer -benchmark -nosound -vo fbdev ./big_buck_bunny_720p_stereo.ogg
> 
> With sorted page lists:
> 
>   BENCHMARKs: VC:  32.960s VO:  73.068s A:   0.000s Sys:   2.413s =  108.441s
>   BENCHMARK%: VC: 30.3947% VO: 67.3802% A:  0.0000% Sys:  2.2251% = 100.0000%
> 
> With unsorted page lists:
> 
>   BENCHMARKs: VC:  31.005s VO:  42.889s A:   0.000s Sys:   2.256s =   76.150s
>   BENCHMARK%: VC: 40.7156% VO: 56.3219% A:  0.0000% Sys:  2.9625% = 100.0000%
> 
> VC shows the overhead of video decoding, VO shows the overhead of the
> video output. Using unsorted page lists reduces the benchmark's run time
> by ~32s/~25%.

Acked-by: Andy Shevchenko <andriy.shevchenko@linux.intel.com>
from fbtft perspective, thanks!

> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> Link: https://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_720p_stereo.ogg # [1]
> ---
>  drivers/staging/fbtft/fbtft-core.c  |  1 +
>  drivers/video/fbdev/broadsheetfb.c  |  1 +
>  drivers/video/fbdev/core/fb_defio.c | 19 ++++++++++++-------
>  drivers/video/fbdev/metronomefb.c   |  1 +
>  drivers/video/fbdev/udlfb.c         |  1 +
>  include/linux/fb.h                  |  1 +
>  6 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/staging/fbtft/fbtft-core.c b/drivers/staging/fbtft/fbtft-core.c
> index f2684d2d6851..4a35347b3020 100644
> --- a/drivers/staging/fbtft/fbtft-core.c
> +++ b/drivers/staging/fbtft/fbtft-core.c
> @@ -654,6 +654,7 @@ struct fb_info *fbtft_framebuffer_alloc(struct fbtft_display *display,
>  	fbops->fb_blank     =      fbtft_fb_blank;
>  
>  	fbdefio->delay =           HZ / fps;
> +	fbdefio->sort_pagelist =   true;
>  	fbdefio->deferred_io =     fbtft_deferred_io;
>  	fb_deferred_io_init(info);
>  
> diff --git a/drivers/video/fbdev/broadsheetfb.c b/drivers/video/fbdev/broadsheetfb.c
> index fd66f4d4a621..b9054f658838 100644
> --- a/drivers/video/fbdev/broadsheetfb.c
> +++ b/drivers/video/fbdev/broadsheetfb.c
> @@ -1059,6 +1059,7 @@ static const struct fb_ops broadsheetfb_ops = {
>  
>  static struct fb_deferred_io broadsheetfb_defio = {
>  	.delay		= HZ/4,
> +	.sort_pagelist	= true,
>  	.deferred_io	= broadsheetfb_dpy_deferred_io,
>  };
>  
> diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
> index 3727b1ca87b1..1f672cf253b2 100644
> --- a/drivers/video/fbdev/core/fb_defio.c
> +++ b/drivers/video/fbdev/core/fb_defio.c
> @@ -132,15 +132,20 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
>  	if (!list_empty(&page->lru))
>  		goto page_already_added;
>  
> -	/* we loop through the pagelist before adding in order
> -	to keep the pagelist sorted */
> -	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
> -		if (cur->index > page->index)
> -			break;
> +	if (fbdefio->sort_pagelist) {
> +		/*
> +		 * We loop through the pagelist before adding in order
> +		 * to keep the pagelist sorted.
> +		 */
> +		list_for_each_entry(cur, &fbdefio->pagelist, lru) {
> +			if (cur->index > page->index)
> +				break;
> +		}
> +		list_add_tail(&page->lru, &cur->lru);
> +	} else {
> +		list_add_tail(&page->lru, &fbdefio->pagelist);
>  	}
>  
> -	list_add_tail(&page->lru, &cur->lru);
> -
>  page_already_added:
>  	mutex_unlock(&fbdefio->lock);
>  
> diff --git a/drivers/video/fbdev/metronomefb.c b/drivers/video/fbdev/metronomefb.c
> index 952826557a0c..af858dd23ea6 100644
> --- a/drivers/video/fbdev/metronomefb.c
> +++ b/drivers/video/fbdev/metronomefb.c
> @@ -568,6 +568,7 @@ static const struct fb_ops metronomefb_ops = {
>  
>  static struct fb_deferred_io metronomefb_defio = {
>  	.delay		= HZ,
> +	.sort_pagelist	= true,
>  	.deferred_io	= metronomefb_dpy_deferred_io,
>  };
>  
> diff --git a/drivers/video/fbdev/udlfb.c b/drivers/video/fbdev/udlfb.c
> index b9cdd02c1000..184bb8433b78 100644
> --- a/drivers/video/fbdev/udlfb.c
> +++ b/drivers/video/fbdev/udlfb.c
> @@ -980,6 +980,7 @@ static int dlfb_ops_open(struct fb_info *info, int user)
>  
>  		if (fbdefio) {
>  			fbdefio->delay = DL_DEFIO_WRITE_DELAY;
> +			fbdefio->sort_pagelist = true;
>  			fbdefio->deferred_io = dlfb_dpy_deferred_io;
>  		}
>  
> diff --git a/include/linux/fb.h b/include/linux/fb.h
> index 3d7306c9a706..9a77ab615c36 100644
> --- a/include/linux/fb.h
> +++ b/include/linux/fb.h
> @@ -204,6 +204,7 @@ struct fb_pixmap {
>  struct fb_deferred_io {
>  	/* delay between mkwrite and deferred handler */
>  	unsigned long delay;
> +	bool sort_pagelist; /* sort pagelist by offset */
>  	struct mutex lock; /* mutex that protects the page list */
>  	struct list_head pagelist; /* list of touched pages */
>  	/* callback */
> -- 
> 2.34.1
> 

-- 
With Best Regards,
Andy Shevchenko



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fbdev/defio: Early-out if page is already enlisted
  2022-02-10 14:11   ` Thomas Zimmermann
@ 2022-02-10 21:00     ` Sam Ravnborg
  -1 siblings, 0 replies; 28+ messages in thread
From: Sam Ravnborg @ 2022-02-10 21:00 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: linux-fbdev, deller, linux-staging, javierm, dri-devel, bernie,
	noralf, andriy.shevchenko, jayalk

Hi Thomas,

On Thu, Feb 10, 2022 at 03:11:11PM +0100, Thomas Zimmermann wrote:
> Return early if a page is already in the list of dirty pages for
> deferred I/O. This can be detected if the page's list head is not
> empty. Keep the list head initialized while the page is not enlisted
> to make this work reliably.
> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
>  drivers/video/fbdev/core/fb_defio.c | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
> index a591d291b231..3727b1ca87b1 100644
> --- a/drivers/video/fbdev/core/fb_defio.c
> +++ b/drivers/video/fbdev/core/fb_defio.c
> @@ -59,6 +59,7 @@ static vm_fault_t fb_deferred_io_fault(struct vm_fault *vmf)
>  		printk(KERN_ERR "no mapping available\n");
>  
>  	BUG_ON(!page->mapping);
> +	INIT_LIST_HEAD(&page->lru);
>  	page->index = vmf->pgoff;
>  
>  	vmf->page = page;
> @@ -122,17 +123,19 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
>  	 */
>  	lock_page(page);
>  
> +	/*
> +	 * This check is to catch the case where a new process could start
> +	 * writing to the same page through a new pte. this new access
> +	 * can cause the mkwrite even when the original ps's pte is marked
> +	 * writable.
> +	 */
When moving this comment it would be prudent to also fix the wording a
bit.
- Capital in start of sentence and after a period
- Spell out process and do not shorten ps


> +	if (!list_empty(&page->lru))
> +		goto page_already_added;
> +

This check says that if the page already has something in the parge->lru
then this is added by defio and thus is already added.
This matches your commit description - OK.

Maybe add something like:
* Pages added will have their lru set, and it is clered again in the
* deferred work handler.



>  	/* we loop through the pagelist before adding in order
>  	to keep the pagelist sorted */
>  	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
> -		/* this check is to catch the case where a new
> -		process could start writing to the same page
> -		through a new pte. this new access can cause the
> -		mkwrite even when the original ps's pte is marked
> -		writable */
> -		if (unlikely(cur == page))
> -			goto page_already_added;
> -		else if (cur->index > page->index)
> +		if (cur->index > page->index)
>  			break;
>  	}
>  
> @@ -194,7 +197,7 @@ static void fb_deferred_io_work(struct work_struct *work)
>  
>  	/* clear the list */
>  	list_for_each_safe(node, next, &fbdefio->pagelist) {
> -		list_del(node);
> +		list_del_init(node);
>  	}
>  	mutex_unlock(&fbdefio->lock);
>  }

With the comment adjusted as you see fit
Acked-by: Sam Ravnborg <sam@ravnborg.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fbdev/defio: Early-out if page is already enlisted
@ 2022-02-10 21:00     ` Sam Ravnborg
  0 siblings, 0 replies; 28+ messages in thread
From: Sam Ravnborg @ 2022-02-10 21:00 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: daniel, javierm, noralf, andriy.shevchenko, deller, bernie,
	jayalk, linux-fbdev, linux-staging, dri-devel

Hi Thomas,

On Thu, Feb 10, 2022 at 03:11:11PM +0100, Thomas Zimmermann wrote:
> Return early if a page is already in the list of dirty pages for
> deferred I/O. This can be detected if the page's list head is not
> empty. Keep the list head initialized while the page is not enlisted
> to make this work reliably.
> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> ---
>  drivers/video/fbdev/core/fb_defio.c | 21 ++++++++++++---------
>  1 file changed, 12 insertions(+), 9 deletions(-)
> 
> diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
> index a591d291b231..3727b1ca87b1 100644
> --- a/drivers/video/fbdev/core/fb_defio.c
> +++ b/drivers/video/fbdev/core/fb_defio.c
> @@ -59,6 +59,7 @@ static vm_fault_t fb_deferred_io_fault(struct vm_fault *vmf)
>  		printk(KERN_ERR "no mapping available\n");
>  
>  	BUG_ON(!page->mapping);
> +	INIT_LIST_HEAD(&page->lru);
>  	page->index = vmf->pgoff;
>  
>  	vmf->page = page;
> @@ -122,17 +123,19 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
>  	 */
>  	lock_page(page);
>  
> +	/*
> +	 * This check is to catch the case where a new process could start
> +	 * writing to the same page through a new pte. this new access
> +	 * can cause the mkwrite even when the original ps's pte is marked
> +	 * writable.
> +	 */
When moving this comment it would be prudent to also fix the wording a
bit.
- Capital in start of sentence and after a period
- Spell out process and do not shorten ps


> +	if (!list_empty(&page->lru))
> +		goto page_already_added;
> +

This check says that if the page already has something in the parge->lru
then this is added by defio and thus is already added.
This matches your commit description - OK.

Maybe add something like:
* Pages added will have their lru set, and it is clered again in the
* deferred work handler.



>  	/* we loop through the pagelist before adding in order
>  	to keep the pagelist sorted */
>  	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
> -		/* this check is to catch the case where a new
> -		process could start writing to the same page
> -		through a new pte. this new access can cause the
> -		mkwrite even when the original ps's pte is marked
> -		writable */
> -		if (unlikely(cur == page))
> -			goto page_already_added;
> -		else if (cur->index > page->index)
> +		if (cur->index > page->index)
>  			break;
>  	}
>  
> @@ -194,7 +197,7 @@ static void fb_deferred_io_work(struct work_struct *work)
>  
>  	/* clear the list */
>  	list_for_each_safe(node, next, &fbdefio->pagelist) {
> -		list_del(node);
> +		list_del_init(node);
>  	}
>  	mutex_unlock(&fbdefio->lock);
>  }

With the comment adjusted as you see fit
Acked-by: Sam Ravnborg <sam@ravnborg.org>

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
  2022-02-10 14:11   ` Thomas Zimmermann
@ 2022-02-10 21:16     ` Sam Ravnborg
  -1 siblings, 0 replies; 28+ messages in thread
From: Sam Ravnborg @ 2022-02-10 21:16 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: daniel, javierm, noralf, andriy.shevchenko, deller, bernie,
	jayalk, linux-fbdev, linux-staging, dri-devel

Hi Thomas,

On Thu, Feb 10, 2022 at 03:11:13PM +0100, Thomas Zimmermann wrote:
> Fbdev's deferred I/O sorts all dirty pages by default, which incurs a
> significant overhead. Make the sorting step optional and update the few
> drivers that require it. Use a FIFO list by default.
> 
> Sorting pages by memory offset for deferred I/O performs an implicit
> bubble-sort step on the list of dirty pages. The algorithm goes through
> the list of dirty pages and inserts each new page according to its
> index field. Even worse, list traversal always starts at the first
> entry. As video memory is most likely updated scanline by scanline, the
> algorithm traverses through the complete list for each updated page.
> 
> For example, with 1024x768x32bpp a page covers exactly one scanline.
> Writing a single screen update from top to bottom requires updating
> 768 pages. With an average list length of 384 entries, a screen update
> creates (768 * 384 =) 294912 compare operation.
> 
> Fix this by making the sorting step opt-in and update the few drivers
> that require it. All other drivers work with unsorted page lists. Pages
> are appended to the list. Therefore, in the common case of writing the
> framebuffer top to bottom, pages are still sorted by offset, which may
> have a positive effect on performance.
> 
> Playing a video [1] in mplayer's benchmark mode shows the difference
> (i7-4790, FullHD, simpledrm, kernel with debugging).
> 
>   mplayer -benchmark -nosound -vo fbdev ./big_buck_bunny_720p_stereo.ogg
> 
> With sorted page lists:
> 
>   BENCHMARKs: VC:  32.960s VO:  73.068s A:   0.000s Sys:   2.413s =  108.441s
>   BENCHMARK%: VC: 30.3947% VO: 67.3802% A:  0.0000% Sys:  2.2251% = 100.0000%
> 
> With unsorted page lists:
> 
>   BENCHMARKs: VC:  31.005s VO:  42.889s A:   0.000s Sys:   2.256s =   76.150s
>   BENCHMARK%: VC: 40.7156% VO: 56.3219% A:  0.0000% Sys:  2.9625% = 100.0000%
> 
> VC shows the overhead of video decoding, VO shows the overhead of the
> video output. Using unsorted page lists reduces the benchmark's run time
> by ~32s/~25%.
Nice!

> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> Link: https://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_720p_stereo.ogg # [1]
> ---
>  drivers/staging/fbtft/fbtft-core.c  |  1 +
>  drivers/video/fbdev/broadsheetfb.c  |  1 +
>  drivers/video/fbdev/core/fb_defio.c | 19 ++++++++++++-------
>  drivers/video/fbdev/metronomefb.c   |  1 +
>  drivers/video/fbdev/udlfb.c         |  1 +
>  include/linux/fb.h                  |  1 +
>  6 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/staging/fbtft/fbtft-core.c b/drivers/staging/fbtft/fbtft-core.c
> index f2684d2d6851..4a35347b3020 100644
> --- a/drivers/staging/fbtft/fbtft-core.c
> +++ b/drivers/staging/fbtft/fbtft-core.c
> @@ -654,6 +654,7 @@ struct fb_info *fbtft_framebuffer_alloc(struct fbtft_display *display,
>  	fbops->fb_blank     =      fbtft_fb_blank;
>  
>  	fbdefio->delay =           HZ / fps;
> +	fbdefio->sort_pagelist =   true;
>  	fbdefio->deferred_io =     fbtft_deferred_io;
>  	fb_deferred_io_init(info);
>  
> diff --git a/drivers/video/fbdev/broadsheetfb.c b/drivers/video/fbdev/broadsheetfb.c
> index fd66f4d4a621..b9054f658838 100644
> --- a/drivers/video/fbdev/broadsheetfb.c
> +++ b/drivers/video/fbdev/broadsheetfb.c
> @@ -1059,6 +1059,7 @@ static const struct fb_ops broadsheetfb_ops = {
>  
>  static struct fb_deferred_io broadsheetfb_defio = {
>  	.delay		= HZ/4,
> +	.sort_pagelist	= true,
>  	.deferred_io	= broadsheetfb_dpy_deferred_io,
>  };
>  
> diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
> index 3727b1ca87b1..1f672cf253b2 100644
> --- a/drivers/video/fbdev/core/fb_defio.c
> +++ b/drivers/video/fbdev/core/fb_defio.c
> @@ -132,15 +132,20 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
>  	if (!list_empty(&page->lru))
>  		goto page_already_added;
>  
> -	/* we loop through the pagelist before adding in order
> -	to keep the pagelist sorted */
> -	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
> -		if (cur->index > page->index)
> -			break;
> +	if (fbdefio->sort_pagelist) {
> +		/*
> +		 * We loop through the pagelist before adding in order
> +		 * to keep the pagelist sorted.
> +		 */
> +		list_for_each_entry(cur, &fbdefio->pagelist, lru) {
> +			if (cur->index > page->index)
> +				break;
> +		}
> +		list_add_tail(&page->lru, &cur->lru);
> +	} else {
> +		list_add_tail(&page->lru, &fbdefio->pagelist);
>  	}
Bikeshedding - my personal style is to have the likely part first.
This makes reading the code easier.


The following drivers uses deferred io but are not listed as
they need the page list sorted:

- hecubafb
- hyperv_fb
- sh_mobile_lcdcfb
- smscufx
- ssd1307fb 
- xen-fbfront

It would be nice with some info in the commit log that they do not need
the pages sorted.
To make the list complete include the drm stuff too.

It did not jump to me why they did not need sorted pages,
so some sort of reassurance that they have been checked would be nice.

With the following addressed:
Acked-by: Sam Ravnborg <sam@ravnborg.org>

I hope someone else looks that can verify that the list of drivers
without sort_pagelist is correct so someone knowledgeable have looked
too.

	Sam

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
@ 2022-02-10 21:16     ` Sam Ravnborg
  0 siblings, 0 replies; 28+ messages in thread
From: Sam Ravnborg @ 2022-02-10 21:16 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: linux-fbdev, deller, linux-staging, javierm, dri-devel, bernie,
	noralf, andriy.shevchenko, jayalk

Hi Thomas,

On Thu, Feb 10, 2022 at 03:11:13PM +0100, Thomas Zimmermann wrote:
> Fbdev's deferred I/O sorts all dirty pages by default, which incurs a
> significant overhead. Make the sorting step optional and update the few
> drivers that require it. Use a FIFO list by default.
> 
> Sorting pages by memory offset for deferred I/O performs an implicit
> bubble-sort step on the list of dirty pages. The algorithm goes through
> the list of dirty pages and inserts each new page according to its
> index field. Even worse, list traversal always starts at the first
> entry. As video memory is most likely updated scanline by scanline, the
> algorithm traverses through the complete list for each updated page.
> 
> For example, with 1024x768x32bpp a page covers exactly one scanline.
> Writing a single screen update from top to bottom requires updating
> 768 pages. With an average list length of 384 entries, a screen update
> creates (768 * 384 =) 294912 compare operation.
> 
> Fix this by making the sorting step opt-in and update the few drivers
> that require it. All other drivers work with unsorted page lists. Pages
> are appended to the list. Therefore, in the common case of writing the
> framebuffer top to bottom, pages are still sorted by offset, which may
> have a positive effect on performance.
> 
> Playing a video [1] in mplayer's benchmark mode shows the difference
> (i7-4790, FullHD, simpledrm, kernel with debugging).
> 
>   mplayer -benchmark -nosound -vo fbdev ./big_buck_bunny_720p_stereo.ogg
> 
> With sorted page lists:
> 
>   BENCHMARKs: VC:  32.960s VO:  73.068s A:   0.000s Sys:   2.413s =  108.441s
>   BENCHMARK%: VC: 30.3947% VO: 67.3802% A:  0.0000% Sys:  2.2251% = 100.0000%
> 
> With unsorted page lists:
> 
>   BENCHMARKs: VC:  31.005s VO:  42.889s A:   0.000s Sys:   2.256s =   76.150s
>   BENCHMARK%: VC: 40.7156% VO: 56.3219% A:  0.0000% Sys:  2.9625% = 100.0000%
> 
> VC shows the overhead of video decoding, VO shows the overhead of the
> video output. Using unsorted page lists reduces the benchmark's run time
> by ~32s/~25%.
Nice!

> 
> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
> Link: https://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_720p_stereo.ogg # [1]
> ---
>  drivers/staging/fbtft/fbtft-core.c  |  1 +
>  drivers/video/fbdev/broadsheetfb.c  |  1 +
>  drivers/video/fbdev/core/fb_defio.c | 19 ++++++++++++-------
>  drivers/video/fbdev/metronomefb.c   |  1 +
>  drivers/video/fbdev/udlfb.c         |  1 +
>  include/linux/fb.h                  |  1 +
>  6 files changed, 17 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/staging/fbtft/fbtft-core.c b/drivers/staging/fbtft/fbtft-core.c
> index f2684d2d6851..4a35347b3020 100644
> --- a/drivers/staging/fbtft/fbtft-core.c
> +++ b/drivers/staging/fbtft/fbtft-core.c
> @@ -654,6 +654,7 @@ struct fb_info *fbtft_framebuffer_alloc(struct fbtft_display *display,
>  	fbops->fb_blank     =      fbtft_fb_blank;
>  
>  	fbdefio->delay =           HZ / fps;
> +	fbdefio->sort_pagelist =   true;
>  	fbdefio->deferred_io =     fbtft_deferred_io;
>  	fb_deferred_io_init(info);
>  
> diff --git a/drivers/video/fbdev/broadsheetfb.c b/drivers/video/fbdev/broadsheetfb.c
> index fd66f4d4a621..b9054f658838 100644
> --- a/drivers/video/fbdev/broadsheetfb.c
> +++ b/drivers/video/fbdev/broadsheetfb.c
> @@ -1059,6 +1059,7 @@ static const struct fb_ops broadsheetfb_ops = {
>  
>  static struct fb_deferred_io broadsheetfb_defio = {
>  	.delay		= HZ/4,
> +	.sort_pagelist	= true,
>  	.deferred_io	= broadsheetfb_dpy_deferred_io,
>  };
>  
> diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
> index 3727b1ca87b1..1f672cf253b2 100644
> --- a/drivers/video/fbdev/core/fb_defio.c
> +++ b/drivers/video/fbdev/core/fb_defio.c
> @@ -132,15 +132,20 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
>  	if (!list_empty(&page->lru))
>  		goto page_already_added;
>  
> -	/* we loop through the pagelist before adding in order
> -	to keep the pagelist sorted */
> -	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
> -		if (cur->index > page->index)
> -			break;
> +	if (fbdefio->sort_pagelist) {
> +		/*
> +		 * We loop through the pagelist before adding in order
> +		 * to keep the pagelist sorted.
> +		 */
> +		list_for_each_entry(cur, &fbdefio->pagelist, lru) {
> +			if (cur->index > page->index)
> +				break;
> +		}
> +		list_add_tail(&page->lru, &cur->lru);
> +	} else {
> +		list_add_tail(&page->lru, &fbdefio->pagelist);
>  	}
Bikeshedding - my personal style is to have the likely part first.
This makes reading the code easier.


The following drivers uses deferred io but are not listed as
they need the page list sorted:

- hecubafb
- hyperv_fb
- sh_mobile_lcdcfb
- smscufx
- ssd1307fb 
- xen-fbfront

It would be nice with some info in the commit log that they do not need
the pages sorted.
To make the list complete include the drm stuff too.

It did not jump to me why they did not need sorted pages,
so some sort of reassurance that they have been checked would be nice.

With the following addressed:
Acked-by: Sam Ravnborg <sam@ravnborg.org>

I hope someone else looks that can verify that the list of drivers
without sort_pagelist is correct so someone knowledgeable have looked
too.

	Sam

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
  2022-02-10 21:16     ` Sam Ravnborg
@ 2022-02-11  7:58       ` Dan Carpenter
  -1 siblings, 0 replies; 28+ messages in thread
From: Dan Carpenter @ 2022-02-11  7:58 UTC (permalink / raw)
  To: Sam Ravnborg
  Cc: linux-fbdev, Thomas Zimmermann, deller, linux-staging, javierm,
	dri-devel, bernie, noralf, andriy.shevchenko, jayalk

On Thu, Feb 10, 2022 at 10:16:45PM +0100, Sam Ravnborg wrote:
> > diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
> > index 3727b1ca87b1..1f672cf253b2 100644
> > --- a/drivers/video/fbdev/core/fb_defio.c
> > +++ b/drivers/video/fbdev/core/fb_defio.c
> > @@ -132,15 +132,20 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
> >  	if (!list_empty(&page->lru))
> >  		goto page_already_added;
> >  
> > -	/* we loop through the pagelist before adding in order
> > -	to keep the pagelist sorted */
> > -	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
> > -		if (cur->index > page->index)
> > -			break;
> > +	if (fbdefio->sort_pagelist) {
> > +		/*
> > +		 * We loop through the pagelist before adding in order
> > +		 * to keep the pagelist sorted.
> > +		 */
> > +		list_for_each_entry(cur, &fbdefio->pagelist, lru) {
> > +			if (cur->index > page->index)
> > +				break;
> > +		}
> > +		list_add_tail(&page->lru, &cur->lru);
> > +	} else {
> > +		list_add_tail(&page->lru, &fbdefio->pagelist);
> >  	}
> Bikeshedding - my personal style is to have the likely part first.
> This makes reading the code easier.

I've thought about this quite a bit...  I guess my rule is to avoid
negatives as much as possible so I prefer the original code.  My rules
right now are:

1) Always do error handling.  Don't do success handling.
2) Return as quickly as possible and pull the code in an indent.
3) Avoid negatives.  Never had negatives in the variable names.

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
@ 2022-02-11  7:58       ` Dan Carpenter
  0 siblings, 0 replies; 28+ messages in thread
From: Dan Carpenter @ 2022-02-11  7:58 UTC (permalink / raw)
  To: Sam Ravnborg
  Cc: Thomas Zimmermann, daniel, javierm, noralf, andriy.shevchenko,
	deller, bernie, jayalk, linux-fbdev, linux-staging, dri-devel

On Thu, Feb 10, 2022 at 10:16:45PM +0100, Sam Ravnborg wrote:
> > diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
> > index 3727b1ca87b1..1f672cf253b2 100644
> > --- a/drivers/video/fbdev/core/fb_defio.c
> > +++ b/drivers/video/fbdev/core/fb_defio.c
> > @@ -132,15 +132,20 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
> >  	if (!list_empty(&page->lru))
> >  		goto page_already_added;
> >  
> > -	/* we loop through the pagelist before adding in order
> > -	to keep the pagelist sorted */
> > -	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
> > -		if (cur->index > page->index)
> > -			break;
> > +	if (fbdefio->sort_pagelist) {
> > +		/*
> > +		 * We loop through the pagelist before adding in order
> > +		 * to keep the pagelist sorted.
> > +		 */
> > +		list_for_each_entry(cur, &fbdefio->pagelist, lru) {
> > +			if (cur->index > page->index)
> > +				break;
> > +		}
> > +		list_add_tail(&page->lru, &cur->lru);
> > +	} else {
> > +		list_add_tail(&page->lru, &fbdefio->pagelist);
> >  	}
> Bikeshedding - my personal style is to have the likely part first.
> This makes reading the code easier.

I've thought about this quite a bit...  I guess my rule is to avoid
negatives as much as possible so I prefer the original code.  My rules
right now are:

1) Always do error handling.  Don't do success handling.
2) Return as quickly as possible and pull the code in an indent.
3) Avoid negatives.  Never had negatives in the variable names.

regards,
dan carpenter

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
  2022-02-10 21:16     ` Sam Ravnborg
@ 2022-02-11  8:21       ` Thomas Zimmermann
  -1 siblings, 0 replies; 28+ messages in thread
From: Thomas Zimmermann @ 2022-02-11  8:21 UTC (permalink / raw)
  To: Sam Ravnborg
  Cc: daniel, javierm, noralf, andriy.shevchenko, deller, bernie,
	jayalk, linux-fbdev, linux-staging, dri-devel


[-- Attachment #1.1: Type: text/plain, Size: 6445 bytes --]

Hi Sam

Am 10.02.22 um 22:16 schrieb Sam Ravnborg:
> Hi Thomas,
> 
> On Thu, Feb 10, 2022 at 03:11:13PM +0100, Thomas Zimmermann wrote:
>> Fbdev's deferred I/O sorts all dirty pages by default, which incurs a
>> significant overhead. Make the sorting step optional and update the few
>> drivers that require it. Use a FIFO list by default.
>>
>> Sorting pages by memory offset for deferred I/O performs an implicit
>> bubble-sort step on the list of dirty pages. The algorithm goes through
>> the list of dirty pages and inserts each new page according to its
>> index field. Even worse, list traversal always starts at the first
>> entry. As video memory is most likely updated scanline by scanline, the
>> algorithm traverses through the complete list for each updated page.
>>
>> For example, with 1024x768x32bpp a page covers exactly one scanline.
>> Writing a single screen update from top to bottom requires updating
>> 768 pages. With an average list length of 384 entries, a screen update
>> creates (768 * 384 =) 294912 compare operation.
>>
>> Fix this by making the sorting step opt-in and update the few drivers
>> that require it. All other drivers work with unsorted page lists. Pages
>> are appended to the list. Therefore, in the common case of writing the
>> framebuffer top to bottom, pages are still sorted by offset, which may
>> have a positive effect on performance.
>>
>> Playing a video [1] in mplayer's benchmark mode shows the difference
>> (i7-4790, FullHD, simpledrm, kernel with debugging).
>>
>>    mplayer -benchmark -nosound -vo fbdev ./big_buck_bunny_720p_stereo.ogg
>>
>> With sorted page lists:
>>
>>    BENCHMARKs: VC:  32.960s VO:  73.068s A:   0.000s Sys:   2.413s =  108.441s
>>    BENCHMARK%: VC: 30.3947% VO: 67.3802% A:  0.0000% Sys:  2.2251% = 100.0000%
>>
>> With unsorted page lists:
>>
>>    BENCHMARKs: VC:  31.005s VO:  42.889s A:   0.000s Sys:   2.256s =   76.150s
>>    BENCHMARK%: VC: 40.7156% VO: 56.3219% A:  0.0000% Sys:  2.9625% = 100.0000%
>>
>> VC shows the overhead of video decoding, VO shows the overhead of the
>> video output. Using unsorted page lists reduces the benchmark's run time
>> by ~32s/~25%.
> Nice!
> 
>>
>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
>> Link: https://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_720p_stereo.ogg # [1]
>> ---
>>   drivers/staging/fbtft/fbtft-core.c  |  1 +
>>   drivers/video/fbdev/broadsheetfb.c  |  1 +
>>   drivers/video/fbdev/core/fb_defio.c | 19 ++++++++++++-------
>>   drivers/video/fbdev/metronomefb.c   |  1 +
>>   drivers/video/fbdev/udlfb.c         |  1 +
>>   include/linux/fb.h                  |  1 +
>>   6 files changed, 17 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/staging/fbtft/fbtft-core.c b/drivers/staging/fbtft/fbtft-core.c
>> index f2684d2d6851..4a35347b3020 100644
>> --- a/drivers/staging/fbtft/fbtft-core.c
>> +++ b/drivers/staging/fbtft/fbtft-core.c
>> @@ -654,6 +654,7 @@ struct fb_info *fbtft_framebuffer_alloc(struct fbtft_display *display,
>>   	fbops->fb_blank     =      fbtft_fb_blank;
>>   
>>   	fbdefio->delay =           HZ / fps;
>> +	fbdefio->sort_pagelist =   true;
>>   	fbdefio->deferred_io =     fbtft_deferred_io;
>>   	fb_deferred_io_init(info);
>>   
>> diff --git a/drivers/video/fbdev/broadsheetfb.c b/drivers/video/fbdev/broadsheetfb.c
>> index fd66f4d4a621..b9054f658838 100644
>> --- a/drivers/video/fbdev/broadsheetfb.c
>> +++ b/drivers/video/fbdev/broadsheetfb.c
>> @@ -1059,6 +1059,7 @@ static const struct fb_ops broadsheetfb_ops = {
>>   
>>   static struct fb_deferred_io broadsheetfb_defio = {
>>   	.delay		= HZ/4,
>> +	.sort_pagelist	= true,
>>   	.deferred_io	= broadsheetfb_dpy_deferred_io,
>>   };
>>   
>> diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
>> index 3727b1ca87b1..1f672cf253b2 100644
>> --- a/drivers/video/fbdev/core/fb_defio.c
>> +++ b/drivers/video/fbdev/core/fb_defio.c
>> @@ -132,15 +132,20 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
>>   	if (!list_empty(&page->lru))
>>   		goto page_already_added;
>>   
>> -	/* we loop through the pagelist before adding in order
>> -	to keep the pagelist sorted */
>> -	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
>> -		if (cur->index > page->index)
>> -			break;
>> +	if (fbdefio->sort_pagelist) {
>> +		/*
>> +		 * We loop through the pagelist before adding in order
>> +		 * to keep the pagelist sorted.
>> +		 */
>> +		list_for_each_entry(cur, &fbdefio->pagelist, lru) {
>> +			if (cur->index > page->index)
>> +				break;
>> +		}
>> +		list_add_tail(&page->lru, &cur->lru);
>> +	} else {
>> +		list_add_tail(&page->lru, &fbdefio->pagelist);
>>   	}
> Bikeshedding - my personal style is to have the likely part first.
> This makes reading the code easier.

I'll change this a bit to leave out the else branch.

> 
> 
> The following drivers uses deferred io but are not listed as
> they need the page list sorted:
> 
> - hecubafb
> - hyperv_fb
> - sh_mobile_lcdcfb
> - smscufx
> - ssd1307fb
> - xen-fbfront
> 
> It would be nice with some info in the commit log that they do not need
> the pages sorted.
> To make the list complete include the drm stuff too.
> 
> It did not jump to me why they did not need sorted pages,
> so some sort of reassurance that they have been checked would be nice.

Most drivers build a bounding rectangle around the dirty pages or simply 
flush the whole screen. The only two affected DRM drivers, generic fbdev 
and vmwgfx, both use the bounding rectangle.  In those cases, the exact 
order of the pages doesn't matter.  The other drivers look at the page 
index or handle pages one-by-one. I set the sort_pagelist flag for 
those, even though some of them would probably work correctly without 
sorting.

I'll add this information to the commit description.

Best regards
Thomas

> 
> With the following addressed:
> Acked-by: Sam Ravnborg <sam@ravnborg.org>
> 
> I hope someone else looks that can verify that the list of drivers
> without sort_pagelist is correct so someone knowledgeable have looked
> too.
> 
> 	Sam

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
@ 2022-02-11  8:21       ` Thomas Zimmermann
  0 siblings, 0 replies; 28+ messages in thread
From: Thomas Zimmermann @ 2022-02-11  8:21 UTC (permalink / raw)
  To: Sam Ravnborg
  Cc: linux-fbdev, deller, linux-staging, javierm, dri-devel, bernie,
	noralf, andriy.shevchenko, jayalk


[-- Attachment #1.1: Type: text/plain, Size: 6445 bytes --]

Hi Sam

Am 10.02.22 um 22:16 schrieb Sam Ravnborg:
> Hi Thomas,
> 
> On Thu, Feb 10, 2022 at 03:11:13PM +0100, Thomas Zimmermann wrote:
>> Fbdev's deferred I/O sorts all dirty pages by default, which incurs a
>> significant overhead. Make the sorting step optional and update the few
>> drivers that require it. Use a FIFO list by default.
>>
>> Sorting pages by memory offset for deferred I/O performs an implicit
>> bubble-sort step on the list of dirty pages. The algorithm goes through
>> the list of dirty pages and inserts each new page according to its
>> index field. Even worse, list traversal always starts at the first
>> entry. As video memory is most likely updated scanline by scanline, the
>> algorithm traverses through the complete list for each updated page.
>>
>> For example, with 1024x768x32bpp a page covers exactly one scanline.
>> Writing a single screen update from top to bottom requires updating
>> 768 pages. With an average list length of 384 entries, a screen update
>> creates (768 * 384 =) 294912 compare operation.
>>
>> Fix this by making the sorting step opt-in and update the few drivers
>> that require it. All other drivers work with unsorted page lists. Pages
>> are appended to the list. Therefore, in the common case of writing the
>> framebuffer top to bottom, pages are still sorted by offset, which may
>> have a positive effect on performance.
>>
>> Playing a video [1] in mplayer's benchmark mode shows the difference
>> (i7-4790, FullHD, simpledrm, kernel with debugging).
>>
>>    mplayer -benchmark -nosound -vo fbdev ./big_buck_bunny_720p_stereo.ogg
>>
>> With sorted page lists:
>>
>>    BENCHMARKs: VC:  32.960s VO:  73.068s A:   0.000s Sys:   2.413s =  108.441s
>>    BENCHMARK%: VC: 30.3947% VO: 67.3802% A:  0.0000% Sys:  2.2251% = 100.0000%
>>
>> With unsorted page lists:
>>
>>    BENCHMARKs: VC:  31.005s VO:  42.889s A:   0.000s Sys:   2.256s =   76.150s
>>    BENCHMARK%: VC: 40.7156% VO: 56.3219% A:  0.0000% Sys:  2.9625% = 100.0000%
>>
>> VC shows the overhead of video decoding, VO shows the overhead of the
>> video output. Using unsorted page lists reduces the benchmark's run time
>> by ~32s/~25%.
> Nice!
> 
>>
>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
>> Link: https://download.blender.org/peach/bigbuckbunny_movies/big_buck_bunny_720p_stereo.ogg # [1]
>> ---
>>   drivers/staging/fbtft/fbtft-core.c  |  1 +
>>   drivers/video/fbdev/broadsheetfb.c  |  1 +
>>   drivers/video/fbdev/core/fb_defio.c | 19 ++++++++++++-------
>>   drivers/video/fbdev/metronomefb.c   |  1 +
>>   drivers/video/fbdev/udlfb.c         |  1 +
>>   include/linux/fb.h                  |  1 +
>>   6 files changed, 17 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/staging/fbtft/fbtft-core.c b/drivers/staging/fbtft/fbtft-core.c
>> index f2684d2d6851..4a35347b3020 100644
>> --- a/drivers/staging/fbtft/fbtft-core.c
>> +++ b/drivers/staging/fbtft/fbtft-core.c
>> @@ -654,6 +654,7 @@ struct fb_info *fbtft_framebuffer_alloc(struct fbtft_display *display,
>>   	fbops->fb_blank     =      fbtft_fb_blank;
>>   
>>   	fbdefio->delay =           HZ / fps;
>> +	fbdefio->sort_pagelist =   true;
>>   	fbdefio->deferred_io =     fbtft_deferred_io;
>>   	fb_deferred_io_init(info);
>>   
>> diff --git a/drivers/video/fbdev/broadsheetfb.c b/drivers/video/fbdev/broadsheetfb.c
>> index fd66f4d4a621..b9054f658838 100644
>> --- a/drivers/video/fbdev/broadsheetfb.c
>> +++ b/drivers/video/fbdev/broadsheetfb.c
>> @@ -1059,6 +1059,7 @@ static const struct fb_ops broadsheetfb_ops = {
>>   
>>   static struct fb_deferred_io broadsheetfb_defio = {
>>   	.delay		= HZ/4,
>> +	.sort_pagelist	= true,
>>   	.deferred_io	= broadsheetfb_dpy_deferred_io,
>>   };
>>   
>> diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
>> index 3727b1ca87b1..1f672cf253b2 100644
>> --- a/drivers/video/fbdev/core/fb_defio.c
>> +++ b/drivers/video/fbdev/core/fb_defio.c
>> @@ -132,15 +132,20 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
>>   	if (!list_empty(&page->lru))
>>   		goto page_already_added;
>>   
>> -	/* we loop through the pagelist before adding in order
>> -	to keep the pagelist sorted */
>> -	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
>> -		if (cur->index > page->index)
>> -			break;
>> +	if (fbdefio->sort_pagelist) {
>> +		/*
>> +		 * We loop through the pagelist before adding in order
>> +		 * to keep the pagelist sorted.
>> +		 */
>> +		list_for_each_entry(cur, &fbdefio->pagelist, lru) {
>> +			if (cur->index > page->index)
>> +				break;
>> +		}
>> +		list_add_tail(&page->lru, &cur->lru);
>> +	} else {
>> +		list_add_tail(&page->lru, &fbdefio->pagelist);
>>   	}
> Bikeshedding - my personal style is to have the likely part first.
> This makes reading the code easier.

I'll change this a bit to leave out the else branch.

> 
> 
> The following drivers uses deferred io but are not listed as
> they need the page list sorted:
> 
> - hecubafb
> - hyperv_fb
> - sh_mobile_lcdcfb
> - smscufx
> - ssd1307fb
> - xen-fbfront
> 
> It would be nice with some info in the commit log that they do not need
> the pages sorted.
> To make the list complete include the drm stuff too.
> 
> It did not jump to me why they did not need sorted pages,
> so some sort of reassurance that they have been checked would be nice.

Most drivers build a bounding rectangle around the dirty pages or simply 
flush the whole screen. The only two affected DRM drivers, generic fbdev 
and vmwgfx, both use the bounding rectangle.  In those cases, the exact 
order of the pages doesn't matter.  The other drivers look at the page 
index or handle pages one-by-one. I set the sort_pagelist flag for 
those, even though some of them would probably work correctly without 
sorting.

I'll add this information to the commit description.

Best regards
Thomas

> 
> With the following addressed:
> Acked-by: Sam Ravnborg <sam@ravnborg.org>
> 
> I hope someone else looks that can verify that the list of drivers
> without sort_pagelist is correct so someone knowledgeable have looked
> too.
> 
> 	Sam

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
  2022-02-11  7:58       ` Dan Carpenter
@ 2022-02-11  8:25         ` Thomas Zimmermann
  -1 siblings, 0 replies; 28+ messages in thread
From: Thomas Zimmermann @ 2022-02-11  8:25 UTC (permalink / raw)
  To: Dan Carpenter, Sam Ravnborg
  Cc: linux-fbdev, deller, linux-staging, javierm, dri-devel, bernie,
	noralf, andriy.shevchenko, jayalk


[-- Attachment #1.1: Type: text/plain, Size: 2179 bytes --]

Hi

Am 11.02.22 um 08:58 schrieb Dan Carpenter:
> On Thu, Feb 10, 2022 at 10:16:45PM +0100, Sam Ravnborg wrote:
>>> diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
>>> index 3727b1ca87b1..1f672cf253b2 100644
>>> --- a/drivers/video/fbdev/core/fb_defio.c
>>> +++ b/drivers/video/fbdev/core/fb_defio.c
>>> @@ -132,15 +132,20 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
>>>   	if (!list_empty(&page->lru))
>>>   		goto page_already_added;
>>>   
>>> -	/* we loop through the pagelist before adding in order
>>> -	to keep the pagelist sorted */
>>> -	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
>>> -		if (cur->index > page->index)
>>> -			break;
>>> +	if (fbdefio->sort_pagelist) {
>>> +		/*
>>> +		 * We loop through the pagelist before adding in order
>>> +		 * to keep the pagelist sorted.
>>> +		 */
>>> +		list_for_each_entry(cur, &fbdefio->pagelist, lru) {
>>> +			if (cur->index > page->index)
>>> +				break;
>>> +		}
>>> +		list_add_tail(&page->lru, &cur->lru);
>>> +	} else {
>>> +		list_add_tail(&page->lru, &fbdefio->pagelist);
>>>   	}
>> Bikeshedding - my personal style is to have the likely part first.
>> This makes reading the code easier.
> 
> I've thought about this quite a bit...  I guess my rule is to avoid
> negatives as much as possible so I prefer the original code.  My rules
> right now are:
> 
> 1) Always do error handling.  Don't do success handling.
> 2) Return as quickly as possible and pull the code in an indent.
> 3) Avoid negatives.  Never had negatives in the variable names.

 From what I know, CPUs' branch prediction prefers backward jumps (e.g., 
loops) but avoids forward jumps. Compilers arrange the code to optimize 
for this pattern. So I tend to put the exception or error handling into 
the if branch.  But I have no idea if that really makes a difference at 
runtime.

Best regards
Thomas

> 
> regards,
> dan carpenter

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
@ 2022-02-11  8:25         ` Thomas Zimmermann
  0 siblings, 0 replies; 28+ messages in thread
From: Thomas Zimmermann @ 2022-02-11  8:25 UTC (permalink / raw)
  To: Dan Carpenter, Sam Ravnborg
  Cc: linux-fbdev, deller, linux-staging, bernie, dri-devel, javierm,
	noralf, andriy.shevchenko, jayalk


[-- Attachment #1.1: Type: text/plain, Size: 2179 bytes --]

Hi

Am 11.02.22 um 08:58 schrieb Dan Carpenter:
> On Thu, Feb 10, 2022 at 10:16:45PM +0100, Sam Ravnborg wrote:
>>> diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
>>> index 3727b1ca87b1..1f672cf253b2 100644
>>> --- a/drivers/video/fbdev/core/fb_defio.c
>>> +++ b/drivers/video/fbdev/core/fb_defio.c
>>> @@ -132,15 +132,20 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
>>>   	if (!list_empty(&page->lru))
>>>   		goto page_already_added;
>>>   
>>> -	/* we loop through the pagelist before adding in order
>>> -	to keep the pagelist sorted */
>>> -	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
>>> -		if (cur->index > page->index)
>>> -			break;
>>> +	if (fbdefio->sort_pagelist) {
>>> +		/*
>>> +		 * We loop through the pagelist before adding in order
>>> +		 * to keep the pagelist sorted.
>>> +		 */
>>> +		list_for_each_entry(cur, &fbdefio->pagelist, lru) {
>>> +			if (cur->index > page->index)
>>> +				break;
>>> +		}
>>> +		list_add_tail(&page->lru, &cur->lru);
>>> +	} else {
>>> +		list_add_tail(&page->lru, &fbdefio->pagelist);
>>>   	}
>> Bikeshedding - my personal style is to have the likely part first.
>> This makes reading the code easier.
> 
> I've thought about this quite a bit...  I guess my rule is to avoid
> negatives as much as possible so I prefer the original code.  My rules
> right now are:
> 
> 1) Always do error handling.  Don't do success handling.
> 2) Return as quickly as possible and pull the code in an indent.
> 3) Avoid negatives.  Never had negatives in the variable names.

 From what I know, CPUs' branch prediction prefers backward jumps (e.g., 
loops) but avoids forward jumps. Compilers arrange the code to optimize 
for this pattern. So I tend to put the exception or error handling into 
the if branch.  But I have no idea if that really makes a difference at 
runtime.

Best regards
Thomas

> 
> regards,
> dan carpenter

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fbdev/defio: Early-out if page is already enlisted
  2022-02-10 21:00     ` Sam Ravnborg
@ 2022-02-11  9:12       ` Thomas Zimmermann
  -1 siblings, 0 replies; 28+ messages in thread
From: Thomas Zimmermann @ 2022-02-11  9:12 UTC (permalink / raw)
  To: Sam Ravnborg
  Cc: linux-fbdev, deller, linux-staging, javierm, dri-devel, bernie,
	noralf, andriy.shevchenko, jayalk


[-- Attachment #1.1: Type: text/plain, Size: 3459 bytes --]

Hi Sam

Am 10.02.22 um 22:00 schrieb Sam Ravnborg:
> Hi Thomas,
> 
> On Thu, Feb 10, 2022 at 03:11:11PM +0100, Thomas Zimmermann wrote:
>> Return early if a page is already in the list of dirty pages for
>> deferred I/O. This can be detected if the page's list head is not
>> empty. Keep the list head initialized while the page is not enlisted
>> to make this work reliably.
>>
>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
>> ---
>>   drivers/video/fbdev/core/fb_defio.c | 21 ++++++++++++---------
>>   1 file changed, 12 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
>> index a591d291b231..3727b1ca87b1 100644
>> --- a/drivers/video/fbdev/core/fb_defio.c
>> +++ b/drivers/video/fbdev/core/fb_defio.c
>> @@ -59,6 +59,7 @@ static vm_fault_t fb_deferred_io_fault(struct vm_fault *vmf)
>>   		printk(KERN_ERR "no mapping available\n");
>>   
>>   	BUG_ON(!page->mapping);
>> +	INIT_LIST_HEAD(&page->lru);
>>   	page->index = vmf->pgoff;
>>   
>>   	vmf->page = page;
>> @@ -122,17 +123,19 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
>>   	 */
>>   	lock_page(page);
>>   
>> +	/*
>> +	 * This check is to catch the case where a new process could start
>> +	 * writing to the same page through a new pte. this new access
>> +	 * can cause the mkwrite even when the original ps's pte is marked
>> +	 * writable.
>> +	 */
> When moving this comment it would be prudent to also fix the wording a
> bit.
> - Capital in start of sentence and after a period
> - Spell out process and do not shorten ps

Ok.

> 
> 
>> +	if (!list_empty(&page->lru))
>> +		goto page_already_added;
>> +
> 
> This check says that if the page already has something in the parge->lru
> then this is added by defio and thus is already added.
> This matches your commit description - OK.
> 
> Maybe add something like:
> * Pages added will have their lru set, and it is clered again in the
> * deferred work handler.

I'll add a related TODO to the comment because we actually want to 
remove the use of the lru field. It's owned by the page cache. I already 
have a prototype patch that implements the page tracking with a separate 
list.

Best regards
Thomas

> 
> 
> 
>>   	/* we loop through the pagelist before adding in order
>>   	to keep the pagelist sorted */
>>   	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
>> -		/* this check is to catch the case where a new
>> -		process could start writing to the same page
>> -		through a new pte. this new access can cause the
>> -		mkwrite even when the original ps's pte is marked
>> -		writable */
>> -		if (unlikely(cur == page))
>> -			goto page_already_added;
>> -		else if (cur->index > page->index)
>> +		if (cur->index > page->index)
>>   			break;
>>   	}
>>   
>> @@ -194,7 +197,7 @@ static void fb_deferred_io_work(struct work_struct *work)
>>   
>>   	/* clear the list */
>>   	list_for_each_safe(node, next, &fbdefio->pagelist) {
>> -		list_del(node);
>> +		list_del_init(node);
>>   	}
>>   	mutex_unlock(&fbdefio->lock);
>>   }
> 
> With the comment adjusted as you see fit
> Acked-by: Sam Ravnborg <sam@ravnborg.org>

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 1/2] fbdev/defio: Early-out if page is already enlisted
@ 2022-02-11  9:12       ` Thomas Zimmermann
  0 siblings, 0 replies; 28+ messages in thread
From: Thomas Zimmermann @ 2022-02-11  9:12 UTC (permalink / raw)
  To: Sam Ravnborg
  Cc: linux-fbdev, deller, linux-staging, bernie, dri-devel, javierm,
	noralf, andriy.shevchenko, jayalk


[-- Attachment #1.1: Type: text/plain, Size: 3459 bytes --]

Hi Sam

Am 10.02.22 um 22:00 schrieb Sam Ravnborg:
> Hi Thomas,
> 
> On Thu, Feb 10, 2022 at 03:11:11PM +0100, Thomas Zimmermann wrote:
>> Return early if a page is already in the list of dirty pages for
>> deferred I/O. This can be detected if the page's list head is not
>> empty. Keep the list head initialized while the page is not enlisted
>> to make this work reliably.
>>
>> Signed-off-by: Thomas Zimmermann <tzimmermann@suse.de>
>> ---
>>   drivers/video/fbdev/core/fb_defio.c | 21 ++++++++++++---------
>>   1 file changed, 12 insertions(+), 9 deletions(-)
>>
>> diff --git a/drivers/video/fbdev/core/fb_defio.c b/drivers/video/fbdev/core/fb_defio.c
>> index a591d291b231..3727b1ca87b1 100644
>> --- a/drivers/video/fbdev/core/fb_defio.c
>> +++ b/drivers/video/fbdev/core/fb_defio.c
>> @@ -59,6 +59,7 @@ static vm_fault_t fb_deferred_io_fault(struct vm_fault *vmf)
>>   		printk(KERN_ERR "no mapping available\n");
>>   
>>   	BUG_ON(!page->mapping);
>> +	INIT_LIST_HEAD(&page->lru);
>>   	page->index = vmf->pgoff;
>>   
>>   	vmf->page = page;
>> @@ -122,17 +123,19 @@ static vm_fault_t fb_deferred_io_mkwrite(struct vm_fault *vmf)
>>   	 */
>>   	lock_page(page);
>>   
>> +	/*
>> +	 * This check is to catch the case where a new process could start
>> +	 * writing to the same page through a new pte. this new access
>> +	 * can cause the mkwrite even when the original ps's pte is marked
>> +	 * writable.
>> +	 */
> When moving this comment it would be prudent to also fix the wording a
> bit.
> - Capital in start of sentence and after a period
> - Spell out process and do not shorten ps

Ok.

> 
> 
>> +	if (!list_empty(&page->lru))
>> +		goto page_already_added;
>> +
> 
> This check says that if the page already has something in the parge->lru
> then this is added by defio and thus is already added.
> This matches your commit description - OK.
> 
> Maybe add something like:
> * Pages added will have their lru set, and it is clered again in the
> * deferred work handler.

I'll add a related TODO to the comment because we actually want to 
remove the use of the lru field. It's owned by the page cache. I already 
have a prototype patch that implements the page tracking with a separate 
list.

Best regards
Thomas

> 
> 
> 
>>   	/* we loop through the pagelist before adding in order
>>   	to keep the pagelist sorted */
>>   	list_for_each_entry(cur, &fbdefio->pagelist, lru) {
>> -		/* this check is to catch the case where a new
>> -		process could start writing to the same page
>> -		through a new pte. this new access can cause the
>> -		mkwrite even when the original ps's pte is marked
>> -		writable */
>> -		if (unlikely(cur == page))
>> -			goto page_already_added;
>> -		else if (cur->index > page->index)
>> +		if (cur->index > page->index)
>>   			break;
>>   	}
>>   
>> @@ -194,7 +197,7 @@ static void fb_deferred_io_work(struct work_struct *work)
>>   
>>   	/* clear the list */
>>   	list_for_each_safe(node, next, &fbdefio->pagelist) {
>> -		list_del(node);
>> +		list_del_init(node);
>>   	}
>>   	mutex_unlock(&fbdefio->lock);
>>   }
> 
> With the comment adjusted as you see fit
> Acked-by: Sam Ravnborg <sam@ravnborg.org>

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
  2022-02-10 14:11   ` Thomas Zimmermann
@ 2022-02-14  8:05     ` Geert Uytterhoeven
  -1 siblings, 0 replies; 28+ messages in thread
From: Geert Uytterhoeven @ 2022-02-14  8:05 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: daniel, javierm, noralf, andriy.shevchenko, deller, bernie,
	jayalk, dri-devel, linux-fbdev, linux-staging

Hi Thomas,

On Thu, Feb 10, 2022 at 4:24 PM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> Fbdev's deferred I/O sorts all dirty pages by default, which incurs a
> significant overhead. Make the sorting step optional and update the few
> drivers that require it. Use a FIFO list by default.
>
> Sorting pages by memory offset for deferred I/O performs an implicit
> bubble-sort step on the list of dirty pages. The algorithm goes through
> the list of dirty pages and inserts each new page according to its
> index field. Even worse, list traversal always starts at the first
> entry. As video memory is most likely updated scanline by scanline, the
> algorithm traverses through the complete list for each updated page.
>
> For example, with 1024x768x32bpp a page covers exactly one scanline.
> Writing a single screen update from top to bottom requires updating
> 768 pages. With an average list length of 384 entries, a screen update
> creates (768 * 384 =) 294912 compare operation.

What about using folios?
If consecutive pages are merged into a single entry, there's much less
(or nothing in the example above) to sort.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
@ 2022-02-14  8:05     ` Geert Uytterhoeven
  0 siblings, 0 replies; 28+ messages in thread
From: Geert Uytterhoeven @ 2022-02-14  8:05 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: linux-fbdev, deller, linux-staging, javierm, dri-devel, bernie,
	noralf, andriy.shevchenko, jayalk

Hi Thomas,

On Thu, Feb 10, 2022 at 4:24 PM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> Fbdev's deferred I/O sorts all dirty pages by default, which incurs a
> significant overhead. Make the sorting step optional and update the few
> drivers that require it. Use a FIFO list by default.
>
> Sorting pages by memory offset for deferred I/O performs an implicit
> bubble-sort step on the list of dirty pages. The algorithm goes through
> the list of dirty pages and inserts each new page according to its
> index field. Even worse, list traversal always starts at the first
> entry. As video memory is most likely updated scanline by scanline, the
> algorithm traverses through the complete list for each updated page.
>
> For example, with 1024x768x32bpp a page covers exactly one scanline.
> Writing a single screen update from top to bottom requires updating
> 768 pages. With an average list length of 384 entries, a screen update
> creates (768 * 384 =) 294912 compare operation.

What about using folios?
If consecutive pages are merged into a single entry, there's much less
(or nothing in the example above) to sort.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
  2022-02-14  8:05     ` Geert Uytterhoeven
@ 2022-02-14  8:28       ` Thomas Zimmermann
  -1 siblings, 0 replies; 28+ messages in thread
From: Thomas Zimmermann @ 2022-02-14  8:28 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: linux-fbdev, deller, linux-staging, javierm, dri-devel, bernie,
	noralf, andriy.shevchenko, jayalk


[-- Attachment #1.1: Type: text/plain, Size: 2025 bytes --]

Hi

Am 14.02.22 um 09:05 schrieb Geert Uytterhoeven:
> Hi Thomas,
> 
> On Thu, Feb 10, 2022 at 4:24 PM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>> Fbdev's deferred I/O sorts all dirty pages by default, which incurs a
>> significant overhead. Make the sorting step optional and update the few
>> drivers that require it. Use a FIFO list by default.
>>
>> Sorting pages by memory offset for deferred I/O performs an implicit
>> bubble-sort step on the list of dirty pages. The algorithm goes through
>> the list of dirty pages and inserts each new page according to its
>> index field. Even worse, list traversal always starts at the first
>> entry. As video memory is most likely updated scanline by scanline, the
>> algorithm traverses through the complete list for each updated page.
>>
>> For example, with 1024x768x32bpp a page covers exactly one scanline.
>> Writing a single screen update from top to bottom requires updating
>> 768 pages. With an average list length of 384 entries, a screen update
>> creates (768 * 384 =) 294912 compare operation.
> 
> What about using folios?
> If consecutive pages are merged into a single entry, there's much less
> (or nothing in the example above) to sort.

How would the code know that? Calls to page_mkwrite happen 
pagefault-by-pagefault in any order AFAICT.

Best regards
Thomas

[1] 
https://elixir.bootlin.com/linux/v5.16.9/source/include/linux/mm_types.h#L258


> 
> Gr{oetje,eeting}s,
> 
>                          Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                  -- Linus Torvalds

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
@ 2022-02-14  8:28       ` Thomas Zimmermann
  0 siblings, 0 replies; 28+ messages in thread
From: Thomas Zimmermann @ 2022-02-14  8:28 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: linux-fbdev, deller, linux-staging, bernie, dri-devel, javierm,
	noralf, andriy.shevchenko, jayalk


[-- Attachment #1.1: Type: text/plain, Size: 2025 bytes --]

Hi

Am 14.02.22 um 09:05 schrieb Geert Uytterhoeven:
> Hi Thomas,
> 
> On Thu, Feb 10, 2022 at 4:24 PM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>> Fbdev's deferred I/O sorts all dirty pages by default, which incurs a
>> significant overhead. Make the sorting step optional and update the few
>> drivers that require it. Use a FIFO list by default.
>>
>> Sorting pages by memory offset for deferred I/O performs an implicit
>> bubble-sort step on the list of dirty pages. The algorithm goes through
>> the list of dirty pages and inserts each new page according to its
>> index field. Even worse, list traversal always starts at the first
>> entry. As video memory is most likely updated scanline by scanline, the
>> algorithm traverses through the complete list for each updated page.
>>
>> For example, with 1024x768x32bpp a page covers exactly one scanline.
>> Writing a single screen update from top to bottom requires updating
>> 768 pages. With an average list length of 384 entries, a screen update
>> creates (768 * 384 =) 294912 compare operation.
> 
> What about using folios?
> If consecutive pages are merged into a single entry, there's much less
> (or nothing in the example above) to sort.

How would the code know that? Calls to page_mkwrite happen 
pagefault-by-pagefault in any order AFAICT.

Best regards
Thomas

[1] 
https://elixir.bootlin.com/linux/v5.16.9/source/include/linux/mm_types.h#L258


> 
> Gr{oetje,eeting}s,
> 
>                          Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                  -- Linus Torvalds

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
  2022-02-14  8:28       ` Thomas Zimmermann
@ 2022-02-14  9:05         ` Geert Uytterhoeven
  -1 siblings, 0 replies; 28+ messages in thread
From: Geert Uytterhoeven @ 2022-02-14  9:05 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: linux-fbdev, deller, linux-staging, javierm, dri-devel, bernie,
	noralf, andriy.shevchenko, jayalk

Hi Thomas,

On Mon, Feb 14, 2022 at 9:28 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> Am 14.02.22 um 09:05 schrieb Geert Uytterhoeven:
> > On Thu, Feb 10, 2022 at 4:24 PM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> >> Fbdev's deferred I/O sorts all dirty pages by default, which incurs a
> >> significant overhead. Make the sorting step optional and update the few
> >> drivers that require it. Use a FIFO list by default.
> >>
> >> Sorting pages by memory offset for deferred I/O performs an implicit
> >> bubble-sort step on the list of dirty pages. The algorithm goes through
> >> the list of dirty pages and inserts each new page according to its
> >> index field. Even worse, list traversal always starts at the first
> >> entry. As video memory is most likely updated scanline by scanline, the
> >> algorithm traverses through the complete list for each updated page.
> >>
> >> For example, with 1024x768x32bpp a page covers exactly one scanline.
> >> Writing a single screen update from top to bottom requires updating
> >> 768 pages. With an average list length of 384 entries, a screen update
> >> creates (768 * 384 =) 294912 compare operation.
> >
> > What about using folios?
> > If consecutive pages are merged into a single entry, there's much less
> > (or nothing in the example above) to sort.
>
> How would the code know that? Calls to page_mkwrite happen
> pagefault-by-pagefault in any order AFAICT.

fb_deferred_io_mkwrite() would still be called for a page, but an
adjacent page can be merged with an existing entry while adding it
to the list.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
@ 2022-02-14  9:05         ` Geert Uytterhoeven
  0 siblings, 0 replies; 28+ messages in thread
From: Geert Uytterhoeven @ 2022-02-14  9:05 UTC (permalink / raw)
  To: Thomas Zimmermann
  Cc: linux-fbdev, deller, linux-staging, bernie, dri-devel, javierm,
	noralf, andriy.shevchenko, jayalk

Hi Thomas,

On Mon, Feb 14, 2022 at 9:28 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> Am 14.02.22 um 09:05 schrieb Geert Uytterhoeven:
> > On Thu, Feb 10, 2022 at 4:24 PM Thomas Zimmermann <tzimmermann@suse.de> wrote:
> >> Fbdev's deferred I/O sorts all dirty pages by default, which incurs a
> >> significant overhead. Make the sorting step optional and update the few
> >> drivers that require it. Use a FIFO list by default.
> >>
> >> Sorting pages by memory offset for deferred I/O performs an implicit
> >> bubble-sort step on the list of dirty pages. The algorithm goes through
> >> the list of dirty pages and inserts each new page according to its
> >> index field. Even worse, list traversal always starts at the first
> >> entry. As video memory is most likely updated scanline by scanline, the
> >> algorithm traverses through the complete list for each updated page.
> >>
> >> For example, with 1024x768x32bpp a page covers exactly one scanline.
> >> Writing a single screen update from top to bottom requires updating
> >> 768 pages. With an average list length of 384 entries, a screen update
> >> creates (768 * 384 =) 294912 compare operation.
> >
> > What about using folios?
> > If consecutive pages are merged into a single entry, there's much less
> > (or nothing in the example above) to sort.
>
> How would the code know that? Calls to page_mkwrite happen
> pagefault-by-pagefault in any order AFAICT.

fb_deferred_io_mkwrite() would still be called for a page, but an
adjacent page can be merged with an existing entry while adding it
to the list.

Gr{oetje,eeting}s,

                        Geert

--
Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org

In personal conversations with technical people, I call myself a hacker. But
when I'm talking to journalists I just say "programmer" or something like that.
                                -- Linus Torvalds

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
  2022-02-14  9:05         ` Geert Uytterhoeven
@ 2022-02-14 13:29           ` Thomas Zimmermann
  -1 siblings, 0 replies; 28+ messages in thread
From: Thomas Zimmermann @ 2022-02-14 13:29 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: linux-fbdev, deller, linux-staging, bernie, dri-devel, javierm,
	noralf, andriy.shevchenko, jayalk


[-- Attachment #1.1: Type: text/plain, Size: 2520 bytes --]

Hi

Am 14.02.22 um 10:05 schrieb Geert Uytterhoeven:
> Hi Thomas,
> 
> On Mon, Feb 14, 2022 at 9:28 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>> Am 14.02.22 um 09:05 schrieb Geert Uytterhoeven:
>>> On Thu, Feb 10, 2022 at 4:24 PM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>>>> Fbdev's deferred I/O sorts all dirty pages by default, which incurs a
>>>> significant overhead. Make the sorting step optional and update the few
>>>> drivers that require it. Use a FIFO list by default.
>>>>
>>>> Sorting pages by memory offset for deferred I/O performs an implicit
>>>> bubble-sort step on the list of dirty pages. The algorithm goes through
>>>> the list of dirty pages and inserts each new page according to its
>>>> index field. Even worse, list traversal always starts at the first
>>>> entry. As video memory is most likely updated scanline by scanline, the
>>>> algorithm traverses through the complete list for each updated page.
>>>>
>>>> For example, with 1024x768x32bpp a page covers exactly one scanline.
>>>> Writing a single screen update from top to bottom requires updating
>>>> 768 pages. With an average list length of 384 entries, a screen update
>>>> creates (768 * 384 =) 294912 compare operation.
>>>
>>> What about using folios?
>>> If consecutive pages are merged into a single entry, there's much less
>>> (or nothing in the example above) to sort.
>>
>> How would the code know that? Calls to page_mkwrite happen
>> pagefault-by-pagefault in any order AFAICT.
> 
> fb_deferred_io_mkwrite() would still be called for a page, but an
> adjacent page can be merged with an existing entry while adding it
> to the list.

I still don't understand how we'd use it to our advantage. Most drivers 
don't need sorted pages at all. A folio has strong alignment 
requirements for size and offset AFAICT. We might end up flushing way 
too much of the display memory.

Best regards
Thomas

> 
> Gr{oetje,eeting}s,
> 
>                          Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                  -- Linus Torvalds

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default
@ 2022-02-14 13:29           ` Thomas Zimmermann
  0 siblings, 0 replies; 28+ messages in thread
From: Thomas Zimmermann @ 2022-02-14 13:29 UTC (permalink / raw)
  To: Geert Uytterhoeven
  Cc: linux-fbdev, deller, linux-staging, javierm, dri-devel, bernie,
	noralf, andriy.shevchenko, jayalk


[-- Attachment #1.1: Type: text/plain, Size: 2520 bytes --]

Hi

Am 14.02.22 um 10:05 schrieb Geert Uytterhoeven:
> Hi Thomas,
> 
> On Mon, Feb 14, 2022 at 9:28 AM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>> Am 14.02.22 um 09:05 schrieb Geert Uytterhoeven:
>>> On Thu, Feb 10, 2022 at 4:24 PM Thomas Zimmermann <tzimmermann@suse.de> wrote:
>>>> Fbdev's deferred I/O sorts all dirty pages by default, which incurs a
>>>> significant overhead. Make the sorting step optional and update the few
>>>> drivers that require it. Use a FIFO list by default.
>>>>
>>>> Sorting pages by memory offset for deferred I/O performs an implicit
>>>> bubble-sort step on the list of dirty pages. The algorithm goes through
>>>> the list of dirty pages and inserts each new page according to its
>>>> index field. Even worse, list traversal always starts at the first
>>>> entry. As video memory is most likely updated scanline by scanline, the
>>>> algorithm traverses through the complete list for each updated page.
>>>>
>>>> For example, with 1024x768x32bpp a page covers exactly one scanline.
>>>> Writing a single screen update from top to bottom requires updating
>>>> 768 pages. With an average list length of 384 entries, a screen update
>>>> creates (768 * 384 =) 294912 compare operation.
>>>
>>> What about using folios?
>>> If consecutive pages are merged into a single entry, there's much less
>>> (or nothing in the example above) to sort.
>>
>> How would the code know that? Calls to page_mkwrite happen
>> pagefault-by-pagefault in any order AFAICT.
> 
> fb_deferred_io_mkwrite() would still be called for a page, but an
> adjacent page can be merged with an existing entry while adding it
> to the list.

I still don't understand how we'd use it to our advantage. Most drivers 
don't need sorted pages at all. A folio has strong alignment 
requirements for size and offset AFAICT. We might end up flushing way 
too much of the display memory.

Best regards
Thomas

> 
> Gr{oetje,eeting}s,
> 
>                          Geert
> 
> --
> Geert Uytterhoeven -- There's lots of Linux beyond ia32 -- geert@linux-m68k.org
> 
> In personal conversations with technical people, I call myself a hacker. But
> when I'm talking to journalists I just say "programmer" or something like that.
>                                  -- Linus Torvalds

-- 
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 840 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-02-14 13:29 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-10 14:11 [PATCH 0/2] fbdev: Significantly improve performance of fbdefio mmap Thomas Zimmermann
2022-02-10 14:11 ` Thomas Zimmermann
2022-02-10 14:11 ` [PATCH 1/2] fbdev/defio: Early-out if page is already enlisted Thomas Zimmermann
2022-02-10 14:11   ` Thomas Zimmermann
2022-02-10 21:00   ` Sam Ravnborg
2022-02-10 21:00     ` Sam Ravnborg
2022-02-11  9:12     ` Thomas Zimmermann
2022-02-11  9:12       ` Thomas Zimmermann
2022-02-10 14:11 ` [PATCH 2/2] fbdev: Don't sort deferred-I/O pages by default Thomas Zimmermann
2022-02-10 14:11   ` Thomas Zimmermann
2022-02-10 16:15   ` Andy Shevchenko
2022-02-10 16:15     ` Andy Shevchenko
2022-02-10 21:16   ` Sam Ravnborg
2022-02-10 21:16     ` Sam Ravnborg
2022-02-11  7:58     ` Dan Carpenter
2022-02-11  7:58       ` Dan Carpenter
2022-02-11  8:25       ` Thomas Zimmermann
2022-02-11  8:25         ` Thomas Zimmermann
2022-02-11  8:21     ` Thomas Zimmermann
2022-02-11  8:21       ` Thomas Zimmermann
2022-02-14  8:05   ` Geert Uytterhoeven
2022-02-14  8:05     ` Geert Uytterhoeven
2022-02-14  8:28     ` Thomas Zimmermann
2022-02-14  8:28       ` Thomas Zimmermann
2022-02-14  9:05       ` Geert Uytterhoeven
2022-02-14  9:05         ` Geert Uytterhoeven
2022-02-14 13:29         ` Thomas Zimmermann
2022-02-14 13:29           ` Thomas Zimmermann

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.