linux-mm.kvack.org archive mirror
 help / color / mirror / Atom feed
* [PATCH] mm/filemap.c: clear page error before actual read
@ 2020-03-04 10:47 Xianting Tian
  2020-03-04 12:15 ` Matthew Wilcox
  0 siblings, 1 reply; 7+ messages in thread
From: Xianting Tian @ 2020-03-04 10:47 UTC (permalink / raw)
  To: akpm, willy; +Cc: linux-mm, linux-kernel

Mount failure issue happens under the scenario:
Application forked dozens of threads to mount the same number of
cramfs images separately in docker, but several mounts failed
with high probability. Mount failed due to the checking result of
the page(read from the superblock of loop dev) is not uptodate
after wait_on_page_locked(page) returned in function cramfs_read:
   wait_on_page_locked(page);
   if (!PageUptodate(page)) {
      ...
   }

The reason of the checking result of the page not uptodate:
systemd-udevd read the loopX dev before mount, because the status
of loopX is Lo_unbound at this time, so loop_make_request directly
trigger the calling of io_end handler end_buffer_async_read, which
called SetPageError(page). So It caused the page can't be set to
uptodate in function end_buffer_async_read:
   if(page_uptodate && !PageError(page)) {
      SetPageUptodate(page);
   }
Then mount operation is performed, it used the same page which is
just accessed by systemd-udevd above, Because this page is not
uptodate, it will launch a actual read via submit_bh, then wait on
this page by calling wait_on_page_locked(page). When the I/O of
the page done, io_end handler end_buffer_async_read is called,
because no one cleared the page error(during the whole read path of
mount), which is caused by systemd-udevd reading, so this page is
still in "PageError" status, which can't be set to uptodate in
function end_buffer_async_read, then caused mount failure.

But sometimes mount succeed even through systemd-udeved read loopX
dev just before, The reason is systemd-udevd launched other loopX
read just between step 3.1 and 3.2, the steps as below:
1, loopX dev default status is Lo_unbound;
2, systemd-udved read loopX dev (page is set to PageError);
3, mount operation
   1) set loopX status to Lo_bound;
   ==>systemd-udevd read loopX dev<==
   2) read loopX dev(page has no error)
   3) mount succeed
As the loopX dev status is set to Lo_bound after step 3.1, so the
other loopX dev read by systemd-udevd will go through the whole
I/O stack, part of the call trace as below:
   SYS_read
      vfs_read
          do_sync_read
              blkdev_aio_read
                 generic_file_aio_read
                     do_generic_file_read:
                        ClearPageError(page);
                        mapping->a_ops->readpage(filp, page);
here, mapping->a_ops->readpage() is blkdev_readpage.
In latest kernel, some function name changed, the call trace as
below:
   blkdev_read_iter
      generic_file_read_iter
         generic_file_buffered_read:
            /*
             * A previous I/O error may have been due to temporary
             * failures, eg. mutipath errors.
             * Pg_error will be set again if readpage fails.
             */
            ClearPageError(page);
            /* Start the actual read. The read will unlock the page*/
            error=mapping->a_ops->readpage(flip, page);
We can see ClearPageError(page) is called before the actual read,
then the read in step 3.2 succeed.

This patch is to add the calling of ClearPageError just before the
actual read of read path of cramfs mount.
Without the patch, the call trace as below when performing cramfs
mount:
   do_mount
      cramfs_read
         cramfs_blkdev_read
            read_cache_page
               do_read_cache_page:
                  filler(data, page);
                  or
                  mapping->a_ops->readpage(data, page);
With the patch, the call trace as below when performing mount:
   do_mount
      cramfs_read
         cramfs_blkdev_read
            read_cache_page:
               do_read_cache_page:
                  ClearPageError(page); <== new add
                  filler(data, page);
                  or
                  mapping->a_ops->readpage(data, page);

With the patch, mount operation trigger the calling of
ClearPageError(page) before the actual read, the page has no error
if no additional page error happen when I/O done.

Signed-off-by: Xianting Tian <xianting_tian@126.com>
---
 mm/filemap.c | 8 ++++++++
 1 file changed, 8 insertions(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index 1784478..77c370d 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2823,6 +2823,14 @@ static struct page *do_read_cache_page(struct address_space *mapping,
 		unlock_page(page);
 		goto out;
 	}
+
+	/*
+	 * A previous I/O error may have been due to temporary
+	 * failures.
+	 * Clear page error before actual read, PG_error will be
+	 * set again if read page fails.
+	 */
+	ClearPageError(page);
 	goto filler;
 
 out:
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread
* [PATCH] mm/filemap.c: clear page error before actual read
@ 2020-03-03 15:09 Xianting Tian
  2020-03-03 16:43 ` Matthew Wilcox
  0 siblings, 1 reply; 7+ messages in thread
From: Xianting Tian @ 2020-03-03 15:09 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel

Mount failure issue happens under the scenario:
Application forked dozens of threads to mount the same number of
cramfs images separately in docker, but several mounts failed
with high probability. Mount failed due to the checking result of
the page(read from the superblock of loop dev) is not uptodate
after wait_on_page_locked(page) returned in function cramfs_read:
   wait_on_page_locked(page);
   if (!PageUptodate(page)) {
      ...
   }

The reason of the checking result of the page not uptodate:
systemd-udevd read the loopX dev before mount, because the status
of loopX is Lo_unbound at this time, so loop_make_request directly
trigger the calling of io_end handler end_buffer_async_read, which
called SetPageError(page). So It caused the page can't be set to
uptodate in function end_buffer_async_read:
   if(page_uptodate && !PageError(page)) {
      SetPageUptodate(page);
   }
Then mount operation is performed, it used the same page which is
just accessed by systemd-udevd above, Because this page is not
uptodate, it will launch a actual read via submit_bh, then wait on
this page by calling wait_on_page_locked(page). When the I/O of
the page done, io_end handler end_buffer_async_read is called,
because no one cleared the page error(during the whole read path of
mount), which is caused by systemd-udevd reading, so this page is
still in "PageError" status, which can't be set to uptodate in
function end_buffer_async_read, then caused mount failure.

But sometimes mount succeed even through systemd-udeved read loopX
dev just before, The reason is systemd-udevd launched other loopX
read just between step 3.1 and 3.2, the steps as below:
1, loopX dev default status is Lo_unbound;
2, systemd-udved read loopX dev (page is set to PageError);
3, mount operation
   1) set loopX status to Lo_bound;
   ==>systemd-udevd read loopX dev<==
   2) read loopX dev(page has no error)
   3) mount succeed
As the loopX dev status is set to Lo_bound after step 3.1, so the
other loopX dev read by systemd-udevd will go through the whole
I/O stack, part of the call trace as below:
   SYS_read
      vfs_read
          do_sync_read
              blkdev_aio_read
                 generic_file_aio_read
                     do_generic_file_read:
                        ClearPageError(page);
                        mapping->a_ops->readpage(filp, page);
here, mapping->a_ops->readpage() is blkdev_readpage.
In latest kernel, some function name changed, the call trace as
below:
   blkdev_read_iter
      generic_file_read_iter
         generic_file_buffered_read:
            /*
             * A previous I/O error may have been due to temporary
             * failures, eg. mutipath errors.
             * Pg_error will be set again if readpage fails.
             */
            ClearPageError(page);
            /* Start the actual read. The read will unlock the page*/
            error=mapping->a_ops->readpage(flip, page);
We can see ClearPageError(page) is called before the actual read,
then the read in step 3.2 succeed.

This patch is to add the calling of ClearPageError just before the
actual read of read path of cramfs mount.
Without the patch, the call trace as below when performing cramfs
mount:
   do_mount
      cramfs_read
         cramfs_blkdev_read
            read_cache_page
               do_read_cache_page:
                  filler(data, page);
                  or
                  mapping->a_ops->readpage(data, page);
With the patch, the call trace as below when performing mount:
   do_mount
      cramfs_read
         cramfs_blkdev_read
            read_cache_page:
               do_read_cache_page:
                  ClearPageError(page); <== new add
                  filler(data, page);
                  or
                  mapping->a_ops->readpage(data, page);

With the patch, mount operation trigger the calling of
ClearPageError(page) before the actual read, the page has no error
if no additional page error happen when I/O done.

Signed-off-by: Xianting Tian <xianting_tian@126.com>
---
 mm/filemap.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index 1784478..d65428f 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2755,6 +2755,13 @@ static struct page *do_read_cache_page(struct address_space *mapping,
 		}
 
 filler:
+		/*
+		 * A previous I/O error may have been due to temporary
+		 * failures.
+		 * Clear page error before actual read, PG_error will be
+		 * set again if read page fails.
+		 */
+		ClearPageError(page);
 		if (filler)
 			err = filler(data, page);
 		else
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 7+ messages in thread
* [PATCH] mm/filemap.c: clear page error before actual read
@ 2020-03-03  8:25 Xianting Tian
  2020-04-21  3:42 ` Andrew Morton
  0 siblings, 1 reply; 7+ messages in thread
From: Xianting Tian @ 2020-03-03  8:25 UTC (permalink / raw)
  To: akpm; +Cc: linux-mm, linux-kernel, Xianting Tian

Mount failure issue happens under the scenario:
Application totally forked dozens of threads to mount the same
number of cramfs images separately in docker, but several mounts
failed with high probability.
Mount failed due to the checking result of the page
(read from the superblock of loop dev) is not uptodate after
wait_on_page_locked(page) returned in function cramfs_read:
   wait_on_page_locked(page);
   if (!PageUptodate(page)) {
      ...
   }

The reason of the checking result of the page not uptodate:
systemd-udevd read the loopX dev before mount, because the status
of loopX is Lo_unbound at this time, so loop_make_request directly
trigger the calling of io_end handler end_buffer_async_read, which
called SetPageError(page). So It caused the page can't be set to
uptodate in function end_buffer_async_read:
   if(page_uptodate && !PageError(page)) {
      SetPageUptodate(page);
   }
Then mount operation is performed, it used the same page which is
just accessed by systemd-udevd above, Because this page is not
uptodate, it will launch a actual read via submit_bh, then wait on
this page by calling wait_on_page_locked(page). When the I/O of the
page done, io_end handler end_buffer_async_read is called, because
no one cleared the page error(during the whole read path of mount),
which is caused by systemd-udevd, so this page is still in "PageError"
status, which is can't be set to uptodate in function
end_buffer_async_read, then caused mount failure.

But sometimes mount succeed even through systemd-udeved read loop
dev just before, The reason is systemd-udevd launched other loopX
read just between step 3.1 and 3.2, the steps as below:
1, loopX dev default status is Lo_unbound;
2, systemd-udved read loopX dev (page is set to PageError);
3, mount operation
   1) set loopX status to Lo_bound;
    ==>systemd-udevd read loopX dev<==
   2) read loopX dev(page has no error)
   3) mount succeed
As the loopX dev status is set to Lo_bound after step 3.1, so the
other loopX dev read by systemd-udevd will go through the whole I/O
stack, part of the call trace as below:
   SYS_read
      vfs_read
          do_sync_read
              blkdev_aio_read
                 generic_file_aio_read
                     do_generic_file_read:
                         ClearPageError(page);
                         mapping->a_ops->readpage(filp, page);
here, mapping->a_ops->readpage() is blkdev_readpage.
In latest kernel, some function name changed, the call trace as
below:
   blkdev_read_iter
      generic_file_read_iter
         generic_file_buffered_read:
            /*
             * A previous I/O error may have been due to temporary
             * failures, eg. multipath errors.
             * PG_error will be set again if readpage fails.
             */
            ClearPageError(page);
            /* Start the actual read.The read will unlock the page*/
            error = mapping->a_ops->readpage(filp, page);

We can see ClearPageError(page) is called before the actual read,
then the read in step 3.2 succeed, page has no error.

The patch is to add the calling of ClearPageError just before the
actual read of mount read path. Without the patch, the call trace
as below when performing mount:
  Do_mount
     ramfs_read
       cramfs_blkdev_read
          read_mapping_page
             read_cache_page
                 do_read_cache_page:
                    filler(data, page);
                    or mapping->a_ops->readpage(data, page);
With the patch, the call trace as below when performing mount:
  Do_mount
     cramfs_read
        cramfs_blkdev_read
           read_mapping_page
              read_cache_page
                 do_read_cache_page:
                    ClearPageError(page); <==new add
                    filler(data, page);
                    or mapping->a_ops->readpage(data, page);

With the patch, mount operation trigger the calling of
ClearPageError(page) before the actual read, the page has no
error if no additional page error happen when I/O done.

Signed-off-by: Xianting Tian <tian.xianting@h3c.com>
---
 mm/filemap.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index 178447827..d65428f26 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2755,6 +2755,13 @@ static struct page *do_read_cache_page(struct address_space *mapping,
                }

 filler:
+               /*
+                * A previous I/O error may have been due to temporary
+                * failures.
+                * Clear page error before actual read, PG_error will be
+                * set again if read page fails.
+                */
+               ClearPageError(page);
                if (filler)
                        err = filler(data, page);
                else
--
2.17.1

-------------------------------------------------------------------------------------------------------------------------------------
±¾Óʼþ¼°Æ丽¼þº¬ÓÐлªÈý¼¯Íŵı£ÃÜÐÅÏ¢£¬½öÏÞÓÚ·¢Ë͸øÉÏÃæµØÖ·ÖÐÁгö
µÄ¸öÈË»òȺ×é¡£½ûÖ¹ÈκÎÆäËûÈËÒÔÈκÎÐÎʽʹÓ㨰üÀ¨µ«²»ÏÞÓÚÈ«²¿»ò²¿·ÖµØй¶¡¢¸´ÖÆ¡¢
»òÉ¢·¢£©±¾ÓʼþÖеÄÐÅÏ¢¡£Èç¹ûÄú´íÊÕÁ˱¾Óʼþ£¬ÇëÄúÁ¢¼´µç»°»òÓʼþ֪ͨ·¢¼þÈ˲¢É¾³ý±¾
Óʼþ£¡
This e-mail and its attachments contain confidential information from New H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!


^ permalink raw reply related	[flat|nested] 7+ messages in thread
* [PATCH] mm/filemap.c: clear page error before actual read
@ 2020-03-03  7:18 Xianting Tian
  0 siblings, 0 replies; 7+ messages in thread
From: Xianting Tian @ 2020-03-03  7:18 UTC (permalink / raw)
  To: linux-mm, linux-kernel; +Cc: yubin, Xianting Tian

Mount failure issue happens under the scenario:
Application totally forked dozens of threads to mount the same
number of cramfs images separately in docker, but several mounts
failed with high probability.
Mount failed due to the checking result of the page
(read from the superblock of loop dev) is not uptodate after
wait_on_page_locked(page) returned in function cramfs_read:
   wait_on_page_locked(page);
   if (!PageUptodate(page)) {
      ...
   }

The reason of the checking result of the page not uptodate:
systemd-udevd read the loopX dev before mount, because the status
of loopX is Lo_unbound at this time, so loop_make_request directly
trigger the calling of io_end handler end_buffer_async_read, which
called SetPageError(page). So It caused the page can't be set to
uptodate in function end_buffer_async_read:
   if(page_uptodate && !PageError(page)) {
      SetPageUptodate(page);
   }
Then mount operation is performed, it used the same page which is
just accessed by systemd-udevd above, Because this page is not
uptodate, it will launch a actual read via submit_bh, then wait on
this page by calling wait_on_page_locked(page). When the I/O of the
page done, io_end handler end_buffer_async_read is called, because
no one cleared the page error(during the whole read path of mount),
which is caused by systemd-udevd, so this page is still in "PageError"
status, which is can't be set to uptodate in function
end_buffer_async_read, then caused mount failure.

But sometimes mount succeed even through systemd-udeved read loop
dev just before, The reason is systemd-udevd launched other loopX
read just between step 3.1 and 3.2, the steps as below:
1, loopX dev default status is Lo_unbound;
2, systemd-udved read loopX dev (page is set to PageError);
3, mount operation
   1) set loopX status to Lo_bound;
    ==>systemd-udevd read loopX dev<==
   2) read loopX dev(page has no error)
   3) mount succeed
As the loopX dev status is set to Lo_bound after step 3.1, so the
other loopX dev read by systemd-udevd will go through the whole I/O
stack, part of the call trace as below:
   SYS_read
      vfs_read
          do_sync_read
              blkdev_aio_read
                 generic_file_aio_read
                     do_generic_file_read:
                         ClearPageError(page);
                         mapping->a_ops->readpage(filp, page);
here, mapping->a_ops->readpage() is blkdev_readpage.
In latest kernel, some function name changed, the call trace as
below:
   blkdev_read_iter
      generic_file_read_iter
         generic_file_buffered_read:
            /*
             * A previous I/O error may have been due to temporary
             * failures, eg. multipath errors.
             * PG_error will be set again if readpage fails.
             */
            ClearPageError(page);
            /* Start the actual read.The read will unlock the page*/
            error = mapping->a_ops->readpage(filp, page);

We can see ClearPageError(page) is called before the actual read,
then the read in step 3.2 succeed, page has no error.

The patch is to add the calling of ClearPageError just before the
actual read of mount read path. Without the patch, the call trace
as below when performing mount:
  Do_mount
     ramfs_read
       cramfs_blkdev_read
          read_mapping_page
             read_cache_page
                 do_read_cache_page:
                    filler(data, page);
                    or mapping->a_ops->readpage(data, page);
With the patch, the call trace as below when performing mount:
  Do_mount
     cramfs_read
        cramfs_blkdev_read
           read_mapping_page
              read_cache_page
                 do_read_cache_page:
                    ClearPageError(page); <==new add
                    filler(data, page);
                    or mapping->a_ops->readpage(data, page);

With the patch, mount operation trigger the calling of
ClearPageError(page) before the actual read, the page has no
error if no additional page error happen when I/O done.

Signed-off-by: Xianting Tian <tian.xianting@h3c.com>
---
 mm/filemap.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/mm/filemap.c b/mm/filemap.c
index 178447827..d65428f26 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -2755,6 +2755,13 @@ static struct page *do_read_cache_page(struct address_space *mapping,
                }

 filler:
+               /*
+                * A previous I/O error may have been due to temporary
+                * failures.
+                * Clear page error before actual read, PG_error will be
+                * set again if read page fails.
+                */
+               ClearPageError(page);
                if (filler)
                        err = filler(data, page);
                else
--
2.17.1

-------------------------------------------------------------------------------------------------------------------------------------
±¾Óʼþ¼°Æ丽¼þº¬ÓÐлªÈý¼¯Íŵı£ÃÜÐÅÏ¢£¬½öÏÞÓÚ·¢Ë͸øÉÏÃæµØÖ·ÖÐÁгö
µÄ¸öÈË»òȺ×é¡£½ûÖ¹ÈκÎÆäËûÈËÒÔÈκÎÐÎʽʹÓ㨰üÀ¨µ«²»ÏÞÓÚÈ«²¿»ò²¿·ÖµØй¶¡¢¸´ÖÆ¡¢
»òÉ¢·¢£©±¾ÓʼþÖеÄÐÅÏ¢¡£Èç¹ûÄú´íÊÕÁ˱¾Óʼþ£¬ÇëÄúÁ¢¼´µç»°»òÓʼþ֪ͨ·¢¼þÈ˲¢É¾³ý±¾
Óʼþ£¡
This e-mail and its attachments contain confidential information from New H3C, which is
intended only for the person or entity whose address is listed above. Any use of the
information contained herein in any way (including, but not limited to, total or partial
disclosure, reproduction, or dissemination) by persons other than the intended
recipient(s) is prohibited. If you receive this e-mail in error, please notify the sender
by phone or email immediately and delete it!


^ permalink raw reply related	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2020-04-21  3:42 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-03-04 10:47 [PATCH] mm/filemap.c: clear page error before actual read Xianting Tian
2020-03-04 12:15 ` Matthew Wilcox
  -- strict thread matches above, loose matches on Subject: below --
2020-03-03 15:09 Xianting Tian
2020-03-03 16:43 ` Matthew Wilcox
2020-03-03  8:25 Xianting Tian
2020-04-21  3:42 ` Andrew Morton
2020-03-03  7:18 Xianting Tian

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).