linux-lvm.redhat.com archive mirror
 help / color / mirror / Atom feed
From: Heming Zhao <heming.zhao@suse.com>
To: David Teigland <teigland@redhat.com>
Cc: Gang He <GHe@suse.com>, "linux-lvm@redhat.com" <linux-lvm@redhat.com>
Subject: Re: [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512"
Date: Mon, 14 Oct 2019 03:13:13 +0000	[thread overview]
Message-ID: <bf4a79f3-3604-9981-b6f2-5ad702c8d09a@suse.com> (raw)
In-Reply-To: <d8f2f0af-0b54-76ca-6a44-adabc73f1a08@suse.com>

For the issue in bcache_flush, it's related with cache->errored.

I give my fix. I believe there should have better solution than my.

Solution:
To keep cache->errored, but this list only use to save error data,
and the error data never resend.
So bcache_flush check the cache->errored, when the errored list is not empty,
bcache_flush return false, it will trigger caller/upper to do the clean jobs.

```
commit 17e959c0ba58edc67b6caa7669444ecffa40a16f (HEAD -> master)
Author: Zhao Heming <heming.zhao@suse.com>
Date:   Mon Oct 14 10:57:54 2019 +0800

     The fd in cache->errored may already be closed before calling bcache_flush,
     so bcache_flush shouldn't rewrite data in cache->errored. Currently
     solution is return error to caller when cache->errored is not empty, and
     caller should do all the clean jobs.
     
     Signed-off-by: Zhao Heming <heming.zhao@suse.com>

diff --git a/lib/device/bcache.c b/lib/device/bcache.c
index cfe01bac2f..2eb3f0ee34 100644
--- a/lib/device/bcache.c
+++ b/lib/device/bcache.c
@@ -897,16 +897,20 @@ static bool _wait_io(struct bcache *cache)
   * High level IO handling
   *--------------------------------------------------------------*/
  
-static void _wait_all(struct bcache *cache)
+static bool _wait_all(struct bcache *cache)
  {
+       bool ret = true;
         while (!dm_list_empty(&cache->io_pending))
-               _wait_io(cache);
+               ret = _wait_io(cache);
+       return ret;
  }
  
-static void _wait_specific(struct block *b)
+static bool _wait_specific(struct block *b)
  {
+       bool ret = true;
         while (_test_flags(b, BF_IO_PENDING))
-               _wait_io(b->cache);
+               ret = _wait_io(b->cache);
+       return ret;
  }
  
  static unsigned _writeback(struct bcache *cache, unsigned count)
@@ -1262,10 +1266,7 @@ void bcache_put(struct block *b)
  
  bool bcache_flush(struct bcache *cache)
  {
-       // Only dirty data is on the errored list, since bad read blocks get
-       // recycled straight away.  So we put these back on the dirty list, and
-       // try and rewrite everything.
-       dm_list_splice(&cache->dirty, &cache->errored);
+       bool ret = true;
  
         while (!dm_list_empty(&cache->dirty)) {
                 struct block *b = dm_list_item(_list_pop(&cache->dirty), struct block);
@@ -1275,11 +1276,18 @@ bool bcache_flush(struct bcache *cache)
                 }
  
                 _issue_write(b);
+               if (b->error) ret = false;
         }
  
-       _wait_all(cache);
+       ret = _wait_all(cache);
  
-       return dm_list_empty(&cache->errored);
+       // merge the errored list to dirty, return false to trigger caller to
+       // clean them.
+       if (!dm_list_empty(&cache->errored)) {
+               dm_list_splice(&cache->dirty, &cache->errored);
+               ret = false;
+       }
+       return ret;
  }
  
  //----------------------------------------------------------------
```

  parent reply	other threads:[~2019-10-14  3:13 UTC|newest]

Thread overview: 16+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-09-11  9:17 [linux-lvm] pvresize will cause a meta-data corruption with error message "Error writing device at 4096 length 512" Gang He
2019-09-11 10:01 ` Ilia Zykov
2019-09-11 10:03 ` Ilia Zykov
2019-09-11 10:10   ` Ingo Franzki
2019-09-11 10:20     ` Gang He
2019-10-11  8:11 ` Heming Zhao
2019-10-11  9:22   ` Heming Zhao
2019-10-11 10:38     ` Zdenek Kabelac
2019-10-11 11:50       ` Heming Zhao
2019-10-11 15:14   ` David Teigland
2019-10-12  3:23     ` Gang He
2019-10-12  6:34     ` Heming Zhao
2019-10-12  7:11       ` Heming Zhao
2019-10-14  3:07         ` Heming Zhao
2019-10-14  3:13         ` Heming Zhao [this message]
2019-10-16  8:50           ` Heming Zhao

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bf4a79f3-3604-9981-b6f2-5ad702c8d09a@suse.com \
    --to=heming.zhao@suse.com \
    --cc=GHe@suse.com \
    --cc=linux-lvm@redhat.com \
    --cc=teigland@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).