All of lore.kernel.org
 help / color / mirror / Atom feed
From: lixiaokeng <lixiaokeng@huawei.com>
To: Martin Wilck <mwilck@suse.com>,
	Benjamin Marzinski <bmarzins@redhat.com>,
	 Christophe Varoqui <christophe.varoqui@opensvc.com>,
	dm-devel mailing list <dm-devel@redhat.com>
Cc: linfeilong <linfeilong@huawei.com>,
	"liuzhiqiang \(I\)" <liuzhiqiang26@huawei.com>,
	lihaotian9@huawei.com
Subject: Re: [dm-devel] [QUESTION]: multipath device with wrong path lead to metadata err
Date: Wed, 20 Jan 2021 10:30:58 +0800	[thread overview]
Message-ID: <d8ba8118-ce98-249a-cafd-021f0c1831a5@huawei.com> (raw)
In-Reply-To: <f86753b17cc7e85e7e0f7e711adec349323a7c5a.camel@suse.com>

Hi Martin:
    Thanks for your reply.


> verify_paths() would detect this. We do call verify_paths() in
> coalesce_paths() before calling domap(), but not immediately before.
> Perhaps we should move the verify_paths() call down to immediately
> before the domap() call. That would at least minimize the time window
> for this race. It's hard to avoid it entirely. The way multipathd is
> written, the vecs lock is held all the time during coalesce_paths(), 
> and thus no uevents can be processed. We could also consider calling
> verify_paths() before *and* after domap().

Can calling verify_paths() before *and* after domap() deal this entirely?

> Was this a map creation or a map reload? Was the map removed after the
> failure? Do you observe the message "ignoring map" or "removing map"?
>
> Do you observe a "remove" uevent for sdi? 

This was a map reload but sdi was not in old map. The  "removing map"
was observed. The "remove" uevent for sdi was not observed here.

> I wonder if you'd see the issue also if you run the same test without
> the "multipath -F; multipath -r" loop, or with just one. Ok, one
> multipath_query() loop simulates an admin working on the system, but 2
> parallel loops - 2 admins working in parallel, plus the intensive
> sequence of actions done in multipathd_query at the same time? The
> repeated "multipath -r" calls and multipathd commands will cause
> multipathd to spend a lot of time in reconfigure() and in cli_* calls
> holding the vecs lock, which makes it likely that uevents are missed or
> processed late.

As you said, there were lots of cli_* calls but no uevent when error
caused. And after finishing them, hundreds of uevent will be found (for
example ,"Forwarding 201 uevents" in log).

> Don't get me wrong, I don't argue against tough testing. But we should
> be aware that there are always time intervals during which multipathd's
> picture of the present devices is different from what the kernel sees.

What you said is very reasonable. When this problem was found, I think
it is difficult to solve that entirely, while it is hard to happen. Well,
I will discuss the rationality of test scripts with testers.

> There's definitely room for improvement in multipathd wrt locking and
> event processing in general, but that's a BIG piece of work.

Thanks again!
Regards
Lixiaokeng


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

  reply	other threads:[~2021-01-20  2:31 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-01-18 11:08 [dm-devel] [QUESTION]: multipath device with wrong path lead to metadata err lixiaokeng
2021-01-19  9:41 ` Martin Wilck
2021-01-19 12:46   ` lixiaokeng
2021-01-19 21:57 ` Martin Wilck
2021-01-20  2:30   ` lixiaokeng [this message]
2021-01-20 14:07     ` Martin Wilck
2021-01-25  1:33       ` lixiaokeng
2021-01-25 12:28         ` Martin Wilck
2021-01-26  6:40           ` lixiaokeng
2021-01-26 11:14           ` lixiaokeng
2021-01-26 23:11             ` Martin Wilck
2021-01-28  8:27               ` lixiaokeng
2021-01-28 21:15                 ` Martin Wilck
2021-02-04 11:25               ` lixiaokeng
2021-02-04 14:56                 ` Martin Wilck
2021-02-05 11:49                   ` lixiaokeng
2021-01-20 13:02   ` Roger Heflin
2021-01-20 20:45     ` Martin Wilck

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=d8ba8118-ce98-249a-cafd-021f0c1831a5@huawei.com \
    --to=lixiaokeng@huawei.com \
    --cc=bmarzins@redhat.com \
    --cc=christophe.varoqui@opensvc.com \
    --cc=dm-devel@redhat.com \
    --cc=lihaotian9@huawei.com \
    --cc=linfeilong@huawei.com \
    --cc=liuzhiqiang26@huawei.com \
    --cc=mwilck@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.