All of lore.kernel.org
 help / color / mirror / Atom feed
* master - multipathd: fix fd leak when iscsi device logs in
@ 2020-07-09  7:02 lixiaokeng
  2020-07-09  8:54 ` Zdenek Kabelac
  0 siblings, 1 reply; 12+ messages in thread
From: lixiaokeng @ 2020-07-09  7:02 UTC (permalink / raw)
  To: lvm-devel

When one iscsi device logs in and logs out several times, the
number of fd, which points to '/dev/mapper/control', increases in
/proc/<multipathd-pid>/fd as follows,
[root at localhost fd]# ll | grep control
lrwx------. 1 root root 64 Jul  6 16:30 10 -> /dev/mapper/control
lrwx------. 1 root root 64 Jul  6 16:30 11 -> /dev/mapper/control
lrwx------. 1 root root 64 Jul  6 16:30 12 -> /dev/mapper/control
lrwx------. 1 root root 64 Jul  6 16:30 13 -> /dev/mapper/control
lrwx------. 1 root root 64 Jul  6 16:30 5 -> /dev/mapper/control
lrwx------. 1 root root 64 Jul  6 16:30 9 -> /dev/mapper/control

This is because that both wait_dmevents thread and uevqloop thread in
multipathd will concurrently open and use public _control_fd which points
to '/dev/mapper/control' by calling _open_and_assign_control_fd func.
Although, the legality of _control_fd will be checked before open
'/dev/mapper/control' in _open_control func. Due to lack of lock protection,
both wait_dmevents thread and uevqloop thread may concurrently find
_control_fd illegal, and concurrently open '/dev/mapper/control' and reset
the public _control_fd to their new fd, then finally the allocated fd in faster
one will leak.

The procedure details given as follows,
1. wait_dmevents thread
wait_dmevents
->dmevent_loop
    ->dm_get_events
        ->dm_task_run
            ->_open_control
                ->_open_and_assign_control_fd

2. uevqloop thread
uevqloop
->uevent_dispatch
    ->service_uevq
        ->ev_add_path
            ->__setup_multipath
                ->dm_get_info
                    ->dm_task_run
                        ->_open_control
                            ->_open_and_assign_control_fd

3. _open_control func
_open_control
    -> check whether _control_fd is legal, if _control_fd is legal, return 1
    -> _create_control
    ->_open_and_assign_control_fd
        -> open '/dev/mapper/control' and set _control_fd to the new fd

Here, we add _control_fd_mutex to protect the procedure of
open('/dev/mapper/control') in _open_and_assign_control_fd func.

Reported-by: Tianxiong Lu <lutianxiong@huawei.com>
Signed-off-by: lixiaokeng <lixiaokeng@huawei.com>
Signed-off-by: Guanghao Wu <wuguanghao3@huawei.com>
Signed-off-by: Zhiqiang Liu <liuzhiqiang26@huawei.com>
---
 libdm/ioctl/libdm-iface.c | 11 +++++++++++
 1 file changed, 11 insertions(+)

diff --git a/libdm/ioctl/libdm-iface.c b/libdm/ioctl/libdm-iface.c
index 7ad549c..7168369 100644
--- a/libdm/ioctl/libdm-iface.c
+++ b/libdm/ioctl/libdm-iface.c
@@ -23,6 +23,7 @@
 #include <sys/ioctl.h>
 #include <sys/utsname.h>
 #include <limits.h>
+#include <pthread.h>

 #ifdef __linux__
 #  include "libdm/misc/kdev_t.h"
@@ -81,6 +82,7 @@ static dm_bitset_t _dm_bitset = NULL;
 static uint32_t _dm_device_major = 0;

 static int _control_fd = -1;
+static pthread_mutex_t _control_fd_mutex = PTHREAD_MUTEX_INITIALIZER;
 static int _hold_control_fd_open = 0;
 static int _version_checked = 0;
 static int _version_ok = 1;
@@ -404,10 +406,19 @@ static void _close_control_fd(void)
 #ifdef DM_IOCTLS
 static int _open_and_assign_control_fd(const char *control)
 {
+	pthread_mutex_lock(&_control_fd_mutex);
+
+	if (_control_fd != -1) {
+		pthread_mutex_unlock(&_control_fd_mutex);
+		return 1;
+	}
+
 	if ((_control_fd = open(control, O_RDWR)) < 0) {
 		log_sys_error("open", control);
+		pthread_mutex_unlock(&_control_fd_mutex);
 		return 0;
 	}
+	pthread_mutex_unlock(&_control_fd_mutex);

 	return 1;
 }
-- 
1.8.3.1



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* master - multipathd: fix fd leak when iscsi device logs in
  2020-07-09  7:02 master - multipathd: fix fd leak when iscsi device logs in lixiaokeng
@ 2020-07-09  8:54 ` Zdenek Kabelac
  2020-07-13  2:15   ` [lvm-devel] " lixiaokeng
  0 siblings, 1 reply; 12+ messages in thread
From: Zdenek Kabelac @ 2020-07-09  8:54 UTC (permalink / raw)
  To: lvm-devel

Dne 09. 07. 20 v 9:02 lixiaokeng napsal(a):
> When one iscsi device logs in and logs out several times, the
> number of fd, which points to '/dev/mapper/control', increases in
> /proc/<multipathd-pid>/fd as follows,
> [root at localhost fd]# ll | grep control

> diff --git a/libdm/ioctl/libdm-iface.c b/libdm/ioctl/libdm-iface.c
> index 7ad549c..7168369 100644
> --- a/libdm/ioctl/libdm-iface.c
> +++ b/libdm/ioctl/libdm-iface.c
> @@ -23,6 +23,7 @@
>   #include <sys/ioctl.h>
>   #include <sys/utsname.h>
>   #include <limits.h>
> +#include <pthread.h>
> 
>   #ifdef __linux__
>   #  include "libdm/misc/kdev_t.h"
> @@ -81,6 +82,7 @@ static dm_bitset_t _dm_bitset = NULL;
>   static uint32_t _dm_device_major = 0;
> 
>   static int _control_fd = -1;
> +static pthread_mutex_t _control_fd_mutex = PTHREAD_MUTEX_INITIALIZER;
>   static int _hold_control_fd_open = 0;
>   static int _version_checked = 0;
>   static int _version_ok = 1;
> @@ -404,10 +406,19 @@ static void _close_control_fd(void)
>   #ifdef DM_IOCTLS
>   static int _open_and_assign_control_fd(const char *control)
>   {
> +	pthread_mutex_lock(&_control_fd_mutex);
> +
> +	if (_control_fd != -1) {



Hi

libdm is not pthread aware/safe library.

So the fix needs to happen on libdm user-side to prevent race call of this 
function.


Zdenek



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [lvm-devel] master - multipathd: fix fd leak when iscsi device logs in
  2020-07-09  8:54 ` Zdenek Kabelac
@ 2020-07-13  2:15   ` lixiaokeng
  2020-07-13  9:21     ` Martin Wilck
  0 siblings, 1 reply; 12+ messages in thread
From: lixiaokeng @ 2020-07-13  2:15 UTC (permalink / raw)
  To: Zdenek Kabelac, Christophe Varoqui, Martin Wilck, Benjamin Marzinski
  Cc: linfeilong, dm-devel, liuzhiqiang26, lutianxiong

Hi

Now the number of fd pointing /dev/mapper/control in multipathd process
increases when iscsi device logs in. The reason is that wait_dmevents
thread and uevqloop thread call _open_and_assign_control_fd concurrently.
If lock add to _open_and_assign_control_fd fun in lvm2/libdm/libdm-iface.c,
the trouble is solved easily but Zdenek Kabelac said that libdm is not
pthread aware/safe library.So the lock could not be added to libdm. If
the lock add to multipath-tools, there will be a lot of positions where
dm_task_run is called and the lock shuold be added. It may degrade
multipath-tools' performance. I don't have any other good idea about
this trouble. Do you have some good idea about it? There is an another
problem to me. Multipathd is a process with multi-thread and libdm is
not pthread aware/safe library, why multipathd use libdm with no
protect? Thanks.

The procedure details when fd leak occurs given as follows:
1. wait_dmevents thread
wait_dmevents
->dmevent_loop
    ->dm_get_events (->dm_is_mpath)
        ->dm_task_run
            ->_open_control
                ->_open_and_assign_control_fd

2. uevqloop thread
uevqloop
->uevent_dispatch
    ->service_uevq
        ->ev_add_path
            ->__setup_multipath
                ->dm_get_info
                    ->dm_task_run
                        ->_open_control
                            ->_open_and_assign_control_fd

Lixiaokeng

On 2020/7/9 16:54, Zdenek Kabelac wrote:
> Dne 09. 07. 20 v 9:02 lixiaokeng napsal(a):
>> When one iscsi device logs in and logs out several times, the
>> number of fd, which points to '/dev/mapper/control', increases in
>> /proc/<multipathd-pid>/fd as follows,
>> [root@localhost fd]# ll | grep control
> 
>> diff --git a/libdm/ioctl/libdm-iface.c b/libdm/ioctl/libdm-iface.c
>> index 7ad549c..7168369 100644
>> --- a/libdm/ioctl/libdm-iface.c
>> +++ b/libdm/ioctl/libdm-iface.c
>> @@ -23,6 +23,7 @@
>>   #include <sys/ioctl.h>
>>   #include <sys/utsname.h>
>>   #include <limits.h>
>> +#include <pthread.h>
>>
>>   #ifdef __linux__
>>   #  include "libdm/misc/kdev_t.h"
>> @@ -81,6 +82,7 @@ static dm_bitset_t _dm_bitset = NULL;
>>   static uint32_t _dm_device_major = 0;
>>
>>   static int _control_fd = -1;
>> +static pthread_mutex_t _control_fd_mutex = PTHREAD_MUTEX_INITIALIZER;
>>   static int _hold_control_fd_open = 0;
>>   static int _version_checked = 0;
>>   static int _version_ok = 1;
>> @@ -404,10 +406,19 @@ static void _close_control_fd(void)
>>   #ifdef DM_IOCTLS
>>   static int _open_and_assign_control_fd(const char *control)
>>   {
>> +    pthread_mutex_lock(&_control_fd_mutex);
>> +
>> +    if (_control_fd != -1) {
> 
> 
> 
> Hi
> 
> libdm is not pthread aware/safe library.
> 
> So the fix needs to happen on libdm user-side to prevent race call of this function.
> 
> 
> Zdenek
> 
> 
> .


--
dm-devel mailing list
dm-devel@redhat.com
https://www.redhat.com/mailman/listinfo/dm-devel

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [lvm-devel] master - multipathd: fix fd leak when iscsi device logs in
  2020-07-13  2:15   ` [lvm-devel] " lixiaokeng
@ 2020-07-13  9:21     ` Martin Wilck
  2020-07-13  9:56       ` Zdenek Kabelac
  0 siblings, 1 reply; 12+ messages in thread
From: Martin Wilck @ 2020-07-13  9:21 UTC (permalink / raw)
  To: lixiaokeng, Zdenek Kabelac, Christophe Varoqui, Benjamin Marzinski
  Cc: linfeilong, dm-devel, liuzhiqiang26, lutianxiong

[-- Attachment #1: Type: text/plain, Size: 4981 bytes --]

Hi Lixiaokeng,

On Mon, 2020-07-13 at 10:15 +0800, lixiaokeng wrote:
> Hi
> 
> Now the number of fd pointing /dev/mapper/control in multipathd
> process
> increases when iscsi device logs in. The reason is that wait_dmevents
> thread and uevqloop thread call _open_and_assign_control_fd
> concurrently.
> If lock add to _open_and_assign_control_fd fun in lvm2/libdm/libdm-
> iface.c,
> the trouble is solved easily but Zdenek Kabelac said that libdm is
> not
> pthread aware/safe library.So the lock could not be added to libdm.
> If
> the lock add to multipath-tools, there will be a lot of positions
> where
> dm_task_run is called and the lock shuold be added. It may degrade
> multipath-tools' performance. I don't have any other good idea about
> this trouble. Do you have some good idea about it? There is an
> another
> problem to me. Multipathd is a process with multi-thread and libdm is
> not pthread aware/safe library, why multipathd use libdm with no
> protect? Thanks.
> 
> The procedure details when fd leak occurs given as follows:
> 1. wait_dmevents thread
> wait_dmevents
> ->dmevent_loop
>     ->dm_get_events (->dm_is_mpath)
>         ->dm_task_run
>             ->_open_control
>                 ->_open_and_assign_control_fd
> 
> 2. uevqloop thread
> uevqloop
> ->uevent_dispatch
>     ->service_uevq
>         ->ev_add_path
>             ->__setup_multipath
>                 ->dm_get_info
>                     ->dm_task_run
>                         ->_open_control
>                             ->_open_and_assign_control_fd
> 
> Lixiaokeng

Thanks for the report and the excellent analysis. The general problem
may perhaps not so bad currently, as multipathd uses the "vecs lock" to
protect its own data structures, and most libdm calls are made under
this lock. dmevent_loop() is one example where the vecs lock is not
taken.

I'm attaching a tentative patch for the particular race you reported, 
which resorts to simply taking the vecs lock im dmevent_loop. Please
review/test. This is tentative, I haven't tested it myself beyond
making sure that it passes the unit test.

To be sure we aren't missing anything we'd need to assess if there are
other libdm calls which are not made under the vecs lock. Short-term I
have to time for a complete assessment. My guess would be that there
are a few, but not many, and most of them not prone to races.

In the long run, we need to think about the general problem of libdm
not being thread-safe. So far we've had very few reports like this
(actually, I'm aware of none), so perhaps the vecs log protects us
quite well already. OTOH, we've discussed repeatedly that the locking
in multipathd is too monolithic and should be more fine-grained; as
soon as we drop the monolithic lock, this might hit us hard. If we
introduce an additional lock for libdm, we'll have to think about
potential deadlock situations (probably easy, as the new lock would
just warp libdm calls and would thus, by definition, be at the bottom
of any lock hierarchy in libmultipath).

@Zdenek, do we have to protect every libdm call, or is it sufficient
to protect only dm_task_run(), as lixiaokeng suggested?

@Ben, please take a look as well, as you're the main author of the
dmevents code and know libdm better than me.

Regards,
Martin


> 
> On 2020/7/9 16:54, Zdenek Kabelac wrote:
> > Dne 09. 07. 20 v 9:02 lixiaokeng napsal(a):
> > > When one iscsi device logs in and logs out several times, the
> > > number of fd, which points to '/dev/mapper/control', increases in
> > > /proc/<multipathd-pid>/fd as follows,
> > > [root@localhost fd]# ll | grep control
> > > diff --git a/libdm/ioctl/libdm-iface.c b/libdm/ioctl/libdm-
> > > iface.c
> > > index 7ad549c..7168369 100644
> > > --- a/libdm/ioctl/libdm-iface.c
> > > +++ b/libdm/ioctl/libdm-iface.c
> > > @@ -23,6 +23,7 @@
> > >   #include <sys/ioctl.h>
> > >   #include <sys/utsname.h>
> > >   #include <limits.h>
> > > +#include <pthread.h>
> > > 
> > >   #ifdef __linux__
> > >   #  include "libdm/misc/kdev_t.h"
> > > @@ -81,6 +82,7 @@ static dm_bitset_t _dm_bitset = NULL;
> > >   static uint32_t _dm_device_major = 0;
> > > 
> > >   static int _control_fd = -1;
> > > +static pthread_mutex_t _control_fd_mutex =
> > > PTHREAD_MUTEX_INITIALIZER;
> > >   static int _hold_control_fd_open = 0;
> > >   static int _version_checked = 0;
> > >   static int _version_ok = 1;
> > > @@ -404,10 +406,19 @@ static void _close_control_fd(void)
> > >   #ifdef DM_IOCTLS
> > >   static int _open_and_assign_control_fd(const char *control)
> > >   {
> > > +    pthread_mutex_lock(&_control_fd_mutex);
> > > +
> > > +    if (_control_fd != -1) {
> > 
> > 
> > Hi
> > 
> > libdm is not pthread aware/safe library.
> > 
> > So the fix needs to happen on libdm user-side to prevent race call
> > of this function.
> > 
> > 
> > Zdenek
> > 
> > 
> > .


[-- Attachment #2: 0001-libmultipath-dmevent_loop-hold-vecs-lock.patch --]
[-- Type: text/x-patch, Size: 7160 bytes --]

From 05115217286dbbd1679e71c4aaceabe42afb5f7f Mon Sep 17 00:00:00 2001
From: Martin Wilck <mwilck@suse.com>
Date: Mon, 13 Jul 2020 11:03:43 +0200
Subject: [PATCH] libmultipath: dmevent_loop: hold vecs lock

The libdm calls in the event waiter, in particular get_dm_events(),
may race with libdm calls from other threads, leading to a fd leak
for the libdm control fd.

The procedure details when fd leak occurs given as follows:

1. wait_dmevents thread
wait_dmevents
->dmevent_loop
    ->dm_get_events (->dm_is_mpath)
        ->dm_task_run
            ->_open_control
                ->_open_and_assign_control_fd

2. uevqloop thread
uevqloop
->uevent_dispatch
    ->service_uevq
        ->ev_add_path
            ->__setup_multipath
                ->dm_get_info
                    ->dm_task_run
                        ->_open_control
                            ->_open_and_assign_control_fd

In the long run, we may need to protect libdm access with a separate
lock. For now, work around this particular race by grabbing the
vecs log for an entire round of events processing.

This implies also changes in the dmevents test code, because poll()
is not called from dmevent_loop() any more.

Reported-by: lixiaokeng <lixiaokeng@huawei.com>
---
 multipathd/dmevents.c | 32 ++++++++++++++++++--------------
 tests/dmevents.c      | 35 ++++-------------------------------
 2 files changed, 22 insertions(+), 45 deletions(-)

diff --git a/multipathd/dmevents.c b/multipathd/dmevents.c
index b235dd4..be7e122 100644
--- a/multipathd/dmevents.c
+++ b/multipathd/dmevents.c
@@ -289,24 +289,15 @@ static void unwatch_dmevents(char *name)
 static int dmevent_loop (void)
 {
 	int r, i = 0;
-	struct pollfd pfd;
 	struct dev_event *dev_evt;
 
-	pfd.fd = waiter->fd;
-	pfd.events = POLLIN;
-	r = poll(&pfd, 1, -1);
-	if (r <= 0) {
-		condlog(0, "failed polling for dm events: %s", strerror(errno));
-		/* sleep 1s and hope things get better */
-		return 1;
-	}
-
 	if (arm_dm_event_poll(waiter->fd) != 0) {
 		condlog(0, "Cannot re-arm event polling: %s", strerror(errno));
 		/* sleep 1s and hope things get better */
 		return 1;
 	}
 
+
 	if (dm_get_events() != 0) {
 		condlog(0, "failed getting dm events: %s", strerror(errno));
 		/* sleep 1s and hope things get better */
@@ -352,15 +343,12 @@ static int dmevent_loop (void)
 		 * 4) a path reinstate : nothing to do
 		 * 5) a switch group : nothing to do
 		 */
-		pthread_cleanup_push(cleanup_lock, &waiter->vecs->lock);
-		lock(&waiter->vecs->lock);
 		pthread_testcancel();
 		r = 0;
 		if (curr_dev.action == EVENT_REMOVE)
 			remove_map_by_alias(curr_dev.name, waiter->vecs, 1);
 		else
 			r = update_multipath(waiter->vecs, curr_dev.name, 1);
-		pthread_cleanup_pop(1);
 
 		if (r) {
 			condlog(2, "%s: stopped watching dmevents",
@@ -392,7 +380,23 @@ void *wait_dmevents (__attribute__((unused)) void *unused)
 	mlockall(MCL_CURRENT | MCL_FUTURE);
 
 	while (1) {
-		r = dmevent_loop();
+		struct pollfd pfd;
+		pfd.fd = waiter->fd;
+
+		pfd.events = POLLIN;
+		r = poll(&pfd, 1, -1);
+
+		if (r <= 0) {
+			condlog(0, "failed polling for dm events: %s", strerror(errno));
+			/* sleep 1s and hope things get better */
+			r = 1;
+		} else {
+			pthread_cleanup_push(cleanup_lock, &waiter->vecs->lock);
+			lock(&waiter->vecs->lock);
+			pthread_testcancel();
+			r = dmevent_loop();
+			pthread_cleanup_pop(1);
+		}
 
 		if (r < 0)
 			break;
diff --git a/tests/dmevents.c b/tests/dmevents.c
index bee117a..6857963 100644
--- a/tests/dmevents.c
+++ b/tests/dmevents.c
@@ -627,9 +627,8 @@ static void test_get_events_good1(void **state)
 	assert_int_equal(VECTOR_SIZE(waiter->events), 3);
 }
 
-/* poll does not return an event. nothing happens. The
- * devices remain after this test */
-static void test_dmevent_loop_bad0(void **state)
+/* arm_dm_event_poll's ioctl fails. Nothing happens */
+static void test_dmevent_loop_bad1(void **state)
 {
 	struct dm_device *dev;
 	struct dev_event *dev_evt;
@@ -637,34 +636,13 @@ static void test_dmevent_loop_bad0(void **state)
 	if (datap == NULL)
 		skip();
 
+	// will_return(__wrap_poll, 1);
 	remove_all_dm_device_events();
 	unwatch_all_dmevents();
 	assert_int_equal(add_dm_device_event("foo", 1, 5), 0);
 	will_return(__wrap_dm_geteventnr, 0);
 	assert_int_equal(watch_dmevents("foo"), 0);
 	assert_int_equal(add_dm_device_event("foo", 1, 6), 0);
-	will_return(__wrap_poll, 0);
-	assert_int_equal(dmevent_loop(), 1);
-	dev_evt = find_dmevents("foo");
-	assert_ptr_not_equal(dev_evt, NULL);
-	assert_int_equal(dev_evt->evt_nr, 5);
-	assert_int_equal(dev_evt->action, EVENT_NOTHING);
-	dev = find_dm_device("foo");
-	assert_ptr_not_equal(dev, NULL);
-	assert_int_equal(dev->evt_nr, 6);
-	assert_int_equal(dev->update_nr, 5);
-}
-
-/* arm_dm_event_poll's ioctl fails. Nothing happens */
-static void test_dmevent_loop_bad1(void **state)
-{
-	struct dm_device *dev;
-	struct dev_event *dev_evt;
-	struct test_data *datap = (struct test_data *)(*state);
-	if (datap == NULL)
-		skip();
-
-	will_return(__wrap_poll, 1);
 	will_return(__wrap_ioctl, -1);
 	assert_int_equal(dmevent_loop(), 1);
 	dev_evt = find_dmevents("foo");
@@ -686,7 +664,7 @@ static void test_dmevent_loop_bad2(void **state)
 	if (datap == NULL)
 		skip();
 
-	will_return(__wrap_poll, 1);
+	// will_return(__wrap_poll, 1);
 	will_return(__wrap_ioctl, 0);
 	will_return(__wrap_libmp_dm_task_create, NULL);
 	assert_int_equal(dmevent_loop(), 1);
@@ -710,7 +688,6 @@ static void test_dmevent_loop_good0(void **state)
 
 	remove_all_dm_device_events();
 	unwatch_all_dmevents();
-	will_return(__wrap_poll, 1);
 	will_return(__wrap_ioctl, 0);
 	will_return(__wrap_libmp_dm_task_create, &data);
 	will_return(__wrap_dm_task_no_open_count, 1);
@@ -746,7 +723,6 @@ static void test_dmevent_loop_good1(void **state)
 	assert_int_equal(watch_dmevents("xyzzy"), 0);
 	assert_int_equal(add_dm_device_event("foo", 1, 6), 0);
 	assert_int_equal(remove_dm_device_event("xyzzy"), 0);
-	will_return(__wrap_poll, 1);
 	will_return(__wrap_ioctl, 0);
 	will_return(__wrap_libmp_dm_task_create, &data);
 	will_return(__wrap_dm_task_no_open_count, 1);
@@ -794,7 +770,6 @@ static void test_dmevent_loop_good2(void **state)
 	will_return(__wrap_dm_geteventnr, 0);
 	assert_int_equal(watch_dmevents("baz"), 0);
 	assert_int_equal(add_dm_device_event("baz", 1, 14), 0);
-	will_return(__wrap_poll, 1);
 	will_return(__wrap_ioctl, 0);
 	will_return(__wrap_libmp_dm_task_create, &data);
 	will_return(__wrap_dm_task_no_open_count, 1);
@@ -838,7 +813,6 @@ static void test_dmevent_loop_good3(void **state)
 
 	assert_int_equal(remove_dm_device_event("foo"), 0);
 	unwatch_dmevents("bar");
-	will_return(__wrap_poll, 1);
 	will_return(__wrap_ioctl, 0);
 	will_return(__wrap_libmp_dm_task_create, &data);
 	will_return(__wrap_dm_task_no_open_count, 1);
@@ -896,7 +870,6 @@ int test_dmevents(void)
 		cmocka_unit_test(test_get_events_good0),
 		cmocka_unit_test(test_get_events_good1),
 		cmocka_unit_test(test_arm_poll),
-		cmocka_unit_test(test_dmevent_loop_bad0),
 		cmocka_unit_test(test_dmevent_loop_bad1),
 		cmocka_unit_test(test_dmevent_loop_bad2),
 		cmocka_unit_test(test_dmevent_loop_good0),
-- 
2.26.2


[-- Attachment #3: Type: text/plain, Size: 0 bytes --]



^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [lvm-devel] master - multipathd: fix fd leak when iscsi device logs in
  2020-07-13  9:21     ` Martin Wilck
@ 2020-07-13  9:56       ` Zdenek Kabelac
  2020-07-13 10:08         ` Martin Wilck
  2020-07-13 19:59         ` Martin Wilck
  0 siblings, 2 replies; 12+ messages in thread
From: Zdenek Kabelac @ 2020-07-13  9:56 UTC (permalink / raw)
  To: Martin Wilck, lixiaokeng, Christophe Varoqui, Benjamin Marzinski
  Cc: linfeilong, dm-devel, liuzhiqiang26, lutianxiong

Dne 13. 07. 20 v 11:21 Martin Wilck napsal(a):
> Hi Lixiaokeng,
> 
>
> @Zdenek, do we have to protect every libdm call, or is it sufficient
> to protect only dm_task_run(), as lixiaokeng suggested?
>

Hi

It's actually hard to answer it in a simple way.
Several properties are held in library static variables. So converting libdm
into a fully threaded 'version' would basically require to duplicate all API 
functions with extended 'context' structure passed in - where all buffers can 
be maintained properly (and it's getting more complicated with signal handling 
and debug logging).

ATM it doesn't look like there is a big need for threaded support of DM usage
as majority of tools spends most of their time outside thus 'serialization'
of lvm2 or dmeventd on libdm access look doesn't look like a big issue
(let's say there are far bigger fishes to hunt).

As for the issue of keeping control_fd open - there should be a synchronized 
call of dm_hold_control_dev(0/1) -  see the codebase of  dmeventd.c in lvm2 
code base - how we solve the control_fd handling.

Zdenek

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [lvm-devel] master - multipathd: fix fd leak when iscsi device logs in
  2020-07-13  9:56       ` Zdenek Kabelac
@ 2020-07-13 10:08         ` Martin Wilck
  2020-07-13 10:13           ` Zdenek Kabelac
  2020-07-13 19:59         ` Martin Wilck
  1 sibling, 1 reply; 12+ messages in thread
From: Martin Wilck @ 2020-07-13 10:08 UTC (permalink / raw)
  To: Zdenek Kabelac, lixiaokeng, Christophe Varoqui, Benjamin Marzinski
  Cc: linfeilong, dm-devel, liuzhiqiang26, lutianxiong

Hi Zdenek,

On Mon, 2020-07-13 at 11:56 +0200, Zdenek Kabelac wrote:
> Dne 13. 07. 20 v 11:21 Martin Wilck napsal(a):
> > Hi Lixiaokeng,
> > 
> > 
> > @Zdenek, do we have to protect every libdm call, or is it
> > sufficient
> > to protect only dm_task_run(), as lixiaokeng suggested?
> > 
> 
> Hi
> 
> It's actually hard to answer it in a simple way.
> Several properties are held in library static variables. So
> converting libdm
> into a fully threaded 'version' would basically require to duplicate
> all API 
> functions with extended 'context' structure passed in - where all
> buffers can 
> be maintained properly (and it's getting more complicated with signal
> handling 
> and debug logging).

Sure. I don't think this is where we're going for. But we need to
figure out how to handle this properly in libmultipath.
I take it that we must really protect *every* libdm call in
libmultipath and multipathd if we want to do it right.

> ATM it doesn't look like there is a big need for threaded support of
> DM usage
> as majority of tools spends most of their time outside thus
> 'serialization'
> of lvm2 or dmeventd on libdm access look doesn't look like a big
> issue
> (let's say there are far bigger fishes to hunt).

Agreed.

> As for the issue of keeping control_fd open - there should be a
> synchronized 
> call of dm_hold_control_dev(0/1) -  see the codebase of  dmeventd.c
> in lvm2 
> code base - how we solve the control_fd handling.

Ben has already added support for dm_hold_control_dev() in libmultipath
(e24d8b1 ("libmutipath: don't close fd on dm_lib_release")). But this
doesn't protect us from calling _open_control() simultaneously in
separate code paths, as Lixiaokeng has pointed out.

I don't see how it would, as dm_hold_control_dev() also just changes
a static variable.

Thanks for your quick answer,
Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [lvm-devel] master - multipathd: fix fd leak when iscsi device logs in
  2020-07-13 10:08         ` Martin Wilck
@ 2020-07-13 10:13           ` Zdenek Kabelac
  2020-07-13 10:27             ` Martin Wilck
  0 siblings, 1 reply; 12+ messages in thread
From: Zdenek Kabelac @ 2020-07-13 10:13 UTC (permalink / raw)
  To: Martin Wilck, Benjamin Marzinski; +Cc: dm-devel

Dne 13. 07. 20 v 12:08 Martin Wilck napsal(a):
> Hi Zdenek,
> 
> On Mon, 2020-07-13 at 11:56 +0200, Zdenek Kabelac wrote:
>> Dne 13. 07. 20 v 11:21 Martin Wilck napsal(a):
>>> Hi Lixiaokeng,
>>>
>>>
>>> @Zdenek, do we have to protect every libdm call, or is it
>>> sufficient
>>> to protect only dm_task_run(), as lixiaokeng suggested?
>>>
>>
>> Hi
>>
>> It's actually hard to answer it in a simple way.
>> Several properties are held in library static variables. So
>> converting libdm
>> into a fully threaded 'version' would basically require to duplicate
>> all API
>> functions with extended 'context' structure passed in - where all
>> buffers can
>> be maintained properly (and it's getting more complicated with signal
>> handling
>> and debug logging).
> 
> Sure. I don't think this is where we're going for. But we need to
> figure out how to handle this properly in libmultipath.
> I take it that we must really protect *every* libdm call in
> libmultipath and multipathd if we want to do it right.
> 
>> ATM it doesn't look like there is a big need for threaded support of
>> DM usage
>> as majority of tools spends most of their time outside thus
>> 'serialization'
>> of lvm2 or dmeventd on libdm access look doesn't look like a big
>> issue
>> (let's say there are far bigger fishes to hunt).
> 
> Agreed.
> 
>> As for the issue of keeping control_fd open - there should be a
>> synchronized
>> call of dm_hold_control_dev(0/1) -  see the codebase of  dmeventd.c
>> in lvm2
>> code base - how we solve the control_fd handling.
> 
> Ben has already added support for dm_hold_control_dev() in libmultipath
> (e24d8b1 ("libmutipath: don't close fd on dm_lib_release")). But this
> doesn't protect us from calling _open_control() simultaneously in
> separate code paths, as Lixiaokeng has pointed out.
> 
> I don't see how it would, as dm_hold_control_dev() also just changes
> a static variable.


Here I'm not familiar with  multipath codebase (more a question for Ben 
likely), but is 'dm_hold...()' being used in non-threaded way - i.e. in 
dmeventd these functions are called  before/after threads are finished - so 
there would be no way 2 threads may call this function in parallel ?

Zdenek

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [lvm-devel] master - multipathd: fix fd leak when iscsi device logs in
  2020-07-13 10:13           ` Zdenek Kabelac
@ 2020-07-13 10:27             ` Martin Wilck
  0 siblings, 0 replies; 12+ messages in thread
From: Martin Wilck @ 2020-07-13 10:27 UTC (permalink / raw)
  To: Zdenek Kabelac, Benjamin Marzinski, lixiaokeng; +Cc: dm-devel

On Mon, 2020-07-13 at 12:13 +0200, Zdenek Kabelac wrote:
> Dne 13. 07. 20 v 12:08 Martin Wilck napsal(a):
> > 
> > Ben has already added support for dm_hold_control_dev() in
> > libmultipath
> > (e24d8b1 ("libmutipath: don't close fd on dm_lib_release")). But
> > this
> > doesn't protect us from calling _open_control() simultaneously in
> > separate code paths, as Lixiaokeng has pointed out.
> > 
> > I don't see how it would, as dm_hold_control_dev() also just
> > changes
> > a static variable.
> 
> Here I'm not familiar with  multipath codebase (more a question for
> Ben 
> likely), but is 'dm_hold...()' being used in non-threaded way - i.e.
> in 
> dmeventd these functions are called  before/after threads are
> finished - so 
> there would be no way 2 threads may call this function in parallel ?

Yes, it's called via pthread_once().

@Lixiaokeng, that would suggest a simpler solution to your fd leak
problem than what I posted before. You'd just need to apply Ben's patch
"libmutipath: don't close fd on dm_lib_release" submitted March
24th.

Contrary to what I said before, this patch is NOT merged yet. The commit id e24d8b1 was from my "upstream-queue" branch only 
(https://github.com/openSUSE/multipath-tools/tree/upstream-queue)

Could you try with that patch (and without mine sent previously)?

Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [lvm-devel] master - multipathd: fix fd leak when iscsi device logs in
  2020-07-13  9:56       ` Zdenek Kabelac
  2020-07-13 10:08         ` Martin Wilck
@ 2020-07-13 19:59         ` Martin Wilck
  2020-07-14 22:22           ` Benjamin Marzinski
  1 sibling, 1 reply; 12+ messages in thread
From: Martin Wilck @ 2020-07-13 19:59 UTC (permalink / raw)
  To: Zdenek Kabelac, lixiaokeng, Christophe Varoqui, Benjamin Marzinski
  Cc: linfeilong, dm-devel, liuzhiqiang26, lutianxiong

On Mon, 2020-07-13 at 11:56 +0200, Zdenek Kabelac wrote:
> 
> > @Zdenek, do we have to protect every libdm call, or is it
> > sufficient
> > to protect only dm_task_run(), as lixiaokeng suggested?
> > 
> 
> Hi
> 
> It's actually hard to answer it in a simple way.
> Several properties are held in library static variables.
>
> ...
>
> As for the issue of keeping control_fd open - there should be a
> synchronized 
> call of dm_hold_control_dev(0/1) -  see the codebase of  dmeventd.c
> in lvm2 
> code base - how we solve the control_fd handling.

I've made an assessment of the libdm calls that multipath-tools use.

Most of them are trivial getters and setters and need no locking,
because they don't operate on any static or global data structures.

The exceptions I found are are listed here:

dm_driver_version
dm_hold_control_dev
dm_is_dm_major
dm_lib_exit
dm_lib_release
dm_log_init
dm_log_init_verbose
dm_task_create
dm_task_run
dm_task_set_cookie
dm_udev_set_sync_support
dm_udev_wait

The reported race around _control_fd could probably be fixed simply
by calling dm_hold_control_dev() and dm_driver_version() early on e.g.
via pthread_once(), because dm_driver_version() calls dm_task_create()
and dm_task_run(), and thus implicitly initializes the _control_fd.
(libmultipath currently doesn't do these calls in the right order).
This should also avoid the need for locking dm_is_dm_major() (access to
_bitset) and dm_task_create (check_version()). The early init function
could also call dm_log_init(), dm_log_init_verbose(), and
dm_udev_set_sync_support(), setting up the libdm parameters once and
for all.

However, there are other possible races, as you noted. Mainly related
to manipulating the node_ops stack - this concerns dm_task_run(),
dm_udev_wait(), dm_lib_release(), dm_lib_exit(). I'm uncertain about 
dm_task_set_cookie(). I suppose it doesn't need protecting, but I tend
to be on the safe side.

Anyway, that's my summary: the 4 or 5 functions mentioned in the
previous paragraph would need wrapping under a lock. The rest can be
handled by making sure the initialization is made early on, and only
once.

Comments welcome.

Regards
Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [lvm-devel] master - multipathd: fix fd leak when iscsi device logs in
  2020-07-13 19:59         ` Martin Wilck
@ 2020-07-14 22:22           ` Benjamin Marzinski
  2020-07-15 17:41             ` Martin Wilck
  0 siblings, 1 reply; 12+ messages in thread
From: Benjamin Marzinski @ 2020-07-14 22:22 UTC (permalink / raw)
  To: Martin Wilck
  Cc: lixiaokeng, linfeilong, dm-devel, Zdenek Kabelac, liuzhiqiang26,
	lutianxiong

On Mon, Jul 13, 2020 at 09:59:43PM +0200, Martin Wilck wrote:
> On Mon, 2020-07-13 at 11:56 +0200, Zdenek Kabelac wrote:
> > 
> > > @Zdenek, do we have to protect every libdm call, or is it
> > > sufficient
> > > to protect only dm_task_run(), as lixiaokeng suggested?
> > > 
> > 
> > Hi
> > 
> > It's actually hard to answer it in a simple way.
> > Several properties are held in library static variables.
> >
> > ...
> >
> > As for the issue of keeping control_fd open - there should be a
> > synchronized 
> > call of dm_hold_control_dev(0/1) -  see the codebase of  dmeventd.c
> > in lvm2 
> > code base - how we solve the control_fd handling.
> 
> I've made an assessment of the libdm calls that multipath-tools use.
> 
> Most of them are trivial getters and setters and need no locking,
> because they don't operate on any static or global data structures.
> 
> The exceptions I found are are listed here:
> 
> dm_driver_version
> dm_hold_control_dev
> dm_is_dm_major
> dm_lib_exit
> dm_lib_release
> dm_log_init
> dm_log_init_verbose
> dm_task_create
> dm_task_run
> dm_task_set_cookie
> dm_udev_set_sync_support
> dm_udev_wait
> 
> The reported race around _control_fd could probably be fixed simply
> by calling dm_hold_control_dev() and dm_driver_version() early on e.g.
> via pthread_once(), because dm_driver_version() calls dm_task_create()
> and dm_task_run(), and thus implicitly initializes the _control_fd.
> (libmultipath currently doesn't do these calls in the right order).
> This should also avoid the need for locking dm_is_dm_major() (access to
> _bitset) and dm_task_create (check_version()). The early init function
> could also call dm_log_init(), dm_log_init_verbose(), and
> dm_udev_set_sync_support(), setting up the libdm parameters once and
> for all.
> 
> However, there are other possible races, as you noted. Mainly related
> to manipulating the node_ops stack - this concerns dm_task_run(),
> dm_udev_wait(), dm_lib_release(), dm_lib_exit(). I'm uncertain about 
> dm_task_set_cookie(). I suppose it doesn't need protecting, but I tend
> to be on the safe side.
> 
> Anyway, that's my summary: the 4 or 5 functions mentioned in the
> previous paragraph would need wrapping under a lock. The rest can be
> handled by making sure the initialization is made early on, and only
> once.
> 
> Comments welcome.

We've agreed that dm_lib_release() is unnecessary, since we always call
dm_udev_wait() when update_devs() is needed. dm_lib_exit() should
already be safe, since we only call it after cleaning up the other
threads. dm_task_set_cookie() looks safe to me as well, assuming that we
have early initialization. So we are really talking about dm_task_run()
and dm_udev_wait(). Clearly dm_udev_wait() always needs to run under a
lock. Two threads trying to remove items from a list at the same time
isn't safe, and there's no way to know if there will be items on the
_node_ops list. The one thing I'm not sure of, is whether every call to
dm_task_run() always needs to be locked. clearly we could, and it would
be safer. Also, clearly for calls that add elements to _node_ops it
must. But for calls like DM_DEVICE_TABLE or DM_DEVICE_LIST, the only
issue that I can see with concurrent calls is the possibility that
_ioctl_buffer_double_factor gets increased too much. Perhaps that's
enough of a problem that we should just lock all dm_task_run() calls as
well.

-Ben
 
> Regards
> Martin
> 

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: master - multipathd: fix fd leak when iscsi device logs in
  2020-07-14 22:22           ` Benjamin Marzinski
@ 2020-07-15 17:41             ` Martin Wilck
  2020-07-20  6:19               ` Zhiqiang Liu
  0 siblings, 1 reply; 12+ messages in thread
From: Martin Wilck @ 2020-07-15 17:41 UTC (permalink / raw)
  To: Benjamin Marzinski
  Cc: lixiaokeng, linfeilong, dm-devel, Zdenek Kabelac, liuzhiqiang26,
	lutianxiong

On Tue, 2020-07-14 at 17:22 -0500, Benjamin Marzinski wrote:
> On Mon, Jul 13, 2020 at 09:59:43PM +0200, Martin Wilck wrote:
> > Comments welcome.
> 
> We've agreed that dm_lib_release() is unnecessary, since we always
> call
> dm_udev_wait() when update_devs() is needed. dm_lib_exit() should
> already be safe, since we only call it after cleaning up the other
> threads. dm_task_set_cookie() looks safe to me as well, assuming that
> we
> have early initialization. So we are really talking about
> dm_task_run()
> and dm_udev_wait(). Clearly dm_udev_wait() always needs to run under
> a
> lock. Two threads trying to remove items from a list at the same time
> isn't safe, and there's no way to know if there will be items on the
> _node_ops list. The one thing I'm not sure of, is whether every call
> to
> dm_task_run() always needs to be locked. clearly we could, and it
> would
> be safer. Also, clearly for calls that add elements to _node_ops it
> must. But for calls like DM_DEVICE_TABLE or DM_DEVICE_LIST, the only
> issue that I can see with concurrent calls is the possibility that
> _ioctl_buffer_double_factor gets increased too much. Perhaps that's
> enough of a problem that we should just lock all dm_task_run() calls
> as
> well.

Yes. We shouldn't try to be too smart. It's easier to verify that every
call is wrapped under a lock than to figure out later whether we made
the case distinction right.

So, action plan:

 - remove dm_lib_release() calls
 - Combine all possibly racy calls related to libdm initialization in
   some early init call, protected by pthread_once()
 - make sure dm_lib_exit() is handled correctly
 - add locking around dm_task_run() and dm_udev_wait()

Regards
Martin

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: master - multipathd: fix fd leak when iscsi device logs in
  2020-07-15 17:41             ` Martin Wilck
@ 2020-07-20  6:19               ` Zhiqiang Liu
  0 siblings, 0 replies; 12+ messages in thread
From: Zhiqiang Liu @ 2020-07-20  6:19 UTC (permalink / raw)
  To: Martin Wilck, Benjamin Marzinski
  Cc: lixiaokeng, linfeilong, dm-devel, lutianxiong, Zdenek Kabelac



On 2020/7/16 1:41, Martin Wilck wrote:
> On Tue, 2020-07-14 at 17:22 -0500, Benjamin Marzinski wrote:
>> On Mon, Jul 13, 2020 at 09:59:43PM +0200, Martin Wilck wrote:
>>> Comments welcome.
>>
>> We've agreed that dm_lib_release() is unnecessary, since we always
>> call
>> dm_udev_wait() when update_devs() is needed. dm_lib_exit() should
>> already be safe, since we only call it after cleaning up the other
>> threads. dm_task_set_cookie() looks safe to me as well, assuming that
>> we
>> have early initialization. So we are really talking about
>> dm_task_run()
>> and dm_udev_wait(). Clearly dm_udev_wait() always needs to run under
>> a
>> lock. Two threads trying to remove items from a list at the same time
>> isn't safe, and there's no way to know if there will be items on the
>> _node_ops list. The one thing I'm not sure of, is whether every call
>> to
>> dm_task_run() always needs to be locked. clearly we could, and it
>> would
>> be safer. Also, clearly for calls that add elements to _node_ops it
>> must. But for calls like DM_DEVICE_TABLE or DM_DEVICE_LIST, the only
>> issue that I can see with concurrent calls is the possibility that
>> _ioctl_buffer_double_factor gets increased too much. Perhaps that's
>> enough of a problem that we should just lock all dm_task_run() calls
>> as
>> well.
> 
> Yes. We shouldn't try to be too smart. It's easier to verify that every
> call is wrapped under a lock than to figure out later whether we made
> the case distinction right.
> 
> So, action plan:
> 
>  - remove dm_lib_release() calls
>  - Combine all possibly racy calls related to libdm initialization in
>    some early init call, protected by pthread_once()
>  - make sure dm_lib_exit() is handled correctly
>  - add locking around dm_task_run() and dm_udev_wait()
> 
> Regards
> Martin
> 

Thanks for your detailed analysis.

Look forward to the new version with solving the issue.

Regards,
Zhiqiang Liu

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2020-07-20  6:19 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-07-09  7:02 master - multipathd: fix fd leak when iscsi device logs in lixiaokeng
2020-07-09  8:54 ` Zdenek Kabelac
2020-07-13  2:15   ` [lvm-devel] " lixiaokeng
2020-07-13  9:21     ` Martin Wilck
2020-07-13  9:56       ` Zdenek Kabelac
2020-07-13 10:08         ` Martin Wilck
2020-07-13 10:13           ` Zdenek Kabelac
2020-07-13 10:27             ` Martin Wilck
2020-07-13 19:59         ` Martin Wilck
2020-07-14 22:22           ` Benjamin Marzinski
2020-07-15 17:41             ` Martin Wilck
2020-07-20  6:19               ` Zhiqiang Liu

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.