All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v3] tgtd: refresh ready fds of event loop after event deletion
@ 2014-02-26  5:13 Hitoshi Mitake
  2014-02-26  5:22 ` FUJITA Tomonori
  2014-03-04 15:14 ` Or Gerlitz
  0 siblings, 2 replies; 5+ messages in thread
From: Hitoshi Mitake @ 2014-02-26  5:13 UTC (permalink / raw)
  To: stgt; +Cc: mitake.hitoshi, Hitoshi Mitake

Current main event loop (event_loop()) of tgtd has a possibility of
segmentation fault. The problem is caused by the below sequence:

1. Event A, B are ready so epoll_wait(2) returns.
2. The handler of the event A is called. In the event handler, the
   event B is deleted with tgt_event_del()
3. event_loop() tries to call the handler of the event B. It causes
   segfault because the event struct is already removed and freed.

For avoid this problem, this patch adds a new global variable
event_need_refresh. If the value of this variable is 1, event_loop()
calls epoll_wait(2) again for refreshing ready fd list. This patch
also lets tgt_event_del() to turn on the flag in its tail.

For example, we can produce segfault of tgtd under heavy load. Below
is a backtrace obtained from the core file:
 (gdb) bt
 #0  0x0000000000000000 in ?? ()
 #1  0x0000000000411419 in event_loop () at tgtd.c:414
 #2  0x0000000000411b65 in main (argc=<value optimized out>, argv=<value optimized out>) at tgtd.c:591

To be honest, I still don't find an event handler which calls
tgt_event_del() for other fds. But with this modification, the above
segfault is avoided. The change seems to be effective.

Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
---
v3:
 - update commit log for backtrace
 - remove needless extern declaration in tgtd.h

v2: use the existing label "retry" instead of adding a new label

 usr/tgtd.c |    9 +++++++++
 1 file changed, 9 insertions(+)

diff --git a/usr/tgtd.c b/usr/tgtd.c
index 50e1c83..04b31dc 100644
--- a/usr/tgtd.c
+++ b/usr/tgtd.c
@@ -212,6 +212,8 @@ static struct event_data *tgt_event_lookup(int fd)
 	return NULL;
 }
 
+int event_need_refresh;
+
 void tgt_event_del(int fd)
 {
 	struct event_data *tev;
@@ -229,6 +231,8 @@ void tgt_event_del(int fd)
 
 	list_del(&tev->e_list);
 	free(tev);
+
+	event_need_refresh = 1;
 }
 
 int tgt_event_modify(int fd, int events)
@@ -426,6 +430,11 @@ retry:
 		for (i = 0; i < nevent; i++) {
 			tev = (struct event_data *) events[i].data.ptr;
 			tev->handler(tev->fd, events[i].events, tev->data);
+
+			if (event_need_refresh) {
+				event_need_refresh = 0;
+				goto retry;
+			}
 		}
 	}
 
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] tgtd: refresh ready fds of event loop after event deletion
  2014-02-26  5:13 [PATCH v3] tgtd: refresh ready fds of event loop after event deletion Hitoshi Mitake
@ 2014-02-26  5:22 ` FUJITA Tomonori
  2014-02-26  5:33   ` Hitoshi Mitake
  2014-03-04 15:14 ` Or Gerlitz
  1 sibling, 1 reply; 5+ messages in thread
From: FUJITA Tomonori @ 2014-02-26  5:22 UTC (permalink / raw)
  To: mitake.hitoshi; +Cc: stgt, mitake.hitoshi

On Wed, 26 Feb 2014 14:13:22 +0900
Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> wrote:

> Current main event loop (event_loop()) of tgtd has a possibility of
> segmentation fault. The problem is caused by the below sequence:
> 
> 1. Event A, B are ready so epoll_wait(2) returns.
> 2. The handler of the event A is called. In the event handler, the
>    event B is deleted with tgt_event_del()
> 3. event_loop() tries to call the handler of the event B. It causes
>    segfault because the event struct is already removed and freed.
> 
> For avoid this problem, this patch adds a new global variable
> event_need_refresh. If the value of this variable is 1, event_loop()
> calls epoll_wait(2) again for refreshing ready fd list. This patch
> also lets tgt_event_del() to turn on the flag in its tail.
> 
> For example, we can produce segfault of tgtd under heavy load. Below
> is a backtrace obtained from the core file:
>  (gdb) bt
>  #0  0x0000000000000000 in ?? ()
>  #1  0x0000000000411419 in event_loop () at tgtd.c:414
>  #2  0x0000000000411b65 in main (argc=<value optimized out>, argv=<value optimized out>) at tgtd.c:591
> 
> To be honest, I still don't find an event handler which calls
> tgt_event_del() for other fds. But with this modification, the above
> segfault is avoided. The change seems to be effective.

As you pointed out off-line, making a connection closed via tgtadm
might be the case. Anyway, I think that the handler API should allow
removing other handlers from a handler. Merged, thanks a lot!

> +int event_need_refresh;
> +

static?

>  void tgt_event_del(int fd)
>  {
>  	struct event_data *tev;
> @@ -229,6 +231,8 @@ void tgt_event_del(int fd)
>  
>  	list_del(&tev->e_list);
>  	free(tev);
> +
> +	event_need_refresh = 1;
>  }
>  
>  int tgt_event_modify(int fd, int events)
> @@ -426,6 +430,11 @@ retry:
>  		for (i = 0; i < nevent; i++) {
>  			tev = (struct event_data *) events[i].data.ptr;
>  			tev->handler(tev->fd, events[i].events, tev->data);
> +
> +			if (event_need_refresh) {
> +				event_need_refresh = 0;
> +				goto retry;
> +			}
>  		}
>  	}
>  
> -- 
> 1.7.10.4
> 
> --
> To unsubscribe from this list: send the line "unsubscribe stgt" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] tgtd: refresh ready fds of event loop after event deletion
  2014-02-26  5:22 ` FUJITA Tomonori
@ 2014-02-26  5:33   ` Hitoshi Mitake
  0 siblings, 0 replies; 5+ messages in thread
From: Hitoshi Mitake @ 2014-02-26  5:33 UTC (permalink / raw)
  To: FUJITA Tomonori; +Cc: mitake.hitoshi, stgt, mitake.hitoshi

At Wed, 26 Feb 2014 14:22:21 +0900,
FUJITA Tomonori wrote:
> 
> On Wed, 26 Feb 2014 14:13:22 +0900
> Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> wrote:
> 
> > Current main event loop (event_loop()) of tgtd has a possibility of
> > segmentation fault. The problem is caused by the below sequence:
> > 
> > 1. Event A, B are ready so epoll_wait(2) returns.
> > 2. The handler of the event A is called. In the event handler, the
> >    event B is deleted with tgt_event_del()
> > 3. event_loop() tries to call the handler of the event B. It causes
> >    segfault because the event struct is already removed and freed.
> > 
> > For avoid this problem, this patch adds a new global variable
> > event_need_refresh. If the value of this variable is 1, event_loop()
> > calls epoll_wait(2) again for refreshing ready fd list. This patch
> > also lets tgt_event_del() to turn on the flag in its tail.
> > 
> > For example, we can produce segfault of tgtd under heavy load. Below
> > is a backtrace obtained from the core file:
> >  (gdb) bt
> >  #0  0x0000000000000000 in ?? ()
> >  #1  0x0000000000411419 in event_loop () at tgtd.c:414
> >  #2  0x0000000000411b65 in main (argc=<value optimized out>, argv=<value optimized out>) at tgtd.c:591
> > 
> > To be honest, I still don't find an event handler which calls
> > tgt_event_del() for other fds. But with this modification, the above
> > segfault is avoided. The change seems to be effective.
> 
> As you pointed out off-line, making a connection closed via tgtadm
> might be the case. Anyway, I think that the handler API should allow
> removing other handlers from a handler. Merged, thanks a lot!
> 
> > +int event_need_refresh;
> > +
> 
> static?

Ah, the variable should be static one. I'll send a patch later.

Thanks,
Hitoshi

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] tgtd: refresh ready fds of event loop after event deletion
  2014-02-26  5:13 [PATCH v3] tgtd: refresh ready fds of event loop after event deletion Hitoshi Mitake
  2014-02-26  5:22 ` FUJITA Tomonori
@ 2014-03-04 15:14 ` Or Gerlitz
  2014-03-06  6:54   ` Hitoshi Mitake
  1 sibling, 1 reply; 5+ messages in thread
From: Or Gerlitz @ 2014-03-04 15:14 UTC (permalink / raw)
  To: Hitoshi Mitake, stgt; +Cc: mitake.hitoshi, Roi Dayan

On 26/02/2014 07:13, Hitoshi Mitake wrote:
> For example, we can produce segfault of tgtd under heavy load. Below
> is a backtrace obtained from the core file:
>   (gdb) bt
>   #0  0x0000000000000000 in ?? ()
>   #1  0x0000000000411419 in event_loop () at tgtd.c:414
>   #2  0x0000000000411b65 in main (argc=<value optimized out>, argv=<value optimized out>) at tgtd.c:591
>
> To be honest, I still don't find an event handler which calls
> tgt_event_del() for other fds. But with this modification, the above
> segfault is avoided. The change seems to be effective.

Just want to make sure I follow --- you do have a way to reproduce the 
bug, but from code inspection you didn't find an event handler in tgt 
which calls  tgt_event_del() for "other" fds which is the trigger for 
the bug, right?

Can you please provide the steps to reproduce the bug?

Or.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH v3] tgtd: refresh ready fds of event loop after event deletion
  2014-03-04 15:14 ` Or Gerlitz
@ 2014-03-06  6:54   ` Hitoshi Mitake
  0 siblings, 0 replies; 5+ messages in thread
From: Hitoshi Mitake @ 2014-03-06  6:54 UTC (permalink / raw)
  To: Or Gerlitz; +Cc: Hitoshi Mitake, stgt, mitake.hitoshi, Roi Dayan

At Tue, 4 Mar 2014 17:14:41 +0200,
Or Gerlitz wrote:
> 
> On 26/02/2014 07:13, Hitoshi Mitake wrote:
> > For example, we can produce segfault of tgtd under heavy load. Below
> > is a backtrace obtained from the core file:
> >   (gdb) bt
> >   #0  0x0000000000000000 in ?? ()
> >   #1  0x0000000000411419 in event_loop () at tgtd.c:414
> >   #2  0x0000000000411b65 in main (argc=<value optimized out>, argv=<value optimized out>) at tgtd.c:591
> >
> > To be honest, I still don't find an event handler which calls
> > tgt_event_del() for other fds. But with this modification, the above
> > segfault is avoided. The change seems to be effective.
> 
> Just want to make sure I follow --- you do have a way to reproduce the 
> bug, but from code inspection you didn't find an event handler in tgt 
> which calls  tgt_event_del() for "other" fds which is the trigger for 
> the bug, right?

Yes. To be more precise, I'd like to describe my understanding:

1. There are some event handlers which can close "other
fds". e.g. mtask_recv_send_handler(). It can be invoked via input of
unix domain socket and close fds of tcp connections when user invokes
"--op delete --mode target".

2. But we can produce the above segfault without using it...

> 
> Can you please provide the steps to reproduce the bug?

We are using 4 node cluster connected via 10Gbps ethernet. 1 node
executes tgtd for providing iSCSI target. The backing store is
sheepdog. All nodes execute 4 VMs and read iso file (about 4GB) from
single logical unit.

# The test is mocking an environment of thin clients. We need multiple
# dd processes for exhausting the 10Gbps network.

When we run the above test several times, the segfault occurs. But, of
course, we don't invoke the "tgtadm --op delete --mode target" during
the testing.

Thanks,
Hitoshi

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2014-03-06  6:54 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-26  5:13 [PATCH v3] tgtd: refresh ready fds of event loop after event deletion Hitoshi Mitake
2014-02-26  5:22 ` FUJITA Tomonori
2014-02-26  5:33   ` Hitoshi Mitake
2014-03-04 15:14 ` Or Gerlitz
2014-03-06  6:54   ` Hitoshi Mitake

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.