* [PATCH v3] tgtd: refresh ready fds of event loop after event deletion
@ 2014-02-26 5:13 Hitoshi Mitake
2014-02-26 5:22 ` FUJITA Tomonori
2014-03-04 15:14 ` Or Gerlitz
0 siblings, 2 replies; 5+ messages in thread
From: Hitoshi Mitake @ 2014-02-26 5:13 UTC (permalink / raw)
To: stgt; +Cc: mitake.hitoshi, Hitoshi Mitake
Current main event loop (event_loop()) of tgtd has a possibility of
segmentation fault. The problem is caused by the below sequence:
1. Event A, B are ready so epoll_wait(2) returns.
2. The handler of the event A is called. In the event handler, the
event B is deleted with tgt_event_del()
3. event_loop() tries to call the handler of the event B. It causes
segfault because the event struct is already removed and freed.
For avoid this problem, this patch adds a new global variable
event_need_refresh. If the value of this variable is 1, event_loop()
calls epoll_wait(2) again for refreshing ready fd list. This patch
also lets tgt_event_del() to turn on the flag in its tail.
For example, we can produce segfault of tgtd under heavy load. Below
is a backtrace obtained from the core file:
(gdb) bt
#0 0x0000000000000000 in ?? ()
#1 0x0000000000411419 in event_loop () at tgtd.c:414
#2 0x0000000000411b65 in main (argc=<value optimized out>, argv=<value optimized out>) at tgtd.c:591
To be honest, I still don't find an event handler which calls
tgt_event_del() for other fds. But with this modification, the above
segfault is avoided. The change seems to be effective.
Signed-off-by: Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp>
---
v3:
- update commit log for backtrace
- remove needless extern declaration in tgtd.h
v2: use the existing label "retry" instead of adding a new label
usr/tgtd.c | 9 +++++++++
1 file changed, 9 insertions(+)
diff --git a/usr/tgtd.c b/usr/tgtd.c
index 50e1c83..04b31dc 100644
--- a/usr/tgtd.c
+++ b/usr/tgtd.c
@@ -212,6 +212,8 @@ static struct event_data *tgt_event_lookup(int fd)
return NULL;
}
+int event_need_refresh;
+
void tgt_event_del(int fd)
{
struct event_data *tev;
@@ -229,6 +231,8 @@ void tgt_event_del(int fd)
list_del(&tev->e_list);
free(tev);
+
+ event_need_refresh = 1;
}
int tgt_event_modify(int fd, int events)
@@ -426,6 +430,11 @@ retry:
for (i = 0; i < nevent; i++) {
tev = (struct event_data *) events[i].data.ptr;
tev->handler(tev->fd, events[i].events, tev->data);
+
+ if (event_need_refresh) {
+ event_need_refresh = 0;
+ goto retry;
+ }
}
}
--
1.7.10.4
^ permalink raw reply related [flat|nested] 5+ messages in thread
* Re: [PATCH v3] tgtd: refresh ready fds of event loop after event deletion
2014-02-26 5:13 [PATCH v3] tgtd: refresh ready fds of event loop after event deletion Hitoshi Mitake
@ 2014-02-26 5:22 ` FUJITA Tomonori
2014-02-26 5:33 ` Hitoshi Mitake
2014-03-04 15:14 ` Or Gerlitz
1 sibling, 1 reply; 5+ messages in thread
From: FUJITA Tomonori @ 2014-02-26 5:22 UTC (permalink / raw)
To: mitake.hitoshi; +Cc: stgt, mitake.hitoshi
On Wed, 26 Feb 2014 14:13:22 +0900
Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> wrote:
> Current main event loop (event_loop()) of tgtd has a possibility of
> segmentation fault. The problem is caused by the below sequence:
>
> 1. Event A, B are ready so epoll_wait(2) returns.
> 2. The handler of the event A is called. In the event handler, the
> event B is deleted with tgt_event_del()
> 3. event_loop() tries to call the handler of the event B. It causes
> segfault because the event struct is already removed and freed.
>
> For avoid this problem, this patch adds a new global variable
> event_need_refresh. If the value of this variable is 1, event_loop()
> calls epoll_wait(2) again for refreshing ready fd list. This patch
> also lets tgt_event_del() to turn on the flag in its tail.
>
> For example, we can produce segfault of tgtd under heavy load. Below
> is a backtrace obtained from the core file:
> (gdb) bt
> #0 0x0000000000000000 in ?? ()
> #1 0x0000000000411419 in event_loop () at tgtd.c:414
> #2 0x0000000000411b65 in main (argc=<value optimized out>, argv=<value optimized out>) at tgtd.c:591
>
> To be honest, I still don't find an event handler which calls
> tgt_event_del() for other fds. But with this modification, the above
> segfault is avoided. The change seems to be effective.
As you pointed out off-line, making a connection closed via tgtadm
might be the case. Anyway, I think that the handler API should allow
removing other handlers from a handler. Merged, thanks a lot!
> +int event_need_refresh;
> +
static?
> void tgt_event_del(int fd)
> {
> struct event_data *tev;
> @@ -229,6 +231,8 @@ void tgt_event_del(int fd)
>
> list_del(&tev->e_list);
> free(tev);
> +
> + event_need_refresh = 1;
> }
>
> int tgt_event_modify(int fd, int events)
> @@ -426,6 +430,11 @@ retry:
> for (i = 0; i < nevent; i++) {
> tev = (struct event_data *) events[i].data.ptr;
> tev->handler(tev->fd, events[i].events, tev->data);
> +
> + if (event_need_refresh) {
> + event_need_refresh = 0;
> + goto retry;
> + }
> }
> }
>
> --
> 1.7.10.4
>
> --
> To unsubscribe from this list: send the line "unsubscribe stgt" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] tgtd: refresh ready fds of event loop after event deletion
2014-02-26 5:22 ` FUJITA Tomonori
@ 2014-02-26 5:33 ` Hitoshi Mitake
0 siblings, 0 replies; 5+ messages in thread
From: Hitoshi Mitake @ 2014-02-26 5:33 UTC (permalink / raw)
To: FUJITA Tomonori; +Cc: mitake.hitoshi, stgt, mitake.hitoshi
At Wed, 26 Feb 2014 14:22:21 +0900,
FUJITA Tomonori wrote:
>
> On Wed, 26 Feb 2014 14:13:22 +0900
> Hitoshi Mitake <mitake.hitoshi@lab.ntt.co.jp> wrote:
>
> > Current main event loop (event_loop()) of tgtd has a possibility of
> > segmentation fault. The problem is caused by the below sequence:
> >
> > 1. Event A, B are ready so epoll_wait(2) returns.
> > 2. The handler of the event A is called. In the event handler, the
> > event B is deleted with tgt_event_del()
> > 3. event_loop() tries to call the handler of the event B. It causes
> > segfault because the event struct is already removed and freed.
> >
> > For avoid this problem, this patch adds a new global variable
> > event_need_refresh. If the value of this variable is 1, event_loop()
> > calls epoll_wait(2) again for refreshing ready fd list. This patch
> > also lets tgt_event_del() to turn on the flag in its tail.
> >
> > For example, we can produce segfault of tgtd under heavy load. Below
> > is a backtrace obtained from the core file:
> > (gdb) bt
> > #0 0x0000000000000000 in ?? ()
> > #1 0x0000000000411419 in event_loop () at tgtd.c:414
> > #2 0x0000000000411b65 in main (argc=<value optimized out>, argv=<value optimized out>) at tgtd.c:591
> >
> > To be honest, I still don't find an event handler which calls
> > tgt_event_del() for other fds. But with this modification, the above
> > segfault is avoided. The change seems to be effective.
>
> As you pointed out off-line, making a connection closed via tgtadm
> might be the case. Anyway, I think that the handler API should allow
> removing other handlers from a handler. Merged, thanks a lot!
>
> > +int event_need_refresh;
> > +
>
> static?
Ah, the variable should be static one. I'll send a patch later.
Thanks,
Hitoshi
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] tgtd: refresh ready fds of event loop after event deletion
2014-02-26 5:13 [PATCH v3] tgtd: refresh ready fds of event loop after event deletion Hitoshi Mitake
2014-02-26 5:22 ` FUJITA Tomonori
@ 2014-03-04 15:14 ` Or Gerlitz
2014-03-06 6:54 ` Hitoshi Mitake
1 sibling, 1 reply; 5+ messages in thread
From: Or Gerlitz @ 2014-03-04 15:14 UTC (permalink / raw)
To: Hitoshi Mitake, stgt; +Cc: mitake.hitoshi, Roi Dayan
On 26/02/2014 07:13, Hitoshi Mitake wrote:
> For example, we can produce segfault of tgtd under heavy load. Below
> is a backtrace obtained from the core file:
> (gdb) bt
> #0 0x0000000000000000 in ?? ()
> #1 0x0000000000411419 in event_loop () at tgtd.c:414
> #2 0x0000000000411b65 in main (argc=<value optimized out>, argv=<value optimized out>) at tgtd.c:591
>
> To be honest, I still don't find an event handler which calls
> tgt_event_del() for other fds. But with this modification, the above
> segfault is avoided. The change seems to be effective.
Just want to make sure I follow --- you do have a way to reproduce the
bug, but from code inspection you didn't find an event handler in tgt
which calls tgt_event_del() for "other" fds which is the trigger for
the bug, right?
Can you please provide the steps to reproduce the bug?
Or.
^ permalink raw reply [flat|nested] 5+ messages in thread
* Re: [PATCH v3] tgtd: refresh ready fds of event loop after event deletion
2014-03-04 15:14 ` Or Gerlitz
@ 2014-03-06 6:54 ` Hitoshi Mitake
0 siblings, 0 replies; 5+ messages in thread
From: Hitoshi Mitake @ 2014-03-06 6:54 UTC (permalink / raw)
To: Or Gerlitz; +Cc: Hitoshi Mitake, stgt, mitake.hitoshi, Roi Dayan
At Tue, 4 Mar 2014 17:14:41 +0200,
Or Gerlitz wrote:
>
> On 26/02/2014 07:13, Hitoshi Mitake wrote:
> > For example, we can produce segfault of tgtd under heavy load. Below
> > is a backtrace obtained from the core file:
> > (gdb) bt
> > #0 0x0000000000000000 in ?? ()
> > #1 0x0000000000411419 in event_loop () at tgtd.c:414
> > #2 0x0000000000411b65 in main (argc=<value optimized out>, argv=<value optimized out>) at tgtd.c:591
> >
> > To be honest, I still don't find an event handler which calls
> > tgt_event_del() for other fds. But with this modification, the above
> > segfault is avoided. The change seems to be effective.
>
> Just want to make sure I follow --- you do have a way to reproduce the
> bug, but from code inspection you didn't find an event handler in tgt
> which calls tgt_event_del() for "other" fds which is the trigger for
> the bug, right?
Yes. To be more precise, I'd like to describe my understanding:
1. There are some event handlers which can close "other
fds". e.g. mtask_recv_send_handler(). It can be invoked via input of
unix domain socket and close fds of tcp connections when user invokes
"--op delete --mode target".
2. But we can produce the above segfault without using it...
>
> Can you please provide the steps to reproduce the bug?
We are using 4 node cluster connected via 10Gbps ethernet. 1 node
executes tgtd for providing iSCSI target. The backing store is
sheepdog. All nodes execute 4 VMs and read iso file (about 4GB) from
single logical unit.
# The test is mocking an environment of thin clients. We need multiple
# dd processes for exhausting the 10Gbps network.
When we run the above test several times, the segfault occurs. But, of
course, we don't invoke the "tgtadm --op delete --mode target" during
the testing.
Thanks,
Hitoshi
^ permalink raw reply [flat|nested] 5+ messages in thread
end of thread, other threads:[~2014-03-06 6:54 UTC | newest]
Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-02-26 5:13 [PATCH v3] tgtd: refresh ready fds of event loop after event deletion Hitoshi Mitake
2014-02-26 5:22 ` FUJITA Tomonori
2014-02-26 5:33 ` Hitoshi Mitake
2014-03-04 15:14 ` Or Gerlitz
2014-03-06 6:54 ` Hitoshi Mitake
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.