* Re: Not matching event states in ./msg/async/AsyncConnection.cc [not found] <0a8a87d5-777c-7e78-0400-cf1978217928@digiware.nl> @ 2016-06-22 13:47 ` Willem Jan Withagen 2016-06-22 14:36 ` Haomai Wang 0 siblings, 1 reply; 13+ messages in thread From: Willem Jan Withagen @ 2016-06-22 13:47 UTC (permalink / raw) To: Ceph Development, Haomai Wang This time also to the list. On 20-6-2016 04:40, Haomai Wang wrote: > On Mon, Jun 20, 2016 at 5:28 AM, Willem Jan Withagen <wjw@digiware.nl> wrote: >> Hi, >> >> I 've just rebased, but now I'm getting BUG warnings in del_event() >> calls. And I suspect that it is because of the snippet below... >> >> In STATE_CONNECTING EVENT_READABLE is set on sd, but we try a bit latter >> to remove EVENT_WRITABLE. Which abort since asserts were added to the >> kevent code. >> >> Could it be that in STATE_CONNECTING_RE EVENT_READABLE needs to be removed? > > https://github.com/ceph/ceph/pull/9086/commits/a74ce419133881ff8618733a0501c4a47e1368e3 'mmmm, That piece of code is already in the code that I'm testing. So that could be the source of problems. I don't know. Not going to pretend I understand anything of the statmachine for setting up connections. >> ./msg/async/AsyncConnection.cc at line 1014 >> case STATE_CONNECTING_RE: >> { >> r = net.reconnect(get_peer_addr(), sd); >> if (r < 0) { >> ldout(async_msgr->cct, 1) << __func__ << " reconnect failed " >> << dendl; >> goto fail; >> } else if (r > 0) { >> ldout(async_msgr->cct, 10) << __func__ << " nonblock connect >> inprogress" << dendl; >> center->create_file_event(sd, EVENT_WRITABLE, read_handler); >> break; >> } >> >> lderr(async_msgr->cct) << __func__ << ":" <<__LINE__ >> << " delete_file_event(" << sd << ", EVENT_WRITABLE)" >> << dendl; >> center->delete_file_event(sd, EVENT_WRITABLE); >> state = STATE_CONNECTING_WAIT_BANNER; >> break; >> } If I put debug code in Event and {Epoll,Kqueue}Event, and the trace for this file handle is: 2016-06-22 12:54:59.183258 811653300 -1 Event(0x81164f448 owner=0x811653300 nevent=5000 time_id=1).create_file_event create event started fd=13 mask=1 original mask is 0 2016-06-22 12:54:59.183279 811653300 -1 KqueueDriver.add_event add event fd = 13 to kqfd = 10 cur_mask = 0 add_mask = 1 2016-06-22 12:54:59.183286 811653300 -1 Event(0x81164f448 owner=0x811653300 nevent=5000 time_id=1).create_file_event create event end fd=13 mask=1 original mask is 1 2016-06-22 12:54:59.183293 811653300 -1 Event(0x81164f448 owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event started fd=13 mask=2 original mask is 1 2016-06-22 12:54:59.183304 811653300 -1 KqueueDriver.del_event delete event fd = 13 to kqfd = 10 cur_mask = 1 delmask = 2 2016-06-22 12:54:59.183308 811653300 -1 KqueueDriver.del_event unable to delete event: (2) No such file or directory. for handle: 13 kqfd handle: 10 And then we hit assert(BUG) Looking at the e_poll output from Linux, it looks like: 2016-06-22 15:28:15.237564 7ff58d7fa700 -1 Event(0x7ff5900b1038 owner=140692617668352 nevent=5000 time_id=1).create_file_event create event started fd=13 mask=1 original mask is 0 2016-06-22 15:28:15.237595 7ff58d7fa700 -1 Event(0x7ff5900b1038 owner=140692617668352 nevent=5000 time_id=1).create_file_event create event end fd=13 mask=1 original mask is 1 2016-06-22 15:28:15.237609 7ff58d7fa700 -1 Event(0x7ff5900b1038 owner=140692617668352 nevent=5000 time_id=1).delete_file_event delete event started fd=13 mask=2 original mask is 1 2016-06-22 15:28:15.237613 7ff58d7fa700 -1 EpollDriver.del_event delete event fd = 13 to epfd = 10 cur_mask = 1 delmask = 2 The access pattern is the same, but that does not generate an error. I think the error message actually means that there is no event in the Keventqueue, that actually matches. And that is because EVENT_WRITABLE is asked to be removed, but the event actually only has EVENT_READ activated. So the EV_DELETE fails. I have not tried this trace on Linux to see what is the result there. Perhaps Linux e_poll stuff does not really care if you want a non-existent filter? But the assert is rather detrimental on FreeBSD code :( But the other strange part for me is that the code tries to remove events that are really not inserted.... Wether that is a problem in the current code? I do not know, but it certainly strange. --WjW ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc 2016-06-22 13:47 ` Not matching event states in ./msg/async/AsyncConnection.cc Willem Jan Withagen @ 2016-06-22 14:36 ` Haomai Wang 2016-06-22 15:25 ` Willem Jan Withagen 2016-06-22 22:16 ` Willem Jan Withagen 0 siblings, 2 replies; 13+ messages in thread From: Haomai Wang @ 2016-06-22 14:36 UTC (permalink / raw) To: Willem Jan Withagen; +Cc: Ceph Development Oh, sorry. I still realize you are testing on kqueue event backend. I submit a pr to fix this. plz help to verify whether it works for you since I don't have bsd handy... https://github.com/ceph/ceph/pull/9869 On Wed, Jun 22, 2016 at 9:47 PM, Willem Jan Withagen <wjw@digiware.nl> wrote: > This time also to the list. > > On 20-6-2016 04:40, Haomai Wang wrote: >> On Mon, Jun 20, 2016 at 5:28 AM, Willem Jan Withagen <wjw@digiware.nl> wrote: >>> Hi, >>> >>> I 've just rebased, but now I'm getting BUG warnings in del_event() >>> calls. And I suspect that it is because of the snippet below... >>> >>> In STATE_CONNECTING EVENT_READABLE is set on sd, but we try a bit latter >>> to remove EVENT_WRITABLE. Which abort since asserts were added to the >>> kevent code. >>> >>> Could it be that in STATE_CONNECTING_RE EVENT_READABLE needs to be removed? >> >> https://github.com/ceph/ceph/pull/9086/commits/a74ce419133881ff8618733a0501c4a47e1368e3 > > 'mmmm, > > That piece of code is already in the code that I'm testing. > So that could be the source of problems. I don't know. > Not going to pretend I understand anything of the statmachine for > setting up connections. > >>> ./msg/async/AsyncConnection.cc at line 1014 >>> case STATE_CONNECTING_RE: >>> { >>> r = net.reconnect(get_peer_addr(), sd); >>> if (r < 0) { >>> ldout(async_msgr->cct, 1) << __func__ << " reconnect failed " >>> << dendl; >>> goto fail; >>> } else if (r > 0) { >>> ldout(async_msgr->cct, 10) << __func__ << " nonblock connect >>> inprogress" << dendl; >>> center->create_file_event(sd, EVENT_WRITABLE, read_handler); >>> break; >>> } >>> >>> lderr(async_msgr->cct) << __func__ << ":" <<__LINE__ >>> << " delete_file_event(" << sd << ", EVENT_WRITABLE)" >>> << dendl; >>> center->delete_file_event(sd, EVENT_WRITABLE); >>> state = STATE_CONNECTING_WAIT_BANNER; >>> break; >>> } > > If I put debug code in Event and {Epoll,Kqueue}Event, and the trace for > this file handle is: > 2016-06-22 12:54:59.183258 811653300 -1 Event(0x81164f448 > owner=0x811653300 nevent=5000 time_id=1).create_file_event create event > started fd=13 mask=1 original mask is 0 > 2016-06-22 12:54:59.183279 811653300 -1 KqueueDriver.add_event add event > fd = 13 to kqfd = 10 cur_mask = 0 add_mask = 1 > 2016-06-22 12:54:59.183286 811653300 -1 Event(0x81164f448 > owner=0x811653300 nevent=5000 time_id=1).create_file_event create event > end fd=13 mask=1 original mask is 1 > 2016-06-22 12:54:59.183293 811653300 -1 Event(0x81164f448 > owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event > started fd=13 mask=2 original mask is 1 > 2016-06-22 12:54:59.183304 811653300 -1 KqueueDriver.del_event delete > event fd = 13 to kqfd = 10 cur_mask = 1 delmask = 2 > 2016-06-22 12:54:59.183308 811653300 -1 KqueueDriver.del_event unable to > delete event: (2) No such file or directory. for handle: 13 kqfd handle: 10 > > And then we hit assert(BUG) > > Looking at the e_poll output from Linux, it looks like: > 2016-06-22 15:28:15.237564 7ff58d7fa700 -1 Event(0x7ff5900b1038 > owner=140692617668352 nevent=5000 time_id=1).create_file_event create > event started fd=13 mask=1 original mask is 0 > 2016-06-22 15:28:15.237595 7ff58d7fa700 -1 Event(0x7ff5900b1038 > owner=140692617668352 nevent=5000 time_id=1).create_file_event create > event end fd=13 mask=1 original mask is 1 > 2016-06-22 15:28:15.237609 7ff58d7fa700 -1 Event(0x7ff5900b1038 > owner=140692617668352 nevent=5000 time_id=1).delete_file_event delete > event started fd=13 mask=2 original mask is 1 > 2016-06-22 15:28:15.237613 7ff58d7fa700 -1 EpollDriver.del_event delete > event fd = 13 to epfd = 10 cur_mask = 1 delmask = 2 > > The access pattern is the same, but that does not generate an error. > > I think the error message actually means that there is no event in the > Keventqueue, that actually matches. And that is because EVENT_WRITABLE > is asked to be removed, but the event actually only has EVENT_READ > activated. So the EV_DELETE fails. > > I have not tried this trace on Linux to see what is the result there. > Perhaps Linux e_poll stuff does not really care if you want a > non-existent filter? > But the assert is rather detrimental on FreeBSD code :( > > But the other strange part for me is that the code tries to remove > events that are really not inserted.... > Wether that is a problem in the current code? I do not know, but it > certainly strange. > > --WjW > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc 2016-06-22 14:36 ` Haomai Wang @ 2016-06-22 15:25 ` Willem Jan Withagen 2016-06-22 22:09 ` Willem Jan Withagen 2016-06-22 22:16 ` Willem Jan Withagen 1 sibling, 1 reply; 13+ messages in thread From: Willem Jan Withagen @ 2016-06-22 15:25 UTC (permalink / raw) To: Haomai Wang; +Cc: Ceph Development On 22-6-2016 16:36, Haomai Wang wrote: > Oh, sorry. I still realize you are testing on kqueue event backend. > > I submit a pr to fix this. plz help to verify whether it works for you > since I don't have bsd handy... > > https://github.com/ceph/ceph/pull/9869 Yup, Saw this flying by. Got to finish the running 'make check', and then I'll put your stuff in and let you know. --WjW > On Wed, Jun 22, 2016 at 9:47 PM, Willem Jan Withagen <wjw@digiware.nl> wrote: >> This time also to the list. >> >> On 20-6-2016 04:40, Haomai Wang wrote: >>> On Mon, Jun 20, 2016 at 5:28 AM, Willem Jan Withagen <wjw@digiware.nl> wrote: >>>> Hi, >>>> >>>> I 've just rebased, but now I'm getting BUG warnings in del_event() >>>> calls. And I suspect that it is because of the snippet below... >>>> >>>> In STATE_CONNECTING EVENT_READABLE is set on sd, but we try a bit latter >>>> to remove EVENT_WRITABLE. Which abort since asserts were added to the >>>> kevent code. >>>> >>>> Could it be that in STATE_CONNECTING_RE EVENT_READABLE needs to be removed? >>> >>> https://github.com/ceph/ceph/pull/9086/commits/a74ce419133881ff8618733a0501c4a47e1368e3 >> >> 'mmmm, >> >> That piece of code is already in the code that I'm testing. >> So that could be the source of problems. I don't know. >> Not going to pretend I understand anything of the statmachine for >> setting up connections. >> >>>> ./msg/async/AsyncConnection.cc at line 1014 >>>> case STATE_CONNECTING_RE: >>>> { >>>> r = net.reconnect(get_peer_addr(), sd); >>>> if (r < 0) { >>>> ldout(async_msgr->cct, 1) << __func__ << " reconnect failed " >>>> << dendl; >>>> goto fail; >>>> } else if (r > 0) { >>>> ldout(async_msgr->cct, 10) << __func__ << " nonblock connect >>>> inprogress" << dendl; >>>> center->create_file_event(sd, EVENT_WRITABLE, read_handler); >>>> break; >>>> } >>>> >>>> lderr(async_msgr->cct) << __func__ << ":" <<__LINE__ >>>> << " delete_file_event(" << sd << ", EVENT_WRITABLE)" >>>> << dendl; >>>> center->delete_file_event(sd, EVENT_WRITABLE); >>>> state = STATE_CONNECTING_WAIT_BANNER; >>>> break; >>>> } >> >> If I put debug code in Event and {Epoll,Kqueue}Event, and the trace for >> this file handle is: >> 2016-06-22 12:54:59.183258 811653300 -1 Event(0x81164f448 >> owner=0x811653300 nevent=5000 time_id=1).create_file_event create event >> started fd=13 mask=1 original mask is 0 >> 2016-06-22 12:54:59.183279 811653300 -1 KqueueDriver.add_event add event >> fd = 13 to kqfd = 10 cur_mask = 0 add_mask = 1 >> 2016-06-22 12:54:59.183286 811653300 -1 Event(0x81164f448 >> owner=0x811653300 nevent=5000 time_id=1).create_file_event create event >> end fd=13 mask=1 original mask is 1 >> 2016-06-22 12:54:59.183293 811653300 -1 Event(0x81164f448 >> owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event >> started fd=13 mask=2 original mask is 1 >> 2016-06-22 12:54:59.183304 811653300 -1 KqueueDriver.del_event delete >> event fd = 13 to kqfd = 10 cur_mask = 1 delmask = 2 >> 2016-06-22 12:54:59.183308 811653300 -1 KqueueDriver.del_event unable to >> delete event: (2) No such file or directory. for handle: 13 kqfd handle: 10 >> >> And then we hit assert(BUG) >> >> Looking at the e_poll output from Linux, it looks like: >> 2016-06-22 15:28:15.237564 7ff58d7fa700 -1 Event(0x7ff5900b1038 >> owner=140692617668352 nevent=5000 time_id=1).create_file_event create >> event started fd=13 mask=1 original mask is 0 >> 2016-06-22 15:28:15.237595 7ff58d7fa700 -1 Event(0x7ff5900b1038 >> owner=140692617668352 nevent=5000 time_id=1).create_file_event create >> event end fd=13 mask=1 original mask is 1 >> 2016-06-22 15:28:15.237609 7ff58d7fa700 -1 Event(0x7ff5900b1038 >> owner=140692617668352 nevent=5000 time_id=1).delete_file_event delete >> event started fd=13 mask=2 original mask is 1 >> 2016-06-22 15:28:15.237613 7ff58d7fa700 -1 EpollDriver.del_event delete >> event fd = 13 to epfd = 10 cur_mask = 1 delmask = 2 >> >> The access pattern is the same, but that does not generate an error. >> >> I think the error message actually means that there is no event in the >> Keventqueue, that actually matches. And that is because EVENT_WRITABLE >> is asked to be removed, but the event actually only has EVENT_READ >> activated. So the EV_DELETE fails. >> >> I have not tried this trace on Linux to see what is the result there. >> Perhaps Linux e_poll stuff does not really care if you want a >> non-existent filter? >> But the assert is rather detrimental on FreeBSD code :( >> >> But the other strange part for me is that the code tries to remove >> events that are really not inserted.... >> Wether that is a problem in the current code? I do not know, but it >> certainly strange. >> >> --WjW >> >> ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc 2016-06-22 15:25 ` Willem Jan Withagen @ 2016-06-22 22:09 ` Willem Jan Withagen 0 siblings, 0 replies; 13+ messages in thread From: Willem Jan Withagen @ 2016-06-22 22:09 UTC (permalink / raw) To: Haomai Wang; +Cc: Ceph Development On 22-6-2016 17:25, Willem Jan Withagen wrote: > On 22-6-2016 16:36, Haomai Wang wrote: >> Oh, sorry. I still realize you are testing on kqueue event backend. >> >> I submit a pr to fix this. plz help to verify whether it works for you >> since I don't have bsd handy... >> >> https://github.com/ceph/ceph/pull/9869 > > Yup, > Saw this flying by. > Got to finish the running 'make check', and then I'll put your stuff in > and let you know. Slightly better, but still not quite there yet: 2016-06-22 23:49:22.862340 811653300 -1 KqueueDriver.add_event add event fd = 13 to kqfd = 10 cur_mask = 0 add_mask = 1 2016-06-22 23:49:22.862346 811653300 -1 Event(0x81164f448 owner=0x811653300 nevent=5000 time_id=1).create_file_event create event end fd=13 mask=1 original mask is 1 2016-06-22 23:49:22.862353 811653300 -1 Event(0x81164f448 owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event started fd=13 mask=2 original mask is 1 2016-06-22 23:49:22.862364 811653300 -1 KqueueDriver.del_event delete event fd = 13 to kqfd = 10 cur_mask = 1 delmask = 2 2016-06-22 23:49:22.862367 811653300 -1 Event(0x81164f448 owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event end fd=13 mask=2 original mask is 1 2016-06-22 23:49:22.862382 811653300 -1 Event(0x81164f448 owner=0x811653300 nevent=5000 time_id=1).create_file_event create event started fd=13 mask=2 original mask is 1 2016-06-22 23:49:22.862386 811653300 -1 KqueueDriver.add_event add event fd = 13 to kqfd = 10 cur_mask = 1 add_mask = 2 2016-06-22 23:49:22.862391 811653300 -1 Event(0x81164f448 owner=0x811653300 nevent=5000 time_id=1).create_file_event create event end fd=13 mask=2 original mask is 3 2016-06-22 23:49:22.863084 811653300 -1 Event(0x81164f448 owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event started fd=13 mask=2 original mask is 3 2016-06-22 23:49:22.863095 811653300 -1 KqueueDriver.del_event delete event fd = 13 to kqfd = 10 cur_mask = 3 delmask = 2 2016-06-22 23:49:22.863100 811653300 -1 KqueueDriver.del_event unable to delete event: (22) Invalid argument. for handle: 13 kqfd handle: 10 msg/async/Event.cc: In function 'void EventCenter::delete_file_event(int, int)' thread 811653300 time 2016-06-22 23:49:22.863112 msg/async/Event.cc: 210: FAILED assert(0 == "del_event BUG!") ceph version Development (no_version) But there is a small type in the patch, the last two lines must be: if (mask & EVENT_WRITABLE) filter |= EVFILT_WRITE; After which there are no more asserts. So you need to fix this. --WjW ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc 2016-06-22 14:36 ` Haomai Wang 2016-06-22 15:25 ` Willem Jan Withagen @ 2016-06-22 22:16 ` Willem Jan Withagen 2016-06-23 7:22 ` Willem Jan Withagen 1 sibling, 1 reply; 13+ messages in thread From: Willem Jan Withagen @ 2016-06-22 22:16 UTC (permalink / raw) To: Haomai Wang; +Cc: Ceph Development On 22-6-2016 16:36, Haomai Wang wrote: > Oh, sorry. I still realize you are testing on kqueue event backend. > > I submit a pr to fix this. plz help to verify whether it works for you > since I don't have bsd handy... > > https://github.com/ceph/ceph/pull/9869 I think add_event needs about the same treatment. Trying, Testing ATM.... --WjW ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc 2016-06-22 22:16 ` Willem Jan Withagen @ 2016-06-23 7:22 ` Willem Jan Withagen 2016-06-23 7:44 ` Haomai Wang 0 siblings, 1 reply; 13+ messages in thread From: Willem Jan Withagen @ 2016-06-23 7:22 UTC (permalink / raw) To: Haomai Wang; +Cc: Ceph Development On 23-6-2016 00:16, Willem Jan Withagen wrote: > On 22-6-2016 16:36, Haomai Wang wrote: >> Oh, sorry. I still realize you are testing on kqueue event backend. >> >> I submit a pr to fix this. plz help to verify whether it works for you >> since I don't have bsd handy... >> >> https://github.com/ceph/ceph/pull/9869 > > I think add_event needs about the same treatment. > Trying, Testing ATM.... errgh, not quite... It also generates errors when trying to delete EVFILT_READ (mask=2) from an eventfilter that has both READ and WRITE set (mask=3). Next to that the ms_async_messenger threads seem to be busy_waiting looping and loading a full CPU core per thread. So I think I need some testing code to see what the requirements of kqueue actually are, compared to what async_messenger does. Could be that if we want to go from (READ|WRITE) to either READ or WRITE the event needs to be deleted first and than added anew. --WjW ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc 2016-06-23 7:22 ` Willem Jan Withagen @ 2016-06-23 7:44 ` Haomai Wang 2016-06-23 9:40 ` Willem Jan Withagen 0 siblings, 1 reply; 13+ messages in thread From: Haomai Wang @ 2016-06-23 7:44 UTC (permalink / raw) To: Willem Jan Withagen; +Cc: Ceph Development On Thu, Jun 23, 2016 at 3:22 PM, Willem Jan Withagen <wjw@digiware.nl> wrote: > On 23-6-2016 00:16, Willem Jan Withagen wrote: >> On 22-6-2016 16:36, Haomai Wang wrote: >>> Oh, sorry. I still realize you are testing on kqueue event backend. >>> >>> I submit a pr to fix this. plz help to verify whether it works for you >>> since I don't have bsd handy... >>> >>> https://github.com/ceph/ceph/pull/9869 >> >> I think add_event needs about the same treatment. >> Trying, Testing ATM.... > > errgh, not quite... > > It also generates errors when trying to delete EVFILT_READ (mask=2) from > an eventfilter that has both READ and WRITE set (mask=3). > Next to that the ms_async_messenger threads seem to be busy_waiting > looping and loading a full CPU core per thread. > > So I think I need some testing code to see what the requirements of > kqueue actually are, compared to what async_messenger does. > > Could be that if we want to go from (READ|WRITE) to either READ or WRITE > the event needs to be deleted first and than added anew. Hmm, I think I need to reread kqueue man page to figure out the problem... > > --WjW > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc 2016-06-23 7:44 ` Haomai Wang @ 2016-06-23 9:40 ` Willem Jan Withagen 2016-06-23 12:28 ` Haomai Wang 0 siblings, 1 reply; 13+ messages in thread From: Willem Jan Withagen @ 2016-06-23 9:40 UTC (permalink / raw) To: Haomai Wang; +Cc: Ceph Development On 23-6-2016 09:44, Haomai Wang wrote: > On Thu, Jun 23, 2016 at 3:22 PM, Willem Jan Withagen <wjw@digiware.nl> wrote: >> On 23-6-2016 00:16, Willem Jan Withagen wrote: >>> On 22-6-2016 16:36, Haomai Wang wrote: >>>> Oh, sorry. I still realize you are testing on kqueue event backend. >>>> >>>> I submit a pr to fix this. plz help to verify whether it works for you >>>> since I don't have bsd handy... >>>> >>>> https://github.com/ceph/ceph/pull/9869 >>> >>> I think add_event needs about the same treatment. >>> Trying, Testing ATM.... >> >> errgh, not quite... >> >> It also generates errors when trying to delete EVFILT_READ (mask=2) from >> an eventfilter that has both READ and WRITE set (mask=3). >> Next to that the ms_async_messenger threads seem to be busy_waiting >> looping and loading a full CPU core per thread. >> >> So I think I need some testing code to see what the requirements of >> kqueue actually are, compared to what async_messenger does. >> >> Could be that if we want to go from (READ|WRITE) to either READ or WRITE >> the event needs to be deleted first and than added anew. > > Hmm, I think I need to reread kqueue man page to figure out the problem... Hi, Perhaps we should take the Ceph-dev list off the CC, since I doubt that they would like to read all the details. Feel free to do so, and mail me private until there is a resolution. Eh, perhaps it is in the manual page but it is rather vague in how it deals with the filters internally. And how you can modify event-filters already in the kqueue filterlist. You currently assume that the filters you assign are internally summarised such that you can do add(read) add(write) delete(read|write) Or add(read|write) delete(write) delete(read) What I read in the man page: EV_ADD Adds the event to the kqueue. Re-adding an existing event will modify the parameters of the original event, and not result in a duplicate entry. Adding an event automatically enables it, unless overridden by the EV_DISABLE flag. EV_DELETE Removes the event from the kqueue. Events which are attached to file descriptors are automatically deleted on the last close of the descriptor. So I would read that as that you need to do any changes to the filter-flags doing EV_ADD. And only removing the full event should be done with EV_DELETE. And an event is identified by its "ident" only: ident Value used to identify this event. The exact interpretation is determined by the attached filter, but often is a file descriptor. So any of the other settings are not used in determining which event to add/modify/delete. Now the problem lies in the "modify" word in EV_ADD. I woul;d read that as overwrite. Your code assumes that flags are merged when doing ADD or DELETE. I think the code should use ADD with the settings/mask that needs to be there, and only DELETE when the required filtermask is actually 0. Does this make sense? --WjW ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc 2016-06-23 9:40 ` Willem Jan Withagen @ 2016-06-23 12:28 ` Haomai Wang 2016-06-23 13:16 ` Willem Jan Withagen 0 siblings, 1 reply; 13+ messages in thread From: Haomai Wang @ 2016-06-23 12:28 UTC (permalink / raw) To: Willem Jan Withagen; +Cc: Ceph Development On Thu, Jun 23, 2016 at 5:40 PM, Willem Jan Withagen <wjw@digiware.nl> wrote: > On 23-6-2016 09:44, Haomai Wang wrote: >> On Thu, Jun 23, 2016 at 3:22 PM, Willem Jan Withagen <wjw@digiware.nl> wrote: >>> On 23-6-2016 00:16, Willem Jan Withagen wrote: >>>> On 22-6-2016 16:36, Haomai Wang wrote: >>>>> Oh, sorry. I still realize you are testing on kqueue event backend. >>>>> >>>>> I submit a pr to fix this. plz help to verify whether it works for you >>>>> since I don't have bsd handy... >>>>> >>>>> https://github.com/ceph/ceph/pull/9869 >>>> >>>> I think add_event needs about the same treatment. >>>> Trying, Testing ATM.... >>> >>> errgh, not quite... >>> >>> It also generates errors when trying to delete EVFILT_READ (mask=2) from >>> an eventfilter that has both READ and WRITE set (mask=3). >>> Next to that the ms_async_messenger threads seem to be busy_waiting >>> looping and loading a full CPU core per thread. >>> >>> So I think I need some testing code to see what the requirements of >>> kqueue actually are, compared to what async_messenger does. >>> >>> Could be that if we want to go from (READ|WRITE) to either READ or WRITE >>> the event needs to be deleted first and than added anew. >> >> Hmm, I think I need to reread kqueue man page to figure out the problem... > > Hi, > > Perhaps we should take the Ceph-dev list off the CC, since I doubt that > they would like to read all the details. Feel free to do so, and mail me > private until there is a resolution. > > Eh, perhaps it is in the manual page but it is rather vague in how it > deals with the filters internally. And how you can modify event-filters > already in the kqueue filterlist. > > You currently assume that the filters you assign are internally > summarised such that you can do > add(read) > add(write) > delete(read|write) > > Or add(read|write) > delete(write) > delete(read) > > What I read in the man page: > EV_ADD Adds the event to the kqueue. Re-adding an existing event > will modify the parameters of the original event, and not > result in a duplicate entry. Adding an event automatically > enables it, unless overridden by the EV_DISABLE flag. > > EV_DELETE Removes the event from the kqueue. Events which are > attached to file descriptors are automatically deleted on > the last close of the descriptor. > > So I would read that as that you need to do any changes to the > filter-flags doing EV_ADD. And only removing the full event should be > done with EV_DELETE. > And an event is identified by its "ident" only: > ident Value used to identify this event. The exact interpretation > is determined by the attached filter, but often is a file > descriptor. > So any of the other settings are not used in determining which event to > add/modify/delete. > > Now the problem lies in the "modify" word in EV_ADD. I woul;d read that > as overwrite. Your code assumes that flags are merged when doing ADD or > DELETE. > > I think the code should use ADD with the settings/mask that needs to be > there, and only DELETE when the required filtermask is actually 0. > I just remember four years ago(I'm a bsd fan that time) I have written kqueue codes(https://github.com/yuyuyu101/twemproxy/commit/68d04283ba8d58142e544868842307d66d8f4600#diff-ea1f3d7f51a1def56821fefd14eb8560R193). EV_SET(&event, fd, EVFILT_READ|EV_CLEAR, EV_ADD, 0, 0, data); I think we can follow this.... > Does this make sense? > > --WjW > > ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc 2016-06-23 12:28 ` Haomai Wang @ 2016-06-23 13:16 ` Willem Jan Withagen 2016-06-23 15:16 ` Willem Jan Withagen 0 siblings, 1 reply; 13+ messages in thread From: Willem Jan Withagen @ 2016-06-23 13:16 UTC (permalink / raw) To: Haomai Wang; +Cc: Ceph Development On 23-6-2016 14:28, Haomai Wang wrote: > On Thu, Jun 23, 2016 at 5:40 PM, Willem Jan Withagen <wjw@digiware.nl> wrote: >> On 23-6-2016 09:44, Haomai Wang wrote: >>> On Thu, Jun 23, 2016 at 3:22 PM, Willem Jan Withagen <wjw@digiware.nl> wrote: >>>> On 23-6-2016 00:16, Willem Jan Withagen wrote: >>>>> On 22-6-2016 16:36, Haomai Wang wrote: >>>>>> Oh, sorry. I still realize you are testing on kqueue event backend. >>>>>> >>>>>> I submit a pr to fix this. plz help to verify whether it works for you >>>>>> since I don't have bsd handy... >>>>>> >>>>>> https://github.com/ceph/ceph/pull/9869 >>>>> >>>>> I think add_event needs about the same treatment. >>>>> Trying, Testing ATM.... >>>> >>>> errgh, not quite... >>>> >>>> It also generates errors when trying to delete EVFILT_READ (mask=2) from >>>> an eventfilter that has both READ and WRITE set (mask=3). >>>> Next to that the ms_async_messenger threads seem to be busy_waiting >>>> looping and loading a full CPU core per thread. >>>> >>>> So I think I need some testing code to see what the requirements of >>>> kqueue actually are, compared to what async_messenger does. >>>> >>>> Could be that if we want to go from (READ|WRITE) to either READ or WRITE >>>> the event needs to be deleted first and than added anew. >>> >>> Hmm, I think I need to reread kqueue man page to figure out the problem... >> >> Hi, >> >> Perhaps we should take the Ceph-dev list off the CC, since I doubt that >> they would like to read all the details. Feel free to do so, and mail me >> private until there is a resolution. >> >> Eh, perhaps it is in the manual page but it is rather vague in how it >> deals with the filters internally. And how you can modify event-filters >> already in the kqueue filterlist. >> >> You currently assume that the filters you assign are internally >> summarised such that you can do >> add(read) >> add(write) >> delete(read|write) >> >> Or add(read|write) >> delete(write) >> delete(read) >> >> What I read in the man page: >> EV_ADD Adds the event to the kqueue. Re-adding an existing event >> will modify the parameters of the original event, and not >> result in a duplicate entry. Adding an event automatically >> enables it, unless overridden by the EV_DISABLE flag. >> >> EV_DELETE Removes the event from the kqueue. Events which are >> attached to file descriptors are automatically deleted on >> the last close of the descriptor. >> >> So I would read that as that you need to do any changes to the >> filter-flags doing EV_ADD. And only removing the full event should be >> done with EV_DELETE. >> And an event is identified by its "ident" only: >> ident Value used to identify this event. The exact interpretation >> is determined by the attached filter, but often is a file >> descriptor. >> So any of the other settings are not used in determining which event to >> add/modify/delete. >> >> Now the problem lies in the "modify" word in EV_ADD. I woul;d read that >> as overwrite. Your code assumes that flags are merged when doing ADD or >> DELETE. >> >> I think the code should use ADD with the settings/mask that needs to be >> there, and only DELETE when the required filtermask is actually 0. >> > > I just remember four years ago(I'm a bsd fan that time) I have written > kqueue codes(https://github.com/yuyuyu101/twemproxy/commit/68d04283ba8d58142e544868842307d66d8f4600#diff-ea1f3d7f51a1def56821fefd14eb8560R193). > > EV_SET(&event, fd, EVFILT_READ|EV_CLEAR, EV_ADD, 0, 0, data); > > I think we can follow this.... Anything you throw at me, I will test. ATM busy reworking some of the other compat code I have in a pull, so when ever you have something, let me know. --WjW ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc 2016-06-23 13:16 ` Willem Jan Withagen @ 2016-06-23 15:16 ` Willem Jan Withagen 0 siblings, 0 replies; 13+ messages in thread From: Willem Jan Withagen @ 2016-06-23 15:16 UTC (permalink / raw) To: Haomai Wang; +Cc: Ceph Development On 23-6-2016 15:16, Willem Jan Withagen wrote: > On 23-6-2016 14:28, Haomai Wang wrote: >> On Thu, Jun 23, 2016 at 5:40 PM, Willem Jan Withagen <wjw@digiware.nl> wrote: >>> On 23-6-2016 09:44, Haomai Wang wrote: >>>> On Thu, Jun 23, 2016 at 3:22 PM, Willem Jan Withagen <wjw@digiware.nl> wrote: >>>>> On 23-6-2016 00:16, Willem Jan Withagen wrote: >>>>>> On 22-6-2016 16:36, Haomai Wang wrote: >>>>>>> Oh, sorry. I still realize you are testing on kqueue event backend. >>>>>>> >>>>>>> I submit a pr to fix this. plz help to verify whether it works for you >>>>>>> since I don't have bsd handy... >>>>>>> >>>>>>> https://github.com/ceph/ceph/pull/9869 >>>>>> >>>>>> I think add_event needs about the same treatment. >>>>>> Trying, Testing ATM.... >>>>> >>>>> errgh, not quite... >>>>> >>>>> It also generates errors when trying to delete EVFILT_READ (mask=2) from >>>>> an eventfilter that has both READ and WRITE set (mask=3). >>>>> Next to that the ms_async_messenger threads seem to be busy_waiting >>>>> looping and loading a full CPU core per thread. >>>>> >>>>> So I think I need some testing code to see what the requirements of >>>>> kqueue actually are, compared to what async_messenger does. >>>>> >>>>> Could be that if we want to go from (READ|WRITE) to either READ or WRITE >>>>> the event needs to be deleted first and than added anew. >>>> >>>> Hmm, I think I need to reread kqueue man page to figure out the problem... >>> >>> Hi, >>> >>> Perhaps we should take the Ceph-dev list off the CC, since I doubt that >>> they would like to read all the details. Feel free to do so, and mail me >>> private until there is a resolution. >>> >>> Eh, perhaps it is in the manual page but it is rather vague in how it >>> deals with the filters internally. And how you can modify event-filters >>> already in the kqueue filterlist. >>> >>> You currently assume that the filters you assign are internally >>> summarised such that you can do >>> add(read) >>> add(write) >>> delete(read|write) >>> >>> Or add(read|write) >>> delete(write) >>> delete(read) >>> >>> What I read in the man page: >>> EV_ADD Adds the event to the kqueue. Re-adding an existing event >>> will modify the parameters of the original event, and not >>> result in a duplicate entry. Adding an event automatically >>> enables it, unless overridden by the EV_DISABLE flag. >>> >>> EV_DELETE Removes the event from the kqueue. Events which are >>> attached to file descriptors are automatically deleted on >>> the last close of the descriptor. >>> >>> So I would read that as that you need to do any changes to the >>> filter-flags doing EV_ADD. And only removing the full event should be >>> done with EV_DELETE. >>> And an event is identified by its "ident" only: >>> ident Value used to identify this event. The exact interpretation >>> is determined by the attached filter, but often is a file >>> descriptor. >>> So any of the other settings are not used in determining which event to >>> add/modify/delete. >>> >>> Now the problem lies in the "modify" word in EV_ADD. I woul;d read that >>> as overwrite. Your code assumes that flags are merged when doing ADD or >>> DELETE. >>> >>> I think the code should use ADD with the settings/mask that needs to be >>> there, and only DELETE when the required filtermask is actually 0. >>> >> >> I just remember four years ago(I'm a bsd fan that time) I have written >> kqueue codes(https://github.com/yuyuyu101/twemproxy/commit/68d04283ba8d58142e544868842307d66d8f4600#diff-ea1f3d7f51a1def56821fefd14eb8560R193). >> >> EV_SET(&event, fd, EVFILT_READ|EV_CLEAR, EV_ADD, 0, 0, data); >> >> I think we can follow this.... > > Anything you throw at me, I will test. > ATM busy reworking some of the other compat code I have in a pull, so > when ever you have something, let me know. I'm still sort of woried about this type of sequence: 1) 2016-06-23 17:01:27.496757 811653300 -1 Event(0x81164f448 owner=0x811653300 nevent=5000 time_id=1).create_file_event create event started fd=13 mask=1 original mask is 0 2) 2016-06-23 17:01:27.496788 811653300 -1 KqueueDriver.add_event add event fd = 13 to kqfd = 10 cur_mask = 0 add_mask = 1 3) 2016-06-23 17:01:27.496792 811653300 -1 Event(0x81164f448 owner=0x811653300 nevent=5000 time_id=1).create_file_event create event end fd=13 mask=1 original mask is 1 4) 2016-06-23 17:01:27.496808 811653300 -1 Event(0x81164f448 owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event started fd=13 mask=2 original mask is 1 5) 2016-06-23 17:01:27.496821 811653300 -1 KqueueDriver.del_event delete event fd = 13 to kqfd = 10 cur_mask = 1 delmask = 2 We create an event with add_mask = 1, cur_mask = 0, as expected the we add an event with add_mask = 1, cur_mask = 1, that will work but WHY does line 4 want to delete with delmask = 2, and cur_mask = 1 It looks to me that there is something out of sync in the layers higher up in the stack. You cannot delete something that is not set?? --WjW ^ permalink raw reply [flat|nested] 13+ messages in thread
* Not matching event states in ./msg/async/AsyncConnection.cc @ 2016-06-19 21:28 Willem Jan Withagen 2016-06-20 2:40 ` Haomai Wang 0 siblings, 1 reply; 13+ messages in thread From: Willem Jan Withagen @ 2016-06-19 21:28 UTC (permalink / raw) To: Ceph Development Hi, I 've just rebased, but now I'm getting BUG warnings in del_event() calls. And I suspect that it is because of the snippet below... In STATE_CONNECTING EVENT_READABLE is set on sd, but we try a bit latter to remove EVENT_WRITABLE. Which abort since asserts were added to the kevent code. Could it be that in STATE_CONNECTING_RE EVENT_READABLE needs to be removed? --WjW ./msg/async/AsyncConnection.cc at line 1014 case STATE_CONNECTING_RE: { r = net.reconnect(get_peer_addr(), sd); if (r < 0) { ldout(async_msgr->cct, 1) << __func__ << " reconnect failed " << dendl; goto fail; } else if (r > 0) { ldout(async_msgr->cct, 10) << __func__ << " nonblock connect inprogress" << dendl; center->create_file_event(sd, EVENT_WRITABLE, read_handler); break; } lderr(async_msgr->cct) << __func__ << ":" <<__LINE__ << " delete_file_event(" << sd << ", EVENT_WRITABLE)" << dendl; center->delete_file_event(sd, EVENT_WRITABLE); state = STATE_CONNECTING_WAIT_BANNER; break; } ^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc 2016-06-19 21:28 Willem Jan Withagen @ 2016-06-20 2:40 ` Haomai Wang 0 siblings, 0 replies; 13+ messages in thread From: Haomai Wang @ 2016-06-20 2:40 UTC (permalink / raw) To: Willem Jan Withagen; +Cc: Ceph Development On Mon, Jun 20, 2016 at 5:28 AM, Willem Jan Withagen <wjw@digiware.nl> wrote: > Hi, > > I 've just rebased, but now I'm getting BUG warnings in del_event() > calls. And I suspect that it is because of the snippet below... > > In STATE_CONNECTING EVENT_READABLE is set on sd, but we try a bit latter > to remove EVENT_WRITABLE. Which abort since asserts were added to the > kevent code. > > Could it be that in STATE_CONNECTING_RE EVENT_READABLE needs to be removed? https://github.com/ceph/ceph/pull/9086/commits/a74ce419133881ff8618733a0501c4a47e1368e3 > > --WjW > > > ./msg/async/AsyncConnection.cc at line 1014 > case STATE_CONNECTING_RE: > { > r = net.reconnect(get_peer_addr(), sd); > if (r < 0) { > ldout(async_msgr->cct, 1) << __func__ << " reconnect failed " > << dendl; > goto fail; > } else if (r > 0) { > ldout(async_msgr->cct, 10) << __func__ << " nonblock connect > inprogress" << dendl; > center->create_file_event(sd, EVENT_WRITABLE, read_handler); > break; > } > > lderr(async_msgr->cct) << __func__ << ":" <<__LINE__ > << " delete_file_event(" << sd << ", EVENT_WRITABLE)" > << dendl; > center->delete_file_event(sd, EVENT_WRITABLE); > state = STATE_CONNECTING_WAIT_BANNER; > break; > } > > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html ^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-06-23 15:16 UTC | newest] Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <0a8a87d5-777c-7e78-0400-cf1978217928@digiware.nl> 2016-06-22 13:47 ` Not matching event states in ./msg/async/AsyncConnection.cc Willem Jan Withagen 2016-06-22 14:36 ` Haomai Wang 2016-06-22 15:25 ` Willem Jan Withagen 2016-06-22 22:09 ` Willem Jan Withagen 2016-06-22 22:16 ` Willem Jan Withagen 2016-06-23 7:22 ` Willem Jan Withagen 2016-06-23 7:44 ` Haomai Wang 2016-06-23 9:40 ` Willem Jan Withagen 2016-06-23 12:28 ` Haomai Wang 2016-06-23 13:16 ` Willem Jan Withagen 2016-06-23 15:16 ` Willem Jan Withagen 2016-06-19 21:28 Willem Jan Withagen 2016-06-20 2:40 ` Haomai Wang
This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.