* Re: Not matching event states in ./msg/async/AsyncConnection.cc
[not found] <0a8a87d5-777c-7e78-0400-cf1978217928@digiware.nl>
@ 2016-06-22 13:47 ` Willem Jan Withagen
2016-06-22 14:36 ` Haomai Wang
0 siblings, 1 reply; 13+ messages in thread
From: Willem Jan Withagen @ 2016-06-22 13:47 UTC (permalink / raw)
To: Ceph Development, Haomai Wang
This time also to the list.
On 20-6-2016 04:40, Haomai Wang wrote:
> On Mon, Jun 20, 2016 at 5:28 AM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>> Hi,
>>
>> I 've just rebased, but now I'm getting BUG warnings in del_event()
>> calls. And I suspect that it is because of the snippet below...
>>
>> In STATE_CONNECTING EVENT_READABLE is set on sd, but we try a bit latter
>> to remove EVENT_WRITABLE. Which abort since asserts were added to the
>> kevent code.
>>
>> Could it be that in STATE_CONNECTING_RE EVENT_READABLE needs to be removed?
>
> https://github.com/ceph/ceph/pull/9086/commits/a74ce419133881ff8618733a0501c4a47e1368e3
'mmmm,
That piece of code is already in the code that I'm testing.
So that could be the source of problems. I don't know.
Not going to pretend I understand anything of the statmachine for
setting up connections.
>> ./msg/async/AsyncConnection.cc at line 1014
>> case STATE_CONNECTING_RE:
>> {
>> r = net.reconnect(get_peer_addr(), sd);
>> if (r < 0) {
>> ldout(async_msgr->cct, 1) << __func__ << " reconnect failed "
>> << dendl;
>> goto fail;
>> } else if (r > 0) {
>> ldout(async_msgr->cct, 10) << __func__ << " nonblock connect
>> inprogress" << dendl;
>> center->create_file_event(sd, EVENT_WRITABLE, read_handler);
>> break;
>> }
>>
>> lderr(async_msgr->cct) << __func__ << ":" <<__LINE__
>> << " delete_file_event(" << sd << ", EVENT_WRITABLE)"
>> << dendl;
>> center->delete_file_event(sd, EVENT_WRITABLE);
>> state = STATE_CONNECTING_WAIT_BANNER;
>> break;
>> }
If I put debug code in Event and {Epoll,Kqueue}Event, and the trace for
this file handle is:
2016-06-22 12:54:59.183258 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
started fd=13 mask=1 original mask is 0
2016-06-22 12:54:59.183279 811653300 -1 KqueueDriver.add_event add event
fd = 13 to kqfd = 10 cur_mask = 0 add_mask = 1
2016-06-22 12:54:59.183286 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
end fd=13 mask=1 original mask is 1
2016-06-22 12:54:59.183293 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event
started fd=13 mask=2 original mask is 1
2016-06-22 12:54:59.183304 811653300 -1 KqueueDriver.del_event delete
event fd = 13 to kqfd = 10 cur_mask = 1 delmask = 2
2016-06-22 12:54:59.183308 811653300 -1 KqueueDriver.del_event unable to
delete event: (2) No such file or directory. for handle: 13 kqfd handle: 10
And then we hit assert(BUG)
Looking at the e_poll output from Linux, it looks like:
2016-06-22 15:28:15.237564 7ff58d7fa700 -1 Event(0x7ff5900b1038
owner=140692617668352 nevent=5000 time_id=1).create_file_event create
event started fd=13 mask=1 original mask is 0
2016-06-22 15:28:15.237595 7ff58d7fa700 -1 Event(0x7ff5900b1038
owner=140692617668352 nevent=5000 time_id=1).create_file_event create
event end fd=13 mask=1 original mask is 1
2016-06-22 15:28:15.237609 7ff58d7fa700 -1 Event(0x7ff5900b1038
owner=140692617668352 nevent=5000 time_id=1).delete_file_event delete
event started fd=13 mask=2 original mask is 1
2016-06-22 15:28:15.237613 7ff58d7fa700 -1 EpollDriver.del_event delete
event fd = 13 to epfd = 10 cur_mask = 1 delmask = 2
The access pattern is the same, but that does not generate an error.
I think the error message actually means that there is no event in the
Keventqueue, that actually matches. And that is because EVENT_WRITABLE
is asked to be removed, but the event actually only has EVENT_READ
activated. So the EV_DELETE fails.
I have not tried this trace on Linux to see what is the result there.
Perhaps Linux e_poll stuff does not really care if you want a
non-existent filter?
But the assert is rather detrimental on FreeBSD code :(
But the other strange part for me is that the code tries to remove
events that are really not inserted....
Wether that is a problem in the current code? I do not know, but it
certainly strange.
--WjW
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc
2016-06-22 13:47 ` Not matching event states in ./msg/async/AsyncConnection.cc Willem Jan Withagen
@ 2016-06-22 14:36 ` Haomai Wang
2016-06-22 15:25 ` Willem Jan Withagen
2016-06-22 22:16 ` Willem Jan Withagen
0 siblings, 2 replies; 13+ messages in thread
From: Haomai Wang @ 2016-06-22 14:36 UTC (permalink / raw)
To: Willem Jan Withagen; +Cc: Ceph Development
Oh, sorry. I still realize you are testing on kqueue event backend.
I submit a pr to fix this. plz help to verify whether it works for you
since I don't have bsd handy...
https://github.com/ceph/ceph/pull/9869
On Wed, Jun 22, 2016 at 9:47 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
> This time also to the list.
>
> On 20-6-2016 04:40, Haomai Wang wrote:
>> On Mon, Jun 20, 2016 at 5:28 AM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>>> Hi,
>>>
>>> I 've just rebased, but now I'm getting BUG warnings in del_event()
>>> calls. And I suspect that it is because of the snippet below...
>>>
>>> In STATE_CONNECTING EVENT_READABLE is set on sd, but we try a bit latter
>>> to remove EVENT_WRITABLE. Which abort since asserts were added to the
>>> kevent code.
>>>
>>> Could it be that in STATE_CONNECTING_RE EVENT_READABLE needs to be removed?
>>
>> https://github.com/ceph/ceph/pull/9086/commits/a74ce419133881ff8618733a0501c4a47e1368e3
>
> 'mmmm,
>
> That piece of code is already in the code that I'm testing.
> So that could be the source of problems. I don't know.
> Not going to pretend I understand anything of the statmachine for
> setting up connections.
>
>>> ./msg/async/AsyncConnection.cc at line 1014
>>> case STATE_CONNECTING_RE:
>>> {
>>> r = net.reconnect(get_peer_addr(), sd);
>>> if (r < 0) {
>>> ldout(async_msgr->cct, 1) << __func__ << " reconnect failed "
>>> << dendl;
>>> goto fail;
>>> } else if (r > 0) {
>>> ldout(async_msgr->cct, 10) << __func__ << " nonblock connect
>>> inprogress" << dendl;
>>> center->create_file_event(sd, EVENT_WRITABLE, read_handler);
>>> break;
>>> }
>>>
>>> lderr(async_msgr->cct) << __func__ << ":" <<__LINE__
>>> << " delete_file_event(" << sd << ", EVENT_WRITABLE)"
>>> << dendl;
>>> center->delete_file_event(sd, EVENT_WRITABLE);
>>> state = STATE_CONNECTING_WAIT_BANNER;
>>> break;
>>> }
>
> If I put debug code in Event and {Epoll,Kqueue}Event, and the trace for
> this file handle is:
> 2016-06-22 12:54:59.183258 811653300 -1 Event(0x81164f448
> owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
> started fd=13 mask=1 original mask is 0
> 2016-06-22 12:54:59.183279 811653300 -1 KqueueDriver.add_event add event
> fd = 13 to kqfd = 10 cur_mask = 0 add_mask = 1
> 2016-06-22 12:54:59.183286 811653300 -1 Event(0x81164f448
> owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
> end fd=13 mask=1 original mask is 1
> 2016-06-22 12:54:59.183293 811653300 -1 Event(0x81164f448
> owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event
> started fd=13 mask=2 original mask is 1
> 2016-06-22 12:54:59.183304 811653300 -1 KqueueDriver.del_event delete
> event fd = 13 to kqfd = 10 cur_mask = 1 delmask = 2
> 2016-06-22 12:54:59.183308 811653300 -1 KqueueDriver.del_event unable to
> delete event: (2) No such file or directory. for handle: 13 kqfd handle: 10
>
> And then we hit assert(BUG)
>
> Looking at the e_poll output from Linux, it looks like:
> 2016-06-22 15:28:15.237564 7ff58d7fa700 -1 Event(0x7ff5900b1038
> owner=140692617668352 nevent=5000 time_id=1).create_file_event create
> event started fd=13 mask=1 original mask is 0
> 2016-06-22 15:28:15.237595 7ff58d7fa700 -1 Event(0x7ff5900b1038
> owner=140692617668352 nevent=5000 time_id=1).create_file_event create
> event end fd=13 mask=1 original mask is 1
> 2016-06-22 15:28:15.237609 7ff58d7fa700 -1 Event(0x7ff5900b1038
> owner=140692617668352 nevent=5000 time_id=1).delete_file_event delete
> event started fd=13 mask=2 original mask is 1
> 2016-06-22 15:28:15.237613 7ff58d7fa700 -1 EpollDriver.del_event delete
> event fd = 13 to epfd = 10 cur_mask = 1 delmask = 2
>
> The access pattern is the same, but that does not generate an error.
>
> I think the error message actually means that there is no event in the
> Keventqueue, that actually matches. And that is because EVENT_WRITABLE
> is asked to be removed, but the event actually only has EVENT_READ
> activated. So the EV_DELETE fails.
>
> I have not tried this trace on Linux to see what is the result there.
> Perhaps Linux e_poll stuff does not really care if you want a
> non-existent filter?
> But the assert is rather detrimental on FreeBSD code :(
>
> But the other strange part for me is that the code tries to remove
> events that are really not inserted....
> Wether that is a problem in the current code? I do not know, but it
> certainly strange.
>
> --WjW
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc
2016-06-22 14:36 ` Haomai Wang
@ 2016-06-22 15:25 ` Willem Jan Withagen
2016-06-22 22:09 ` Willem Jan Withagen
2016-06-22 22:16 ` Willem Jan Withagen
1 sibling, 1 reply; 13+ messages in thread
From: Willem Jan Withagen @ 2016-06-22 15:25 UTC (permalink / raw)
To: Haomai Wang; +Cc: Ceph Development
On 22-6-2016 16:36, Haomai Wang wrote:
> Oh, sorry. I still realize you are testing on kqueue event backend.
>
> I submit a pr to fix this. plz help to verify whether it works for you
> since I don't have bsd handy...
>
> https://github.com/ceph/ceph/pull/9869
Yup,
Saw this flying by.
Got to finish the running 'make check', and then I'll put your stuff in
and let you know.
--WjW
> On Wed, Jun 22, 2016 at 9:47 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>> This time also to the list.
>>
>> On 20-6-2016 04:40, Haomai Wang wrote:
>>> On Mon, Jun 20, 2016 at 5:28 AM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>>>> Hi,
>>>>
>>>> I 've just rebased, but now I'm getting BUG warnings in del_event()
>>>> calls. And I suspect that it is because of the snippet below...
>>>>
>>>> In STATE_CONNECTING EVENT_READABLE is set on sd, but we try a bit latter
>>>> to remove EVENT_WRITABLE. Which abort since asserts were added to the
>>>> kevent code.
>>>>
>>>> Could it be that in STATE_CONNECTING_RE EVENT_READABLE needs to be removed?
>>>
>>> https://github.com/ceph/ceph/pull/9086/commits/a74ce419133881ff8618733a0501c4a47e1368e3
>>
>> 'mmmm,
>>
>> That piece of code is already in the code that I'm testing.
>> So that could be the source of problems. I don't know.
>> Not going to pretend I understand anything of the statmachine for
>> setting up connections.
>>
>>>> ./msg/async/AsyncConnection.cc at line 1014
>>>> case STATE_CONNECTING_RE:
>>>> {
>>>> r = net.reconnect(get_peer_addr(), sd);
>>>> if (r < 0) {
>>>> ldout(async_msgr->cct, 1) << __func__ << " reconnect failed "
>>>> << dendl;
>>>> goto fail;
>>>> } else if (r > 0) {
>>>> ldout(async_msgr->cct, 10) << __func__ << " nonblock connect
>>>> inprogress" << dendl;
>>>> center->create_file_event(sd, EVENT_WRITABLE, read_handler);
>>>> break;
>>>> }
>>>>
>>>> lderr(async_msgr->cct) << __func__ << ":" <<__LINE__
>>>> << " delete_file_event(" << sd << ", EVENT_WRITABLE)"
>>>> << dendl;
>>>> center->delete_file_event(sd, EVENT_WRITABLE);
>>>> state = STATE_CONNECTING_WAIT_BANNER;
>>>> break;
>>>> }
>>
>> If I put debug code in Event and {Epoll,Kqueue}Event, and the trace for
>> this file handle is:
>> 2016-06-22 12:54:59.183258 811653300 -1 Event(0x81164f448
>> owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
>> started fd=13 mask=1 original mask is 0
>> 2016-06-22 12:54:59.183279 811653300 -1 KqueueDriver.add_event add event
>> fd = 13 to kqfd = 10 cur_mask = 0 add_mask = 1
>> 2016-06-22 12:54:59.183286 811653300 -1 Event(0x81164f448
>> owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
>> end fd=13 mask=1 original mask is 1
>> 2016-06-22 12:54:59.183293 811653300 -1 Event(0x81164f448
>> owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event
>> started fd=13 mask=2 original mask is 1
>> 2016-06-22 12:54:59.183304 811653300 -1 KqueueDriver.del_event delete
>> event fd = 13 to kqfd = 10 cur_mask = 1 delmask = 2
>> 2016-06-22 12:54:59.183308 811653300 -1 KqueueDriver.del_event unable to
>> delete event: (2) No such file or directory. for handle: 13 kqfd handle: 10
>>
>> And then we hit assert(BUG)
>>
>> Looking at the e_poll output from Linux, it looks like:
>> 2016-06-22 15:28:15.237564 7ff58d7fa700 -1 Event(0x7ff5900b1038
>> owner=140692617668352 nevent=5000 time_id=1).create_file_event create
>> event started fd=13 mask=1 original mask is 0
>> 2016-06-22 15:28:15.237595 7ff58d7fa700 -1 Event(0x7ff5900b1038
>> owner=140692617668352 nevent=5000 time_id=1).create_file_event create
>> event end fd=13 mask=1 original mask is 1
>> 2016-06-22 15:28:15.237609 7ff58d7fa700 -1 Event(0x7ff5900b1038
>> owner=140692617668352 nevent=5000 time_id=1).delete_file_event delete
>> event started fd=13 mask=2 original mask is 1
>> 2016-06-22 15:28:15.237613 7ff58d7fa700 -1 EpollDriver.del_event delete
>> event fd = 13 to epfd = 10 cur_mask = 1 delmask = 2
>>
>> The access pattern is the same, but that does not generate an error.
>>
>> I think the error message actually means that there is no event in the
>> Keventqueue, that actually matches. And that is because EVENT_WRITABLE
>> is asked to be removed, but the event actually only has EVENT_READ
>> activated. So the EV_DELETE fails.
>>
>> I have not tried this trace on Linux to see what is the result there.
>> Perhaps Linux e_poll stuff does not really care if you want a
>> non-existent filter?
>> But the assert is rather detrimental on FreeBSD code :(
>>
>> But the other strange part for me is that the code tries to remove
>> events that are really not inserted....
>> Wether that is a problem in the current code? I do not know, but it
>> certainly strange.
>>
>> --WjW
>>
>>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc
2016-06-22 15:25 ` Willem Jan Withagen
@ 2016-06-22 22:09 ` Willem Jan Withagen
0 siblings, 0 replies; 13+ messages in thread
From: Willem Jan Withagen @ 2016-06-22 22:09 UTC (permalink / raw)
To: Haomai Wang; +Cc: Ceph Development
On 22-6-2016 17:25, Willem Jan Withagen wrote:
> On 22-6-2016 16:36, Haomai Wang wrote:
>> Oh, sorry. I still realize you are testing on kqueue event backend.
>>
>> I submit a pr to fix this. plz help to verify whether it works for you
>> since I don't have bsd handy...
>>
>> https://github.com/ceph/ceph/pull/9869
>
> Yup,
> Saw this flying by.
> Got to finish the running 'make check', and then I'll put your stuff in
> and let you know.
Slightly better, but still not quite there yet:
2016-06-22 23:49:22.862340 811653300 -1 KqueueDriver.add_event add event
fd = 13 to kqfd = 10 cur_mask = 0 add_mask = 1
2016-06-22 23:49:22.862346 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
end fd=13 mask=1 original mask
is 1
2016-06-22 23:49:22.862353 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event
started fd=13 mask=2 original
mask is 1
2016-06-22 23:49:22.862364 811653300 -1 KqueueDriver.del_event delete
event fd = 13 to kqfd = 10 cur_mask = 1 delmask = 2
2016-06-22 23:49:22.862367 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event
end fd=13 mask=2 original mask
is 1
2016-06-22 23:49:22.862382 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
started fd=13 mask=2 original
mask is 1
2016-06-22 23:49:22.862386 811653300 -1 KqueueDriver.add_event add event
fd = 13 to kqfd = 10 cur_mask = 1 add_mask = 2
2016-06-22 23:49:22.862391 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
end fd=13 mask=2 original mask
is 3
2016-06-22 23:49:22.863084 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event
started fd=13 mask=2 original
mask is 3
2016-06-22 23:49:22.863095 811653300 -1 KqueueDriver.del_event delete
event fd = 13 to kqfd = 10 cur_mask = 3 delmask = 2
2016-06-22 23:49:22.863100 811653300 -1 KqueueDriver.del_event unable to
delete event: (22) Invalid argument. for handle: 13 kqfd handle: 10
msg/async/Event.cc: In function 'void
EventCenter::delete_file_event(int, int)' thread 811653300 time
2016-06-22 23:49:22.863112
msg/async/Event.cc: 210: FAILED assert(0 == "del_event BUG!")
ceph version Development (no_version)
But there is a small type in the patch, the last two lines must be:
if (mask & EVENT_WRITABLE)
filter |= EVFILT_WRITE;
After which there are no more asserts.
So you need to fix this.
--WjW
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc
2016-06-22 14:36 ` Haomai Wang
2016-06-22 15:25 ` Willem Jan Withagen
@ 2016-06-22 22:16 ` Willem Jan Withagen
2016-06-23 7:22 ` Willem Jan Withagen
1 sibling, 1 reply; 13+ messages in thread
From: Willem Jan Withagen @ 2016-06-22 22:16 UTC (permalink / raw)
To: Haomai Wang; +Cc: Ceph Development
On 22-6-2016 16:36, Haomai Wang wrote:
> Oh, sorry. I still realize you are testing on kqueue event backend.
>
> I submit a pr to fix this. plz help to verify whether it works for you
> since I don't have bsd handy...
>
> https://github.com/ceph/ceph/pull/9869
I think add_event needs about the same treatment.
Trying, Testing ATM....
--WjW
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc
2016-06-22 22:16 ` Willem Jan Withagen
@ 2016-06-23 7:22 ` Willem Jan Withagen
2016-06-23 7:44 ` Haomai Wang
0 siblings, 1 reply; 13+ messages in thread
From: Willem Jan Withagen @ 2016-06-23 7:22 UTC (permalink / raw)
To: Haomai Wang; +Cc: Ceph Development
On 23-6-2016 00:16, Willem Jan Withagen wrote:
> On 22-6-2016 16:36, Haomai Wang wrote:
>> Oh, sorry. I still realize you are testing on kqueue event backend.
>>
>> I submit a pr to fix this. plz help to verify whether it works for you
>> since I don't have bsd handy...
>>
>> https://github.com/ceph/ceph/pull/9869
>
> I think add_event needs about the same treatment.
> Trying, Testing ATM....
errgh, not quite...
It also generates errors when trying to delete EVFILT_READ (mask=2) from
an eventfilter that has both READ and WRITE set (mask=3).
Next to that the ms_async_messenger threads seem to be busy_waiting
looping and loading a full CPU core per thread.
So I think I need some testing code to see what the requirements of
kqueue actually are, compared to what async_messenger does.
Could be that if we want to go from (READ|WRITE) to either READ or WRITE
the event needs to be deleted first and than added anew.
--WjW
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc
2016-06-23 7:22 ` Willem Jan Withagen
@ 2016-06-23 7:44 ` Haomai Wang
2016-06-23 9:40 ` Willem Jan Withagen
0 siblings, 1 reply; 13+ messages in thread
From: Haomai Wang @ 2016-06-23 7:44 UTC (permalink / raw)
To: Willem Jan Withagen; +Cc: Ceph Development
On Thu, Jun 23, 2016 at 3:22 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
> On 23-6-2016 00:16, Willem Jan Withagen wrote:
>> On 22-6-2016 16:36, Haomai Wang wrote:
>>> Oh, sorry. I still realize you are testing on kqueue event backend.
>>>
>>> I submit a pr to fix this. plz help to verify whether it works for you
>>> since I don't have bsd handy...
>>>
>>> https://github.com/ceph/ceph/pull/9869
>>
>> I think add_event needs about the same treatment.
>> Trying, Testing ATM....
>
> errgh, not quite...
>
> It also generates errors when trying to delete EVFILT_READ (mask=2) from
> an eventfilter that has both READ and WRITE set (mask=3).
> Next to that the ms_async_messenger threads seem to be busy_waiting
> looping and loading a full CPU core per thread.
>
> So I think I need some testing code to see what the requirements of
> kqueue actually are, compared to what async_messenger does.
>
> Could be that if we want to go from (READ|WRITE) to either READ or WRITE
> the event needs to be deleted first and than added anew.
Hmm, I think I need to reread kqueue man page to figure out the problem...
>
> --WjW
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc
2016-06-23 7:44 ` Haomai Wang
@ 2016-06-23 9:40 ` Willem Jan Withagen
2016-06-23 12:28 ` Haomai Wang
0 siblings, 1 reply; 13+ messages in thread
From: Willem Jan Withagen @ 2016-06-23 9:40 UTC (permalink / raw)
To: Haomai Wang; +Cc: Ceph Development
On 23-6-2016 09:44, Haomai Wang wrote:
> On Thu, Jun 23, 2016 at 3:22 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>> On 23-6-2016 00:16, Willem Jan Withagen wrote:
>>> On 22-6-2016 16:36, Haomai Wang wrote:
>>>> Oh, sorry. I still realize you are testing on kqueue event backend.
>>>>
>>>> I submit a pr to fix this. plz help to verify whether it works for you
>>>> since I don't have bsd handy...
>>>>
>>>> https://github.com/ceph/ceph/pull/9869
>>>
>>> I think add_event needs about the same treatment.
>>> Trying, Testing ATM....
>>
>> errgh, not quite...
>>
>> It also generates errors when trying to delete EVFILT_READ (mask=2) from
>> an eventfilter that has both READ and WRITE set (mask=3).
>> Next to that the ms_async_messenger threads seem to be busy_waiting
>> looping and loading a full CPU core per thread.
>>
>> So I think I need some testing code to see what the requirements of
>> kqueue actually are, compared to what async_messenger does.
>>
>> Could be that if we want to go from (READ|WRITE) to either READ or WRITE
>> the event needs to be deleted first and than added anew.
>
> Hmm, I think I need to reread kqueue man page to figure out the problem...
Hi,
Perhaps we should take the Ceph-dev list off the CC, since I doubt that
they would like to read all the details. Feel free to do so, and mail me
private until there is a resolution.
Eh, perhaps it is in the manual page but it is rather vague in how it
deals with the filters internally. And how you can modify event-filters
already in the kqueue filterlist.
You currently assume that the filters you assign are internally
summarised such that you can do
add(read)
add(write)
delete(read|write)
Or add(read|write)
delete(write)
delete(read)
What I read in the man page:
EV_ADD Adds the event to the kqueue. Re-adding an existing event
will modify the parameters of the original event, and not
result in a duplicate entry. Adding an event automatically
enables it, unless overridden by the EV_DISABLE flag.
EV_DELETE Removes the event from the kqueue. Events which are
attached to file descriptors are automatically deleted on
the last close of the descriptor.
So I would read that as that you need to do any changes to the
filter-flags doing EV_ADD. And only removing the full event should be
done with EV_DELETE.
And an event is identified by its "ident" only:
ident Value used to identify this event. The exact interpretation
is determined by the attached filter, but often is a file
descriptor.
So any of the other settings are not used in determining which event to
add/modify/delete.
Now the problem lies in the "modify" word in EV_ADD. I woul;d read that
as overwrite. Your code assumes that flags are merged when doing ADD or
DELETE.
I think the code should use ADD with the settings/mask that needs to be
there, and only DELETE when the required filtermask is actually 0.
Does this make sense?
--WjW
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc
2016-06-23 9:40 ` Willem Jan Withagen
@ 2016-06-23 12:28 ` Haomai Wang
2016-06-23 13:16 ` Willem Jan Withagen
0 siblings, 1 reply; 13+ messages in thread
From: Haomai Wang @ 2016-06-23 12:28 UTC (permalink / raw)
To: Willem Jan Withagen; +Cc: Ceph Development
On Thu, Jun 23, 2016 at 5:40 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
> On 23-6-2016 09:44, Haomai Wang wrote:
>> On Thu, Jun 23, 2016 at 3:22 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>>> On 23-6-2016 00:16, Willem Jan Withagen wrote:
>>>> On 22-6-2016 16:36, Haomai Wang wrote:
>>>>> Oh, sorry. I still realize you are testing on kqueue event backend.
>>>>>
>>>>> I submit a pr to fix this. plz help to verify whether it works for you
>>>>> since I don't have bsd handy...
>>>>>
>>>>> https://github.com/ceph/ceph/pull/9869
>>>>
>>>> I think add_event needs about the same treatment.
>>>> Trying, Testing ATM....
>>>
>>> errgh, not quite...
>>>
>>> It also generates errors when trying to delete EVFILT_READ (mask=2) from
>>> an eventfilter that has both READ and WRITE set (mask=3).
>>> Next to that the ms_async_messenger threads seem to be busy_waiting
>>> looping and loading a full CPU core per thread.
>>>
>>> So I think I need some testing code to see what the requirements of
>>> kqueue actually are, compared to what async_messenger does.
>>>
>>> Could be that if we want to go from (READ|WRITE) to either READ or WRITE
>>> the event needs to be deleted first and than added anew.
>>
>> Hmm, I think I need to reread kqueue man page to figure out the problem...
>
> Hi,
>
> Perhaps we should take the Ceph-dev list off the CC, since I doubt that
> they would like to read all the details. Feel free to do so, and mail me
> private until there is a resolution.
>
> Eh, perhaps it is in the manual page but it is rather vague in how it
> deals with the filters internally. And how you can modify event-filters
> already in the kqueue filterlist.
>
> You currently assume that the filters you assign are internally
> summarised such that you can do
> add(read)
> add(write)
> delete(read|write)
>
> Or add(read|write)
> delete(write)
> delete(read)
>
> What I read in the man page:
> EV_ADD Adds the event to the kqueue. Re-adding an existing event
> will modify the parameters of the original event, and not
> result in a duplicate entry. Adding an event automatically
> enables it, unless overridden by the EV_DISABLE flag.
>
> EV_DELETE Removes the event from the kqueue. Events which are
> attached to file descriptors are automatically deleted on
> the last close of the descriptor.
>
> So I would read that as that you need to do any changes to the
> filter-flags doing EV_ADD. And only removing the full event should be
> done with EV_DELETE.
> And an event is identified by its "ident" only:
> ident Value used to identify this event. The exact interpretation
> is determined by the attached filter, but often is a file
> descriptor.
> So any of the other settings are not used in determining which event to
> add/modify/delete.
>
> Now the problem lies in the "modify" word in EV_ADD. I woul;d read that
> as overwrite. Your code assumes that flags are merged when doing ADD or
> DELETE.
>
> I think the code should use ADD with the settings/mask that needs to be
> there, and only DELETE when the required filtermask is actually 0.
>
I just remember four years ago(I'm a bsd fan that time) I have written
kqueue codes(https://github.com/yuyuyu101/twemproxy/commit/68d04283ba8d58142e544868842307d66d8f4600#diff-ea1f3d7f51a1def56821fefd14eb8560R193).
EV_SET(&event, fd, EVFILT_READ|EV_CLEAR, EV_ADD, 0, 0, data);
I think we can follow this....
> Does this make sense?
>
> --WjW
>
>
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc
2016-06-23 12:28 ` Haomai Wang
@ 2016-06-23 13:16 ` Willem Jan Withagen
2016-06-23 15:16 ` Willem Jan Withagen
0 siblings, 1 reply; 13+ messages in thread
From: Willem Jan Withagen @ 2016-06-23 13:16 UTC (permalink / raw)
To: Haomai Wang; +Cc: Ceph Development
On 23-6-2016 14:28, Haomai Wang wrote:
> On Thu, Jun 23, 2016 at 5:40 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>> On 23-6-2016 09:44, Haomai Wang wrote:
>>> On Thu, Jun 23, 2016 at 3:22 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>>>> On 23-6-2016 00:16, Willem Jan Withagen wrote:
>>>>> On 22-6-2016 16:36, Haomai Wang wrote:
>>>>>> Oh, sorry. I still realize you are testing on kqueue event backend.
>>>>>>
>>>>>> I submit a pr to fix this. plz help to verify whether it works for you
>>>>>> since I don't have bsd handy...
>>>>>>
>>>>>> https://github.com/ceph/ceph/pull/9869
>>>>>
>>>>> I think add_event needs about the same treatment.
>>>>> Trying, Testing ATM....
>>>>
>>>> errgh, not quite...
>>>>
>>>> It also generates errors when trying to delete EVFILT_READ (mask=2) from
>>>> an eventfilter that has both READ and WRITE set (mask=3).
>>>> Next to that the ms_async_messenger threads seem to be busy_waiting
>>>> looping and loading a full CPU core per thread.
>>>>
>>>> So I think I need some testing code to see what the requirements of
>>>> kqueue actually are, compared to what async_messenger does.
>>>>
>>>> Could be that if we want to go from (READ|WRITE) to either READ or WRITE
>>>> the event needs to be deleted first and than added anew.
>>>
>>> Hmm, I think I need to reread kqueue man page to figure out the problem...
>>
>> Hi,
>>
>> Perhaps we should take the Ceph-dev list off the CC, since I doubt that
>> they would like to read all the details. Feel free to do so, and mail me
>> private until there is a resolution.
>>
>> Eh, perhaps it is in the manual page but it is rather vague in how it
>> deals with the filters internally. And how you can modify event-filters
>> already in the kqueue filterlist.
>>
>> You currently assume that the filters you assign are internally
>> summarised such that you can do
>> add(read)
>> add(write)
>> delete(read|write)
>>
>> Or add(read|write)
>> delete(write)
>> delete(read)
>>
>> What I read in the man page:
>> EV_ADD Adds the event to the kqueue. Re-adding an existing event
>> will modify the parameters of the original event, and not
>> result in a duplicate entry. Adding an event automatically
>> enables it, unless overridden by the EV_DISABLE flag.
>>
>> EV_DELETE Removes the event from the kqueue. Events which are
>> attached to file descriptors are automatically deleted on
>> the last close of the descriptor.
>>
>> So I would read that as that you need to do any changes to the
>> filter-flags doing EV_ADD. And only removing the full event should be
>> done with EV_DELETE.
>> And an event is identified by its "ident" only:
>> ident Value used to identify this event. The exact interpretation
>> is determined by the attached filter, but often is a file
>> descriptor.
>> So any of the other settings are not used in determining which event to
>> add/modify/delete.
>>
>> Now the problem lies in the "modify" word in EV_ADD. I woul;d read that
>> as overwrite. Your code assumes that flags are merged when doing ADD or
>> DELETE.
>>
>> I think the code should use ADD with the settings/mask that needs to be
>> there, and only DELETE when the required filtermask is actually 0.
>>
>
> I just remember four years ago(I'm a bsd fan that time) I have written
> kqueue codes(https://github.com/yuyuyu101/twemproxy/commit/68d04283ba8d58142e544868842307d66d8f4600#diff-ea1f3d7f51a1def56821fefd14eb8560R193).
>
> EV_SET(&event, fd, EVFILT_READ|EV_CLEAR, EV_ADD, 0, 0, data);
>
> I think we can follow this....
Anything you throw at me, I will test.
ATM busy reworking some of the other compat code I have in a pull, so
when ever you have something, let me know.
--WjW
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc
2016-06-23 13:16 ` Willem Jan Withagen
@ 2016-06-23 15:16 ` Willem Jan Withagen
0 siblings, 0 replies; 13+ messages in thread
From: Willem Jan Withagen @ 2016-06-23 15:16 UTC (permalink / raw)
To: Haomai Wang; +Cc: Ceph Development
On 23-6-2016 15:16, Willem Jan Withagen wrote:
> On 23-6-2016 14:28, Haomai Wang wrote:
>> On Thu, Jun 23, 2016 at 5:40 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>>> On 23-6-2016 09:44, Haomai Wang wrote:
>>>> On Thu, Jun 23, 2016 at 3:22 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>>>>> On 23-6-2016 00:16, Willem Jan Withagen wrote:
>>>>>> On 22-6-2016 16:36, Haomai Wang wrote:
>>>>>>> Oh, sorry. I still realize you are testing on kqueue event backend.
>>>>>>>
>>>>>>> I submit a pr to fix this. plz help to verify whether it works for you
>>>>>>> since I don't have bsd handy...
>>>>>>>
>>>>>>> https://github.com/ceph/ceph/pull/9869
>>>>>>
>>>>>> I think add_event needs about the same treatment.
>>>>>> Trying, Testing ATM....
>>>>>
>>>>> errgh, not quite...
>>>>>
>>>>> It also generates errors when trying to delete EVFILT_READ (mask=2) from
>>>>> an eventfilter that has both READ and WRITE set (mask=3).
>>>>> Next to that the ms_async_messenger threads seem to be busy_waiting
>>>>> looping and loading a full CPU core per thread.
>>>>>
>>>>> So I think I need some testing code to see what the requirements of
>>>>> kqueue actually are, compared to what async_messenger does.
>>>>>
>>>>> Could be that if we want to go from (READ|WRITE) to either READ or WRITE
>>>>> the event needs to be deleted first and than added anew.
>>>>
>>>> Hmm, I think I need to reread kqueue man page to figure out the problem...
>>>
>>> Hi,
>>>
>>> Perhaps we should take the Ceph-dev list off the CC, since I doubt that
>>> they would like to read all the details. Feel free to do so, and mail me
>>> private until there is a resolution.
>>>
>>> Eh, perhaps it is in the manual page but it is rather vague in how it
>>> deals with the filters internally. And how you can modify event-filters
>>> already in the kqueue filterlist.
>>>
>>> You currently assume that the filters you assign are internally
>>> summarised such that you can do
>>> add(read)
>>> add(write)
>>> delete(read|write)
>>>
>>> Or add(read|write)
>>> delete(write)
>>> delete(read)
>>>
>>> What I read in the man page:
>>> EV_ADD Adds the event to the kqueue. Re-adding an existing event
>>> will modify the parameters of the original event, and not
>>> result in a duplicate entry. Adding an event automatically
>>> enables it, unless overridden by the EV_DISABLE flag.
>>>
>>> EV_DELETE Removes the event from the kqueue. Events which are
>>> attached to file descriptors are automatically deleted on
>>> the last close of the descriptor.
>>>
>>> So I would read that as that you need to do any changes to the
>>> filter-flags doing EV_ADD. And only removing the full event should be
>>> done with EV_DELETE.
>>> And an event is identified by its "ident" only:
>>> ident Value used to identify this event. The exact interpretation
>>> is determined by the attached filter, but often is a file
>>> descriptor.
>>> So any of the other settings are not used in determining which event to
>>> add/modify/delete.
>>>
>>> Now the problem lies in the "modify" word in EV_ADD. I woul;d read that
>>> as overwrite. Your code assumes that flags are merged when doing ADD or
>>> DELETE.
>>>
>>> I think the code should use ADD with the settings/mask that needs to be
>>> there, and only DELETE when the required filtermask is actually 0.
>>>
>>
>> I just remember four years ago(I'm a bsd fan that time) I have written
>> kqueue codes(https://github.com/yuyuyu101/twemproxy/commit/68d04283ba8d58142e544868842307d66d8f4600#diff-ea1f3d7f51a1def56821fefd14eb8560R193).
>>
>> EV_SET(&event, fd, EVFILT_READ|EV_CLEAR, EV_ADD, 0, 0, data);
>>
>> I think we can follow this....
>
> Anything you throw at me, I will test.
> ATM busy reworking some of the other compat code I have in a pull, so
> when ever you have something, let me know.
I'm still sort of woried about this type of sequence:
1) 2016-06-23 17:01:27.496757 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
started fd=13 mask=1 original mask is 0
2) 2016-06-23 17:01:27.496788 811653300 -1 KqueueDriver.add_event add
event fd = 13 to kqfd = 10 cur_mask = 0 add_mask = 1
3) 2016-06-23 17:01:27.496792 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
end fd=13 mask=1 original mask is 1
4) 2016-06-23 17:01:27.496808 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event
started fd=13 mask=2 original mask is 1
5) 2016-06-23 17:01:27.496821 811653300 -1 KqueueDriver.del_event delete
event fd = 13 to kqfd = 10 cur_mask = 1 delmask = 2
We create an event with add_mask = 1, cur_mask = 0, as expected
the we add an event with add_mask = 1, cur_mask = 1, that will work
but WHY does line 4 want to delete with delmask = 2, and cur_mask = 1
It looks to me that there is something out of sync in the layers higher
up in the stack. You cannot delete something that is not set??
--WjW
^ permalink raw reply [flat|nested] 13+ messages in thread
* Re: Not matching event states in ./msg/async/AsyncConnection.cc
2016-06-19 21:28 Willem Jan Withagen
@ 2016-06-20 2:40 ` Haomai Wang
0 siblings, 0 replies; 13+ messages in thread
From: Haomai Wang @ 2016-06-20 2:40 UTC (permalink / raw)
To: Willem Jan Withagen; +Cc: Ceph Development
On Mon, Jun 20, 2016 at 5:28 AM, Willem Jan Withagen <wjw@digiware.nl> wrote:
> Hi,
>
> I 've just rebased, but now I'm getting BUG warnings in del_event()
> calls. And I suspect that it is because of the snippet below...
>
> In STATE_CONNECTING EVENT_READABLE is set on sd, but we try a bit latter
> to remove EVENT_WRITABLE. Which abort since asserts were added to the
> kevent code.
>
> Could it be that in STATE_CONNECTING_RE EVENT_READABLE needs to be removed?
https://github.com/ceph/ceph/pull/9086/commits/a74ce419133881ff8618733a0501c4a47e1368e3
>
> --WjW
>
>
> ./msg/async/AsyncConnection.cc at line 1014
> case STATE_CONNECTING_RE:
> {
> r = net.reconnect(get_peer_addr(), sd);
> if (r < 0) {
> ldout(async_msgr->cct, 1) << __func__ << " reconnect failed "
> << dendl;
> goto fail;
> } else if (r > 0) {
> ldout(async_msgr->cct, 10) << __func__ << " nonblock connect
> inprogress" << dendl;
> center->create_file_event(sd, EVENT_WRITABLE, read_handler);
> break;
> }
>
> lderr(async_msgr->cct) << __func__ << ":" <<__LINE__
> << " delete_file_event(" << sd << ", EVENT_WRITABLE)"
> << dendl;
> center->delete_file_event(sd, EVENT_WRITABLE);
> state = STATE_CONNECTING_WAIT_BANNER;
> break;
> }
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at http://vger.kernel.org/majordomo-info.html
^ permalink raw reply [flat|nested] 13+ messages in thread
* Not matching event states in ./msg/async/AsyncConnection.cc
@ 2016-06-19 21:28 Willem Jan Withagen
2016-06-20 2:40 ` Haomai Wang
0 siblings, 1 reply; 13+ messages in thread
From: Willem Jan Withagen @ 2016-06-19 21:28 UTC (permalink / raw)
To: Ceph Development
Hi,
I 've just rebased, but now I'm getting BUG warnings in del_event()
calls. And I suspect that it is because of the snippet below...
In STATE_CONNECTING EVENT_READABLE is set on sd, but we try a bit latter
to remove EVENT_WRITABLE. Which abort since asserts were added to the
kevent code.
Could it be that in STATE_CONNECTING_RE EVENT_READABLE needs to be removed?
--WjW
./msg/async/AsyncConnection.cc at line 1014
case STATE_CONNECTING_RE:
{
r = net.reconnect(get_peer_addr(), sd);
if (r < 0) {
ldout(async_msgr->cct, 1) << __func__ << " reconnect failed "
<< dendl;
goto fail;
} else if (r > 0) {
ldout(async_msgr->cct, 10) << __func__ << " nonblock connect
inprogress" << dendl;
center->create_file_event(sd, EVENT_WRITABLE, read_handler);
break;
}
lderr(async_msgr->cct) << __func__ << ":" <<__LINE__
<< " delete_file_event(" << sd << ", EVENT_WRITABLE)"
<< dendl;
center->delete_file_event(sd, EVENT_WRITABLE);
state = STATE_CONNECTING_WAIT_BANNER;
break;
}
^ permalink raw reply [flat|nested] 13+ messages in thread
end of thread, other threads:[~2016-06-23 15:16 UTC | newest]
Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
[not found] <0a8a87d5-777c-7e78-0400-cf1978217928@digiware.nl>
2016-06-22 13:47 ` Not matching event states in ./msg/async/AsyncConnection.cc Willem Jan Withagen
2016-06-22 14:36 ` Haomai Wang
2016-06-22 15:25 ` Willem Jan Withagen
2016-06-22 22:09 ` Willem Jan Withagen
2016-06-22 22:16 ` Willem Jan Withagen
2016-06-23 7:22 ` Willem Jan Withagen
2016-06-23 7:44 ` Haomai Wang
2016-06-23 9:40 ` Willem Jan Withagen
2016-06-23 12:28 ` Haomai Wang
2016-06-23 13:16 ` Willem Jan Withagen
2016-06-23 15:16 ` Willem Jan Withagen
2016-06-19 21:28 Willem Jan Withagen
2016-06-20 2:40 ` Haomai Wang
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.