All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: Not matching event states in ./msg/async/AsyncConnection.cc
       [not found] <0a8a87d5-777c-7e78-0400-cf1978217928@digiware.nl>
@ 2016-06-22 13:47 ` Willem Jan Withagen
  2016-06-22 14:36   ` Haomai Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Willem Jan Withagen @ 2016-06-22 13:47 UTC (permalink / raw)
  To: Ceph Development, Haomai Wang

This time also to the list.

On 20-6-2016 04:40, Haomai Wang wrote:
> On Mon, Jun 20, 2016 at 5:28 AM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>> Hi,
>>
>> I 've just rebased, but now I'm getting BUG warnings in del_event()
>> calls. And I suspect that it is because of the snippet below...
>>
>> In STATE_CONNECTING EVENT_READABLE is set on sd, but we try a bit latter
>> to remove EVENT_WRITABLE. Which abort since asserts were added to the
>> kevent code.
>>
>> Could it be that in STATE_CONNECTING_RE EVENT_READABLE needs to be removed?
> 
> https://github.com/ceph/ceph/pull/9086/commits/a74ce419133881ff8618733a0501c4a47e1368e3

'mmmm,

That piece of code is already in the code that I'm testing.
So that could be the source of problems. I don't know.
Not going to pretend I understand anything of the statmachine for
setting up connections.

>> ./msg/async/AsyncConnection.cc at line 1014
>>     case STATE_CONNECTING_RE:
>>       {
>>         r = net.reconnect(get_peer_addr(), sd);
>>         if (r < 0) {
>>           ldout(async_msgr->cct, 1) << __func__ << " reconnect failed "
>> << dendl;
>>           goto fail;
>>         } else if (r > 0) {
>>           ldout(async_msgr->cct, 10) << __func__ << " nonblock connect
>> inprogress" << dendl;
>>           center->create_file_event(sd, EVENT_WRITABLE, read_handler);
>>           break;
>>         }
>>
>>       lderr(async_msgr->cct) << __func__ << ":" <<__LINE__
>>         << " delete_file_event(" << sd << ", EVENT_WRITABLE)"
>>         << dendl;
>>         center->delete_file_event(sd, EVENT_WRITABLE);
>>         state = STATE_CONNECTING_WAIT_BANNER;
>>         break;
>>       }

If I put debug code in Event and {Epoll,Kqueue}Event, and the trace for
this file handle is:
2016-06-22 12:54:59.183258 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
started fd=13 mask=1 original mask is 0
2016-06-22 12:54:59.183279 811653300 -1 KqueueDriver.add_event add event
fd = 13 to kqfd = 10 cur_mask = 0 add_mask = 1
2016-06-22 12:54:59.183286 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
end fd=13 mask=1 original mask is 1
2016-06-22 12:54:59.183293 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event
started fd=13 mask=2 original mask is 1
2016-06-22 12:54:59.183304 811653300 -1 KqueueDriver.del_event delete
event fd = 13 to kqfd = 10 cur_mask = 1 delmask = 2
2016-06-22 12:54:59.183308 811653300 -1 KqueueDriver.del_event unable to
delete event: (2) No such file or directory. for handle: 13 kqfd handle: 10

And then we hit assert(BUG)

Looking at the e_poll output from Linux, it looks like:
2016-06-22 15:28:15.237564 7ff58d7fa700 -1 Event(0x7ff5900b1038
owner=140692617668352 nevent=5000 time_id=1).create_file_event create
event started fd=13 mask=1 original mask is 0
2016-06-22 15:28:15.237595 7ff58d7fa700 -1 Event(0x7ff5900b1038
owner=140692617668352 nevent=5000 time_id=1).create_file_event create
event end fd=13 mask=1 original mask is 1
2016-06-22 15:28:15.237609 7ff58d7fa700 -1 Event(0x7ff5900b1038
owner=140692617668352 nevent=5000 time_id=1).delete_file_event delete
event started fd=13 mask=2 original mask is 1
2016-06-22 15:28:15.237613 7ff58d7fa700 -1 EpollDriver.del_event delete
event fd = 13 to epfd = 10 cur_mask = 1 delmask = 2

The access pattern is the same, but that does not generate an error.

I think the error message actually means that there is no event in the
Keventqueue, that actually matches. And that is because EVENT_WRITABLE
is asked to be removed, but the event actually only has EVENT_READ
activated. So the EV_DELETE fails.

I have not tried this trace on Linux to see what is the result there.
Perhaps Linux e_poll stuff does not really care if you want a
non-existent filter?
But the assert is rather detrimental on FreeBSD code :(

But the other strange part for me is that the code tries to remove
events that are really not inserted....
Wether that is a problem in the current code? I do not know, but it
certainly strange.

--WjW



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Not matching event states in ./msg/async/AsyncConnection.cc
  2016-06-22 13:47 ` Not matching event states in ./msg/async/AsyncConnection.cc Willem Jan Withagen
@ 2016-06-22 14:36   ` Haomai Wang
  2016-06-22 15:25     ` Willem Jan Withagen
  2016-06-22 22:16     ` Willem Jan Withagen
  0 siblings, 2 replies; 13+ messages in thread
From: Haomai Wang @ 2016-06-22 14:36 UTC (permalink / raw)
  To: Willem Jan Withagen; +Cc: Ceph Development

Oh, sorry. I still realize you are testing on kqueue event backend.

I submit a pr to fix this. plz help to verify whether it works for you
since I don't have bsd handy...

https://github.com/ceph/ceph/pull/9869

On Wed, Jun 22, 2016 at 9:47 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
> This time also to the list.
>
> On 20-6-2016 04:40, Haomai Wang wrote:
>> On Mon, Jun 20, 2016 at 5:28 AM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>>> Hi,
>>>
>>> I 've just rebased, but now I'm getting BUG warnings in del_event()
>>> calls. And I suspect that it is because of the snippet below...
>>>
>>> In STATE_CONNECTING EVENT_READABLE is set on sd, but we try a bit latter
>>> to remove EVENT_WRITABLE. Which abort since asserts were added to the
>>> kevent code.
>>>
>>> Could it be that in STATE_CONNECTING_RE EVENT_READABLE needs to be removed?
>>
>> https://github.com/ceph/ceph/pull/9086/commits/a74ce419133881ff8618733a0501c4a47e1368e3
>
> 'mmmm,
>
> That piece of code is already in the code that I'm testing.
> So that could be the source of problems. I don't know.
> Not going to pretend I understand anything of the statmachine for
> setting up connections.
>
>>> ./msg/async/AsyncConnection.cc at line 1014
>>>     case STATE_CONNECTING_RE:
>>>       {
>>>         r = net.reconnect(get_peer_addr(), sd);
>>>         if (r < 0) {
>>>           ldout(async_msgr->cct, 1) << __func__ << " reconnect failed "
>>> << dendl;
>>>           goto fail;
>>>         } else if (r > 0) {
>>>           ldout(async_msgr->cct, 10) << __func__ << " nonblock connect
>>> inprogress" << dendl;
>>>           center->create_file_event(sd, EVENT_WRITABLE, read_handler);
>>>           break;
>>>         }
>>>
>>>       lderr(async_msgr->cct) << __func__ << ":" <<__LINE__
>>>         << " delete_file_event(" << sd << ", EVENT_WRITABLE)"
>>>         << dendl;
>>>         center->delete_file_event(sd, EVENT_WRITABLE);
>>>         state = STATE_CONNECTING_WAIT_BANNER;
>>>         break;
>>>       }
>
> If I put debug code in Event and {Epoll,Kqueue}Event, and the trace for
> this file handle is:
> 2016-06-22 12:54:59.183258 811653300 -1 Event(0x81164f448
> owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
> started fd=13 mask=1 original mask is 0
> 2016-06-22 12:54:59.183279 811653300 -1 KqueueDriver.add_event add event
> fd = 13 to kqfd = 10 cur_mask = 0 add_mask = 1
> 2016-06-22 12:54:59.183286 811653300 -1 Event(0x81164f448
> owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
> end fd=13 mask=1 original mask is 1
> 2016-06-22 12:54:59.183293 811653300 -1 Event(0x81164f448
> owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event
> started fd=13 mask=2 original mask is 1
> 2016-06-22 12:54:59.183304 811653300 -1 KqueueDriver.del_event delete
> event fd = 13 to kqfd = 10 cur_mask = 1 delmask = 2
> 2016-06-22 12:54:59.183308 811653300 -1 KqueueDriver.del_event unable to
> delete event: (2) No such file or directory. for handle: 13 kqfd handle: 10
>
> And then we hit assert(BUG)
>
> Looking at the e_poll output from Linux, it looks like:
> 2016-06-22 15:28:15.237564 7ff58d7fa700 -1 Event(0x7ff5900b1038
> owner=140692617668352 nevent=5000 time_id=1).create_file_event create
> event started fd=13 mask=1 original mask is 0
> 2016-06-22 15:28:15.237595 7ff58d7fa700 -1 Event(0x7ff5900b1038
> owner=140692617668352 nevent=5000 time_id=1).create_file_event create
> event end fd=13 mask=1 original mask is 1
> 2016-06-22 15:28:15.237609 7ff58d7fa700 -1 Event(0x7ff5900b1038
> owner=140692617668352 nevent=5000 time_id=1).delete_file_event delete
> event started fd=13 mask=2 original mask is 1
> 2016-06-22 15:28:15.237613 7ff58d7fa700 -1 EpollDriver.del_event delete
> event fd = 13 to epfd = 10 cur_mask = 1 delmask = 2
>
> The access pattern is the same, but that does not generate an error.
>
> I think the error message actually means that there is no event in the
> Keventqueue, that actually matches. And that is because EVENT_WRITABLE
> is asked to be removed, but the event actually only has EVENT_READ
> activated. So the EV_DELETE fails.
>
> I have not tried this trace on Linux to see what is the result there.
> Perhaps Linux e_poll stuff does not really care if you want a
> non-existent filter?
> But the assert is rather detrimental on FreeBSD code :(
>
> But the other strange part for me is that the code tries to remove
> events that are really not inserted....
> Wether that is a problem in the current code? I do not know, but it
> certainly strange.
>
> --WjW
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Not matching event states in ./msg/async/AsyncConnection.cc
  2016-06-22 14:36   ` Haomai Wang
@ 2016-06-22 15:25     ` Willem Jan Withagen
  2016-06-22 22:09       ` Willem Jan Withagen
  2016-06-22 22:16     ` Willem Jan Withagen
  1 sibling, 1 reply; 13+ messages in thread
From: Willem Jan Withagen @ 2016-06-22 15:25 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Ceph Development

On 22-6-2016 16:36, Haomai Wang wrote:
> Oh, sorry. I still realize you are testing on kqueue event backend.
> 
> I submit a pr to fix this. plz help to verify whether it works for you
> since I don't have bsd handy...
> 
> https://github.com/ceph/ceph/pull/9869

Yup,
Saw this flying by.
Got to finish the running 'make check', and then I'll put your stuff in
and let you know.

--WjW

> On Wed, Jun 22, 2016 at 9:47 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>> This time also to the list.
>>
>> On 20-6-2016 04:40, Haomai Wang wrote:
>>> On Mon, Jun 20, 2016 at 5:28 AM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>>>> Hi,
>>>>
>>>> I 've just rebased, but now I'm getting BUG warnings in del_event()
>>>> calls. And I suspect that it is because of the snippet below...
>>>>
>>>> In STATE_CONNECTING EVENT_READABLE is set on sd, but we try a bit latter
>>>> to remove EVENT_WRITABLE. Which abort since asserts were added to the
>>>> kevent code.
>>>>
>>>> Could it be that in STATE_CONNECTING_RE EVENT_READABLE needs to be removed?
>>>
>>> https://github.com/ceph/ceph/pull/9086/commits/a74ce419133881ff8618733a0501c4a47e1368e3
>>
>> 'mmmm,
>>
>> That piece of code is already in the code that I'm testing.
>> So that could be the source of problems. I don't know.
>> Not going to pretend I understand anything of the statmachine for
>> setting up connections.
>>
>>>> ./msg/async/AsyncConnection.cc at line 1014
>>>>     case STATE_CONNECTING_RE:
>>>>       {
>>>>         r = net.reconnect(get_peer_addr(), sd);
>>>>         if (r < 0) {
>>>>           ldout(async_msgr->cct, 1) << __func__ << " reconnect failed "
>>>> << dendl;
>>>>           goto fail;
>>>>         } else if (r > 0) {
>>>>           ldout(async_msgr->cct, 10) << __func__ << " nonblock connect
>>>> inprogress" << dendl;
>>>>           center->create_file_event(sd, EVENT_WRITABLE, read_handler);
>>>>           break;
>>>>         }
>>>>
>>>>       lderr(async_msgr->cct) << __func__ << ":" <<__LINE__
>>>>         << " delete_file_event(" << sd << ", EVENT_WRITABLE)"
>>>>         << dendl;
>>>>         center->delete_file_event(sd, EVENT_WRITABLE);
>>>>         state = STATE_CONNECTING_WAIT_BANNER;
>>>>         break;
>>>>       }
>>
>> If I put debug code in Event and {Epoll,Kqueue}Event, and the trace for
>> this file handle is:
>> 2016-06-22 12:54:59.183258 811653300 -1 Event(0x81164f448
>> owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
>> started fd=13 mask=1 original mask is 0
>> 2016-06-22 12:54:59.183279 811653300 -1 KqueueDriver.add_event add event
>> fd = 13 to kqfd = 10 cur_mask = 0 add_mask = 1
>> 2016-06-22 12:54:59.183286 811653300 -1 Event(0x81164f448
>> owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
>> end fd=13 mask=1 original mask is 1
>> 2016-06-22 12:54:59.183293 811653300 -1 Event(0x81164f448
>> owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event
>> started fd=13 mask=2 original mask is 1
>> 2016-06-22 12:54:59.183304 811653300 -1 KqueueDriver.del_event delete
>> event fd = 13 to kqfd = 10 cur_mask = 1 delmask = 2
>> 2016-06-22 12:54:59.183308 811653300 -1 KqueueDriver.del_event unable to
>> delete event: (2) No such file or directory. for handle: 13 kqfd handle: 10
>>
>> And then we hit assert(BUG)
>>
>> Looking at the e_poll output from Linux, it looks like:
>> 2016-06-22 15:28:15.237564 7ff58d7fa700 -1 Event(0x7ff5900b1038
>> owner=140692617668352 nevent=5000 time_id=1).create_file_event create
>> event started fd=13 mask=1 original mask is 0
>> 2016-06-22 15:28:15.237595 7ff58d7fa700 -1 Event(0x7ff5900b1038
>> owner=140692617668352 nevent=5000 time_id=1).create_file_event create
>> event end fd=13 mask=1 original mask is 1
>> 2016-06-22 15:28:15.237609 7ff58d7fa700 -1 Event(0x7ff5900b1038
>> owner=140692617668352 nevent=5000 time_id=1).delete_file_event delete
>> event started fd=13 mask=2 original mask is 1
>> 2016-06-22 15:28:15.237613 7ff58d7fa700 -1 EpollDriver.del_event delete
>> event fd = 13 to epfd = 10 cur_mask = 1 delmask = 2
>>
>> The access pattern is the same, but that does not generate an error.
>>
>> I think the error message actually means that there is no event in the
>> Keventqueue, that actually matches. And that is because EVENT_WRITABLE
>> is asked to be removed, but the event actually only has EVENT_READ
>> activated. So the EV_DELETE fails.
>>
>> I have not tried this trace on Linux to see what is the result there.
>> Perhaps Linux e_poll stuff does not really care if you want a
>> non-existent filter?
>> But the assert is rather detrimental on FreeBSD code :(
>>
>> But the other strange part for me is that the code tries to remove
>> events that are really not inserted....
>> Wether that is a problem in the current code? I do not know, but it
>> certainly strange.
>>
>> --WjW
>>
>>


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Not matching event states in ./msg/async/AsyncConnection.cc
  2016-06-22 15:25     ` Willem Jan Withagen
@ 2016-06-22 22:09       ` Willem Jan Withagen
  0 siblings, 0 replies; 13+ messages in thread
From: Willem Jan Withagen @ 2016-06-22 22:09 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Ceph Development

On 22-6-2016 17:25, Willem Jan Withagen wrote:
> On 22-6-2016 16:36, Haomai Wang wrote:
>> Oh, sorry. I still realize you are testing on kqueue event backend.
>>
>> I submit a pr to fix this. plz help to verify whether it works for you
>> since I don't have bsd handy...
>>
>> https://github.com/ceph/ceph/pull/9869
> 
> Yup,
> Saw this flying by.
> Got to finish the running 'make check', and then I'll put your stuff in
> and let you know.

Slightly better, but still not quite there yet:
2016-06-22 23:49:22.862340 811653300 -1 KqueueDriver.add_event add event
fd = 13 to kqfd = 10 cur_mask = 0 add_mask = 1
2016-06-22 23:49:22.862346 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
end fd=13 mask=1 original mask
 is 1
2016-06-22 23:49:22.862353 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event
started fd=13 mask=2 original
mask is 1
2016-06-22 23:49:22.862364 811653300 -1 KqueueDriver.del_event delete
event fd = 13 to kqfd = 10 cur_mask = 1 delmask = 2
2016-06-22 23:49:22.862367 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event
end fd=13 mask=2 original mask
 is 1
2016-06-22 23:49:22.862382 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
started fd=13 mask=2 original
mask is 1
2016-06-22 23:49:22.862386 811653300 -1 KqueueDriver.add_event add event
fd = 13 to kqfd = 10 cur_mask = 1 add_mask = 2
2016-06-22 23:49:22.862391 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
end fd=13 mask=2 original mask
 is 3
2016-06-22 23:49:22.863084 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event
started fd=13 mask=2 original
mask is 3
2016-06-22 23:49:22.863095 811653300 -1 KqueueDriver.del_event delete
event fd = 13 to kqfd = 10 cur_mask = 3 delmask = 2
2016-06-22 23:49:22.863100 811653300 -1 KqueueDriver.del_event unable to
delete event: (22) Invalid argument. for handle: 13 kqfd handle: 10
msg/async/Event.cc: In function 'void
EventCenter::delete_file_event(int, int)' thread 811653300 time
2016-06-22 23:49:22.863112
msg/async/Event.cc: 210: FAILED assert(0 == "del_event BUG!")
 ceph version Development (no_version)

But there is a small type in the patch, the last two lines must be:
  if (mask & EVENT_WRITABLE)
    filter |= EVFILT_WRITE;

After which there are no more asserts.
So you need to fix this.

--WjW


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Not matching event states in ./msg/async/AsyncConnection.cc
  2016-06-22 14:36   ` Haomai Wang
  2016-06-22 15:25     ` Willem Jan Withagen
@ 2016-06-22 22:16     ` Willem Jan Withagen
  2016-06-23  7:22       ` Willem Jan Withagen
  1 sibling, 1 reply; 13+ messages in thread
From: Willem Jan Withagen @ 2016-06-22 22:16 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Ceph Development

On 22-6-2016 16:36, Haomai Wang wrote:
> Oh, sorry. I still realize you are testing on kqueue event backend.
> 
> I submit a pr to fix this. plz help to verify whether it works for you
> since I don't have bsd handy...
> 
> https://github.com/ceph/ceph/pull/9869

I think add_event needs about the same treatment.
Trying, Testing ATM....

--WjW


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Not matching event states in ./msg/async/AsyncConnection.cc
  2016-06-22 22:16     ` Willem Jan Withagen
@ 2016-06-23  7:22       ` Willem Jan Withagen
  2016-06-23  7:44         ` Haomai Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Willem Jan Withagen @ 2016-06-23  7:22 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Ceph Development

On 23-6-2016 00:16, Willem Jan Withagen wrote:
> On 22-6-2016 16:36, Haomai Wang wrote:
>> Oh, sorry. I still realize you are testing on kqueue event backend.
>>
>> I submit a pr to fix this. plz help to verify whether it works for you
>> since I don't have bsd handy...
>>
>> https://github.com/ceph/ceph/pull/9869
> 
> I think add_event needs about the same treatment.
> Trying, Testing ATM....

errgh, not quite...

It also generates errors when trying to delete EVFILT_READ (mask=2) from
an eventfilter that has both READ and WRITE set (mask=3).
Next to that the ms_async_messenger threads seem to be busy_waiting
looping and loading a full CPU core per thread.

So I think I need some testing code to see what the requirements of
kqueue actually are, compared to what async_messenger does.

Could be that if we want to go from (READ|WRITE) to either READ or WRITE
the event needs to be deleted first and than added anew.

--WjW


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Not matching event states in ./msg/async/AsyncConnection.cc
  2016-06-23  7:22       ` Willem Jan Withagen
@ 2016-06-23  7:44         ` Haomai Wang
  2016-06-23  9:40           ` Willem Jan Withagen
  0 siblings, 1 reply; 13+ messages in thread
From: Haomai Wang @ 2016-06-23  7:44 UTC (permalink / raw)
  To: Willem Jan Withagen; +Cc: Ceph Development

On Thu, Jun 23, 2016 at 3:22 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
> On 23-6-2016 00:16, Willem Jan Withagen wrote:
>> On 22-6-2016 16:36, Haomai Wang wrote:
>>> Oh, sorry. I still realize you are testing on kqueue event backend.
>>>
>>> I submit a pr to fix this. plz help to verify whether it works for you
>>> since I don't have bsd handy...
>>>
>>> https://github.com/ceph/ceph/pull/9869
>>
>> I think add_event needs about the same treatment.
>> Trying, Testing ATM....
>
> errgh, not quite...
>
> It also generates errors when trying to delete EVFILT_READ (mask=2) from
> an eventfilter that has both READ and WRITE set (mask=3).
> Next to that the ms_async_messenger threads seem to be busy_waiting
> looping and loading a full CPU core per thread.
>
> So I think I need some testing code to see what the requirements of
> kqueue actually are, compared to what async_messenger does.
>
> Could be that if we want to go from (READ|WRITE) to either READ or WRITE
> the event needs to be deleted first and than added anew.

Hmm, I think I need to reread kqueue man page to figure out the problem...

>
> --WjW
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Not matching event states in ./msg/async/AsyncConnection.cc
  2016-06-23  7:44         ` Haomai Wang
@ 2016-06-23  9:40           ` Willem Jan Withagen
  2016-06-23 12:28             ` Haomai Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Willem Jan Withagen @ 2016-06-23  9:40 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Ceph Development

On 23-6-2016 09:44, Haomai Wang wrote:
> On Thu, Jun 23, 2016 at 3:22 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>> On 23-6-2016 00:16, Willem Jan Withagen wrote:
>>> On 22-6-2016 16:36, Haomai Wang wrote:
>>>> Oh, sorry. I still realize you are testing on kqueue event backend.
>>>>
>>>> I submit a pr to fix this. plz help to verify whether it works for you
>>>> since I don't have bsd handy...
>>>>
>>>> https://github.com/ceph/ceph/pull/9869
>>>
>>> I think add_event needs about the same treatment.
>>> Trying, Testing ATM....
>>
>> errgh, not quite...
>>
>> It also generates errors when trying to delete EVFILT_READ (mask=2) from
>> an eventfilter that has both READ and WRITE set (mask=3).
>> Next to that the ms_async_messenger threads seem to be busy_waiting
>> looping and loading a full CPU core per thread.
>>
>> So I think I need some testing code to see what the requirements of
>> kqueue actually are, compared to what async_messenger does.
>>
>> Could be that if we want to go from (READ|WRITE) to either READ or WRITE
>> the event needs to be deleted first and than added anew.
> 
> Hmm, I think I need to reread kqueue man page to figure out the problem...

Hi,

Perhaps we should take the Ceph-dev list off the CC, since I doubt that
they would like to read all the details. Feel free to do so, and mail me
private until there is a resolution.

Eh, perhaps it is in the manual page but it is rather vague in how it
deals with the filters internally. And how you can modify event-filters
already in the kqueue filterlist.

You currently assume that the filters you assign are internally
summarised such that you can do
	add(read)
	add(write)
	delete(read|write)

Or	add(read|write)
	delete(write)
	delete(read)

What I read in the man page:
EV_ADD       Adds the event to the kqueue.  Re-adding an existing event
             will modify the parameters of the original event, and not
             result in a duplicate entry.  Adding an event automatically
             enables it, unless overridden by the EV_DISABLE flag.

EV_DELETE    Removes the event from the kqueue.  Events which are
             attached to file descriptors are automatically deleted on
             the last close of the descriptor.

So I would read that as that you need to do any changes to the
filter-flags doing EV_ADD. And only removing the full event should be
done with EV_DELETE.
And an event is identified by its "ident" only:
 ident      Value used to identify this event.  The exact interpretation
            is determined by the attached filter, but often is a file
            descriptor.
So any of the other settings are not used in determining which event to
add/modify/delete.

Now the problem lies in the "modify" word in EV_ADD. I woul;d read that
as overwrite. Your code assumes that flags are merged when doing ADD or
DELETE.

I think the code should use ADD with the settings/mask that needs to be
there, and only DELETE when the required filtermask is actually 0.

Does this make sense?

--WjW



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Not matching event states in ./msg/async/AsyncConnection.cc
  2016-06-23  9:40           ` Willem Jan Withagen
@ 2016-06-23 12:28             ` Haomai Wang
  2016-06-23 13:16               ` Willem Jan Withagen
  0 siblings, 1 reply; 13+ messages in thread
From: Haomai Wang @ 2016-06-23 12:28 UTC (permalink / raw)
  To: Willem Jan Withagen; +Cc: Ceph Development

On Thu, Jun 23, 2016 at 5:40 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
> On 23-6-2016 09:44, Haomai Wang wrote:
>> On Thu, Jun 23, 2016 at 3:22 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>>> On 23-6-2016 00:16, Willem Jan Withagen wrote:
>>>> On 22-6-2016 16:36, Haomai Wang wrote:
>>>>> Oh, sorry. I still realize you are testing on kqueue event backend.
>>>>>
>>>>> I submit a pr to fix this. plz help to verify whether it works for you
>>>>> since I don't have bsd handy...
>>>>>
>>>>> https://github.com/ceph/ceph/pull/9869
>>>>
>>>> I think add_event needs about the same treatment.
>>>> Trying, Testing ATM....
>>>
>>> errgh, not quite...
>>>
>>> It also generates errors when trying to delete EVFILT_READ (mask=2) from
>>> an eventfilter that has both READ and WRITE set (mask=3).
>>> Next to that the ms_async_messenger threads seem to be busy_waiting
>>> looping and loading a full CPU core per thread.
>>>
>>> So I think I need some testing code to see what the requirements of
>>> kqueue actually are, compared to what async_messenger does.
>>>
>>> Could be that if we want to go from (READ|WRITE) to either READ or WRITE
>>> the event needs to be deleted first and than added anew.
>>
>> Hmm, I think I need to reread kqueue man page to figure out the problem...
>
> Hi,
>
> Perhaps we should take the Ceph-dev list off the CC, since I doubt that
> they would like to read all the details. Feel free to do so, and mail me
> private until there is a resolution.
>
> Eh, perhaps it is in the manual page but it is rather vague in how it
> deals with the filters internally. And how you can modify event-filters
> already in the kqueue filterlist.
>
> You currently assume that the filters you assign are internally
> summarised such that you can do
>         add(read)
>         add(write)
>         delete(read|write)
>
> Or      add(read|write)
>         delete(write)
>         delete(read)
>
> What I read in the man page:
> EV_ADD       Adds the event to the kqueue.  Re-adding an existing event
>              will modify the parameters of the original event, and not
>              result in a duplicate entry.  Adding an event automatically
>              enables it, unless overridden by the EV_DISABLE flag.
>
> EV_DELETE    Removes the event from the kqueue.  Events which are
>              attached to file descriptors are automatically deleted on
>              the last close of the descriptor.
>
> So I would read that as that you need to do any changes to the
> filter-flags doing EV_ADD. And only removing the full event should be
> done with EV_DELETE.
> And an event is identified by its "ident" only:
>  ident      Value used to identify this event.  The exact interpretation
>             is determined by the attached filter, but often is a file
>             descriptor.
> So any of the other settings are not used in determining which event to
> add/modify/delete.
>
> Now the problem lies in the "modify" word in EV_ADD. I woul;d read that
> as overwrite. Your code assumes that flags are merged when doing ADD or
> DELETE.
>
> I think the code should use ADD with the settings/mask that needs to be
> there, and only DELETE when the required filtermask is actually 0.
>

I just remember four years ago(I'm a bsd fan that time) I have written
kqueue codes(https://github.com/yuyuyu101/twemproxy/commit/68d04283ba8d58142e544868842307d66d8f4600#diff-ea1f3d7f51a1def56821fefd14eb8560R193).

EV_SET(&event, fd, EVFILT_READ|EV_CLEAR, EV_ADD, 0, 0, data);

I think we can follow this....

> Does this make sense?
>
> --WjW
>
>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Not matching event states in ./msg/async/AsyncConnection.cc
  2016-06-23 12:28             ` Haomai Wang
@ 2016-06-23 13:16               ` Willem Jan Withagen
  2016-06-23 15:16                 ` Willem Jan Withagen
  0 siblings, 1 reply; 13+ messages in thread
From: Willem Jan Withagen @ 2016-06-23 13:16 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Ceph Development

On 23-6-2016 14:28, Haomai Wang wrote:
> On Thu, Jun 23, 2016 at 5:40 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>> On 23-6-2016 09:44, Haomai Wang wrote:
>>> On Thu, Jun 23, 2016 at 3:22 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>>>> On 23-6-2016 00:16, Willem Jan Withagen wrote:
>>>>> On 22-6-2016 16:36, Haomai Wang wrote:
>>>>>> Oh, sorry. I still realize you are testing on kqueue event backend.
>>>>>>
>>>>>> I submit a pr to fix this. plz help to verify whether it works for you
>>>>>> since I don't have bsd handy...
>>>>>>
>>>>>> https://github.com/ceph/ceph/pull/9869
>>>>>
>>>>> I think add_event needs about the same treatment.
>>>>> Trying, Testing ATM....
>>>>
>>>> errgh, not quite...
>>>>
>>>> It also generates errors when trying to delete EVFILT_READ (mask=2) from
>>>> an eventfilter that has both READ and WRITE set (mask=3).
>>>> Next to that the ms_async_messenger threads seem to be busy_waiting
>>>> looping and loading a full CPU core per thread.
>>>>
>>>> So I think I need some testing code to see what the requirements of
>>>> kqueue actually are, compared to what async_messenger does.
>>>>
>>>> Could be that if we want to go from (READ|WRITE) to either READ or WRITE
>>>> the event needs to be deleted first and than added anew.
>>>
>>> Hmm, I think I need to reread kqueue man page to figure out the problem...
>>
>> Hi,
>>
>> Perhaps we should take the Ceph-dev list off the CC, since I doubt that
>> they would like to read all the details. Feel free to do so, and mail me
>> private until there is a resolution.
>>
>> Eh, perhaps it is in the manual page but it is rather vague in how it
>> deals with the filters internally. And how you can modify event-filters
>> already in the kqueue filterlist.
>>
>> You currently assume that the filters you assign are internally
>> summarised such that you can do
>>         add(read)
>>         add(write)
>>         delete(read|write)
>>
>> Or      add(read|write)
>>         delete(write)
>>         delete(read)
>>
>> What I read in the man page:
>> EV_ADD       Adds the event to the kqueue.  Re-adding an existing event
>>              will modify the parameters of the original event, and not
>>              result in a duplicate entry.  Adding an event automatically
>>              enables it, unless overridden by the EV_DISABLE flag.
>>
>> EV_DELETE    Removes the event from the kqueue.  Events which are
>>              attached to file descriptors are automatically deleted on
>>              the last close of the descriptor.
>>
>> So I would read that as that you need to do any changes to the
>> filter-flags doing EV_ADD. And only removing the full event should be
>> done with EV_DELETE.
>> And an event is identified by its "ident" only:
>>  ident      Value used to identify this event.  The exact interpretation
>>             is determined by the attached filter, but often is a file
>>             descriptor.
>> So any of the other settings are not used in determining which event to
>> add/modify/delete.
>>
>> Now the problem lies in the "modify" word in EV_ADD. I woul;d read that
>> as overwrite. Your code assumes that flags are merged when doing ADD or
>> DELETE.
>>
>> I think the code should use ADD with the settings/mask that needs to be
>> there, and only DELETE when the required filtermask is actually 0.
>>
> 
> I just remember four years ago(I'm a bsd fan that time) I have written
> kqueue codes(https://github.com/yuyuyu101/twemproxy/commit/68d04283ba8d58142e544868842307d66d8f4600#diff-ea1f3d7f51a1def56821fefd14eb8560R193).
> 
> EV_SET(&event, fd, EVFILT_READ|EV_CLEAR, EV_ADD, 0, 0, data);
> 
> I think we can follow this....

Anything you throw at me, I will test.
ATM busy reworking some of the other compat code I have in a pull, so
when ever you have something, let me know.

--WjW



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Not matching event states in ./msg/async/AsyncConnection.cc
  2016-06-23 13:16               ` Willem Jan Withagen
@ 2016-06-23 15:16                 ` Willem Jan Withagen
  0 siblings, 0 replies; 13+ messages in thread
From: Willem Jan Withagen @ 2016-06-23 15:16 UTC (permalink / raw)
  To: Haomai Wang; +Cc: Ceph Development

On 23-6-2016 15:16, Willem Jan Withagen wrote:
> On 23-6-2016 14:28, Haomai Wang wrote:
>> On Thu, Jun 23, 2016 at 5:40 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>>> On 23-6-2016 09:44, Haomai Wang wrote:
>>>> On Thu, Jun 23, 2016 at 3:22 PM, Willem Jan Withagen <wjw@digiware.nl> wrote:
>>>>> On 23-6-2016 00:16, Willem Jan Withagen wrote:
>>>>>> On 22-6-2016 16:36, Haomai Wang wrote:
>>>>>>> Oh, sorry. I still realize you are testing on kqueue event backend.
>>>>>>>
>>>>>>> I submit a pr to fix this. plz help to verify whether it works for you
>>>>>>> since I don't have bsd handy...
>>>>>>>
>>>>>>> https://github.com/ceph/ceph/pull/9869
>>>>>>
>>>>>> I think add_event needs about the same treatment.
>>>>>> Trying, Testing ATM....
>>>>>
>>>>> errgh, not quite...
>>>>>
>>>>> It also generates errors when trying to delete EVFILT_READ (mask=2) from
>>>>> an eventfilter that has both READ and WRITE set (mask=3).
>>>>> Next to that the ms_async_messenger threads seem to be busy_waiting
>>>>> looping and loading a full CPU core per thread.
>>>>>
>>>>> So I think I need some testing code to see what the requirements of
>>>>> kqueue actually are, compared to what async_messenger does.
>>>>>
>>>>> Could be that if we want to go from (READ|WRITE) to either READ or WRITE
>>>>> the event needs to be deleted first and than added anew.
>>>>
>>>> Hmm, I think I need to reread kqueue man page to figure out the problem...
>>>
>>> Hi,
>>>
>>> Perhaps we should take the Ceph-dev list off the CC, since I doubt that
>>> they would like to read all the details. Feel free to do so, and mail me
>>> private until there is a resolution.
>>>
>>> Eh, perhaps it is in the manual page but it is rather vague in how it
>>> deals with the filters internally. And how you can modify event-filters
>>> already in the kqueue filterlist.
>>>
>>> You currently assume that the filters you assign are internally
>>> summarised such that you can do
>>>         add(read)
>>>         add(write)
>>>         delete(read|write)
>>>
>>> Or      add(read|write)
>>>         delete(write)
>>>         delete(read)
>>>
>>> What I read in the man page:
>>> EV_ADD       Adds the event to the kqueue.  Re-adding an existing event
>>>              will modify the parameters of the original event, and not
>>>              result in a duplicate entry.  Adding an event automatically
>>>              enables it, unless overridden by the EV_DISABLE flag.
>>>
>>> EV_DELETE    Removes the event from the kqueue.  Events which are
>>>              attached to file descriptors are automatically deleted on
>>>              the last close of the descriptor.
>>>
>>> So I would read that as that you need to do any changes to the
>>> filter-flags doing EV_ADD. And only removing the full event should be
>>> done with EV_DELETE.
>>> And an event is identified by its "ident" only:
>>>  ident      Value used to identify this event.  The exact interpretation
>>>             is determined by the attached filter, but often is a file
>>>             descriptor.
>>> So any of the other settings are not used in determining which event to
>>> add/modify/delete.
>>>
>>> Now the problem lies in the "modify" word in EV_ADD. I woul;d read that
>>> as overwrite. Your code assumes that flags are merged when doing ADD or
>>> DELETE.
>>>
>>> I think the code should use ADD with the settings/mask that needs to be
>>> there, and only DELETE when the required filtermask is actually 0.
>>>
>>
>> I just remember four years ago(I'm a bsd fan that time) I have written
>> kqueue codes(https://github.com/yuyuyu101/twemproxy/commit/68d04283ba8d58142e544868842307d66d8f4600#diff-ea1f3d7f51a1def56821fefd14eb8560R193).
>>
>> EV_SET(&event, fd, EVFILT_READ|EV_CLEAR, EV_ADD, 0, 0, data);
>>
>> I think we can follow this....
> 
> Anything you throw at me, I will test.
> ATM busy reworking some of the other compat code I have in a pull, so
> when ever you have something, let me know.

I'm still sort of woried about this type of sequence:
1) 2016-06-23 17:01:27.496757 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
started fd=13 mask=1 original mask is 0
2) 2016-06-23 17:01:27.496788 811653300 -1 KqueueDriver.add_event add
event fd = 13 to kqfd = 10 cur_mask = 0 add_mask = 1
3) 2016-06-23 17:01:27.496792 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).create_file_event create event
end fd=13 mask=1 original mask is 1
4) 2016-06-23 17:01:27.496808 811653300 -1 Event(0x81164f448
owner=0x811653300 nevent=5000 time_id=1).delete_file_event delete event
started fd=13 mask=2 original mask is 1
5) 2016-06-23 17:01:27.496821 811653300 -1 KqueueDriver.del_event delete
event fd = 13 to kqfd = 10 cur_mask = 1 delmask = 2

We create an event with add_mask = 1, cur_mask = 0, as expected
the we add an event with add_mask = 1, cur_mask = 1, that will work

but WHY does line 4 want to delete  with delmask = 2, and cur_mask = 1

It looks to me that there is something out of sync in the layers higher
up in the stack. You cannot delete something that is not set??

--WjW




^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: Not matching event states in ./msg/async/AsyncConnection.cc
  2016-06-19 21:28 Willem Jan Withagen
@ 2016-06-20  2:40 ` Haomai Wang
  0 siblings, 0 replies; 13+ messages in thread
From: Haomai Wang @ 2016-06-20  2:40 UTC (permalink / raw)
  To: Willem Jan Withagen; +Cc: Ceph Development

On Mon, Jun 20, 2016 at 5:28 AM, Willem Jan Withagen <wjw@digiware.nl> wrote:
> Hi,
>
> I 've just rebased, but now I'm getting BUG warnings in del_event()
> calls. And I suspect that it is because of the snippet below...
>
> In STATE_CONNECTING EVENT_READABLE is set on sd, but we try a bit latter
> to remove EVENT_WRITABLE. Which abort since asserts were added to the
> kevent code.
>
> Could it be that in STATE_CONNECTING_RE EVENT_READABLE needs to be removed?

https://github.com/ceph/ceph/pull/9086/commits/a74ce419133881ff8618733a0501c4a47e1368e3


>
> --WjW
>
>
> ./msg/async/AsyncConnection.cc at line 1014
>     case STATE_CONNECTING_RE:
>       {
>         r = net.reconnect(get_peer_addr(), sd);
>         if (r < 0) {
>           ldout(async_msgr->cct, 1) << __func__ << " reconnect failed "
> << dendl;
>           goto fail;
>         } else if (r > 0) {
>           ldout(async_msgr->cct, 10) << __func__ << " nonblock connect
> inprogress" << dendl;
>           center->create_file_event(sd, EVENT_WRITABLE, read_handler);
>           break;
>         }
>
>       lderr(async_msgr->cct) << __func__ << ":" <<__LINE__
>         << " delete_file_event(" << sd << ", EVENT_WRITABLE)"
>         << dendl;
>         center->delete_file_event(sd, EVENT_WRITABLE);
>         state = STATE_CONNECTING_WAIT_BANNER;
>         break;
>       }
>
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Not matching event states in ./msg/async/AsyncConnection.cc
@ 2016-06-19 21:28 Willem Jan Withagen
  2016-06-20  2:40 ` Haomai Wang
  0 siblings, 1 reply; 13+ messages in thread
From: Willem Jan Withagen @ 2016-06-19 21:28 UTC (permalink / raw)
  To: Ceph Development

Hi,

I 've just rebased, but now I'm getting BUG warnings in del_event()
calls. And I suspect that it is because of the snippet below...

In STATE_CONNECTING EVENT_READABLE is set on sd, but we try a bit latter
to remove EVENT_WRITABLE. Which abort since asserts were added to the
kevent code.

Could it be that in STATE_CONNECTING_RE EVENT_READABLE needs to be removed?

--WjW


./msg/async/AsyncConnection.cc at line 1014
    case STATE_CONNECTING_RE:
      {
        r = net.reconnect(get_peer_addr(), sd);
        if (r < 0) {
          ldout(async_msgr->cct, 1) << __func__ << " reconnect failed "
<< dendl;
          goto fail;
        } else if (r > 0) {
          ldout(async_msgr->cct, 10) << __func__ << " nonblock connect
inprogress" << dendl;
          center->create_file_event(sd, EVENT_WRITABLE, read_handler);
          break;
        }

      lderr(async_msgr->cct) << __func__ << ":" <<__LINE__
        << " delete_file_event(" << sd << ", EVENT_WRITABLE)"
        << dendl;
        center->delete_file_event(sd, EVENT_WRITABLE);
        state = STATE_CONNECTING_WAIT_BANNER;
        break;
      }


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-06-23 15:16 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <0a8a87d5-777c-7e78-0400-cf1978217928@digiware.nl>
2016-06-22 13:47 ` Not matching event states in ./msg/async/AsyncConnection.cc Willem Jan Withagen
2016-06-22 14:36   ` Haomai Wang
2016-06-22 15:25     ` Willem Jan Withagen
2016-06-22 22:09       ` Willem Jan Withagen
2016-06-22 22:16     ` Willem Jan Withagen
2016-06-23  7:22       ` Willem Jan Withagen
2016-06-23  7:44         ` Haomai Wang
2016-06-23  9:40           ` Willem Jan Withagen
2016-06-23 12:28             ` Haomai Wang
2016-06-23 13:16               ` Willem Jan Withagen
2016-06-23 15:16                 ` Willem Jan Withagen
2016-06-19 21:28 Willem Jan Withagen
2016-06-20  2:40 ` Haomai Wang

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.