All of lore.kernel.org
 help / color / mirror / Atom feed
* RTM_NEWLINK not received by application when connecting multiple devices simultaneously
@ 2012-04-19 11:44 Kristian Evensen
  2012-04-19 14:26 ` Ben Greear
  0 siblings, 1 reply; 11+ messages in thread
From: Kristian Evensen @ 2012-04-19 11:44 UTC (permalink / raw)
  To: netdev

Hello,

I am working on my own network manager and have come across a problem
that I have tried to solve for the last couple of days, but with no
luck. My application works by creating a rtnl_handle, joining the
RTMGRP_LINK group, and then using select to do I/O multiplexing (on
the netlink socket) with a pipe used to control the application. I
have written my own function for receiving the packets, called
rtnl_listen_single, which is basically rtnl_listen without the
while(1) loop. When it receives a RTM_NEWLINK message, the application
calls a handle, which configures the interfaces accordingly (for
example by requesting and setting IP, route, netmask, ...).

The application works as intended when I connect interfaces one by
one. However, if I connect two interfaces "simultaneously", the
RTM_NEWLINK message for one of the interfaces is sometimes not
received. Nothing arrives at the handle. It seems to be random which
RTM_NEWLINK actually arrives. I have only been able to recreate this
problem when connecting two USB 3G modems and automatically dialing
the ISP, but I assume it would happen with other technologies as well.
What puzzles me, is that both RTM_NEWLINK messages are seen by for
example ip monitor. This has led me to conclusion that there is a bug
in my application, and my question is therefore, are there any common
mistakes one can make or things to forget that would cause a message
to get lost or not be received, or does anyone have any tips on where
I can start looking?

In order to figure out this, I have so far tried the following:

- I have checked the return value of different functions and made sure
that they return the correct value. So, in other words, for example
the processing of the nlmsg completes and every message is looked at.
However, I have yet to see a rtnetlink message with NLMSG_MULTI set.

- I have made sure that the buffer size is large enough to buffer a
sufficient number of packets. SO_RCVBUF is set to the default of 32
768.

- The handle is still open and valid, as following messages are
received successfully. For example, if I disconnect another interface,
the RTM_NEWLINK message is received.

- I have also checked that the missing message is not blocked, as the
next event does not cause any unintended packets to be received.

The machine I am working on is running the 3.2.0 kernel, with Debian
6.0.4 and iproute-dev marked 20100519-3.

Thanks in advance for any help!

-Kristian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RTM_NEWLINK not received by application when connecting multiple devices simultaneously
  2012-04-19 11:44 RTM_NEWLINK not received by application when connecting multiple devices simultaneously Kristian Evensen
@ 2012-04-19 14:26 ` Ben Greear
  2012-04-19 14:54   ` Stephen Hemminger
  0 siblings, 1 reply; 11+ messages in thread
From: Ben Greear @ 2012-04-19 14:26 UTC (permalink / raw)
  To: Kristian Evensen; +Cc: netdev

On 04/19/2012 04:44 AM, Kristian Evensen wrote:
> Hello,

> The application works as intended when I connect interfaces one by
> one. However, if I connect two interfaces "simultaneously", the
> RTM_NEWLINK message for one of the interfaces is sometimes not
> received. Nothing arrives at the handle. It seems to be random which
> RTM_NEWLINK actually arrives. I have only been able to recreate this
> problem when connecting two USB 3G modems and automatically dialing
> the ISP, but I assume it would happen with other technologies as well.
> What puzzles me, is that both RTM_NEWLINK messages are seen by for
> example ip monitor. This has led me to conclusion that there is a bug
> in my application, and my question is therefore, are there any common
> mistakes one can make or things to forget that would cause a message
> to get lost or not be received, or does anyone have any tips on where
> I can start looking?

Multiple netlink msgs can be received in each read of a netlink
socket.  Maybe you are only processing the first one?

Thanks,
Ben

-- 
Ben Greear <greearb@candelatech.com>
Candela Technologies Inc  http://www.candelatech.com

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RTM_NEWLINK not received by application when connecting multiple devices simultaneously
  2012-04-19 14:26 ` Ben Greear
@ 2012-04-19 14:54   ` Stephen Hemminger
  2012-04-19 15:07     ` Kristian Evensen
  0 siblings, 1 reply; 11+ messages in thread
From: Stephen Hemminger @ 2012-04-19 14:54 UTC (permalink / raw)
  To: Ben Greear; +Cc: Kristian Evensen, netdev

On Thu, 19 Apr 2012 07:26:22 -0700
Ben Greear <greearb@candelatech.com> wrote:

> On 04/19/2012 04:44 AM, Kristian Evensen wrote:
> > Hello,
> 
> > The application works as intended when I connect interfaces one by
> > one. However, if I connect two interfaces "simultaneously", the
> > RTM_NEWLINK message for one of the interfaces is sometimes not
> > received. Nothing arrives at the handle. It seems to be random which
> > RTM_NEWLINK actually arrives. I have only been able to recreate this
> > problem when connecting two USB 3G modems and automatically dialing
> > the ISP, but I assume it would happen with other technologies as well.
> > What puzzles me, is that both RTM_NEWLINK messages are seen by for
> > example ip monitor. This has led me to conclusion that there is a bug
> > in my application, and my question is therefore, are there any common
> > mistakes one can make or things to forget that would cause a message
> > to get lost or not be received, or does anyone have any tips on where
> > I can start looking?
> 
> Multiple netlink msgs can be received in each read of a netlink
> socket.  Maybe you are only processing the first one?

I recommend using the libmnl library instead of the older
libnetlink. The code is cleaner and it handles error cases better.
In libmnl there is a callback interface for parsing
netlink messages.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RTM_NEWLINK not received by application when connecting multiple devices simultaneously
  2012-04-19 14:54   ` Stephen Hemminger
@ 2012-04-19 15:07     ` Kristian Evensen
  2012-04-19 19:54       ` Kristian Evensen
  0 siblings, 1 reply; 11+ messages in thread
From: Kristian Evensen @ 2012-04-19 15:07 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Ben Greear, netdev

Thank you very much for both your replies.

>>
>> Multiple netlink msgs can be received in each read of a netlink
>> socket.  Maybe you are only processing the first one?

Yes, my code does this. I also some debug output for the test
"nlmsg_flags & NLM_F_MULTI", but it never evaluates to true.

> I recommend using the libmnl library instead of the older
> libnetlink. The code is cleaner and it handles error cases better.
> In libmnl there is a callback interface for parsing
> netlink messages.

Thank you for the tip, I will try this as it will make my code much cleaner.

Btw, I hadnt paid attention the age of the package I used. However,
upgrading to the latest available in the Debian repository (20120319)
does not have an effect.

-Kristian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RTM_NEWLINK not received by application when connecting multiple devices simultaneously
  2012-04-19 15:07     ` Kristian Evensen
@ 2012-04-19 19:54       ` Kristian Evensen
  2012-04-19 20:36         ` Stephen Hemminger
  0 siblings, 1 reply; 11+ messages in thread
From: Kristian Evensen @ 2012-04-19 19:54 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Ben Greear, netdev

I spent some more time debugging this now. It turns out that which
interface is seen by my application is not random, it is always the
first one that is connected. This indicates that the bug is that the
netlink message contains information about more than one interface.
However, I am not able to prove this.

When I check for the presence of NLM_F_MULTI, it is always NULL. Also,
the length of the received nlmsg (including payload) always equals the
numbers of bytes I receive from the netlink socket, i.e., all the data
is received.

Based on my understanding of netlink, an nlmsg will only contain one
packet (for example ifinfomsg), unless the NLM_F_MULTI flag is set. Or
am I mistaken?

-Kristian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RTM_NEWLINK not received by application when connecting multiple devices simultaneously
  2012-04-19 19:54       ` Kristian Evensen
@ 2012-04-19 20:36         ` Stephen Hemminger
  2012-04-20 14:07           ` Kristian Evensen
  0 siblings, 1 reply; 11+ messages in thread
From: Stephen Hemminger @ 2012-04-19 20:36 UTC (permalink / raw)
  To: Kristian Evensen; +Cc: Ben Greear, netdev

On Thu, 19 Apr 2012 21:54:24 +0200
Kristian Evensen <kristian.evensen@gmail.com> wrote:

> I spent some more time debugging this now. It turns out that which
> interface is seen by my application is not random, it is always the
> first one that is connected. This indicates that the bug is that the
> netlink message contains information about more than one interface.
> However, I am not able to prove this.

There is no filtering. A dump request always returns all interfaces.

> When I check for the presence of NLM_F_MULTI, it is always NULL. Also,
> the length of the received nlmsg (including payload) always equals the
> numbers of bytes I receive from the netlink socket, i.e., all the data
> is received.

The flags for nested and multi are relatively new. The original ancient
rtnetlink message formats dont use them, don't depend on them.
Since rtnetlink is cast in ABI concrete, it can't be fixed.

> Based on my understanding of netlink, an nlmsg will only contain one
> packet (for example ifinfomsg), unless the NLM_F_MULTI flag is set. Or
> am I mistaken?

The only reliable way is to parse the response to GET request is
to keep reading until you see NLMSG_DONE (or NLMSG_ERROR)

Look at example in libmnl source examples/rtnl/rtnl-link-dump.c

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RTM_NEWLINK not received by application when connecting multiple devices simultaneously
  2012-04-19 20:36         ` Stephen Hemminger
@ 2012-04-20 14:07           ` Kristian Evensen
  2012-04-20 15:14             ` Stephen Hemminger
  0 siblings, 1 reply; 11+ messages in thread
From: Kristian Evensen @ 2012-04-20 14:07 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Ben Greear, netdev

Thank you very much for all the help.

>> I spent some more time debugging this now. It turns out that which
>> interface is seen by my application is not random, it is always the
>> first one that is connected. This indicates that the bug is that the
>> netlink message contains information about more than one interface.
>> However, I am not able to prove this.
>
> There is no filtering. A dump request always returns all interfaces.

I think maybe I didnt explain myself clearly enough, I am sorry. This
does not happen when I dump interface info, but at the event of
connecting two interfaces at the same time.

I have now converted the application to using libmnl, but I still see
the same error. There is one message that is not received by my
netlink socket, or at least, the application does not process it. Just
in case it was bundled together with the message about the first
interface, I tried adding a loop where I received the packets (similar
to the one in the -dump example). However, this did not help.

-Kristian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RTM_NEWLINK not received by application when connecting multiple devices simultaneously
  2012-04-20 14:07           ` Kristian Evensen
@ 2012-04-20 15:14             ` Stephen Hemminger
  2012-04-20 15:54               ` Kristian Evensen
  0 siblings, 1 reply; 11+ messages in thread
From: Stephen Hemminger @ 2012-04-20 15:14 UTC (permalink / raw)
  To: Kristian Evensen; +Cc: Ben Greear, netdev

On Fri, 20 Apr 2012 16:07:42 +0200
Kristian Evensen <kristian.evensen@gmail.com> wrote:

> Thank you very much for all the help.
> 
> >> I spent some more time debugging this now. It turns out that which
> >> interface is seen by my application is not random, it is always the
> >> first one that is connected. This indicates that the bug is that the
> >> netlink message contains information about more than one interface.
> >> However, I am not able to prove this.
> >
> > There is no filtering. A dump request always returns all interfaces.
> 
> I think maybe I didnt explain myself clearly enough, I am sorry. This
> does not happen when I dump interface info, but at the event of
> connecting two interfaces at the same time.
> 
> I have now converted the application to using libmnl, but I still see
> the same error. There is one message that is not received by my
> netlink socket, or at least, the application does not process it. Just
> in case it was bundled together with the message about the first
> interface, I tried adding a loop where I received the packets (similar
> to the one in the -dump example). However, this did not help.
> 
> -Kristian

Check if ip command catches the events (ip monitor).
What is the type of device? and what do you  mean by connecting?
(ie installing, doing admin up, operstate transition, carrier)

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RTM_NEWLINK not received by application when connecting multiple devices simultaneously
  2012-04-20 15:14             ` Stephen Hemminger
@ 2012-04-20 15:54               ` Kristian Evensen
  2012-04-20 16:04                 ` Kristian Evensen
  0 siblings, 1 reply; 11+ messages in thread
From: Kristian Evensen @ 2012-04-20 15:54 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Ben Greear, netdev

On Fri, Apr 20, 2012 at 5:14 PM, Stephen Hemminger
<shemminger@vyatta.com> wrote:
>
> Check if ip command catches the events (ip monitor).

ip monitor catches the event, also when I run ip monitor link. I.e.,
the message is sent to the RTMGRP_LINK, which the application
subscribes to.

> What is the type of device? and what do you  mean by connecting?
> (ie installing, doing admin up, operstate transition, carrier)

The devices are USB 3G Modems (Huawei e173). By connecting, I mean
when ppp has established a successful connection. I.e., the device is
seen as IFF_RUNNING/IF_OPER_UP.

-Kristian

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RTM_NEWLINK not received by application when connecting multiple devices simultaneously
  2012-04-20 15:54               ` Kristian Evensen
@ 2012-04-20 16:04                 ` Kristian Evensen
  2012-04-20 16:15                   ` Stephen Hemminger
  0 siblings, 1 reply; 11+ messages in thread
From: Kristian Evensen @ 2012-04-20 16:04 UTC (permalink / raw)
  To: Stephen Hemminger; +Cc: Ben Greear, netdev

I have uploaded a compressed version of my listener code here:
http://pastebin.com/f1NPYGSh

The othe operations my application do related to netlink, is to
configure the sockets. Could it be that the configuration messages
somehow disturb the retrieval of other messages, as they all use the
same mnl_sock? None of them requests any replies (they are all
RTM_NEW*/NLM_F_CREATE), but I am not sure about the internal working
of netlink.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: RTM_NEWLINK not received by application when connecting multiple devices simultaneously
  2012-04-20 16:04                 ` Kristian Evensen
@ 2012-04-20 16:15                   ` Stephen Hemminger
  0 siblings, 0 replies; 11+ messages in thread
From: Stephen Hemminger @ 2012-04-20 16:15 UTC (permalink / raw)
  To: Kristian Evensen; +Cc: Ben Greear, netdev

On Fri, 20 Apr 2012 18:04:49 +0200
Kristian Evensen <kristian.evensen@gmail.com> wrote:

> I have uploaded a compressed version of my listener code here:
> http://pastebin.com/f1NPYGSh
> 
> The othe operations my application do related to netlink, is to
> configure the sockets. Could it be that the configuration messages
> somehow disturb the retrieval of other messages, as they all use the
> same mnl_sock? None of them requests any replies (they are all
> RTM_NEW*/NLM_F_CREATE), but I am not sure about the internal working
> of netlink.

Most applications using netlink listening have two sockets, one
for events the other for requests. The problem is that your own requests
will generate events and it is hard to tell what to expect next
the response (ACK) or the async event when using the same socket.

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2012-04-20 16:16 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2012-04-19 11:44 RTM_NEWLINK not received by application when connecting multiple devices simultaneously Kristian Evensen
2012-04-19 14:26 ` Ben Greear
2012-04-19 14:54   ` Stephen Hemminger
2012-04-19 15:07     ` Kristian Evensen
2012-04-19 19:54       ` Kristian Evensen
2012-04-19 20:36         ` Stephen Hemminger
2012-04-20 14:07           ` Kristian Evensen
2012-04-20 15:14             ` Stephen Hemminger
2012-04-20 15:54               ` Kristian Evensen
2012-04-20 16:04                 ` Kristian Evensen
2012-04-20 16:15                   ` Stephen Hemminger

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.