All of lore.kernel.org
 help / color / mirror / Atom feed
* Re: DHCP renewal stops after disconnect
@ 2022-05-12 16:37 Denis Kenzior
  0 siblings, 0 replies; 15+ messages in thread
From: Denis Kenzior @ 2022-05-12 16:37 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 1635 bytes --]

Hi Michael,

On 5/12/22 11:19, Michael Johnson wrote:
> Hi Denis,
> 
> I'm trying out the ell patch on some clients and so far it's looking
> good. I also have some clients running without the patch but with

Cool

> IWD_DHCP_DEBUG enabled and I saw a very similar issue occur as the
> original major issue >
> This was caused by the retry timing out so wouldn't have happened with
> the patch but the result is not what I expected in that after we do a
> full rebind, we don't send a T1 renew. I expected that it would do a

So by a full rebind you mean:

May 12 15:03:45 r2-19006 iwd[475]: [DHCPv4] dhcp_client_t2_expired:607
May 12 15:03:45 r2-19006 iwd[475]: [DHCPv4] dhcp_client_t2_expired:614 Entering 
state: DHCP_STATE_REBINDING
May 12 15:05:01 r2-19006 iwd[475]: [DHCPv4] dhcp_client_lease_expired:596

Where we issue a l_dhcp_client_stop and l_dhcp_client_start() again.

> full state refresh and so it should go back to working after the T2
> timeout.

So the problem is that we do not reset the timer for T2 until after the T1 timer 
fires.  Each l_timeout object uses an fd from the kernel, so we try to minimize 
the number of timers active.

As you point out, the timeout value is getting set correctly:

# T1 is set correctly
May 12 15:05:02 r2-19006 iwd[475]: [DHCPv4] dhcp_client_rx_message:943 T1 
expiring in 299927 ms

So the only thing I can think of is that we're leaking the fd somehow, which 
would be pretty embarrassing.  Or we're running out of fds and can't create the 
new timer.  Running under strace or putting in some extra debugs might help?

Regards,
-Denis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: DHCP renewal stops after disconnect
@ 2022-05-16  9:38 Michael Johnson
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Johnson @ 2022-05-16  9:38 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 1765 bytes --]

Hi Denis,

I ran this over the weekend and with the extra logging I can see that
ACD error is "Bad file descriptor". I captured some additional fd info
while iwd was running in the bad state and it looks like timers are
being leaked somewhere.

```
May 14 03:29:00.008410 r2-19101 iwd[477]: [DHCPv4]
announce_wait_timeout:189 Failed to send ACD announcement: Bad file
descriptor

$ cat /proc/477/limits
Limit Soft Limit Hard Limit Units
...
Max open files 1024 524288 files

$ sudo lsof -a -p 477
...
iwd 477 root 1020u a_inode 0,13 0 13445 [timerfd]
iwd 477 root 1021u a_inode 0,13 0 13445 [timerfd]
iwd 477 root 1022u a_inode 0,13 0 13445 [timerfd]
iwd 477 root 1023u a_inode 0,13 0 13445 [timerfd]
```

https://pastebin.com/BHYVTSXT

Regards,
Michael

On Fri, 13 May 2022 at 12:27, Michael Johnson <mjohnson459(a)gmail.com> wrote:
>
> Hi Denis,
>
> I've updated some of the clients in my test environment to use your
> latest ell debug logging patch and I'll let you know what error I see.
>
> On Thu, 12 May 2022 at 18:50, Denis Kenzior <denkenz(a)gmail.com> wrote:
> > > May 12 15:05:02 r2-19006 iwd[475]: [DHCPv4] announce_wait_timeout:187
> > > Failed to send ACD announcement
> > > ```
> >
> > Yes, I saw that.  Looking at the code, we should fail to start acd if we're
> > running out of fds and this logic wouldn't be triggered...
>
> I'm not sure this is related but I notice that in normal logging there
> are never any retries of this method as this is never printed:
> ```
> announce_wait_timeout: ACD_DEBUG("Done announcing");
> ```
> which makes sense as the timeout object is NULL when "skip_probes" is true.
> ```
> l_acd_start: announce_wait_timeout(NULL, acd);
> ```
>
> Regards,
> Michael

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: DHCP renewal stops after disconnect
@ 2022-05-13 11:27 Michael Johnson
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Johnson @ 2022-05-13 11:27 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 851 bytes --]

Hi Denis,

I've updated some of the clients in my test environment to use your
latest ell debug logging patch and I'll let you know what error I see.

On Thu, 12 May 2022 at 18:50, Denis Kenzior <denkenz(a)gmail.com> wrote:
> > May 12 15:05:02 r2-19006 iwd[475]: [DHCPv4] announce_wait_timeout:187
> > Failed to send ACD announcement
> > ```
>
> Yes, I saw that.  Looking at the code, we should fail to start acd if we're
> running out of fds and this logic wouldn't be triggered...

I'm not sure this is related but I notice that in normal logging there
are never any retries of this method as this is never printed:
```
announce_wait_timeout: ACD_DEBUG("Done announcing");
```
which makes sense as the timeout object is NULL when "skip_probes" is true.
```
l_acd_start: announce_wait_timeout(NULL, acd);
```

Regards,
Michael

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: DHCP renewal stops after disconnect
@ 2022-05-13  9:58 Michael Johnson
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Johnson @ 2022-05-13  9:58 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 1244 bytes --]

Hi Denis,

When enabling the DHCP debug logs I also upgraded the version of iwd
and ell in my test environment to 1.27 and 0.50 respectively so the
most recent logs should be closer to you at least.

Regards,
Michael

On Thu, 12 May 2022 at 18:50, Denis Kenzior <denkenz(a)gmail.com> wrote:
>
> Hi Michael,
>
> On 5/12/22 12:02, Michael Johnson wrote:
> > Hi Denis,
> >
> > Ok, thanks! I'll try to add more debug logging and get back to you.
> > This error might be important if it is related to an fp?
> > ```
> > May 12 15:05:02 r2-19006 iwd[475]: [DHCPv4] announce_wait_timeout:166
> > No conflicts found for 10.101.8.24, announcing address
> > May 12 15:05:02 r2-19006 iwd[475]: [DHCPv4] acd_send_packet:146
> > sending packet with target IP 10.101.8.24
> > May 12 15:05:02 r2-19006 iwd[475]: [DHCPv4] announce_wait_timeout:187
> > Failed to send ACD announcement
> > ```
>
> Yes, I saw that.  Looking at the code, we should fail to start acd if we're
> running out of fds and this logic wouldn't be triggered...
>
> I'm running a full test suite under valgrind with '--track-fds=yes', but it
> hasn't found anything so far.  You are running a different version than I am
> though...
>
> Regards,
> -Denis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: DHCP renewal stops after disconnect
@ 2022-05-12 17:50 Denis Kenzior
  0 siblings, 0 replies; 15+ messages in thread
From: Denis Kenzior @ 2022-05-12 17:50 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 898 bytes --]

Hi Michael,

On 5/12/22 12:02, Michael Johnson wrote:
> Hi Denis,
> 
> Ok, thanks! I'll try to add more debug logging and get back to you.
> This error might be important if it is related to an fp?
> ```
> May 12 15:05:02 r2-19006 iwd[475]: [DHCPv4] announce_wait_timeout:166
> No conflicts found for 10.101.8.24, announcing address
> May 12 15:05:02 r2-19006 iwd[475]: [DHCPv4] acd_send_packet:146
> sending packet with target IP 10.101.8.24
> May 12 15:05:02 r2-19006 iwd[475]: [DHCPv4] announce_wait_timeout:187
> Failed to send ACD announcement
> ```

Yes, I saw that.  Looking at the code, we should fail to start acd if we're 
running out of fds and this logic wouldn't be triggered...

I'm running a full test suite under valgrind with '--track-fds=yes', but it 
hasn't found anything so far.  You are running a different version than I am 
though...

Regards,
-Denis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: DHCP renewal stops after disconnect
@ 2022-05-12 17:02 Michael Johnson
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Johnson @ 2022-05-12 17:02 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 2293 bytes --]

Hi Denis,

Ok, thanks! I'll try to add more debug logging and get back to you.
This error might be important if it is related to an fp?
```
May 12 15:05:02 r2-19006 iwd[475]: [DHCPv4] announce_wait_timeout:166
No conflicts found for 10.101.8.24, announcing address
May 12 15:05:02 r2-19006 iwd[475]: [DHCPv4] acd_send_packet:146
sending packet with target IP 10.101.8.24
May 12 15:05:02 r2-19006 iwd[475]: [DHCPv4] announce_wait_timeout:187
Failed to send ACD announcement
```

Regards,
Michael

On Thu, 12 May 2022 at 17:38, Denis Kenzior <denkenz(a)gmail.com> wrote:
>
> Hi Michael,
>
> On 5/12/22 11:19, Michael Johnson wrote:
> > Hi Denis,
> >
> > I'm trying out the ell patch on some clients and so far it's looking
> > good. I also have some clients running without the patch but with
>
> Cool
>
> > IWD_DHCP_DEBUG enabled and I saw a very similar issue occur as the
> > original major issue >
> > This was caused by the retry timing out so wouldn't have happened with
> > the patch but the result is not what I expected in that after we do a
> > full rebind, we don't send a T1 renew. I expected that it would do a
>
> So by a full rebind you mean:
>
> May 12 15:03:45 r2-19006 iwd[475]: [DHCPv4] dhcp_client_t2_expired:607
> May 12 15:03:45 r2-19006 iwd[475]: [DHCPv4] dhcp_client_t2_expired:614 Entering
> state: DHCP_STATE_REBINDING
> May 12 15:05:01 r2-19006 iwd[475]: [DHCPv4] dhcp_client_lease_expired:596
>
> Where we issue a l_dhcp_client_stop and l_dhcp_client_start() again.
>
> > full state refresh and so it should go back to working after the T2
> > timeout.
>
> So the problem is that we do not reset the timer for T2 until after the T1 timer
> fires.  Each l_timeout object uses an fd from the kernel, so we try to minimize
> the number of timers active.
>
> As you point out, the timeout value is getting set correctly:
>
> # T1 is set correctly
> May 12 15:05:02 r2-19006 iwd[475]: [DHCPv4] dhcp_client_rx_message:943 T1
> expiring in 299927 ms
>
> So the only thing I can think of is that we're leaking the fd somehow, which
> would be pretty embarrassing.  Or we're running out of fds and can't create the
> new timer.  Running under strace or putting in some extra debugs might help?
>
> Regards,
> -Denis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: DHCP renewal stops after disconnect
@ 2022-05-12 16:19 Michael Johnson
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Johnson @ 2022-05-12 16:19 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 832 bytes --]

Hi Denis,

I'm trying out the ell patch on some clients and so far it's looking
good. I also have some clients running without the patch but with
IWD_DHCP_DEBUG enabled and I saw a very similar issue occur as the
original major issue.

This was caused by the retry timing out so wouldn't have happened with
the patch but the result is not what I expected in that after we do a
full rebind, we don't send a T1 renew. I expected that it would do a
full state refresh and so it should go back to working after the T2
timeout.

https://pastebin.com/ERyQZPhr

Regards,
Michael

On Wed, 11 May 2022 at 19:52, Denis Kenzior <denkenz(a)gmail.com> wrote:
>
> Hi Michael,
>
> > I would be happy to contribute something like this if you were open to it.
>
> Sure.  I think that would be useful.
>
> Regards,
> -Denis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: DHCP renewal stops after disconnect
@ 2022-05-11 18:52 Denis Kenzior
  0 siblings, 0 replies; 15+ messages in thread
From: Denis Kenzior @ 2022-05-11 18:52 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 151 bytes --]

Hi Michael,

> I would be happy to contribute something like this if you were open to it.

Sure.  I think that would be useful.

Regards,
-Denis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: DHCP renewal stops after disconnect
@ 2022-05-11 18:41 Michael Johnson
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Johnson @ 2022-05-11 18:41 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 2437 bytes --]

Hi Denis,

Right, I was imagining a reduced set of information more user targeted
that is shown by default. Something similar to dhclient:

```
May 08 08:51:16 p3-2019 dhclient[852]: DHCPDISCOVER on wlp2s0 to
255.255.255.255 port 67 interval 3 (xid=0x8f5cca6b)
May 08 08:51:19 p3-2019 dhclient[852]: DHCPDISCOVER on wlp2s0 to
255.255.255.255 port 67 interval 5 (xid=0x8f5cca6b)
May 08 08:51:24 p3-2019 dhclient[852]: DHCPDISCOVER on wlp2s0 to
255.255.255.255 port 67 interval 13 (xid=0x8f5cca6b)
May 08 08:51:25 p3-2019 dhclient[852]: DHCPREQUEST of 10.133.65.248 on
wlp2s0 to 255.255.255.255 port 67 (xid=0x6bca5c8f)
May 08 08:51:25 p3-2019 dhclient[852]: DHCPOFFER of 10.133.65.248 from
10.133.64.2
May 08 08:51:25 p3-2019 dhclient[852]: DHCPACK of 10.133.65.248 from 10.133.64.2
May 08 08:51:25 p3-2019 dhclient[852]: bound to 10.133.65.248 --
renewal in 3419 seconds.
May 08 09:48:24 p3-2019 dhclient[1222]: DHCPREQUEST of 10.133.65.248
on wlp2s0 to 10.133.64.2 port 67 (xid=0x6bca5c8f)
May 08 09:48:24 p3-2019 dhclient[1222]: DHCPACK of 10.133.65.248 from
10.133.64.2
May 08 09:48:24 p3-2019 dhclient[1222]: bound to 10.133.65.248 --
renewal in 3400 seconds.
May 08 10:45:04 p3-2019 dhclient[1222]: DHCPREQUEST of 10.133.65.248
on wlp2s0 to 10.133.64.2 port 67 (xid=0x6bca5c8f)
May 08 10:45:04 p3-2019 dhclient[1222]: DHCPACK of 10.133.65.248 from
10.133.64.2
May 08 10:45:04 p3-2019 dhclient[1222]: bound to 10.133.65.248 --
renewal in 3159 seconds.
```

I would be happy to contribute something like this if you were open to it.

Regards,
Michael

On Wed, 11 May 2022 at 19:21, Denis Kenzior <denkenz(a)gmail.com> wrote:
>
> Hi Michael,
>
> > One thing I was wondering is if some level of info logging would be
> > possible for the dhcp client. This could be similar in verbosity to
> > dhclient or networkd which would make it easier to see what happened
> > with this kind of thing. DHCP happens on a relatively slow basis so it
> > wouldn't spam much. I attempted to add this but I couldn't really find
> > a good place for it.
>
> You can start iwd with IWD_DHCP_DEBUG set but not '-d'.  So it would only print
> the DHCP info and not anything else.
>
> Or do you mean a reduced set of information prints from the dhcp_client?  Should
> be possible by introducing something like l_dhcp_client_set_debug_level().
>
> Or do you mean something else?
>
> Regards,
> -Denis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: DHCP renewal stops after disconnect
@ 2022-05-11 18:21 Denis Kenzior
  0 siblings, 0 replies; 15+ messages in thread
From: Denis Kenzior @ 2022-05-11 18:21 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 736 bytes --]

Hi Michael,

> One thing I was wondering is if some level of info logging would be
> possible for the dhcp client. This could be similar in verbosity to
> dhclient or networkd which would make it easier to see what happened
> with this kind of thing. DHCP happens on a relatively slow basis so it
> wouldn't spam much. I attempted to add this but I couldn't really find
> a good place for it.

You can start iwd with IWD_DHCP_DEBUG set but not '-d'.  So it would only print 
the DHCP info and not anything else.

Or do you mean a reduced set of information prints from the dhcp_client?  Should 
be possible by introducing something like l_dhcp_client_set_debug_level().

Or do you mean something else?

Regards,
-Denis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: DHCP renewal stops after disconnect
@ 2022-05-11 18:13 Michael Johnson
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Johnson @ 2022-05-11 18:13 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 988 bytes --]

Hi Denis,

Thank you, I'll try it out.

One thing I was wondering is if some level of info logging would be
possible for the dhcp client. This could be similar in verbosity to
dhclient or networkd which would make it easier to see what happened
with this kind of thing. DHCP happens on a relatively slow basis so it
wouldn't spam much. I attempted to add this but I couldn't really find
a good place for it.

Regards,
Michael

On Wed, 11 May 2022 at 18:32, Denis Kenzior <denkenz(a)gmail.com> wrote:
>
> Hi Michael,
>
>
> > The lease should always be 2 hours if dhcpd is working correctly. The
> > ones I've seen are always this value.
> >
>
> Ok.
>
> > For this one I did have DHCP debug on. You seem to be right, it looks
> > like we didn't renew until we timed out completely.
> >
>
> That would seem to confirm my hunch.  I just sent out a patch to the ell mailing
> list and to you.  See if that one fixes this particular issue.
>
> Regards,
> -Denis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: DHCP renewal stops after disconnect
@ 2022-05-11 17:32 Denis Kenzior
  0 siblings, 0 replies; 15+ messages in thread
From: Denis Kenzior @ 2022-05-11 17:32 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 443 bytes --]

Hi Michael,


> The lease should always be 2 hours if dhcpd is working correctly. The
> ones I've seen are always this value.
> 

Ok.

> For this one I did have DHCP debug on. You seem to be right, it looks
> like we didn't renew until we timed out completely.
> 

That would seem to confirm my hunch.  I just sent out a patch to the ell mailing 
list and to you.  See if that one fixes this particular issue.

Regards,
-Denis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: DHCP renewal stops after disconnect
@ 2022-05-11 16:22 Michael Johnson
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Johnson @ 2022-05-11 16:22 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 2843 bytes --]

Hi Denis,

Thanks for the prompt response.

On Wed, 11 May 2022 at 16:56, Denis Kenzior <denkenz(a)gmail.com> wrote:
> Nothing jumps out at me right now.  The IWD_DHCP_DEBUG log would be helpful.

Ok, thanks. I'm still waiting for this to happen again. I tried to
reproduce locally but I've been unsuccessful. None of kicking clients,
restarting the AP, lowering the power, or changing channels caused it
on their own so there must be a hidden variable I'm missing. Or its a
race thing and I've just not hit it.

> Is the lease an inifinite lease by any chance?

The lease should always be 2 hours if dhcpd is working correctly. The
ones I've seen are always this value.

> Yes, we definitely should be doing retransmissions in the RENEWAL state.  We do
> start the retransmission timer dhcp_client_t1_expired(), so the same
> retransmission logic as during DISCOVER/REQUESTING states should apply.
>
> The interesting thing is that T2 timer expiration updates the state, but still
> relies on the retransmission timer to fire so that the REBIND request can be
> sent.  So that implies that retransmission timer is active.
>
> Have you noticed any cases where the lease was not renewed at all?

For this one I did have DHCP debug on. You seem to be right, it looks
like we didn't renew until we timed out completely.

```
May 10 13:03:59 p3-153 iwd[1327599]: [DHCPv4] dhcp_client_t1_expired:628
May 10 13:03:59 p3-153 iwd[1327599]: [DHCPv4]
dhcp_client_t1_expired:630 Entering state: DHCP_STATE_RENEWING
May 10 13:03:59 p3-153 iwd[1327599]: [DHCPv4] dhcp_client_send_request:396
May 10 13:03:59 p3-153 iwd[1327599]: [DHCPv4] dhcp_client_rx_message:791
May 10 13:03:59 p3-153 iwd[1327599]: [DHCPv4] dhcp_client_receive_ack:684
May 10 13:03:59 p3-153 iwd[1327599]: [DHCPv4]
dhcp_client_rx_message:869 Entering state: DHCP_STATE_BOUND
May 10 13:03:59 p3-153 iwd[1327599]:
src/netconfig.c:netconfig_ipv4_dhcp_event_handler() DHCPv4 event 3
May 10 13:03:59 p3-153 iwd[1327599]: [DHCPv4]
dhcp_client_rx_message:903 T1 expiring in 3600841 ms
May 10 14:04:00 p3-153 iwd[1327599]: [DHCPv4] dhcp_client_t1_expired:628
May 10 14:04:00 p3-153 iwd[1327599]: [DHCPv4]
dhcp_client_t1_expired:630 Entering state: DHCP_STATE_RENEWING
May 10 14:04:00 p3-153 iwd[1327599]: [DHCPv4] dhcp_client_send_request:396
# no response
May 10 14:49:01 p3-153 iwd[1327599]: [DHCPv4] dhcp_client_t2_expired:607
May 10 14:49:01 p3-153 iwd[1327599]: [DHCPv4]
dhcp_client_t2_expired:614 Entering state: DHCP_STATE_REBINDING
# why nothing here?
May 10 15:04:01 p3-153 iwd[1327599]: [DHCPv4] dhcp_client_lease_expired:596
May 10 15:04:01 p3-153 iwd[1327599]: [DHCPv4] dhcp_client_send_release:489
May 10 15:04:01 p3-153 iwd[1327599]: [DHCPv4] l_dhcp_client_stop:1219
Entering state: DHCP_STATE_INIT
```

Regards,
Michael

^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: DHCP renewal stops after disconnect
@ 2022-05-11 15:56 Denis Kenzior
  0 siblings, 0 replies; 15+ messages in thread
From: Denis Kenzior @ 2022-05-11 15:56 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 2551 bytes --]

Hi Michael,

On 5/11/22 05:53, Michael Johnson wrote:
> Hey,
> 
> I'm encountering a couple of issues with the DHCP client in iwd
> specifically when the network environment is flaky.
> 
> The major issue I have is when the iwd client is disconnected either
> from a channel change or just a low signal. The DHCP client gets a new
> lease when it connects but never asks for a lease renewal which
> eventually means it loses it after the expiration. This paste shows
> the logs of this happening https://pastebin.com/yGJbUmVR  (I have
> enabled IWD_DHCP_DEBUG=1 to get more data but I am still waiting for
> it to happen again).

Nothing jumps out at me right now.  The IWD_DHCP_DEBUG log would be helpful.

> 
> My plan is to see if I can reproduce the issue on a smaller scale (we
> have ~130 clients running and it happens maybe twice a day) and once I
> can reproduce I will update iwd and ell to see if that fixes it (I'm
> using version 1.21 iwd and 0.47 ell). I looked through the new commits
> since my versions and there was nothing explicitly about this but
> c78ad1bb and b02c2034 look possibly related?

c78ad1bb should only affect a code path that isn't being used by iwd right now.

I don't think b02c2034 should affect this either.

> 
> For now, I wonder if you know of an obvious place that a disconnect
> might not be cleaning up the DHCP state fully which causes the T1
> timer not to be set?
> 

Is the lease an inifinite lease by any chance?

> 
> As for the second minor issue, it seems like iwd doesn't retry sending
> a DHCPREQUEST once it's passed the T1 timer and in the RENEW state
> which I believe it should based on
> http://www.tcpipguide.com/free/t_DHCPLeaseRenewalandRebindingProcesses-2.htm.
> I have seen the first request being lost due to a bad signal and then
> iwd didn't try again until T2 which is a more complete renewal. This
> doesn't cause any interruption but does cause the lease to be removed
> before being renewed.

Yes, we definitely should be doing retransmissions in the RENEWAL state.  We do 
start the retransmission timer dhcp_client_t1_expired(), so the same 
retransmission logic as during DISCOVER/REQUESTING states should apply.

The interesting thing is that T2 timer expiration updates the state, but still 
relies on the retransmission timer to fire so that the REBIND request can be 
sent.  So that implies that retransmission timer is active.

Have you noticed any cases where the lease was not renewed at all?

Regards,
-Denis

^ permalink raw reply	[flat|nested] 15+ messages in thread

* DHCP renewal stops after disconnect
@ 2022-05-11 10:53 Michael Johnson
  0 siblings, 0 replies; 15+ messages in thread
From: Michael Johnson @ 2022-05-11 10:53 UTC (permalink / raw)
  To: iwd

[-- Attachment #1: Type: text/plain, Size: 1651 bytes --]

Hey,

I'm encountering a couple of issues with the DHCP client in iwd
specifically when the network environment is flaky.

The major issue I have is when the iwd client is disconnected either
from a channel change or just a low signal. The DHCP client gets a new
lease when it connects but never asks for a lease renewal which
eventually means it loses it after the expiration. This paste shows
the logs of this happening https://pastebin.com/yGJbUmVR  (I have
enabled IWD_DHCP_DEBUG=1 to get more data but I am still waiting for
it to happen again).

My plan is to see if I can reproduce the issue on a smaller scale (we
have ~130 clients running and it happens maybe twice a day) and once I
can reproduce I will update iwd and ell to see if that fixes it (I'm
using version 1.21 iwd and 0.47 ell). I looked through the new commits
since my versions and there was nothing explicitly about this but
c78ad1bb and b02c2034 look possibly related?

For now, I wonder if you know of an obvious place that a disconnect
might not be cleaning up the DHCP state fully which causes the T1
timer not to be set?


As for the second minor issue, it seems like iwd doesn't retry sending
a DHCPREQUEST once it's passed the T1 timer and in the RENEW state
which I believe it should based on
http://www.tcpipguide.com/free/t_DHCPLeaseRenewalandRebindingProcesses-2.htm.
I have seen the first request being lost due to a bad signal and then
iwd didn't try again until T2 which is a more complete renewal. This
doesn't cause any interruption but does cause the lease to be removed
before being renewed.


Kind Regards,
Michael Johnson

^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-05-16  9:38 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-12 16:37 DHCP renewal stops after disconnect Denis Kenzior
  -- strict thread matches above, loose matches on Subject: below --
2022-05-16  9:38 Michael Johnson
2022-05-13 11:27 Michael Johnson
2022-05-13  9:58 Michael Johnson
2022-05-12 17:50 Denis Kenzior
2022-05-12 17:02 Michael Johnson
2022-05-12 16:19 Michael Johnson
2022-05-11 18:52 Denis Kenzior
2022-05-11 18:41 Michael Johnson
2022-05-11 18:21 Denis Kenzior
2022-05-11 18:13 Michael Johnson
2022-05-11 17:32 Denis Kenzior
2022-05-11 16:22 Michael Johnson
2022-05-11 15:56 Denis Kenzior
2022-05-11 10:53 Michael Johnson

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.