linux-bluetooth.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* BlueZ/mesh: RX not working after daemon restart (with workaround)
@ 2019-11-10 20:08 Aurelien Jarno
  2019-11-10 20:59 ` Steve Brown
  0 siblings, 1 reply; 4+ messages in thread
From: Aurelien Jarno @ 2019-11-10 20:08 UTC (permalink / raw)
  To: linux-bluetooth

Hi all,

On my system (Raspberry PI 3), the RX path doesn't work anymore
following a restart of the bluetooth-meshd daemon. I have tracked down
that to the fact that the receive callbacks are setup before the HCI is
fully initialized. Said otherwise, BT_HCI_CMD_LE_SET_SCAN_PARAMETERS is
called before BT_HCI_CMD_RESET and the callback calling
BT_HCI_CMD_LE_SET_SCAN_ENABLE is not called. This timing dependent and
probably not reproducible on all hardware.

I have workarounded the issue by adding a small delay between the HCI
initialization and the call to node_attach_io_all():

diff --git a/mesh/mesh.c b/mesh/mesh.c
index 9b2b2073b..1c06060f9 100644
--- a/mesh/mesh.c
+++ b/mesh/mesh.c
@@ -167,6 +167,10 @@ bool mesh_init(const char *config_dir, enum mesh_io_type type, void *opts)
 	mesh_io_get_caps(mesh.io, &caps);
 	mesh.max_filters = caps.max_num_filters;
 
+	for (int i = 0 ; i < 100 ; i++) {
+		l_main_iterate(10);
+	}
+
 	node_attach_io_all(mesh.io);
 
 	return true;

I guess there is a better way to do that by waiting for the HCI to be
fully initialized before calling node_attach_io_all() or by using a
callback instead. However I do not know the codebase good enough to fix
that properly.

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: BlueZ/mesh: RX not working after daemon restart (with workaround)
  2019-11-10 20:08 BlueZ/mesh: RX not working after daemon restart (with workaround) Aurelien Jarno
@ 2019-11-10 20:59 ` Steve Brown
  2019-11-10 21:39   ` Aurelien Jarno
  0 siblings, 1 reply; 4+ messages in thread
From: Steve Brown @ 2019-11-10 20:59 UTC (permalink / raw)
  To: Aurelien Jarno, linux-bluetooth

On Sun, 2019-11-10 at 21:08 +0100, Aurelien Jarno wrote:
> Hi all,
> 
> On my system (Raspberry PI 3), the RX path doesn't work anymore
> following a restart of the bluetooth-meshd daemon. I have tracked
> down
> that to the fact that the receive callbacks are setup before the HCI
> is
> fully initialized. Said otherwise, BT_HCI_CMD_LE_SET_SCAN_PARAMETERS
> is
> called before BT_HCI_CMD_RESET and the callback calling
> BT_HCI_CMD_LE_SET_SCAN_ENABLE is not called. This timing dependent
> and
> probably not reproducible on all hardware.
> 
> I have workarounded the issue by adding a small delay between the HCI
> initialization and the call to node_attach_io_all():
> 
> diff --git a/mesh/mesh.c b/mesh/mesh.c
> index 9b2b2073b..1c06060f9 100644
> --- a/mesh/mesh.c
> +++ b/mesh/mesh.c
> @@ -167,6 +167,10 @@ bool mesh_init(const char *config_dir, enum
> mesh_io_type type, void *opts)
>  	mesh_io_get_caps(mesh.io, &caps);
>  	mesh.max_filters = caps.max_num_filters;
>  
> +	for (int i = 0 ; i < 100 ; i++) {
> +		l_main_iterate(10);
> +	}
> +
>  	node_attach_io_all(mesh.io);
>  
>  	return true;
> 
> I guess there is a better way to do that by waiting for the HCI to be
> fully initialized before calling node_attach_io_all() or by using a
> callback instead. However I do not know the codebase good enough to
> fix
> that properly.
> 
> Aurelien
> 
I've experienced something similar on my rpi3. I found that on restart,
discover-unprovisioned stopped working.

In my case, it appears that meshd assumes that if there are existing
nodes, scanning has been enabled. Thus, calls from mesh-cfgclient to
discover additional unprovisioned nodes do not need another hci scan
enable at mesh/mesh-io-generic.c:736.

If meshd is restarted with preexisting nodes, scanning is still assumed
to already be enabled, but it's not. This breaks discover-unprovisioned 
for me.

I suspect this is a symptom of a deeper problem where mesh/mesh-config-
json.c:load_node doesn't completely reestablish the node state that
existed when the node was originally added.

Steve







^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: BlueZ/mesh: RX not working after daemon restart (with workaround)
  2019-11-10 20:59 ` Steve Brown
@ 2019-11-10 21:39   ` Aurelien Jarno
  2019-11-12  6:44     ` Stotland, Inga
  0 siblings, 1 reply; 4+ messages in thread
From: Aurelien Jarno @ 2019-11-10 21:39 UTC (permalink / raw)
  To: Steve Brown; +Cc: linux-bluetooth

Hi,

On 2019-11-10 13:59, Steve Brown wrote:
> On Sun, 2019-11-10 at 21:08 +0100, Aurelien Jarno wrote:
> > Hi all,
> > 
> > On my system (Raspberry PI 3), the RX path doesn't work anymore
> > following a restart of the bluetooth-meshd daemon. I have tracked
> > down
> > that to the fact that the receive callbacks are setup before the HCI
> > is
> > fully initialized. Said otherwise, BT_HCI_CMD_LE_SET_SCAN_PARAMETERS
> > is
> > called before BT_HCI_CMD_RESET and the callback calling
> > BT_HCI_CMD_LE_SET_SCAN_ENABLE is not called. This timing dependent
> > and
> > probably not reproducible on all hardware.
> > 
> > I have workarounded the issue by adding a small delay between the HCI
> > initialization and the call to node_attach_io_all():
> > 
> > diff --git a/mesh/mesh.c b/mesh/mesh.c
> > index 9b2b2073b..1c06060f9 100644
> > --- a/mesh/mesh.c
> > +++ b/mesh/mesh.c
> > @@ -167,6 +167,10 @@ bool mesh_init(const char *config_dir, enum
> > mesh_io_type type, void *opts)
> >  	mesh_io_get_caps(mesh.io, &caps);
> >  	mesh.max_filters = caps.max_num_filters;
> >  
> > +	for (int i = 0 ; i < 100 ; i++) {
> > +		l_main_iterate(10);
> > +	}
> > +
> >  	node_attach_io_all(mesh.io);
> >  
> >  	return true;
> > 
> > I guess there is a better way to do that by waiting for the HCI to be
> > fully initialized before calling node_attach_io_all() or by using a
> > callback instead. However I do not know the codebase good enough to
> > fix
> > that properly.
> > 
> > Aurelien
> > 
> I've experienced something similar on my rpi3. I found that on restart,
> discover-unprovisioned stopped working.

In my case I also observe the same.

> In my case, it appears that meshd assumes that if there are existing
> nodes, scanning has been enabled. Thus, calls from mesh-cfgclient to
> discover additional unprovisioned nodes do not need another hci scan
> enable at mesh/mesh-io-generic.c:736.
> 
> If meshd is restarted with preexisting nodes, scanning is still assumed
> to already be enabled, but it's not. This breaks discover-unprovisioned 
> for me.

Yes, I think this is exactly my problem. If there are existing nodes,
recv_register is called before the HCI is configured and pvt->rx_regs is
filled at mesh/mesh-io-generic.c:738. This means that later scanning is
assumed to be enabled. However the call to bt_hci_send with
BT_HCI_CMD_LE_SET_SCAN_PARAMETERS fails as the HCI is not yet
initialized and the callback set_recv_scan_enable() supposed to enable
scanning is not called.

So when loading a node, scanning is assumed to be enabled, but it is
not practice.

I believe my workaround should work on your system (maybe after
adjusting the number of iterations of the loop).

Aurelien
 
-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: BlueZ/mesh: RX not working after daemon restart (with workaround)
  2019-11-10 21:39   ` Aurelien Jarno
@ 2019-11-12  6:44     ` Stotland, Inga
  0 siblings, 0 replies; 4+ messages in thread
From: Stotland, Inga @ 2019-11-12  6:44 UTC (permalink / raw)
  To: aurelien, Gix, Brian, sbrown; +Cc: linux-bluetooth

Hi Aurelien,

On Sun, 2019-11-10 at 22:39 +0100, Aurelien Jarno wrote:
> Hi,
> 
> On 2019-11-10 13:59, Steve Brown wrote:
> > On Sun, 2019-11-10 at 21:08 +0100, Aurelien Jarno wrote:
> > > Hi all,
> > > 
> > > On my system (Raspberry PI 3), the RX path doesn't work anymore
> > > following a restart of the bluetooth-meshd daemon. I have tracked
> > > down
> > > that to the fact that the receive callbacks are setup before the HCI
> > > is
> > > fully initialized. Said otherwise, BT_HCI_CMD_LE_SET_SCAN_PARAMETERS
> > > is
> > > called before BT_HCI_CMD_RESET and the callback calling
> > > BT_HCI_CMD_LE_SET_SCAN_ENABLE is not called. This timing dependent
> > > and
> > > probably not reproducible on all hardware.
> > > 
> > > I have workarounded the issue by adding a small delay between the HCI
> > > initialization and the call to node_attach_io_all():
> > > 
> > > diff --git a/mesh/mesh.c b/mesh/mesh.c
> > > index 9b2b2073b..1c06060f9 100644
> > > --- a/mesh/mesh.c
> > > +++ b/mesh/mesh.c
> > > @@ -167,6 +167,10 @@ bool mesh_init(const char *config_dir, enum
> > > mesh_io_type type, void *opts)
> > >  	mesh_io_get_caps(mesh.io, &caps);
> > >  	mesh.max_filters = caps.max_num_filters;
> > >  
> > > +	for (int i = 0 ; i < 100 ; i++) {
> > > +		l_main_iterate(10);
> > > +	}
> > > +
> > >  	node_attach_io_all(mesh.io);
> > >  
> > >  	return true;
> > > 
> > > I guess there is a better way to do that by waiting for the HCI to be
> > > fully initialized before calling node_attach_io_all() or by using a
> > > callback instead. However I do not know the codebase good enough to
> > > fix
> > > that properly.
> > > 
> > > Aurelien
> > > 
> > I've experienced something similar on my rpi3. I found that on restart,
> > discover-unprovisioned stopped working.
> 
> In my case I also observe the same.
> 
> > In my case, it appears that meshd assumes that if there are existing
> > nodes, scanning has been enabled. Thus, calls from mesh-cfgclient to
> > discover additional unprovisioned nodes do not need another hci scan
> > enable at mesh/mesh-io-generic.c:736.
> > 
> > If meshd is restarted with preexisting nodes, scanning is still assumed
> > to already be enabled, but it's not. This breaks discover-unprovisioned 
> > for me.
> 
> Yes, I think this is exactly my problem. If there are existing nodes,
> recv_register is called before the HCI is configured and pvt->rx_regs is
> filled at mesh/mesh-io-generic.c:738. This means that later scanning is
> assumed to be enabled. However the call to bt_hci_send with
> BT_HCI_CMD_LE_SET_SCAN_PARAMETERS fails as the HCI is not yet
> initialized and the callback set_recv_scan_enable() supposed to enable
> scanning is not called.
> 
> So when loading a node, scanning is assumed to be enabled, but it is
> not practice.
> 
> I believe my workaround should work on your system (maybe after
> adjusting the number of iterations of the loop).
> 
> Aurelien
>  

Thanks for the analysis. I think we should switch to callback approach,
i.e. initialize io first and register the RX on the successful init
callback.

Best regards,
Inga


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-11-12  6:44 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-10 20:08 BlueZ/mesh: RX not working after daemon restart (with workaround) Aurelien Jarno
2019-11-10 20:59 ` Steve Brown
2019-11-10 21:39   ` Aurelien Jarno
2019-11-12  6:44     ` Stotland, Inga

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).