All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 1/3] push/pop errno in initrd read file path
@ 2016-03-11 16:28 Josef Bacik
  2016-03-11 16:28 ` [PATCH 2/3] tcp: add a dprintf for opening tcp connections Josef Bacik
                   ` (3 more replies)
  0 siblings, 4 replies; 13+ messages in thread
From: Josef Bacik @ 2016-03-11 16:28 UTC (permalink / raw)
  To: kernel-team, grub-devel

If you try to load an initrd from http and it errors out we will free the initrd
context but continue on because net_tcp_socket_close() will reset the grub_errno
as will grub_initrd_close().  So we'll lose the errno and return GRUB_ERR_NONE
instead of the original error.  Add push/pulls to the appropriate places so we
don't lose our errno.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 grub-core/loader/linux.c | 2 ++
 grub-core/net/http.c     | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/grub-core/loader/linux.c b/grub-core/loader/linux.c
index be6fa0f..bd61ca2 100644
--- a/grub-core/loader/linux.c
+++ b/grub-core/loader/linux.c
@@ -202,7 +202,9 @@ grub_initrd_init (int argc, char *argv[],
       initrd_ctx->components[i].file = grub_file_open (fname);
       if (!initrd_ctx->components[i].file)
 	{
+	  grub_error_push ();
 	  grub_initrd_close (initrd_ctx);
+	  grub_error_pop ();
 	  return grub_errno;
 	}
       initrd_ctx->nfiles++;
diff --git a/grub-core/net/http.c b/grub-core/net/http.c
index 4684f8b..0eeb2f6 100644
--- a/grub-core/net/http.c
+++ b/grub-core/net/http.c
@@ -406,7 +406,9 @@ http_establish (struct grub_file *file, grub_off_t offset, int initial)
   err = grub_net_send_tcp_packet (data->sock, nb, 1);
   if (err)
     {
+      grub_error_push ();
       grub_net_tcp_close (data->sock, GRUB_NET_TCP_ABORT);
+      grub_error_pop ();
       return err;
     }
 
-- 
2.5.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 2/3] tcp: add a dprintf for opening tcp connections
  2016-03-11 16:28 [PATCH 1/3] push/pop errno in initrd read file path Josef Bacik
@ 2016-03-11 16:28 ` Josef Bacik
  2016-03-11 16:28 ` [PATCH 3/3] pxenet: process transmit interrupts when out of resources Josef Bacik
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 13+ messages in thread
From: Josef Bacik @ 2016-03-11 16:28 UTC (permalink / raw)
  To: kernel-team, grub-devel

In debugging strange timeouts and other network problems it has been helpful to
see when we're opening new connections to get an idea of where we're having a
breakdown.

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 grub-core/net/tcp.c | 7 +++++++
 1 file changed, 7 insertions(+)

diff --git a/grub-core/net/tcp.c b/grub-core/net/tcp.c
index 2d1706e..cb97d05 100644
--- a/grub-core/net/tcp.c
+++ b/grub-core/net/tcp.c
@@ -632,6 +632,8 @@ grub_net_tcp_open (char *server,
   grub_uint8_t *nbd;
   grub_net_link_level_address_t ll_target_addr;
   grub_size_t headersize;
+  char addr_buf[GRUB_NET_MAX_STR_ADDR_LEN];
+  char gateway_buf[GRUB_NET_MAX_STR_ADDR_LEN];
 
   err = grub_net_resolve_address (server, &addr);
   if (err)
@@ -656,6 +658,11 @@ grub_net_tcp_open (char *server,
   if (socket == NULL)
     return NULL; 
 
+  grub_net_addr_to_str (&addr, addr_buf);
+  grub_net_addr_to_str (&gateway, gateway_buf);
+  grub_dprintf("net", "opening connection to %s on inf %s via gateway %s\n",
+	       addr_buf, inf->name, gateway_buf);
+
   socket->out_port = out_port;
   socket->inf = inf;
   socket->out_nla = addr;
-- 
2.5.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH 3/3] pxenet: process transmit interrupts when out of resources
  2016-03-11 16:28 [PATCH 1/3] push/pop errno in initrd read file path Josef Bacik
  2016-03-11 16:28 ` [PATCH 2/3] tcp: add a dprintf for opening tcp connections Josef Bacik
@ 2016-03-11 16:28 ` Josef Bacik
  2016-03-12  6:20   ` Andrei Borzenkov
  2016-03-11 17:23 ` [PATCH 1/3] push/pop errno in initrd read file path Vladimir 'phcoder' Serbinenko
  2016-11-10 12:49 ` Daniel Kiper
  3 siblings, 1 reply; 13+ messages in thread
From: Josef Bacik @ 2016-03-11 16:28 UTC (permalink / raw)
  To: kernel-team, grub-devel

We were hitting a problem in production where our bios based machines would
sometimes timeout while pulling down either the kernel/initrd.  It turned out
that sometimes we'd run out of transmit buffers and would then error out while
sending packets.  This is because we don't actually have an interrupt handler in
PXE, we just poll the card when we want to receive.  This works most of the
time, but sometimes we end up with too many transmit interrupts pending and then
we can't add new packets to the transmit buffers.

So rework the whole ISR logic to be able to be called from both transmit and
receive.  If we get OUT_OF_RESOURCES while trying to transmit then we can go
through and process the interrupts and hope that leaves us with space to retry
the transmit.  Unfortunately this puts us in a place where we can trip over a
RECEIVE interrupt, so we have to process that interrupt and leave a netbuff
behind for the next call into recv.

With this patch we are now able to properly provision boxes suffering from this
problem.  Thanks,

Signed-off-by: Josef Bacik <jbacik@fb.com>
---
 grub-core/net/drivers/i386/pc/pxe.c | 155 +++++++++++++++++++++++++-----------
 1 file changed, 108 insertions(+), 47 deletions(-)

diff --git a/grub-core/net/drivers/i386/pc/pxe.c b/grub-core/net/drivers/i386/pc/pxe.c
index 3f4152d..57445b7 100644
--- a/grub-core/net/drivers/i386/pc/pxe.c
+++ b/grub-core/net/drivers/i386/pc/pxe.c
@@ -166,16 +166,11 @@ grub_pxe_scan (void)
   return bangpxe;
 }
 
-static struct grub_net_buff *
-grub_pxe_recv (struct grub_net_card *dev __attribute__ ((unused)))
-{
-  struct grub_pxe_undi_isr *isr;
-  static int in_progress = 0;
-  grub_uint8_t *ptr, *end;
-  struct grub_net_buff *buf;
-
-  isr = (void *) GRUB_MEMORY_MACHINE_SCRATCH_ADDR;
+static int in_progress = 0;
 
+static void
+grub_pxe_process_isr (struct grub_pxe_undi_isr *isr)
+{
   if (!in_progress)
     {
       grub_memset (isr, 0, sizeof (*isr));
@@ -186,13 +181,14 @@ grub_pxe_recv (struct grub_net_card *dev __attribute__ ((unused)))
 	 breaks on intel cards.
        */
       if (isr->status)
-	{
-	  in_progress = 0;
-	  return NULL;
+        {
+	  grub_dprintf("net", "problem pulling isr %d\n", (int)isr->status);
+	  return;
 	}
       grub_memset (isr, 0, sizeof (*isr));
       isr->func_flag = GRUB_PXE_ISR_IN_PROCESS;
       grub_pxe_call (GRUB_PXENV_UNDI_ISR, isr, pxe_rm_entry);
+      in_progress = 1;
     }
   else
     {
@@ -200,18 +196,13 @@ grub_pxe_recv (struct grub_net_card *dev __attribute__ ((unused)))
       isr->func_flag = GRUB_PXE_ISR_IN_GET_NEXT;
       grub_pxe_call (GRUB_PXENV_UNDI_ISR, isr, pxe_rm_entry);
     }
+}
 
-  while (isr->func_flag != GRUB_PXE_ISR_OUT_RECEIVE)
-    {
-      if (isr->status || isr->func_flag == GRUB_PXE_ISR_OUT_DONE)
-	{
-	  in_progress = 0;
-	  return NULL;
-	}
-      grub_memset (isr, 0, sizeof (*isr));
-      isr->func_flag = GRUB_PXE_ISR_IN_GET_NEXT;
-      grub_pxe_call (GRUB_PXENV_UNDI_ISR, isr, pxe_rm_entry);
-    }
+static struct grub_net_buff *
+grub_pxe_make_net_buff (struct grub_pxe_undi_isr *isr)
+{
+  struct grub_net_buff *buf;
+  grub_uint8_t *ptr, *end;
 
   buf = grub_netbuff_alloc (isr->frame_len + 2);
   if (!buf)
@@ -230,48 +221,118 @@ grub_pxe_recv (struct grub_net_card *dev __attribute__ ((unused)))
   ptr += isr->buffer_len;
   while (ptr < end)
     {
-      grub_memset (isr, 0, sizeof (*isr));
-      isr->func_flag = GRUB_PXE_ISR_IN_GET_NEXT;
-      grub_pxe_call (GRUB_PXENV_UNDI_ISR, isr, pxe_rm_entry);
+      grub_pxe_process_isr (isr);
       if (isr->status || isr->func_flag != GRUB_PXE_ISR_OUT_RECEIVE)
 	{
-	  in_progress = 1;
+	  grub_dprintf("net", "half processed packet\n");
 	  grub_netbuff_free (buf);
 	  return NULL;
 	}
-
       grub_memcpy (ptr, LINEAR (isr->buffer), isr->buffer_len);
       ptr += isr->buffer_len;
     }
-  in_progress = 1;
-
   return buf;
 }
 
+static struct grub_net_buff *
+grub_pxe_recv (struct grub_net_card *dev)
+{
+  struct grub_pxe_undi_isr *isr;
+
+  if (dev->data)
+    {
+      struct grub_net_buff *buf = dev->data;
+      grub_dprintf("net", "Pulling existing receive packet\n");
+      dev->data = NULL;
+      return buf;
+    }
+
+  isr = (void *) GRUB_MEMORY_MACHINE_SCRATCH_ADDR;
+  grub_pxe_process_isr (isr);
+  while (isr->func_flag != GRUB_PXE_ISR_OUT_RECEIVE)
+    {
+      if (isr->status || isr->func_flag == GRUB_PXE_ISR_OUT_DONE)
+	{
+	  if (isr->status)
+	    grub_dprintf("net", "error pulling next %d\n", (int)isr->status);
+	  in_progress = 0;
+	  return 0;
+	}
+      grub_memset (isr, 0, sizeof (*isr));
+      isr->func_flag = GRUB_PXE_ISR_IN_GET_NEXT;
+      grub_pxe_call (GRUB_PXENV_UNDI_ISR, isr, pxe_rm_entry);
+    }
+  return grub_pxe_make_net_buff (isr);
+}
+
 static grub_err_t 
-grub_pxe_send (struct grub_net_card *dev __attribute__ ((unused)),
-	       struct grub_net_buff *pack)
+grub_pxe_send (struct grub_net_card *dev, struct grub_net_buff *pack)
 {
   struct grub_pxe_undi_transmit *trans;
   struct grub_pxe_undi_tbd *tbd;
   char *buf;
 
-  trans = (void *) GRUB_MEMORY_MACHINE_SCRATCH_ADDR;
-  grub_memset (trans, 0, sizeof (*trans));
-  tbd = (void *) (GRUB_MEMORY_MACHINE_SCRATCH_ADDR + 128);
-  grub_memset (tbd, 0, sizeof (*tbd));
-  buf = (void *) (GRUB_MEMORY_MACHINE_SCRATCH_ADDR + 256);
-  grub_memcpy (buf, pack->data, pack->tail - pack->data);
-
-  trans->tbd = SEGOFS ((grub_addr_t) tbd);
-  trans->protocol = 0;
-  tbd->len = pack->tail - pack->data;
-  tbd->buf = SEGOFS ((grub_addr_t) buf);
-
-  grub_pxe_call (GRUB_PXENV_UNDI_TRANSMIT, trans, pxe_rm_entry);
-  if (trans->status)
-    return grub_error (GRUB_ERR_IO, N_("couldn't send network packet"));
+  while (1)
+    {
+      int made_progress = 0;
+
+      trans = (void *) GRUB_MEMORY_MACHINE_SCRATCH_ADDR;
+      grub_memset (trans, 0, sizeof (*trans));
+      tbd = (void *) (GRUB_MEMORY_MACHINE_SCRATCH_ADDR + 128);
+      grub_memset (tbd, 0, sizeof (*tbd));
+      buf = (void *) (GRUB_MEMORY_MACHINE_SCRATCH_ADDR + 256);
+      grub_memcpy (buf, pack->data, pack->tail - pack->data);
+
+      trans->tbd = SEGOFS ((grub_addr_t) tbd);
+      trans->protocol = 0;
+      tbd->len = pack->tail - pack->data;
+      tbd->buf = SEGOFS ((grub_addr_t) buf);
+
+      grub_pxe_call (GRUB_PXENV_UNDI_TRANSMIT, trans, pxe_rm_entry);
+      if (!trans->status)
+	break;
+      if (trans->status == GRUB_PXENV_STATUS_OUT_OF_RESOURCES)
+	{
+	  struct grub_pxe_undi_isr *isr;
+
+	  grub_dprintf("net", "Out of transmit buffers, processing "
+		       "interrupts\n");
+	  /* Process any outstanding transmit interrupts. */
+	  isr = (void *) GRUB_MEMORY_MACHINE_SCRATCH_ADDR;
+	  do
+	    {
+	      grub_pxe_process_isr (isr);
+	      if (isr->status)
+		{
+		  grub_dprintf("net", "process interrupts errored %d,"
+			       "made_progress %d\n", (int)isr->status, made_progress);
+		  if (made_progress)
+		    break;
+		  else
+		    goto out_err;
+		}
+	      if (isr->func_flag == GRUB_PXE_ISR_OUT_DONE)
+		in_progress = 0;
+	      made_progress = 1;
+	    } while (isr->func_flag == GRUB_PXE_ISR_OUT_TRANSMIT);
+
+	  /* If we had a receive interrupt in the queue we need to copy this
+	     buffer out otherwise we'll lose it. */
+	  if (isr->func_flag == GRUB_PXE_ISR_OUT_RECEIVE)
+	    {
+	      if (dev->data)
+		grub_dprintf("net", "dropping packet, already have a "
+			     "pending packet.\n");
+	      else
+	        dev->data = grub_pxe_make_net_buff (isr);
+	    }
+	}
+      else
+        goto out_err;
+    }
   return 0;
+out_err:
+  return grub_error (GRUB_ERR_IO, N_("couldn't send network packet"));
 }
 
 static void
-- 
2.5.0



^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] push/pop errno in initrd read file path
  2016-03-11 16:28 [PATCH 1/3] push/pop errno in initrd read file path Josef Bacik
  2016-03-11 16:28 ` [PATCH 2/3] tcp: add a dprintf for opening tcp connections Josef Bacik
  2016-03-11 16:28 ` [PATCH 3/3] pxenet: process transmit interrupts when out of resources Josef Bacik
@ 2016-03-11 17:23 ` Vladimir 'phcoder' Serbinenko
  2016-03-11 18:13   ` Josef Bacik
  2016-11-10 12:49 ` Daniel Kiper
  3 siblings, 1 reply; 13+ messages in thread
From: Vladimir 'phcoder' Serbinenko @ 2016-03-11 17:23 UTC (permalink / raw)
  To: The development of GNU GRUB; +Cc: kernel-team

[-- Attachment #1: Type: text/plain, Size: 1983 bytes --]

On Friday, March 11, 2016, Josef Bacik <jbacik@fb.com> wrote:

> If you try to load an initrd from http and it errors out we will free the
> initrd
> context but continue on because net_tcp_socket_close() will reset the
> grub_errno
> as will grub_initrd_close().  So we'll lose the errno and return
> GRUB_ERR_NONE
> instead of the original error.  Add push/pulls to the appropriate places
> so we
> don't lose our errno.  Thanks,
>
Close functions shouldn't do this. Can you fix them instead? Also please
add [2.02] to the subjectwhen appropriate, like in this case.

>
> Signed-off-by: Josef Bacik <jbacik@fb.com <javascript:;>>
> ---
>  grub-core/loader/linux.c | 2 ++
>  grub-core/net/http.c     | 2 ++
>  2 files changed, 4 insertions(+)
>
> diff --git a/grub-core/loader/linux.c b/grub-core/loader/linux.c
> index be6fa0f..bd61ca2 100644
> --- a/grub-core/loader/linux.c
> +++ b/grub-core/loader/linux.c
> @@ -202,7 +202,9 @@ grub_initrd_init (int argc, char *argv[],
>        initrd_ctx->components[i].file = grub_file_open (fname);
>        if (!initrd_ctx->components[i].file)
>         {
> +         grub_error_push ();
>           grub_initrd_close (initrd_ctx);
> +         grub_error_pop ();
>           return grub_errno;
>         }
>        initrd_ctx->nfiles++;
> diff --git a/grub-core/net/http.c b/grub-core/net/http.c
> index 4684f8b..0eeb2f6 100644
> --- a/grub-core/net/http.c
> +++ b/grub-core/net/http.c
> @@ -406,7 +406,9 @@ http_establish (struct grub_file *file, grub_off_t
> offset, int initial)
>    err = grub_net_send_tcp_packet (data->sock, nb, 1);
>    if (err)
>      {
> +      grub_error_push ();
>        grub_net_tcp_close (data->sock, GRUB_NET_TCP_ABORT);
> +      grub_error_pop ();
>        return err;
>      }
>
> --
> 2.5.0
>
>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org <javascript:;>
> https://lists.gnu.org/mailman/listinfo/grub-devel
>


-- 
Regards
Vladimir 'phcoder' Serbinenko

[-- Attachment #2: Type: text/html, Size: 2776 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] push/pop errno in initrd read file path
  2016-03-11 17:23 ` [PATCH 1/3] push/pop errno in initrd read file path Vladimir 'phcoder' Serbinenko
@ 2016-03-11 18:13   ` Josef Bacik
  2016-03-11 19:34     ` Vladimir 'phcoder' Serbinenko
  0 siblings, 1 reply; 13+ messages in thread
From: Josef Bacik @ 2016-03-11 18:13 UTC (permalink / raw)
  To: Vladimir 'phcoder' Serbinenko, The development of GNU GRUB
  Cc: kernel-team

On 03/11/2016 12:23 PM, Vladimir 'phcoder' Serbinenko wrote:
>
>
> On Friday, March 11, 2016, Josef Bacik <jbacik@fb.com
> <mailto:jbacik@fb.com>> wrote:
>
>     If you try to load an initrd from http and it errors out we will
>     free the initrd
>     context but continue on because net_tcp_socket_close() will reset
>     the grub_errno
>     as will grub_initrd_close().  So we'll lose the errno and return
>     GRUB_ERR_NONE
>     instead of the original error.  Add push/pulls to the appropriate
>     places so we
>     don't lose our errno.  Thanks,
>
> Close functions shouldn't do this. Can you fix them instead? Also please
> add [2.02] to the subjectwhen appropriate, like in this case.
>

So do we not want close functions to do grub_error() at all?  Seems like 
there may be some cases where we want to know there was an error closing 
a tcp socket or the initrd?  Maybe not, just want to make sure before I 
go make these two functions void.  Thanks,

Josef



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] push/pop errno in initrd read file path
  2016-03-11 18:13   ` Josef Bacik
@ 2016-03-11 19:34     ` Vladimir 'phcoder' Serbinenko
  2016-03-11 20:34       ` Josef Bacik
  0 siblings, 1 reply; 13+ messages in thread
From: Vladimir 'phcoder' Serbinenko @ 2016-03-11 19:34 UTC (permalink / raw)
  To: Josef Bacik, The development of GRUB 2; +Cc: kernel-team

[-- Attachment #1: Type: text/plain, Size: 1216 bytes --]

Le ven. 11 mars 2016 19:13, Josef Bacik <jbacik@fb.com> a écrit :

> On 03/11/2016 12:23 PM, Vladimir 'phcoder' Serbinenko wrote:
> >
> >
> > On Friday, March 11, 2016, Josef Bacik <jbacik@fb.com
> > <mailto:jbacik@fb.com>> wrote:
> >
> >     If you try to load an initrd from http and it errors out we will
> >     free the initrd
> >     context but continue on because net_tcp_socket_close() will reset
> >     the grub_errno
> >     as will grub_initrd_close().  So we'll lose the errno and return
> >     GRUB_ERR_NONE
> >     instead of the original error.  Add push/pulls to the appropriate
> >     places so we
> >     don't lose our errno.  Thanks,
> >
> > Close functions shouldn't do this. Can you fix them instead? Also please
> > add [2.02] to the subjectwhen appropriate, like in this case.
> >
>
> So do we not want close functions to do grub_error() at all?  Seems like
> there may be some cases where we want to know there was an error closing
> a tcp socket or the initrd?  Maybe not, just want to make sure before I
> go make these two functions void.

How can a failure occur in close routines? What can we do with the failure
anyway?

> Thanks,
>
> Josef
>
>

[-- Attachment #2: Type: text/html, Size: 1858 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] push/pop errno in initrd read file path
  2016-03-11 19:34     ` Vladimir 'phcoder' Serbinenko
@ 2016-03-11 20:34       ` Josef Bacik
  2016-03-11 22:00         ` Vladimir 'phcoder' Serbinenko
  0 siblings, 1 reply; 13+ messages in thread
From: Josef Bacik @ 2016-03-11 20:34 UTC (permalink / raw)
  To: Vladimir 'phcoder' Serbinenko, The development of GRUB 2
  Cc: kernel-team

On 03/11/2016 02:34 PM, Vladimir 'phcoder' Serbinenko wrote:
>
>
> Le ven. 11 mars 2016 19:13, Josef Bacik <jbacik@fb.com
> <mailto:jbacik@fb.com>> a écrit :
>
>     On 03/11/2016 12:23 PM, Vladimir 'phcoder' Serbinenko wrote:
>      >
>      >
>      > On Friday, March 11, 2016, Josef Bacik <jbacik@fb.com
>     <mailto:jbacik@fb.com>
>      > <mailto:jbacik@fb.com <mailto:jbacik@fb.com>>> wrote:
>      >
>      >     If you try to load an initrd from http and it errors out we will
>      >     free the initrd
>      >     context but continue on because net_tcp_socket_close() will reset
>      >     the grub_errno
>      >     as will grub_initrd_close().  So we'll lose the errno and return
>      >     GRUB_ERR_NONE
>      >     instead of the original error.  Add push/pulls to the appropriate
>      >     places so we
>      >     don't lose our errno.  Thanks,
>      >
>      > Close functions shouldn't do this. Can you fix them instead? Also
>     please
>      > add [2.02] to the subjectwhen appropriate, like in this case.
>      >
>
>     So do we not want close functions to do grub_error() at all?  Seems like
>     there may be some cases where we want to know there was an error closing
>     a tcp socket or the initrd?  Maybe not, just want to make sure before I
>     go make these two functions void.
>
> How can a failure occur in close routines? What can we do with the
> failure anyway?

So sending the FIN packet for the tcp close was failing for example.  I 
don't think we can do anything really, I just don't like doing a patch 4 
times, so I want to make sure turning these close functions into void's 
is ok.  Thanks,

Josef



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] push/pop errno in initrd read file path
  2016-03-11 20:34       ` Josef Bacik
@ 2016-03-11 22:00         ` Vladimir 'phcoder' Serbinenko
  2016-03-12  7:01           ` Andrei Borzenkov
  0 siblings, 1 reply; 13+ messages in thread
From: Vladimir 'phcoder' Serbinenko @ 2016-03-11 22:00 UTC (permalink / raw)
  To: Josef Bacik; +Cc: The development of GRUB 2, kernel-team

[-- Attachment #1: Type: text/plain, Size: 2024 bytes --]

On Friday, March 11, 2016, Josef Bacik <jbacik@fb.com> wrote:

> On 03/11/2016 02:34 PM, Vladimir 'phcoder' Serbinenko wrote:
>
>>
>>
>> Le ven. 11 mars 2016 19:13, Josef Bacik <jbacik@fb.com
>> <mailto:jbacik@fb.com>> a écrit :
>>
>>     On 03/11/2016 12:23 PM, Vladimir 'phcoder' Serbinenko wrote:
>>      >
>>      >
>>      > On Friday, March 11, 2016, Josef Bacik <jbacik@fb.com
>>     <mailto:jbacik@fb.com>
>>      > <mailto:jbacik@fb.com <mailto:jbacik@fb.com>>> wrote:
>>      >
>>      >     If you try to load an initrd from http and it errors out we
>> will
>>      >     free the initrd
>>      >     context but continue on because net_tcp_socket_close() will
>> reset
>>      >     the grub_errno
>>      >     as will grub_initrd_close().  So we'll lose the errno and
>> return
>>      >     GRUB_ERR_NONE
>>      >     instead of the original error.  Add push/pulls to the
>> appropriate
>>      >     places so we
>>      >     don't lose our errno.  Thanks,
>>      >
>>      > Close functions shouldn't do this. Can you fix them instead? Also
>>     please
>>      > add [2.02] to the subjectwhen appropriate, like in this case.
>>      >
>>
>>     So do we not want close functions to do grub_error() at all?  Seems
>> like
>>     there may be some cases where we want to know there was an error
>> closing
>>     a tcp socket or the initrd?  Maybe not, just want to make sure before
>> I
>>     go make these two functions void.
>>
>> How can a failure occur in close routines? What can we do with the
>> failure anyway?
>>
>
> So sending the FIN packet for the tcp close was failing for example.  I
> don't think we can do anything really, I just don't like doing a patch 4
> times, so I want to make sure turning these close functions into void's is
> ok.  Thanks,
>
> I prefer void close functions.  Give it another day and if nobody objects
by then, then it's fine for everyone.

> Josef
>
>

-- 
Regards
Vladimir 'phcoder' Serbinenko

[-- Attachment #2: Type: text/html, Size: 2738 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 3/3] pxenet: process transmit interrupts when out of resources
  2016-03-11 16:28 ` [PATCH 3/3] pxenet: process transmit interrupts when out of resources Josef Bacik
@ 2016-03-12  6:20   ` Andrei Borzenkov
  0 siblings, 0 replies; 13+ messages in thread
From: Andrei Borzenkov @ 2016-03-12  6:20 UTC (permalink / raw)
  To: The development of GNU GRUB, kernel-team

11.03.2016 19:28, Josef Bacik пишет:
> We were hitting a problem in production where our bios based machines would
> sometimes timeout while pulling down either the kernel/initrd.  It turned out
> that sometimes we'd run out of transmit buffers and would then error out while
> sending packets.  This is because we don't actually have an interrupt handler in
> PXE, we just poll the card when we want to receive.  This works most of the
> time, but sometimes we end up with too many transmit interrupts pending and then
> we can't add new packets to the transmit buffers.
> 
> So rework the whole ISR logic to be able to be called from both transmit and
> receive.  If we get OUT_OF_RESOURCES while trying to transmit then we can go
> through and process the interrupts and hope that leaves us with space to retry
> the transmit.  Unfortunately this puts us in a place where we can trip over a
> RECEIVE interrupt, so we have to process that interrupt and leave a netbuff
> behind for the next call into recv.
> 
> With this patch we are now able to properly provision boxes suffering from this
> problem.  Thanks,
> 
> Signed-off-by: Josef Bacik <jbacik@fb.com>
> ---
>  grub-core/net/drivers/i386/pc/pxe.c | 155 +++++++++++++++++++++++++-----------
>  1 file changed, 108 insertions(+), 47 deletions(-)
> 
> diff --git a/grub-core/net/drivers/i386/pc/pxe.c b/grub-core/net/drivers/i386/pc/pxe.c
> index 3f4152d..57445b7 100644
> --- a/grub-core/net/drivers/i386/pc/pxe.c
> +++ b/grub-core/net/drivers/i386/pc/pxe.c
> @@ -166,16 +166,11 @@ grub_pxe_scan (void)
>    return bangpxe;
>  }
>  
> -static struct grub_net_buff *
> -grub_pxe_recv (struct grub_net_card *dev __attribute__ ((unused)))
> -{
> -  struct grub_pxe_undi_isr *isr;
> -  static int in_progress = 0;
> -  grub_uint8_t *ptr, *end;
> -  struct grub_net_buff *buf;
> -
> -  isr = (void *) GRUB_MEMORY_MACHINE_SCRATCH_ADDR;
> +static int in_progress = 0;
>  

You need to reset it in ->open or ->close.

> +static void
> +grub_pxe_process_isr (struct grub_pxe_undi_isr *isr)
> +{
>    if (!in_progress)
>      {
>        grub_memset (isr, 0, sizeof (*isr));
> @@ -186,13 +181,14 @@ grub_pxe_recv (struct grub_net_card *dev __attribute__ ((unused)))
>  	 breaks on intel cards.
>         */
>        if (isr->status)
> -	{
> -	  in_progress = 0;
> -	  return NULL;
> +        {
> +	  grub_dprintf("net", "problem pulling isr %d\n", (int)isr->status);

"pxe" would be better for dedicated debugging, "net" is too generic. We
can also set debug="pxe net" if needed. Also PXE spec lists status
values as hex, would be better to print it this way too.

> +	  return;
>  	}
>        grub_memset (isr, 0, sizeof (*isr));
>        isr->func_flag = GRUB_PXE_ISR_IN_PROCESS;
>        grub_pxe_call (GRUB_PXENV_UNDI_ISR, isr, pxe_rm_entry);
> +      in_progress = 1;
>      }
>    else
>      {
> @@ -200,18 +196,13 @@ grub_pxe_recv (struct grub_net_card *dev __attribute__ ((unused)))
>        isr->func_flag = GRUB_PXE_ISR_IN_GET_NEXT;
>        grub_pxe_call (GRUB_PXENV_UNDI_ISR, isr, pxe_rm_entry);
>      }
> +}
>  
> -  while (isr->func_flag != GRUB_PXE_ISR_OUT_RECEIVE)
> -    {
> -      if (isr->status || isr->func_flag == GRUB_PXE_ISR_OUT_DONE)
> -	{
> -	  in_progress = 0;
> -	  return NULL;
> -	}
> -      grub_memset (isr, 0, sizeof (*isr));
> -      isr->func_flag = GRUB_PXE_ISR_IN_GET_NEXT;
> -      grub_pxe_call (GRUB_PXENV_UNDI_ISR, isr, pxe_rm_entry);
> -    }
> +static struct grub_net_buff *
> +grub_pxe_make_net_buff (struct grub_pxe_undi_isr *isr)
> +{
> +  struct grub_net_buff *buf;
> +  grub_uint8_t *ptr, *end;
>  
>    buf = grub_netbuff_alloc (isr->frame_len + 2);
>    if (!buf)
> @@ -230,48 +221,118 @@ grub_pxe_recv (struct grub_net_card *dev __attribute__ ((unused)))
>    ptr += isr->buffer_len;
>    while (ptr < end)
>      {
> -      grub_memset (isr, 0, sizeof (*isr));
> -      isr->func_flag = GRUB_PXE_ISR_IN_GET_NEXT;
> -      grub_pxe_call (GRUB_PXENV_UNDI_ISR, isr, pxe_rm_entry);
> +      grub_pxe_process_isr (isr);
>        if (isr->status || isr->func_flag != GRUB_PXE_ISR_OUT_RECEIVE)
>  	{
> -	  in_progress = 1;
> +	  grub_dprintf("net", "half processed packet\n");
>  	  grub_netbuff_free (buf);
>  	  return NULL;
>  	}
> -
>        grub_memcpy (ptr, LINEAR (isr->buffer), isr->buffer_len);
>        ptr += isr->buffer_len;
>      }
> -  in_progress = 1;
> -
>    return buf;
>  }
>  
> +static struct grub_net_buff *
> +grub_pxe_recv (struct grub_net_card *dev)
> +{
> +  struct grub_pxe_undi_isr *isr;
> +
> +  if (dev->data)
> +    {
> +      struct grub_net_buff *buf = dev->data;
> +      grub_dprintf("net", "Pulling existing receive packet\n");
> +      dev->data = NULL;
> +      return buf;
> +    }
> +
> +  isr = (void *) GRUB_MEMORY_MACHINE_SCRATCH_ADDR;
> +  grub_pxe_process_isr (isr);
> +  while (isr->func_flag != GRUB_PXE_ISR_OUT_RECEIVE)
> +    {
> +      if (isr->status || isr->func_flag == GRUB_PXE_ISR_OUT_DONE)
> +	{
> +	  if (isr->status)
> +	    grub_dprintf("net", "error pulling next %d\n", (int)isr->status);
> +	  in_progress = 0;
> +	  return 0;
> +	}
> +      grub_memset (isr, 0, sizeof (*isr));
> +      isr->func_flag = GRUB_PXE_ISR_IN_GET_NEXT;
> +      grub_pxe_call (GRUB_PXENV_UNDI_ISR, isr, pxe_rm_entry);
> +    }
> +  return grub_pxe_make_net_buff (isr);
> +}
> +
>  static grub_err_t 
> -grub_pxe_send (struct grub_net_card *dev __attribute__ ((unused)),
> -	       struct grub_net_buff *pack)
> +grub_pxe_send (struct grub_net_card *dev, struct grub_net_buff *pack)
>  {
>    struct grub_pxe_undi_transmit *trans;
>    struct grub_pxe_undi_tbd *tbd;
>    char *buf;
>  
> -  trans = (void *) GRUB_MEMORY_MACHINE_SCRATCH_ADDR;
> -  grub_memset (trans, 0, sizeof (*trans));
> -  tbd = (void *) (GRUB_MEMORY_MACHINE_SCRATCH_ADDR + 128);
> -  grub_memset (tbd, 0, sizeof (*tbd));
> -  buf = (void *) (GRUB_MEMORY_MACHINE_SCRATCH_ADDR + 256);
> -  grub_memcpy (buf, pack->data, pack->tail - pack->data);
> -
> -  trans->tbd = SEGOFS ((grub_addr_t) tbd);
> -  trans->protocol = 0;
> -  tbd->len = pack->tail - pack->data;
> -  tbd->buf = SEGOFS ((grub_addr_t) buf);
> -
> -  grub_pxe_call (GRUB_PXENV_UNDI_TRANSMIT, trans, pxe_rm_entry);
> -  if (trans->status)
> -    return grub_error (GRUB_ERR_IO, N_("couldn't send network packet"));
> +  while (1)
> +    {
> +      int made_progress = 0;
> +
> +      trans = (void *) GRUB_MEMORY_MACHINE_SCRATCH_ADDR;
> +      grub_memset (trans, 0, sizeof (*trans));
> +      tbd = (void *) (GRUB_MEMORY_MACHINE_SCRATCH_ADDR + 128);
> +      grub_memset (tbd, 0, sizeof (*tbd));
> +      buf = (void *) (GRUB_MEMORY_MACHINE_SCRATCH_ADDR + 256);
> +      grub_memcpy (buf, pack->data, pack->tail - pack->data);
> +
> +      trans->tbd = SEGOFS ((grub_addr_t) tbd);
> +      trans->protocol = 0;
> +      tbd->len = pack->tail - pack->data;
> +      tbd->buf = SEGOFS ((grub_addr_t) buf);
> +
> +      grub_pxe_call (GRUB_PXENV_UNDI_TRANSMIT, trans, pxe_rm_entry);
> +      if (!trans->status)
> +	break;

This looks like the only terminating condition. What if hardware is
stuck and cannot make progress anymore and continues to return error here?

> +      if (trans->status == GRUB_PXENV_STATUS_OUT_OF_RESOURCES)
> +	{
> +	  struct grub_pxe_undi_isr *isr;
> +
> +	  grub_dprintf("net", "Out of transmit buffers, processing "
> +		       "interrupts\n");
> +	  /* Process any outstanding transmit interrupts. */
> +	  isr = (void *) GRUB_MEMORY_MACHINE_SCRATCH_ADDR;
> +	  do
> +	    {
> +	      grub_pxe_process_isr (isr);
> +	      if (isr->status)
> +		{
> +		  grub_dprintf("net", "process interrupts errored %d,"
> +			       "made_progress %d\n", (int)isr->status, made_progress);
> +		  if (made_progress)
> +		    break;
> +		  else
> +		    goto out_err;
> +		}
> +	      if (isr->func_flag == GRUB_PXE_ISR_OUT_DONE)
> +		in_progress = 0;
> +	      made_progress = 1;
> +	    } while (isr->func_flag == GRUB_PXE_ISR_OUT_TRANSMIT);
> +
> +	  /* If we had a receive interrupt in the queue we need to copy this
> +	     buffer out otherwise we'll lose it. */
> +	  if (isr->func_flag == GRUB_PXE_ISR_OUT_RECEIVE)

It probably should check isr->status, we come here also if we got an
error previously. Such packets are ignored everywhere else.

> +	    {
> +	      if (dev->data)
> +		grub_dprintf("net", "dropping packet, already have a "
> +			     "pending packet.\n");
> +	      else
> +	        dev->data = grub_pxe_make_net_buff (isr);
> +	    }
> +	}
> +      else
> +        goto out_err;
> +    }
>    return 0;
> +out_err:
> +  return grub_error (GRUB_ERR_IO, N_("couldn't send network packet"));
>  }
>  
>  static void
> 



^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] push/pop errno in initrd read file path
  2016-03-11 22:00         ` Vladimir 'phcoder' Serbinenko
@ 2016-03-12  7:01           ` Andrei Borzenkov
  2016-03-15 17:26             ` Vladimir 'phcoder' Serbinenko
  0 siblings, 1 reply; 13+ messages in thread
From: Andrei Borzenkov @ 2016-03-12  7:01 UTC (permalink / raw)
  To: The development of GNU GRUB, Josef Bacik; +Cc: kernel-team

12.03.2016 01:00, Vladimir 'phcoder' Serbinenko пишет:
> On Friday, March 11, 2016, Josef Bacik <jbacik@fb.com> wrote:
> 
>> On 03/11/2016 02:34 PM, Vladimir 'phcoder' Serbinenko wrote:
>>
>>>
>>>
>>> Le ven. 11 mars 2016 19:13, Josef Bacik <jbacik@fb.com
>>> <mailto:jbacik@fb.com>> a écrit :
>>>
>>>     On 03/11/2016 12:23 PM, Vladimir 'phcoder' Serbinenko wrote:
>>>      >
>>>      >
>>>      > On Friday, March 11, 2016, Josef Bacik <jbacik@fb.com
>>>     <mailto:jbacik@fb.com>
>>>      > <mailto:jbacik@fb.com <mailto:jbacik@fb.com>>> wrote:
>>>      >
>>>      >     If you try to load an initrd from http and it errors out we
>>> will
>>>      >     free the initrd
>>>      >     context but continue on because net_tcp_socket_close() will
>>> reset
>>>      >     the grub_errno
>>>      >     as will grub_initrd_close().  So we'll lose the errno and
>>> return
>>>      >     GRUB_ERR_NONE
>>>      >     instead of the original error.  Add push/pulls to the
>>> appropriate
>>>      >     places so we
>>>      >     don't lose our errno.  Thanks,
>>>      >
>>>      > Close functions shouldn't do this. Can you fix them instead? Also
>>>     please
>>>      > add [2.02] to the subjectwhen appropriate, like in this case.
>>>      >
>>>
>>>     So do we not want close functions to do grub_error() at all?  Seems
>>> like
>>>     there may be some cases where we want to know there was an error
>>> closing
>>>     a tcp socket or the initrd?  Maybe not, just want to make sure before
>>> I
>>>     go make these two functions void.
>>>
>>> How can a failure occur in close routines? What can we do with the
>>> failure anyway?
>>>

It is not about problems in close routines themselves, but we call them
to cleanup and lose errors that caused us to clean up.

>>
>> So sending the FIN packet for the tcp close was failing for example.  I
>> don't think we can do anything really, I just don't like doing a patch 4
>> times, so I want to make sure turning these close functions into void's is
>> ok.  Thanks,
>>
>> I prefer void close functions.  Give it another day and if nobody objects
> by then, then it's fine for everyone.
> 

May be I miss something obvious here - which close functions do you
mean? grub_net_tcp_close() is void already, like is grub_initrd_close.
So what exactly do you suggest to change?

The problem is that we lose error indication from upper level protocols
or preceding code every time we close file or socket due to previous error.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] push/pop errno in initrd read file path
  2016-03-12  7:01           ` Andrei Borzenkov
@ 2016-03-15 17:26             ` Vladimir 'phcoder' Serbinenko
  0 siblings, 0 replies; 13+ messages in thread
From: Vladimir 'phcoder' Serbinenko @ 2016-03-15 17:26 UTC (permalink / raw)
  To: The development of GNU GRUB, Josef Bacik; +Cc: kernel-team

[-- Attachment #1: Type: text/plain, Size: 3252 bytes --]

Le Fri, Mar 11, 2016 à 11:01 PM, Andrei Borzenkov <arvidjaar@gmail.com> a
écrit :

> 12.03.2016 01:00, Vladimir 'phcoder' Serbinenko пишет:
> > On Friday, March 11, 2016, Josef Bacik <jbacik@fb.com> wrote:
> >
> >> On 03/11/2016 02:34 PM, Vladimir 'phcoder' Serbinenko wrote:
> >>
> >>>
> >>>
> >>> Le ven. 11 mars 2016 19:13, Josef Bacik <jbacik@fb.com
> >>> <mailto:jbacik@fb.com>> a écrit :
> >>>
> >>>     On 03/11/2016 12:23 PM, Vladimir 'phcoder' Serbinenko wrote:
> >>>      >
> >>>      >
> >>>      > On Friday, March 11, 2016, Josef Bacik <jbacik@fb.com
> >>>     <mailto:jbacik@fb.com>
> >>>      > <mailto:jbacik@fb.com <mailto:jbacik@fb.com>>> wrote:
> >>>      >
> >>>      >     If you try to load an initrd from http and it errors out we
> >>> will
> >>>      >     free the initrd
> >>>      >     context but continue on because net_tcp_socket_close() will
> >>> reset
> >>>      >     the grub_errno
> >>>      >     as will grub_initrd_close().  So we'll lose the errno and
> >>> return
> >>>      >     GRUB_ERR_NONE
> >>>      >     instead of the original error.  Add push/pulls to the
> >>> appropriate
> >>>      >     places so we
> >>>      >     don't lose our errno.  Thanks,
> >>>      >
> >>>      > Close functions shouldn't do this. Can you fix them instead?
> Also
> >>>     please
> >>>      > add [2.02] to the subjectwhen appropriate, like in this case.
> >>>      >
> >>>
> >>>     So do we not want close functions to do grub_error() at all?  Seems
> >>> like
> >>>     there may be some cases where we want to know there was an error
> >>> closing
> >>>     a tcp socket or the initrd?  Maybe not, just want to make sure
> before
> >>> I
> >>>     go make these two functions void.
> >>>
> >>> How can a failure occur in close routines? What can we do with the
> >>> failure anyway?
> >>>
>
> It is not about problems in close routines themselves, but we call them
> to cleanup and lose errors that caused us to clean up.
>
> Yes, and if we don't put the logic to preserve errors in the close
functions, we'll have to repeat this code at every call site which is
error-prone and is code duplication.

> >>
> >> So sending the FIN packet for the tcp close was failing for example.  I
> >> don't think we can do anything really, I just don't like doing a patch 4
> >> times, so I want to make sure turning these close functions into void's
> is
> >> ok.  Thanks,
> >>
> >> I prefer void close functions.  Give it another day and if nobody
> objects
> > by then, then it's fine for everyone.
> >
>
> May be I miss something obvious here - which close functions do you
> mean? grub_net_tcp_close() is void already, like is grub_initrd_close.
> So what exactly do you suggest to change?
>
I suggest to make them preserve grub_errno properly.

>
> The problem is that we lose error indication from upper level protocols
> or preceding code every time we close file or socket due to previous error.
>
Yes, and we need to change close routines, to preserve grub_errno

>
> _______________________________________________
> Grub-devel mailing list
> Grub-devel@gnu.org
> https://lists.gnu.org/mailman/listinfo/grub-devel
>

[-- Attachment #2: Type: text/html, Size: 5215 bytes --]

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] push/pop errno in initrd read file path
  2016-03-11 16:28 [PATCH 1/3] push/pop errno in initrd read file path Josef Bacik
                   ` (2 preceding siblings ...)
  2016-03-11 17:23 ` [PATCH 1/3] push/pop errno in initrd read file path Vladimir 'phcoder' Serbinenko
@ 2016-11-10 12:49 ` Daniel Kiper
  2016-11-18 14:51   ` Josef Bacik
  3 siblings, 1 reply; 13+ messages in thread
From: Daniel Kiper @ 2016-11-10 12:49 UTC (permalink / raw)
  To: jbacik; +Cc: kernel-team, grub-devel

On Fri, Mar 11, 2016 at 11:28:33AM -0500, Josef Bacik wrote:
> If you try to load an initrd from http and it errors out we will free the initrd
> context but continue on because net_tcp_socket_close() will reset the grub_errno
> as will grub_initrd_close().  So we'll lose the errno and return GRUB_ERR_NONE
> instead of the original error.  Add push/pulls to the appropriate places so we
> don't lose our errno.  Thanks,
>
> Signed-off-by: Josef Bacik <jbacik@fb.com>

Josef, could you repost whole patch series taking into account Andrei and
Vladimir comments? We would like to have it in upcoming release.

Daniel


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH 1/3] push/pop errno in initrd read file path
  2016-11-10 12:49 ` Daniel Kiper
@ 2016-11-18 14:51   ` Josef Bacik
  0 siblings, 0 replies; 13+ messages in thread
From: Josef Bacik @ 2016-11-18 14:51 UTC (permalink / raw)
  To: Daniel Kiper; +Cc: kernel-team, grub-devel

On 11/10/2016 07:49 AM, Daniel Kiper wrote:
> On Fri, Mar 11, 2016 at 11:28:33AM -0500, Josef Bacik wrote:
>> If you try to load an initrd from http and it errors out we will free the initrd
>> context but continue on because net_tcp_socket_close() will reset the grub_errno
>> as will grub_initrd_close().  So we'll lose the errno and return GRUB_ERR_NONE
>> instead of the original error.  Add push/pulls to the appropriate places so we
>> don't lose our errno.  Thanks,
>>
>> Signed-off-by: Josef Bacik <jbacik@fb.com>
>
> Josef, could you repost whole patch series taking into account Andrei and
> Vladimir comments? We would like to have it in upcoming release.
>

We stopped using grub2 internally, if you want to update them go ahead but I'm 
not working on grub2 anymore.  Thanks,

Josef



^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2016-11-18 14:51 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-11 16:28 [PATCH 1/3] push/pop errno in initrd read file path Josef Bacik
2016-03-11 16:28 ` [PATCH 2/3] tcp: add a dprintf for opening tcp connections Josef Bacik
2016-03-11 16:28 ` [PATCH 3/3] pxenet: process transmit interrupts when out of resources Josef Bacik
2016-03-12  6:20   ` Andrei Borzenkov
2016-03-11 17:23 ` [PATCH 1/3] push/pop errno in initrd read file path Vladimir 'phcoder' Serbinenko
2016-03-11 18:13   ` Josef Bacik
2016-03-11 19:34     ` Vladimir 'phcoder' Serbinenko
2016-03-11 20:34       ` Josef Bacik
2016-03-11 22:00         ` Vladimir 'phcoder' Serbinenko
2016-03-12  7:01           ` Andrei Borzenkov
2016-03-15 17:26             ` Vladimir 'phcoder' Serbinenko
2016-11-10 12:49 ` Daniel Kiper
2016-11-18 14:51   ` Josef Bacik

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.