All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] ntb_hw_switchtec: Force down the peer's link before initializing
@ 2017-11-28 22:01 Logan Gunthorpe
  2017-11-28 22:05 ` Logan Gunthorpe
  0 siblings, 1 reply; 7+ messages in thread
From: Logan Gunthorpe @ 2017-11-28 22:01 UTC (permalink / raw)
  To: linux-ntb, ThanhTuThai; +Cc: jdmason, Logan Gunthorpe

If one host crashes and soft reboots, the other host may not see a
link down event. Then when the crashed host comes back up, the
surviving host may not know the link was reset and the ntb clients
may not work without being reset.

To solve this, we send a LINK_FORCE_DOWN message to each peer every
time we come up, before we register the ntb device. If a surving
host still thinks the link is up it will take it down immediately.
In this way, once the crashed host comes up fully, it will send a
regular link up event as per usual and the link will be properly
restarted.

While we are in the area, this also fixes the MSG_LINK_UP message that
was in the link down function that was reported by Doug Meyers.

Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
Reported-by: ThanhTuThai <cruisethai@gmail.com>
---
 drivers/ntb/hw/mscc/ntb_hw_switchtec.c | 33 ++++++++++++++++++++++++++-------
 1 file changed, 26 insertions(+), 7 deletions(-)

diff --git a/drivers/ntb/hw/mscc/ntb_hw_switchtec.c b/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
index d71744904b19..5cfcd4deadaf 100644
--- a/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
+++ b/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
@@ -463,18 +463,30 @@ static void switchtec_ntb_set_link_speed(struct switchtec_ntb *sndev)
 	sndev->link_width = min(self_width, peer_width);
 }

-enum {
+enum switchtec_msg {
 	LINK_MESSAGE = 0,
 	MSG_LINK_UP = 1,
 	MSG_LINK_DOWN = 2,
 	MSG_CHECK_LINK = 3,
+	MSG_LINK_FORCE_DOWN = 4,
 };

-static void switchtec_ntb_check_link(struct switchtec_ntb *sndev)
+static void switchtec_ntb_check_link(struct switchtec_ntb *sndev,
+				     enum switchtec_msg msg)
 {
 	int link_sta;
 	int old = sndev->link_is_up;

+	if (msg == MSG_LINK_FORCE_DOWN) {
+		if (sndev->link_is_up) {
+			sndev->link_is_up = 0;
+			ntb_link_event(&sndev->ntb);
+			dev_info(&sndev->stdev->dev, "ntb link forced down\n");
+		}
+
+		return;
+	}
+
 	link_sta = sndev->self_shared->link_sta;
 	if (link_sta) {
 		u64 peer = ioread64(&sndev->peer_shared->magic);
@@ -500,7 +512,7 @@ static void switchtec_ntb_link_notification(struct switchtec_dev *stdev)
 {
 	struct switchtec_ntb *sndev = stdev->sndev;

-	switchtec_ntb_check_link(sndev);
+	switchtec_ntb_check_link(sndev, MSG_CHECK_LINK);
 }

 static u64 switchtec_ntb_link_is_up(struct ntb_dev *ntb,
@@ -528,7 +540,7 @@ static int switchtec_ntb_link_enable(struct ntb_dev *ntb,
 	sndev->self_shared->link_sta = 1;
 	switchtec_ntb_send_msg(sndev, LINK_MESSAGE, MSG_LINK_UP);

-	switchtec_ntb_check_link(sndev);
+	switchtec_ntb_check_link(sndev, MSG_CHECK_LINK);

 	return 0;
 }
@@ -540,9 +552,9 @@ static int switchtec_ntb_link_disable(struct ntb_dev *ntb)
 	dev_dbg(&sndev->stdev->dev, "disabling link\n");

 	sndev->self_shared->link_sta = 0;
-	switchtec_ntb_send_msg(sndev, LINK_MESSAGE, MSG_LINK_UP);
+	switchtec_ntb_send_msg(sndev, LINK_MESSAGE, MSG_LINK_DOWN);

-	switchtec_ntb_check_link(sndev);
+	switchtec_ntb_check_link(sndev, MSG_CHECK_LINK);

 	return 0;
 }
@@ -1061,7 +1073,7 @@ static irqreturn_t switchtec_ntb_message_isr(int irq, void *dev)
 			iowrite8(1, &sndev->mmio_self_dbmsg->imsg[i].status);

 			if (i == LINK_MESSAGE)
-				switchtec_ntb_check_link(sndev);
+				switchtec_ntb_check_link(sndev, msg);
 		}
 	}

@@ -1159,6 +1171,13 @@ static int switchtec_ntb_add(struct device *dev,
 	if (rc)
 		goto deinit_shared_and_exit;

+	/*
+	 * If this host crashed, the other host may think the link is
+	 * still up. Tell them to force it down (it will go back up
+	 * once we register the ntb device).
+	 */
+	switchtec_ntb_send_msg(sndev, LINK_MESSAGE, MSG_LINK_FORCE_DOWN);
+
 	rc = ntb_register_device(&sndev->ntb);
 	if (rc)
 		goto deinit_and_exit;
--
2.11.0

^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] ntb_hw_switchtec: Force down the peer's link before initializing
  2017-11-28 22:01 [PATCH] ntb_hw_switchtec: Force down the peer's link before initializing Logan Gunthorpe
@ 2017-11-28 22:05 ` Logan Gunthorpe
  2017-11-29  3:09   ` Jon Mason
  0 siblings, 1 reply; 7+ messages in thread
From: Logan Gunthorpe @ 2017-11-28 22:05 UTC (permalink / raw)
  To: linux-ntb, ThanhTuThai; +Cc: jdmason

Hey Thai Thu,

Please test with this patch and report back if it fixes your issue.

If it doesn't, can you please send the dmesg output of the two hosts?

Thanks,

Logan

On 28/11/17 03:01 PM, Logan Gunthorpe wrote:
> If one host crashes and soft reboots, the other host may not see a
> link down event. Then when the crashed host comes back up, the
> surviving host may not know the link was reset and the ntb clients
> may not work without being reset.
> 
> To solve this, we send a LINK_FORCE_DOWN message to each peer every
> time we come up, before we register the ntb device. If a surving
> host still thinks the link is up it will take it down immediately.
> In this way, once the crashed host comes up fully, it will send a
> regular link up event as per usual and the link will be properly
> restarted.
> 
> While we are in the area, this also fixes the MSG_LINK_UP message that
> was in the link down function that was reported by Doug Meyers.
> 
> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> Reported-by: ThanhTuThai <cruisethai@gmail.com>
> ---
>   drivers/ntb/hw/mscc/ntb_hw_switchtec.c | 33 ++++++++++++++++++++++++++-------
>   1 file changed, 26 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/ntb/hw/mscc/ntb_hw_switchtec.c b/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
> index d71744904b19..5cfcd4deadaf 100644
> --- a/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
> +++ b/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
> @@ -463,18 +463,30 @@ static void switchtec_ntb_set_link_speed(struct switchtec_ntb *sndev)
>   	sndev->link_width = min(self_width, peer_width);
>   }
> 
> -enum {
> +enum switchtec_msg {
>   	LINK_MESSAGE = 0,
>   	MSG_LINK_UP = 1,
>   	MSG_LINK_DOWN = 2,
>   	MSG_CHECK_LINK = 3,
> +	MSG_LINK_FORCE_DOWN = 4,
>   };
> 
> -static void switchtec_ntb_check_link(struct switchtec_ntb *sndev)
> +static void switchtec_ntb_check_link(struct switchtec_ntb *sndev,
> +				     enum switchtec_msg msg)
>   {
>   	int link_sta;
>   	int old = sndev->link_is_up;
> 
> +	if (msg == MSG_LINK_FORCE_DOWN) {
> +		if (sndev->link_is_up) {
> +			sndev->link_is_up = 0;
> +			ntb_link_event(&sndev->ntb);
> +			dev_info(&sndev->stdev->dev, "ntb link forced down\n");
> +		}
> +
> +		return;
> +	}
> +
>   	link_sta = sndev->self_shared->link_sta;
>   	if (link_sta) {
>   		u64 peer = ioread64(&sndev->peer_shared->magic);
> @@ -500,7 +512,7 @@ static void switchtec_ntb_link_notification(struct switchtec_dev *stdev)
>   {
>   	struct switchtec_ntb *sndev = stdev->sndev;
> 
> -	switchtec_ntb_check_link(sndev);
> +	switchtec_ntb_check_link(sndev, MSG_CHECK_LINK);
>   }
> 
>   static u64 switchtec_ntb_link_is_up(struct ntb_dev *ntb,
> @@ -528,7 +540,7 @@ static int switchtec_ntb_link_enable(struct ntb_dev *ntb,
>   	sndev->self_shared->link_sta = 1;
>   	switchtec_ntb_send_msg(sndev, LINK_MESSAGE, MSG_LINK_UP);
> 
> -	switchtec_ntb_check_link(sndev);
> +	switchtec_ntb_check_link(sndev, MSG_CHECK_LINK);
> 
>   	return 0;
>   }
> @@ -540,9 +552,9 @@ static int switchtec_ntb_link_disable(struct ntb_dev *ntb)
>   	dev_dbg(&sndev->stdev->dev, "disabling link\n");
> 
>   	sndev->self_shared->link_sta = 0;
> -	switchtec_ntb_send_msg(sndev, LINK_MESSAGE, MSG_LINK_UP);
> +	switchtec_ntb_send_msg(sndev, LINK_MESSAGE, MSG_LINK_DOWN);
> 
> -	switchtec_ntb_check_link(sndev);
> +	switchtec_ntb_check_link(sndev, MSG_CHECK_LINK);
> 
>   	return 0;
>   }
> @@ -1061,7 +1073,7 @@ static irqreturn_t switchtec_ntb_message_isr(int irq, void *dev)
>   			iowrite8(1, &sndev->mmio_self_dbmsg->imsg[i].status);
> 
>   			if (i == LINK_MESSAGE)
> -				switchtec_ntb_check_link(sndev);
> +				switchtec_ntb_check_link(sndev, msg);
>   		}
>   	}
> 
> @@ -1159,6 +1171,13 @@ static int switchtec_ntb_add(struct device *dev,
>   	if (rc)
>   		goto deinit_shared_and_exit;
> 
> +	/*
> +	 * If this host crashed, the other host may think the link is
> +	 * still up. Tell them to force it down (it will go back up
> +	 * once we register the ntb device).
> +	 */
> +	switchtec_ntb_send_msg(sndev, LINK_MESSAGE, MSG_LINK_FORCE_DOWN);
> +
>   	rc = ntb_register_device(&sndev->ntb);
>   	if (rc)
>   		goto deinit_and_exit;
> --
> 2.11.0
> 

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ntb_hw_switchtec: Force down the peer's link before initializing
  2017-11-28 22:05 ` Logan Gunthorpe
@ 2017-11-29  3:09   ` Jon Mason
  2017-11-29  5:28     ` ThanhTuThai
  0 siblings, 1 reply; 7+ messages in thread
From: Jon Mason @ 2017-11-29  3:09 UTC (permalink / raw)
  To: Logan Gunthorpe; +Cc: linux-ntb, ThanhTuThai

On Tue, Nov 28, 2017 at 5:05 PM, Logan Gunthorpe <logang@deltatee.com> wrote:
> Hey Thai Thu,
>
> Please test with this patch and report back if it fixes your issue.

I'll wait on the ack before applying this patch and I'll add the
"tested-by" line (assuming it works).

Thanks,
Jon

>
> If it doesn't, can you please send the dmesg output of the two hosts?
>
> Thanks,
>
> Logan
>
>
> On 28/11/17 03:01 PM, Logan Gunthorpe wrote:
>>
>> If one host crashes and soft reboots, the other host may not see a
>> link down event. Then when the crashed host comes back up, the
>> surviving host may not know the link was reset and the ntb clients
>> may not work without being reset.
>>
>> To solve this, we send a LINK_FORCE_DOWN message to each peer every
>> time we come up, before we register the ntb device. If a surving
>> host still thinks the link is up it will take it down immediately.
>> In this way, once the crashed host comes up fully, it will send a
>> regular link up event as per usual and the link will be properly
>> restarted.
>>
>> While we are in the area, this also fixes the MSG_LINK_UP message that
>> was in the link down function that was reported by Doug Meyers.
>>
>> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>> Reported-by: ThanhTuThai <cruisethai@gmail.com>
>> ---
>>   drivers/ntb/hw/mscc/ntb_hw_switchtec.c | 33
>> ++++++++++++++++++++++++++-------
>>   1 file changed, 26 insertions(+), 7 deletions(-)
>>
>> diff --git a/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
>> b/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
>> index d71744904b19..5cfcd4deadaf 100644
>> --- a/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
>> +++ b/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
>> @@ -463,18 +463,30 @@ static void switchtec_ntb_set_link_speed(struct
>> switchtec_ntb *sndev)
>>         sndev->link_width = min(self_width, peer_width);
>>   }
>>
>> -enum {
>> +enum switchtec_msg {
>>         LINK_MESSAGE = 0,
>>         MSG_LINK_UP = 1,
>>         MSG_LINK_DOWN = 2,
>>         MSG_CHECK_LINK = 3,
>> +       MSG_LINK_FORCE_DOWN = 4,
>>   };
>>
>> -static void switchtec_ntb_check_link(struct switchtec_ntb *sndev)
>> +static void switchtec_ntb_check_link(struct switchtec_ntb *sndev,
>> +                                    enum switchtec_msg msg)
>>   {
>>         int link_sta;
>>         int old = sndev->link_is_up;
>>
>> +       if (msg == MSG_LINK_FORCE_DOWN) {
>> +               if (sndev->link_is_up) {
>> +                       sndev->link_is_up = 0;
>> +                       ntb_link_event(&sndev->ntb);
>> +                       dev_info(&sndev->stdev->dev, "ntb link forced
>> down\n");
>> +               }
>> +
>> +               return;
>> +       }
>> +
>>         link_sta = sndev->self_shared->link_sta;
>>         if (link_sta) {
>>                 u64 peer = ioread64(&sndev->peer_shared->magic);
>> @@ -500,7 +512,7 @@ static void switchtec_ntb_link_notification(struct
>> switchtec_dev *stdev)
>>   {
>>         struct switchtec_ntb *sndev = stdev->sndev;
>>
>> -       switchtec_ntb_check_link(sndev);
>> +       switchtec_ntb_check_link(sndev, MSG_CHECK_LINK);
>>   }
>>
>>   static u64 switchtec_ntb_link_is_up(struct ntb_dev *ntb,
>> @@ -528,7 +540,7 @@ static int switchtec_ntb_link_enable(struct ntb_dev
>> *ntb,
>>         sndev->self_shared->link_sta = 1;
>>         switchtec_ntb_send_msg(sndev, LINK_MESSAGE, MSG_LINK_UP);
>>
>> -       switchtec_ntb_check_link(sndev);
>> +       switchtec_ntb_check_link(sndev, MSG_CHECK_LINK);
>>
>>         return 0;
>>   }
>> @@ -540,9 +552,9 @@ static int switchtec_ntb_link_disable(struct ntb_dev
>> *ntb)
>>         dev_dbg(&sndev->stdev->dev, "disabling link\n");
>>
>>         sndev->self_shared->link_sta = 0;
>> -       switchtec_ntb_send_msg(sndev, LINK_MESSAGE, MSG_LINK_UP);
>> +       switchtec_ntb_send_msg(sndev, LINK_MESSAGE, MSG_LINK_DOWN);
>>
>> -       switchtec_ntb_check_link(sndev);
>> +       switchtec_ntb_check_link(sndev, MSG_CHECK_LINK);
>>
>>         return 0;
>>   }
>> @@ -1061,7 +1073,7 @@ static irqreturn_t switchtec_ntb_message_isr(int
>> irq, void *dev)
>>                         iowrite8(1,
>> &sndev->mmio_self_dbmsg->imsg[i].status);
>>
>>                         if (i == LINK_MESSAGE)
>> -                               switchtec_ntb_check_link(sndev);
>> +                               switchtec_ntb_check_link(sndev, msg);
>>                 }
>>         }
>>
>> @@ -1159,6 +1171,13 @@ static int switchtec_ntb_add(struct device *dev,
>>         if (rc)
>>                 goto deinit_shared_and_exit;
>>
>> +       /*
>> +        * If this host crashed, the other host may think the link is
>> +        * still up. Tell them to force it down (it will go back up
>> +        * once we register the ntb device).
>> +        */
>> +       switchtec_ntb_send_msg(sndev, LINK_MESSAGE, MSG_LINK_FORCE_DOWN);
>> +
>>         rc = ntb_register_device(&sndev->ntb);
>>         if (rc)
>>                 goto deinit_and_exit;
>> --
>> 2.11.0
>>
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ntb_hw_switchtec: Force down the peer's link before initializing
  2017-11-29  3:09   ` Jon Mason
@ 2017-11-29  5:28     ` ThanhTuThai
  2017-11-29  6:57       ` ThanhTuThai
  2017-11-29 16:37       ` Logan Gunthorpe
  0 siblings, 2 replies; 7+ messages in thread
From: ThanhTuThai @ 2017-11-29  5:28 UTC (permalink / raw)
  To: Jon Mason; +Cc: Logan Gunthorpe, linux-ntb

[-- Attachment #1: Type: text/plain, Size: 5727 bytes --]

Dear Logan,

It works only for soft-reboot, but for power-on reset, it doesn't.
I saw the problem is: with the power-on reset, the good peer didn't receive
any message from the crashed peer even the crashed peer already sent out
the message.

Do you think that after power-in reset, the memory-window is modified ?
So it cannot translate the message from the crashed peer to the good peer.

Thanks !


On 29 November 2017 at 11:09, Jon Mason <jdmason@kudzu.us> wrote:

> On Tue, Nov 28, 2017 at 5:05 PM, Logan Gunthorpe <logang@deltatee.com>
> wrote:
> > Hey Thai Thu,
> >
> > Please test with this patch and report back if it fixes your issue.
>
> I'll wait on the ack before applying this patch and I'll add the
> "tested-by" line (assuming it works).
>
> Thanks,
> Jon
>
> >
> > If it doesn't, can you please send the dmesg output of the two hosts?
> >
> > Thanks,
> >
> > Logan
> >
> >
> > On 28/11/17 03:01 PM, Logan Gunthorpe wrote:
> >>
> >> If one host crashes and soft reboots, the other host may not see a
> >> link down event. Then when the crashed host comes back up, the
> >> surviving host may not know the link was reset and the ntb clients
> >> may not work without being reset.
> >>
> >> To solve this, we send a LINK_FORCE_DOWN message to each peer every
> >> time we come up, before we register the ntb device. If a surving
> >> host still thinks the link is up it will take it down immediately.
> >> In this way, once the crashed host comes up fully, it will send a
> >> regular link up event as per usual and the link will be properly
> >> restarted.
> >>
> >> While we are in the area, this also fixes the MSG_LINK_UP message that
> >> was in the link down function that was reported by Doug Meyers.
> >>
> >> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
> >> Reported-by: ThanhTuThai <cruisethai@gmail.com>
> >> ---
> >>   drivers/ntb/hw/mscc/ntb_hw_switchtec.c | 33
> >> ++++++++++++++++++++++++++-------
> >>   1 file changed, 26 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
> >> b/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
> >> index d71744904b19..5cfcd4deadaf 100644
> >> --- a/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
> >> +++ b/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
> >> @@ -463,18 +463,30 @@ static void switchtec_ntb_set_link_speed(struct
> >> switchtec_ntb *sndev)
> >>         sndev->link_width = min(self_width, peer_width);
> >>   }
> >>
> >> -enum {
> >> +enum switchtec_msg {
> >>         LINK_MESSAGE = 0,
> >>         MSG_LINK_UP = 1,
> >>         MSG_LINK_DOWN = 2,
> >>         MSG_CHECK_LINK = 3,
> >> +       MSG_LINK_FORCE_DOWN = 4,
> >>   };
> >>
> >> -static void switchtec_ntb_check_link(struct switchtec_ntb *sndev)
> >> +static void switchtec_ntb_check_link(struct switchtec_ntb *sndev,
> >> +                                    enum switchtec_msg msg)
> >>   {
> >>         int link_sta;
> >>         int old = sndev->link_is_up;
> >>
> >> +       if (msg == MSG_LINK_FORCE_DOWN) {
> >> +               if (sndev->link_is_up) {
> >> +                       sndev->link_is_up = 0;
> >> +                       ntb_link_event(&sndev->ntb);
> >> +                       dev_info(&sndev->stdev->dev, "ntb link forced
> >> down\n");
> >> +               }
> >> +
> >> +               return;
> >> +       }
> >> +
> >>         link_sta = sndev->self_shared->link_sta;
> >>         if (link_sta) {
> >>                 u64 peer = ioread64(&sndev->peer_shared->magic);
> >> @@ -500,7 +512,7 @@ static void switchtec_ntb_link_notification(struct
> >> switchtec_dev *stdev)
> >>   {
> >>         struct switchtec_ntb *sndev = stdev->sndev;
> >>
> >> -       switchtec_ntb_check_link(sndev);
> >> +       switchtec_ntb_check_link(sndev, MSG_CHECK_LINK);
> >>   }
> >>
> >>   static u64 switchtec_ntb_link_is_up(struct ntb_dev *ntb,
> >> @@ -528,7 +540,7 @@ static int switchtec_ntb_link_enable(struct ntb_dev
> >> *ntb,
> >>         sndev->self_shared->link_sta = 1;
> >>         switchtec_ntb_send_msg(sndev, LINK_MESSAGE, MSG_LINK_UP);
> >>
> >> -       switchtec_ntb_check_link(sndev);
> >> +       switchtec_ntb_check_link(sndev, MSG_CHECK_LINK);
> >>
> >>         return 0;
> >>   }
> >> @@ -540,9 +552,9 @@ static int switchtec_ntb_link_disable(struct
> ntb_dev
> >> *ntb)
> >>         dev_dbg(&sndev->stdev->dev, "disabling link\n");
> >>
> >>         sndev->self_shared->link_sta = 0;
> >> -       switchtec_ntb_send_msg(sndev, LINK_MESSAGE, MSG_LINK_UP);
> >> +       switchtec_ntb_send_msg(sndev, LINK_MESSAGE, MSG_LINK_DOWN);
> >>
> >> -       switchtec_ntb_check_link(sndev);
> >> +       switchtec_ntb_check_link(sndev, MSG_CHECK_LINK);
> >>
> >>         return 0;
> >>   }
> >> @@ -1061,7 +1073,7 @@ static irqreturn_t switchtec_ntb_message_isr(int
> >> irq, void *dev)
> >>                         iowrite8(1,
> >> &sndev->mmio_self_dbmsg->imsg[i].status);
> >>
> >>                         if (i == LINK_MESSAGE)
> >> -                               switchtec_ntb_check_link(sndev);
> >> +                               switchtec_ntb_check_link(sndev, msg);
> >>                 }
> >>         }
> >>
> >> @@ -1159,6 +1171,13 @@ static int switchtec_ntb_add(struct device *dev,
> >>         if (rc)
> >>                 goto deinit_shared_and_exit;
> >>
> >> +       /*
> >> +        * If this host crashed, the other host may think the link is
> >> +        * still up. Tell them to force it down (it will go back up
> >> +        * once we register the ntb device).
> >> +        */
> >> +       switchtec_ntb_send_msg(sndev, LINK_MESSAGE,
> MSG_LINK_FORCE_DOWN);
> >> +
> >>         rc = ntb_register_device(&sndev->ntb);
> >>         if (rc)
> >>                 goto deinit_and_exit;
> >> --
> >> 2.11.0
> >>
> >
>

[-- Attachment #2: Type: text/html, Size: 8166 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ntb_hw_switchtec: Force down the peer's link before initializing
  2017-11-29  5:28     ` ThanhTuThai
@ 2017-11-29  6:57       ` ThanhTuThai
  2017-11-29 16:37       ` Logan Gunthorpe
  1 sibling, 0 replies; 7+ messages in thread
From: ThanhTuThai @ 2017-11-29  6:57 UTC (permalink / raw)
  To: Jon Mason; +Cc: Logan Gunthorpe, linux-ntb

[-- Attachment #1: Type: text/plain, Size: 6072 bytes --]

The good peer didn't receive any message, the interrupt routine is not
asserted.

Thanks !

On 29 November 2017 at 13:28, ThanhTuThai <cruisethai@gmail.com> wrote:

> Dear Logan,
>
> It works only for soft-reboot, but for power-on reset, it doesn't.
> I saw the problem is: with the power-on reset, the good peer didn't
> receive any message from the crashed peer even the crashed peer already
> sent out the message.
>
> Do you think that after power-in reset, the memory-window is modified ?
> So it cannot translate the message from the crashed peer to the good peer.
>
> Thanks !
>
>
> On 29 November 2017 at 11:09, Jon Mason <jdmason@kudzu.us> wrote:
>
>> On Tue, Nov 28, 2017 at 5:05 PM, Logan Gunthorpe <logang@deltatee.com>
>> wrote:
>> > Hey Thai Thu,
>> >
>> > Please test with this patch and report back if it fixes your issue.
>>
>> I'll wait on the ack before applying this patch and I'll add the
>> "tested-by" line (assuming it works).
>>
>> Thanks,
>> Jon
>>
>> >
>> > If it doesn't, can you please send the dmesg output of the two hosts?
>> >
>> > Thanks,
>> >
>> > Logan
>> >
>> >
>> > On 28/11/17 03:01 PM, Logan Gunthorpe wrote:
>> >>
>> >> If one host crashes and soft reboots, the other host may not see a
>> >> link down event. Then when the crashed host comes back up, the
>> >> surviving host may not know the link was reset and the ntb clients
>> >> may not work without being reset.
>> >>
>> >> To solve this, we send a LINK_FORCE_DOWN message to each peer every
>> >> time we come up, before we register the ntb device. If a surving
>> >> host still thinks the link is up it will take it down immediately.
>> >> In this way, once the crashed host comes up fully, it will send a
>> >> regular link up event as per usual and the link will be properly
>> >> restarted.
>> >>
>> >> While we are in the area, this also fixes the MSG_LINK_UP message that
>> >> was in the link down function that was reported by Doug Meyers.
>> >>
>> >> Signed-off-by: Logan Gunthorpe <logang@deltatee.com>
>> >> Reported-by: ThanhTuThai <cruisethai@gmail.com>
>> >> ---
>> >>   drivers/ntb/hw/mscc/ntb_hw_switchtec.c | 33
>> >> ++++++++++++++++++++++++++-------
>> >>   1 file changed, 26 insertions(+), 7 deletions(-)
>> >>
>> >> diff --git a/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
>> >> b/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
>> >> index d71744904b19..5cfcd4deadaf 100644
>> >> --- a/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
>> >> +++ b/drivers/ntb/hw/mscc/ntb_hw_switchtec.c
>> >> @@ -463,18 +463,30 @@ static void switchtec_ntb_set_link_speed(struct
>> >> switchtec_ntb *sndev)
>> >>         sndev->link_width = min(self_width, peer_width);
>> >>   }
>> >>
>> >> -enum {
>> >> +enum switchtec_msg {
>> >>         LINK_MESSAGE = 0,
>> >>         MSG_LINK_UP = 1,
>> >>         MSG_LINK_DOWN = 2,
>> >>         MSG_CHECK_LINK = 3,
>> >> +       MSG_LINK_FORCE_DOWN = 4,
>> >>   };
>> >>
>> >> -static void switchtec_ntb_check_link(struct switchtec_ntb *sndev)
>> >> +static void switchtec_ntb_check_link(struct switchtec_ntb *sndev,
>> >> +                                    enum switchtec_msg msg)
>> >>   {
>> >>         int link_sta;
>> >>         int old = sndev->link_is_up;
>> >>
>> >> +       if (msg == MSG_LINK_FORCE_DOWN) {
>> >> +               if (sndev->link_is_up) {
>> >> +                       sndev->link_is_up = 0;
>> >> +                       ntb_link_event(&sndev->ntb);
>> >> +                       dev_info(&sndev->stdev->dev, "ntb link forced
>> >> down\n");
>> >> +               }
>> >> +
>> >> +               return;
>> >> +       }
>> >> +
>> >>         link_sta = sndev->self_shared->link_sta;
>> >>         if (link_sta) {
>> >>                 u64 peer = ioread64(&sndev->peer_shared->magic);
>> >> @@ -500,7 +512,7 @@ static void switchtec_ntb_link_notification(struct
>> >> switchtec_dev *stdev)
>> >>   {
>> >>         struct switchtec_ntb *sndev = stdev->sndev;
>> >>
>> >> -       switchtec_ntb_check_link(sndev);
>> >> +       switchtec_ntb_check_link(sndev, MSG_CHECK_LINK);
>> >>   }
>> >>
>> >>   static u64 switchtec_ntb_link_is_up(struct ntb_dev *ntb,
>> >> @@ -528,7 +540,7 @@ static int switchtec_ntb_link_enable(struct
>> ntb_dev
>> >> *ntb,
>> >>         sndev->self_shared->link_sta = 1;
>> >>         switchtec_ntb_send_msg(sndev, LINK_MESSAGE, MSG_LINK_UP);
>> >>
>> >> -       switchtec_ntb_check_link(sndev);
>> >> +       switchtec_ntb_check_link(sndev, MSG_CHECK_LINK);
>> >>
>> >>         return 0;
>> >>   }
>> >> @@ -540,9 +552,9 @@ static int switchtec_ntb_link_disable(struct
>> ntb_dev
>> >> *ntb)
>> >>         dev_dbg(&sndev->stdev->dev, "disabling link\n");
>> >>
>> >>         sndev->self_shared->link_sta = 0;
>> >> -       switchtec_ntb_send_msg(sndev, LINK_MESSAGE, MSG_LINK_UP);
>> >> +       switchtec_ntb_send_msg(sndev, LINK_MESSAGE, MSG_LINK_DOWN);
>> >>
>> >> -       switchtec_ntb_check_link(sndev);
>> >> +       switchtec_ntb_check_link(sndev, MSG_CHECK_LINK);
>> >>
>> >>         return 0;
>> >>   }
>> >> @@ -1061,7 +1073,7 @@ static irqreturn_t switchtec_ntb_message_isr(int
>> >> irq, void *dev)
>> >>                         iowrite8(1,
>> >> &sndev->mmio_self_dbmsg->imsg[i].status);
>> >>
>> >>                         if (i == LINK_MESSAGE)
>> >> -                               switchtec_ntb_check_link(sndev);
>> >> +                               switchtec_ntb_check_link(sndev, msg);
>> >>                 }
>> >>         }
>> >>
>> >> @@ -1159,6 +1171,13 @@ static int switchtec_ntb_add(struct device *dev,
>> >>         if (rc)
>> >>                 goto deinit_shared_and_exit;
>> >>
>> >> +       /*
>> >> +        * If this host crashed, the other host may think the link is
>> >> +        * still up. Tell them to force it down (it will go back up
>> >> +        * once we register the ntb device).
>> >> +        */
>> >> +       switchtec_ntb_send_msg(sndev, LINK_MESSAGE,
>> MSG_LINK_FORCE_DOWN);
>> >> +
>> >>         rc = ntb_register_device(&sndev->ntb);
>> >>         if (rc)
>> >>                 goto deinit_and_exit;
>> >> --
>> >> 2.11.0
>> >>
>> >
>>
>
>

[-- Attachment #2: Type: text/html, Size: 8785 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ntb_hw_switchtec: Force down the peer's link before initializing
  2017-11-29  5:28     ` ThanhTuThai
  2017-11-29  6:57       ` ThanhTuThai
@ 2017-11-29 16:37       ` Logan Gunthorpe
  2017-11-29 16:41         ` Logan Gunthorpe
  1 sibling, 1 reply; 7+ messages in thread
From: Logan Gunthorpe @ 2017-11-29 16:37 UTC (permalink / raw)
  To: ThanhTuThai, Jon Mason; +Cc: linux-ntb

Hey,

> It works only for soft-reboot, but for power-on reset, it doesn't.
> I saw the problem is: with the power-on reset, the good peer didn't 
> receive any message from the crashed peer even the crashed peer already 
> sent out the message.

Hmm, that's unexpected. If the link state changed on one of the ports 
the good host should have gotten an interrupt.

Can you send me the dmesg output on both hosts? Preferably with 
debugging[1] turned on.

Thanks,

Logan


[1] 
https://www.kernel.org/doc/html/latest/admin-guide/dynamic-debug-howto.html

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] ntb_hw_switchtec: Force down the peer's link before initializing
  2017-11-29 16:37       ` Logan Gunthorpe
@ 2017-11-29 16:41         ` Logan Gunthorpe
  0 siblings, 0 replies; 7+ messages in thread
From: Logan Gunthorpe @ 2017-11-29 16:41 UTC (permalink / raw)
  To: ThanhTuThai, Jon Mason; +Cc: linux-ntb

Also,

Can you grab the switchtec management utility[1] and send me the output of:

sudo switchtec gas dump > gas_dump.dat

On the good host after the bad host came back up?

Thanks,

Logan

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2017-11-29 16:41 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-11-28 22:01 [PATCH] ntb_hw_switchtec: Force down the peer's link before initializing Logan Gunthorpe
2017-11-28 22:05 ` Logan Gunthorpe
2017-11-29  3:09   ` Jon Mason
2017-11-29  5:28     ` ThanhTuThai
2017-11-29  6:57       ` ThanhTuThai
2017-11-29 16:37       ` Logan Gunthorpe
2017-11-29 16:41         ` Logan Gunthorpe

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.