linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* Re: [PATCH] eeprom: at24: Fix unexpected timeout under high load
@ 2018-08-05 12:26 Jonas Mark (BT-FIR/ENG1)
  2018-08-05 14:09 ` Andy Shevchenko
  0 siblings, 1 reply; 6+ messages in thread
From: Jonas Mark (BT-FIR/ENG1) @ 2018-08-05 12:26 UTC (permalink / raw)
  To: Andy Shevchenko, Bartosz Golaszewski
  Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-i2c,
	Linux Kernel Mailing List, WANG Xin (BT-FIR/ENG1-Zhu),
	Jonas Mark (BT-FIR/ENG1)

Hi Andy,

Thank you for your feedback.
 
> > -#define at24_loop_until_timeout(tout, op_time)                         \
> > -       for (tout = jiffies + msecs_to_jiffies(at24_write_timeout),     \
> > -            op_time = 0;                                               \
> > -            op_time ? time_before(op_time, tout) : true;               \
> > -            usleep_range(1000, 1500), op_time = jiffies)
> 
> This one understandble and represents one operation.

It just has the downside that it will not retry if the timeout is
reached after the usleep_range().

If you have a system which combines high CPU load with repeated EEPROM
writes you will run into the following scenario:

- System makes a successful regmap_bulk_write() to EEPROM.
- System wants to perform another write to EEPROM but EEPROM is still
  busy with the last write.
- Because of high CPU load the usleep_range() will sleep more than
  25 ms (at24_write_timeout).
- Within the over-long sleeping the EEPROM finished the previous write
  operation and is ready again.
- at24_loop_until_timeout() will detect timeout and won't try to write.

The scenario above happens very often on our system and we need a fix.

> > +#define at24_loop_until_timeout_begin(tout, op_time)           \
> > +       tout = jiffies + msecs_to_jiffies(at24_write_timeout);  \
> > +       while (true) {                                          \
> > +               op_time = jiffies;
> > +
> > +#define at24_loop_until_timeout_end(tout, op_time)             \
> > +               if (time_before(tout, op_time))                 \
> > +                       break;                                  \
> > +               usleep_range(1000, 1500);                       \
> > +       }
> 
> Besides `while (true)`, which is a red flag for timeout loops,
> these are done in an hack way. Just open code them in both cases, or
> rewrite original one to keel it's semantics.

I have to admit that I am not sure what you mean.

We kept the macro-style of the loop because we assumed it was good
style in this context.

What does "keel it's semantics" mean?

With "open code them in both cases" do you mean to rid of the macro
and to directly write the loop into the code? Does the following
match your proposals?

ret = 0;
tout = jiffies + msecs_to_jiffies(at24_write_timeout);
do {
	if (ret)
		usleep_range(1000, 1500);

	read_time = jiffies;

	ret = regmap_bulk_read(regmap, offset, buf, count);
	dev_dbg(&client->dev, "read %zu@%d --> %d (%ld)\n",
		count, offset, ret, jiffies);
	if (!ret)
		return count;
} while (!time_before(tout, read_time))

Greetings,
Mark

Building Technologies, Panel Software Fire (BT-FIR/ENG1) 
Bosch Sicherheitssysteme GmbH | Postfach 11 11 | 85626 Grasbrunn | GERMANY | www.boschsecurity.com

Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart HRB 23118 
Aufsichtsratsvorsitzender: Stefan Hartung; Geschäftsführung: Tanja Rückert, Andreas Bartz, Thomas Quante, Bernhard Schuster

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] eeprom: at24: Fix unexpected timeout under high load
  2018-08-05 12:26 [PATCH] eeprom: at24: Fix unexpected timeout under high load Jonas Mark (BT-FIR/ENG1)
@ 2018-08-05 14:09 ` Andy Shevchenko
  0 siblings, 0 replies; 6+ messages in thread
From: Andy Shevchenko @ 2018-08-05 14:09 UTC (permalink / raw)
  To: Jonas Mark (BT-FIR/ENG1)
  Cc: Bartosz Golaszewski, Arnd Bergmann, Greg Kroah-Hartman,
	linux-i2c, Linux Kernel Mailing List, WANG Xin (BT-FIR/ENG1-Zhu)

On Sun, Aug 5, 2018 at 3:26 PM, Jonas Mark (BT-FIR/ENG1)
<Mark.Jonas@de.bosch.com> wrote:
> Hi Andy,
>
> Thank you for your feedback.
>
>> > -#define at24_loop_until_timeout(tout, op_time)                         \
>> > -       for (tout = jiffies + msecs_to_jiffies(at24_write_timeout),     \
>> > -            op_time = 0;                                               \
>> > -            op_time ? time_before(op_time, tout) : true;               \
>> > -            usleep_range(1000, 1500), op_time = jiffies)
>>
>> This one understandble and represents one operation.
>
> It just has the downside that it will not retry if the timeout is
> reached after the usleep_range().
>
> If you have a system which combines high CPU load with repeated EEPROM
> writes you will run into the following scenario:
>
> - System makes a successful regmap_bulk_write() to EEPROM.
> - System wants to perform another write to EEPROM but EEPROM is still
>   busy with the last write.
> - Because of high CPU load the usleep_range() will sleep more than
>   25 ms (at24_write_timeout).
> - Within the over-long sleeping the EEPROM finished the previous write
>   operation and is ready again.
> - at24_loop_until_timeout() will detect timeout and won't try to write.

>
> The scenario above happens very often on our system and we need a fix.

Thanks for explanation why. (it would be good partially move this to
the commit message).


>
>> > +#define at24_loop_until_timeout_begin(tout, op_time)           \
>> > +       tout = jiffies + msecs_to_jiffies(at24_write_timeout);  \
>> > +       while (true) {                                          \
>> > +               op_time = jiffies;
>> > +
>> > +#define at24_loop_until_timeout_end(tout, op_time)             \
>> > +               if (time_before(tout, op_time))                 \
>> > +                       break;                                  \
>> > +               usleep_range(1000, 1500);                       \
>> > +       }
>>
>> Besides `while (true)`, which is a red flag for timeout loops,
>> these are done in an hack way. Just open code them in both cases, or
>> rewrite original one to keel it's semantics.
>
> I have to admit that I am not sure what you mean.
>
> We kept the macro-style of the loop because we assumed it was good
> style in this context.

No way. It's a bad style when you have a macro like you proposing. It
would give you a bottle of sparkling bugs.

> What does "keel it's semantics" mean?

Macro should be standalone piece of code which does something from A
to Z, not from A-K when you need a complementary macro to do L-Z
parts.

> With "open code them in both cases" do you mean to rid of the macro
> and to directly write the loop into the code? Does the following
> match your proposals?
>
> ret = 0;
> tout = jiffies + msecs_to_jiffies(at24_write_timeout);
> do {
>         if (ret)
>                 usleep_range(1000, 1500);
>
>         read_time = jiffies;
>
>         ret = regmap_bulk_read(regmap, offset, buf, count);
>         dev_dbg(&client->dev, "read %zu@%d --> %d (%ld)\n",
>                 count, offset, ret, jiffies);
>         if (!ret)
>                 return count;
> } while (!time_before(tout, read_time))

Yes, though, please, look at the examples in the existing code and
make it slightly better

timeout = ...
do {
ret = ...
if (ret) // or if (!ret)
 ...

usleep_range(...);
} while(time_before(...));

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] eeprom: at24: Fix unexpected timeout under high load
  2018-08-06 11:39 Jonas Mark (BT-FIR/ENG1)
@ 2018-08-14 12:50 ` Andy Shevchenko
  0 siblings, 0 replies; 6+ messages in thread
From: Andy Shevchenko @ 2018-08-14 12:50 UTC (permalink / raw)
  To: Jonas Mark (BT-FIR/ENG1)
  Cc: Bartosz Golaszewski, Arnd Bergmann, Greg Kroah-Hartman,
	linux-i2c, Linux Kernel Mailing List, WANG Xin (BT-FIR/ENG1-Zhu)

On Mon, Aug 6, 2018 at 2:39 PM, Jonas Mark (BT-FIR/ENG1)
<Mark.Jonas@de.bosch.com> wrote:

> tout = jiffies + msecs_to_jiffies(at24_write_timeout);
> do {
>          read_time = jiffies;
>
>          ret = regmap_bulk_read(regmap, offset, buf, count);
>          dev_dbg(&client->dev, "read %zu@%d --> %d (%ld)\n",
>                  count, offset, ret, jiffies);
>          if (!ret)
>                  return count;
>
>          usleep_range(1000, 1500);
> } while (!time_before(tout, read_time))
>
> The advantage of this code is that the usleep_range() is unconditional.

> The disadvantage of the new proposal is that in case of a timeout one
> more unnecessary sleep is made. Is that acceptable?

Yes.

> An alternative would be to duplicate the regmap_bulk_read() and the
> debugging code outside the loop.

> Is this preferable?

No.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] eeprom: at24: Fix unexpected timeout under high load
@ 2018-08-06 11:39 Jonas Mark (BT-FIR/ENG1)
  2018-08-14 12:50 ` Andy Shevchenko
  0 siblings, 1 reply; 6+ messages in thread
From: Jonas Mark (BT-FIR/ENG1) @ 2018-08-06 11:39 UTC (permalink / raw)
  To: Andy Shevchenko, Bartosz Golaszewski
  Cc: Arnd Bergmann, Greg Kroah-Hartman, linux-i2c,
	Linux Kernel Mailing List, WANG Xin (BT-FIR/ENG1-Zhu),
	Jonas Mark (BT-FIR/ENG1)

Hi Andy,

> >> > -#define at24_loop_until_timeout(tout, op_time)                         \
> >> > -       for (tout = jiffies + msecs_to_jiffies(at24_write_timeout),     \
> >> > -            op_time = 0;                                               \
> >> > -            op_time ? time_before(op_time, tout) : true;               \
> >> > -            usleep_range(1000, 1500), op_time = jiffies)
> >>
> >> This one understandble and represents one operation.
> >
> > It just has the downside that it will not retry if the timeout is
> > reached after the usleep_range().
> >
> > If you have a system which combines high CPU load with repeated
> EEPROM
> > writes you will run into the following scenario:
> >
> > - System makes a successful regmap_bulk_write() to EEPROM.
> > - System wants to perform another write to EEPROM but EEPROM is still
> >   busy with the last write.
> > - Because of high CPU load the usleep_range() will sleep more than
> >   25 ms (at24_write_timeout).
> > - Within the over-long sleeping the EEPROM finished the previous write
> >   operation and is ready again.
> > - at24_loop_until_timeout() will detect timeout and won't try to write.
> 
> >
> > The scenario above happens very often on our system and we need a fix.
> 
> Thanks for explanation why. (it would be good partially move this to
> the commit message).

We will improve the commit message in the next revision of the patch.

> >> > +#define at24_loop_until_timeout_begin(tout, op_time)           \
> >> > +       tout = jiffies + msecs_to_jiffies(at24_write_timeout);  \
> >> > +       while (true) {                                          \
> >> > +               op_time = jiffies;
> >> > +
> >> > +#define at24_loop_until_timeout_end(tout, op_time)             \
> >> > +               if (time_before(tout, op_time))                 \
> >> > +                       break;                                  \
> >> > +               usleep_range(1000, 1500);                       \
> >> > +       }
> >>
> >> Besides `while (true)`, which is a red flag for timeout loops,
> >> these are done in an hack way. Just open code them in both cases, or
> >> rewrite original one to keel it's semantics.
> >
> > I have to admit that I am not sure what you mean.
> >
> > We kept the macro-style of the loop because we assumed it was good
> > style in this context.
> 
> No way. It's a bad style when you have a macro like you proposing. It
> would give you a bottle of sparkling bugs.
> 
> > What does "keel it's semantics" mean?
> 
> Macro should be standalone piece of code which does something from A
> to Z, not from A-K when you need a complementary macro to do L-Z
> parts.

I agree, we will do it without a macro then.

> > With "open code them in both cases" do you mean to rid of the macro
> > and to directly write the loop into the code? Does the following
> > match your proposals?
> >
> > ret = 0;
> > tout = jiffies + msecs_to_jiffies(at24_write_timeout);
> > do {
> >         if (ret)
> >                 usleep_range(1000, 1500);
> >
> >         read_time = jiffies;
> >
> >         ret = regmap_bulk_read(regmap, offset, buf, count);
> >         dev_dbg(&client->dev, "read %zu@%d --> %d (%ld)\n",
> >                 count, offset, ret, jiffies);
> >         if (!ret)
> >                 return count;
> > } while (!time_before(tout, read_time))
> 
> Yes, though, please, look at the examples in the existing code and
> make it slightly better
> 
> timeout = ...
> do {
> ret = ...
> if (ret) // or if (!ret)
>  ...
> 
> usleep_range(...);
> } while(time_before(...));

When working on the problem we had an intermediate result were we
came to the same solution as your proposal showed:

tout = jiffies + msecs_to_jiffies(at24_write_timeout);
do {
         read_time = jiffies;

         ret = regmap_bulk_read(regmap, offset, buf, count);
         dev_dbg(&client->dev, "read %zu@%d --> %d (%ld)\n",
                 count, offset, ret, jiffies);
         if (!ret)
                 return count;

         usleep_range(1000, 1500);
} while (!time_before(tout, read_time))

The advantage of this code is that the usleep_range() is unconditional.

(In my older proposal a "if (ret)" condition is required to make sure
that there is not sleep at the very first iteration but only at
follow-up iterations where regmap_bulk_read() failed.)

The disadvantage of the new proposal is that in case of a timeout one
more unnecessary sleep is made. Is that acceptable?

An alternative would be to duplicate the regmap_bulk_read() and the
debugging code outside the loop.

tout = jiffies + msecs_to_jiffies(at24_write_timeout);
read_time = jiffies;

ret = regmap_bulk_read(regmap, offset, buf, count);
dev_dbg(&client->dev, "read %zu@%d --> %d (%ld)\n",
	count, offset, ret, jiffies);

while (ret && !time_before(tout, read_time))
	usleep_range(1000, 1500);

	/*
	 * The timestamp shall be taken before sleep and the actual
	 * operation to avoid a premature timeout in case of high CPU load.
	 */
	read_time = jiffies;

	ret = regmap_bulk_read(regmap, offset, buf, count);
	dev_dbg(&client->dev, "read retry %zu@%d --> %d (%ld)\n",
		count, offset, ret, jiffies);
}

if (!ret)
	return count;

Is this preferable?

Greetings,
Mark

Building Technologies, Panel Software Fire (BT-FIR/ENG1) 
Bosch Sicherheitssysteme GmbH | Postfach 11 11 | 85626 Grasbrunn | GERMANY | www.boschsecurity.com

Sitz: Stuttgart, Registergericht: Amtsgericht Stuttgart HRB 23118 
Aufsichtsratsvorsitzender: Stefan Hartung; Geschäftsführung: Tanja Rückert, Andreas Bartz, Thomas Quante, Bernhard Schuster

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] eeprom: at24: Fix unexpected timeout under high load
  2018-08-04 17:43 Mark Jonas
@ 2018-08-04 20:51 ` Andy Shevchenko
  0 siblings, 0 replies; 6+ messages in thread
From: Andy Shevchenko @ 2018-08-04 20:51 UTC (permalink / raw)
  To: Mark Jonas
  Cc: Bartosz Golaszewski, Arnd Bergmann, Greg Kroah-Hartman,
	linux-i2c, Linux Kernel Mailing List, Wang Xin

On Sat, Aug 4, 2018 at 8:43 PM, Mark Jonas <mark.jonas@de.bosch.com> wrote:


> -#define at24_loop_until_timeout(tout, op_time)                         \
> -       for (tout = jiffies + msecs_to_jiffies(at24_write_timeout),     \
> -            op_time = 0;                                               \
> -            op_time ? time_before(op_time, tout) : true;               \
> -            usleep_range(1000, 1500), op_time = jiffies)

This one understandble and represents one operation.

> +#define at24_loop_until_timeout_begin(tout, op_time)           \
> +       tout = jiffies + msecs_to_jiffies(at24_write_timeout);  \
> +       while (true) {                                          \
> +               op_time = jiffies;
> +
> +#define at24_loop_until_timeout_end(tout, op_time)             \
> +               if (time_before(tout, op_time))                 \
> +                       break;                                  \
> +               usleep_range(1000, 1500);                       \
> +       }

Besides `while (true)`, which is a red flag for timeout loops,
these are done in an hack way. Just open code them in both cases, or
rewrite original one to keel it's semantics.

-- 
With Best Regards,
Andy Shevchenko

^ permalink raw reply	[flat|nested] 6+ messages in thread

* [PATCH] eeprom: at24: Fix unexpected timeout under high load
@ 2018-08-04 17:43 Mark Jonas
  2018-08-04 20:51 ` Andy Shevchenko
  0 siblings, 1 reply; 6+ messages in thread
From: Mark Jonas @ 2018-08-04 17:43 UTC (permalink / raw)
  To: Bartosz Golaszewski, Arnd Bergmann, Greg Kroah-Hartman
  Cc: linux-i2c, linux-kernel, Wang Xin, Mark Jonas

From: Wang Xin <xin.wang7@cn.bosch.com>

Within at24_loop_until_timeout the timestamp used for timeout checking
is recorded after the I2C transfer and sleep_range(). Under high CPU
load either the execution time for I2C transfer or sleep_range() could
actually be larger than the timeout value. Worst case the I2C transfer
is only tried once because the loop will exit due to the timeout
although the EEPROM is now ready.

To fix this issue the timestamp is recorded at the beginning of each
iteration. That is, before I2C transfer and sleep. Then the timeout
is actually checked against the timestamp of the previous iteration.
This makes sure that even if the timeout is reached, there is still one
more chance to try the I2C transfer in case the EEPROM is ready.

Signed-off-by: Wang Xin <xin.wang7@cn.bosch.com>
Signed-off-by: Mark Jonas <mark.jonas@de.bosch.com>
---
 drivers/misc/eeprom/at24.c | 26 +++++++++++++++++++-------
 1 file changed, 19 insertions(+), 7 deletions(-)

diff --git a/drivers/misc/eeprom/at24.c b/drivers/misc/eeprom/at24.c
index f5cc517..6d8c294 100644
--- a/drivers/misc/eeprom/at24.c
+++ b/drivers/misc/eeprom/at24.c
@@ -112,16 +112,26 @@ MODULE_PARM_DESC(at24_write_timeout, "Time (in ms) to try writes (default 25)");
  * write to work while making sure that at least one iteration is run before
  * checking the break condition.
  *
+ * By recording current time at the beginning of one iteration, the timeout
+ * is actually checked against the time in previous iteration. It makes sure,
+ * even the timeout is reached, there is still one more chance to try I2C
+ * transfer in case EEPROM is ready.
+ *
  * It takes two parameters: a variable in which the future timeout in jiffies
  * will be stored and a temporary variable holding the time of the last
  * iteration of processing the request. Both should be unsigned integers
  * holding at least 32 bits.
  */
-#define at24_loop_until_timeout(tout, op_time)				\
-	for (tout = jiffies + msecs_to_jiffies(at24_write_timeout),	\
-	     op_time = 0;						\
-	     op_time ? time_before(op_time, tout) : true;		\
-	     usleep_range(1000, 1500), op_time = jiffies)
+#define at24_loop_until_timeout_begin(tout, op_time)		\
+	tout = jiffies + msecs_to_jiffies(at24_write_timeout);	\
+	while (true) {						\
+		op_time = jiffies;
+
+#define at24_loop_until_timeout_end(tout, op_time)		\
+		if (time_before(tout, op_time))			\
+			break;					\
+		usleep_range(1000, 1500);			\
+	}
 
 struct at24_chip_data {
 	/*
@@ -308,13 +318,14 @@ static ssize_t at24_regmap_read(struct at24_data *at24, char *buf,
 	/* adjust offset for mac and serial read ops */
 	offset += at24->offset_adj;
 
-	at24_loop_until_timeout(timeout, read_time) {
+	at24_loop_until_timeout_begin(timeout, read_time) {
 		ret = regmap_bulk_read(regmap, offset, buf, count);
 		dev_dbg(&client->dev, "read %zu@%d --> %d (%ld)\n",
 			count, offset, ret, jiffies);
 		if (!ret)
 			return count;
 	}
+	at24_loop_until_timeout_end(timeout, read_time)
 
 	return -ETIMEDOUT;
 }
@@ -359,13 +370,14 @@ static ssize_t at24_regmap_write(struct at24_data *at24, const char *buf,
 	client = at24_client->client;
 	count = at24_adjust_write_count(at24, offset, count);
 
-	at24_loop_until_timeout(timeout, write_time) {
+	at24_loop_until_timeout_begin(timeout, write_time) {
 		ret = regmap_bulk_write(regmap, offset, buf, count);
 		dev_dbg(&client->dev, "write %zu@%d --> %d (%ld)\n",
 			count, offset, ret, jiffies);
 		if (!ret)
 			return count;
 	}
+	at24_loop_until_timeout_end(timeout, write_time)
 
 	return -ETIMEDOUT;
 }
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2018-08-14 12:50 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-08-05 12:26 [PATCH] eeprom: at24: Fix unexpected timeout under high load Jonas Mark (BT-FIR/ENG1)
2018-08-05 14:09 ` Andy Shevchenko
  -- strict thread matches above, loose matches on Subject: below --
2018-08-06 11:39 Jonas Mark (BT-FIR/ENG1)
2018-08-14 12:50 ` Andy Shevchenko
2018-08-04 17:43 Mark Jonas
2018-08-04 20:51 ` Andy Shevchenko

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).