[PATCH v2 0/7] rslib: RS decoder is severely broken

* [PATCH v2 0/7] rslib: RS decoder is severely broken
@ 2019-06-20 14:10 Ferdinand Blomqvist
  2019-06-20 14:10 ` [PATCH v2 1/7] rslib: Add tests for the encoder and decoder Ferdinand Blomqvist
                   ` (7 more replies)
  0 siblings, 8 replies; 18+ messages in thread
From: Ferdinand Blomqvist @ 2019-06-20 14:10 UTC (permalink / raw)
  To: linux-kernel; +Cc: Thomas Gleixner

This is the second revision of this series that fixes the bugs/flaws in
the RS decoder. This addresses all of Thomas' comments/suggestions.

Changes in v2:
- Replace code that requires the FPU (one small thing in test_rslib.c)
- More verbose changelog (patch 2/7)
- Clarifying comment (patch 7/7)
- Fixed some small whitespace issues (patches 1 and 5)

------

The Reed_Solomon library used in the kernel is based on Phil Karn's
fec library. When playing with this library I found a couple of bugs. It
turn out that all of these bugs, and some additional flaws, are present
in rslib.

The Reed-Solomon decoder has several bugs/flaws:

- Decoding of shortened codes is broken. The decoder only works as
  expected if there are no erasures.

- The decoder sometimes fails silently, i.e. it announces success but
  returns a word that is not a codeword.

- The return value of the decoder is incoherent with respect to how
  fixed erasures are counted. If the word to be decoded is a codeword,
  then the decoder always returns zero even if some erasures are given.
  On the other hand, if the word to be decoded contains errors, then the
  number of erasures is always included in the count of corrected
  symbols. So the decoder handles erasures without symbol corruption
  inconsistently. This inconsistency probably doesn't affect anyone
  using the decoder, but it is inconsistent with the documentation.

- The error positions returned in eras_pos include all erasures, but the
  corrections are only set in the correction buffer if there actually is
  a symbol error. So if there are erasures without symbol corruption,
  then the correction buffer will contain errors (unless initialized to
  zero before calling the decoder) or some values will be unset (if the
  correction buffer is uninitialized).

- Assumes that the syndromes provided by callers are in index form.
  This is simply undocumented.

- When correcting data in-place the decoder does not correct errors in
  the parity. On the other hand, when returning the errors in correction
  buffers, errors in the parity are included.

This series provides a module with tests for rslib and fixes all the
bugs/flaws. I am not sure that the provided self tests are written in
the 'right way'. I just looked at other self tests in lib and
implemented something similar.

The fixes are tested with the self tests. They should probably also be
tested with drivers etc. that use rslib, but it is unclear to me how to
do this.

I looked a bit on two of the drivers that use rslib:

drivers/mtd/nand/raw/cafe_nand.c
drivers/mtd/nand/raw/diskonchip.c

Both of them seem to do some additional error checking after calling
decode_rs16. Maybe this is needed because of the bugs in the decoder?

Ferdinand Blomqvist (7):
  rslib: Add tests for the encoder and decoder
  rslib: Fix decoding of shortened codes
  rslib: decode_rs: Fix length parameter check
  rslib: decode_rs: code cleanup
  rslib: Fix handling of of caller provided syndrome
  rslib: Update documentation
  rslib: Fix remaining decoder flaws

 lib/Kconfig.debug               |  12 +
 lib/reed_solomon/Makefile       |   2 +-
 lib/reed_solomon/decode_rs.c    | 115 +++++--
 lib/reed_solomon/reed_solomon.c |  12 +-
 lib/reed_solomon/test_rslib.c   | 516 ++++++++++++++++++++++++++++++++
 5 files changed, 622 insertions(+), 35 deletions(-)
 create mode 100644 lib/reed_solomon/test_rslib.c

-- 
2.17.2

^ permalink raw reply	[flat|nested] 18+ messages in thread