From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <linux-kernel-owner@vger.kernel.org>
Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand
	id S933294AbbLXQ1E (ORCPT <rfc822;w@1wt.eu>);
	Thu, 24 Dec 2015 11:27:04 -0500
Received: from relay1.mentorg.com ([192.94.38.131]:56900 "EHLO
	relay1.mentorg.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org
	with ESMTP id S1754771AbbLXQ1A (ORCPT
	<rfc822;linux-kernel@vger.kernel.org>);
	Thu, 24 Dec 2015 11:27:00 -0500
From: Andrew Gabbasov <andrew_gabbasov@mentor.com>
To: Jan Kara <jack@suse.com>, <linux-kernel@vger.kernel.org>
Subject: [PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support
Date: Thu, 24 Dec 2015 10:25:31 -0600
Message-ID: <1450974338-22762-1-git-send-email-andrew_gabbasov@mentor.com>
X-Mailer: git-send-email 2.1.0
MIME-Version: 1.0
Content-Type: text/plain
Sender: linux-kernel-owner@vger.kernel.org
List-ID: <linux-kernel.vger.kernel.org>
X-Mailing-List: linux-kernel@vger.kernel.org

V2:

The single patch was split into several commits for separate logical
steps. Also, some minor fixes were done in the code of the patches.

V1:

Current implementation has several issues in unicode.c, mostly related
to handling multi-bytes characters in file names:

- loop ending conditions in udf_CS0toUTF8 and udf_CS0toNLS functions do not
properly catch the end of output buffer in case of multi-bytes characters,
allowing out-of-bounds writing and memory corruption;

- udf_UTF8toCS0 and udf_NLStoCS0 do not check the right boundary of output
buffer at all, also allowing out-of-bounds writing and memory corruption;

- udf_translate_to_linux does not take into account multi-bytes characters
at all (although it is called after converting to UTF8 or NLS): maximal
length of extension is counted as 5 bytes, that may be incorrect with
multi-bytes characters; when inserting CRC and extension for long names
(near the end of the buffer), they are inserted at fixed place at the end,
that can break into the middle of the multi-bytes character;

- when being converted from CS0 to UTF8 (or NLS), the name can be truncated
(even if the sizes in bytes of input and output buffers are the same),
but the following translating function does not know about it and does not
insert CRC, as it is assumed by the specs.

Because of the last item above, it looks like all the checks and
conversions (re-coding and possible CRC insertions) should be done
simultaneously in the single function. This means that the listed
issues can not be fixed independently and separately. So, the whole
conversion and translation support should be reworked.

The proposed implementation below fixes the listed issues, and also has
some additional features:

- it gets rid of "struct ustr", since it actually just makes an unneeded
extra copying of the buffer and does not have any other significant
advantage;

- it unifies UTF8 and NLS conversions support, since there is no much
sense to separate these cases;

- UDF_NAME_LEN constant adjusted to better reflect actual restrictions.


Andrew Gabbasov (7):
  udf: Prevent buffer overrun with multi-byte characters
  udf: Check output buffer length when converting name to CS0
  udf: Parameterize output length in udf_put_filename
  udf: Join functions for UTF8 and NLS conversions
  udf: Adjust UDF_NAME_LEN to better reflect actual restrictions
  udf: Remove struct ustr as non-needed intermediate storage
  udf: Merge linux specific translation into CS0 conversion function

 fs/udf/namei.c   |  16 +-
 fs/udf/super.c   |  38 ++--
 fs/udf/udfdecl.h |  21 +-
 fs/udf/unicode.c | 611 ++++++++++++++++++++++---------------------------------
 4 files changed, 274 insertions(+), 412 deletions(-)

-- 
2.1.0