From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S933294AbbLXQ1E (ORCPT ); Thu, 24 Dec 2015 11:27:04 -0500 Received: from relay1.mentorg.com ([192.94.38.131]:56900 "EHLO relay1.mentorg.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1754771AbbLXQ1A (ORCPT ); Thu, 24 Dec 2015 11:27:00 -0500 From: Andrew Gabbasov To: Jan Kara , Subject: [PATCH v2 0/7] udf: rework name conversions to fix multi-bytes characters support Date: Thu, 24 Dec 2015 10:25:31 -0600 Message-ID: <1450974338-22762-1-git-send-email-andrew_gabbasov@mentor.com> X-Mailer: git-send-email 2.1.0 MIME-Version: 1.0 Content-Type: text/plain Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org V2: The single patch was split into several commits for separate logical steps. Also, some minor fixes were done in the code of the patches. V1: Current implementation has several issues in unicode.c, mostly related to handling multi-bytes characters in file names: - loop ending conditions in udf_CS0toUTF8 and udf_CS0toNLS functions do not properly catch the end of output buffer in case of multi-bytes characters, allowing out-of-bounds writing and memory corruption; - udf_UTF8toCS0 and udf_NLStoCS0 do not check the right boundary of output buffer at all, also allowing out-of-bounds writing and memory corruption; - udf_translate_to_linux does not take into account multi-bytes characters at all (although it is called after converting to UTF8 or NLS): maximal length of extension is counted as 5 bytes, that may be incorrect with multi-bytes characters; when inserting CRC and extension for long names (near the end of the buffer), they are inserted at fixed place at the end, that can break into the middle of the multi-bytes character; - when being converted from CS0 to UTF8 (or NLS), the name can be truncated (even if the sizes in bytes of input and output buffers are the same), but the following translating function does not know about it and does not insert CRC, as it is assumed by the specs. Because of the last item above, it looks like all the checks and conversions (re-coding and possible CRC insertions) should be done simultaneously in the single function. This means that the listed issues can not be fixed independently and separately. So, the whole conversion and translation support should be reworked. The proposed implementation below fixes the listed issues, and also has some additional features: - it gets rid of "struct ustr", since it actually just makes an unneeded extra copying of the buffer and does not have any other significant advantage; - it unifies UTF8 and NLS conversions support, since there is no much sense to separate these cases; - UDF_NAME_LEN constant adjusted to better reflect actual restrictions. Andrew Gabbasov (7): udf: Prevent buffer overrun with multi-byte characters udf: Check output buffer length when converting name to CS0 udf: Parameterize output length in udf_put_filename udf: Join functions for UTF8 and NLS conversions udf: Adjust UDF_NAME_LEN to better reflect actual restrictions udf: Remove struct ustr as non-needed intermediate storage udf: Merge linux specific translation into CS0 conversion function fs/udf/namei.c | 16 +- fs/udf/super.c | 38 ++-- fs/udf/udfdecl.h | 21 +- fs/udf/unicode.c | 611 ++++++++++++++++++++++--------------------------------- 4 files changed, 274 insertions(+), 412 deletions(-) -- 2.1.0