linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
From: "Pali Rohár" <pali@kernel.org>
To: Konstantin Komarov <almaz.alexandrovich@paragon-software.com>
Cc: linux-fsdevel@vger.kernel.org, viro@zeniv.linux.org.uk,
	linux-kernel@vger.kernel.org, dsterba@suse.cz, aaptel@suse.com,
	willy@infradead.org, rdunlap@infradead.org, joe@perches.com,
	mark@harmstone.com
Subject: Re: [PATCH v3 04/10] fs/ntfs3: Add file operations and implementation
Date: Fri, 4 Sep 2020 13:50:49 +0200	[thread overview]
Message-ID: <20200904115049.i6zjfwba7egalxnp@pali> (raw)
In-Reply-To: <20200828143938.102889-5-almaz.alexandrovich@paragon-software.com>

Hello Konstantin!

On Friday 28 August 2020 07:39:32 Konstantin Komarov wrote:
> +/*
> + * Convert little endian utf16 to UTF-8.

There is mistake in comment. This function converts UTF-16 to some NLS.
It does not have to be UTF-8.

> + */
> +int ntfs_utf16_to_nls(struct ntfs_sb_info *sbi, const struct le_str *uni,
> +		      u8 *buf, int buf_len)
> +{
> +	int ret, uni_len;
> +	const __le16 *ip;
> +	u8 *op;
> +	struct nls_table *nls = sbi->nls;
> +
> +	static_assert(sizeof(wchar_t) == sizeof(__le16));
> +
> +	if (!nls) {
> +		/* utf16 -> utf8 */
> +		ret = utf16s_to_utf8s((wchar_t *)uni->name, uni->len,
> +				      UTF16_HOST_ENDIAN, buf, buf_len);

In comment you wrote that input is little endian, but here you use host
endian. Can you check what should be correct behavior (little or host
endian) and update code or comment?

> +		buf[ret] = '\0';
> +		return ret;
> +	}
> +
> +	ip = uni->name;
> +	op = buf;
> +	uni_len = uni->len;
> +
> +	while (uni_len--) {
> +		u16 ec;
> +		int charlen;
> +
> +		if (buf_len < NLS_MAX_CHARSET_SIZE) {
> +			ntfs_printk(sbi->sb, KERN_WARNING
> +				    "filename was truncated while converting.");
> +			break;
> +		}
> +
> +		ec = le16_to_cpu(*ip++);

In this branch (when nls variable is non-NULL) you expects that input is
in UTF-16 little endian. So probably in above utf16s_to_utf8s() call
should be used UTF-16 little endian too. But please recheck it.

> +		charlen = nls->uni2char(ec, op, buf_len);
> +
> +		if (charlen > 0) {
> +			op += charlen;
> +			buf_len -= charlen;
> +		} else {
> +			*op++ = ':';
> +			op = hex_byte_pack(op, ec >> 8);
> +			op = hex_byte_pack(op, ec);
> +			buf_len -= 5;
> +		}
> +	}
> +
> +	*op = '\0';
> +	return op - buf;
> +}
> +
> +static inline u8 get_digit(u8 d)
> +{
> +	u8 x = d & 0xf;
> +
> +	return x <= 9 ? ('0' + x) : ('A' + x - 10);
> +}
> +
> +#define PLANE_SIZE 0x00010000
> +
> +#define SURROGATE_PAIR 0x0000d800
> +#define SURROGATE_LOW 0x00000400
> +#define SURROGATE_BITS 0x000003ff
> +
> +/*
> + * modified version of 'utf8s_to_utf16s' allows to
> + * - detect -ENAMETOOLONG
> + * - convert problem symbols into triplet %XX

In this UTF-8 context it is not 'symbols', but rather 'bytes'.

Anyway, what is the purpose of converting invalid UTF-8 bytes into
triplet %XX? UNICODE standard defines standard algorithm how to handle
malformed UTF-8 input, so I think we should use it here, instead of
defining new own/custom way. This algorithm decodes malformed UTF-8 byte
sequence as sequence of UNICODE code points U+FFFD.

> + */
> +static int _utf8s_to_utf16s(const u8 *s, int inlen, wchar_t *pwcs, int maxout)
> +{
> +	u16 *op;
> +	int size;
> +	unicode_t u;
> +
> +	op = pwcs;
> +	while (inlen > 0 && *s) {
> +		if (*s & 0x80) {
> +			size = utf8_to_utf32(s, inlen, &u);
> +			if (size < 0) {
> +				if (maxout < 3)
> +					return -ENAMETOOLONG;
> +
> +				op[0] = '%';
> +				op[1] = get_digit(*s >> 4);
> +				op[2] = get_digit(*s >> 0);
> +
> +				op += 3;
> +				maxout -= 3;
> +				inlen--;
> +				s++;
> +				continue;
> +			}
> +
> +			s += size;
> +			inlen -= size;
> +
> +			if (u >= PLANE_SIZE) {
> +				if (maxout < 2)
> +					return -ENAMETOOLONG;
> +				u -= PLANE_SIZE;
> +
> +				op[0] = SURROGATE_PAIR |
> +					((u >> 10) & SURROGATE_BITS);
> +				op[1] = SURROGATE_PAIR | SURROGATE_LOW |
> +					(u & SURROGATE_BITS);
> +				op += 2;
> +				maxout -= 2;
> +			} else {
> +				if (maxout < 1)
> +					return -ENAMETOOLONG;
> +
> +				*op++ = u;
> +				maxout--;
> +			}
> +		} else {
> +			if (maxout < 1)
> +				return -ENAMETOOLONG;
> +
> +			*op++ = *s++;
> +			inlen--;
> +			maxout--;
> +		}
> +	}
> +	return op - pwcs;
> +}
> +
> +/*
> + * Convert input string to utf16
> + *
> + * name, name_len - input name
> + * uni, max_ulen - destination memory
> + * endian - endian of target utf16 string
> + *
> + * This function is called:
> + * - to create ntfs names (max_ulen == NTFS_NAME_LEN == 255)
> + * - to create symlink
> + *
> + * returns utf16 string length or error (if negative)
> + */
> +int ntfs_nls_to_utf16(struct ntfs_sb_info *sbi, const u8 *name, u32 name_len,
> +		      struct cpu_str *uni, u32 max_ulen,
> +		      enum utf16_endian endian)
> +{
> +	int i, ret, slen, warn;
> +	u32 tail;
> +	const u8 *str, *end;
> +	wchar_t *uname = uni->name;
> +	struct nls_table *nls = sbi->nls;
> +
> +	static_assert(sizeof(wchar_t) == sizeof(u16));
> +
> +	if (!nls) {
> +		/* utf8 -> utf16 */
> +		ret = _utf8s_to_utf16s(name, name_len, uname, max_ulen);
> +		if (ret < 0)
> +			return ret;
> +		goto out;
> +	}
> +
> +	str = name;
> +	end = name + name_len;
> +	warn = 0;
> +
> +	while (str < end && *str) {
> +		if (!max_ulen)
> +			return -ENAMETOOLONG;
> +		tail = end - str;
> +
> +		/*str -> uname*/
> +		slen = nls->char2uni(str, tail, uname);
> +		if (slen > 0) {

I'm not sure, but is not zero return value from char2uni also valid
conversion? I'm not sure if some NLSs could use escape sequences and
processing escape sequence would lead to no output, but still it is
valid conversion to UNICODE.

I looked into exfat driver and it treats only negative value from
char2uni as error.

> +			max_ulen -= 1;
> +			uname += 1;
> +			str += slen;
> +			continue;
> +		}
> +
> +		if (!warn) {
> +			warn = 1;
> +			ntfs_printk(
> +				sbi->sb,
> +				KERN_ERR
> +				"%s -> utf16 failed: '%.*s', pos %d, chars %x %x %x",
> +				nls->charset, name_len, name, (int)(str - name),
> +				str[0], tail > 1 ? str[1] : 0,
> +				tail > 2 ? str[2] : 0);
> +		}
> +
> +		if (max_ulen < 3)
> +			return -ENAMETOOLONG;
> +
> +		uname[0] = '%';
> +		uname[1] = get_digit(*str >> 4);
> +		uname[2] = get_digit(*str >> 0);
> +
> +		max_ulen -= 3;
> +		uname += 3;
> +		str += 1;
> +	}
> +
> +	ret = uname - uni->name;
> +out:
> +	uni->len = ret;
> +
> +#ifdef __BIG_ENDIAN
> +	if (endian == UTF16_LITTLE_ENDIAN) {
> +		i = ret;
> +		uname = uni->name;
> +
> +		while (i--) {
> +			__cpu_to_le16s(uname);
> +			uname++;
> +		}
> +	}
> +#else
> +	if (endian == UTF16_BIG_ENDIAN) {
> +		i = ret;
> +		uname = uni->name;
> +
> +		while (i--) {
> +			__cpu_to_be16s(uname);
> +			uname++;
> +		}
> +	}
> +#endif
> +
> +	return ret;
> +}
> +


...

> diff --git a/fs/ntfs3/file.c b/fs/ntfs3/file.c
> new file mode 100644
> index 000000000000..72c6a263b5bc
> --- /dev/null
> +++ b/fs/ntfs3/file.c
> @@ -0,0 +1,1214 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + *  linux/fs/ntfs3/file.c
> + *
> + * Copyright (C) 2019-2020 Paragon Software GmbH, All rights reserved.
> + *
> + *  regular file handling primitives for ntfs-based filesystems
> + */
> +#include <linux/backing-dev.h>
> +#include <linux/buffer_head.h>
> +#include <linux/compat.h>
> +#include <linux/falloc.h>
> +#include <linux/fiemap.h>
> +#include <linux/msdos_fs.h> /* FAT_IOCTL_XXX */
> +#include <linux/nls.h>
> +
> +#include "debug.h"
> +#include "ntfs.h"
> +#include "ntfs_fs.h"
> +
> +static int ntfs_ioctl_fitrim(struct ntfs_sb_info *sbi, unsigned long arg)
> +{
> +	struct fstrim_range __user *user_range;
> +	struct fstrim_range range;
> +	struct request_queue *q = bdev_get_queue(sbi->sb->s_bdev);
> +	int err;
> +
> +	if (!capable(CAP_SYS_ADMIN))
> +		return -EPERM;
> +
> +	if (!blk_queue_discard(q))
> +		return -EOPNOTSUPP;
> +
> +	user_range = (struct fstrim_range __user *)arg;
> +	if (copy_from_user(&range, user_range, sizeof(range)))
> +		return -EFAULT;
> +
> +	range.minlen = max_t(u32, range.minlen, q->limits.discard_granularity);
> +
> +	err = ntfs_trim_fs(sbi, &range);
> +	if (err < 0)
> +		return err;
> +
> +	if (copy_to_user(user_range, &range, sizeof(range)))
> +		return -EFAULT;
> +
> +	return 0;
> +}
> +
> +static long ntfs_ioctl(struct file *filp, u32 cmd, unsigned long arg)
> +{
> +	struct inode *inode = file_inode(filp);
> +	struct ntfs_sb_info *sbi = inode->i_sb->s_fs_info;
> +	u32 __user *user_attr = (u32 __user *)arg;
> +
> +	switch (cmd) {
> +	case FAT_IOCTL_GET_ATTRIBUTES:
> +		return put_user(le32_to_cpu(ntfs_i(inode)->std_fa), user_attr);
> +
> +	case FAT_IOCTL_GET_VOLUME_ID:
> +		return put_user(sbi->volume.ser_num, user_attr);

Question for fs maintainers: Do we want to reuse FAT ioctls in NTFS driver?

> +	case FITRIM:
> +		return ntfs_ioctl_fitrim(sbi, arg);
> +	}
> +	return -ENOTTY; /* Inappropriate ioctl for device */
> +}
> +

  parent reply	other threads:[~2020-09-04 11:51 UTC|newest]

Thread overview: 27+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-28 14:39 [PATCH v3 00/10] NTFS read-write driver GPL implementation by Paragon Software Konstantin Komarov
2020-08-28 14:39 ` [PATCH v3 01/10] fs/ntfs3: Add headers and misc files Konstantin Komarov
2020-08-28 14:39 ` [PATCH v3 02/10] fs/ntfs3: Add initialization of super block Konstantin Komarov
2020-09-04 12:06   ` Pali Rohár
2020-09-11 16:59     ` Konstantin Komarov
2020-08-28 14:39 ` [PATCH v3 03/10] fs/ntfs3: Add bitmap Konstantin Komarov
2020-08-28 14:39 ` [PATCH v3 04/10] fs/ntfs3: Add file operations and implementation Konstantin Komarov
2020-08-28 15:45   ` Al Viro
2020-09-04 12:41     ` Konstantin Komarov
2020-08-28 15:55   ` Al Viro
2020-09-04 12:49     ` Konstantin Komarov
2020-09-04 11:50   ` Pali Rohár [this message]
2020-09-11 16:52     ` Konstantin Komarov
2020-09-21 13:36       ` Pali Rohár
2020-09-22 10:26         ` Aurélien Aptel
2020-09-25 16:39         ` Konstantin Komarov
2020-08-28 14:39 ` [PATCH v3 05/10] fs/ntfs3: Add attrib operations Konstantin Komarov
2020-08-28 16:14   ` Mark Harmstone
2020-08-28 14:39 ` [PATCH v3 06/10] fs/ntfs3: Add compression Konstantin Komarov
2020-08-28 14:39 ` [PATCH v3 07/10] fs/ntfs3: Add NTFS journal Konstantin Komarov
2020-08-28 14:39 ` [PATCH v3 08/10] fs/ntfs3: Add Kconfig, Makefile and doc Konstantin Komarov
2020-08-28 14:39 ` [PATCH v3 09/10] fs/ntfs3: Add NTFS3 in fs/Kconfig and fs/Makefile Konstantin Komarov
2020-09-01  0:37   ` kernel test robot
2020-09-01  4:45   ` kernel test robot
2020-08-28 14:39 ` [PATCH v3 10/10] fs/ntfs3: Add MAINTAINERS Konstantin Komarov
2020-08-29 11:31 ` [PATCH v3 00/10] NTFS read-write driver GPL implementation by Paragon Software Nikolay Borisov
2020-09-04 12:55   ` Konstantin Komarov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20200904115049.i6zjfwba7egalxnp@pali \
    --to=pali@kernel.org \
    --cc=aaptel@suse.com \
    --cc=almaz.alexandrovich@paragon-software.com \
    --cc=dsterba@suse.cz \
    --cc=joe@perches.com \
    --cc=linux-fsdevel@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark@harmstone.com \
    --cc=rdunlap@infradead.org \
    --cc=viro@zeniv.linux.org.uk \
    --cc=willy@infradead.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).