All of lore.kernel.org
 help / color / mirror / Atom feed
From: Xiubo Li <xiubli@redhat.com>
To: "Luís Henriques" <lhenriques@suse.de>
Cc: Jeff Layton <jlayton@kernel.org>,
	Ilya Dryomov <idryomov@gmail.com>,
	Gregory Farnum <gfarnum@redhat.com>,
	ceph-devel@vger.kernel.org, linux-kernel@vger.kernel.org
Subject: Re: [RFC PATCH v3] ceph: prevent a client from exceeding the MDS maximum xattr size
Date: Thu, 2 Jun 2022 17:42:50 +0800	[thread overview]
Message-ID: <289f5136-d2fc-1474-eb0f-521586f241b2@redhat.com> (raw)
In-Reply-To: <87h7534dr9.fsf@brahms.olymp>


On 6/2/22 5:26 PM, Luís Henriques wrote:
> Xiubo Li <xiubli@redhat.com> writes:
>
>> On 6/2/22 12:29 AM, Luís Henriques wrote:
>>> The MDS tries to enforce a limit on the total key/values in extended
>>> attributes.  However, this limit is enforced only if doing a synchronous
>>> operation (MDS_OP_SETXATTR) -- if we're buffering the xattrs, the MDS
>>> doesn't have a chance to enforce these limits.
>>>
>>> This patch adds support for decoding the xattrs maximum size setting that is
>>> distributed in the mdsmap.  Then, when setting an xattr, the kernel client
>>> will revert to do a synchronous operation if that maximum size is exceeded.
>>>
>>> While there, fix a dout() that would trigger a printk warning:
>>>
>>> [   98.718078] ------------[ cut here ]------------
>>> [   98.719012] precision 65536 too large
>>> [   98.719039] WARNING: CPU: 1 PID: 3755 at lib/vsprintf.c:2703 vsnprintf+0x5e3/0x600
>>> ...
>>>
>>> URL: https://tracker.ceph.com/issues/55725
>>> Signed-off-by: Luís Henriques <lhenriques@suse.de>
>>> ---
>>>    fs/ceph/mdsmap.c            | 27 +++++++++++++++++++++++----
>>>    fs/ceph/xattr.c             | 12 ++++++++----
>>>    include/linux/ceph/mdsmap.h |  1 +
>>>    3 files changed, 32 insertions(+), 8 deletions(-)
>>>
>>> * Changes since v2
>>>
>>> Well, a lot has changed since v2!  Now the xattr max value setting is
>>> obtained through the mdsmap, which needs to be decoded, and the feature
>>> that was used in the previous revision was dropped.  The drawback is that
>>> the MDS isn't unable to know in advance if a client is aware of this xattr
>>> max value.
>>>
>>> * Changes since v1
>>>
>>> Added support for new feature bit to get the MDS max_xattr_pairs_size
>>> setting.
>>>
>>> Also note that this patch relies on a patch that hasn't been merged yet
>>> ("ceph: use correct index when encoding client supported features"),
>>> otherwise the new feature bit won't be correctly encoded.
>>>
>>> diff --git a/fs/ceph/mdsmap.c b/fs/ceph/mdsmap.c
>>> index 30387733765d..36b2bc18ca2a 100644
>>> --- a/fs/ceph/mdsmap.c
>>> +++ b/fs/ceph/mdsmap.c
>>> @@ -13,6 +13,12 @@
>>>      #include "super.h"
>>>    +/*
>>> + * Maximum size of xattrs the MDS can handle per inode by default.  This
>>> + * includes the attribute name and 4+4 bytes for the key/value sizes.
>>> + */
>>> +#define MDS_MAX_XATTR_SIZE (1<<16) /* 64K */
>>> +
>>>    #define CEPH_MDS_IS_READY(i, ignore_laggy) \
>>>    	(m->m_info[i].state > 0 && ignore_laggy ? true : !m->m_info[i].laggy)
>>>    @@ -352,12 +358,10 @@ struct ceph_mdsmap *ceph_mdsmap_decode(void **p, void
>>> *end, bool msgr2)
>>>    		__decode_and_drop_type(p, end, u8, bad_ext);
>>>    	}
>>>    	if (mdsmap_ev >= 8) {
>>> -		u32 name_len;
>>>    		/* enabled */
>>>    		ceph_decode_8_safe(p, end, m->m_enabled, bad_ext);
>>> -		ceph_decode_32_safe(p, end, name_len, bad_ext);
>>> -		ceph_decode_need(p, end, name_len, bad_ext);
>>> -		*p += name_len;
>>> +		/* fs_name */
>>> +		ceph_decode_skip_string(p, end, bad_ext);
>>>    	}
>>>    	/* damaged */
>>>    	if (mdsmap_ev >= 9) {
>>> @@ -370,6 +374,21 @@ struct ceph_mdsmap *ceph_mdsmap_decode(void **p, void *end, bool msgr2)
>>>    	} else {
>>>    		m->m_damaged = false;
>>>    	}
>>> +	if (mdsmap_ev >= 17) {
>>> +		/* balancer */
>>> +		ceph_decode_skip_string(p, end, bad_ext);
>>> +		/* standby_count_wanted */
>>> +		ceph_decode_skip_32(p, end, bad_ext);
>>> +		/* old_max_mds */
>>> +		ceph_decode_skip_32(p, end, bad_ext);
>>> +		/* min_compat_client */
>>> +		ceph_decode_skip_8(p, end, bad_ext);
>> This is incorrect.
>>
>> If mdsmap_ev == 15 the min_compat_client will be a feature_bitset_t instead of
>> int8_t.
> Hmm... can you point me at where that's done in the code?  As usual, I'm
> confused with that code and simply can't see that.
>
> Also, if that happens only when mdsmap_ev == 15, then there's no problem
> because that branch is only taken if it's >= 17.

Yeah, so you should skip 32 or 32+64 bits instead here, just likes:

3536                 /* version >= 3, feature bits */
3537                 ceph_decode_32_safe(&p, end, len, bad);
3538                 if (len) {
3539                         ceph_decode_64_safe(&p, end, features, bad);
3540                         p += len - sizeof(features);
3541                 }

For the ceph code please see:

Please see https://github.com/ceph/ceph/blob/main/src/mds/MDSMap.cc#L925.

>>
>>> +		/* required_client_features */
>>> +		ceph_decode_skip_set(p, end, 64, bad_ext);
>>> +		ceph_decode_64_safe(p, end, m->m_max_xattr_size, bad_ext);
>>> +	} else {
>>> +		m->m_max_xattr_size = MDS_MAX_XATTR_SIZE;
>>> +	}
>>>    bad_ext:
>>>    	dout("mdsmap_decode m_enabled: %d, m_damaged: %d, m_num_laggy: %d\n",
>>>    	     !!m->m_enabled, !!m->m_damaged, m->m_num_laggy);
>>> diff --git a/fs/ceph/xattr.c b/fs/ceph/xattr.c
>>> index 8c2dc2c762a4..67f046dac35c 100644
>>> --- a/fs/ceph/xattr.c
>>> +++ b/fs/ceph/xattr.c
>>> @@ -1086,7 +1086,7 @@ static int ceph_sync_setxattr(struct inode *inode, const char *name,
>>>    			flags |= CEPH_XATTR_REMOVE;
>>>    	}
>>>    -	dout("setxattr value=%.*s\n", (int)size, value);
>>> +	dout("setxattr value size: %ld\n", size);
>>>      	/* do request */
>>>    	req = ceph_mdsc_create_request(mdsc, op, USE_AUTH_MDS);
>>> @@ -1184,8 +1184,14 @@ int __ceph_setxattr(struct inode *inode, const char *name,
>>>    	spin_lock(&ci->i_ceph_lock);
>>>    retry:
>>>    	issued = __ceph_caps_issued(ci, NULL);
>>> -	if (ci->i_xattrs.version == 0 || !(issued & CEPH_CAP_XATTR_EXCL))
>>> +	required_blob_size = __get_required_blob_size(ci, name_len, val_len);
>>> +	if ((ci->i_xattrs.version == 0) || !(issued & CEPH_CAP_XATTR_EXCL) ||
>>> +	    (required_blob_size >= mdsc->mdsmap->m_max_xattr_size)) {
>> Shouldn't it be '>' instead ?
> Ok, I'll fix that.
>
>> We'd better always force to do a sync request with old ceph. Just check if the
>> mdsmap_ev < 17. It's not safe to buffer it because it maybe discarded as your
>> ceph PR does.
> Right, that can be done.  So, I can simply set the m_max_xattr_size to '0'
> if mdsmap_ev < 17.  Then, this 'if' condition will always be evaluated to
> true because required_blob_size will be > 0.  Does that sound OK?

Yeah, sounds good.

-- Xiubo


>
> Cheers,


  reply	other threads:[~2022-06-02  9:43 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-06-01 16:29 [RFC PATCH v3] ceph: prevent a client from exceeding the MDS maximum xattr size Luís Henriques
2022-06-01 20:27 ` kernel test robot
2022-06-02  2:33 ` Xiubo Li
2022-06-02  9:26   ` Luís Henriques
2022-06-02  9:42     ` Xiubo Li [this message]
2022-06-02 10:28       ` Luís Henriques
2022-06-02 10:57         ` Xiubo Li

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=289f5136-d2fc-1474-eb0f-521586f241b2@redhat.com \
    --to=xiubli@redhat.com \
    --cc=ceph-devel@vger.kernel.org \
    --cc=gfarnum@redhat.com \
    --cc=idryomov@gmail.com \
    --cc=jlayton@kernel.org \
    --cc=lhenriques@suse.de \
    --cc=linux-kernel@vger.kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.