Cluster Command Parsing Refactor #373

vazois · 2024-05-10T22:34:57Z

This PR addresses regressions due to changes in the semantics of DrainCommand and to adhere to the new parsing restrictions.
It refactors ClusterCommands.cs into multiple files similar to how other Network commands are implemented.
The PR contains the following enchantments:

Separation of cluster subcommand parsing and matching from corresponding execution portion.
I removed the lowercase sequences used for matching the individual cluster subcommands. For all cluster commands, except those used for internode communication, the input token will be converted to upper case command matching is performed.
Added explicit checks for the arguments of each cluster command.
Implemented negative tests to validate parsing edge cases (e.g. wrong number of arguments or partially send packets).

Some notes:

Performance impact should be minimal since cluster commands are generally in the slow path.
I have prioritized checking first APPENDLOG and GOSSIP since these are generally the most frequently executed commands.

PaulusParssinen · 2024-05-10T22:51:17Z

I used MemoryExtensions.SequenceEqual because I read somewhere that it offers better performance ReadOnlySpan.SequenceEqual. I could benchmark it or if someone can chime in that will be great.

They're the same method. MemoryExtensions.SequenceEqual(this ReadOnlySpan<T> a, ReadOnlySpan<T> b) is the extension method that gets binded when you do span.SequenceEqual(other).

There is LINQ Enumerable.SequenceEqual which won't get binded to spans, but you can accidentally bind it to anything that is IEnumerable 😅

In general, cluster commands are in the slow path, so I am wondering if we should use MakeUpperCase to avoid having two versions of each command type when parsing param byte sequence.

IMO yes. And in the cases where we want don't want to try help with a fast path (IIRC I think it was OperationDirection or something that had first fast-path check for already uppercase and then MakerUpperCase + SequenceEquals again) we could have a AsciiUtils.EqualsIgnoreCase helper which does this pattern for us.

out of topic, sorta: MakeUpperCase is potentially dangerous. It has no length limit so it'll keep scanning potentially out of bounds and output the client anything that it got to. Current use of it is okay-ish, as long as nothing goes wrong (I haven't tried to craft malicious payload to it, so I think that method should have some limit..) 😅

edit: Found the place where we have "fast-path" for upper case (if we know to prioritize it for some reason)

garnet/libs/server/Resp/Objects/ObjectStoreUtils.cs

Lines 75 to 89 in 8856dc3

    
           public OperationDirection GetOperationDirection(ArgSlice input) 
        
           { 
        
               // Optimize for the common case 
        
               if (input.ReadOnlySpan.SequenceEqual("LEFT"u8)) 
        
                   return OperationDirection.Left; 
        
               if (input.ReadOnlySpan.SequenceEqual("RIGHT"u8)) 
        
                   return OperationDirection.Right; 
        
               // Rare case: try making upper case and retry 
        
               MakeUpperCase(input.ptr); 
        
               if (input.ReadOnlySpan.SequenceEqual("LEFT"u8)) 
        
                   return OperationDirection.Left; 
        
               if (input.ReadOnlySpan.SequenceEqual("RIGHT"u8)) 
        
                   return OperationDirection.Right; 
        
               return OperationDirection.Unknown; 
        
           }

We could have something like:

namespace Garnet.Common;

public static class AsciiUtils
{
    public static bool EqualsIgnoreCase(byte* start, byte* end /* (or length) */, ReadOnlySpan<byte> other)
    {
        // get the ptrs
        MakeUpperCase(start, end /* so we don't cause a heartbleed 2.0 */); // We expect "other" to be already upper case AND constant because we control it.
        return span.SequenceEquals(other);
    }
}

but I guess in this case where all the comparisons are in the same method, it is worth it to have just one MakeUpperCase and then do comparisons against the constants.

vazois · 2024-05-13T16:28:13Z

I used MemoryExtensions.SequenceEqual because I read somewhere that it offers better performance ReadOnlySpan.SequenceEqual. I could benchmark it or if someone can chime in that will be great.

They're the same method. MemoryExtensions.SequenceEqual(this ReadOnlySpan<T> a, ReadOnlySpan<T> b) is the extension method that gets binded when you do span.SequenceEqual(other).

There is LINQ Enumerable.SequenceEqual which won't get binded to spans, but you can accidentally bind it to anything that is IEnumerable 😅

Thanks for the clarification. Are you saying span.SequenceEnqual is more readable? I don't have a personal preference. I pushed a new commit where I follow the pattern used in ParseCommand (i.e. MemoryMarshal.Read)

In general, cluster commands are in the slow path, so I am wondering if we should use MakeUpperCase to avoid having two versions of each command type when parsing param byte sequence.

IMO yes. And in the cases where we want don't want to try help with a fast path (IIRC I think it was OperationDirection or something that had first fast-path check for already uppercase and then MakerUpperCase + SequenceEquals again) we could have a AsciiUtils.EqualsIgnoreCase helper which does this pattern for us.

out of topic, sorta: MakeUpperCase is potentially dangerous. It has no length limit so it'll keep scanning potentially out of bounds and output the client anything that it got to. Current use of it is okay-ish, as long as nothing goes wrong (I haven't tried to craft malicious payload to it, so I think that method should have some limit..) 😅

edit: Found the place where we have "fast-path" for upper case (if we know to prioritize it for some reason)

garnet/libs/server/Resp/Objects/ObjectStoreUtils.cs

Lines 75 to 89 in 8856dc3

public OperationDirection GetOperationDirection(ArgSlice input)

{

// Optimize for the common case

if (input.ReadOnlySpan.SequenceEqual("LEFT"u8))

return OperationDirection.Left;

if (input.ReadOnlySpan.SequenceEqual("RIGHT"u8))

return OperationDirection.Right;

// Rare case: try making upper case and retry

MakeUpperCase(input.ptr);

if (input.ReadOnlySpan.SequenceEqual("LEFT"u8))

return OperationDirection.Left;

if (input.ReadOnlySpan.SequenceEqual("RIGHT"u8))

return OperationDirection.Right;

return OperationDirection.Unknown;

}

We could have something like:
namespace Garnet.Common;

public static class AsciiUtils
{
    public static bool EqualsIgnoreCase(byte* start, byte* end /* (or length) */, ReadOnlySpan<byte> other)
    {
        // get the ptrs
        MakeUpperCase(start, end /* so we don't cause a heartbleed 2.0 */); // We expect "other" to be already upper case AND constant because we control it.
        return span.SequenceEquals(other);
    }
}
but I guess in this case where all the comparisons are in the same method, it is worth it to have just one MakeUpperCase and then do comparisons against the constants.

I will see if I can address this. For cluster commands, we already pay the cost of extracting a Span from the receive buffer, so it would be easy to have an uppercase bounded by the Span boundaries. I will have to benchmark the case where MakeUpperCase
is used in ProcessMessages.
Also I was thinking, that for internode communication commands, we don't need to call MakeUpperCase because they already formatted like this and external clients are not supposed to call them directly. So maybe I can separate the switch statement into two parts.

vazois · 2024-05-13T21:23:59Z

I used MemoryExtensions.SequenceEqual because I read somewhere that it offers better performance ReadOnlySpan.SequenceEqual. I could benchmark it or if someone can chime in that will be great.

They're the same method. MemoryExtensions.SequenceEqual(this ReadOnlySpan<T> a, ReadOnlySpan<T> b) is the extension method that gets binded when you do span.SequenceEqual(other).
There is LINQ Enumerable.SequenceEqual which won't get binded to spans, but you can accidentally bind it to anything that is IEnumerable 😅

Thanks for the clarification. Are you saying span.SequenceEnqual is more readable? I don't have a personal preference. I pushed a new commit where I follow the pattern used in ParseCommand (i.e. MemoryMarshal.Read)
In general, cluster commands are in the slow path, so I am wondering if we should use MakeUpperCase to avoid having two versions of each command type when parsing param byte sequence.

IMO yes. And in the cases where we want don't want to try help with a fast path (IIRC I think it was OperationDirection or something that had first fast-path check for already uppercase and then MakerUpperCase + SequenceEquals again) we could have a AsciiUtils.EqualsIgnoreCase helper which does this pattern for us.
out of topic, sorta: MakeUpperCase is potentially dangerous. It has no length limit so it'll keep scanning potentially out of bounds and output the client anything that it got to. Current use of it is okay-ish, as long as nothing goes wrong (I haven't tried to craft malicious payload to it, so I think that method should have some limit..) 😅
edit: Found the place where we have "fast-path" for upper case (if we know to prioritize it for some reason)

garnet/libs/server/Resp/Objects/ObjectStoreUtils.cs

Lines 75 to 89 in 8856dc3

public OperationDirection GetOperationDirection(ArgSlice input)

{

// Optimize for the common case

if (input.ReadOnlySpan.SequenceEqual("LEFT"u8))

return OperationDirection.Left;

if (input.ReadOnlySpan.SequenceEqual("RIGHT"u8))

return OperationDirection.Right;

// Rare case: try making upper case and retry

MakeUpperCase(input.ptr);

if (input.ReadOnlySpan.SequenceEqual("LEFT"u8))

return OperationDirection.Left;

if (input.ReadOnlySpan.SequenceEqual("RIGHT"u8))

return OperationDirection.Right;

return OperationDirection.Unknown;

}

We could have something like:
namespace Garnet.Common;

public static class AsciiUtils
{
    public static bool EqualsIgnoreCase(byte* start, byte* end /* (or length) */, ReadOnlySpan<byte> other)
    {
        // get the ptrs
        MakeUpperCase(start, end /* so we don't cause a heartbleed 2.0 */); // We expect "other" to be already upper case AND constant because we control it.
        return span.SequenceEquals(other);
    }
}
but I guess in this case where all the comparisons are in the same method, it is worth it to have just one MakeUpperCase and then do comparisons against the constants.
I will see if I can address this. For cluster commands, we already pay the cost of extracting a Span from the receive buffer, so it would be easy to have an uppercase bounded by the Span boundaries. I will have to benchmark the case where MakeUpperCase is used in ProcessMessages. Also I was thinking, that for internode communication commands, we don't need to call MakeUpperCase because they already formatted like this and external clients are not supposed to call them directly. So maybe I can separate the switch statement into two parts.

I removed MemoryExtensions and reverted back to span.SequenceEqual. Looks much better now 😄

badrishc · 2024-05-14T00:46:44Z

MakeUpperCase is potentially dangerous. It has no length limit so it'll keep scanning potentially out of bounds and output the client anything that it got to

MakeUpperCase does have a length limit of whatever the input buffer contains. It also stops conversion as soon as it sees a protocol character (such as \r and \n), so it should not make its way into user data even within the input buffer.

…ctions

vazois force-pushed the vazois/parsing-regression-testing branch from 29f3969 to 4c07d9b Compare May 10, 2024 22:40

vazois marked this pull request as ready for review May 13, 2024 18:17

vazois force-pushed the vazois/parsing-regression-testing branch from b8a6950 to 62f8648 Compare May 14, 2024 02:55

badrishc approved these changes May 14, 2024

View reviewed changes

vazois added 11 commits May 14, 2024 15:09

minor cleanup and fix regression for error

d82faf4

code cleanup and refactor

2e49870

refactor cluster command resp parser to adhere to new protocol restri…

790f4fc

…ctions

added parsing tests for cluster command

fd05057

formatting fix

e74dade

make uppercase cluster subcommand

6fa780b

ignore case for client accessible cluster commands

8c786c9

make parse cluster command read friendly

f509b4f

reorg cluster component

1f66f44

remove newline

83fd8b7

renamed cluster folder hierarchy

1187128

vazois force-pushed the vazois/parsing-regression-testing branch from 16d9d25 to 1187128 Compare May 14, 2024 22:16

vazois merged commit 51667fd into microsoft:main May 14, 2024
23 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Cluster Command Parsing Refactor #373

Cluster Command Parsing Refactor #373

vazois commented May 10, 2024 •

edited

PaulusParssinen commented May 10, 2024 •

edited

vazois commented May 13, 2024

vazois commented May 13, 2024

badrishc commented May 14, 2024 •

edited

Cluster Command Parsing Refactor #373

Cluster Command Parsing Refactor #373

Conversation

vazois commented May 10, 2024 • edited

PaulusParssinen commented May 10, 2024 • edited

vazois commented May 13, 2024

vazois commented May 13, 2024

badrishc commented May 14, 2024 • edited

vazois commented May 10, 2024 •

edited

PaulusParssinen commented May 10, 2024 •

edited

badrishc commented May 14, 2024 •

edited