Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve performance of encoding #64

Open
wants to merge 4 commits into
base: main
Choose a base branch
from
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Jump to
Jump to file
Failed to load files.
Diff view
Diff view
99 changes: 60 additions & 39 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -102,6 +102,7 @@ impl<'a> BytesToHexChars<'a> {
impl<'a> Iterator for BytesToHexChars<'a> {
type Item = char;

#[inline]
fn next(&mut self) -> Option<Self::Item> {
match self.next.take() {
Some(current) => Some(current),
Expand Down Expand Up @@ -129,7 +130,6 @@ impl<'a> iter::ExactSizeIterator for BytesToHexChars<'a> {
}
}

#[inline]
fn encode_to_iter<T: iter::FromIterator<char>>(table: &'static [u8; 16], source: &[u8]) -> T {
BytesToHexChars::new(source, table).collect()
}
Expand Down Expand Up @@ -257,7 +257,10 @@ from_hex_array_impl! {
#[must_use]
#[cfg(feature = "alloc")]
pub fn encode<T: AsRef<[u8]>>(data: T) -> String {
data.encode_hex()
let data = data.as_ref();
let mut out = vec![0; data.len() * 2];
encode_to_slice(data, &mut out).unwrap();
String::from_utf8(out).unwrap()
Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

When using from_utf8_unckecked here, it can improve performance by about 8%. (It is safe because we emit only hex characters.)
However, I didn't apply that change because I don't know the policy regarding the unsafe code in this crate.
If It is okay with using unsafe code, I'll add that change.

hex_encode              time:   [9.4555 us 9.4828 us 9.5106 us]                        
                        change: [-9.2447% -8.3200% -7.4582%] (p = 0.00 < 0.05)
                        Performance has improved.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some overlap with #66 here I think. In particular I suspect the combination of ExactSizeIterator will allow us to collect into a String efficiently without indirecting through Vec.

}

/// Encodes `data` as hex string using uppercase characters.
Expand All @@ -273,7 +276,10 @@ pub fn encode<T: AsRef<[u8]>>(data: T) -> String {
#[must_use]
#[cfg(feature = "alloc")]
pub fn encode_upper<T: AsRef<[u8]>>(data: T) -> String {
data.encode_hex_upper()
let data = data.as_ref();
let mut out = vec![0; data.len() * 2];
encode_to_slice_upper(data, &mut out).unwrap();
String::from_utf8(out).unwrap()
}

/// Decodes a hex string into raw bytes.
Expand Down Expand Up @@ -326,17 +332,6 @@ pub fn decode_to_slice<T: AsRef<[u8]>>(data: T, out: &mut [u8]) -> Result<(), Fr
Ok(())
}

// generates an iterator like this
// (0, 1)
// (2, 3)
// (4, 5)
// (6, 7)
// ...
#[inline]
fn generate_iter(len: usize) -> impl Iterator<Item = (usize, usize)> {
(0..len).step_by(2).zip((0..len).skip(1).step_by(2))
}

// the inverse of `val`.
#[inline]
#[must_use]
Expand All @@ -347,7 +342,25 @@ const fn byte2hex(byte: u8, table: &[u8; 16]) -> (u8, u8) {
(high, low)
}

/// Encodes some bytes into a mutable slice of bytes.
fn encode_to_slice_inner(
input: &[u8],
output: &mut [u8],
table: &[u8; 16],
) -> Result<(), FromHexError> {
if input.len() * 2 != output.len() {
return Err(FromHexError::InvalidStringLength);
}

for (byte, output) in input.iter().zip(output.chunks_exact_mut(2)) {
let (high, low) = byte2hex(*byte, table);
output[0] = high;
output[1] = low;
}

Ok(())
}

/// Encodes some bytes into a mutable slice of bytes using lowercase characters.
///
/// The output buffer, has to be able to hold exactly `input.len() * 2` bytes,
/// otherwise this function will return an error.
Expand Down Expand Up @@ -381,56 +394,64 @@ const fn byte2hex(byte: u8, table: &[u8; 16]) -> (u8, u8) {
/// # }
/// ```
pub fn encode_to_slice<T: AsRef<[u8]>>(input: T, output: &mut [u8]) -> Result<(), FromHexError> {
if input.as_ref().len() * 2 != output.len() {
return Err(FromHexError::InvalidStringLength);
}

for (byte, (i, j)) in input
.as_ref()
.iter()
.zip(generate_iter(input.as_ref().len() * 2))
{
let (high, low) = byte2hex(*byte, HEX_CHARS_LOWER);
output[i] = high;
output[j] = low;
}
encode_to_slice_inner(input.as_ref(), output, HEX_CHARS_LOWER)
}

Ok(())
/// Encodes some bytes into a mutable slice of bytes using uppercase characters.
///
/// The output buffer, has to be able to hold exactly `input.len() * 2` bytes,
/// otherwise this function will return an error.
///
/// # Example
///
/// ```
/// # use hex::FromHexError;
/// # fn main() -> Result<(), FromHexError> {
/// let mut bytes = [0u8; 4 * 2];
///
/// hex::encode_to_slice_upper(b"kiwi", &mut bytes)?;
/// assert_eq!(&bytes, b"6B697769");
/// # Ok(())
/// # }
/// ```
pub fn encode_to_slice_upper<T: AsRef<[u8]>>(
input: T,
output: &mut [u8],
) -> Result<(), FromHexError> {
encode_to_slice_inner(input.as_ref(), output, HEX_CHARS_UPPER)
}

#[cfg(test)]
mod test {
use super::*;
#[cfg(feature = "alloc")]
use alloc::string::ToString;
#[cfg(feature = "alloc")]
use alloc::vec;
use pretty_assertions::assert_eq;

#[test]
#[cfg(feature = "alloc")]
fn test_gen_iter() {
let result = vec![(0, 1), (2, 3)];

assert_eq!(generate_iter(5).collect::<Vec<_>>(), result);
}

#[test]
fn test_encode_to_slice() {
let mut output_1 = [0; 4 * 2];
encode_to_slice(b"kiwi", &mut output_1).unwrap();
assert_eq!(&output_1, b"6b697769");
encode_to_slice_upper(b"kiwi", &mut output_1).unwrap();
assert_eq!(&output_1, b"6B697769");

let mut output_2 = [0; 5 * 2];
encode_to_slice(b"kiwis", &mut output_2).unwrap();
assert_eq!(&output_2, b"6b69776973");
encode_to_slice_upper(b"kiwis", &mut output_2).unwrap();
assert_eq!(&output_2, b"6B69776973");

let mut output_3 = [0; 100];

assert_eq!(
encode_to_slice(b"kiwis", &mut output_3),
Err(FromHexError::InvalidStringLength)
);
assert_eq!(
encode_to_slice_upper(b"kiwis", &mut output_3),
Err(FromHexError::InvalidStringLength)
);
}

#[test]
Expand Down