Skip to content

An experiment with string fiddling in C. Is the C++ std::string_view abstraction a good fit for C?

License

Notifications You must be signed in to change notification settings

agl-alexglopez/c-str-view

Repository files navigation

c-str-view

mini-grep

Pictured Above: Mocking up a mini grep program is easy with the right tools. Use mini_grep and any other sample programs as they arrive in the samples/ folder to test the utility and convenience of a str_view.

The str_view type is a simple, copyable, flexible, read only view of const char * data in C. This implementation is experimental for now, lacking any official packaging or robust sample programs. However, this library is well tested and does what is advertised in the interface. The entire implementation can be viewed in str_view/str_view.h/.c and included in any project for some convenient string helpers.

A str_view is a 16-byte struct and, due to this size, is treated throughout the interface as a copyable type. This is neither a trivially cheap nor excessively expensive type to copy. The intention of this library is to abstract away many sharp edges of working with C-strings to provide usage that "just works," not optimize for performance at this time.

There are still improvements to be made to this library as time allows for packaging, sample programs, and further experimentation.

Build Instructions

If you wish to build the code in this repository and run the tests or sample programs as they become available, follow these steps.

  1. Clone the repository.
  2. Review the provided CMakePresets.json and CMakeUserPresets.json to tailor the compile flags to your liking. Currently, there is a provided configuration for gcc-12 and clang that should work on many platforms.
    • Build with either presets cmake --preset=grel for gcc, cmake --preset=crel for clang, or the provided convenience makefile (make grel for gcc, make crel for clang).
  3. Run the tests with make test-rel.
  4. Any sample programs can be found and run in the build/rel/ folder.
  5. Repeat the above steps, replacing any occurence of rel with deb to build in debug mode and run programs in the build/deb/ folder.

The CMake configuration is minimal so change the presets and flags to your liking to build the code.

Interface

/* All functions labelled ATTRIB_PURE have no side effects and if given the
   same arguments with the same underlying data, produce the same
   return value. Importantly, these cannot be marked with the const attribute
   as str_view points to const char * data which cannot promise to remain
   unchanged even if str_view is a read only type. The str_view only promises
   that it will not alter data not that the program will not alter the string
   data to which a str_view points. The ATTRIB_CONST applies only in rare
   cases where no pointers are accessed or dereferenced. The attributes
   relating to null terminators provide stronger compiler warnings if
   supported to ensure that safe string handling occurs on non null and
   null terminated strings. */

/* A str_view is a read-only view of string data in C. It is modeled after
   the C++ std::string_view. It consists of a pointer to const char data
   and a size_t field. Therefore, the exact size of this type may be platform
   dependent but it is small enough that one should prefer to use the provided
   functions when manupulating views. Try to avoid accessing struct fields.
   A str_view is a cheap, copyable type in all functions but swap. */
typedef struct
{
    const char *s;
    size_t sz;
} str_view;

/* Standard three way comparison type in C. See the comparison
   functions for how to interpret the comparison results. ERR
   is returned if bad input is provided to any comparison. */
typedef enum
{
    LES = -1,
    EQL,
    GRT,
    ERR,
} sv_threeway_cmp;

/*==========================  Construction  ================================*/

/* A macro to reduce the chance for errors in repeating oneself when
   constructing an inline or const str_view. The input must be a string
   literal. For example:

      static const str_view prefix = SV("test_");

   One can even use this in code when string literals are used rather than
   saved constants to avoid errors in str_view constructions.

       for (str_view cur = sv_begin_tok(ref, SV(" "));
            !sv_end_tok(ref_view, cur);
            cur = sv_next_tok(ref_view, cur, SV(" "))
       {}

   However saving the str_view in a constant may be more convenient. */
#define SV(str) ((str_view){(str), (sizeof("" str "") - 1)})

/* Constructs and returns a string view from a NULL TERMINATED string.
   It is undefined to construct a str_view from a non terminated string. */
str_view sv(const char str[static const 1]) ATTRIB_NONNULL(1)
    ATTRIB_NULLTERM(1) ATTRIB_PURE;

/* Constructs and returns a string view from a sequence of valid n bytes
   or string length, whichever comes first. The resulting str_view may
   or may not be null terminated at the index of its size. */
str_view sv_n(size_t n, const char str[static const 1]) ATTRIB_NONNULL(2)
    ATTRIB_NULLTERM(2) ATTRIB_PURE;

/* Constructs and returns a string view from a NULL TERMINATED string
   broken on the first ocurrence of delimeter if found or null
   terminator if delim cannot be found. This constructor will also
   skip the delimeter if that delimeter starts the string. This is similar
   to the tokenizing function in the iteration section. */
str_view sv_delim(const char str[static const 1],
                  const char delim[static const 1]) ATTRIB_NONNULL(1, 2)
    ATTRIB_NULLTERM(1, 2) ATTRIB_PURE;

/* Creates the substring from position pos for count length. The count is
   the minimum value between count and (str_view.sz - pos). If an invalid
   position is given greater than str_view length an empty view is returned
   positioned at the end of str_view. This position may or may not hold the
   null terminator. */
str_view sv_substr(str_view sv, size_t pos, size_t count) ATTRIB_PURE;

/* A sentinel empty string. Safely dereferenced to view a null terminator.
   This may be returned from various functions when bad input is given
   such as NULL as the underlying str_view string pointer. */
const char *sv_null(void) ATTRIB_PURE;

/* The end of a str_view guaranted to be greater than or equal to size.
   May be used for the idiomatic check for most string searching function
   return values when something is not found. If a size is returned from
   a searching function it is possible to check it against npos. */
size_t sv_npos(str_view sv) ATTRIB_CONST;

/* Returns true if the provided str_view is empty, false otherwise.
   This is a useful function to check for str_view searches that yield
   an empty view at the end of a str_view when an element cannot be
   found. See sv_svsv or sv_rsvsv as an example. */
bool sv_empty(str_view sv) ATTRIB_CONST;

/* Returns the length of the str_view in O(1) time. The position at
   str_view size is interpreted as the null terminator and not
   counted toward length of a str_view. */
size_t sv_len(str_view sv) ATTRIB_CONST;

/* Returns the bytes of str_view including null terminator. Note that
   string views may not actually be null terminated but the position at
   str_view[str_view.sz] is interpreted as the null terminator and thus
   counts towards the byte count. */
size_t sv_size(str_view sv) ATTRIB_CONST;

/* Returns the bytes of the string pointer to, null terminator included. */
size_t sv_strsize(const char str[static const 1]) ATTRIB_NONNULL(1)
    ATTRIB_NULLTERM(1) ATTRIB_PURE;

/* Swaps the contents of a and b. Becuase these are read only views
   only pointers and sizes are exchanged. */
void sv_swap(str_view *a, str_view *b) ATTRIB_NONNULL(1, 2);

/* Copies the max of str_sz or src_str length into a view, whichever
   ends first. This is the same as sv_n. */
str_view sv_copy(size_t str_sz, const char src_str[static const 1])
    ATTRIB_NONNULL(2) ATTRIB_NULLTERM(1) ATTRIB_PURE;

/* Fills the destination buffer with the minimum between
   destination size and source view size, null terminating
   the string. This may cut off src data if dest_sz < src.sz.
   Returns how many bytes were written to the buffer. */
size_t sv_fill(size_t dest_sz, char dest_buf[static const dest_sz],
               str_view src) ATTRIB_NONNULL(2);

/* Returns a str_view of the entirety of the underlying string, starting
   at the current view pointer position. This guarantees that the str_view
   returned ends at the null terminator of the underlying string as all
   strings used with str_views are assumed to be null terminated. It is
   undefined behavior to provide non null terminated strings to any
   str_view code. */
str_view sv_extend(str_view sv) ATTRIB_PURE;

/*============================  Comparison  ================================*/

/* Returns the standard C threeway comparison between cmp(lhs, rhs)
   between two string views.
   lhs LES( -1  ) rhs (lhs is less than rhs)
   lhs EQL(  0  ) rhs (lhs is equal to rhs)
   lhs GRT(  1  ) rhs (lhs is greater than rhs).
   Comparison is bounded by the shorter str_view length. ERR is
   returned if bad input is provided such as a str_view with a
   NULL pointer field. */
sv_threeway_cmp sv_cmp(str_view lhs, str_view rhs) ATTRIB_PURE;

/* Returns the standard C threeway comparison between cmp(lhs, rhs)
   between a str_view and a c-string.
   str_view LES( -1  ) rhs (str_view is less than str)
   str_view EQL(  0  ) rhs (str_view is equal to str)
   str_view GRT(  1  ) rhs (str_view is greater than str)
   Comparison is bounded by the shorter str_view length. ERR is
   returned if bad input is provided such as a str_view with a
   NULL pointer field. */
sv_threeway_cmp sv_strcmp(str_view lhs, const char rhs[static const 1])
    ATTRIB_NONNULL(2) ATTRIB_NULLTERM(2) ATTRIB_PURE;

/* Returns the standard C threeway comparison between cmp(lhs, rhs)
   between a str_view and the first n bytes (inclusive) of str
   or stops at the null terminator if that is encountered first.
   str_view LES( -1  ) rhs (str_view is less than str)
   str_view EQL(  0  ) rhs (str_view is equal to str)
   str_view GRT(  1  ) rhs (str_view is greater than str)
   Comparison is bounded by the shorter str_view length. ERR is
   returned if bad input is provided such as a str_view with a
   NULL pointer field. */
sv_threeway_cmp sv_strncmp(str_view lhs, const char rhs[static const 1],
                           size_t n) ATTRIB_NONNULL(2)
    ATTRIB_NULLTERM(2) ATTRIB_PURE;

/* Returns the minimum between the string size vs n bytes. */
size_t sv_minlen(const char str[static const 1], size_t n) ATTRIB_NONNULL(1)
    ATTRIB_NULLTERM(1) ATTRIB_PURE;

/*============================  Iteration  ==================================*/

/* For the forward and reverse tokenization use the idiomatic for loop
   to acheive the desired tokenization.

      for (str_view tok = sv_begin_tok(src, delim);
           !sv_end_tok(src, tok),
           tok = sv_next_tok(src, tok, delim))
      {}

      for (str_view tok = sv_rbegin_tok(src, delim);
           !sv_rend_tok(src, tok),
           tok = sv_rnext_tok(src, tok, delim))
      {}

   Other patterns are possible but this is recommended for tokenization.
   The same applies to character iteration.

      for (const char *i = sv_begin(src); i != sv_end(src); i = sv_next(i))
      {}

      for (const char *i = sv_rbegin(src); i != sv_rend(src); i = sv_rnext(i))
      {}

   For character iteration, it is undefined behavior to change the str_view
   being iterated through before the loop terminates. */

/* Finds the first tokenized position in the string view given any length
   delim str_view. Skips leading delimeters in construction. If the
   str_view to be searched stores NULL than the sv_null() is returned. If
   delim stores NULL, that is interpreted as a search for the null terminating
   character or empty string and the size zero substring at the final position
   in the str_view is returned wich may or may not be the null termiator. If no
   delim is found the entire str_view is returned. */
str_view sv_begin_tok(str_view src, str_view delim) ATTRIB_PURE;

/* Returns true if no further tokes are found and position is at the end
   position, meaning a call to sv_next_tok has yielded a size 0 str_view
   that points at the end of the src str_view which may or may not be null
   terminated. */
bool sv_end_tok(str_view src, str_view tok) ATTRIB_PURE;

/* Advances to the next token in the remaining view seperated by the delim.
   Repeating delimter patterns will be skipped until the next token or end
   of string is found. If str_view stores NULL the sv_null() placeholder
   is returned. If delim stores NULL the end position of the str_view
   is returned which may or may not be the null terminator. The tok is
   bounded by the length of the view between two delimeters or the length
   from a delimeter to the end of src, whichever comes first. */
str_view sv_next_tok(str_view src, str_view tok, str_view delim) ATTRIB_PURE;

/* Obtains the last token in a string in preparation for reverse tokenized
   iteration. Any delimeters that end the string are skipped, as in the
   forward version. If src is NULL sv_null is returned. If delim is null
   the entire src view is returned. Though the str_view is tokenized in
   reverse, the token view will start at the first character and be the
   length of the token found. */
str_view sv_rbegin_tok(str_view src, str_view delim) ATTRIB_PURE;

/* Given the current str_view being iterated through and the current token
   in the iteration returns true if the ending state of a reverse tokenization
   has been reached, false otherwise. */
bool sv_rend_tok(str_view src, str_view tok) ATTRIB_PURE;

/* Advances the token in src to the next token between two delimeters provided
   by delim. Repeating delimiters are skipped until the next token is found.
   If no further tokens can be found an empty str_view is returned with its
   pointer set to the start of the src string being iterated through. Note
   that a multicharacter delimiter may yield different tokens in reverse
   than in the forward direction when partial matches occur and some portion
   of the delimeter is in a token. This is because the string is now being
   parsed from right to left. However, the token returned starts at the first
   character and is read from left to right between two delimeters as is
   in the forward tokenization.  */
str_view sv_rnext_tok(str_view src, str_view tok, str_view delim) ATTRIB_PURE;

/* Returns a read only pointer to the beginning of the string view,
   the first valid character in the view. If the view stores NULL,
   the placeholder sv_null() is returned. */
const char *sv_begin(str_view sv) ATTRIB_PURE;

/* Returns a read only pointer to the end of the string view. This
   may or may not be a null terminated character depending on the
   view. If the view stores NULL, the placeholder sv_null() is returned. */
const char *sv_end(str_view sv) ATTRIB_PURE;

/* Advances the pointer from its previous position. If NULL is provided
   sv_null() is returned. */
const char *sv_next(const char c[static 1]) ATTRIB_NONNULL(1)
    ATTRIB_NULLTERM(1) ATTRIB_PURE;

/* Returns the reverse iterator beginning, the last character of the
   current view. If the view is null sv_null() is returned. If the
   view is sized zero with a valid pointer that pointer in the
   view is returned. */
const char *sv_rbegin(str_view sv) ATTRIB_PURE;

/* The ending position of a reverse iteration. It is undefined
   behavior to access or use rend. It is undefined behavior to
   pass in any str_view not being iterated through as started
   with rbegin. */
const char *sv_rend(str_view sv) ATTRIB_PURE;

/* Advances the iterator to the next character in the str_view
   being iterated through in reverse. It is undefined behavior
   to change the str_view one is iterating through during
   iteration. If the char pointer is null, sv_null() is returned. */
const char *sv_rnext(const char c[static 1]) ATTRIB_NONNULL(1) ATTRIB_PURE;

/* Returns the character pointer at the minimum between the indicated
   position and the end of the string view. If NULL is stored by the
   str_view then sv_null() is returned. */
const char *sv_pos(str_view sv, size_t i) ATTRIB_PURE;

/* The characer in the string at position i with bounds checking.
   If i is greater than or equal to the size of str_view the null
   terminator character is returned. */
char sv_at(str_view sv, size_t i) ATTRIB_PURE;

/* The character at the first position of str_view. An empty
   str_view or NULL pointer is valid and will return '\0'. */
char sv_front(str_view sv) ATTRIB_PURE;

/* The character at the last position of str_view. An empty
   str_view or NULL pointer is valid and will return '\0'. */
char sv_back(str_view sv) ATTRIB_PURE;

/*============================  Searching  =================================*/

/* Searches for needle in hay starting from pos. If the needle
   is larger than the hay, or position is greater than hay length,
   then hay length is returned. */
size_t sv_find(str_view hay, size_t pos, str_view needle) ATTRIB_PURE;

/* Searches for the last occurence of needle in hay starting from pos
   from right to left. If found the starting position of the string
   is returned, the same as find. If not found hay size is returned.
   The only difference from find is the search direction. If needle
   is larger than hay, hay length is returned. If the position is
   larger than the hay, the entire hay is searched. */
size_t sv_rfind(str_view hay, size_t pos, str_view needle) ATTRIB_PURE;

/* Returns true if the needle is found in the hay, false otherwise. */
bool sv_contains(str_view hay, str_view needle) ATTRIB_PURE;

/* Returns a view of the needle found in hay at the first found
   position. If the needle cannot be found the empty view at the
   hay length position is returned. This may or may not be null
   terminated at that position. If needle is greater than
   hay length an empty view at the end of hay is returned. If
   hay is NULL, sv_null is returned (modeled after strstr). */
str_view sv_match(str_view hay, str_view needle) ATTRIB_PURE;

/* Returns a view of the needle found in hay at the last found
   position. If the needle cannot be found the empty view at the
   hay length position is returned. This may or may not be null
   terminated at that position. If needle is greater than
   hay length an empty view at hay size is returned. If hay is
   NULL, sv_null is returned (modeled after strstr). */
str_view sv_rmatch(str_view hay, str_view needle) ATTRIB_PURE;

/* Returns true if a prefix shorter than or equal in length to
   the str_view is present, false otherwise. */
bool sv_starts_with(str_view sv, str_view prefix) ATTRIB_PURE;

/* Removes the minimum between str_view length and n from the start
   of the str_view. It is safe to provide n larger than str_view
   size as that will result in a size 0 view to the end of the
   current view which may or may not be the null terminator. */
str_view sv_remove_prefix(str_view sv, size_t n) ATTRIB_PURE;

/* Returns true if a suffix less or equal in length to str_view is
   present, false otherwise. */
bool sv_ends_with(str_view sv, str_view suffix) ATTRIB_PURE;

/* Removes the minimum between str_view length and n from the end. It
   is safe to provide n larger than str_view and that will result in
   a size 0 view to the end of the current view which may or may not
   be the null terminator. */
str_view sv_remove_suffix(str_view sv, size_t n) ATTRIB_PURE;

/* Finds the first position of an occurence of any character in set.
   If no occurence is found hay size is returned. An empty set (NULL)
   is valid and will return position at hay size. An empty hay
   returns 0. */
size_t sv_find_first_of(str_view hay, str_view set) ATTRIB_PURE;

/* Finds the first position at which no characters in set can be found.
   If the string is all characters in set hay length is returned.
   An empty set (NULL) is valid and will return position 0. An empty
   hay returns 0. */
size_t sv_find_first_not_of(str_view hay, str_view set) ATTRIB_PURE;

/* Finds the last position of any character in set in hay. If
   no position is found hay size is returned. An empty set (NULL)
   is valid and returns hay size. An empty hay returns 0. */
size_t sv_find_last_of(str_view hay, str_view set) ATTRIB_PURE;

/* Finds the last position at which no character in set can be found.
   An empty set (NULL) is valid and will return the final character
   in the str_view. An empty hay will return 0. */
size_t sv_find_last_not_of(str_view hay, str_view set) ATTRIB_PURE;

/*============================  Printing  ==================================*/

/* Writes all characters in str_view to specified file such as stdout. */
void sv_print(FILE *f, str_view sv);

Thanks for reading!

About

An experiment with string fiddling in C. Is the C++ std::string_view abstraction a good fit for C?

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published