Skip to content

FollowTheProcess/gowc

Repository files navigation

gowc

License Go Report Card GitHub CI codecov

Toy clone of coreutils wc in Go

Project Description

gowc is a toy reimplementation of wc in Go, mainly written for fun 😃. It's perfectly functional, well tested and correct but there's no real benefit over using it vs the original (aside from maybe the JSON flag).

The main reason I chose to write it was that I discovered you can (sort of) abuse the io.Writer interface to count lines, words etc. The primary benefit being you can then leverage io.Copy from either files or stdin (both of which implement io.Reader).

Using io.Copy means large files automatically get chunked into 32kb blocks and streamed through your program so gowc works seamlessly on enormous files!

So this was a fun experiment to see how far you can take it.

Installation

Compiled binaries for all supported platforms can be found in the GitHub release. There is also a homebrew tap:

brew install FollowTheProcess/homebrew-tap/gowc

Quickstart

Pipe from stdin

gowc < moby_dick.txt

# Or
cat moby_dick.txt | gowc
File          Bytes   Chars   Lines Words
moby_dick.txt 1232922 1232922 23243 214132

Read from file

gowc moby_dick.txt
File          Bytes   Chars   Lines Words
moby_dick.txt 1232922 1232922 23243 214132

Multiple files

Multiple files are counted concurrently using a worker pool 🚀

gowc myfiles/*
File                   Bytes    Chars   Lines Words
.myfiles/onemore.txt   460      460     2     63
.myfiles/another.txt   608      608     2     80
.myfiles/moby_dick.txt 1232922  1232922 23243 214132

JSON

gowc -json moby_dick.txt | jq
{
  "name": "moby_dick.txt",
  "lines": 23243,
  "bytes": 1232922,
  "words": 214132,
  "chars": 1232922
}

You can also do multiple files in JSON:

gowc -json myfiles/*
[
  {
    "name": "myfiles/onemore.txt",
    "lines": 2,
    "bytes": 460,
    "words": 63,
    "chars": 460
  },
  {
    "name": "myfiles/another.txt",
    "lines": 2,
    "bytes": 608,
    "words": 80,
    "chars": 608
  },
  {
    "name": "myfiles/moby_dick.txt",
    "lines": 23243,
    "bytes": 1232922,
    "words": 214132,
    "chars": 1232922
  }
]

Performance

I've not really put too much effort into optimisation, there's potentially some to be had, but it performs fast enough so that you wouldn't notice the difference with the original.

Counting on multiple files happens concurrently in a worker pool across all your cores so even on very high numbers of files it performs well:

bench

That's 9261 files read and counted words, lines, bytes and utf-8 characters in just over 18ms 🚀

Credits

This package was created with copier and the FollowTheProcess/go_copier project template.