Skip to content

Commit

Permalink
Fix Docker image; update docs
Browse files Browse the repository at this point in the history
  • Loading branch information
mre committed Jul 25, 2023
1 parent a9f81c7 commit 31ed837
Show file tree
Hide file tree
Showing 4 changed files with 97 additions and 67 deletions.
7 changes: 5 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -59,10 +59,13 @@ COPY --from=builder /usr/local/cargo/bin/ /usr/local/bin/
# Copy tinysearch build directory to be used as the engine (see `--engine-version` option below)
# This is done because we want to use the same image for building and running tinysearch
# and not depend on crates.io for the engine
COPY --from=builder /build/tinysearch/ /app/engine
COPY --from=builder /build/tinysearch/ /engine

# Initialize crate cache
RUN echo '[{"title":"","body":"","url":""}]' > build.json && \
tinysearch --engine-version 'path= "/app/engine"' build.json
tinysearch --engine-version 'path= "/engine"' build.json && \
rm -r build.json wasm_output

ENTRYPOINT ["tinysearch"]
# Use the engine we built above and not the one from crates.io
CMD ["--engine-version", "path= \"/engine\""]
100 changes: 55 additions & 45 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,45 +2,51 @@

![CI](https://github.com/mre/tinysearch/workflows/CI/badge.svg)

tinysearch is a lightweight, fast, full-text search engine. It is designed for static websites.
tinysearch is a lightweight, fast, full-text search engine. It is designed for
static websites.

tinysearch is written in Rust, and then compiled to WebAssembly to run in a browser.
It can be used together with static site generators such as [Jekyll](https://jekyllrb.com/),
[Hugo](https://gohugo.io/), [Zola](https://www.getzola.org/),
[Cobalt](https://github.com/cobalt-org/cobalt.rs), or [Pelican](https://getpelican.com).
tinysearch is written in Rust, and then compiled to WebAssembly to run in a
browser.\
It can be used together with static site generators such as
[Jekyll](https://jekyllrb.com/), [Hugo](https://gohugo.io/),
[Zola](https://www.getzola.org/),
[Cobalt](https://github.com/cobalt-org/cobalt.rs), or
[Pelican](https://getpelican.com).

![Demo](tinysearch.gif)

## Is it tiny?

The test index file of my blog with around 40 posts creates a WASM payload of 99kB (49kB gzipped, 40kB brotli).
The test index file of my blog with around 40 posts creates a WASM payload of
99kB (49kB gzipped, 40kB brotli).\
That is smaller than the demo image above; so yes.

## How it works

tinysearch is a Rust/WASM port of the Python code from the article ["Writing a full-text
tinysearch is a Rust/WASM port of the Python code from the article
["Writing a full-text
search engine using Bloom filters"](https://www.stavros.io/posts/bloom-filter-search-engine/).
It can be seen as an alternative to [lunr.js](https://lunrjs.com/) and
[elasticlunr](http://elasticlunr.com/), which are too heavy for smaller websites and
load a lot of JavaScript.
[elasticlunr](http://elasticlunr.com/), which are too heavy for smaller websites
and load a lot of JavaScript.

Under the hood it uses a [Xor Filter](https://arxiv.org/abs/1912.08258) — a
datastructure for fast approximation of set membership that is smaller than
bloom and cuckoo filters. Each blog post gets converted into a filter that will
Under the hood it uses a [Xor Filter](https://arxiv.org/abs/1912.08258) —
a datastructure for fast approximation of set membership that is smaller than
bloom and cuckoo filters. Each blog post gets converted into a filter that will
then be serialized to a binary blob using
[bincode](https://github.com/bincode-org/bincode). Please note that the
[bincode](https://github.com/bincode-org/bincode). Please note that the
underlying technologies are subject to change.

## Limitations

- Only finds entire words. As a consequence there are no search
suggestions (yet). This is a necessary tradeoff for reducing memory usage. A
trie datastructure was about 10x bigger than the xor filters. New research on
- Only finds entire words. As a consequence there are no search suggestions
(yet). This is a necessary tradeoff for reducing memory usage. A trie
datastructure was about 10x bigger than the xor filters. New research on
compact datastructures for prefix searches might lift this limitation in the
future.
- Since we bundle all search indices for all articles into one static binary, we
recommend to only use it for small- to medium-size websites. Expect around 2 kB
uncompressed per article (~1 kb compressed).
recommend to only use it for small- to medium-size websites. Expect around 2
kB uncompressed per article (~1 kb compressed).

## Installation

Expand All @@ -66,9 +72,9 @@ you can install it with [homebrew](https://brew.sh/):
brew install binaryen
```

Alternatively, you can download the binary from the [release
page](https://github.com/WebAssembly/binaryen/releases) or use your OS package
manager.
Alternatively, you can download the binary from the
[release page](https://github.com/WebAssembly/binaryen/releases) or use your OS
package manager.

After that, you can install tinysearch itself:

Expand All @@ -81,7 +87,8 @@ cargo install tinysearch
A JSON file, which contains the content to index, is required as an input.
Please take a look at the [example file](fixtures/index.json).

ℹ️ The `body` field in the JSON document is optional and can be skipped to just index post titles.
ℹ️ The `body` field in the JSON document is optional and can be skipped to just
index post titles.

Once you created the index, you can run

Expand All @@ -90,19 +97,18 @@ tinysearch fixtures/index.json
```

This will create a WASM module and the JavaScript glue code to integrate it into
your website. You can open the `demo.html` from any webserver to see the
result.
your website. You can open the `demo.html` from any webserver to see the result.

For example, Python has a built-in webserver that can be used for a quick test:

```
python3 -m http.server
python3 -m http.server
```

then browse to http://0.0.0.0:8000/demo.html to run the demo.

You can also take a look at the code examples for different static site
generators [here](https://github.com/mre/tinysearch/tree/master/howto).
generators [here](https://github.com/mre/tinysearch/tree/master/howto).

## Advanced Usage

Expand All @@ -112,37 +118,43 @@ For advanced usage options, run
tinysearch --help
```

Please check what's required to [host WebAssembly in production](https://rustwasm.github.io/book/reference/deploying-to-production.html) -- you will need to explicitly set gzip mime types.
Please check what's required to
[host WebAssembly in production](https://rustwasm.github.io/book/reference/deploying-to-production.html)
-- you will need to explicitly set gzip mime types.

## Docker

If you don't have a full Rust setup available, you can also use our nightly-built Docker images.
If you don't have a full Rust setup available, you can also use our
nightly-built Docker images.

Here is how to quickly try tinysearch with Docker:

```sh
# Download a sample blog index from endler.dev
curl -O https://raw.githubusercontent.com/tinysearch/tinysearch/master/fixtures/index.json
# Create the WASM output
docker run -v $PWD:/tmp tinysearch/cli index.json
docker run -v $PWD:/app tinysearch/cli --engine-version path=\"/engine\" --path /app/wasm_output /app/index.json
```

By default, the most recent stable Alpine Rust image is used. To get nightly, run
By default, the most recent stable Alpine Rust image is used. To get nightly,
run

```sh
docker build --build-arg RUST_IMAGE=rustlang/rust:nightly-alpine -t tinysearch/cli:nightly .
```

### Advanced Docker Build Args

- `WASM_REPO`: Overwrite the wasm-pack repository
- `WASM_BRANCH`: Overwrite the repository branch to use
- `TINY_REPO`: Overwrite repository of tinysearch
- `TINY_BRANCH`: Overwrite tinysearch branch
- `WASM_REPO`: Overwrite the wasm-pack repository
- `WASM_BRANCH`: Overwrite the repository branch to use
- `TINY_REPO`: Overwrite repository of tinysearch
- `TINY_BRANCH`: Overwrite tinysearch branch

## Github action

To integrate tinysearch in continuous deployment pipelines, a [github action](https://github.com/marketplace/actions/tinysearch-action) is available.
To integrate tinysearch in continuous deployment pipelines, a
[github action](https://github.com/marketplace/actions/tinysearch-action) is
available.

```yaml
- name: Build tinysearch
Expand All @@ -154,32 +166,30 @@ To integrate tinysearch in continuous deployment pipelines, a [github action](ht
wasm
```


## Users

The following websites use tinysearch:

* [Matthias Endler's blog](https://endler.dev/2019/tinysearch/)
* [OutOfCheeseError](https://out-of-cheese-error.netlify.app/)
* [Museum of Warsaw Archdiocese](https://maw.art.pl/cyfrowemaw/)
- [Matthias Endler's blog](https://endler.dev/2019/tinysearch/)
- [OutOfCheeseError](https://out-of-cheese-error.netlify.app/)
- [Museum of Warsaw Archdiocese](https://maw.art.pl/cyfrowemaw/)

Are you using tinysearch, too? Add your site here!

## Maintainers

* Matthias Endler (@mre)
* Jorge-Luis Betancourt (@jorgelbg)
* Mad Mike (@fluential)
- Matthias Endler (@mre)
- Jorge-Luis Betancourt (@jorgelbg)
- Mad Mike (@fluential)

## License

tinysearch is licensed under either of

* Apache License, Version 2.0, (LICENSE-APACHE or
- Apache License, Version 2.0, (LICENSE-APACHE or
http://www.apache.org/licenses/LICENSE-2.0)
* MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)
- MIT license (LICENSE-MIT or http://opensource.org/licenses/MIT)

at your option.


[wasm-pack]: https://github.com/rustwasm/wasm-pack
2 changes: 1 addition & 1 deletion assets/crate/Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
[package]
name = "THIS_VALUE_SHOULD_BE_FILLED"
authors = ["Matthias Endler <matthias-endler@gmx.net>"]
version = "0.7.0"
version = "0.8.1"
edition = "2021"
description = "A tiny search engine for static websites"
license = "Apache-2.0/MIT"
Expand Down
55 changes: 36 additions & 19 deletions examples/yew-example-crate/Cargo.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

0 comments on commit 31ed837

Please sign in to comment.