- I implemented these in my attempt to SIMD optimize Wasm/WASI libc.by ncruces - 1 day ago
https://github.com/ncruces/go-sqlite3/tree/main/sqlite3/libc
For known haystack lengths, and large needles, I found it useful to augment the approach with Quick Search.
- I implemented many different algorithms for searching and splitting strings using SMID methods as well: https://github.com/naver/tamgu/blob/06aedf2f14895925d7b5a8e2... I used a different algorithm than the ones presented here.by clauderoux - 1 day ago
- We know that the libc strstr performs horrible, but musl is fast and state of the art.by rurban - 1 day ago
Now we just need a name, and we can add it to the smart shootout. Wonder how it compares to the best SIMD algos
- missing (2018) ?by mgaunard - 1 day ago
- What is the string sizes thresold for efficiency?by sylware - 1 day ago
Because the SIMD (aligned, size computations, etc) setup code is usually more expensive that byte-oriented basic search for "small" strings.
Yes, it depends heavily on the hardware architecture.
- The swar algorithm has UB because it casts the 1 byte aligned data to 8 byte aligned. Perhaps it suffers some performance issues from unaligned loads?by eska - 1 day ago
- The "AVX2 (generic)" approach is roughly what ripgrep uses (via Rust's `regex` crate) to accelerate most searches. Even something like `\w+\s+Sherlock\s+\w+` will benefit since ripgrep will pluck `Sherlock` out of the regex and search that.by burntsushi - 1 day ago
The actual implementation is here: https://github.com/BurntSushi/memchr?tab=readme-ov-file#algo...
The main difference with the algorithm presented here is that instead of always using the first and last bytes of the needle, a heuristic is used to try to pick two bytes that occur less frequently according to a background frequency distribution.
It ends up being quite a bit faster than just plain Two-Way or even GNU libc's implementation of `memmem`. From the root of the `memchr` repository:
The "prebuilt" benchmarks also demonstrate the API deficiencies of routines like `memmem`. Because of their API, they need to rebuild the state necessary to execute a search for the needle given. But it is often the case that the needle is invariant across repeated calls with different haystacks. So you end up doing that repeated work for no gain.$ rebar rank benchmarks/record/x86_64/2023-12-29.csv -e '^rust/memchr/memmem/(oneshot|prebuilt|twoway)' -e '^libc/memmem/oneshot' Engine Version Geometric mean of speed ratios Benchmark count ------ ------- ------------------------------ --------------- rust/memchr/memmem/prebuilt 2.7.1 1.07 57 rust/memchr/memmem/oneshot 2.7.1 1.39 54 libc/memmem/oneshot unknown 3.15 54 rust/memchr/memmem/twoway 2.7.1 5.26 54
- Now if I can just use SIMD directly from Python without calling out to another language.by azhenley - 1 day ago
- Reminds me of https://github.com/nikneym/hparseby klaussilveira - 1 day ago
Which uses SIMD algos for fast HTTP parsing.
- C# has implemented some SIMD for IndexOf: https://github.com/dotnet/runtime/pull/63285by mwsherman - 1 day ago