67 lines
2.3 KiB
Markdown
67 lines
2.3 KiB
Markdown
blake3
|
|
------
|
|
|
|
[![GoDoc](https://godoc.org/lukechampine.com/blake3?status.svg)](https://godoc.org/lukechampine.com/blake3)
|
|
[![Go Report Card](http://goreportcard.com/badge/lukechampine.com/blake3)](https://goreportcard.com/report/lukechampine.com/blake3)
|
|
|
|
```
|
|
go get lukechampine.com/blake3
|
|
```
|
|
|
|
`blake3` implements the [BLAKE3 cryptographic hash function](https://github.com/BLAKE3-team/BLAKE3).
|
|
This implementation aims to be performant without sacrificing (too much)
|
|
readability, in the hopes of eventually landing in `x/crypto`.
|
|
|
|
In addition to the pure-Go implementation, this package also contains AVX-512
|
|
and AVX2 routines (generated by [`avo`](https://github.com/mmcloughlin/avo))
|
|
that greatly increase performance for large inputs and outputs.
|
|
|
|
Contributions are greatly appreciated.
|
|
[All contributors are eligible to receive an Urbit planet.](https://twitter.com/lukechampine/status/1274797924522885134)
|
|
|
|
|
|
## Benchmarks
|
|
|
|
Tested on a 2020 MacBook Air (i5-7600K @ 3.80GHz). Benchmarks will improve as
|
|
soon as I get access to a beefer AVX-512 machine. :wink:
|
|
|
|
### AVX-512
|
|
|
|
```
|
|
BenchmarkSum256/64 120 ns/op 533.00 MB/s
|
|
BenchmarkSum256/1024 2229 ns/op 459.36 MB/s
|
|
BenchmarkSum256/65536 16245 ns/op 4034.11 MB/s
|
|
BenchmarkWrite 245 ns/op 4177.38 MB/s
|
|
BenchmarkXOF 246 ns/op 4159.30 MB/s
|
|
```
|
|
|
|
### AVX2
|
|
|
|
```
|
|
BenchmarkSum256/64 120 ns/op 533.00 MB/s
|
|
BenchmarkSum256/1024 2229 ns/op 459.36 MB/s
|
|
BenchmarkSum256/65536 31137 ns/op 2104.76 MB/s
|
|
BenchmarkWrite 487 ns/op 2103.12 MB/s
|
|
BenchmarkXOF 329 ns/op 3111.27 MB/s
|
|
```
|
|
|
|
### Pure Go
|
|
|
|
```
|
|
BenchmarkSum256/64 120 ns/op 533.00 MB/s
|
|
BenchmarkSum256/1024 2229 ns/op 459.36 MB/s
|
|
BenchmarkSum256/65536 133505 ns/op 490.89 MB/s
|
|
BenchmarkWrite 2022 ns/op 506.36 MB/s
|
|
BenchmarkXOF 1914 ns/op 534.98 MB/s
|
|
```
|
|
|
|
## Shortcomings
|
|
|
|
There is no assembly routine for single-block compressions. This is most
|
|
noticeable for ~1KB inputs.
|
|
|
|
Each assembly routine inlines all 7 rounds, causing thousands of lines of
|
|
duplicated code. Ideally the routines could be merged such that only a single
|
|
routine is generated for AVX-512 and AVX2, without sacrificing too much
|
|
performance.
|