Rust性能相关资料
Contents
性能调优相关
配置
export RUSTFLAGS='-g -C target-cpu=native --emit=asm'
[profile.release]
#opt-level = 3
#debug=true
codegen-units = 1
lto = "fat"
其他
https://www.youtube.com/watch?v=d2ZQ9-4ZJmQ&t=35s
- Back to fundamentals
- early exit conditions (e.g.
3*1
,3*0
) - operational effciencies (e.g.
>>
vs/
) - Parallelism (e.g. simd)
- dynamic programming
- use of efficient types
- early exit conditions (e.g.
- Fixed size slices can perform better
- Inline can both improve and hinder
- primitive types are (almost) always better
- Consider copy/borrow semantics
切换其他的分配器
jemallocator
[dependencies]
jemallocator = "0.3.2"
然后在 main.rs
的最顶部加入下面的代码
#[global_allocator]
static GLOBAL: jemallocator::Jemalloc = jemallocator::Jemalloc;
mimalloc
[dependencies]
mimalloc = { version = "0.1.17", default-features = false }
然后在 main.rs
的最顶部加入下面的代码
#[global_allocator]
static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;
性能对比
在自己的测试中
- 系统默认内存分配器, QPS 为
18199.37 QPS
(100%
) - 使用
jemallocator
分配器, QPS 为22434.20 QPS
(127%
) https://docs.rs/jemallocator/ - 使用
mimalloc
内存分配器. QPS 为22085.08 QPS
(121%
) https://docs.rs/mimalloc/
最后, 选了 jemallocator
, 毕竟 Redis 默认也是使用它. :)
~ 逃. 哈哈~
测试环境
MMMMMMMMMMMMMMMMMMMMMMMMMmds+. OS: Mint 18.3 sylvia
MMm----::-://////////////oymNMd+` Kernel: x86_64 Linux 4.10.0-38-generic
MMd /++ -sNMd: Uptime: 7d 1h 9m
MMNso/` dMM `.::-. .-::.` .hMN: Packages: 2399
ddddMMh dMM :hNMNMNhNMNMNh: `NMm Shell: zsh 5.1.1
NMm dMM .NMN/-+MMM+-/NMN` dMM Resolution: 1920x1080
NMm dMM -MMm `MMM dMM. dMM DE: MATE 1.18.2
NMm dMM -MMm `MMM dMM. dMM WM: Metacity (Marco)
NMm dMM .mmd `mmm yMM. dMM GTK Theme: 'Mint-X' [GTK2/3]
NMm dMM` ..` ... ydm. dMM Icon Theme: Mint-X
hMM- +MMd/-------...-:sdds dMM Font: Noto Sans 9
-NMm- :hNMNNNmdddddddddy/` dMM CPU: Intel Core i5-4590 CPU @ 3.7GHz
-dMNs-``-::::-------.`` dMM GPU: Mesa DRI Intel(R) Haswell Desktop
`/dMNmy+/:-------------:/yMMM RAM: 4707MiB / 15917MiB
./ydNMMMMMMMMMMMMMMMMMMMMM
\.MMMMMMMMMMMMMMMMMMM
rust 版本: 1.45.0
profiler
https://github.com/svenstaro/cargo-profiler#to-install
火焰图
https://github.com/flamegraph-rs/flamegraph
perf
Linux 专有
# 保留符号等调试信息
export RUSTFLAGS='-g'
sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict"
# 然后执行
perf record --call-graph=dwarf cargo run --release
perf report
另一和种 bench
https://github.com/bheisler/criterion.rs
在 toml 中加入
[dependencies]
criterion = "0.3
[dev-dependencies]
criterion = "0.3"
[[bench]]
name = "my_benchmark"
harness = false
path = "src/benches/my_benchmark.rs"
然后执行 cargo bench
, 可看到如下类似输出
ip find time: [750.92 ns 751.02 ns 751.14 ns]
change: [-0.5387% -0.2229% -0.0520%] (p = 0.08 > 0.05)
No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
1 (1.00%) low mild
3 (3.00%) high mild
2 (2.00%) high severe
cargo clippy
https://github.com/rust-lang/rust-clippy
cargo clippy
它会自动分析
资料
- https://likebike.com/posts/How_To_Write_Fast_Rust_Code.html
- https://zhuanlan.zhihu.com/p/111426642
- https://lise-henry.github.io/articles/optimising_strings.html
- https://rust-lang.github.io/packed_simd/perf-guide/prof/linux.html
- https://gist.github.com/jFransham/5c19171f898ca3e33eadb30bbb5e4fd6
- https://gist.github.com/jFransham/369a86eff00e5f280ed25121454acec1
- https://blog.anp.lol/rust/2016/07/24/profiling-rust-perf-flamegraph/
- https://deterministic.space/high-performance-rust.html
调试
LLDB
lldb /path/to/bin
GDB
gdb /path/to/bin
valgrind
valgrind --tool=[memcheck|massif|cachegrind] /path/to/bin
perf
perf stat --event task-clock,context-switches,page-faults,cycles,instructions,branches,branch-misses,cache-references,cache-misses target/release/naive > /dev/null