性能调优相关

配置

export RUSTFLAGS='-g -C target-cpu=native --emit=asm'

[profile.release]
#opt-level = 3
#debug=true
codegen-units = 1
lto = "fat"

其他

https://www.youtube.com/watch?v=d2ZQ9-4ZJmQ&t=35s

Back to fundamentals
- early exit conditions (e.g. 3*1, 3*0)
- operational effciencies (e.g.>> vs/ )
- Parallelism (e.g. simd)
- dynamic programming
- use of efficient types
Fixed size slices can perform better
Inline can both improve and hinder
primitive types are (almost) always better
Consider copy/borrow semantics

切换其他的分配器

jemallocator

[dependencies]
jemallocator = "0.3.2"

然后在 main.rs 的最顶部加入下面的代码

#[global_allocator]
static GLOBAL: jemallocator::Jemalloc = jemallocator::Jemalloc;

mimalloc

[dependencies]
mimalloc = { version = "0.1.17", default-features = false }

然后在 main.rs 的最顶部加入下面的代码

#[global_allocator]
static GLOBAL: mimalloc::MiMalloc = mimalloc::MiMalloc;

性能对比

在自己的测试中

系统默认内存分配器, QPS 为 18199.37 QPS (100%)
使用 jemallocator 分配器, QPS 为 22434.20 QPS (127%) https://docs.rs/jemallocator/
使用 mimalloc 内存分配器. QPS 为 22085.08 QPS (121%) https://docs.rs/mimalloc/

最后, 选了 jemallocator , 毕竟 Redis 默认也是使用它. :) ~ 逃. 哈哈~

测试环境

 MMMMMMMMMMMMMMMMMMMMMMMMMmds+.        OS: Mint 18.3 sylvia
 MMm----::-://////////////oymNMd+`     Kernel: x86_64 Linux 4.10.0-38-generic
 MMd      /++                -sNMd:    Uptime: 7d 1h 9m
 MMNso/`  dMM    `.::-. .-::.` .hMN:   Packages: 2399
 ddddMMh  dMM   :hNMNMNhNMNMNh: `NMm   Shell: zsh 5.1.1
     NMm  dMM  .NMN/-+MMM+-/NMN` dMM   Resolution: 1920x1080
     NMm  dMM  -MMm  `MMM   dMM. dMM   DE: MATE 1.18.2
     NMm  dMM  -MMm  `MMM   dMM. dMM   WM: Metacity (Marco)
     NMm  dMM  .mmd  `mmm   yMM. dMM   GTK Theme: 'Mint-X' [GTK2/3]
     NMm  dMM`  ..`   ...   ydm. dMM   Icon Theme: Mint-X
     hMM- +MMd/-------...-:sdds  dMM   Font: Noto Sans 9
     -NMm- :hNMNNNmdddddddddy/`  dMM   CPU: Intel Core i5-4590 CPU @ 3.7GHz
      -dMNs-``-::::-------.``    dMM   GPU: Mesa DRI Intel(R) Haswell Desktop 
       `/dMNmy+/:-------------:/yMMM   RAM: 4707MiB / 15917MiB
          ./ydNMMMMMMMMMMMMMMMMMMMMM  
             \.MMMMMMMMMMMMMMMMMMM

rust 版本: 1.45.0

profiler

https://github.com/svenstaro/cargo-profiler#to-install

火焰图

https://github.com/flamegraph-rs/flamegraph

perf

Linux 专有

# 保留符号等调试信息
export RUSTFLAGS='-g'

sudo sh -c " echo 0 > /proc/sys/kernel/kptr_restrict"

# 然后执行
perf record --call-graph=dwarf cargo run --release

perf report

另一和种 bench

https://github.com/bheisler/criterion.rs

在 toml 中加入

[dependencies]
criterion = "0.3

[dev-dependencies]
criterion = "0.3"

[[bench]]
name = "my_benchmark"
harness = false
path = "src/benches/my_benchmark.rs"

然后执行 cargo bench , 可看到如下类似输出

ip find                 time:   [750.92 ns 751.02 ns 751.14 ns]
                        change: [-0.5387% -0.2229% -0.0520%] (p = 0.08 > 0.05)
                        No change in performance detected.
Found 6 outliers among 100 measurements (6.00%)
  1 (1.00%) low mild
  3 (3.00%) high mild
  2 (2.00%) high severe

cargo clippy

https://github.com/rust-lang/rust-clippy

cargo clippy

它会自动分析

资料

调试

LLDB

lldb /path/to/bin

GDB

gdb /path/to/bin

valgrind

valgrind --tool=[memcheck|massif|cachegrind] /path/to/bin

perf

perf stat --event task-clock,context-switches,page-faults,cycles,instructions,branches,branch-misses,cache-references,cache-misses target/release/naive > /dev/null

AFL

https://github.com/google/AFL

Rust性能相关资料

Contents

性能调优相关

配置

其他

切换其他的分配器

jemallocator

mimalloc

性能对比

profiler

火焰图

perf

另一和种 bench

cargo clippy

资料

调试

LLDB

GDB

valgrind

perf

AFL