Contents



Introduction

When it comes to deploying neural networks or data science related projects it is important to benchmark all components to identify bottlenecks. On the deployment side of any project this is even more important than on the development/training side. However, there is no point in choosing much slower solutions on the development/training side to avoid a few changes.

Most of my notes contain some micro-benchmarks. This list here covers a wider range of benchmarks that might be worth considering when architecting a data science or machine learning project. NB!: These benchmarks provide some initial impressions however they do not free us from micro-benchmarking our own source code to identify more bottlenecks.

Database/Data Frame Benchmarks

  • Database-like ops benchmark
    • benchmarks for doing database like operations of various sizes for data frames such as pandas, data.table and others
  • modin-benchmark
    • benchmark I programmed in 2019 shortly after modin was released

I/O Benchmarks

  • CSV readers
    • benchmarks for reading csv files using Julia, Python and Rust

General Computation Bechmarks

Graph/Network Benchmarks

Hardware Benchmarks

  • Deep Learning GPU Benchmarks
    • various deep learning benchmarks of various nvidia GPUs (different number of GPUs, different batch sizes, different models)

Machine Learning Framework Benchmarks

Machine Learning Model Benchmarks

Web Framework Benchmarks