Projects

Research and Practical Aspects of AI/ML

Common Mistakes in Machine Learning and Data Science

Providing a more holistic approach to minimizing errors. Unfortunately, I see that these very basic mistakes are repeated all over again. New mistakes are added from time to time.

Concatenated MNIST (CMNIST) Dataset

MNIST-like datasets are still used for a lot of fundamental research especially with respect to shallow models that may end up running on tiny micro-controllers. For anything outside such experiments the dataset should not be taking too seriously ;). CMNIST turns 784 pixels into something truly ridiculous and challenging for such models by concatenating various permutations of MNIST-like datasets into well specified new and large datasets.

Revisiting ML/BruteForceML

There are quite a few blog posts that fall into this category. The idea behind this micro project is to deploy a simple brute-force approach following best practices and set splits and pre-processing with some strict time limits and compare it against the baseline from publications a given dataset originates from. Relatively little AutoML tactics were deployed but often it outperformed AutoML frameworks such like auto-sklearn, auto-keras, and TPOT by quite a margin. For practical deployments the results of this micro project have some nice implications. Most of the experiments were conducted in 2018/2019. As of December 2022 most of the code is migrated to newer versions of some libraries used and to a proper structure instead of independent notebooks. Some time in 2023 I'm most likely going to release either technical report on this or turn it into a proper peer-reviewed paper or so ;).

Software

zenodo-dl
- CLI to download Zenodo records
mish-cuda-dummy
various utils

Research and Practical Aspects of AI/ML

Common Mistakes in Machine Learning and Data Science

Concatenated MNIST (CMNIST) Dataset

Revisiting ML/BruteForceML

Software

Build and Packaging Scripts