“Capsule networks without routing procedures” by Chen et al. was under review for ICLR 2020 but ultimately withdrawn by the authors. (But I have to note that the reviews were even worse than what they criticized in the paper - like I keep saying: peer-review is broken beyond repair) While I would consider the paper as badly-written and with some flaws, I like to investigate some of their ideas because not everything they describe fits to the original CapsNet approach but they look interesting and show some interesting results. So anyhow, let’s see what they present/propose in their paper.
As a professional, I like to use neural network architectures that are less common but make a lot of sense from a physical point of view. They usually require orders of magnitude less of trainable parameters and require less data to achieve state-of-the-art performance. Using complex-valued neural networks (complex numbers, quaternions, octonions) helps a lot to reduce model sizes and increase inference speed by quite a large margin when dealing with something that we would express with complex numbers anyhow to reduce (manual) calculations. With images however, capsule networks are the most promising thing around. They showed good results for segmentaion (e.g. SegCaps, MatwoCaps) as well as for character recognition and 3D object detection. However, their major downside of capsule networks is their routing procedure. It slows down the whole training process. A major disadvantage of less common architectures is that they often show good empirical evidence that they work, but most papers on them are poorly written, sometimes lack a bit of mathematical correctness or require math that most people holding a CS degree are afraid of. In combination with a lack of easy-to-use implementations for PyTorch and TensorFlow, they are often talked down because it is too much work/too challenging for most people working in this field to implement them.
To overcome some of these issues, the authors propose P-CapsNets. P-Caps come with two distinct features:
- no routing algorithm
- no up-front 2D convolutional layers.
P-Caps generalize much better than CNN. They show a much smaller generalization gap. However, P-Caps have one major disadvantage. They are more susceptible to white-noise adversarial attacks.
So, what are hey proposing? They propose to treat routing by agreement as linear combinations and use 3D tensors (yes, proper tensors) to store capsules. From there on, it is just tensor calculus instead of iterative routing procedures which speeds training up quite a bit.
Once I’ll have some time slots available, I’ll test P-CapsNets on the CMNIST dataset and see how good they are in a gray-scale MNIST-like environment.