Introduction
We investigate the computational power of transformer architectures through the lens of circuit complexity. Specifically, we ask: can transformers solve problems that require super-constant depth in traditional circuit models?
This question sits at the intersection of classical complexity theory and modern machine learning, offering insights into the fundamental capabilities and limitations of attention-based architectures.