Topological Deep Learning
An In-Depth Survey and Research Proposal
Executive Summary
Topological Deep Learning (TDL) works on richer structures like simplicial complexes and hypergraphs that can represent multi-way connections directly, which is something standard graphs simply cannot do. This report goes through the major TDL developments since 2021 and proposes a project applying a combinatorial complex neural network to fake social media community detection — a problem where modeling groups rather than pairs gives you a real, practical edge.
Introduction: What Graphs Cannot Do
A graph gives you nodes and edges — N entities and the pairwise relationships between them. That covers a lot of the world, which is why Graph Neural Networks took off the way they did. But pairwise relationships are, almost by definition, a simplification. Real systems have higher-order interactions that fall apart when you try to reduce them to pairs.
A few examples that make this concrete.
In academic collaboration networks, a paper with four co-authors is not just six bilateral relationships. There is something about the group dynamic, the division of labor, the shared intellectual project that lives at the level of the group and not in any individual edge. In social media, a coordinated bot campaign involves accounts acting together — posting at the same time, retweeting in chains — and that coordination belongs to the group, not to any pair of accounts. In biology, gene regulatory networks have clusters of genes that co-regulate each other, where the functional unit is the cluster, not the gene pair.
In all three cases, forcing the data into pairwise edges means throwing away structure that actually matters. Topological Deep Learning holds onto it.
How We Got Here: A Brief History of TDL
TDL went from scattered theoretical ideas to a real research field in just a few years.
| Year | What happened |
|---|---|
| 2021 | Bodnar et al. built CW Networks, running message passing on cell complexes. Yang et al. developed Simplicial CNNs that operate across nodes, edges, and triangles. |
| 2022 | Hajij et al. unified graphs, hypergraphs, and simplicial complexes under one attention-based framework. Papillon et al. standardized the terminology. Ramamurthy et al. showed you could capture topology through a contrastive loss instead of message passing. |
| 2024 | Papamarkou et al. positioned TDL as the next frontier for relational learning. Maggs et al. achieved universal approximation using exterior calculus. Montagna et al. adapted the Mamba sequence model to simplicial complexes. Telyatnikov et al. released TopoBench to bring some standardization to evaluation. |
Survey of Major TDL Architectures
A few things worth saying upfront before getting into specific papers. Several of these works claim theoretical universality but only test on narrow benchmarks. Computational cost is consistently underreported. And almost nothing has been done on unsupervised or generative tasks — the field is still heavily focused on supervised classification.
Architectures
Hajij et al. built the first framework that genuinely covers graphs, hypergraphs, and cell complexes under a single attention-based model. It is expensive and mostly tested on geometric data, but it is foundational — the first serious attempt to unify these structures.
Giusti et al. brought learned attention into the simplicial setting, which is a real improvement over uniform aggregation. Before this, all neighbors contributed equally regardless of relevance.
Ramamurthy et al. showed topology could be encoded through a contrastive loss without any message passing. This is a genuinely different approach, though so far it only works at the node level.
Montagna et al. adapted the Mamba sequence model to treat simplicial neighborhoods as sequences. This opens a path to more efficient long-range reasoning without the quadratic cost of attention.
Maggs et al. achieved universal approximation using exterior calculus with no message passing at all — a theoretically significant result that points in a very different architectural direction.
Infrastructure and Benchmarking
Papillon et al. produced a survey that standardized the inconsistent terminology across papers and organized the ICML 2023 TDL Challenge to push reproducibility forward.
Telyatnikov et al. released TopoBench, a unified evaluation platform that makes it possible to actually compare methods across the same datasets and tasks.
Papamarkou et al. framed TDL as the next frontier for relational learning in a widely co-authored position paper that has helped bring the community together around shared goals.
What Is Still Missing
The gaps here are worth naming directly. There is no serious work on dynamic or temporal structures. Interpretability is almost completely ignored. And very little attention has been paid to what happens when the input topology is noisy or incomplete — which in most real applications it will be.
Detecting Fake Social Media Communities Using TDL
The Problem
Standard GNNs struggle with coordinated inauthentic behavior because coordination is a group-level property. When you model it through individual edges, the signal gets diluted. Representing communities as hyperedges gives the model a direct handle on the right level of abstraction — the group — rather than trying to reconstruct it from pairs.
Model Design
The architecture is a Combinatorial Complex Neural Network (CCNN) built in four stages.
Stage 1 — Graph Lifting
The social graph is lifted into a combinatorial complex where identified communities become hyperedges. The result is a three-level structure: nodes, edges, and community-level cells.
Stage 2 — Feature Extraction
Node features come from account metadata: age, posting frequency, follower ratios. Hyperedge features capture group-level properties: internal density and temporal coordination scores.
Stage 3 — CCNN Layers
Several CCNN layers with learned attention aggregate information across all cell dimensions simultaneously, following the SAN design from Giusti et al.
Stage 4 — Classification Head
Community embeddings are pooled and passed through a two-layer MLP for binary classification. There is an optional contrastive loss to push fake and real community embeddings apart in the representation space.
Data and Evaluation
| Component | Details |
|---|---|
| Controlled experiments | Synthetic stochastic block model (SBM) graphs |
| Primary real-world dataset | TwiBot-20 |
| Supplementary data | Kaggle and SNAP network datasets |
| Primary metrics | F1, AUC-ROC (weighted heavily given class imbalance) |
| Baselines | Standard GCN, HyperGCN |
| Ablations | Hyperedges, attention, contrastive loss — isolated individually |
What Could Go Wrong
Community detection heuristics are noisy. Poor hyperedge structure means the model starts with corrupted topology, and there is no clean way to fix that downstream. The model is more complex than a GCN and more sensitive to hyperparameter choices. Real bot datasets carry noisy labels that cannot be fully engineered away.
We plan to document failed experiments alongside the ones that work — both because it is good practice and because the venues we are targeting expect it.
Conclusion
TDL has grown from scattered ideas into a recognizable field with surveys, competitions, and benchmarking infrastructure. The case for higher-order modeling is solid, and there is enough evidence now that it delivers real gains when the domain genuinely has group-level structure.
The gaps are still real. Most papers test on narrow tasks, scalability is understudied, interpretability is almost entirely ignored, and the assumption of clean input topology is too optimistic for most practical settings.
The fake community detection project deals with both the empirical and practical sides at the same time. It is a problem with real stakes and a clear structural argument for why higher-order modeling belongs here. If the results hold up, it would be a grounded demonstration that TDL offers something a standard GCN cannot.
Next steps: implement the pipeline, run experiments, write up. Target venues are the ICML and NeurIPS topology workshops for early findings, then a full paper at IEEE Transactions on Network Science or the ACM Web Conference. All code will be released publicly.
References
-
Hajij, M. et al. (2022). Topological Deep Learning: Going Beyond Graph Data. arXiv:2206.00606. https://arxiv.org/abs/2206.00606
-
Papillon, M. et al. (2023). Architectures of Topological Deep Learning: A Survey of Message-Passing Topological Neural Networks. arXiv:2304.10031. https://arxiv.org/abs/2304.10031
-
Papamarkou, T. et al. (2024). Position: Topological Deep Learning is the New Frontier for Relational Learning. arXiv:2402.08871. https://arxiv.org/abs/2402.08871
-
Montagna, F. et al. (2024). Topological Deep Learning with State-Space Models: A Mamba Approach for Simplicial Complexes. arXiv:2409.12033. https://arxiv.org/abs/2409.12033
-
Papamarkou, T. et al. (2024). Position: Topological Deep Learning is the New Frontier for Relational Learning (Full HTML Version). https://arxiv.org/html/2402.08871v1
Research Assistance
- Anthropic. (2025). Claude Sonnet 4.6. Used for literature synthesis and writing assistance. https://claude.ai
- OpenAI. (2025). ChatGPT Deep Research. Used for background research and paper discovery. https://openai.com