Skip to content

Commit 5d77f2c

Browse files
committed
minor fix
1 parent 488618d commit 5d77f2c

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

_data/publications.yaml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -10,7 +10,7 @@
1010
- gradient-based optimization
1111
- graph structure learning
1212
- latent random variables
13-
abstract: Latent categorical variables are frequently found in deep learning architectures. They can model actions in discrete reinforcement-learning environments, represent categories in latent-variable models, or express relations in graph neural networks. Despite their widespread use, their discrete nature poses significant challenges to gradient-descent learning algorithms. While a substantial body of work has offered improved gradient estimation techniques, we take a complementary approach. Specifically, we: 1) revisit the ubiquitous softmax function and demonstrate its limitations from an information-geometric perspective; 2) replace the softmax with the catnat function, a function composed by a sequence of hierarchical binary splits; we prove that this choice offers significant advantages to gradient descent due to the resulting diagonal Fisher Information Matrix. A rich set of experiments — including graph structure learning, variational autoencoders, and reinforcement learning — empirically show that the proposed function improves the learning efficiency and yields models characterized by consistently higher test performance. Catnat is simple to implement and seamlessly integrates into existing codebases. Moreover, it remains compatible with standard training stabilization techniques and, as such, offers a better alternative to the softmax function.
13+
abstract: Latent categorical variables are frequently found in deep learning architectures. They can model actions in discrete reinforcement-learning environments, represent categories in latent-variable models, or express relations in graph neural networks. Despite their widespread use, their discrete nature poses significant challenges to gradient-descent learning algorithms. While a substantial body of work has offered improved gradient estimation techniques, we take a complementary approach. Specifically, we: 1) revisit the ubiquitous softmax function and demonstrate its limitations from an information-geometric perspective; 2) replace the softmax with the catnat function, a function composed by a sequence of hierarchical binary splits; we prove that this choice offers significant advantages to gradient descent due to the resulting diagonal Fisher Information Matrix. A rich set of experiments - including graph structure learning, variational autoencoders, and reinforcement learning - empirically show that the proposed function improves the learning efficiency and yields models characterized by consistently higher test performance. Catnat is simple to implement and seamlessly integrates into existing codebases. Moreover, it remains compatible with standard training stabilization techniques and, as such, offers a better alternative to the softmax function.
1414
bibtex: >
1515
@article{manenti2025beyond,
1616
title={Beyond Softmax: A Natural Parameterization for Categorical Random Variables},

0 commit comments

Comments
 (0)