A detailed breakdown of the Transformer architecture.

transformer attention_mechanism positional_encoding paper_reading maths

A detailed breakdown of the Transformer architecture.

transformer attention_mechanism positional_encoding paper_reading maths

Softmax function is widely used in the output layer of neural network based model. However, it suffers from the time complexitiy problem resulted from normalizing over the whole vocabulary in the NLP application (e.g. Word2Vec). To deal with it, various approaches for approximating/replacing softmax have been proposed, and this post introduces some of them as well as the maths behind them.

softmax softmax_variants normalization cross_entropy maximum_likelihood_estimation monte_carlo_estimation sampling maths

Softmax function is widely used in the output layer of neural network based model. However, it suffers from the time complexitiy problem resulted from normalizing over the whole vocabulary in the NLP application (e.g. Word2Vec). To deal with it, various approaches for approximating/replacing softmax have been proposed, and this post introduces some of them as well as the maths behind them.

softmax softmax_variants normalization cross_entropy maximum_likelihood_estimation monte_carlo_estimation sampling maths

Cross-entropy is a measure of the difference between two probability distributions for a given random variable or set of events.

information_theory cross_entropy loss_function maths

Cross-entropy is a measure of the difference between two probability distributions for a given random variable or set of events.

The Word2Vec model has two variants, i.e. skip-gram and CBOW. The skip-gram model predicts context words given the target word and CBOW does the reverse.

word2vec word_embedding skip_gram continuous_bag_of_words paper_reading

The Word2Vec model has two variants, i.e. skip-gram and CBOW. The skip-gram model predicts context words given the target word and CBOW does the reverse.

word2vec word_embedding skip_gram continuous_bag_of_words paper_reading

Notes on the paper about node embeddings enhanced with semantic proximity.

graph_embedding node_embedding semantic_proximity paper_reading

Notes on the paper about node embeddings enhanced with semantic proximity.

graph_embedding node_embedding semantic_proximity paper_reading

The proof and notes of the Contraction Mapping Theorem.

contraction_mapping_theorem topology maths

The proof and notes of the Contraction Mapping Theorem.

A collection of definitions and concise points in the long survey of Knowledge Graphs (KGs).

knowledge_graph kg literature_review

A collection of definitions and concise points in the long survey of Knowledge Graphs (KGs).

Notes on the original Graph Neural Network (GNN).

graph_neural_network gnn literature_review

Notes on the original Graph Neural Network (GNN).

We discuss the common three types of Pythagorean means (i.e. *Arithmetic Mean, Geometric Mean* and *Harmonic Mean*) in this post, with emphasis on the interpretation of the Harmonic Mean.

pythagorean Arithmetic Mean Geometric Mean Harmonic Mean maths

We discuss the common three types of Pythagorean means (i.e. *Arithmetic Mean, Geometric Mean* and *Harmonic Mean*) in this post, with emphasis on the interpretation of the Harmonic Mean.

pythagorean Arithmetic Mean Geometric Mean Harmonic Mean maths