Word2Vec

Word2vec is a set of models used to produce word embeddings, that is, the creation of a high dimensional vector space (usually hundreds of dimensions) in which each individual word is a vector (or a coordinate). This space is designed to have the property that words occurring in similar contexts in the source corpora appear close to each other in the space. Not only do these spaces allow for similar words to appear in clusters, but the algorithm preserves certain semantic relationships in a remarkable fashion (e.g. the vector that leads to a singular to a plural form of a noun remains stable throughout the space, as does the masculine/feminine versions of entities, etc.).

This Stanford lecture provides a comprehensive introduction to the model:

Allison Parish integrated these models into her practice (see this post, with some exciting research into phonetic similarity vectors. Her ‘Average Novel’, using word2vec to produce ‘average words’ (averaging vector values) from a large corpus of Gutenberg fiction, left me a little disappointed, and I think there is more work and exploration to be made using these models on a semantic level.

I already experimented with Word2Vec when trying to produce visalisations of various features in wordsquares. However, I haven’t really used them for creative purposes yet (one could imagine, for instance, various ways in which one could trace paths, or create shapes, in the space, hopping from cluster to cluster using certain predefined steps…).