Lateral has been sunset. A New Chapter begins - find out more

June 26, 2015

How do machines learn meaning?

Computers consist of on/off switches and process meaningless symbols. So how is it that we can hope that computers might understand the meaning of words, products, actions and documents? If most of us consider machine learning to be magic, it is because we don’t yet have an answer to this question. Here, I’ll provide an answer in the context of machines learning the meaning of words. But as we’ll see, the approach is the same everywhere.

Keywords are good and bad

Firstly, some motivation. Why would you want a machine to understand the meaning of a word? Consider the case where it doesn’t, and words are treated as meaningless symbols. In this case, the only way to compare two words is to check if they are the same word. So the machine considers the word ship to be totally unrelated to the word boat: these two words would be as unrelated to one another as the words cat and dynamite.

This "keyword autism" has the advantage of precision — it's useful in keyword search, for instance, if you know the document you want, and you know its exact title. But it is catastrophic for document discovery. Imagine having a research assistant who, when asked to find documents about "nordic boat building" deliberately ignored an article on "scandinavian ship construction" because the words weren’t exactly the same. It’d be time for a new research assistant.

At Lateral, we’ve built a tool for document discovery, for surfacing relevant documents that you didn’t know were there. For us, then, keyword autism was the enemy. Our machines needed to understand that the words ship and boat represent very similar concepts. Our machines needed to learn word meaning.

So, how can a machine understand the meaning of a word?

The key insight, made at the beginning of the information age, is to replace word meaning with something that machines can actually measure. It’s called the Distributional Hypothesis, and claims:

“words are characterised by the company that they keep"

This means, for example, that the words ship and boat must represent related notions because they both occur often with the words stern, sail and sea, but almost never with glycerine or meow. On the other hand, the words that occur with dynamite are very different from those that occur with cat, so these words must represent unrelated notions. Much has been made of this of late in the machine learning community (e.g. word2vec), but the idea is in fact seventy years old.

Familiar idea, different clothing

If the distributional hypothesis seems at all familiar, it's because the same approach is applied in different domains. Consider, for example, building a recommender for an e-Commerce website. Two products are related to the extent that they tend to be purchased together, and two customers are similar to the extent that they buy similar products. The fundamental insight is that objects (be they words or products) are related to one another by their use. The relationships between objects are used as a proxy for any intrinsic meaning the objects might have. Mathematicians will find this point of view familiar from abstract algebra and category theory.

Formalisation

So the machine can learn which words are related by processing text, sentence by sentence, and seeing which words occur together. Formally, we are trying to estimate the probability that a word occurs, given e.g. that the word cat occurs:

We can now forget about word meaning and use these probability distributions, which can be estimated by the machine, in its place. The word cat is then represented by a vector consisting of its co-occurrence probabilities. These vectors live in a very high dimensional vector space, but we can use dimension reduction to make this representation more robust and tractable.

How we do this at Lateral

A task for us, then, is collecting lots of text, so that the machine has an understanding of word relationships from a wide variety of disciplines. In this way, we built an artificial mind that seems to have studied every degree at University. It has studied the news, political science, mathematics, pharmacology, geology and has read patents and case law. You can use it for your own applications with our API, or check out some of the demos.

I hope that has helped demystify our machine learning somewhat!

“

More in

Machine Learning

Using machine learning to segment documents

Breaking documents into “chunks”, like sections and subsections, is easy for humans, but surprisingly hard for computers. In this post we explain why this is, why it’s a valuable problem to solve, and we introduce our new solution.

Machine Learning

Text segmentation using word embeddings

This post describes a simple principle to split documents into coherent segments, using word embeddings.

Machine Learning

Semantic trees for training word embeddings with hierarchical softmax

In this blog post we describe an experiment to construct semantic trees and show how they can improve the quality of the learned embeddings in common word analogy and similarity tasks.

By clicking “Agree”, you agree to the storing of cookies on your device to enhance site navigation, analyse site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

More Options Deny Agree

How do machines learn meaning?

Become a Lateral Pioneer

Keywords are good and bad

So, how can a machine understand the meaning of a word?

Familiar idea, different clothing

Formalisation

How we do this at Lateral

More in

Using machine learning to segment documents

Text segmentation using word embeddings

Semantic trees for training word embeddings with hierarchical softmax

Get into flow.

How do machines learn meaning?

Become a Lateral Pioneer

Keywords are good and bad

So, how can a machine understand the meaning of a word?

Familiar idea, different clothing

Formalisation

How we do this at Lateral

Spread the word

More in

Using machine learning to segment documents

Text segmentation using word embeddings

Semantic trees for training word embeddings with hierarchical softmax

Get into flow.