Machine Learning

Using machine learning to segment documents

Breaking documents into “chunks”, like sections and subsections, is easy for humans, but surprisingly hard for computers. In this post we explain why this is, why it’s a valuable problem to solve, and we introduce our new solution.

Machine Learning

Text segmentation using word embeddings

This post describes a simple principle to split documents into coherent segments, using word embeddings.

Machine Learning

Semantic trees for training word embeddings with hierarchical softmax

In this blog post we describe an experiment to construct semantic trees and show how they can improve the quality of the learned embeddings in common word analogy and similarity tasks.

Machine Learning

Crosslingual document comparison

How can you learn a map from a German language to an English language word vectorisation model, to enable crosslingual document comparison?

Machine Learning

A fastText-based hybrid recommender

By labelling documents with the users who read them, we used fastText to hack together a “hybrid recommender” system.

Machine Learning

Clustering debates from UK politicians

What kind of language do British parliamentarians use? We used the Lateral API to provide an overview by clustering debates and creating word clouds.

Machine Learning

Teaching machines new languages

Previously we wrote about how machines can learn meaning. An exciting opportunity of this approach is that it also enables teaching machines new languages.

Machine Learning

The arXiv as Dataset

The arXiv is a repository of over 1 million preprints. It is truly open access, and excellent for testing language modelling / machine learning prototypes.

Machine Learning

How do machines learn meaning?

Computers consist of on/off switches and process meaningless symbols. So how is it that we can hope that machines learn meaning of words and documents?

Machine Learning

The Unknown Perils of Mining Wikipedia

If a machine is to learn about humans from Wikipedia, it must experience the corpus as a human sees it and ignore the mass of robot-generated pages.

Machine Learning

Leveraging machine learning to discover research

By ignoring citation graphs and keywords, you can discover research and researchers you never knew existed. Check it out!

By clicking “Agree”, you agree to the storing of cookies on your device to enhance site navigation, analyse site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

More Options Deny Agree

Machine Learning

Using machine learning to segment documents

Text segmentation using word embeddings

Semantic trees for training word embeddings with hierarchical softmax

Crosslingual document comparison

A fastText-based hybrid recommender

Clustering debates from UK politicians

Teaching machines new languages

The arXiv as Dataset

How do machines learn meaning?

The Unknown Perils of Mining Wikipedia

Leveraging machine learning to discover research

Get into flow.