Lateral has been sunset. A New Chapter begins - find out more

August 15, 2015

Teaching machines new languages

Previously we've written about how machines can learn meaning. One of the exciting opportunities of this approach is that it also means they can learn new languages very quickly. All you need is enough text data. Wikipedia offers a great starting point and partnering with content providers enables us to quickly gather additional data. We have recently started working on supporting new languages, and thought we would share some initial impressions here.

No one on the team needs to speak the language

While it would be awesome to have a representative from every language on the team (we currently cover about 7), this isn't always possible. So what's amazing about teaching a machine a new language is that a team doesn't require a native language speaker to achieve it.

With a mixture of standardised testing and some Google translate quality control, anyone can train the machine to learn a new language. It's a simple observation but one that I think is pretty cool.

The tendency towards English

Another simple observation, since there are many machine learning-based language services that work for English only, there are often opportunities that can readily be filled by providers whose software is natively language-agnostic.

For new companies entering this space I would recommend considering new languages early on. It feels like something that can be put off mentally even once it’s worthwhile doing.

Next steps

We will be releasing our first new language APIs publicly in the near future. If you have text content in other languages than English that you would be interested in recommending, please let us know. We'd also love to hear from you if you have a lot of text content in any language, and would like to share it with us to help us train a recommender model.

If there are any languages you would like to see supported, especially those that you feel there is a general lack of support for click here to suggest one.

If you're working on machine learning solutions for multiple languages or are considering training new languages and have any questions please get in touch. It would be great to share notes, also with regards to opportunities for understanding multiple languages!

Finally if you know any large open access text databases in academia or law for any language, we would love to hear about it.

“

More in

Machine Learning

Using machine learning to segment documents

Breaking documents into “chunks”, like sections and subsections, is easy for humans, but surprisingly hard for computers. In this post we explain why this is, why it’s a valuable problem to solve, and we introduce our new solution.

Machine Learning

Text segmentation using word embeddings

This post describes a simple principle to split documents into coherent segments, using word embeddings.

Machine Learning

Semantic trees for training word embeddings with hierarchical softmax

In this blog post we describe an experiment to construct semantic trees and show how they can improve the quality of the learned embeddings in common word analogy and similarity tasks.

By clicking “Agree”, you agree to the storing of cookies on your device to enhance site navigation, analyse site usage, and assist in our marketing efforts. View our Privacy Policy for more information.

More Options Deny Agree

Teaching machines new languages

Become a Lateral Pioneer

No one on the team needs to speak the language

The tendency towards English

Next steps

More in

Using machine learning to segment documents

Text segmentation using word embeddings

Semantic trees for training word embeddings with hierarchical softmax

Get into flow.

Teaching machines new languages

Become a Lateral Pioneer

No one on the team needs to speak the language

The tendency towards English

Next steps

Spread the word

More in

Using machine learning to segment documents

Text segmentation using word embeddings

Semantic trees for training word embeddings with hierarchical softmax

Get into flow.