Facebook’s Artificial Intelligence Research Lab Releases Open Source FastText on GitHub

John Mannes | Tech Crunch | June 18, 2016

Every day, billions of pieces of content are shared on Facebook. To keep up with the data, Facebook has been using a variety of tools to classify text. Traditional methods of classification, like deep neural networks are accurate, but have serious training requirements. In an effort to classify both accurately and easily, Facebook’s Artificial Intelligence Research (FAIR) lab developed fastText.

Today, fastText is going open source so developers can implement its libraries anywhere. FastText supports both text classification and learning word vector representations through techniques like bag of words and subword information. Based on the skip-gram model, words are represented as bag of character n-grams with vectors representing each character n-gram.

For those less artificially intelligent, the bag of words process is fast because it essentially ignores word order and instead focuses on the occurrences of a word. “Words” are represented in a multidimensional space and linear algebra is used to calculate the relationship between a query and a categorized set of words. Remember that when we feed a computer text, we are starting from scratch. To adults, grammar is intuitive — we know what words are, where they end and where they begin...