Text: it's everywhere. It fills up our social feeds, clutters our inboxes, and commands our attention like nothing else. It is oh so familiar, and yet, as a programmer, it is oh so strange. We learn the basics of spoken and written language at a very young age and the more formal side of it in high school and college, yet most of us never get beyond very simple processing rules when it comes to how we handle text in our applications. And yet, by most accounts, unstructured content, which is almost always text or at least has a text component, makes up a vast majority of the data we encounter. Don't you think it is time you upgraded your skills to better handle text?
Thankfully, open source is chock full of high-quality libraries to solve common problems in text processing like sentiment analysis, topic identification, automatic labeling of content, and more. More importantly, open source also provides many building block libraries that make it easy for you to innovate without having to reinvent the wheel. If all of this stuff is giving you flashbacks to your high school grammar classes, not to worry—we've included some useful resources at the end to brush up your knowledge as well as explain some of the key concepts around natural language processing (NLP). To begin your journey, check out these projects:
If all of this talk of parsing, tokenization, and named entities has left you wondering how to get started, be sure to check out the following books:
Once you've graduated to more advanced NLP tasks, you may also wish to check out projects like Apache cTakes (aimed at medical NLP), Apache Mahout, and MALLET from UMass Amherst. If you are looking to try out new approaches using big data analysis and complex machine learning, be sure to check out the Deeplearning4J project.
With a little practice and creativity, combined with the power of open source and the projects above, your next application just might be at the forefront of truly making language processing as natural as handling all those zeros and ones!
|5 Open Source Natural Language Processing Tools was authored by Grant Ingersoll and published in Opensource.com. It is being republished by Open Health News under the terms of the Creative Commons Attribution-ShareAlike 4.0 International License. The original copy of the article can be found here.|