The transition into the information age has brought a massive proliferation of data, but fortunately, at the same time there has been rapid innovation in tools to analyze this data. In the past, the focus of most large scale data analysis solutions has been on metrics that are easily measured – the number of visitors to a webpage, what products visitors purchase, how many ‘likes’ certain posts get. However, focusing only on things that are easy to measure can mean missing the most important data.
New methodologies in data analysis seek to change that to tap the potential of a much wider selection of data sources.
One of these data analysis technique is Text analytics, also known as text mining, a way of transforming raw, unstructured text into structured data, which can then be measured and analyzed scientifically. It seeks to quantify the sprawling masses of text such as product reviews, customer service interactions, or comments on a product page, and turn it into measurable data, indentifying the “who,” “what,” “when,” “where,” “why,” as well as the emotional tone of conversations.
Tasks included in text analysis
Categorizing information
Counting the number of times subjects are mentioned
Identifying the sentiments of text
Summarizing documents
Statistically analyzing blocks of text
Extracting concepts and themes
Drawing connections between different hyperlinked web pages and
Identifying the relationships between entities in the text.
The importance of text analytics is highlighted by its use by major companies.
Facebook recently released ‘Topic Data’, a system to anonymously analyze comments and posts about subjects relevant to specific products. On the page of this system, they give the example of how a company selling hair de-frizzing products can actually harvest data from users’ posts about how humidity affects their hair.
IBM also recently purchased AlchemyAPI to augment the analytics of their Watson platform, and Microsoft recently purchasedthe text analytics company Equivio.
In addition, all email providers use text analytics in their anti-spam filters and while these never seem to be perfect, their increasing rate of correctly identifying spam highlights the effectiveness of text analytics.
Other practical uses of text analytics
Identifying consumer attitude towards brands and products
Checking for plagiarism
Electronic discovery’ process in legal investigations
Determining automatic advertisement placements
Monitoring online conversations for national security
Indexing large publication databases in academic and scientific fields
Thus, text analytics can be valuable for everyone from small businesses to multinational corporations.
As it can be a complicated field, companies can benefit from outside help in the form of a technology consultant with expertise in this area. A good technology consulting firm can advise on the most appropriate software and help organizations get the most value from its use. Since it is such a new and diverse field, we still do not know all potential uses of text analytics, and as such, businesses could be surprised by innovative ways in which it could help them.
While it is difficult to say for certain, most estimates say that more than 80% of the data is in the form of text. This suggests that there is enormous commercial potential in the field of text mining. While text mining was originally developed by intelligence agencies during the Second World War, it has only been in recent years that the technology has truly began to come into its own.
And due to its complexity, it is a field with huge potential for growth, as machines learn to read more and more like their human counterparts.
In the end, we can only guess at how effective the technique will become, but the potential is truly revolutionary, which we are already reaching with the many diverse uses of text analyticsavailable today.