How Much Does Sentiment Analysis in the Cloud Actually Cost?
Deciphering the Cost of Running Cloud-based Text Analytics with Amazon, Google, IBM, and Microsoft
These are exciting times we live in, especially for the data scientist charged with rapidly analyzing volumes of text. In addition to free packages for IDEs like R Studio and myriad off-the-shelf software, the big cloud players have put their machine learning tools at your disposal via API. Taken individually, the cloud services work well, but combining them (as we have recently) yields even better results: We found that combining the sentiment analysis results of Amazon Comprehend, Google Cloud, IBM Watson, and Microsoft Azure could correctly predict sentiment at a rate of 78 percent.
Using the services is easy enough. Figuring out how much they cost, and how those costs compare, is another matter entirely. In this article we’ll compare costs of the four biggest cloud players across a variety of scenarios.
Comparing Apples, Oranges, and Bananas
Amazon, whose pricing is notoriously complicated, is one of the few cloud services with an open API to offer a concrete pricing example so we’ll use that as a benchmark.
Here’s the scenario: You have 10,000 customer reviews, each averaging 550 characters. You’re just getting started and want to keep things simple, so you just want to know the overall sentiment of each review. More complicated analytical techniques are possible on each platform, including topic analysis and custom modeling. But we don’t need that right now.
For the sake of comparison, we’ll exclude any storage costs, since that also varies by platform.
Amazon Comprehend
Though this simple example would be free to perform on Amazon Comprehend, we’ll assume the lowest standard pricing tier (also worth noting there is a 12 month limit on using the free tier). Amazon counts each 100 characters as one “unit.”
10,000 requests X 550 characters/request = 60,000 units 60,000 X $0.0001 per unit = $6
Google Cloud Natural Language
The Google Cloud Natural Language free tier tops out at 5,000 records, so you’d be forced to pay for even this simple 10,000 record example. Each record has a character limit of 1000, however, so each record in our example would require just one unit.
10,000 requests X 550 characters/request = 10,000 units 10,000 X $1 per 1,000 units = $10
IBM Watson
IBM Watson Natural Language Understanding allows for up to 30,000 “Natural Language Units (NLUs)” per month in its free pricing tier, but we’ll consider the lowest pricing tier for this comparison. Each NLU allows for 10,000 characters and two “enrichment features”, so our example would require 10,000 NLUs.
10,000 requests X 550 characters/request = 10,000 NLUs 10,000 X $0.003 per NLU = $30
Microsoft Azure Cognitive Services
Like Google, the Microsoft Azure Cognitive Services free tier is limited to 5,000 “transactions”, so we’ll analyze their lowest pricing tier. Each transaction includes sentiment analysis and key phrase extraction, language detection, and entity identification for up to 5,000 characters. Microsoft does not break down pricing for individual elements and only offers pricing based on tiers rather than volume.
10,000 requests X 550 characters/request = 10,000 transactions 10,000 transactions requires “Standard S0 pricing tier” = $74.71
More Scenarios, More Fruit to Compare
Judging from the comparison above, you’d think Amazon is clearly the most aggressive on pricing and that’s the end of the story. Not quite. Things get a little more complicated when you increase the volume of records or the size of the records.
In the first example above, we compared sentiment analysis of 10,000 customer reviews. But what if we wanted to analyze 1,000,000 tweets? Or 5,000 academic papers with 10,000 records each?
While Amazon is generally ahead of the competition on pricing, there are a few oddities in scaled pricing that become apparent. IBM, for example, looks like a good place to run your analysis if you have a relatively low number of very large documents. Microsoft’s lowest tier of pricing seems pricey when doing something small, but it looks more appealing if those documents are larger in size.
We recommend you try them all out for yourself with tests data sets and see how they perform. How much they cost is irrelevant if they don’t work.