At RailsConf I’ll be giving a talk on Natural Language Processing. As part of my preparation for this talk, I’ve been reading a bunch of the history of Natural Language Processing, and I’ve been experimenting with Google’s Natural Language API. The Ruby Gem for this API is alpha, but it works well enough to do basic experimentation.
One of the things that intrigue me most about NLP is syntax analysis. Doing static analysis on English is tricky. Even determining the part of speech can be hard. For example, in the sentence, “I’m leaving work” the word work is a noun. I’m leaving a physical place. But in the sentence, “I work on my talk” work is a verb. Words such as “very” can be either adverbs or adjectives depending on where they are placed in a sentence.
Despite these challenges knowing the part of speech can be useful. At last week’s Seattle Ruby Brigade meeting we worked on a text generator that we’ll eventually use for a chat bot. We used simple Markov Chains for our bots, but sometimes that resulted in grammatically incorrect sentences. If we had been able to ensure that each sentence had a verb and a subject the generated text may have been better.
The Natural Language API breaks input text into tokens (words and punctuation) and then provides information about each token. Here’s some basic code that uses the Natural Language API to identify the part of speech of each word on the input.
I ran this code against the sentence “The cat plays.” and got this output.
The enum for mapping the tags to labels we are more familiar with is here. Running it against a slightly longer sentence, “The cat plays with the toy.” gives me this.
In both examples, the API identifies cat as a noun and play as a verb. ‘The’ is identified as a determiner; you may know this as an article. In the longer sentence, ‘with’ is identified as an “Adposition (preposition and postposition)”.
The NLP API can also identify the role that a specific word is playing in a sentence by using the “label” property of the token.
And here’s the results of running the longer sentence from above through the API.
Cat is identified as the subject, plays as the root of the sentence, and “with the toy” as a prepositional phrase (preposition, determiner, prepositional object).
I’ve been enjoying playing around with the API just to learn it and see where the edges are. Here’s a diagram I made using the graph gem and the information returned from the syntax analysis call.
In my Rails Conf talk I’ll show other methods of sentence diagramming and go into more detail about what all these grammar terms mean for those who have forgotten middle school grammar.
At Ruby Conf in San Antonio, I gave a talk entitled Stupid Ideas for Many Computers. In that talk I do very hacky sentiment analysis on tweets by assigning values to various emoji, extracting the emoji from tweets, and adding the whole thing up. It was an incredibly stupid idea, but that was the purpose of the talk.
I’ll be reprising this code at Rails Conf, but this time I’ll be using proper sentiment analysis. The code is similar to the syntax analysis code above.
Score is a number between -1 (negative sentiment) and 1 (positive sentiment). The magnitude is a measure of “how much” the message was negative or positive. I ran some tweets through the sentiment analyzer.
One of the best parts of dogs at work: Listening to everyone's voice go up an octave when they say "HI PUPPY! YOU ARE SO CUTE!"— Aja Hammerly (@the_thagomizer) March 27, 2017
I cherry-picked this one because I was confident the sentiment would be positive. I got a 0.7 sentiment score and a magnitude of 1.5. So “a lot of pretty darn positive” in rough English. I also tried the tweet that the Seattle Ruby Brigade sends out to remind us about meetings.
so like... we're meeting 'n stuff. you should show up. code. talk. have coffee. vivace 7-9. rawr. https://t.co/awW4m6VqrI— Seattle.rb (@seattlerb) March 28, 2017
The sentiment for this was 0.1, so almost neutral. And the magnitude was 1.3. Together that is approximately “pretty strongly neutral”.
If you liked these examples, I encourage you to try out the Cloud Natural Language API library and just experiment with all the different types of analysis it supports. If you are at Rails Conf, you can see more examples in my talk or stop by the Google Cloud booth to try it out in a codelab.