Home » Predicting Targaryen madness with Apache Spark and ML.NET in .NET Core

Predicting Targaryen madness with Apache Spark and ML.NET in .NET Core

4 years 11 months ago

“What’s the saying? ‘Every time a Targaryen is born the gods flip a coin‘“. - Cersei Lannister

With this phrase Cersei reminds us of the madness that runs in Targaryen blood and pre-announces what is to come. Following the last two episodes of Game Of Thrones last season we know which side the coin landed. The question is, could this madness be predicted? Was Daenerys’ rise to her Mad Queen title foreshadowed or was it an act of madness that came out of nowhere? Powered by the bias that hindsight knowledge offers, we will perform sentiment analysis on her 8 season script lines to find out if Daenerys had shown early signs of her erratic personality.

Photo by King Siberia on Unsplash


The tools

Along with the expectation to see if the outcome is actually aligned with the storytelling, we are mostly thrilled to put some cool toys in action.

Our weapons of choice will be Apache Spark, ML.NET and MobiusCore, which is our .NET Core port for Mobius. Mobius is an open source library that provides bindings to Apache Spark through C# code created by Microsoft.

ML.NET

ML.NET is Microsoft’s open source framework for machine learning. It allows you to create your own custom ML models and use them to perform Sentiment Analysis, Product Recommendation, Image Classification and all kinds of cool stuff. We will be using ML.NET to perform Sentiment Analysis on all of Daenerys scripts lines across all the episodes of the 8 seasons of Game of Thrones.

Apache Spark

Apache Spark is a general-purpose distributed analytics engine for processing big amounts of data. It’s probably the most popular open source Big Data library. We will be using it to run our sentiment analysis tasks in parallel over a cluster.

MobiusCore

We are heavily invested in C# and we always wanted C# support for running Spark Jobs. As a result, when Mobius was created by Microsoft for the .NET platform we jumped to it. Same time, being huge .NET Core fans, we were hopping to get the same level of support there too. Certain implementation did not allow the Mobius team to target the .NET Core platform. Thus the idea of MobiusCore was created.

MobiusCore is an open source port of Mobius for .NET Core. Mobius relied heavily in delegate serialization in order to allow user defined functions written in C# to be executed by Spark Workers. As a result it was difficult for Mobius to target the NET Core platform because delegate serialization was dropped by the .NET Core team (see discussions here and here). In comes MobiusCore. To provide the support we wanted, we replaced all Method Reference Delegates with lambdas expressed as LINQ Expressions. Although LINQ Expressions are not directly serializable they represent an expression tree. We can extract the required information from the LINQ Expression Tree, pass it to the Mobius Workers, reconstruct the lambda from the expression tree on the Worker and execute it. Tap dancing around the delegate serialization minefield, we were able to keep the same, fluent, task definition expressiveness bypassing the API differences while porting to .NET Core. If you are interested in getting some more details, you can find some here.

In the meantime Microsoft released its own new library for Apache Spark bindings in .NET Core, .NET for Apache Spark which supports the Apache Spark Dataframe API and we are super-excited about it. We can’t wait to see where the efforts by the Microsoft Team will take us next. In fact, we are now looking into how we can assist in this great new tool!

For our example we will be using MobiusCore to implement the algorithm that will perform Daenerys’ psychological evaluation.


Psyche Eval

It’s time to sit Daenerys down to the examination couch. We have gathered all 8 Game of Thrones Season scripts from the Genious API and we trained our ML Model with the AFINN Lexicon (more info here). Now, the good thing is, the text is all English. Translating Dothraki would probably be a showstopper! At this point, doing the Sentiment Analysis is as simple as:

It’s amazing what you can do when standing on the shoulders of giants.


Results

Dear Mother of Dragons! The results will definitely surprise you. Daenerys Sentiment Analysis came out, and it scores a staggering 74.89% of negative lines. Although the results contain some false positives (or should I say false negatives?), Daenerys, certainly had her fair share of toxic moments!

The top picks contain :

  1. Text: Daenerys Targaryen: He was no dragon. Fire cannot kill a dragon.
    Toxicity Prediction: Toxic sentiment | Probability of being toxic: 0.983727037906647
  2. Text: Daenerys Targaryen: Have you ever seen a dragon?
    Toxicity Prediction: Toxic sentiment | Probability of being toxic: 0.970722794532776
  3. Text: DAENERYS: I’m not a politician. I’m a queen.
    Toxicity Prediction: Toxic sentiment | Probability of being toxic: 0.968557298183441
  4. Text: DAENERYS: (speaks Valyrian) Unsullied! Slay the masters, slay the soldiers, slay every man who holds a whip, but harm no child. Strike the chains off every slave you see!
    Toxicity Prediction: Toxic sentiment | Probability of being toxic: 0.96771764755249
  5. Text: DAENERYS: I know what my father was. What he did. I know the Mad King earned his name.
    Toxicity Prediction: Toxic sentiment | Probability of being toxic: 0.966907918453217
  6. Text: DAENERYS: Jorah sent my secrets to Varys. For 20 years the spider oversaw the campaign to find and kill me.
    Toxicity Prediction: Toxic sentiment | Probability of being toxic: 0.965652883052826
  7. Text: DAENERYS: You’re a strange man.
    Toxicity Prediction: Toxic sentiment | Probability of being toxic: 0.960509717464447
  8. Text: DAENERYS: My enemies are in the Red Keep. What kind of a queen am I if I’m not willing to risk my life to fight them?
    Toxicity Prediction: Toxic sentiment | Probability of being toxic: 0.957431375980377
  9. Text: DAENERYS: I do not recognize this tradition.
    Toxicity Prediction: Toxic sentiment | Probability of being toxic: 0.955697119235992
  10. Text: DAENERYS: I am sorry you no longer have a father, but my treatment of the masters was no crime. You’d be wise to remember that.
    Toxicity Prediction: Toxic sentiment | Probability of being toxic: 0.947878360748291

We can already see, from the top 10 negative lines picked, some toxic behaviors like the indifference she showed at her brothers death, the execution order she gave to her Unsullied and her lack of remorse when addressing the son of a master in Meereen, who she had recently killed.

Now, a lot can be said about what a Dragon Queen’s expected level of toxicity ought to be or how far it would have taken her not to act or speak the way she did. One might also argue on how the warning to “harm no child” should lower the toxicity score, but she seemed to have forgotten about it as the show drew to its final episodes, so +12 points for the Queen of Ashes.


Conclusion

With the power of hindsight, the aid of both our new and established toys, and with some good will and humor, we have made it official! It seems that Daenerys had Queen of the Ashes written all over her from the early beginning. We were all in favor of the Savior title bestowed on her, so we looked the other way whenever she showed glimpses of madness and we refused to believe that she would become her father’s daughter. At least, that is what ML.NET, Apache Spark and MobiusCore have concluded.

What would it take to prevent the destruction of Kings Landing? The Westerosi would need a cluster running Apache Spark, some great open source libraries and a few lines of code… or just a lit more trust on Varys’ gut.

PS: At CITE we have been actively engaged with Apache Spark for a long time and in parallel have been following through the progress and using all the goodies that .NET Core brings to the .NET enthusiasts as ourselves. So, we worked on MobiusCore as a way to spread the love. What a brave new world to be coding!