Text Similarity and Electability

Introduction

In early 2019 I completed the requirements for the Master’s in Predictive Analytics program at Northwestern University. One of those requirements was a thesis paper, in which I wrote about several text data analysis methods that can be applied to the question of partisan framing.

A Few Charts Illustrating the Hokies Recent Woes

I have never lived a year of life on a Planet Earth without the Virginia Tech Hokies playing in a college football bowl game. The team went from a small time program who hired some coach from Southwest Virginia in the 1980s to a significant national force. But since Tyrod Taylor went to the NFL and beloved coach Frank Beamer retired, we haven’t really felt all that relevant nationally. Last year we were in real danger of losing our longest-in-the-nation active post-season streak. Towards the end of the season I downloaded 18 years worth of game data from Sports-Reference.com to try figure out why the program has struggled in recent years - and the answer lies in our defense.

How Google Trends Tell the Story of a World Cup Match

We tell Google about everything - what our weird rashes look like, which restaurants we eat at, where we need driving directions to get to, and who we want more information about right now. Seth Stephens-Davidowitz writes about using Google and other internet data to answer questions ranging from “How many people in the US identify as LGBT?” to “When do people in Edmonton go to the bathroom during a big hockey game?” His book Everybody Lies: Big data, new data, and what the internet can tell us about who we really are sparked my interest in using Google Trends data as a low cost market research tool. Here I outline the available data from the gtrendsR package - a popular R package for accessing Google Trends data - and show how they describe the excitement and buzz around the opening Group F match of the 2018 World Cup between Germany and Mexico.

Balancing Individual and Team Talent Among Champions

One of the more common criteria for judging the greatness of basketball players is to simply ask who has the most championship rings. It’s a natural starting place, because only a fraction of the players who have donned NBA Jerseys have won even one championship and, of those who have, only about half saw significant playing time during their championship seasons.

Exploring Lebron James' Point Share

Depending on who you ask, Lebron James is the greatest basketball player of all time or at least in the top three. He is constantly compared to Micheal Jordan and has, at the time of publication, played in the last 9 NBA Finals and won both championships and MVP recognitions in in 2012, 2013, and 2016. LeBron James’ was drafted first overall out of St. Vincent’s Academy in the 2003-2004 NBA draft by the Cleveland Cavaliers. He had an immediate impact, dropping 27 points in his first game, then winning Rookie of the Year season and contributing to an 18-win improvement for the Cavs over the prior year.

Differences in Perceptions of Racial Fairness at the Community and National Levels

Americans don’t like to think of ourselves as racists - even when we acknowledge the existence of racism in the country. That’s not terribly controversial, but some data¹ published by the Pew Research Center in 2016 illustrates the divide by asking a battery of questions about areas where blacks are treated unfairly compared to whites. Half of respondents were asked about racism in their own communities and half were asked about racism in the country as a whole. This post doesn’t go so far as to assign causality to differences in perceptions of racial unfairness, the differences between each measure are an interesting look at how people perceive race in their lives.

Are You Better Off Now Than You Were 50 Years Ago?

On July 2, 1964, Lyndon Johnson sat down in front of an array of leaders in Congress and the Civil Rights movement to sign the Civil Rights Act of 1964. The law, he acknowledged, would turn the South Republican for a generation, but it was too important not to sign. He said that because he knew southern white voters would feel a sense of loss as their black and brown neighbors gained rights and status in the country and that they would blame Democrats for stripping away their supremacy. LBJ was spot on, and the Civil Rights Act was not the only instance of societal change since then. Women have gained increased rights to make their own healthcare decisions, LGBT Americans have gained increasing acceptance in society, and the Vietnam War effectively brought an end to the military draft. We’ve also seen massive shifts in economic dynamics in terms of automation and globalization. At each major milestone there were people who saw progress and felt left behind and people who saw progress and felt more free.

Measuring the Madness

UMBC’s historic and dominant upset of UVA in the first round of the 2018 NCAA Men’s Basketball tournament instantly made a case that 2018 was the maddest March Madness ever.

Predicting Charitable Contributions with Gradient Tree Boosting

Excessive overhead spending has long been seen as a cardinal sin for charitable organizations. Money spent on corporate salaries and donor outreach is a key part of how charities are evaluated by organizations like Charity Navigator who signal to donors that certain organization are worthy of their dollars. One way to reduce overhead is to boost the efficiency of donor outreach by limiting outreach to those most likely to donate to your organization. This post demonstrates the use of tree-based methods for predicting donor likelihood for a fictional charity (code).

A Pair of Text Analysis Explorations

After about a year of learning text analysis techniques from Text Mining with R (Silge and Robinson 2017) I had two big questions that I wanted to explore. First, the tidytext R package taught in Silge and Robinsons’s book has four different ways of measuring polarity (positivity) of text. Robinson wrote a blog post validating the performance of one method and I’ve extended his analysis to compare all four models to one another. Second, how much text do we need to provide reliable sentiment estimates?

Cluster Analysis of Players in Division I Men’s College Basketball Teams

Basketball teams are teams for a reason. Some players are great pure shooters and others are stronger defensive players. Thousands of players ride the bench and rarely, if ever see any playing time. Coaches and athletic directors assemble varying combinations of players in hopes of optimizing their win percentages.

Reverse Coattails in the VA Governor’s Race

One of the more controversial hypotheses coming out of the 2017 statewide elections in Virginia is the idea of reverse coattails — a phenomenon where House of Delegates candidates boosted the vote share for the top of the ticket. Most writing on this topic has focused on whether or not Ralph Northam got a greater share of the vote in precincts with contested house races. I tend to agree with previous work that argues that down-ballot competition doesn’t really affect support for the top of the ticket. But elections are about vote share and turnout. Perhaps down-ballot competition increases turnout in helpful ways for statewide candidates.

When Can I Start Relying on Basketball Stats?

One challenge in predicting basketball outcomes is the unpredictability inherent in college sports. Players get injured or suspended, teams have great nights and off nights, and particularly at the beginning of the season not all match-ups are on the same competitive level.

Examining FiveThirtyEight’s Soccer Power Index Ratings

FIveThirtyEight recently released their newest batch of soccer power index (SPI) ratings for over 400 soccer teams around the world. I don’t know much about soccer, but I love data and enjoy the occasional game when it’s on, so I copied the scores into an Excel file, cleaned up the data a bit and loaded it into R to see what I could learn! Code for preparing the data and visuals is included below.

Looking Ahead to Virginia in 2017

One of the first chances Democrats will have to strike back against Trumpism will be the statewide races in Virginia for Governor, Lieutenant Governor, and Attorney General.

Sentiment Analysis from the Second Presidential Debate

After reading this article by David Robinson demonstrating that Donald Trump’s most vitriolic and nasty tweets were coming from his Android while more traditional campaign tweets came from staffers using an iPhone, I’ve been itching for a good excuse to apply Robinson’s sentiment analysis on my own. The second presidential debate offered just such opportunity. All of my code and data are here if you want to follow along at home.