How Yellfy developed it’s prediction AI for NFL and College Football

To build our artificial intelligence NFL prediction model, we had to work on many different things. First, we had to develop an algorithm to help us calculate the outcomes. Then, we had to determine which data factors we needed to compute within our algorithm. Finally, we input all of the required data through the calculation to help us provide you with winner & point spread recommendations for each of the NFL games this year. In this section, I go a little deeper into each of these parts that help us make up our prediction model.

NOTE: There is a tremendous amount of detailed information eliminated as we are under strict NDA as we have sold our IP. All information you are seeing has been disclosed in our public channels and investor presentations.


An algorithm is defined as a set of rules to be followed in calculations or other problem-solving operations, especially by a computer. Essentially, algorithms can be applied to anything with procedures. A cooking recipe is considered an algorithm in a sense. One area that has been growing quite rapidly is sports betting algorithms. A sports game algorithm is designed to try and predict the outcome of sporting events. That’s precisely what we’ve developed with the help of artificial intelligence. In short, we’ve built a sophisticated algorithm that will help predict the winners of NFL games in conjunction with the expected margin of victory.

Data Factors

We gathered every NFL box score from 2003-2018 for our machine learning model. Our data only consists of quantifiable data.

IMPORTANT: We did incorporate data such as injuries, team psychology, trades, weather, one-off details, etc. as each of these data points played a role in how players performed in the game.

Once we gathered up everything we wanted, we cleaned and processed the data until we were left with over 4,000 games and 50 descriptive statistics associated with each game.

Some of our descriptive statistics used in our artificial intelligence model include:

  • First downs
  • Fourth down attempts
  • Fourth down conversions
  • Rush yards
  • Passing yards
  • Penalties
  • Third down attempts
  • Third down conversions
  • Pass completions
  • Yards lost from sacks
  • Las Vegas betting line/betting total
  • Return yardage (kick, punt, interception)
  • and many more

If you’re not interested in nerdy math talk, this section probably isn’t for you (LOL). However, I know many people are enormous data junkies like my staff here. If you’re one of our fellow statistical geeks, get prepared to geek out with me for a few moments. Otherwise, STOP READING and just trust us that some of our brainiacs have done some fantastic stuff with the power of math and artificial intelligence to help us develop this powerful prediction tool.

Every machine learning process involves applying an algorithm to a problem. We built several models, including logistic and linear regression models to help us come up with our predictions. Regression, in general, involves analyzing the relationship between two or more variables and understanding how they collectively contribute to a particular outcome.

With logistic regression, you are trying to determine the outcome of a dependent variable by analyzing one or more independent variables.

PLEASE NOTE: The dependent variable has a binary result (i.e. the outcome is either a 1 or a 0).

The purpose is to fit the model to the data as best as possible to accurately describe the relationship between the outcome variable and the predictor variables.

With linear regression, it also tries to determine the outcome of a dependent variable by analyzing one or more independent variables.

NOTICE: Where linear and logistic differ is that while logistic regression predicts a binary outcome, linear regression predicts a continuous variable (i.e. the result can be 1, 4.72, 10, 54672, 934, etc.)

We split our data up into training and testing data. A general rule of thumb is to split the training/testing set by 80/20 we fit our training data to our model to get coefficient values for each independent/predictor variable. We also get an intercept value.

Once we have fitted our training data to the model, we then test the model with our testing data set. Our fitted model attempts to make predictions for the target variable using the line of best fit is calculated from the training set. We then check these predictions against the actual results to find our error scores.

Phew! That’s a ton of statistical talk about the behind the scenes goodness of our prediction model. For those of you who stuck with me through that, you’ve now got a better understanding of everything which goes into our artificial intelligence NFL picks.