Insight into Yellfy Predictive Modeling For Game Winner/Point Spread Predictions
The applications that generate our AI insights include Cloud Foundry applications, as well as stateless computing functions. This system is written in Python and JavaScript. The Python applications handle AI algorithms, including data cleansing, normalization, model training and test, fairness evaluation, and multimedia management. The JavaScript applications that are run through Node.js combine finished data artifacts to generate content in support of the user experience. Finally, the front-end React application is built on demand and released through the multi CDN. If any of the applications experience slowness or a service outage, our continuous service monitoring will send out alerts via email, Slack, and mobile.
The Python application, Cloud Foundry Natural Language Container, encapsulates the majority of the machine learning. This app pulls news articles & Stats from Google and other Stats providers and enrolls the datas into a custom collection with our statistical entity detector. The engine pulls sources from Google Cloud Object Storage, STATS LLC, SPORTS RADAR, and RotoWire. Unstructured information is combined with traditional statistics so our system can reason about players. The job runs every day as a batch and periodically throughout the day as each player’s state changes. The incremental runs are driven by a Python app that detects if a player’s projections or actuals have changed, and if so, sends a post request to the Cloud Foundry Natural Language Container app that runs the player through the machine learning pipeline. In parallel, and on the hour, a player pre-processor updates player information, such as a trade, injury, suspension, and bye status. When data changes, a stateless function updates Tableau Dashboard Embedded.
Yellfys AI is trained by several human annotators that associate text within 1,200 articles to 13 entity types. The 1,200 articles are a sampling of articles from 5,568,714 documents about specific players over a previous fantasy football year. Ten football dictionaries were used to pre-annotate the articles. The approach accelerated the annotation process by suggesting textual annotators for a human to review and correct. On a daily basis, the group of human annotators met to discuss their relative understanding of each entity. The kappa statistic was used to measure the group’s disagreement over each entity. Over time, each individual began agreeing upon a common definition of entities. Entities included player, team, contract, injury, performance, gear, etc.
After the documents were annotated, a statistical entity recognizer was trained. During training, the results of cross-fold validation indicated that the entity model was ready for deployment. The model was deployed to Yellfy Discovery(TM) for custom machine reading.
Each of the documents found by querying Yellfy Discovery news or transcripts from podcasts and videos was enrolled in a Yellfy Discovery custom collection. A second query was issued against the custom collection to extract fantasy football-related entities. After machine reading, each document had results containing a list of keywords, concepts, and entities. To semantically understand each word, two Word2Vec models projected the letters into a high dimensional space for spatial representation.
The broad Word2Vec model was trained on 94 GB of text that represented slang and general use of fantasy football terms. The precise Word2Vec model was focused on football dictionaries. Each word from each of the list of keywords, concepts, and entities was input into both models. The results created a large floating-point feature representation of the words. The vectors were averaged together to provide Yellfy with content comprehension. Now Yellfy has an understanding of the word meanings. The three averaged vectors were then combined in preparation for machine understanding.
Before the next phase, we had to evaluate and ensure that Yellfy was comprehending the words. Two tests were applied to the Word2Vec models. The first was a word keyword lookup test. For example, if we said player 1’s name, the result should be the player’s team name. When given the keyword test between players and teams, 80 percent of the questions were correct if the answer was in the top 1 percent of ranked answers. When given the keyword test between team and location, 75 percent were correct when the correct answer was in the top 1 percent of the results. Next, Yellfy was given two analogy tests between players and teams, as well as between teams and locations. We found that if the data was in the top 500 results, we are in the top 1 percent of the data. With players to team testing, 100 percent of the correct analogies were found. For a team with location, 93.48 percent were correct. The results of machine comprehension were outstanding.