[break] I began to do research after I read this Clay Travis article. His main thesis was that to win a national championship, a team needs to have a Top 10 recruiting class at least once in 4 year span. My initial reaction was that a blue moon comes around every four years as well, but that fact as little to do with championships. Later, Clay strengthens the argument by stating that national champions have had a remarkable average of 2.8 Top 10 recruiting classes in the four years prior to winning a Natty over the last 20 years. Now that is an exciting claim that needs to be delved into, investigated further. But at the center of my data scientist mind was that before one can truly say there is a correlation between Top 10 recruiting classes and National Championships, one needs to examine the number of beta errors that occurred – the number of false positives. For example, a pregnancy test is no good if it has many false positives, even though it is 98% on the true positives.
[break] [break] So, I set out to get all the cases where there were at least (2) Top 10 recruiting classes in a rolling 4 year cycle and see their highest ranking and their 4 year rolling AP finish. That was the path I started down. Because Clay Travis used Rivals team rankings, I was going to use the same database. Clay Travis had access to data that isn’t publically available. The public data set only went back to 2002. I quickly found errors in the Rivals data. In 2004 Rivals staffers forgot to do the data entry for the #3 team, the #5 team and the # 7 team in their rankings. If you have trouble counting 1 to 10, that sent of major reliability red flags in terms of data integrity. (I tested that site from various computers at least 4 times. See Below)
[break] So I turned to 247 Composite rankings. I was going to be able to go further back and all of their classes were sequentially numbered! By hunch, I started to compare 247 signees to Rivals. This is where my original hypothesis got derailed. It swiftly became apparent to the numbers did not jive with one another – didn’t “geehaw” as my grandfather use to say.
[break] [break] I did consider that maybe one source by a buried definition was counting gray shirts while another left those off in their counts. I also considered maybe one source was counting the number of NLI that actually made it into schools. In both of those scenarios one source would always be higher than the other. Those plausible explanations were ruled out due to higher tally’s occurring on both sides of the ledger. I just know that inaccuracies occurred. These were occurring at astonishing rates.
[break] [break] From 2014 to 2002, Rivals and 247 Composite team rankings, among common teams in both Top 10 lists, they didn’t agree over 2/3rds time on the number of signees (71 times out of 105 common teams = 67.62%) – not a subjective 4 or 5 star! In this recruiting class (2015), they disagreed over the number that Clemson signed. If these organizations can’t nail down the number of recruits coming into a program, how in the world can they be trusted with subjective scoring of an athlete or a ranking list? In 2010 the (2) sites agree on all Top 10 teams, in nine instances they disagreed on the number of recruits. In 2006 and 2005, they disagreed on counts 100% of the common Top 10 teams, 8 and 9 common teams, respectively.
[break] [break] Other most glaring discrepancies:
- 2010 Auburn: Rivals 32; 247 Composite 44 – a variance of 12
- 2007 Notre Dame: Rivals 18; 247 Composite 35 – a variance of 17 or a complete class for a team on probation.
- 2008 Notre Dame: Rivals 23; 247 Composite 34 – a variance of 11
- 2009 Ohio State: Rivals 25; 247 Composite 36 – another double digit variance of 11
[break] [break] For one organization/ industry to have an error rate of ~ 66%, (even on a small sampling) that is garbage. I don’t know which one organization or if it is a combination of both, but I do know that type of quality would be completely unacceptable in any other organization or industry. Would you take your car into a shop that got the repair wrong that often? How about a pilot landing at the wrong airport a large percentage of the time? Wait, my wife had a Nova, and I flew Eastern as a child a couple of times will… Just be a skeptic even though you are dialed in.
[break] [break] Hopefully this makes a compelling case for having reliable data sources. I would propose is the same one that gave the college football playoff committee all of their data. It is called Sport Source Analytics. Three of the four C-Suite people played baseball for Vanderbilt. They have been vetted by SEC, Big 10 as well as others. All this cost money, not an over whelming amount, but it would need sponsorship. If interested in a real Stats feature, that would compare to national publications right here on BI, contact us.