How to evaluate if an NFL Draft was a success

02/28/2023
DB+

For many NFL fans, draft season has just begun. Most fans don’t really start to dive into college prospects until after their favorite team’s season is over, and that’s certainly understandable, but many others have been tracking this year’s crop of draftees since last August or even longer. NFL scouts and respectable draftniks have already spent thousands of hours detailing their reports and breakdowns and most will spend hundreds of hours more between now and the NFL Draft starting on April 27th.

How do we determine if all of those hours are worth it? How do we ascertain that a draft has been successful or not?

Next week, we’ll look at how some of the aspects of these methodologies can be combined to possibly offer a more nuanced tool to answer the question each teams’ draft proposes: Was it good or bad?

For this week, we’ll get introduced to a few of the types of methods the football cognoscenti have used to reach conclusions about drafts. The title of this piece might be a little misleading so, if you’re coming here to find out the secret sauce behind how Cover 1’s draft crew assesses college prospects through film study, take your eyeballs and head home now. If you want to examine some of the tools for evaluating draft success, dig in.

Various Methods For Evaluating Drafts

Approximate Value

Uber Hansen recently did a great article on evaluating how Brandon Beane and the Bills have drafted, which introduced many of you to Approximate Value (AV). AV was created by the co-founder of Pro Football Reference, Doug Drinen, when he was intrigued by Bill James’ work on Value Approximation Method in baseball, which attempted to simplify comparing one era of baseball to another. In a similar vein, Drinen’s work with AV was intended to group levels of play for comparison’s sake. If you’re interested, you can get into the weeds of how AV is calculated, but suffice it to say for the purposes here, AV lets us make generalizations about groups of players with similar AV scores, whether for a season or career.

For example, in the 2019 draft, the Raiders picked Clelin Ferrell at No. 4, which, in part, gave the Bills a chance to draft Ed Oliver at No. 9. The Raiders’ pick was largely considered a surprising reach at the time, and Ferrell’s career overall has not lived up to the expectations of a fourth overall pick. Using AV for each year of their respective careers and their career in total, there is a general impression of how they have performed.

Consider 2021 for an example. There is a clear demarcation in the level of play. Ferrell scored 1 in AV, while Oliver scored 10. What AV lets us say is that players who scored 10 generally significantly outperformed players who scored 1. However, AV doesn’t let us say with specificity that Oliver outplayed Ferrell definitively in 2021 because the metric isn’t designed to do so. The gap between a 10 AV  and a 1 AV is an overstatement of an example, but the point remains that AV is not trying to compare individual players but groups of players at particular levels to other groups of players at other levels.  AV is looking across broader scopes of time for purposes that are ultimately different than evaluating a certain year’s draft.

This cross-era contextualization was intended to get beyond the simple use of games started, Pro Bowls, etc. for comparing Hall of Fame resumes, and it does provide a framework for general classification, but it can’t be the sole evaluative tool.

In defense of the Bills drafting Terrel Bernard

Grading Scales

The scale below is from the Bleacher Report player scouting series, which included work from Nate Tice.

The BR scale comes from the other end of the draft process – expectations instead of results – but those expectations are built off the results teams are reasonably anticipating. Theoretically then, a player drafted in the first round who isn’t at least an immediate starter has underperformed. There can be a variety of factors that dramatically impact those outcomes though. In Kaiir Elam’s case, he was not an immediate starter for the Bills. Was that situation just about Elam’s abilities, or did something like Sean McDermott’s history of bringing rookies along slowly have enough influence on the result to cloud the evaluation? The grading scale is another good tool, but, while AV was too generalized, there is too much potential subjectivity in grades.

Cover 1’s own Anthony Prohaska created his own rubric for his recent evaluation of Bills’ draft picks, and it falls along a grading scale. Ant’s scale isn’t as hard and fast as the others we are looking at, and that’s effective because there is room for overlap and discussion, which is a necessary aspect of evaluating a game where 22 players fly around and many of us doing the evaluating don’t fully understand each player’s responsibilities (**Spoiler Alert** Next week’s evaluation tool will probably be less cut and dried than you might have hoped). Ant’s definition of success for each round:

  • Round 1 – Starter – B+ to A- player – Team Pillar – Earns second contract
  • Round 2 – Starter – B to B+ – Earns second contract or becomes a consistent quality starter
  • Round 3 – B- to B or above – Functional starter O/D
  • Round 4 – C+ to B- or above – Rotational player to functional starter O/D
  • Round 5 – Makes the team – Offers quality depth in any phase O/D/ST
  • Round 6 – Makes the team – Offers any level of depth in any phase
  • Round 7 – Makes the team

This list is helpful in its clear definitions and easy accessibility. The room for nuance means folks might disagree, but spotting the difference between a functional starter and a rotational player is something anyone can do.

A first issue with these grading tools is they offer little to no scope of time. For a fourth-round pick, at what point does the player need to have become a role player or spot starter? Year two? Year four? If we need to wait four years before a player has earned a second contract or had their fifth-year option exercised, then draft evaluation will have to occur after a lot of the front offices that made the picks have already been fired. Again, that statement is hyperbolic, but if there isn’t a time element baked into the method, its usefulness is diluted.

A second issue is the potential of subjectivity. The component scores that generate the overall grade are subjectively assessed, as all film grades are, and sometimes the grade is a product of the evaluator as much as the evaluatee. This is also a good opportunity to review the range of objectivity to subjectivity in statistics and metrics. The graphic below from an old Nick and Nolan show on Buffalo Rumblings is an excellent overview of how numbers can move from essentially inarguable facts to very close to pure opinion.

Find the pod here: Nick and Nolan – Methods of Measurement.

Draft Capital

There are a variety of ways to capture draft capital, beginning with the now long-fabled Jimmy Johnson trade chart, followed by the Rich Hill chart, and the Chase Stuart chart. They each have pros and cons but their goal is important – find translatable values for draft picks. Football Outsiders has done extensive work, and their results can be found in Ben Ellinger’s NFL Drafting Efficiency article from 2020.

Ellinger’s article begins with determining each team’s amount of draft capital based on the Chase Stuart chart. Each pick a team has that year is assigned a value (there are small differences year over year because of the compensatory pick formula), and then each team’s total is divided by the total for the full draft, generating a percentage of draft capital for each team. For the return on how each team spent those draft resources, they used Career AV from each draftee, added the Career AV from each of that particular year’s draftees for each team, then divided each team’s total by the total Carer AV from that year. This process created a percentage depicting how much of the possible AV from that year’s draft each team garnered. Football Outsiders went the final step and divided draft return by the draft capital year by year.

Maybe you already noticed the difficulty in this method, but, if not, basing all of this work on AV is problematic. By its own definition, AV is not intended to be player-to-player specific, which is a significant component of comparing drafts, which essentially boils down to comparing player to player in this methodology.

(Side note: Other than an outlier in 2015, notice how the Bills drafts began performing much better on average in 2017. Baseline organizational competence is so nice.)

Bills Promote vs. Pay: Tremaine Edmunds vs. Everyone

Total Points Earned

Total Points Earned(TPE) is an SIS metric based on Expected Points Added (EPA), which was discussed in some detail in my piece on the Top 5 Plays of the 2022 Season by EPA. TPE uses an EPA-like formula where, on each play, average play scores a zero, poor play creates a negative score, and unusually good play scores positive. The scores are then scaled, and you can read more about that process: A Primer on Total Points.

TPE is obviously a more thorough metric than AV, encompassing more of how a player has performed specifically. TPE is still a relatively young metric, having been introduced in 2018 and only going back to 2016. That timeline works in our favor though, since the timeline we are primarily interested in starts in 2017. ESPN recently used TPE to grade the 2022 draft classes, but it wasn’t adjusted by invested draft capital. That venture is a big part of what we’ll look at next.

Conclusion

The draft evaluation methods we have available have their relative strengths and weaknesses, but there might be better tools if can find ways to combine some of the ever-evolving advanced analytics now available. Next week, we’re going to look at combining TPE with Chase Stuart’s draft chart to see if we can derive a more thorough and specific draft value comparison tool for the quantitative side of the evaluation. For the qualitative side, we’ll see if we can build a series of qualifiers position by position to generate an exhaustive evaluation rubric. The calculations are not complete yet, so, if it doesn’t work out, next week’s piece might be really short, and defeat is admitted. If it does work though, we might have a more universal tool for assessing good versus bad drafts.

You can find Chris on Twitter (@lowbuffa), getting dirty in #MafiaGardens, or watching football. Go Bills!

0 Comments