clock menu more-arrow no yes mobile
Getty Images/Ringer illustration
Getty Images/Ringer illustration

Filed under:

How to Avoid a Box Office Disaster

We use analytics to predict baseball, elections, and even pop music. But can an algorithm foretell a movie’s success or failure, even before it’s made?

Welcome to Future of Movies Week. Too often this year we’ve been left baffled at the multiplex. It’s been 10 months, and we’re struggling to come up with a viable top-10 list. Streaming platforms are encroaching on Hollywood’s share of our collective attention, preexisting intellectual property is providing diminishing returns, and moviegoers largely skipped Jack Reacher: Never Go Back. Wild days.

November will be different. It’s packed with interesting releases — Oscar contenders like Loving and Arrival and Manchester by the Sea, blockbusters from Marvel (Doctor Strange) and J.K. Rowling (Fantastic Beasts and Where to Find Them), a Disney movie with Lin-Manuel Miranda and the Rock (Moana), and old-fashioned fare from big-name directors like Robert Zemeckis (Allied) and Warren Beatty (Rules Don’t Apply).

This week, we’re looking at the future — of film school, horror, the Marvel Universe, movie stars, and the medium itself.

Do you remember when you realized that 2013’s supposed-to-be-blockbuster R.I.P.D. wouldn’t be a big hit? Maybe it was when you watched the trailer and saw yet another city get CGI-destroyed. Maybe it was when you spotted the posters, which spelled out "Rest in Peace Department" (as if we wouldn’t get it). Maybe it wasn’t until you read the rest of this sentence: R.I.P.D., which cost $130 million to make, earned a 13 percent rating on Rotten Tomatoes, ranked seventh in sales on its opening weekend, and grossed only $33.6 million domestically.

Joshua Lynn knew R.I.P.D. would be DOA — as every review’s headline declared — before you did, with a high degree of certainty. He may have known even before R.I.P.D.’s distributor, Universal Pictures. At least six months before the movie came out, Lynn, the president of West Hollywood–based Piedmont Media Research, looked at a spreadsheet and saw an unexpectedly low number next to R.I.P.D.: 137. To him, that number told a disconcerting story. "It came in scoring at, like, half of what a typical movie of that size and scope ends up coming in at," Lynn says.

As Wesley Morris wrote at the time, "R.I.P.D. isn’t an accident. Someone said ‘yes.’" That "yes" didn’t have to happen. "Instead of guessing and looking at R.I.P.D. and kind of thinking like, ‘Oh yeah, this will be fantastic, because it’s just like Ghostbusters and Men in Black,’ you can actually find out beforehand," Lynn says. For the right price, he’s happy to send out his surveys, dissect the responses, and forecast the future.

"Once we get the results in for any given study," Lynn says, "I know that we have specific information and knowledge about something potentially incredibly major — and where many millions of dollars are on the line — that no one else necessarily knows." If that sounds a little like a thriller, in which the protagonist sees disaster looming but is powerless to stop it, Lynn won’t mind. His stats say that thrillers sell well.

In his seminal 1983 memoir, Adventures in the Screen Trade, screenwriter William Goldman disclosed what he believed to be "the single most important fact, perhaps, of the entire movie industry." He wrote it in all caps, twice on the same page, one paragraph apart: "NOBODY KNOWS ANYTHING." Below, he narrowed "anything" down: "Nobody, nobody — not now, not ever — knows the least goddam thing about what is or isn’t going to work at the box office."

Goldman quotes a former studio executive, the ironically surnamed David Picker, as saying, "If I had said yes to all the projects I turned down, and no to all the ones I took, it would have worked out about the same." In Goldman’s experience, studio heads were no more effective than mutual-fund managers. Some had hot streaks, but none could keep beating the market. It wasn’t because they were bad at their jobs; it was because their jobs required the wisdom of crowds, not the wisdom of one person. "They’re trying to predict public taste three years ahead and it’s just not possible," Goldman wrote.

It’s hard enough to predict public taste in the present; even the Romans knew there was no accounting for it. I couldn’t stand Synecdoche, New York, but you may have admired it (although we probably both agreed that it wouldn’t do well at the box office). Imagine, then, trying to predict what the world will think of an unmade movie’s box office revenue, months before its first dailies. If you were facing that sort of uncertainty, you might want to rely less on the people operating on only feel — especially because, as Blumhouse president of feature films Couper Samuelson told my colleague Chris Ryan this week, "there are fewer and bigger movies being made," which means missing is more costly than ever.

Turn on a New York Yankees radio broadcast, and you can still catch 78-year-old play-by-play man John Sterling insisting that "you can’t predict baseball." When he started calling baseball games, right around the time Goldman was completing his memoir, Sterling was probably right. Now, of course, you can predict baseball; not nearly perfectly, but better than the typical pundit consistently could by eyeballing Opening Day rosters.

As our machines have in some ways surpassed us, we’ve become accustomed to seeking out algorithms for everything. It’s comforting to find concrete reasons why some efforts flourish and others fizzle. Hit Song Science and its successors try to predict popular music. The Bestseller Code claims to detect which books will break through. Politics sites, whose election models update hourly, reassure you that the presidential candidate you dislike the least will win (although it’s hard to tell which to trust when their predictions and methodologies differ).

In Piedmont, Hollywood has its equivalent, although Lynn’s algorithm has a human heart. He cofounded Piedmont in mid-2011 and spent two years building up a database big and refined enough not to fall prey to overfitting. "Film has a lot of variables," he says. "So with more inputs, you need a larger number of films in order for the model to be accurate." The company now has about 10 total employees. Lynn won’t disclose his clients’ identities, but he claims to have worked with "several of the major studios" and "many other finance/production entities," as well as individuals. A studio might consult him in preproduction to decide whether to green-light an idea, or, later, to crowdsource a marketing campaign; a writer might solicit advice on how to package a script, and a director might ask for help picking or casting a project.

As always, the cinema is under assault from competing entertainment. Domestic ticket totals have been flat or falling for the past 15 years, and more and more small screens are stealing potential sales. As Michael D. Smith, coauthor of Streaming, Sharing, Stealing: Big Data and the Future of Entertainment, told Yahoo Finance in September, streaming services are "using the algorithms not so much to change the content, but to take the content that exists and market it to exactly the right audience." Smith’s segment was called "Big Data to Disrupt Hollywood." However, Hollywood could use data to disrupt right back — not with AromaRama, but by steering clear of some future big-budget bombs like the many they failed to defuse in 2016.

Studios have focus-grouped films and gathered postrelease reactions for decades. They still do, using services ranging from the mainstay, Cinemascore (which distributes surveys to moviegoers after a film is in theaters), to newcomers like Canvs, which scans social media for emotional responses attached to certain titles. Lynn believes he can do better than that. Whereas other surveyors have to wait for a finished (or near-finished) film to get feedback, Lynn can work with a figment, testing the strength of the audience’s connection to a concept alone and then assessing how it varies with tweaks to the director and cast.

Once a client commissions a survey, Piedmont distributes its proprietary questions to approximately 3,000 respondents, whose makeup is meant to mirror the demographics of the moviegoing public. That process, which typically takes one to two weeks, produces a Consumer Engagement score, which forms the foundation of the predictive model. The CE tells Lynn how the movie’s elevator pitch resonates with the respondents as a unit — and as smaller cohorts splintered by sex, age, income, education, geographic location, and race, some of which have disproportionate impacts in the model — although if he wants to know why a movie yielded the reaction it did, he has to follow up with additional inquiries.

One of his conclusions is that very few actors are reliable draws, independent of the projects they pick. "Denzel Washington is about the only actor whose name adds value in a big way consistently, no matter what it is he’s in," Lynn says. "[Sandra Bullock] and Melissa McCarthy in The Heat, it was a 90 percent jump with their names together in it," Lynn says. "But Sandra Bullock and George Clooney in Gravity, that was about a 17 percent jump with their names. You’re able to really see where most of the value is for a project. In the case of Gravity, most of the value actually came in what that concept was."

With rare exceptions — Spielberg, Tarantino — the same goes for directors. "The real value of the director is being able to put out a better product, meaning, something that might score a 90 percent on Rotten Tomatoes versus something that might score a 40 percent on Rotten Tomatoes," Lynn says.

The interpretation of a movie’s CE score depends on its budget; a movie with a modest CE score that cost much less to make could be a bigger success, relative to expectations, than a film with a higher CE that was still dwarfed by its budget. A successful summer blockbuster should score at least a 250, a line of demarcation that separates Jason Bourne (370), Star Trek Beyond (364), Ghostbusters (338), and The Legend of Tarzan (292) from Warcraft (212), The BFG (204), and Ben-Hur (178). "Horror films tend to score higher than other genres in general," Lynn says, which backs up the Blumhouse formula; Don’t Breathe’s concept scored a 240, while Lights Out checked in at 214. Comedies also do well, although they’re especially volatile, suffering steep penalties when reviews report that they don’t deliver laughs. Dramas (130–160) and indie films (below 130) bring up the rear. Even Oscar notoriety can’t save some films with low CE scores; Whiplash (97) and Nebraska (74) didn’t cost much to make, but they weren’t windfalls, either.

Lynn says that Piedmont’s CE scores have a 0.73 correlation to opening-weekend dollars (where 1.0 would indicate a perfect, positive relationship between the two), and only a slightly lower correlation to total box office earnings. One weakness of this stat is that it doesn’t tell us what the typical studio exec would get just by going with his or her gut; what we’d really like to know is what value the model provides over a replacement-level Hollywood lifer. Still, Lynn points out that based on data on every major release dating to 1995, CE correlates more closely to opening-weekend box office than production budget (0.58), screen count (0.57, which itself has a close correlation to budget), "quality" as measured by review aggregators (0.35), whether the movie is a sequel (0.22), and whether it’s rated R (minus-0.21; the more exclusive the rating, the worse the box office forecast). The model’s "r-squared," or coefficient of determination, says that its inputs explain more than 86 percent of the observed variation in box office results.

Lynn doesn’t claim to be as infallible about film as, say, Meatloaf claims to be about fantasy football. He acknowledges that unforeseen circumstances can skew a movie’s stats. Fantastic Four, for instance, scored a solid 326 CE but has a 9 percent rating on Rotten Tomatoes, a problem compounded by a tweet from director Josh Trank, who all but disowned the film’s final cut on the eve of its opening. Lynn believes that if Fantastic Four had been "decent," it would have more than doubled its $25.7 million opening-weekend gross.

On the other end of the spectrum, Mad Max: Fury Road (213 CE) recorded a 97 percent Rotten Tomatoes rating and roughly doubled the low-to-middle $20 million opening-weekend total that Piedmont projected. "Movies that score on the very higher end in terms of quality can often shoot past their linear expectations," says Lynn, who also puts Deadpool and Guardians of the Galaxy in that group. "This happens about 5–10 times at most a year, where a major movie is just rated that well and you see it snowball in that way." Unbridled enthusiasm from one subset of the audience can also increase earnings; Assassin’s Creed, which comes out in December, has a 345 overall CE but a 559 CE with males 12–17 (who’ll be able to buy their way in, thanks to its PG-13 rating).

Piedmont sends out its surveys at least six months before the films in question reach theaters, although when working with clients, they’ve predated debuts by two years or more. Lynn says the results aren’t less reliable at any stage before the audience becomes widely aware of the movie, assuming its cast avoids scandals and its premise doesn’t come uncomfortably close to horrific real-life events that precede its release. "Literally, you could make up an idea right now and I can get the numbers back to you within days," he says. "And unless something changed about what we tested in the time until release, they should be exactly the same."

One might think that the spectacular downfall of the original Relativity Media, the now-notorious company Ryan Kavanaugh founded in 2004, would make Lynn’s message less likely to be heard. Like Lynn, Kavanaugh — a broker/financier who willed a self-contained but unsustainable studio into existence — presented a vision of an objective solution to the Goldman dilemma, and his skills as a salesman fueled a rapid rise. Kavanaugh, though, was a glad-hander, a profligate spender, and an unstable businessman. Worse, he proved to be something of a huckster, with a cherry-picked Monte Carlo model that the company manipulated to get deals done. Relativity filed for Chapter 11 last year, although Kavanaugh is trying to make a comeback. "I think people are able to separate out the issues with Relativity with the idea of modeling, in general," Lynn says. "I really haven’t sort of run into issues on that front." It’s hard to assess how embraced he’s been, because studios don’t have any incentive to say. (Multiple studios declined or didn’t respond to The Ringer’s requests for comment.) No one who’s skeptical wants to look like a Luddite, and no one who’s sold wants to surrender what could be a competitive advantage. "The industry is sort of opening up a little bit, but I think it’s opening up cautiously," Lynn says.

Perhaps the greater challenge is that some execs whose careers began before the algorithmic era perceive Piedmont’s statistical techniques as a threat, even though they’re based on the responses of actual people — stats and scouts, as the sabermetricians would say. "If you’re somebody who makes a living off of your great taste and your ability to pick, and you’re known for that, this is sort of a threat to what you do," Lynn says. "A lot of this industry is run on opinions … There is a sort of confirmation bias that happens a lot, where people will tend to remember the things they were involved with that went well, and then if there was something that they put a certain actor in a project and it didn’t do well, they’ll sort of discount that."

If Piedmont’s approach gains currency, it won’t mean the end of unprofitable movies. Producers will still make passion projects, and studios will still take short-term losses with long-term gains in mind. "Sometimes a movie will get made that a studio will know will likely not do as well as they needed it to, but it’s starring Tom Cruise, and they want to be in the Tom Cruise business," Lynn says. "They’ll toss somebody a little something just to know that they will have the next one, which will more than cover it."

Nor does Lynn believe there’s a specter of sameness lurking within his outwardly innocent innovation. Audiences value variety, and he doesn’t think greater awareness of CE scores would encourage studios to make more movies from the same mold. "I think what you might see is smarter allocations of money so that you’re not putting out $200 million on a wing and a prayer, thinking other, similar movies have done this well so we’re looking about the same," Lynn says. "It’s giving you a smarter sense of where your starting-off point is. So that movie may still get made, but now for $120 million or $80 million, as opposed to $200 million." In other words, Oscar bait will still exist alongside filler like R.I.P.D. The schlock just won’t have as big a budget.

Lynn doesn’t denigrate the Goldman maxim. "I don’t want to say that that’s not true and that we know stuff," he says. "It’s not that I have a secret whatever, it’s not that I know more than the next guy. [What] I think doesn’t matter. I’m just a Caucasian male, 41 years old, living in Los Angeles. It’s just that we’re looking at … a representative sample across the ticket-buying population for movies."

Goldman closed his "NOBODY KNOWS ANYTHING" section with one last assertion. For studio execs, he wrote, insomnia goes with the territory. Lynn is trying to sell them a better night’s sleep.

The Town

The Biggest Losers of Hollywood’s Cable-pocalypse

We're Obsessed

Olympics Roundup and ‘It Ends With Us’ Conspiracy Corner

The Big Picture

‘Sing Sing’ Is One of the Best Things You’ll See This Year. ‘It Ends With Us’ Is … Not. (Our 700th Episode!)

View all stories in Movies