Expert opinion or simple model: Which is better?

I saw a very interesting talk at work today about decision making in oil and gas businesses, and thought it had some pretty neat applications for decision making in general. I’d just like to summarise the research by David Newman who is studying his PhD at the University of Adelaide in the Australian School of Petroleum. He has 35 years experience in the oil and gas industry and in decision making. Unfortunately I don’t have full references for a lot of the work due to the format of the presentation and have tried to provide credit where possible.


The premise is that oil and gas projects (the exploration, development, drilling and production of petroleum) struggle to achieve promised economic outcomes in hindsight. Research has shown that a good predictor of outcomes is the level of front end loading (FEL), or exploration, feasibility studies and analysis, completed at the final investment decision (FID), when the full blown project is given the final go-ahead.

The value of FEL is well known and many individuals and companies advocate its use, but in reality it is not used or used poorly. More commonly, expert opinion is used. A common situation is expert opinion overruling a work of analysis because they claim that this project in particular is somehow ‘different’ or ‘unique’ compared to other projects.

As we know from research in the non-profit sector, expert opinion is very often wrong, and is not a substitute for data and analysis, and so it is no surprise that it holds little value in other industries as well.

However, Newman proposes that expert may be a viable substitute if and only if it passes 4 tests:

  • Familiarity test – Is the situation similar to previous known examples?
  • Feedback test – Is ongoing feedback on the accuracy of the opinion good? If evidence is received that expert opinion is not working for the given situation, immediately review. This is notoriously difficult for projects with multi-year lifespans, such as oil and gas projects and charity programs.
  • Emotions test – Is there a possibility that emotions are clouding the expert’s judgement?
  • Bias test – Is there a possibility that the expert is succumbing to some kind of bias? It is hard to be a dispassionate expert on an issue.

There is a belief that data and models are only better at predicting outcomes than expert opinion if they are complex and advanced. Meehl’s work shows that even simple models are better than expert opinion in the majority of cases. 60% of comparisons showed that the simple model was better, and the majority of the remaining 40% showed something close to a draw.

To understand the phenomena at play, Newman and his colleagues interviewed 34 senior personnel from oil and gas companies with an average of over 25 years experience in the industry. The personnel were a mix of executives (vice president level or equivalent), managers and technical professionals (who were leaders in their own discipline).

The survey data showed that ~80% saw FEL as very important, ~10% as important, with none saying it was not important.* However, none of those surveyed use the results from FEL as a hard criteria. That is to say, none are willing to approve or reject a project based on FEL data alone. Many used FEL as a soft criteria, in that it guided their final decision, but had no veto power. The results of this survey are not statistically significant due to small sample size, but according to Newman may be seen as indicative.

Interestingly, the executives tended to rate their understanding of the technical details of projects higher than the actual technical experts. Either the executives are over confident, the technical staff are under confident, a combination of both, or, seemingly less likely, the executives really are more competent in technical matters.

Newman proposes the following set of solutions to overcome the problems discussed here.

Apply correction factors to predict likely outcomes based on FEL benchmarking (comparison to other projects). This is difficult in oil and gas due to the differing nature of projects, and is expected to be a problem in charity programs as well. It might be worthwhile looking at programs that have done similar work in an attempt to benchmark, or at least previous programs within the same organisation.

Benchmarking can be a checklist to score against a certain criteria. For example, a dispassionate outsider can be brought in to answer pre-determined questions and provide an assessment based on data (and only data, without interpretations) from the team. They might also rate individual categories as poor, fair, good or best.

The adjustment factors will vary significantly between different types of projects, however the table below provides an example for two factors, cost and schedule, which have been rated by an external auditor. If the schedule has been rated as poor, as in the schedule pressures are likely applying pressure and biasing results (being behind schedule makes staff more likely to say the project is complete), you should adjust the appropriate data by a scalar of 1.1-1.5 (or inverse). My interpretation of this is that if long term costs are expected to be $100/week, and the scalar of 1.4 is selected due to the project being behind schedule, the true cost should be estimated as $140/week. The ranges are examples only, and the ideal values for a given type of project can only be determined through extensive analysis of that type of project, which can make this type of analysis difficult to be meaningful if substantial data isn’t available.

CostSchedule
Best0.9 - 1.150.9 - 1.15
Good0.95 - 1.20.95 - 1.25
Fair1.0 - 1.31.05 - 1.4
Poor1.05 - 1.451.1 - 1.5

Apply post-mortem analyses, or reviews of projects after completion.

Apply pre-mortem analyses. This involves asking everyone involved in the project to imagine that the project has concluded its life, and a disaster has occurred. They are then asked to propose why the project failed. This increases the chances of identifying key risks by 30% (no source beyond Newman for this unfortunately, but it’s a huge result). The reason being that it legitimises uncertainty, and makes staff more likely to think of obscure lines of thought or things that might be considered rude to bring up under different circumstances. Calling a team members work a risk would be uncomfortable in other situations.

I’d be interested to see some of these techniques being applied in non-profits and EA organisations more if they aren’t already, especially the pre-mortem technique. If the data is to be believed then it is a highly effective exercise. Also interested to hear your thoughts as to how they could be applied, or whether you think they are useful in the first place.

Again, there are several references to the work of other researchers that I would love to have referenced, however was unable to as the reference was not provided.


*In my personal opinion, the way these surveys are structured may lead to some bias themselves. For example, the 4 choices for this part of the survey were ‘very important’, ‘important’, ‘neutral’ and ‘not important’. It doesn’t seem likely that anyone perceived to be an expert would say a concept known to be important is important.