Combining forecasts

No Gravatar

I have been suggesting that the best statistical approach, when confronted with conflicting signals such as the employment estimates from the BLS payroll survey, the separate BLS household survey, or the huge database from the private company Automatic Data Processing, is not to selectively throw some of the data out but rather to combine the different measures. Judging from some of the comments this suggestion has received at Econbrowser, Calculated Risk and Outside the Beltway, I thought it might be useful to say a little more about the benefits of combining forecasts.

Suppose we have available two polls that have surveyed voters for a particular election. The first surveyed 1,000 voters, and found that 52% of those surveyed favored candidate Jones, with a margin of error of plus or minus 3.2%. [By the way, in case you’ve forgotten your Stat 101, those margins of error for purposes of evaluating the null hypothesis of no difference between the candidates can be approximated as (1/N)0.5, or 0.032 when N = 1,000]. The second poll surveyed 500 voters, of whom 54% favored candidate Jones, with the margin of error for the second poll of plus or minus 4.5%. Would you (a) throw out the second poll, because it&#8217-s less reliable than the first, and (b) then conclude that the evidence for Candidate Jones is unpersuasive, because the null hypothesis of no difference between the candidates is within the first poll&#8217-s margin of error?

If that&#8217-s the conclusion you reach, you&#8217-re really not making proper use of the data in hand. You should instead be reasoning that, between the two polls, we have in fact surveyed 1,500 voters, of whom a total of 520 + 270 = 790 or 52.7% favor Jones. In a poll of 1,500 people, the margin of error would be plus or minus 2.6%. So, even though neither poll alone is entirely convincing, the two taken together make a pretty good case that Jones is in the lead.

In the above example, it&#8217-s pretty obvious how to combine the two polls, just by counting the raw number of people covered by each poll and then combining the two as if it were one big sample. But this example illustrates a statistical procedure that works in more general settings as well. We have two different estimates, 0.52 and 0.54, of the same object. We know that the variance of the first estimate is (0.5)2/1000, while the variance of the second estimate is (0.5)2/500 [again, does that sound familiar from Stat 101?]. If we followed the general principle of taking a weighted average of the two, with weights inversely proportional to the variances, that would mean in this case calculating [(1000)(0.52) + (500)(0.54)]/(1000 + 500) = 0.527, which amounts to combining the two estimates in exactly the way that common sense requires for the two-poll example. That principle, of taking a weighted average of different estimates, with weights inversely proportional to the sampling variance of each, turns out to be a good way not just to combine two polls but also to combine independent estimates that may have come from a wide range of different statistical problems.

But what if the second poll not only covered fewer people, but is also less reliable because it is a week older? One way to think about the issue in that case is to notice that the second poll&#8217-s estimate differs from the true population proportion because of the contribution of two terms. The first is the sampling error in the original poll (correctly measured by the (0.5)2/500 formula), and the second is the change in that population proportion over the last week. If we knew the variance governing how much public preferences are likely to change within a week, we would just add this to the sampling variance to get the total variance associated with the second estimate, and use this total variance rather than (0.5)2/500 to figure out how strongly to downweight the earlier poll. The earlier poll would then get much less weight than the newer one, but you&#8217-d still be better off making some use of the data rather than throwing it out altogether.

And what if you believe that one of the polls is systematically biased, but you&#8217-re not sure by how much? Many statisticians in that case might give you the OK to go ahead and ignore the second poll. On the other hand, there are many of us who would still want to make some use of that data, accepting some bias in the estimate in order to achieve a smaller mean squared error. In doing so, we acknowledge that we may make a systematic error in inference that you will avoid, but we will nevertheless be closer to the truth most of the time than you will if there are substantial benefits to bringing in extra data.
Examples where such an approach is quite well-established are estimating a spectrum (where we use the value of the periodogram at nearby frequencies, even though we know it would be a biased estimate of the spectrum at the point of interest) and nonparametric regression (where we use the value when x takes on values other than the one we&#8217-re interested in, even though again our assumption is doing so necessarily introduces some bias to the final estimate).

Robert Clemen, in a paper in the International Journal of Forecasting in 1989 surveyed over 200 different academic studies, and concluded:

Consider what we have learned about the combination of forecasts over the past twenty years&#8230-. The results have been virtually unanimous: combining multiple forecasts leads to increased forecast accuracy. This has been the result whether the forecasts are judgmental or statistical, econometric or extrapolation. Furthermore, in many cases one can make dramatic performance improvements by simply averaging the forecasts.

If I ask you what you think U.S. employment growth was in December, and your answer is the December BLS payroll number, one could say you have decided that the optimal weights to use for &#8220-combining&#8221- the payroll, household survey, and ADP estimates are 1.0, 0.0, and 0.0 respectively. But there&#8217-s an awful lot of statistical theory and practical experience to suggest those aren&#8217-t the best possible weights.

Or to put it another way, even though the payroll numbers were encouraging, the fact that ADP estimates that the U.S. lost 40,000 jobs in December should surely make you a little less confident about the robustness of employment growth than you otherwise would have been.

Thoughts on Weather Bill

No Gravatar

See this article on the new firm, which will offer daily weather prediction market-like contracts on temperature and precipitation (our host mentioned them earlier today).

A couple thoughts:

1. I really don&#8217-t see that many accredited investors having so much exposure to daily weather risk that they&#8217-ll feel the need to hedge at prices they&#8217-ll assume imply negative expected value (given that both Weather Bill and the hedgies they offload risk on need their pounds of flesh).

  • Movie theaters are a nice example of someone who is long rain, but they are mostly owned by large chains. Portfolio theory suggests that public companies have no business paying to hedge risks that are not large enough to threaten bankruptcy.
  • The law of large #s helps people out with weather. Yes, rain is bad for a golf course, but really they care about a rainy summer (or decade) more than a rainy day. And memberships provide a means of offloading some of that risk on the golfers.

2. Weather isn&#8217-t the kind of thing that a lot of people think they know a lot about (unlike, say, sports and politics). So I&#8217-m not sure &#8220-betting&#8221- is going to save them.

3. Part of why the wholesale weather futures market hasn&#8217-t taken off and has devolved into an OTC affair is an absence of liquidity trading. Only a few big utilities have a real need to hedge temperature (many are still hedged by the regulatory environment they operate in), and in an open market, there is the worry that you&#8217-ll always be trading against someone with a better model than you.

4. What I think is most innovative is the idea of marketing a prediction market contract as &#8220-insurance.&#8221- But I&#8217-d have started with housing. Sell me &#8220-insurance&#8221- against a 10% or greater decline in the SF property market, and then dynamically hedge with the new CME futures.

Would prediction registries obtain 80% of the benefits of prediction markets?

No Gravatar

Smart *** Mike Linksvayer (who has an opinion on anything) doubts that the number is that high. (How does he know??? How dare of him to contredict Robin Hanson, the master of the Universe???) What do you think, guys and gals?

(((And I renamed the Midas Oracle post category, &#8220-Track Records&#8221-, to read now &#8220-Prediction Registries&#8220-. See, even though I roast Mike Linksvayer publicly, he has a profound impact on moi.)))

Icons. Geniuses. Mavericks. – REDUX

Last time, I told you about TED 2007 and I noticed Nathan Myhrvold (former MicroSoft CTO –he founded MicroSoft Research) in the speaker list. That prompted me to visit his site, Intellectual Ventures, home of his “invention company”:

Intellectual Ventures is an invention company. We conceive and patent our own inventions in-house through a world-renowned staff of internal and external scientists and engineers. We also acquire and license patented inventions from other inventors around the world. Our network of invention sources includes: large and small businesses, governments, academia, and individual inventors. These inventions span a diverse range of technologies including: software, semiconductors, wireless, consumer electronics, networking, lasers, biotechnology, and medical devices. Our current focus is on developing our invention portfolio. Over time, we intend to market our portfolio on a broad and non-exclusive basis through a variety of channels including spin-out companies.

In the November issue of IP Investor (PDF), it is said that Intellectual Ventures has filed about 400 patents, and Nathan Myhrvold slams eBay for having filled only “10 patents or 11 patents” —big companies need patents “for the future”, he says. It would be interesting to know how many patents BetFair has filled —BetFair, like eBay, is a peer-to-peer exchange.

Of course, once you have transitioned from “invention” to “innovation”, the road does not end there: more than 75% of new products fail.

All this to come back to Robin Hanson’s invention, the concept of decision markets (a market-generated automatism to extract the wisdom of crowds and make it a better decision tool than managers or politicians), which he spams at each conference he is invited in. As I wrote two times, it’s a solution in search for a problem. I would see only two applications (last time I said “only one”, but I got a new idea):

– As I said, series of decisions that are NEVER taken by senior executives.

– In the coming two decades, imaginary universes (like Second Life) are going to meet with prediction markets. Once we are there, we can imagine a Second Life sub-universe filled with libertarians, playing with imaginary prediction markets. Then, yes, if the gamers agree, imaginary decision markets could rule —both at the macro and micro levels.

More Resources On Innovation:

The Business Innovation Insider blog – (highly recommended, especially to my marketing guru Guy Kawasaki)

Innovators share the lessons they’ve learned during 2006 – (a hodgepodge of ideas taken from various thinkers)

HSX = an advanced indicator for TradeSports traders??? – REDUX 2

No Gravatar

This is a recent picture of traitor Mike Linksvayer (much better than the New York Times picture where he looked like an angry chimp):

Mike Linksvayer
&#8212-

First, Mike Linksvayer sided with moi:

[Hollywood Stock Exchange] could be an indicator that the [TradeSports] price is wrong and in which direction, which is enough for a [TradeSports] trader to place an order.

Then, as soon as I turned my back, he sold me down the river (I should have known!!!!!!):

I agree with everything you [you = Sacha Peter] write in the last comment (and your original challenge to CFM to put his money where his mouth is).

&#8212-

#1. HSX &#8220-prices&#8221- are predictive (see The effectiveness of pre-release advertising for motion pictures – PDF – by Anita Elberse and Bharat Anand – 2005-03-05):

Despite the fact that the simulation does not offer any real monetary incentives, collectively, HSX traders generally produce relatively good forecasts of actual box office returns (e.g. Elberse and Eliashberg 2003, Spann and Skiera 2003). According to Pennock, Lawrence, Giles and Nielsen (2001a- 2001b), who analyzed HSX&#8217-s efficiency and forecast accuracy, arbitrage closure on HSX is quantitatively weaker, but qualitatively similar, relative to a real-money market. Moreover, in direct comparisons with expert judges, HSX forecasts perform competitively.

#2. You can investigate accuracy and precision scientifically- no need to back one&#8217-s research paper with one&#8217-s money.

#3. My original question was: “Should I use HSX info to speculate at TradeSports?”. Looking at HSX and TradeSports odds, maybe The Departed is correctly priced or underpriced at TradeSports, and maybe Babel is underpriced at TradeSports.

Iran – Is Something Really Up?

No Gravatar

Both Spook86 and Michael Ledeen suggested a few days ago that the USA might be adopting a stronger position towards Iran. Are we?

Look at Tradesports&#8217- price history for its AIRSTRIKE.IRAN.DEC07 contract:

(Click the thumbnail to display a large version of this chart.)

So what does this combination of an increase in stern American and British rhetoric, and stagnant odds in the geopolitical wagering market, mean? I think it&#8217-s clear. The rhetoric is most likely not intended as a prelude to action by us. It is intended as a substitute for action. This is business as usual and not at all encouraging.

(See also this post.)

Cross-posted at Chicago Boyz, an Intrade affiliate.

NewsFuturess explainer on prediction markets

No GravatarNo definition, but there&#8217-s an example, here &#8212-static, alas.

(((In passing, I see that NewsFutures &#8220-will use the Brookings Institution&#8217-s Iraq Index as a source to measure troop levels.&#8221- Good. But what it they fail to deliver, like the US DOD in the NKM scandal?)))

Previous blog posts by Chris F. Masse:

  • Is that HubDub’s Nigel Eccles on the bottom left of that UK WebMission pic?
  • Collective Error = Average Individual Error – Prediction Diversity
  • When gambling meets Wall Street — Proposal for a brand-new kind of finance-based lottery
  • The definitive proof that it’s presently impossible to practice prediction market journalism with BetFair.
  • The Absence of Teams In Production of Blog Journalism
  • Publish a comment on the BetFair forum, get arrested.
  • If I had to guess, I would say about 50 percent of the “name pros” you see on television on a regular basis have a negative net worth. Frightening, I know.

Prediction Markets for Science?

No Gravatar

There&#8217-s a set of Robin Hanson slides that are much more interesting than the presentation he gave at Yahoo! Confab.

– It&#8217-s bigger (67 slides vs. 21) and it covers the prediction market problematic in a more comprehensive way (including MSR).

– The decision market concept takes a minor place, whereas at Confab, Robin Hanson made the mistake to focus most of his speech on it &#8212-interesting concept but that was the wrong audience (Confab attendees wanted specific answers about basic prediction market questions).

– Here are excerpts, but I recommend you to download the presentation file, read it from A to Z, and share it with your friends and colleagues. (And if you had downloaded his Confab slides, direction the trash can of your computer &#8211-don&#8217-t keep 7 megas of useless bits of information.)

&#8212-

Prediction Markets for Science? – (PPT) – by Robin Hanson – 2006-12-XX

&#8212-

Today’s Science Prices (Play $ Alas) – [Foresight Exchange]
11-14% P != NP proven by 2010
16-18% Cancer cured by 2010
16-19% Cold fusion works by 2015
28-29% Mammal immortality by 2015
28-31% Eventual universe collapse
68-70% Fusion energy sold by 2045
74-76% Extraterrestrial life by 2050
91-95% A gamma ray burst w/in 33Mly
93-95% Cosmo constant &gt- 0
93-96% Neutrino mass &gt- 0

Science Decision Markets
E[ Iraq civil war | US moves troops out? ]
E[ Sea levels | Raise CO2 tax ]
E[ Lifespan | Health care reform ]

E[ Murders | More gun control ]
E[ Cancer deaths | More research funding ]
E[ Firm stock price | fire CEO? ]
E[ Succeed? | Fund project ]
E[ Publications | Hire candidate ]
E[ Citations | Publish ]

Theory I – Old
“Strong Efficient Markets” is straw man

No info – Supply and Demand
Assume beliefs not respond to prices
Price is weighted average of beliefs
More influence: risk takers, rich
Info, Static – Rational Expectations
Price clears, but beliefs depend on price
No trade if not expect “noise traders”
Price not reveal all info
More influence: info holders

Theory II – Market Microstructure
Info, Dynamic – Game Theory

Example – Kyle ’85
X – Informed trader(s) – risk averse
Y – Noise trader – fool or liquidity pref
Market makers – no info, deep pockets
If many compete, Price = E[value|x+y]
Info markets – use risk-neutral limit
If Y larger, X larger to compensate more info gathered, so more accuracy!

Theory III – Behavioral Finance
Humans are overconfident

Far more speculative trade than need
Mere fact of disagreement shows
Overconfidence varies with person, experience, consequence severity
Implications
Price in part an ave of beliefs?
Adds noise to price aggregates?
Prices more honest than talk, polls, …

Ask the Right Questions
Cost independent of topic, but value not!
Seek high value to more accurate estimates!
Relevant standard: beat existing institutions
Where suspect more accuracy is possible
Suspect info is withheld, or not sure who has it
Prefer fun, easy to explain and judge
Prefer can let many know best estimates
Not fear reveal secrets, use fear to motivate
Avoid inducing foul play

Eight Design Issues
How avoid self-defeating prophecies?
How handle billions of possible combos?
What if terrorists lose $ to mislead us?
What if terrorists gain $ by give us info?
How not alarm public, inform terrorists?
Price can mislead if deciders know more.
Will markets induce people to lie?
Will markets help employees embezzle?

&#8212-

Thoughts About &#8220-Decision Markets&#8221-:

Decision Market for Science
&#8212-

#1. Robin Hanson first wanted to apply his decision market concept to refine the democratic process &#8212-&#8221-vote on values but bet on beliefs&#8220-. He calls that &#8220-futarchy&#8221- (PDF) &#8212-as far as I can see, only some libertarian wackos like Chris Hibbert or Peter McCluskey bought the idea. (I&#8217-m not even sure his paper was accepted somewhere for publication.)

#2. Now, Robin Hanson tries to plug his decision market concept as a management decision tool &#8212-here&#8217-s from his Confab presentation:

Decision Market Applications

E[ Revenue | Switch ad agency? ]
E[ Revenue | Raise price 10%? ]
E[ Project done date | Drop feature? ]
E[ Project done date | Add personnel? ]
E[ Stock price | Fire CEO? ]
E[ Stock price | Acquire firm X? ]

The guy doesn&#8217-t have the slightest chance that his envisioned applications see the ray of light, one day (that is, before his head gets chopped off and frozen, shortly after his death). Basically, he wants the senior executives to be replaced with a market-generated automatism. Even if he can prove that it would lead to lead to better management decisions (and I trust him on that), he&#8217-ll encounter entrenched resistance from the same people his decision tool was created to compete with. I&#8217-d short-sell Robin Hanson on that one &#8212-with all my might, and I&#8217-d bet George Soros would see an opportunity here.

#3. The only chance the guy has would be to dig the field of management science for areas where a series of micro decisions are taken by mid-level executives or technologist or scientists or other employees &#8212-but NEVER by senior executives. In that perspective, all the examples of applications he gave above are worthless &#8212-direction the trash can of your computer (and select &#8220-empty the trash can&#8221-, to make sure they disappear for good.)

#4. Robin Hanson (who is a bright inventor) is totally incapable of seeing the light of what could be a successful innovation. The only chance the guy has would be for him to network with Silicon Valley&#8217-s geeks-turned-IT-executives, and, after a series of of pitches, maybe he would get feedback from someone who can find a mutant idea &#8212-an idea that is original, almost bizarre- an idea nobody ever thought of before. Robin Hanson, on his own, is totally incapable of thinking creatively in terms of innovation &#8212-he likes big ideas but he doesn&#8217-t get people and marketing (including internet usability). Which is why I&#8217-m suggesting to him to go West and to find his complement(s) there. That&#8217-s his only chance.