A crash course by herr professor Panos Ipeirotis:
[...] We have two things to measure: a. how much better than random the predictions are “predictive power†(accuracy, loss functions etc etc), and b. how consistent are the confidence metrics for these predictions.
Most of the papers focus on (a), often ignoring (b). But for prediction markets (where prices are supposed to estimate probabilities) it is important to examine rigorously how good are the estimates themselves, not only how good were the predictions.