Tag Archives: News

Bets As Signals of Article Quality

On October 15, I talked at the Rutgers Foundation of Probability Seminar on Uncommon Priors Require Origin Disputes. While visiting that day, I talked to Seminar host Harry Crane about how the academic replication crisis might be addressed by prediction markets, and by his related proposal to have authors offer bets supporting their papers. I mentioned to him that I’m now part of a project that will induce a great many replication attempts, set up prediction markets about them beforehand, and that we would love to get journals to include our market prices in their review process. (I’ll say more about this when I can.)

When the scheduled speaker for the next week slot of the seminar cancelled, Crane took the opening to give a talk comparing our two approaches (video & links here). He focused on papers for which it is possible to make a replication attempt and said “We don’t need journals anymore.” That is, he argued that we should not use which journal is willing to publish a paper as a signal of paper quality, but that we should use the signal of what bet authors offer in support of their paper.

That author betting offer would specify what would count as a replication attempt, and as a successful replication, and include an escrowed amount of cash and betting odds which set the amount a challenger must put up to try to win that escrowed amount. If the replication fails, the challenger wins these two amounts minus the cost of doing a replication attempt; if not the authors win that amount.

In his talk, Crane contrasted his approach with an alternative in which the quality signal would be the odds in an open prediction market of replication, conditional on a replication attempt. In comparing the two, Crane seems to think that authors would not usually participate in setting market odds. He lists three advantages of author bets over betting market odds: 1) Authors bets give authors better incentives to produce non-misleading papers. 2) Market odds are less informed because market participants know less that paper authors about their paper. 3) Relying on market odds allows a mistaken consensus to suppress surprising new results. In the rest of this post, I’ll respond.

I am agnostic on whether journal quality should remain as a signal of article quality. If that signal goes away, then we are talking about what other signals can be how useful. And if that signal remains, then we can be talking about other signals that might be used by journals to make their decisions, and also by other observers to evaluate article quality. But whatever signals are used, I’m pretty sure that most observers will demand that a few simple easy-to-interpret signals be distilled from the many complex signals available. Tenure review committees, for example, will need signals nearly as simple as journal prestige.

Let me also point out that these two approaches of market odds or author bets can also be applied to non-academic articles, such as news articles, and also to many other kinds of quality signals. For example, we could have author or market bets on how many future citations or how much news coverage an article will get, whether any contained math proofs will be shown to be in error, whether any names or dates will be shown to have been misreported in the article, or whether coding errors will be found in supporting statistical analysis. Judges or committees might also evaluate overall article quality at some distant future date. Bets on any of these could be conditional on whether serious attempts were made in that category.

Now, on the comparison between author and market bets, an obvious alternative is to offer both author bets and market odds as signals, either to ultimate readers or to journals reviewing articles. After all, it is hard to justify suppressing any potentially useful signal. If a market exists, authors could easily make betting offers via that market, and those offers could easily be flagged for market observers to take as signals.

I see market odds as easier for observers to interpret than author bet offers. First, authors bets are more easily corrupted via authors arranging for a collaborating shill to accept their bet. Second, it can be hard for observers to judge how author risk-aversion influences author odds, and how replication costs and author wealth influences author bet amounts. For market odds, in contrast, amounts take care of themselves via opposing bets, and observers need only judge any overall differences in wealth and risk-aversion between the two sides, differences that tend to be smaller, vary less, and matter less for market odds.

Also, authors would usually participate in any open market on their paper, giving those authors bet incentives and making market odds include their info. The reason authors will bet is that other participants will expect authors to bet to puff up their odds, and so other participants will push the odds down to compensate. So if authors don’t in fact participate, the odds will tend to look bad for them. Yes, market odds will be influenced by views others than those of authors, but when evaluating papers we want our quality signals to be based on the views of people other than paper authors. That is why we use peer review, after all.

When there are many possible quality metrics on which bets could be offered, article authors are unlikely to offer bets on all of them. But in an open market, anyone could offer to bet on any of those metrics. So an open market could show estimates regarding any metric for which anyone made an offer to bet. This allows a much larger range of quality metrics to be available under the market odds approach.

While the simple market approach merely bets conditional on someone attempting a replication attempt, an audit lottery variation that I’ve proposed would instead use a small fixed percentage of amounts bet to pay for replication attempts. If the amount collected is insufficient, then it and all betting amounts are gambled so that either a sufficient amount is created, or all these assets disappear.

Just as 5% significance is treated as a threshold today for publication evaluation, I can imagine particular bet reliability thresholds being important for evaluating article quality. News articles might even be filtered or show simple icons based on a reliability category. In this case the betting offer and market options would more tend to merge.

For example, an article might be considered “good enough” if it had no more than a 5% chance of being wrong, if checked. The standard for checking this might be if anyone was currently offering to bet at 19-1 odds in favor of reliability. For as long as the author or anyone else maintained such offers, the article would qualify as at least that reliable, and so could be shown via filters or icons as meeting that standard. For this approach we don’t need to support a market with varying prices; we only need to keep track of how much has been offered and accepted on either side of this fixed odds bet.

GD Star Rating
loading...
Tagged as: , ,

News Accuracy Bonds

Fake news is a type of yellow journalism or propaganda that consists of deliberate misinformation or hoaxes spread via traditional print and broadcast news media or online social media. This false information is mainly distributed by social media, but is periodically circulated through mainstream media. Fake news is written and published with the intent to mislead in order to damage an agency, entity, or person, and/or gain financially or politically, often using sensationalist, dishonest, or outright fabricated headlines to increase readership, online sharing, and Internet click revenue. (more)

One problem with news is that sometimes readers who want truth instead read (or watch) and believe news that is provably false. That is, a news article may contain claims that others are capable of proving wrong to a sufficiently expert and attentive neutral judge, and some readers may be fooled against their wishes into believing such news.

Yes, news can have other problems. For example, there can be readers who don’t care much about truth, and who promote false news and its apparent implications. Or readers who do care about truth may be persuaded by writing whose mistakes are too abstract or subtle to prove wrong now to a judge. I’ve suggested prediction markets as a partial solution to this; such markets could promote accurate consensus estimates on many topics which are subtle today, but which will eventually become sufficiently clear.

In this post, however, I want to describe what seems to me the simple obvious solution to the more basic problem of truth-seekers believing provably-false news: bonds. Those who publish or credential an article could offer bonds payable to anyone who shows their article to be false. The larger the bond, the higher their declared confidence in their article. With standard icons for standard categories of such bonds, readers could easily note the confidence associated with each news article, and choose their reading and skepticism accordingly.

That’s the basic idea; the rest of this post will try to work out the details.

While articles backed by larger bonds should be more accurate on average, the correlation would not be exact. Statistical models built on the dataset of bonded articles, some of which eventually pay bonds, could give useful rough estimates of accuracy. To get more precise estimates of the chance that an article will be shown to be in error, one could create prediction markets on the chance that an individual article will pay a bond, with initial prices set at statistical model estimates.

Of course the same article should have a higher chance of paying a bond when its bond amount is larger. So even better estimates of article accuracy would come from prediction markets on the chance of paying a bond, conditional on a large bond amount being randomly set for that article (for example) a week after it is published. Such conditional estimates could be informative even if only one article in a thousand is chosen for such a very large bond. However, since there are now legal barriers to introducing prediction markets, and none to introducing simple bonds, I return to focusing on simple bonds.

Independent judging organizations would be needed to evaluate claims of error. A limited set of such judging organizations might be certified to qualify an article for any given news bond icon. Someone who claimed that a bonded article was in error would have to submit their evidence, and be paid the bond only after a valid judging organization endorsed their claim.

Bond amounts should be held in escrow or guaranteed in some other way. News firms could limit their risk by buying insurance, or by limiting how many bonds they’d pay on all their articles in a given time period. Say no more than two bonds paid on each day’s news. Another option is to have the bond amount offered be a function of the (posted) number of readers of an article.

As a news article isn’t all true or false, one could distinguish degrees of error. A simple approach could go sentence by sentence. For example, a bond might pay according to some function of the number of sentences (or maybe sentence clauses) in an article shown to be false. Alternatively, sentence level errors might be combined to produce categories of overall article error, with bonds paying different amounts to those who prove each different category. One might excuse editorial sentences that do not intend to make verifiable newsy claims, and distinguish background claims from claims central to the original news of the article. One could also distinguish degrees of error, and pay proportional to that degree. For example, a quote that is completely made up might be rated as completely false, while a quote that is modified in a way that leaves the meaning mostly the same might count as a small fractional error.

To the extent that it is possible to verify partisan slants across large sets of articles, for example in how people or organizations are labeled, publishers might also offer bonds payable to those than can show that a publisher has taken a consistent partisan slant.

A subtle problem is: who pays the cost to judge a claim? On the one hand, judges can’t just offer to evaluate all claims presented to them for free. But on the other hand, we don’t want to let big judging fees stop people from claiming errors when errors exist. To make a reasonable tradeoff, I suggest a system wherein claim submissions include a fee to pay for judging, a fee that is refunded double if that claim is verified.

That is, each bond specifies a maximum amount it will pay to judge that bond, and which judging organizations it will accept.  Each judging organization specifies a max cost to judge claims of various types. A bond is void if no acceptable judge’s max is below that bond’s max. Each submission asking to be paid a bond then submits this max judging fee. If the judges don’t spend all of their max judging fee evaluating this case, the remainder is refunded to the submission. It is the amount of the fee that the judges actually spend that will be refunded double if the claim is supported. A public dataset of past bonds and their actual judging fees could help everyone to estimate future fees.

Those are the main subtleties that I’ve considered. While there are ways to set up such a system better or worse, the basic idea seems robust: news publishers who post bonds payable if their news is shown to be wrong thereby credential their news as more accurate. This can allow readers to more easily avoid believing provably-false news.

A system like that I’ve just proposed has long been feasible; why hasn’t it been adopted already? One possible theory is that publishers don’t offer bonds because that would remind readers of typical high error rates:

The largest accuracy study of U.S. papers was published in 2007 and found one of the highest error rates on record — just over 59% of articles contained some type of error, according to sources. Charnley’s first study [70 years ago] found a rate of roughly 50%. (more)

If bonds paid mostly for small errors, then bond amounts per error would have to be very small, and calling reader attention to a bond system would mostly remind them of high error rates, and discourage them from consuming news.

However, it seems to me that it should be possible to aggregate individual article errors into measures of overall article error, and to focus bond payouts on the most mistaken “fake news” type articles. That is, news error bonds should mostly pay out on articles that are wrong overall, or at least quite misleading regarding their core claims. Yes, a bit more judgment might be required to set up a system that can do this. But it seems to me that doing so is well within our capabilities.

A second possible theory to explain the lack of such a system today is the usual idea that innovation is hard and takes time. Maybe no one ever tried this with sufficient effort, persistence, or coordination across news firms. So maybe it will finally take some folks who try this hard, long, and wide enough to make it work. Maybe, and I’m willing to work with innovation attempts based on this second theory.

But we should also keep a third theory in mind: that most news consumers just don’t care much for accuracy. As we discuss in our book The Elephant in the Brain, the main function of news in our lives may be to offer “topics in fashion” that we each can all riff on in our local conversations, to show off our mental backpacks of tools and resources. For that purpose, it doesn’t much matter how accurate is such news. In fact, it might be easier to show off with more fake news in the mix, as we can then show off by commenting on which news is fake. In this case, news bonds would be another example of an innovation designed to give us more of what we say we want, which is not adopted because we at some level know that we have hidden motives and actually want something else.

GD Star Rating
loading...
Tagged as: , , ,

News As If Info Mattered

In our new book, we argue that most talk, including mass media news and academic talk, isn’t really about info, at least the obvious base-level info. But to study talk, it helps to think about what it would in fact look like if it were mostly about info. And as with effective altruism, such an exercise can also be useful for those who see themselves as having unusually sincere preferences, i.e., who actually care about info. So in this post let’s consider what info based talk would actually look like.

From an info perspective, a piece of “news” is a package that includes a claim that can be true or false, a sufficient explanation of what this claim means, and some support, perhaps implicit, to convince the reader of this claim. Here are a few relevant aspects of each such claim:

Surprise – how low a probability a reader would have previously assigned to this claim.
Confidence – how high a probability a reader is to assign after reading this news.
Importance – how much the probability of this claim matters to the reader.
Commonality – how many potential readers this consider this topic important.
Recency – how recently this news became available.
Support Type – what kind of support is offered for a reader to believe this claim.
Support Space – how many words it takes to show the support to a reader.
Definition Space – how many words it takes to explain what this claim means.
Bandwidth – number of channels of communication used at once to tell reader about this news.
Chunk – size of a hard-to-divide model containing news, such as a tweets or a book.

Okay, the amount of info that some news gives a reader on a claim is the ratio of its confidence to its surprise. The value of this info multiplies this info amount by the claim’s importance to that reader. The total value of this news to all readers (roughly) multiplies this individual value by its commonality. Valuable news tells many people to put high confidence in claims that they previously thought rather unlikely, on topics they consider important.

A reader who knew most everything that is currently known would focus mostly on recent news. Real people, however, who know very little of what is known, would in contrast focus mostly on much less recent news. Waiting to process recent news allows time for many small pieces of news to be integrated into large chunks that share common elements of definition and support, and that make better use of higher bandwidth.

In a world mainly interested in getting news for its info, most news would be produced by specialists in particular news topics. And there’d be far more news on topics of common interest to many readers, relative to niche topics of interest only to smaller sets of readers.

The cost of reading news to a reader is any financial cost, plus a time cost for reading (or watching etc.). This time cost is mostly set by the space required for that news, divided by the effective bandwidth used. Total space is roughly definition space plus support space. If the claim offered is a small variation on many similar previous claims already seen by a reader, little space may be required for its definition. In contrast, claims strange to a reader may take a lot more space to explain.

When the support offered for a claim is popularity or authority, such support may be seen as weak, but it can often be given quite concisely. However, when the support offered is an explicit argument, that can seem strong, but it can also take a lot more space. Some claims are self-evident to readers upon being merely stated, or after a single example. If prediction markets were common, market odds could offer concise yet strong support for many claims. The smallest news items will usually not come with arguments.

Given the big advantages of modularity, in news as in anything else, we need a big gain to justify the modularity costs of clumping news together into hard-to-divide units, like articles and books. There are two obvious gain cases here: 1) many related claims, and 2) one focus claim requiring much explanation or support. The first case has a high correlation in reader interest across a set of claims, at least for a certain set of readers. Here a sufficient degree of shared explanation or support across these claims could justify a package that explains and supports them all together.

The second case is where a single focal claim requires either a great deal of explanation to even make clear what is being claimed, or it requires extensive detailed arguments to persuade readers. Or both. Of course there can be mixes of these two cases. For example, if in making the effort to support one main claim, one has already done most of the work needed to support a related but less important claim, one might include that related claim in the same chunk.

For most readers, most of the claims that are important enough to be the focus of a large chunk are also relatively easy to understand. As a result, most of the space in most large focused chunks is devoted to support. And as argument is the main support that requires a lot of space, most of the space in big chunks focused on a central claim is devoted to supporting arguments. Also, to justify the cost of a large chunk with a large value for the reader, most large focused chunks focus on claims to which readers initially assign a low probability.

So how does all this compare to our actual world of talk today? There are a lot of parallels, but also some big deviations. Our real world has a lot of local artisan production on topics of narrow interest. That is, people just chat with each other about random stuff. Even for news produced by efficient specialists, an awful lot of it seems to be on topics of relatively low importance to readers. Readers seem to care more about commonality than about importance. And there’s a huge puzzling focus on the most recently available news.

Books are some of our largest common chunks of news today, and each one usually purports to offer recent news on arguments supporting a central claim that is relatively easy to understand. It seems puzzling that so few big chunks are explicitly justified via shared explanation and justification of many related small claims, or that so man big chunks seem neither to cover many related claims nor a single central claim. It also seems puzzling that most focal claims of books are not very surprising to most readers. Readers do not seem to be proportionally more interested in the books on with more surprising focal claims. And given how much space is devoted to arguments for focal claims, it is somewhat surprising that books often neglect to even mention other kinds of support, such as popularity or authority.

While I do think alternative theories, in which news is not mainly about info, can explain many of these puzzles, a discussion of that will have to wait for another post.

GD Star Rating
loading...
Tagged as: , ,

News of What?

Today’s New York Times has a 7000 word article by Amy Harmon on cryonics, brain scanning, and brain emulation. Now these are subjects of great interest to me; my first book comes out in spring on the third topic. And 7000 words is space to say a great deal, even if you add the constraint that what you say must be understandable to the typical NYT reader.

So I’m struck by the fact that I have almost nothing to say in response to anything particular said in this article. Ms. Harmon gives the most space to one particular young cryonics patient who got others to donate to pay for her freezing. This patient hopes to return via brain emulation. Ms. Harmon discusses some history of the Brain Preservation Prize, highlighting Ken Hayworth personally, and quotes a few experts saying we are nowhere close to being able to emulate brains. At one point she says,

The questions the couple faced may ultimately confront more of us with implications that could be preposterously profound.

Yet she discusses no such implications. She discusses no arguments on if emulation would be feasible or desirable or what implications it might have. I’ll give her the benefit of the doubt and presume that her priorities accurately reflect the priorities of New York Times readers. But those priorities are so different from mine as to highlight the question: what exactly do news readers want?

For a topic like this, it seems readers want colorful characters described in detail, and quotes from experts with related prestige. They don’t want to hear about arguments for or against the claims made, or to discuss further implications of those claims. It seems they will enjoy talking to others about the colorful characters described, and perhaps enjoy taking a position on the claims those characters make. But these aren’t the sort of topics where anyone expects to care about the quality or care of the arguments one might. It is enough to just have opinions.

Added 14Sep: Amy posted a related article that is a technical review of brain emulation tech. I’m glad it exists, but I also have nothing particular to say in response.

GD Star Rating
loading...
Tagged as: ,

Why News?

Google Alerts has failed me. For years I’d been trusting it to tell me about new news that cites me, and for the last few years it has just not been doing that. So when I happened to go searching for news that mentions me, I found 135 new articles, listed on my press page. I’d probably find more, if I spent a few more hours searching.

Consider for the moment what would have happened if I had put up a blog post about each of those press articles, as they appeared. Even if I didn’t say much beyond a link and a short quote, some of you would have followed that link. And the sum total of those follows across all 135 articles would be far more than the number of you who today are going to go browsing my press page now that you know it has 135 new entries.

Similarly, I now have 2829 scholarly citations of my work, most of which appeared while I was doing this blog, and this blog has had 3640 posts, many of which were written by others when this was a group blog. So I might plausibly have doubled the number of my posts on this blog by putting up a post on each paper that cited one of my papers. Or more reasonably, I might have made one post a month listing such articles.

For both news and academic articles that cite me, I expect readers to pay vastly more attention to them if I announce them soon after they appear than if I give a single link to a set of them a few years later. Yet I don’t think, and I don’t think readers think, that the fundamental interest or importance of these articles declines remotely as fast as reader interest. This is also suggested by the fact that readers follow so many news sources, like blogs, instead of looking at only the ‘best of’ sections of far more sources.

Bottom line, readers show a strong interest in reading and discussing articles soon after they appear, an interest not explained by an increased fundamental importance of recent articles. Instead a plausible hypothesis is that readers care greatly about reading and talking about the same articles that others will read and talk about, at near the time when those others will do that reading and talking. In substantial part, we like news in order to support talking about the news, and not so much because news communicates important information or insights.

GD Star Rating
loading...
Tagged as: ,

Trends Rarely Inform Policy

I’d like to try to make a point here that I’ve made before, but hopefully make it more clearly this time. My point is: trend tracking and policy analysis have little relevance for each other.

You can discuss education policy, or you can discuss education trends. You can discuss medical policy or you can discuss medical trends. You can discuss immigration policy, or you can discuss immigration trends. And you can discuss redistribution and inequality trends, or you can discuss redistribution and inequality policy. But in all of these cases, and many more, the trend and policy topics have little relevance for each other.

On trends, we collect a lot of data, usually on parameters that are relatively close to what we can easily measure, and also close to summary outcomes that we care about, like income, mortality, or employment. Many are interested in explaining past trends, and in forecasting future trends. Such trend tracking supports the familiar human need for news to discuss and fret about. And when a trend looks worrisome, that naturally leads people to want to discuss what oh what we might do about it.

On policy, we have lots of thoughtful theoretical analysis of policies, which try to judge which policies are better. And we have lots of relevant data analysis, that tries to distinguish relevant theories. Such analysis usually ends up identifying a few key parameters on which policy decisions should depend. But those tend to be abstract parameters, close to theoretical fundamentals. They usually have only a distant relation to the parameters which are tracked so eagerly as trends.

To repeat for emphasis: the easy to measure parameters where trends are most eagerly tracked are rarely close to the key theoretical parameters that determine which policies are best. They are in fact usually so far away that it is hard to judge the sign of the relation between them. This makes it unlikely that a change in one of these policies is a reasonable response to noticing some tracked-parameter trend.

For example which policies are best in medicine depends on key theoretical parameters like risk-aversion, asymmetric info on risks, meddling preferences, market power of hospitals, customer irrationality, and where learning happens, etc. But the trends we usually track are things like mortality, rates of new drug introduction, and amounts, fractions, and variance of spending. These later parameters are just not very relevant for inferring the former. People may find it fascinating to track trends in doctor salaries, cancer deaths, or how many are signed up for Obamacare. But those are pretty irrelevant to which policies are best.

As another example, debates on immigration refer to many relevant theoretical parameters, including meddling preferences, demand elasticity for low wage workers, and the intelligence, cultural norms, and cultural plasticity of immigrants. In contrast, trend trackers talk about trends in immigration, low-skill wages, wage inequality, labor share of income, voter participation, etc. Which might be fascinating topics, but they are just not very relevant for whether immigration is a good or bad idea. So it just doesn’t make sense to suggest changing immigration policy in response to noticing particular trends in these tracked parameters.

Alas, most people are a lot more interested in tracking trends than in analyzing policies. So well meaning people with smart things to say about policy often try to make their points seem more newsworthy by suggesting those policies as answers to the problems posed by troublesome trends. But, in doing so they usually mislead their audiences, and often themselves. Trends just aren’t very relevant for policy. If you want to talk policy, talk policy, and skip the trends.

GD Star Rating
loading...
Tagged as: , ,