Tag Archives: Academia

How School Goes Wrong

I’ve been teaching for over two decades, but haven’t yet posted much on my theoretical view of school. Talking recently to an entering education Ph.D. student has inspired me to fill that gap.

The obvious usual purpose for school is to help people learn how to do useful tasks in life. And the obvious way to help with that is to show students various useful tools, show examples of their use, and then have students practice trying related tasks with related tools. Finally, score students on how well they do these practice tasks, to help others judge their suitability for various positions.

In this view, the big question is: how far and in what ways should school tasks differ from the later life tasks for which students are preparing? School tasks can differ from life tasks in many ways, such as in how long they take, how wide a scope of subproblems they encompass, how clearly performance on them can be judged, how many others have previously completed similar tasks, how connected each new task is to one’s recent tasks, what sort of teams take on tasks, and what sort of other distractions one must deal with while working on each task.

It seem obvious to me that school tasks must differ greatly from life tasks, at least when kids are young. It is also obvious that choosing school tasks well is hard, but that this can offer huge gains. We should search well the vast space of possibilities for the best student tasks.

Furthermore, it seems obvious that student tasks often complement each other strongly. Often learning one task helps a lot in learning another task. So we want all the tasks that students eventually take on to fit into a total package where the parts fit well together, and where that package fits well with later life tasks. Which can justify a lot of coordination between the teaching of related topics, and between schools and those who manage life tasks. In addition, there are often scale and scope economies from having many students do similar tasks, especially regarding evaluation. (This coordination isn’t obviously better when governments run schools.)

Our simplest general task tool is inference, supported by related “facts”. That is, one tells students about key facts related to a task class, and shows them examples of drawing relevant inferences from such facts. This “book learning” is far from the only useful tool, but it is useful often enough to make fact-telling a big fraction of learning for most topics. Yes, it is somewhat possible to teach better general inference, but the scope for this seems vastly overrated.

Not only is it hard to choose the package of learning tasks well, it is even harder for non-experts to judge the quality of such packages. And even when one can judge the quality of particular school tasks, their fitting together into large integrated packages makes it hard to push for particular changes. (Such as the long-overdue switch from geometry to statistics in high school.) If schools competed fiercely on measured student outcomes, they might try harder to find the best packages. But such outcomes are usually not measured well, and many schools are funded and managed by customers who are not very outcome-oriented.

The net result is that teachers and schools can have a lot of slack regarding their choices of student tasks and supporting tools. Which suggests that schools may allow other priorities, besides preparing students for life tasks, to influence their choices. For example, when the world changes, teachers with status tied to their expertise regarding particular student tasks may have insufficient incentives to change those tasks to better fit a changed world. As another example, teachers who seek to push ideologies may over-emphasize teaching facts, and try to infuse those ideologies into the facts they present, even when that cuts student performance.

When schools face stronger selection pressures regarding the perceived quality of their students, relative to preparing students for life tasks, then such schools may pick tasks with less evaluation noise and higher perceived prestige, even if those tasks help less for common life tasks. Especially for students likely to go into industries where the main product sold to customers is affiliation with worker prestige. In that case, schools mainly just need to agree on how to prestige is measured, and then pick school tasks that fit well with those prestige concepts. Here the social value of such schooling seems far less than its private value; we should tax, not subsidize, such school.

When accusations of teacher bias are important, schools may emphasize tasks that can be more clearly and objectively evaluated, even if those tasks are otherwise less useful. And when an accusation of school bias against particular subgroups is salient, schools may emphasize tasks on which those particular subgroups do better. Some have suggested that accusations of bias against girls has induced schools to switch more to tasks on where girls do better. (Even though the direct measured biases seems to be against boys.)

Over the last few decades there seems to have been a move away from giving students “hard” tasks, where one cannot offer clear procedures to follow to succeed. On such hard tasks, teachers show students related tools and examples of prior successful performance, and can offer suggestions on how to improve tasks in progress. But students must flounder and search for how to achieve excellence, and most students will not so achieve. Some have claimed that such hard tasks favor boys, who are less risk-averse.

One of the main tasks for grad students is to write research papers. And my grad classes are focused overwhelmingly on this task. This is a hard task, where many will fail, and where evaluations are more subjective. And it is a big task chunk, which takes a long time and is not easily broken down into subtasks that can be evaluated independently. But it is also a task clearly and directly relevant to their future life, at least if they move near academic circles. While academics are willing to water down many school tasks to satisfy various outside pressures, they have so far drawn the line at how they train their own replacements.

When teaching undergrads, I usually split the class grade into four quizzes and four short papers. The quizzes are more fact-based, and have more parts and thus less noise in their overall evaluation. With quizzes, I more give students what they, their parents, and their schools want. The papers are harder and have more evaluation noise, but are closer to a life task that I value: using economic tools to argue for a policy position of their choice on a topic that I choose. For papers, I grade using a point system designed to ignore my personal opinions on paper topics.

My teaching strategy roughly matches my theory of teaching; I get as close as I can to having my students practice a real life task related to the class topic that I have been assigned. Even if those tasks are hard, even if that makes my evaluations of students more noisy, and even if students like it less. I accept that schools have mostly devolved to sorting students by prestige, instead of preparing them for life tasks. But in my classes, I do what I can to resist that trend.

Added 11a: The main obstacle to replacing college with real jobs is finding ways to standardize across such jobs the topics learned and performance evaluation. That will just require a lot of trial and error to figure out. Don’t invest in a firm that claims to know the answers if you aren’t willing to pay for lots of trial and error time.

GD Star Rating
a WordPress rating system
Tagged as: ,

Shoulda-Listened Futures

Over the decades I have written many times on how prediction markets might help the intellectual world. But usually my pitch has been to those who want to get a better actionable info out of intellectuals, or to help the world to make better intellectual progress in the long run. Problem is, such customers seem pretty scarce. So in this post I want to outline an idea that is a bit closer to a business proposal, in that I can better identify concrete customers who might pay for it.

For every successful intellectual there are (at least) hundreds of failures. People who started out along a path, but then were not sufficiently rewarded or encouraged, and so then either quit or persisted in relative obscurity. And a great many of these (maybe even a majority) think that the world done them wrong, that their intellectual contributions were underrated. And no doubt many of them are right. Such malcontents are my intended customers.

These “world shoulda listened to me” customers might pay to have some of their works evaluated by posterity. For example, for every $1 saved now that gains a 3% real rate of return, $19 in real assets are available in a century to pay historians for evaluations. At a 6% rate of return (or 3% for 2 centuries), that’s $339. Furthermore, if future historians needed only to randomly evaluate 1% of the works assigned them, then if malcontents paid $10 per work to be maybe evaluated, historians could spend $20K (or $339K) per work they evaluate. Considering all the added knowledge and tools to which future historians may have access, that seems enough to do a substantial evaluation, especially if they evaluate several related works at the same time.

Given a substantial chance (1% will do) that a work might be evaluated by historians in a century or two, we could then create (conditional) prediction markets now estimating those future evaluations. So a customer might pay their $20 now, and get an immediate prediction market estimate of that future evaluation for their work. That $20 might pay $10 for the (chance of a) future evaluation and another $10 to establish and subsidize a prediction market over the coming centuries until resolution.

Finally, if customers thought market estimate regarding their works looked too low, then they could of course try to bet to raise those estimates. Skeptics would no doubt lie waiting to bet against them, and on average this tendency of authors to bet to support their works would probably subsidize these markets, and so lower the fees that the system needs to charge.

Of course even with big budgets for evaluations, if we want future historians to make reliable enough formal estimates that we can bet on in advance, then we will need to give them a well-defined-enough task to accomplish. And we need to define this task in a way that discourages future historians from expressing their gratitude to all these people who funded their work by giving them all an A+.

I suggest we have future historians estimate each work’s ideal attention: how much attention each particular work should have been given during some time period. So we should pick some measure of attention, a measure that we can calculate for works when they are submitted, and track over time. This measure should weigh if the dissertation was approved, the paper was published and where, how many cites did it get, etc. If we add up all the initial attention for submitted works, then we can assign historians the task of (counterfactually) reallocating this total attention across all the submitted works. So to give more attention to some, they’d have to take away attention from others.

Okay, so now they can’t give every work an A+. (And we ensure that bet assets have bounded values.) But our job isn’t done. We also need to give them a principle to follow when allocating attention among all these prior works. What objective would they be trying to accomplish via this reallocation of attention?

I suggest that the objective just be intellectual progress, toward the world having access to more accurate and useful beliefs. A set of works should have gotten more attention if in that case the world would have been more likely to have more quickly come to appreciate valuable truths. And this task is probably easier if we ask future historians to use their future values in this task, instead of asking them to try to judge according to our values today.

These evaluation tasks probably get easier if historians randomly pick related sets of works to evaluate together, instead of independently picking each work to evaluate. And this system can probably offer scaled fees, wherein the chance that your work gets evaluated rises linearly with the price you paid for that chance. There are probably a lot more details to work out, but I expect I’ve already said enough for most people to decide roughly how much they like this idea.

Once there were many works in this system, and many prediction markets estimating their shoulda-been attention, then we could look to see if market speculators see any overall biases in today’s intellectual worlds. That is, topics, methods, disciplines, genders, etc. to which speculators estimate that the world today is giving too little attention. That could be pretty dramatic and damning evidence of bias, by someone, evidence to which we’d all be wise to attend.

One obvious test of this approach would be to assign historians today the task of reallocating attention among papers published a century or two ago. Perhaps assign multiple independent groups, and see how correlated are their evaluations, and how that correlation varies across topic areas. Perhaps repeating in a decade or two, to see how much evaluations drift over time.

Showing these correlations to potential customers might convince them that there’s a good enough chance that such a system will later correctly vindicate their neglected contributions. And these tests may show good scopes to use, for related works and time periods to evaluate together, and how narrow or broad should be the expertise of the evaluators.

This whole shoulda-listened-futures approach could or course also be applied to many other kinds of works, not just intellectual works. You’d just have to establish your standards for how future historians are to allocate shoulda attention, and trust them to actually follow those standards. Doing tests on works from centuries ago here could also help to show if this is a viable approach for these kinds of works.

Added 7am 28Apr: On average more assets will be available to pay for future evaluations if the fees paid are invested in risky assets. So instead of promising a particular percentage chance of evaluation, it may make more sense to specify how fees will be invested, set the (real) amount to be spent on each evaluation, and then promise that the chance of evaluation for each work will be set by the investment return relative to the initial fee paid. Yes that induces more evaluations in state of the world where investments do better, but customers are already accepting a big chance that their work will never be directly evaluated.

GD Star Rating
a WordPress rating system
Tagged as: , ,

Do Your Thoughts Scale?

Most intellectuals don’t pick their topics based on fundamental value. They instead opportunistically read the many clues around them regarding on which topics they are more likely to be rewarded. Now if you, in contrast, have the slack and inclination to instead pursue what seems fundamentally important, I salute you. And to help you, I now review some related considerations that you might overlook:

  • Rewards: You don’t want to focus *only on topics where others offer rewards, but that does help, so don’t ignore it.
  • Impressive: In particular, if your work can help you look impressive, that can help you get more support later.
  • Generality: The more general your topic, the more different useful applications you and others might later find.
  • Approachable: It is not enough for insights on X to be valuable, you need some ideas for how to get insights on X.
  • Pioneering: Due to diminishing returns, the 10th insight in an area offers more gains relative to costs than the 1000th.
  • Advantage: If you will compete with others on your topic, seek some sort of comparative advantage relative to them.
  • Actionable: Cosmically big topics are insufficient; you also need key concrete actions which your results could inform.
  • Near-term: The sooner that relevant actions could be taken the better; actions in a century matter a lot less.
  • Scales-well: You want to join an intellectual community that will achieve big scale economies in accumulating insights.

This last consideration is so important, and so oft overlooked, that I will now spend the rest of this post on it. The world gains vastly more when intellectuals can organize themselves via a division of labor to each look into different topics and then combine all their efforts into a unified total perspective. So that over time their efforts accumulate into progress. Most intellectuals pretend that their usual habits ensure this, but this isn’t remotely true. Continue reading "Do Your Thoughts Scale?" »

GD Star Rating
a WordPress rating system
Tagged as:

Our Default Info System: Status And Gossip

Around 1988-1990, I was working on the idea of “hypertext publishing”, which today we call the web. I was invited to give a talk to a few (<10) academics working on computer based info systems, I think at Xerox PARC. I argued that we then were hampered by our poor systems for finding out what other people had done and said.

One of the audience members said that, via gossip, he had no problem finding out what others were doing in his field. If anything was important, he’d hear about it via gossip, and if someone didn’t have enough status to get people to gossip about his work, it couldn’t be important enough for him to attend to.

Today, a physics academic told me (and a few others) that it isn’t a problem that physicists can’t be persuaded by contrarian arguments published in respectable peer reviewed physics journals, as they won’t read or consider it if it goes against their prior expectations. He said what really matters is your status, not whether you’ve published or where. Gossip about high status people gets their arguments considered even without publication, and no one else’s arguments matter anyway. Low status people can contribute by working out the details of high status people’s arguments.

And from a sociological point of view, of course, they are both correct. In a world that has decided that only arguments from high status people are worthy of considering, each one of them can safely ignore all the others. Even if some low status person somehow forces the world to hear and be persuaded by their argument, the high status people can and will close ranks to ensure that this low status person gains minimal concrete advantages from it, to make sure everyone learns the lesson about going through proper channels.

I presume you can see the social problem here, of insufficient information aggregation and intellectual progress. They can probably see it too, if forced to think on it. But why should they, and even if they saw the problem why should they risk personal prestige to change things, as success just makes it easier for others to compete with them.

GD Star Rating
a WordPress rating system
Tagged as: ,

School Vouchers As Pandemic Response

Politico asked me and 17 others:

If you were in charge of your school district or university, how would you design the fall semester?

My answer:

Let 1,000 vouchers bloom. Schools face very difficult choices this fall, between higher risks of infection and worse learning outcomes. We should admit we don’t know how to make these choices well collectively, and empower parents to choose instead. Take the per-student school budget and offer a big fraction of it to parents as a voucher, to pay for home schooling they run themselves, for a neighbor to set up a one-house schoolhouse, for a larger private school, or to use at a qualifying local public school. Each option would set its own learning policies and also policies on distancing and testing. Let parents weigh family infection risks against learning quality risks, using what they know about available options, and their children’s risks, learning styles and learning priorities.

Yes, schools may suffer a large initial revenue shortfall this way; maybe they could rent out some rooms to new private school ventures. Yes, some children will end up with regretful schooling outcomes, though that seems inevitable no matter what we do. Yes, there should be some limits on teaching quality, but we should be forgiving at first; after all, public schools don’t know how to ensure quality here either. And maybe let any allowed option start a month or two late, if they also end later next summer; after all, we aren’t giving them much time to get organized.

GD Star Rating
a WordPress rating system
Tagged as: ,

Toward A University Department of Generalists

The hard problem then is how to get specialists to credit you for advancing their field when they don’t see you as a high status one of them. (more)

Many of my most beloved colleagues, and also I, are intellectual polymaths. That is, we have published in many different areas, and usefully integrated results from diverse areas. Academia tends to neglect integration and generality, which hurts not only intellectual progress, but also myself and my colleagues. Which makes me especially interested in fixing this problem.

The key problem is that academics and their research are mostly evaluated by those who work on very similar topics and methods. To the extent that these are evaluated by folks at a larger distance, it is by those who control one of the limited number of standard “disciplines” (math, physics, literature, econ, etc.).

Thus we have a poor system for evaluating work and people that sit between disciplines, or that cover many disciplines. Making it harder to evaluate work that combines areas A and B, and maybe also C and D. You might be able to get an A person to evaluate the A parts, and then a B person for the B parts, but that is more work, and the person who knows how to pick a good A evaluator may not know how to pick a good B evaluator. Academics tend to think that interdisciplinary groups do worse work, held to lower standards, and this is a big part of why.

Furthermore, even when specialists can evaluate such things well enough, they have an incentive to say “Maybe that should be supported, but not with our resources.” That is, for people and work that combines A and B, the A folks say it should be supported by the B budget, and vice versa. Often to be accepted by people in A, you must do as much good work in A as someone who only ever works in A, regardless of how much good work you also do in B, C, etc.

Yet generality still gains substantial prestige among intellectuals, which gives me hope. For example, there are usually fights to write more general summaries, such as review articles and textbooks, fights usually won by the highest in status. And Nobel prize winners, upon winning, often famously wax philosophic and general, pontificating (usually badly) on a much wider range of topics than they did previously.

Academic disciplines and departments usually need to do two things: (1) evaluate people to say who can join and stay in them, and (2) train new candidates in a way that makes it likely that many will later be evaluated positively in part (1). I’m not sure there is a way to do part (2) well here, but I think I at least know of a way to do part (1).

I propose that one university (and eventually many) create a Department of Generalists. (Maybe there’s a better name for it.) To apply to join this department, you must first get tenure in some other department. You submit your publication record, and from that they can calculate a measure of the range of your publications. Weighted by quality of course. Folks with very high range are assumed to be shoo-ins, folks with low ranges are routinely rejected, and existing department members have discretion on borderline cases.

How could we calculate publication range? I’ve posted before on using citation data to construct maps of academia. From such maps it seems straightforward to create robust metrics describing the volume in that space encompassed by a person’s research. And something like citations could be used to weigh publications in this metric. No doubt there is room for disagreement on exact metrics, and I’m not pushing to get too mechanical here. My point is that it is feasible to evaluate generality, as we know how to mechanically get a decent first cut measure of a researcher’s range.

So what do people in Department of Generalists do exactly? Well of course they continue with their research, and can continue to serve the departments form which they came. But they are encouraged to do more general research than do folks in other departments. They can now more easily talk with other generalists, work together on more general projects, and invite outside generalist speakers.

Maybe they experiment with training or mentoring other professors at the university to be generalists, people who hope to later apply to join this generalist department. They might be preferred candidates to write those prestigious general summaries, such as review articles and textbooks, and to teach generalist courses, like big introductory courses. And especially to review more generalist work by others.

It would of course be hard work to get such a department going. And you’d need to start it at a university where there are already many generalists who could get along. But I have high hopes, again from the fact that academics so often fight to appear general, as in fighting to write summarizes and to pontificate on more general issues. Once there was a widespread perception that people in the Department of Generalists were in fact better at being generalists, as well as meeting the usual criteria of at least one regular department, they would naturally be seen as an elite. A group that others aspire to join, patrons aspire to fund, reporters aspire to interview, and students aspire to learn under.

And then academia would less neglect work on integration, synthesis, and generality, and work between existing disciplines. Oh academia would still neglect those things, don’t get me wrong, just less. And that seems a goal worth pursuing.

GD Star Rating
a WordPress rating system
Tagged as: ,

Our Prestige Obsession

Long ago our distant ancestors lived through both good times and bad. In bad times, they did their best to survive, while in good times they asked themselves, “What can I invest in now to help me in coming bad times?” The obvious answer was: good relations and reputations. So they had kids, worked to raise their personal status, and worked to collect and maintain good allies.

This has long been my favored explanation for why we now invest so much in medicine and education, and why those investment have risen so much over the last century. We subconsciously treat medicine as a way to show that we care about others, and to let others show they care about us. As we get richer, we devote a larger fraction of our resources to this plan, and to other ways of showing off.

I’d never thought about it until yesterday, but this theory also predicts that, as we get rich, we put an increasing priority on associating with prestigious doctors and teachers. In better times, we focus more on gaining prestige via closer associations with more prestigious people. So as we get rich, we not only spend more on medicine, we more want that spending to connect us to especially prestigious medical professionals.

This increasing-focus-on-prestige effect can also help us to understand some larger economic patterns. Over the last half century, rising wage inequality has been driven to a large extent by a limited number of unusual services, such as medicine, education, law, firm management, management consulting, and investment management. And these services tend to share a common pattern.

As a fraction of the economy, spending on these services has increased greatly over the last half century or so. The public face of each service tends to be key high status individuals, e.g., doctors, teachers, lawyers, managers, who are seen as driving key service choices for customers. Customers often interact directly with these faces, and develop personal relations with them. There are an increasing number of these key face individuals, their pay is high, and it has been rising faster than has average pay, contributing to rising wage inequality.

For each of these services, we see customers knowing and caring more about the prestige of key service faces, relative to their service track records. Customers seem surprisingly disinterested in big ways in which these services are inefficient and could be greatly improved, such as via tech. And these services tend to be more highly regulated.

For example, since 1960, the US has roughly doubled its number of doctors and nurses, and their pay has roughly tripled, a far larger increase than seen in median pay. As a result, the fraction of total income spent on medicine has risen greatly. Randomized trials comparing paramedics and nurse practitioners to general practice doctors find that they all produce similar results, even though doctors cost far more. While student health centers often save by having one doctor supervise many nurses who do most of the care, most people dislike this and insist on direct doctor care.

We see very little correlation between having more medicine and more health, suggesting that there is much excess care and inefficiency. Patients prefer expensive complex treatments, and are suspicious of simple cheap treatments. Patients tend to be more aware of and interested in their doctor’s prestigious schools and jobs than of their treatment track record. While medicine is highly regulated overall, the much less regulated world of animal medicine has seen spending rise a similar rate.

In education, since 1960 we’ve seen big rises in the number of students, the number of teachers and other workers per student, and in the wages of teachers relative to worker elsewhere. Teachers make relatively high wages. While most schools are government run, spending at private schools has risen at a similar rate to public schools. We see a strong push for more highly educated teachers, even though teachers with less schooling seem adequate for learning. Students don’t actually remember much of what they are taught, and most of what they do learn isn’t actually useful. Students seem to know and care more about the prestige of their teachers than about their track records at teaching. College students prefer worse teachers who have done more prestigious research.

In law, since 1960 we’ve similarly seen big increases in the number of court cases, the number of lawyers employed, and in lawyer incomes. While two centuries ago most people could go to court without a lawyer, law is now far more complex. Yet it is far from clear whether we are better off with our more complex and expensive legal system. Most customers know far more about the school and job prestige of the lawyers they consider than they do about such lawyers’ court track records.

Management consultants have greatly increased in number and wages. While it is often possible to predict what they would recommend at a lower cost, such consultants are often hired because their prestige can cow internal opponents to not resist proposed changes. Management consultants tend to hire new graduates from top schools to impress clients with their prestige.

People who manage investment funds have greatly increased in number and pay. Once their management fees are taken into account, they tend to give lower returns than simple index funds. Investors seem willing to accept such lower expected returns in trade for a chance to brag about their association should returns happen to be high. They enjoy associating with prestigious fund managers, and tend to insist that such managers take their phone calls, which credibly shows a closer than arms-length relation.

Managers in general have also increased in number and also in pay, relative to median pay. And a key function of managers may be to make firms seem more prestigious, not only to customers and investors, but also to employees. Employees are generally wary of submitting to the dominance of bosses, as such submission violates an ancient forager norm. But as admiring and following prestigious people is okay, prestigious bosses can induce more cooperative employees.

Taken together, these cases suggest that increasing wage inequality may be caused in part by an increased demand for associating with prestigious service faces. As we get rich, we become willing to spend a larger fraction of our income on showing off via medicine and schooling, and we put higher priority on connecting to more prestigious doctors, teachers, lawyers, managers, etc. This increasing demand is what pushes their wages high.

This demand for more prestigious service faces seems to not be driven by a higher productivity that more prestigious workers may be able to provide. Customers seem to pay far less attention to productivity than to prestige; they don’t ask for track records, and they seem to tolerate a great deal of inefficiency. This all suggests that it is prestige more directly that customers seek.

Note that my story is somewhat in conflict with the usual “skill-biased technical change” story, which says that tech changed to make higher-skilled workers more productive relative to lower-skilled workers.

Added 10June: Note that the so-called Baumol “cost disease”, wherein doing some tasks just takes a certain number of hours unaided by tech gains, can only explain spending increases proportional to overall wage increases, and that only if demand is very inelastic. It can’t explain how some wages rise faster than the average, nor big increases in quantity demanded even as prices increases.

Added 12Jun: This post inspired by reading & discussing Why Are the Prices So Damn High?

GD Star Rating
a WordPress rating system
Tagged as: , , , ,

Can We Trust Deliberation Priests?

In Science, academic “deliberation” experts offer a fix for our political ills:

Citizens to express their views … overabundance [of] … has been accompanied by marked decline in civility and argumentative complexity. Uncivil behavior by elites and pathological mass communication reinforce each other. How do we break this vicious cycle? …

All survey research … obtains evidence only about the capacity of the individual in isolation to reason about politics. … [But] even if people are bad solitary reasoners, they can be good group problem-solvers … Deliberative experimentation has generated empirical research that refutes many of the more pessimistic claims about the citizenry’s ability to make sound judgments.

Great huh? But there’s a catch:

Especially when deliberative processes are well-arranged: when they include the provision of balanced information, expert testimony, and oversight by a facilitator … These effects are not necessarily easy to achieve; good deliberation takes time and effort. Many positive effects are demonstrated most easily in face-to-face assemblies and gatherings, which can be expensive and logistically challenging at scale. Careful institutional design involv[es] participant diversity, facilitation, and civility norms …

A major improvement … might involve a randomly selected citizens’ panel deliberating a referendum question and then publicizing its assessments for and against a measure … problem is not social media per se but how it is implemented and organized. Algorithms for ranking sources that recognize that social media is a political sphere and not merely a social one could help. …

It is important to remain vigilant against incentives for governments to use them as symbolic cover for business as usual, or for well-financed lobby groups to subvert their operation and sideline their recommendations. These problems are recognized and in many cases overcome by deliberative practitioners and practice. … The prospects for benign deployment are good to the degree that deliberative scholars and practitioners have established relationships with political leaders and publics—as opposed to being turned to in desperation in a crisis.

So ordinary people are capable of fair and thoughtful deliberation, but only via expensive processes carefully managed in detail by, and designed well in advance by, proper deliberation experts with “established relationships with political leaders and publics.” That is, these experts must be free to pick the “balance” of info, experts, and participants included, and even who speaks when how, and these experts must be treated with proper respect and deference by the public and by political authorities.

No, they aren’t offering a simple well-tested mechanism (e.g., an auction) that we can apply elsewhere with great confidence that the deployed mechanism is the same as the one that they tested. Because what they tested instead was a mechanism with a lot of “knobs” that need context-specific turning; they tested the result of having particular experts use a lot of discretion to make particular political and info choices in particular contexts. They say that went well, and their academic peer reviewers (mostly the same people) agreed. So we shouldn’t worry that such experts would become corrupted if we gave them a lot more power.

This sure sounds like a priesthood to me. If we greatly empower and trust a deliberation priesthood, presumably overseen by these 20 high priest authors and their associates, they promise to create events wherein ordinary people talk much more reasonably, outputting policy recommendations that we could then all defer to with more confidence. At least if we trust them.

In contrast, I’ve been suggesting that we empower and trust prediction markets on key policy outcomes. We’ve tested such mechanisms a lot, including in contexts with strong incentives to corrupt them, and these mechanisms have far fewer knobs that must be set by experts with discretion. Which seems more trustworthy to me.

GD Star Rating
a WordPress rating system
Tagged as: , , ,

Replication Markets Team Seeks Journal Partners for Replication Trial

An open letter, from myself and a few colleagues:

Recent attempts to systematically replicate samples of published experiments in the social and behavioral sciences have revealed disappointingly low rates of replication. Many parties are discussing a wide range of options to address this problem.

Surveys and prediction markets have been shown to predict, at rates substantially better than random, which experiments will replicate. This suggests a simple strategy by which academic journals could increase the rate at which their published articles replicate. For each relevant submitted article, create a prediction market estimating its chance of replication, and use that estimate as one factor in deciding whether to publish that article.

We the Replication Markets Team seek academic journals to join us in a test of this strategy. We have been selected for an upcoming DARPA program to create prediction markets for several thousand scientific replication experiments, many of which could be based on articles submitted to your journal. Each market would predict the chance of an experiment replicating. Of the already-published experiments in the pool, approximately one in ten will be sampled randomly for replication. (Whether submitted papers could be included in the replication pool depends on other teams in the program.) Our past markets have averaged 70% accuracy, and the work is listed at the Science Prediction Market Project page, and has been published in Science, PNAS, and Royal Society Open Science.

While details are open to negotiation, our initial concept is that your journal would tell potential authors that you are favorably inclined toward experiment article submissions that are posted at our public archive of submitted articles. By posting their article, authors declare that they have submitted their article to some participating journal, though they need not say which one. You tell us when you get a qualifying submission, we quickly tell you the estimated chance of replication, and later you tell us of your final publication decision.

At this point in time we seek only an expression of substantial interest that we can take to DARPA and other teams. Details that may later be negotiated include what exactly counts as a replication, whether archived papers reveal author names, how fast we respond with our replication estimates, what fraction of your articles we actually attempt to replicate, and whether you privately give us any other quality indicators obtained in your reviews to assist in our statistical analysis.

Please RSVP to: Angela Cochran, PM acochran@replicationmarkets.com 571 225 1450

Sincerely, the Replication Markets Team

Thomas Pfeiffer (Massey University)
Yiling Chen, Yang Liu, and Haifeng Xu (Harvard University)
Anna Dreber Almenberg & Magnus Johannesson (Stockholm School of Economics)
Robin Hanson & Kathryn Laskey (George Mason University)

Added 2p: We plan to forecast ~8,000 replications over 3 years, ~2,000 within the first 15 months.  Of these, ~5-10% will be selected for an actual replication attempt.

GD Star Rating
a WordPress rating system
Tagged as: , ,

It’s All Data

Bayesian decision theory is often a useful approximation as a theory of decisions, evidence, and learning. And according to it, everything you experience or see or get as an input can be used as data. Some of it may be more informative or useful, but it’s all data; just update via Bayes rule and off you go.

So what then is “scientific” data? Well “science” treated as a social phenomena is broken into many different disciplines and sub-fields, and each field tends to have its own standards for what kinds of data they will publish. These standards vary across fields, and have varied across time, and I can think of no universals that apply to all fields at all times.

For example, at some times in some fields one might be allowed to report on the content of one’s dreams, while in other fields at times that isn’t okay but it is okay to give statistics summarizing the contents of all the dreams of some set of patients at a hospital, while in other fields at other times they just don’t want to hear anything subjective about dreams.

Most field’s restrictions probably make a fair bit of sense for them. Journal space is limited, so even if all data can tell you something, they may judge that certain kinds of data rarely say enough, compared to other available kinds. Which is fine. But the not-published kinds of data are not “unscientific”, though they may temporarily be “un-X” for field X. And you should remember that as most academic fields put a higher priority on being impressive than informative, they may thus neglect unimpressive data sources.

For example, chemists may insist that chemistry experiments know what are the chemicals being tested. But geology papers can give data on tests made on samples obtained from particular locations, without knowing the exact chemical composition of those samples. And they don’t need these samples to be uniformly sampled from the volume of the Earth or the universe; it is often enough to specify where samples came from.

Consider agricultural science field experiments, where they grow different types of crops in different kinds of soil and climate. They usually don’t insist on knowing the exact chemical composition of the soil, or the exact DNA of the crops. But they can at least tell you where they got the crops, where exactly is the test field, how they were watered, weeded, and fertilized, and some simple stats on the soils. It would be silly to insist that such experiments use a “representative” sample of crops, fields, or growing conditions. Should it be uniformly sampled from actual farming conditions used today, from all possible land on Earth’s surface, or from random mass or volume in the universe across its history?

Lab experiments in the human and social sciences today typically use convenience samples of subjects. They post invitations to their local school or community and then accept most everyone who signs up or shows up. They collect a few stats on subjects, but do not even attempt to create “representative” samples of subjects. Nationally, globally-now, or over-all-history representative samples of lab subjects would just be vastly more expensive. Medical experiments are done similarly. They may shoot for balance along a few particular measured dimensions, but on other parameters they take whoever they can get.

I mention all this because over the last few months I’ve had some fun doing Twitter polls. And I’ve consistently had critics tell me I shouldn’t do this, because Twitter polls are “meaningless” or “worthless” or “unscientific”. They tell me I should only collect the sort of data I could publish in a social science journal today, and if I show people any other kind of data I’m an intellectual fraud. As if some kinds of data were “unscientific”.

Today I have ~24,700 followers, and I can typically get roughly a thousand people to answer each poll question. And as my book Elephant in the Brain suggests, I have many basic questions about human behavior that aren’t very specific to particular groups of people; we have many things to learn that apply to most people everywhere at all times. Whenever a question occurs to me, I can take a minute to post it, and within a few hours get some thought-provoking answers.

Yes, the subset of my Twitter followers who actually respond to my polls are not a representative sample of my nation, world, profession, university, or even of Twitter users. But why exactly is it so important to have a representative sample from such a group?

Well there is a big advantage to having many representative polls from the same group, no matter what that group. Then when comparing such polls you have to wonder less whether sample differences are driving results. But the more questions I ask of my Twitter followers, the more I can usefully compare those different polls. For example, if I ask them at different times, I can see how their attitudes change over time. Or if I make slight changes in wording, I can see what difference wording changes make.

Of course if I were collecting data to help a political candidate, I’d want data representative of potential voters in that candidate’s district. But if I’m just trying to understand the basics of human behavior, its not clear why I need any particular distribution over people polled. Yes, answers to each thing I ask might vary greatly over people, and my sample might have few of the groups who act the most differently. But this can happen for any distribution over the people sampled.

Even though the people who do lab experiments on humans usually use convenience samples that are not representative of a larger world, what they do is still science. We just have to keep in mind that differing results might be explained by different sources of subjects. Similarly, the data I get from my Twitter polls can still be useful to a careful intellectual, even if isn’t representative of some larger world.

If one suspects that some specific Twitter poll results of mine differ from other results due to my differing sample, or due to my differing wordings, the obvious checks are to ask the same questions of different samples, or using different wordings. Such as having other people on Twitter post a similar poll to their different pool of followers. Alas, people seem to be willing to spend lots of time complaining about my polls, but are almost never willing to take a few seconds to help check on them in this way.

GD Star Rating
a WordPress rating system
Tagged as: ,