Real Research

Binary search for truth

Jul 22, 2023

(Context: Written while reading through Joshua Angrist, Mostly Harmless Econometrics, which I later discussed with a tutor)

A Discussion on The Scientific Method

I already have a problem with the framing of the book, in particular the four questions (relationship of interest, ideal experiment, identification strategy, mode of inference)

It's not a problem with the book per-se but a more general problem with a set of beliefs educated people tend to have

Where truth is discovered by certain people (researchers) in a certain way (identify questions -> design experiments around questions -> draw conclusions)

But there is really no reason to believe that generating "research" is the best way to discover the "truth"

I really liked Adam Mastroianni's blog posts "The Rise and Fall of Peer Review" and "The Dance of the Naked Emperors". The most memorable quote I think is in response to someone saying peer review really improved their work prior to publication and resulted in better research:

When people spend a collective 15,000 years of labour every year commenting on each other's work, I would certainly hope that some of it is helpful. That's about the lowest bar it could clear.

Or that discovering the "truth", as a separate project from actually doing stuff, is the best way to positively impact public policy

What I have in mind is activists picking and choosing their favoured research paper, or even going into "research" to argue their favoured side

I feel like this is only happening because "doing stuff" and "research" have been separated into two spheres, similar to that quote "A nation that makes a great distinction between its scholars and warriors will have its laws made by cowards and its wars fought by fools." (Side note: I just learnt this quote is misattributed to Thucydides, and actually comes from some British dude named William Butler!)

If the world is really, really complicated, such that there are a billion combinations of smaller class sizes and other factors, where under some combinations smaller class sizes help and under some combinations don't, I don't think it makes sense to have multi-year experiments knocking out specific combinations one by one. It makes more sense to me to wrap up the complexity and give a bundle of fixes at a certain cost, and as long as the bundle works, we don't care why exactly it does

I suppose the latter is what policymakers do with research in practice. There's some research finding, activists make hay of it, policymakers trial it. But do we really need research to say it's feasible that smaller class sizes will improve test scores, so let's give it a try? I suppose we do need someone to measure the results once we actually try something, but I don't think observing that test scores have increased require particular technical skill

"Research" (or at least my impression of it, since I haven't actually read any directly) seems to be at a weird middle level abstraction, where it doesn't describe the atoms of interaction, but at the same time it doesn't admit that it's just a decision-making tool to narrow down uncertainty and tries to make claims about "truth"

If I had to point to a positive example of what I mean by combining "doing stuff" and "research", I would point to Econtalk's interview of Roland Fryer:

, where he described a project he did to improve the education of the 20 worst elementary, middle, and high schools in Houston. To save you some time (though the whole interview is really a treat, I would be curious to read any papers he wrote), these are the five things he implemented:

The Basic Physics of Education - To learn more, students have to spend more time in school. Fryer lengthened the school year by 2 weeks and the school day by 1 hour
Human Capital - Fryer removed 19 of the 20 principals and 50% of the teachers
Data - Fryer brought in data systems to implement short-cycle assessments every three weeks. If 80% of students understood something, the class would move on and the 20% would be tutored, but if only 30% got it, the class would be retaught
Tutoring - Fryer hired 400 tutors to give small-group, high-dosage tutoring for Grades 4, 6, and 9. He couldn't afford to do it for all grades (this was a $60 million dollar experiment!)
Culture of High Expectations - Fryer said some of these schools, their goal was that 40%(!) of kids should have basic skills next year. And some of the teachers would say, we don't need five things, we would only one thing, smarter kids(!). Needless to say he fired them, lol

What Fryer did I think goes against what the common understanding of what "research" is. He was doing a buttload of different stuff based on his prior qualitative research of high performing charter schools, across a large number of schools, with no real controls. He actively intervened in the research, so there's no "double blind". He's an especially charismatic and credentialed (Harvard professor!) guy given an unusual level of control and budget because the schools were going to be taken over by the state, and 2008-2010 when he did the project was the height of the school reform movement, so it is not easily replicable. Heck, we could easily imagine the things he did backfiring; firing the principal and 50% of the teachers could very easily go very wrong. (The sheer amount of work and unpopularity of the measures required was why what Fryer did didn't become popular.)

But because of that, I think he proved very convincingly that with more resources ($1,863 more per kid) and better management, the worst students could catch up - "In three years in the middle and high schools we closed the racial achievement gap in math, and cut it by a third in reading. In five years, we did the same thing in the elementary schools. For our high schools, every single pupil was admitted to a two- or four-year college. This is 20,000, 30,000 kids. Before that, it would have been half." Instead of endless back and forth over numerical "research" whether or not more giving schools more resources would improve test scores based on this or that re-analysis.

How to do "real research" is something that I've been thinking about for a while, though I don't have any good ideas yet. If I'm trying to think about where statistical analysis is useful vs where it is not, I'm thinking of things like German tank identifiers giving a clue about how many tanks Germany had in World War I/II (forgot which it was), or where statistical discrepancies show that a regional manager might be doing fraud. So purely descriptive statistics, from a far away distance. In terms of causal statistical analysis, we might have John Snow's cholera experiment or certain other medical experiments e.g. showing that handwashing works. These are active practitioners actively intervening in a system they control (similar to Fryer).

I, as a random person, cannot do the latter, and there's no real guide to teach someone outside e.g.the education system how to become involved the education system, to even develop some kind of expertise/causal model that can be used to intervene. So everyone just uses papers-at-a-distance, like how middle class people used to read travel guides when it was too unaffordable to actually travel overseas.

Post-Discussion Addendum

During the discussion I came around to the idea that econometrics has its place in policy-making, at least in the context of the United States, where a trial study is required before real implementation

The tutor brought up the fact that many people do cherry pick studies, and since I am unbearably contrarian, I then realised that that's an uncharitable view. There are people who sincerely care about figuring out the truth, and I really shouldn't devalue the whole research process just because it's abused by bad actors

I think my feeling is, I've overestimated econometrics. It's not a means for discovering the truth, which must come from observations and conjectures about cause and effect. It's a means for checking the truth after we think we know what it is. Yes, it is important, in the same way accounting is an important part of business, but the most important part is elsewhere

The tutor also brought up the cost of RCTs, and its weakness in not being able to identity how different groups may have different responses (since the whole point of RCTs is to average out responses to determine an "overall effect".) I brought up how there can be a separate study to investigate how groups differ, but how that raised the question if it was worth the cost

I suddenly had the insight that I was misunderstanding what these tools e.g. RCT, quasi-experiments, etc. were meant to do. First, there is no study that will exactly deduce the truth. What studies do is reduce uncertainty. If I have strong result in one direction, then the space of what is possible or likely shrinks. I suspect (I have no personal experience of academia, so I'm wildly speculating) that most researchers try to pin down the "truth" based on some popular research idea, instead of strategically reducing uncertainty. Which is bad because more studies in the same area increase precision, but there's no guarantee of accuracy in the first place. Second, rather than blindly insisting on specific "ideal" procedure (and have procedures so strict we conclude we cannot fix cars on the side of the road), we do have a means of choosing what procedure to do: reducing the most uncertainty at the least cost

Process Over Outcome

Discussion about this post

Ready for more?