These are some rather introspective worries of mine to the effect that the longtermist community may be missing out on people with a particular knack for finding counterexamples.
My mental model of people, which is probably informed or biased by few extreme cases, suggests that people vary widely in the following abilities:
Memory and “indexation”1 or connectedness of memory. That is the degree to which they can immediately think of many relevant experiences and relevant insights to inform a decision. (Maybe they’re also good at inventing great metaphors?) To that end they need to remember the experiences or insights in the first place, but they also need to recall them in connection to the decision.
Theory of Mind or conceptual perspective-taking. It seems to me that there are several levels of sophistication here: Level 1 is interpreting a situation from the perspective of a person whose emotional reactions are systematically different from one’s own. Level 2 is interpreting a situation from the perspective of a person who knows more or less than oneself. Level 3 is interpreting a situation from the perspective of a person who has a different model of it, and you’ve not worked together on identifying your cruxes or the cruxes are so fundamental that their implications are likely very different along long inductive chains.2
Noticing. Noticing contradictions is obvious if two contradictory statements follow in quick succession. But if a statement contradicts something that is not related to anything the statement highlights, something that is latent, say, because one has thought about it explicitly for the last time years ago, then it’s much harder to notice the contradiction.
Memory and Indexation
When I talk with my friends, who are largely EA or EA-adjacent, there is a tendency that they agree with me on the following approach to, let’s say, ontology: (1) You are surprised by something, (2) you update a relevant internal model and propagate updates to related internal models as well as possible, (3) you gradually forget the particular experience that surprised you.
I suspect that a lot of people follow an approach more like this: (1) You are surprised by something, (2) you distill the essence of the surprising situation, some sort of centroid or archetype, something that implies a set of references classes, (3) you file the surprising experience away in an indexed structure that makes it easy to retrieve it again when you’re in a situation that fits the same archetype.
In the end, most people probably do both, but it seems to me that everyone has a tendency in one direction or the other. I surely use both, but in the first case I feel like I have a routine grasp on what it’s like, while in the second case I’m gesturing at something elusive and half-forgotten.
People who excel at the first cognitive style have an easy time generating revolutionary new hypotheses: They take the model they already have and see what it implies. In a sizable minority of the cases, it’ll imply something they haven’t seen yet: They’ll find it obvious to have (cognitive) empathy with beings that they’ve never been, worry about risks of catastrophes that have never happened, and were probably worried about corona early on even without knowing about the Spanish flu.
But if you mix in a lot of openness and a bit of confirmation bias, they may become hedgehogs espousing theories that are simple, elegant, and wrong. (More wrong than would be a good price to pay for the simplicity.) In self-selecting communities, in particular, they may get continual social proof for such theories from people around them who share the same cognitive style. It is hard to distinguish people who are better than you at something from people who are vastly better than you at it, so decision-makers who specialize in the first cognitive style and are only slightly above average at the second style may undervalue people who are exceptional at it unless these people are also exceptional at other things that the decision-makers can recognize.
For example, if I’m tasked to learn something by heart, I rather try to find some underlying regularity, so I can just rederive it on demand rather than having to laboriously learn it. Interestingly, this feels like it’s high status in my circles, but I just do it because it’s easier. It’s probably good when the generating model is fairly regular and general but bad when the information seems uncompressable or I need to be fast and the rederiving takes too long. My reaction is usually to focus on areas where I can use my cognitive style well and avoid the ones where it is unhelpful. This may be a major part of self-selection in communities.
Quantum mechanics, especially the many-worlds interpretation, and various brands of utilitarianism probably suit this style well.
Meanwhile, people who instead excel at the second cognitive style excel in areas where the information is hard to compress – highly nonlinear or lacking in the sort of structure that lends itself to abstractions vis à vis human cognition (e.g., law, medicine, psychology, sociology, history). Dr. House may be an example. They are great at encountering a situation and immediately thinking of all the relevant associated bits of information – maybe the archetype that certain symptoms boil down to, what diagnoses are plausible given the symptom combination archetype, and how to test for them. Or if the situation is a hypothetical implication of someone else’s model, they are great at finding counterexamples: For every implication of a model, they can distill the archetype of the situation and retrieve countless examples of how such situations have played out in the past. If these experiences tend to contradict the implication, they can warn that something is off about the model. If the situation has never happened before, they may just have to distill a bit harder to find an archetype whose reference class does contain a sufficient number of experiences.
But they have no grand models that would allow them to derive implications that have never happened (though curiously they seem to be good at the deriving itself when you give them a model), and they may be unduly sceptical of models that have unprecedented implications in the rare cases where these are actually about to happen: They’ll increasingly widen the reference class that their archetype implies until it starts to contain instances, but then these are fairly unrepresentative so that the fraction of confirming and disconfirming instances is uninformative. If the implication is unprecedented, maybe wide reference classes that contain it are also still full of rare events, so maybe the disconfirming instances are even systematically overrepresented. (E.g., there has never been a virus like COVID-19, so they expand the reference class to SARS, MERS, ebola, and the seasonal flu, and get uninformative results.)
The field that I classified as “highly nonlinear or lacking in the sort of structure that lends itself to abstractions vis à vis human cognition (e.g., law, medicine, psychology, sociology, history)” are known for degrees that involve a lot of memorizing. People who excel in these fields probably excel at the second type of cognition. In these areas, the second cognitive style excels because it can actually work efficiently with all the chaotic information that is so hard to compress.
The areas that are attractive to them are probably necessarily named for the problems rather than the solution concepts because the solution concepts are so diverse. Above are some fields that seem like good examples to me.
I’m not calling these people “foxes,” because foxes seem to me more like the people who manage to combine the best of both worlds, not extremes in either direction.
Self-selection of people with one or the other cognitive style into communities like effective altruism or exogenous selection of them into EA organizations may be unproblematic if these styles are highly correlated with general competence and general competence is what determines the selection. Then most people should be about equally good at both with a few having a slight preference one way or the other.
But if (1) people really vary widely in this regard, or (2) the self-selection is heavily influenced by cognitive styles or (3) the exogenous selection is based on markers of a particular cognitive style, then one group will end up being vastly overrepresented.
I have the unsubstantiated hypothesis that risk factors 1 and 2 are likely true and risk factor 3 may be in a few cases:
Risk factor 1: My own thinking, noticeably in my childhood, has been rather hedgehog-like. It was only in my adolescence that I started seeing this tendency of myself as a bias and questioned it. Conversely, I’ve known something who, due to a brain injury, had sustained some disabilities and also scored around IQ 60–70 in some unnamed IQ test. At the time we met, she was doing her PhD in anthropology. What I remember of our extensive conversations was that I usually enjoyed them for bouncing new, weird hypotheses off of her, which she usually promptly destroyed with a wealth of counterexamples that cut at the core of my hypotheses. These counterexamples where what inspired my term “archetype” – they weren’t specific situations that could be dismissed as isolated exceptions to an otherwise sound rule but situations where it was clear from the structure of the situation that they contradicted my model. She couldn’t muster any interest for EA.
Risk factor 2: Here I’m drawing on my impressions of my friend circles. I can easily think of friends of mine who are strong modelers. I can only think of a few strong indexers, and they’re also pretty strong modelers. I don’t think I currently know anyone with a really pronounced indexer preference.
Risk factor 3: Here I’m just guessing that if there’s a preponderance of modelers, that may also be true of position where people influence hiring decisions, so that they may be less able to recognize exceptional indexers than they’re able to recognize exceptional modelers. But at the one organization where I know people better, at least a few top people are also really good indexers. Maybe indexers are more efficient in operations roles.
Perspective-taking is a separate skill. I find that it comes in different difficulty levels. I’ll use the first person here, because I haven’t checked whether others would order these differently.
If I already know in what specific ways someone else is different, and the implications of these differences are straightforward, then I can adjust for them easily. Say, I’m concerned what others think about me and someone else is not concerned what others think about them. Now the other person is treated disrespectfully in front of a group. I might’ve been worried about how that’ll affect how the groups sees me, but I can infer that the other person, being less concerned about that, will be more straightforwardly cross with the perpetrator.
Level 2 resembles the marble test: The subject sees that someone puts a marble in a place and leaves, and that someone else then places the marble in a different place. The first person returns and the subject is asked they will look for the marble.
The more challenging version goes: You’ve studied computer science and worked as a software engineer for a decade; another smart person has studied psychology and worked as a psychology researcher for a decade. They ask you what database migrations are. How much do you have to simplify and analogize your explanation so that they can just understand it but not so much that they feel patronized?
A different version of this is more about expectations than about knowledge. I meet a new person, and don’t know whether they like small talk. They start small talk. Can I reciprocate the small talk? I want to reciprocate small talk iff they are more likely to enjoy than not to enjoy small talk.
My prior is that someone who likes small talk is more likely to open with small talk and someone who doesn’t is less likely to. Also someone who dislikes small talk may be less likely to assume that I enjoy small talk than someone who likes it, so if their prior of me is that I like small talk, they’re more likely to open with small talk. And if they realize this and think that I will model them as someone who likes small talk if they open with small talk, then they likely think that liking small talk is something good, so they probably either like or want to train small talk. So this points in the direction of updating toward them liking small talk. This consideration feels like a positive but asymptotic feedback loop at approaches some limit. Other people are perhaps much better at intuiting where that limit is.
But it’s only an implication, so they might also start small talk for other reasons than liking small talk. It might be that they only started small talk because they thought it was polite but don’t actually like it. This complicates the situation and leads to a much slighter update toward them liking small talk. (Am I right to think that the update is still positive?) Putting in an optional small talk escape hatch (“I’ve been really enjoying reading about what decision theories are implicit in the training of AIs. What have you been up to?”) seems to help to generate more evidence, but I suppose it might be taken as a cue for the other that you want to end the small talk, so it’s hard to make it truly neutral.3
I suppose that indexers, and with a bit of practice about anyone, no longer think through all of these considerations but just act according to a behavior that fits the situational archetype, which they have stored.
But it gets more complicated when several people are involved. A simple, linear version is the joke: Three mathematicians enter a bar. The bartender asks, “Would you all like a beer?” The first mathematician: “I don’t know.” The second one: “I don’t know.” The third one: “Yes.” But that only works absent any noise from social customs, politeness, pragmatics, and reasoning styles. The joke would not work with “Three people who don’t know each other enter a bar.”
I found a more complex and noncooperative demonstration of this reasoning problem in the scene in Dumbledore’s office in chapter 18 of Harry Potter and the Methods of Rationality (but this chapter does not make sense in isolation) and later in the book in the complexities of the three-army game. It’s difficult to imagine that some people have been in all possible permutations of these situations to have meaningful achetypes stored.
A wholly different solution is the Pavlov strategy where you just do what you want and watch carefully how the other person reacts. If they seem unhappy with the situation (the altruistic “punishment”), you adjust your behavior until you’re both happy.
Level 3 is similar to level 2 except that the difference is not so much one of knowledge than one of world models.
For example, someone makes a statement, even a really long and comprehensive one, that strikes me as somehow confused but in a confusing way that I can’t make sense of. What I don’t know is that they subscribe to CDT, dualism about consciousness, moral objectivism, and a sort of essentialism about personal identity, none of which I share. But they don’t know these terms and can’t tell me. From within the framework of these concepts, insofar as they form a coherent framework, their statement was straightforward. But to understand that, I first have to recognize all the above cruxes and then take the perspective of someone whose model of the world is so different.
Once we’ve (collaboratively) identified all the cruxes, this should resemble Level 1. But I’ve found that it can be much more difficult than Level 1 if I’m not practiced at using the different model. Over the years of using the model, the other person will have noticed various implications the model has for religion, population ethics, and social customs. But I, being new to it, may be either unaware of these implications or unaware that some of my previous conclusions in these matters don’t hold under the new framework.
There is something magical about noticing something without paying attention to it. Maybe some System 1 background process is constantly paying attention to thousands of things and notifies System 2 if something odd transpires? Or rather maybe there’s a hierachy of such processes that each pay attention to one more and more fine-grained type of oddity?
In any case, if you are working on a task, your whole attention is on the task, and you get completely immersed in the intricacies of the task, then it may sometimes be hard to notice when you start working on the task for a longer time than the result is worth. Unless you explicitly timeboxed it from the start.
Or in an interview, you might concentrate on the interview partner, on asking interesting questions, on avoiding questions that would force them to mention things they can’t talk about, on coming up with the best spontaneous follow-up questions, etc. It may be hard to notice when the interviewee uses a term the audience may not know or when you start to go more in depth on a topic than it deserves given the limited airtime you have. (I think the TV show Monk was largely about noticing completely ordinary things that in the particular situation under the generally accepted assumptions should or should not have happened.)
Tying It All Together
All three of these abilities are crucial for collaborative truth-seeking – or in particular collaborative mistake-spotting.
Most situations are probably unlikely and uninformative, so it will be hard to derive truths from a model that it doesn’t imply or that it even contradicts. But if you have a repository of truths and you have a method of indexation that allows you to look up relevant ones at will, it becomes more likely that you can find the pertinent counterexamples.
If the person who has the model is not you yourself, you need to first understand another person’s model – e.g., find all the cruxes – and its implications before you can bring to bear your skill at finding pertinent counterexamples.
And most such models are never discussed as explicitly as during a double crux exercise. Rather they’re latent, and sometimes something as fleeting as the connotation of one word or other in a fast-paced conversation on some orthogonal topic can reveal an unknown difference between one’s own and the other person’s model. A difference that may result in a lot of wasted work if it goes unnoticed.
AI safety, priorities research still strike me as so fragmented that indexers may actually do well in them, but I suppose that’s more because of their relative novelty.
University friends of mine (none of who are today associated with EA or Less Wrong) reported different experiences reading the Haskell 98 Report (Haskell 2010 wasn’t out yet) compared to some other books on Haskell: I found it amazingly pleasant reading because they put the definition first and then added some examples to confirm that it applies as expected in some edge cases. Some university friends of mine found a book much more instructive that contained many examples but almost no definitions. It was almost as if they didn’t need the definitions to learn to write their code. When I had read the same book, I tried to interpolate the definition from the examples, which was amazingly cumbersome compared to just reading the Report.
I suppose this blog post is “self-anti-suggesting.” I should downgrade my confidence in it for the very reasons it lists. Confusingly, that means that I’m less obliged to take it seriously, so that I also don’t need to downgrade my confidence in it as much. But I suppose there’s some limit between my felt confidence and the confidence after the first update that this feedback cycles updates me towards?
There’s a trope of students learning things by heart for exams that they were supposed to derive during the exam. Is there a corresponding meme of students needlessly rederiving things during exams that could’ve just memorized in some other fields I’m not familiar with?
The EA type of thinking seems to be uncommon and highly attuned to modelers. Self-selection effects tend to be surprisingly extreme (see part III). So a small self-selected group within a large population can probably still self-select strongly against the population base rate?
There’s also this funny effect, so maybe the selection effect is relevant even if indexer and modeler abilities are highly correlated, e.g., because the top people in either group will have an outsized impact, will be modelers, and will be (per this effect) highly preferential modelers?
It’s a bit hard to keep track of just how well-informed a model is if information only comes in once a year, and you don’t have exceptional memory. So maybe modelers have much higher variance in terms of over- or underconfidence in their models. If it is overconfidence, they may get stuck with some only-locally-optimal model. Indexers who also model may be much quicker to discard their models and start from scratch, also because they still have the almost-raw data that informed the model. So they may be more “conservative” in that they are less willing to jump to hypothetical conclusions but less “conservative” in that they are less likely to cling to a bad model.
A teacher of mine once implied that he assumed that everyone categorized everyone else within seconds of meeting them. That had seemed very foreign to me. Interestingly, he was a history and language teacher.
There is the structurally reminiscent distinction between the inside and the outside view. But it seems orthogonal to my distinction here.
Here is Eliezer Yudkowsky working hard to understand someone with a funny world model. So what I called level 3 of perspective taking.
Maybe people don’t start out with a strong indexer or modeler disposition but it’s a feedback loop that increasingly turns them into one. They start doing one or the other a bit more, and so, say, they have a model that works fairly well for them, so they use it, so it becomes even more useful, so they use it more. Meanwhile, they’ve always discarded their situational memories rather than memorizing them, so they don’t have much of a useful repository of those, so the ability to memorize these things atrophys, and so on. Well, and conversely for the indexer.
Or maybe there is some innate effect, say, because people with more intelligence than memory (according to some sort of population percentile let’s say) tend to become modelers while people with more memory than intelligence become indexers – well, and the majority of people have both in similar amounts and so don’t develop a particular disposition.
This is a bit of a jargon term here. What I mean is the same principle that an index of a book employs. It tries to list every word anyone might likely look for and for each word lists the pages where it features. This way, you can easily find every occurrence of the word. The alternative would be to leave through and skim every page of the book. ↩
Please ignore the exact classification if it seems unhelpful. There are a few challenges that I can’t clearly classify either, so it’s probably not very good. ↩
I somehow still feel a vague sense of embarrassment in situations that are clearly mutual, so this type of reasoning doesn’t seem to come naturally to my System 1, or I’m making a mistake somewhere. ↩