- Top Priority
- Evidential Cooperation in Large Worlds
- Common Views on the Shape of the Universe
- Considerations for Independent Researchers
- Working Toward the Long Reflection Today
- Deconfusion About Trajectories
- Cluelessness, Robustness, and Ensembles
- Value of Information of Priorities Research
- Medium Priority
- Moral trade compared to regression analysis
- Effects of Importance-First Selection
- Space Governance
- Speed Reading Comprehension Rates
- Green Houses
- Controlling Evolution
- Importance of the Valley of Death Effect
- Classification Systems for Sandboxing Task AGI
- Importance of Prediction
- Pros and Cons of Funding Mechanisms
- Tit-for-Tat or Pavlovian Cultures for Safe AGI
- Low Priority
Evidential cooperation in large worlds (ECL, because W is annoying to pronounce) is a form of acausal trade between agents who are sufficiently relevantly similar to one another that their own behavior gives them evidence of the behavior of the others. They don’t need to simulate each other and of course they don’t need to communicate.
If they do it right, they can cooperate with one another and realize gains from trade, for example because different regions of space may lend themselves to realizing particular moral goals particularly cheaply. These gains from trade are maximized by a particular universal compromise utility function. So for all cooperators it’s rational to maximize one joint combination of all their moral goals.
One upshot is that ECL may significantly limit what moral goals it is rational for an actor to pursue. If this turns out to be sufficiently knowable and important, it’ll be a significant contribution to moral philosophy and priorities research.
- Understand the existing research by Caspar Oesterheld and Johannes Treutlein.
- Understand the requisite decision theory and bargaining theory.
- Understand the importance of updatelessness for successful cooperation.
- If we can notice comparative advantages for realizing others’ moral preferences, we can also notice comparative advantages for realizing our own moral preferences. That should be fine?
- If EDT agents are updateless by implication if that’s what gives them superior news and UDT/FDT/LDT are updateless by design, that doesn’t leave any major DT that can cooperate evidentially and yet might fail to be updateless, right?
- Understand the interplay with infinite ethics. We can’t simultaneously hope/assume that the universe is finite for standard aggregative consequentialism to work and that it’s infinite for us to realize gains from evidential trade.
- Estimate the magnitude of the gains from trade, maybe in the ideal case and in the realistic case where we implement the existing robust cooperation heuristics. (Others are already planning to do some version of this.)
- Publicize the results and recommendations or make sure particular decision-makers take them into account.
- Somewhat improve the allocation of enormous amounts of time and capital that is subject to the decisions of people in or close to the EA community.
Common models in prioritization seem to either be based on the assumption that the universe is finite (standard aggregative consequentialism) or infinite (evidential cooperation in large worlds). It is frequently mentioned that an infinite universe – one with infinite volume, not just circular – is the more common assumption among physical cosmologists. But I haven’t double-checked that or tried to understand their reasons for the assumption.
It is also important to make sure that the universe is no perfectly regular beyond a certain point as that would interfere with arguments to the effect that every permutation of particles is found somewhere in the universe (infinitely many times). (H/t Robert Harling.)
Further, it is important that not only space is infinite but also the number of atoms or the amount of energy within. (H/t Inga Großmann.)
Friends of mine also sometimes mention that Anthropic Decision Theory bears on this assumption. I want to understand this too.
- Read up on some physical cosmology.
- Refresh myself on Anthropic Decision Theory.
- Refresh myself on what Amanda Askell has already said or written on the topic.
If you inadvertently duplicate work that produces an easily verifiable result, your work was wasted (apart from skill-building etc.). But if you inadvertently duplicate work that is not verifiable, your result informs the aggregate assessment of the issue: If it is similar, the result more likely to be correct than one would’ve thought based on either data point alone; if it is different, neither data point is quite as likely to be the full answer.
This pushes researchers with a vast network and close collaborators to pursue research that is verifiable, because they can make sure to coordinate with most other people who might duplicate it. It pushes independent researchers without such a network to pursue research that is not easily verifiable.
There are likely to be more such considerations.
- Collect considerations for independent researchers: articles, social media posts, interviews, or thinking really hard.
- Publish them.
- Reprioritize my work accordingly.
The Long Reflection appears to me to be robustly desirable. It only suffers from being more or less unrealistic depending on how it is construed.
In particular, I feel that two aspects of it are in tension: (1) delaying important, risky, and irreversible decisions until after we’ve arrived at a Long Reflection–facilitated consensus on them, and (2) waiting with the Long Reflection itself until after we’ve achieved existential security.
I would expect, as a prior, that most things happen because of economic or political necessity, which is very hard to influence. Hence the Long Reflection either has to ramp up early enough that we can arrive at consensus conclusions and then engage in the advocacy efforts that’ll be necessary to improve over the default outcomes or else risk that flawed solutions get locked in forever. But the first comes at the risk of diverting resources from existential security. This indicates that there is some optimal trade-off point between existential security and timely conclusions.
I’m unsure whether I’m right to think that this is (fairly) urgent from an impartial perspective because I may well be unusually concerned with even moderate risks of lock-ins of terrible futures. This is something that I’ll seek to determine before I’ll try to grapple with the object-level question.
There’s also another sense in which it may not be urgent as the major lock-in that I can foresee concerns settlements beyond our solar system, which may only be attempted many centuries from now or, if sooner, then only after transitions (ems or AGI) that we should probably rather prepare for explicitly.
- Determine whether I have a personal bias in the matter.
- Consider whether there are strategies to facilitate the Long Reflection that are robust against major transitions like ems or AGI.
- Consider the next steps.
I understand trajectories as projections of world history onto a plane of time and some dimension that we care about, such as total welfare. Whenever there is an indefinite superlinear increase in (say) total welfare, a mere one-time delay or speed-up leads to the actual and the counterfactual world diverging increasingly rather than decreasingly over time. I find this highly unintuitive.
Maybe any development is bottlenecked in so many ways that any speed-up (or analogously any delay) only has a minimal effect because some other series of bottlenecks will be hit earlier.
This may not generally be the case (the fragility of history is likely heterogeneous), and some interventions may affect coefficients other than the constant term, which will lead to even greater speed-ups or retardations if true.
- Become less confused about trajectories.
- Find out whether this way of thinking about trajectories maps to well-known considerations that I already take into account.
- Or otherwise determine what consequences this has for prioritization.
The response to complex cluelessness that I’ve observed in the EA community over the past five years seems to me to be epitomized by a concentration on robust strategies, strategies that are particularly unlikely to have major problems, e.g., to backfire badly. The general approach is that researchers think about possible problems very long and hard, and then work out whether there are ways to prevent any problems they find.
Consequently, a well-developed proposal for an intervention looks more robust to me the more numerous and nonobvious the problems are (while also covering all the obvious ones) to which there are convincing preventive measures, and the fewer or less likely-seeming the failure modes are for which there are no preventive measures.
This seems almost satisfactory to me. There is the obvious drawback that of course we will have overlooked some problems with the proposals even after thinking about them long and hard. But then we’ll likely overlook problems with all proposals, including the proposal to think even longer and the proposal to give up. If the goal is to weigh the robustness of proposals, such incompleteness is unproblematic.
But it may also be the case that we’re biased in the way we look for problems, so that we may systematically notice the problems with one proposal and overlook the problems with another even though they are actually equally problematic.
- You put a lot of effort into modeling the mechanics of a toy model of the world.
- You simulate hundreds of ways your proposed intervention might turn out.
- Only then do you assess how likely and preventable the bad outcomes are.
Doing the manual assessment last prevents a lot of unconscious biases from creeping in that would otherwise control which scenarios you’ll think of in the first place.
It seems interesting to apply this to AI policy.
Philosophy has been an incubator of successful fields of science like physics. Maybe priorities research can follow a similar path. If there is any similarity there, a historical comparison may give us some hints of how much we’ll yet update away from our current best guesses as to our impartial priorities.
Our prospective updates may only be of the modest nature that we’ll be able to refute one of the currently plausible worldviews. A large EA-aligned funder like Open Phil is then likely to update on that. This scenario could serve as a lower bound on the value of information from further priorities research.
This can also inform our assessment of the relative importance of the Long Reflection.
- Familiarize oneself with the history of philosophy or scholarship in general, select examples that may be representative of priorities research in some way or another, and make inferences from their development to the development of priorities research.
- Decide on this basis what the probability distribution looks like over the range from no change of our current idea of impartial priorities to a 180° change of priorities.
- Estimate the range of capital that is allocated by funders who are influenced by insights into impartial priorities.
- Derive from that the value of information of further research into impartial priorities.
- Decide whether to invest more into priorities research or to concentrate on the top priority.
Some clusters of people (epitomized, to me, by Nick Beckstead’s thesis) conceive of ethical frameworks as regression over an immense set of moral intuitions that we have for particular situations.
To me, it is much more intuitive to think in terms of moral markets where people trade to reap moral (and other) gains from trade. (For example, a person may accept a longer commute so not to work for a company whose business model they view as unethical, or an animal rights activist may support a half-hearted certification scheme if no better certification scheme would stand a chance of being widely adopted.)
It may be interesting in what ways these two approaches differ, and if one of them appears to be clearly more suitable than the other. For example, it may be that there’s some bargaining solution implicit in the first approach, which might be important to realize.
All in all, I don’t consider this a top priority because I’m unsure whether it’ll yield any action-relevant results.
I typically first consider the most important projects that I can find (such as suffering risk, permanent dystopian lock-ins, etc.), and then filter further by such criteria as tractability, neglectedness, option value, personal fit, transferable skills, etc.
This approach can easily miss projects whose slightly lacking importance is outweighed by (say) great tractability. Maybe a search pattern that first looks for the most tractable opportunities and then selects the most important ones from among them would produce new high-priority projects.
Some typical criteria, like neglectedness, are not suitable for this role, but a preselection based on tractability seems feasible and resembles the a more general case of the prepared opportunism that I’ve advocated before.
I can’t currently think of a way of approaching this.
Quoting Tobias Baumann:
I argue that space governance has been overlooked as a potentially promising cause area for longtermist effective altruists. While many uncertainties remain, there is a reasonably strong case that such work is important, time-sensitive, tractable and neglected, and should therefore be part of the longtermist EA portfolio.
My reasons for not prioritizing it (even) more highly are that I see little personal fit for this work and Tobias’s argument for its urgency sounds urgent only on relatively long timescales:
But I would, again, argue that establishing good space governance is plausibly time-sensitive. It seems at least possible that our civilisation will start settling other planets in the not-too-distant future – say, within the next centuries.
But otherwise I find the topic brilliantly interesting!
I like to listen to texts at ~ 400 words per minute using text-to-speech software. I seem to have no problem processing the meaning at that rate and like to eagerly recommend it around among friends.
But my ability to read visually is much more limited. I can read at ~ 150 words per minute without technical assistance and at 300–400 words per minute using apps that display only one (or a few) words at a time, but it’s very tiring. (I can do it for at most half an hour before I get tired.) If the text is written in a poetic style that makes it enjoyable on other levels than that of its meaning, then slow reading works fine. If it is like most nonfiction texts, though, reading at a rate of 150 words per minute is supremely boring. The result is that my thoughts drift off tangents all the time and I forget to pay attention to what I’m reading. These thoughts are perhaps sometimes interesting but, in effect, my reading rate across a whole book is lower still.
Naturally, I want to listen to everything now, but I worry that it may have unwanted side effects. I spend less time thinking about the text or thoughts evoked by the text, so:
- I may not attach it to as many associations and thereby might make it harder to remember,
- I may accept it relatively uncritically, and
- I may fail to have interesting new ideas inspired by the text.
But I can easily reread a text slowly if it seems worth the time. So even though all of those seem quite testable, the result may just be that they are slight effects that may or may not be worth the saved time and effort. That trade-off seems less objectively testable.
Traditional agricultural systems continually kill insects with insecticides which are not optimized to be humane. (Brian Tomasik and Wild Animal Initiative on the topic.) Insect populations are usually proportional to the net primary productivity, but I would expect that this case is an exception because the insects are repeatedly killed before they can consume much of the biomass.
Insects, being short-lived, are more likely to have net negative lives than animals with longer lives and similarly painful deaths. I like to define that as a life that a sufficiently idealized version of the individual would not choose to (re-) live if they were given the choice. (Of course there are many further complications.) Therefore, insects in lawns and forests may already have net negative lives. But if these areas are used for fields instead, (1) the animals are killed much more often than otherwise, (2) the animals are not clearly fewer at any moment in time, and (3) their deaths are not clearly less painful.
Conversely, agricultural systems like Square Roots or the systems used in the Netherlands and Japan physically shield the crops from insects that would otherwise try to consume them. (H/t Inga Großmann.) Apart from accidents (gaps in the shielding) they can prevent the crops from having any influence on wild animal populations thus also rendering their continual killing unnecessary. If these systems are economically viable, they could displace some of the traditional agriculture, increasing total welfare.
The caveat that has led me to deprioritize this topic is that insecticide use has been suggested as a contributor to the long-term decline in insect populations. If work on this topic would lead to an increase or slowed decline in insect populations, the case for this work becomes a lot less robust and hinges much more on difficult questions about the average welfare of insect populations in the different conditions.
Evolution creates a lot of suffering by creating a lot of feeling individuals in a try-and-error fashion for no particular end. The process is highly immoral and the lack of a goal maybe amoral or also, by omission, immoral.
Slack1 or the dream time2 enable us to sidestep the worst of evolutionary pressures for a while. While we’re in a dream time–like period, we may have a chance to make it last. Or rather, after the dream time we’ll have no chance anymore.
A Leviathan, say in the shape of an AGI, might be the solution, but maybe there are other solutions that we could prioritize for redundancy and to allow more people with different strengths to work on different problems without incurring prohibitive coordination costs.
All sorts of selection mechanisms (intelligent foresight, competitions, etc.) that come without suffering may serve to accelerate progress sufficiently that no group that cooperates internally can be crushed by outside forces until all such groups can become one.
I’m not currently prioritizing this topic because its path to impact seems fuzzy to me and it has some risk of touching on controversial issues.
If you conceive of altruists and interventions as forming clusters according to human temperaments and fit for human temperaments, mismatches emerge.
For example, a lot of people will be happy to support Make-A-Wish-like organizations and a lot of people will be happy to support MIRI-like organizations but some organizations or interventions combine properties in such a way as to be the first, obvious choice for no cluster of altruists at all. (They may well be the second choice for all.) ALLFED has been cited as an example of such an organization.
It would be interesting to determine whether the resulting neglectedness – sometimes called the Valley of Death – may again make these organizations or interventions particularly attractive or whether it’s insufficient.
In a comprehensive AI services (CAIS) context, it may be important to shield task AIs from some others in systematic ways to prevent failure modes where an AI that can coordinate other AIs (or a human) gains too much power.
Intelligence agencies have developed classification systems that limit the risks from individuals exposing whatever information they had been authorized to see and sometimes allow them to trace leaks. It would be interesting to understand such systems and transfer them to the CAIS context.
For example, it may be advisable to (1) never deploy more than one exact copy of a system but always make alterations to the code such that the output is subtly different in ways that are not apparent without comparing two systems, (2) never allowing anyone access to two such near-identical systems once they’re running, and (3) only ever give system access to up to two other systems.
Rule 3 limits the power of any one system. This assumes that each system operates as a black box but that the communication between systems can be monitored. Rules 1 and 2 make sure the overseers can identify the origin of misused information. This should keep the system generally capable while allowing overseers to readily extricate parts of it that are not working their interest.
This feels vague and inchoate to me, but some literature research may shed light on whether ideas in this direction could work and be useful.
Finally, there should also always be encouraged cooperative routes for all systems that are agent-like to achieve their goals so that underhanded methods are both difficult and unnecessary.
Forecasting – through superforecasters or prediction markets – seems valuable and popular in the EA community. But I don’t know just how valuable it is, so some Fermi estimation may be valuable here to trade it off against other things people (like me) could be doing.
Funding mechanisms such as Patreon/Ko-fi, impact purchases, grants, and different sources of salaries probably all come with advantages and disadvantages. Dimensions include the overhead for the fund managers or donors, overhead for the recipients, control over cannibalization of funding, and dependencies. I feel like I lack an overview of these considerations.
Herrmann et al. (2008) have found that in games that resemble collective prisoners dilemmas with punishment, cultures worldwide fall into different groups. Those with antisocial punishment fail to realize the gains from cooperation but two other cultures succeed: In the first (cities Boston, Copenhagen, and St. Gallen), participants cooperated at a high level from the start and used occasional punishments to keep it that way. In the second (cities Seoul, Melbourne, and Chengdu), the prior appeared to be low cooperation, but through punishment they achieved after a few rounds the same level of cooperation as the first group.
Sarah Constantin writes:
In Wedekind and Milinski’s 1996 experiment with human subjects, playing an iterated prisoner’s dilemma game, a full 70% of them engaged in Pavlov-like strategies. The human Pavlovians were smarter than a pure Pavlov strategy — they eventually recognized the DefectBots and stopped cooperating with them, while a pure-Pavlov strategy never would — but, just like Pavlov, the humans kept “pushing boundaries” when unopposed.
As mentioned, I think these strategies map somewhat imperfectly to human behavior, but I feel that I can often classify the people around me as tending toward one or the other strategy.
- Break rules until the cost to yourself from punishments exceeds the profits from the rule-breaking.
- View rules as rights for other people to request that you stop a behavior if they disapprove of it. Then stop if anyone invokes a rule.
- Push boundaries, the Overton window, or unwritten social rules habitually or for fun, but then take note if someone looks hurt or complains. Someone else merely looking unhappy with the situation is a form of punishment for an empathetic person. (I’m thinking of things like “sharp culture.”)
- Don’t worry about etiquette because you expect others to give you frank feedback if they are annoyed/hurt/threatened by something you do. Don’t see it as a morally relevant mistake so long as you change your behavior in response to the feedback. (This seems to me like it might be associated with low agreeableness.)
Tit for Tat behaviors:
- Try to anticipate the correct behavior in every situation. Feel remorse over any mistakes.
- Attribute rule-breaking, boundary-pushing behaviors to malice.
- Keep to very similar people to be able to anticipate the correct behaviors reliably and to avoid being exploited (if only for a short number of “rounds”).
This way of categorizing behaviors has led me to think that there are forms of both strategies that seem perfectly nice to me. In particularly, I’ve met socially astute agents who noticed that I’m a “soft culture” tit-for-tat type of person and adjusted to my interaction style. I don’t think it would make sense for an empathetic tit-for-tat agent to adjust to a Pavlovian agent in such a way, but it’s a straightforward self-modification for an empathetic Pavlovian agent.
Further, Pavlovian agents probably have a much easier time navigating areas like entrepreneurship where you’re always moving in innovative areas that don’t have any hard and fast rules yet that you could anticipate. Rather they need to be renegotiated all the time.
Pavlov also seems more time-consuming and cognitively demanding, so it may be more attractive for socially astute agents and for situations where there are likely gains to be had as compared to a tit for tat approach.
The idea is that one type of culture may be safer than another for AIs to learn from through, e.g., inverse reinforcement learning. My tentative hypothesis is that the Pavlovian culture is safer because punishments are small and routine with little risk of ideological, fanatical retributivism emerging.
North Korea strikes me as a great example of a totalitarian regime straight out of 1984. Its systematic oppression of its citizens is so sophisticated that I could well imagine a world-wide regime of this sort to be stable for a very long time. Even as it exists today, it’s remarkably stable.
The main source of instability is that there’s a world all around North Korea, and especially right to its south, that works so much better in terms of welfare, justice, prospecity, growth, and various moral preferences that are widely shared in the rest of the world.
There may be other sources of instability – for example, I don’t currently understand why North Korea’s currency is inflated to worthlessness – but if not, then we, today, are toward a hypothetical future global totalitarian state what the rest of the world is to North Korea.
Just like some organizations are trying to send leaflets with information about the outside world into North Korea, so we may need to try to send messages into the future just in case a totalitarian dystopia takes hold. These messages would need to be hard to censure and should not depend on people acting against their self-interest to distribute. (Information from most normal time capsules could easily be suppressed.) Maybe a satellite can be set on a course that takes it past earth every century and projects messages against the moon.
This probably not the most cost-effective method, so I’d first like to think about approaches to this more. I’ll leave this under low priority until I have some more realistic ideas.
Maybe measure, in the many-worlds interpretation sense, should play a role in hiring decisions, e.g., because if we hire people with relatively low measure, our actions as an organization tell us less about our actions in other branches. But then again we can just assume that everyone has rather high measure, and we’ll be able to cooperate just fine in most branches. Plus, everyone is, ipso facto, likely to have quite high measure unless they’re selected in a special way, for example, if you want to hire the top free solo climber in the world. So it probably makes hardly any difference.
I’d like to know how much I should invest to prevent fruit flies from being born into potentially net-negative lives by leaving my bananas lying around. (E.g., by not leaving my bananas lying around.) I eat a lot of bananas.
Tax exemptions usually make a difference of a factor of two at most but some people are probably convince by them to donate to organizations that are likely one or several orders of magnitude less effective. Maybe someone should check how often such misallocations happen and whether a bit more awareness of that can make a difference.
Update: This has now been done.
Disclaimer: I don’t generally endorse the works of the author. Alexander originated a wealth of helpful ideas, so that I can’t help but cite him lest it seem that I plagiarize them. Unfortunately, (1) the community around his blog contains some insalubrious factions, and (2) until roughly 2016, he himself still published articles that presented issues in a skewed fashion reminiscent of the very dynamics he warns of in Toxoplasma of Rage. Occasional remarks make me think that he hasn’t so much updated away from his previous opinions than that he merely doesn’t address them on the blog. I’m adding these disclaimers to avoid the impression that I accept such intellectual wantonness or that it is accepted in my circles. ↩
Disclaimer: I don’t generally endorse the works of the author. Hanson originated many useful ideas, so that I can’t help but cite him frequently lest it seem that I plagiarize them. Unfortunately, he seems to be attracted to controversial ideas for controversiality’s sake. But controversial ideas are unusually often insensitive, inaccurate, misleading, or all of these. They need to be treated with special care to make them as uncontroversial as possible. I’m adding these disclaimers to avoid the impression that I accept such intellectual wantonness or that it is accepted in my circles. ↩