Introduction

One of my projects in 2020 was a software for doing math with probability distributions – SquigglyPy. Sort of like Guesstimate but with the ability to estimate probabilistic trajectories over time or some other independent variable.

I had seen a number of researchers either write code for Monte Carlo simulations from scratch or create models based on mere point estimates. My software would save the first group some time and help the second group get a better sense of the degree of uncertainty in their model.

My vision for it was of course greater than that. I hoped to inspire people to use the software to model various interesting effects in the world in the various ways that seem most useful or parsimonious to them, to publish these various models online, and then to recombine existing models to create ever more comprehensive models. Thus people would loosely collaborate to create a faster and faster growing ecosystem of Baysian models of the world – along the lines of how open source software allows developers to develop new software faster and faster by recombining increasingly powerful chunks of existing software.

My hope was that this would be done primarily by researchers in global priorities research and would lead, over the course of decades, to an improved allocation of resources.

I’m putting this project on hold for now because (1) I updated downward on the time we have left till transformative AI emerges, which had complex effects on my prioritization and (2) because I didn’t feel like it had as much traction as other projects of mine. That may change of course, and I may pick it up again at a more opportune time.

Meanwhile I’m happy to add other developers to the repository or to accept pull requests. In particular, it would be great to improve:

  1. the tests, to catch mathematical errors,
  2. the sensitivity analysis, which is currently a hacky sketch on Starboard, and
  3. the documentation, later, when the code is better tested.

Warning: When using SquigglyPy, please take great care to sanity-check all your results! Hardly anyone has used the software yet and the tests are currently better at catching broken code than subtly wrong math.

Squiggle

My vision was similar to Ozzie Gooen’s (Quantified Uncertainty Research Institute), which is part of the reason I chose the name SquigglyPy. Ozzie developed a similar software in ReasonML, which also runs in the browser. It’s the ReasonML implementation of his Squiggle language whereas SquigglyPy is the Python implementation.

I makes senses to me to implement Squiggle in a few different languages:

  1. Any language that compiles to JavaScript or WebAssembly has the advantage that it runs in a browser. The right UI and an efficient implementation can give people the ability to experiment and explore rapidly. Models might be shared with others through repositories on the same server to save the time it would take to download and install a software from a popular existing platform like NPM or PyPI.
  2. Languages like R, Julia, and Python have the advantage that the main target demographic (as I currently imagine it) uses them and has built a wealth of statistical tooling for them already.
  3. Finally, Python has the advantage that there is already a project that has made Python run in the browser, combining all advantages. (It also has the advantage that I’m very familiar with Python.)

But there are also drawbacks to using Python in the browser:

  1. Pyodide is a very small, young project and may not survive.
    1. But chances are another project will pick up the slack because Python is popular and WebAssembly seems to me like the obvious next step in the technological development of websites.
  2. Many statistical libraries use some C or Cython extensions, and so can’t just be installed through Micropip. Instead the Pyodide team (or whoever wants to use them) has to create a WebAssembly version of them. The Pyodide team at least is too small to support a lot of WebAssembly builds.
    1. But over the years it may become more common for library authors to produce WebAssembly builds themselves or for companies to publish any such builds that they’re using for their products.
  3. The Python interpreter takes a moment to load, which may degrade the user experience compared to a JavaScript implementation.
    1. But it loads only once per page load and then stays in memory.

I feel like the advantages outweigh the disadvantages here, but I could easily be wrong in any of various way:

  1. The world of stats may transition toward Julia or some other more specialized language than Python,
  2. Python may lose market share to Rust or some other similar language,
  3. More statistical tooling may be built for JavaScript or even WebAssembly directly, or
  4. Any of surely countless unforeseen vicissitudes may eventuate.

Risks

I’ve detailed a lot of risks in “How Might Better Collective Decision-Making Backfire?” They fall into the general buckets of risks that need to be addressed immediately and risks that can be addressed later or reacted to only if they manifest.

Immediate

  1. Programming errors.
    1. This is the primary reason I’ve put off writing this post and why now I only publish it on my blog where few people will see it. It’s very likely that there are mathematical errors in some parts of the code that will skew any results. This might lead to disastrous misprioritization.
    2. The main way to address this issue is to improve the tests and to get people to use the software for many typical but not absolutely crucial purposes. These people need to be closely in touch with the developers and take their roles as beta testers seriously.
  2. Version conflicts and path dependent legacy software.
    1. The Squiggle language will need a clear, versioned language definition so that implementations of models in different languages are interoperable, either to the point where you can copy-paste the code or at least to the point where the real Squiggle code can be turned into a language specific version automatically. (Along the lines of how the 2to3 tool turns Python 2 code into Python 3 code.) Otherwise some collections of models may end up siloed by their dialect or it’ll become hard to innovate for fear of losing backward compatibility.
  3. Failure to catch on among the right people.
    1. Squiggle may fail to get used or it might be used by people with nonaltruistic motives or uncooperative methods.
    2. I’ve changed the name of the library from Swungdash to SquigglyPy to make it easier to set up Google Alerts to track who is using it.
    3. It may also be valuable to cultivate a community around it that has a lot of knowledge and resources so that people who use it have an incentive to join the community, so that the developers can stay in touch with them.
  4. Conflict through coordination failures.
    1. If a community forms around the creation and composition of Squiggle models, then there will be power in the repository of knowledge of that community. It’ll be important to instill the right ethos in the community early on to avoid the sorts of coordination failures that can arise even among well-intentioned actors. It is not clear what ethos that is, so a general readiness to adopt it and an interest in cooperative conflict resolution may be a good start.
  5. Discrimination against illegible values.
    1. There is a risk that Squiggle will differentially enhance the thinking of people with broadly utilitarian values because they’re easier to quantify than other values. Other values may be displaced or abandoned.
    2. I consider it an open question which procedures to idealized preferences are cooperative, but to the extent that the influence of Squiggle will be uncooperative, it’ll be a problem. We may choose to reject such uncooperative behavior terminally or consider that it can backfire against us in an acausal bargain. (See Oesterheld 2017, especially section 3.3.1.)
    3. Either way, the cooperative ethos of the hypothetical Squiggle community should include respect for other value systems and protection of strong interests of moral minorities.
  6. Missing of failure modes when they happen.
    1. On a higher level, we can try to prevent or address all these risks, but we may also fail to notice them when they happen, which seems like a risk in itself.
  7. Benefiting malevolent actors
    1. Finally, there is the risk that we’ll build up a vast body of knowledge that can then be abused by people with malevolent or otherwise uncooperative intentions.
    2. It may pay off to reduce the generality of the knowledge we build by focusing on research questions that are particularly interesting to altruists only.
    3. Making the community inhospitable to such people may not be a good approach, at least not generally, because that may not slow them down but it will deprive us of a chance to talk to them and make them see the future from our perspective.

Delayed

These will either take a while to manifest, will be obvious, will be easy to control, or will be problems no matter what. They are explained in “How Might Better Collective Decision-Making Backfire?” This subsection is mostly notes I wrote to myself a while back.

  1. Decision theoretic catch-22s.
    1. I expect that military intelligence makes a big difference here, but prediction tools may still improve upon this significantly. (I’m mostly thinking of conflicts between countries here since behaviors of other people in interpersonal conflicts can be more easily intuited by many/most people.)
    2. This will be easier to address with better decision theory, and it’s a big problem anyway, so people (e.g., at MIRI) are probably long working on said decision theory.
    3. The example I cite in the blog post has the feature that the payoff chosen by the dumb dictator is not a typical Schelling point. If this is a general feature of such situations, and if I’m not missing more commonplace examples of this exploit, it’ll likely take a long time before we become sophisticated enough to become vulnerable to this. I feel like I’m probably wrong about this, but I can’t tell in what way.
  2. More power to the group.
    1. I think this can be observed and reacted to.
  3. Overconfidence may be necessary for motivation.
    1. There is already a strong focus in the community on individual rationality. It seems like a stretch to think that a collaborative system would influence this potential issue at the margin.
  4. The sunk cost fallacy may be necessary for sustained motivation.
    1. Same as with overconfidence.
  5. Silencing of lone dissenters.
    1. This only works if the collaborative group has a lot of social power, so at this point, it’s probably not something that needs to be addressed.
    2. It also seems like something that can be reacted to, i.e. that doesn’t need to be prevented at all costs.
  6. Loss of valuable ambiguity.
    1. This will probably take a long time to manifest if it does.
  7. Social ramifications.
    1. This can be evaluated at the time.
  8. Legibility and centralized surveillance.
    1. This depends heavily on who will feel threatened by what research. So in the end, not using the tools for potential sensitive research is an option.
  9. Cooperation-enhancing effects.
    1. Probably good on balance,
    2. Unlikely to be affected by the tools,
    3. Will take a long time.

Demo

Christian Tarsney writes in his paper The Epistemic Challenge to Longtermism:

The ideal Bayesian approach would be to treat all the model parameters as random variables rather than point estimates, choose a probability distribution that represents our uncertainty about each parameter, and compute EV(L) on that basis. But for our purposes, this approach has significant drawbacks: EV(L) would be extremely sensitive to the tails of the distributions for parameters like r, s, and vs. And specifying full distributions for these parameters—in particular, specifying the size and shape of the tails—would require a great deal of subjective and questionable guesswork, especially since we have nothing like observed, empirical distributions to rely on. Even if we aim to adopt distributions that are conservative (i.e., unfavorable to longtermism), it would be hard to be confident that the tails of our chosen distributions are genuinely as conservative as we intended.

That seems fair. My software, SquigglyPy, addresses the concern about subjective guesswork insofar as it’s not a printed and published paper but an interactive software tool that anyone can play around with and run with their own subjective guesses for parameter values.

SquigglyPy does not yet address the related concern about sensitivity to tiny differences in the shapes of the tails of distributions. This concern takes two forms: (1) We may have no reason to prefer one tail shape over another and (2) the Monte Carlo–based sampling methods that SquigglyPy uses to approximate the tails may never use enough samples to approximate them sufficiently no matter how many samples we tell SquigglyPy to use.

I consider the first problem to be outside of the scope of SquigglyPy. You can try to address it by experimenting with many very different tail shapes or, if you’re dealing with a univariate distribution, to plot the distribution for a range of tail shapes. But those are, admittedly, not solutions that are universally applicable or sure to cover all possible shapes.

The second problem is one that it would be very interesting to address in a future version of SquigglyPy. The ReasonML implementation already tries to manipulate various well-known distributions analytically for as long as possible before resorting to sampling. That seems like a very promising 80/20 solution to this problem.

Technical note: Ideally, I’d like to implement a class Distribution that always has a representation that is a mapping from x to y coordinates but also, whenever possible, has a representation that is a formula and one that is a set of samples. All sorts of mathematical operations could then be defined on distribution samples (the status quo) and formulas, all fully encapsulated in the Distribution class. Any code that uses the distributions could be completely oblivious to the inner workings of the distributions, just mathing them like any other variable, while the distributions would take care to keep their formula representations for as long as possible. Plots could always use the x/y coordinate representation, again not caring how the distribution generated it.

Starboard

The following is best viewed on Starboard.

Starboard is like a Jupyter Notebook or like Google Colab except that it runs in your browser! The nice folks of Pyodide have compiled the CPython interpreter to WebAssembly so you don’t have to connect to a server to run your Python scripts. Starboard then built a Jupyter Notebook–like environment around it.

To use SquigglyPy in Starboard, you can simply import it with Micropip, because it’s pure Python. Of course you don’t have to use Starboard. You can also install it with pip, e.g., in your Poetry environment.

import micropip
micropip.install('https://packages.claviger.net/public/squigglypy-0.2.2-py3-none-any.whl')

Next, a bunch of imports. From SquigglyPy in particular we import the context, which stores some more or less global state and settings, various distributions, and miscellaneous utility functions.

import math

import scipy
import numpy as np
import matplotlib as mpl
import matplotlib.pyplot as plt
import matplotlib.transforms as transforms
import pandas as pd

from collections.abc import Iterable

from squigglypy.context import Context
from squigglypy.dsl import normal, uniform, mixture, pareto, lognormal
from squigglypy.utils import bfs, as_model

plt.style.use('seaborn')

The plotting function is a mess and quite specific to our case here, so I didn’t want to include it in the library. Ideally, I’d like to have plotting methods with sensible defaults integrated into the model API so that you can produce a plot with one command. But that would require (1) knowing what these sensible defaults are, (2) coming up with an API that allows you to override these defaults at will when they don’t work for you without copy-pasting a big chunk of code, and (3) tweaking the result based on the actual needs of actual users. So I’d be happy to accept pull requests that solve that for me. ^.^

def plot(model_, points, sample_count=100, sample_fmt='b-', quantiles=(0.1, 0.5, 0.9)):
    with Context(sample_count=sample_count):
        samples = np.stack([~model_(point) for point in points])
    p10, p50, p90 = np.quantile(samples, quantiles, axis=1)
    plt.figure(figsize=(8, 6))
    plt.yscale('log')
    plt.plot(points, p50, 'r-', label='mean')
    plt.fill_between(points, p10, p90, color="red", alpha=0.1)
    plt.plot(points, samples, sample_fmt, alpha=0.1)
    plt.xlim(0, None)

Christian Tarsney’s cubic growth model (section 4) describes the scenario that seems most sensible to me. I’ll focus on it for the purposes of this demo. My original plan was to implement it here faithfully, but I don’t have time for that now, so it has more the character of a proof of concept.

I’ll keep the explanations brief here because Tarsney has already done a great job at that.

value_star = pareto(.4) * 3.333e5
value_star.name = 'Value delta per star (v_s)'

value_earth = lognormal(0, 1) * 2e6
value_earth.name = 'Value delta on Earth (v_e)'

rate = uniform(10, 1e8)**-1
rate.name = 'Rate of exogenous nullifying events (r)'

# This doesn’t seem to work for some reason?
speed = 0.1  # lognormal(1, .3) / 10
#speed.name = 'Speed of settlement in c (s)'

year_start = 0                        # = t_l, year of start of settlement
year_end = 1e14                       # = t_f, eschatological bound
probability = 2e-14                   # = p, probability delta
density_galaxy = 2.2e-5               # = d_g, star density within a radius of 130k ly
density_supercluster = 2.9e-9         # = d_s, star density within Virgo Supercluster


def number_stars(radius):  # In light years
    if radius <= 0:
        return 0
    elif radius <= 1.3e5:
        return (4/3) * math.pi * radius**3 * density_galaxy
    return (4/3) * math.pi * (
        radius**3 * density_supercluster +
        1.3e5**3 * (density_galaxy - density_supercluster)
    )


def cubic_growth(year):
    return value_earth + (
        value_star *
        number_stars((year - year_start) * speed) *
        math.e**(rate * year * -1)
    )

years = range(0, int(1e6), int(1e3))

This is the value over time. There is an Integral resolver in SquigglyPy, but I haven’t gotten around to using it, so this is just the part inside the integral. (Also ignoring the probability delta, which is outside the integral.)

You can see that all the semitransparent lines overlap and so naturally indicate where the probability density is highest. In this case, this is similar to the 80% HDI (reddish) and sort of clustered around the mean, but you can just as soon imagine a more multimodal distribution where the HDI would hide the multimodal nature.

plot(cubic_growth, years, sample_count=100)
plt.show()

Consider, for example, the fabulous foobar model. The reddish HDI and the mean are rather uninformative compared to the traces.

def foobar(x):
    weight = mixture([normal(2, .1), normal(0, .1)])
    bias = uniform(100, 200)
    return weight * x**2 + bias / 5

plot(foobar, range(100), sample_count=100)
plt.yscale('linear')
plt.show()

From here it’s also conceptually easy to develop a sensitivity analysis that tells us what influence a given constant (by which I mean a univariate random variable, one that, in our example, is independent of the time) has on the model value for a selection of points in time.

The main difficulties here are:

  1. What methods of visualization are there that are more intuitive and/or informative than the one I’ve chosen here?
  2. What graphing library do we use to make this pretty and interactive?
  3. How do we integrate this into SquigglyPy so that the APIs are intuitive but also powerful enough to allow for a few other comparisons?

The goal would be to come up with an API in SquigglyPy that allows for a range of comparisons of parts of the model to other parts regardless of whether they are dependent or independent of x. (x is the time in our example.) The visualizations would have to be different ones for these combinations.

def sensitivity(model, xs=None, sample_counts=(10, 10)):
    x_sample_count, dist_sample_count = sample_counts
    xs = xs or range(30)
    constants, variables, tracer = bfs(model)
    constants = [constant for constant in constants if getattr(constant, 'name', False)]

    fig, axes = plt.subplots(len(constants), figsize=(10, len(constants) * 5))
    plt.subplots_adjust(hspace=0.5)

    for ci, constant in enumerate(constants):
        axis = axes[ci]

        with Context(sample_count=dist_sample_count):
            constant_samples = ~constant
            variable_samples = pd.DataFrame([~model(x) for x in xs], index=xs)

        if not (
                hasattr(constant_samples, 'shape')
                and constant_samples.shape == (dist_sample_count,)
                and variable_samples.shape == (len(xs), dist_sample_count)
            ):
            continue

        normalize_index = mpl.colors.Normalize(vmin=0.0, vmax=10.0)
        x_sample_indices = np.linspace(0, len(xs) - 1, num=10, dtype=int)

        axis.set_xlabel(str(constant))
        axis.set_ylabel('model value')

      rows = variable_samples.iloc[x_sample_indices].iterrows()

        for i, (x, variable_samples_at_x) in enumerate(rows):
            combined_samples = pd.DataFrame({'model': variable_samples_at_x, 'node': constant_samples})
            combined_samples = combined_samples.sort_values('node')
            cmap = plt.get_cmap('coolwarm', 256)
            color = mpl.colors.to_rgb(cmap(normalize_index(i)))
            axis.plot(
                combined_samples.node,
                combined_samples.model,
                c=color,
                linewidth=1,
            )

        axis.legend([f'x = {x}' for x in np.array(xs)[x_sample_indices]], loc='upper left')

sensitivity(cubic_growth, xs=years)
plt.show()


Comments

comments powered by Disqus