# Blog

- Still need to read Jayadev’s lecture notes on info-theoretic toolbox for stats
- want to go through the construction of conditional probability in lehman & casella
- nonparametrics ?
- randomization inference (thanks frances!) e.g. [Channelling Fisher: Randomization Tests and the Statistical Insignificance of Seemingly Significant Experimental Results*], http://www.mattblackwell.org/files/teaching/s05-fisher.pdf
- submitted a request for The Rule of Logistics (!)
- supplystudies.com/syllabus
- Have I read the MAS CONTEXT: HIDDEN issue yet?
- stuff for process
- layout for syllabus
- this oral history of Gomory, IBM research
- “Oracles and politicians” - this extension to the oracle framework (assuming you have ‘oracle’ access to a function evaluation at a point, even if you don’t have full access to a general gradient, etc)
- causal inference and machine learning: because in many areas of interest we have many covariates / contexts but perhaps not supervised
- “Politicians and Oracles” - Sebastian Bubeck The main idea is you can practically improve oracle models of computation for first-order optimization methods, where you assume you can query the function value (e.g. of the gradient) at the point but don’t have access to the full function, by considering access to the function via “politician”. The politician differs from the oracle because if you ask the politician for the value of f() at some point x, instead it’ll give you the value at some other point y. Where this politician differs from real life is that this politician will conduct line search in some region to query the function at a value that’ll lead to strictly better performance. For example, the politician might keep track of an ellipsoid within which it knows the optimum lies. How would such a politician be computed? You can run QR decomposition on the span of the gradients.
- mezard & montanari - information, physics, and computation
- things on topological data analysis: understanding “the shape of information”
- Overcoming intractability in machine learning http://www.cs.princeton.edu/courses/archive/spring15/cos598D/
- Bubeck’s approach to mathematical optimization; also his mini-course on bandit theory
- the new jim crow which appears to be devastatingly on point
- better facility with convex analysis, bregman divergences, etc.
- mathematical statistics
- primal-dual online algorithms
- contextual bandits
- interior point methods; slater conditions
- topology lol
- algebra lol
- real analysis
- Can solve exactly with interior point methods but these require a lot of memory for large problems.
- Projected subgradient projection - standard [subgradient] descent
- Low rank parametrization [e.g. matrix completion] Constraint is not convex because you end up multiplying matrices (the factors).
- some sort of noodle; ideally long and skinny. I’ve never tried spaghetti, fettucine would work better if you can’t get the chinese kind
- Sauce: emulsify peanut butter with a bit of water, soy sauce, black vinegar (unnecessary) to taste, add hot sauce. Be careful with the water: you need it to cling to the noodles. if you have sesame oil, this would probably be bomb.
- mix together; level up with julienned cucumbers, julienned eggs; basically whatever you would put into bibimbap would work !
- base layer of aromatics: garlic, ginger if you have it, spices, fry in oil. let it get hot and then add onions; wait a few minutes and add bell peppers. season
- add ~two cups of water or whatever and the lentils; cook for ~20/25 minutes. add in curry paste [if using]
- possibilities: if you haven’t used a paste that comes with coconut milk; add coconut milk; or whip up a batch of tzatziki and combine on the plate

# Shower Thoughts

I was thinking to myself, just in case it comes up in conversation (e.g. i’m not so secretly hoping that someone will ask), how do I explain my specific interest in reading – this unholy mix of cultural studies, academic nonfiction, soft infrastructure/soft power, urban studies?

Someone told me about this movie where the narrator is given glasses that allow him to see the ‘truth’ behind advertisements - e.g. he looks at a billboard and immediately sees the ‘message’ that they’re trying to get across to him. I think of cultural studies as a very similar lens (semi-literally): it’s a way of seeing the world in which you try to understand the larger forces at work in explaining the physical realities in front of you, of acknowledging the social construction of most things and how that affects the built environment/cultural forces of today.

There’s this essay that I still think about from time to time - Heidegger’s The Age of the World Picture, which I love so much because it shifts focus from ‘technology’ to the world-picture or world-frames its development changes or enables. It’s this emphasis on the world-picture in particular that I find really compelling, but also rewarding; to be able to turn this interpretative lens on whatever you wish. There’s this Kenya Hara quote about how design is about being able to drill down into the realities of something, just as there are infinites between successive integers on the real line.

In a similar way, I find that these kind of cultural studies recover some richness of everyday experience through its study.

And then there are a host of new questions, I think: what is the cultural study of operations research? How has shopping as a cultural institution changed in response to, e.g. economic pressures from Amazon’s 2-day shipping? What are the new social institutions and landscapes surrounding the rise of autonomous vehicles – or conversely, what social/cultural shifts are the conditions of possibility for these technologies to take root at large? What are the identity politics of modern supply chain logistics – identity politics in the age of Zara, H&M and trends straight off the runway for consumption?

# What Im Reading

## technical-type things

## critical logistics

## to-dos

## ephemera i read

## strange curiosities

# Dumpling Saturday

## homemade pork belly & cabbage dumplings

on a whim I bought a pound and a half of pork belly and it’s been in my freezer - so today I made pork belly & cabbage (+scallion) dumplings! it took a little longer than anticipated because i don’t have a food processor, so i minced the pork belly by hand. it still works; it’s just less incorporated than the typical lumps you see.

also picked up some storebought wrappers from Chinatown – the “northern style” dumpling wrappers are thicker than “shanghai style”

I ended up with a giant tray of ~50? 70? dumplings - so I’ll be eating these for a while ! next up: bulgogi dumplings !

# Thoughts On Icml

It’s the end of day 2 of the main conference – I’m here primarily for the Data-Efficient Machine Learning workshop (though the Personalization and Machine Learning for Social Good workshops look really good, and I hope to be able to sneak into them). It’s really exciting (and legitimately overwhelming!) being here, seeing faces I recognize from their academic websites walking around, etc.. The scope of the conference is rather broad (even considering the magnitude of interest in deep learning / RL / deep RL) but the papers are rather specific: it’s been most rewarding to engage with the state of the art in some fields which I’ve kind-of, sort-of worked in.

Some interesting ideas and things I took away:

Some ideas that I don’t want to forget from ICML talks:

“Doubly Robust Off-Policy Value Evaluation for Reinforcement Learning” How do you evaluate a policy for RL without deploying it in real life and incurring costs? This talk presented work that used importance sampling

“Statistical Limits of Convex Relaxation” This work analyzed the statistical suboptimality of convex relaxations of the SOS hierarchy, in particular for two settings. They note that previous work relies on the planted clique hypothesis; in contrast, their work is constructive.

“Faster Convex Optimization: Simulated ANnealing with an efficient Universal Barrier” Abernethy & Hazan - they present really nice analysis relating simulated annealing and interior point methods.

“Provable Algorithms for Inference in Topic Models” Arora, Rong Ge, Frederic koehler, Tengyu Ma, Ankur Moitra

Following up on their work getting theoretical bounds for NMF, they consider the problem of inference - how to determine the topic distribution of documents? In particular you can’t ask for more “samples” from the document to figure out its composition. They address this by expressing the problem as a linear map, where the problem is that the variance tends to be high: E[y] = Ax. They compute the pseudoinverse of A, the matrix B such that BA = x, and

There were several talks on learning choice models from rank data, or comparison data, which I didn’t really understand because I’m not quite familiar with the literature. E.g. plackett-Luce models, etc.

“Recommendations as Treatments: Debiasing Learning and Evaluation” Riding the causal inference wave, this work considered the inverse propensity score estimator, which reweights samples in th eERM estimator with the inverse of propensity score, to debias the dataset for a standard recommendation sytsem. In particular, the data is “missing not at random”: there are definite tendencies for people to provide data for works that they already like.

“Stability of SGD” Why does SGD generalize? Ben Recht presents joint work with Moritz Hardt and Yoram Singer on the stability of SGD from a dynamical systems point of view, including in the nonconvex case.

# Passing Thoughts From Papers

### Currently Reading, or trying to read, or should read

## concepts which i am woefully unfamiliar which i need to become more familiar with

## Recht - minimum rank approximations fo rmatrix completion

Minimization of l1 norm is well-known heuristic for cardinality minimization [e.g. sparsity] => compressed sensing, whenever the sensing matrix has *basis incoherence* properties; e.g. when it is chosen according to specific ‘ensembles’ (?)

In fact l1 heuristic is special case of nuclear norm heuristic. Parallels between nuclear norm heuristic and l1: restricted isometry property: implies that nuclear norm heuristic guaranteed to provide minimum rank solution

General LMIs can be expressed as rank constriants on a block matrix: rank of block symmetric matrix is equal to the rank of a diagonal block plus the rank of its schur complement

Insofar as sparsity optimization problems in compressed sensing based on cardinality can be expressed via linear programming (l1 norm minimization), convex optimization (specifically SDPs) can be used for matrix completion.

** operator norm and nuclear norm of a matrix can be cast as SDPs

Recall relationships between the Frobenius, operator norms, nuclear norms. Related by the following inequalities:

In particular this chain of inequalities shows that , so the nuclear norm is a lower bound of the rank function on the unit ball in the operator norm.

If the row spaces and column spaces of the two matrices A, B are orthogonal, then [sufficient] that the nuclear norm is additive! The proof follows by partitioning the SVD of A, B …. => infer a valid SVD showing that the singular values are equal to the union (with repetition) of the singular values of A + B.

The main section of interest is “Restricted Isometry and Recovery of Low-Rank Matrices” – Consider parameters determining the behavior of a linear map $$\mathcal{A}$, restricted to matrices of rank r. In particular this generalizes the restricted isometry property from vectors [Candes, Tao] to matrices.

*** Restricted Isometry Property, def 3.1.***
for every integer r (rank < m), define the r-restricted isometry constant to be such that $(1 - \delta_r(\mathcal{A})) \vert\vert X \vert\vert_F \leq \vert\vert \mathcal{A}(X) \vert\vert \leq (1 + \delta_r(\mathcal{A}(X)\vert\vert \leq (1 + \delta_r(\mathcal{A})) \vert\vert X \vert\vert_F
$$
holds for all matrices X of rank at most r.
e.g. that the Frobenius norm of the linear map applied to any matrix of rank r or lower can be bounded between a multiplier interval of the induced 2-norm. (?)

The set of matrices X for which the restricted isometry property must hold is a generalized Stiefel manifold [algebraic variety]. Many ensembles of random matrices have RIP with small w.h.p.

Main recovery theorem: requires a technical lemma; for any two matrices A, B; can decompose B as the sum of two matrices with the rank of not too large; and satisfies additivity of the rank; e.g. can do from SVD.

=== Algorithms for solving

# Broke Capsule Kitchen

# principles towards the broke-a**-b**** capsule kitchen

## Eat in season

Thanks to the miracles of globalization, you can pretty much procure *any* type of produce you desire; which is actually a huge trap because the cheapest (relatively) and *best-tasting* produce is when it’s in season. This is for very seasonal kinds of produce (e.g. squash) as opposed to things you can find all the time like apples.

## Take shortcuts

I really enjoy spicy foods and bold flavors, so I stand by premade flavor bodies like curry paste, ssamjang, gochujang, etc.

Personally, I veer vegetarian because I’m too lazy to marinade things, meat is more expensive, etc. but I also have like a pound of pork belly in the freezer so

## Spice liberally

Invest in dried spices since these will last you a long time; fresh herbs are nice and worth it if you can but it can be hard to stare down the nose of $5 basil that’ll wither in the fridge if you’re just cooking for yourself

## High-impact affordable indulgences

Like avocadoes.

## Don’t deprive yourself!

Eyes on the prize.

## “Recipes”

### Recipe: liang mian (cold sesame noodles)

### lentil whatever curry

# Reading List

# Reading List

## Technical Topics

*decision theory

*real analysis please …

*information theory

*that mezard text on information and computation

*i should check out an actual algorithms text

## Nontechnical texts

*Finish & actually process Foucault’s Order of Things

*Extrastatecraft

*things on my aaaaarg reading list

## Fiction

*DFW - the long one - infinite jest

I would also like to make a zine; about what, I don’t know.

# Future Is Possible

## working towards the future that i know is possible

During the Bacculaureate ceremony, when Randall Kennedy was imploring us to use our expensive educations to work towards the benefit of society, my friend turned to me and asked, “Do you think coupon allocation is going to save the world”? (in reference to my thesis topic.) I was upset most of all because she hit upon a very fundamental paradox I’ve faced being a student of operations research (& ‘financial engineering’), and also a steadfast acolyte of critical and cultural studies – especially the criticism of neoliberal capitalism.

I don’t think that coupon allocation will save the world, but I do think that the advancement of fundamental research on optimization and learning will support initiatives such as healthcare personalization, humanitarian logistics, that are opportunities for advancing intelligent decision-making that supports the public good. Couching such research in the language of computational marketing in order to maintain a veil of respectability/applicability/*potential fundability* is a unconscious (?) choice I’ve made, or a product of my unwillingness to commit to a handwavy contextual, domain-dependent story. I’m not interested in computational marketing *per se* but I do think it’s a useful language to adopt to maintain a baseline level of applicability.

I think further opportunities are possible, which I’ll endeavor to collect in this are.na collection for future reference.

# Michael Jordan

# Theoretical Ecology, Optimization, and Learning

## Theoretical Ecology, Learning, and Optimization.

In lieu of writing my thesis, I’m going to write up a collection of statements and results related to theoretical ecology, learning, and optimization which I’ve found of interest. I’m taking Prof. Levin’s Theoretical Ecology course, an interesting perspective on the process of theory-building in a natural science.

## Evolutionary Game Theory and Learning

In evolutionary game theory, the payoff of a strategy is dependent on the frequency of its prevalence in the population. The solution concept of interest is an evolutionary stable strategy which is resistant to ‘invasion’ by mutant types. A sufficient condition is that a strategy does strictly better than other strategies, but only weakly better against itself. Interestingly enough, removing the strict inequality removes compactness of the set that the optimization is taken over - so Nash’s analysis on the existence of the Nash equilibrium doesn’t apply.

One thing that I’m curious about is the extent to which ecologists might be interested in the computability or complexity of ESS. After all, Rubinstein himself raised a concern on the interpretability of Nash equilibria in ‘vanilla’ game theory since they were generally computationally intractable (specifically, PPAD-Complete - a slightly weaker version of NP-completeness.

Sergiu Hart and Mas-Colell show in an ‘03 paper that no-regret learning algorithms generally converge to correlated equilibria, a more general solution concept than Nash equilibria. Correlated equilibria occur if no player has an incentive to deviate from a given strategy if signalled to take a certain strategy from a central authority before the start of the game. This raises the question of whether or not these no-learning algorithms represent computationally feasible alternatives to equilibrium convergence in games with algorithmic agents: one can imagine how this could lead to the design of incentive-compatible systems where each player has no incentive to deviate, given an algorithm to maximize their payoffs in the system of interacting agents. Katrina Ligett et al. have a neat paper in SIGECOM 2015 that shows that it is at least computationally difficult (NP-Hard) to find a coarse correlated equilibrium with social welfare better than the worst possible. “One cannot provide any nontrivial welfare guarantee for the resulting equilibrium”. This is rather unfortunate; however they do provide an algorithmic framework to identify problem instances where an efficient approximate equilibrium can be computed with near-optimal welfare guarantees.

## Ecology and Optimization

This post on Off The Convex Path outlines a few interesting case studies in the relationship between dynamical systems and nonconvex optimization. (In particular there’s a cute anecdote about slime molds computing gradient descent on a manifold to solve an linear program. I think it demonstrates a flaw in thinking about logical connections between computation and the natural sciences: the fact that discrete models of age-structured populations resemble the power iteration method (for finding eigenvalues) doesn’t mean that nature is computing the largest eigenvalue - it means that in this model, the largest eigenvalue is governing the dynamics of evolution of the system. It’s the structure of the discrete-time updating which resembles the power method, which is more a consequence of the fundamental simplicity of matrix multiplication as projection, and the assumption that the projection matrix stays constant!

Theoretical ecology is interested in stability analysis of dynamical systems because this type of analysis stays robust (for the most part) to small changes in parameters - stability results arise from the main dynamics of the model. The claim is that there is no general theory for finding multiple fixed points of a dynamical system from the point of view of nonconvex optimization seems to overlook the type of stability analysis which exactly studies which equilibria are reached under which initial conditions.

## Evolutionary Game Theory on Graphs

Prof. Tarnita’s thesis addresses the question of evolutionary game theory via structured populations. What if populations are spatially partitioned or their topological interactions are otherwise governed by some network structure? In finite populations, evolutionary game theory is best modeled via Moran’s stochastic process which models the death and reproduction process in a finite population, vs. the infinite well-mixed population. The analysis then (loosely) analyzes propagation of evolutionary game (including mutant strategies) via the dynamics of a finite Markov chain. The quantity of interest is the fixation probability, the ability of a mutant strategy to take over a finite structured population.

The analysis looks interesting and I’d like to go deeper through it: one question brought up at the end of a review raises the question of how temporally changing spatial structures might affect fixation probability and other questions relevant to the evolutionary game.

(I’m reminded of the utility of working with abstractions like graphs, since insights about dynamic topologies in one setting (I’m working with the properties of network flows on dynamic graphs) might add to our ability to make predictions in an entirely different field!)

# Scribe Notes

## Scribe notes from talks, etc:

# learning-and-optimization

I’ve been intrigued by Nathan Kallus’ work with Prof. Bertsimas on predictive to prescriptive learning: I think it’s well worth a read and still need to go through the math thoroughly.