2:00–3:00 pm
GCIS E223 929 E. 57th Street
To understand a system, in science or more broadly, one often begins by sifting through variation and identifying that with the most relevance. The information bottleneck formalizes this process, defining a spectrum for the variation contained in a random variable that is ordered by relevance to a second random variable. Functionally, the information bottleneck encapsulates the relevant variation in a compression variable, and optimization requires navigating the space of all possible compression schemes: a task well-suited for machine learning. In this talk, we focus on a particular version of the information bottleneck that identifies the most relevant variation in a set of multiple random variables, called the distributed information bottleneck, and show some of the exciting doors it opens for a physicist equipped with data. First, the method is a practically useful source of interpretability, turning a complex high-dimensional relationship into a spectrum of approximations that reveal the most important sources of variation. As a motivating example, we study rearrangement dynamics in a simulated glass, and pinpoint the specific variations in local structure that contain the most information about future instability. Second, we show how the characteristic rate of information production in chaotic dynamics -- the Kolmogorov-Sinai (metric) entropy -- can be formulated as a distributed information bottleneck optimization problem. We find optimal measurements for a handful of chaotic systems, and discuss the challenges that arise when using machine learning to find such measurement schemes. With a relatively straightforward implementation, the methodology we introduce offers a pragmatic means to harness machine learning for information theoretic analyses of complex systems through data.