An Unexpected Union - Physics and Fisher Information
An Unexpected Union - Physics and Fisher Information
An Unexpected Union - Physics and Fisher Information
BOOK REVIEW
05 I 0y5 p1y 6 dy
E 0 y =
0
(1)
e2 =
0 y p y dy
05
1 6
(2)
I=
I satisfies the consequence of the CauchySchwartz inequality known as the CramerRao inequality: Ie2 1. If N = 1 and p(y|q) = p(y q), (3) reduces to
ln p y / p y dy.
1 6
1 6
(3)
IJ IJ
p y / y
2
0 5 0 5 / p0 y 5L dy = p0 x 5 / p0 x 5L dx = I .
2
(4)
Moreover, the substitution p(x) = [q(x)]2 yields the even simpler form
I = 4 q x
05
dx.
(5)
The introduction of real probability amplitudes q(x) in place of probability density functions p(x) simplifies many calculations in modern physics. Moreover, since the integrability of p(x) is equivalent to the square integrability of q(x), the substitution allows the analysis to take place in Hilbert space, a change of venue that von Neumann welcomed. In higher dimensions, the quantity [q(x)]2 becomes q.q or, after the introduction of complex probability amplitudes, yy*, where * denotes complex conjugation. Frieden presents an entire table of Lagrangian functions, gleaned from branches of physics as diverse as classical and fluid mechanics, electro- and thermodynamics, quantum theory, and general relativity, in which such forms appear. This suggests that an approach via Fisher information could furnish at least a partial answer to a long unanswered question: Where do Lagrangians come from? Frieden quotes a standard text to the effect that Lagrangians are often plucked out of thin air, simply because the associated EulerLagrange equations are the expected equations of motion. There is, as yet, no general explanation for the remaining terms present in the Lagrangians of Friedens table. The success of the EPI technique requires an ability to account for the additional terms that occur in particular branches of physics. If an experiment is designed to estimate several different parameter vectors qn from measurements yn = qn + xn, the appropriate generalizations of (5) are
I=4 =4
I I
n n
qn qn dx n * dx, n
(6)
with either real or complex probability amplitudes. As long as N (the largest n) is even, complex amplitudes can be formed from real ones by writing i = (1)1/2 and yn = q2nl + iq2n , n = 1, . . . , N/2. Alternatively, the qns can be recovered from the yns by separating the real and imaginary parts. The choice between real and complex amplitudes is strictly a matter of convenience. It is also con-venient to define q(x) as (ql(x), . . . , qn(x)), and y(x) as (y1,(x), . . . , yN/2(x)). It follows from (4) that if the random deviations x follow a normal or Bernoulli distribution about q, then I is inversely proportional to their variance. If x is logistically distributed, so that p(x) = 1/(l + exp(ax)), then I = a/2. Or, if the deviations are 2-vectors drawn from a bivariate normal distribution in which the components share a common mean q, a common variance s2, and a covariance r, then I becomes 2/s2 (1 + r). The information content of two variables is thus a decreasing function of their degree of correlationyet another reason why experimenters should and typically do make the extra effort to observe independent random variables whenever possible. The form of the foregoing result leads Frieden to speculate that every experiment may constitute a (zero-sum) game against nature, in which the experimenter strives to maximize the Fisher information obtained, while a hostile nature (information demon) strives to minimize it by introducing unexpected correlations and the like. It seems an interestingif as yet undevelopedidea. Frieden compares Fisher information with the Shannon, Boltzmann, and KullbackLiebler definitions of entropy, which likewise represent attempts to identify useful scalar measures of information. The comparison is most easily made with Shannon entropy H = [ln p(x)]p(x)dx, and the discrete approximation
H x 1 n p x n ln p xn ,
1 6 1 6
2
(7)
1 6 1 6
/ p xn
1 6
(8)
of (4) in that the probabilities p(.) can be reassigned among the carriers x1, . . . , xn without altering the value of (7). Not so (8). It matters for Fisher information Ibut not for Shannon information Hwhether the likeliest carriers are clustered in space. It is this feature that uniquely qualifies I to illuminate Lagrangian physics. I, like H, can be viewed as a measure of physical disorder. When H coincides with traditional Boltzmann entropy HB, as it often does, the second law of thermodynamics stipulates that dH(t)/dt is never negative. And in a variety of other circumstancesamong them the situation in which p(x,t) obeys a FokkerPlanck equation of the form pt = [D1(x,t)p]x + [D2(x,t)p]xxit turns out that dI(t)/ dt is never positive. It is also possible to define temperature in terms of Fisher information, and to deduce a version of the perfect
P.M. Morse and H. Feshbach, Methods of Theoretical Physics, Part I, McGraw-Hill, New York, 1953.
gas law incorporating that precise definition of temperature, which suggests that it may eventually prove possible to recover all of thermodynamics from the notion of Fisher information. That, of course, remains to be done. According to (4) and (5), Fisher information I is a real-valued functional defined on the space of PDFs p(.), or probability amplitudes q(.). In order to derive the laws of physics from Fisher information, one must find a second such functional J that, when subtracted from I, furnishes the missing terms in the Lagrangians of Friedens table. In what follows, I will always have the form (6), while J will have a form appropriate to the particular branch of physics under consideration. Moreover, I will always represent the information in the data obtained via experiment, while J will represent the information content of the physical situation being explored. It makes sense, therefore, to suppose that I can never exceed J, to define an efficiency k = I/J, and to observe that k has always turned out to be either 1 or 1/2 in every situation (branch of physics) that Frieden and his co-workers have explored to date. An experiment, however sensitive, must always perturb the probability amplitudes q(x) of the system under observation by an amount dq. The consequent perturbations dI and dJ represent quantities of information transferred from nature (the system) to the experimenter. Frieden proposes axiom 1: dI = dJ. This is a conservation law concerning the transfers of information possible in certain more or less idealized experiments. Its validity, according to Frieden, is demonstrated many times over in his book and is the key to his EPI approach to the understanding of physical reality. The axiom can actually be confirmed whenever the measurement space has a conjugate space connected to it by a unitary transformation. The author augments the foregoing conceptual axiom with two additional axioms of a more technical nature. To consider the effects on the functional K of a small perturbation dq(x) of its argument q(x), we write K = I J. Then, by axiom 1, we can write
K = I J = I J = 0,
0 5
(9)
and ask for which probability amplitudes q(x) equation (9) holds. The answer is naturally expressed in terms of EulerLagrange equations from the calculus of variations, a process best understood by seeing it in action. Consider a (small) particle of mass m moving on a line about its rest position q. If the particle is subject to a conservative scalar potential field V(x), its total energy W is conserved. In terms of complex wave functions yn(x), the expression (6) for I becomes
I = 4 N n d n x /dx dx,
2
05
(10)
the summation extending from n = 1 to n = N/2. The particle in question will have not only a position x relative to its rest position q, but a momentum m relative to its rest momentum 0. Moreover, in the same way the observations xn = yn q are drawn from probability distributions pn(x) = [qn(x)]2 corresponding to complex amplitudes yn (x), observed values m n would be drawn from distributions pn(m) corresponding to complex amplitudes fn(m). Again, we write f = (f1, . . . , fN/2) and y = y1, . . . , yN/2). Frieden concludes on physical groundsby analysis of an experimental apparatus that could in theory be used to perform the required measurementsthat J[f] = I[y]. Moreover, the fn and yn are Fourier transforms of one another, so that
05 3 8 I 16 exp1ix/h6d.
n x = 1 / 2 h
n
(11)
I = 4 N /h 2
7I 1 6
2 n n
= J.
(12)
He then identifies the summation in (12) with the probability distribution p(m) mentioned earlier, so that
J = 4 N /h 2 E 2 .
(13)
Finally, he makes the specifically nonrelativistic approximation that the kinetic energy E of the particle is m2 /2m, so that
J = 8 Nm/h 2 E E
2 2
05 2 7 = 28 Nm/h 7I W V 0 x 5 p0 x 5 dx = 28 Nm/h 7I W V 0 x 5 0 x 5
= 8 Nm/h E W V x
2 n n
(14)
2
dx.
Since I and J have now been expressed in terms of the quantities yn(x), K = I J can be combined into a single integral of the form K = L (yn, yn, x) dx, and the associated EulerLagrange equations written:
x + 2 m/h 2 W V x n x = 0, n n = 1,..., N /2,
05 2
05 05
(15)
the time-independent Schrdinger equation. This is the (one-dimensional, nonrelativistic) approximation of the (covariant, relativistic) KleinGordon equation, which can be derivedwithout approximationin similar fashion. The unitary nature of the Fourier transformation played an essential role in the foregoing derivation, particularly that of equation (12), where it justified the conclusion that k = I/J = 1 in the present theory. That means that quantum mechanics is efficient in the sense that the underlying experiments are capable of extracting all available information. Many other physical theories, including electromagnetic theory and gravitational theory, yield only k = 1/2. Frieden derives a version of Heis-enbergs uncertainty principle from the CramerRao inequality, and contrasts the result with the standard version obtained via Fourier duality. He remarks that the EPI version is stronger than the standard one in the sense that it implies the other, whereas the converse is untrue. The standard result applies only to uncertainties that exist before any measurements are taken, while the EPI version also applies to uncertainties that remain afterward. The material described here is contained in the books first four chapters and in Appendix D. The next six chapters present further applications of the EPI method. Statistical mechanics, like quantum mechanics, is found to be efficient in the sense that k = I/ J = 1, while most other theories are only halfway efficient. Frieden finds this unsurprisingquantized versions of most physical theories, including gravitation and electromagnetism, have not been developed, although he expects that they will be found in due course. Indeed, he and others are currently applying the EPI method to a study of quantum gravityin steps analogous to those that led to the Schrdinger equation presented hereand, in higher dimensions, to the KleinGordon equation. Feynman path integrals seem to emerge naturally in this ongoing investigation, which seems destined to culminate in a vector wave equation of gravity. Also under attack are various problems in turbulence. Probability laws of fluid density and velocity have been found for lowvelocity flow in a nearly incompressible fluid. The new laws agree well with published data arising from detailed simulations of such turbulence. Frieden also remarks that the EPI method applies most naturally to the laws of physics that are expressed in the form of (differential) field equations. He sees no reason, however, why the method could not be extended to cover laws concerning the sources of the fields in question. The exposition ends as it began, quoting John Archibald Wheelers words on observer participation. It seems safe to conclude, all in all, that the unexpected union between physics and Fisher information will prove both lasting and fruitful.
James Case writes from Baltimore, Maryland.