The newest quartet of Fields medalists were just announced last week at the ICM 2014 in Seoul. The Fields medal is widely considered to be the highest prize in mathematics, and the community always seems to be very excited to hear about its new recipients. Although I’m wary of paying too much attention to these sorts of prizes in general, one thing I like very much about the process is that it inspires experts to take cutting-edge mathematical research and try to explain it to non-experts. I think that other graduate students will concur that this is an extremely difficult thing to do, so I’ve been impressed by the quality of some expositions I’ve seen.
While I was browsing Gowers’s blog of the Laudatio for Martin Hairer, one of this year’s Fields medalists, I encountered some elements that I found exciting simply because they connected with ideas that I had actually encountered, and rather enjoyed, in a recent course on PDE. So I thought that I would try and explain these ideas; whether or not they really are related to Hairer’s work in a more than superficial way is something I don’t know, since I don’t understand his work, but at least they will clarify some of the jokes (that’s the most important part, right?).
Since I am extremely far from being an analyst myself, I am sure that everything I have to say will seem very basic to people who work in this area, but it may be useful for people like myself who come from a more algebraic background. As usual, I’ll try to keep things non-technical – almost all of the technical details that I know are written up in these lecture notes (though they are currently optimized for faithfulness to the lectures and not readability, which are quite different in this case).
1. Partial differential equations
Hairer’s work is in a subject called “stochastic partial differential equations.” That’s a rather intimidating extension of the already difficult subject of (deterministic) partial differential equations, so let’s take baby steps first.
A partial differential equation (PDE) is just a functional equation involving partial derivatives: for a given , it asks for functions satisfying the relation
The main source for PDEs is from physics, where they come up in modeling all sorts of phenomena. There describes the state of the physical system, and is some kind of local constraint.
Let’s take a look at some of the main examples.
- The Poisson equation describes the electric potential in the presence of a charge density
- The wave equation models the displacement of a wave in space:
- The heat equation models the diffusion of heat in space:
- The Schrödinger equation governs the time evolution of the quantum wavefunction of a system.
An early important idea in the theory of partial differential equations, which was first understood and clarified by Hadamard, is that of well-posedness. You see, when we ask for solutions to a PDE we implicitly mean solutions in some specific space, which could be for instance the space of differentiable functions, smooth functions, or even analytic functions.
Ideally we want to work in a space where the solution to the PDE exists and is unique. (There is one more condition to well-posedness, which is that the solution should somehow depend “continuously on the initial data” in an appropriate sense.) This obsession with existence and uniqueness has been the subject of several jokes; my favorite is this one:
A mathematician and an engineer are staying in a hotel. The engineer wakes up to find his room on fire. He spots a fire extinguisher, uses it to put out the fire, and goes back to bed.
The mathematician wakes up to find his room on fire. He spots a fire extinguisher, exclaims “a solution exists!” and goes back to bed.
All joking aside, though, existence and uniqueness are quite important if we think of our solution as modeling a physical situation. That is because real physical systems tend to evolve in a specific, determined way from their initial conditions, and so any “real” solution should somehow be unique.
This creates some interesting room for interpretation as far as “solving” the PDE goes. The most natural interpretation is to seek a solution that possesses at least as much regularity as demanded by the equation, and then to impose additional regularity until one obtains uniqueness. For instance, if the equation involves at most second derivatives, as all our examples above do, then the most natural problem would be to ask for a twice-differentiable function that satisfies the equation. This is the notion of “classical solution.” However, this already turns out to be impossible for many interesting cases.
Therefore, one has to expand the solution concept beyond the familiar realm of functions. The broadest sense of solution are the “weak solutions,” which are solve the equation as “distributions.” Such functions are not differentiable in the usual sense, but they possess derivatives as distributions, which are not functions but do possess some of the key properties of functions.
It may be useful to motivate the notion of distribution by borrowing some ideas from physics (I should credit this explanation to Strichartz). The classical notion of a function is something that takes in an input and outputs some value. For instance, we could let be a function associating some physical quantity like temperature or energy to a particular point in space, . We might think that we can measure by holding up a thermometer to position , but this is really just an idealization. In physical reality, any thermometer we use will have a definite size and hence imprecision, so what we are really measuring is some sort of average temperature in a localized region. That is, instead of measuring we are measuring something like
where can be interpreted as some averaging function.
The idea of a “generalized function,” or distribution is to entirely drop the notion that a “function” has to have a definite value at each point. A distribution provides a mean of “measuring” a function, i.e. to a function it associates a value, but it does not need to have any intrinsic values itself.
Formally, now: we define the space of test functions to be the vector space of smooth, compactly supported functions on . The space of distributions is then the continuous dual space (in the sense of linear algebra) . Any vector space includes into its dual, and indeed any test function realizes a distribution, which measures another test function as
Finite-dimensional vector spaces are isomorphic to their duals, but for infinite-dimensional spaces the dual space is bigger. The theory of distributions allows us to make sense of objects like the “Dirac function,” which is not really a function at all, although it is also not quite a distribution under our definition either, because it fails to be continuous.
Now we can say what it means for a function to have a “distributional derivative”: it should be a distribution that measures test functions in the same way that a classical derivative would. In other words, for a test function , if were differentiable then by integration by parts we would have
Even if does not exist classically, it may be that there is a distribution such that , so that it measures like . In this case we would say that is the distributional derivative of . And if is actually represented by some function , then we would say that is the weak derivative of .
Now you can see how you might cook up a function that is not even itself differentiable (classically), but deserves to be called a solution to some partial differential equation: the derivative exists as a distribution, and so then the equation asks for an equality of distributions. This just means that the two distributions measure up against all test functions in the same way, but there is no point-wise equality in the usual sense. However, this introduces a wrinkle into some partial differential equations, since distributions can be differentiated but not generally multiplied (or operated on in other ways that functions can be). So one winds up with some PDE that simply “do not make sense.”
At last, we can explain one of the great jokes of the ICM laudatios, concerning the progression of PDE theory.
- Newton introduced the differential calculus, and initiated the study of differential equations. For mathematicians, the task became: you are given a differential equation, perhaps modeling some physical system, and you want to find its solution.
- Poincaré appreciated that in many cases of interest, one cannot write down a formula for the solution to a differential equation, and so one has to settle for a less explicit, often qualitative, description. The task instead becomes: you are given a differential equation, and you want to say something about its solution. For example, one might try to prove energy estimates or bounds on the solution, or investigate the behavior as time goes to infinity.
- In the 1960s, Smale and Thom realized that often the physicists did not themselves know the correct equations that they wished to solve (at least from a mathematical perspective). For instance, the equations might not really make sense, for the reasons mentioned above. Therefore, the task then becomes: you are not given a differential equation, and you want to say something about its solution.
Now, Hairer’s work concerns an exotic type of PDE called stochastic partial differential equations. Here the equations are not determinstic, but involve random variables, which introduces all kinds of complications. The particular example that Hairer is famous for solving is called the Kardar-Parisi-Zhang (KPZ) equation:
This equation describes the time evolution of a boundary between two substances. Hairer gives the example of the shape of a piece of paper as it burns: it starts out straight, but becomes irregular and bumpy in a way that is sort of random, but also sort of structured.
The is a “white noise” term, and the solution is expected to look like Brownian motion, so the derivative doesn’t make sense classically, and has to be interpreted as a distribution. Unfortunately, one cannot multiply distributions in general, so the term doesn’t really make sense.
It is a great credit to physicists that they can manipulate completely nonsensical mathematical objects, like this KPZ equation, and somehow still obtain meaningful answers and predictions. Nonetheless, a mathematician would like to perform mathematics properly, and the great achievement of Hairer was to formulate a theory of “regularity structures” that does make sense out of the KPZ equation and also other kinds of stochastic PDE previously thought to be similarly nonsensical.
2. Elliptic regularity
Perhaps the most famous phenomenon in the basic theory of PDE is elliptic regularity. It describes the “miracle” by which solutions to certain classes of PDE, which are called elliptic, automatically possess more regularity than prescribed by the equation itself.
If you have studied complex analysis, then you have already witnessed this “miracle”: a function that is once complex-differentiable, or “holomorphic” is automatically infinitely differentiable, and even analytic. This basic fact lends complex analysis its characteristic rigidity.
The underlying reason is that a holomorphic function can be thought of as a solution to a system of PDE, the Cauchy-Riemann equations, which are elliptic. So the collapsing of all the notions of differentiability, smoothness, and analyticity in complex function theory is a special case of elliptic regularity.
The Poisson equation is a prototypical example of elliptic equation. The special case where , which is called Laplace’s equation, defines the famous class of harmonic functions, which have great significance in analysis. As you may know, general harmonic functions satisfies the sequence of properties that are familiar from complex analysis: the mean value property, the maximum principle, etc. These may be generalized, to some extent, to all solutions of elliptic equations.
So what is an elliptic PDE anyway? There is a partial classification of PDE into four broad categories: elliptic, hyperbolic, parabolic, and dispersive, which are modeled on the four main equations from physics above. This classification is neither formal nor comprehensive, but we can formalize it in a few cases of interest. A “second-order semi-linear” PDE takes the form
(The “second-order” refers to the fact that the PDE involves derivatives only up to second order, and the “semi-linear” refers to the fact that the coefficients of the differential operators do not depend on the solution .)
By the equality of mixed partial derivatives for functions, there is no loss of generality in assuming that for all . Now we perform a common mathematical maneuver, which is to throw away the lower-order terms and look only at the expression
This is called the principal symbol of the equation. The idea is that the qualitative nature of the PDE is controlled by the highest-order terms. This is analogous to how the qualitative behavior of a polynomial is controlled by its highest-order terms, i.e. its degree. Indeed, you may be familiar with the fact that the Fourier transform turns derivatives into multiplication, so that in the Fourier world this corresponds precisely to taking highest-order terms in a traditional sense. This reflects the fact that in PDE the behavior of the solution is often tied to the level of oscillation, which is precisely what is measured by the derivatives.
So anyway, we can take this expression and interpret it as quadratic form on the cotangent bundle. In this case, that’s just a fancy way of saying to look at the symmetric matrix . This is classified up to conjugacy by its signature, i.e. its number of positive and negative eigenvalues. We say that the PDE is elliptic if it has signature , hyperbolic if it has signature , and parabolic if it has signature . This comes from the classification of plane conics: we have ellipses ( signature ), hyperbolas ( signature ), and parabolas ( signature ). In other words, an elliptic equation has a positive-definite matrix at each point .
Now here is one possible formulation of elliptic regularity (some technical conditions omitted).
Theorem 1 (Elliptic Regularity) Let be a function satisfying an elliptic PDE
where are smooth. Then also .
So why is this true? There is a heuristic that “singularities flow along the characteristics” of the PDE. The characteristics are described by the zeros of the principal symbol, so elliptic equations have no characteristics, hence one expects no propagation of singularities.
Another heuristic to mention, which I already alluded to above, is that lack of regularity is linked to oscillation. Indeed, recall that the regularity of a function is related to the rate of decay of its Fourier transform, which describes its oscillation. To make this slightly more precise, consider the differential operator described by the principal symbol:
where we assume for simplicity that the are constant. Then . If this has a nontrivial zero , let’s consider how it operates on the plane wave, as gets very large. You can just compute that
Therefore, we can get as large oscillations as we want by letting . So the absence of characteristics prohibits this sort of violent oscillation.
Now, how does one prove an elliptic-regularity type theorem? Generally speaking, there are three steps.
- First one introduces algebraic contexts that provide a means to measure regularity by integrals rather than derivatives. These are a special class of Hilbert spaces, called Sobolev spaces. Perhaps the simplest way to define them is as the subspace of possessing weak (i.e. distributional) derivatives in .
The power of these Sobolev spaces is in providing a natural framework for understanding regularity. The Sobolev embedding theorems provide a link between the integral regularity measured by the Sobolev spaces and the classical sense of regularity. In addition, it is important to understand the geometric relationships between these spaces; for instance, that the Sobolev spaces are relatively compact. So this step is basically a lot of functional analysis.
- Next, one establishes a priori estimates on the derivatives, assuming their existence. This is usually the trickiest step, requiring a careful analysis of the structure of the particular PDE in question. Consider the prototype, the Poisson equation:
Squaring and integrating by parts, we find that
This shows that we can bound the norm of all the second-order derivatives of uniformly in terms of the initial data. We only assume that but just suppose, for the moment, that were three times differentiable. Then just by differentiating the original Poisson equation we would have the PDE
(In general, one obtains a different PDE for the derivative, but crucially it is still elliptic). By the same argument as above, we would obtain control over the third derivatives of in terms of the first derivatives of . Now, we don’t really have any such bound since we don’t know that the third derivative of exists. But we have shown that if it did exist, then we would have a certain amount of control over it. And in practice, this usually turns out to be enough: the saying is that “existence follows from the estimate.” Carrying that out rigorously is the content of the third step, which is usually the most technical but also relatively straightforward.
- The final step typically consists of converting the a priori estimates into a rigorous proof. There are a couple of general approaches possible. The first is to “regularize” the functions and equation involved, approximating them by smooth substitutes so that all the estimates become valid. Then one wants to slowly remove the regularization and show that these nice properties are preserved in the limit. Here again, the uniformity estimates prove crucial. A second approach is to approximate the process of differentiation, instead of the functions. For instance, one can introduce a “discrete derivative” or “difference quotient” and use the a priori estimates to show that in the limit where the discrete derivative becomes the usual infinitesimal derivative, all the objects are sufficiently well-behaved to converge to sensible classical objects.
Carrying out this program is actually quite lengthy and technical, as you might imagine. For details, see the lecture notes.
3. Transport equations and Entropy
Elliptic regularity may appear to be a wonderful thing. Our instinct is to think of regularity as being “nice,” so it may at first seem very fortunate that the solution to PDE are naturally very regular. However, it turns out that this is actually somewhat limiting if we want our solutions to describe interesting physical phenomena, which often result from a lack of regularity.
For example, did you ever wonder why matter can take several different states: solid, liquid, gas? Statistical mechanics answers this question in a rather remarkable way. It assigns to a physical system a single function, called the partition function, which encodes all of its fundamental properties like energy, entropy, heat, etc. Interesting phenomena like phase transitions correspond to discontinuities in the partition function (or its derivatives).
Just as a side remark, it is an amazing testament to the power of concentration of measure that even though describing the motion of as few as three particles directly with Newton’s laws is extremely difficult, when you scale up to a physical system with billions of particles then you suddenly find such an elegant mathematical theory.
Similarly, we might try to right down PDE that model “singular” events, like a traffic jam or sudden eruption. Then we don’t want the solutions to be regular – we expect to find interesting information in the lack of regularity.
One prominent example is Burgers’ equation, which comes from fluid dynamics in modeling traffic flow:
This has the interesting property that it does not admit a classical solution; one can prove that any classical solution blows up in finite time. Physically, this models the fact that transport equations can develop “shocks,” which are fundamentally singular events. So then we must look in the realm of weak solutions for functions that exhibit shockwave behavior, but it turns out that there are infinitely many different weak solutions (even for the same initial data). So in order to achieve well-posedness, we must work in some class of solutions interpolating between the classical and the weak. How can we find this?
This question is answered by a beautiful perturbative argument. The idea is to introduce a second order term, which is sometimes called a “viscosity term” because it corresponds physically to adding some viscosity to the system. Let us instead consider the family of PDE:
You might at first think that this is a family of elliptic PDE, but it is technically parabolic for because there is a time variable as well. Nonetheless, one can use arguments like what I explained for elliptic regularity to argue that certain types of parabolic PDE also have nice regularity properties. (Apologies that this makes the whole previous section seem somewhat misleading, but what goes around comes around.) In particular, that means that there will be exist a unique classical solution to the equations in this family, for . As , the equation “converges” to the one we want to solve, so one might hope that these solutions also converge to a solution to Burgers’ equation. If that is case, then it makes sense that this limiting solution is what we should consider the unique “physical” solution.
In practice, it tends to be extremely difficult to show that the solutions converge. In fact it’s basically impossible to do this in any level of generality, but for specific equations like the Burgess one can make an argument. What is more tractable is to show that if the solutions converge, then the limit has nice properties. For instance, it is almost immediate that the limit will be a weak solution: if is a test function, then using integration by parts we have
One can then take the limit as to conclude that
which is the defining condition of a weak solution.
It takes quite a bit of work to digest this idea into a formulation of solution concept that is well-posed for the transport equation, but answer turns out to be the notion of entropy solution. Roughly speaking, this says that all entropy functions of the solution should increase (or at least not decrease) in time. A classical solution preserves the entropy perfectly, whereas a weak solution can increase it or decrease it. The precise formulation is somewhat technical: it says that for all pairs of smooth function such that is convex and , we have for all positive test functions that
It is a celebrated theorem of Khruzkov that entropy solutions exist and are unique. The uniqueness is fairly involved but not so difficult; a proof is outlined in my notes. The existence is somewhat deeper still, requiring semigroup methods. And then after this theoretical preparation, it is quite a bit more work to explain how to actually compute entropy solutions. This material is discussed in a book of Denis Serre, but I should warn that he seems to have inherited some of the terseness of his famous uncle.
4. Hairer and the KPZ equation
Hairer introduced a regularization procedure for the KPZ equation that sounds similar in spirit (to the basic extent that I understand it) to the “viscosity approximation” that I described for transport equations. In his case, the process can be viewed as assuming that the random noise in the equation occurs at a small but not infinitesimal scale, which leads to “smoothed out” equations that make sense. The monumental technical task that Hairer carried out was in showing that the solutions to these regularized equations converge properly, which I imagine was quite difficult, since it was already difficult for the simple example of the Burgers equation.
That is, the original equation was
and Hairer considers the family of equations
where is smooth and converges in an appropriate sense to , and is a well-chosen renormalization constant. By “smoothing out the noise” in this way, he obtains a family of PDE admitting sensible solutions. Since the equations in this family “converge” to the KPZ equation, one hopes the same for the solutions. This is really the technically difficult step, but after a lot of work he is able to show that the family of solutions to these perturbed SPDE does indeed converge, to a function that is then interpreted as the solution to the original KPZ equation, and has the properties predicted by physicists.
I’ve seen mentioned a number of key insights of Hairer, but I don’t understand the technicalities well enough to really say what they mean. For instance, proving convergence involves a Taylor-like expansion of the solutions. One of the difficulties with proving convergence to a limit seems to be that if one picks a fixed “universal” basis, such as the basis of polynomials that is used for Taylor expansion, then the roughness of the solutions means that the approximation is not good enough. Instead, Hairer defined a custom basis for the equation that was designed to approximate the solutions well. I think that the objects he defines are really more subtle than a basis in the usual sense, but they have a similar kind of role. In any case, it’s nice to see that his ideas at least sound connected to things that I can actually appreciate.