See also: Number theory
We may -- paraphrasing the famous sentence of George Orwell -- say that
'all mathematics is beautiful, yet some is more beautiful than the
other.' But the most beautiful in all mathematics is the Zeta function.
There is no doubt about it.
Krysztof Maslanka God may not play dice with the universe, but there's something strange going on with the prime numbers! Paul Erdős |
David Hilbert, in his famous speech at the International Congress of Mathematicians at Paris in 1900, included this problem as number 8 in his list of 23 challenging problems for mathematicians in the coming century. Today, over 100 years later, it is one of the few on that list that have not been solved. Many contemporary mathematicians consider it the most important unsolved problem in mathematics. In any case, it has been included among the seven most important problems, for the solution of any of which a prize of $1 million has been offered by the Clay Mathematics Institute.
The definition of the function ζ(s) is not especially difficult. The statement of the Riemann hypothesis is certainly clear enough (if you have any acquaintance with complex numbers). So why on Earth is this hypothesis considered so important? One reason is that it has turned out, in spite of its apparent simplicity, to be so difficult to prove. But a more important reason is that the problem is thoroughly entangled with many questions about prime numbers -- which of course are of intense interest to number theorists, yet at first glance have little to do with functions of a complex variable.
We shall soon see that there is a very close and simple relationship between ζ(s) and prime numbers. But hidden beneath this simple relationship seems to be one much deeper and more profound -- one that connects properties of the zeros of ζ(s) to the way in which prime numbers are distributed among the integers. And not only is there reason to suspect such a relationship, but much more general relationships are suspected to exist between various generalizations of ζ(s) and more general types of algebraic objects.
The function ζ(s) is defined by the infinite sum
ζ(s) = ∑_{1≤n<∞} 1/n^{s}Where the sum is taken over integers n such that 1 ≤ n < ∞. (We have to use this somewhat clumsy notation for sums since HTML doesn't let one easily write the more usual notation.)
A great deal of care is required when working with "infinite" sums, because only in certain circumstances do they actually make sense. In this case, it can be shown that the sum actually "converges" only when the real part of s is greater than 1 (i. e. Re(s) > 1). (Recall that any complex number can be written uniquely in the form s = σ + iτ, where σ and τ are both real and i^{2} = -1. We write Re(s) = σ and Im(s) = τ to denote the real and imaginary parts, respectively.)
To say that the infinite sum "converges" for Re(s) > 1 is to say that the complex numbers represented by finite partial sums of the form
S_{N}(s) = ∑_{1≤n<N} 1/n^{s}have a well-defined limiting value, for each s with Re(s) > 1, as N becomes arbitrarily large.
Without going into further details, let it suffice to say that there are standard techniques for rigorously talking about infinite sums and proving that they make good sense (i. e. "converge") under appropriate conditions. These rigorous techniques were fully established only in the second half of the 19th century. However, mathematicians had worked with infinte sums long before that -- rearranging terms and performing other kinds of formal operations (e. g. differentiation and integration) -- in order to derive a wide variety of formal identities. Often these formal manipulations could be rigorously justified, at least under suitable conditions. Sometimes, unfortunately, they could not, in which case nonsense results such as 1 = 0 could be derived, just as can happen when improperly allowing division by 0. Indeed, the basic operations of calculus itself (differentiation and integration) themselves involve operations with infinite sums and sequences. It was a most welcome accomplishment of 19th century mathematics to put all such "infinite" operations on a rigorous basis.
With standard rigorous techniques for working with infinite sums (which are also often called "infinite series") in hand, it was clear that the infinte sum used above to define ζ(s) definitely did not converge when Re(s) < 1. Just for example, if s = -1, it would amount to trying to assign a reasonable value to the sum 1 + 2 + 3 + ... . Now, there are roundabout ways for assigning plausible values to such a sum. For instance, the remarakable Indian mathematician Srinivasa Ramanujan (1887-1920) thought that 1 + 2 + 3 + ... should equal -1/12. And soon we shall see a sense in which ζ(-2k), which is formally 1 + 2^{2k} + 3^{2k} + ..., has a simple numerical value for integers k &ge 1. But all of this is beyond the standard rigorous concept of convergence.
Instead of representing functions as infinite series, mathematicians developed an entirely different way of defining functions of a complex variable. It is called the method of "analytic continuation". It turns out that complex-valued functions of complex variables have remarkable properties. For one thing, if such a function has even a first derivative (at a given point), it must actually have derivatives of all orders (at that point). Functions with that property are called "analytic" or "holomorphic". Although this property is specific to a given point, for all "reasonable" functions it occurs for all but (at most) an isolated set of points in the complex plane. Another surprising property of analytic functions is that if they are constant on any circular disk in the complex plane (that is, a set of the form {z : |z - z_{0}| < r}), then they must be constant everywhere in the complex plane that they are analytic. (The notation |w|, for a complex number w, is called the "norm" or "absolute value" of w and is defined as √(Re(w)^{2} + Im(w)^{2}).) Consequently, any two analytic functions that are equal on some disk -- so that their difference is an analytic function equal to the constant 0 on the disk -- must be equal everywhere. Using this fact, the process of analytic continuation consists of extending analytic functions defined on some disk in the complex plane to the whole plane (perhaps with a few isolated singular points) by defining the function on a sequence of overlapping disks.
Now, the infinite series used to define ζ(s) for Re(s) > 1 defines the function not merely on a disk but on an infinite half of the complex plane. It can be shown that anaytic continuation permits ζ(s) to be defined for any complex s except for isolated singularities. (s = 1 is one of these.) The zeta function is therefore well-defined and a generally quite well-behaved function for all but isolated singularities. So it can be meaningfully discussed even when its infinite series doesn't make sense. We shall soon see that there is a "functional equation" which specifies the value of ζ(1-s) in terms of ζ(s) -- in particular, for Re(s) < 0.
Although the function is usually called the "Riemann" zeta function, Riemann, as we noted, was not the first to work with it. Leonhard Euler (1707-83) studied the series 100 years eariler, discovered a number of its properties, and in fact found another extremely important representation of it in terms of an infinite product instead of an infinite series. P. G. L. Dirichlet (1805-59) (who was perhaps Riemann's favorite teacher at the University of Göttingen) studied more general series of the form ∑_{n} a_{n}/n^{s}, for arbitrary complex coefficients a_{n}. Accordingly, series of this form are called Dirichlet series.
Riemann's contribution, however, was to realize that ζ(s) could be analytically continued to the whole complex plane, to derive many important properties of ζ(s), such as its functional equation, and -- most importantly -- to suspect its deeper relationship to the distribution of prime numbers. And along the way, to state his celebrated hypothesis. So there's plenty of justification for calling ζ(s) the Riemann zeta function.
∑_{1≤n<∞} 1/n^{s} = ∏_{p} (1 - 1/p^{s})^{-1}This formula is valid only for s with Re(s) > 1, so that both sides "converge" to finite values.
If you've read this far, you've probably had enough mathematics to be comfortable with the ∑-notation, but may not be familiar with the product notation on the right side of this equation. It should be intuitively clear, however, that it may be understood in an analogous way as a product of terms of the indicated form. In this case, the product is taken over all positive primes, and only primes. (Note that 1 is not considered a prime, so the product makes sense.) For an infinite product, there are major issues of convergence, just as for infinite sums. Euler worked mostly formally and didn't bother much about such issues. Nevertheless, the issues can be dealt with and a rigorous treatment of infinte products can be given, just as for infinite sums.
To "prove" Euler's equation, in a loose formal sense, we need only recall the so-called "geometric series":
1/(1-x) = ∑_{0≤n<∞} x^{n} = 1 + x + x^{2} + ...The "proof" of this latter identity is obtained by multiplying both sides by 1 - x and rearranging terms, so that
1 = (1 - x)(1 + x + x^{2} + ...) = (1 + x + x^{2} + ...) - (x + x^{2} + x^{3} + ...)On the right hand side of this, all terms cancel out except for the 1. These formal operations can be rigorously justified if x is a real (or complex) number of absolute value less than 1 (i. e. |x| < 1). If p is any prime, 1/p < 1, and in fact (less obviously) |1/p^{s}| < 1 if Re(s) > 1. So putting x = 1/p^{s} in the geometric series gives
(1 - 1/p^{s})^{-1} = ∑_{0≤k<∞} (1/p^{s})^{k} = ∑_{0≤k<∞} p^{-ks}Finally, taking the product of all these terms gives
∏_{p} (1 - 1/p^{s})^{-1} = ∏_{p} ∑_{0≤k<∞} p^{-ks} = (1 + 2^{-s} + 2^{-2s} + ...) × (1 + 3^{-s} + 3^{-2s} + ...) × ...Now, when you multiply everything out on the right side of that, you find that a term of the form 1/n^{s} occurs exactly once for each n such that 1 ≤ n < ∞, because each such integer is uniquely expressible as a product of powers of primes. (You might have to stare at that and scratch your head for awhile to see it.) This proves Euler's product formula.
The existence of this product formula of Euler to represent the zeta function is the simplest, most obvious way in which the zeta function is connected with prime numbers. It is certainly not the only way, nor -- by a long shot -- the most profound. To begin to appreciate deeper connections, we have to look into what is known about the distribution of prime numbers.
Interestingly enough, Euler's product formula for ζ(s) gives an independent proof that there are infinitely many primes. It is known that the infinite sum ∑_{n} 1/n diverges, i. e. is not a finite number. By a standard result of calculus, one has the "Taylor series" expansion:
-log(1-x) = ∑_{1≤n<∞} x^{n} / n(Here, "log" always means the natural logarithm, with base e = 2.718281828... .) But this expansion is valid only for -1 ≤ x < 1, since the left hand side is not finite for x = 1. In fact, the sum with N terms is approximately log N, which grows arbitrarily large as N does, albeit very slowly. More precisely, there is a small number γ such that
γ = lim_{N→∞} [(∑_{1≤n≤N} 1/n) - log N]This too was proven by Euler, and γ is known as "Euler's constant". It has been computed that γ = 0.5772157... . (Incidentally, it's an open question whether γ is a rational number, though it very likely isn't.) Consequently, the sum ∑_{n} 1/n^{s} must be arbitrarily large for real s > 1 as s approaches 1. But by Euler's formula for ζ(s), that sum must approach the product ∏_{p} (1 - 1/p)^{-1}, which is finite if there are only finitely many primes. The contradiction implies there must be infinitely many primes.
So already the function ζ(s) tells us something about the distribution of primes. Now, given that there are infinitely many primes, we want to ask ourselves how are they distributed. Simply looking at a table of primes suggests that the primes come farther and farther apart the larger they are. We would like to have a better quantification of this observation.
Intuitively, the reason that primes become sparser as they grow larger is that in some sense it becomes more "difficult" for a large number to be prime the larger it is. This is because there are an ever increasing number of smaller numbers (primes among them) which could divide it. The ancient Greeks knew how to construct tables of primes using what is known as the "sieve of Eratosthenes", after its inventor, who lived in the 3rd century BCE. The method of the sieve is to list all numbers up to whatever limit one wishes to deal with. Then starting with the prime 2, cross out every second number thereafter, every 3rd number after 3, every 5th number after 5, and so on, up to a prime larger than the square root of the largest number in the original list. The numbers which are left are guaranteed to be prime.
One can ask what is the probability that any given number in the list will be crossed out at some point in the process. Of course, for any particular number, it is either prime or it isn't, so the probability is either 0 or 1. But suppose we don't know in advance which numbers are which and we simply pick one at random. What can we say about the probability it is prime, knowing only its size?
Suppose the number is X. The probability it is divisible by 2 is 1/2. The probability it is divisible by 3 is 1/3, and so the probability that it is not divisible by 3 is 2/3. In general, for any prime p ≤ X, the probability X is divisible by p is 1/p, and the probability it is not divisible by p is 1 - 1/p. Therefore, the probability that X is prime is the probability that it is not divisible by any prime p ≤ X, which is the product of all the separate probabilities, which we define as G(X) = ∏_{p≤X} (1 - 1/p) -- provided the probabilities for divisibility by each prime are independent of each other. A priori we don't know about independence. There's no good reason to think the probabilities are entirely independent. We will see that a slight adjustment is necessary to get the correct probability.
We are interested in the total number of primes ≤ X, and so we give it a name. We will call this number π(X). A very rough approximation is the count of numbers ≤ X times the probability each is prime. We would like to find a simple formula f(X) such that the ratio of π(X) to f(X) for large X is 1. Mathematicians call this notion "asymptotic equivalence" and even have a notation for it. We write π(X) ∼ f(X) provided π(X)/f(X) → 1 as X → ∞. Note that is not to say that the limit of π(X) is f(X) as X → ∞, since that would imply the difference between the two tends to 0 as X → ∞. We are willing to settle for something a little weaker, namely that the ratio tends to 1 in the limit.
Does that product expression for the probability G(X) look familiar? It sure as heck does. It is a finite form of the product in Euler's formula for the zeta function (or rather, its reciprocal, to be precise). We also recall that there is a relationship between the finite sum ∑_{1≤n≤X} 1/n and log(X) which involves Euler's constant γ. Is there a similar relationship between the finite product and log(X)? Indeed there is. It was discovered by Franz Mertens in 1874, and so it's called Mertens' theorem:
1/G(X) = ∏_{p≤X} (1 - 1/p)^{-1} ∼ e^{γ} log(X)(Mertens also made a conjecture, called the Mertens conjecture which, if true, would have implied the Riemann hypothesis. Unfortunately, it was proved false in 1984, as we shall see later.) When a careful analysis of probabilities is done, it turns out that e^{γ} is exactly the factor that must multiply G(X) to give the probability that X is prime. (e^{γ} is about 1.782, so the estimate wasn't wildly off.) In other words, the suspicion is that
π(X) ∼ e^{γ}G(X)Xand so by Mertens' theorem
π(X) ∼ X / log(X)
This formula can be interpreted to say that the probability that a given number less than X is prime is just 1/log(X). Alternatively, it says that the average spacing between primes less than X is about log(X).
The argument here is by no means a rigorous proof. The result is in fact true, but it is a very difficult theorem, now called the prime number theorem. C. F. Gauss (1777-1855) suspected (perhaps as early as 1792 by his own account) that the probability any integer X is prime is about 1/log(X), and hence that π(X) ∼ X / log(X). But a rigorous proof did not come until 1896, when Jacques Hadamard (1865-1963) and C. J. de la Vallée-Poussin (1866-1962) independently came up with proofs. Although various proofs of the prime number theorem have been given since then, none are especially easy.
Interestingly enough, Riemann himself seems not to have worked much on this problem, even though his famous conjecture turns out to be equivalent to a statement about just how precise the approximation of the prime number theorem actually is. To understand somewhat of the nature of this deep connection, we have to talk more about the zeta function.
To begin with, Riemann showed that ζ(s), as defined by either the Dirichlet series or the Euler product only in case Re(s) > 1, actually had an "analytic continuation" to the entire complex plane except for a single singularity at s=1. (That singularity was to be expected, given that the Dirichlet series reduces to the Taylor expansion of -log(1-s) at s=1.) Riemann did not use an abstract argument based on extending the function by means of defining it piecemeal on overlapping disks. (This procedure originated in the work of Karl Weierstrass (1815-97), which came a little later.) Instead, Riemann came up with an explicit formula for ζ(s) which actually made sense, and defined ζ(s), for all s ≠ 1. That formula can be written as:
ζ(s) = -Γ(1-s)(2πi)^{-1} ∫_{C} (-x)^{s-1}/(e^{x}-1) dxThis is what's known as a "contour integral", which is an integral that is evaluated on a path in the complex plane chosen to avoid the singularity of the integrand. In this formula, Γ(s) is Euler's gamma function, which is defined by
Γ(s) = ∫_{0≤x<∞} e^{-x}x^{s-1} dxAlthough Γ(s) has a definition in the form of a definite integral, it's more intuitively thought of as a generalization of the factorial function, since Γ(n+1) = n! for integers n ≥ 0, and in fact Γ(s+1) = (s+1)Γ(s) for all s.
Even more importantly, Riemann proved that ζ(s) satisfies a "functional equation" which relates its values at s to its values at 1 - s. The functional equation is this:
ζ(1-s) = (π^{1/2-s} Γ(s/2) / Γ((1-s)/2)) ζ(s)Since Γ(s) appears in Riemann's integral formula for ζ(s), it's not surprising that it also appears in the functional equation. The equation isn't too hard to prove, given known properties of Γ(s) and standard facts of complex function theory. In fact, Riemann provided two proofs of the equation in his paper. There are many other proofs. E. C. Titchmarsh in his Theory of the Riemann Zeta Function gave seven proofs.
Various important facts about ζ(s) follow immediately from the functional equation. Clearly, Γ(s) plays a significant role. One fact about Γ(s) is that it runs off to infinity when s is any integer ≤ 0. (Such a singularity is said to be a "pole" of the function.) Suppose first that s = 1. ζ(1) and Γ(0) both have poles, but because one is in the numerator and the other is in the denominator on the right hand side of the functional equation, they cancel each other out. Γ(1/2) = √π, so as a result ζ(0) is finite and nonzero.
Now suppose s is an odd positive integer ≥ 3. Then Γ(s/2) and ζ(s) are both finite, but there is a pole corresponding to Γ(1-s) in the denominator, so the functional equation shows ζ(-2k) is zero when s = 2k+1 and k is an integer ≥ 1. These values of s = -2k are the only real zeros of ζ(s), as well as the only zeros when Re(s) ≤ 0. They are called the "trivial zeros" of the zeta function. (The Euler product representation of ζ(s) for Re(s) > 1 ensures there are no zeros in that part of the complex plane.)
Finally suppose s = 2k is an even positive integer. Then both Γ(s) and Γ((1-s)/2) are finite and nonzero, as is ζ(s). So ζ(1-2k) is finite and nonzero for integers k ≥ 1. Interestingly enough, those values are rational numbers, which we can write like so:
ζ(1-2k) = -B_{2k} / 2kfor k ≥ 1. We write the values in that form, because B_{n} are rather special rational numbers, called Bernoulli numbers, after Jacques Bernoulli (1654-1709) who introduced them. Bernoulli numbers are defined as coeffiecients in a certain power series (the "generating function"):
xe^{x} / (e^{x} - 1) = ∑_{0≤n<∞} B_{n} x^{n} / n!
(The expression x / (e^{x} - 1) can also be used as a generating function to define Bernoulli numbers, and the only difference is in the sign of B_{1}. This fact isn't immediately obvious, and it implies very interesting recurrence relationships among the Bernoulli numbers.)
The functional equation allows us to give an expression for ζ(2k) as well, although Euler had already discovered the formula. It is:
ζ(2k) = (-1)^{k+1} B_{2k} (2π)^{2k} / (2(2k)!)Unfortunately, the functional equation makes the values ζ(2k+1) totally obscure. It is only relatively recently (1978) that ζ(3) was proven to be irrational (by R. Apéry) -- in spite of Euler's dilligent efforts to compute the ζ(2k+1) numerically for many values of k. We make a big deal about these expressions involving the Bernoulli numbers, since these numbers have connections with diverse other areas of mathematics, including a big role in Fermat's Last Theorem. We'll say a bit more about them later.
However, we haven't yet come to one more important result Riemann stated in his 1859 paper, though he gave no proof of it. (Riemann was the sort of mathematician who "knew" more than he could rigorously prove, as his famous hypothesis shows.) This result is the most important as far as the distribution of prime numbers is concerned. It is an entirely different product formula for ζ(s), which is:
ζ(s) = -e^{bs} [s(1-s)Γ(s/2)]^{-1} ∏_{ρ∈Z} (1-s/ρ)e^{s/ρ}In this formula b = log(2π) - 1 - γ/2, where γ is Euler's constant, and Z is the set of zeros of ζ(s) in the so-called "critical strip", {s∈C: 0≤Re(s)≤1}, the portion of the complex plane lying between the lines where Re(s)=0 and Re(s)=1. The complex numbers in Z are the "non-trivial" zeros of ζ(s). It follows from the functional equation of ζ(s) that the zeros are symmetrically distributed around the line Re(s) = 1/2. That is, if ρ = 1/2 + σ + iτ ∈ Z, then so is 1/2 - σ + iτ. The Riemann hypothesis is the conjecture that all members of Z are of the form ρ = 1/2 + iτ.
This formula is rather like the expression for a polynomial f(s), which can be written (up to a constant factor) as
f(s) = ∏_{ρ} (1-s/ρ)where the (finite) product is over the zeros of f(s) (assuming no ρ = 0 -- if s=0 is also a zero of f(s), just throw in a factor of s^{k} for some integer k > 0, where k is the multiplicity of 0 as a root.)
Given that there are in fact two product formulas for ζ(s), that one involves the prime numbers, and that the other involves the zeros of ζ(s), it's hard to avoid the suspicion that there is some relationship between the distribution of the primes and the distribution of the zeros. We'll be more specific about this shortly.
It turns out that many complex analytic functions have a product formula of the above sort, provided they are sufficiently well-behaved. Weierstrass first proved the existence of this kind of representation, and in 1893, just three years before he proved the prime number theorem, Hadamard (motivated by his work on the prime number theorem) proved a more general result, which as a corollary proved the product formula that Riemann conjectured for ζ(s). Such products are now called Hadamard products.
From this representation, together with the functional equation of ζ(s) and some properties of its Dirichlet series, Riemann claimed an asymptotic formula for the number of zeros whose imaginary parts were less than some given value T. More precisely, he claimed
#({ρ∈Z: 0 < Im(ρ) < T}) ∼ (T/2π)log(T/2π) - T/2π(The notation #(...) means the number of elements in the set.) Zeros in Z are symmetric about the real axis (the line Im(s) = 0), so it isn't necessary to count zeros with negative imaginary part as well.
Riemann didn't give a proof of this formula, and it was not rigorously proven until 1905 (by H. von Mangoldt). Nevertheless, he went further and also claimed
#({ρ∈Z: 0 < Im(ρ) < T and Re(ρ) = 1/2}) ∼ (T/2π)log(T/2π) - T/2πas well. This isn't quite as strong as saying all zeros are on the line Re(s) = 1/2, just that "most" of them are. Nevertheless, even this weaker result has never yet been proven. As for the strongest possible result, that all zeros lie on the line Re(s) = 1/2, Riemann simply says it is "very likely" true.
That is, based on probabalistic ideas as well as extensive calculations, Gauss conjectured that
π(x) ∼ x/log(x)
There's a similar function which gives a slightly better result, both theoretically and computationally. It is the "logarithmic integral":
Li(x) = ∫_{2≤t≤x} log(t)^{-1} dtThis was actually a pretty good estimate. When x = 10^{7}, the estimate is off by only 339.
One of Gauss' near contemporaries, A. M. Legendre (1752-1833) reached similar conclusions. His book Theorie des nombres in about 1800 suggested that
π(x) ∼ x/(A log(x) + B)for constants A and B. Somewhat later, P. L. Chebyshev actually proved, around 1850 (about 10 years before Riemann's paper), that if the limit implied by Legendre's asymptotic equivalence actually existed, A had to be 1.
Riemann's 1859 paper gave an "explicit" formula for π(x) that shows
π(x) = Li(x) + other stuffRiemann's formula gave the "other stuff" explicitly. Unfortunately he could not estimate it well enough to yield, immediately, the prime number theorem. But let's take a closer look at it anyhow.
Starting with the Euler product for ζ(s), we've already found, for Re(s) > 1,
log ζ(s) = ∑_{p} ∑_{n} (1/n)p^{-ns}Riemann defined a function F(x), for real x ≥ 0, that is constant except for jumps at integers which are powers of primes. F(0) = 0. At integers which are primes, it jumps by 1. At the squares of primes, it jumps by 1/2. At prime cubes it jumps by 1/3. And so on. That sounds like a rather peculiar function, but it turns out there is a simple formula for it. Riemann realized that
F(x) = ∑_{1≤n<∞} π(x^{1/n})/nHere's how to see that. First note that for any x ≥ 0 the sum has only a finite number of terms, because π(x) = 0 for 0 ≤ x < 2. Hence π(x^{1/n}) = 0 for x^{1/n} < 2, or equivalently, for log_{2}(x) < n. If p is any prime, then log_{2}(x) = (ln(p)/ln(2))log_{p}(x) = N(x,p). So the sum is actually, for any prime p,
F(x) = ∑_{1≤n≤N(x,p)} π(x^{1/n})/nNow, π(x) is a monotonically increasing step function, with steps only for x a prime integer. Hence F(x) is also a monotonically increasing step function, with steps where x is a positive integer power of a prime. If x = p^{k}, then k = log_{p}(x) ≤ N(x,p), and the step, which occurs for n = k, is 1/k.
Riemann then showed that for this peculiar function, the preceding formula is an integral:
log ζ(s) = s ∫_{0≤x≤∞} F(x)x^{-s-1} dxBy a procedure known as "Fourier inversion" this can be used to express F(x) in terms of log ζ(s):
F(x) = (2πi)^{-1} ∫ log ζ(s) (x^{-s}/s) dsAt this point, the other product formula for ζ(s) can be used to give a formula for log ζ(s) which can be substituted into the integral. As we've seen, this other formula explicitly involves the zeros of ζ(s). This is the precise point at which the distribution of those zeros becomes important. There is a great deal of hairy calculus involved in working everything out. But the final result is a formula for F(X), which is:
F(x) = Li(x) - ∑_{ρ} Li(x^{ρ}) - log 2 + ∫_{x≤t≤∞} (t(t^{2} - 1)log t)^{-1} dtThere were substantial holes in Riemann's "proof" of this formula. It was rigorously established by H. von Mangoldt only in 1895.
To finish the derivation of a formula for π(x) there is one more "trick", which involves the number-theoretic function μ(n) defined for integers by μ(n) = 0 if n is divisible by the square of a prime, μ(n) = 1 if n is a product of an even number of distinct primes, and μ(n) = -1 if n is a product of an odd number of distinct primes. μ(n) is called the Möbius function. (The same Möbius who was responsible for the Möbius strip.) Using that, we can "invert" the expression for F(x) in terms of π(x) to obtain
π(x) = ∑_{1≤n<∞} μ(n)F(x^{1/n})/nThis is also a finite sum, since for any x and sufficiently large n, F(x^{1/n}) = 0. Finally, substituting the preceding expression for F(x) into this we get
π(x) = Li(x) + ∑_{2≤n<∞} μ(n)Li(x^{1/n})/n + other stuffwhich, as claimed, is an "explicit" formula for π(x).
Oddly enough, Riemann did not try to use this explicit formula to give an asymptotic formula for π(x). Or perhaps not so oddly, because estimating the size of the "other stuff" is quite difficult. In particular, it entails dealing with with sums like
∑_{1≤n<∞} ∑_{ρ} Li(x^{ρ/n})that involve zeros of ζ(s). Theoretically this is quite significant. It shows that π(x) depends explicitly on the zeros of ζ(s). However, in practice, dealing with these terms requires knowing about the location of those zeros. Our knowledge of this is very limited even today, and Riemann knew even less, in spite of his inspired hunch.
The bottom line is that it is necessary to describe the location of the zeros of ζ(s) at least crudely even to prove the prime number theorem. And in order to get the "best possible" estimate of π(x) it's necessary to prove Riemann's hypothesis.
As intriguing as it is to have an actual "explicit" formula for π(x), making use of the formula is another matter. The first term of the formula is Li(x), which is precisely what we want to have as the asymptotic value of π(x). However, the rest of the formula makes it too cumbersome to use for deriving the prime number theorem -- since, as it turns out, Chebyshev had in 1850 already found an equivalent form of the theorem which was much easier to work with.
The basis of the alternative formulation is the function ψ(x) defined by the finite sum
ψ(x) = ∑_{pn≤x} log pwhere the sum is over all prime powers less than x. ψ(x) can be defined also as a step function, similar to F(x), that jumps by log p at powers of each prime p. Chebyshev could show that π(x) ∼ Li(x) is a consequence of ψ(x) ∼ x -- but of course he couldn't show the latter. The key thing to remember is that ψ(x) grows just a little faster than π(x), but is easier to work with. In fact, since Li(x) ∼ x/log(x), if π(x) ∼ Li(x), then ψ(x) ∼ π(x) log(x), and conversely.
Where did this strange definition of ψ(x) come from? It isn't quite as weird as it may seem. For one thing, ψ(x) may be thought of as a function that counts the number of primes, only in a slightly different way than π(x) does. The latter function increases by 1 every time x is a prime. ψ(x), on the other hand, increases by log(x) every time x is a power of a prime. In exchange for this odd way of counting, the conjectured asymptotic relationship π(x) ∼ Li(x) is equivalent to the relationship ψ(x) ∼ x, which is a much simpler function of x -- almost as simple as possible, in fact. One might therefore expect ψ(x) to be easier to work with.
In any case, we can write ψ(x) = ∑_{1<n≤x} Λ(n) where
Λ(n) = log p if n = p^{k} for some k ≥ 1, otherwise Λ(n) = 0Λ(n) was introduced by von Mangoldt and is named after him. But where does it come from? It's simply the coefficient of n^{-s} in a certain Dirichlet series, namely that of -ζ′(s)/ζ(s) = -(log ζ(s))′. (This is called the "logarithmic derivative" of ζ(s).) The result can be proven with some standard calculus facts from the Euler product of ζ(s), because
-ζ′(s)/ζ(s) = (∑_{p} log (1-p^{-s}))′ = ∑_{p} p^{-s}(log p) / (1 - p^{-s}) = ∑_{p} ∑_{k≥1} (log p)p^{-ks} = ∑_{n≥2} Λ(n)n^{-s}
The expression that relates ζ(s) to ψ(x) is
-ζ′(s)/ζ(s) = s ∫_{0≤x<∞} ψ(x)x^{-s-1} dx(This can be shown from the fact that ∫_{n≤x<∞} x^{-s-1} dx = n^{-s}/s, and so
-ζ′(s)/ζ(s) = ∑_{n≥2} Λ(n)n^{-s} = ∑_{n≥2} s&Lambda(n); ∫_{n≤x<∞} x^{-s-1} dx = s∫_{0≤x<∞} (∑_{1<n≤x} Λ(n)) x^{-s-1} dx = s ∫_{0≤x<∞} ψ(x)x^{-s-1} dx)
By manipulations similar to those Riemann used to obtain an explicit formula for π(x), von Mangoldt found from this, in 1894, that
ψ(x) = x - ∑_{ρ} x^{ρ}/ρ + ∑_{1≤n<∞} x^{-2n}/2n - ζ′(0)/ζ(0)where ρ in the sum refers (as before) to the nontrivial zeros of ζ(s). This formula can be further simplified to
ψ(x) = x - ∑_{ρ} x^{ρ}/ρ + log(x^{2}/(x^{2}-1))/2 - log(2π)This is similar to Riemann's explicit formula for π(x), but considerably easier to work with. The fact that the leading term is x means it is quite plausible that what we want (per Chebyshev) for the prime number theorem, namely ψ(x) ∼ x, is true. An immediate consequence of this formula is that there must be infinitely many zeros in the critical strip, for otherwise ψ(x) would be a continuous function, which it clearly isn't.
It is required to calculate the limit of ψ(x)/x as x → ∞. The ratio of the last two terms to x in the limit is 0, so in order to have ψ(x) ∼ x as desired, it suffices to show that ∑_{ρ} x^{ρ-1}/ρ → 0 also as x → ∞. Suppose the limit of that sum can be evaluated termwise, so it suffices to show that x^{ρ-1} → 0 for each ρ. Then it only remains to show Re(ρ) < 1 for all roots of ζ(s), because then Re(ρ-1) < 0 and so the absolute value
|x^{ρ-1}| = |x|^{Re(ρ-1)} → 0as x → ∞.
It was these last two steps that both Hadamard and de la Vallée Poussin independently accomplished in 1896, completing the proof of the prime number theorem. In other words, considerably less than the Riemann hypothesis needs to be known about the zeros of ζ(s). All that's required is that none of the nontrivial zeros lie on the boundary of the critical strip, Re(s) = 1 (or equivalently, by the functional equation, Re(s) = 0). The fact that there are no zeros with Re(s) = 1 turns out to be pretty easy to deduce in a variety of ways from the Euler product.
In the time since 1896, many other proofs of the prime number theorem have been found, some of them fairly short. There are even proofs which are "elementary" in the sense that they don't use complex analysis or Fourier transforms. The first such "elementary" proof was devised by Paul Erdös and Atle Selberg in the late 1940s. There are a number of variations of this proof, but none are especially straightforward or "natural", nor do they provide much insight into the theorem. Various other short proofs use more sophisticated techniques involving Fourier transfoms and "Tauberian theorems". Proofs of this sort were pioneered by Norbert Wiener and Shikao Ikehara around 1930.
Yet another kind of short proof of the prime number theorem is based on a Tauberian theorem (that is, a theorem about conditions for convergence of a series) due to A. E. Ingham. The reason that Tauberian theorems play a prominent role in this theory is that they relate the convergence of a series (such as a Dirichlet series) to the size of its coefficients. In particular, the prime number theorem is equivalent to ψ(x) ∼ x, which means that ψ(x)/x → 1 as x → ∞. But ψ(N)/N = (1/N)∑_{1<n≤N} Λ(n) is simply the average value (up to the Nth term) of the coeffiecients of the Dirichlet series for -ζ(s)′/ζ(s). Then the average value of the coefficients of that series being 1 is equivalent to the average value of the coefficients of -ζ(s)′/ζ(s) - ζ(s) being 0. So one can write out what those coefficients are exactly, make other estimates on them, and eventually establish the desired result.
However, there is a shortcoming of most of these shorter proofs, in that they are generally unable to say much about the absolute error in the approximation. That is, π(x) ∼ Li(x) means
π(x) = Li(x) + O(f(x))where f(x)/x → 0 and the "O" notation means |π(x)-Li(x)| < Cf(x) for some constant C and all large x. We would like to know as much as possible about f(x). It could be as large as x^{1-ε} for some ε, or something much smaller, like log(x). Most of the shorter proofs of the prime number theorem don't give much information about this, because it really depends on the location of the zeros of ζ(s).
So we want to know as much as possible about the location of the zeros of ζ(s), to know the best estimates that can be made for bounds on the approximation of π(x) by Li(x) -- with the best bounds being achieved if the Riemann hypothesis is true. So let's take a closer look at that.
Let's review a bit. The first clue that the Riemann zeta function is related to prime numbers came from the Euler product formula, which exhibits ζ(s) explicitly in terms of an infinite product (that converges for Re(s) > 1) which has factors involving the prime numbers in a simple way. From this formula it is possible to derive fairly straightforwardly many identites that relate ζ(s) to a variety of arithmetic functions. In fact, it can be shown that
log ζ(s) = s ∫_{2≤x<∞} π(x)/(x(x^{s}-1)) dxwhich expresses log ζ(s) explicitly in terms of an integral that involves the prime number function π(x). With considerably more effort, using ζ(s) as a key tool, Riemann was able to come up with an "explicit" formula for π(x) itself.
When the prime number theorem was eventually proved by Hadamard and de la Vallée Poussin the plot thickened considerably. Not only was ζ(s) related to the prime numbers in a formal way, but its properties turned out to play a governing role in the asymptotic behavior of π(x). More specifically, it was critical for the proof of the prime number theorem that ζ(s) should not have any zeros with Re(s) = 1.
But the connection between the zeros of ζ(s) and the asymptotic behavior of π(x) turned out to be far stronger than that. Using the alternative product formula for ζ(s) that Riemann conjectured and Hadamard proved, it could be shown that the precise distribution of the zeros of ζ(s) affected the goodness of the approximation of π(x) by Li(x). And in the best possible case, if all zeros of ζ(s) lie on the line Re(s) = 1/2 as Riemann conjectured, then the error term, i. e. the difference π(x) - Li(x), is as small as possible. Not only that, but the converse is also true: if the error term has the conjectured smallest possible form, then the Riemann hypothesis is true.
So the main conjecture which is equivalent to the Riemann hypothesis is formulated in terms of the absolute error in the approximation of π(x) by Li(x). Specifically it is the conjecture that we can write
π(x) = Li(x) + E(x), where |E(x)| ≤ Cx^{1/2} log(x)It is known that this estimate is the best possible, because the difference between π(x) and Li(x) can actually be that large. What this means is that the distribution of prime numbers is so closely related to the distribution of the zeros of ζ(s) that the "best" possible estimate of π(x) can be made if, and only if, all zeros lie on the line Re(s) = 1/2.
OK, if we can't yet say for sure that Re(s) = 1/2 for all s such that ζ(s) = 0, what can we say? Progress towards establishing the Riemann hypothesis could be viewed in terms of giving tighter limits on Re(s). As before, we shall let Z be the set of zeros of ζ(s) in the critical strip {s ∈ ℂ; Re(s) ≤ 1}. ρ will stand for any member of Z. It's fairly easily shown that there is a real constant c > 0 such that if ρ ∈ Z then
Re(ρ) < 1 - c Min(1, (log |Im(ρ)|)^{-1})This result was proven by Hadamard, and the region in which zeros cannot lie is known as Hadamard's zero-free region. Except for minor variations, this is still the best result which is known. Min(x,y), of course, simply means the smaller (minimum) of x and y. If ρ is close to the real axis, log |Im(ρ)| is negative, so the result follows because there are no zeros with Re(ρ) = 1. The more interesting case is when Im(ρ) is large. Then log |Im(ρ)| is also large, so its reciprocal is small, but positive. The net result is that, on the basis of this estimate, Re(ρ) theoretically could come very close to 1 the larger Im(ρ) is. In other words, the possible "scatter" of zeros around the line Re(s) = 1/2 becomes larger as Im(s) does.
In spite of that, most zeros must be located close to the line for large values of Im(s). It has been shown that more than 99% of zeros satisfy
|Re(ρ) - 1/2| ≤ 8 / log|Im(ρ)|
Even from this relatively weak information about ρ it is possible to make an estimate for the error term E(x) in the relation π(x) = Li(x) + E(x). Namely it can be shown that
|E(x)| ≤ C x e^{-c √(log x)}for some positive constants c and C. Another way to write this, using the O-notation, is
E(x) = O(x e^{-c √(log x)})Hence E(x)/x = O(e^{-c √(log x)})). That is enough to prove the prime number theorem, since it implies the relative error E(x)/x → 0 as x → ∞. Yet it's a very weak estimate, since it isn't even as strong as E(x)/x = O(e^{-c log x)})) = O(x^{-c}).
It is possible to make tests of the Riemann hypothesis by counting the number of zeros whose imaginary part is less than any given T > 0. That is, we look at the number N(T) which Riemann considered (as described earlier) that is defined by
N(T) = #({ρ∈Z: 0 < Im(ρ) < T})It can be shown, confirming one of Riemann's subsidiary conjectures, that
N(T) = (T/2π)log(T/2π) - T/2π + O(log T)In fact, there are ways to compute N(T), for large T, with an error of less than 1/2. But since N(T) is an integer, the computation determines N(T) exactly. It is also possible to locate precisely all the zeros on the line Re(ρ) = 1/2 that have Im(ρ) < T. All computations done so far, which extend at least to N(T) = 1.5 × 10^{9}, have verified that every single zero in the range (for T less than about 5 × 10^{8}) lies on the line Re(ρ) = 1/2. As of 2004, 10^{13} zeros in the critical strip had been identified, and all have Re(ρ) = 1/2. This is good evidence in support of the Riemann hypothesis, though it hardly proves it.
In a slightly different direction, it is possible to make estimates of how many zeros lie off the line Re(ρ) = 1/2. One defines, for T > 0 and σ > 1/2
N(σ,T) = #({ρ ∈ Z: Re(ρ) > σ and 0 < Im(ρ) < T})Then a celebrated result discovered by Harald Bohr (brother of physicist Niels Bohr) and Edmund Landau in 1914 says that for any σ > 1/2,
N(σ,T)/N(T) → 0 as T → ∞In other words, the proportion of zeros not on the line Re(ρ) = 1/2 is arbitrarily small for large enough T. Very little progress in this direction has been made since the 1914 Bohr-Landau theorem.
There have been a few additional results regarding how many zeros are on the line. G. H. Hardy and J. E. Littlewood showed that infinitely many are. Later Hardy and Atle Selberg showed that a nonzero proportion are on the line. The best currently proven estimate is that this proportion is more than 40%.
|π(x) - Li(x)| = O(x^{1/2} log(x))has been proven equivalent to the Riemann hypothesis. Knowing this about the distribution of primes can be used to derive the best possible estimates for many other number theoretic quantites as well.
Because of the importance of the hypothesis, it is often helpful to have other equivalent formulations available. So we'll just mention some of them now. It's fairly easy to guess at some equivalent statements. For example, think of functions that can be defined in terms of ζ(s) which would have singularities if ζ(s) = 0 for some s with Re(s) > 1/2. Thus the Riemann hypothesis is true if and only if 1/ζ(s) is holomorphic for Re(s) > 1/2.
Since ζ(s) has a simple pole at s=1, (s-1)ζ(s) is holomorphic there. So its logarithm, log(s-1) + log(ζ(s)), is too. Except, perhaps, right at s=1, but in a neighborhood of s=1, the derivative of that is also holomorphic, and equals (s-1)^{-1} + ζ′(s)/ζ(s). ζ′(s)/ζ(s) is the "logarithmic derivative" of ζ(s), (log ζ(s))′. Using power series representations, it's clear
ζ′(s)/ζ(s) + (s-1)^{-1}is holomorphic at s=1, and so we conclude it is holomorphic in the region {s ∈ ℂ: Re(s) > 1/2} if and only if the Riemann hypothesis is true. Equivalently, log((s-1)ζ(s)) is holomorphic in the same region. (Remark: the logarithmic derivative ζ′(s)/ζ(s) has at most a simple pole at zeros of ζ(s), even if some of those zeros are multiple – although none have been found that are.)
We worked a bit with ζ′(s)/ζ(s) in connection with the function ψ(x). When all the gory details are processed, it turns out that two things are true:
If for some α < 1 one hasandψ(x) = x + O(x^{α}) as x → ∞then all zeros ρ of ζ(s) in the critical strip have Re(ρ) ≤ α
If all zeros of ζ(s) are contained in the strip {s ∈ ℂ: 1-θ ≤ Re(s) ≤ θ} for some θ < 1, thenSince ζ(s) = 0 for infinitely many s with Re(s) = 1/2, then by the first statement, α = 1/2 is the smallest exponent we can have such that ψ(x) = x + O(x^{α}). In other words, no matter what, we can never get any substantially better estimate of ψ(x) than ψ(x) = x + O(x^{1/2}). If that is true for ψ(x), then we have also π(x) = Li(x) + O(x^{1/2}).ψ(x) = x + O(x^{θ} (log(x))^{2}) as x → ∞
On the other hand, if the Riemann hypothesis is correct, we can apply the second statement with θ = 1/2 to conclude that in fact ψ(x) = x + O(x^{1/2} (log(x))^{2}) and hence π(x) = Li(x) + O(x^{1/2} log(x)). Conversely, by the first statement, this estimate implies the Riemann hypothesis.
In other words, the Riemann hypothesis is equivalent to the estimate π(x) = Li(x) + O(x^{1/2} log(x)), and no better estimate of this sort is possible.
Unfortunately, we must point out that no results have actually been proven of the form that all zeros of ζ(s) must have Re(ρ) ≤ α for some α < 1. Even now, the best result we have for a zero-free region is Hadamard's, which illustrates how little actual progress on the Riemann hypothesis has been made in almost 100 years.
One of these was proposed by E. Lindelöf in 1908 and is accordingly known as the Lindelöf hypothesis. Interestingly enough, it is concerned with how large |ζ(s)| can be on the line s = 1/2. There are several equivalent ways to state this hypothesis. Considering ζ(1/2+iτ) as a function of a real variable τ, one formulation is the statement that
|ζ(1/2 + iτ)| = O(τ^{ε})for any ε > 0 as τ → ∞. In fact, this is equivalent to the requirement
|ζ(σ + iτ)| = O(τ^{ε})for all σ > 1/2 and ε > 0. Intuitively, both of these state that ζ(s) grows rather slowly in the critical strip as Im(s) → ∞.
The relation of the hypotheses of Riemann and Lindelöf is made clearer given the provable fact that the Lindelöf hypothesis is equivalent to the requirement that the number of zeros of ζ(s) in the region 1/2 < Re(s) < 1 and T < Im(s) < T+1, grows more slowly than log(T). Using
N(σ,T) = #({ρ ∈ Z: Re(ρ) > σ and 0 < Im(ρ) < T})as before then, more precisely,
(N(σ,T+1) - N(σ,T))/log(T) → 0 as T → ∞for every σ > 1/2. This is implied by the Riemann hypothesis, which requires that N(σ,T) = 0 for all T and σ > 1/2, yet it doesn't imply the Riemann hypothesis, so it is a strictly weaker condition.
Nevertheless, even the Lindelöf hypothesis remains an open problem, demonstrating just how difficult the Riemann hypothesis must be. The best result currently known along these lines, due to Hermann Weyl, is that
|ζ(1/2 + iτ)| = O(τ^{1/6+ε})for any ε > 0 as τ → ∞.
It is thought that the Lindelöf hypothesis is the strongest result of this type that can be achieved by classical methods, if indeed it can be so achieved.
There are a number of other results which could be proven if the Riemann hypothesis is assumed. Apart from better estimates on the distribution of prime numbers, most such results, like the Lindelöf hypothesis, concern either the distribution of the imaginary parts of zeros of ζ(s), or the absolute values of ζ(s) or log(ζ(s)) as Im(s) becomes large. Most such results, likewise, remain unproven without assuming the Riemann hypothesis. Of course, if any were ever disproven, the Riemann hypothesis would be also – and that hasn't happened.
Similar to strengthening a statement by weakening the assumptions is to make the results more general, i. e. the assumptions on objects that the statement applies to are made less restrictive. For instance, statements about integers can frequently be strengthened by applying them to more general objects, such as groups or rings, which do not have the full range of properties that the integers do. As we shall see, there are conjectures very similar to the Riemann hypotheses that apply to a wide variety of functions other than the zeta function, but analogous to it in some way.
What are some other interesting number theoretic functions? Well, there's the Möbius function μ(n) which has been mentioned already and is defined for integers by μ(n) = 0 if n is divisible by the square of a prime, μ(n) = 1 if n is a product of an even number of distinct primes, and μ(n) = -1 if n is a product of an odd number of distinct primes. Now, whenever a number theorist deals with functions defined on (positive) integers, one of the first impulses is to think about the Dirichlet series, which is one type of "generating function" for the given sequence of numbers. That is, a function defined formally by ∑_{n} a_{n} n^{-s} that corresponds to the sequence {a_{n}}.
So we might well wonder about the function defined by ∑_{n} μ(n) n^{-s}. Would you be shocked to learn that this function is precisely the reciprocal of ζ(s)?
1/ζ(s) = ∑_{1≤n<∞} μ(n)/n^{s}Just like ζ(s) this series converges and gives a holomorphic function for Re(s) > 1. If ζ(s) had any zeros for 1/2 < Re(s) ≤ 1, 1/ζ(s) wouldn't be a holomorphic function in that region. The Riemann hypothesis says there are no such zeros. In fact, assuming the Riemann hypothesis, it can be shown that not only is 1/ζ(s) holomorphic for Re(s) > 1/2, but the Dirichlet series converges in that region.
We recall the very similar series
ζ′(s)/ζ(s) = -∑_{1≤n<∞} Λ(n)/n^{s}which played such a crucial role in the prime number theorem. Just as we had ψ(x) = ∑_{0<n≤x} Λ(n), we cannot help but wonder whether the function defined by
M(x) = ∑_{0<n≤x} μ(n)might be very interesting.
It is, and just as with ψ(x), one of the most interesting things about it is its asymptotic size. It can be shown that the estimate
M(x) = O(x^{1/2 + ε}) for any ε > 0is equivalent to the Riemann hypothesis.
Even more could conceivably be true. On the basis of numerical evidence at the time, Franz Mertens (whose theorem concerning Euler's constant γ and the Γ function has already been mentioned) conjectured that in fact |M(x)| < √x for all x. This is known as the Mertens conjecture. It would imply the Riemann hypothesis, if true. However, it's actually a stronger statement, and therefore a generalization of the Riemann hypothesis.
But in this case, this is a plausible conjecture that has been disproven. In 1984 A. M. Odlyzko and H. J. J. te Riele proved that for some values of x, M(x)/√x > 1.06, and for others M(x)/√x < -1.09. The proof was indirect and didn't actually exhibit any x satisfying those inequalities. It is still unknown whether M(x) = O(√x), though that seems doubtful. The negative conclusion to Mertens' conjecture shows one can't always depend on extensive numerical evidence in formulating number theoretic conjectures. This rather reduces one's enthusiasm for believing the Riemann hypothesis based solely on numerical evidence. (A similar failure occurred with the conjecture that π(x) < Li(x), which was known to be true for x up to 3 million. But it was later proven that the quantity Li(x) - π(x) changes sign infinitely often.)
Most of these more general functions can be expressed as Dirichlet series or something similar. They usually can also be expressed in terms of a product like Euler's and satisfy a functional equation like that of ζ(s). The numerical values of such functions at special points (and their "residues" at poles) have particular significance in terms of algebraic and arithmetic objects that the functions are associated with. And they are also expected to satisfy a condition similar to Riemann's hypothesis regarding the location of their zeros.
With number theory in mind, there is a generalization of a quite important kind we might consider. In "ordinary" number theory the numbers that are of principal concern are rational numbers (the set of which is denoted by ℚ) and the integers (ℤ). The theory of rational numbers and integers is quite rich and interesting. A great deal of it is concerned with properties of prime numbers. Many natural questions in number theory can be expressed in terms of integers and rationals.
Not all questions, however. Any time one wants to deal with Diophantine equations that are not "linear", even simple ones such as x^{2} - d = 0, it is necessary to go beyond the rationals and integers to consider "algebraic" numbers, which are, roughly speaking, any solution of a polynomial equation having rational numbers as coefficients. This involves the vast mathematical subject of "algebraic number theory", of which "ordinary" number theory -- the theory of rational numbers and integers -- is only a small part.
Algebraic number theory is a huge and deep subject, with many difficult open questions of its own. We go into more detail on this subject elsewhere, but there are straightforward analogues of ζ(s) that play a large role in the theory and are relevant to mention here.
Many of the important questions of algebraic number theory revolve around the properties of suitable analogues of prime numbers. It is not hard to define the meaning of "integers" within the class of all algebraic numbers. However, in general algebraic number theory one does not usually have a set of numbers (even algebraic numbers) in terms of which any arbitrary algebraic integer can be expressed uniquely as a product of "prime" numbers of some sort. One must instead work with certain sets of algebraic numbers called "ideals". This makes the theory much more complicated. It is therefore striking that a fairly simple analogue of the Riemann zeta function does exist. The analogue is the Dedekind zeta function, after Richard Dedekind (1831-1916) who did much to develop algebraic number theory, and was a close friend of Riemann and his first biographer as well.
To talk about Dedekind zeta functions we need a little terminology. First, a "field" is a collection of mathematical objects (such as numbers or functions) for which there are concepts analogous to the addition and multiplication of ordinary numbers. There are certain axiomatic rules which specify how addition and multiplication should behave. For instance, there is the "distributive" rule of multiplication with respect to addition: a(b + c) = ab + ac for any a, b, and c belonging to the field. The rational numbers ℚ form a field, as do the real numbers ℝ and the complex numbers ℂ.
Fields can contain, and be contained in, other fields. For present purposes we are concerned with fields K that contain the rationals ℚ. Symbolically, ℚ ⊆ K. In this case, one says that ℚ is a subfield of K, and K is an extension of ℚ. An "algebraic number" is simply any root of any polynomial f(x) whose coefficients are rational numbers in ℚ, where polynomials are expressions of the form
f(x) = a_{n}x^{n} + a_{n-1}x^{n-1} + ... + a_{1}x + a_{0} with a_{i} ∈ ℚ for 0 ≤ i ≤ nThe numbers a_{i} are the coefficients, and x is referred to as an "indeterminate". The set of all such polynomials is denoted by ℚ[x]
To say that the algebraic number α is a "root" of f(x) means just that f(α) = 0. If n is the smallest integer such that a_{n} ≠ 0, while a_{m} = 0 if m > n, n is said to be the "degree" of f(x). An algebraic number α is an "algebraic integer" if there is a polynomial f(x) of degree n where a_{n} = 1, and all of the other coefficients are ordinary integers (elements of ℤ). It isn't hard to show that the subset of an extension K ⊇ ℚ consisting only of algebraic integers is what is called a "ring", which is a very much like a field, except that its members do not necessarily have multiplicative inverses (i. e. reciprocals) that are also algebraic integers (although the inverses are still algebraic numbers in K). This ring is called the ring of algebraic integers of K/ℚ. (Which is read "K over Q".) It is this ring which generalizes the ring ℤ of ordinary integers.
In general, an algebraic number α can be a root of infinitely many polynomials, but there is always a polynomial of smallest degree n whose leading coefficient a_{n} = 1. This is called the "minimal" polynomial of α. There is a function called the "norm" that maps all algebraic numbers of an extension field K ⊇ ℚ to ℚ. If K is the smallest extension field of ℚ that contains α and all other complex roots of the minimal polynomial f(x) of α (known as the "splitting field" of f(x)), then the norm of α is simply the constant term of f(x), i. e. f(0). It is denoted by N_{K/ℚ}(α) and it is a member of ℚ by definition. If α is an algebraic integer, its norm is an ordinary integer.
Now, as mentioned, the ring of integers R of some K ⊇ ℚ doesn't necessarily have the property that all its members are products of "primes" in a unique way. This fact makes arithmetic in R much messier than arithmetic in ℤ. Fortunately, algebraists in the 19th century discovered that it is possible to define certain subsets of R called "ideals" that have a property analogous to unique factorization. An ideal I ⊆ R is itself a ring with the further property that all products of elements of I by elements of R are members of I. (Symbolically, IR ⊆ I.) An important type of ideal is a "principal" ideal, which consists of all products of an algebraic integer and other elements of the ring, and is usually written as αR or (α). Although ideals are not numbers, they play an analogous role in the theory. For instance, it is possible to define the norm of an ideal, and the value of the norm is an ordinary integer.
Happily, unique factorization can be recovered for rings of integers as far as ideals are concerned. That is, there are ideals, called prime ideals, such that any ideal in the ring can be expressed as a "product" of prime ideals in a unique way. The way that an algebraic integer α factors in the ring can be expressed in terms of the prime ideals that are factors of the principal ideal (α). Most of algebraic number theory is devoted to describing the prime ideals of a given ring of algebraic integers in greater detail.
One of the key tools used in algebraic number theory is the Dedekind zeta function, which we now have enough terminology to define. The zeta function of the algebraic number field K ⊇ ℚ is given by
ζ_{K}(s) = ∑_{A} (N_{K/ℚ}A)^{-s}where the sum runs over all the ideals of the ring of integers of K, and N_{K/ℚ}A is the norm of A. Note that this is in fact a Dirichlet series, because the norm is defined so that it is an integer. If K = ℚ, ζ_{K}(s) is simply the familiar Riemann zeta function.
Many of the properties of ζ(s) carry over to ζ_{K}(s). In particular, precisely because one has unique factorization of ideals, there is a product formula:
ζ_{K}(s) = ∏_{P} (1 - 1/(N_{K/ℚ}P)^{s})^{-1}where the product runs over all prime ideals P.
ζ_{K}(s) can also be written as an ordinary Dirichlet series:
ζ_{K}(s) = ∑_{1≤n<∞} f(n)/n^{s}where f(n) is the number of ideals of K that have norm equal to n. Clearly, f(n) is a key number theoretic function in the theory, and ζ_{K}(s) encodes information about it.
The similarities to ζ(s) don't stop there. There are many others:
Naturally, the precise characteristics of ζ_{K}(s) depend on the specific field K. Indeed, important quantities related to K can be expressed in terms of ζ_{K}(s). For instance, the very important "ideal class number" of K, which roughly expresses how badly the ring of integers of K fails to have all of its ideals be principal ideals, enters into the formula for the "residue" of ζ_{K}(s) at its pole at s = 1.
The class number is number of elements of the "ideal class group" of the field. This group consists of classes of ideals within the set of all ideals of the ring of integers of K. We won't attempt to explain that idea more here – it's part of the very deep theory called "class field theory – except to note that Dedekind zeta functions can also be defined for each of these ideal classes, by letting the sum which defines the function run only over those ideals which are members of a given class:
ζ(s; H) = ∑_{A∈H} (N_{K/ℚ}A)^{-s}for some ideal class H. There are h such classes, where h is the ideal class number, and
ζ_{K}(s) = ∑_{1≤i≤h} ζ_{K}(s; H_{i})It is also possible to express ζ_{K}(s) in terms of a product of ζ(s) and functions called "L-functions", the simplest case of which comprises Dirichlet L-functions, which we'll look at next.
Instead, Dirichlet was interested in certain simple arithmetic functions called "characters". A character (specifically, a "Dirichlet" character) is a complex-valued function defined for integers and having the multiplicative property: if χ is a character, then χ(ab) = χ(a)χ(b). For any character, there is an integer m such that χ is "periodic modulo m", i. e. χ(a) = χ(a + km) for any integer k. In modern terminology that we won't attempt to explain in detail here, a Dirichlet character corresponds to "homomorphism" from the multiplicative group of integers modulo m, denoted by (Z/mZ)*, to a group of roots of unity -- complex numbers some power of which is 1. Given such a homomorphism, χ(n) can easily be defined for all integers n that are "relatively prime" to m (i. e., have no prime factors in common with m, which one expresses symbolically in terms of the greatest common divisor: (m,n) = 1), while for integers n that aren't relatively prime to m, one sets χ(n) = 0. (The group (Z/mZ)* is a group of φ(m) elements, where φ(m) is Euler's φ function: the number of positive integers less than m and relatively prime to m. Therefore, the value of χ(n) are either φ(m)-th roots of unity or zero.) Dirichlet characters are useful tools for studying multiplicative properties of the integers.
Given any Dirichlet character, the obvious thing to do is to define a series, called an L-series or L-function:
L(s, χ) = ∑_{1≤n<∞} χ(n)/n^{s}The terms "L-series" and "L-function" are often used interchangably, but the latter is to be preferred, since some examples that are defined by other means do not have Dirichlet series expansions.
If we take χ(n) to be the "trivial" character, corresponding to m=1 such that χ(n) = 1 for all n, we get the Riemann zeta function. Not too surprisingly, then, L-functions also have many properties in common with ζ(s).
First of all, because of the multiplicative property of χ(n), there is an Euler product formula:
L(s, χ) = ∏_{p} (1 - χ(p)p^{-s})^{-1}The product here is over all primes p. Both the series and the product representations of L(s, χ) converge to holomorphic functions in the half plane Re(s) > 1, just as for ζ(s). L(s, χ) can also be analytically continued to the whole complex plane. Unlike ζ(s), L(s, χ) does not even have a pole at s=1, unless χ is a "trivial" character (modulo m) with χ(n) = 1 for all n with (n,m)=1. And L(s, χ) also has a functional equation, involving analogues of the Γ function, which relates L(s, χ) to L(1-s; χ).
Another way that L(s, χ) is analogous to ζ(s) is the existence of a simple formula for values of the function at negative integers. Specifically, one has
L(1-k, χ) = - B_{k,χ} / k for k ≥ 1Here B_{k,χ} is a generalized Bernoulli number defined by the generating function
∑_{1≤a≤m} χ(a) x e^{ax} / (e^{mx} - 1) = ∑_{0≤n<∞} B_{n,χ} x^{n} / n!Like the ordinary Bernoulli numbers, the generalized Bernoulli numbers play an interesting and somewhat mysterious role in number theory. We'll say a little about that in the next section. This has to do with the fact that there is also a formula for L(k, χ) for positive k which also involves the B_{k,χ}. It is a consequence of the functional equation of L(s, χ).
Why pay so much attention to these seemingly arcane L-functions? The answer is several fold. First, historically, they played a key role in Dirichlet's noteworthy theorem that there are actually an infinite number of primes in every arithmetic progression of the form {a + km: k ∈ Z} where a and m are relatively prime. This is obviously a nice improvement on Euclid's old result on the infinity of primes.
But far more significantly than that, L-functions have proven to have a profound importance for expressing very deep facts about algebraic number fields through a series of generalizations. This comes about (in part) because the Dedekind zeta functions can be expressed as a finite product of suitable L-functions. The model for such results is the formula
ζ_{K} = Z(s) ∏_{χ} L(s, χ)In this formula, the field K is the smallest field that contains the rational numbers Q and all m-th roots of unity, which is written K = Q(μ_{m}). The (finite) product is over all Dirichlet characters modulo m, and Z(s) is a product of terms of the form (1 - (N_{K/Q}P)^{-s})^{-1} where P is a prime ideal of the integers of K which divides the principal ideal (m). (Such factors can be thought of as "local" zeta functions associated with a given prime ideal P.)
Now, K is a relatively simple sort of extension of Q, called a "cyclotomic" field. It is very important in algebraic number theory, but it is also rather a special case. Most extensions of Q have a more complicated structure, which is reflected in the so-called Galois group G(K/Q). Galois groups, named after Évariste Galois (1811-32), play a fundamental role in the theory of polynomial equations. Characters can be defined for any group, and indeed they are a fundamental tool in the theory of "group representations", which is a way of studying abstract groups in terms of groups of matrices.
It is possible to generalize the Dirichlet L-functions in various ways, which involve using characters of certain groups that can be defined for different types of field extensions. One such kind of generalization was discovered by Erich Hecke, and hence the corresponding characters are called Hecke characters, and the L-functions are called Hecke L-functions. Another type of generalization was discovered by Emil Artin, yielding Artin L-functions.
Many of the mysteries of advanced algebraic number theory are bound up in the properties of Artin's L-functions. They are defined in terms of certain products, they are meromorphic functions, and they satisfy a functional equation. Their zeros are easy to describe, and Dedekind L-functions can be expressed as a product of ζ(s) and Artin L-functions. For certain types of field extensions (abelian extensions, which have abelian Galois groups), the Artin L-functions can be related to Hecke L-functions. Artin L-functions, however, do not in general have an infinite series expansions.
But there remain important open questions about the Artin L-functions. In particular, there is Artin's conjecture: that the functions are in fact holomorphic everywhere. This is known to be true for abelian field extensions, because this part of the theory (known as "abelian class field theory") is fairly well understood. However, the Artin L-functions encode facts about nonabelian field extensions as well, and nonabelian class field theory is an active research area.
xe^{x} / (e^{x} - 1) = ∑_{0≤n<∞} B_{n} x^{n} / n!Let χ(n) be a Dirichlet character modulo m (and, technically, it needs to be what is called a "primitive" character). Then one can define generalized Bernoulli numbers with this generating function:
∑_{1≤a≤m} χ(a) xe^{ax} / (e^{mx} - 1) = ∑_{0≤n<∞} B_{n,χ} x^{n} / n!In proving that ζ(s) was a holomorphic function in the whole complex plane except for s=1, Riemann used various representations for ζ(s) in terms of integrals, such as the formula quoted earlier:
ζ(s) = -Γ(1-s)(2πi)^{-1} ∫_{C} (-x)^{s-1}/(e^{x}-1) dxSince the integrand in this formula is almost the generating function for Bernoulli numbers, it falls out that
ζ(1-2k) = -B_{2k} / 2k for k ≥ 1.and also
ζ(2k) = (-1)^{k+1} B_{2k} (2π)^{2k} / (2(2k)!) for k ≥ 1The net result is that we have relatively simple expressions for the value of ζ(s) in terms of Bernoulli numbers at all even integers (and negative odd integers). Given the way the generalized Bernoulli numbers are defined, it comes about that they occur in simple expressions for special values of Dirichlet L-functions, namely
L(1-k, χ) = - B_{k,χ} / k for k ≥ 1There are also slightly more complicated formulas for L(s, χ) at positive integral values of s.
But why is any of this interesting or important? To answer that, we need to say a little more about Bernoulli numbers.
For starters, how do we know that Bernoulli numbers are rational? The answer is that they satisfy a simple "recurrence relation" as a direct result of their generating function. Let C(n,k) denote the number of combinations of k objects that can be selected from a set of n ≥ k objects. We have C(n,k) = n! / k!(n-k)!. The C(n,k) are integers and are also known as binomial coefficients, because they occur in the expansion of binomials:
(1 + x)^{n} = ∑_{0≤k≤n} C(n,k)x^{k}Given that, it can be shown from the generating function that
∑_{0≤k≤n} C(n+1,k)B_{k} = 0 for n ≥ 1Thus B_{n} is expressable in terms of B_{k} for k < n using only rational numbers. Since B_{0} = 1 is rational, it follows (by induction) that all B_{n} are.
There is a more general generating function which can be used to define what are called Bernoulli polynomials:
xe^{tx} / (e^{x} - 1) = ∑_{0≤n<∞} B_{n}(t)x^{n}/n!Then the B_{n}(t) are polynomials in the indeterminate t whose coefficients involve Bernoulli numbers:
B_{n}(t) = ∑_{0≤k≤n} C(n,k)B_{k}t^{n-k}There are also analogous generalized Bernoulli polynomials B_{n,χ}(t) associated with Dirichet characters χ, defined by generating functions:
∑_{1≤a≤m} χ(a)xe^{(a+t)x} / (e^{mx} - 1) = ∑_{0≤n<∞} B_{n,χ}(t)x^{n}/n!Formal manipulation of the generating functions allows us to deduce an expression for B_{n,χ}(t) in terms of B_{n}(t):
B_{n,χ}(t) = m^{n-1}∑_{1≤a≤m} χ(a)B_{n}((t+a)/m)Since the constant terms of the polynomials B_{n}(t) and B_{n,χ}(t) are just B_{n} and B_{n,χ} (respectively), setting t=0 in the above relation gives an expression for B_{n,χ} in terms of B_{n}:
B_{n,χ} = m^{n-1}∑_{1≤a≤m} χ(a)B_{n}(a/m)Since B_{n}(a/m) is a rational number and χ(a) is algebraic (a root of unity, in fact), it follows that the B_{n,χ} are algebraic numbers lying in a "cyclotomic" extension of Q, i. e. an extension field that is generated over Q by roots of unity. And hence the same is true for values of L(s, χ) when s is a negative integer.
Just one more new definition. Let
ζ(s,x) = ∑_{0≤n<∞} (n+x)^{-s} for real x such that 0 < x ≤ 1This is called a "partial" zeta function or a Hurwitz zeta function (after Adolf Hurwitz). For x = 1, it is the Riemann zeta function: ζ(s, 1) = ζ(s). It is not quite a Dirichlet series in general, but if x = p/q is rational, then
ζ(s, p/q) = q^{s} ∑_{0≤n<∞} (nq + p)^{-s}is an exponential function times a Dirichlet series. The values of ζ(s,x) when s is a negative intgeger can be expressed in terms of Bernoulli polynomials:
ζ(-k, x) = -B_{k+1}(x) / (k + 1)And since (from the generating function) one has B_{n}(1) = B_{n}, setting x = 1 in this last formula gives the previously obtained value of ζ(-k).
Finally, one can write L(s, χ) as a combination of certain partial zeta functions:
L(s, χ) = m^{-s} ∑_{0<a<m} χ(a)ζ(s, a/m)What all this demonstrates is how closely tied the Bernoulli numbers are to the simpler zeta and L-functions. What about some of the more general L-functions, such as Artin's?
It was discovered by Armand Borel that the values of Riemann's zeta function at positive odd integers are tied up with an abstruse area of contemporary mathematics called algebraic K-theory and certain real numbers called "regulators". More recently other mathematicians (Spencer Block, Kazuya Kato) have found a complete description of values of ζ(2k-1) using a theory of "Tamagawa measure". Generalizations of some of these results for other L-functions are given by some comprehensive conjectures of A. A. Beilinson.
But the curious properties of Bernoulli numbers don't end there. It has long been known that they are also intimately involved with the famous (and now proven) Fermat's Last Theorem. In fact, one of the few strong results known prior to Andrew Wiles' work was that FLT is true for "regular" primes p. More explicitly, if p is a regular prime, then x^{p} + y^{p} = z^{p} has no nontrivial integer solutions.
A regular prime is defined as one which does not divide the class number of the p-th cyclotomic field Q(ζ_{p}). This is the extension of Q generated by p-th roots of unity, which have the general form
ζ_{p} = e^{2kπi/p} = cos(2kπ/p) + i sin(2kπ/p) for 0 ≤ k < p(This use of the letter ζ is conventional but is unrelated to the zeta function.)
The class number of Q(ζ_{p}) is the number of "ideal classes" in the ideal class group of the field. As mentioned before, this is a measure of how far the ideals in the ring of integers of the field are from being principal ideals. The whole theory of ideals in a ring of integers was developed by Ernst Kummer (1810-93) in order to attack the Fermat problem. Kummer didn't solve the problem completely, but it was he who solved it for regular primes. Unfortunately, he couldn't even show that there are infinitely many regular primes -- and this is still an open question. Oddly enough, even though computations show that regular primes seem to be in the majority, it has been possible to show there are infinitely many primes which aren't regular.
Kummer was able to show that a prime p is regular if an only if it does not divide any of the numerators of B_{2}, ..., B_{p-3}. This is not an easy test to apply directly, since the numberators of B_{n} become very large very quickly. However, it was possible to determine that the only primes less than 100 which aren't regular are 37, 59, and 67.
How do Bernoulli numbers get into the act here? Without getting into the details of Kummer's work, let's just say that the deep reasons involve the fact that the class number of a field is closely connected with special properties of the field's Dedekind zeta function and hence with Dirichlet L-functions.
There's still more to the story of Bernoulli numbers. Much more. The occur, for example, in the theory of modular forms associated with elliptic curves, which we'll touch on a little later. Also, there are hundreds of identities involving the numbers, and some close relatives such as "Euler numbers" and "Stirling numbers", and "symmetric functions", and the binomial coefficients. Most of these properties can be deduced from the generating functions of the various number sequences. Many of these formulas come about because the generating function of the Bernoulli numbers also plays an important role in the result now known as the Atiyah-Singer index formula, which is a generalization of various classical geometric results such as the "Gauss-Bonnett theorem" and the "Riemann-Roch theorem".
There's a lot that could be said about the deep circle of ideas connected with geometry and the Atiyah-Singer formula. It's quite likely there are deeper connections not yet discovered between these ideas and zeta; and L-functions. But in terms of present knowledge, we've strayed far enough from the Riemann hypothesis. Nevertheless, there is more to say about it in connection with very important ideas from algebraic geometry.
The basic idea is very simple. Even if we are interested in finding only integral solutions of Diophantine equations, it is sometimes very helpful to consider first the geometric objects defined by those same equations. This is a typical example of attacking difficult mathematical problems by looking at them in a more general context instead as a specific problem. Such an approach of looking at the more general rather than more concrete problem often succeeds because it so often helps to look at the forest as a whole instead of individual trees.
Copyright © 2002 by Charles Daney, All Rights Reserved