Open Questions: Algebraic Number Theory

[Home] [Up] [Glossary] [Topic Index] [Site Map]

See also: Number theory -- The Riemann hypothesis -- The Langlands program

Introduction

Groups, rings, and fields

Ring theory

Rings and ideals

Ideal class groups

Class field theory

Zeta functions and L-functions

Reciprocity laws

Kronecker's Jugendtraum

Gauss' quadratic class number problem

Nonabelian class field theory and Artin's conjecture

Recommended references: Web sites

Recommended references: Magazine/journal articles

Recommended references: Books

Introduction

In order to describe what "algebraic number theory" is about, we should start by discussing two things: "algebra" and "numbers".

Algebra

Given that we have already an intuitive concept of what "number" is, let's take "algebra" first. The word comes from Arabic: al-jebr because the subject was studied and written about in something like the modern sense, by scholars who spoke Arabic in what is now the Middle East, in the 9th century CE. Although classical Greeks and various of their predecessors and contemporaries had investigated problems we now call "algebraic", these investigations became known to speakers of European languages not from the classical sources but from Arabic writers. So that is why we use a term derived from Arabic.

Muhammad ben Musa al-Khwarizmi seems to have been the first person whose writing uses the term al-jebr. As he used it, the term referred to a technique for solving equations by performing operations such as addition or multiplication to both sides of the equation -- just as is taught in first-year high school algebra. al-Khwarizmi, of course, didn't use our modern notation with Roman letters for unknowns and symbols like "+", "×", and "=". Instead, he expressed everything in ordinary words, but in a way equivalent to our modern symbolism.

The word al-jebr itself is a metaphor, as the usual meaning of the word referred to the setting or straightening out of broken bones. The same metaphor exists in Latin and related languages, as in the English words "reduce" and "reduction". Although they now usually refer to making something smaller, the older meaning refers to making somethng simpler or straighter. The Latin root is the verb ducere, to lead -- hence to re-duce is to lead something back to a simpler from a more convoluted state. In elementary algebra still one talks of "reducing" fractions to lowest terms and simplifying equations.

The essence of the study of algebra, then, is solving or "reducing" equations to the simplest possible form. The emphasis is on finding and describing explicit methods for performing this simplification. Such methods are known as algorithms -- in honor of al-Khwarizmi. Different types of methods can be used. Guessing at solutions, for instance, is a method. One can often, by trying long enough, guess the exact solution of a simple equation. And if one has a guess that is close but not exact, by changing this guess a little one can get a better solution by an iterative process of successive approximation. This is a perfectly acceptable method of "solving" equations for many practical purposes -- so much so that it is the method generally used by computers (where irrational numbers can be specificed only approximately anyhow). Some approximation methods are fairly sophisticated, such as "Newton's method" for finding the roots of polynomial equations -- but they're still based essentially on guessing an initial rough answer.

Another method for finding solutions of equations is by means of geometric construction. One can construct geometric figures in which the length of a certain line segment is a solution to some given equation. This works well, for example, when square roots are needed, since the hypotenuse of a right triangle has a length which is the square root of the sum of squares of the other two sides of the triangle. That is, if the lengths of the sides are a, b, and c, then a2 + b2 = c2 and hence c = √(a2 + b2). If a and b are whole numbers, so is the sum of their squares. Algorithms for finding the approximate square root of a whole number were known, so c could be computed approximately. However, with a geometric construction, c could be found simply be measuring the length of the right line segment. For future reference, note that an interesting problem is finding two numbers a and b such that for some given number d, d = a2 + b2. This is because if d is given, finding a and b enables one to find the square root of d by a geometric construction.

In addition to such approximation and geometric methods, al-Khwarizmi was interested in methods for finding exact solutions by a sequence of steps that could be described in cookbook style -- algorithms. This can be done completely successfully for linear equations of the form ax + b = c (in modern notation) using just the arithmetic operations of addition, subtraction, multiplication, and division. Just as important, the operations can be performed symbolically -- not just on particular numbers, but on symbols that stand for "any number". And so, one seeks to express the solution of a given equation in a symbolic form.

An interesting question is that of when the idea of representing equations in symbolic form arose. It's not easy to answer such a question, in part because symbolic representations were used before their full and considerable utility was recognized. For instance, Greek geometers labeled the lines in their figures with single letters, so it was natural to write what we now recognize as the Pythagorean theorem in the form a2 + b2 = c2. But the importance of this representation was somewhat blurred, since the distinction between a line and the length of a line was not fully appreciated. In fact, although Greeks and other early mathematicians (e. g. in India) used symbolic equations, al-Khwarizmi did not. (Hence it is likely he didn't know of Greek mathematics and much of his work was original, if not always as advanced as that of the Greeks.)

In modern notation, polynomial equations can be classified in terms of the highest power of any variable which occurs in them. We call an equation linear if the highest power is one, because its graph is a straight line. If there is just one variable, such an equation has the most general form ax + b = 0. If the highest power is two, the equation is called "quadratic", and has the form ax2 + bx + c = 0. (Why does the Latin prefix quad, usually associated with the number 4, occur here? Simply because the word for "square" is quadra in Latin.) In spite of lacking a symbolic representation of equations, al-Khwarizmi effectively did know the quadratic forumula which says that there are two solutions of the last equation, that can be written as x = {-b±√(b2-4ac)}/2a. He also realized that the equation has solutions at all in terms of "real" numbers only if the quantity we now call the discriminant, b2-4ac, is not negative.

The highest power of an unknown which occurs in a given polynomial equation is known as the degree of the equation. Although al-Khwarizmi doesn't seem to have studied equations of degree 3, called cubic equations, a more famous successor, who lived about 250 years later, did: Omar Khayyam (ca. 1050-1123), a Persian. Like his predecessor, Khayyam did not work with symbolic expressions for the equations. But he was able to produce solutions using geometric constructions (involving conic sections), provided a positive solution exists. He also thought, mistakenly, that such solutions couldn't be found by algebraic (algorithmic) methods of the sort al-Khwarizmi used.

The next substantial advance in solving equations began when scholars in Western Europe began to study and appreciate the work of people like al-Khwarizmi and Khayyam. Most notable among these scholars was Leonardo of Pisa, more commonly known as Fibonacci (ca. 1180-1250). He showed that algorithmic (as opposed to geometric) methods could be used to find solutions of some cubic equations. Fibonnacci had a much more obscure contemporary, Jordanus Nemorarius, of whom little is known apart from several books attributed to him on arithmetic, mechanics, geometry, and astronomy. He made a more systematic use of letters to stand for "variable" (not necessarily unknown) quantities in equations, but the importance of this technique was still not widely appreciated

With the advent of the Renaissance, progress in mathematics began to speed up. One of the first notable names was a German, Regiomontanus (1436-76). Though he produced less original work than others, he was widely read in the classic works of both the Greeks and the Muslim world. In particular, he had studied the Arithmetic of Diophantus of Alexandria (who was active around 250 CE) in the original Greek, and even considered publishing a Latin translation, though he never got around to it. Diophantus was in some respects more advanced than any other mathematician before the Renaissance, and among the problems he considered were what are now called Diophantine equations. The relevance of such problems will be explained shortly.

Somewhat more original than Regiomontanus was a Frenchman, Nicolas Chuquet, who died around 1500. He used expressions involving nested radicals farily close to the modern style, such as √(14-√180)), to represent solutions of 4th degree equations.

The real breakthrough came in the work of several Italians in the 16th century. In 1545 Girolamo Cardano (1501-76) published explicit algebraic solutions (that is, using arithmetic operations plus extraction of roots) of both cubic and quartic (4th degree) equations. Cardano, however, did not discover the solutions himself. The result for cubics was known before 1541 by Niccolo Tartaglia (ca. 1500-57), though apparently discovered even earlier by Scipione del Ferro (ca. 1465-1526). Cardano admitted he had not discovered the solution, but apparently he did break a promise to Tartaglia to keep the results a secret. (Just as now, precedence in publishing new scientific results was a matter of great prestige.) As for the quartic, Cardano states that the solution was discovered by Ludovico Ferrari (1522-65), though at his [Cardano's] request.

Such rapid progress naturally raised the question of solutions to equations of 5th degree (quintics) and higher, either by algebraic means (using arithmetic operations and radicals) or at least by means of geometric constructions (using only straightedge and compass). Surprisingly, it was proven almost 300 years later that solutions of either sort were not possible in general, i. e. for all cases. This was done independently by two young men, Niels Henrik Abel (1802-29) in 1824 and Évariste Galois (1811-32) in 1832. Galois' result is especially important, as it is based on very novel methods of abstract algebra -- the theory of groups -- and in fact Galois' ideas thoroughly permeate the theory of algebraic numbers we will discuss.

In spite of that astonishing negative result, only a few year earlier Carl Friedrich Gauss (1777-1855) had proven in his doctoral thesis of 1798 that polynomial equations of any degree n must have exactly n solutions in a certain very specific sense. This result was so important it became known as the fundamental theorem of algebra. The exact sense in which that theorem is true is the subject of the other part of this story: "numbers".

Numbers

The most intuitively "natural" sort of numbers are the "counting numbers": 1, 2, 3, etc. The precise mathematical term for them, in fact, is natural numbers. 0 is also considered a natural number, though the concept was invented in India and seems to have been unknown to the ancient Greeks.

The next most "natural" sort of number includes the negatives of all natural numbers. Collectively all natural numbers and their negatives are known as integers. Mathematicians use the symbol Z to denote the set of all integers. The idea of negative numbers seems to have existed in China before 400 CE. The Chinese had specific tools for reckoning with negative quantities (e. g. debts), but they had even less algebra than the Greeks. For their part, the Greeks seem to have had no concept of negative quantities as such. Negative numbers may have made their first appearance in the written record in the work of the Indian mathematician Brahmagupta early in the 7th century CE. He seems to have been the first to use both 0 and negative numbers systematically, and even recognized that a negative number could be the root of a quadratic equation. (For instance, both +2 and -2 are solutions of x2-4=0.) But since it was not easy to see a negative number of tangible things or count with negative numbers on one's fingers, suspicion of them as mere fictions was widespread for centuries in the West. (Just as many today still regard "imaginary" numbers with deep suspicion.)

If the concept of symbolic equations involving unknown quantities had been more well understood, negative numbers would have been accepted much more readily. They provide a means, after all, of solving even the simplest equations, such as x+1=0, a first degree equation in which all the coefficients are natural numbers.

The operation of division is the inverse of multiplication, and so the reciprocal of a nonzero number n is 1/n -- 1 divided by n. Negative numbers are merely formed using subtraction (the inverse of addition), since -n = 0-n. So it's curious that the Greeks didn't think of negative numbers, though they, and other ancient people, did embrace fractions readily. One assumes this is because fractions arise naturally in geometry, measurement, commerce, and so forth. Fractions are just ratios of two natural numbers, in the form a/b for positive integers a and b, and so they are called rational numbers. If the numbers involved were allowed to be negative as well, the rational numbers too could be negative. Mathematicians use Q for the set of all such rational numbers. If the Greeks had been more capable of thinking abstractly in terms of solutions to equations, it would have been easy to define rational numbers as possible solutions to any linear equation of the form ax+b=0, where a (≠ 0) and b are integers.

Geometry was the most developed form of mathematics in ancient Greece, so it was natural to think of numbers (apart from simple counting) as the lengths of lines, areas of circles, volumes of solids, etc. In other words, it was easy to perceive that arithmetic rules of working with counting numbers behaved in the same way as rules for adding and subtracting the lengths of lines, or computing areas and volumes by multiplication and division. It looked as though, perhaps, all numbers of any consequence should be rational. It thus came as a shocking revelation to the classical Greeks that there were "numbers" that could occur as lenghts of lines in a geometric figure which could not be rational numbers. A proof of this was discovered by followers of Pythagoras, specifically that the length of the diagonal of a square whose sides had 1 unit of length could not be a rational number. In modern notation this length is simply √2.

The proof that √2 is not rational is simple. Suppose it were rational. Then √2 = a/b for natural numbers a and b. Hence a2 = 2b2. We may suppose that the fraction is in lowest terms, so that a and b have no whole number factors in common. (Otherwise, just divide those out.) a2 is clearly an even whole number, so a must be even also. (If 2 divides a2, it has to divide a as well, by the rule of unique factorization into prime numbers, also known as the fundamental theorem of arithmentic. As we shall see, this can be proven fairly easily.) So a is divisible by 2; say a = 2A. Then 4A2 = 2b2, hence 2A2 = b2. It follows that b2 is even, so b must be also. But that is a contradiction, since we could assume a and b had no factor in common. This contradiction means that the original assumption that √2 was rational must be false. QED.

This was so shocking to its discoverers that everyone who learned of it was pledged to secrecy. After all, "not rational", or "irrational" meant to the Greeks (just as in English) "unreasonable". This linguistic fluke suggested that the whole field of endeavor of Greek mathematicians was deeply flawed, so it would be devastating to their prestige if this notion became widely known.

In truth, there is nothing inherently contradictory or unreasonable about "irrational" numbers. They simply are not ratios of integers, but they can occur as solutions of polynomial equations with rational coefficients: for example, ±√2, which are solutions of x2-2 = 0. Numbers of this sort are called algebraic numbers, for obvious reasons. This class of algebraic numbers is the principal subject dealt with in algebraic number theory.

Algebraic numbers clearly exist, since the length of the diagonal of a unit square is certainly a meaningful concept. We've just seen the proof that some algebraic rational numbers are not rational. What are they then? In some sense, answering this question is what the subject of algebraic number theory is largely about. The theory attempts to say what they are in terms of mathematical properties they have. We will be spending most of our time on this issue.

Before we dive into that, let's look at the broader context. Recall the result Gauss proved in his thesis, the fundamental theorem of algebra. This theorem is about the roots of a polynomial equation of the form

anxn + an-1xn-1 + ... + a1x + a0 = 0
where n is a positive integer, x is an "unknown", an ≠ 0, and for 0≤j≤n all aj are rational (symbolically, ajQ). Such an equation, as we noted, is said to be of degree n. This can be simplified a little, because if an ≠ 0, then we can divide both sides of the equation by it, without affecting any of the solutions of the equation (known as roots), and therefore assume that the coefficient of xn is 1. The polynomial in such an equation is said to be monic. For simplicity, we often write the polynomial part of an equation as f(x) so that the equation is f(x) = 0.

What Gauss actually proved is that the polynomial in this equation factors completely into polynomials which have degree at most 2 -- even if the coefficients are any real numbers, not necessarily rationals. Intuitively, a real number is one that can be represented in decimal form as a whole number plus a fractional part that is an infinite series of decimal digits. The decimal digits in the fractional part may or may not repeat in groups, from a certain point on. (For instance, .000123123123... is a repeating decimal, while the fractional parts of the numbers √2 and π never repeat.) It is not hard to show that a number is rational if, and only if, its fractional part is a repeating decimal. Thus the rational numbers form a subset of all real numbers. The set of all real numbers is denoted by R.

Irrational numbers such as √2 are examples of real numbers which are not rational. However, √2 is an algebraic number because it is a root of x2-2=0. Yet not all real numbers are algebraic. In fact, there are vastly more real numbers than there are algebraic numbers. In some sense, there are just as many rational numbers (or even integers) as there are algebraic numbers, because the set of all algebraic numbers can be put into a 1-to-1 correspondence with either Z or Q. (Algebraic numbers may actually be "complex", as will be discussed shortly, but for now just think about real algebraic numbers.) All of these sets are subsets of R which are strictly smaller than R, because they cannot be put into 1-to-1 correspondence with R. (The argument is simple. If A is the set of all (real) algebraic numbers, it has a 1:1 correspondence with positive integers, which means one can, in principle write down all members of A in some order. Now we can define a new number r as the number whose nth decimal digit is one more than the corresponding digit of the nth member of A in the list (or 0 if that digit is 9). r is therefore a real number which cannot appear anywhere in the list, since it differs from every one of them in at least one place. So the supposed list of all

A real number which is not algebraic is said to be transcendental. Curiously, even though there is a vast quantity of transcendental numbers, it is quite difficult to prove that specific numbers, such as π, are transcendental. In fact, it was not until 1873 that a "familiar" number (e, the base of the natural logarithms in this case) was shown to be transcendental, by Charles Hermite. π was shown to be transcendental in 1882, by C. F. Lindemann.

Let's return to Gauss' fundamental theorem of algebra. It is now possible to prove something more general than what Gauss showed. Namely, we can work in any algebraic structure called a field. We'll explain more carefully in the next section what that structure is, but for now just accept that it is any system of objects (like numbers) that allow the arithmetic operations of addition, subtraction, multiplication, and division (except division by 0) following rules just like those of rational or real numbers. In this case, let F be any field, and f(x) be a monic polynomial with coefficients in F. Then it is possible to construct a slightly larger field E that contains F be adding certain new elements which are defined by simple polynomial relations. For instance, if F=Q, we can add or adjoin another element which we will denote by α and which has the property that α2=2. This new field, which we denote by F(α), consists of all possible sums and differences of &alpha with elements of F, as well as all products and quotients of such expressions. There are standard ways to do this construction rigorously and to prove that the result E=F(α) is a field, called an extension field. F is said to be a subfield of E. (Note that as sets, F⊆E.) You may think of the extension E, if you wish, as a collection of formal expressions of sums, differences, products, and quotients involving α and elements of F, always simplified by using the relationship α2=2 whenever possible. That is, always replace α2 by 2 whenever it occurs.

Given all this, it can be shown that there is one root of f(x)=0 in some extension field of the field F that contains the coeffiecients of f(x). Call this root α, so that f(α)=0. With polynomials, there is a process very much like long division of integers which allows one to compute the quotient of f(x) divided by x-α, yielding another polynomial g(x) = f(x)/(x-α). This algorithm guarantees the coefficients of g(x) are in E=F(α) if the coefficients of f(x) are. (In particular, if the coefficients are actually in F.) Consequently, f(x) = (x-α)g(x), where g(x) is monic and has degree exactly one less than that of f(x). We can repeat this process with g(x), and so after exactly n steps, we will arrive at a complete factorization of f(x) into linear factors with coefficients (the constant terms) that are in some extension field of F. We might have to adjoin n different symbols (the roots of f(x)), but at least it can be done. (In fact, it can be shown there is a single additional element θ, called a primitive element, or a generator of the field, which is the only element that needs to be adjoined to F to produce an extension field E=F(θ) in which f(x) splits into linear factors. In other words, this field E contains all the roots of f(x)=0.)

Note that unlike other sorts of numbers we considered before, the "numbers" in an extension field of Q may be somewhat abstract objects, such as formal expressions. They certainly can't be just expressions involving radicals, if the degree of the lowest degree polynomial they satisfy is 5 or more (as Abel and Galois proved). Nevertheless, as long as they are elements of R, i. e. real algebraic numbers, they still "make sense", say, as the length of a geometric object.

The real numbers themselves are rather abstract objects when one tries to construct them rigorously. There are ways to do this other than using decimal expansions. One such method, involving set theory, is called the method of Dedekind cuts, after its inventor Richard Dedekind (1831-1916). (Dedekind's name will come up again, because he was one of the primary contributors to algebraic number theory.) More generally, we can adjoin to Q all possible limits of sequences {an} of rational numbers to form the completion of Q considered as a metric space. We won't attempt to describe these abstract constructions further. The point is that once one goes beyond the field Q of rationals, larger fields consist of objects which are somewhat more abstract -- and to an extent arbitrary, subject only to the rules which define a field.

A perfect example of this is the field of complex numbers, which is obtained by adjoining the element i to the field R of real numbers, subject only to the relation i2=-1. So we can say that i=√-1. What is i "actually"? It doesn't matter. The only thing one needs to know is i2=-1. This should not be cause for suspiciousness or skepticism about such imaginary numbers. Their existence is just as secure as any other abstract object of modern mathematics. If we adjoin i to R the field C = R(i) of complex numbers is what we get.

Another way to describe C is as the set of all "numbers" of the form a+bi with a,b∈R, i. e. C = {a+bi | a,b∈R}. Addition and multiplication are defined on this set by the rules (a+bi)+(c+di) = (a+c)+(b+d)i, and (a+bi)×(c+di) = (ac-bd)+(bc+ad)i. This is very much as if i were an "unknown" symbol like x, except that we always simplify expressions by using the relation i2=-1.

There are other ways to think of this field. For instance, we can take it to be the set of all ordered pairs {(a,b) | a,b∈R} where addition and multiplication are given by (a,b)+(c,d) = (a+c,b+d) and (a,b)×(c,d) = (ac-bd,bc+ad), as suggested by the preceding paragraph. In this notation, it is apparent that C is "nothing but" the Cartesian plane R×R with a peculiar sort of multiplication. (Indeed, topologically, C and R×R are the same.)

One of the requirements of a field is that division by any element of the field except 0 is always possible -- that is, all nonzero elements have a multiplicative inverse. What is 1/(a+bi), the inverse of a+bi? First, we use the notation (a+bi)* = a-bi for the operation of complex conjugation. a-bi is said to be the complex conjugate of a+bi. This is used quite frequently. Next we note that (a+bi)×(a+bi)* = a2 + b2, a non-negative real number that is 0 if and only if a=b=0. So the square root of this is a real number, and we use the notation |a+bi| = √((a+bi)×(a+bi)*). This is called the norm of the complex number a+bi. It follows that if a+bi≠0, then its inverse is given by 1/(a+bi) = (a+bi)*/|a+bi|2.

Just a little more teminology and we can move on. The set of all polynomials in one variable that have coefficients in a field F is denoted by F[x]. A polynomial f(x)∈F[x] is said to be irreducible over F if it has no factors other than 1 and itself belonging to F[x]. An irreducible polynomial is completely analogous to a prime number in the integers. Suppose an element α is a member of some extension E of F. f(x)∈F[x] is said to be a minimal polynomial for α if f(α) = 0 and this is true of no polynomial in F[x] that has degree less than that of f(x). It's easy to show that when f(α)=0, f(x) is a minimal polynomial for α if and only if f(x) is irreducible over F. α is said to have degree n over F if n is the degree of its minimal polynomial. The degree of an extension E⊇F, denoted by [E:F], can be defined in various ways, but what it boils down to is that [E:F] is the degree of a primitive element θ such that E=F(θ). In some ways, a better definition of the degree [E:F] comes about when we regard E as a vector space over F. This is a concept from linear algebra. In these terms, [E:F] is the dimension of E as a vector space over F.

Given all that, we note that [C:R]=2 and that i is a primitive element for the extension CR. C has the fairly special property of being algebraically closed. This means that any polynomial in C[x] factors completely into linear factors in C[x]. In other words, there are no irreducible polynomials in C[x] having degree more than 1, and all roots of any f(x)∈C[x] actually lie in C itself. These facts follow from Gauss' fundamental theorem of algebra. (C does have extensions of infinite degree, such as the field of rational functions of one variable, but we won't go into that now.)

In order to avoid ambiguity, whenever discussing extension fields of some field F, we always assume the extensions are subfields of some fixed algebraically closed field that contains F. A smallest such field is known as an algebraic closure of F. For instance, C is an algebraic closure of R. Q has an algebraic closure (the field of all algebraic numbers) contained in C that is of infinite degree over Q, but much smaller than C itself. (For instance, the algebraic closure of Q contains no transcendental numbers.)


Groups, rings, and fields

We are now going to return to first principles and state more precisely how the concepts of mathematical objects called groups, rings, and fields are defined. These abstract concepts are absolutely essential to a proper discussion of algebraic number theory.

Groups

We begin with groups. As with most other sorts of algebraic systems, groups are defined abstractly in terms of sets of elements satisfying certain axioms. The axioms for a group are not the simplest that an interesting mathematical system can have -- monoids and semigroups have somewhat weaker axioms. But groups are just about the simplest objects that occur commonly in algebraic number theory. (For other discussions of groups in these pages, see here and here and here.)

A group G is a mathematical system consisting of a set of elements and one operation between any two elements of the set. If "∘" denotes the operation, in a group there are three requirements:

  1. the operation should be associative: x∘(y∘z) = (x∘y)∘z for all x, y, z in G;
  2. there should be an identity element "e": e∘x = x∘e = x for all x in G;
  3. every element of G should have an inverse: x∘x-1 = x-1∘x = e.

These axioms can be stated in slightly different ways, but we don't need to get into that.

Note one respect in which these group rules are different from the usual rules of either addition or multiplication in arithmetic: the commutative property x∘y = y∘x is not required for elements of a group, though it might hold for some or even all elements. If it does hold for all group elements, the group is said to be commutative or abelian (after Niels Abel). In the theory of algebraic numbers, whenever groups consist of actual algebraic numbers they will necessarily be commutative, since the rules of arithmentic (both addition and multiplication) still hold. But we will encounter groups that are defined in different ways which definitely won't be commutative. Some of the hardest problems of the theory, in fact, occur in the non-commutative cases.

For a nontrivial example of a commutative group that's important in algebraic number theory, just look at the set of all units, as we discussed in reference to Pell's equation. As you recall, we denoted by Z[√n] the set of numbers of the form a+b√n, where a and b are integers, and n is a positive integer that's not a perfect square. (Z[√n] is in fact a ring, as we'll define the term in a moment.)

Within that set, consider the subset of numbers such that the equation a2 - nb2 = ±1 holds. In other words, the "norm" of a+b√n, N(a+b√n), as defined by the left hand side of the equation, has the value ±1. We noted that this condition is necessary and sufficient for a number in the subset to have a multiplicative inverse. We called such numbers units of the ring Z[√n]. Note that 0 is not a unit, but 1 is, and that the existence of a multiplicative inverse of any unit makes the set of units into a commutative group under multiplication (with ordinary addition being irrelevant in this group – indeed, the sums and differences of units are not units).

Another thing to note is that the requirement for an identity element is a requirement for a solution to a certain simple equation, and we have seen this in action several times. For instance, the natural numbers N (nonzero integers) do not form a group under addition, because there is in general no solution to an equation of the form x+a = 0 with arbitrary a∈N. But if we extend N to the integers Z by "adjoining" all negative numbers, we have in effect simply included all formal solutions of equations x+a=0 for each a∈N and gotten lucky in that the enlarged domain satisfies the group axioms without difficulty.

Very much the same thing happened with respect to the operation of multiplication when we passed from the integers Z to the rationals Q. Again, with respect to multiplication, Z satisfies the group axioms except for the existence of inverses. That is, we are not able to solve the equation xa = 1 for arbitrary a∈Z. In fact, a solution exists only for a = ± 1. But in defining the rational numbers Q in effect we just formally adjoined the inverses (reciprocals) 1/a for each a∈Z (except a=0). The resulting group with the operation of multiplication consists of all nonzero elements of Q. This group is sometimes denoted by Q×.

If we wanted to preserve the additive structure of Z at the same time as providing multiplicative inverses, in order to construct the ring Q, we would need to have been a little subtler. This process is a standard one. It is called constructing a field of fractions, and we will come back to it.

Rings

Let's look at rings next, since they are the next major level up in axiom complexity. A ring is a mathematical system which has two distinct binary operations: "+" and "×", which are intended to be rather like the addition and multiplication of ordinary arithmetic. If R is a ring, then it satisfies the axioms for a commutative group with respect to addition. With respect to multiplication, R must satisfy the associative axiom. Sometimes rings are not required to have a multiplicative identity element, but most in fact do. Inverse elements, however, do not typically exist, even if there is a multiplicative identity element. Addition in a ring is always commutative, but multiplication need not be. If the multiplication is commutitive, the ring is a commutative ring. The rings that occur in algebraic number theory are commutative rings if they consist of ordinary algebraic numbers, but a few important cases of rings (matrix rings for example) aren't commutative.

In addition to the requirements on the operations of addition and multiplication seprately, they must satisfy a compatibility condition, known as the distributive law of multiplication with respect to addition:

If multiplication in R isn't commutative, the same thing must hold for multiplication on the right as well. One consequence of this axiom is that if 0 is the additive identity element, a0=0a=0 for all a∈R.

The integers Z are the most obvious example of a ring. For any field F containing Q, there is also the concept of a ring of integers of the field F. This ring is a direct generalization of Z, and it is one of the central objects of study. One wants to know as much as possible about the structure of such rings, because this knowledge has extensive practical application to the study of Diophantine equations, as we shall see. Elements of a such a ring of integers are called, simply, algebraic integers.

In many respects, of groups, rings, and fields, it is rings which are most interesting. They have the complexity due to possessing two operations, but the freedom of a less restrictive set of axioms than fields. This results in many more special situations, though not all of the strong theorems about fields (such as Galois theory) apply to rings.

Fields and Galois theory

Finally we come to fields, about which we already said much in the previous section. A field is simply a ring whose multiplication is commutative, has an identity element, and has multiplicative inverses for all elements except the additive identity element. We've already mentioned several examples of fields, specifically number fields, which are algebraic extensions of finite degree over the rationals Q. (I. e., each element of a such a field is an algebraic number in some finite extension of Q.) More exotic examples of fields certainly exist, though, such as finite fields, fields of functions of various kinds, p-adic number fields, and certain other types of local fields. If you go far enough in algebraic number theory, you'll encounter all of these.

The most important set of facts about fields for our purposes lie in what is known as Galois theory. This is the theory developed originally by Évariste Galois to deal (among other things) with the solvability or non-solvability, using radicals, of algebraic equations. It tells us a lot about the structure of field extensions in terms of certain groups – called Galois groups – which are constructed using permutations of roots of a polynomial which determines the extension. (Permutations are 1-to-1 mappings of a set to itself that interchange elements.) A little more precisely, a Galois group consists of automorphisms of a field – i. e. maps (functions) of the field to itself which preserve the field structure. All such automorphisms, it turns out, can be derived from permutations of the roots of a polynomial – under the right conditions.

The importance of Galois theory is that it sketches out some of the "easy" background facts about a given field extension, into which some of the more difficult facts about the algebraic integers of the extension must fit.

Before we proceed, let's review some notations and definitions that will be used frequently. Suppose F is a field. For now, we will assume F is a subset of the complex numbers C, but not necessarily a subset of the real numbers R. If x is an indeterminate (an "unknown"), then F[x] is the set of polynomials in powers of x with coefficients in F. F[x] is obviously a ring. If f(x)∈F[x] is a polynomial, it has degree n if n is the highest power of x in the polynomial. f(x) is monic if the coefficient of its highest power of x is 1. If f(x) has degree n, it is said to be irreducible over F if it is not the product of two (or more) nonconstant polynomials in F[x] having degree less than n.

A complex number α, which is not in F, is algebraic over F if f(α)=0 for some f(x)∈F[x]. f(x) is said to be a minimal polynomial for α over F if f(x) is monic, f(α)=0, and no polynomial g(x) whose degree is less than that of f(x) has g(α)=0. (Note that any polynomial such that f(α)=0 can be made monic without changing its degree.) A minimal polynomial is therefore irreducible over F. F(α) is defined to be the set of all quotients g(α)/h(α) where g(x) and h(x) are in F[x] and h(α)≠0. F(α) is obviously a field, and it is referred to as the field obtained by adjoining α to F.

If E is any field that contains F, such as F(α), the degree of E over F, written [E:F], is the dimension of E as a vector space over F. (Usually this is assumed to be finite, but there are infinite dimensional extensions also.) It is relatively easily proven that if α is algebraic over F and if the minimal polynomial of α has degree n, then [F(α):F]=n. Of course, more than one element can be adjoined to form an extension. For instance, with two elements α and β we write F(α,β), which means (F(α))(β). (Or (F(β))(α) – the order doesn't matter.)

We will frequently need one more important fact. Suppose we have two successive extensions, involving three fields, say D⊇E⊇F. This is called a tower of fields. Then D is a vector space over E, as is E over F. From basic linear algebra, D is also a vector space over F, and vector space dimensions multiply. Consequently, in this situation we have the rule that degrees of field extensions multiply in towers: [D:F]=[D:E][E:F].

Now we're almost ready to define a group, called the Galois group, corresponding to an extension field E⊇F. However, Galois groups can't be properly defined for all field extensions E⊇F. The extension must have a certain property. Here is the problem: The group we want should be a group of permutations on a certain set – the set of all roots of a polynomial equation. But consider this equation: x3-2=0. One root of this equation is the (real) cube root of 2, 21/3. The other two roots are ω21/3 and ω221/3 where ω=(-1+√-3)/2. You can check that ω3=1 and ω satisfies the second degree equation x2+x+1=0. ω is called a root of unity, a cube root of unity in particular. (Roots of unity, as we'll see, are very important in algebraic number theory.) Now, the extension field E=Q(21/3) is contained in R, but the other roots of x3-2=0 are complex, so not in the extension E. This means that it isn't possible to find an automorphism of E which permutes the roots of the equation. Hence we can't have the Galois group we need for an extension like E.

The property of an extension E⊇F that we need to have is that for any polynomial f(x)∈F[x] which is irreducible (has no nontrivial factors) over F, if f(x) has one root in E, then all of its roots are in E, and so f(x) splits completely in E, i. e. f(x) splits into linear (first degree) factors in E. An equivalent condition (as it turns out), though seemingly weaker, is that there be even one irreducible f(x)∈F[x] such that f(x) splits completely in E but in no subfield of E. That is, E must be the smallest field containing F in which the irreducible polynomial f(x)∈F[x] splits completely. E is said to be a splitting field of f(x). The factorization can be written

f(x) = ∏1≤i≤n (x - αi)
with all αi∈E, where n is the degree of f(x). (Remember that we are assuming f(x) is monic.) When this is the case, E is generated over F by adjoining all the roots of f(x) to F. In this case it can be shown that the degree [E:F] is the same as the degree of f(x).

An extension that satisfies these conditions is said to be a Galois extension, and it is the kind of extension we need in order to define the Galois group G(E/F). (Sometimes the type of extension just described is called a normal extension, and a further property known as separability is required for a Galois extension. As long as we are dealing with subfields of C, fields are automaticaly separable, so the concepts of Galois and normal are the same in this case.)

Suppose E⊇F isn't a Galois extension. If E is a proper extensions of F (i. e. E≠F), if α∈E but α∉F, and if f(x) is a minimal polynomial for α over F, then the degree [E:F] of the extension is greater than or equal to the degree of f(x). The degrees might not be equal, because all the roots of f(x) must be adjoined to F to obtain a Galois extension, not just a single root. If α is (any) one of the roots, [F(α):F] is equal to the degree of f(x). But this is the degree [E:F] only if α happens to be a primitive element for the extension, so that E=F(α), which isn't usually the case, and certainly isn't if E isn't a Galois extension of F.

In the example above with f(x)=x3-2, we have E = Q(ω,21/3) = Q(ω)(21/3), [Q(ω):Q]=2 and [Q(ω,21/3):Q(ω)]=3, so the degree of the splitting field of f(x) over Q is 6, because degrees multiply. Q(21/3)⊇Q is an example of a field extension that is not Galois. But Q(ω,21/3)⊇Q(ω) is Galois, since f(x) is irreducible over Q(ω) but splits completely in the larger field. Likewise, Q(ω)⊇Q is Galois, and in fact all extensions of degree 2 are Galois. (If f(x)∈Z[X] is a quadratic which is irreducible over Q and has one root in E, then the roots are given by the quadratic formula and involve √d for some d∈Z, so if one is in E, both are.)

We'll come back to this example, but first we'll look at a simpler one to get some idea of how Galois groups work. Consider the two equations x2-2=0 and x2-3=0. The roots of the first are x=±√2, and the roots of the second are x=±√3. We will start from the field Q and adjoin one root of each equation. This yields two different fields: E2=Q(√2) and E3=Q(√3). If we adjoin a root from both equations we get a larger field that contains the others as subfields: E=Q(√2,√3).

Consider the field extension E2Q first. We use the notation G(E2/Q) to denote the Galois group of the extension. In this example, call it G2 for short. We will use Greek letters σ and τ to denote Galois group elements in general. G2 consists of two elements. One of these is the identity (which we denote by "1") which acts on elements of the field E2 but (by definition) leaves them unchanged. This can be symbolized as 1(α)=α for all α∈E2. The action of a Galois group element can be fully determined by how it acts on a generator of the field, meaning √2 in this case. So it is enough to specify that 1(√2) = √2. This Galois group has just one other element σ2, which is defined by σ2(√2)=-√2. An important property that a Galois group must satisfy is that the action of all its elements leaves the base field (Q in this case) unchanged. A Galois group is an example of a group that acts on a set – a very important concept in group theory. But there is an additional requirement on Galois groups: each group element must preserve the structure of the field it acts on. In technical terms, it must be a field automorphism. We'll see the importance of this condition very soon.

As you can probably anticipate, the Galois group G3=G(E3/Q) has elements 1 and σ3 defined by σ3(√3)=-√3. We can now ask: what is the Galois group of the larger extension E⊇Q? It must contain 1, σ2 and σ3. We have to think about how (for instance) σ2 acts on √3. The clever thing about Galois theory is that it's easy to say what this action should be: σ2 should leave √3 unchanged: σ2(√3)=√3. In particular, σ2(√3) cannot be ±√2 The reason is that σ2 leaves the coefficients of x2-3=0 unchanged, and because σ2 is a structure-preserving field automorphism it cannot map something that is a root of that equation (such as √3) to something that is not a root of that equation (±√2).

For any finite group G, the order of the group is the number of distinct elements. We symbolize the order of G by #(G). In Galois theory it is shown that the order of a Galois group is the same as the degree of the corresponding field extension. Symbolically: #(G(E/F))=[E:F]. Basically this is because we can always find a primitive element θ such that E=F(θ), and θ satisfies an equation f(x)=0, where the degree of f(x) is [E:F]. The other n-1 roots of that equation are said to be conjugate roots. We get n automorphisms, the elements of G(E/F), generated from mapping θ to one of its conjugates (or to itself, giving the identity automorphism). Since the degrees of field extensions in towers multiply, so too do the orders of Galois groups in field towers, as long as each extension is Galois. That is, if D⊇E⊇F, where each extension is Galois, then #(G(D/F)) = #(G(D/E))#(G(E/F)). In our example, the degree of the extension is [Q(√2,√3):Q] = [Q(√2,√3):Q(√2)][Q(√2):Q] = 4. So this is also the order of the Galois group G=G(Q(√2,√3)/Q), and therefore we need to find 4 elements.

We've already identified three of the elements (1, σ2 and σ3). It's pretty clear that the remaining element must be a product of group elements: τ=σ2σ3. The product of Galois group elements is just the composition of the elements, which are field automorphisms (which happen to be derived from permutations on roots of equations), and hence they compose like any other function (or permutation). (Composition is just another term for the the function which is the result of applying one function after another.) Because of how σ2 and σ3 are defined, it must be the case that τ(√2)=-√2 and τ(√3)=-√3. Since E⊇Q is generated by √2 and √3, and τ is a field automorphism, we can figure out what τ(α) must be for any other α∈E. For instance, τ(√6)=√6, since √6=√2√3.

(Remember that we specified σ2(√3)=√3. You may have been wondering why we didn't just define the action of σ2 as an element of the full Galois group G=G(E/Q) by σ2(√3)=-√3. Had we done that, σ2 would have been what we found as τ, while the τ we got as the product of σ2 and σ3 would turn out to be the "old" σ2, so the only difference would be a relabeling of group elements.)

For a slightly more complicated example, suppose f(x)=x2+x+1 and g(x)=x3-2, with roots ω and 21/3 respectively, as above. Then in the tower Q(ω,21/3) ⊇ Q(ω) ⊇ Q both the extensions are Galois. (We already saw this isn't so with the tower Q(ω,21/3) ⊇ Q(21/3) ⊇ Q – order matters.) So the full extension E=Q(ω,21/3) ⊇ Q is Galois. Its Galois group G=G(E/Q) has order 6, because 6 is the degree of the whole extension, since the intermediate extensions are of degree 3 and 2 and the degrees of the extensions multiply.

It turns out to be easy to determine the Galois group of this extension, although there are some tedious calculations needed to verify this. So bear with us a moment here. We can define two automorphisms of E that leave Q fixed, as follows. It suffices to specify them on generators of the field. Let one automorphism σ be defined by σ(&omega)=ω2 and σ(21/3)=21/3. Let the other automorphism τ be defined by &tau(21/3)=ω21/3 and τ(ω)=ω. σ and τ are defined to leave elements of Q unchanged. For sums and products elements of E, σ and τ are defined to preserve the field structure, so they really are automorphisms (though, to be rigorous, this should be checked). So σ and τ are elements of the Galois group G=G(E/Q).

We can also see that σ2(ω) = σ(σ(ω)) = σ(ω2) = ω4 = ω, because ω3 = 1. So σ2 is the identity automorphism. (Note that the exponents on σ and τ refer to repeated composition, not ordinary exponentiation, because composition "is" multiplication in the group G.) If we compute τ2 and τ3 in the same way, applied to 21/3, we find that τ2(21/3) = ω221/3, and τ3(21/3) = 21/3, again because ω3 = 1. Thus τ2 isn't the identity automorphism, but τ3 is.

Now let's compute with the composed automorphisms στ and τσ. First, στ(21/3) = σ(ω21/3) = ω221/3. However, τσ(21/3) = τ(21/3) = ω21/3. So we have στ ≠ τσ, because ω≠ω2. Instead, we will find by a similar calculation that στ(21/3) = ω221/3 = τ2σ(21/3). Hence στ = τ2σ. A little more checking will show that 1 (the identity automorphism), σ, τ, τ2, τσ, and στ give a complete list of distinct automorphisms that can be formed from σ and τ. That's just right, because G must be a group of order 6.

In abstract group theory there are only two distinct groups of order 6. (That is, distinct up to an isomorphism, which is a 1-to-1 structure-preserving map between groups that shows they are essentiall the "same" group.) One is the cyclic group of order 6, denoted by C6. This is isomorphic to the direct product of a cyclic group of order two and one of order 3, i. e. the group C2×C3. However, since στ ≠ τσ, G isn't abelian, it cannot be C6, which is abelian. The only other group of order 6 is (up to isomorphism) S3, the group of permutations of three distinct objects, also known as the symmetric group. (An isomorphic group is the dihedral group D3, the group of symmetries of an equilateral triangle.) Since this group is the only nonabelian group of order 6, G(E/Q) must be isomorphic to it.

There's a whole lot more that could be said about Galois theory, but that would take up quite a bit of space, and the intention here is only to give a feel for what it is about. The basic idea to take away is this: A great deal is known about abstract groups and their subgroup structure. Galois theory is a way to "map" extensions of fields to groups and their subgroups in such a way that most of the interesting details about the extension are reflected in details about the groups, and vice versa. The group structure is sensitive to relationships among elements in the subextensions of a Galois extension. In Galois theory it is proven that there is a precise correspondence between subextensions and subgroups of the Galois group.

It thus becomes possible to infer facts about field extensions easily from a knowledge of their Galois groups. One example of the power of this method is that it made possible proving facts that had remained mysterious for hundreds of years – for example, the unsolvability by radicals of general polynomial equations of degree 5 or more, and the impossibility of certain geometric constructions by straightedge and compass alone (trisecting angles, for example).

Galois theory is an absolutely indispensible tool in algebraic number theory. It will come up again and again. We will mention other results in the theory when they are needed.

Diophantine equations

As discussed here, Diophantine equations have been a central area of interest in number theory for over two thousand years. The language of groups, rings and fields helps give us some perspective on why Diophantine equations are interesting.

What was pointed out in the discussion of Diophantine equations is that the characteristic feature isn't the form of the equation, but rather the type of solution which is sought. In normal mathematical conversation, a Diophantine equation is, in form, just a polynomial equation f(x,y,z,...)=0 in several variables, usually with integer coefficients. (Equations with rational coefficients can be multiplied by some integer to leave only integer coefficients.)

As the history sketched at the beginning of this page shows, simple equations of this form, with one variable, were studied from the time of Al-Khwarizmi in the 9th century onward. The notation and concepts used evolved closer to what we use today, until the 16th century, when the problem was mostly solved for equations of one variable of fourth degree or less, as long as "solutions" were allowed to be what (in modern terminology) are called algebraic numbers that are members of some finite extension of the field Q of rational numbers.

It is a much harder problem if one is required to find solutions to such equations which are actually rational numbers or even integers. Yet, curiously and perversely, this is exactly what Diophantos of Alexandria in the 3rd century CE, and his earlier predecessors and later successors, set out to do in solving a Diophantine equation. In form such equations are polynomials, yet the problem is made harder by adding additional requirements on the solution. Although adding additional conditions to a problem doesn't make the problem harder in all cases (sometimes just the opposite), it certainly does in this case.

The reason for this lies in the nature of the mathematics itself. Mostly this is because it becomes more difficult to determine whether a solution even exists. Note that for solving equations of the fifth or higher degree, it was not even realized until the 19th century that solutions weren't possible if it was required that they consist only of closed expressions involving arithmentic operations and radicals. Adding the further condition that solutions consist only of integers or rationals makes matters even more difficult, since one needs to determine conditions under which solutions even exist, depending on the degree of the equation, number of variables, nature of the coefficients, etc.

Yet this is precisely the swamp into which Diophantos and his predecessors and successors unwittingly stumbled. Of course, it's understandable why this happened. The problems that Diophantos et al. wanted to solve weren't formulated in modern notation. They weren't formulated in "algebraic" notation at all, which evolved only very gradually. Instead, they arose as "word problems". And they called for solutions in integers, or at worst rational fractions, because more sophisticated concepts of number hardly even existed at the time. Indeed, even rational numbers were cumbersome to work with or think about (to say nothing of negative numbers), while "irrational" numbers were still considered to be something of a scandal.

As a result of the work of Abel and Galois and others early in the 19th century, the problems of solving equations which were actually tractable finally became fairly well understood. So, at last, the time was right to attack the older problems, which were in fact much harder -- and therefore more interesting.

The discussion which follows is about the machinery which was invented in order to get a better grip on these harder problems. The problems are by no means fully conquered even today -- there remain many quite difficult open questions which have their roots in these seemingly simple problems about equations.


Ring theory

The key to this whole enterprise is finding the right abstraction to work with. We now know that in some sense Galois theory provides a good description (so far as one can be given) of the nature of solutions of one polynomial equation in one variable with rational coefficients. What we might hope for is finding solutions of such equations in integer or rational numbers. But that's very hard, so the evident strategy to adopt is to seek solutions which are conceptually somewhere between integers and fully general algebraic numbers. If we can deal with some intermediate construct, hopefully we can (with care) manage to bridge the gap and cross the chasm.

So let's call the concept we are looking for algebraic integers, and try to figure out how this concept should be defined. Without attempting to psych out what went on in the minds of the mathematicians of the 19th century who pulled this off, let's just try something obvious. An integer may be defined as a solution of a monic polynomial equation of degree one with coefficients in Z, i. e. an equation like x-a=0 for a∈Z.

Yes, this is a circular definition, but it can be generalized to a definition for equations of higher degree that not only isn't circular, but is in fact exactly what we want. So define an algebraic integer as any solution of a monic polynomial equation f(x)=0 whose coefficients are ordinary integers in Z, that is f(x)∈Z[x]. (Clearly, the assumption of a monic polynomial is very important now, since we are no longer at liberty to divide through by the leading coefficient without messing up the assumption on the other coefficients.)

What we have done is simply to limit consideration to polynomials whose coefficients lie in a ring, namely Z, instead of polynomials with coefficients in a field (Q). Moreover, Z and Q are closely related. It turns out that many (but hardly all) other rings like Z can be extended to fields like Q in exactly the same way -- Q is the field of fractions corresponding to the ring Z. Although this construction is familiar and obvious, there are subtleties about it which will motivate us to look at some interesting features of the theory of rings in general.

Let's look at how the construction of fractions actually works. What it really comes down to is finding a solution to the simplest possible Diophantine equation: ax-b=0 for a,b∈Z. Doing this is something like "solving" the equation x2+1=0. Essentially, one just gives a name to the solution (x=i in the latter case) and shows that the introduction of this symbol with the property that i2=-1 does not bring along any inconsistency. One way to do this is to construct some sort of mathematical object which actually has the desired property. In the case of x2+1=0, the construction can be done in a very general way by forming the quotient ring R[x]/(x2+1), noting that it is a field which contains R, and has an element which satisfies the given relation. (We'll explain this more fully a bit later.)

Now, as regards the solution of ax-b=0, we can give the name "a/b" to the solution. This a/b is constructed as follows: we start from the set of all ordered pairs of integers for which the second element isn't 0: {(a,b)∈Z×Z | b≠0}. We want this to be a field, so there must be rules of addition and multiplication for the pairs. Multiplication is easily defined as (a,b)×(c,d)=(ac,bd). Addition is a little trickier: we define (a,b)+(c,d)=(ad+bc,bd). (That being the way one actually adds a/b and c/d.) It's easy to check that the given set with these rules satisfies the axioms for a field. In particular, the multiplicative inverse of (a,b) is (b,a) (assuming a≠0), and the additive inverse of (a,b) is (a,-b). (One subtlety is that we have to regard (a,b) as identical to (a′,b′) in case ab′=a′b, or equivalently, to limit the original set of ordered pairs to only those where the elements have no integer divisor in common, the first element of the pair is a positive integer, and to maintain that condition by always removing common factors after addition or multiplication.)

This construction can actually be done for many, but not all, other commutative rings besides the integers Z. There is a certain property Z has which is not true of all rings. Namely, for a,b∈Z the product ab=0 if and only if at least one of a and b is 0. Although that is clearly true in Z, there are reasonable examples of rings where it fails. We'll give such an example very soon. The property that ab≠0 whenever both factors are not 0 is so important that a commutative ring with a multiplicative identity having this property is given a special name: a domain (or sometimes, integral domain). A ring clearly needs this property to perform the construction explained in the preceding paragraph.

But that property is sufficient. The indicated construction can be done for any integral domain A, and the field that results is called the field of fractions of the domain A. Any field is also a ring, and a domain A is naturally a subring of its field of fractions by means of the correspondence a→(a,1) for all a∈A. (This is why one requires a domain A to contain a mutiplicative identity element.) A field is a domain, because if we have ab=0, and one of the factors is not 0, then multiplying by its inverse would imply that the other factor is 0. (Multiplication is always commutative in a field.)

Another standard example of a ring is a ring A[x] of polynomials over an integral domain A: A[x] consists of all finite sums of the form ∑0≤k<∞ akxk, where all ak∈A. The "x" is just a formal symbol. The rules of addition and multiplication in A[x] follow from assuming ax=xa for a∈A and applying the distributive law as necessary. Since A[x] is a domain, a field of fractions can be constructed. It consists of rational functions, which are just quotients of polynomials f(x)/g(x), where g(x)≠0.

There is one more example, which is the main motivation for considering rings at all in this subject. If F is a field which is a finite extension of Q (that is, has finite degree over Q), F is called a number field or algebraic number field, because all of its members are algebraic numbers as defined previously (roots of some polynomial f(x)∈Q[x]). Let O (or OF or O(F/Q) when we want to be explicit) be the set of all elements of F that are algebraic integers as defined at the beginning of this section, namely roots of monic polynomials f(x) with integer coefficients, i. e. monic f(x)∈Z[x]. Then in fact O is a ring, called the ring of integers of the extension F⊇Q. Indeed, O is a domain. Addition and multiplication in O come from the corresponding operations in the field F. The only tricky part is showing that the sum and product of algebraic integers also satisfy monic polynomial in Z[x]. Trivially Z is a subring of any such O. And as you would expect, F turns out to be the field of fractions of the domain O.

For all these reasons, rings of integers are the natural generalization of the rational integers Z, which is a subring of any ring of algebraic integers. It is fair to say that the main concern of algebraic number theory is determining properties of such rings OF for algebraic number fields F. This is important, because there are properties of Z has which general rings of integers do not have.

Ideals, quotient rings

In order to be more explicit about such properties, we need to introduce a few more concepts of ring theory. One concept that just about everything else depends on is that of an ideal. This can be defined for an arbitrary ring R, but it's a little messy. If R isn't commutative, there can be ideals which are "right" ideals but not "left" ideals or vice versa, because the definition of an ideal involves multiplication. So we'll assume R is commutative and has a multiplicative identity element ("1") too. For such a ring R, consider a subset I⊆R. Then I is an ideal just in case: These axioms imply that I is a subgroup of R under addition. The additive identity 0 is in I by the second axiom. The second axiom also means that additive inverses are in I because R has a multiplicative identity, and hence its additive inverse -1∈R, so -a∈I for all a∈I. Note that if I≠R, I isn't a full ring (so it isn't a subring of R), because if 1∈I we would have I=R by the second axiom.

One of the motivations for this concept of ideals is that it makes possible the definition of another very important concept: quotient rings. If as above R is a ring and I is an ideal, the quotient ring, denoted by R/I, is defined as the set of distinct cosets of the form r+I for all r∈R, where r+I is defined for any r∈ R as the set {r+a | a∈I}. Not all cosets are distinct as r ranges over R. Two cosets r+I and r′+I are the same exactly when r-r′∈I. This is because we can write r+I = r′+(r-r′)+I = r′+I. (Because r+I = I if r∈I.)

Importantly, we can define a ring structure on the set of all cosets. Addition is simple: (r+I)+(r′+I) = (r+r′)+I. Multiplication is a little trickier: (r+I)(r′+I) = rr′+I, but this only works since it doesn't depend on the choice of representative of each coset. The problem is we can have r+I=r′′+I even though r≠r′&prime, if r-r′′∈I. But in that case, (r-r′′)r′∈I by the second axiom for ideals, so all is OK. Thus multiplication of cosets is well-defined and unambiguous.

Under these definitions of addition and multiplication R/I is a (commutative) ring with a multiplicative identity. The additive identity element is I, and the multiplicative identity is the coset 1+I.

The ideal structure of the rational integers Z provides some important examples. Let n∈Z. Then obviously the set nZ = {nm | m∈Z} is an ideal, often written simply as (n). It is just all integral multiples of n. The quotient ring Z/nZ=Z/(n) is known as the ring of integers modulo n, and it has n elements. It is familiar from elementary number theory, where one writes "equations" such as a≡b (mod n) just in case a-b is divisible by n, i. e. is a multiple of n, i. e. a-b∈(n). The study of such congruences, which is done all the time in number theory, is really just the study of the ring Z/nZ.

A very important special case is when n is a prime p. Then it is a fact that Z/pZ is a field – a finite field of p elements, sometimes denoted by Fp. However, if n is not prime, then Z/nZ isn't even an integral domain, because it has divisors of zero, i. e. nonzero elements whose product is 0. For instance, if n=st for s,t∈Z, but s,t≠±1, then as ideals (s)≠0 and (t)≠0, where 0=(0)=(n). Yet (s)(t)=0. We can in fact characterize prime numbers p as elements of Z such that Z/pZ is a field.

Unique factorization

We need a little terminology. In any ring, a unit is a ring element that has a multiplicative inverse which is also in the ring. For instance, in Z 1 and -1 are units, and they are the only units. Other rings of algebraic integers can have many units, and the set of units of the ring form an abelian group under multiplication. Determining this group of units, in fact, is one of the interesting computational issues in algebraic number theory.

Another important concept is that of a prime element of a ring. A little bit of care is required to define "prime" in a general ring, but essentially a prime element is one that has no factors other than itself and units. As far as divisibility and factors are concerned, units are essentially irrelevant, since they are invertible.

One of the most important properties that the integers have as a ring is unique factorization. That is, for any n∈Z, there is a unique way (apart from order and unit factors) to write n as a product of primes.

This fact can be proven using the order properties of Z, i. e. for every pair of distinct positive integers a, b, exactly one of a<b, a=b, or a>b is true. To begin with, this implies that for any pair of positive a,b∈Z, we can write a=qb+r with 0≤q and 0≤r<b. Reason: you can subtract b from a only a nonnegative but finite number of times (q) before the result is negative. This is because every number in the sequence a, a-b, a-2b, ... is strictly less than its predecessor, and if a is finite, there are only a finite number of distinct positive integers less than a. r is simply the last quantity before you have a negative number, and so 0≤r<b. The numbers q and r are uniquely determined by this procedure, and in fact there is a simple algorithm to find them, as we'll see in a moment.

For any positive integers a,b∈Z, we can define the greatest common divisor of the pair as the largest (positive) integer which divides both, written gcd(a,b), or simply (a,b). It may be, of course, that (a,b)=1, in which case we say a and b are relatively prime. As a matter of notation, if one number m divides another n, so that n=mq for some q∈Z, we write m|n. If this is not the case, then we write m∤n. (a,b) can be defined by the conditions that (a,b)|a, (a,b)|b, and if both c|a and c|b, then c|(a,b).

The Greek mathematician Euclid, known best for his geometry, was interested in number theory also. In addition to proving that there are infinitely many primes, he also gave a simple algrorithm for computing the greatest common divisior of two integers without explicitly factoring them – since factoring can be a relatively difficult process for large numbers. The algorithm is called, of course, the Euclidean algorithm.

To apply it, assume (without loss of generality) that a>b and write a=q1b+r1. Here, q1>0 and 0≤r1<b. Provided r1≠0 we can repeat the procedure and write b=q2r1+r2. We can repeat this procedure as long as the remainder rk isn't 0. If rk is the last nonzero remainder, then one notes that (a,b)|rk, because in fact (a,b) divides all such remainders in the process. But we also have rk-1=qk+1rk, hence rk|rk-1 and from rk-2=qkrk-1+rk, we find rk|rk-2 too. If we proceed back all the way we find rk|b and rk|a, hence rk|(a,b). Therefore rk=(a,b). In other words, (a,b) is the last nonzero remainder in this process.

But even nicer things are true. Go back to a=q1b+r1, so that r1=a-q1b. Similarly, r2= b-q2r1= b-q2(a-q1b)= Ma+Nb for some integers M and N (not necessarily positive). Proceding inductively, we have that (a,b)=Ma+Nb for some M,N∈Z. What this says is that a certain Diophantine equation can be solved for unknowns M and N if a, b (and hence (a,b)) are given. Note that if (a,b)>1, the equation d=Ma+Nb could not be solved if 1≤d<(a,b), because a solution would imply (a,b)|d.

We need one more fact about prime numbers. Suppose p is prime, and p|mn for some m,n∈Z. So by definition, mn=pq for some q∈Z. We claim that p must divide either m or n (perhaps both). For suppose that we don't have p|m, hence (p,m) can't be p. But p is prime, and (p,m)|p, so we must have (p,m)=1. Hence it is possible to write 1=Mp+Nm. Therefore n=n(Mp+Nm)=Mnp+Nmn=Mnp+Npq= p(Mn+Nq). In other words, p|n. This property possessed by primes in Z is not shared by "primes" in other rings of algebraic integers, as we shall soon see.

We now have all the facts we need to prove unique factorization in Z. The proof is done by supposing factorization isn't unique, and showing this leads to a contradiction. So suppose factorization isn't unique, and for some n there are two different factorizations of n (apart from units ±1). There cannot be any prime which occurs in one factorization but not the other, by the result of the preceding paragraph. Hence the same prime factors occur, but for at least one prime p we have n=Apr=Bps with 0<r<s, and (A,p)=(B,p)=1. Dividing through by pr reduces to the case where a prime occurs in one factorization but not the other, which is impossible. The contradiction proves the desired result.

It may seem "obvious" that factorization is unique, because we are so familiar with the fact this is true in Z that it is taken for granted. It may therefore be rather surprising that in many (in fact most) rings of algebraic integers, factorization is not unique. Unique factorization is actually a very special and rare occurrence, and a great deal of algebraic number theory is concerned with either trying to compensate for this "problem", or else trying to describe, in some sense, just how badly factorization fails to be unique.

Some examples

Let's look at an example where unique factorization fails. First, we need to introduce a concept that makes it easy to prove some results about factorization (and has many more applications as well). Suppose we have an algebraic number α∈F and F⊇Q is a Galois extension. (Such an extension always exists: it is a splitting field of an irreducible polynomial f(x) such that f(α)=0, but we don't necessarily assume F is the smallest such extension.) Let G=G(F/Q) be the Galois group.

To review a concept, which has been introduced before, we define the norm of α with respect to the extension F⊇Q to be the product of all numbers σ(α) as σ ranges over elements of G. (More generally, the norm works also for Galois extensions of base fields that contain Q.) Symbolically, the norm is:

NF/Q(α) = ∏σ∈G σ(α)
Note that "the" norm depends on the specific extension F/Q, and so the extension is indicated in the subscript.

For instance, let F=Q(√-5). F/Q is Galois, because it is the splitting field of the irreducible polynomial f(x) = x2+5 = 0. Any α∈F can be written as a+b√-5 for a,b∈Q. An element σ∈G can be specified by how it acts on a typical such element. Of course, since [F:Q]=2, G has only two elements: 1 (the identity) and σ, so σ is determined by how it acts on √-5. σ(√-5) has to be a root of f(x)=0 different from √-5, so it must be -√-5. It follows that σ(a+b√-5)=a-b√-5 for a,b∈Q, because σ is a field automorphism of F that leaves all elements of Q fixed. This should remind you of complex conjugation, because that is in fact the nontrivial automorphism of the group G(Q(i)/Q).

In the simple case at hand, we can give a simple formula for the norm:

NF/Q(a+b√-5) = (a+b√-5)(a-b√-5) = a2 + 5b2
(In the field Q(i), the norm of a complex number is the square of the modulus, i. e. |a+bi|2 = a2 + b2, so norm in the sense used here is closely related to the complex "norm".)

The norm symbol has some fairly obvious properties. From the way it is defined as a product, the norm is a group homomorphism from the multiplicative group of F to the multiplicative group of Q. It is also a homomorphism of the multiplicative semigroups of the rings of integers OF and Z, which means that the value of the norm of an algebraic integer is always an integer in the base field (in this case the integers Z of Q). This is because each σ∈G is a field automorphism which satisfies σ(αβ)=σ(α)σ(β) for all α,β∈F. In other words,

NF/Q(αβ) = NF/Q(α) NF/Q(β)
Furthermore, NF/Q(ε)=±1 if and only if ε is an invertible element of OF, i. e. a unit. (Since ±1 are the units of Z.)

Let's first determine the ring of integers OF. Let α=a+b√-5 be a general element of F, with a,b∈Q. If in fact α is an algebraic integer, then so is its conjugate α*=a-b√-5. Further, the sum α+α*=2a is in Q, and is also an algebraic integer, since the algebraic integers of an extension form a ring. But the only algebraic integers in Q are in fact in Z, so 2a∈Z. Similarly, 2b=(α-α*)/√-5 is an algebraic integer in Q, hence an element of Z, so 2b∈Z. The norm of both α and α* is a2+5b2, which is in Z. Multiplying the last expression by 4 shows (2a)2+(2b)25∈4Z. Since 5≡1 (mod 4), (2a)2+(2b)2≡0 (mod 4). This is impossible unless both 2a and 2b are even integers (just check the separate cases). Hence both a and b are in Z. The conclusion is that OF = {a+b√-5 | a,b∈Z}.

A slight elaboration of this argument shows that in any quadratic field Q(√d) where d∈Z is square-free, algebraic integers have the form OF = {a+b√d | a,b∈Z} unless d≡1 (mod 4), in which case OF = {(a+b√d)/2 | a,b∈Z, a≡b (mod 2)}. In other words, the naive guess that algebraic integers of Q(√d) just have the form a+b√d for a,b∈Z isn't entirely correct, but it is wrong only for d≡1 (mod 4), and then only by a little bit.

At this point, there is one delicate issue of nomenclature we must deal with. You will recall that a prime p∈Z is customarily defined as a (nonzero, nonunit) number which has no divisors other than units (±1) and ±p. We also proved that if p has this property, and if p divides a product mn, then either p divides m or p divides n (or maybe it divides both). In Z we can use this property to define p as a prime, since if the property is true of p then the more familiar condition that p has no nontrivial divisors is also true. This is because if p has this property, then the only divisors of p can be ±1 and ±p. (This follows from order properties of Z, because all divisors of a number n, except for ±n, have absolute values less than |n|.)

So these two properties of a nonzero p∈Z are equivalent. However, as we are about to see, the properties are not equivalent in other rings of integers. Nevertheless, we will find it convenient to use a generalization of the definition that p is prime if and only if p|mn implies p|m or p|m. So we will need a new term for the property of nonzero p that it is not a unit and not divisible by any other number except a unit times p. For this property we will use the term irreducible (like an irreducible polynomial). (And when this is the case, we say that p has only "trivial" factors, hence a nonunit p is defined to be irreducible if and only if all its factors are trivial, or equivalently if and only if it has no nontrivial factors.) We will make a similar distinction of "prime" and "irreducible" for integers α of other rings of integers.

Finally we can get back to factorization in F=Q(√-5). Observe that 21=3⋅7=(1+2√-5)(1-2√-5). We claim, first, that both 3 and 7 are irreducible in OF. Consider 3 first. If α=a+b√-5 were a nontrivial integral divisor of 3 – i. e. neither α nor 3/α is a unit – then we would have NF/Q(α) = a2+5b2 divides NF/Q(3) = 9. (Note, by the way, that for this extension, the norm is always nonnegative.) So NF/Q(α) must be either 3 or 9, since α isn't a unit. Obviously the equation a2+5b2=3 has no solutions for a,b∈Z. So NF/Q(α) isn't 3, hence it must be 9. Then NF/Q(3/α)=1, and 3/α is a unit, contrary to assumption. So 3 is irreducible. 7 is also irreducible, by a similar argument. The same kind of argument shows that 1±2√-5 must be irreducible, since both conjugates have norm 21, and any non-unit α that divided either would have a norm equal to 3 or 7, which we just observed is impossible. And we cannot have 1±2√-5 dividing either 3 or 7 (or vice versa), since 21∤9 and 21∤49.

What we've just shown is that 21 has two factorizations into irreducible numbers of OQ(√-5), and the factorizations are not equivalent, since the irreducible numbers in one factorization aren't unit multiples of either irreducible number in the other factorization. This shows that factorization of elements of the ring OQ(√-5) into irreducible numbers isn't unique.

This example shows that a number which has no non-trivial factors (e. g. 3 or 7) can divide a product (e. g. 21) of two other numbers (e. g. 1±2√-5) without dividing either one of the factors of the product. So an irreducible number is not "prime" in the sense that if it divides a product, it must divide at least one of the factors. This latter property is actually more useful in practice, so we want to use the term "prime" for it. Therefore, a distinction is made in a general ring of integers: a (nonzero, nonunit) number which has no non-trivial factors is said to be irreducible. (Equivalently, if α=βγ then either β or γ must be a unit.) On the other hand, the term prime is reserved for (nonzero, nonunit) numbers α which have the property α|βγ implies α|β or α|γ.

Now, in any ring of integers of an algebraic number field, a prime integer (in the new sense) must also be irreducible. This is because if α is not irreducible, then by definition we can write α=βγ, where neither factor is a unit. But if α is prime it must divide one of its factors. Say α divides β. Then β=αδ. Hence 1=δγ. That is, γ is a unit, contrary to assumption, and so α has only trivial factors, so it's irreducible. Thus the set of prime integers is a subset of the set of irreducible integers.

However, in the example we just examined where F=Q(√-5), where we do not have unique factorization, then some irreducible numbers are not prime. E. g. 1+2√-5 is irreducible (as we showed), but it is not prime, because it divides 21, but does not divide either 3 or 7. Therefore it's possible for the set of prime integers to be a proper subset of the set of irreducible integers, i. e. a strictly smaller subset.

This raises some interesting questions. Recall that the cases we are interested in are the rings of algebraic numbers. By definition, these are the rings of integers A=OF of a finite extension F/Q.

In any such A, it is always true that we can write any element as a product of a finite number of irreducible integers. The reason is that for any (nonzero, nonunit) α∈A, if α isn't irreducible, we can write α=βγ, where neither factor is 0 or a unit. Since F/Q is a finite extension, we can always compute norms, and we have NF/Q(α) = NF/Q(β)NF/Q(γ). In some extensions a norm can be negative, but we can also stick in the absolute values of each term, and since no term is ±1, each factor on the right hand side is an an integer in Z that is strictly smaller in absolute value than |NF/Q(α)|. Since all numbers here are finite, this process can't continue indefinitely. So not only do we get a finite product of irreducible integers, but we in fact get a finite product of finite powers of distinct irreducible integers. However, as the example above showed, this factorization need not be unique.

On the other hand, we can also say that if some algebraic integer α can be expressed as a product of powers of distinct prime integers, then (up to order and unit factors), the expression is unique as to which primes occur in the factorization and the powers of each that occur. To prove this, note first that any prime π which appears in one factorization into powers of primes must appear in the other. Because since π is in one factorization, it divides α, and because it is prime, it must divide at least one factor in any other factorization into powers of distinct primes. That factor must then be a power of π, since π can't divide a power of a different prime (using the fact that all primes are irreducible). Furthermore, if π occurs at all in some factorization of α, it must occur to eactly the same power in each factorization. Otherwise if the smaller power has exponent n, then we could cancel πn from both factorizations. That would leave the integer α/πn with distinct prime power factorizations, one containing π and the other not, which we just ruled out.

The problem here is that we do not know that every integer α∈A actually has even one factorization into distinct primes. Consequently, if there can be irreducible integers of A that aren't prime, so the set of prime integers is a proper subset of the set of irreducible integers, we cannot be sure that there is the kind of unique factorization theorem for integers of A that we have for Z, regardless of whether we specify "primes" or "irreducibles". Factorizations into irreducibles can't be guaranteed to be unique, while factorizations into only powers of primes might not even exist.

However, if the set of primes is the same as the set of irreducibles, then factorizations of any integer of A into irreducibles, and hence primes, are guaranteed to exist, And furthermore, the factorizations must also be unique.

What about converses? Suppose we can guarantee that factorizations of any α∈A into primes must exist. Does that imply prime = irreducible? Yes, for the following simple reason. We have already shown that if a factorization into primes exists, it must be unique. So suppose α is irreducible. If a factorization into primes must exist, then α=πβ for some prime π. But because α is irreducible, β has to be a unit. Any prime times a unit is still a prime, so α itself is a prime, and the set of irreducible elements contains only primes.

Or suppose we can guarantee that factorizations of any α∈A into irreducibles are unique. Does that imply prime = irreducible? Again the answer is yes. For suppose α is irreducible and that for some β and γ we have α|βγ. Since α|βγ there is a δ such that αδ = βγ. Write the right and left hand sides as a product of powers of distinct irreducible numbers, so that (ignoring possible factors which are units) αδ1⋅⋅⋅δm = β1⋅⋅⋅βn (except that if α is among the δi, combine the terms). Then by the assumed uniqueness αk = βj for some j and some power k≥1, and both sides are powers of the same irreducible number α. That number must have been a divisor of either β or γ (or both). In any case, this means α is prime.

What we have now proven is this: If A=OF is the ring of integers of a finite extension F/Q, the following conditions are equivalent:

  1. The set of irreducible elements of A is the same as the set of prime elements of A (up to unit factors).
  2. Every element of A has a unique factorization into powers of irreducible elements (up to unit factors).
  3. Every element of A has a unique factorization into powers of prime elements (up to unit factors).
So (as is obvious) if all irreducible elements are prime, the difference in how these are defined is irrelevant. However, if there are irreducible elements that aren't prime, then factorizations of some integers into powers of irreducibles will not be unique, and some integers will not even have a factorization into powers of primes.

It turns out that there are certain types of rings in which all irreducible elements are prime, so the two concepts are equivalent in such rings. Z is one example of such a ring, but it is not the only one. Certainly, if we could guarantee that the integers of some extension of Q always had some factorization into primes (in the special sense used here), then as we showed, the factorizations must be unique. In order to investigate this issue further, we need help from the theory of ideals of rings of integers. By placing certain conditions on the types of ideals that the ring can have, we will be able to guarantee that any irreducible integer is prime, so that factorizations of any integers into irreducibles (which always exist) are also factorizations into primes, and therefore that they are unique.

One important type of ring that has this property is called a principal ideal domain, which means that every ideal consists of elements that are multiples of a single element (that isn't a unit) by some element of the ring. This is in fact the case with Z, where all ideals are of the form nZ for some n. (The ideal is the full ring Z itself if n=±1.) But there are other rings of integers that are also principal ideal domaings, and a large part of algebraic number theory is about identifying which rings have this property. We'll look into this in much more detail, but first we need to explain further why we care about unique factorization.

Why do we care about unique factorization?

One may reasonably ask why this matters so much that we've spent so much time on it. The answer is that even leading mathematicians did not clearly appreciate, until the 1840s, how unique factorization could fail in rings of algebraic integers -- and they sometimes based erroneous or incomplete proofs on the assumption of unique factorization.

The first noteworthy case of this seems to have been Leonhard Euler (1707-83). Number theory was one of a vast number of Euler's interests, and one of the problems he dealt with was what is now known as Fermat's Last Theorem: The equation xn+yn=zn has no nontrivial integer solutions for integers n>2. Pierre de Fermat (1601-65) himself publicly stated this result only in case n=3 or 4. Fermat stated the case for general n only in a private note in his copy of Diophantus' Arithmetica. No written proof of this by Fermat in even the simplest cases is known, but it can easily be proven if n=4 by a technique Fermat invented, called the "method of infinite descent". This method involves assuming a solution exists for some triple (x,y,z) and then showing that there must always be another non-trivial solution (x′,y′,z′) in which at least one of the numbers is strictly smaller than any number of the original solution. Since that's impossible, the contradiction shows there couldn't have been any solution to begin with.

Anyhow, Euler knew the proof when n=4, and set out to give a proof along the same lines when n=3. He had the brilliant and original idea to work with numbers of the form a+b√-3 for a,b∈Z, that is, numbers in Z[√-3]. This was a great idea because it provided an entirely new and powerful set of tools for dealing with questions about ordinary integers. Unfortunately, as prescient as Euler was, there were too many subtleties in this area to use the tools correctly from the start. One step of the argument involved reasoning that if for some c∈Z, c3 = a2+3b2 = (a+b√-3)(a-b√-3), and if the factors a+b√-3 and a-b√-3 are relatively prime, then each of the factors must themselves be cubes of numbers in Z[√-3]. One problem is that Z[√-3] isn't the full ring of integers of Q(√-3), since -3≡1 (mod 4). But even disregarding that, the conclusion depends on unique factorization of the numbers involved. Although it happens to be true that unique factorization holds in the integers of Q(√-3), Euler didn't seem to recognize the need to prove that.

This same lack of clarity about unique factorization in rings of algebraic integers seems to have persisted into the 1840s. In 1847 Gabriel Lamé (1795-1870) thought he had a proof of Fermat's theorem for arbitrary n. Lamé worked with numbers of the form Zn] where ζn is a root of xn-1=0 -- which is called an nth root of unity. (Here we assume n is the smallest integer for which the chosen ζn satisfies the equation.) Zn] is called the ring of cyclotomic integers (for a particular choice of n), and it is in fact the ring of integers of the field Qn), the nth cyclotomic field -- of which we shall have much to say later on. By 1847 some astute mathematicians did recognize the need for proof of unique factorization, and they pointed it out to Lamé. He must have quickly appreciated the problem, since he didn't persist in developing his "proof".

The purported proof went something like this: Suppose there were a solution of xn+yn=zn for some n>2 and integers x, y, and z that are relatively prime. The equation could be rewritten as

xn = zn - yn = ∏1≤k≤n (z - ζnky)
Since x∈Z but most of the factors on the right hand side aren't, there would be a clear violation of unique factorization. Unfortunately, such a violation can't be ruled out, so the proof doesn't work. (It does work for those n where Qn) has unique factorization. It wasn't known until 1976 that there are only 29 distinct cyclotomic fields that do have unique factorization.)

Interestingly enough, at almost exactly the same time, Eduard Kummer (1810-93), working independently on questions involving cyclotomic fields, had not only understood the problem of (lack of) unique factorization, but had even started to develop a way around the problem -- what he called the method of "ideal numbers" or "divisors". Kummer had also found examples where unique factorization failed in Z37]. He wrote a letter to the mathematicians in Paris who were debating Lamé's work, and pretty much put an end to their deliberations. Although Kummer's work was not solely concerned with Fermat's Last Theorem, he made what were some of the most significant partial solutions to the problem, and in the process played a huge role in advancing algebraic number theory. His work also led to the theory of ideals as discussed here. Much of what Kummer tried to do was to find "ideal" numbers, of some sort, for which unique factorization could be proven, so that as above a contradiction would arise if Fermat's equation had a solution.


Rings and ideals

It's time to introduce a few more concepts -- at which point we can really begin to talk about algebraic number theory in earnest.

Because of the importance of unique factorization, an integral domain which has unique factorization is called a unique factorization domain, or UFD. A little more precisely, an integral domain R is a UFD if there is a set P of irreducible elements such that every nonzero element α of R can be written in a unique way as a finite product α=u∏1≤k≤n pkek, where u is a unit, n is a nonnegative integer, pk are distinct elements of P, and exponents ek are nonnegative integers (all of which depend on α).

The defining property of a UFD is pretty clear and understandable, but it is not expressed in the language of ideals. As we shall see, a great deal of what we want to know about algebraic numbers can be expressed in terms of ideals, so we'd like to know how unique factorization fits in. To this end, we have this centrally important concept: An integral domain R is called a principal ideal domain (PID for short) in case every ideal I⊆R is equal to an ideal of the form (α)=αR for some α∈R.

It is a fact, which is not difficult to prove, but not immediately obvious either, that every principal ideal domain is a unique factorization domain. So if we want to show that a given integral domain R has unique factorization, it is sufficient to show that R is a PID. Unfortunately, among all integral domains there are some which are UFDs but not PIDs. So R can be a UFD even if it is not a PID – being a PID is not a necessary condition. The class of UFDs contains the class of PIDs, but is strictly larger. For instance, if F is a field, the ring of polynomials in one variable F[x] is a PID (and a UFD). (The reason this is true will be mentioned in a moment.) However, the ring of polynomials in two variables F[x,y] is a UFD but not a PID.

Obviously, it would be convenient to have a simple criterion to determine whether a domain R is a PID (and hence a UFD). It turns out that the the process of division-with-remainder that can be performed in Z and in polynomial rings in one variable fills the bill. All that's needed is an integer valued function on the domain R with certain properties. For a∈R, let this function be written as |a| (since it is much like an absolute value). This function should have three properties:

  1. |a|≥0 for all a∈R, and |a|=0 if and only if a=0
  2. |ab| = |a|⋅|b| for all a,b∈R
  3. if a,b∈R and b≠0, then there exist q,r∈R such that a=qb+r, with 0≤|r|<|b|
A domain R with such a function is called a Euclidean domain, because one has a Euclidean algorithm that works just as it does in Z. If I⊆R is a nonzero ideal, it has an element b∈I, with b≠0, of smallest nonzero "norm" |b|. If a∈I we can write a=qb+r for q,r∈R, and |r|<|b|. Yet r=a-qb is in I, so by the assumption of minimality of |b| we have |r|=0 and therefore r=0. Hence a=qb, I⊆(b), and finally I=(b) is principal. In other words, every Euclidean domain is a PID, and therefore a UFD.

Unfortunately, even among rings of integers of quadratic fields, only finitely many are known to be Eucldean. If F=ℚ(√d) and OF is the ring of integers, it is known that for d<0 the ring is Euclidean only when d=-1, -2, -3, -7, or -11. If d>0, the number of rings which are Euclidean is larger. What is known is that only a finite number of these are Euclidean using the norm function. At least one other is Euclidean using a function other than the norm function, but so far it's not known whether there are only a finite number like that.

This may seem rather disappointing, but in fact the quadratic fields where OF is a PID are also quite scarce. If d<0, then in addition to the Euclidean cases, the only other values are d=-19, -43, -67, and -163. The proof that this is a complete list for d<0 is quite difficult and was not satisfactorily done until 1966.

The situation with d>0 is even more difficult. It is not actually known whether there are only finitely many d>0 such that OQ(√d) is a PID. Gauss himself conjectured that there are infinitely many, but this is still an important open question.

Returning to concepts, we recall that among all integral domains, the class of PIDs is strictly smaller than the class of UFDs. It turns out that there is a subclass of all integral domains in which the notions of UFD and PID are equivalent. In fact, this is an important class, because it includes all rings of algebraic integers. This class itself can be defined by a number of equivalent conditions. But to explain this, we need to discuss the group of fractional ideals of an integral domain.

Fractional ideals

If A and B are ideals of any commutative ring R, it's easy to define the product of two ideals as a set of finite sums: A⋅B = {∑1≤k≤n akbk | ak∈A, bk∈B, n∈Z, n>0}. By definition of an ideal, A⋅B⊆A and A⋅B⊆B, hence A⋅B⊆A∩B. If R has a unit (as we usually assume), then clearly A⋅R=A. So in the set of ideals of R there is a commutative binary operation of multiplication, and it has an identity. Multiplication of ideals is associative since R multiplication in R is associative. (A set with an associative multiplication is called a semigroup, and if an identity exists, it's a monoid. In neither case is multiplication necessarily commutative.)

In a situation like this, it's natural (for a mathematician anyhow) to wonder what conditions on R would make the set of its ideals into a full group -- that is, how the inverse of an ideal might be defined. It turns out that for integral domains the conditions are beautiful and everything one could hope for.

To get an idea of where to start, consider the principal ideal domain Z. For any nonzero n∈Z, the obvious thing to consider is (1/n) = {m/n | m∈Z}. That's sort of like an ideal, since it's a commutative group under addition, and m(1/n)⊆(1/n) for all m∈Z. Also, under the obvious definition, (n)(1/n) = Z (since the product contains 1). So (1/n) surely acts like the "inverse" of the ideal (n) of Z for any n.

Suppose R is any commutative ring with an identity and M is any set at all (not necessarily related directly to R) where one can define an operation of multiplication rm=mr for r∈R and m∈M. Suppose further that:

  1. M is a commutative group under addition.
  2. rm∈M for all r∈R and m∈M.
  3. 1m = m for all m∈M.
  4. r(m1 + m2) = rm1 + rm2 for all r∈R, m1,m2∈M.
Then M is said to be an R-module. (If R isn't commutative, one can define R-modules by being a little more picky about the definition.) So from the example above, (1/n) is a Z-module, and also any ideal of a ring R is an R-module.

Given all that, if R is an integral domain, whatever a fractional ideal of R might be, it certainly should be an R-module. Indeed, we can formally define a fractional ideal of R as an R-module M such that:

  1. M⊆F, if F is the field of quotients of R.
  2. The multiplication rm for r∈R and m∈M is just the normal multiplication in F.
  3. There is some nonzero r∈R such that rM⊆R.
(As a reminder, the field of quotients of an integral domain R is defined as follows. Consider the set of pairs (m,n), with m,n∈R, n≠0 Consider two pairs (m,n) and (m′,n′) to be equivalent just in case mn′=m′n. (Think of this as fractions, where m/n = m′/n&prime when mn′=m′n.) The underlying set of the field of quotients is the set of equivalence classes of pairs under this relation. Define addition on this set by (m,n)+(m′,n′) = (mn′+m′n,nn′) and multiplication by (m,n)(m′,n′) = (mm′,nn′). Then it can be shown that addition and multiplication are well-defined, and the set of equivalence classes of pairs is indeed a field.)

Multiplication of fractional ideals is defined just like multiplication of ideals: M⋅M′= {∑1≤k≤n mkm′k | mk∈M, m′k∈M′, n∈Z, n>0}. With this definition, the set of fractional ideals of R is a monoid. The question is: what conditions on R will guarantee that fractional ideals form a group? This is not just a matter of idle curiosity, because it turns out that for rings with the right properties, one has unique factorization in the group of fractional ideals, and in the set of integral (i. e. ordinary) ideals as well. For rings of algebraic integers, which just happen to have the right properties, this unique factorization of ideals is almost as good, for many purposes, as having unique factorization of ring elements themselves.

We need just a few more concepts before we can state the necessary conditions. A proper ideal of R is an ideal I that is not equal to R, i. e. a proper subset. One writes I⊂R. A prime ideal P is a proper ideal such that for all a,b∈R, ab∈P only if either a∈P or b∈P (or both). In Z, for example, (6) isn't a prime ideal, since 2⋅3∈(6), but 2∉(6) and 3∉(6). A maximal ideal is a proper ideal P that is not properly contained in some other proper ideal P′. The integral domain R, with field of fractions F, is said to be integrally closed if every α∈F that is integral (i. e. an algebraic integer of F) over R is actually an element of R. For example, if R=Z, then F=Q, and to say α∈R is integral over F means f(α)=0 for some monic polynomial f(x) with coefficients in R, i. e. f(x)∈Z[x]. If α=a/b for a,b∈Z, then when you "clear fractions" in f(a/b)=0, you find b|a. This argument applies in any UFD, so in fact any UFD is integrally closed.

Lastly, we need a type of finiteness condition. An ideal I is said to be finitely generated if it has the form I = {∑x∈S axx | ax∈R for all x∈S}, where S⊆R is a finte set of generators. It is much like a principal ideal, except for having n=#(S) generators instead of 1. If all ideals of a ring are finitely generated, the ring is said to be Noetherian, after Emmy Noether (1882-1935). There are several equivalent characterizations of Noetherian rings. For instance, R is Noetherian if and only if every nonempty family of ideals of R has a maximal element (which contains all the other members of the family) with respect to inclusion.

Finally we can state the crucial result: If R is an integral domain, then the following are equivalent:

  1. R is Noetherian, integrally closed, and every nonzero prime ideal is maximal.
  2. Every nonzero ideal of R is uniquely expressible as a product of prime ideals.
  3. Every nonzero ideal of R is a product of prime ideals.
  4. The set of nonzero fractional ideals of R forms a group under multiplication.
The first item on this list characterizes R in terms of several ring-theoretic properties. The second item is unique factorization into prime ideals, and it is in fact equivalent to the apparently weaker third item. The fourth item is the key fact, which answers our earlier quesion about when the fractional ideals of R form a group.

Any integral domain R that has one of these properties has all of them, and is called a Dedekind domain, after Richard Dedekind (1831-1916). It isn't hard to show that if F⊇Q is a finite field extension, then the ring OF of algebraic integers of F has the properties listed in the first item, and so OF is a Dedekind domain and has the other properties also.

As rings, Dedekind domains have some very nice properties. For example, if R is a Dedekind domain:

  1. P is a prime ideal of R if and only if it is indecomposable, i. e. P ≠ I⋅I′ where I and I′ are ideals other than P or R.
  2. If P is a prime ideal and P divides the product I⋅I′ of two ideals (P|I⋅I′), then P|I or P|I′.
  3. Divisibility between fractional ideals is equivalent to inclusion, i. e. if M and M′ are nonzero fractional ideals, then M divides M′ if and only if M⊇M′. (Multiplication of ideals yields a result that is smaller.)
  4. If R is a unique factorization domain, it is a principal ideal domain. (The converse is also true, but more obvious.)
  5. Every fractional ideal M of R can be generated by at most two elements, and one of these elements of M can be chosen arbitrarily.

Dedekind, Emmy Noether, and a few others were the main developers of ideal theory in this form, and Dedekind was a leading figure in the theory of algebraic numbers in general. Ernst Kummer (1810-1893) somewhat earlier had a more primitive theory of "ideal numbers" which provided a kind of unique factorization of algebraic numbers, but Dedekind made the theory much simpler and more general.

We will shortly see some of the fruits of that effort.

Factorization of prime ideals in extension fields

Suppose E⊇F is a field extension and OE, OF are the corresponding rings of integers. We have OF ⊆ OE, and if I is any ideal of OF then of course I ⊆ OE. I isn't an ideal of OE, because it isn't closed under multiplication by elements of OE, but I⋅OE is an ideal of OE.

If P is a prime ideal of OF, we have no good reason to expect that P⋅OE is prime in OE, and in fact it generally is not. For example, let's look at quadratic extensions of Q. So suppose F=Q, d is a square-free integer, positive or negative, and d ≠ 0 or 1. Let p∈Z be a positive prime. We can ask whether (the ideal corresponding to) p is still a prime ideal of the ring of integers of E=Q(√d). That is, we can ask what happens to the prime, principal ideal (p) in OQ(√d). (With principal ideals, we can usually get away without specifying what ring they are contained in.)

Consider the Diophantine equation ±p=a2-b2d. If we can solve it by finding a,b∈Z that make the equation true, then we can verify that we have a factorization of the ideal (p) in OQ(√d) expressed by the equation of principal ideals (p)=(a+b√d)(a-b√d). (Proof: p∈(a+b√d)(a-b√d) since ±p=a2-b2d, so (p)⊆(a+b√d)(a-b√d). But by the definition of a product of ideals, all members of (a+b√d)(a-b√d) are a product of an element of OQ(√d) and the number [a+b√d][a-b√d]=±p, and so (a+b√d)(a-b√d)⊆(p).)

So solutions of a certain Diophantine equation tell us about how an ideal (p) of Z factors in the integers of a quadratic extension. And in fact, if the equation can be solved, then the prime ideal (p) of Z is not also a prime ideal of OQ(√d). Is there a converse, that is, can we infer solutions of a Diophantine equation from how ideals factor in extensions? Unfortunately, the situation is more complicated. For instance, if OQ(√d) is not a PID, then it is possible for (p) to not be a prime ideal of OQ(√d), yet there may be no solutions of ±p=a2-b2d. One of the reasons that algebraic number theory is useful and interesting is that it helps get a handle on the complications, as we shall see.

For simplicity, suppose d≡3 (mod 4). Then we have remarked that the integers of Q(√d) are given by OQ(√d) = {a+b√d | a,b∈Z}. Suppose we could find some integer α∈Q(√d) such that the norm NQ(√d)/Qα = ±p. (For short, write Nα for the norm.) So if α=a+b√d with a,b∈Z, we have ±p=Nα=a2-b2d. When this happens, we can write ±p=(a+b√d)(a-b√d). Since Nα is a prime (in Z), α=a+b√d must be a prime of OQ(√d) – and p is not a prime in that ring. To make things really easy for showing examples, let d=3. Trying a few values for a and b, let α=4+√3, so Nα=13=(4+√3)(4-√3). Or if α=8+5√3, then Nα=-11=(8+5√3)(8-5√3). So both 11 and 13 are not primes in OQ(√3).

This translates directly to a factorization of the principal ideal (13) in OQ(√d), namely (13)=(4+√3)(4-√3). Clearly (13) isn't a prime ideal, since it has two distinct nontrivial prime factors. (11)=(-11) is likewise not a prime ideal. (Q(√3) happens to be a PID, but that isn't crucial here.) In general, then, if p∈Z is a prime, the principal ideal (p) is a prime ideal of Z, but for any square-free d∈Z (with d≠0,1), (p) splits into the product of two distinct ideals in the integers of the quadratic extension Q(√d) if we can find a,b∈Z, with a≠0, such that ±p = N(a±b√d) = a2-b2q. In that case, we have a factorization of (p) in OQ(√d) into principal ideals such that (p)=(a+b√d)(a-b√d), and the factors are prime ideals. If a≠0, these will be distinct ideals. (Since p is assumed to be prime in Z, if a=0, we must have b=±1, and the two ideals will be the same. In this special case, one actually says that p "ramifies" in the extension field.)

Conversely, what if, for the given d, there are no integers a,b∈Z that satisfy the equation ±p = a2-b2d – can we conclude that p doesn't split in OQ(√d)? Actually, no, we can't in general. We could if OQ(√d) happens to be a PID, since, because d≡3 (mod 4), the factor ideals would have to have the form (a±b√d) for some a,b∈Z, which would yield a solution of the equation, contrary to supposition. (And remember, by the general theory of Dedekind rings, the same is true if OQ(√d) has unique factorization, since a ring of integers of a field is a Dedekind ring, and therefore is a PID, if and only if it has unique factorization.)

However, and this is a major "however", if OQ(√d) isn't a PID, or equivalently if it doesn't have unique factorization of elements, then we could have a factorization of ideals where (p)=AB into prime ideals that are not principal. (The prime ideal factors themselves are uniquely determined, which, do not forget, is always true in Dedekind rings.) In that case, the factorization doesn't necessary give us a solution of ±p = a2-b2q. Thus it's possible to have primes p where (p) splits in OQ(√d) even if the equation has no integer solutions. We might even have A=B, in which case (p) ramifies.

So one key thing this discussion illustrates is that there is a close, though not entirely simple, connection between solving simple Diophantine equations (like ±p=a2-b2q) and determining how prime ideals of a subfield split into prime ideals in an extension field. Further, these equations arise from asking what numbers can occur as norms of some integer of an extension field. But the situation is more complicated if the ring of integers of the extension field doesn't have unique factorization (or equivalently, if it's not a principal ideal domain). This is another reason uniqueness of factorization (or the lack of it) is rather important.

To give some idea of where this leads, we'll just say that class field theory, which we will be coming to shortly, originated in trying to deal with such questions, in the form of saying someting about how prime ideals split in extension fields, and also about when nonprincipal ideals can become principal ideals in extension fields. It helps us deal with the complexities that result from failure of unique factorization of numbers, and from the equalivalent untidiness of having ideals that are not principal ideals.

But before proceeding, let's establish some general conventions and terminology. The terminology will be used to keep the upcoming discussion succinct, so unfortunately you'll need to memorize the details or else plan to refer back here if you want to follow beyond this point. As a reward, we'll be able to state some fairly simple rules for when primes of Z do or do not split in extension fields of Q.

Let's set out the terminology to describe the general possibilities for how prime ideals split in arbitrary extension fields. We continue to suppose E⊇F is a finite extension, of degree n=[E:F], not necessarily Galois. Let P be a prime ideal of OF. P⋅OE is an ideal of OE, so it factors in a unique way as a product of powers of distinct ideals that are prime in OE:

P⋅OE = ∏1≤j≤g Qjej
We say that each Qj is a prime lying above P in E. Since every Qj divides P, each one contains P: Qj⊇P. In fact, P=Qj∩OF for each j.

If we start with a prime ideal P of the integers of an extension of Q, then P∩Z is a prime ideal (p) of Z for some rational prime p, and P lies above (p). Given the factorization of P shown above, each Qj also lies above (p). By general commutative ring theory, since Qj is a prime of OE, the quotient ring OE/Qj is a finite field Fq where q is some power of p. (In Dedekind domains, any prime ideal is a maximal ideal, and for any commutative ring R with identity, the quotient of R by a maximal ideal is a field.) Likewise, since P is prime in OF, OF/P is a finite field Fq′ where q′ is also a power of p (and divides q). Each field OE/Qj is a finite extension of the field OF/P, and we denote its degree by fj. Finally, it turns out that P⋅OE is a vector space of degree n=[E:F] over the field OF/P.

These facts allow one to make the following definitions. The exponent ej is called the ramification index of Qj (relative to the given extension). The degree fj is the inertial degree of Qj. And g is called the decomposition number. All of these numbers are specific to how the prime P of OF splits (or decomposes) in OE. These numbers are the key data one wants to have for each prime P for any given extension E⊇F. A prime P is said to be ramified unless all exponents ej=1, in which case it is unramified. Primes that ramify tend to make life more complicated, but fortunately there are only finitely many for any particular extension. All these numbers are related as follows:

n = [E:F] = ∑1≤j≤g ejfj
If P is unramified and all inertial degrees are 1, then we must have g=n, and one says that P splits completely. This is a fairly unusual circumstance. Although only finitely many primes are ramified, most have some inertia.

If E⊇F is a Galois extension, things are much simpler. In that case, all ej have the same value, say e, and all fk have the same value, say f. (This is because the Galois group G(E/F) acts on the various Qj and permutes them among themselves. Since each σ∈G(E/F) is an automorphism, these ideals are all essentially alike.) Hence [E:F]=efg. So there are exactly g primes lying above P in E, and each has the same ramification index and inertial degree.

Splitting of prime ideals in quadratic extensions of Q

This all probably sounds rather dry and abstract, but when we look at simple examples, such as quadratic extensions, why it's interesting becomes clearer. Because of how the ramification indexes and inertial degrees are related, for any prime ideal (p) of Z there are only three different possibilities:
  1. (p)=P1⋅P2, so (p) splits completely. (e=f=1, g=2)
  2. (p)=P is a prime ideal in Q(√d), so p is inert. (e=g=1, f=2)
  3. (p)=P2 where P is prime, and p is ramified. (f=g=1, e=2)
It turns out that there are simple criteria for each of these cases. But figuring out what the criteria are is tricky.

Recall that in Q(√3) we found (13)=(4+√3)⋅(4-√3) and (-11)=(8+5√3)⋅(8-5√3), so both (11) and (13) split completely. Clearly, (3)=(√3)2, so (3) is an example of a prime ideal if Z that is ramified. How about an example of a prime ideal that is inert in the extension? This is a little harder for a couple of reasons. (p) will be inert just in case it neither splits nor is ramified, but we don't yet have simple criteria to rule out those cases.

So let's back up a little and look at the details. We found examples where (p) splits in the integers of Q(√3) by solving the equation ±p=a2-3b2, because that gave elements a±b√3 whose norm was ±p. Being able to find such elements guaranteed that the prime split. But Q(√d) with d=3 is a special case, since here d≡3 (mod 4). In that case, and also if d≡2 (mod 4), the integers of the extension have the form a+b√d with a,b∈Z.

If d≡1 (mod 4), integers can also have the form (a+b√d)/2, with a,b∈Z, and we might have a factorization like (p)=((a+b√d)/2)⋅((a-b√d)/2), so we would have also to consider solvability of ±4p=a2-3b2. If we were to look at solvability of the approriate equation, according as to whether or not d≡1 (mod 4), the solvability would be a sufficient condition for (p) to split (or ramify if a=0). Notice that this sufficient condition for (p) to split holds regardless of whether or not Q(√d) is a PID.

Now we need to find a convenient necessary condition for (p) to split. Unfortunately, solvability of one simple equation is not a necessary condition in general. It would be, as we'll see in a minute, if the ring of integers of Q(√d) happens to be a PID, as is true when d=3. However, in quadratic extensions where the ring of integers isn't a PID, being unable to solve the applicable equation doesn't guarantee (p) cannot split, because there might be non-principal ideals that are factors of (p).

So let's ignore that problem for a moment and just focus on the case where the ring of integers of Q(√d) is a PID. Can we then find a necessary condition on p for (p) to split or ramify, i. e. for (p) to not be a prime ideal of the integers of Q(√d)? That is, what must be true about p if (p) splits or ramifies?

If (p) splits or ramifies, then (p)=P1⋅P2 for nontrivial ideals Pi. (The ideals are the same or distinct according as (p) ramifies or splits.) Assuming Q(√d) is a PID, then P1 is generated by a+b√d where both a,b∈Z, if d≡2 or 3 (mod 4), or else by (a+b√d)/2 with a,b∈Z, if d≡1 (mod 4). Likewise, the conjugate ideal P2 is generated by a-b√d or (a-b√d)/2. Since p∈P1⋅P2, by definition of a product of ideals, p is of the form p = ε(a+b√d)(a-b√d) = ε(a2-db2) or p = ε(a+b√d)(a-b√d)/4 = ε(a2-db2)/4 for some integer ε of Q(√d).

Recall that the norm of an element of a Galois extension field is the product of all conjugates of the element. So for an element that is also in the base field, the norm (with respect to a quadratic extension, which is always Galois) is the square of the element. Taking norms of both sides of the possible equations, then either p2 = N(&epsilon)(a2-db2)2 or 16p2 = N(&epsilon)(a2-db2)2. For simplicity, consider just the first case. Then N(ε) is a positive integer that has to be 1, p, or p2. If N(ε)≠1 then N(a±b√d) = a2-db2 must be ±1, so a±b√d must be a unit, and both Pi must be non-proper ideals (i. e. equal to the whole ring). Hence N(ε)=1. This will be true also in the other case (when d≡1 (mod 4)), so ±p=a2-db2 or ±4p=a2-db2. Consequently, solvability of the appropriate equation (depending on d mod 4), is a necessary condition for (p) to split or ramify.

So we have a necessary and sufficient condition for (p) to split or ramify in Q(√d), in terms of solvability of Diophantine equations, provided OQ(√d) is a PID. Since the only other possibility is for (p) to be inert, we also have a necessary and sufficient condition for that.

However, still assuming that OQ(√d) is a PID, we can find a further necessary condition for (p) to split or ramify. Take those equations we just found and reduce them modulo p. Then both equations become a2≡db2 (mod p). Since p is prime, Z/pZ is a field. Assume first that b≢0 (mod p). Then b has an inverse in the finite field. So we have d≡(a/b)2 (mod p). In other words, d is a square mod p. This is the additional necessary condition we were looking for on p in order for (p) to split or ramify. Since it's a necessary condition, if d is not a square mod p, then (p) must not split or ramify, and thus p is inert. And so, for d to be a non-square mod p is a sufficient condition for p to be inert.

(What if b≡0 (mod p)? Then b=b1p. So ±p = a2 - (b1p)2d or else ±4p = a2 - (b1p)2d. Either way, p∣a, hence p2 divides the right side of either equation, and hence the left side also. But that's not possible unless p=2 – which is always a special case.)

To summarize, then, let p≠2 be prime and d square-free and not 0 or 1. Then the solvability of ±tp=a2-db2 (where t is 4 or 1 according as d≡1 (mod 4) or not), is sufficient for (p) to split or ramify. And if the integers of Q(√d) are a PID, then solvability of the appropriate equation provides a necessary and sufficient condition for (p) to split or ramify. Further, in that case, d being a square mod p is necessary for (p) to split or ramify.

Remarkably, d being a square mod p is a necessary and sufficient condition for (p) to split or ramify, even if the integers of Q(√d) aren't a PID, but that's harder to prove. Since solvability of Diophantine equations is generally not obvious by inspection, it's very convenient to have a necessary and sufficient conditions for (p) to split or ramify simply in terms of the properties of d mod p.


Ideal class groups

Rings of algebraic integers are what algebraic number theory is all about. Although unique factorization domains such as Z are the easiest examples to work with, they turn out to be quite uncommon among all rings of integers. One thing, therefore, which would be nice to have, is some way to "measure" the extent to which OF for a given extension F⊇Q deviates from being a UFD.

First we note one helpful fact. Earlier we remarked that any PID is a UFD. But if we know that R is a Dedekind domain, then it is also true that whenever R is a UFD, it is actually a PID too. That is, for Dedekind domains, the concepts of UFD and PID are the same. This is a nice simplification. Instead of asking for a measure of the deviation of OF from being a UFD, we can ask instead how much it differs from being a PID, and that is an easier thing to get a handle on.

Let R=OF be the ring of integers of some number field F. We know that the set of all fractional ideals of R forms an abelian group under multiplication, which we can denote by FI(R). Let PFI(R) be the set of all principal fractional ideals of R, i. e. fractional ideals of the form (a) for some nonzero a∈R. PFI(R) is clearly a subgroup of FI(R). So we can form a new group, the quotient group, which is denoted by FI(R)/PFI(R). This symbolism does not refer to actual division of some sort. Instead, it is a very standard construction in group theory, very much like the quotient ring R/I when I is any (integral) ideal of R.

If you aren't too familiar with the concept of quotient groups, here's another way to think about it. We say that two fractional ideals I and I′ of R, which belong to FI(R), are equivalent (symbolically I∼I′) just in case I⋅I′-1 is principal, i. e. an element of PFI(R). Then ~ is a legitimate equivalence relation: as a relation it is reflexive (I∼I), symmetric (I∼I′ if and only if I′∼I), and transitive (I∼I′ and I′∼I′′ implies I∼I′′). So it defines equivalence classes, and there is an obvious group structure on these classes: define the group operation by taking representatives of each class.

This quotient group is called the ideal class group of the number field F, and we'll denote it by Cl(F). It measures the degree to which OF fails to be a PID, because if Cl(F) is very large with "many" different equivalence classes, it is in some sense "very likely" that for any pair I,I′∈FI(R), then I⋅I′-1 is not a principal ideal. Conversely, if Cl(F) is as small as possible, a single element with only one ∼-equivalence class, then every ideal is principal, and R is a PID (and thus a UFD).

It is a very important fact that Cl(F) is finite. For quadratic extensions F this was proven (though not in the modern abstract form) by Gauss, and for cyclotomic extensions by Kummer. The general proof for any number field F was finally done by Hermann Minkowski (1864-1909), who was an important number theorist, though he is better known for giving a more complete mathematical treatment of the 4-dimensional spacetime of Einstein's special theory of relativity.

The order #(Cl(F)) of the ideal class group is called the class number of F, and is always denoted simply by h (or hF if the field is to be indicated). Most of the more advanced part of algebraic number theory, including class field theory, revolves around properties of Cl(F) and the number h.

We've already introduced enough concepts to state concisely some key open questions of algebraic number theory. For example, we noted that it is unknown whether there are infinitely many positive d∈Z such that the ring of integers OQ(√d) has unique factorization. In other words, is isn't known whether there are infinitely many d such that Q(√d) has class number h=1 (although Gauss conjectured that this is so). Indeed, it isn't known whether there are infintely many finite extensions F⊇Q of any kind (but contained in C) with h=1 and unique factorization. That right there indicates a rather large gap in our knowledge.

Class field theory has a reputation for being a very difficult subject, and it is. There is a fair bit of abstract conceptual machinery involved even to explain many of the results, and (of course) much more to prove the results. However, we've already mentioned many of the necessary concepts for describing some of the results -- such as the notions of Galois groups and fractional ideals. We actually need just a little more preparation before we can start talking about class field theory itself.

Kummer and class numbers of cyclotomic fields

Aside from representing the degree to which the ring of integers of a number field fails to have unique factorization, are the class number and class group of a number field important? Yes, definitely. Class field theory shows that class groups (as above, and in a more general form) not only contain information about arithmetic within a given field, but in fact they also determine all possible finite abelian extensions of a field (i. e. Galois extensions with an abelian Galois group). This is pretty surprising, and wasn't realized for a fairly long time.

But before we come to that, we can give another example, from the work of Eduard Kummer. As noted above, in 1847 Kummer pointed out that the integers of various cyclotomic fields did not have unique factorization, and that this invalidated simple attempts to prove Fermat's Last Theorem by working in those fields. In order to do this, Kummer constructed a theory of "ideal numbers" which was a forerunner of the ring-theoretic methods of Dedekind, including ideal class groups.

Although Kummer is best remembered today for his work related to Fermat's Last Theorem, his interests and the methods he developed covered a larger area, especially the question of "higher reciprocity laws" -- that is, generalizations of the law of quadratic reciprocity to powers greater than two. In other words, what is the relation of primes p and q if one is an nth power modulo the other? This was a harder question than it may have seemed -- class field theory was required in order to find the answer. Nevertheless, this question also required detailed knowledge of cyclotomic fields, just as Kummer's work on Fermat's Last Theorem did. In fact, the natural setting in which to study the problem of higher reciprocity laws is an extension of the form K(a1/n)/K for some a∈K, where the base field K is assumed to contain all nth roots of unity. Obviously the polynomial f(x)=xn-a has all its roots in K(a1/n)/K if a∈K. An extension like this is now called a Kummer extension.

Although Kummer was too early to solve the higher reciprocity law problem, he did have limited success with Fermat's Last Theorem, and the concept of the ideal class group played a major part. Specifically, Kummer proved that if p is a prime of a certain type, then the equation xp+yp=zp has no non-trivial (i. e. with none of x, y, z being 0) solutions for x,y,z∈Z, or even in Z[ζ], where ζ=ζp is a pth root of unity. (Recall that to prove Fermat's theorem it is sufficient to prove it for exponents which are primes.)

The condition required on p is that it be an odd prime which doesn't divide the class number of K=Qp). Kummer in fact proved that this condition is equivalent to certain weaker and more easily verified conditions. Let h+ be the class number of K+= Q(ζ+ζ-1). Kummer proved that h+|h, so h-=h/h+ is an integer. Kummer next proved that p∤h if and only if p∤h-. He then showed that this criterion for p to be regular is equivalent to p not being a divisor of any of the numerators of the first p-3 rational numbers Bn known as Bernoulli numbers, which are defined as coefficients of the Taylor series expansion

x/(ex-1) = ∑0≤n<∞ Bnxn/n!
There is a simple recurrence relation between the first n Bernoulli numbers for any n, so it is in principle simple to calculate them. (In practice, the numerators grow large very quickly, so it wasn't actually easy before modern computers.) Bernoulli numbers seem to turn up everywhere in number theory. For instance, Euler discovered that the values of Riemann's famous zeta function
ζ(s) = ∑1≤n<∞ n-s
at positive even integers are given by
ζ(2k) = (-1)k-1 (2π)2k B2k / (2(2k)!)
Computations have shown that "most" primes which have been checked are regular. For instance, of odd primes less than 100, only 37, 59, and 67 are not regular. Ironically, however, it has been proven that there are infinitely may primes which are not regular, but it is still a seemingly very difficult open question whether there are infinitely many regular primes.

Facts about ideal class groups and the class number


Class field theory

Kronecker's Jugendtraum and the Kronecker-Weber theorem

Although Dedekind reformulated Kummer's concepts of ideals into more or less the modern shape around 1880, it was Leopold Kronecker (1823-91) who had the first intuitions of what might rightly be called class field theory when he was a young man. Kronecker was a student of Kummer's at the German equivalent of an American high school and traced his enthusiasm for number theory to that experience. (Kronecker was also mentored by Kummer a little later at the University of Breslau.) Not surprisingly, Kronecker picked up a keen interest in questions of algebraic number theory in general and cyclotomic numbers in particular.

In later years Kronecker wrote (in an 1880 letter to Dedekind) that it was the "dearest dream of his youth" -- his Jugendtraum -- to show (in modern terms) that abelian extensions of number fields could be generated by special values of certain transcendental functions. For example, he conjectured and partially proved in 1853 than any abelian extension of Q is contained in a cyclotomic field Q(ζ) for some nth root of unity ζ. Of course, all such roots are of the form e2πi/n, so in this case special values of the exponential function are involved. This is a simple consequence of class field theory, but hard to prove otherwise. A full proof was not competed until 1886, by Heinrich Weber (1842-1913).

In 1860 Kronecker conjectured that all abelian extensions of an imaginary quadratic field (of the form Q(√-d) for positive d∈Z) are contained in an extension of Q generated by special values of elliptic functions. This was mostly proven by Weber, in 1891. And this result, too, falls naturally out of class field theory.

Kronecker certainly expected further results along the same lines, and this idea so impressed David Hilbert (1862-1943) that he made it problem number 12 in his famous list of 23 Hilbert problems. Hilbert wrote, "the extension of Kronecker's theorem to the case that, in place of the realm of rational numbers or of the imaginary quadratic field, any algebraic field whatever is laid down as realm of rationality, seems to me of the greatest importance. I regard this problem as one of the most profound and far-reaching in the theory of numbers and of functions." Hilbert did not regard Weber's proof of the imaginary quadratic case as complete, "but I believe that it must be obtainable without very great difficulty on the basis of the theory of complex multiplication developed by H. Weber with the help of the purely arithmetical theorems on class fields which I have developed."

Hilbert was right on that last point, but he seems to have missed the mark somewhat on the more general case. To date, no further complete results have emerged along these lines, although Goro Shimura has obtained partial results, using automorphic forms, for abelian extensions of real quadratic fields. In this sense, Hilbert's 12th problem -- also commonly known as Kronecker's Jugendtraum -- remains an unresolved open question.

On the other hand, class field theory provides a complete answer of a different kind which specifies a 1-to-1 correspondence between abelian extensions of a number field and certain generalized ideal class groups which are defined entirely from the arithmetic properties of the base field. Unfortunately, this result isn't "constructive" in being able to specify actual generators for abelian extensions. So in the sense of knowing how to actually construct the required extensions, the problem is still open and unsolved. It's probably very difficult, if solvable at all.

Of course, in mathematics even the unsolved problems can be enormously important (as Fermat's Last Theorem was for many years). The ideas of Kronecker and Weber are the historic origins of the notion of a "class field", as developed by Hilbert himself. More specifically, Kronecker deserves recognition for having conceived the program of characterizing the extensions of a field in terms of sets of prime ideals. We are about to see how this idea plays out.

Hilbert's class field theory

Hilbert received his doctorate in 1885, and already by 1893 he was a full professor (at Königsberg) and a significant figure in the German mathematical community. In that year the German mathematical society asked Hilbert to write a report synthesizing the recent progress in algebraic number theory achieved by Kummer, Kronecker, Dedekind, and others. When the report, known as the Zahlbericht, was completed in 1897, it not only gave a comprehensive survey of the work of his predecessors, but contained substantial new and original work by Hilbert himself. Prominent in that work were some of the key ideas of modern class field theory. While not entirely original to Hilbert, these ideas had never been preesnted so clearly before.

Let k be any algebraic number field (finite extension of Q), and K⊇k any finite abelian extension of the base field. The central idea which Hilbert supplied is called the class field of k -- hence the term "class field theory". Hilbert realized that one of the main difficulties in proving results about the integers of an algebraic extension is the existence of ramified primes -- prime ideals of Ok which contain more than one factor of the same prime ideal in the integers OK of the extension field. "Ramification" is a term which is also used in the theory of Riemann surfaces to refer to what happens in a neighborhood of a point on the surface where a complex analytic function (defined on C) is multi-valued. This is what happens, for example, with the square root function f(z)=√z at the origin. There are important fundamental similarities between ramification on Riemann surfaces and ramification in number theory. In both cases, ramification makes life more difficult.

So Hilbert had the brilliant idea to consider the largest abelian extension of the base field k which is not ramified. This is what he defined as the class field of k -- more concisely, the maximal unramified abelian extension of k. This was something of a leap, because the most familiar number field, Q, doesn't have any nontrivial unramified extensions. Hence the class field of Q is itself. (This fact is easy to see for an extension Q(√d)⊇Q, because as we noted, p ramifies if and only p|d.)

The reason that Q has no unramified extensions is very interesting, because it lies precisely in the fact that Q is a PID. In other words, although the integers of Q are not complicated by the existence of non-principal ideals and non-unique factorization, the complexity is shifted into extensions of Q, where ramification must occur. Stated yet another way, it is the existence of non-principal ideals in the (ring of integers of the) base field which allows some extension fields to lack ramification.

These facts illustrate one of the most important insights of class field theory -- the fact that purely arithmetic relationships in the base field (involving primes and factorization) affect what kind of extensions the field can have.

A priori the maximal unramified abelian extension of a number field k need not be a finite extension. The maximal abelian extension of a number field is in fact not a finite extension of the field. However, what Hilbert realized from consideration of many special cases is that the class field he defined had a very strong relationship to the ideal class group of the base field -- the strongest possible, in fact. Indeed, if K is the class field of k, then the Galois group G(K/k) of the extension is isomorphic to the ideal class group Cl(k) of k. This is quite astonishing if you think about it, given how different the definitions of these two things are. Consequently, the degree of the extension [K:k] is simply the class number h(k) -- and therefore is finite. This makes very precise the idea that the larger abelian extension of k you can find which lacks ramification, the larger and more complicated the ideal class group has to be, and conversely. In particular, the class field of a PID (such as Q) is trivial.

There are several other rather astonishing facts about the ideal class group. For example, if K is the class field of k, then the prime ideals of k which split completely in K are the principal prime ideals. Indeed, this characterises K -- K⊇k is the class field of k if and only if the prime ideals which split completely are the ones which are principal. Or equivalently, since there is no ramification in K, the prime ideals which don't split completely (hence have index of inertia > 1) are the nonprincipal ideals. (And again, if k is a PID so all ideals are principal, the class field is trivial.)

Finally, if K is the Hilbert class field of k, then every ideal of k becomes principal in K. This is known as the Principal Ideal Theorem. It doesn't mean that K is a PID and all its ideals are principal, only that the ideals I of k, when extended to ideals I⋅OK are.

Hilbert stated all these results as conjectures, based on the many examples he had considered. But he didn't actually prove any of these results in full generality himself, though he did prove special cases, for example when the class number h(k) of k is 2. First and foremost among facts which needed proof was the very existence of a class field since, as we observed, there is no a prioi reason to suppose some finite extension K⊇k exists which contains all the unramified abelian extensions of k. Fortunately, Hilbert's student Philipp Furtwängler (1869-1940) dedicated himself to this task, and soon proved most of the conjectures, except for the Principal Ideal Theorem, which eluded him until 1930. In fact, Furtwängler completed proof of this important theorem using newer results proven by Emil Artin in 1927, which we well describe below.

Takagi's class field theory

As impressive as it was, Hilbert's class field theory left some obvious questions unanswered. For example, although it could determine all unramified abelian extensions of a field as subfields of the class field, it said nothing about ramified abelian extensions, which are the vast majority. Even Kronecker's theorem that every abelian extension of Q is contained in a cyclotomic extension was much more useful as far as extensions of Q are concerned -- since Hilbert's theory says nothing at all about them.

Evidently the issue of ramification couldn't be so easily swept under the rug if further progress was to be made. One might have supposed, given the tradition of Kummer, Kronecker, Weber, Dedekind, and Hilbert, that another German mathematician, perhaps Furtwängler, would see a way through the problem. Surprisingly, this was not the case, and instead a Japanese, Teiji Takagi (1875-1960) made the next, decisive step -- at a time when Japan had only one major university (Tokyo) and almost no mathematicians at all working in frontier areas.

Takagi was one of a small handful of top students privileged to study in Europe. In Berlin he read the Zahlbericht shortly after it was published in 1897, was fascinated by the subject, and arranged to study with Hilbert in Göttingen. Unfortunately, Hilbert seems to have lost interest in algebraic number theory after completing the Zahlbericht -- he did after all leave it to Furtwängler to prove the results he had conjectured. Hilbert was gracious and encouraging to Takagi, but not inclined to be much involved personally. Takagi returned to Japan in 1901. He obtained a professorship in Tokyo and seemed content to set aside research in favor of teaching.

According to Takagi himself, he took up research again only during World War I, when research papers and journals from Europe ceased to be available in Japan. If he was to remain stimulated by new developments, he would simply have to do the research himself -- and he did. Algebraic number theory and class field theory were evidently what still appealed to him the most. He understood the problem with Hilbert's approach:

Concerning class field theory, I confess that I was misled by Hilbert. Hilbert considered only unramified class fields. From the standpoint of the theory of algebraic functions which are defined by Riemann surfaces, it is natural to limit consideration to the unramified case ... after the end of scientific exchange between Japan and Europe ... I was freed from that idea and suspected that every abelian extension might be a class field if the latter is not limited to the unramified case.
Takagi perceived a simple, but fiendishly clever, way around the ramification problem. Ironically, it was something that had been right under the noses of the German mathematicians, as the idea (but not the application) was actually due to Weber, who expounded on a theory of "generalized ideal classes" in a widely-read textbook that Takagi had studied even before he went to Germany. Apparently Hilbert had become so uninterested in algebraic number theory while writing the Zahlbericht that he missed something which should have been obvious.

Although one can't ignore fields that have ramification, one can (as it turns out) work only with ideals that aren't divisible by ramified primes. Suppose M is any (integral) ideal at all of Ok. Then M has a unique factorization into a finite number of prime ideals. Let AM be the set of fractional ideals whose factorizations do not involve any primes which divide M. Let HM be the subset of principal fractional ideals contained in AM, so ideals of HM also have no primes dividing M in their factorization. One is thus able to avoid any finite set of prime ideals that one wants to.

We leave out a few technical details, but essentially it can be shown that AM is a group under multiplication, and HM is a subgroup of finite index in AM. That is, the quotient group AM/HM is finite. Many ideals M may result in isomorphic quotient groups, so that if M and M′ are two such, then AM/HM ≅ AM′/HM′. This can be shown to be an equivalence relation on the set of integral ideals. So there is some quotient group A/H which is the same for all M in any equivalence class -- it doesn't depend on the choice of M in that class. Weber called A/H a congruence divisor class group, since the HM can be defined in terms of congruences for the elements which generate the principal ideals. Furthermore, for each A/H, the greatest common divisor of all ideals in the corresponding equivalence class is an ideal called the conductor of A/H, usually denoted by F. (The German word for conductor is Fürher.) As the greatest common divisor, F is the smallest ideal which contains all of the ideals M in the equivalence class that defines A/H.

So Takagi just combined the ideas of Weber and those of Hilbert. He defined the class field K of A/H as a finite extension of k such that the prime ideals of k which split completely (with inertial degree 1 and no ramification) are precisely the ideals of H -- which are principal. If one takes M to be the trivial ideal (1) (i. e. Ok) then there are no restrictions on what fractional ideals are in A or what principal ideals are in H, and so A/H is just the ordinary ideal class group. As we noted, an equivalent definition of the Hilbert class field is the field K such that the prime ideals which split completely are precisely the principal ones. Thus Takagi's notion of class field is a direct generalization of Hilbert's.

The hard part was proving all of the properties of Takagi's class fields, and the proofs were rather gruesome, but Takagi published them in 1920. Most importantly, he proved that for any generalized class group A/H of k a unique class field K exists and is a finite abelian extension of k, and in fact (as one would expect) G(K/k) is isomorphic to A/H.

Even more importantly, Takagi showed that conversely, any finite abelian extension K⊇k is a class field of some divisor class group A/H. In some sense, this latter result is a partial solution of Kronecker's Jugendtraum, since it describes every finite abelian extension K⊇k. However, it is a fairly weak solution, since no explicit, constructive method is known, given a divisor class group, to find generators for K⊇k. Explicitly constructing class fields is tantamount to realizing the Jugendtraum, and it is still very much an open problem. Conceivably it could turn out to be unsolvable, just as the problem (which was number 10 on Hilbert's list) of giving an explicit general method of solving any Diophantine equation was proven to be impossible.

As far as ramification is concerned, the class field K is the maximal abelian extension of k which is unramified except for primes that divide the conductor F. This is an alternative definition of K as the class field of k, so this too is a direct generalization of Hilbert's results. Moreover, the primes which divide the conductor are precisely the primes which ramify. An "elementary" fact about field extensions is that one can define the discriminant D for any extension K/k as an ideal of Ok. The prime divisors of D are then exactly the primes which ramify, and so D and the conductor F have the same prime divisors.

There is also an ordering relationship. If another abelian extension K′⊇K and H′, H are the corresponding groups of principal ideals, then H′⊆H. Conversely, if we have a subgroup H′⊆H, then K′⊇K for the corresponding class fields. Further, K′=K if and only if H′=H. This makes good sense intuitively. If K′⊇K is a larger field, then the number of primes of k which ramify in it can only increase. So the corresponding conductor F′ will have at least as many factors as F. Consequently, one expects H′⊆H. Conversely, if H′⊆H, then for the corresponding conductors, F′ should have more prime factors than F. Since K and K′ are maximal among abelian extensions which are unramified outside of primes that divide their respective conductors, the conditions on K′ are less restrictive (there are fewer primes where it must be unramified and more where it can be ramified), so K′⊇K.

Artin's class field theory

Takagi's results seemed to be just about as general, of their kind, as one could want, for abelian extensions of fields at any rate. What about arbitrary (non-abelian) field extensions? There was hardly a clue as to what sort of result to expect in that case, given that class fields for K⊇k were defined so that G(K/k) should be isomorphic to a generalized ideal class group, which is necessarily abelian. Consequently, neither Takagi nor the Germans interested in class field theory dared to explore in that direction.

None, that is, until Emil Artin (1898-1962), fresh from completing his PhD at age 23, arrived in Göttingen. There he first read Takagi's papers on class field theory, and was almost alone in not finding them difficult to understand. For mathematicians, being young is frequently a big advantage. Artin soon conceived of a powerful way to extend Takagi's results, though for several years he was unable to prove his conjectures.

It helps to know a bit about what Artin wrote his thesis on, which was the theory of algebraic function fields over a finite field. An algebraic function (of one variable) g(x) is simply a function which, for some polynomial f(x)∈F[x], where F is a field, satisfies the equation f(g(x))=0. This is a reasonable thing to study, since algebraic functions, by their very definition, have a lot of similarities to algebraic numbers. You can, for example, start with the ring of polynomials F[x]. Since that is an integral domain, it has a field of fractions, which are called, obviously, rational functions. If this field is denoted by F(x), then it's natural to consider finite extensions, i. e. fields of algebraic functions.

If F=C, then the study of algebraic functions was investigated intensively in the 19th century by Niels Abel, G. F. B. Riemann, and many others, as part of the general theory of analytic functions of a complex variable. Indeed, as we noted, the concept of ramification was imported to algebraic number theory from algebraic function theory. And Riemann invented Riemann surfaces in order to handle the problem that algebraic functions are in general not single-valued: the solution of g(z)2-z=0, where g(z)=√z, for example.

Artin's thesis considered algebraic functions when the field of constants was a finite field F. Any such F must be Fq where q=pn for some prime p and some integer n≥1. When n=1, Fp is simply Z/pZ. When n>1, Fq is a (unique) extension of degree n over Fp. (It should not be confused with Z/pnZ, which isn't a field, since pnZ isn't a maximal ideal if n>1.) Finite fields were also studied in the 19th century, but Artin's idea of studying algebraic function fields over such fields was a novelty. And a great idea, because they turned out to have many properties in common with algebraic number fields. However, many of these properties are considerably easier to prove.

Mathematics often progresses by applying great ideas of the past in new situations. One of the great ideas of number theory in the 19th century was the use of functions called zeta functions and L-functions to investigate questions in arithmetic and algebraic number theory. The Riemann zeta function ζ(x), as a function of a real variable, was first studied by Euler in the 18th century but seriously applied (by Riemann, as a function of a complex variable) to deep results about prime numbers in 1859. L-functions, which are a generalization of ζ(x), were invented by P. G. L. Dirichlet (1805-59), who used them to prove that there are infinitely many primes p such that p≡a (mod m) if (and only if) a and m are relatively prime.

Dedekind generalized the zeta function in another direction when he defined a function ζK(s) for any extension field K⊇Q. Riemann's zeta function is just ζK(s) when K=Q. Dedekind found a relation between his function and certain Dirichlet L-functions, and also obtained a formula for the class number of K in terms of the value of ζK(1). Zeta and L-functions actually played an important role in the proofs of the main theorems of class field theory, but we haven't mentioned them before, since they aren't needed in order to describe the theorems themselves. We will say a good deal more about them later, since they play a central role in Artin's work.

What Artin did in his thesis was to define analogs of the zeta and L-functions for algebraic function fields over a finite field. He then proceded to apply them to proving results about the arithmetic of such function fields. He didn't develop the theory in full generality, but instead considered various special cases, such as quadratic extensions of Fp. Nevertheless, his thesis laid the groundwork for some of the deepest ideas of 20th century mathematics.

For example, André Weil (1906-1998) picked up the study of the zeta and L-functions Artin considered in his thesis and actually showed that they satisfied the appropriate analog of the Riemann hypothesis -- a result which is still very much unproven for the original zeta and L-functions. Not only that, he generalized these functions further for the case of algebraic varieties over finite fields. (A variety, roughly speaking, is the solution set of several simultaneous algebraic equations, instead of just one.) He conjectured (what was known as the Weil conjectures) that these generalized functions had the properties expected of the classical zeta and L-functions, including a "Riemann hypothesis" of their own. The Weil conjectures were fully proven by Pierre Deligne in 1973.

But let's get back to Artin. He was fresh from working with L-functions in his thesis. He knew that L-functions had been crucial in proofs of Takagi's class field theory. He therefore suspected that there might be a way to define yet another sort of L-functions, more general than those used by Takagi, which would make sense even for non-abelian (but Galois) extensions K⊇k. Indeed, such functions are rather easily defined, as we shall explain later. He first published his ideas in 1923, but wasn't able to prove them until 1927, after certain key results had been proven by a Russian, N. G. Chebotarev (1894-1947). Even then, he could give proofs for his generalized L-functions (now called Artin L-functions) only for the case of abelian extensions. (And no one since has proven the results, now known as the Artin conjecture, in the general non-abelian case either.)

Nevertheless, what Artin did prove finished up the the abelian version of class field theory pretty neatly. There was one thing notably lacking in Takagi's (and earlier) class field theory. Although they demonstrated an isomporphism between an ideal class group and the Galois group of the class field, this was not an explicit isomorphism. That is, there was no explicit way to take a class of ideals and map it to an element of the Galois group. Instead, the isomorphism was proven using a combination of abstract group theoretical arguments (which today are a part of what is called the cohomology of groups) and analytic formulas for the ideal class number derived from the theory of L-functions.

What Artin perceived was that there was a way to describe the isomorphism explicitly. This was a great advantage, because it significantly simplified proving important consequences of the general theory. For example, as we shall explain, various reciprocity laws come about simply and naturally. It took Artin four years, with help from Chebotarev's ideas, to actually prove his isomorphism worked -- but that's not a long period of time for such a striking conjecture.

Understanding Artin's idea depends on simple facts about finite fields. Let K⊇k be the abelian extension in question. Suppose P is a prime ideal of Ok which isn't ramified in K and Q is a prime of OK above P, i. e. Q is a prime factor of P⋅OK, and P=Q∩Ok. Since P is a maximal ideal of its ring of integers, the quotient Ok/P is a finite field Fq, called the residue field of P, of q=pn elements, where (p)=P∩Z is the rational prime below P. The norm of P, denoted by NP, is defined to be q. (Hence the norm of any ideal of Ok can be defined by multiplying the norms of all prime factors of the ideal.) The nonzero elements of any finite field form a finite cyclic group of order q-1. Hence if a∈Fq, then aq-1=1, and aq=a. In terms of residue fields, that means for any α∈k, αNP≡α (mod P). (Which is trivially true even if α∈P). Another basic fact is that the residue field OK/Q is a finite Galois extension of Ok/P, and the Galois group is cyclic. If OK/Q has q′=pn′ elements, then n|n&prime, and m=n′/n is the degree of OK/Q over Ok/P, i. e. the order of the Galois group. In this situation, the automorphism defined by σ(x)=xq is a generator of the Galois group. (Note that it leaves the base field Fq fixed.) σ is known as the Frobenius automorphism, after Georg Frobenius (1849-1917), who worked out a lot of the basic facts about finite fields. (Frobenius studied in Berlin under people like Kummer and Kronecker and contributed to many branches of mathematics, including the theory of group characters and group representations, which will be mentioned later.)

Now consider the Galois group G=G(K/k). For given prime P of Ok, divisible by some prime Q of OK above P, let GQ be the subset {τ∈G | τQ=Q}. It's obviously a subgroup of G (called the decomposition group of Q). Since GQ takes elements of Q to other elements of Q, every τ∈GQ in a natural way induces an automorphism τ of the residue field OK/Q. τ leaves elements of P⊆k fixed, so τ leaves elements of Ok/P fixed. Hence τ is in the Galois G of OK/Q over Ok/P. Conversely, it can be shown that every element of G is of the form τ for some τ∈G. There may actually be many elements of G that have this property. However, if we define a further subset TQ={τ∈G | τ is the identity of G}, then TQ is a subgroup of GQ (called the inertia group of Q). Furthermore, if τ∈GQ induces a particular τG, then all elements of GQ that induce the same &tau have the form ττ′, for some τ′∈TQ. One writes this set as τTQ. (In group theory, this is called a coset of TQ.)

Therefore, there is an automorphism in GQ, which we denote by the symbol (Q,K/k) and which depends on Q, such that all automorphisms which induce the Frobenius automorphism on the residue field OK/Q are in the set (Q,K/k)TQ. Although in general we have more than one choice for (Q,K/k), when P is unramified as we are assuming, then in fact TQ consists of just the identity element, and the choice for (Q,K/k) is unique. Suppose next that Q′ is another prime above P. Then Q′=ηQ for some η∈G. From the definition, it follows that (Q′,K/k) = (ηQ,K/k) = η(Q,K/k)η-1. Hence if, as we are also assuming, K/k is abelian, then (Q,K/k) is uniquely defined for any Q that divides P, so it may be denoted by (P,K/k).

If you have found this discussion a bit long-winded, then you may just accept that for any prime P which is unramified in K/k, there is a unique way to define a special element (P,K/k) of the Galois group G(K/k). This definition can be exended multiplicatively to any fractional ideal J of k which has no ramified primes in its factorization, i. e. (P⋅P′,K/k)=(P,K/k)(P′,K/k), and so forth. The automorphism (P,K/k) is called the Artin symbol, and it is Artin's inspired idea of how to explicitly define a map from fractional ideals which are prime to the conductor of K/k to elements of the Galois group G(K/k). Note that the extension has to be abelian so that, at one key step, the map is well-defined.

Armed with this excellent idea, which surely seemed to Artin to be too elegant to be a useless fluke, it was quite natural to conjecture that this map from fractional ideals to G(K/k) should be surjective (i. e. all elements of G(K/k) are of the form (J,K/k) for some fractional ideal J), and that the kernel of the map (fractional ideals which map to the identity of G(K/k)) should be a subgroup H of fractional ideals, such that A/H is an ideal class group in Takagi's sense and that the Artin symbol gives the isomorphism A/H≅G(K/k) explicitly.

As we said, Artin was able to prove this conjecture in 1927, using some ideas from Chebotarev. The proof still wasn't easy, but it had many advantages over Takagi's nonconstructive proof that the ideal class group of an abelian extension K⊇k is isomorphic to the Galois group G(K/k). A number of other proofs of Artin's result have been formulated since 1927, using a succession of more sophisticated tools. Although the structure of such proofs has become clearer, none can be called easy.

We shall see how Artin's result makes some generalized reciprocity laws fairly transparent. Consequently, Artin's theorem has become known simply as the Artin reciprocity law. It thus constituted the most elegant (if not the first) solution to Hilbert's 9th problem, which called for "a proof of the most general reciprocity law in any number field".

Artin's result also allowed him to define in a new way certain "L-functions" of an abelian extension K/k. Although it enables proving that these functions have certain desired properties only in the abelian case, it suggests that analogous results can be obtained for arbitrary non-abelian extensions -- hence one might have a "non-abelian class field theory". To date this conjecture is still unproven, and is a major open question. We'll consider it in more detail later.

Reciprocity laws

Gauss' quadratic reciprocity law is the fact that if p, q are distinct odd primes then
(p/q)(q/p) = (-1)(p-1)(q-1)/4
where (p/q) is the Legendre symbol defined to be 1 or -1 according as p either is or is not a square modulo q, i. e. whether the congruence p≡a2 (mod q) has some solution with a∈Z. By itself, it is an interesting result, since a priori there's no reason to expect that p being a square modulo q has much to do with q being a square modulo p. Nevertheless, for all anyone around the time of Euler, Legendre, and Gauss might have known, it could have been merely an interesting curiosity. "Elementary" number theory has a lot of results like that. Still, it was strange that it was so easy to state, yet so difficult to prove that mathematicians as able and creative as Euler and Legendre couldn't prove it.

It is also a useful result, since it can be applied to the analysis of many Diophantine equations, especially (of course) those which involve squares. An important class of such equations are known as quadratic forms, which are sums of squares with arbitrary (integer) coefficients, such as

1≤k≤n akbk2
One can ask questions about what integers are representable in such a form, for some set of bkZ, when the coefficients akZ are given. Gauss was interested in the theory of quadratic forms, and his research led him to the concept of what we now call the ideal class number, because the theory of quadratic forms is closely related to the theory of quadratic field extensions, which Gauss also investigated. Out of that came his conjecture (now proven) that there are only 9 imaginary quadratic fields with class number one, and also his conjecture (still unproven) that there are infinitely many real quadratic fields with class number 1.

Something else which fell easily out of the study of quadratic extensions was the fact that the way an integer prime p factors in a quadratic extensions (either remaining prime, splitting, or ramifying) is exactly determined by whether or not the discriminant of the field is or is not a square modulo p, i. e. depending on the value of the Legendre symbol (d/p). From a modern point of view, this question about how a prime factors in an extension field appears to be more interesting and central, at least from a theoretical perspective, than the original questions about congruences.

But even before we get to that, the original reciprocity law and the Legendre symbol it involves can be generalized in several ways. For one thing, it is possible to define (a/b) for any a,b∈Z which are relatively prime and 2∤b, not just for prime b. As a matter of notation, let vp(b) denote the exact power of a prime p which divides b, so that we have

b = ∏p pvp(b)
(The product is over all primes p, but only the factors where p|b, so that vp(b)≠0, actually matter.) Then we can define
(a/b) = ∏p|b (a/p)vp(b)
(The product is over p which divide b since none of those divide a, by assumption, while if p|a we would have a factor (a/p)=0.) This symbol is called the Jacobi symbol after C. G. J. Jacobi (1804-51). It has properties analogous to those of the Legendre symbol: Its value depends only on the residue class of a modulo b, and it is multiplicative in both arguments. (a/b)=1 if a is a square modulo b. However, (a/b)=1 doesn't necessarily imply, conversely, that a is a square modulo b. (For instance, (2/9)=(2/3)(2/3)=1, yet 2 isn't a square modulo 9.) Nevertheless, there is still a reciprocity law for the Jacobi symbol:

If a and b are relatively prime odd integers then

(a/b)(b/a) = ε(-1)(a-1)(b-1)/4    where &epsilon=-1 if a and b < 0, otherwise &epsilon=1
When we come to other generalizations of this kind of symbol, we will find that they, likewise, can be defined for more than just primes.

Another obvious generalization comes from asking about numbers which are nth powers modulo another integer when n>2. Instead of quadratic residues these are called (of course) nth power residues. n need not be a prime. This is a much harder question.

Gauss considered the problem of 4th power (biquadratic) residues and conjectured, but did not prove, a biquadratic reciprocity law. This was proven by Jacobi and Ferdinand Eisenstein (1823-52). They also formuated and proved a cubic reciprocity law. For reasons which we will explain, an nth power residue symbol should have values which are nth roots of unity. So Gauss' biquadratic reciprocity law involved values in the field Q(√-1). It was therefore a reciprocity law question which led Gauss to consider the integers of that field, namely the ring Z[√-1], which became known as the "Gaussian integers". Similarly, a cubic reciprocity law involves the integers of Q(ζ), where ζ=(-1+√-3)/2, i. e. the ring Z[ζ].

Higher reciprocity laws of this sort were what interested Kummer the most. In fact, it was the problem of pth power reciprocity laws, rather than finding a proof for Fermat's Last Theorem, which entangled Kummer in the problems of the ring of integers of Qp), where ζp is a pth root of unity. This revealed to him the lack of unique factorization of integers in these fields (in the general case), and therefore led him to his theory of ideals, which was put in more modern form by Dedekind and Hilbert.

Our ultimate goal is to define an nth power residue symbol for n∈Z, n≥2. We'll need class field theory to do a complete job. But to give a better idea of what's involved, we'll look at a more special case which can be done with simpler tools, when n is a prime number. To begin with, suppose k is a field that contains a primitive nth root of unity ζ, i. e. ζn=1 but ζj≠1 if 1≤j<n. For example, k=Q(ζ). Since ζ is primitive, k contains all nth roots of unity, which are just powers ζj for 1≤j≤n. Now xn-1 = ∏0≤i<n (x-ζi), and (xn-1)/(x-1) = ∑0≤i<nxi, hence ∑0≤i<nxi = ∏0<i<n (x-ζi). Setting x=1 in this gives n = ∏0<i<n (1-ζi). Hence the principal ideal (n) is the product of n-1 principal ideals of the form (1-ζi).

Now suppose that n is a prime number, call it p, and let k=Q(ζ). Consider numbers of the form εij = (1-ζi)/(1-ζj). If 0<j<p, then j has an inverse modulo p, and so we can find j′ such that jj′≡i (mod p). Hence

εij = (1-ζjj′)/(1-ζj) = ∑0≤k<j′ ζjk
This shows that the numbers εij are all integers of k. But 1/εij = εji, so these numbers are all units. Therefore the ideals (1-ζi) are all equivalent to a principal ideal P=(1-ζ), and (p)=Pp-1.

Suppose a∈k, a≠0, (but not necessarily an integer), i. e. a is a member of k×, the invertible elements of k. Let Q be prime ideal of k that is relatively prime to (i. e., has no prime factors in common with) both a and P. Let [a] = {b∈k× | b≡a (mod Q)}. (b≡a (mod Q) means a-b∈Q.) [a] is known as the residue class of a modulo Q (a natural generalization of residue classes in Z). We have [ai]=[a]i for any i∈Z. Since Q is prime to P=(1-ζi), for all 1≤i<p, it follows 1≢ζi (mod Q), so [ζi]=[ζ]i≠[1].

Let G be the set of (distinct) residue classes [a] where a∈k× and a is relatively prime to Q. Since Q is prime, G is a finite cyclic group under multiplication. [ζ]p=[1], so [ζ] generates a subgroup of G of order p, and p divides the order #(G) of G. Let Gp be the set {xp | x∈G} of pth powers of elements of G. Gp is a subgroup of G. If [a] is a generator of G, [a]p generates Gp, so the index of Gp in G is p, and the quotient group G/Gp consists of p disjoint cosets.

It is not the case that ζi and ζj are in the same coset if 0≤i,j<p unless i=j. For otherwise, we would have ζi-j∈Gp.

Hence for i≢0 (mod p), [ζi]∉Gp. Therefore G is a disjoint union of cosets of Gp in G, namely

G = ∪0≤i<p Gp[ζ]i
Stated differently, for any a∈k×, there is a unique i (modulo p) such that [a]∈Gp[ζ]i.

Therefore we can define the pth power residue symbol (a/Q)=(a/Q)p, whenever Q is a prime ideal of k that is relatively prime to a and p, to be ζi, where i is unambiguously determined (modulo p, which is enough) by the above condition on [a]. Note that if (a/Q)=1, then i=0, so [a]∈Gp, so a≡bp (mod Q) for some b∈k×. Conversely, if a≡bp (mod Q), then [a]∈Gp, hence (a/Q)=1. This shows that our definition of (a/Q) yields (a/Q)=1 if and only if a≡bp (mod Q) -- which is precisely how we should expect a pth power residue symbol to behave.

Now, up to this point in the current section, we haven't used any class field theory, just relatively simple facts about algebraic number fields, roots of unity, and a bit of group theory. Let's now apply class field theory to a specific abelian extension of k. Given a∈k×, let α=a1/p be any particular pth root of a. All pth roots of a are solutions of xn-a=0, and have the form ζjα for 0≤j<p. If K=k(α), then K⊇k is an abelian extension, which is called a Kummer extension, as noted earlier.


Zeta functions and L-functions


Open questions

Kronecker's Jugendtraum

Nonabelian class field theory and Artin's conjecture

p-adic L-functions, Iwasawa theory

The Langlands program

Miscellaneous problems


Quadratic fields with h=1
Regular primes
Leopoldt's conjecture
Artin's other conjecture



units
norms, conjugates
multiplication of ideals
fractional ideals
unique factorization, principal ideal domains, Dedekind domains Euclidean domains, integral domains
Z/nZ as an example
field extensions, rings of integers
field of fractions of an integral domain
hidden properties of Z which don't generalize to other rings, even rings of algebraic integers (e. g. integral domain, euclidean, unique factorization)
relation of splitting of primes and splitting of polynomials over finite fields; A/P ≅ Fp[x]/(f(x)) (BF Wyman)

Although Gauss' class number problem has been solved, it is still not known whether there are infinitely many D such that Q(√D) has unique factorization.


Recommended references: Web sites

Site indexes

Math Forum Internet Mathematics Library: Algebraic Number Theory
Alphabetized list of links with extensive annotations.
Open Directory Project: Algebraic Number Theory
Categorized and annotated topology links. A version of this list is at Google, with entries sorted in "page rank" order.
Galaxy: Algebraic Number Theory
Categorized site directory. Entries usually include descriptive annotations.

Sites with general resources

Franz Lemmermeyer's Articles and Preprints
Lemmermeyer specializes in algebraic number theory, and his papers deal with topics like class groups, reciprocity laws, and Euclidean number fields. He has a huge Bibliography on Reciprocity Laws and a recent book: Reciprocity Laws (with links to other pages on quadratic reciprocity).
Algebraic Number Theory Archives
Preprint archives for papers in algebraic number theory and arithmetic geometry.
Robin Chapman's Home Page
Contains lecture notes on algebraic number theory, plus preprints, miscellaneous articles, external links
Math 254B (Number Theory)
Lecture notes and other material, including some external links, from a course given at U. C. Berkeley in 2002, by Kiran Kedlaya.

Surveys, overviews, tutorials

Algebraic number theory
Article from Wikipedia. Consists mostly of references to more specific topics.
Class field theory
Article from Wikipedia.
Algebraic Number Theory
Course notes for a standard graduate course, available in DVI, PS, and PDF formats. By J. S. Milne.
Class Field Theory
Course notes, available in DVI, PS, and PDF formats. By J. S. Milne.


Recommended references: Magazine/journal articles

Algorithms in Algebraic Number Theory
H. W. Lenstra, Jr.
Bulletin of the AMS, April 1992, pp. 211-244
The study of algorithms in algebraic number theory has both practical and theoretical payoffs. The objects that are of greatest interest for computation are Galois groups, rings of integers and their units, and ideal class groups.


Recommended references: Books


Home

Copyright © 2002 by Charles Daney, All Rights Reserved