The Strange Theory of Light and Matter (III)

This is my third and, most probably, final comments on Feynman’s popular Lectures on Quantum Electrodynamics (QED). The origin of these short lectures is quite touching: it’s the death of Alix G. Mautner, a good friend of Feynman’s who was always curious about physics but never got the math because her career was in English literature. Indeed, Feynman introduces this 1985 publication by writing that “Here are the lectures I really prepared for Alix, but unfortunately I can’t tell them to her directly, now.”

Alix Mautner died from a brain tumor, and her husband, Leonard Maunter, sponsored the QED lectures series at the UCLA, which Ralph Leigton transcribed and published as the booklet we’re talking about here. Feynman himself also died a few years later, at the relatively young age of 69—and from cancer too! All this tragedy makes it rather surprising that Feynman’s QED never quite got the same iconic status of, let’s say, Stephen Hawking’s Brief History of Time. [In fact, I just saw the movie on Hawing (The Theory of Everything), and I note another weird coincidence: Jane Wilde, Hawking’s first wife, also had a PhD in literature. And, just to wrap up this parenthesis: the movie doesn’t show that Hawking divorced his nurse in 2006, to get back to Jane. Isn’t that amazing?]

Now, one reason why Feynman’s Strange Theory of Light and Matter did not sell like Hawking’s Brief History of Time, might well be that, in some places, the text is not entirely convincing. As mentioned before, I suspect Richard Feynman simply didn’t have much time or energy left to correct some of the writing of Ralph Leighton, who transcribed and edited these four short lectures a few years before Feynman’s death. When everything is said and done, Ralph Leighton is not a physicist and, hence, I do think he compromised – just a little bit – on accuracy for the sake of readability. Ralph Leighton’s father, Robert Leighton, an eminent physicist who worked with Feynman, would probably have done a much better job.

Now, because quantum electrodynamics provides the framework for all of quantum mechanics, one should not not compromise on accuracy here, even when trying to write something reader-friendly. So, in short, I wrote this and the other two posts, indeed, because I feel that, while that little book on QED is an excellent introduction to the weird world of quantum mechanics – for people who don’t like math, at least – Ralph Leighton’s story is great but, as said, in some places, it’s not entirely convincing.

So… Well… Yes. I want to do better than Leighton here. :-)

I. Probability amplitudes: what are they?

The greatest achievement of that little QED publication is that it manages to avoid any reference to wave functions and other complicated mathematical constructs: all of the complexity of quantum mechanics is reduced to three basic events (i.e. things that may or may not happen) and, hence, three basic amplitudes which are represented as ‘arrows’, literally.

Now, you know that a (probability) amplitude is a complex number. Simplifying it to some ‘arrow’ is a stroke of genius, I think, because it’s true that the concept of a complex number is quite impenetrable for a newbie. In contrast, everyone easily ‘gets’ the concept of an ‘arrow’.

Whatever we call it, a complex number or an arrow, the point is that a probability amplitude is something with (a) a magnitude (that’s some real positive number, like a length, but so you should not associate it with some spatial dimension: it’s just a number) as well as (b) a phase. We also know that we have to square these lengths to find some probability, i.e. some real number between 0 and 1. Hence, the length of our arrows cannot be larger than one. [Note that this ‘normalization’ condition reinforces the point that these ‘arrows’ do not have any spatial dimension: they represent a function. To be specific, they represent a wavefunction.]

Now, if we’d be talking complex numbers instead of ‘arrows’, we’d say the absolute value of the complex number cannot be larger than one. We’d also say that, to find the probability, we should take the absolute square of the complex number, so that’s the square of the magnitude or absolute value of the complex number indeed. We cannot just square the complex number: it has to be the square of the absolute value. Why? Well… Just write it out. The square of a complex number z = a + bi is equal to z= a+ 2abi – b2, while the square of its absolute value (i.e. the absolute square) is |z|= [√(a+ b2)]2 = a+ b2. Hence, the square and the absolute square are two very different concepts. Indeed, it’s not only the 2abi term: there’s also the minus sign in the first expression. In case of doubt, always remember that the square of a complex number may actually yield a negative number, as evidence by the definition of the imaginary unit: i= –1. So… Well… Just note it’s the absolute square when talking amplitudes and probabilities.

So Feynman and Leighton managed to avoid any reference to complex numbers in this short series of four lectures, and that’s truly a great achievement. The second stroke of genius is the metaphor of the ‘stopwatch’ and the ‘stopwatch hand’ for the variable phase. Indeed, although I think it’s worth explaining why z = a + bi = rcosφ + irsinφ in the illustration below can be written as z = reiθ = |z|eiφ, understanding Euler’s representation of complex number as a complex exponential does require a bit of a feel for math.


The stopwatch is, basically, a periodic function: it represents a sinusoid, i.e. a smooth repetitive oscillation. The stopwatch hand represents the phase, i.e. the φ angle in the illustration above, and that angle is a function of time: the speed with which the stopwatch turns is related to some frequency:

  1. For photons, that frequency is given by Planck’s energy-frequency relation, which relates the energy (E) of a photon (1.5 to 3.5 eV for visible light) to its frequency (ν). It’s a simple proportional relation relation, with Planck’s constant (h) as the proportionality constant: E = hν, or ν = E/h.
  2. For electrons, we have the de Broglie relation, which looks similar to the Planck relation (E = hf, or f = E/h) but, as you know, it’s something different. Indeed, these so-called matter waves are not so easy to interpret because there actually is no precise frequency f. The matter wave representing some particle in space will consist of a potentially infinite number of waves, all superimposed one over another, as illustrated below.


For the sake of accuracy, I should mention that the animation above has its limitations: the wavetrain is complex-valued and, hence, has a real as well as an imaginary part, so it’s something like the blob underneath: two functions in one—literally, because the imaginary part follows the real part with a phase difference of 90 degrees or π/2 radians. Indeed, if the wavefunction is a regular complex exponential reiθ, then rsin(φ–π/2) = rcos(φ), which proves the point.

Photon wave

In any case, the point is that, when it comes to analyzing electrons (or any other matter-particle), we’re dealing with a range of frequencies f really (or, what amounts to the same, a range of wavelengths λ) and, hence, we should write Δf = ΔE/h, which is just one of the many expressions of the Uncertainty Principle in quantum mechanics.

Now that’s just one of the complications. Another difficulty is that matter-particles, such as electrons, have some rest mass, and so that enters the energy equation as well (literally). Last but not least, one should distinguish between the group velocity and the phase velocity of matter waves. That makes for a complicated relationship between the wavelength and the frequency. I’ll come back to that, and so you shouldn’t worry about it here. Just note that the speed of the stopwatch hand is very different for an electron!

So, in his postmortem lectures for Alix Mautner, Feynman avoided all these complications. Frankly, I think that’s a missed opportunity because I do not think it’s all that incomprehensible. In any case, for what follows here, you’ll need to understand at least one mathematical concept, and that’s the concept of (angular) frequency. It’s not difficult. High-school math is enough here. One turn of the stopwatch corresponds to one cycle. One cycle, or 1 Hz (i.e. one oscillation per second) covers 360 degrees or, to use a more natural unit, 2π radians. [Why is radian a more natural unit? Because it’s the distance unit itself. Indeed, remember that the circumference of the unit circle is 2π.] Hence, our frequency ν corresponds to an angular frequency ω = 2πν. We can also link this formula to the period of the oscillation, T, i.e. the duration of one cycle. T = 1/ν and, hence, ω = 2π/T. [As for the animation below, yes, you’ll recognize it from Wikipedia: nice and simple.]


So, the easy math above now allows us to formally write the phase φ as a function of time (t): φ = ωt. Now, the wave travels through space, and the two illustrations above usually analyze the wave shape at some fixed point in time and so the horizontal axis is not t but x, in that case. Hence, we can and should write the phase not only as a function of time but also of space. So how do we do that? Well… If the hypothesis is that the wave travels through space at some fixed speed c, then its frequency ν will also determine its wavelength λ. It’s a simple relationship: c = λν (the number of oscillations per second times the length of one wavelength should give you the distance traveled per second, so that’s, effectively, the wave’s speed). Now, we’ve expressed the frequency in radians per second, so we should express the wavelength in radians per unit distance too. That’s what the wavenumber does: think of it as the spatial frequency of the wave. We denote the wavenumber by k, and write: k = 2π/λ. [Just do a numerical example when you have difficulty following. For example, if you’d assume the wavelength is 5 units distance (i.e. 5 meter) – that’s a typical VHF radio frequency: ν = (3×10m/s)/(5 m) = 0.6×108 Hz = 60 MHz – then that would correspond to (2π radians)/(5 m) ≈ 1.25 radians per second.


Now, from the ω = 2πν, c = λν and k = 2π/λ relations, it’s obvious that k = 2π/λ = 2π/(c/ν) = (2πν)/c = ω/c. To sum it all up, frequencies and wavelengths, in time and in space, are all related through the speed of propagation of the wave c. More specifically, they’re related as follows:

c = λν = ω/k

Now, it’s obvious that the periodicity of the wave implies that we can find the same phase by going back or forward in time, and/or by going back and forward in space. Now, if both time and space variable, and we want to keep the same phase, then you’ll agree we need to either (a) go forward in space and back in time or, alternatively, (b) go back in space and forward in time. Mathematically, that means the argument of the function representing φ will be some function φ = φ[x–ct] or, what amounts to the same, some function φ[ct–x]. The point to note is that we’ve got the propagation speed c again and a minus sign. Now, because c = ω/k, we can rewrite that as φ[ωt/k – x]. Now, that looks a bit ugly, and so we’ll just multiply the argument ωt/k – x with k. We finally get what we need to get:

φ = ωt–kx = –k(x–ct)

I hope you’ll agree that all of this isn’t rocket science: it’s just high-school math. And so – but who am I? – would have put at least one or two footnotes in a text like Feynman’s QED.

Digression 1: on relativity and spacetime

As you can see, any wave equation establishes a deep relation between space and time (and the ‘thing’ we’re describing, of course). In fact, that’s what the whole wave equation is all about! So let me say a few things more about that.

Because you know a thing or two about physics, you may ask: when we’re talking time, whose time are we talking about? Indeed, we’ll be talking about photons going from A to B but if they travel at the speed of light – which is most likely but, as you know, not assured – their stopwatch hand, as seen from our (inertial) frame of reference, doesn’t move. Likewise, according to the photon, our clock seems to be standing still.

Let me put the issue to bed immediately: we’re looking at things from our point of view, obviously, and you should, quite simply, just accept that the analysis is consistent with relativity theory. Why? Because it is. For example, the (probability) amplitude for a photon to travel from point A to B depends on the spacetime interval, which is invariant. Hence, A and B are four-dimensional points in spacetime, involving both spatial as well as time coordinates: A = (xA, yA, zA, tA) and B = (xB, yB, zB, tB).

Now, having said that, we should draw some attention to the intimate relationship between space and time which, let me remind you, results from the absoluteness of the speed of light. Indeed, one will always measure the speed of light c as being equal to 299,792,458 m/s, always and everywhere. It does not depend on your reference frame (inertial or moving). That’s why the constant c anchors all laws in physics, and why we can write what we write above, i.e. include both distance (x) as well as time (t) in the phase function θ = f(x, t) = ωt–kx = –k(x–ct). The k and ω are related through the ω/k = c relationship: the speed of light links the frequency in time (ν = ω/2π = 1/T) with the frequency in space (i.e. the wavenumber or spatial frequency k). Hence, they are not independent. There is only degree of freedom here: the frequency. [As noted above, this relationship is not so obvious for electrons, or for matter waves in general: there we need to distinguish group and phase velocity, and so we don’t have a unique frequency.]

Now, let me make a small digression here. Thinking about travel at the speed of light invariably leads to paradoxes. In previous posts, I explained the mechanism of light emission: a photon is emitted – one photon only – when an electron jumps back to its ground state after being excited. Hence, we may imagine a photon as a transient electromagnetic wave–something like what’s pictured below. Now, the decay time of this transient oscillation (τ) is measured in nanoseconds, i.e. billionths of a second (1 ns = 1×10–9 s): the decay time for sodium light, for example, is some 30 ns only.

decay time

However, because of the tremendous speed of light, that still makes for a wavetrain that’s like ten meter long, at least. Indeed, 30×10–9 s times 3×10m/s is nine meter, but you should note that the decay time measures the time for the oscillation to die out by a factor 1/e), so the oscillation itself lasts longer than that. That nine or ten meter covers like 16 to 17 million oscillations (the wavelength of sodium light is about 600 nm and, hence, 10 meter fits almost 17 million oscillations indeed). Now, how can we reconcile the image of a photon as a ten-meter long wavetrain with the image of a photon as a point particle?

The answer to that question is paradoxical: from our perspective, anything traveling at the speed of light will have zero length because of the relativistic length contraction effect. Now, from the three measurable effects on objects moving at relativistic speeds – i.e. (a) an increase of the mass (the energy needed to further accelerate particles in particle accelerators increases dramatically at speeds nearer to c), (b) time dilation, i.e. a slowing down of the (internal) clock (because of their relativistic speeds when entering the Earth’s atmosphere, the measured half-life of muons is five times that when at rest), and (c) length contraction – length contraction is probably the most paradoxical of all.

Finally, I should also make another note. I said that one will always measure the speed of light c as being equal to 299,792,458 m/s, always and everywhere and, hence, that it does not depend on your reference frame (inertial or moving). I need to nuance that statement in light of what follows: an individual photon does have an amplitude to travel faster or slower than c, and when discussing matter waves (such as the wavefunction that’s associated with an electron), we can have phase velocities that are faster than light! However, when calculating those amplitudes, is a constant. Otherwise it wouldn’t make sense to talk about spacetime intervals and all that. Full stop. End of digression.

Let’s go back to the story.

II. The building blocks: P(A to B), E(A to B) and j

The three basic ‘events’ (and amplitudes) are the following:

1. P(A to B) is the (probability) amplitude for a photon to travel from point A to B. A and B are points in spacetime. Therefore, we associate them not only with some specific (x, y, z) position in space, but also with a some specific time t. Now, quantum-mechanical theory gives us an easy formula for P(A to B): it depends on the so-called (spacetime) interval between the two points A and B, i.e. I = Δr– Δt= (x2–x1)2+(y2–y1)2+(z2–z1)– (t2–t1)2. The point to note is that the spacetime interval takes both the distance in space as well as the ‘distance’ in time into account. [Note that we should measure time and distance in equivalent units when using that Δr– Δtformula for I. So we either measure distance in light-seconds or, else, we measure time in units that correspond to the time that’s needed for light to travel one meter. If no equivalent units are adopted, the formula is I = Δrc·Δt2.]

Now, in quantum theory, anything is possible and, hence, not only do we allow for crooked paths, but we also allow for the difference in time to differ from  the time you’d expect a photon to need to travel along some curve (whose length we’ll denote by l), i.e. l/c. Hence, our photon may actually travel slower or faster than the speed of light c! There is one lucky break, however, that makes all come out alright: the amplitudes associated with the odd paths and strange timings cancel each other out. Hence, what remains, are the paths that are equal or near to the so-called ‘light-like’ intervals in spacetime only. Having said that, the net result is still that light does effectively use a small core of space as it travels. Hence, our photon does use a small core of space, as evidenced by the fact it interferes with itself when traveling through a slit or a small hole!

2. E(A to B) is the (probability) amplitude for an electron to travel from point A to B. The formula for E(A to B) is much more complicated, and it’s the one I want to discuss somewhat more in detail in this post. It depends on some complex number j (see the next remark) and some real number n.

3. Finally, an electron could emit or absorb a photon, and the amplitude associated with this event is denoted by j, for junction number. It’s the same number j as the one mentioned when discussing E(A to B) above.

Now, this junction number is often referred to as the coupling constant or the fine-structure constant. However, the truth is, as I pointed out in my previous post, that these numbers are related, but they are not the same: α is the square of j, so we have α = j2. There is also one more, related, number: the gauge parameter, which is denoted by g (it has nothing to do with gravitation, though!). Again, this number is related, but not the same. I’ll come back to this.

Digression 2: on the fine-structure constant, Planck units and the Bohr radius

We know the value for j: it’s approximately –0.08542454. How do we know that?

The easy answer to that question is: physicists measured it. In fact, what they measure is the square root of the (absolute value) of j, i.e. that fine-structure constant α. Its measured value is published (and updated) by the US National Institute on Standards and Technology. To be precise, the currently accepted value of α is 7.29735257×10−3. You can check:

j = –0.08542454 ≈ –√0.00729735257 = –√α

Just for the record, older and/or more popular books will usually mention 1/α as the ‘magical’ number, so the ‘special’ number you may have seen is the inverse fine-structure constant, which is about 137, but not quite:

1/α = 137.035999074 ± 0.000000044

I am adding the standard uncertainty just to give you an idea of how serious these measurements are. :-) So that‘s the number about which popular writers, including Leighton, get very excited about. Indeed, as Leighton puts it:

“Where does this number come from? Nobody knows. It’s one of the greatest damn mysteries of physics: a magic number that comes to us with no understanding by man. You might say the “hand of God” wrote that number, and “we don’t know how He pushed his pencil.” We know what kind of a dance to do experimentally to measure this number very accurately, but we don’t know what kind of dance to do on the computer to make this number come out, without putting it in secretly!”

Is it Leighton, or did Feynman really say this? Not sure. In any case, the point to note is that the fine-structure constant is a very special number indeed. But you should also that it’s not the only ‘special’ number. In fact, we derive it from other ‘magical’ numbers. To be specific, I’ll show you how we derive it from the fundamental properties of the electron. So, the truth is: we do know how to make this number come out, which makes me doubt whether Feynman really said what Leighton said he said. :-)

So we can derive α from some other numbers. That brings me to the more complicated answer to the question as to the value of j is: j‘s value is the electron charge expressed in Planck units, which we’ll denote by –eP:

j = –eP

Now that is strange. Truly strange. Incomprehensibly weird!

Why? Well… Let me start with the basic facts. If is –√α, and it’s also equal to –eP, then the fine-structure constant must also be equal to the square of the electron charge eP, so we can write:

α = eP2

You’ll say: yes, so what? Well… I am pretty sure that, if you’ve ever seen a formula for α, it’s surely not this simple α = eP2 formula. What you’ve seen, most likely, is one or more of the expressions below:

Fine-structure constant formula

Now that’s a pretty impressive collection of physical constants, isn’t it? :-) They’re all different but, somehow, when we combine them in one or the other ration (we have not less than five different expressions here, and I could give you a few more!), we get the very same number: α. Now that is what I call strange. Truly strange. Incomprehensibly weird!

Let’s look at those constants. You know most of them already. The only constants you may not have seen before are μ0Rand, perhaps, ras well as m. However, these can easily be defined as some easy function of the constants that you did see before, so let me quickly do that:

  1. The μ0 constant is the so-called magnetic constant. It’s something similar as ε0 and it’s referred to as the magnetic permeability of the vacuum. So it’s just like the (electric) permittivity of the vacuum (i.e. the electric constant ε0) and the only reason why this blog hasn’t mentioned this constant before is because I haven’t really discussed magnetic fields so far. I only talked about the electric field vector. In any case, you know that the electric and magnetic force are part and parcel of the same phenomenon (i.e. the electromagnetic interaction between charged particles) and, hence, they are closely related. To be precise, μ0ε0 = 1/c= c–2. So that shows the first and second expression for α are, effectively, fully equivalent. [Just in case you’d doubt that μ0ε0 = 1/c2, let me give you the values: μ0 = 4π·10–7 N/A2, and ε0 = (1/4π·c2)·10C2/N·m2. Just plug them in, and you’ll see it’s bang on. Moreover, note that the ampere (A) unit is equal to the coulomb per second unit (C/s), so even the units come out alright!]
  2. The ke constant is the Coulomb constant and, from its definition ke = 1/4πε0, it’s easy to see how those two expressions are, in turn, equivalent with the third expression for α.
  3. The Rconstant is the so-called von Klitzing constant. Huh? Yeah. I know. I am pretty sure you’ve never ever heard of that one before. It’s something quantum-mechanical. However, don’t worry about it because it’s, quite simply, equal to Rh/e2. Hence, substituting (and don’t forget that h = 2πħ) will demonstrate the equivalence of the fourth expression for α.
  4. Finally, the re factor is the classical electron radius, which is usually written as a function of me, i.e. the electron mass: re = e2/4πε0mec2. Also note that this also implies that reme = e2/4πε0c2.

I am sure you’re under some kind of ‘formula shock’ now. But you should just take a deep breath and and move on. The point to note is: “Yes. It’s all the same really!”

So what is that α really? Well… A strange number indeed. It’s dimensionless (so we don’t measure in kg, m/s, eV·s or whatever) and it pops up everywhere.

You’ll say: what’s everywherehaven’t seen it!

Well… Let me start by explaining the term itself. The fine structure in the name refers to the splitting of the spectral lines of atoms. That’s a fine structure indeed. :-) We also have a so-called hyperfine structure. Both are illustrated below for the hydrogen atom. The numbers n, JI, and are quantum numbers used in the quantum-mechanical explanation of the emission spectrum, which is  also depicted below, but note that the illustration gives you the so-called Balmer series only, i.e. the colors in the visible light spectrum (there are many more ‘colors’ in the high-energy ultraviolet and the low-energy infrared range).



To be precise: (1) n is the principal quantum number: here it takes the values 1 or 2, and we could say these are the principal shells; (2) the S, P, D,… orbitals (which are usually written in lower case: s, p, d, f, g, h and i) correspond to the (orbital) angular momentum quantum number l = 0, 1, 2,…, so we could say it’s the subshell; (3) the J values correspond to the so-called magnetic quantum number m, which goes from –l to +l; (4) the fourth quantum number is the spin angular momentum s. I’ve copied another diagram below so you see how it works, more or less, that is.

hydrogen spectrum

Now, our fine-structure constant is related to all of these quantum numbers. For details, I’ll let you check on the Internet. For example, just check out the various physical interpretations of α in the Wikipedia article on it. Please do it: it’s fascinating. This α is connected to everything!

Really? Well… Almost everything. :-)

So… Well… Yes! It is an ‘amazing’ number, indeed, and, hence, it might well be “one of the greatest damn mysteries of physics”, as Feynman and/or Leighton put it. Having said that, I would not go as far as to write that it’s “a magic number that comes to us with no understanding by man.” In fact, I think Feynman/Leighton could have done a much better job when explaining what it’s all about. So, yes, I hope to do better than Leighton here and, as he’s still alive, I actually hope he reads this. :-)

The point is: α is not the only weird number. What’s particular about it, as a physical constant, is that it’s dimensionless, because it relates a number of other physical constants in such a way that the units fall away. Having said that, the Planck or Boltzmann constant are at least as weird.

So… What is this all about? Well… You’ve probably heard about the so-called fine-tuning problem in physics and, no, the term fine-tuning and fine-structure have nothing in common, except for four letters. :-) The term fine-tuning refers to the fact that all the parameters or constants in the so-called Standard Model of physics are, indeed, all related to each other in the way they are. We can’t sort of just turn the knob of one and change it, because everything falls apart then. So, in essence, the fine-tuning problem in physics is more like a philosophical question: why is the value of all these physical constants and parameters exactly what it is? So it’s like asking: could we change some of the ‘constants’ and still end up with the world we’re living in? What if was some other number? What if ke or ε0 was some other number?

Of course, that’s a question one shouldn’t try to answer before answering some other, more fundamental, question: how many degrees of freedom are there really? Indeed, we just saw that ke and εare intimately related through some equation, and other constants and parameters are related to. So the question is like: what are the ‘dependent’ and the ‘independent’ variables in this so-called Standard Model?

I am not sure there’s an answer to that question. In fact, one of the reasons why I find physics so fascinating is that one cannot easily answer such questions. There are the obvious relationships, of course. For example, the ke = 1/4πεrelationship, and the context in which they are used (Coulomb’s Law) does, indeed, strongly suggest that both constants are actually part and parcel of the same thing. Almost identical, I’d say. Likewise, the μ0ε0 = 1/crelation also suggests there’s only one degree of freedom here, just like there’s only one degree of freedom in that ω/k = relationship (if we set a value for ω, we have k, and vice versa). But… Well… I am not quite sure how to phrase this, but… What physical constants could be ‘variables’ indeed?

It’s pretty obvious that the various formulas for α cannot answer that question: you could stare at them for days and weeks and months and years really, but I’d suggest you use your time to read more of Feynman’s real Lectures instead. :-) One point that may help to come to terms with this question – to some extent, at least – is what I casually mentioned above already: the fine-structure constant is equal to the square of the electron charge expressed in Planck units: α = eP2.

Now, that’s very remarkable because Planck units are some kind of ‘natural units’ indeed (for the detail, see my previous post: among other things, it explains what these Planck units really are) and, therefore, it is quite tempting to think that we’ve actually got only one degree of freedom here: α itself. All the rest should follow from it.


It should… But… Does it?

The answer is: yes and no. To be frank, it’s more no than yes because, as I noted a couple of times already, the fine-structure constant relates a lot of stuff but it’s surely not the only significant number in the Universe. For starters, I said that our E(A to B) formula has two ‘variables':

  1. We have that complex number j, which, as mentioned, is equal to the electron charge expressed in Planck units. [In case you wonder why –eP ≈ –0.08542455 is said to be an amplitude, i.e. a complex number or an ‘arrow’… Well… Complex numbers include the real numbers and, hence, –0.08542455 is both real and complex. When combining ‘arrows’ or, to be precise, when multiplying some complex number with –0.08542455, we will (a) shrink the original arrow to about 8.5% of its original value (8.542455% to be precise) and (b) rotate it over an angle of plus or minus 180 degrees. In other words, we’ll reverse its direction. Hence, using Euler’s notation for complex numbers, we can write: –1 = eiπ eiπ and, hence, –0.085 = 0.085·eiπ = 0.085·eiπ.]
  2. We also have some some real number n in the E(A to B) formula. So what’s the n? Well… Believe it or not, it’s the electron mass! Isn’t that amazing?

You’ll say: Well… Hmm… I suppose so.

You should also wonder: the electron mass? In what units? Planck units again? And are we talking relativistic mass (i.e. its total mass, including the equivalent mass of its kinetic energy) or its rest mass only? And we were talking α here, so can we relate it to α too, just like the electron charge?

These are all very good questions. Let’s start with the second one. We’re talking rather slow-moving electrons here, so the relativistic mass (m) and its rest mass (m0) is more or less the same. Indeed, the Lorentz factor γ in the m = γm0 equation is very close to 1 for electrons moving at their typical speed. So… Well… That question doesn’t matter very much. Hmm… That’s not very convincing as an answer, is it? You’re right. So… OK. Let’s go for it. What’s their ‘typical’ speed?

We know we shouldn’t attach too much importance to the concept of an electron in orbit around some nucleus (we know it’s not like some planet orbiting around some star) and, hence, to the concept of velocity when discussing an electron in an atom. Having said that, however, there’s a very easy mathematical relationship that gives us some clue here: we’ll use the Uncertainty Relationship in a few moments to relate the momentum of an electron (p) to the so-called Bohr radius r (think of it as the size of a hydrogen atom) as follows: p ≈ ħ/r. Now we also know its kinetic energy (K.E.) is mv2/2, which we can write as p2/2m. Substituting our p ≈ ħ/r conjecture, we get K.E. = mv2/2 = ħ2/2mr2. This is equivalent to m2v2 = ħ2/r2, so we have v = ħ/mr. Now, one of the many relation we can derive from the formulas for the fine-structure constant is re = α2r. Hence, r = re2 (ris the classical electron radius). So we can now write v = ħα2/mre. Let’s throw c now: v/c = α2ħ/mcre. But, from that fifth expression for α, we know that ħ/mcre = α, so we get v/c = α. We have another amazing result here: the v/c ratio for an electron (i.e. its speed expressed as a fraction of the speed of light) is equal to that fine-structure constant α. So that’s about 1/137, so that’s less than 1% of the speed of light. Now, I’ll leave it to you to calculate γ but it is very close to 1. :-)

Let’s now look at the question in regard to the Planck units. If you know nothing at all about them, I would advise you to read what I wrote about them in my previous post. Let me just note we get those Planck units by equating five fundamental physical constants to 1, notably (1) the speed of light, (2) Planck’s (reduced) constant, (3) Boltzmann’s constant, (4) Coulomb’s constant and (5) Newton’s constant (i.e. the gravitational constant). Hence, we have a set of five equations here (ħ = kB = ke = G = 1), and so we can solve that to get the five Planck units, i.e. the Planck length, time, mass, energy, charge and temperature units. [Note that mass and energy are directly related because of the mass-energy equivalence relation E = mc2).

Now, you may or may not remember that the Planck time and length units are unimaginably small, but that the Planck mass unit is actually quite sizable, at the atomic scale, at least. Indeed, the Planck mass is something huge, like the mass of an eyebrow hair, or a flea egg. Is that huge? Yes. Because if you’d want to pack it in a Planck-sized particle, it would make for a tiny black hole. So… Well… That’s the physical significance of the Planck mass and the Planck length and, yes, it’s weird. :-)

Let me give you some values. First, the Planck mass itself: it’s about 2.1765×10−8 kg and, if you use the E = mc2 equivalence relationship, you’ll see that’s equivalent to 2 giga-joule, approximately. Just to give an idea, that’s like the monthly electricity consumption of an average American family. So that’s huge indeed!  :-) Let me now give you the electron mass expressed in the Planck mass unit:

  1. Measured in our old-fashioned super-sized SI kilogram unit, the electron mass is me = 9.1×10–31 kg.
  2. The Planck mass is mP = 2.1765×10−8 kg.
  3. Hence, the electron mass expressed in Planck units is meP = me/mP = (9.1×10–31 kg)/(2.1765×10−8 kg) = 4.181×10−23.

We can, once again, write that as some function of the fine-structure constant, so… Yes: another amazing formula involving α. More specifically, we can write:

meP = α/reP = α/α2rP  = 1/αrP

In this formula, we have rand reP, which are the (classical) electron radius and the Bohr radius expressed in Planck (length) units. So you can see what’s going on here: we have all kinds of numbers here expressed in Planck units: a charge, a radius, a mass,… And we can relate all of them to the fine-structure constant

Why? Who knows? I don’t. As Leighton puts it: that’s just seem to be the way “God pushed His pencil.” :-)

Note that the beauty of natural units ensures that we get the same number for the (equivalent) energy of an electron. Indeed, from the E = mc2 relation, we know the mass of an electron can also be written as 0.511 MeV/c2. Hence, the equivalent energy is 0.511 MeV (so that’s the same number but without the 1/cfactor. Now, the Planck energy EP (in eV) is 1.22×1028 eV, so we get EeP = Ee/EP = (0.511×10eV)/(1.22×1028 eV) = 4.181×10−23. Isn’t that nice? :-)

Now, are all these numbers dimensionless, just like α? The answer to that question is complicated. It’s, again, yes, and no:

  1. Yes. They’re dimensionless because they measure something in natural units, i.e. Planck units, and, hence, that’s some kind of relative measure indeed so… Well… Yes, dimensionless.
  2. No. They’re not dimensionless because they do measure something, like a charge, a length, or a mass, and when you chose some kind of relative measure, you need to define some gauge, i.e. some kind of standard measure.

So what’s the final answer. Well… The Planck units are not dimensionless. All we can say is that they are closely related, physically. I should also add that we’ll use the electron charge and mass (expressed in Planck units) in our amplitude calculations as a simple (dimensionless) number between zero and one. So the correct answer to the question as to whether these numbers have any dimension is: expressing some quantities in Planck units sort of normalizes them, so we can use them directly in dimensionless calculations, like when we multiply and add amplitudes.

Hmm… Well… I can imagine you’re not very happy with this answer but it’s the best I can do. Sorry. I’ll let you further ponder that question. I need to move on.  

Note that that 4.181×10−23 is still a very small number (23 zeroes after the decimal point!), even if it’s like 46 million times larger than the electron mass measured in our conventional SI unit (i.e. 9.1×10–31 kg). Does such small number make any sense? The answer is: yes, it does. When we’ll finally start discussing that E(A to B) formula (I’ll give it to you in a moment), you’ll see that a very small number for n makes a lot of sense.

Before diving into it all, let’s first see if that formula for that alpha, that fine-structure constant, still makes sense with me expressed in Planck units. Just to make sure. :-) To do that, we need to use the fifth (last) expression for a, i.e. the one with re in it. Now, in my previous post, I also gave some formula for re: re = e2/4πε0mec2, which we can re-write as reme = e2/4πε0c2. If we substitute that expression for reme  in the formula for α, we can calculate α from the electron charge, which indicates both the electron radius and its mass are not some random God-given variable, or “some magic number that comes to us with no understanding by man“, as Feynman – well… Leighton, I guess – puts it. No. They are magic numbers alright, one related to another through the equally ‘magic’ number α, but so I do feel we actually can create some understanding here.

At this point, I’ll digress once again, and insert some quick back-of-the-envelope argument from Feynman’s very serious Caltech Lectures on Physics, in which, as part of the introduction to quantum mechanics, he calculates the so-called Bohr radius from Planck’s constant h. Let me quickly explain: the Bohr radius is, roughly speaking, the size of the simplest atom, i.e. an atom with one electron (so that’s hydrogen really). So it’s not the classical electron radius re. However, both are also related to that ‘magical number’ α. To be precise, if we write the Bohr radius as r, then re = α2r ≈ 0.000053… times r, which we can re-write as:

α = √(re /r) = (re /r)1/2

So that’s yet another amazing formula involving the fine-structure constant. Just think about it for a while. In case you’d still doubt its magic, let me write what we’ve discovered so far:

(1) α is the square of the electron charge expressed in Planck units: α = eP2.

(2) α is the square root of the ratio of (a) the classical electron radius and (b) the Bohr radius: α = √(re /r). You’ll see this more often written as re = α2r. Also note that this is an equation that does not depend on the units: equations (1), (4) and (5) require you to switch to Planck units, indeed.

(3) α is the (relative) speed of an electron: α = v/c. [The relative speed is the speed as measured against the speed of light. Note that the ‘natural’ unit of speed in the Planck system of units is equal to c. Indeed, if you divide one Planck length by one Planck time unit, you get (1.616×10−35 m)/(5.391×10−44 s) = m/s. However, this is another equation, just like (2), that does not depend on the units: we can express v and c in whatever unit we want, as long we’re consistent and express both in the same units.]

(4) Finally – I’ll show you in a moment – α is also equal to the product of (a) the electron mass (which I’ll simply write as me here) and (b) the classical electron radius re (if both are expressed in Planck units): α = me·re. Now think that’s, perhaps, the most amazing of all of the expressions for α. If you don’t think that’s amazing, I’d suggest you stop reading all of this and just go and play the guitar or something. I’ll do that in a moment myself, anyway. :-)

Note that, from (2) and (4), we find that:

(5) The electron mass (in Planck units) is equal me = α/r= α/α2r = 1/αr. So that gives us an expression, using α once again, for the electron mass as a function of the Bohr radius r expressed in Planck units.

Finally, we can also substitute (1) in (5) to get:

(6) The electron mass (in Planck units) is equal to me = α/r = eP2/re. Using the Bohr radius, we get me = 1/αr = 1/eP2r.

So… As you can see, this fine-structure constant really links ALL of the fundamental properties of the electron: its charge, its radius, its distance to the nucleus (i.e. the Bohr radius), its velocity, its mass (and, hence, its energy),…


Now that should answer the question in regard to the degrees of freedom we have here, doesn’t it? It looks like we’ve got only one degree of freedom here. Indeed, if we’ve got some value for α, then we’ve have the electron charge, and from the electron charge, we can calculate the Bohr radius r (as I will show below), and if we have r, we have mand re. And we also have v, which gives us its momentum (mv) and its kinetic energy (mv2/2).


Isn’t that amazing? Hmm… You’ll probably reserve your judgment as for now – AND RIGHTLY SO! – and say: how do we get the Bohr radius from the charge? Let me show you how, because I will also show you why you should reserve your judgment. In other words, I’ll show you why we do NOT have it all!

Let’s go for it:

1. If we assume that (a) an electron takes some space – which I’ll denote by r :-) – and (b) that it has some momentum p because of its mass m and its velocity v, then the ΔxΔp = ħ relation (i.e. the Uncertainty Principle in its roughest form) suggests that the order of magnitude of r and p should be related in the very same way. Hence, we just boldly write: r ≈ ħ/p. So we equate Δx with r and Δp with p. As Feynman notes, this is really more like a dimensional analysis and so we don’t care about factors like 2 or 1/2. [Indeed, note that the more precise formulation of the Uncertainty Principle is σxσ≥ ħ/2.] In fact, we didn’t even bother to define r very rigorously: is it a volume or a radius? The answer is: we don’t care. We’re just thinking orders or magnitude here. [If you’re appalled by the rather rude approach, I am sorry for that, but just try to go along with it.]

2. From our discussions on energy, we know that the kinetic energy is mv2/2, which we can write as p2/2m so we get rid of that unknown velocity factor. Then, substituting our p ≈ ħ/r conjecture, we get K.E. = ħ2/2mr2.

3. Now, the discussions on potential energy were (much) more difficult. You’ll probably remember that we had an easy and very comprehensible formula for the energy that’s needed (i.e. the work that needs to be done) to bring two charges together from a large distance (i.e. infinity). Indeed, we derived that formula directly from Coulomb’s Law and it’s U = q1q2/4πε0r12. Now, we’re actually talking about the size of an atom here in my previous post, so one charge is the proton (+e) and the other is the electron (–e), so the potential energy is U = P.E. = –e2/4πε0r, with r the ‘distance’ between the proton and the electron, and so that’s the Bohr radius we’re looking for.

[In case you’re struggling a bit with those minus signs when talking potential energy  – I am not ashamed to admit I did! – let me quickly help you here. It has to do with our reference point: the reference point for measuring potential energy is at infinity, and it’s zero there. Now, to separate the proton and the electron, we’d have to do quite a lot of work. To use an analogy: imagine we’re somewhere deep down in a cave, and we have to climb back to the zero level. You’ll agree that’s likely to involve some sweat, don’t you? Hence, the potential energy associated with us being down in the cave is negative. Likewise, if we write the potential energy between the proton and the electron as U(r), and the potential energy at the reference point as U(∞) = 0, then the work to be done to separate the charges, i.e. the potential difference U(∞) – U(r), will be positive. So U(∞) – U(r) = 0 – U(r) > 0 and, hence, U(r) < 0. If you still don’t ‘get’ this, think of the electron being in some (potential) well, i.e. below the zero level, and so it’s potential energy is less than zero.]

4. We can now write the total energy (which I’ll denote by E, but don’t confuse it with the electric field vector!) as

E = K.E. + P.E. =  ħ2/2mr– e2/4πε0r

Now, the electron (whatever it is) is, obviously, in some kind of equilibrium state. Why is that obvious? Well… Otherwise our hydrogen atom wouldn’t be there. :-) Hence, it’s in some kind of ‘well’ indeed, at the bottom. Such equilibrium point ‘at the bottom’ is characterized by its derivative (in regard to whatever variable) being equal to zero. Now, the only ‘variable’ here is r (all the other symbols are physical constants), so we have to solve for dE/dr = 0. Writing it all out yields:

dE/dr = –ħ2/mr+ e2/4πε0r= 0 ⇔ r = 4πε0ħ2/me2

You’ll say: so what? Well… We’ve got a nice formula for the Bohr radius here, and we got it in no time! :-) So… Well… Let’s just check it by putting the values in:

r = 4πε0h2/me2

= [(1/(9×109) C2/N·m2)·(1.055×10–34 J·s)2]/[(9.1×10–31 kg)·(1.6×10–19 C)2]

= 53×10–12 m = 53 pico-meter (pm)

So what? Well… Double-check it on the Internet: the Bohr radius is, effectively, about 53 trillionths of a meter indeed! So we’re right on the spot! [As for the units, note that mass is a measure of inertia: one kg is the mass of an object which, subject to a force of 1 newton, will accelerate at the rate of 1 m/s per second. Hence, we write F = m·a, which is equivalent to m = F/a. Hence, the kg, as a unit, is equivalent to 1 N/(m/s2). If you make this substitution, we get r in the unit we want to see: [(C2/N·m2)·(N2·m2·s2)/[(N·s2/m)·C2] = m.]

Moreover, if we take that value for r and put it in the (total) energy formula above, we’d find that the energy of the electron is –13.6 eV. [Don’t forget to convert from joule to electronvolt when doing the calculation!] Now you can check that on the Internet too: that’s exactly the amount of energy that’s need to ionize a hydrogen atom!

Waw ! Isn’t it great that such simple calculations yield such great results? :-) [Of course, you’ll note that the omission of the 1/2 factor in the Uncertainty Principle was quite strategic. :-)] Using the r = 4πε0ħ2/meformula for the Bohr radius, you can now easily check the re = α2r formula. You should find what we jotted down already: re = e2/4πε0mec2. To be precise, re = (53×10–6)·(53×10–12m) = 2.8×10–15 m. Now that’s again something you should check on the Internet. Guess what? […] That’s right on the spot again. :-)

We can now also check that α = m·re formula: α = m·r= 4.181×10−23 times… Hey! Wait! We have to express re in Planck units as well, of course! Now, (2.81794×10–15 m)/(1.616×10–35 m) ≈ 1.7438 ×1020. So now we get 4.181×10−23 times 1.7438×1020 = 7.29×10–3 = 0.00729 ≈ 1/137. Bingo! We got the magic number once again. :-)

So… That confirms we do have it all with α, don’t we?

The (negative) answer should be obvious from the argument above. First, you should note that I had to use h in that calculation of the Bohr radius. Moreover, the other physical constants (most notably c and the Coulomb constant) were actually there as well, in background I’d say, because one needs them to derive the formulas we used above. And then we have the equations themselves, of course, most notably that Uncertainty Principle… So… Well…

It’s not like God gave us one number only (α) and that all the rest flows out of it. We have a whole bunch of ‘fundamental’ relations and, hence, a whole bunch of ‘fundamental’ constants.

Hmm… Now you’ll wonder: how many? How many constants do we need in all of physics?

Well… I’d say, you should not only ask about the constants: you should also ask about the equations: how many equations do we need in all of physics? [Just for the record, that made me smile when the Hawking of the movie says that he’s actually looking for one formula that sums up all of physics. Frankly, that’s a nonsensical statement, and I am pretty sure Hawking never said that and/or, if he did, that it was one of those statements one needs to interpret very carefully.]

But let’s look at a few constants indeed. For example, if we have c, h and α, then we can calculate the electric charge e and, hence, the electric constant ε= e2/2αhc. From that, we get Coulomb’s constant ke, because ke is defined as 1/4πε0… But… Hey! Wait a minute! How do we know that ke = 1/4πε0? Well… From experiment. But… Yes? That means 1/4π is some fundamental proportionality coefficient too, isn’t it?

Hmm… That’s a good and valid remark. In fact, we use the so-called reduced Planck constant ħ in a number of calculations, and so that involves a 2π factor too (ħ = h/2π) so… Well… Yes, perhaps we should consider 2π as some fundamental constant too! And, then, well… Now that I think of it, there’s a few other mathematical constants out there, like Euler’s number e, for example, which we use in complex exponentials.


I am joking, right? I am not saying that 2π and Euler’s number (it’s a bit of a nuisance we’re also using the symbol here, but so we’re not talking the electron charge here: we’re talking that 2.718… number that’s used in so-called ‘natural’ exponentials and logarithms) are fundamental ‘physical’ constants, am I?

Well… Yes and no. They’re mathematical constants indeed, rather than physical, but… Well… I hope you get my point. What I want to show here, is that it’s quite hard to say what’s fundamental and what isn’t. We can actually pick and choose a bit among all those constants and all those equations. As one physicist puts its: it depends on how we slice it. The one thing we know for sure is that a great many things are related, in a physical way (α connects all of the fundamental properties of the electron, for example) and/or in a mathematical way (2π connects not only the circumference of the unit circle with the radius but quite a few other constants as well!), but… Well… What to say? It’s a tough discussion and I am not smart enough to give you an unambiguous answer. From what I gather on the Internet, when looking at the whole Standard Model (including the strong force, the weak force and the Higgs field), we’ve got a few dozen physical constants, and then a few mathematical ones as well.

Now, that’s a lot, but not an awful lot. Whatever number it is, it does raise a very fundamental question: why are they what they are? That brings us back to that ‘fine-tuning’ problem. Now, I can’t make this post too long (it’s way too long already), so let me just conclude this discussion by copying Wikipedia on that question, because what it has on this topic is not so bad:

“Some physicists have explored the notion that if the physical constants had sufficiently different values, our Universe would be so radically different that intelligent life would probably not have emerged, and that our Universe therefore seems to be fine-tuned for intelligent life. The anthropic principle states a logical truism: the fact of our existence as intelligent beings who can measure physical constants requires those constants to be such that beings like us can exist.

I like this. But the article then adds the following, which I do not like so much, because I think it’s a bit too ‘frivolous’, I’d say:

“There are a variety of interpretations of the constants’ values, including that of a divine creator (the apparent fine-tuning is actual and intentional), or that ours is one universe of many in a multiverse (e.g. the many-worlds interpretation of quantum mechanics), or even that, if information is an innate property of the universe and logically inseparable from consciousness, a universe without the capacity for conscious beings cannot exist.”

Hmm… As said, I am quite happy with the logical truism: we are there because of alpha (and a whole range of other stuff), and we can see alpha (and a whole range of other stuff) because we’re there. Full stop. As for the ‘interpretations’, I’ll let you think about that for yourself. :-)

The E(A to B) formula

I must assume that, with all these digressions, you are truly despairing by now, but… Don’t. We’re here! We’re finally ready for the E(A to B) formula! Let’s go for it.

We’ve now got those two numbers measuring the electron charge and the electron mass in Planck units respectively. They’re fundamental indeed and so let’s loosen up on notation and just write them as e and m respectively:

1. The value of e is approximately –0.08542455, and it corresponds to the so-called junction number j, which is amplitude for an electron-photon coupling. When multiplying it with another amplitude (to find the amplitude for an event consisting of two sub-events, for example), it corresponds to a ‘shrink’ to less than one-tenth (something like 8.5% indeed, corresponding to the magnitude of e) and a ‘rotation’ (or a ‘turn’) over 180 degrees, as mentioned above.

Please note what’s going on here: we have a physical quantity, the electron charge (expressed in Planck units), and we use it in a quantum-mechanical calculation as a dimensionless (complex) number, i.e. as an amplitude. So… Well… That’s what physicists mean when they say that the charge of some particle (usually the electric charge but, in quantum chromodynamics, it will be the ‘color’ charge of a quark) is a ‘coupling constant’.

2. We also have m, the electron mass, and we’ll use in the same way, i.e. as some dimensionless amplitude. As compared to j, it’s is a very tiny number: approximately 4.181×10−23. So if you look at it as an amplitude, indeed, then it corresponds to an enormous ‘shrink’ (but no turn) of the amplitude(s) that we’ll be combining it with.

So… Well… How do we do it?

Well… At this point, Leighton goes a bit off-track. Just a little bit. :-) From what he writes, it’s obvious that he assumes the frequency (or, what amounts to the same, the de Broglie wavelength) of an electron is just like the frequency of a photon. Frankly, I just can’t imagine why and how Feynman let this happen. It’s wrong. Plain wrong. As I mentioned in my introduction already, an electron traveling through space is not like a photon traveling through space.

For starters, an electron is much slower (because it’s a matter-particle: hence, it’s got mass). Secondly, the de Broglie wavelength and/or frequency of an electron is not like that of a photon. For example, if we take an electron and a photon having the same energy, let’s say 1 eV (that corresponds to infrared light), then the de Broglie wavelength of the electron will be 1.23 nano-meter (i.e. 1.23 billionths of a meter). Now that’s about one thousand times smaller than the wavelength of our 1 eV photon, which is about 1240 nm. You’ll say: how is that possible? If they have the same energy, then the f = E/h and ν = E/h should give the same frequency and, hence, the same wavelength, no?

Well… No! Not at all! Because an electron, unlike the photon, has a rest mass indeed – measured as not less than 0.511 MeV/c2, to be precise (note the rather particular MeV/c2 unit: it’s from the E = mc2 formula) – one should use a different energy value! Indeed, we should include the rest mass energy, which is 0.511 MeV. So, almost all of the energy here is rest mass energy! There’s also another complication. For the photon, there is an easy relationship between the wavelength and the frequency: it has no mass and, hence, all its energy is kinetic, or movement so to say, and so we can use that ν = E/h relationship to calculate its frequency ν: it’s equal to ν = E/h = (1 eV)/(4.13567×10–15 eV·s) ≈ 0.242×1015 Hz = 242 tera-hertz (1 THz = 1012 oscillations per second). Now, knowing that light travels at the speed of light, we can check the result by calculating the wavelength using the λ = c/ν relation. Let’s do it: (2.998×10m/s)/(242×1012 Hz) ≈ 1240 nm. So… Yes, done!

[…] But so we’re talking photons here. For the electron, it’s a more complicated story. That wavelength I mentioned was calculated using the other of the two de Broglie relations: λ = h/p. So that uses the momentum of the electron which, as you know, is the product of its mass (m) and its velocity (v): p = mv. You can amuse yourself and check if you find the same wavelength (1.23 nm): you should! However, once you’ve got the wavelength, you’ll want to find the frequency. Now here’s the problem with matter waves: they have a so-called group velocity and a so-called phase velocity. The idea is illustrated below: the green dot travels with the wave packet – and, hence, its velocity corresponds to the group velocity – while the red dot travels with the oscillation itself, and so that’s the phase velocity. [You should also remember, of course, that the matter wave is some complex-valued wavefunction, so we have both a real as well as an imaginary part oscillating and traveling through space.]

Wave_group (1)

So what? Well, to make a long story short, the ‘amplitude framework’ for electrons is somewhat differerent. Hence, the story that I’ll be telling here is different from what you’ll read in Feynman’s QED. I will use his drawings, though, and his concepts. Indeed, despite my misgivings above, the conceptual framework is sound, and so the corrections to be made are relatively minor.

So… We’re looking at E(A to B), i.e. the amplitude for an electron to go from point A to B in spacetime, and I said the conceptual framework is exactly the same as that for a photon. Hence, the electron can follow any path really. It may go in a straight line and travel at a speed that’s consistent with what we know of its momentum (p), but it may also follow other paths. So, just like the photon, we’ll have some so-called propagator function, which gives you amplitudes based on the distance in space as well as in the distance in ‘time’ between two points. Now, Ralph Leighton identifies that propagator function with the propagator function for the photon, i.e. P(A to B), but that’s wrong: it’s not the same.

The propagator function for an electron depends on its mass and its velocity, and/or on the combination of both (like it momentum p = mv and/or its kinetic energy: K.E. = mv2 = p2/2m). So we have a different propagator function here. However, I’ll use the same symbol for it: P(A to B).

So, the bottom line is that, because of the electron’s mass (which, remember, is a measure for inertia), momentum and/or kinetic energy (which, remember, are conserved in physics), the straight line is definitely the most likely path, but (big but!), just like the photon, the electron may follow some other path as well.

So how do we formalize that? Let’s first associate an amplitude P(A to B) with an electron traveling from point A to B in a straight line and in a time that’s consistent with its velocity. Now, as mentioned above, the P here stands for propagator function, not for photon, so we’re talking a different P(A to B) here than that P(A to B) function we used for the photon. Sorry for the confusion. :-) The left-hand diagram below then shows what we’re talking about: it’s the so-called ‘one-hop flight’, and so that’s what the P(A to B) amplitude is associated with.

Diagram 1Now, the electron can follow other paths. For photons, we said the amplitude depended on the spacetime interval I: when negative or positive (i.e. paths that are not associated with the photon traveling in a straight line and/or at the speed of light), the contribution of those paths to the final amplitudes (or ‘final arrow’, as it was called) was smaller.

For an electron, we have something similar, but it’s modeled differently. We say the electron could take a ‘two-hop flight’ (via point C or C’), or a ‘three-hop flight’ (via D and E) from point A to B. Now, it makes sense that these paths should be associated with amplitudes that are much smaller. Now that’s where that n-factor comes in. We just put some real number n in the formula for the amplitude for an electron to go from A to B via C, which we write as:

P(A to C)∗n2∗P(C to B)

Note what’s going on here. We multiply two amplitudes, P(A to C) and P(C to B), which is OK, because that’s what the rules of quantum mechanics tell us: if an ‘event’ consists of two sub-events, we need to multiply the amplitudes (not the probabilities) in order to get the amplitude that’s associated with both sub-events happening. However, we add an extra factor: n2. Note that it must be some very small number because we have lots of alternative paths and, hence, they should not be very likely! So what’s the n? And why n2 instead of just n?

Well… Frankly, I don’t know. Ralph Leighton boldly equates n to the mass of the electron. Now, because he obviously means the mass expressed in Planck units, that’s the same as saying n is the electron’s energy (again, expressed in Planck’s ‘natural’ units), so n should be that number m = meP = EeP = 4.181×10−23. However, I couldn’t find any confirmation on the Internet, or elsewhere, of the suggested n = m identity, so I’ll assume n = m indeed, but… Well… Please check for yourself. It seems the answer is to be found in a mathematical theory that helps physicists to actually calculate j and n from experiment. It’s referred to as perturbation theory, and it’s the next thing on my study list. As for now, however, I can’t help you much. I can only note that the equation makes sense.

Of course, it does: inserting a tiny little number n, close to zero, ensures that those other amplitudes don’t contribute too much to the final ‘arrow’. And it also makes a lot of sense to associate it with the electron’s mass: if mass is a measure of inertia, then it should be some factor reducing the amplitude that’s associated with the electron following such crooked path. So let’s go along with it, and see what comes out of it.

A three-hop flight is even weirder and uses that n2 factor two times:

P(A to E)∗n2∗P(E to D)∗n2∗P(D to B)

So we have an (n2)= nfactor here, which is good, because two hops should be much less likely than one hop. So what do we get? Well… (4.181×10−23)≈ 305×10−92. Pretty tiny, huh? :-) Of course, any point in space is a potential hop for the electron’s flight from point A to B and, hence, there’s a lot of paths and a lot of amplitudes (or ‘arrows’ if you want), which, again, is consistent with a very tiny value for n indeed.

So, to make a long story short, E(A to B) will be a giant sum (i.e. some kind of integral indeed) of a lot of different ways an electron can go from point A to B. It will be a series of terms P(A to E) + P(A to C)∗n2∗P(C to B) + P(A to E)∗n2∗P(E to D)∗n2∗P(D to B) + … for all possible intermediate points C, D, E, and so on.

What about the j? The junction number of coupling constant. How does that show up in the E(A to B) formula? Well… Those alternative paths with hops here and there are actually the easiest bit of the whole calculation. Apart from taking some strange path, electrons can also emit and/or absorb photons during the trip. In fact, it’s assumed they’re doing that actually constantly. Indeed, the image of an electron ‘in orbit’ around the nucleus is that of an electron exchanging so-called ‘virtual’ photons constantly, as illustrated below. So our image of an electron absorbing and then emitting a photon (see the diagram on the right-hand side) is really like the tiny tip of a giant iceberg: most of what’s going on is underneath! So that’s where our junction number j comes in, i.e. the charge (e) of the electron.

So, when you hear that a coupling constant is actually equal to the charge, then this is what it means: you should just note it’s the charge expressed in Planck units. But it’s a deep connection, isn’t? When everything is said and done, a charge is something physical, but so here, in these amplitude calculations, it just shows up as some dimensionless negative number, used in multiplications and additions of amplitudes. Isn’t that remarkable?

d2 d3

The situation becomes even more complicated when more than one electron is involved. For example, two electrons can go in a straight line from point 1 and 2 to point 3 and 4 respectively, but there’s two ways in which this can happen, and they might exchange photons along the way, as shown below. If there’s two alternative ways in which one event can happen, you know we have to add amplitudes, rather than multiply them. Hence, the formula for E(A to B) becomes even more complicated.


Moreover, a single electron may first emit and then absorb a photon itself, so there’s no need for other particles to be there to have lots of j factors in our calculation. In addition, that photon may briefly disintegrate into an electron and a positron, which then annihilate each other to again produce a photon: in case you wondered, that’s what those little loops in those diagrams depicting the exchange of virtual photons is supposed to represent. So, every single junction (i.e. every emission and/or absorption of a photon) involves a multiplication with that junction number j, so if there are two couplings involved, we have a j2 factor, and so that’s 0.085424552 = α ≈ 0.0073. Four couplings implies a factor of 0.085424554 ≈ 0.000053.

Just as an example, I copy two diagrams involving four, five or six couplings indeed. They all have some ‘incoming’ photon, because Feynman uses them to explain something else (the so-called magnetic moment of a photon), but it doesn’t matter: the same illustrations can serve multiple purposes.

d6 d7

Now, it’s obvious that the contributions of the alternatives with many couplings add almost nothing to the final amplitude – just like the ‘many-hop’ flights add almost nothing – but… Well… As tiny as these contributions are, they are all there, and so they all have to be accounted for. So… Yes. You can easily appreciate how messy it all gets, especially in light of the fact that there are so many points that can serve as a ‘hop’ or a ‘coupling’ point!

So… Well… Nothing. That’s it! I am done! I realize this has been another long and difficult story, but I hope you appreciated and that it shed some light on what’s really behind those simplified stories of what quantum mechanics is all about. It’s all weird and, admittedly, not so easy to understand, but I wouldn’t say an understanding is really beyond the reach of us, common mortals. :-)

Post scriptum: When you’ve reached here, you may wonder: so where’s the final formula then for E(A to B)? Well… There’s no final formula. From what I wrote above, it’s obvious there’s actually a whole bunch of them. In case you wonder, it’s some really awful-looking integral and, because it’s so awful, I’ll let you find it yourself. :-)

I should also note another reason why I am reluctant to identify n with m. The formulas in Feynman’s QED are definitely not the standard ones. The more standard formulations will use the gauge coupling parameter about which I talked already. I sort of discussed it, indirectly, in my first comments on Feynman’s QED, when I criticized some other part of the book, notably its explanation of the phenomenon of diffraction of light, which basically boiled down to: “When you try to squeeze light too much [by forcing it to go through a small hole], it refuses to cooperate and begins to spread out”, because “There are not enough arrows representing alternative paths.”

Now that raises a lot of questions, and very sensible ones, because that simplification is nonsensical. Not enough arrows? That statement doesn’t make sense. We can subdivide space in as many paths as we want, and probability amplitudes don’t take up any physical space. What happens is that, when we cut in all up in smaller pieces (so we analyze more paths within the same space), the directions of the arrows won’t change but their length will get ‘smaller’. That’s because of the normalization constraint. However, when adding them all up – a lot of very tiny ones, or a smaller bunch of bigger ones – we’ll still get the same ‘final’ arrow. That’s because the direction of those arrows depends on the length of the path, and that doesn’t change because we suddenly choose some other ‘gauge’.

Indeed, the real question is: what’s a ‘small’ hole? What’s ‘small’ and what’s ‘large’ in quantum electrodynamics? Now, I gave an intuitive answer to that question in that post of mine, but it’s much more accurate than Feynman’s, or Leighton’s. The answer to that question is: there’s some kind of natural ‘gauge’, and it’s related to the wavelength. So the wavelength of a photon, or an electron, in this case, comes with some kind of scale indeed. That’s why the fine-structure constant is often written in yet another form:

α = 2πree = rek

λe and kare the Compton wavelength and wavenumber of the electron (so kis not the Coulomb constant here). The Compton wavelength is the de Broglie wavelength of the electron. [You’ll find that Wikipedia defines it as “the wavelength that’s equivalent to the wavelength of a photon whose energy is the same as the rest-mass energy of the electron”, but that’s a very confusing definition, I think.]

Hence,, the point to note is that the spatial dimension in both the analysis of photons as well as of matter waves, especially in regard to studying diffraction and/or interference phenomena, is related to the frequencies, wavelengths and/or wavenumbers of their wavefunctions. There’s a certain ‘gauge’ involved indeed, i.e. some measure that is relative, like the gauge pressure illustrated below. So that’s where that gauge parameter g comes in. And the fact that it’s yet another number that’s closely related to that fine-structure constant is… Well… Again… Quite amazing indeed. :-)


Fields and charges (II)

My previous posts was, perhaps, too full of formulas, without offering much reflection. Let me try to correct that here by tying up a few loose ends. The first loose end is about units. Indeed, I haven’t been very clear about that and so let me somewhat more precise on that now.

Note: In case you’re not interested in units, you can skip the first part of this post. However, please do look at the section on the electric constant εand, most importantly, the section on natural units—especially Planck units, as I will touch upon the topic of gauge coupling parameters there and, hence, on quantum mechanics. Also, the third and last part, on the theoretical contradictions inherent in the idea of point charges, may be of interest to you.]

The field energy integrals

When we wrote that down that u = ε0E2/2 formula for the energy density of an electric field (see my previous post on fields and charges for more details), we noted that the 1/2 factor was there to avoid double-counting. Indeed, those volume integrals we use to calculate the energy over all space (i.e. U = ∫(u)dV) count the energy that’s associated with a pair of charges (or, to be precise, charge elements) twice and, hence, they have a 1/2 factor in front. Indeed, as Feynman notes, there is no convenient way, unfortunately, of writing an integral that keeps track of the pairs so that each pair is counted just once. In fact, I’ll have to come back to that assumption of there being ‘pairs’ of charges later, as that’s another loose end in the theory.

U 6

U 7

Now, we also said that that εfactor in the second integral (i.e. the one with the vector dot product EE =|E||E|cos(0) = E2) is there to make the units come out alright. Now, when I say that, what does it mean really? I’ll explain. Let me first make a few obvious remarks:

  1. Densities are always measures in terms per unit volume, so that’s the cubic meter (m3). That’s, obviously, an astronomical unit at the atomic or molecular scale.
  2. Because of historical reasons, the conventional unit of charge is not the so-called elementary charge +e (i.e. the charge of a proton), but the coulomb. Hence, the charge density ρ is expressed in Coulomb per cubic meter (C/m3). The coulomb is a rather astronomical unit too—at the atomic or molecular scale at least: 1 e ≈ 1.6022×10−19 C. [I am rounding here to four digits after the decimal point.]
  3. Energy is in joule (J) and that’s, once again, a rather astronomical unit at the lower end of the scales. Indeed, theoretical physicists prefer to use the electronvolt (eV), which is the energy gained (or lost) when an electron (so that’s a charge of –e, i.e. minus e) moves across a potential difference of one volt. But so we’ll stick to the joule as for now, not the eV, because the joule is the SI unit that’s used when defining most electrical units, such as the ampere, the watt and… Yes. The volt. Let’s start with that one.

The volt

The volt unit (V) measures both potential (energy) as well as potential difference (in both cases, we mean electric potential only, of course). Now, from all that you’ve read so far, it should be obvious that potential (energy) can only be measured with respect to some reference point. In physics, the reference point is infinity, which is so far away from all charges that there is no influence there. Hence, any charge we’d bring there (i.e. at infinity) will just stay where it is and not be attracted or repelled by anything. We say the potential there is zero: Φ(∞) = 0. The choice of that reference point allows us, then, to define positive or negative potential: the potential near positive charges will be positive and, vice versa, the potential near negative charges will be negative. Likewise, the potential difference between the positive and negative terminal of a battery will be positive.

So you should just note that we measure both potential as well as potential difference in volt and, hence, let’s now answer the question of what a volt really is. The answer is quite straightforward: the potential at some point r = (x, y, z) measures the work done when bringing one unit charge (i.e. +e) from infinity to that point. Hence, it’s only natural that we define one volt as one joule per unit charge:

1 volt = 1 joule/coulomb (1 V = 1 J/C).

Also note the following:

  1. One joule is the energy energy transferred (or work done) when applying a force of one newton over a distance of one meter, so one volt can also be measured in newton·meter per coulomb: 1 V = 1 J/C = N·m/C.
  2. One joule can also be written as 1 J = 1 V·C.

It’s quite easy to see why that energy = volt-coulomb product makes sense: higher voltage will be associated with higher energy, and the same goes for higher charge. Indeed, the so-called ‘static’ on our body is usually associated with potential differences of thousands of volts (I am not kidding), but the charges involved are extremely small, because the ability of our body to store electric charge is minimal (i.e. the capacitance (aka capacity) of our body). Hence, the shock involved in the discharge is usually quite small: it is measured in milli-joules (mJ), indeed.

The farad

The remark on ‘static’ brings me to another unit which I should mention in passing: the farad. It measures the capacitance (formerly known as the capacity) of a capacitor (formerly known as a condenser). A condenser consists, quite simply, of two separated conductors: it’s usually illustrated as consisting of two plates or of thin foils (e.g. aluminum foil) separated by an insulating film (e.g. waxed paper), but one can also discuss the capacity of a single body, like our human body, or a charged sphere. In both cases, however, the idea is the same: we have a ‘positive’ charge on one side (+q), and a ‘negative’ charge on the other (–q). In case of a single object, we imagine the ‘other’ charge to be some other large object (the Earth, for instance, but it can also be a car or whatever object that could potentially absorb the charge on our body) or, in case of the charged sphere, we could imagine some other sphere of much larger radius. The farad will then measure the capacity of one or both conductors to store charge.

Now, you may think we don’t need another unit here if that’s the definition: we could just express the capacity of a condensor in terms of its maximum ‘load’, couldn’t we? So that’s so many coulomb before the thing breaks down, when the waxed paper fails to separate the two opposite charges on the aluminium foil, for example. No. It’s not like that. It’s true we can not continue to increase the charge without consequences. However, what we want to measure with the farad is another relationship. Because of the opposite charges on both sides, there will be a potential difference, i.e. a voltage difference. Indeed, a capacitor is like a little battery in many ways: it will have two terminals. Now, it is fairly easy to show that the potential difference (i.e. the voltage) between the two plates will be proportional to the charge. Think of it as follows: if we double the charges, we’re doubling the fields, right? So then we need to do twice the amount of work to carry the unit charge (against the field) from one plate to the other. Now, because the distance is the same, that means the potential difference must be twice what it was.

Now, while we have a simple proportionality here between the voltage and the charge, the coefficient of proportionality will depend on the type of conductors, their shape, the distance and the type of insulator (aka dielectric) between them, and so on and so on. Now, what’s being measured in farad is that coefficient of proportionality, which we’ll denote by C(the proportionality coefficient for the charge), CV ((the proportionality coefficient for the voltage) or, because we should make a choice between the two, quite simply, as C. Indeed, we can either write (1) Q = CQV or, alternatively, V = CVQ, with C= 1/CV. As Feynman notes, “someone originally wrote the equation of proportionality as Q = CV”, so that’s what it will be: the capacitance (aka capacity) of a capacitor (aka condenser) is the ratio of the electric charge Q (on each conductor) to the potential difference V between the two conductors. So we know that’s a constant typical of the type of condenser we’re talking about. Indeed, the capacitance is the constant of proportionality defining the linear relationship between the charge and the voltage means doubling the voltage, and so we can write:

C = Q/V

Now, the charge is measured in coulomb, and the voltage is measured in volt, so the unit in which we will measure C is coulomb per volt (C/V), which is also known as the farad (F):

1 farad = 1 coulomb/volt (1 F = 1 C/V)

[Note the confusing use of the same symbol C for both the unit of charge (coulomb) as well as for the proportionality coefficient! I am sorrry about that, but so that’s convention!].

To be precise, I should add that the proportionality is generally there, but there are exceptions. More specifically, the way the charge builds up (and the way the field builds up, at the edges of the capacitor, for instance) may cause the capacitance to vary a little bit as it is being charged (or discharged). In that case, capacitance will be defined in terms of incremental changes: C = dQ/dV.

Let me conclude this section by giving you two formulas, which are also easily proved but so I will just give you the result:

  1. The capacity of a parallel-plate condenser is C = ε0A/d. In this formula, we have, once again, that ubiquitous electric constant ε(think of it as just another coefficient of proportionality), and then A, i.e. the area of the plates, and d, i.e. the separation between the two plates.
  2. The capacity of a charged sphere of radius r (so we’re talking the capacity of a single conductor here) is C = 4πε0r. This may remind you of the formula for the surface of a sphere (A = 4πr2), but note we’re not squaring the radius. It’s just a linear relationship with r.

I am not giving you these two formulas to show off or fill the pages, but because they’re so ubiquitous and you’ll need them. In fact, I’ll need the second formula in this post when talking about the other ‘loose end’ that I want to discuss.

Other electrical units

From your high school physics classes, you know the ampere and the watt, of course:

  1. The ampere is the unit of current, so it measures the quantity of charge moving or circulating per second. Hence, one ampere is one coulomb per second: 1 A = 1 C/s.
  2. The watt measures power. Power is the rate of energy conversion or transfer with respect to time. One watt is one joule per second: 1 W = 1 J/s = 1 N·m/s. Also note that we can write power as the product of current and voltage: 1 W = (1 A)·(1 V) = (1 C/s)·(1 J/C) = 1 J/s.

Now, because electromagnetism is such well-developed theory and, more importantly, because it has so many engineering and household applications, there are many other units out there, such as:

  • The ohm (Ω): that’s the unit of electrical resistance. Let me quickly define it: the ohm is defined as the resistance between two points of a conductor when a (constant) potential difference (V) of one volt, applied to these points, produces a current (I) of one ampere. So resistance (R) is another proportionality coefficient: R = V/I, and 1 ohm (Ω) = 1 volt/ampere (V/A). [Again, note the (potential) confusion caused by the use of the same symbol (V) for voltage (i.e. the difference in potential) as well as its unit (volt).] Now, note that it’s often useful to write the relationship as V = R·I, so that gives the potential difference as the product of the resistance and the current.
  • The weber (Wb) and the tesla (T): that’s the unit of magnetic flux (i.e. the strength of the magnetic field) and magnetic flux density (i.e. one tesla = one weber per square meter) respectively. So these have to do with the field vector B, rather than E. So we won’t talk about it here.
  • The henry (H): that’s the unit of electromagnetic inductance. It’s also linked to the magnetic effect. Indeed, from Maxwell’s equations, we know that a changing electric current will cause the magnetic field to change. Now, a changing magnetic field causes circulation of E. Hence, we can make the unit charge go around in some loop (we’re talking circulation of E indeed, not flux). The related energy, or the work that’s done by a unit of charge as it travels (once) around that loop, is – quite confusingly! – referred to as electromotive force (emf). [The term is quite confusing because we’re not talking force but energy, i.e. work, and, as you know by now, energy is force times distance, so energy and force are related but not the same.] To ensure you know what we’re talking about, let me note that emf is measured in volts, so that’s in joule per coulomb: 1 V = 1 J/C. Back to the henry now. If the rate of change of current in a circuit (e.g. the armature winding of an electric motor) is one ampere per second, and the resulting electromotive force (remember: emf is energy per coulomb) is one volt, then the inductance of the circuit is one henry. Hence, 1 H = 1 V/(1 A/s) = 1 V·s/A.     

The concept of impedance

You’ve probably heard about the so-called impedance of a circuit. That’s a complex concept, literally, because it’s a complex-valued ratio. I should refer you to the Web for more details, but let me try to summarize it because, while it’s complex, that doesn’t mean it’s complicated. :-) In fact, I think it’s rather easy to grasp after all you’ve gone through already. :-) So let’s give it a try.

When we have a simple direct current (DC), then we have a very straightforward definition of resistance (R), as mentioned above: it’s a simple ratio between the voltage (as measured in volt) and the current (as measured in ampere). Now, with alternating current (AC) circuits, it becomes more complicated, and so then it’s the concept of impedance that kicks in. Just like resistance, impedance also sort of measures the ‘opposition’ that a circuit presents to a current when a voltage is applied, but we have a complex ratio—literally: it’s a ratio with a magnitude and a direction, or a phase as it’s usually referred to. Hence, one will often write the impedance (denoted by Z) using Euler’s formula:

Z = |Z|eiθ

Now, if you don’t know anything about complex numbers, you should just skip all of what follows and go straight to the next section. However, if you do know what a complex number is (it’s an ‘arrow’, basically, and if θ is a variable, then it’s a rotating arrow, or a ‘stopwatch hand’, as Feynman calls it in his more popular Lectures on QED), then you may want to carry on reading.

The illustration below (credit goes to Wikipedia, once again) is, probably, the most generic view of an AC circuit that one can jot down. If we apply an alternating current, both the current as well as the voltage will go up and down. However, the current signal will lag the voltage signal, and the phase factor θ tells us by how much. Hence, using complex-number notation, we write:

V = IZ = I∗|Z|eiθ


Now, while that resembles the V = R·I formula I mentioned when discussing resistance, you should note the bold-face type for V and I, and the ∗ symbol I am using here for multiplication. First the ∗ symbol: that’s a convention Feynman adopts in the above-mentioned popular account of quantum mechanics. I like it, because it makes it very clear we’re not talking a vector cross product A×B here, but a product of two complex numbers. Now, that’s also why I write V and I in bold-face: they have a phase too and, hence, we can write them as:

  • = |V|ei(ωt + θV)
  • = |I|ei(ωt + θI)

This works out as follows:

IZ = |I|ei(ωt + θI)∗|Z|eiθ = |I||Z|ei(ωt + θ+ θ) = |V|ei(ωt + θV)

Indeed, because the equation must hold for all t, we can equate the magnitudes and phases and, hence, we get: |V| = |I||Z| and θ= θI + θ. But voltage and current is something real, isn’t it? Not some complex number? You’re right. The complex notation is used mainly to simplify the calculus, but it’s only the real part of those complex-valued functions that count. [In any case, because we limit ourselves to complex exponentials here, the imaginary part (which is the sine, as opposed to the real part, which is the cosine) is the same as the real part, but with a lag of its own (π/2 or 90 degrees, to be precise). Indeed: when writing Euler’s formula out (eiθ = cos(θ) + isin(θ), you should always remember that the sine and cosine function are basically the same function: they differ only in the phase, as is evident from the trigonometric identity sin(θ+π/) = cos(θ).]

Now, that should be more than enough in terms of an introduction to the units used in electromagnetic theory. Hence, let’s move on.

The electric constant ε0

Let’s now look at  that energy density formula once again. When looking at that u = ε0E2/2 formula, you may think that its unit should be the square of the unit in which we measure field strength. How do we measure field strength? It’s defined as the force on a unit charge (E = F/q), so it should be newton per coulomb (N/C). Because the coulomb can also be expressed in newton·meter/volt (1 V = 1 J/C = N·m/C and, hence, 1 C = 1 N·m/V), we can express field strength not only in newton/coulomb but also in volt per meter: 1 N/C = 1 N·V/N·m = 1 V/m. How do we get from N2/C2 and/or V2/mto J/m3?

Well… Let me first note there’s no issue in terms of units with that ρΦ formula in the first integral for U: [ρ]·[Φ] = (C/m3)·V = [(N·m/V)/m3)·V = (N·m)/m3 = J/m3. No problem whatsoever. It’s only that second expression for U, with the u = ε0E2/2 in the integrand, that triggers the question. Here, we just need to accept that we need that εfactor to make the units come out alright. Indeed, just like other physical constants (such as c, G, or h, for example), it has a dimension: its unit is either C2/N·m2 or, what amounts to the same, C/V·m. So the units come out alright indeed if, and only if, we multiply the N2/C2 and/or V2/m2 units with the dimension of ε0:

  1. (N2/C2)·(C2/N·m2) = (N2·m)·(1/m3) = J/m3
  2. (V2/m2)·(C/V·m) = V·C/m3 = (V·N·m/V)/m= N·m/m3 = J/m3


But so that’s the units only. The electric constant also has a numerical value:

ε0 = 8.854187817…×10−12 C/V·m ≈ 8.8542×10−12 C/V·m

This numerical value of εis as important as its unit to ensure both expressions for U yield the same result. Indeed, as you may or may not remember from the second of my two posts on vector calculus, if we have a curl-free field C (that means ×= 0 everywhere, which is the case when talking electrostatics only, as we are doing here), then we can always find some scalar field ψ such that C = ψ. But so here we have E = –ε0Φ, and so it’s not the minus sign that distinguishes the expression from the C = ψ expression, but the εfactor in front.

It’s just like the vector equation for heat flow: h = –κT. Indeed, we also have a constant of proportionality here, which is referred to as the thermal conductivity. Likewise, the electric constant εis also referred to as the permittivity of the vacuum (or of free space), for similar reasons obviously!

Natural units

You may wonder whether we can’t find some better units, so we don’t need the rather horrendous 8.8542×10−12 C/V·m factor (I am rounding to four digits after the decimal point). The answer is: yes, it’s possible. In fact, there are several systems in which the electric constant (and the magnetic constant, which we’ll introduce later) reduce to 1. The best-known are the so-called Gaussian and Lorentz-Heaviside units respectively.

Gauss defined the unit of charge in what is now referred to as the statcoulomb (statC), which is also referred to as the franklin (Fr) and/or the electrostatic unit of charge (esu), but I’ll refer you to the Wikipedia article on it in case you’d want to find out more about it. You should just note the definition of this unit is problematic in other ways. Indeed, it’s not so easy to try to define ‘natural units’ in physics, because there are quite a few ‘fundamental’ relations and/or laws in physics and, hence, equating this or that constant to one usually has implications on other constants. In addition, one should note that many choices that made sense as ‘natural’ units in the 19th century seem to be arbitrary now. For example:

  1. Why would we select the charge of the electron or the proton as the unit charge (+1 or –1) if we now assume that protons (and neutrons) consists of quarks, which have +2/3 or –1/3?
  2. What unit would we choose as the unit for mass, knowing that, despite all of the simplification that took place as a result of the generalized acceptance of the quark model, we’re still stuck with quite a few elementary particles whose mass would be a ‘candidate’ for the unit mass? Do we chose the electron, the u quark, or the d quark?

Therefore, the approach to ‘natural units’ has not been to redefine mass or charge or temperature, but the physical constants themselves. Obvious candidates are, of course, c and ħ, i.e. the speed of light and Planck’s constant. [You may wonder why physicists would select ħ, rather than h, as a ‘natural’ unit, but I’ll let you think about that. The answer is not so difficult.] That can be done without too much difficulty indeed, and so one can equate some more physical constants with one. The next candidate is the so-called Boltzmann constant (kB). While this constant is not so well known, it does pop up in a great many equations, including those that led Planck to propose his quantum of action, i.e.(see my post on Planck’s constant). When we do that – so when we equate c, ħ and kB with one (ħ = kB = 1), we still have a great many choices, so we need to impose further constraints. The next is to equate the gravitational constant with one, so then we have ħ = kB = G = 1.

Now, it turns out that the ‘solution’ of this ‘set’ of four equations (ħ = kB = G = 1) does, effectively, lead to ‘new’ values for most of our SI base units, most notably length, time, mass and temperature. These ‘new’ units are referred to as Planck units. You can look up their values yourself, and I’ll let you appreciate the ‘naturalness’ of the new units yourself. They are rather weird. The Planck length and time are usually referred to as the smallest possible measurable units of length and time and, hence, they are related to the so-called limits of quantum theory. Likewise, the Planck temperature is a related limit in quantum theory: it’s the largest possible measurable unit of temperature. To be frank, it’s hard to imagine what the scale of the Planck length, time and temperature really means. In contrast, the scale of the Planck mass is something we actually can imagine – it is said to correspond to the mass of an eyebrow hair, or a flea egg – but, again, its physical significance is not so obvious: Nature’s maximum allowed mass for point-like particles, or the mass capable of holding a single elementary charge. That triggers the question: do point-like charges really exist? I’ll come back to that question. But first I’ll conclude this little digression on units by introducing the so-called fine-structure constant, of which you’ve surely heard before.

The fine-structure constant

I wrote that the ‘set’ of equations ħ = kB = G = 1 gave us Planck units for most of our SI base units. It turns out that these four equations do not lead to a ‘natural’ unit for electric charge. We need to equate a fifth constant with one to get that. That fifth constant is Coulomb’s constant (often denoted as ke) and, yes, it’s the constant that appears in Coulomb’s Law indeed, as well as in some other pretty fundamental equations in electromagnetics, such as the field caused by a point charge q: E = q/4πε0r2. Hence, ke = 1/4πε0. So if we equate kwith one, then ε0 will, obviously, be equal to ε= 1/4π.

To make a long story short, adding this fifth equation to our set of five also gives us a Planck charge, and I’ll give you its value: it’s about 1.8755×10−18 C. As I mentioned that the elementary charge is 1 e ≈ 1.6022×10−19 C, it’s easy to that the Planck charge corresponds to some 11.7 times the charge of the proton. In fact, let’s be somewhat more precise and round, once again, to four digits after the decimal point: the qP/e ratio is about 11.7062. Conversely, we can also say that the elementary charge as expressed in Planck units, is about 1/11.7062 ≈ 0.08542455. In fact, we’ll use that ratio in a moment in some other calculation, so please jot it down.

0.08542455? That’s a bit of a weird number, isn’t it? You’re right. And trying to write it in terms of the charge of a u or d quark doesn’t make it any better. Also, note that the first four significant digits (8542) correspond to the first four significant digits after the decimal point of our εconstant. So what’s the physical significance here? Some other limit of quantum theory?

Frankly, I did not find anything on that, but the obvious thing to do is to relate is to what is referred to as the fine-structure constant, which is denoted by α. This physical constant is dimensionless, and can be defined in various ways, but all of them are some kind of ratio of a bunch of these physical constants we’ve been talking about:

Fine-structure constant formula

The only constants you have not seen before are μ0Rand, perhaps, ras well as m. However, these can be defined as a function of the constants that you did see before:

  1. The μ0 constant is the so-called magnetic constant. It’s something similar as ε0 and it’s referred to as the magnetic permeability of the vacuum. So it’s just like the (electric) permittivity of the vacuum (i.e. the electric constant ε0) and the only reason why you haven’t heard of this before is because we haven’t discussed magnetic fields so far. In any case, you know that the electric and magnetic force are part and parcel of the same phenomenon (i.e. the electromagnetic interaction between charged particles) and, hence, they are closely related. To be precise, μ= 1/ε0c2. That shows the first and second expression for α are, effectively, fully equivalent.
  2. Now, from the definition of ke = 1/4πε0, it’s easy to see how those two expressions are, in turn, equivalent with the third expression for α.
  3. The Rconstant is the so-called von Klitzing constant, but don’t worry about it: it’s, quite simply, equal to Rh/e2. Hene, substituting (and don’t forget that h = 2πħ) will demonstrate the equivalence of the fourth expression for α.
  4. Finally, the re factor is the classical electron radius, which is usually written as a function of me, i.e. the electron mass: re = e2/4πε0mec2. This very same equation implies that reme = e2/4πε0c2. So… Yes. It’s all the same really.

Let’s calculate its (rounded) value in the old units first, using the third expression:

  • The econstant is (roughly) equal to (1.6022×10–19 C)= 2.5670×10–38 C2. Coulomb’s constant k= 1/4πεis about 8.9876×10N·m2/C2. Hence, the numerator e2k≈ 23.0715×10–29 N·m2.
  • The (rounded) denominator is ħc = (1.05457×10–34 N·m·s)(2.998×108 m/s) = 3.162×10–26 N·m2.
  • Hence, we get α = kee2/ħc ≈ 7.297×10–3 = 0.007297.

Note that this number is, effectively, dimensionless. Now, the interesting thing is that if we calculate α using Planck units, we get an econstant that is (roughly) equal to 0.08542455= … 0.007297! Now, because all of the other constants are equal to 1 in Planck’s system of units, that’s equal to α itself. So… Yes ! The two values for α are one and the same in the two systems of units and, of course, as you might have guessed, the fine-structure constant is effectively dimensionless because it does not depend on our units of measurement. So what does it correspond to?

Now that would take me a very long time to explain, but let me try to summarize what it’s all about. In my post on quantum electrodynamics (QED) – so that’s the theory of light and matter basically and, most importantly, how they interact – I wrote about the three basic events in that theory, and how they are associated with a probability amplitude, so that’s a complex number, or an ‘arrow’, as Feynman puts it: something with (a) a magnitude and (b) a direction. We had to take the absolute square of these amplitudes in order to calculate the probability (i.e. some real number between 0 and 1) of the event actually happening. These three basic events or actions were:

  1. A photon travels from point A to B. To keep things simple and stupid, Feynman denoted this amplitude by P(A to B), and please note that the P stands for photon, not for probability. I should also note that we have an easy formula for P(A to B): it depends on the so-called space-time interval between the two points A and B, i.e. I = Δr– Δt= (x2–x1)2+(y2–y1)2+(z2–z1)– (t2–t1)2. Hence, the space-time interval takes both the distance in space as well as the ‘distance’ in time into account.
  2. An electron travels from point A to B: this was denoted by E(A to B) because… Well… You guessed it: the of electron. The formula for E(A to B) was much more complicated, but the two key elements in the formula was some complex number j (see below), and some other (real) number n.
  3. Finally, an electron could emit or absorb a photon, and the amplitude associated with this event was denoted by j, for junction.

Now, that junction number j is about –0.1. To be somewhat more precise, I should say it’s about –0.08542455.

–0.08542455? That’s a bit of a weird number, isn’t it? Hey ! Didn’t we see this number somewhere else? We did, but before you scroll up, let’s first interpret this number. It looks like an ordinary (real) number, but it’s an amplitude alright, so you should interpret it as an arrow. Hence, it can be ‘combined’ (i.e. ‘added’ or ‘multiplied’) with other arrows. More in particular, when you multiply it with another arrow, it amounts to a shrink to a bit less than one-tenth (because its magnitude is about 0.085 = 8.5%), and half a turn (the minus sign amounts to a rotation of 180°). Now, in that post of mine, I wrote that I wouldn’t entertain you on the difficulties of calculating this number but… Well… We did see this number before indeed. Just scroll up to check it. We’ve got a very remarkable result here:

j ≈ –0.08542455 = –√0.007297 = –√α = –e expressed in Planck units

So we find that our junction number j or – as it’s better known – our coupling constant in quantum electrodynamics (aka as the gauge coupling parameter g) is equal to the (negative) square root of that fine-structure constant which, in turn, is equal to the charge of the electron expressed in the Planck unit for electric charge. Now that is a very deep and fundamental result which no one seems to be able to ‘explain’—in an ‘intuitive’ way, that is.

I should immediately add that, while we can’t explain it, intuitively, it does make sense. A lot of sense actually. Photons carry the electromagnetic force, and the electromagnetic field is caused by stationary and moving electric charges, so one would expect to find some relation between that junction number j, describing the amplitude to emit or absorb a photon, and the electric charge itself, but… An equality? Really?

Well… Yes. That’s what it is, and I look forward to trying to understand all of this better. For now, however, I should proceed with what I set out to do, and that is to tie up a few loose ends. This was one, and so let’s move to the next, which is about the assumption of point charges.

Note: More popular accounts of quantum theory say α itself is ‘the’ coupling constant, rather than its (negative) square –√α = j = –e (expressed in Planck units). That’s correct: g or j are, technically speaking, the (gauge) coupling parameter, not the coupling constant. But that’s a little technical detail which shouldn’t bother you. The result is still what it is: very remarkable! I should also note that it’s often the value of the reciprocal (1/α) that is specified, i.e. 1/0.007297 ≈ 137.036. But so now you know what this number actually stands for. :-)

Do point charges exist?

Feynman’s Lectures on electrostatics are interesting, among other things, because, besides highlighting the precision and successes of the theory, he also doesn’t hesitate to point out the contradictions. He notes, for example, that “the idea of locating energy in the field is inconsistent with the assumption of the existence of point charges.”


Yes. Let’s explore the point. We do assume point charges in classical physics indeed. The electric field caused by a point charge is, quite simply:

E = q/4πε0r2

Hence, the energy density u is ε0E2/2 = q2/32πε0r4. Now, we have that volume integral U = (ε0/2)∫EEdV = ∫(ε0E2/2)dV integral. As Feynman notes, nothing prevents us from taking a spherical shell for the volume element dV, instead of an infinitesimal cube. This spherical shell would have the charge q at its center, an inner radius equal to r, an infinitesimal thickness dr, and, finally, a surface area 4πr(that’s just the general formula for the surface area of a spherical shell, which I also noted above). Hence, its (infinitesimally small) volume is 4πr2dr, and our integral becomes:


To calculate this integral, we need to take the limit of –q2/8πε0r for (a) r tending to zero (r→0) and for (b) r tending to infinity (r→∞). The limit for r = ∞ is zero. That’s OK and consistent with the choice of our reference point for calculating the potential of a field. However, the limit for r = 0 is zero is infinity! Hence, that U = (ε0/2)∫EEdV basically says there’s an infinite amount of energy in the field of a point charge! How is that possible? It cannot be true, obviously.

So… Where did we do wrong?

Your first reaction may well be that this very particular approach (i.e. replacing our infinitesimal cubes by infinitesimal shells) to calculating our integral is fishy and, hence, not allowed. Maybe you’re right. Maybe not. It’s interesting to note that we run into similar problems when calculating the energy of a charged sphere. Indeed, we mentioned the formula for the capacity of a charged sphere: C = 4πε0r. Now, there’s a similarly easy formula for the energy of a charged sphere. Let’s look at how we charge a condenser:

  • We know that the potential difference between two plates of a condenser represents the work we have to do, per unit charge, to transfer a charge (Q) from one plate to the other. Hence, we can write V = ΔU/ΔQ.
  • We will, of course, want to do a differential analysis. Hence, we’ll transfer charges incrementally, one infinitesimal little charge dQ at the time, and re-write V as V = dU/dQ or, what amounts to the same: dU = V·dQ.
  • Now, we’ve defined the capacitance of a condenser as C = Q/V. [Again, don’t be confused: C stands for capacity here, measured in coulomb per volt, not for the coulomb unit.] Hence, we can re-write dU as dU = Q·dQ/C.
  • Now we have to integrate dU going from zero charge to the final charge Q. Just do a little bit of effort here and try it. You should get the following result: U = Q2/2C. [We could re-write this as U = (C2/V2)/2C =  C·V2/2, which is a form that may be more useful in some other context but not here.]
  • Using that C = 4πε0r formula, we get our grand result. The energy of a charged sphere is:

U = Q2/8πε0r

From that formula, it’s obvious that, if the radius of our sphere goes to zero, its energy should also go to infinity! So it seems we can’t really pack a finite charge Q in one single point. Indeed, to do that, our formula says we need an infinite amount of energy. So what’s going on here?

Nothing much. You should, first of all, remember how we got that integral: see my previous post for the full derivation indeed. It’s not that difficult. We first assumed we had pairs of charges qi and qfor which we calculated the total electrostatic energy U as the sum of the energies of all possible pairs of charges:

U 3

And, then, we looked at a continuous distribution of charge. However, in essence, we still did the same: we counted the energy of interaction between infinitesimal charges situated at two different points (referred to as point 1 and 2 respectively), with a 1/2 factor in front so as to ensure we didn’t double-count (there’s no way to write an integral that keeps track of the pairs so that each pair is counted only once):

U 4

Now, we reduced this double integral by a clever substitution to something that looked a bit better:

U 6

Finally, some more mathematical tricks gave us that U = (ε0/2)∫EEdV integral.

In essence, what’s wrong in that integral above is that it actually includes the energy that’s needed to assemble the finite point charge q itself from an infinite number of infinitesimal parts. Now that energy is infinitely large. We just can’t do it: the energy required to construct a point charge is ∞.

Now that explains the physical significance of that Planck mass ! We said Nature has some kind of maximum allowable mass for point-like particles, or the mass capable of holding a single elementary charge. What’s going on is, as we try to pile more charge on top of the charge that’s already there, we add energy. Now, energy has an equivalent mass. Indeed, the Planck charge (q≈ 1.8755×10−18 C), the Planck length (l= 1.616×10−35 m), the Planck energy (1.956×109 J), and the Planck mass (2.1765×10−8 kg) are all related. Now things start making sense. Indeed, we said that the Planck mass is tiny but, still, it’s something we can imagine, like a flea’s egg or the mass of a hair of a eyebrow. The associated energy (E = mc2, so that’s (2.1765×10−8 kg)·(2.998×108 m/s)2 ≈ 19.56×108 kg·m2/s= 1.956×109 joule indeed.

Now, how much energy is that? Well… That’s about 2 giga-joule, obviously, but so what’s that in daily life? It’s about the energy you would get when burning 40 liter of fuel. It’s also likely to amount, more or less, to your home electricity consumption over a month. So it’s sizable, and so we’re packing all that energy into a Planck volume (lP≈ 4×10−105 m3). If we’d manage that, we’d be able to create tiny black holes, because that’s what that little Planck volume would become if we’d pack so much energy in it. So… Well… Here I just have to refer you to more learned writers than I am. As Wikipedia notes dryly: “The physical significance of the Planck length is a topic of theoretical research. Since the Planck length is so many orders of magnitude smaller than any current instrument could possibly measure, there is no way of examining it directly.”

So… Well… That’s it for now. The point to note is that we would not have any theoretical problems if we’d assume our ‘point charge’ is actually not a point charge but some small distribution of charge itself. You’ll say: Great! Problem solved! 

Well… For now, yes. But Feynman rightly notes that assuming that our elementary charges do take up some space results in other difficulties of explanation. As we know, these difficulties are solved in quantum mechanics, but so we’re not supposed to know that when doing these classical analyses. :-)

Fields and charges (I)

My previous posts focused mainly on photons, so this one should be focused more on matter-particles, things that have a mass and a charge. However, I will use it more as an opportunity to talk about fields and present some results from electrostatics using our new vector differential operators (see my posts on vector analysis).

Before I do so, let me note something that is obvious but… Well… Think about it: photons carry the electromagnetic force, but have no electric charge themselves. Likewise, electromagnetic fields have energy and are caused by charges, but so they also carry no charge. So… Fields act on a charge, and photons interact with electrons, but it’s only matter-particles (notably the electron and the proton, which is made of quarks) that actually carry electric charge. Does that make sense? It should. :-)

Another thing I want to remind you of, before jumping into it all head first, are the basic units and relations that are valid always, regardless of what we are talking about. They are represented below:


Let me recapitulate the main points:

  • The speed of light is always the same, regardless of the reference frame (inertial or moving), and nothing can travel faster than light (except mathematical points, such as the phase velocity of a wavefunction).
  • This universal rule is the basis of relativity theory and the mass-energy equivalence relation E = mc2.
  • The constant speed of light also allows us to redefine the units of time and/or distance such that c = 1. For example, if we re-define the unit of distance as the distance traveled by light in one second, or the unit of time as the time light needs to travel one meter, then c = 1.
  • Newton’s laws of motion define a force as the product of a mass and its acceleration: F = m·a. Hence, mass is a measure of inertia, and the unit of force is 1 newton (N) = 1 kg·m/s2.
  • The momentum of an object is the product of its mass and its velocity: p = m·v. Hence, its unit is 1 kg·m/s = 1 N·s. Therefore, the concept of momentum combines force (N) as well as time (s).
  • Energy is defined in terms of work: 1 Joule (J) is the work done when applying a force of one newton over a distance of one meter: 1 J = 1 N·m. Hence, the concept of energy combines force (N) and distance (m).
  • Relativity theory establishes the relativistic energy-momentum relation pc = Ev/c, which can also be written as E2 = p2c+ m02c4, with mthe rest mass of an object. The rest mass of an object is invariant for all frames of references, just like c. These equations reduce to m = E and E = p2 + m0when choosing time and/or distance units such that c = 1. The mass is the total mass of the object, including its inertial mass as well as the equivalent mass of its kinetic energy.
  • The relationships above establish (a) energy and time and (b) momentum and position as complementary variables and, hence, the Uncertainty Principle can be expressed in terms of both. The Uncertainty Principle, as well as the Planck-Einstein relation and the de Broglie relation (not shown on the diagram), establish a quantum of action, h, whose dimension combines force, distance and time (h ≈ 6.626×10−34 N·m·s). This quantum of action (Wirkung) can be defined in various ways, as it pops up in more than one fundamental relation, but one of the more obvious approaches is to define h as the coefficient of proportionality constant between the energy of a photon (i.e. the ‘light particle’) and its frequency: h = E/ν.

Note that we talked about forces and energy above, but we didn’t say anything about the origin of these forces. That’s what we are going to do now, even if we’ll limit ourselves to the electromagnetic force only.


According to Wikipedia, electrostatics deals with the phenomena and properties of stationary or slow-moving electric charges with no acceleration. Feynman usually uses the term when talking about stationary charges only. If a current is involved (i.e. slow-moving charges with no acceleration), the term magnetostatics is preferred. However, the distinction does not matter all that much because  – remarkably! – with stationary charges and steady currents, the electric and magnetic fields (E and B) can be analyzed as separate fields: there is no interconnection whatsoever! That shows, mathematically, as a neat separation between (1) Maxwell’s first and second equation and (2) Maxwell’s third and fourth equation:

  1. Electrostatics: (i) ∇•E = ρ/ε0 and (ii) ×E = 0.
  2. Magnetostatics: (iii) c2∇×B = j0 and (iv) B = 0.

Electrostatics: The ρ in equation (i) is the so-called charge density, which describes the distribution of electric charges in space: ρ = ρ(x, y, z). To put it simply: ρ is the ‘amount of charge’ (which we’ll denote by Δq) per unit volume at a given point. As for ε0, that’s a constant which ensures all units are ‘compatible’. Equation (i) basically says we have some flux of E, the exact amount of which is determined by the charge density ρ or, more in general, by the charge distribution in space. As for equation (ii), i.e. ×E = 0, we can sort of forget about that. It means the curl of E is zero: everywhere, and always. So there’s no circulation of E. Hence, E is a so-called curl-free field, in this case at least, i.e. when only stationary charges and steady currents are involved.

Magnetostatics: The j in (iii) represents a steady current indeed, causing some circulation of B. The cfactor is related to the fact that magnetism is actually only a relativistic effect of electricity, but I can’t dwell on that here. I’ll just refer you to what Feynman writes about this in his Lectures, and warmly recommend to read it. Oh… Equation (iv), B = 0, means that the divergence of B is zero: everywhere, and always. So there’s no flux of B. None. So B is a divergence-free field.

Because of the neat separation, we’ll just forget about B and talk about E only.

The electric potential

OK. Let’s try to go through the motions as quickly as we can. As mentioned in my introduction, energy is defined in terms of work done. So we should just multiply the force and the distance, right? 1 Joule = 1 newton × 1 meter, right? Well… Yes and no. In discussions like this, we talk potential energy, i.e. energy stored in the system, so to say. That means that we’re looking at work done against the force, like when we carry a bucket of water up to the third floor or, to use a somewhat more scientific description of what’s going on, when we are separating two masses. Because we’re doing work against the force, we put a minus sign in front of our integral:

formula 1

Now, the electromagnetic force works pretty much like gravity, except that, when discussing gravity, we only have positive ‘charges’ (the mass of some object is always positive). In electromagnetics, we have positive as well as negative charge, and please note that two like charges repel (that’s not the case with gravity). Hence, doing work against the electromagnetic force may involve bringing like charges together or, alternatively, separating opposite charges. We can’t say. Fortunately, when it comes to the math of it, it doesn’t matter: we will have the same minus sign in front of our integral. The point is: we’re doing work against the force, and so that’s what the minus sign stands for. So it has nothing to do with the specifics of the law of attraction and repulsion in this case (electromagnetism as opposed to gravity) and/or the fact that electrons carry negative charge. No.

Let’s get back to the integral. Just in case you forgot, the integral sign ∫ stands for an S: the S of summa, i.e. sum in Latin, and we’re using these integrals because we’re adding an infinite number of infinitesimally small contributions to the total effort here indeed. You should recognize it, because it’s a general formula for energy or work. It is, once again, a so-called line integral, so it’s a bit different than the ∫f(x)dx stuff you learned from high school. Not very different, but different nevertheless. What’s different is that we have a vector dot product F•ds after the integral sign here, so that’s not like f(x)dx. In case you forgot, that f(x)dx product represents the surface of an infinitesimally rectangle, as shown below: we make the base of the rectangle smaller and smaller, so dx becomes an infinitesimal indeed. And then we add them all up and get the area under the curve. If f(x) is negative, then the contributions will be negative.


But so we don’t have little rectangles here. We have two vectors, F and ds, and their vector dot product, F•ds, which will give you… Well… I am tempted to write: the tangential component of the force along the path, but that’s not quite correct: if ds was a unit vector, it would be true—because then it’s just like that h•n product I introduced in our first vector calculus class. However, ds is not a unit vector: it’s an infinitesimal vector, and, hence, if we write the tangential component of the force along the path as Ft, then F•d= |F||ds|cosθ = F·cosθ·ds = Ft·ds. So this F•ds is a tangential component over an infinitesimally small segment of the curve. In short, it’s an infinitesimally small contribution to the total amount of work done indeed. You can make sense of this by looking at the geometrical representation of the situation below.

illustration 1

I am just saying this so you know what that integral stands for. Note that we’re not adding arrows once again, like we did when calculating amplitudes or so. It’s all much more straightforward really: a vector dot product is a scalar, so it’s just some real number—just like any component of a vector (tangential, normal, in the direction of one of the coordinates axes, or in whatever direction) is not a vector but a real number. Hence, W is also just some real number. It can be positive or negative because… Well… When we’d be going down the stairs with our bucket of water, our minus sign doesn’t disappear. Indeed, our convention to put that minus sign there should obviously not depend on what point a and b we’re talking about, so we may actually be going along the direction of the force when going from a to b.

As a matter of fact, you should note that’s actually the situation which is depicted above. So then we get a negative number for W. Does that make sense? Of course it does: we’re obviously not doing any work here as we’re moving along the direction, so we’re surely not adding any (potential) energy to the system. On the contrary, we’re taking energy out of the system. Hence, we are reducing its (potential) energy and, hence, we should have a negative value for W indeed. So, just think of the minus sign being there to ensure we add potential energy to the system when going against the force, and reducing it when going with the force.

OK. You get this. You probably also know we’ll re-define W as a difference in potential between two points, which we’ll write as Φ(b) – Φ(a). Now that should remind you of your high school integral ∫f(x)dx once again. For a definite integral over a line segment [a, b], you’d have to find the antiderivative of f(x), which you’d write as F(x), and then you’d take the difference F(b) – F(a) too. Now, you may or may not remember that this antiderivative was actually a family of functions F(x) + k, and k could be any constant – 5/9, 6π, 3.6×10124, 0.86, whatever! – because such constant vanishes when taking the derivative.

Here we have the same, we can define an infinite number of functions Φ(r) + k, of which the gradient will yield… Stop! I am going too fast here. First, we need to re-write that W function above in order to ensure we’re calculating stuff in terms of the unit charge, so we write:

unit chage

Huh? Well… Yes. I am using the definition of the field E here really: E is the force (F) when putting a unit charge in the field. Hence, if we want the work done per unit charge, i.e. W(unit), then we have to integrate the vector dot product E·ds over the path from a to b. But so now you see what I want to do. It makes the comparison with our high school integral complete. Instead of taking a derivative in regard to one variable only, i.e. dF(x)/dx) = f(x), we have a function Φ here not in one but in three variables: Φ = Φ(x, y, z) = Φ(r) and, therefore, we have to take the vector derivative (or gradient as it’s called) of Φ to get E:

Φ(x, y, z) = (∂Φ/∂x, ∂Φ/∂y, ∂Φ/∂z) = –E(x, y, z)

But so it’s the same principle as what you learned how to use to solve your high school integral. Now, you’ll usually see the expression above written as:

E = –Φ

Why so short? Well… We all just love these mysterious abbreviations, don’t we? :-) Jokes aside, it’s true some of those vector equations pack an awful lot of information. Just take Feynman’s advice here: “If it helps to write out the components to be sure you understand what’s going on, just do it. There is nothing inelegant about that. In fact, there is often a certain cleverness in doing just that.” So… Let’s move on.

I should mention that we can only apply this more sophisticated version of the ‘high school trick’ because Φ and E are like temperature (T) and heat flow (h): they are fields. T is a scalar field and h is a vector field, and so that’s why we can and should apply our new trick: if we have the scalar field, we can derive the vector field. In case you want more details, I’ll just refer you to our first vector calculus class. Indeed, our so-called First Theorem in vector calculus was just about the more sophisticated version of the ‘high school trick': if we have some scalar field ψ (like temperature or potential, for example: just substitute the ψ in the equation below for T or Φ), then we’ll always find that:

First theorem

The Γ here is the curve between point 1 and 2, so that’s the path along which we’re going, and ψ must represent some vector field.

Let’s go back to our W integral. I should mention that it doesn’t matter what path we take: we’ll always get the same value for W, regardless of what path we take. That’s why the illustration above showed two possible paths: it doesn’t matter which one we take. Again, that’s only because E is a vector field. To be precise, the electrostatic field is a so-called conservative vector field, which means that we can’t get energy out of the field by first carrying some charge along one path, and then carrying it back along another. You’ll probably find that’s obvious,  and it is. Just note it somewhere in the back of your mind.

So we’re done. We should just substitute E for Φ, shouldn’t we? Well… Yes. For minus Φ, that is. Another minus sign. Why? Well… It makes that W(unit) integral come out alright. Indeed, we want a formula like W = Φ(b) – Φ(a), not like Φ(a) – Φ(b). Look at it. We could, indeed, define E as the (positive) gradient of some scalar field ψ = –Φ, and so we could write E = ψ, but then we’d find that W = –[ψ(b) – ψ(a)] = ψ(a) – ψ(b).

You’ll say: so what? Well… Nothing much. It’s just that our field vectors would point from lower to higher values of ψ, so they would be flowing uphill, so to say. Now, we don’t want that in physics. Why? It just doesn’t look good. We want our field vectors to be directed from higher potential to lower potential, always. Just think of it: heat (h) flows from higher temperature (T) to lower, and Newton’s apple falls from greater to lower height. Likewise, when putting a unit charge in the field, we want to see it move from higher to lower electric potential. Now, we can’t change the direction of E, because that’s the direction of the force and Nature doesn’t care about our conventions and so we can’t choose the direction of the force. But we can choose our convention. So that’s why we put a minus sign in front of Φ when writing E = –Φ. It makes everything come out alright. :-) That’s why we also have a minus sign in the differential heat flow equation: h = –κT.

So now we have the easy W(unit) = Φ(b) – Φ(a) formula that we wanted all along. Now, note that, when we say a unit charge, we mean a plus one charge. Yes: +1. So that’s the charge of the proton (it’s denoted by e) so you should stop thinking about moving electrons around! [I am saying this because I used to confuse myself by doing that. You end up with the same formulas for W and Φ but it just takes you longer to get there, so let me save you some time here. :-)]

But… Yes? In reality, it’s electrons going through a wire, isn’t? Not protons. Yes. But it doesn’t matter. Units are units in physics, and they’re always +1, for whatever (time, distance, charge, mass, spin, etcetera). AlwaysFor whatever. Also note that in laboratory experiments, or particle accelerators, we often use protons instead of electrons, so there’s nothing weird about it. Finally, and most fundamentally, if we have a –e charge moving through a neutral wire in one direction, then that’s exactly the same as a +e charge moving in the other way.

Just to make sure you get the point, let’s look at that illustration once again. We already said that we have F and, hence, E pointing from a to b and we’ll be reducing the potential energy of the system when moving our unit charge from a to b, so W was some negative value. Now, taking into account we want field lines to point from higher to lower potential, Φ(a) should be larger than Φ(b), and so… Well.. Yes. It all makes sense: we have a negative difference Φ(b) – Φ(a) = W(unit), which amounts, of course, to the reduction in potential energy.

The last thing we need to take care of now, is the reference point. Indeed, any Φ(r) + k function will do, so which one do we take? The approach here is to take a reference point Pat infinity. What’s infinity? Well… Hard to say. It’s a place that’s very far away from all of the charges we’ve got lying around here. Very far away indeed. So far away we can say there is nothing there really. No charges whatsoever. :-) Something like that. :-) In any case. I need to move on. So Φ(P0) is zero and so we can finally jot down the grand result for the electric potential Φ(P) (aka as the electrostatic or electric field potential):


So now we can calculate all potentials, i.e. when we know where the charges are at least. I’ve shown an example below. As you can see, besides having zero potential at infinity, we will usually also have one or more equipotential surfaces with zero potential. One could say these zero potential lines sort of ‘separate’ the positive and negative space. That’s not a very scientifically accurate description but you know what I mean.


Let me make a few final notes about the units. First, let me, once again, note that our unit charge is plus one, and it will flow from positive to negative potential indeed, as shown below, even if we know that, in an actual electric circuit, and so now I am talking about a copper wire or something similar, that means the (free) electrons will move in the other direction.

1280px-Current_notationIf you’re smart (and you are), you’ll say: what about the right-hand rule for the magnetic force? Well… We’re not discussing the magnetic force here but, because you insist, rest assured it comes out alright. Look at the illustration below of the magnetic force on a wire with a current, which is a pretty standard one.

terminalSo we have a given B, because of the bar magnet, and then v, the velocity vector for the… Electrons? No. You need to be consistent. It’s the velocity vector for the unit charges, which are positive (+e). Now just calculate the force F = qv×B = ev×B using the right-hand rule for the vector cross product, as illustrated below. So v is the thumb and B is the index finger in this case. All you need to do is tilt your hand, and it comes out alright.


But… We know it’s electrons going the other way. Well… If you insist. But then you have to put a minus sign in front of the q, because we’re talking minus e (–e). So now v is in the other direction and so v×B is in the other direction indeed, but our force F = qv×B = –ev×is not. Fortunately not, because physical reality should not depend on our conventions. :-) So… What’s the conclusion. Nothing. You may or may not want to remember that, when we say that our current j current flows in this or that direction, we actually might be talking electrons (with charge minus one) flowing in the opposite direction, but then it doesn’t matter. In addition, as mentioned above, in laboratory experiments or accelerators, we may actually be talking protons instead of electrons, so don’t assume electromagnetism is the business of electrons only.

To conclude this disproportionately long introduction (we’re finally ready to talk more difficult stuff), I should just make a note on the units. Electric potential is measured in volts, as you know. However, it’s obvious from all that I wrote above that it’s the difference in potential that matters really. From the definition above, it should be measured in the same unit as our unit for energy, or for work, so that’s the joule. To be precise, it should be measured in joule per unit charge. But here we have one of the very few inconsistencies in physics when it comes to units. The proton is said to be the unit charge (e), but its actual value is measured in coulomb (C). To be precise: +1 e = 1.602176565(35)×10−19 C. So we do not measure voltage – sorry, potential difference :-) – in joule but in joule per coulomb (J/C).

Now, we usually use another term for the joule/coulomb unit. You guessed it (because I said it): it’s the volt (V). One volt is one joule/coulomb: 1 V = 1 J/C. That’s not fair, you’ll say. You’re right, but so the proton charge e is not a so-called SI unit. Is the Coulomb an SI unit? Yes. It’s derived from the ampere (A) which, believe it or not, is actually an SI base unit. One ampere is 6.241×1018 electrons (i.e. one coulomb) per second. You may wonder how the ampere (or the coulomb) can be a base unit. Can they be expressed in terms of kilogram, meter and second, like all other base units. The answer is yes but, as you can imagine, it’s a bit of a complex description and so I’ll refer you to the Web for that.

The Poisson equation

I started this post by saying that I’d talk about fields and present some results from electrostatics using our ‘new’ vector differential operators, so it’s about time I do that. The first equation is a simple one. Using our E = –Φ formula, we can re-write the ∇•E = ρ/ε0 equation as:

∇•E = ∇•∇Φ = ∇2Φ = –ρ/ε0

This is a so-called Poisson equation. The ∇2 operator is referred to as the Laplacian and is sometimes also written as Δ, but I don’t like that because it’s also the symbol for the total differential, and that’s definitely not the same thing. The formula for the Laplacian is given below. Note that it acts on a scalar field (i.e. the potential function Φ in this case).

LaplacianAs Feynman notes: “The entire subject of electrostatics is merely the study of the solutions of this one equation.” However, I should note that this doesn’t prevent Feynman from devoting at least a dozen of his Lectures on it, and they’re not the easiest ones to read. [In case you’d doubt this statement, just have a look at his lecture on electric dipoles, for example.] In short: don’t think the ‘study of this one equation’ is easy. All I’ll do is just note some of the most fundamental results of this ‘study’.

Also note that ∇•E is one of our ‘new’ vector differential operators indeed: it’s the vector dot product of our del operator () with E. That’s something very different than, let’s say, Φ. A little dot and some bold-face type make an enormous difference here. :-) You may or may remember that we referred to the ∇• operator as the divergence (div) operator (see my post on that).

Gauss’ Law

Gauss’ Law is not to be confused with Gauss’ Theorem, about which I wrote elsewhere. It gives the flux of E through a closed surface S, any closed surface S really, as the sum of all charges inside the surface divided by the electric constant ε(but then you know that constant is just there to make the units come out alright).

Gauss' Law

The derivation of Gauss’ Law is a bit lengthy, which is why I won’t reproduce it here, but you should note its derivation is based, mainly, on the fact that (a) surface areas are proportional to r2 (so if we double the distance from the source, the surface area will quadruple), and (b) the magnitude of E is given by an inverse-square law, so it decreases as 1/r2. That explains why, if the surface S describes a sphere, the number we get from Gauss’ Law is independent of the radius of the sphere. The diagram below (credit goes to Wikipedia) illustrates the idea.


The diagram can be used to show how a field and its flux can be represented. Indeed, the lines represent the flux of E emanating from a charge. Now, the total number of flux lines depends on the charge but is constant with increasing distance because the force is radial and spherically symmetric. A greater density of flux lines (lines per unit area) means a stronger field, with the density of flux lines (i.e. the magnitude of E) following an inverse-square law indeed, because the surface area of a sphere increases with the square of the radius. Hence, in Gauss’ Law, the two effect cancel out: the two factors vary with distance, but their product is a constant.

Now, if we describe the location of charges in terms of charge densities (ρ), then we can write Qint as:

Q int

Now, Gauss’ Law also applies to an infinitesimal cubical surface and, in one of my posts on vector calculus, I showed that the flux of E out of such cube is given by E·dV. At this point, it’s probably a good idea to remind you of what this ‘new’ vector differential operator •, i.e. our ‘divergence’ operator, stands for: the divergence of E (i.e. • applied to E, so that’s E) represents the volume density of the flux of E out of an infinitesimal volume around a given point. Hence, it’s the flux per unit volume, as opposed to the flux out of the infinitesimal cube itself, which is the product of and dV, i.e. E·dV.

So what? Well… Gauss’ Law applied to our infinitesimal volume gives us the following equality:

ES 1

That, in turn, simplifies to:

ES 2

So that’s Maxwell’s first equation once again, which is equivalent to our Poisson equation: E = ∇2Φ = –ρ/ε0. So what are we doing here? Just listing equivalent formulas? Yes. I should also note they can be derived from Coulomb’s law of force, which is probably the one you learned in high school. So… Yes. It’s all consistent. But then that’s what we should expect, of course. :-)

The energy in a field

All these formulas look very abstract. It’s about time we use them for something. A lot of what’s written in Feynman’s Lectures on electrostatics is applied stuff indeed: it focuses, among other things, on calculating the potential in various circumstances and for various distributions of charge. Now, funnily enough, while that E = –ρ/ε0 equation is equivalent to Coulomb’s law and, obviously, much more compact to write down, Coulomb’s law is easier to start with for basic calculations. Let me first write Coulomb’s law. You’ll probably recognize it from your high school days:

Coulomb's law

Fis the force on charge q1, and Fis the force on charge q2. Now, qand q2. may attract or repel each other but, in both cases, the forces will be equal and opposite. [In case you wonder, yes, that’s basically the law of action and reaction.] The e12 vector is the unit vector from qto q1, not from qto q2, as one might expect. That’s because we’re not talking gravity here: like charges do not attract but repel and, hence, we have to switch the order here. Having said that, that’s basically the only peculiar thing about the equation. All the rest is standard:

  1. The force is inversely proportional to the square of the distance and so we have an inverse-square law here indeed.
  2. The force is proportional to the charge(s).
  3. Finally, we have a proportionality constant, 1/4πε0, which makes the units come out alright. You may wonder why it’s written the way it’s written, i.e. with that 4π factor, but that factor (4π or 2π) actually disappears in a number of calculations, so then we will be left with just a 1/ε0 or a 1/2ε0 factor. So don’t worry about it.

We want to calculate potentials and all that, so the first thing we’ll do is calculate the force on a unit charge. So we’ll divide that equation by q1, to calculate E(1) = F1/q1:

E 1

Piece of cake. But… What’s E(1) really? Well… It’s the force on the unit charge (+e), but so it doesn’t matter whether or not that unit charge is actually there, so it’s the field E caused by a charge q2. [If that doesn’t make sense to you, think again.] So we can drop the subscripts and just write:

E 3

What a relief, isn’t it? The simplest formula ever: the (magnitude) of the field as a simple function of the charge q and its distance (r) from the point that we’re looking at, which we’ll write as P = (x, y, z). But what origin are we using to measure x, y and z. Don’t be surprised: the origin is q.

Now that’s a formula we can use in the Φ(P) integral. Indeed, the antiderivative is ∫(q/4πε0r2)dr. Now, we can bring q/4πε0 out and so we’re left with ∫(1/r2)dr. Now ∫(1/r2)dr is equal to –1/r + k, and so the whole antiderivative is –q/4πε0r + k. However, the minus sign cancels out with the minus sign in front of the Φ(P) = Φ(x, y, z)  integral, and so we get:

E 4

You should just do the integral to check this result. It’s the same integral but with P0 (infinity) as point a and P as point b in the integral, so we have ∞ as start value and r as end value. The integral then yields Φ(P) – Φ(P0) = –q/4πε0[1/r – 1/∞). [The k constant falls away when subtracting Φ(P0) from Φ(P).] But 1/∞ = 0, and we had a minus sign in front of the integral, which cancels the sign of –q/4πε0. So, yes, we get the wonderfully simple result above. Also please do quickly check if it makes sense in terms of sign: the unit charge is +e, so that’s a positive charge. Hence, Φ(x, y, z) will be positive if the sign of q is also positive, but negative if q would happen to be negative. So that’s OK.

Also note that the potential – which, remember, represents the amount of work to be done when bringing a unit charge (e) from infinity to some distance r from a charge q – is proportional to the charge of q. We also know that the force and, hence, the work is proportional to the charge that we are bringing in (that’s how we calculated the work per unit in the first place: by dividing the total amount of work by the charge). Hence, if we’d not bring some unit charge but some other charge q2, the work done would also be proportional to q2. Now, we need to make sure we understand what we’re writing and so let’s tidy up and re-label our first charge once again as q1, and the distance r as r12, because that’s what r is: the distance between the two charges. We then have another obvious but nice result: the work done in bringing two charges together from a large distance (infinity) is

U 1Now, one of the many nice properties of fields (scalar or vector fields) and the associated energies (because that’s what we are talking about here) is that we can simply add up contributions. For example, if we’d have many charges and we’d want to calculate the potential Φ at a point which we call 1, we can use the same Φ(r) = q/4πε0r formula which we had derived for one charge only, for all charges, and then we simply add the contributions of each to get the total potential:

P 1

Now that we’re here, I should, of course, also give the continuum version of this formula, i.e. the formula used when we’re talking charge densities rather than individual charges. The sum then becomes an infinite sum (i.e. an integral), and qj (note that j goes from 2 to n) becomes a variable which we write as ρ(2). We get:

U 2

Going back to the discrete situation, we get the same type of sum when bringing multiple pairs of charges qi and qj together. Hence, the total electrostatic energy U is the sum of the energies of all possible pairs of charges:

U 3It’s been a while since you’ve seen any diagram or so, so let me insert one just to reassure you it’s as simple as that indeed:

U system

Now, we have to be aware of the risk of double-counting, of course. We should not be adding qiqj/4πε0rij twice. That’s why we write ‘all pairs’ under the ∑ summation sign, instead of the usual i, j subscripts. The continuum version of this equation below makes that 1/2 factor explicit:

U 4

Hmm… What kind of integral is that? It’s a so-called double integral because we have two variables here. Not easy. However, there’s a lucky break. We can use the continuum version of our formula for Φ(1) to get rid of the ρ(2) and dV2 variables and reduce the whole thing to a more standard ‘single’ integral. Indeed, we can write:

U 5Now, because our point (2) no longer appears, we can actually write that more elegantly as:

U 6That looks nice, doesn’t it? But do we understand it? Just to make sure. Let me explain it. The potential energy of the charge ρdV is the product of this charge and the potential at the same point. The total energy is therefore the integral over ϕρdV, but then we are counting energies twice, so that’s why we need the 1/2 factor. Now, we can write this even more beautifully as:

U 7

Isn’t this wonderful? We have an expression for the energy of a field, not in terms of the charges or the charge distribution, but in terms of the field they produce.

I am pretty sure that, by now, you must be suffering from ‘formula overload’, so you probably are just gazing at this without even bothering to try to understand. Too bad, and you should take a break then or just go do something else, like biking or so. :-)

First, you should note that you know this EE expression already: EE is just the square of the magnitude of the field vector E, so EE = E2. That makes sense because we know, from what we know about waves, that the energy is always proportional to the square of an amplitude, and so we’re just writing the same here but with a little proportionality constant (ε0).

OK, you’ll say. But you probably still wonder what use this formula could possibly have. What is that number we get from some integration over all space? So we associate the Universe with some number and then what? Well… Isn’t that just nice? :-) Jokes aside, we’re actually looking at that EE = Eproduct inside of the integral as representing an energy density (i.e. the energy per unit volume). We’ll denote that with a lower-case symbol and so we write:

D 6

Just to make sure you ‘get’ what we’re talking about here: u is the energy density in the little cube dV in the rather simplistic (and, therefore, extremely useful) illustration below (which, just like most of what I write above, I got from Feynman).


Now that should make sense to you—I hope. :-) In any case, if you’re still with me, and if you’re not all formula-ed out you may wonder how we get that ε0EE = ε0E2 expression from that ρΦ expression. Of course, you know that E = –∇Φ, and we also have the Poisson equation ∇2Φ = –ρ/ε0, but that doesn’t get you very far. It’s one of those examples where an easy-looking formula requires a lot of gymnastics. However, as the objective of this post is to do some of that, let me take you through the derivation.

Let’s do something with that Poisson equation first, so we’ll re-write it as ρ = –ε02Φ, and then we can substitute ρ in the integral with the ρΦ product. So we get:

U 8

Now, you should check out those fancy formulas with our new vector differential operators which we listed in our second class on vector calculus, but, unfortunately, none of them apply. So we have to write it all out and see what we get:

D 1

Now that looks horrendous and so you’ll surely think we won’t get anywhere with that. Well… Physicists don’t despair as easily as we do, it seems, and so they do substitute it in the integral which, of course, becomes an even more monstrous expression, because we now have two volume integrals instead of one! Indeed, we get:

D 2But if Φ is a vector field (it’s minus E, remember!), then ΦΦ is a vector field too, and we can then apply Gauss’ Theorem, which we mentioned in our first class on vector calculus, and which – mind you! – has nothing to do with Gauss’ Law. Indeed, Gauss produced so much it’s difficult to keep track of it all. :-) So let me remind you of this theorem. [I should also show why ΦΦ still yields a field, but I’ll assume you believe me.] Gauss’ Theorem basically shows how we can go from a volume integral to a surface integral:

Gauss Theorem-2If we apply this to the second integral in our U expression, we get:

D 4

So what? Where are we going with this? Relax. Be patient. What volume and surface are we talking about here? To make sure we have all charges and influences, we should integrate over all space and, hence, the surface goes to infinity. So we’re talking a (spherical) surface of enormous radius R whose center is the origin of our coordinate system. I know that sounds ridiculous but, from a math point of view, it is just the same like bringing a charge in from infinity, which is what we did to calculate the potential. So if we don’t difficulty with infinite line integrals, we should not have difficulty with infinite surface and infinite volumes. That’s all I can, so… Well… Let’s do it.

Let’s look at that product ΦΦ•n in the surface integral. Φ is a scalar and Φ is a vector, and so… Well… Φ•is a scalar too: it’s the normal component of Φ = –E. [Just to make sure, you should note that the way we define the normal unit vector n is such that ∇Φ•n is some positive number indeed! So n will point in the same direction, more or less, as ∇Φ = –E. So the θ angle  between ∇Φ = –E and n is surely less than ± 90° and, hence, the cosine factor in the ∇Φ•= |∇Φ||n|cosθ = |∇Φ|cosθ is positive, and so the whole vector dot product is positive.]

So, we have a product of two scalars here.  What happens with them if R goes to infinity? Well… The potential varies as 1/r as we’re going to infinity. That’s obvious from that Φ = (q/4πε0)(1/r) formula: just think of q as some kind of average now, which works because we assume all charges are located within some finite distance, while we’re going to infinity. What about Φ•n? Well… Again assuming that we’re reasonably far away from the charges, we’re talking the density of flux lines here (i.e. the magnitude of E) which, as shown above, follows an inverse-square law, because the surface area of a sphere increases with the square of the radius. So Φ•n varies not as 1/r but as 1/r2. To make a long story short, the whole product ΦΦ•n falls of as 1/r goes to infinity. Now, we shouldn’t forget we’re integrating a surface integral here, with r = R, and so it’s R going to infinity. So that surface integral has to go to zero when we include all space. The volume integral still stands however, so our formula for U now consists of one term only, i.e. the volume integral, and so we now have:

D 5

Done !

What’s left?

In electrostatics? Lots. Electric dipoles (like polar molecules), electrolytes, plasma oscillations, ionic crystals, electricity in the atmosphere (like lightning!), dielectrics and polarization (including condensers), ferroelectricity,… As soon as we try to apply our theory to matter, things become hugely complicated. But the theory works. Fortunately! :-) I have to refer you to textbooks, though, in case you’d want to know more about it. [I am sure you don’t, but then one never knows.]

What I wanted to do is to give you some feel for those vector and field equations in the electrostatic case. We now need to bring magnetic field back into the picture and, most importantly, move to electrodynamics, in which the electric and magnetic field do not appear as completely separate things. No! In electrodynamics, they are fully interconnected through the time derivatives ∂E/∂t and ∂B/∂t. That shows they’re part and parcel of the same thing really: electromagnetism. 

But we’ll try to tackle that in future posts. Goodbye for now!

The wave-particle duality revisited

As an economist, having some knowledge of what’s around in my field (social science), I think I am well-placed to say that physics is not an easy science. Its ‘first principles’ are complicated, and I am not ashamed to say that, after more than a year of study now, I haven’t reached what I would call a ‘true understanding’ of it.

Sometimes, the teachers are to be blamed. For example, I just found out that, in regard to the question of the wave function of a photon, the answer of two nuclear scientists was plain wrong. Photons do have a de Broglie wave, and there is a fair amount of research and actual experimenting going on trying to measure it. One scientific article which I liked in particular, and I hope to fully understand a year from now or so, is on such ‘direct measurement of the (quantum) wavefunction‘. For me, it drove home the message that these idealized ‘thought experiments’ that are supposed to make autodidacts like me understand things better, are surely instructive in regard to the key point, but confusing in other respects.

A typical example of such idealized thought experiment is the double-slit experiment with ‘special detectors’ near the slits, which may or may not detect a photon, depending on whether or not they’re switched on as well as on their accuracy. Depending on whether or not the detectors are switched on, and their accuracy, we get full interference (a), no interference (b), or a mixture of (a) and (b), as shown in (c) and (d).

set-up photons double-slit photons - results

I took the illustrations from Feynman’s lovely little book, QED – The Strange Theory of Light and Matter, and he surely knows what he’s talking about. Having said that, the set-up raises a key question in regard to these detectors: how do they work, exactly? More importantly, how do they disturb the photons?

I googled for actual double-slit experiments with such ‘special detectors’ near the slits, but only found such experiments for electrons. One of these, a 2010 experiment of an Italian team, suggests that it’s the interaction between the detector and the electron wave that may cause the interference pattern to disappear. The idea is shown below. The electron is depicted as an incoming plane wave, which breaks up as it goes through the slits. The slit on the left has no ‘filter’ (which you may think of as a detector) and, hence, the plane wave goes through as a cylindrical wave. The slit on the right-hand side is covered by a ‘filter’ made of several layers of ‘low atomic number material’, so the electron goes through but, at the same time, the barrier creates a spherical wave as it goes through. The researchers note that “the spherical and cylindrical wave do not have any phase correlation, and so even if an electron passed through both slits, the two different waves that come out cannot create an interference pattern on the wall behind them.” [Needless to say, while being represented as ‘real’ waves here, the ‘waves’ are, in fact, complex-valued psi functions.]

double-slit experiment

In fact, to be precise, there actually still was an interference effect if the filter was thin enough. Let me quote the reason for that: “The thicker the filter, the greater the probability for inelastic scattering. When the electron suffers inelastic scattering, it is localized. This means that its wavefunction collapses and, after the measurement act, it propagates roughly as a spherical wave from the region of interaction, with no phase relation at all with other elastically or inelastically scattered electrons. If the filter is made thick enough, the interference effects cancels out almost completely.”

This, of course, doesn’t solve the mystery. The mystery, in such experiments, is that, when we put detectors, it is either the detector at A or the detector at B that goes off. They should never go off together—”at half strength, perhaps?”, as Feynman puts it. That’s why I used italics when writing “even if an electron passed through both slits.” The electron, or the photon in a similar set-up, is not supposed to do that. As mentioned above, the wavefunction collapses or reduces. Now that’s where these so-called ‘weak measurement’ experiments come in: they indicate the interaction doesn’t have to be that way. It’s not all or nothing: our observations should not necessarily destroy the wavefunction. So, who knows, perhaps we will be able, one day, to show that the wavefunction does go through both slits, as it should (otherwise the interference pattern cannot be explained), and then we will have resolved the paradox.

I am pretty sure that, when that’s done, physicists will also be able to relate the image of a photon as a transient electromagnetic wave (first diagram below), being emitted by an atomic oscillator for a few nanoseconds only (we gave the example for sodium light, for which the decay time was 3.2×10–8 seconds) with the image of a photon as a de Broglie wave (second diagram below). I look forward to that day. I think it will come soon.

Photon wavePhoton wave


In the previous posts, I showed how the ‘real-world’ properties of photons and electrons emerge out of very simple mathematical notions and shapes. The basic notions are time and space. The shape is the wavefunction.

Let’s recall the story once again. Space is an infinite number of three-dimensional points (x, y, z), and time is a stopwatch hand going round and round—a cyclical thing. All points in space are connected by an infinite number of paths – straight or crooked, whatever  – of which we measure the length. And then we have ‘photons’ that move from A to B, but so we don’t know what is actually moving in space here. We just associate each and every possible path (in spacetime) between A and B with an amplitude: an ‘arrow‘ whose length and direction depends on (1) the length of the path l (i.e. the ‘distance’ in space measured along the path, be it straight or crooked), and (2) the difference in time between the departure (at point A) and the arrival (at point B) of our photon (i.e. the ‘distance in time’ as measured by that stopwatch hand).

Now, in quantum theory, anything is possible and, hence, not only do we allow for crooked paths, but we also allow for the difference in time to differ from l/c. Hence, our photon may actually travel slower or faster than the speed of light c! There is one lucky break, however, that makes all come out alright: the arrows associated with the odd paths and strange timings cancel each other out. Hence, what remains, are the nearby paths in spacetime only—the ‘light-like’ intervals only: a small core of space which our photon effectively uses as it travels through empty space. And when it encounters an obstacle, like a sheet of glass, it may or may not interact with the other elementary particle–the electron. And then we multiply and add the arrows – or amplitudes as we call them – to arrive at a final arrow, whose square is what physicists want to find, i.e. the likelihood of the event that we are analyzing (such a photon going from point A to B, in empty space, through two slits, or through as sheet of glass, for example) effectively happening.

The combining of arrows leads to diffraction, refraction or – to use the more general description of what’s going on – interference patterns:

  1. Adding two identical arrows that are ‘lined up’ yields a final arrow with twice the length of either arrow alone and, hence, a square (i.e. a probability) that is four times as large. This is referred to as ‘positive’ or ‘constructive’ interference.
  2. Two arrows of the same length but with opposite direction cancel each other out and, hence, yield zero: that’s ‘negative’ or ‘destructive’ interference.

Both photons and electrons are represented by wavefunctions, whose argument is the position in space (x, y, z) and time (t), and whose value is an amplitude or ‘arrow’ indeed, with a specific direction and length. But here we get a bifurcation. When photons interact with other, their wavefunctions interact just like amplitudes: we simply add them. However, when electrons interact with each other, we have to apply a different rule: we’ll take a difference. Indeed, anything is possible in quantum mechanics and so we combine arrows (or amplitudes, or wavefunctions) in two different ways: we can either add them or, as shown below, subtract one from the other.

vector addition

There are actually four distinct logical possibilities, because we may also change the order of A and B in the operation, but when calculating probabilities, all we need is the square of the final arrow, so we’re interested in its final length only, not in its direction (unless we want to use that arrow in yet another calculation). And so… Well… The fundamental duality in Nature between light and matter is based on this dichotomy only: identical (elementary) particles behave in one of two ways: their wavefunctions interfere either constructively or destructively, and that’s what distinguishes bosons (i.e. force-carrying particles, such as photons) from fermions (i.e. matter-particles, such as electrons). The mathematical description is complete and respects Occam’s Razor. There is no redundancy. One cannot further simplify: every logical possibility in the mathematical description reflects a physical possibility in the real world.

Having said that, there is more to an electron than just Fermi-Dirac statistics, of course. What about its charge, and this weird number, its spin?,

Well… That’s what’s this post is about. As Feynman puts it: “So far we have been considering only spin-zero electrons and photons, fake electrons and fake photons.”

I wouldn’t call them ‘fake’, because they do behave like real photons and electrons already but… Yes. We can make them more ‘real’ by including charge and spin in the discussion. Let’s go for it.

Charge and spin

From what I wrote above, it’s clear that the dichotomy between bosons and fermions (i.e. between ‘matter-particles’ and ‘force-carriers’ or, to put it simply, between light and matter) is not based on the (electric) charge. It’s true we cannot pile atoms or molecules on top of each other because of the repulsive forces between the electron clouds—but it’s not impossible, as nuclear fusion proves: nuclear fusion is possible because the electrostatic repulsive force can be overcome, and then the nuclear force is much stronger (and, remember, no quarks are being destroyed or created: all nuclear energy that’s being released or used is nuclear binding energy).

It’s also true that the force-carriers we know best, notably photons and gluons, do not carry any (electric) charge, as shown in the table below. So that’s another reason why we might, mistakenly, think that charge somehow defines matter-particles. However, we can see that matter-particles, first carry very different charges (positive or negative, and with very different values: 1/3, 2/3 or 1), and even be neutral, like the neutrinos. So, if there’s a relation, it’s very complex. In addition, one of the two force-carrier for the weak force, the W boson, can have positive or negative charge too, so that doesn’t make sense, does it? [I admit the weak force is a bit of a ‘special’ case, and so I should leave it out of the analysis.] The point is: the electric charge is what it is, but it’s not what defines matter. It’s just one of the possible charges that matter-particles can carry. [The other charge, as you know, is the color charge but, to confuse the picture once again, that’s a charge that can also be carried by gluons, i.e. the carriers of the strong force.]

Standard_Model_of_Elementary_ParticlesSo what is it, then? Well… From the table above, you can see that the property of ‘spin’ (i.e. the third number in the top left-hand corner) matches the above-mentioned dichotomy in behavior, i.e. the two different types of interference (bosons versus fermions or, to use a heavier term, Bose-Einstein statistics versus Fermi-Dirac statistics): all matter-particles are so-called spin-1/2 particles, while all force-carriers (gauge bosons) all have spin one. [Never mind the Higgs particle: that’s ‘just’ a mechanism to give (most) elementary particles some mass.]

So why is that? Why are matter-particles spin-1/2 particles and force-carries spin-1 particles? To answer that question, we need to answer the question: what’s this spin number? And to answer that question, we first need to answer the question: what’s spin?

Spin in the classical world

In the classical world, it’s, quite simply, the momentum associated with a spinning or rotating object, which is referred to as the angular momentum. We’ve analyzed the math involved in another post, and so I won’t dwell on that here, but you should note that, in classical mechanics, we distinguish two types of angular momentum:

  1. Orbital angular momentum: that’s the angular momentum an object gets from circling in an orbit, like the Earth around the Sun.
  2. Spin angular momentum: that’s the angular momentum an object gets from spinning around its own axis., just like the Earth, in addition to rotating around the Sun, is rotating around its own axis (which is what causes day and night, as you know).

The math involved in both is pretty similar, but it’s still useful to distinguish the two, if only because we’ll distinguish them in quantum mechanics too! Indeed, when I analyzed the math in the above-mentioned post, I showed how we represent angular momentum by a vector that’s perpendicular to the direction of rotation, with its direction given by the ubiquitous right-hand rule—as in the illustration below, which shows both the angular momentum (L) as well as the torque (τ) that’s produced by a rotating mass. The formulas are given too: the angular momentum L is the vector cross product of the position vector r and the linear momentum p, while the magnitude of the torque τ is given by the product of the length of the lever arm and the applied force. An alternative approach is to define the angular velocity ω and the moment of inertia I, and we get the same result: L = Iω. 


Of course, the illustration above shows orbital angular momentum only and, as you know, we no longer have a ‘planetary model’ (aka the Rutherford model) of an atom. So should we be looking at spin angular momentum only?

Well… Yes and no. More yes than no, actually. But it’s ambiguous. In addition, the analogy between the concept of spin in quantum mechanics, and the concept of spin in classical mechanics, is somewhat less than straightforward. Well… It’s not straightforward at all actually. But let’s get on with it and use more precise language. Let’s first explore it for light, not because it’s easier (it isn’t) but… Well… Just because. :-)

The spin of a photon

I talked about the polarization of light in previous posts (see, for example, my post on vector analysis): when we analyze light as a traveling electromagnetic wave (so we’re still in the classical analysis here, not talking about photons as ‘light particles’), we know that the electric field vector oscillates up and down and is, in fact, likely to rotate in the xy-plane (with z being the direction of propagation). The illustration below shows the idealized (aka limiting) case of perfectly circular polarization: if there’s polarization, it is more likely to be elliptical. The other limiting case is plane polarization: in that case, the electric field vector just goes up and down in one direction only. [In case you wonder whether ‘real’ light is polarized, it often is: there’s an easy article on that on the Physics Classroom site.]

spin angular momentumThe illustration above uses Dirac’s bra-ket notation |L〉 and |R〉 to distinguish the two possible ‘states’, which are left- or right-handed polarization respectively. In case you forgot about bra-ket notations, let me quickly remind you: an amplitude is usually denoted by 〈x|s〉, in which 〈x| is the so-called ‘bra’, i.e. the final condition, and |s〉 is the so-called ‘ket’, i.e. the starting condition, so 〈x|s〉 could mean: a photon leaves at s (from source) and arrives at x. It doesn’t matter much here. We could have used any notation, as we’re just describing some state, which is either |L〉 (left-handed polarization) or |R〉 (right-handed polarization). The more intriguing extras in the illustration above, besides the formulas, are the values: ± ħ = ±h/2π. So that’s plus or minus the (reduced) Planck constant which, as you know, is a very tiny constant. I’ll come back to that. So what exactly is being represented here?

At first, you’ll agree it looks very much like the momentum of light (p) which, in a previous post, we calculated from the (average) energy (E) as p = E/c. Now, we know that E is related to the (angular) frequency of the light through the Planck-Einstein relation E = hν = ħω. Now, ω is the speed of light (c) times the wave number (k), so we can write: p = ħω = ħck/c = ħk. The wave number is the ‘spatial frequency’, expressed either in cycles per unit distance (1/λ) or, more usually, in radians per unit distance (k = 2π/λ), so we can also write p = ħk = h/λ. Whatever way we write it, we find that this momentum (p) depends on the energy and/or, what amounts to saying the same, the frequency and/or the wavelength of the light.

So… Well… The momentum of light is not just h or ħ, i.e. what’s written in that illustration above. So it must be something different. In addition, I should remind you this momentum was calculated from the magnetic field vector, as shown below (for more details, see my post on vector calculus once again), so it had nothing to do with polarization really.

radiation pressure

Finally, last but not least, the dimensions of ħ and p = h/λ are also different (when one is confused, it’s always good to do a dimensional analysis in physics):

  1. The dimension of Planck’s constant (both h as well as ħ = h/2π) is energy multiplied by time (J·s or eV·s) or, equivalently, momentum multiplied by distance. It’s referred to as the dimension of action in physics, and h is effectively, the so-called quantum of action.
  2. The dimension of (linear) momentum is… Well… Let me think… Mass times velocity (mv)… But what’s the mass in this case? Light doesn’t have any mass. However, we can use the mass-energy equivalence: 1 eV = 1.7826×10−36 kg. [10−36? Well… Yes. An electronvolt is a very tiny measure of energy.] So we can express p in eV·m/s units.

Hmm… We can check: momentum times distance gives us the dimension of Planck’s constant again – (eV·m/s)·m = eV·s. OK. That’s good… […] But… Well… All of this nonsense doesn’t make us much smarter, does it? :-) Well… It may or may not be more useful to note that the dimension of action is, effectively, the same as the dimension of angular momentum. Huh? Why? Well… From our classical L = r×p formula, we find L should be expressed in m·(eV·m/s) = eV·m2/s  units, so that’s… What? Well… Here we need to use a little trick and re-express energy in mass units. We can then write L in kg·m2/s units and, because 1 Newton (N) is 1 kg⋅m/s2, the kg·m2/s unit is equivalent to the N·m·s = J·s unit. Done!

Having said that, all of this still doesn’t answer the question: are the linear momentum of light, i.e. our p, and those two angular momentum ‘states’, |L〉 and |R〉, related? Can we relate |L〉 and |R〉 to that L = r×p formula?

The answer is simple: no. The |L〉 and |R〉 states represent spin angular momentum indeed, while the angular momentum we would derive from the linear momentum of light using that L = r×p is orbital angular momentum. Let’s introduce the proper symbols: orbital angular momentum is denoted by L, while spin angular momentum is denoted by S. And then the total angular momentum is, quite simply, J = L + S.

L and S can both be calculated using either a vector cross product r × p (but using different values for r and p, of course) or, alternatively, using the moment of inertia tensor I and the angular velocity ω. The illustrations below (which I took from Wikipedia) show how, and also shows how L and S are added to yield J = L + S.



So what? Well… Nothing much. The illustration above show that the analysis – which is entirely classical, so far – is pretty complicated. [You should note, for example, that in the S = Iω and L Iω formulas, we don’t use the simple (scalar) moment of inertia but the moment of inertia tensor (so that’s a matrix denoted by I, instead of the scalar I), because S (or L) and ω are not necessarily pointing in the same direction.

By now, you’re probably very confused and wondering what’s wiggling really. The answer for the orbital angular momentum is: it’s the linear momentum vector p. Now…

Hey! Stop! Why would that vector wiggle?

You’re right. Perhaps it doesn’t. The linear momentum p is supposed to be directed in the direction of travel of the wave, isn’t it? It is. In vector notation, we have p = ħk, and that k vector (i.e. the wavevector) points in the direction of travel of the wave indeed and so… Well… No. It’s not that simple. The wave vector is perpendicular to the surfaces of constant phase, i.e. the so-called wave fronts, as show in the illustration below (see the direction of ek, which is a unit vector in the direction of k).

wave vector

So, yes, if we’re analyzing light moving in a straight one-dimensional line only, or we’re talking a plane wave, as illustrated below, then the orbital angular momentum vanishes.

plane wave

But the orbital angular momentum L does not vanish when we’re looking at a real light beam, like the ones below. Real waves? Well… OK… The ones below are idealized wave shapes as well, but let’s say they are somewhat more real than a plane wave. :-)


So what do we have here? We have wavefronts that are shaped as helices, except for the one in the middle (marked by m = 0) which is, once again, an example of plane wave—so for that one (m = 0), we have zero orbital angular momentum indeed. But look, very carefully, at the m = ± 1 and m = ± 2 situations. For m = ± 1, we have one helical surface with a step length equal to the wavelength λ. For m = ± 2, we have two intertwined helical surfaces with the step length of each helix surface equal to 2λ. [Don’t worry too much about the second and third column: they show a beam cross-section (so that’s not a wave front but a so-called phase front) and the (averaged) light intensity, again of a beam cross-section.] Now, we can further generalize and analyze waves composed of m helices with the step length of each helix surface equal to |m|λ. The Wikipedia article on OAM (orbital angular momentum of light), from which I got this illustration, gives the following formula to calculate the OAM:

Formula OAMThe same article also notes that the quantum-mechanical equivalent of this formula, i.e. the orbital angular momentum of the photons one would associate with the not-cylindrically-symmetric waves above (i.e. all those for which m ≠ 0), is equal to:

Lz = mħ

So what? Well… I guess we should just accept that as a very interesting result. For example, I duly note that Lis along the direction of propagation of the wave (as indicated by the z subscript), and I also note the very interesting fact that, apparently, Lz  can be either positive or negative. Now, I am not quite sure how such result is consistent with the idea of radiation pressure, but I am sure there must be some logical explanation to that. The other point you should note is that, once again, any reference to the energy (or to the frequency or wavelength) of our photon has disappeared. Hmm… I’ll come back to this, as I promised above already.

The thing is that this rather long digression on orbital angular momentum doesn’t help us much in trying to understand what that spin angular momentum (SAM) is all about. So, let me just copy the final conclusion of the Wikipedia article on the orbital angular momentum of light: the OAM is the component of angular momentum of light that is dependent on the field spatial distribution, not on the polarization of light.

So, again, what’s the spin angular momentum? Well… The only guidance we have is that same little drawing again and, perhaps, another illustration that’s supposed to compare SAM with OAM (underneath).

spin angular momentum

800px-Sam-oam-interactionNow, the Wikipedia article on SAM (spin angular momentum), from which I took the illustrations above, gives a similar-looking formula for it:

Formula SAM

When I say ‘similar-looking’, I don’t mean it’s the same. [Of course not! Spin and orbital angular momentum are two different things!]. So what’s different in the two formulas? Well… We don’t have any del operator () in the SAM formula, and we also don’t have any position vector (r) in the integral kernel (or integrand, if you prefer that term). However, we do find both the electric field vector (E) as well as the (magnetic) vector potential (A) in the equation again. Hence, the SAM (also) takes both the electric as well as the magnetic field into account, just like the OAM. [According to the author of the article, the expression also shows that the SAM is nonzero when the light polarization is elliptical or circular, and that it vanishes if the light polarization is linear, but I think that’s much more obvious from the illustration than from the formula… However, I realize I really need to move on here, because this post is, once again, becoming way too long. So…]

OK. What’s the equivalent of that formula in quantum mechanics?

Well… In quantum mechanics, the SAM becomes a ‘quantum observable’, described by a corresponding operator which has only two eigenvalues:

Sz = ± ħ

So that corresponds to the two possible values for Jz, as mentioned in the illustration, and we can understand, intuitively, that these two values correspond to two ‘idealized’ photons which describe a left- and right-handed circularly polarized wave respectively.

So… Well… There we are. That’s basically all there is to say about it. So… OK. So far, so good.

But… Yes? Why do we call a photon a spin-one particle?

That has to do with convention. A so-called spin-zero particle has no degrees of freedom in regard to polarization. The implied ‘geometry’ is that a spin-zero particle is completely symmetric: no matter in what direction you turn it, it will always look the same. In short, it really behaves like a (zero-dimensional) mathematical point. As you can see from the overview of all elementary particles, it is only the Higgs boson which has spin zero. That’s why the Higgs field is referred to as a scalar field: it has no direction. In contrast, spin-one particles, like photons, are also ‘point particles’, but they do come with one or the other kind of polarization, as evident from all that I wrote above. To be specific, they are polarized in the xy-plane, and can have one of two directions. So, when rotating them, you need a full rotation of 360° if you want them to look the same again.

Now that I am here, let me exhaust the topic (to a limited extent only, of course, as I don’t want to write a book here) and mention that, in theory, we could also imagine spin-2 particles, which would look the same after half a rotation (180°). However, as you can see from the overview, none of the elementary particles has spin-2. A spin-2 particle could be like some straight stick, as that looks the same even after it is rotated 180 degrees. I am mentioning the theoretical possibility because the graviton, if it would effectively exist, is expected to be a massless spin-2 boson. [Now why do I mention this? Not sure. I guess I am just noting this to remind you of the fact that the Higgs boson is definitely not the (theoretical) graviton, and/or that we have no quantum theory for gravity.]

Oh… That’s great, you’ll say. But what about all those spin-1/2 particles in the table? You said that all matter-particles are spin 1/2 particles, and that it’s this particular property that actually makes them matter-particles. So what’s the geometry here? What kind of ‘symmetries’ do they respect?

Well… As strange as it sounds, a spin-1/2 particle needs two full rotations (2×360°=720°) until it is again in the same state. Now, in regard to that particularity, you’ll often read something like: “There is nothing in our macroscopic world which has a symmetry like that.” Or, worse, “Common sense tells us that something like that cannot exist, that it simply is impossible.” [I won’t quote the site from which I took this quotes, because it is, in fact, the site of a very respectable  research center!] Bollocks! The Wikipedia article on spin has this wonderful animation: look at how the spirals flip between clockwise and counterclockwise orientations, and note that it’s only after spinning a full 720 degrees that this ‘point’ returns to its original configuration after spinning a full 720 degrees.


So, yes, we can actually imagine spin-1/2 particles, and with not all that much imagination, I’d say. But… OK… This is all great fun, but we have to move on. So what’s the ‘spin’ of these spin-1/2 particles and, more in particular, what’s the concept of ‘spin’ of an electron?

The spin of an electron

When starting to read about it, I thought that the angular momentum of an electron would be easier to analyze than that of a photon. Indeed, while a photon has no mass and no electric charge, that analysis with those E and B vectors is damn complicated, even when sticking to a strictly classical analysis. For an electron, the classical picture seems to be much more straightforward—but only at first indeed. It quickly becomes equally weird, if not more.

We can look at an electron in orbit as a rotating electrically charged ‘cloud’ indeed. Now, from Maxwell’s equations (or from your high school classes even), you know that a rotating electric charged body creates a magnetic dipole. So an electron should behave just like a tiny bar magnet. Of course, we have to make certain assumptions about the distribution of the charge in space but, in general, we can write that the magnetic dipole moment μ is equal to:

formule magnetic dipole moment

In case you want to know, in detail, where this formula comes from, let me refer you to Feynman once again, but trust me – for once :-) – it’s quite straightforward indeed: the L in this formula is the angular momentum, which may be the spin angular momentum, the orbital angular momentum, or the total angular momentum. The e and m are, of course, the charge and mass of the electron respectively.

So that’s a good and nice-looking formula, and it’s actually even correct except for the spin angular momentum as measured in experiments. [You’ll wonder how we can measure orbital and spin angular momentum respectively, but I’ll talk about an 1921 experiment in a few minutes, and so that will give you some clue as to that mystery. :-)] To be precise, it turns out that one has to multiply the above formula for μ with a factor which is referred to as the g-factor. [And, no, it’s got nothing to do with the gravitational constant or… Well… Nothing. :-)] So, for the spin angular momentum, the formula should be:

formula spin angular momentum

Experimental physicists are constantly checking that value and they know measure it to be something like g = is 2.00231930419922 ± 1.5×10−12. So what’s the explanation for that g? Where does it come from? There is, in fact, a classical explanation for it, which I’ll copy hereunder (yes, from Wikipedia). This classical explanation is based on assuming that the distribution of the electric charge of the electron and its mass does not coincide:

classical theory

Why do I mention this classical explanation? Well… Because, in most popular books on quantum mechanics (including Feynman’s delightful QED), you’ll read that (a) the value for g can be derived from a quantum-theoretical equation known as Dirac’s equation (or ‘Dirac theory’, as it’s referred to above) and, more importantly, that (b) physicists call the “accurate prediction of the electron g-factor” from quantum theory (i.e. ‘Dirac’s theory’ in this case) “one of the greatest triumphs” of the theory.

So what about it? Well… Whatever the merits of both explanations (classical or quantum-mechanical), they are surely not the reason why physicists abandoned the classical theory. So what was the reason then? What a stupid question! You know that already! The Rutherford model was, quite simply, not consistent: according to classical theory, electrons should just radiate their energy away and spiral into the nucleus. More in particular, there was yet another experiment that wasn’t consistent with classical theory, and it’s one that’s very relevant for the discussion at hand: it’s the so-called Stern-Gerlach experiment.

It was just as ‘revolutionary’ as the Michelson-Morley experiment (which couldn’t measure the speed of light), or the discovery of the positron in 1932. The Stern-Gerlach experiment was done in 1921, so that’s many years before quantum theory replaced classical theory and, hence, it’s not one of those experiments confirming quantum theory. No. Quite the contrary. It was, in fact, one of the experiments that triggered the so-called quantum revolution. Let me insert the experimental set-up and summarize it (below).


  • The German scientists Otto Stern and Walther Gerlach produced a beam of electrically-neutral silver atoms and let it pass through a (non-uniform) magnetic field. Why silver atoms? Well… Silver atoms are easy to handle (in a lab, that is) and easy to detect with a photoplate.
  • These atoms came out of an oven (literally), in which the silver was being evaporated (yes, one can evaporate silver), so they had no special orientation in space and, so Stern and Gerlach thought, the magnetic moment (or spin) of the outer electrons in these atoms would point into all possible directions in space.
  • As expected, the magnetic field did deflect the silver atoms, just like it would deflect little dipole magnets if you would shoot them through the field. However, the pattern of deflection was not the one which they expected. Instead of hitting the plate all over the place, within some contour, of course, only the contour itself was hit by the atoms. There was nothing in the middle!
  • And… Well… It’s a long story, but I’ll make it short. There was only one possible explanation for that behavior, and that’s that the magnetic moments – and, therefore the spins – had only two orientations in space, and two possible values only which – Surprise, surprise! – are ±ħ/2 (so that’s half the value of the spin angular momentum of photons, which explains the spin-1/2 terminology).

The spin angular momentum of an electron is more popularly known as ‘up’ or ‘down’.

So… What about it? Well… It explains why a atomic orbital can have two electrons, rather than one only and, as such, the behavior of the electron here is the basis of the so-called periodic table, which explains all properties of the chemical elements. So… Yes. Quantum theory is relevant, I’d say. :-)


This has been a terribly long post, and you may no longer remember what I promised to do. What I promised to do, is to write some more about the difference between a photon and an electron and, more in particular, I said I’d write more about their charge, and that “weird number”: their spin. I think I lived up to that promise. The summary is simple:

  1. Photons have no (electric) charge, but they do have spin. Their spin is linked to their polarization in the xy-plane (if z is the direction of propagation) and, because of the strangeness of quantum mechanics (i.e. the quantization of (quantum) observables), the value for this spin is either +ħ orħ, which explains why they are referred to as spin-one particles (because either value is one unit of the Planck constant).
  2. Electrons have both electric charge as well as spin. Their spin is different and is, in fact, related to their electric charge. It can be interpreted as the magnetic dipole moment, which results from the fact we have a spinning charge here. However, again, because of the strangeness of quantum mechanics, their dipole moment is quantized and can take only one of two values: ±ħ/2, which is why they are referred to as spin-1/2 particles.

So now you know everything you need to know about photons and electrons, and then I mean real photons and electrons, including their properties of charge and spin. So they’re no longer ‘fake’ spin-zero photons and electrons now. Isn’t that great? You’ve just discovered the real world! :-)

So… I am done—for the moment, that is… :-) If anything, I hope this post shows that even those ‘weird’ quantum numbers are rooted in ‘physical reality’ (or in physical ‘geometry’ at least), and that quantum theory may be ‘crazy’ indeed, but that it ‘explains’ experimental results. Again, as Feynman says:

“We understand how Nature works, but not why Nature works that way. Nobody understands that. I can’t explain why Nature behave in this particular way. You may not like quantum theory and, hence, you may not accept it. But physicists have learned to realize that whether they like a theory or not is not the essential question. Rather, it is whether or not the theory gives predictions that agree with experiment. The theory of quantum electrodynamics describes Nature as absurd from the point of view of common sense. But it agrees fully with experiment. So I hope you can accept Nature as She is—absurd.”

Frankly speaking, I am not quite prepared to accept Nature as absurd: I hope that some more familiarization with the underlying mathematical forms and shapes will make it look somewhat less absurd. More, I hope that such familiarization will, in the end, make everything look just as ‘logical’, or ‘natural’ as the two ways in which amplitudes can ‘interfere’.

Post scriptum: I said I would come back to the fact that, in the analysis of orbital and spin angular momentum of a photon (OAM and SAM), the frequency or energy variable sort of ‘disappears’. So why’s that? Let’s look at those expressions for |L〉 and |R〉 once again:

Formula L spin

Formula R spin

What’s written here really? If |L〉 and |R〉 are supposed to be equal to either +ħ orħ, then that product of ei(kz–ωt) with the 3×1 matrix (which is a ‘column vector’ in this case) does not seem to make much sense, does it? Indeed, you’ll remember that ei(kz–ωt) just a regular wave function. To be precise, its phase is φ = kz–ωt (with z the direction of propagation of the wave), and its real and imaginary part can be written as eiφ = cos(φ) + isin(φ) = a + bi. Multiplying it with that 3×1 column vector (1, i, 0) or (1, –i, 0) just yields another 3×1 column vector. To be specific, we get:

  1. The 3×1 ‘vector’ (a + bi, –b+ai, 0) for |L〉, and
  2. The 3×1 ‘vector’ (a + bi, b–ai, 0) for |R〉.

So we have two new ‘vectors’ whose components are complex numbers. Furthermore, we can note that their ‘x’-component is the same, their ‘y’-component is each other’s opposite –b+ai = –(b–ai), and their ‘z’-component is 0.

So… Well… In regard to their ‘y’-component, I should note that’s just the result of the multiplication with i and/or –i: multiplying a complex number with i amounts to a 90° degree counterclockwise rotation, while multiplication with –i amounts to the same but clockwise. Hence, we must arrive at two complex numbers that are each other’s opposite. [Indeed, in complex analysis, the value –1 = eiπ = eiπ is a 180° rotation, both clockwise (φ = –π) or counterclockwise (φ = +π), of course!.]

Hmm… Still… What does it all mean really? The truth is that it takes some more advanced math to interpret the result. To be precise, pure quantum states, such |L〉 and |R〉 here, are represented by so-called ‘state vectors’ in a Hilbert space over complex numbers. So that’s what we’ve got here. So… Well… I can’t say much more about this right now: we’ll just need to study some more before we’ll ‘understand’ those expressions for |L〉 and |R〉. So let’s not worry about it right now. We’ll get there.

Just for the record, I should note that, initially, I thought 1/√2 factor in front gave some clue as to what’s going on here: 1/√2 ≈ 0.707 is a factor that’s used to calculate the root mean square (RMS) value for a sine wave. It’s illustrated below. The RMS value is a ‘special average’ one can use to calculate the energy or power (i.e. energy per time unit) of a wave. [Using the term ‘average’ is misleading, because the average of a sine wave is 1/2 over half a cycle, and 0 over a fully cycle, as you can easily see from the shape of the function. But I guess you know what I mean.]

V-rmsIndeed, you’ll remember that the energy (E) of a wave is proportional to the square of its amplitude (A): E ∼ A2. For example, when we have a constant current I, the power P will be proportional to its square: P ∼ I2. With a varying current (I) and voltage (V), the formula is more complicated but we can simply it using the rms values: Pavg = VRMS·IRMS.

So… Looking at that formula, should we think of h and/or ħ as some kind of ‘average’ energy, like the energy of a photon per cycle or per radian? That’s an interesting idea so let’s explore it. If the energy of a photon is equal to E = ν = ω/2π = ħω, then we can also write:

h = E/ν and/or ħ = E/ω

So, yes: is the energy of a photon per cycle obviously and, because the phase covers 2π radians during each cycle, and ħ must be the energy of the photon per radian! That’s a great result, isn’t it? It also gives a wonderfully simple interpretation to Planck’s quantum of action!

Well… No. We made at least two mistakes here. The first mistake is that if we think of a photon as wave train being radiated by an atom – which, as we calculated in another post, lasts about 3.2×10–8 seconds – the graph for its energy is going to resemble the graph of its amplitude, so it’s going to die out and each oscillation will carry less and less energy. Indeed, the decay time given here (τ = 3.2×10–8 seconds) was the time it takes for the radiation (we assumed sodium light with a wavelength of 600 nanometer) to die out by a factor 1/e. To be precise, the shape of the energy curve is E = E0e−t/τ, and so it’s an envelope resembling the A(t) curve below.

decay time

Indeed, remember, the energy of a wave is determined not only by its frequency (or wavelength) but also by its amplitude, and so we cannot assume the amplitude of a ‘photon wave’ is going to be the same everywhere. Just for the record: note that the energy of a wave is proportional to the frequency (doubling the frequency doubles the energy) but, when linking it to the amplitude, we should remember that the energy is proportional to the square of the amplitude, so we write E ∼ A2.

The second mistake is that both ν and ω are the light frequency (expressed in cycles or radians respectively) of the light per second, i.e per time unit. So that’s not the number of cycles or radians that we should associate with the wavetrain! We should use the number of cycles (or radians) packed into one photon. We can calculate that easily from the value for the decay time τ. Indeed, for sodium light, which which has a frequency of 500 THz (500×1012 oscillations per second) and a wavelength of 600 nm (600×10–9 meter), we said the radiation lasts about 3.2×10–8 seconds (that’s actually the time it takes for the radiation’s energy to die out by a factor 1/e, so the wavetrain will actually last (much) longer, but so the amplitude becomes quite small after that time), and so that makes for some 16 million oscillations, and a ‘length’ of the wavetrain of about 9.6 meter! Now, the energy of a sodium light photon is about 2eV (h·ν ≈ 4×10−15 electronvolt·second times 0.5×1015 cycles/sec = 2eV) and so we could say the average energy of each of those 16 million oscillations would be 2/(16×106) eV = 0.125×10–6 eV. But, from all that I wrote above, it’s obvious that this number doesn’t mean all that much, because the wavetrain is not likely to be shaped very regularly.

So, in short, we cannot say that h is the photon energy per cycle or that ħ is the photon energy per radian!  That’s not only simplistic but, worse, false. Planck’s constant is what is is: a factor of proportionality for which there is no simple ‘arithmetic’ and/or ‘geometric’ explanation. It’s just there, and we’ll need to study some more math to truly understand the meaning of those two expressions for |L〉 and |R〉.

Having said that, and having thought about it all some more, I find there’s, perhaps, a more interesting way to re-write E = ν:

h = E/ν = (λ/c)E = T·E

T? Yes. T is the period, so that’s the time needed for one oscillation: T is just the reciprocal of the frequency (T = 1/ν = λ/c). It’s a very tiny number, because we divide (1) a very small number (the wavelength of light measured in meter) by (2) a very large number (the distance (in meter) traveled by light). For sodium light, T is equal to 2×10–15 seconds, so that’s two femtoseconds, i.e. two quadrillionths of a second.

Now, we can think of the period as a fraction of a second, and smaller fractions are, obviously, associated with higher frequencies and, what amounts to the same, shorter wavelengths (and, hence, higher energies). However, when writing T = λ/c, we can also think of T being another kind of fraction: λ/can also be written as the ratio of the wavelength and the distance traveled by light in one second, i.e. a light-second (remember that light-seconds are measures of length, not of distance). The two fractions are the same when we express time and distance in equivalent units indeed (i.e. distance in light-second, or time in sec/units).

So that links h to both time as well as distance and we may look at h as some kind of fraction or energy ‘density’ even (although the term ‘density’ in this context is not quite accurate). In the same vein, I should note that, if there’s anything that should make you think about h, is the fact that its value depends on how we measure time and distance. For example, if w’d measure time in other units (for example, the more ‘natural’ unit defined by the time light needs to travel one meter), then we get a different unit for h. And, of course, you also know we can relate energy to distance (1 J = 1 N·m). But that’s something that’s obvious from h‘s dimension (J·s), and so I shouldn’t dwell on that.

Hmm… Interesting thoughts. I think I’ll develop these things a bit further in one of my next posts. As for now, however, I’ll leave you with your own thoughts on it.

Note 1: As you’re trying to follow what I am writing above, you may have wondered whether or not the duration of the wavetrain that’s emitted by an atom is a constant, or whether or not it packs some constant number of oscillations. I’ve thought about that myself as I wrote down the following formula at some point of time:

h = (the duration of the wave)·(the energy of the photon)/(the number of oscillations in the wave)

As mentioned above, interpreting h as some kind of average energy per oscillation is not a great idea but, having said that, it would be a good exercise for you to try to answer that question in regard to the duration of these wavetrains, and/or the number of oscillations packed into them, yourself. There are various formulas for the Q of an atomic oscillator, but the simplest one is the one expressed in terms of the so-called classical electron radius r0:

Q = 3λ/4πr0

As you can see, the Q depends on λ: higher wavelengths (so lower energy) are associated with higher Q. In fact, the relationship is directly proportional: twice the wavelength will give you twice the Q. Now, the formula for the decay time τ is also dependent on the wavelength. Indeed, τ = 2Q/ω = Qλ/πc. Combining the two formulas yields (if I am not mistaken):

τ = 3λ2/4π2r0c.

Hence, the decay time is proportional to the square of the wavelength. Hmm… That’s an interesting result. But I really need to learn how to be a bit shorter, and so I’ll really let you think now about what all this means or could mean.

Note 2: If that 1/√2 factor has nothing to do with some kind of rms calculation, where does it come from? I am not sure. It’s related to state vector math, it seems, and I haven’t started that as yet. I just copy a formula from Wikipedia here, which shows the same factor in front:

state vector

The formula above is said to represent the “superposition of joint spin states for two particles”. My gut instinct tells me 1/√2 factor has to do with the normalization condition and/or with the fact that we have to take the (absolute) square of the (complex-valued) amplitudes to get the probability.

The Strange Theory of Light and Matter (II)

If we limit our attention to the interaction between light and matter (i.e. the behavior of photons and electrons only—so we we’re not talking quarks and gluons here), then the ‘crazy ideas’ of quantum mechanics can be summarized as follows:

  1. At the atomic or sub-atomic scale, we can no longer look at light as an electromagnetic wave. It consists of photons, and photons come in blobs. Hence, to some extent, photons are ‘particle-like’.
  2. At the atomic or sub-atomic scale, electrons don’t behave like particles. For example, if we send them through a slit that’s small enough, we’ll observe a diffraction pattern. Hence, to some extent, electrons are ‘wave-like’.

In short, photons aren’t waves, but they aren’t particles either. Likewise, electrons aren’t particles, but they aren’t waves either. They are neither. The weirdest thing of all, perhaps, is that, while light and matter are two very different things in our daily experience – light and matter are opposite concepts, I’d say, just like particles and waves are opposite concepts) – they look pretty much the same in quantum physics: they are both represented by a wavefunction.

Let me immediately make a little note on terminology here. The term ‘wavefunction’ is a bit ambiguous, in my view, because it makes one think of a real wave, like a water wave, or an electromagnetic wave. Real waves are described by real-valued wave functions describing, for example, the motion of a ball on a spring, or the displacement of a gas (e.g. air) as a sound wave propagates through it, or – in the case of an electromagnetic wave – the strength of the electric and magnetic field.

You may have questions about the ‘reality’ of fields, but electromagnetic waves – i.e. the classical description of light – are quite ‘real’ too, even if:

  1. Light doesn’t travel in a medium (like water or air: there is no aether), and
  2. The magnitude of the electric and magnetic field (they are usually denoted by E and B) depend on your reference frame: if you calculate the fields using a moving coordinate system, you will get a different mixture of E and B. Therefore, E and B may not feel very ‘real’ when you look at them separately, but they are very real when we think of them as representing one physical phenomenon: the electromagnetic interaction between particles. So the E and B mix is, indeed, a dual representation of one reality. I won’t dwell on that, as I’ve done that in another post of mine.

How ‘real’ is the quantum-mechanical wavefunction?

The quantum-mechanical wavefunction is not like any of these real waves. In fact, I’d rather use the term ‘probability wave’ but, apparently, that’s used only by bloggers like me :-) and so it’s not very scientific. That’s for a good reason, because it’s not quite accurate either: the wavefunction in quantum mechanics represents probability amplitudes, not probabilities. So we should, perhaps, be consistent and term it a ‘probability amplitude wave’ – but then that’s too cumbersome obviously, so the term ‘probability wave’ may be confusing, but it’s not so bad, I think.

Amplitudes and probabilities are related as follows:

  1. Probabilities are real numbers between 0 and 1: they represent the probability of something happening, e.g. a photon moves from point A to B, or a photon is absorbed (and emitted) by an electron (i.e. a ‘junction’ or ‘coupling’, as you know).
  2. Amplitudes are complex numbers, or ‘arrows’ as Feynman calls them: they have a length (or magnitude) and a direction.
  3. We get the probabilities by taking the (absolute) square of the amplitudes.

So photons aren’t waves, but they aren’t particles either. Likewise, electrons aren’t particles, but they aren’t waves either. They are neither. So what are they? We don’t have words to describe what they are. Some use the term ‘wavicle’ but that doesn’t answer the question, because who knows what a ‘wavicle’ is? So we don’t know what they are. But we do know how they behave. As Feynman puts it, when comparing the behavior of light and then of electrons in the double-slit experiment—struggling to find language to describe what’s going on: “There is one lucky break: electrons behave just like light.”

He says so because of that wave function: the mathematical formalism is the same, for photons and for electrons. Exactly the same? […] But that’s such a weird thing to say, isn’t it? We can’t help thinking of light as waves, and of electrons as particles. They can’t be the same. They’re different, aren’t they? They are.

Scales and senses

To some extent, the weirdness can be explained because the scale of our world is not atomic or sub-atomic. Therefore, we ‘see’ things differently. Let me say a few words about the instrument we use to look at the world: our eye.

Our eye is particular. The retina has two types of receptors: the so-called cones are used in bright light, and distinguish color, but when we are in a dark room, the so-called rods become sensitive, and it is believed that they actually can detect a single photon of light. However, neural filters only allow a signal to pass to the brain when at least five photons arrive within less than a tenth of a second. A tenth of a second is, roughly, the averaging time of our eye. So, as Feynman puts it: “If we were evolved a little further so we could see ten times more sensitively, we wouldn’t have this discussion—we would all have seen very dim light of one color as a series of intermittent little flashes of equal intensity.” In other words, the ‘particle-like’ character of light would have been obvious to us.

Let me make a few more remarks here, which you may or may not find useful. The sense of ‘color’ is not something ‘out there':  colors, like red or brown, are experiences in our eye and our brain. There are ‘pigments’ in the cones (cones are the receptors that work only if the intensity of the light is high enough) and these pigments absorb the light spectrum somewhat differently, as a result of which we ‘see’ color. Different animals see different things. For example, a bee can distinguish between white paper using zinc white versus lead white, because they reflect light differently in the ultraviolet spectrum, which the bee can see but we don’t. Bees can also tell the direction of the sun without seeing the sun itself, because they are sensitive to polarized light, and the scattered light of the sky (i.e. the blue sky as we see it) is polarized. The bee can also notice flicker up to 200 oscillations per second, while we see it only up to 20, because our averaging time is like a tenth of a second, which is short for us, but so the averaging time of the bee is much shorter. So we cannot see the quick leg movements and/or wing vibrations of bees, but the bee can!

Sometimes we can’t see any color. For example, we see the night sky in ‘black and white’ because the light intensity is very low, and so it’s our rods, not the cones, that process the signal, and so these rods can’t ‘see’ color. So those beautiful color pictures of nebulae are not artificial (although the pictures are often enhanced). It’s just that the camera that is used to take those pictures (film or, nowadays, digital) is much more sensitive than our eye. 

Regardless, color is a quality which we add to our experience of the outside world ourselves. What’s out there are electromagnetic waves with this or that wavelength (or, what amounts to the same, this or that frequency). So when critics of the exact sciences say so much is lost when looking at (visible) light as an electromagnetic wave in the range of 430 to 790 teraherz, they’re wrong. Those critics will say that physics reduces reality. That is not the case.

What’s going on is that our senses process the signal that they are receiving, especially when it comes to vision. As Feynman puts it: “None of the other senses involves such a large amount of calculation, so to speak, before the signal gets into a nerve that one can make measurements on. The calculations for all the rest of the senses usually happen in the brain itself, where it is very difficult to get at specific places to make measurements, because there are so many interconnections. Here, with the visual sense, we have the light, three layers of cells making calculations, and the results of the calculations being transmitted through the optic nerve.”

Hence, things like color and all of the other sensations that we have are the object of study of other sciences, including biochemistry and neurobiology, or physiology. For all we know, what’s ‘out there’ is, effectively, just ‘boring’ stuff, like electromagnetic radiation, energy and ‘elementary particles’—whatever they are. No colors. Just frequencies. :-)

Light versus matter

If we accept the crazy ideas of quantum mechanics, then the what and the how become one and the same. Hence we can say that photons and electrons are a wavefunction somewhere in space. Photons, of course, are always traveling, because they have energy but no rest mass. Hence, all their energy is in the movement: it’s kinetic, not potential. Electrons, on the other hand, usually stick around some nucleus. And, let’s not forget, they have an electric charge, so their energy is not only kinetic but also potential.

But, otherwise, it’s the same type of ‘thing’ in quantum mechanics: a wavefunction, like those below.


Why diagram A and B? It’s just to emphasize the difference between a real-valued wave function and those ‘probability waves’ we’re looking at here (diagram C to H). A and B represent a mass on a spring, oscillating at more or less the same frequency but a different amplitude. The amplitude here means the displacement of the mass. The function describing the displacement of a mass on a spring (so that’s diagram A and B) is an example of a real-valued wave function: it’s a simple sine or cosine function, as depicted below. [Note that a sine and a cosine are the same function really, except for a phase difference of 90°.]

cos and sine

Let’s now go back to our ‘probability waves’. Photons and electrons, light and matter… The same wavefunction? Really? How can the sunlight that warms us up in the morning and makes trees grow be the same as our body, or the tree? The light-matter duality that we experience must be rooted in very different realities, isn’t it?

Well… Yes and no. If we’re looking at one photon or one electron only, it’s the same type of wavefunction indeed. The same type… OK, you’ll say. So they are the same family or genus perhaps, as they say in biology. Indeed, both of them are, obviously, being referred to as ‘elementary particles’ in the so-called Standard Model of physics. But so what makes an electron and a photon specific as a species? What are the differences?

There’re  quite a few, obviously:

1. First, as mentioned above, a photon is a traveling wave function and, because it has no rest mass, it travels at the ultimate speed, i.e. the speed of light (c). An electron usually sticks around or, if it travels through a wire, it travels at very low speeds. Indeed, you may find it hard to believe, but the drift velocity of the free electrons in a standard copper wire is measured in cm per hour, so that’s very slow indeed—and while the electrons in an electron microscope beam may be accelerated up to 70% of the speed of light, and close to in those huge accelerators, you’re not likely to find an electron microscope or accelerator in Nature. In fact, you may want to remember that a simple thing like electricity going through copper wires in our houses is a relatively modern invention. :-)

So, yes, those oscillating wave functions in those diagrams above are likely to represent some electron, rather than a photon. To be precise, the wave functions above are examples of standing (or stationary) waves, while a photon is a traveling wave: just extend that sine and cosine function in both directions if you’d want to visualize it or, even better, think of a sine and cosine function in an envelope traveling through space, such as the one depicted below.

Photon wave

Indeed, while the wave function of our photon is traveling through space, it is likely to be limited in space because, when everything is said and done, our photon is not everywhere: it must be somewhere. 

At this point, it’s good to pause and think about what is traveling through space. It’s the oscillation. But what’s the oscillation? There is no medium here, and even if there would be some medium (like water or air or something like aether—which, let me remind you, isn’t there!), the medium itself would not be moving, or – I should be precise here – it would only move up and down as the wave propagates through space, as illustrated below. To be fully complete, I should add we also have longitudinal waves, like sound waves (pressure waves): in that case, the particles oscillate back and forth along the direction of wave propagation. But you get the point: the medium does not travel with the wave.


When talking electromagnetic waves, we have no medium. These E and B vectors oscillate but is very wrong to assume they use ‘some core of nearby space’, as Feynman puts it. They don’t. Those field vectors represent a condition at one specific point (admittedly, a point along the direction of travel) in space but, for all we know, an electromagnetic wave travels in a straight line and, hence, we can’t talk about its diameter or so.

Still, as mentioned above, we can imagine, more or less, what E and B stand for (we can use field line to visualize them, for instance), even if we have to take into account their relativity (calculating their values from a moving reference frame results in different mixtures of E and B). But what are those amplitudes? How should we visualize them?

The honest answer is: we can’t. They are what they are: two mathematical quantities which, taken together, form a two-dimensional vector, which we square to find a value for a real-life probability, which is something that – unlike the amplitude concept – does make sense to us. Still, that representation of a photon above (i.e. the traveling envelope with a sine and cosine inside) may help us to ‘understand’ it somehow. Again, you absolute have to get rid of the idea that these ‘oscillations’ would somehow occupy some physical space. They don’t. The wave itself has some definite length, for sure, but that’s a measurement in the direction of travel, which is often denoted as x when discussing uncertainty in its position, for example—as in the famous Uncertainty Principle (ΔxΔp > h).

You’ll say: Oh!—but then, at the very least, we can talk about the ‘length’ of a photon, can’t we? So then a photon is one-dimensional at least, not zero-dimensional! The answer is yes and no. I’ve talked about this before and so I’ll be short(er) on it now. A photon is emitted by an atom when an electron jumps from one energy level to another. It thereby emits a wave train that lasts about 10–8 seconds. That’s not very long but, taking into account the rather spectacular speed of light (3×10m/s), that still makes for a wave train with a length of not less than 3 meter. […] That’s quite a length, you’ll say. You’re right. But you forget that light travels at the speed of light and, hence, we will see this length as zero because of the relativistic length contraction effect. So… Well… Let me get back to the question: if photons and electrons are both represented by a wavefunction, what makes them different?

2. A more fundamental difference between photons and electrons is how they interact with each other.

From what I’ve written above, you understand that probability amplitudes are complex numbers, or ‘arrows’, or ‘two-dimensional vectors’. [Note that all of these terms have precise mathematical definitions and so they’re actually not the same, but the difference is too subtle to matter here.] Now, there are two ways of combining amplitudes, which are referred to as ‘positive’ and ‘negative’ interference respectively. I should immediately note that there’s actually nothing ‘positive’ or ‘negative’ about the interaction: we’re just putting two arrows together, and there are two ways to do that. That’s all.

The diagrams below show you these two ways. You’ll say: there are four! However, remember that we square an arrow to get a probability. Hence, the direction of the final arrow doesn’t matter when we’re taking the square: we get the same probability. It’s the direction of the individual amplitudes that matters when combining them. So the square of A+B is the same as the square of –(A+B) = –A+(–B) = –AB. Likewise, the square of AB is the same as the square of –(AB) = –A+B.

vector addition

These are the only two logical possibilities for combining arrows. I’ve written ad nauseam about this elsewhere: see my post on amplitudes and statistics, and so I won’t go into too much detail here. Or, in case you’d want something less than a full mathematical treatment, I can refer you to my previous post also, where I talked about the ‘stopwatch’ and the ‘phase': the convention for the stopwatch is to have its hand turn clockwise (obviously!) while, in quantum physics, the phase of a wave function will turn counterclockwise. But so that’s just convention and it doesn’t matter, because it’s the phase difference between two amplitudes that counts. To use plain language: it’s the difference in the angles of the arrows, and so that difference is just the same if we reverse the direction of both arrows (which is equivalent to putting a minus sign in front of the final arrow).

OK. Let me get back to the lesson. The point is: this logical or mathematical dichotomy distinguishes bosons (i.e. force-carrying ‘particles’, like photons, which carry the electromagnetic force) from fermions (i.e. ‘matter-particles’, such as electrons and quarks, which make up protons and neutrons). Indeed, the so-called ‘positive’ and ‘negative’ interference leads to two very different behaviors:

  1. The probability of getting a boson where there are already present, is n+1 times stronger than it would be if there were none before.
  2. In contrast, the probability of getting two electrons into exactly the same state is zero. 

The behavior of photons makes lasers possible: we can pile zillions of photon on top of each other, and then release all of them in one powerful burst. [The ‘flickering’ of a laser beam is due to the quick succession of such light bursts. If you want to know how it works in detail, check my post on lasers.]

The behavior of electrons is referred to as Fermi’s exclusion principle: it is only because real-life electrons can have one of two spin polarizations (i.e. two opposite directions of angular momentum, which are referred to as ‘up’ or ‘down’, but they might as well have been referred to as ‘left’ or ‘right’) that we find two electrons (instead of just one) in any atomic or molecular orbital.

So, yes, while both photons and electrons can be described by a similar-looking wave function, their behavior is fundamentally different indeed. How is that possible? Adding and subtracting ‘arrows’ is a very similar operation, isn’it?

It is and it isn’t. From a mathematical point of view, I’d say: yes. From a physics point of view, it’s obviously not very ‘similar’, as it does lead to these two very different behaviors: the behavior of photons allows for laser shows, while the behavior of electrons explain (almost) all the peculiarities of the material world, including us walking into doors. :-) If you want to check it out for yourself, just check Feynman’s Lectures for more details on this or, else, re-read my posts on it indeed.

3. Of course, there are even more differences between photons and electrons than the two key differences I mentioned above. Indeed, I’ve simplified a lot when I wrote what I wrote above. The wavefunctions of electrons in orbit around a nucleus can take very weird shapes, as shown in the illustration below—and please do google a few others if you’re not convinced. As mentioned above, they’re so-called standing waves, because they occupy a well-defined position in space only, but standing waves can look very weird. In contrast, traveling plane waves, or envelope curves like the one above, are much simpler.


In short: yes, the mathematical representation of photons and electrons (i.e. the wavefunction) is very similar, but photons and electrons are very different animals indeed.

Potentiality and interconnectedness

I guess that, by now, you agree that quantum theory is weird but, as you know, quantum theory does explain all of the stuff that couldn’t be explained before: “It works like a charm”, as Feynman puts it. In fact, he’s often quoted as having said the following:

“It is often stated that of all the theories proposed in this century, the silliest is quantum theory. Some say the the only thing that quantum theory has going for it, in fact, is that it is unquestionably correct.”

Silly? Crazy? Uncommon-sensy? Truth be told, you do get used to thinking in terms of amplitudes after a while. And, when you get used to them, those ‘complex’ numbers are no longer complicated. :-) Most importantly, when one thinks long and hard enough about it (as I am trying to do), it somehow all starts making sense.

For example, we’ve done away with dualism by adopting a unified mathematical framework, but the distinction between bosons and fermions still stands: an ‘elementary particle’ is either this or that. There are no ‘split personalities’ here. So the dualism just pops up at a different level of description, I’d say. In fact, I’d go one step further and say it pops up at a deeper level of understanding.

But what about the other assumptions in quantum mechanics. Some of them don’t make sense, do they? Well… I struggle for quite a while with the assumption that, in quantum mechanics, anything is possible really. For example, a photon (or an electron) can take any path in space, and it can travel at any speed (including speeds that are lower or higher than light). The probability may be extremely low, but it’s possible.

Now that is a very weird assumption. Why? Well… Think about it. If you enjoy watching soccer, you’ll agree that flying objects (I am talking about the soccer ball here) can have amazing trajectories. Spin, lift, drag, whatever—the result is a weird trajectory, like the one below:


But, frankly, a photon taking the ‘southern’ route in the illustration below? What are the ‘wheels and gears’ there? There’s nothing sensible about that route, is there?


In fact, there’s at least three issues here:

  1. First, you should note that strange curved paths in the real world (such as the trajectories of billiard or soccer balls) are possible only because there’s friction involved—between the felt of the pool table cloth and the ball, or between the balls, or, in the case of soccer, between the ball and the air. There’s no friction in the vacuum. Hence, in empty space, all things should go in a straight line only.
  2. While it’s quite amazing what’s possible, in the real world that is, in terms of ‘weird trajectories’, even the weirdest trajectories of a billiard or soccer ball can be described by a ‘nice’ mathematical function. We obviously can’t say the same of that ‘southern route’ which a photon could follow, in theory that is. Indeed, you’ll agree the function describing that trajectory cannot be ‘nice’. So even we’d allow all kinds of ‘weird’ trajectories, shouldn’t we limit ourselves to ‘nice’ trajectories only? I mean: it doesn’t make sense to allow the photons traveling from your computer screen to your retina take some trajectory to the Sun and back, does it?
  3. Finally, and most fundamentally perhaps, even when we would assume that there’s some mechanism combining (a) internal ‘wheels and gears’ (such as spin or angular momentum) with (b) felt or air or whatever medium to push against, what would be the mechanism determining the choice of the photon in regard to these various paths? In Feynman’s words: How does the photon ‘make up its mind’?

Feynman answers these questions, fully or partially (I’ll let you judge), when discussing the double-slit experiment with photons:

“Saying that a photon goes this or that way is false. I still catch myself saying, “Well, it goes either this way or that way,” but when I say that, I have to keep in mind that I mean in the sense of adding amplitudes: the photon has an amplitude to go one way, and an amplitude to go the other way. If the amplitudes oppose each other, the light won’t get there—even though both holes are open.”

It’s probably worth re-calling the results of that experiment here—if only to help you judge whether or not Feynman fully answer those questions above!

The set-up is shown below. We have a source S, two slits (A and B), and a detector D. The source sends photons out, one by one. In addition, we have two special detectors near the slits, which may or may not detect a photon, depending on whether or not they’re switched on as well as on their accuracy.

set-up photons

First, we close one of the slits, and we find that 1% of the photons goes through the other (so that’s one photon for every 100 photons that leave S). Now, we open both slits to study interference. You know the results already:

  1. If we switch the detectors off (so we have no way of knowing where the photon went), we get interference. The interference pattern depends on the distance between A and B and varies from 0% to 4%, as shown in diagram (a) below. That’s pretty standard. As you know, classical theory can explain that too assuming light is an electromagnetic wave. But so we have blobs of energy – photons – traveling one by one. So it’s really that double-slit experiment with electrons, or whatever other microscopic particles (as you know, they’ve done these interference electrons with large molecules as well—and they get the same result!). We get the interference pattern by using those quantum-mechanical rules to calculate probabilities: we first add the amplitudes, and it’s only when we’re finished adding those amplitudes, that we square the resulting arrow to the final probability.
  2. If we switch those special detectors on, and if they are 100% reliable (i.e. all photons going through are being detected), then our photon suddenly behaves like a particle, instead of as a wave: they will go through one of the slits only, i.e. either through A, or, alternatively, through B. So the two special detectors never go off together. Hence, as Feynman puts it: we shouldn’t think there is “sneaky way that the photon divides in two and then comes back together again.” It’s one or the other way and, and there’s no interference: the detector at D goes off 2% of the time, which is the simple sum of the probabilities for A and B (i.e. 1% + 1%).
  3. When the special detectors near A and B are not 100% reliable (and, hence, do not detect all photons going through), we have three possible final conditions: (i) A and D go off, (ii) B and D go off, and (iii) D goes off alone (none of the special detectors went off). In that case, we have a final curve that’s a mixture, as shown in diagram (c) and (d) below. We get it using the same quantum-mechanical rules: we add amplitudes first, and then we square to get the probabilities.

double-slit photons - results

Now, I think you’ll agree with me that Feynman doesn’t answer my (our) question in regard to the ‘weird paths’. In fact, all of the diagrams he uses assume straight or nearby paths. Let me re-insert two of those diagrams below, to show you what I mean.

 Many arrowsFew arrows

So where are all the strange non-linear paths here? Let me, in order to make sure you get what I am saying here, insert that illustration with the three crazy routes once again. What we’ve got above (Figure 33 and 34) is not like that. Not at all: we’ve got only straight lines there! Why? The answer to that question is easy: the crazy paths don’t matter because their amplitudes cancel each other out, and so that allows Feynman to simplify the whole situation and show all the relevant paths as straight lines only.


Now, I struggled with that for quite a while. Not because I can’t see the math or the geometry involved. No. Feynman does a great job showing why those amplitudes cancel each other out indeed (if you want a summary, see my previous post once again).  My ‘problem’ is something else. It’s hard to phrase it, but let me try: why would we even allow for the logical or mathematical possibility of ‘weird paths’ (and let me again insert that stupid diagram below) if our ‘set of rules’ ensures that the truly ‘weird’ paths (like that photon traveling from your computer screen to your eye doing a detour taking it to the Sun and back) cancel each other out anyway? Does that respect Occam’s Razor? Can’t we devise some theory including ‘sensible’ paths only?

Of course, I am just an autodidact with limited time, and I know hundreds (if not thousands) of the best scientists have thought long and hard about this question and, hence, I readily accept the answer is quite simply: no. There is no better theory. I accept that answer, ungrudgingly, not only because I think I am not so smart as those scientists but also because, as I pointed out above, one can’t explain any path that deviates from a straight line really, as there is no medium, so there are no ‘wheels and gears’. The only path that makes sense is the straight line, and that’s only because…

Well… Thinking about it… We think the straight path makes sense because we have no good theory for any of the other paths. Hmm… So, from a logical point of view, assuming that the straight line is the only reasonable path is actually pretty random too. When push comes to shove, we have no good theory for the straight line either!

You’ll say I’ve just gone crazy. […] Well… Perhaps you’re right. :-) But… Somehow, it starts to make sense to me. We allow for everything to, then, indeed weed out the crazy paths using our interference theory, and so we do end up with what we’re ending up with: some kind of vague idea of “light not really traveling in a straight line but ‘smelling’ all of the neighboring paths around it and, hence, using a small core of nearby space“—as Feynman puts it.

Hmm… It brings me back to Richard Feynman’s introduction to his wonderful little book, in which he says we should just be happy to know how Nature works and not aspire to know why it works that way. In fact, he’s basically saying that, when it comes to quantum mechanics, the ‘how’ and the ‘why’ are one and the same, so asking ‘why’ doesn’t make sense, because we know ‘how’. He compares quantum theory with the system of calculation used by the Maya priests, which was based on a system of bars and dots, which helped them to do complex multiplications and divisions, for example. He writes the following about it: “The rules were tricky, but they were a much more efficient way of getting an answer to complicated questions (such as when Venus would rise again) than by counting beans.”

When I first read this, I thought the comparison was flawed: if a common Maya Indian did not want to use the ‘tricky’ rules of multiplication and what have you (or, more likely, if he didn’t understand them), he or she could still resort to counting beans. But how do we count beans in quantum mechanics? We have no ‘simpler’ rules than those weird rules about adding amplitudes and taking the (absolute) square of complex numbers so… Well… We actually are counting beans here then:

  1. We allow for any possibility—any path: straight, curved or crooked. Anything is possible.
  2. But all those possibilities are inter-connected. Also note that every path has a mirror image: for every route ‘south’, there is a similar route ‘north’, so to say, except for the straight line, which is a mirror image of itself.
  3. And then we have some clock ticking. Time goes by. It ensures that the paths that are too far removed from the straight line cancel each other. [Of course, you’ll ask: what is too far? But I answered that question –  convincingly, I hope – in my previous post: it’s not about the ‘number of arrows’ (as suggested in the caption under that Figure 34 above), but about the frequency and, hence, the ‘wavelength’ of our photon.]
  4. And so… Finally, what’s left is a limited number of possibilities that interfere with each other, which results in what we ‘see': light seems to use a small core of space indeed–a limited number of nearby paths.

You’ll say… Well… That still doesn’t ‘explain’ why the interference pattern disappears with those special detectors or – what amounts to the same – why the special detectors at the slits never click simultaneously.

You’re right. How do we make sense of that? I don’t know. You should try to imagine what happens for yourself. Everyone has his or her own way of ‘conceptualizing’ stuff, I’d say, and you may well be content and just accept all of the above without trying to ‘imagine’ what’s happening really when a ‘photon’ goes through one or both of those slits. In fact, that’s the most sensible thing to do. You should not try to imagine what happens and just follow the crazy calculus rules.

However, when I think about it, I do have some image in my head. The image is of one of those ‘touch-me-not’ weeds. I quickly googled one of these images, but I couldn’t quite find what I am looking for: it would be more like something that, when you touch it, curls up in a little ball. Any case… You know what I mean, I hope.


You’ll shake your head now and solemnly confirm that I’ve gone mad. Touch-me-not weeds? What’s that got to do with photons? 

Well… It’s obvious you and I cannot really imagine how a photon looks like. But I think of it as a blob of energy indeed, which is inseparable, and which effectively occupies some space (in three dimensions that is). I also think that, whatever it is, it actually does travel through both slits, because, as it interferes with itself, the interference pattern does depend on the space between the two slits as well as the width of those slits. In short, the whole ‘geometry’ of the situation matters, and so the ‘interaction’ is some kind of ‘spatial’ thing. [Sorry for my awfully imprecise language here.]

Having said that, I think it’s being detected by one detector only because only one of them can sort of ‘hook’ it, somehow. Indeed, because it’s interconnected and inseparable, it’s the whole blob that gets hooked, not just one part of it. [You may or may not imagine that the detectors that’s got the best hold of it gets it, but I think that’s pushing the description too much.] In any case, the point is that a photon is surely not like a lizard dropping its tail while trying to escape. Perhaps it’s some kind of unbreakable ‘string’ indeed – and sorry for summarizing string theory so unscientifically here – but then a string oscillating in dimensions we can’t imagine (or in some dimension we can’t observe, like the Kaluza-Klein theory suggests). It’s something, for sure, and something that stores energy in some kind of oscillation, I think.

What it is, exactly, we can’t imagine, and we’ll probably never find out—unless we accept that the how of quantum mechanics is not only the why, but also the what. :-)

Does this make sense? Probably not but, if anything, I hope it fired your imagination at least. :-)

The Strange Theory of Light and Matter (I)

I am of the opinion that Richard Feynman’s wonderful little common-sense introduction to the ‘uncommon-sensy‘ theory of quantum electrodynamics (The Strange Theory of Light and Matter), which were published a few years before his death only, should be mandatory reading for high school students.

I actually mean that: it should just be part of the general education of the first 21st century generation. Either that or, else, the Education Board should include a full-fledged introduction to complex analysis and quantum physics in the curriculum. :-)

Having praised it (just now, as well as in previous posts), I re-read it recently during a trek in Nepal with my kids – I just grabbed the smallest book I could find the morning we left :-) – and, frankly, I now think Ralph Leighton, who transcribed and edited these four short lectures, could have cross-referenced it better. Moreover, there are two or three points where Feynman (or Leighton?) may have sacrificed accuracy for readability. Let me recapitulate the key points and try to improve here and there.

Amplitudes and arrows

The booklet avoids scary mathematical terms and formulas but doesn’t avoid the fundamental concepts behind, and it doesn’t avoid the kind of ‘deep’ analysis one needs to get some kind of ‘feel’ for quantum mechanics either. So what are the simplifications?

A probability amplitude (i.e. a complex number) is, quite simply, an arrow, with a direction and a length. Thus Feynman writes: “Arrows representing probabilities from 0% to 16% [as measured by the surface of the square which has the arrow as its side] have lengths from 0 to 0.4.” That makes sense: such geometrical approach does away, for example, with the need to talk about the absolute square (i.e. the square of the absolute value, or the squared norm) of a complex number – which is what we need to calculate probabilities from probability amplitudes. So, yes, it’s a wonderful metaphor. We have arrows and surfaces now, instead of wave functions and absolute squares of complex numbers.

The way he combines these arrows make sense too. He even notes the difference between photons (bosons) and electrons (fermions): for bosons, we just add arrows; for fermions, we need to subtract them (see my post on amplitudes and statistics in this regard).

There is also the metaphor for the phase of a wave function, which is a stroke of genius really (I mean it): the direction of the ‘arrow’ is determined by a stopwatch hand, which starts turning when a photon leaves the light source, and stops when it arrives, as shown below.

front and back reflection amplitude

OK. Enough praise. What are the drawbacks?

The illustration above accompanies an analysis of how light is either reflected from the front surface of a sheet of a glass or, else, from the back surface. Because it takes more time to bounce off the back surface (the path is associated with a greater distance), the front and back reflection arrows point in different directions indeed (the stopwatch is stopped somewhat later when the photon reflects from the back surface). Hence, the difference in phase (but that’s a term that Feynman also avoids) is determined by the thickness of the glass. Just look at it. In the upper part of the illustration above, the thickness is such that the chance of a photon reflecting off the front or back surface is 5%: we add two arrows, each with a length of 0.2, and then we square the resulting (aka final) arrow. Bingo! We get a surface measuring 0.05, or 5%.

Huh? Yes. Just look at it: if the angle between the two arrows would be 90° exactly, it would be 0.08 or 8%, but the angle is a bit less. In the lower part of the illustration, the thickness of the glass is such that the two arrows ‘line up’ and, hence, they form an arrow that’s twice the length of either arrow alone (0.2 + 0.2 = 0.4), with a square four times as large (0.16 = 16%). So… It all works like a charm, as Feynman puts it.


But… Hey! Look at the stopwatch for the front reflection arrows in the upper and lower diagram: they point in the opposite direction of the stopwatch hand! Well… Hmm… You’re right. At this point, Feynman just notes that we need an extra rule: “When we are considering the path of a photon bouncing off the front surface of the glass, we reverse the direction of the arrow.

He doesn’t say why. He just adds this random rule to the other rules – which most readers who read this book already know. But why this new rule? Frankly, this inconsistency – or lack of clarity – would wake me up at night. This is Feynman: there must be a reason. Why?

Initially, I suspected it had something to do with the two types of ‘statistics’ in quantum mechanics (i.e. those different rules for combining amplitudes of bosons and fermions respectively, which I mentioned above). But… No. Photons are bosons anyway, so we surely need to add, not subtract. So what is it?

[…] Feynman explains it later, much later – in the third of the four chapters of this little book, to be precise. It’s, quite simply, the result of the simplified model he uses in that first chapter. The photon can do anything really, and so there are many more arrows than just two. We actually should look at an infinite number of arrows, representing all possible paths in spacetime, and, hence, the two arrows (i.e. the one for the reflection from the front and back surface respectively) are combinations of many other arrows themselves. So how does that work?

An analysis of partial reflection (I)

The analysis in Chapter 3 of the same phenomenon (i.e. partial reflection by glass) is a simplified analysis too, but it’s much better – because there are no ‘random’ rules here. It is what Leighton promises to the reader in his introduction: “A complete description, accurate in every detail, of a framework onto which more advanced concepts can be attached without modification. Nothing has to be ‘unlearned’ later.

Well… Accurate in every detail? Perhaps not. But it’s good, and I still warmly recommend a reading of this delightful little book to anyone who’d ask me what to read as a non-mathematical introduction to quantum mechanics. I’ll limit myself here to just some annotations.

The first drawing (a) depicts the situation:

  1. A photon from a light source is being reflected by the glass. Note that it may also go straight through, but that’s a possibility we’ll analyze separately. We first assume that the photon is effectively being reflected by the glass, and so we want to calculate the probability of that event using all these ‘arrows’, i.e. the underlying probability amplitudes.
  2. As for the geometry of the situation: while the light source and the detector seem to be positioned at some angle from the normal, that is not the case: the photon travels straight down (and up again when reflected). It’s just a limitation of the drawing. It doesn’t really matter much for the analysis: we could look at a light beam coming in at some angle, but so we’re not doing that. It’s the simplest situation possible, in terms of experimental set-up that is. I just want to be clear on that.

partial reflection

Now, rather than looking at the front and back surface only (as Feynman does in Chapter 1), the glass sheet is now divided into a number of very thin sections: five, in this case, so we have six points from which the photon can be scattered into the detector at A: X1 to X6. So that makes six possible paths. That’s quite a simplification but it’s easy to see it doesn’t matter: adding more sections would result in many more arrows, but these arrows would also be much smaller, and so the final arrow would be the same.

The more significant simplification is that the paths are all straight paths, and that the photon is assumed to travel at the speed of light, always. If you haven’t read the booklet, you’ll say that’s obvious, but it’s not: a photon has an amplitude to go faster or slower than c but, as Feynman points out, these amplitudes cancel out over longer distances. Likewise, a photon can follow any path in space really, including terribly crooked paths, but these paths also cancel out. As Feynman puts it: “Only the paths near the straight-line path have arrows pointing in nearly the same direction, because their timings are nearly the same, and only these arrows are important, because it is from them that we accumulate a large final arrow.” That makes perfect sense, so there’s no problem with the analysis here either.

So let’s have a look at those six arrows in illustration (b). They point in a slightly different direction because the paths are slightly different and, hence, the distances (and, therefore, the timings) are different too. Now, Feynman (but I think it’s Leighton really) loses himself here in a digression on monochromatic light sources. A photon is a photon: it will have some wave function with a phase that varies in time and in space and, hence, illustration (b) makes perfect sense. [I won’t quote what he writes on a ‘monochromatic light source’ because it’s quite confusing and, IMHO, not correct.]

The stopwatch metaphor has only one minor shortcoming: the hand of a stopwatch rotates clockwise (obviously!), while the phase of an actual wave function goes counterclockwise with time. That’s just convention, and I’ll come back to it when I discuss the mathematical representation of the so-called wave function, which gives you these amplitudes. However, it doesn’t change the analysis, because it’s the difference in the phase that matters when combining amplitudes, so the clock can turn in either way indeed, as long as we’re agreed on it.

At this point, I can’t resist: I’ll just throw the math in. If you don’t like it, you can just skip the section that follows.

Feynman’s arrows and the wave function

The mathematical representation of Feynman’s ‘arrows’ is the wave function:

f = f(x–ct)

Is that the wave function? Yes. It is: it’s a function whose argument is x – ct, with x the position in space, and t the time variable. As for c, that’s the speed of light. We throw it in to make the units in which we measure time and position compatible. 

Really? Yes: f is just a regular wave function. To make it look somewhat more impressive, I could use the Greek symbol Φ (phi) or Ψ (psi) for it, but it’s just what it is: a function whose value depends on position and time indeed, so we write f = f(x–ct). Let me explain the minus sign and the c in the argument.

Time and space are interchangeable in the argument, provided we measure time in the ‘right’ units, and so that’s why we multiply the time in seconds with c, so the new unit of time becomes the time that light needs to travel a distance of one meter. That also explains the minus sign in front of ct: if we add one distance unit (i.e. one meter) to the argument, we have to subtract one time unit from it – the new time unit of course, so that’s the time that light needs to travel one meter – in order to get the same value for f. [If you don’t get that x–ct thing, just think a while about this, or make some drawing of a wave function. Also note that the spacetime diagram in illustration (b) above assumes the same: time is measured in an equivalent unit as distance, so the 45% line from the south-west to the north-east, that bounces back to the north-west, represents a photon traveling at speed c in space indeed: one unit of time corresponds to one meter of travel.]

Now I want to be a bit more aggressive. I said is a simple function. That’s true and not true at the same time. It’s a simple function, but it gives you probability amplitudes, which are complex numbers – and you may think that complex numbers are, perhaps, not so simple. However, you shouldn’t be put off. Complex numbers are really like Feynman’s ‘arrows’ and, hence, fairly simple things indeed. They have two dimensions, so to say: an a- and a b-coordinate. [I’d say an x- and y-coordinate, because that’s what you usually see, but then I used the x symbol already for the position variable in the argument of the function, so you have to switch to a and b for a while now.]

This a- and b-coordinate are referred to as the real and imaginary part of a complex number respectively. The terms ‘real’ and ‘imaginary’ are confusing because both parts are ‘real’ – well… As real as numbers can be, I’d say. :-) They’re just two different directions in space: the real axis is the a-axis in coordinate space, and the imaginary axis is the b-axis. So we could write it as an ordered pair of numbers (a, b). However, we usually write it as a number itself, and we distinguish the b-coordinate from the a-coordinate by writing an i in front: (a, b) = a + ib. So our function f = f(x–ct) is a complex-valued function: it will give you two numbers (an a and a b) instead of just one when you ‘feed’ it with specific values for x and t. So we write:

f = f(x–ct) = (a, b) = a + ib

So what’s the shape of this function? Is it linear or irregular or what? We’re talking a very regular wave function here, so it’s shape is ‘regular’ indeed. It’s a periodic function, so it repeats itself again and again. The animations below give you some idea of such ‘regular’ wave functions. Animation A and B shows a real-valued ‘wave': a ball on a string that goes up and down, for ever and ever. Animations C to H are – believe it or not – basically the same thing, but so we have two numbers going up and down. That’s all.


The wave functions above are, obviously, confined in space, and so the horizontal axis represents the position in space. What we see, then, is how the real and imaginary part of these wave functions varies as time goes by. [Think of the blue graph as the real part, and the imaginary part as the pinkish thing – or the other way around. It doesn’t matter.] Now, our wave function – i.e. the one that Feynman uses to calculate all those probabilities – is even more regular than those shown above: its real part is an ordinary cosine function, and it’s imaginary part is a sine. Let me write this in math:

f = f(x–ct) = a + ib = r(cosφ + isinφ)

It’s really the most regular wave function in the world: the very simple illustration below shows how the two components of f vary as a function in space (i.e. the horizontal axis) while we keep the time fixed, or vice versa: it could also show how the function varies in time at one particular point in space, in which case the horizontal axis would represent the time variable. It is what it is: a sine and a cosine function, with the angle φ as its argument.

cos and sine

Note that a sine function is the same as a cosine function, but it just lags a bit. To be precise, the phase difference is 90°, or π/2 in radians (the radian (i.e. the length of the arc on the unit circle) is a much more natural unit to express angles, as it’s fully compatible with our distance unit and, hence, most – if not all – of our other units). Indeed, you may or may not remember the following trigonometric identities: sinφ = cos(π/2–φ) = cos(φ–π/2).

In any case, now we have some r and φ here, instead of a and b. You probably wonder where I am going with all of this. Where are the x and t variables? Be patient! You’re right. We’ll get there. I have to explain that r and φ first. Together, they are the so-called polar coordinates of Feynman’s ‘arrow’ (i.e. the amplitude). Polar coordinates are just as good as coordinates as these Cartesian coordinates we’re used to (i.e. a and b). It’s just a different coordinate system. The illustration below shows how they are related to each other. If you remember anything from your high school trigonometry course, you’ll immediately agree that a is, obviously, equal to rcosφ, and b is rsinφ, which is what I wrote above. Just as good? Well… The polar coordinate system has some disadvantages (all of those expressions and rules we learned in vector analysis assume rectangular coordinates, and so we should watch out!) but, for our purpose here, polar coordinates are actually easier to work with, so they’re better.


Feynman’s wave function is extremely simple because his ‘arrows’ have a fixed length, just like the stopwatch hand. They’re just turning around and around and around as time goes by. In other words, is constant and does not depend on position and time. It’s the angle φ that’s turning and turning and turning as the stopwatch ticks while our photon is covering larger and larger distances. Hence, we need to find a formula for φ that makes it explicit how φ changes as a function in spacetime. That φ variable is referred to as the phase of the wave function. That’s a term you’ll encounter frequently and so I had better mention it. In fact, it’s generally used as a synonym for any angle, as you can see from my remark on the phase difference between a sine and cosine function.

So how do we express φ as a function of x and t? That’s where Euler’s formula comes in. Feynman calls it the most remarkable formula in mathematics – our jewel! And he’s probably right: of all the theorems and formulas, I guess this is the one we can’t do without when studying physics. I’ve written about this in another post, and repeating what I wrote there would eat up too much space, so I won’t do it and just give you that formula. A regular complex-valued wave function can be represented as a complex (natural) exponential function, i.e. an exponential function with Euler’s number e (i.e. 2.728…) as the base, and the complex number iφ as the (variable) exponent. Indeed, according to Euler’s formula, we can write:

f = f(x–ct) = a + ib = r(cosφ + isinφ) = r·eiφ

As I haven’t explained Euler’s formula (you should really have a look at my posts on it), you should just believe me when I say that r·eiφ is an ‘arrow’ indeed, with length r and angle φ (phi), as illustrated above, with a and b coordinates arcosφ and b = rsinφ. What you should be able to do now, is to imagine how that φ angle goes round and round as time goes by, just like Feynman’s ‘arrow’ goes round and round – just like a stopwatch hand indeed, but note our φ angle turns counterclockwise indeed.

Fine, you’ll say – but so we need a mathematical expression, don’t we? Yes,we do. We need to know how that φ angle (i.e. the variable in our r·eiφ function) changes as a function of x and t indeed. It turns out that the φ in r·eiφ can be substituted as follows:

eiφ = r·ei(ωt–kx) = r·eik(x–ct)

Huh? Yes. The phase (φ) of the probability amplitude (i.e. the ‘arrow’) is a simple linear function of x and t indeed: φ = ωt–kx = –k(x–ct). What about all these new symbols, k and ω? The ω and k in this equation are the so-called angular frequency and the wave number of the wave. The angular frequency is just the frequency expressed in radians, and you should think of the wave number as the frequency in space. [I could write some more here, but I can’t make it too long, and you can easily look up stuff like this on the Web.] Now, the propagation speed c of the wave is, quite simply, the ratio of these two numbers: c = ω/k. [Again, it’s easy to show how that works, but I won’t do it here.]

Now you know it all, and so it’s time to get back to the lesson.

An analysis of partial reflection (II)

Why did I digress? Well… I think that what I write above makes much more sense than Leighton’s rather convoluted description of a monochromatic light source as he tries to explain those arrows in diagram (b) above. Whatever it is, a monochromatic light source is surely not “a device that has been carefully arranged so that the amplitude for a photon to be emitted at a certain time can be easily calculated.” That’s plain nonsense. Monochromatic light is light of a specific color, so all photons have the same frequency (or, to be precise, their wave functions have all the same well-defined frequency), but these photons are not in phase. Photons are emitted by atoms, as an electron moves from one energy level to the other. Now, when a photon is emitted, what actually happens is that the atom radiates a train of waves only for about 10–8 sec, so that’s about 10 billionths of a second. After 10–8 sec, some other atom takes over, and then another atom, and so on. Each atom emits one photon, whose energy is the difference between the two energy levels that the electron is jumping to or from. So the phase of the light that is being emitted can really only stay the same for about 10–8 sec. Full stop.

Now, what I write above on how atoms actually emit photons is a paraphrase of Feynman’s own words in his much more serious series of Lectures on Mechanics, Radiation and Heat. Therefore, I am pretty sure it’s Leighton who gets somewhat lost when trying to explain what’s happening. It’s not photons that interfere. It’s the probability amplitudes associated with the various paths that a photon can take. To be fully precise, we’re talking the photon here, i.e. the one that ends up in the detector, and so what’s going on is that the photon is interfering with itself. Indeed, that’s exactly what the ‘craziness’ of quantum mechanics is all about: we sent electrons, one by one, through two slits, and we observe an interference pattern. Likewise, we got one photon here, that can go various ways, and it’s those amplitudes that interfere, so… Yes: the photon interferes with itself.

OK. Let’s get back to the lesson and look at diagram (c) now, in which the six arrows are added. As mentioned above, it would not make any difference if we’d divide the glass in 10 or 20 or 1000 or a zillion ‘very thin’ sections: there would be many more arrows, but they would be much smaller ones, and they would cover the same circular segment: its two endpoints would define the same arc, and the same chord on the circle that we can draw when extending that circular segment. Indeed, the six little arrows define a circle, and that’s the key to understanding what happens in the first chapter of Feynman’s QED, where he adds two arrows only, but with a reversal of the direction of the ‘front reflection’ arrow. Here there’s no confusion – Feynman (or Leighton) eloquently describe what they do:

“There is a mathematical trick we can use to get the same answer [i.e. the same final arrow]: Connecting the arrows in order from 1 to 6, we get something like an arc, or part of a circle. The final arrow forms the chord of this arc. If we draw arrows from the center of the ‘circle’ to the tail of arrow 1 and to the head of arrow 6, we get two radii. If the radius arrow from the center to arrow 1 is turned 180° (“subtracted”), then it can be combined with the other radius arrow to give us the same final arrow! That’s what I was doing in the first lecture: these two radii are the two arrows I said represented the ‘front surface’ and ‘back surface’ reflections. They each have the famous length of 0.2.”

That’s what’s shown in part (d) of the illustration above and, in case you’re still wondering what’s going on, the illustration below should help you to make your own drawings now.

CircularsegmentSo… That explains the phenomenon Feynman wanted to explain, which is a phenomenon that cannot be explained in classical physics. Let me copy the original here:


Partial reflection by glass—a phenomenon that cannot be explained in classical physics? Really?

You’re right to raise an objection: partial reflection by glass can, in fact, be explained by the classical theory of light as an electromagnetic wave. The assumption then is that light is effectively being reflected by both the front and back surface and the reflected waves combine or cancel out (depending on the thickness of the glass and the angle of reflection indeed) to match the observed pattern. In fact, that’s how the phenomenon was explained for hundreds of years! The point to note is that the wave theory of light collapsed as technology advanced, and experiments could be made with very weak light hitting photomultipliers. As Feynman writes: “As the light got dimmer and dimmer, the photomultipliers kept making full-sized clicks—there were just fewer of them. Light behaved as particles!”

The point is that a photon behaves like an electron when going through two slits: it interferes with itself! As Feynman notes, we do not have any ‘common-sense’ theory to explain what’s going on here. We only have quantum mechanics, and quantum mechanics is an “uncommon-sensy” theory: a “strange” or even “absurd” theory, that looks “cockeyed” and incorporates “crazy ideas”. But… It works.

Now that we’re here, I might just as well add a few more paragraphs to fully summarize this lovely publication – if only because summarizing stuff like this helps me to come to terms with understanding things better myself!

Calculating amplitudes: the basic actions

So it all boils down to calculating amplitudes: an event is divided into alternative ways of how the event can happen, and the arrows for each way are ‘added’. Now, every way an event can happen can be further subdivided into successive steps. The amplitudes for these steps are then ‘multiplied’. For example, the amplitude for a photon to go from A to C via B is the ‘product’ of the amplitude to go from A to B and the amplitude to go from B to C.

I marked the terms ‘multiplied’ and ‘product’ with apostrophes, as if to say it’s not a ‘real’ product. But it is an actual multiplication: it’s the product of two complex numbers. Feynman does not explicitly compare this product to other products, such as the dot (•) or cross (×) product of two vectors, but he uses the ∗ symbol for multiplication here, which clearly distinguishes VW from VW or V×W indeed or, more simply, from the product of two ordinary numbers. [Ordinary numbers? Well… With ‘ordinary’ numbers, I mean real numbers, of course, but once you get used to complex numbers, you won’t like that term anymore, because complex numbers start feeling just as ‘real’ as other numbers – especially when you get used to the idea of those complex-valued wave functions underneath reality.]

Now, multiplying complex numbers, or ‘arrows’ using QED’s simpler language, consists of adding their angles and multiplying their lengths. That being said, the arrows here all have a length smaller than one (because their square cannot be larger than one, because that square is a probability, i.e. a (real) number between 0 and 1), Feynman defines successive multiplication as successive ‘shrinks and turns’ of the unit arrow. That all makes sense – very much sense.

But what’s the basic action? As Feynman puts the question: “How far can we push this process of splitting events into simpler and simpler subevents? What are the smallest possible bits and pieces? Is there a limit?” He immediately answers his own question. There are three ‘basic actions':

  1. A photon goes from one point (in spacetime) to another: this amplitude is denoted by P(A to B).
  2. An electron goes from one point to another: E(A to B).
  3. An electron emits and/or absorbs a photon: this is referred to as a ‘junction’ or a ‘coupling’, and the amplitude for this is denoted by the symbol j, i.e. the so-called junction number.

How do we find the amplitudes for these?

The amplitudes for (1) and (2) are given by a so-called propagator functions, which give you the probability amplitude for a particle to travel from one place to another in a given time indeed, or to travel with a certain energy and momentum. Judging from the Wikipedia article on these functions, the subject-matter is horrendously complicated, and the formulas are too, even if Feynman says it’s ‘very simple’ – for a photon, that is. The key point to note is that any path is possible. Moreover, there are also amplitudes for photons to go faster or slower than the speed of light (c)! However, these amplitudes make smaller contributions, and cancel out over longer distances. The same goes for the crooked paths: the amplitudes cancel each other out as well.

What remains are the ‘nearby paths’. In my previous post (check the section on electromagnetic radiation), I noted that, according to classical wave theory, a light wave does not occupy any physical space: we have electric and magnetic field vectors that oscillate in a direction that’s perpendicular to the direction of propagation, but these do not take up any space. In quantum mechanics, the situation is quite different. As Feynman puts it: “When you try to squeeze light too much [by forcing it to go through a small hole, for example, as illustrated below], it refuses to cooperate and begins to spread out.” He explains this in the text below the second drawing: “There are not enough arrows representing the paths to Q to cancel each other out.”

Many arrowsFew arrows

Not enough arrows? We can subdivide space in as many paths as we want, can’t we? Do probability amplitudes take up space? And now that we’re asking the tougher questions, what’s a ‘small’ hole? What’s ‘small’ and what’s ‘large’ in this funny business?

Unfortunately, there’s not much of an attempt in the booklet to try to answer these questions. One can begin to formulate some kind of answer when doing some more thinking about these wave functions. To be precise, we need to start looking at their wavelength. The frequency of a typical photon (and, hence, of the wave function representing that photon) is astronomically high. For visible light, it’s in the range of 430 to 790 teraherz, i.e. 430–790×1012 Hz. We can’t imagine such incredible numbers. Because the frequency is so high, the wavelength is unimaginably small. There’s a very simple and straightforward relation between wavelength (λ) and frequency (ν) indeed: c = λν. In words: the speed of a wave is the wavelength (i.e. the distance (in space) of one cycle) times the frequency (i.e. the number of cycles per second). So visible light has a wavelength in the range of 390 to 700 nanometer, i.e. 390–700 billionths of a meter. A meter is a rather large unit, you’ll say, so let me express it differently: it’s less than one thousandth of a micrometer, and a micrometer itself is one thousandth of a millimeter. So, no, we can’t imagine that distance either.

That being said, that wavelength is there, and it does imply that some kind of scale is involved. A wavelength covers one full cycle of the oscillation: it means that, if we travel one wavelength in space, our ‘arrow’ will point in the same direction again. Both drawings above (Figure 33 and 34) suggest the space between the two blocks is less than one wavelength. It’s a bit hard to make sense of the direction of the arrows but note the following:

  1. The phase difference between (a) the ‘arrow’ associated with the straight route (i.e. the ‘middle’ path) and (b) the ‘arrow’ associated with the ‘northern’ or ‘southern’ route (i.e. the ‘highest’ and ‘lowest’ path) in Figure 33 is like quarter of a full turn, i.e. 90°. [Note that the arrows for the northern and southern route to P point in the same direction, because they are associated with the same timing. The same is true for the two arrows in-between the northern/southern route and the middle path.]
  2. In Figure 34, the phase difference between the longer routes and the straight route is much less, like 10° only.

Now, the calculations involved in these analyses are quite complicated but you can see the explanation makes sense: the gap between the two blocks is much narrower in Figure 34 and, hence, the geometry of the situation does imply that the phase difference between the amplitudes associated with the ‘northern’ and ‘southern’ routes to Q is much smaller than the phase difference between those amplitudes in Figure 33. To be precise,

  1. The phase difference between (a) the ‘arrow’ associated with the ‘northern route’ to Q and (b) the ‘arrow’ associated with the ‘southern’ route to Q (i.e. the ‘highest’ and ‘lowest’ path) in Figure 33 is like three quarters of a full turn, i.e. 270°. Hence, the final arrow is very short indeed, which means that the probability of the photon going to Q is very low indeed. [Note that the arrows for the northern and southern route no longer point in the same direction, because they are associated with very different timings: the ‘southern route’ is shorter and, hence, faster.]
  2. In Figure 34, we have a phase difference between the shortest and longest route that is like 60° only and, hence, the final arrow is very sizable and, hence, the probability of the photon going to Q is, accordingly, quite substantial.

OK… What did I say here about P(A to B)? Nothing much. I basically complained about the way Feynman (or Leighton, more probably) explained the interference or diffraction phenomenon and tried to do a better job before tacking the subject indeed: how do we get that P(A to B)?

A photon can follow any path from A to B, including the craziest ones (as shown below). Any path? Good players give a billiard ball extra spin that may make the ball move in a curved trajectory, and will also affect its its collision with any other ball – but a trajectory like the one below? Why would a photon suddenly take a sharp turn left, or right, or up, or down? What’s the mechanism here? What are the ‘wheels and gears inside’ of the photon that (a) make a photon choose this path in the first place and (b) allow it to whirl, swirl and twirl like that?


We don’t know. In fact, the question may make no sense, because we don’t know what actually happens when a photon travels through space. We know it leaves as a lump of energy, and we know it arrives as a similar lump of energy. When we actually put a detector to check which path is followed – by putting special detectors at the slits in the famous double-slit experiment, for example – the interference pattern disappears. So… Well… We don’t know how to describe what’s going on: a photon is not a billiard ball, and it’s not a classical electromagnetic wave either. It is neither. The only thing that we know is that we get probabilities that match with the results of experiment if we accept this nonsensical assumptions and do all of the crazy arithmetic involved. Let me get back to the lesson.  

Photons can also travel faster or slower than the speed of light (c is some 3×108 meter per second but, in our special time unit, it’s equal to one). Does that violate relativity? It doesn’t, apparently, but for the reasoning behind I must, once again, refer you to more sophisticated writing.

In any case, if the mathematicians and physicists have to take into account both of these assumptions (any path is possible, and speeds higher or lower than c are possible too!), they must be looking at some kind of horrendous integral, don’t they?

They are. When everything is said and done, that propagator function is some monstrous integral indeed, and I can’t explain it to you in a couple of words – if only because I am struggling with it myself. :-) So I will just believe Feynman when he says that, when the mathematicians and physicists are finished with that integral, we do get some simple formula which depends on the value of the so-called spacetime interval between two ‘points’ – let’s just call them 1 and 2 – in space and time. You’ve surely heard about it before: it’s denoted by sor I (or whatever) and it’s zero if an object moves at the speed of light, which is what light is supposed to do – but so we’re dealing with a different situation here. :-) To be precise, I consists of two parts:

  1. The distance d between the two points (1 and 2), i.e. Δr, which is just the square root of d= Δr= (x2–x2)2+(y2–y1)2+(z2–z1)2. [This formula is just a three-dimensional version of the Pythagorean Theorem.]
  2. The ‘distance’ (or difference) in time, which is usually expressed in those ‘equivalent’ time units that we introduced above already, i.e. the time that light – traveling at the speed of light :-) – needs to travel one meter. We will usually see that component of I in a squared version too: Δt= (t2–t1)2, or, if time is expressed in the ‘old’ unit (i.e. seconds), then we write c2Δt2 = c2(t2–t1)2.

Now, the spacetime interval itself is defined as the excess of the squared distance (in space) over the squared time difference:

s= I = Δr– Δt= (x2–x2)2+(y2–y1)2+(z2–z1)– (t2–t1)2

You know we can then define time-like, space-like and light-like intervals, and these, in turn, define the so-called light cone. The spacetime interval can be negative, for example. In that case, Δt2 will be greater than Δr2, so there is no ‘excess’ of distance over time: it means that the time difference is large enough to allow for a cause–effect relation between the two events, and the interval is said to be time-like. In any case, that’s not the topic of this post, and I am sorry I keep digressing.

The point to note is that the formula for the propagator favors light-like intervals: they are associated with large arrows. Space- and time-like intervals, on the other hand, will contribute much smaller arrows. In addition, the arrows for space- and time-like intervals point in opposite directions, so they will cancel each other out. So, when everything is said and done, over longer distances, light does tend to travel in a straight line and at the speed of light. At least, that’s what Feynman tells us, and I tend to believe him. :-)

But so where’s the formula? Feynman doesn’t give it, probably because it would indeed confuse us. Just google ‘propagator for a photon’ and you’ll see what I mean. He does integrate the above conclusions in that illustration (b) though. What illustration? 

Oh… Sorry. You probably forgot what I am trying to do here, but so we’re looking at that analysis of partial reflection of light by glass. Let me insert it once again so you don’t have to scroll all the way up.

partial reflection

You’ll remember that Feynman divided the glass sheet into five sections and, hence, there are six points from which the photon can be scattered into the detector at A: X1 to X6. So that makes six possible paths: these paths are all straight (so Feynman makes abstraction of all of the crooked paths indeed), and the other assumption is that the photon effectively traveled at the speed of light, whatever path it took (so Feynman also assumes the amplitudes for speeds higher or lower than c cancel each other out). So that explains the difference in time at emission from the light source. The longest path is the path to point X6 and then back up to the detector. If the photon would have taken that path, it would have to be emitted earlier in time – earlier as compared to the other possibilities, which take less time. So it would have to be emitted at T = T6. The direction of the ‘arrow’ is like one o’clock. The shorter paths are associated with shorter times (the difference between the time of arrival and departure is shorter) and so T5 is associated with an arrow in the 12 o’clock direction, T5 is 11 o’clock, and so on, till T5, which points at the 9 o’clock direction.

But… What? These arrows also include the reflection, i.e. the interaction between the photon and some electron in the glass, don’t they? […] Right you are. Sorry. So… Yes. The action above involves four ‘basic actions':

  1. A photon is emitted by the source at a time T = T1, T2, T3, T4, T5 or T6: we don’t know. Quantum-mechanical uncertainty. :-)
  2. It goes from the source to one of the points X = X1, X2, X3, X4, X5 or Xin the glass: we don’t know which one, because we don’t have a detector there.
  3. The photon interacts with an electron at that point.
  4. It makes it way back up to the detector at A.

Step 1 does not have any amplitude. It’s just the start of the event. Well… We start with the unit arrow pointing north actually, so its length is one and its direction is 12 o’clock. And so we’ll shrink and turn it, i.e. multiply it with other arrows, in the next steps.

Steps 2 and 4 are straightforward and are associated with arrows of the same length. Their direction depends on the distance traveled and/or the time of emission: it amounts to the same because we assume the speed is constant and exactly the same for the six possibilities (that speed is c = 1 obviously). But what length? Well… Some length according to that formula which Feynman didn’t give us. :-)

So now we need to analyze the third of those three basic actions: a ‘junction’ or ‘coupling’ between an electron and a photon. At this point, Feynman embarks on a delightful story highlighting the difficulties involved in calculating that amplitude. A photon can travel following crooked paths and at devious speeds, but an electron is even worse: it can take what Feynman refers to as ‘one-hop flights’, ‘two-hop flights’, ‘three-hop flights’,… any ‘n-hop flight’ really. Each stop involves an additional amplitude, which is represented by n2, with n some number that has been determined from experiment. The formula for E(A to B) then becomes a series of terms: P(A to B) + (P(A to C)∗n2∗(P(C to B) + (P(A to D)∗n2∗P(D to E)∗n2∗P(E to C)+…

P(A to B) is the ‘one-hop flight’ here, while C, D and E are intermediate points, and (P(A to C)∗n2∗(P(C to B) and (P(A to D)∗n2∗P(D to E)∗n2∗P(E to C) are the ‘two-hop’ and ‘three-hop’ flight respectively. Note that this calculation has to be made for all possible intermediate points C, D, E and so on. To make matters worse, the theory assumes that electrons can emit and absorb photons along the way, and then there’s a host of other problems, which Feynman tries to explain in the last and final chapter of his little book. […]

Hey! Stop it!


You’re talking about E(A to B) here. You’re supposed to be talking about that junction number j.

Oh… Sorry. You’re right. Well… That junction number j is about –0.1. I know that looks like an ordinary number, but it’s an amplitude, so you should interpret it as an arrow. When you multiply it with another arrow, it amounts to a shrink to one-tenth, and half a turn. Feynman entertains us also on the difficulties of calculating this number but, you’re right, I shouldn’t be trying to copy him here – if only because it’s about time I finish this post. :-)

So let me conclude it indeed. We can apply the same transformation (i.e. we multiply with j) to each of the six arrows we’ve got so far, and the result is those six arrows next to the time axis in illustration (b). And then we combine them to get that arc, and then we apply that mathematical trick to show we get the same result as in a classical wave-theoretical analysis of partial reflection.

Done. […] Are you happy now?

[…] You shouldn’t be. There are so many questions that have been left unanswered. For starters, Feynman never gives that formula for the length of P(A to B), so we have no clue about the length of these arrows and, hence, about that arc. If physicists know their length, it seems to have been calculated backwards – from those 0.2 arrows used in the classical wave theory of light. Feynman is actually quite honest about that, and simply writes:

“The radius of the arc [i.e. the arc that determines the final arrow] evidently depends on the length of the arrow for each section, which is ultimately determined by the amplitude S that an electron in an atom of glass scatters a photon. This radius can be calculated using the formulas for the three basic actions. […] It must be said, however, that no direct calculation from first principles for a substance as complex as glass has actually been done. In such cases, the radius is determined by experiment. For glass, it has been determined from experiment that the radius is approximately 0.2 (when the light shines directly onto the glass at right angles).”

Well… OK. I think that says enough. So we have a theory – or first principles at last – but we don’t them to calculate. That actually sounds a bit like metaphysics to me. :-) In any case… Well… Bye for now!

But… Hey! You said you’d analyze how light goes straight through the glass as well?

Yes. I did. But I don’t feel like doing that right now. I think we’ve got enough stuff to think about right now, don’t we? :-)

Applied vector analysis (II)

We’ve covered a lot of ground in the previous post, but we’re not quite there yet. We need to look at a few more things in order to gain some kind of ‘physical’ understanding’ of Maxwell’s equations, as opposed to a merely ‘mathematical’ understanding only. That will probably disappoint you. In fact, you probably wonder why one needs to know about Gauss’ and Stokes’ Theorems if the only objective is to ‘understand’ Maxwell’s equations.

To some extent, your skepticism is justified. It’s already quite something to get some feel for those two new operators we’ve introduced in the previous post, i.e. the divergence (div) and curl operators, denoted by ∇• and × respectively. By now, you understand that these two operators act on a vector field, such as the electric field vector E, or the magnetic field vector B, or, in the example we used, the heat flow h, so we should write •(a vector) and ×(a vector. And, as for that del operator – i.e.  without the dot (•) or the cross (×) – if there’s one diagram you should be able to draw off the top of your head, it’s the one below, which shows:

  1. The heat flow vector h, whose magnitude is the thermal energy that passes, per unit time and per unit area, through an infinitesimally small isothermal surface, so we write: h = |h| = ΔJ/ΔA.
  2. The gradient vector T, whose direction is opposite to that of h, and whose magnitude is proportional to h, so we can write the so-called differential equation of heat flow: h = –κT.
  3. The components of the vector dot product ΔT = T•ΔR = |T|·ΔR·cosθ.

Temperature drop

You should also remember that we can re-write that ΔT = T•ΔR = |T|·ΔR·cosθ equation – which we can also write as ΔT/ΔR = |T|·cosθ – in a more general form:

Δψ/ΔR = |ψ|·cosθ

That equation says that the component of the gradient vector ψ along a small displacement ΔR is equal to the rate of change of ψ in the direction of ΔRAnd then we had three important theorems, but I can imagine you don’t want to hear about them anymore. So what can we do without them? Let’s have a look at Maxwell’s equations again and explore some linkages.

Curl-free and divergence-free fields

From what I wrote in my previous post, you should remember that:

  1. The curl of a vector field (i.e. ×C) represents its circulation, i.e. its (infinitesimal) rotation.
  2. Its divergence (i.e. ∇•C) represents the outward flux out of an (infinitesimal) volume around the point we’re considering.

Back to Maxwell’s equations:

Maxwell's equations-2

Let’s start at the bottom, i.e. with equation (4). It says that a changing electric field (i.e. ∂E/∂t ≠ 0) and/or a (steady) electric current (j0) will cause some circulation of B, i.e. the magnetic field. It’s important to note that (a) the electric field has to change and/or (b) that electric charges (positive or negative) have to move  in order to cause some circulation of B: a steady electric field will not result in any magnetic effects.

This brings us to the first and easiest of all the circumstances we can analyze: the static case. In that case, the time derivatives ∂E/∂t and ∂B/∂t are zero, and Maxwell’s equations reduce to:

  1. ∇•E = ρ/ε0. In this equation, we have ρ, which represents the so-called charge density, which describes the distribution of electric charges in space: ρ = ρ(x, y, z). To put it simply: ρ is the ‘amount of charge’ (which we’ll denote by Δq) per unit volume at a given point. Hence, if we  consider a small volume (ΔV) located at point (x, y, z) in space – an infinitesimally small volume, in fact (as usual) –then we can write: Δq =  ρ(x, y, z)ΔV. [As for ε0, you already know this is a constant which ensures all units are ‘compatible’.] This equation basically says we have some flux of E, the exact amount of which is determined by the charge density ρ or, more in general, by the charge distribution in space.  
  2. ×E = 0. That means that the curl of E is zero: everywhere, and always. So there’s no circulation of E. We call this a curl-free field.
  3. B = 0. That means that the divergence of B is zero: everywhere, and always. So there’s no flux of B. None. We call this a divergence-free field.
  4. c2∇×B = j0. So here we have steady current(s) causing some circulation of B, the exact amount of which is determined by the (total) current j. [What about that cfactor? Well… We talked about that before: magnetism is, basically, a relativistic effect, and so that’s where that factor comes from. I’ll just refer you to what Feynman writes about this in his Lectures, and warmly recommend to read it, because it’s really quite interesting: it gave me at least a much deeper understanding of what it’s all about, and so I hope it will help you as much.]

Now you’ll say: why bother with all these difficult mathematical constructs if we’re going to consider curl-free and divergence-free fields only. Well… B is not curl-free, and E is not divergence-free. To be precise:

  1. E is a field with zero curl and a given divergence, and
  2. B is a field with zero divergence and a given curl.

Yeah, but why can’t we analyze fields that have both curl and divergence? The answer is: we can, and we will, but we have to start somewhere, and so we start with an easier analysis first.

Electrostatics and magnetostatics

The first thing you should note is that, in the static case (i.e. when charges and currents are static), there is no interdependence between E and B. The two fields are not interconnected, so to say. Therefore, we can neatly separate them into two pairs:

  1. Electrostatics: (1) ∇•E = ρ/ε0 and (2) ×E = 0.
  2. Magnetostatics: (1) ∇×B = j/c2ε0 and (2) B = 0.

Now, I won’t go through all of the particularities involved. In fact, I’ll refer you to a real physics textbook on that (like Feynman’s Lectures indeed). My aim here is to use these equations to introduce some more math and to gain a better understanding of vector calculus – an understanding that goes, in fact, beyond the math (i.e. a ‘physical’ understanding, as Feynman terms it).

At this point, I have to introduce two additional theorems. They are nice and easy to understand (although not so easy to prove, and so I won’t):

Theorem 1: If we have a vector field – let’s denote it by C – and we find that its curl is zero everywhere, then C must be the gradient of something. In other words, there must be some scalar field ψ (psi) such that C is equal to the gradient of ψ. It’s easier to write this down as follows:

If ×= 0, there is a ψ such that C = ψ.

Theorem 2: If we have a vector field – let’s denote it by D, just to introduce yet another letter – and we find that its divergence is zero everywhere, then D must be the curl of some vector field A. So we can write:

If D = 0, there is an A such that D = ×A.

We can apply this to the situation at hand:

  1. For E, there is some scalar potential Φ such that E = –Φ. [Note that we could integrate the minus sign in Φ, but we leave it there as a reminder that the situation is similar to that of heat flow. It’s a matter of convention really: E ‘flows’ from higher to lower potential.]
  2. For B, there is a so-called vector potential A such that B = ×A.

The whole game is then to compute Φ and A everywhere. We can then take the gradient of Φ, and the curl of A, to find the electric and magnetic field respectively, at every single point in space. In fact, most of Feynman’s second Volume of his Lectures is devoted to that, so I’ll refer you that if you’d be interested. As said, my goal here is just to introduce the basics of vector calculus, so you gain a better understanding of physics, i.e. an understanding which goes beyond the math.


We’re almost done. Electrodynamics is, of course, much more complicated than the static case, but I don’t have the intention to go too much in detail here. The important thing is to see the linkages in Maxwell’s equations. I’ve highlighted them below:

Maxwell interaction

I know this looks messy, but it’s actually not so complicated. The interactions between the electric and magnetic field are governed by equation (2) and (4), so equation (1) and (3) is just ‘statics’. Something needs to trigger it all, of course. I assume it’s an electric current (that’s the arrow marked by [0]).

Indeed, equation (4), i.e. c2∇×B = ∂E/∂t + j0, implies that a changing electric current – an accelerating electric charge, for instance – will cause the circulation of B to change. More specifically, we can write: ∂[c2∇×B]/∂t = ∂[j0]∂t. However, as the circulation of B changes, the magnetic field B itself must be changing. Hence, we have a non-zero time derivative of B (∂B/∂t ≠ 0). But, then, according to equation (2), i.e. ∇×E = –∂B/∂t, we’ll have some circulation of E. That’s the dynamics marked by the red arrows [1].

Now, assuming that ∂B/∂t is not constant (because that electric charge accelerates and decelerates, for example), the time derivative ∂E/∂t will be non-zero too (∂E/∂t ≠ 0). But so that feeds back into equation (4), according to which a changing electric field will cause the circulation of B to change. That’s the dynamics marked by the yellow arrows [2].

The ‘feedback loop’ is closed now: I’ve just explained how an electromagnetic field (or radiation) actually propagates through space. Below you can see one of the fancier animations you can find on the Web. The blue oscillation is supposed to represent the oscillating magnetic vector, while the red oscillation is supposed to represent the electric field vector. Note how the effect travels through space.


This is, of course, an extremely simplified view. To be precise, it assumes that the light wave (that’s what an electromagnetic wave actually is) is linearly (aka as plane) polarized, as the electric (and magnetic field) oscillate on a straight line. If we choose the direction of propagation as the z-axis of our reference frame, the electric field vector will oscillate in the xy-plane. In other words, the electric field will have an x- and a y-component, which we’ll denote as Ex and Erespectively, as shown in the diagrams below, which give various examples of linear polarization.

linear polarizationLight is, of course, not necessarily plane-polarized. The animation below shows circular polarization, which is a special case of the more general elliptical polarization condition.


The relativity of magnetic and electric fields

Allow me to make a small digression here, which has more to do with physics than with vector analysis. You’ll have noticed that we didn’t talk about the magnetic field vector anymore when discussing the polarization of light. Indeed, when discussing electromagnetic radiation, most – if not all – textbooks start by noting we have E and B vectors, but then proceed to discuss the E vector only. Where’s the magnetic field? We need to note two things here.

1. First, I need to remind you of the force on any electrically charged particle (and note we only have electric charge: there’s no such thing as a magnetic charge according to Maxwell’s third equation) consists of two components. Indeed, the total electromagnetic force (aka Lorentz force) on a charge q is:

F = q(E + v×B) = qE + q(v×B) = FE + FM

The velocity vector v is the velocity of the charge: if the charge is not moving, then there’s no magnetic force. The illustration below shows you the components of the vector cross product that, by now, you’re fully familiar with. Indeed, in my previous post, I gave you the expressions for the x, y and z coordinate of a cross product, but there’s a geometrical definition as well:

v×B = |v||B|sin(θ)n

magnetic force507px-Right_hand_rule_cross_product

The magnetic force FM is q(v×B) = qv×B q|v||B|sin(θ)n. The unit vector n determines the direction of the force, which is determined by that right-hand rule that, by now, you also are fully familiar with: it’s perpendicular to both v and B (cf. the two 90° angles in the illustration). Just to make sure, I’ve also added the right-hand rule illustration above: check it out, as it does involve a bit of arm-twisting in this case. :-)

In any case, the point to note here is that there’s only one electromagnetic force on the particle. While we distinguish between an E and a B vector, the E and B vector depend on our reference frame. Huh? Yes. The velocity v is relative: we specify the magnetic field in a so-called inertial frame of reference here. If we’d be moving with the charge, the magnetic force would, quite simply, disappear, because we’d have a v equal to zero, so we’d have v×B = 0×B= 0. Of course, all other charges (i.e. all ‘stationary’ and ‘moving’ charges that were causing the field in the first place) would have different velocities as well and, hence, our E and B vector would look very different too: they would come in a ‘different mixture’, as Feynman puts it. [If you’d want to know in what mixture exactly, I’ll refer you Feynman: it’s a rather lengthy analysis (five rather dense pages, in fact), but I can warmly recommend it: in fact, you should go through it if only to test your knowledge at this point, I think.]

You’ll say: So what? That doesn’t answer the question above. Why do physicists leave out the magnetic field vector in all those illustrations?

You’re right. I haven’t answered the question. This first remark is more like a warning. Let me quote Feynman on it:

“Since electric and magnetic fields appear in different mixtures if we change our frame of reference, we must be careful about how we look at the fields E and B. […] The fields are our way of describing what goes on at a point in space. In particular, E and B tell us about the forces that will act on a moving particle. The question “What is the force on a charge from a moving magnetic field?” doesn’t mean anything precise. The force is given by the values of E and B at the charge, and the F = q(E + v×B) formula is not to be altered if the source of E or B is moving: it is the values of E and B that will be altered by the motion. Our mathematical description deals only with the fields as a function of xy, z, and t with respect to some inertial frame.”

If you allow me, I’ll take this opportunity to insert another warning, one that’s quite specific to how we should interpret this concept of an electromagnetic wave. When we say that an electromagnetic wave ‘travels’ through space, we often tend to think of a wave traveling on a string: we’re smart enough to understand that what is traveling is not the string itself (or some part of the string) but the amplitude of the oscillation: it’s the vertical displacement (i.e. the movement that’s perpendicular to the direction of ‘travel’) that appears first at one place and then at the next and so on and so on. It’s in that sense, and in that sense only, that the wave ‘travels’. However, the problem with this comparison to a wave traveling on a string is that we tend to think that an electromagnetic wave also occupies some space in the directions that are perpendicular to the direction of travel (i.e. the x and y directions in those illustrations on polarization). Now that’s a huge misconception! The electromagnetic field is something physical, for sure, but the E and B vectors do not occupy any physical space in the x and y direction as they ‘travel’ along the z direction!

Let me conclude this digression with Feynman’s conclusion on all of this:

“If we choose another coordinate system, we find another mixture of E and B fields. However, electric and magnetic forces are part of one physical phenomenon—the electromagnetic interactions of particles. While the separation of this interaction into electric and magnetic parts depends very much on the reference frame chosen for the description, the complete electromagnetic description is invariant: electricity and magnetism taken together are consistent with Einstein’s relativity.”

2. You’ll say: I don’t give a damn about other reference frames. Answer the question. Why are magnetic fields left out of the analysis when discussing electromagnetic radiation?

The answer to that question is very mundane. When we know E (in one or the other reference frame), we also know B, and, while B is as ‘essential’ as E when analyzing how an electromagnetic wave propagates through space, the truth is that the magnitude of B is only a very tiny fraction of that of E.

Huh? Yes. That animation with these oscillating blue and red vectors is very misleading in this regard. Let me be precise here and give you the formulas:

E vector of wave

B vector of a wave

I’ve analyzed these formulas in one of my other posts (see, for example, my first post on light and radiation), and so I won’t repeat myself too much here. However, let me recall the basics of it all. The eR′ vector is a unit vector pointing in the apparent direction of the charge. When I say ‘apparent’, I mean that this unit vector is not pointing towards the present position of the charge, but at where is was a little while ago, because this ‘signal’ can only travel from the charge to where we are now at the same speed of the wave, i.e. at the speed of light c. That’s why we prime the (radial) vector R also (so we write R′ instead of R). So that unit vector wiggles up and down and, as the formula makes clear, it’s the second-order derivative of that movement which determines the electric field. That second-order derivative is the acceleration vector, and it can be substituted for the vertical component of the acceleration of the charge that caused the radiation in the first place but, again, I’ll refer you my post on that, as it’s not the topic we want to cover here.

What we do want to look at here, is that formula for B: it’s the cross product of that eR′ vector (the minus sign just reverses the direction of the whole thing) and E divided by c. We also know that the E and eR′ vectors are at right angles to each, so the sine factor (sinθ) is 1 (or –1) too. In other words, the magnitude of B is |E|/c =  E/c, which is a very tiny fraction of E indeed (remember: c ≈ 3×108).

So… Yes, for all practical purposes, B doesn’t matter all that much when analyzing electromagnetic radiation, and so that’s why physicists will note it but then proceed and look at E only when discussing radiation. Poor BThat being said, the magnetic force may be tiny, but it’s quite interesting. Just look at its direction! Huh? Why? What’s so interesting about it?  I am not talking the direction of B here: I am talking the direction of the force. Oh… OK… Hmm… Well…

Let me spell it out. Take the force formula: F = q(E + v×B) = qE + q(v×B). When our electromagnetic wave hits something real (I mean anything real, like a wall, or some molecule of gas), it is likely to hit some electron, i.e. an actual electric charge. Hence, the electric and magnetic field should have some impact on it. Now, as we pointed here, the magnitude of the electric force will be the most important one – by far – and, hence, it’s the electric field that will ‘drive’ that charge and, in the process, give it some velocity v, as shown below. In what direction? Don’t ask stupid questions: look at the equation. FE = qE, so the electric force will have the same direction as E.

radiation pressure

But we’ve got a moving charge now and, therefore, the magnetic force comes into play as well! That force is FM  = q(v×B) and its direction is given by the right-hand rule: it’s the F above in the direction of the light beam itself. Admittedly, it’s a tiny force, as its magnitude is F = qvE/c only, but it’s there, and it’s what causes the so-called radiation pressure (or light pressure tout court). So, yes, you can start dreaming of fancy solar sailing ships (the illustration below shows one out of of Star Trek) but… Well… Good luck with it! The force is very tiny indeed and, of course, don’t forget there’s light coming from all directions in space!

solar sail

Jokes aside, it’s a real and interesting effect indeed, but I won’t say much more about it. Just note that we are really talking the momentum of light here, and it’s a ‘real’ as any momentum. In an interesting analysis, Feynman calculates this momentum and, rather unsurprisingly (but please do check out how he calculates these things, as it’s quite interesting), the same 1/c factor comes into play once: the momentum (p) that’s being delivered when light hits something real is equal to 1/c of the energy that’s being absorbed. So, if we denote the energy by W (in order to not create confusion with the E symbol we’ve used already), we can write: p = W/c.

Now I can’t resist one more digression. We’re, obviously, fully in classical physics here and, hence, we shouldn’t mention anything quantum-mechanical here. That being said, you already know that, in quantum physics, we’ll look at light as a stream of photons, i.e. ‘light particles’ that also have energy and momentum. The formula for the energy of a photon is given by the Planck relation: E = hf. The h factor is Planck’s constant here – also quite tiny, as you know – and f is the light frequency of course. Oh – and I am switching back to the symbol E to denote energy, as it’s clear from the context I am no longer talking about the electric field here.

Now, you may or may not remember that relativity theory yields the following relations between momentum and energy:  

E2 – p2c2 = m0cand/or pc = Ev/c

In this equations, mstands, obviously, for the rest mass of the particle, i.e. its mass at v = 0. Now, photons have zero rest mass, but their speed is c. Hence, both equations reduce to p = E/c, so that’s the same as what Feynman found out above: p = W/c.

Of course, you’ll say: that’s obvious. Well… No, it’s not obvious at all. We do find the same formula for the momentum of light (p) – which is great, of course –  but so we find the same thing coming from very different necks parts of the woods. The formula for the (relativistic) momentum and energy of particles comes from a very classical analysis of particles – ‘real-life’ objects with mass, a very definite position in space and whatever other properties you’d associate with billiard balls – while that other p = W/c formula comes out of a very long and tedious analysis of light as an electromagnetic wave. The two analytical frameworks couldn’t differ much more, could they? Yet, we come to the same conclusion indeed.

Physics is wonderful. :-)

So what’s left?

Lots, of course! For starters, it would be nice to show how these formulas for E and B with eR′ in them can be derived from Maxwell’s equations. There’s no obvious relation, is there? You’re right. Yet, they do come out of the very same equations. However, for the details, I have to refer you to Feynman’s Lectures once again – to the second Volume to be precise. Indeed, besides calculating scalar and vector potentials in various situations, a lot of what he writes there is about how to calculate these wave equations from Maxwell’s equations. But so that’s not the topic of this post really. It’s, quite simply, impossible to ‘summarize’ all those arguments and derivations in a single post. The objective here was to give you some idea of what vector analysis really is in physics, and I hope you got the gist of it, because that’s what needed to proceed. :-)

The other thing I left out is much more relevant to vector calculus. It’s about that del operator () again: you should note that it can be used in many more combinations. More in particular, it can be used in combinations involving second-order derivatives. Indeed, till now, we’ve limited ourselves to first-order derivatives only. I’ll spare you the details and just copy a table with some key results:

  1. •(T) = div(grad T) = T = ()T = ∇2T = ∂2T/∂x+ ∂2T/∂y+ ∂2T/∂z= a scalar field
  2. ()h = ∇2= a vector field
  3. (h) = grad(div h) = a vector field
  4. ×(×h) = curl(curl h) =(h) – ∇2h
  5. ∇•(×h) = div(curl h) = 0 (always)
  6. ×(T) = curl(grad T) = 0 (always)

So we have yet another set of operators here: not less than six, to be precise. You may think that we can have some more, like (×), for example. But… No. A (×) operator doesn’t make sense. Just write it out and think about it. Perhaps you’ll see why. You can try to invent some more but, if you manage, you’ll see they won’t make sense either. The combinations that do make sense are listed above, all of them.

Now, while of these combinations make (some) sense, it’s obvious that some of these combinations are more useful than others. More in particular, the first operator, ∇2, appears very often in physics and, hence, has a special name: it’s the Laplacian. As you can see, it’s the divergence of the gradient of a function.

Note that the Laplace operator (∇2) can be applied to both scalar as well as vector functions. If we operate with it on a vector, we’ll apply it to each component of the vector function. The Wikipedia article on the Laplace operator shows how and where it’s used in physics, and so I’ll refer to that if you’d want to know more. Below, I’ll just write out the operator itself, as well as how we apply it to a vector:



So that covers (1) and (2) above. What about the other ‘operators’?

Let me start at the bottom. Equations (5) and (6) are just what they are: two results that you can use in some mathematical argument or derivation. Equation (4) is… Well… Similar: it’s an identity that may or may not help one when doing some derivation.


What about (3), i.e. the gradient of the divergence of some vector function? Nothing special. As Feynman puts it: “It is a possible vector field, but there is nothing special to say about it. It’s just some vector field which may occasionally come up.”

So… That should conclude my little introduction to vector analysis, and so I’ll call it a day now. :-) I hope you enjoyed it.

Applied vector analysis (I)

The relationship between math and physics is deep. When studying physics, one sometimes feels physics and math become one and the same. But they are not. In fact, eminent physicists such as Richard Feynman warn against emphasizing the math side of physics too much: “It is not because you understand the Maxwell equations mathematically inside out, that you understand physics inside out.”

We should never lose sight of the fact that all these equations and mathematical constructs represent physical realities. So the math is nothing but the ‘language’ in which we express physical reality and, as Feynman puts it, one (also) needs to develop a ‘physical’ – as opposed to a ‘mathematical’ – understanding of the equations. Now you’ll ask: what’s a ‘physical’ understanding? Well… Let me quote Feynman once again on that: “A physical understanding is a completely unmathematical, imprecise, and inexact thing, but absolutely necessary for a physicist.

It’s rather surprising to hear that from him: Feynman doesn’t like philosophy (see, for example, what he writes on the philosophical implications of the Uncertainty Principle), and he’s surely not the only physicist thinking philosophy is pretty useless. Indeed, while most physicists – or scientists in general, I’d say – will admit there is some value in a philosophy of science (that’s the branch of philosophy concerned with the foundations and methods of science), they will usually smile derisively when hearing someone talk about metaphysics. However, if metaphysics is the branch of philosophy that deals with ‘first principles’, then it’s obvious that the Standard Model (SM) in physics is, in fact, also some kind of ‘metaphysical’ model! Indeed, what everything is said and done, physicists assume those complex-valued wave functions are, somehow, ‘real’, but all they can ‘see’ (i.e. measure or verify by experiment) are (real-valued) probabilities: we can’t ‘see’ the probability amplitudes.

The only reason why we accept the SM theory is because its predictions agree so well with experiment. Very well indeed. The agreement between theory and experiment is most perfect in the so-called electromagnetic sector of the SM, but the results for the weak force (which I referred to as the ‘weird force’ in some of my posts) are very good too. For example, using CERN data, researchers could finally, recently, observe an extremely rare decay mode which, once again, confirms that the Standard Model, as complicated as it is, is the best we’ve got: just click on the link if you want to hear more about it. [And please do: stuff like this is quite readable and, hence, interesting.]

As this blog makes abundantly clear, it’s not easy to ‘summarize’ the Standard Model in a couple of sentences or in one simple diagram. In fact, I’d say that’s impossible. If there’s one or two diagrams sort of ‘covering’ it all, then it’s the two diagrams that you’ve seen ad nauseam already: (a) the overview of the three generations of matter, with the gauge bosons for the electromagnetic, strong and weak force respectively, as well as the Higgs boson, next to it, and (b) the overview of the various interactions between them. [And, yes, these two diagrams come from Wikipedia.]


I’ve said it before: the complexity of the Standard Model (it has not less than 61 ‘elementary’ particles taking into account that quarks and gluons come in various ‘colors’, and also including all antiparticles – which we have to include them in out count because they are just as ‘real’ as the particles), and the ‘weirdness’ of the weak force, plus a astonishing range of other ‘particularities’ (these ‘quantum numbers’ or ‘charges’ are really not easy to ‘understand’), do not make for a aesthetically pleasing theory but, let me repeat it again, it’s the best we’ve got. Hence, we may not ‘like’ it but, as Feynman puts it: “Whether we like or don’t like a theory is not the essential question. It is whether or not the theory gives predictions that agree with experiment.” (Feynman, QED – The Strange Theory of Light and Matter, p. 10)

It would be foolish to try to reduce the complexity of the Standard Model to a couple of sentences. That being said, when digging into the subject-matter of quantum mechanics over the past year, I actually got the feeling that, when everything is said and done, modern physics has quite a lot in common with Pythagoras’ ‘simple’ belief that mathematical concepts – and numbers in particular – might have greater ‘actuality’ than the reality they are supposed to describe. To put it crudely, the only ‘update’ to the Pythagorean model that’s needed is to replace Pythagoras’ numerological ideas by quantum-mechanical wave functions, describing probability amplitudes that are represented by complex numbers. Indeed, complex numbers are numbers too, and Pythagoras would have reveled in their beauty. In fact, I can’t help thinking that, if he could have imagined them, he would surely have created a ‘religion’ around Euler’s formula, rather than around the tetrad. :-)

In any case… Let’s leave the jokes and silly comparisons aside, as that’s not what I want to write about in this post (if you want to read more about this, I’ll refer you another blog of mine). In this post, I want to present the basics of vector calculus, an understanding of which is absolutely essential in order to gain both a mathematical as well as a ‘physical’ understanding of what fields really are. So that’s classical mechanics once again. However, as I found out, one can’t study quantum mechanics without going through the required prerequisites. So let’s go for it.

Vectors in math and physics

What’s a vector? It may surprise you, but the term ‘vector’, in physics and in math, refers to more than a dozen different concepts, and that’s a major source of confusion for people like us–autodidacts. The term ‘vector’ refers to many different things indeed. The most common definitions are:

  1. The term ‘vector’ often refers to a (one-dimensional) array of numbers. In that case, a vector is, quite simply, an element of Rn, while the array will be referred to as an n-tuple. This definition can be generalized to also include arrays of alphanumerical values, or blob files, or any type of object really, but that’s a definition that’s more relevant for other sciences – most notably computer science. In math and physics, we usually limit ourselves to arrays of numbers. However, you should note that a ‘number’ may also be a complex number, and so we have real as well as complex vector spaces. The most straightforward example of a complex vector space is the set of complex numbers itself: C. In that case, the n-tuple is a ‘1-tuple’, aka as a singleton, but the element in it (i.e. a complex number) will have ‘two dimensions’, so to speak. [Just like the term ‘vector’, the term ‘dimension’ has various definitions in math and physics too, and so it may be quite confusing.] However, we can also have 2-tuples, 3-tuples or, more in general, n-tuples of complex numbers. In that case, the vector space is denoted by Cn. I’ve written about vector spaces before and so I won’t say too much about this.
  2. A vector can also be a point vector. In that case, it represents the position of a point in physical space – in one, two or three dimensions – in relation to some arbitrarily chosen origin (i.e. the zero point). As such, we’ll usually write it as x (in one dimension) or, in three dimensions, as (x, y, z). More generally, a point vector is often denoted by the bold-face symbol R. This definition is obviously ‘related’ to the definition above, but it’s not the same: we’re talking physical space here indeed, not some ‘mathematical’ space. Physical space can be curved, as you obviously know when you’re reading this blog, and I also wrote about that in the above-mentioned post, so you can re-visit that topic too if you want. Here, I should just mention one point which may or may not confuse you: while (two-dimensional) point vectors and complex numbers have a lot in common, they are not the same, and it’s important to understand both the similarities as well as the differences between both. For example, multiplying two vectors and multiplying two complex numbers is definitely not the same. I’ll come back to this.
  3. A vector can also be a displacement vector: in that case, it will specify the change in position of a point relative to its previous position. Again, such displacement vectors may be one-, two-, or three-dimensional, depending on the space we’re envisaging, which may be one-dimensional (a simple line), two-dimensional (i.e. the plane), three-dimensional (i.e. three-dimensional space), or four-dimensional (i.e. space-time). A displacement vector is often denoted by s or ΔR, with the delta indicating we’re talking a a distance or a difference indeed: s = ΔR = R2 – R1 = (x2 – x1, y2 – y1, z2 – z1). That’s kids’ stuff, isn’t it?
  4. A vector may also refer to a so-called four-vector: a four-vector obeys very specific transformation rules, referred to as the Lorentz transformation. In this regard, you’ve surely heard of space-time vectors, referred to as events, and noted as X = (ct, r), with r the spatial vector r = (x, y, z) and c the speed of light (which, in this case, is nothing but a proportionality constant ensuring that space and time are measured in compatible units). So we can also write X as X = (ct, x, y, z). However, there is another four-vector which you’ve probably also seen already (see, for example, my post on (special) Relativity Theory): P = (E/c, p), which relates energy and momentum in spacetime. Of course, spacetime can also be curved. In fact, Einstein’s (general) Relativity Theory is about the curvature of spacetime, not of ordinary space. But I should not write more about this here, as it’s about time I get back to the main story line of this post.
  5. Finally, we also have vector operators, like the gradient vector . Now that is what I want to write about in this post. Vector operators are also considered to be ‘vectors’ – to some extent, at least: we use them in a ‘vector products’, for example, as I will show below – but, because they are operators and, as such, “hungry for something to operate on”, they are obviously quite different from any of the ‘vectors’ I defined in point (1) to (4) above. [Feynman attributes this ‘hungry for something to operate on’ expression to the British mathematician Sir James Hopwood Jeans, who’s best known from the infamous Rayleigh-Jeans law, whose inconsistency with observations is known as the ultraviolet catastrophe or ‘black-body radiation problem’. But that’s a fairly useless digression so let me got in with it.]

In a text on physics, the term ‘vector’ may refer to any of the above but it’s often the second and third definition (point and/or displacement vectors) that will be implicit. As mentioned above, I want to write about the fifth ‘type’ of vector: vector operators. Now, the title of this post is ‘vector calculus’, and so you’ll immediately wonder why I say these vector operators may or may not be defined as vectors. Moreover, the fact of the matter is that these operators operate on yet another type of ‘vector’ – so that’s a sixth definition I need to introduce here: field vectors.

Now, funnily enough, the term ‘field vector’, while being the most obvious description of what it is, is actually not widely used: what I call a ‘field vector’ is usually referred to as a gradient, and the vectors and B are usually referred to as the electric or magnetic field. Indeed, if you google the terms ‘electromagnetic vector’ (or electric or magnetic vector tout court), you will usually be redirected. However, when everything is said and done, E and B are vectors: they have a magnitude, and they have a direction.

So, truth be told, vector calculus (aka vector analysis) in physics is about (vector) fields and (vector) operators,. While the ‘scene’ for these fields and operators is, obviously, physical space (or spacetime) and, hence, a vector space, it’s good to be clear on terminology and remind oneself that, in physics, vector calculus is not about mathematical vectors: it’s about real stuff. That’s why Feynman prefers a much longer term than vector calculus or vector analysis: he calls it differential calculus of vector fields which, indeed, is what it is – but I am sure you would not have bothered starting reading this post if I would have used that term too. :-)

Now, this has probably become the longest introduction ever to a blog post, and so I had better get on with it. :-)

Vector fields and scalar fields

Let’s dive straight into it. Vector fields like E and B behave like h, which is the symbol used in a number of textbooks for the heat flow in some body or block of material: E, B and h are all vector fields derived from a scalar field.

Huh? Scalar field? Aren’t we talking vectors? We are. If I say we can derive the vector field h (i.e. the heat flow) from a scalar field, I am basically saying that the relationship between h and the temperature T (i.e. the scalar field) is direct and very straightforward. Likewise, the relationship between E and the scalar field Φ is also direct and very straightforward.

[To be fully precise and complete, I should qualify the latter statement: it’s only true in electrostatics, i.e. when we’re considering charges that don’t move. When we have moving charges, magnetic effects come into play, and then we have a more complicated relationship between (i) two scalar fields, namely A (the magnetic potential – i.e. the ‘magnetic scalar field’) and Φ (the electrostatic potential, or ‘electric scalar field’), and (ii) two vector fields, namely B and E. The relationships between the two are then a bit more complicated than the relationship between T and h. However, the math involved is the same. In fact, the complication arises from the fact that magnetism is actually a relativistic effect. However, at this stage, this statement will only confuse you, and so I will write more about that in my next post.]

Let’s look at h and T. As you know, the temperature is a measure for energy. In a block of material, the temperature T will be a scalar: some real number that we can measure in Kelvin, Fahrenheit or Celsius but – whatever unit we use – any observer using the same unit will measure the same at any given point. That’s what distinguishes a ‘scalar’ quantity from ‘real numbers’ in general: a scalar field is something real. It represents something physical. A real number is just… Well… A real number, i.e. a mathematical concept only.  

The same is true for a vector field: it is something real. As Feynman puts it: “It is not true that any three numbers form a vector [in physics]. It is true only if, when we rotate the coordinate system, the components of the vector transform among themselves in the correct way.” What’s the ‘correct way’? It’s a way that ensures that any observer using the same unit will measure the same at any given point.

How does it work?

In physics, we associate a point in space with physical realities, such as:

  1. Temperature, the ‘height‘ of a body in a gravitational field, or the pressure distribution in a gas or a fluid, are all examples of scalar fields: they are just (real) numbers from a math point of view but, because they do represent a physical reality, these ‘numbers’ respect certain mathematical conditions: in practice, they will be a continuous or continuously differentiable function of position.
  2. Heat flow (h), the velocity (v) of the molecules/atoms in a rotating object, or the electric field (E), are examples of vector fields. As mentioned above, the same condition applies: any observer using the same unit should measure the same at any given point.
  3. Tensors, which represent, for example, stress or strain at some point in space (in various directions), or the curvature of space (or spacetime, to be fully correct) in the general theory of relativity.
  4. Finally, there are also spinors, which are often defined as a “generalization of tensors using complex numbers instead of real numbers.” They are very relevant in quantum mechanics, it is said, but I don’t know enough about them to write about them, and so I won’t.

How do we derive a vector field, like h, from a scalar field (i.e. T in this case)? The two illustrations below (taken from Feynman’s Lectures) illustrate the ‘mechanics’ behind it: heat flows, obviously, from the hotter to the colder places. At this point, we need some definitions. Let’s start with the definition of the heat flow: the (magnitude of the) heat flow (h) is the amount of thermal energy (ΔJ) that passes, per unit time and per unit area, through an infinitesimal surface element at right angles to the direction of flow.

Fig 1 Fig 2

A vector has both a magnitude and a direction, as defined above, and, hence, if we define ef as the unit vector in the direction of flow, we can write:

h = h·ef = (ΔJ/Δa)·ef

ΔJ stands for the thermal energy flowing through an area marked as Δa in the diagram above per unit time. So, if we incorporate the idea that the aspect of time is already taken care of, we can simplify the definition above, and just say that the heat flow is the flow of thermal energy per unit area. Simple trigonometry will then yield an equally simple formula for the heat flow through any surface Δa2 (i.e. any surface that is not at right angles to the heat flow h):

ΔJ/Δa2 = (ΔJ/Δa1)cosθ = h·n


When I say ‘simple’, I must add that all is relative, of course, Frankly, I myself did not immediately understand why the heat flow through the Δa1 and Δa2 areas below must be the same. That’s why I added the blue square in the illustration above (which I took from Feynman’s Lectures): it’s the same area as Δa1, but it shows more clearly – I hope! – why the heat flow through the two areas is the same indeed, especially in light of the fact that we are looking at infinitesimally small areas here (so we’re taking a limit here).

As for the cosine factor in the formula above, you should note that, in that ΔJ/Δa2 = (ΔJ/Δa1)cosθ = h·equation, we have a dot product (aka as a scalar product) of two vectors: (1) h, the heat flow and (2) n, the unit vector that is normal (orthogonal) to the surface Δa2. So let me remind you of the definition of the scalar (or dot) product of two vectors. It yields a (real) number:

A·B = |A||B|cosθ, with θ the angle between A and B

In this case, h·n = |h||n|cosθ = |h|·1·cosθ = |h|cosθ = h·cosθ. What we are saying here is that we get the component of the heat flow that’s perpendicular (or normal, as physicists and mathematicians seem to prefer to say) to the surface Δa2 by taking the dot product of the heat flow h and the unit normal n. We’ll use this formula later, and so it’s good to take note of it here.

OK. Let’s get back to the lesson. The only thing that we need to do to prove that ΔJ/Δa2 = (ΔJ/Δa1)cosθ formula is show that Δa2 = Δa1/cosθ or, what amounts to the same, that Δa1 = Δa2cosθ. Now that is something you should be able to figure out yourself: it’s quite easy to show that the angle between h and n is equal to the angle between the surfaces Δa1 and Δa2. The rest is just plain triangle trigonometry.

For example, when the surfaces coincide, the angle θ will be zero and then h·n is just equal to |h|cosθ = |h| = h·1 = h = ΔJ/Δa1. The other extreme is that orthogonal surfaces: in that case, the angle θ will be 90° and, hence, h·n = |h||n|cos(π/2) = |h|·1·0 = 0: there is no heat flow normal to the direction of heat flow.

OK. That’s clear enough (or should be clear enough). The point to note is that the vectors h and n represent physical entities and, therefore, they do not depend on our reference frame (except for the units we use to measure things). That allows us to define  vector equations.

The ∇ (del) operator and the gradient

Let’s continue our example of temperature and heat flow. In a block of material, the temperature (T) will vary in the x, y and z direction and, hence, the partial derivatives ∂T/∂x, ∂T/∂y and ∂T/∂z make sense: they measure how the temperature varies with respect to position. Now, the remarkable thing is that the 3-tuple (∂T/∂x, ∂T/∂y, ∂T/∂z) is a physical vector itself: it is independent, indeed, of the reference frame (provided we measure stuff in the same unit) – so we can do a translation and/or a rotation of the coordinate axes and we get the same value. This means this set of three numbers is a vector indeed:

(∂T/∂x, ∂T/∂y, ∂T/∂z) = a vector

If you like to see a formal proof of this, I’ll refer you to Feynman once again – but I think the intuitive argument will do: if temperature and space are real, then the derivatives of temperature in regard to the x-, y- and z-directions should be equally real, isn’t it? Let’s go for the more intricate stuff now.

If we go from one point to another, in the x-, y- or z-direction, then we can define some (infinitesimally small) displacement vector ΔR = (Δx, Δy, Δz), and the difference in temperature between two nearby points (ΔT) will tend to the (total) differential of T – which we denote by ΔT – as the two point get closer and closer. Hence, we write:

ΔT = (∂T/∂x)Δx + (∂T/∂y)Δy + (∂T/∂z)Δz

The two equations above combine to yield:

ΔT = (∂T/∂x, ∂T/∂y, ∂T/∂z)(Δx, Δy, Δz) = T·ΔR

In this equation, we used the (del) operator, i.e. the vector differential operator. It’s an operator like the differential operator ∂/∂x (i.e. the derivative) but, unlike the derivative, it returns not one but three values, i.e. a vector, which is usually referred to as the gradient, i.e. T in this case. More in general, we can write f(x, y, z), ψ or followed by whatever symbol for the function we’re looking at.

In other words, the operator acts on a scalar-valued function (T), aka a scalar field, and yields a vector:

T = (∂T/∂x, ∂T/∂y, ∂T/∂z)

That’s why we write  in bold-type too, just like the vector R. Indeed, using bold-type (instead of an arrow or so) is a convenient way to mark a vector, and the difference (in print) between  and ∇ is subtle, but it’s there – and for a good reason as you can see!

If T is a vector, what’s its direction? Think about it. […] The rate of change of T in the x-, y- and z-direction are the x-, y- and z-component of our T vector respectively. In fact, the rate of change of T in any direction will be the component of the T vector in that direction. Now, the magnitude of a vector component will always be smaller than the magnitude of the vector itself, except if it’s the component in the same direction as the vector, in which case the component is the vector. [If you have difficulty understanding this, read what I write once again, but very slowly and attentively.] Therefore, the direction of T will be the direction in which the (instantaneous) rate of change of T is largest. In Feynman’s words: “The gradient of T has the direction of the steepest uphill slope in T.” Now, it should be quite obvious what direction that really is: it is the opposite direction of the heat flow h.

That’s all you need to know to understand our first real vector equation:

h = –κT

Indeed, you don’t need too much math to understand this equation in the way we want to understand it, and that’s in some kind of ‘physical‘ way (as opposed to just the math side of it). Let me spell it out:

  1. The direction of heat flow is opposite to the direction of the gradient vector T. Hence, heat flows from higher to lower temperature (i.e. ‘downhill’), as we would expect, of course!). So that’s the minus sign.
  2. The magnitude of h is proportional to the magnitude of the gradient T, with the constant of proportionality equal to κ (kappa), which is called the thermal conductivity. Now, in case you wonder what this means (again: do go beyond the math, please!), remember that the heat flow is the flow of thermal energy per unit area (and per unit time, of course): |h| = h = ΔJ/Δa.

But… Yes? Why would it be proportional? Why don’t we have some exponential relation or something? Good question, but the answer is simple, and it’s rooted in physical reality – of course! The heat flow between two places – let’s call them 1 and 2 – is proportional to the temperature difference between those two places, so we have: ΔJ ∼  T2 – T1. In fact, that’s where the factor of proportionality comes in. If we imagine a very small slab of material (infinitesimally small, in fact) with two faces, parallel to the isothermals, with a surface area ΔA and a tiny distance Δs between them, we can write:

ΔJ = κ(T2 – T1)ΔA/Δs = ΔJ = κ·ΔT·ΔA/Δs ⇔ ΔJ/ΔA = κΔT/Δs

Now, we defined ΔJ/ΔA as the magnitude of h. As for its direction, it’s obviously perpendicular (not parallel) to the isothermals. Now, as Δs tends to zero, ΔT/Δs is nothing but the rate of change of T with position. We know it’s the maximum rate of change, because the position change is also perpendicular to the isotherms (if the faces are parallel, that tiny distance Δs is perpendicular). Hence, ΔT/Δs must be the magnitude of the gradient vector (T). As its direction is opposite to that of h, we can simply pop in a minus sign and switch from magnitudes to vectors to write what we wrote already: h = –κT.

But let’s get back to the lesson. I think you ‘get’ all of the above. In fact, I should probably not have introduced that extra equation above (the ΔJ expression) and all the extra stuff (i.e. the ‘infinitesimally small slab’ explanation), as it probably only confuses you. So… What’s the point really? Well… Let’s look, once again, at that equation h = –κT and  let us generalize the result:

  1. We have a scalar field here, the temperature T – but it could be any scalar field really!
  2. When we have the ‘formula’ for the scalar field – it’s obviously some function T(x, y, z) – we can derive the heat flow h from it, i.e. a vector quantity, which has a property which we can vaguely refer to as ‘flow’.
  3. We do so using this brand-new operator . That’s a so-called vector differential operator aka the del operator. We just apply it to the scalar field and we’re done! The only thing left is to add some proportionality factor, but so that’s just because of our units. [In case you wonder about the symbol it self, ∇ is the so-called nabla symbol: the name comes from the Hebrew word for a harp, which has a similar shape indeed.] 

This truly is a most remarkable result, and we’ll encounter the same equation elsewhere. For example, if the electric potential is Φ, then we can immediately calculate the electric field using the following formula:

E = –Φ

Indeed, the situation is entirely analogous from a mathematical point of view. For example, we have the same minus sign, so E also ‘flows’ from higher to lower potential. Where’s the factor of proportionality? Well… We don’t have one, as we’ll make sure that the units in which we’ll measure E and Φ are ‘fully compatible’ (but so don’t worry about them now). Of course, as mentioned above, this formula for E is only valid in electrostatics, i.e. when there are no moving charges. When moving charges are involved, we also have the magnetic force coming into play, and then equations become a bit more complicated. However, this extra complication does not fundamentally alter the logic involved, and I’ll come back to this so you see how it all nicely fits together.

Note: In case you feel I’ve skipped some of the ‘explanation’ of that vector equation h = –κT… Well… You may be right. I feel that it’s enough to simply point out that T is a vector with opposite direction to h, so that explains the minus sign in front of the T factor. The only thing left to ‘explain’ then is the magnitude of h, but so that’s why we pop in that kappa factor (κ), and so we’re done, I think, in terms of ‘understanding’ this equation. But so that’s what I think indeed. Feynman offers a much more elaborate ‘explanation‘, and so you can check that out if you think my approach to it is a bit too much of a shortcut.

Interim summary

So far, we have only have shown two things really:

[I] The first thing to always remember is that h·n product: it gives us the component of ‘flow’ (per unit time and per unit area) of perpendicular through any surface element Δa. Of course, this result is valid for any other vector field, or any vector for that matter: the scalar product of a vector and a unit vector will always yield the component of that vector in the direction of that unit vector.

Now, you should note that the term ‘component’ (of a vector) usually refers to a number (not to a vector) – and surely in this case, because we calculate it using a scalar product! I am just highlighting this because it did confuse me for quite a while. Why? Well… The concept of a ‘component’ of a vector is, obviously, intimately associated with the idea of ‘direction': we always talk about the component in some direction, e.g. in the x-, y- or z-direction, or in the direction of any combination of x, y and z. Hence, I think it’s only natural to think of a ‘component’ as a vector in its own right. However, as I note here, we shouldn’t do that: a ‘component’ is just a magnitude, i.e. a number only. If we’d want to include the idea of direction, it’s simple: we can just multiply the component with the normal vector n once again, and then we have a vector quantity once again, instead of just a scalar. So then we just write (h·nn = (h·n)nSimple, isn’t it? :-)

[As I am smiling here, I should quickly say something about this dot (·) symbol: we use the same symbol here for (i) a product between scalars (i.e. real or complex numbers), like 3·4; (ii) a product between a scalar and a vector, like 3·– but then I often omit the dot to simply write 3v; and, finally, (iii) a scalar product of two vectors, like h·indeed. We should, perhaps, introduce some new symbol for multiplying numbers, like ∗ for example, but then I hope you’re smart enough to see from the context what’s going on really.]

Back to the lesson. Let me jot down the formula once again: h·n = |h||n|cosθ = h·cosθ. Hence, the number we get here is (i.e. the amount of heat flow in the direction of flow) multiplied by cosθ, with θ the angle between (i) the surface we’re looking at (which, as mentioned above, is any surface really) and (ii) the surface that’s perpendicular to the direction of flow.

Hmm… […] The direction of flow? Let’s take a moment to think about what we’re saying here. Is there any particular or unique direction really? Heat flows in all directions from warmer to colder areas, and not just in one direction – doesn’t it? It does. Once again, the terminology may confuse you – which is yet another reason why math is so much better as a language to express physical ideas :-) – and so we should be precise: the direction of h is the direction in which the amount of heat flow (i.e. h·cosθ) is largest (hence, the angle θ is zero). As we pointed out above, that’s the direction in which the temperature T changes the fastest. In fact, as Feynman notes: “We can, if we wish, consider that this statement defines h.”

That brings me to the second thing you should – always and immediately – remember from all of that I’ve written above.

[II] If we write the infinitesimal (i.e. the differential) change in temperature (in whatever direction) as ΔT, then we know that

ΔT = (∂T/∂x, ∂T/∂y, ∂T/∂z)(Δx, Δy, Δz) = T·ΔR

Now, what does this say really? Δis an (infinitesimal) displacement vector: ΔR = (Δx, Δy, Δz). Hence, it has some direction. To be clear: that can be any direction in space really. So that’s simple. What about the second factor in this dot product, i.e. that gradient vector T? 

The direction of the gradient (i.e. T) is not just ‘any direction': it’s the direction in which the rate of change of T is largest, and we know what direction that is: it’s the opposite direction of the heat flow h, as evidenced by the minus sign in our vector equations h = –κT or E = –Φ. So, once again, we have a (scalar) product of two vectors here, T·ΔR, which yields…


Good question. That T·Δexpression is very similar to that h·n expression above, but it’s not quite the same. It’s also a vector dot product – or a scalar product, in other words, but, unlike that n vector, the ΔR vector is not a unit vector: it’s an infinitesimally small displacement vector. So we do not get some ‘component’ of T. More in general, you should note that the dot product of two vectors A and B does not, in general, yield the component of A in the direction of B, unless B is a unit vector – which, again, is not the case here. So if we don’t have that here, what do we have?

Let’s look at the (physical) geometry of the situation once again. Heat obviously flows in one direction only: from warmer to colder places – not in the opposite direction. Therefore, the θ in the h·n = h·cosθ expression varies from –90° to +90° only. Hence, the cosine factor (cosθ) is always positive. Always. Indeed, we do not have any right- or left-hand rule here to distinguish between the ‘front’ side and the ‘back’ side of our surface area. So when we’re looking at that h·n product, we should remember that that normal unit vector n is a unit vector that’s normal to the surface but which is oriented, generally, towards the direction of flow. Therefore, that h·n product will always yield some positive value, because θ varies from –90° to +90° only indeed.

When looking at that ΔT = T·ΔR product, the situation is quite different: while T has a very specific direction (I really mean unique)  – which, as mentioned above is opposite to that of h – that ΔR vector can point in any direction – and then I mean literally any direction, including directions ‘uphill’. Likewise, it’s obvious that the temperature difference ΔT can be both positive or negative (or zero, when we’re moving on a isotherm itself). In fact, it’s rather obvious that, if we go in the direction of flow, we go from higher to lower temperatures and, hence, ΔT will, effectively, be negative: ΔT = T2 – T1 < 0, as shown below.

Temperature drop

Now, because |T| and |ΔR| are absolute values (or magnitudes) of vectors, they are always positive (always!). Therefore, if ΔT has a minus sign, it will have to come from the cosine factor in the ΔT = T·ΔR = |T|·|ΔRcosθ expression. [Again, if you wonder where this expression comes from: it’s just the definition of a vector dot product.] Therefore, ΔT and cosθ will have the same sign, always, and θ can have any value between –180° to +180°. In other words, we’re effectively looking at the full circle here. To make a long story short, we can write the following:

ΔT = |T|·|ΔRcosθ = |T|·ΔR·cosθ ⇔ ΔT/ΔR = |T|cosθ

As you can see, θ is the angle between T and ΔR here and, as mentioned above, it can take on any value – well… Any value between –180° to +180°, that is. ΔT/ΔR is, obviously, the rate of change of T in the direction of ΔR and, from the expression above, we can see it is equal to the component of T in the direction of ΔR:

ΔT/ΔR = |T|cosθ

So we have a negative component here? Yes. The rate of change is negative and, therefore, we have a negative component. Indeed, any vector has components in all directions, including directions that point away from it. However, in the directions that point away from it, the component will be negative. More in general, we have the following interesting result: the rate of change of a scalar field ψ in the direction of a (small) displacement ΔR is the component of the gradient ∇ψ along that displacement. We write that result as:

Δψ/ΔR = |T|cosθ

We’ve said a lot of (not so) interesting things here, but we still haven’t answered the original question: what’s T·ΔR? Well… We can’t say more than what we said already: it’s equal to ΔT, which is a differential: ΔT = (∂T/∂x)Δx + (∂T/∂y)Δy + (∂T/∂z)Δz. A differential and a derivative (i.e. a rate of change) are not the same, but they are obviously closely related, as evidenced from the equations above: the rate of change is the change per unit distance.

In any case, this is enough of a recapitulation. In fact, this ‘interim summary’ has become longer than the preceding text! We’re now ready to discuss what I’ll call the First Theorem of vector calculus in physics. Of course, never mind the term: what’s first or second or third doesn’t matter really: you’ll need all of the theorems below to understand vector calculus.

The First Theorem

Let’s assume we have some scalar field ψ in space: ψ might be the temperature, but it could be any scalar field really. Now, if we go from one point (1) to another (2) in space, as shown below, we’ll follow some arbitrary path, which is denoted by the curve Γ in the illustrations below. Each point along the curve can then be associated with a gradient ψ (think of the h = –κT and E = –Φ expressions above if you’d want examples). Its tangential component is obviously equal to (ψ)t·Δs = ψ·Δs. [Please note, once again, the subtle difference between Δs (with the s in bold-face) and Δs: Δs is a vector, and Δs is its magnitude.] 

Line integral-1 Line integral-2

As shown in the illustrations above, we can mark off the curve at a number of points (a, b, c, etcetera) and join these points by straight-line segments Δsi. Now let’s consider the first line segment, i.e. Δs1. It’s obvious that the change in ψ from point 1 to point a is equal to Δψ= ψ(a) – ψ(1). Now, we have that general Δψ = (∂ψ/∂x, ∂ψ/∂y, ∂ψ/∂z)(Δx, Δy, Δz) = ψ·Δs equation. [If you find it difficult to interpret what I am writing here, just substitute ψ for T and Δs for ΔR.] So we can write:

Δψ= ψ(a) – ψ(1) = (ψ)1·Δs1

Likewise, we can write:

ψ(b) – ψ(a) = (ψ)2·Δs1

In these expressions, (ψ)and (ψ)mean the gradient evaluated at segment Δs1 and point Δs2 respectively, not at point 1 and 2 – obviously. Now, if we add the two equations above, we get:

ψ(b) – ψ(1) = (ψ)1·Δs+ (ψ)2·Δs1

To make a long story short, we can keep adding such terms to get:

ψ(2) – ψ(1) = ∑(ψ)i·Δsi

We can add more and more segments and, hence, take a limit: as Δsi tends to zero, ∑ becomes a sum of an infinite number of terms – which we denote using the integral sign ∫ – in which ds is – quite simply – just the infinitesimally small displacement vector. In other words, we get the following line integral along that curve Γ: 

Line integral - expression

This is a gem, and our First Theorem indeed. It’s a remarkable result, especially taking into account the fact that the path doesn’t matter: we could have chosen any curve Γ indeed, and the result would be the same. So we have:

Line integral - expression -2

You’ll say: so what? What do we do with this? Well… Nothing much for the moment, but we’ll need this result later. So I’d say: just hang in there, and note this is the first significant use of our del operator in a mathematical expression that you’ll encounter very often in physics. So just let it sink in, and allow me to proceed with the rest of the story.

Before doing so, however, I should note that even Feynman sins when trying to explain this theorem in a more ‘intuitive’ way. Indeed, in his Lecture on the topic, he writes the following: “Since the gradient represents the rate of change of a field quantity, if we integrate that rate of change, we should get the total change.” Now, from that Δψ/ΔR = |ψ|cosθ formula, it’s obvious that the gradient is the rate of change in a specific direction only. To be precise, in this particular case – with the field quantity ψ equal to the temperature T – it’s the direction in which T changes the fastest.

You should also note that the integral above is not the type of integral you known from high school. Indeed, it’s not of the rather straightforward ∫f(x)dx type, with f(x) the integrand and dx the variable of integration. That type of integral, we knew how to solve. A line integral is quite different. Look at it carefully: we have a vector dot product after the ∫ sign. So, unlike what Feynman suggests, it’s not just a matter of “integrating the rate of change.”

Now, I’ll refer you to Wikipedia for a good discussion of what a line integral really is, but I can’t resist the temptation to copy the animation in that article, because it’s very well made indeed. While it shows that we can think of a line integral as the two- or three-dimensional equivalent of the standard type of integral we learned to solve in high school (you’ll remember the solution was also the area under the graph of the function that had to be integrated), the way to go about it is quite different. Solving them will, in general, involve some so-called parametrization of the curve C.


However, this post is becoming way too long and, hence, I really need to move on now.

Operations with ∇:  divergence and curl

You may think we’ve covered a lot of ground already, and we did. At the same time, everything I wrote above is actually just the start of it. I emphasized the physics of the situation so far. Let me now turn to the math involved. Let’s start by dissociating the del operator from the scalar field, so we just write:

 = (∂/∂x, ∂/∂y, ∂/∂z)

This doesn’t mean anything, you’ll say, because the operator has nothing to operate on. And, yes, you’re right. However, in math, it doesn’t matter: we can combine this ‘meaningless’ operator (which looks like a vector, because it has three components) with something else. For example, we can do a vector dot product:

·(a vector)

As mentioned above, we can ‘do’ this product because has three components, so it’s a ‘vector’ too (although I find such name-giving quite confusing), and so we just need to make sure that the vector we’re operating on has three components too. To continue with our heat flow example, we can write, for example:

·h = (∂/∂x, ∂/∂y, ∂/∂z)·(hxhyhz) = ∂hx/∂x + ∂hy/∂y, ∂hz/∂z

This del operator followed by a dot, and acting on a vector – i.e. ·(vector) – is, in fact, a new operator. Note that we use two existing symbols, the del () and the dot (·), but it’s one operator really. [Inventing a new symbol for it would not be wise, because we’d forget where it comes from and, hence, probably scratch our head when we’d see it.] It’s referred to as a vector operator, just like the del operator, but don’t worry about the terminology here because, once again, the terminology here might confuse you. Indeed, our del operator acted on a scalar to yield a vector, and now it’s the other way around: we have an operator acting on a vector to return a scalar. In a few minutes, we’ll define yet another operator acting on a vector to return a vector. Now, all of these operators are so-called vector operators, not because there’s some vector involved, but because they all involve the del operator. It’s that simple. So there’s no such thing as a scalar operator. :-) But let me get back to the main line of the story. This ·  operator is quite important in physics, and so it has a name (and an abbreviated notation) of its own:

·h = div h = the divergence of h

The physical significance of the divergence is related to the so-called flux of a vector field: it measures the magnitude of a field’s source or sink at a given point. Continuing our example with temperature, consider air as it is heated or cooled. The relevant vector field is now the velocity of the moving air at a point. If air is heated in a particular region, it will expand in all directions such that the velocity field points outward from that region. Therefore the divergence of the velocity field in that region would have a positive value, as the region is a source. If the air cools and contracts, the divergence has a negative value, as the region is a sink.

A less intuitive but more accurate definition is the following: the divergence represents the volume density of the outward flux of a vector field from an infinitesimal volume around a given point.

Phew! That sounds more serious, doesn’t it? We’ll come back to this definition when we’re ready to define the concept of flux somewhat accurately. For now, just note two of Maxwell’s famous equations involve the divergence operator:

·E = ρ/ε0 and ·B = 0

In my previous post, I gave a verbal description of those two equations:

  1. The flux of E through a closed surface = (the net charge inside)/ε0
  2. The flux of B through a closed surface = zero

The first equation basically says that electric charges cause an electric field. The second equation basically says there is no such thing as a magnetic charge: the magnetic force only appears when charges are moving and/or when electric fields are changing. Note that we’re talking closed surface here, so they define a volume indeed. We can also look at the flux through a non-closed surface (and we’ll do that shortly) but, in the context of Maxwell’s equations, we’re talking volumes and, hence, closed surfaces.

Of course, you’ll anticipate the second new operator now, because that’s the one that appears in the other two equations in Maxwell’s set of equations. It’s the cross product:

∇×E = (∂/∂x, ∂/∂y, ∂/∂z)×(Ex, Ey, Ez) = … What?

Well… The cross product is not as straightforward to write down as the dot product. We get a vector indeed, not a scalar, and its three components are:

(∇×E)z = ∇xEyE= ∂Ey/∂x – ∂Ex/∂y

(∇×E)x = ∇yEzE= ∂Ez/∂y – ∂Ey/∂z

(∇×E)y = ∇zExE= ∂Ex/∂z – ∂Ez/∂x

I know this looks pretty monstrous, but so that’s how cross products work. Please do check it out: you have to play with the order of the x, y and z subscripts. I gave the geometric formula for a dot product above, so I should also give you the same for a cross product:

A×B = |A||B|sin(θ)n

In this formula, we once again have θ, the angle between A and B, but note that, this time around, it’s the sine, not the cosine, that pops up when calculating the magnitude of this vector. In addition, we have n at the end: n is a unit vector at right angles to both A and B. It’s what makes the cross product a vector. Indeed, as you can see, multiplying by n will not alter the magnitude (|A||B|sinθ) of this product, but it gives the whole thing a direction, so we get a new vector indeed. Of course, we have two unit vectors at right angles to A and B, and so we need a rule to choose one of these: the direction of the n vector we want is given by that right-hand rule which we encountered a couple of times already.

Again, it’s two symbols but one operator really, and we also have a special name (and notation) for it:

∇×h = curl h = the curl of h

The curl is, just like the divergence, a so-called vector operator but, as mentioned above, that’s just because it involves the del operator. Just note that it acts on a vector and that its result is a vector too. What’s the geometric interpretation of the curl? Well… It’s a bit hard to describe that but let’s try. The curl describes the ‘rotation’ or ‘circulation’ of a vector field:

  1. The direction of the curl is the axis of rotation, as determined by the right-hand rule.
  2. The magnitude of the curl is the magnitude of rotation.

I know. This is pretty abstract, and I’ll probably have to come back to it in another post. As for now, just note we defined three new operators in this ‘introduction’ to vector calculus:

  1. T = grad T = a vector
  2. ∇·h = div h = a scalar
  3. ×h = curl h = a vector

That’s all. It’s all we need to understand Maxwell’s famous equations:

Maxwell's equations-2

Huh? Hmm… You’re right: understanding the symbols, to some extent, doesn’t mean we ‘understand’ these equations. What does it mean to ‘understand’ an equation? Let me quote Feynman on that: “What it means really to understand an equation—that is, in more than a strictly mathematical sense—was described by Dirac. He said: “I understand what an equation means if I have a way of figuring out the characteristics of its solution without actually solving it.” So if we have a way of knowing what should happen in given circumstances without actually solving the equations, then we “understand” the equations, as applied to these circumstances.”

We’re surely not there yet. In fact, I doubt we’ll ever reach Dirac’s understanding of Maxwell’s equations. But let’s do what we can.

In order to ‘understand’ the equations above in a more ‘physical’ way, let’s explore the concepts of divergence and curl somewhat more. We said the divergence was related to the ‘flux’ of a vector field, and the curl was related to its ‘circulation’. In my previous post, I had already illustrated those two concepts copying the following diagrams from Feynman’s Lectures:


flux = (average normal component)·(surface area)

So that’s the flux (through a non-closed surface).

To illustrate the concept of circulation, we have not one but three diagrams, shown below. Diagram (a) gives us the vector field, such as the velocity field in a liquid. In diagram (b), we imagine a tube (of uniform cross section) that follows some arbitrary closed curve. Finally, in diagram (c), we imagine we’d suddenly freeze the liquid everywhere except inside the tube. Then the liquid in the tube would circulate as shown in (c), and so that’s the concept of circulation.


We have a similar formula as for the flux:

circulation = (the average tangential component)·(the distance around)

In both formulas (flux and circulation), we have a product of two scalars: (i) the average normal component and the average tangential component (for the flux and circulation respectively) and (ii) the surface area and the distance around (again, for the flux and circulation respectively). So we get a scalar as a result. Does that make sense? When we related the concept of flux to the divergence of a vector field, we said that the flux would have a positive value if the region is a source, and a negative value if the region is a sink. So we have a number here (otherwise we wouldn’t be talking ‘positive’ or ‘negative’ values). So that’s OK. But are we talking about the same number? Yes. I’ll show they are the same in a few minutes.

But what about circulation? When we related the concept of circulation of the curl of a vector field, we introduced a vector cross product, so that yields a vector, not a scalar. So what’s the relation between that vector and the number we get when multiplying the ‘average tangential component’ and the ‘distance around’. The answer requires some more mathematical analysis, and I’ll give you what you need in a minute. Let me first make a remark about conventions here.

From what I write above, you see that we use a plus or minus sign for the flux to indicate the direction of flow: the flux has a positive value if the region is a source, and a negative value if the region is a sink. Now, why don’t we do the same for circulation? We said the curl is a vector, and its direction is the axis of rotation as determined by the right-hand rule. Why do we need a vector here? Why can’t we have a scalar taking on positive or negative values, just like we do for the flux?

The intuitive answer to this question (i.e. the ‘non-mathematical’ or ‘physical’ explanation, I’d say) is the following. Although we can calculate the flux through a non-closed surface, from a mathematical point of view, flux is effectively being defined by referring to the infinitesimal volume around some point and, therefore, we can easily, and unambiguously, determine whether we’re inside or outside of that volume. Therefore, the concepts of positive and negative values make sense, as we can define them referring to some unique reference point, which is either inside or outside of the region.

When talking circulation, however, we’re talking about some curve in space. Now it’s not so easy to find some unique reference point. We may say that we are looking at some curve from some point ‘in front of’ that curve, but some other person whose position, from our point of view, would be ‘behind’ the curve, would not agree with our definition of ‘in front of': in fact, his definition would be exactly the opposite of ours. In short, because of the geometry of the situation involved, our convention in regard to the ‘sign’ of circulation (positive or negative) becomes somewhat more complicated. It’s no longer a simple matter of ‘inward’ or ‘outward’ flow: we need something like a ‘right-hand rule’ indeed. [We could, of course, also adopt a left-hand rule but, by now, you know that, in physics, there’s not much use for a left hand. :-)]

That also ‘explains’ why the vector cross product is non-commutative: A×BB×A. To be fully precise, A×B and B×have the same magnitude but opposite direction: A×B = |A||B|sin(θ)n = –|A||B|sin(θ)(–n) = –(B×A) = B×A. The dot product, on the other hand, is fully commutative: A·B = B·A.

In fact, the concept of circulation is very much related to the concept of angular momentum which, as you’ll remember from a previous post, also involves a vector cross product.


I’ve confused you too much already. The only way out is the full mathematical treatment. So let’s go for that.


Some of the confusion as to what flux actually means in electromagnetism is probably caused by the fact that the illustration above is not a closed surface and, from my previous post, you should remember that Maxwell’s first and third equation define the flux of E and B through closed surfaces. It’s not that the formula above for the flux through a non-closed surface is wrong: it’s just that, in electromagnetism, we usually talk about the flux through a closed surface.

A closed surface has no boundary. In contrast, the surface area above does have a clear boundary and, hence, it’s not a closed surface. A sphere is an example of a closed surface. A cube is an example as well. In fact, an infinitesimally small cube is what’s used to prove a very convenient theorem, referred to as Gauss’ Theorem. We will not prove it here, but just try to make sure you ‘understand’ what it says.

Suppose we have some vector field C and that we have some closed surface S – a sphere, for example, but it may also be some very irregular volume. Its shape doesn’t matter: the only requirement is that it’s defined by a closed surface. Let’s then denote the volume that’s enclosed by this surface by V. Now, the flux through some (infinitesimal) surface element da will, effectively, be given by that formula above:

flux = (average normal component)·(surface area)

What’s the average normal component in this case? It’s given by that ΔJ/Δa2 = (ΔJ/Δa1)cosθ = h·formula, except that we just need to substitute h for C here, so we have C·n instead of h·n. To get the flux through the closed surface S, we just need to add all the contributions. Adding those contributions amounts to taking the following surface integral:

Surface integral

Now, I talked about Gauss’ Theorem above, and I said I would not prove it, but this is what Gauss’ Theorem says:

Gauss Theorem

Huh? Don’t panic. Just try to ‘read’ what’s written here. From all that I’ve said so far, you should ‘understand’ the surface integral on the left-hand side. So that should be OK. Let’s now look at the right-hand side. The right-hand side uses the divergence operator which I introduced above: ·(vector). In this case, ·C. That’s a scalar, as we know, and it represents the outward flux from an infinitesimally small cube inside the surface indeed. The volume integral on the right-hand side adds all of the fluxes out of each part (think of it as zillions of infinitesimally small cubes) of the volume V that is enclosed by the (closed) surface S. So that’s what Gauss’ Theorem is all about. In words, we can state Gauss’ Theorem as follows:

Gauss’ Theorem: The (surface) integral of the normal component of a vector (field) over a closed surface is the (volume) integral of the divergence of the vector over the volume enclosed by the surface.

Again, I said I would not prove Gauss’ Theorem, but its proof is actually quite intuitive: to calculate the flux out of a large volume, we can ‘cut it up’ in smaller volumes, and then calculate the flux out of these volumes. If we add it up, we’ll get the total flux. In any case, I’ll refer you to Feynman in case you’d want to see how it goes exactly. So far, I did what I promised to do, and that’s to relate the formula for flux (i.e. that (average normal component)·(surface area) formula) to the divergence operator. Let’s now do the same for the curl.


For non-native English speakers (like me), it’s always good to have a look at the common-sense definition of ‘curl': as a verb (to curl), it means ‘to form or cause to form into a curved or spiral shape’. As a noun (e.g. a curl of hair), it means ‘something having a spiral or inwardly curved form’. It’s clear that, while not the same, we can indeed relate this common-sense definition to the concept of circulation that we introduced above:

circulation = (the average tangential component)·(the distance around)

So that’s the (scalar) product we already mentioned above. How do we relate it to that curl operator?

Patience, please ! The illustration below shows what we actually have to do to calculate the circulation around some loop Γ: we take an infinite number of vector dot products C·ds. Take a careful look at the notation here: I use bold-face for s and, hence, ds is some little vector indeed. Going to the limit, ds becomes a differential indeed. The fact that we’re talking a vector dot product here ensures that only the tangential component of C enters the equation’, so to speak. I’ll come back to that in a moment. Just have a good look at the illustration first.


Such infinite sum of vector dot products C·dis, once again, an integral. It’s another ‘special’ integral, in fact. To be precise, it’s a line integral. Moreover, it’s not just any line integral: we have to go all around the (closed) loop to take it. We cannot stop somewhere halfway. That’s why Feynman writes it with a little loop (ο) through the integral sign (∫):

Line integral

Note the subtle difference between the two products in the integrands of the integrals above: Ctds versus C·ds. The first product is just a product of two scalars, while the second is a dot product of two vectors. Just check it out using the definition of a dot product (A·B = |A||B|cosθ) and substitute A and B by C and ds respectively, noting that the tangential component Ct equals C times cosθ indeed.

Now, once again, we want to relate this integral with that dot product inside to one of those vector operators we introduced above. In this case, we’ll relate the circulation with the curl operator. The analysis involves infinitesimal squares (as opposed to those infinitesimal cubes we introduced above), and the result is what is referred to as Stokes’ Theorem. I’ll just write it down:

Stokes Theorem

Again, the integral on the left was explained above: it’s a line integral taking around the full loop Γ. As for the integral on the right-hand side, that’s a surface integral once again but, instead of a div operator, we have the curl operator inside and, moreover, the integrand is the normal component of the curl only. Now, remembering that we can always find the normal component of a vector (i.e. the component that’s normal to the surface) by taking the dot product of that vector and the unit normal vector (n), we can write Stokes’s Theorem also as:

Stokes Theorem-2

That doesn’t look any ‘nicer’, but it’s the form in which you’ll usually see it. Once again, I will not give you any formal proof of this. Indeed, if you’d want to see how it goes, I’ll just refer you to Feynman’s Lectures. However, the philosophy behind is the same. The first step is to prove that we can break up the surface bounded by the loop Γ into a number of smaller areas, and that the circulation around Γ will be equal to the sum of the circulations around the little loops. The idea is illustrated below:

Proof Stokes

Of course, we then go to the limit and cut up the surface into an infinite number of infinitesimally small squares. The next step in the proof then shows that the circulation of around an infinitesimal square is, indeed, (i) the component of the curl of C normal to the surface enclosed by that square multiplied by (ii) the area of that (infinitesimal) square. The diagram and formula below do not give you the proof but just illustrate the idea:

Stokes proof

Stokes proof - 2

OK, you’ll say, so what? Well… Nothing much. I think you have enough to digest as for now. It probably looks very daunting, but so that’s all we need to know – for the moment that is – to arrive at a better ‘physical’ understanding of Maxwell’s famous equations. I’ll come back to them in my next post. Before proceeding to the summary of this whole post, let me just write down Stokes’ Theorem in words:

Stokes’ TheoremThe line integral of the tangential component of a vector (field) around a closed loop is equal to the surface integral of the normal component of the curl of that vector over any surface which is bounded by the loop.


We’ve defined three so-called vector operators, which we’ll use very often in physics:

  1. T = grad T = a vector
  2. ∇·h = div h = a scalar
  3. ×h = curl h = a vector

Moreover, we also explained three important theorems, which we’ll use as least as much:

[1] The First Theorem:

Line integral - expression -2

[2] Gauss Theorem:

Gauss Theorem-2

[3] Stokes Theorem:

Stokes Theorem-2

As said, we’ll come back to them in my next post. As for now, just try to familiarize yourself with these div and curl operators. Try to ‘understand’ them as good as you can. Don’t look at them as just some weird mathematical definition: try to understand them in a ‘physical’ way, i.e. in a ‘completely unmathematical, imprecise, and inexact way’, remembering that’s what it takes to understand to truly understand physics. :-)

Back to tedious stuff: an introduction to electromagnetism

It seems I skipped too many chapters in Feynman’s second volume of Lectures (on electromagnetism) and so I have to return to that before getting back to quantum physics. So let me just do that in the next couple of posts. I’ll have to start with the basics: Maxwell’s equations.

Indeed, electromagnetic phenomena are described by a set of four equations known as Maxwell’s equations. They relate two fields: the electric field (E) and the magnetic field (B). The electric field appears when we have electric charges: positive (e.g. protons or positively charged ions) or negative (e.g. electrons or negatively charged ions). That’s obvious.

In contrast, there is no such thing as ‘magnetic charges’. The magnetic field appears only when the electric field changes, or when charges move. In turn, the change in the magnetic field causes an electric field, and that’s how electromagnetic radiation basically works: a changing electric field causes a magnetic field, and the build-up of that magnetic field (so that’s a changing magnetic field) causes a build-up of an electric field, and so on and so on.

OK. That’s obvious too. But how does it work exactly? Before explaining this, I need to point out some more ‘obvious’ things:

1. From Maxwell’s equations, we can calculate the magnitude of E and B. Indeed, a specific functional form for E and is what we get when we solve Maxwell’s set of equations, and we’ll jot down that solution in a moment–even if I am afraid you will shake your head when you see it. The point to note is that what we get as a solution for E and B is a solution in a particular frame of reference only: if we switch to another reference frame, E and B will look different.

Huh? Yes. According to the principle of relativity, we cannot say which charges are ‘stationary’ and which charges are ‘moving’ in any absolute sense: it all depends on our frame our reference.

But… Yes? Then if we put an electric charge in these fields, the force on it will also be different?

Yes. Forces also look different when moving from one reference to another.

But… Yes? The physical effect surely has to be the same, regardless of the reference frame?

Yes. The point is that, if we look at an electric charge q moving along a current-carrying wire in a coordinate system at rest with respect to the wire, with the same velocity (v0) as the conduction electrons (v), then the whole force on the electric charge will be ‘magnetic': F = qv0×B and E = 0. Now, if we’re looking at the same situation from a frame of reference that is moving with q, then our charge is at rest, and so there can be no magnetic force on it. Hence, the force on it must come from an electric field! But what produces the electric field? Our current-carrying wire is supposed to be neutral!

Well… It turns out that our ‘neutral’ wire appears to be charged when moving. We’ll explain – in very much detail – why this is so later. Now, you should just note that “we should not attach too much reality to E and B, because they appear in different ‘mixtures’ in different coordinate systems”, as Feynman puts it. In fact, you may or may not heard that magnetism is actually nothing but a “relativistic effect” of electricity. Well… That’s true, but we’ll also explain how that works later only. Let’s not jump the gun.

2. The remark above is related to the other ‘obvious’ thing I wanted to say before presenting Maxwell’s equations: fields are very useful to describe what’s going on but, when everything is said and done, what we really want to know is what force will be acting on a charge, because that’s what’s going to tell us how that charge is going to move. In other words, we want to find the equations of motion, and the force determines how the charge’s momentum will change: F = dp/dt = d(mv)/dt (i.e. Newton’s equation of motion).

So how does that work? We’ve given the formula before:

F = q(E + v×B) = qE + q(v×B)

This is a sum of two vectors:

  1. qE is the ‘electric force: that force is in the same direction as the electric field, but with a magnitude equal to q times E. [Note I use a bold letter (E) for a vector (which we may define as some quantity with a direction) and a non-bold letter (E) for its magnitude.]
  2. q(v×B) is the ‘magnetic’ force: that force depends on both v as well as on B. Its direction is given by the so-called right-hand rule for a vector cross-product (as opposed to a dot product, which is denoted by a dot (·) and which yields a scalar instead of a new vector).

That right-hand rule is illustrated below. Note that, if we switch a and b, the b×a vector will point downwards. The magnitude of q(v×B) is given by |v×B| = |v||B|sinθ (with θ the angle between v and B).    


We know the direction of (because we’re talking about some charge that is moving here) but what direction is B? It’s time to be a bit more systematic now.

Flux and circulation

In order to understand Maxwell’s equations, one needs to understand two concepts related to a vector field: flux and circulation. The two concepts are best illustrated referring to a vector field describing the flow of a liquid:

1. If we have a surface, the flux will give us the net amount of fluid going out through the surface per unit time. The illustration below (which I took from Feynman’s Lectures) gives us not only the general idea but a formal definition as well:


2. The concept of circulation is linked to the idea of some net rotational motion around some loop. In fact, that’s exactly what it describes. I’ll again use Feynman’s illustration (and description) because I couldn’t find anything better.

circulation-1 circulation-2 circulation-3

Diagram (a) gives us the velocity field in the liquid. Now, imagine a tube (of uniform cross section) that follows some arbitrary closed curve, like in (b), and then imagine we’d suddenly freeze the liquid everywhere except inside the tube: the liquid in the tube would circulate as shown in (c). Formally, the circulation is defined as:

circulation = (the average tangential component)·(the distance around)

OK. So far, so good. Back to electromagnetism.

E and B

We’re familiar with the electric field E from our high school physics course. Indeed, you’ll probably recognize the two examples below: (a) a (positive) charge near a (neutral) conducting sheet, and (b) two opposite charges next to each other. Note the convention: the field lines emanate from the positive charge. Does that mean that the force is in that direction too? Yes. But remember: if a particle is attracted to another, the latter particle is attracted to the former too! So there’s a force in both directions !


What more can we say about this? Well… It is clear that the field E is directed radially. In terms of our flux and circulation concepts, we say that there’s an outgoing flux from the (positive) point charge. Furthermore, it would seem to be pretty obvious (we’d need to show why, but we won’t do that here: just look at Coulomb’s Law once again) that the flux should be proportional to the charge, and it is: if we double the charge, the flux doubles too. That gives us Maxwell’s first equation:

flux of E through a closed surface = (the net charge inside)/ε0

Note we’re talking a closed surface here, like a sphere for example–but it does not have to be a nice symmetric shape: Maxwell’s first equation is valid for any closed surface. The expression above is Coulomb’s Law, which you’ll also surely remember from your high school physics course: while it looks very different, it’s the same. It’s just because we’re using that flux concept here that we seem to be getting an entirely different expression. But so we’re not: it’s the same as Coulomb’s Law.

As for the ε0 factor, that’s just a constant that depends on the units we’re using to measure what we write above, so don’t worry about it. [I am noting it here because you’ll see it pop up later too.]

For B, we’ve got a similar-looking law:

flux of B through a closed surface = 0 (= zero = nil)

That’s not the same, you’ll say. Well… Yes and no. It’s the same really, but the zero on the right-hand side of the expression above says there’s no such thing as a ‘magnetic’ charge.

Hmm… But… If we can’t create any flux of B, because ‘magnetic charges’ don’t exist, so how do we get magnetic fields then? 

Well… We wrote that above already, and you should remember it from your high school physics course as well: a magnetic field is created by (1) a moving charge (i.e. a flow or flux of electric current) or (2) a changing electric field.

Situation (1) is illustrated below: the current in the wire creates some circulation of B around the wire. How much? Not much: the magnetic effect is very small as compared to the electric effect (that has to do with magnetism being a relativistic effect of electricity but, as mentioned above, I’ll explain that later only). To be precise, the equation is the following:

c2(circulation of B)= (flux of electric current)/ε0

magnetic field wire

That c2 factor on the left-hand side becomes 1/c2 if we move it to the other side and, yes, is the speed of light here – so you can see we’re talking a very small amount of circulation only indeed! [As for the ε0 factor, that’s just the same constant: it’s got to do with the units we’re using to measure stuff.]

One last point perhaps: what’s the direction of the circulation? Well… There’s a so-called right-hand grip rule for that, which is illustrated below.


OK. Enough about this. Let’s go to situation (2): a changing electric field. That effect is usually illustrated with Faraday’s original 1831 experiment, which is shown below with a more modern voltmeter :-) : when the wire on one side of the iron ring is connected to the battery, we’ll see a transient current on the other side. It’s transient only, so the current quickly disappears. That’s why transformers don’t work with DC. In fact, it is said that Faraday was quite disappointed to see that the current didn’t last! Likewise, when the wire is disconnected, we’ll briefly see another transient current.


So this effect is due to the changing electric field, which causes a changing magnetic field. But so where is that magnetic field? We’re talking currents here, aren’t we? Yes, you’re right. To understand why we have a transient current in the voltmeter, you need to understand yet another effect: a changing magnetic field causes an electric field, and so that’s what actually generates the transient current. However, what’s going on in the iron ring is the magnetic effect, and so that’s caused by the changing electric field as we connect/disconnect the battery to the wire. Capito?

I guess so… So what’s the equation that captures this situation, i.e. situation (2)? That equation involves both flux and circulation, so we’ll have a surface (S) as well as a curve (C). The equation is the following one: for any surface S (not closed this time because, if the surface was closed, it wouldn’t have an edge!), we have:

c2(circulation of B around C)= d(flux of E through S)/dt

I mentioned above that the reverse is also true. A changing magnetic field causes an electric field, and the equation for that looks very similar, except that we don’t have the c2 factor:

circulation of around = d(flux of through S)/dt

Let me quickly mention the presence of absence of that c2 or 1/c2 factor in the previous equations once again. It is interesting. It’s got nothing to do with the units. It’s really a proportionality factor: any change in E will only cause a little change in (because of the 1/c2 factor in the first equation), but the reverse is not true: there’s no c2  in the second equation. Again, it’s got to do with magnetism being a relativistic effect of electricity, so the magnetic effect is, in most cases, tiny as compared to the electric effect, except when we’re talking charges that are moving at relativistic speeds (i.e. speeds close to c). As said, we’ll come back to that–later, much later. Let’s get back to Maxwell’s equations first.

Maxwell’s equations

We can now combine all of the equations above in one set, and so these are Maxwell’s four famous equations:

  1. The flux of E through a closed surface = (the net charge inside)/ε0
  2. The circulation of E around = d(flux of through S)/dt (with the curve or edge around S)
  3. The flux of B through a closed surface = 0
  4. c2(circulation of B around C)= d(flux of E through S)/dt + (flux of electric current)/ε0

From a mathematical point of view, this is a set of differential equations, and they are not easy to grasp intuitively. As Feynman puts it: “The laws of Newton were very simple to write down, but they had a lot of complicated consequences and it took us a long time to learn about them all. These laws are not nearly as simple to write down, which means that the consequences are going to be more elaborate and it will take us quite a lot of time to figure them all out.”

Indeed, Feynman needs about twenty (!) Lectures in that second Volume to show what it all implies, as he walks us through electrostatics, magnetostatics and various other ‘special’ cases before giving us the ‘complete’ or ‘general’ solution to the equations. This ‘general’ solution, in mathematical notation, is the following:

Maxwell's equations

Huh? What’s that? Well… The four equations are the equations we explained already, but this time in mathematical notation: flux and circulation can be expressed much more elegantly using the differential operator  indeed. As for the solutions to Maxwell’s set of equations, you can see they are expressed using two other concepts: the scalar potential Φ and the vector potential A.

Now, it is not my intention to summarize two dozen of Feynman’s Lectures in just a few lines, so I’ll have to leave you here for the moment.


Huh? What? What about my promise to show that magnetism is a relativistic effect of electricity indeed?

Well… I wanted to do that just now, but when I look at it, I realize that I’d end up copying most of Feynman’s little exposé on it and, hence, I’ll just refer you to that particular section. It’s really quite exciting but – as you might expect – it does take a bit of time to wrestle through it.

That being said, it really does give you a kind of an Aha-Erlebnis and, therefore, I really warmly recommend it ! Just click on the link ! :-)

Amplitudes and statistics

When re-reading Feynman’s ‘explanation’ of Bose-Einstein versus Fermi-Dirac statistics (Lectures, Vol. III, Chapter 4), and my own March 2014 post summarizing his argument, I suddenly felt his approach raises as many questions as it answers. So I thought it would be good to re-visit it, which is what I’ll do here. Before you continue reading, however, I should warn you: I am not sure I’ll manage to do a better job now, as compared to a few months ago. But let me give it a try.

Setting up the experiment

The (thought) experiment is simple enough: what’s being analyzed is the (theoretical) behavior of two particles, referred to as particle a and particle b respectively that are being scattered into  two detectors, referred to as 1 and 2. That can happen in two ways, as depicted below: situation (a) and situation (b). [And, yes, it’s a bit confusing to use the same letters a and b here, but just note the brackets and you’ll be fine.] It’s an elastic scattering and it’s seen in the center-of-mass reference frame in order to ensure we can analyze it using just one variable, θ, for the angle of incidence. So there is no interaction between those two particles in a quantum-mechanical sense: there is no exchange of spin (spin flipping) nor is there any exchange of energy–like in Compton scattering, in which a photon gives some of its energy to an electron, resulting in a Compton shift (i.e. the wavelength of the scattered photon is different than that of the incoming photon). No, it’s just what it is: two particles deflecting each other. […] Well… Maybe. Let’s fully develop the argument to see what’s going on.

Identical particles-aIdentical particles-b

First, the analysis is done for two non-identical particles, say an alpha particle (i.e. a helium nucleus) and then some other nucleus (e.g. oxygen, carbon, beryllium,…). Because of the elasticity of the ‘collision’, the possible outcomes of the experiment are binary: if particle a gets into detector 1, it means particle b will be picked up by detector 2, and vice versa. The first situation (particle a gets into detector 1 and particle b goes into detector 2) is depicted in (a), i.e. the illustration on the left above, while the opposite situation, exchanging the role of the particles, is depicted in (b), i.e. the illustration on the right-hand side. So these two ‘ways’ are two different possibilities which are distinguishable not only in principle but also in practice, for non-identical particles that is (just imagine a detector which can distinguish helium from oxygen, or whatever other substance the other particle is). Therefore, strictly following the rules of quantum mechanics, we should add the probabilities of both events to arrive at the total probability of some particle (and with ‘some’, I mean particle a or particle b) ending up in some detector (again, with ‘some’ detector, I mean detector 1 or detector 2).

Now, this is where Feynman’s explanation becomes somewhat tricky. The whole event (i.e. some particle ending up in some detector) is being reduced to two mutually exclusive possibilities that are both being described by the same (complex-valued) wave function f, which has that angle of incidence as its argument. To be precise: the angle of incidence is θ for the first possibility and it’s π–θ for the second possibility. That being said, it is obvious, even if Feynman doesn’t mention it, that both possibilities actually represent a combination of two separate things themselves:

  1. For situation (a), we have particle a going to detector 1 and particle b going to detector 2. Using Dirac’s so-called bra-ket notation, we should write 〈1|a〉〈2|b〉 = f(θ), with f(θ) a probability amplitude, which should yield a probability when taking its absolute square: P(θ) = |f(θ)|2.
  2. For situation (b), we have particle b going to detector 1 and particle a going to 2, so we have 〈1|b〉〈2|a〉, which Feynman equates with f(π–θ), so we write 〈1|b〉〈2|a〉 = 〈2|a〉〈1|b〉 = f(π–θ).

Now, Feynman doesn’t dwell on this–not at all, really–but this casual assumption–i.e. the assumption that situation (b) can be represented by using the same wave function f–merits some more reflection. As said, Feynman is very brief on it: he just says situation (b) is the same situation as (a), but then detector 1 and detector 2 being switched (so we exchange the role of the detectors, I’d say). Hence, the relevant angle is π–θ and, of course, it’s a center-of-mass view again so if a goes to 2, then b has to go to 1. There’s no Third Way here. In short, a priori it would seem to be very obvious indeed to associate only one wave function (i.e. that (complex-valued) f(θ) function) with the two possibilities: that wave function f yields a probability amplitude for θ and, hence, it should also yield some (other) probability amplitude for π–θ, i.e. for the ‘other’ angle. So we have two probability amplitudes but one wave function only.

You’ll say: Of course! What’s the problem? Why are you being fussy? Well… I think these assumptions about f(θ) and f(π–θ) representing the underlying probability amplitudes are all nice and fine (and, yes, they are very reasonable indeed), but I also think we should take them for what they are at this moment: assumptions.

Huh? Yes. At this point, I would like to draw your attention to the fact that the only thing we can measure are real-valued possibilities. Indeed, when we do this experiment like a zillion times, it will give us some real number P for the probability that a goes to 1 and b goes to 2 (let me denote this number as P(θ) = Pa→1 and b→2), and then, when we change the angle of incidence by switching detector 1 and 2, it will also give us some (other) real number for the probability that a goes to 2 and b goes to 1 (i.e. a number which we can denote as P(π–θ) = Pa→2 and b→1). Now, while it would seem to be very reasonable that the underlying probability amplitudes are the same, we should be honest with ourselves and admit that the probability amplitudes are something we cannot directly measure.

At this point, let me quickly say something about Dirac’s bra-ket notation, just in case you haven’t heard about it yet. As Feynman notes, we have to get away from thinking too much in terms of wave functions traveling through space because, in quantum mechanics, all sort of stuff can happen (e.g. spin flipping) and not all of it can be analyzed in terms of interfering probability amplitudes. Hence, it’s often more useful to think in terms of a system being in some state and then transitioning to some other state, and that’s why that bra-ket notation is so helpful. We have to read these bra-kets from right to left: the part on the right, e.g. |a〉, is the ket and, in this case, that ket just says that we’re looking at some particle referred to as particle a, while the part on the left, i.e. 〈1|, is the bra, i.e. a shorthand for particle a having arrived at detector 1. If we’d want to be complete, we should write:

〈1|a〉 = 〈particle a arrives at detector 1|particle a leaves its source〉

Note that 〈1|a〉 is some complex-valued number (i.e. a probability amplitude) and so we multiply it here with some other complex number, 〈2|b〉, because it’s two things happening together. As said, don’t worry too much about it. Strictly speaking, we don’t need wave functions and/or probability amplitudes to analyze this situation because there is no interaction in the quantum-mechanical sense: we’ve got a scattering process indeed (implying some randomness in where those particles end up, as opposed to what we’d have in a classical analysis of two billiard balls colliding), but we do not have any interference between wave functions (probability amplitudes) here. We’re just introducing the wave function f because we want to illustrate the difference between this situation (i.e. the scattering of non-identical particles) and what we’d have if we’d be looking at identical particles being scattered.

At this point, I should also note that this bra-ket notation is more in line with Feynman’s own so-called path integral formulation of quantum mechanics, which is actually implicit in his line of argument: rather than thinking about the wave function as representing the (complex) amplitude of some particle to be at point x in space at point t in time, we think about the amplitude as something that’s associated with a path, i.e. one of the possible itineraries from the source (its origin) to the detector (its destination). That explains why this f(θ) function doesn’t mention the position (x) and space (t) variables. What x and t variables would we use anyway? Well… I don’t know. It’s true the position of the detectors is fully determined by θ, so we don’t need to associate any x or t with them. Hence, if we’d be thinking about the space-time variables, then we should be talking the position in space and time of both particle a and particle b. Indeed, it’s easy to see that only a slight change in the horizontal (x) or vertical position (y) of either particle would ensure that both particles do not end up in the detectors. However, as mentioned above, Feynman doesn’t even mention this. Hence, we must assume that any randomness in any x or t variable is captured by that wave function f, which explains why this is actually not a classical analysis: so, in short, we do not have two billiard balls colliding here.

Hmm… You’ll say I am a nitpicker. You’ll say that, of course, any uncertainty is indeed being incorporated in the fact that we represent what’s going on by a wave function f which we cannot observe directly but whose absolute square represents a probability (or, to use precise statistical terminology, a probability density), which we can measure: P = |f(θ)|2 = f(θ)·f*(θ), with f* the complex conjugate of the complex number f. So… […] What? Well… Nothing. You’re right. This thought experiment describes a classical situation (like two billiard balls colliding) and then it doesn’t, because we cannot predict the outcome (i.e. we can’t say where the two billiard balls are going to end up: we can only describe the likely outcome in terms of probabilities Pa→1 and b→2 = |f(θ)|and Pa→2 and b→1 = |f(π–θ)|2. Of course, needless to say, the normalization condition should apply: if we add all probabilities over all angles, then we should get 1, we can write: ∫|f(θ)|2dθ = ∫f(θ)·f*(θ)dθ = 1. So that’s it, then?

No. Let this sink in for a while. I’ll come back to it. Let me first make a bit of a detour to illustrate what this thought experiment is supposed to yield, and that’s a more intuitive explanation of Bose-Einstein statistics and Fermi-Dirac statistics, which we’ll get out of the experiment above if we repeat it using identical particles. So we’ll introduce the terms Bose-Einstein statistics and Fermi-Dirac statistics. Hence, there should also be some term for the reference situation described above, i.e a situation in which we non-identical particles are ‘interacting’, so to say, but then with no interference between their wave functions. So, when everything is said and done, it’s a term we should associate with classical mechanics. It’s called Maxwell-Boltzmann statistics.

Huh? Why would we need ‘statistics’ here? Well… We can imagine many particles engaging like this–just colliding elastically and, thereby, interacting in a classical sense, even if we don’t know where exactly they’re going to end up, because of uncertainties in initial positions and what have you. In fact, you already know what this is about: it’s the behavior of particles as described by the kinetic theory of gases (often referred to as statististical mechanics) which, among other things, yields a very elegant function for the distribution of the velocities of gas molecules, as shown below for various gases (helium, neon, argon and xenon) at one specific temperature (25º C), i.e. the graph on the left-hand side, or for the same gas (oxygen) at different temperatures (–100º C, 20º C and 600º C), i.e. the graph on the right-hand side.

Now, all these density functions and what have you are, indeed, referred to as Maxwell-Boltzmann statistics, by physicists and mathematicians that is (you know they always need some special term in order to make sure other people (i.e. people like you and me, I guess) have trouble understanding them).

700px-MaxwellBoltzmann-en 800px-Maxwell-Boltzmann_distribution_1

In fact, we get the same density function for other properties of the molecules, such as their momentum and their total energy. It’s worth elaborating on this, I think, because I’ll later compare with Bose-Einstein and Fermi-Dirac statistics.

Maxwell-Boltzmann statistics

Kinetic gas theory yields a very simple and beautiful theorem. It’s the following: in a gas that’s in thermal equilibrium (or just in equilibrium, if you want), the probability (P) of finding a molecule with energy E is proportional to e–E/kT, so we have:

P ∝ e–E/kT

Now that’s a simple function, you may think. If we treat E as just a continuous variable, and T as some constant indeed – hence, if we just treat (the probability) P as a function of (the energy) E – then we get a function like the one below (with the blue, red and green using three different values for T).


So how do we relate that to the nice bell-shaped curves above? The very simple graphs above seem to indicate the probability is greatest for E = 0, and then just goes down, instead of going up initially to reach some maximum around some average value and then drop down again. Well… The fallacy here, of course, is that the constant of proportionality is itself dependent on the temperature. To be precise, the probability density function for velocities is given by:

Boltzmann distribution

The function for energy is similar. To be precise, we have the following function:

Boltzmann distribution-energy

This (and the velocity function too) is a so-called chi-squared distribution, and ϵ is the energy per degree of freedom in the system. Now these functions will give you such nice bell-shaped curves, and so all is alright. In any case, don’t worry too much about it. I have to get back to that story of the two particles and the two detectors.

However, before I do so, let me jot down two (or three) more formulas. The first one is the formula for the expected number 〈Ni〉 of particles occupying energy level ε(and the brackets here, 〈Ni〉, have nothing to do with the bra-ket notation mentioned above: it’s just a general notation for some expected value):

Boltzmann distribution-no of particlesThis formula has the same shape as the ones above but we brought the exponential function down, into the denominator, so the minus sign disappears. And then we also simplified it by introducing that gi factor, which I won’t explain here, because the only reason why I wanted to jot this down is to allow you to compare this formula with the equivalent formula when (a) Fermi-Dirac and (b) Bose-Einstein statistics apply:

B-E and F-D distribution-no of particles

Do you see the difference? The only change in the formula is the ±1 term in the denominator: we have a minus one (–1) for Fermi-Dirac statistics and a plus one (+1) for Bose-Einstein statistics indeed. That’s all. That’s the difference with Maxwell-Boltzmann statistics.

Huh? Yes. Think about it, but don’t worry too much. Just make a mental note of it, as it will be handy when you’d be exploring related articles. [And, of course, please don’t think I am bagatellizing the difference between Maxwell-Boltzmann, Bose-Einstein and Fermi-Dirac statistics here: that ±1 term in the denominator is, obviously, a very important difference, as evidenced by the consequences of formulas like the one above: just think about the crowding-in effect in lasers as opposed to the Pauli exclusion principle, for example. :-)]

Setting up the experiment (continued)

Let’s get back to our experiment. As mentioned above, we don’t really need probability amplitudes in the classical world: ordinary probabilities, taking into account uncertainties about initial conditions only, will do. Indeed, there’s a limit to the precision with which we can measure the position in space and time of any particle in the classical world as well and, hence, we’d expect some randomness (as captured in the scattering phenomenon) but, as mentioned above, ordinary probabilities would do to capture that. Nevertheless, we did associate probability amplitudes with the events described above in order to illustrate the difference with the quantum-mechanical world. More specifically, we distinguished:

  1. Situation (a): particle a goes to detector 1 and b goes to 2, versus
  2. Situation (b): particle a goes to 2 and b goes to 1.

In our bra-ket notation:

  1. 〈1|a〉〈2|b〉 = f(θ), and
  2. 〈1|b〉〈2|a〉 = f(π–θ).

The f(θ) function is a quantum-mechanical wave function. As mentioned above, while we’d expect to see some space (x) and time (t) variables in it, these are, apparently, already captured by the θ variable. What about f(π–θ)? Well… As mentioned above also, that’s just the same function as f(θ) but using the angle π–θ as the argument. So, the following remark is probably too trivial to note but let me do it anyway (to make sure you understand what we’re modeling here really): while it’s the same function f, the values f(θ) and f(π–θ) are, of course, not necessarily equal and, hence, the corresponding probabilities are also not necessarily the same. Indeed, some angles of scattering may be more likely than others. However, note that we assume that the function f itself is  exactly the same for the two situations (a) and (b), as evidenced by that normalization condition we assume to be respected: if we add all probabilities over all angles, then we should get 1, so ∫|f(θ)|2dθ = ∫f(θ)·f*(θ)dθ = 1.

So far so good, you’ll say. However, let me ask the same critical question once again: why would we use the same wave function f for the second situation? 

Huh? You’ll say: why wouldn’t we? Well… Think about it. Again, how do we find that f(θ) function? The assumption here is that we just do the experiment a zillion times while varying the angle θ and, hence, that we’ll find some average corresponding to P(θ), i.e. the probability. Now, the next step then is to equate that average value to |f(θ)|obviously, because we have this quantum-mechanical theory saying probabilities are the absolute square of probability amplitudes. And,  so… Well… Yes. We then just take the square root of the P function to find the f(θ) function, isn’t it?

Well… No. That’s where Feynman is not very accurate when it comes to spelling out all of the assumptions underpinning this thought experiment. We should obviously watch out here, as there’s all kinds of complications when you do something like that. To a large extent (perhaps all of it), the complications are mathematical only.

First, note that any number (real or complex, but note that |f(θ)|2 is a real number) has two distinct real square roots: a positive and a negative one: x = ± √x2. Secondly, we should also note that, if f(θ) is a regular complex-valued wave function of x and t and θ (and with ‘regular’, we mean, of course, that’s it’s some solution to a Schrödinger (or Schrödinger-like) equation), then we can multiply it with some random factor shifting its phase Θ (usually written as Θ = kx–ωt+α) and the square of its absolute value (i.e. its squared norm) will still yield the same value. In mathematical terms, such factor is just a complex number with a modulus (or length or norm–whatever terminology you prefer) equal to one, which we can write as a complex exponential: eiα, for example. So we should note that, from a mathematical point of view, any function eiαf(θ) will yield the same probabilities as f(θ). Indeed,

|f(θ)|= |eiαf(θ)|= (|eiα||f(θ)|)= |eiα|2|f(θ)|= 12|f(θ)|2

Likewise, while we assume that this function f(π–θ) is the same function f as that f(θ) function, from a mathematical point of view, the function eiβf(π–θ) would do just as well, because its absolute square yields the very same (real) probability |f(π–θ)|2. So the question as to what wave function we should take for the probability amplitude is not as easy to answer as you may think. Huh? So what function should we take then? Well… We don’t know. Fortunately, it doesn’t matter, for non-identical particles that is. Indeed, when analyzing the scattering of non-identical particles, we’re interested in the probabilities only and we can calculate the total probability of particle a ending up in detector 1 or 2 (and, hence, particle b ending up in detector 2 or 1) as the following sum:

|eiαf(θ)|2 +|eiβf(π–θ)|= |f(θ)|2 +|f(π–θ)|2.

In other words, for non-identical particles, these phase factors (eiα or eiβ) don’t matter and we can just forget about them.

However, and that’s the crux of the matter really, we should mention them, of course, in case we’d have to add the probability amplitudeswhich is exactly what we’ll have to do when we’re looking at identical particles, of course. In fact, in that case (i.e. when these phase factors eiα and eiβ will actually matter), you should note that what matters really is the phase difference, so we could replace α and β with some δ (which is what we’ll do below).

However, let’s not put the cart before the horse and conclude our analysis of what’s going on when we’re considering non-identical parties: in that case, this phase difference doesn’t matter. And the remark about the positive and negative square root doesn’t matter either. In fact, if you want, you can subsume it under the phase difference story by writing eiα as eiα = ± 1. To be more explicit: we could say that –f(θ) is the probability amplitude, as |–f(θ)|is also equal to that very same real number |f(θ)|2. OK. Done.

Bose-Einstein and Fermi-Dirac statistics

As I mentioned above, the story becomes an entirely different one when we’re doing the same experiment with identical particles. At this point, Feynman’s argument becomes rather fuzzy and, in my humble opinion, that’s because he refused to be very explicit about all of those implicit assumptions I mentioned above. What I can make of it, is the following:

1. We know that we’ll have to add probability amplitudes, instead of probabilities, because we’re talking one event that can happen in two indistinguishable ways. Indeed, for non-identical particles, we can, in principle (and in practice) distinguish situation (a) and (b) – and so that’s why we only have to add some real-valued numbers representing probabilities – but so we cannot do do that for identical particles.

2. Situation (a) is still being described by some probability amplitude f(θ). We don’t know what function exactly, but we assume there is some unique wave function f(θ) out there that accurately describes the probability amplitude of particle a going to 1 (and, hence, particle b going to 2), even if we can’t tell which is a and which is b. What about the phase factor? Well… We just assume we’ve chosen our t such that α = 0. In short, the assumption is that situation (a) is represented by some probability amplitude (or wave function, if you prefer that term) f(θ).

3. However, a (or some) particle (i.e. particle a or particle b) ending up in a (some) detector (i.e. detector 1 or detector 2) may come about in two ways that cannot be distinguished one from the other. One is the way described above, by that wave function f(θ). The other way is by exchanging the role of the two particles. Now, it would seem logical to associate the amplitude f(π–θ) with the second way. But we’re in the quantum-mechanical world now. There’s uncertainty, in position, in momentum, in energy, in time, whatever. So we can’t be sure about the phase. That being said, the wave function will still have the same functional form, we must assume, as it should yield the same probability when squaring. To account for that, we will allow for a phase factor, and we know it will be important when adding the amplitudes. So, while the probability for the second way (i.e. the square of its absolute value) should be the same, its probability amplitude does not necessarily have to be the same: we have to allow for positive and negative roots or, more generally, a possible phase shift. Hence, we’ll write the probability amplitude as eiδf(π–θ) for the second way. [Why do I use δ instead of β? Well… Again: note that it’s the phase difference that matters. From a mathematical point of view, it’s the same as inserting an eiβ factor: δ can take on any value.]

4. Now it’s time for the Big Trick. Nature doesn’t matter about our labeling of particles. If we have to multiply the wave function (i.e. f(π–θ), or f(θ)–it’s the same: we’re talking a complex-valued function of some variable (i.e. the angle θ) here) with a phase factor eiδ when exchanging the roles of the particles (or, what amounts to the same, exchanging the role of the detectors), we should get back to our point of departure (i.e. no exchange of particles, or detectors) when doing that two times in a row, isn’t it? So we exchange the role of particle a and b in this analysis (or the role of the detectors), and then we’d exchange their roles once again, then there’s no exchange of roles really and we’re back at the original situation. So we must have eiδeiδf(θ) = f(θ) (and eiδeiδf(π–θ) = f(π–θ) of course, which is exactly the same statement from a mathematical point of view).

5. However, that means (eiδ)= +1, which, in turn, implies that eiδ is plus or minus one: eiδ = ± 1. So that means the phase difference δ must be equal to 0 or π (or –π, which is the same as +π).

In practical terms, that means we have two ways of combining probability amplitudes for identical particles: we either add them or, else, we subtract them. Both cases exist in reality, and lead to the dichotomy between Bose and Fermi particles:

  1. For Bose particles, we find the total probability amplitude for this scattering event by adding the two individual amplitudes: f(θ) + f(π–θ).
  2. For Fermi particles, we find the total probability amplitude for this scattering event by subtracting the two individual amplitudes: f(θ) – f(π–θ).

As compared to the probability for non-identical particles which, you’ll remember, was equal to |f(θ)|2 +|f(π–θ)|2, we have the following Bose-Einstein and Fermi-Dirac statistics:

  1. For Bose particles: the combined probability is equal to |f(θ) + f(π–θ)|2. For example, if θ is 90°, then we have a scattering probability that is exactly twice the probability for non-identical particles. Indeed, if θ is 90°, then f(θ) = f(π–θ), and then we have |f(π/2) + f(π/2)|2 = |2f(π/2)|2 = 4|f(π/2)|2. Now, that’s two times |f(π/2)|2 +|f(π/2)|2 = 2|f(π/2)|2 indeed.
  2. For Fermi particles (e.g. electrons), we have a combined probability equal to |f(θ) – f(π–θ)|2. Again, if θ is 90°, f(θ) = f(π–θ), and so it would mean that we have a combined probability which is equal to zero ! Now, that‘s a strange result, isn’t it? It is. Fortunately, the strange result has to be modified because electrons will also have spin and, hence, in half of the cases, the two electrons will actually not be identical but have opposite spin. That changes the analysis substantially (see Feynman’s Lectures, III-3-12). To be precise, if we take the spin factor into, we’ll find a total probability (for θ = 90°) equal to |f(π/2)|2, so that’s half of the probability for non-identical particles.

Hmm… You’ll say: Now that was a complicated story! I fully agree. Frankly, I must admit I feel like I still don’t quite ‘get‘ the story with that phase shift eiδ, in an intuitive way that is (and so that’s the reason for going through the trouble of writing out this post). While I think it makes somewhat more sense now (I mean, more than when I wrote a post on this in March), I still feel I’ve only brought some of the implicit assumptions to the fore. In essence, what we’ve got here is a mathematical dichotomy (or a mathematical possibility if you want) corresponding to what turns out to be an actual dichotomy in Nature: in quantum-mechanics, particles are either bosons or fermions. There is no Third Way, in quantum-mechanics that is (there is a Third Way in reality, of course: that’s the classical world!).

I guess it will become more obvious as I’ll get somewhat more acquainted with the real arithmetic involved in quantum-mechanical calculations over the coming weeks. In short, I’ve analyzed this thing over and over again, but it’s still not quite clear me. I guess I should just move on and accept that:

  1. This explanation ‘explains’ the experimental evidence, and that’s different probabilities for identical particles as compared to non-identical particles.
  2. This explanation ‘complements’ analyses such as that 1916 analysis of blackbody radiation by Einstein (see my post on that), which approaches interference from an angle that’s somewhat more intuitive.

A numerical example

I’ve learned that, when some theoretical piece feels hard to read, an old-fashioned numerical example often helps. So let’s try one here. We can experiment with many functional forms but let’s keep things simple. From the illustration (which I copy below for your convenience), that angle θ can take any value between −π and +π, so you shouldn’t think detector 1 can only be ‘north’ of the collision spot: it can be anywhere.

Identical particles-a

Now, it may or may not make sense (and please work out other examples than this one here), but let’s assume particle a and b are more likely to go in a line that’s more or less straight. In other words, the assumption is that both particles deflect each other only slightly, or even not at all. After all, we’re talking ‘point-like’ particles here and so, even when we try hard, it’s hard to make them collide really.

That would amount to a typical bell-shaped curve for that probability density curve P(θ): one like the blue curve below. That one shows that the probability of particle a and b just bouncing back (i.e. θ ≈ ±π) is (close to) zero, while it’s highest for θ ≈ 0, and some intermediate value for anything angle in-between. The red curve shows P(π–θ), which can be found by mirroring the P(θ) around the vertical axis, which yields the same function because the function is symmetrical: P(θ) = P(–θ), and then shifting it by adding the vertical distance π. It should: it’s the second possibility, remember? Particle a ending up in detector 2. But detector 2 is positioned at the angle π–θ and, hence, if π–θ is close to ±π (so if θ ≈ 0), that means particle 1 is basically bouncing back also, which we said is unlikely. On the other hand, if detector 2 is positioned at an angle π–θ ≈ 0, then we have the highest probability of particle a going right to it. In short, the red curve makes sense too, I would think. [But do think about yourself: you’re the ultimate judge!]

Example - graph

The harder question, of course, concerns the choice of some wave function f(θ) to match those P curves above. Remember that these probability densities P are real numbers and any real number is the absolute square (aka the squared norm) of an infinite number of complex numbers! So we’ve got l’embarras du choix, as they say in French. So… What do to? Well… Let’s keep things simple and stupid and choose a real-valued wave function f(θ), such as the blue function below. Huh? You’ll wonder if that’s legitimate. Frankly, I am not 100% sure, but why not? The blue f(θ) function will give you the blue P(θ) above, so why not go along with it? It’s based on a cosine function but it’s only half of a full cycle. Why? Not sure. I am just trying to match some sinusoidal function with the probability density function here, so… Well… Let’s take the next step.

Example 2

The red graph above is the associated f(π–θ) function. Could we choose another one? No. There’s no freedom of choice here, I am afraid: if we choose a functional form for f(θ), then our f(π–θ) function is fixed too. So it is what it is: negative between –π and 0, and positive between 0 and +π and 0. Now that is definitely not good, because f(π–θ) for θ = –π is not equal to f(π–θ) for θ = +π: they’re opposite values. That’s nonsensical, isn’t it? Both the f(θ) and the f(π–θ) should be something cyclical… But, again, let’s go along with it as for now: note that the green horizontal line is the sum of the squared (absolute) values of f(θ) and f(π–θ), and note that it’s some constant.

Now, that’s a funny result, because I assumed both particles were more likely to go in some straight line, rather than recoil with some sharp angle θ. It again indicates I must be doing something wrong here. However, the important thing for me here is to compare with the Bose-Einstein and Fermi-Dirac statistics. What’s the total probability there if we take that blue f(θ) function? Well… That’s what’s shown below. The horizontal blue line is the same as the green line in the graph above: a constant probability for some particle (a or b) ending up in some detector (1 or 2). Note that the surface, when added, of the two rectangles above the x-axis (i.e. the θ-axis) should add up to 1. The red graph gives the probability when the experiment is carried out for (identical) bosons (or Bose particles as I like to call them). It’s weird: it makes sense from a mathematical point of view (the surface under the curve adds up to the same surface under the blue line, so it adds up to 1) but, from a physics point of view, what does this mean? A maximum at θ = π/2 and a minimum at θ = –π/2? Likewise, how to interpret the result for fermions?


Is this OK? Well… To some extent, I guess. It surely matches the theoretical results I mentioned above: we have twice the probability for bosons for θ = 90° (red curve), and a probability equal to zero for the same angle when we’re talking fermions (green curve). Still, this numerical example triggers more questions than it answers. Indeed, my starting hypothesis was very symmetrical: both particle a and b are likely to go in a straight line, rather than being deflected in some sharp(er) angle. Now, while that hypothesis gave a somewhat unusual but still understandable probability density function in the classical world (for non-identical particles, we got a constant for P(θ) + P(π–θ)), we get this weird asymmetry in the quantum-mechanical world: we’re much more likely to catch boson in a detector ‘north’ of the line of firing than ‘south’ of it, and vice versa for fermions.

That’s weird, to say the least. So let’s go back to the drawing board and take another function for f(θ) and, hence, for f(π–θ). This time, the two graphs below assume that (i) f(θ) and f(π–θ) have a real as well as an imaginary part and (ii) that they go through a full cycle, instead of a half-cycle only. This is done by equating the real part of the two functions with cos(θ) and cos(π–θ) respectively, and their imaginary part with sin(θ) and sin(π–θ) respectively. [Note that we conveniently forget about the normalization condition here.]


What do we see? Well… The imaginary part of f(θ) and f(π–θ) is the same, because sin(π–θ) = sin(θ). We also see that the real part of f(θ) and f(π–θ) are the same except for a phase difference equal to π: cos(π–θ) = cos[–(θ–π)] = cos(θ–π). More importantly, we see that the absolute square of both f(θ) and f(π–θ) yields the same constant, and so their sum P = |f(θ)|2 +|f(π–θ)|= 2|f(θ)|2 = 2|f(π–θ)|= 2P(θ) = 2P(π–θ). So that’s another constant. That’s actually OK because, this time, I did not favor one angle over the other (so I did not assume both particles were more likely to go in some straight line rather than recoil).

Now, how does this compare to Bose-Einstein and Fermi-Dirac statistics? That’s shown below. For Bose-Einstein (left-hand side), the sum of the real parts of f(θ) and f(π–θ) yields zero (blue line), while the sum of their imaginary parts (i.e. the red graph) yields a sine-like function but it has double the amplitude of sin(θ). That’s logical: sin(θ) + sin(π–θ) = 2sin(θ). The green curve is the more interesting one, because that’s the total probability we’re looking for. It has two maxima now, at +π/2 and at –π/2. That’s good, as it does away with that ‘weird asymmetry’ we got when we used a ‘half-cycle’ f(θ) function.

B-E and F-D

Likewise, the Fermi-Dirac probability density function looks good as well (right-hand side). We have the imaginary parts of f(θ) and f(π–θ) that ‘add’ to zero: sin(θ) – sin(π–θ) = 0 (I put ‘add’ between brackets because, with Fermi-Dirac, we’re subtracting of course), while the real parts ‘add’ up to a double cosine function: cos(θ) – cos(π–θ) = cos(θ) – [–cos(θ)] = 2cos(θ). We now get a minimum at +π/2 and at –π/2, which is also in line with the general result we’d expect. The (final) graph below summarizes our findings. It gives the three ‘types’ of probabilities, i.e. the probability of finding some particle in some detector as a function of the angle –π < θ < +π using:

  1. Maxwell-Boltzmann statistics: that’s the green constant (non-identical particles, and probability does not vary with the angle θ).
  2. Bose-Einstein: that’s the blue graph below. It has two maxima, at +π/2 and at –π/2, and two minima, at 0 and at ±π (+π and –π are the same angle obviously), with the maxima equal to twice the value we get under Maxwell-Boltzmann statistics.
  3. Finally, the red graph gives the Fermi-Dirac probabilities. Also two maxima and minima, but at different places: the maxima are at θ = 0 and  θ = ±π, while the minima are at at +π/2 and at –π/2.


Funny, isn’t it? These probability density functions are all well-behaved, in the sense that they add up to the same total (which should be 1 when applying the normalization condition). Indeed, the surfaces under the green, blue and red lines are obviously the same. But so we get these weird fluctuations for Bose-Einstein and Fermi-Dirac statistics, favoring two specific angles over all others, while there’s no such favoritism when the experiment involves non-identical particles. This, of course, just follows from our assumption concerning f(θ). What if we double the frequency of f(θ), i.e. from one cycle to two cycles between –π and +π? Well… Just try it: take f(θ) = cos(2·θ) + isin(2·θ) and do the calculations. You should get the following probability graphs: we have the same green line for non-identical particles, but interference with four maxima (and four minima) for the Bose-Einstein and Fermi-Dirac probabilities.

summary 2

Again… Funny, isn’t it? So… What to make of this? Frankly, I don’t know. But one last graph makes for an interesting observation: if the angular frequency of f(θ) takes on larger and larger values, the Bose-Einstein and Fermi-Dirac probability density functions also start oscillating wildly. For example, the graphs below are based on a f(θ) function equal to f(θ) = cos(25·θ) + isin(25·θ). The explosion of color hurts the eye, doesn’t it? :-) But, apart from that, do you now see why physicists say that, at high frequencies, the interference pattern gets smeared out? Indeed, if we move the detector just a little bit (i.e. we change the angle θ just a little bit) in the example below, we hit a maximum instead of a minimum, and vice versa. In short, the granularity may be such that we can only measure that green line, in which case we’d think we’re dealing with Maxwell-Boltzmann statistics, while the underlying reality may be different.

summary 4

That explains another quote in Feynman’s famous introduction to quantum mechanics (Lectures, Vol. III, Chapter 1): “If the motion of all matter—as well as electrons—must be described in terms of waves, what about the bullets in our first experiment? Why didn’t we see an interference pattern there? It turns out that for the bullets the wavelengths were so tiny that the interference patterns became very fine. So fine, in fact, that with any detector of finite size one could not distinguish the separate maxima and minima. What we saw was only a kind of average, which is the classical curve. In the Figure below, we have tried to indicate schematically what happens with large-scale objects. Part (a) of the figure shows the probability distribution one might predict for bullets, using quantum mechanics. The rapid wiggles are supposed to represent the interference pattern one gets for waves of very short wavelength. Any physical detector, however, straddles several wiggles of the probability curve, so that the measurements show the smooth curve drawn in part (b) of the figure.”

Interference with bullets

But that should really conclude this post. It has become way too long already. One final remark, though: the ‘smearing out’ effect also explains why those three equations for 〈Ni〉 sometimes do amount to more or less the same thing: the Bose-Einstein and Fermi-Dirac formulas may approximate the Maxwell-Boltzmann equation. In that case, the ±1 term in the denominator does not make much of a difference. As we said a couple of times already, it all depends on scale. :-)

Concluding remarks

1. The best I can do in terms of interpreting the above, is to tell myself that we cannot fully ‘fix’ the functional form of the wave function for the second or ‘other’ way the event can happen if we’re ‘fixing’ the functional form for the first of the two possibilities. We have to allow for a phase shift eiδ indeed, which incorporates all kinds of considerations of uncertainty in regard to both time and position and, hence, in regard to energy and momentum also (using both the ΔEΔt = ħ/2 and ΔxΔp = ħ/2 expressions)–I assume (but that’s just a gut instinct). And then the symmetry of the situation then implies eiδ can only take on one of two possible values: –1 or +1 which, in turn, implies that δ is equal to 0 or π.

2. For those who’d think I am basically doing nothing but re-write a chapter out of Feynman’s Lectures, I’d refute that. One point to note is that Feynman doesn’t seem to accept that we should introduce a phase factor in the analysis for non-identical particles as well. To be specific: just switching the detectors (instead of the particles) also implies that one should allow for the mathematical possibility of the phase of that f function being shifted by some random factor δ. The only difference with the quantum-mechanical analysis (i.e. the analysis for identical particles) is that the phase factor doesn’t make a difference as to the final result, because we’re not adding amplitudes but their absolute squares and, hence, a phase shift doesn’t matter.

3. I think all of the reasoning above makes not only for a very fine but also a very beautiful theoretical argument, even I feel like I don’t fully ‘understand’ it, in an intuitive way that is. I hope this post has made you think. Isn’t it wonderful to see that the theoretical or mathematical possibilities of the model actually correspond to realities, both in the classical as well as in the quantum-mechanical world? In fact, I can imagine that most physicists and mathematicians would shrug this whole reflection off like… Well… Like: “Of course! It’s obvious, isn’t it?” I don’t think it’s obvious. I think it’s deep. I would even qualify it as mysterious, and surely as beautiful. :-)

Relativity paradoxes

My son, who’s fifteen, said he liked my post on lasers. That’s good, because I effectively wrote it thinking of him as part of the audience. He also said it stimulated him to considering taking on studies in engineering later. That’s great. I hope he does, so he doesn’t have to go through what I am going through right now. Indeed, when everything is said and done, you do want your kids to take on as much math and science they can handle when they’re young because, afterwards, it’s tough to catch up.

Now, I struggled quite a bit with bringing relativity into the picture while pondering the ‘essence’ of a photon in my previous post. Hence, I’d thought it would be good to return to the topic of (special) relativity and write another post to (1) refresh my knowledge on the topic and (2) try to stimulate him even more. Indeed, regardless of whether one does or doesn’t understand any of what I write below here, relativity theory sounds fascinating, doesn’t it? :-) So, this post intends to present, in a nutshell, what (special) relativity theory is all about.

What relativity does

The thing that’s best known about Einstein’s (special) theory of relativity is the following: the mass of an object, as measured by the (inertial) observer, increases with its speed. The formula for this is m = γm0, and the γ factor here is the so-called Lorentz factor: γ = (1–u2/c2)–1/2. Let me give you that diagram of the Lorentz factor once again, which shows that very considerable speeds are required before relativity effects kick in. However, when they do, they kick in with a vengeance, it seems, which makes c the limit !


Now, you may or may not be familiar with two other things that come out of relativity theory as well:

  1. The first is length contraction: objects are measured to be shortened in the direction of motion with respect to the (inertial) observer. The formula to be used incorporates the reciprocal of the Lorentz factor: L = (1/γ)L0. For example, a stick of one meter in a space ship moving at a velocity v = 0.6c will appear to be only 80 cm to the external/inertial observer seeing it whizz past… That is if he can see anything at all of course: he’d have to take like a photo-finish picture as it zooms past ! :-)
  2. The second is time dilation, which is also rather well known – just like the mass increase effect – because of the so-called twin paradox: time will appear to be slower in that space ship and, hence, if you send one of two twins away on a space journey, traveling at relativistic speeds (i.e. a velocity sufficiently close to to make the relativistic effect significant), he will come back younger than his brother. The formula here is equally simple: t = γt0. Hence, one second in the space ship will be measured as 1.25 seconds by the external observer. Hence, the moving clock will appear to run slower – again: to the external (inertial) observer that is.

These simple rules, which comes out of Einsteins’ special relativity theory, give rise to all kinds of paradoxes. You know what a paradox is: a paradox (in physics) is something that, at first sight, does not make sense but that, when the issue is examined more in detail, does get resolved and actually helps us to better understand what’s going on.

You know the twin paradox already: only of the two twins can be the younger (or the older) when they meet again. However, because one can also say it’s the guy staying on Earth that’s moving (and, hence, is ‘traveling’ at relativistic speed) – so then the reference frame of the guy in the spaceship is the so-called inertial frame, one can say the guy who stayed behind (on Earth) should be the youngest when they meet after the journey. I am not ashamed to say that this actually is a paradox that is difficult to understand. So let me first start with another.

The ladder paradox

While the twin paradox examines the time dilation effect, the ladder paradox examines the length contraction effect. The situation is similar as the one for the twin paradox. However, because we don’t have accelerating and decelerating rockets and all that (cf. the twin paradox), I find this paradox not only more straightforward but also more amusing. Look at the left-hand side first. We have a garage which has both a front and back door. A ladder passes through it, and it seems to fit in the garage as you can see. Now, that may or may not be because of the length contraction effect, of course. Whatever. In any case, it seems we can (very) quickly close both doors of the garage to prove that it fits. Now look at the right-hand side. Here we are moving the garage over the ladder (I know, not very convenient, but just go along with the story). So now the ladder frame is the inertial reference frame and the garage is the moving frame. So, according to that length contraction ‘law’, it’s the garage that gets shorter and it turns out the ladder doesn’t fit any more. Hence, the paradox: does the ladder fit or not? The answer must be unambiguous, no? Yes or no. So what is it?

Ladder paradox 1Ladder paradox 2

The paradox pushes us to consider all kinds of important questions which are usually just glossed over. How does we decide if the ladder fits? Well… By closing both the front and back door of course, you’ll say. But then you mean closing them simultaneously, and absolute simultaneity does not exist: two events that appear to happen at the same time in one reference frame may not happen at the same time in another. Only the space-time interval between two events is absolute, in the sense that it’s the same in whatever reference frame we’re measuring it, not the individual space and individual time intervals. Hence, if you’re in the garage shutting those doors at the same time, then that’s your time, but if I am moving with the ladder, I will not see those two doors shutting as something that’s simultaneous. More formally, and using the definition of space-time intervals (and assuming only one space dimension x), we have:

cΔt– Δx= cΔt’– Δx’2.

In this equation, we’ll take the x and t coordinates to be those of the inertial frame (so that’s the garage on the left-hand side), while the the primed coordinates (x’ and t’) are the coordinates as measured in the other reference frame, i.e. the reference frame that moves from the perspective of the inertial frame. Indeed, note that we cannot say that one reference frame moves while the other stands still as we we’re talking relative speeds here: one reference frame moves in respect to the other, and vice versa. In any case, the equation with the space-time intervals above implies that:

c(ΔtΔt’2) – (Δx– Δx’2) = 0

However, that does not imply that the two terms on the left-hand side of the above equation are zero individually. In fact, they aren’t. Hence, while it must be true that c(ΔtΔt’2) = Δx– Δx’2, we have:

ΔtΔt’≠ 0 and Δx– Δx’2 ≠ 0 or ΔtΔt’and Δx≠ Δx’2

To put it simply, if you’re in the garage, and I am moving with the ladder (we’re talking the left-hand side situation) now, you’ll claim that you were able to shut both doors momentarily, so that Δt= 0. I’ll say: bollocks! Which is rude. I should say: my Δt’is not equal to zero. Hence, from my point of view, I always saw one of the two doors open and, hence, I don’t think the ladder fits. Hence, what I am seeing, effectively, is the situation on the right-hand side: your garage looks too short for my ladder.

You’ll say: what is this? The ladder fits or it doesn’t, does it? The answer is: no. It is ambiguous. It does depend on your reference frame. It fits in your reference frame but it does not fit in mine. In order to get a non-ambiguous answer you have to stop moving, or I have to stop moving– whatever: the point is that we need to merge our reference frames.

Hence, paradox solved. In fact, now that I think of it, it’s kinda funny that we don’t have such paradoxes for the relativistic mass formula. No one seems to wonder about the apparent contradiction that, if you’re moving away from me, you look heavier than me but that, vice versa, I also look heavier to you. So we both look heavier as seen from our own respective reference frames. So who’s heavier then? Perhaps no one developed a paradox because it is kinda impolite to compare personal weights? :-)

Of course, I am joking, but think of it: it has to do with our preconceived notions of time and space. Things like inertia (mass is a measure for inertia) don’t grab our attention as much. In any case, now it’s time to discuss time dilation.

Oh ! And do think about that photo-finish picture ! It’s related to the problem of defining what constitutes a length really. :-)

The twin paradox

I find the twin paradox much more difficult to analyze, and I guess many people do because it’s the one that usually receives all of the attention. [Frankly, I hadn’t heard of this ladder paradox before I started studying physics.] Feynman hardly takes the time to look at it. He basically notes that the situation is not unlike an unstable particle traveling at relativistic speeds: when it does, it lasts (much) longer that its lifetime (measured in the inertial reference frame) suggests. Let me actually just quote Feynman’s account of it:

“Peter and Paul are supposed to be twins, born at the same time. When they are old enough to drive a space ship, Paul flies away at very high speed. Because Peter, who is left on the ground, sees Paul going so fast, all of Paul’s clocks appear to go slower, his heart beats go slower, his thoughts go slower, everything goes slower, from Peter’s point of view. Of course, Paul notices nothing unusual, but if he travels around and about for a while and then comes back, he will be younger than Peter, the man on the ground! That is actually right; it is one of the consequences of the theory of relativity which has been clearly demonstrated. Just as the mu-mesons last longer when they are moving, so also will Paul last longer when he is moving. This is called a “paradox” only by the people who believe that the principle of relativity means that all motion is relative; they say, “Heh, heh, heh, from the point of view of Paul, can’t we say that Peter was moving and should therefore appear to age more slowly? By symmetry, the only possible result is that both should be the same age when they meet.” But in order for them to come back together and make the comparison, Paul must either stop at the end of the trip and make a comparison of clocks or, more simply, he has to come back, and the one who comes back must be the man who was moving, and he knows this, because he had to turn around. When he turned around, all kinds of unusual things happened in his space ship—the rockets went off, things jammed up against one wall, and so on—while Peter felt nothing.

So the way to state the rule is to say that the man who has felt the accelerations, who has seen things fall against the walls, and so on, is the one who would be the younger; that is the difference between them in an “absolute” sense, and it is certainly correct. When we discussed the fact that moving mu-mesons live longer, we used as an example their straight-line motion in the atmosphere. But we can also make mu-mesons in a laboratory and cause them to go in a curve with a magnet, and even under this accelerated motion, they last exactly as much longer as they do when they are moving in a straight line. Although no one has arranged an experiment explicitly so that we can get rid of the paradox, one could compare a mu-meson which is left standing with one that had gone around a complete circle, and it would surely be found that the one that went around the circle lasted longer. Although we have not actually carried out an experiment using a complete circle, it is really not necessary, of course, because everything fits together all right. This may not satisfy those who insist that every single fact be demonstrated directly, but we confidently predict the result of the experiment in which Paul goes in a complete circle.”

[…] Well… I am not sure I am “among those who insist that every single fact be demonstrated directly”, but you’ll admit that Feynman is quite terse here (or more terse than usual, I should say). That being said, I understand why: the calculations involved in demonstrating that the paradox is what it is, i.e. an apparent contradiction only, are not straightforward. I’ve googled a bit but it’s all quite confusing. Good explanations usually involve the so-called Minkowski diagram, also known as the spacetime diagram. You’ve surely seen it before–when the light cone was being discussed and what it implies for the concepts of past, present and future. It’s a way to represent those spacetime intervals. The Minkowski diagram–from the perspective of the twin brother on Earth (hence, we only have unprimed coordinates x and (c)t)– is shown below. Don’t worry about those simultaneity planes as for now. Just try to understand the diagram. The twin brother that stays just moves along the vertical axis: x = 0. His space-traveling brother travels out to some point and then turns back, so he first travels northeast on this diagram and then takes a turn northwest, to meet up again with his brother on Earth.

485px-Twin_Paradox_Minkowski_DiagramThe point to note is that the twin brother is not traveling along one straight line, but along two. Hence, the argument that we can just as well say his frame of reference is inertial and that of his brother is the moving one is not correct. As Wikipedia notes (from which I got this diagram): “The trajectory of the ship is equally divided between two different inertial frames, while the Earth-based twin stays in the same inertial frame.”

Still, the situation is essentially symmetric and so we could draw a similar-looking spacetime diagram for the primed coordinates, i.e. x’ and ct’, and wonder what’s the difference. That’s where these planes of simultaneity come in. Look at the wonderful animation below: A, B, C are simultaneous events when I am standing still (v = 0). However, when I move at considerable speed (v = 0.3c), that’s no longer the case: it takes more time for news to reach me from ‘point’ A and, hence, assuming news travels at the speed of light, event A appears to happen later. Conversely, event C (in spacetime) appears to have happened before event B. Now that explains these blue so-called simultaneity planes on the diagram above: they’re the white lines traveling from the past to the future on the animation below, but for the trip out only (> 0). For the trip back, we have the red lines, which correspond to the v = –0.5c situation below. So that’s the return trip (< 0).

Relativity_of_Simultaneity_AnimationWhat you see is that, “during the U-turn, the plane of simultaneity jumps from blue to red and very quickly sweeps over a large segment of the world line of the Earth-based twin.” Hence, “when one transfers from the outgoing frame to the incoming frame there is a jump discontinuity in the age of the Earth-based twin.” [I took the quotes taken from Wikipedia here, where you can find the original references.] Now, you will say, that is also symmetric if we switch the reference frames. Yes… Except for the sign. So, yes, it is the traveling brother who effectively skips some time. Paradox solved.

Now… For some real fun…

Now, for some real fun, I’d like to ask you how the world would look like when you were traveling through it riding a photon. So… Think about it. Think hard. I didn’t google at first and I must admit the question really started wracking my brain. There are some many effects to take into account. One basic property, of course, must be that time stands still around you. You see the world as it was when you reached v = c. Well… Yes and no. The fact of the matter is that, because of all the relativistic effects (e.g. aberration, Doppler shift, intensity shifts,…), you actually don’t see a whole lot. One visualization of it (visual effects of relativistic speeds) seems to indicate that (most) science fiction movies actually present the correct picture (if the animation shows the correct visualization, that is): we’re staring into one bright flash of light ahead of us as we’re getting close to v = c. Interesting…

Finally, you should also try to find out what actually happens to the clocks during the deceleration and acceleration as the space ship of that twin brother turns. You’re going to find it fascinating. At the same time, the math behind is, quite simply, daunting and, hence, I won’t even try go into the math of this thing. :-)


So… Well… That’s it really. I now realize why I never quite got this as a kid. These paradoxes do require some deep thinking and imagination and, most of all, some tools that one just couldn’t find as easily as today.

The Web definitely does make it easier to study without the guidance of professors and the material environment of a university, although I don’t think it can be a substitute for discipline. When everything is said and done, it’s still hard work. Very hard work. But I hope you get there, Vincent ! :-) And please do look at that Youtube video by clicking the link above. :-)

Post scriptum: Because the resolution of the video above is quite low, I looked for others, for example one that describes the journey from the Sun to the Earth, which–as expected–takes about 8 minutes. While it has higher resolution, it is far less informative. I’ll let you google some more. Please tell me if you found something nice. :-)

The Complementarity Principle

Unlike what you might think when seeing the title of this post, it is not my intention to enter into philosophical discussions here: many authors have been writing about this ‘principle’, most of which–according to eminent physicists–don’t know what they are talking about. So I have no intention to make a fool of myself here too. However, what I do want to do here is explore, in an intuitive way, how the classical and quantum-mechanical explanations of the phenomenon of the diffraction of light are different from each other–and fundamentally so–while, necessarily, having to yield the same predictions. It is in that sense that the two explanations should be ‘complementary’.

The classical explanation

I’ve done a fairly complete analysis of the classical explanation in my posts on Diffraction and the Uncertainty Principle (20 and 21 September), so I won’t dwell on that here. Let me just repeat the basics. The model is based on the so-called Huygens-Fresnel Principle, according to which each point in the slit becomes a source of a secondary spherical wave. These waves then interfere, constructively or destructively, and, hence, by adding them, we get the form of the wave at each point of time and at each point in space behind the slit. The animation below illustrates the idea. However, note that the mathematical analysis does not assume that the point sources are neatly separated from each other: instead of only six point sources, we have an infinite number of them and, hence, adding up the waves amounts to solving some integral (which, as you know, is an infinite sum).


We know what we are supposed to get: a diffraction pattern. The intensity of the light on the screen at the other side depends on (1) the slit width (d), (2) the frequency of the light (λ), and (3) the angle of incidence (θ), as shown below.


One point to note is that we have smaller bumps left and right. We don’t get that if we’d treat the slit as a single point source only, like Feynman does when he discusses the double-slit experiment for (physical) waves. Indeed, look at the image below: each of the slits acts as one point source only and, hence, the intensity curves I1 and I2 do not show a diffraction pattern. They are just nice Gaussian “bell” curves, albeit somewhat adjusted because of the angle of incidence (we have two slits above and below the center, instead of just one on the normal itself). So we have an interference pattern on the screen and, now that we’re here, let me be clear on terminology: I am going along with the widespread definition of diffraction being a pattern created by one slit, and the definition of interference as a pattern created by two or more slits. I am noting this just to make sure there’s no confusion.

Water waves

That should be clear enough. Let’s move on the quantum-mechanical explanation.

The quantum-mechanical explanation

There are several formulations of quantum mechanics: you’ve heard about matrix mechanics and wave mechanics. Roughly speaking, in matrix mechanics “we interpret the physical properties of particles as matrices that evolve in time”, while the wave mechanics approach is primarily based on these complex-valued wave functions–one for each physical property (e.g. position, momentum, energy). Both approaches are mathematically equivalent.

There is also a third approach, which is referred to as the path integral formulation, which  “replaces the classical notion of a single, unique trajectory for a system with a sum, or functional integral, over an infinity of possible trajectories to compute an amplitude” (all definitions here were taken from Wikipedia). This approach is associated with Richard Feynman but can also be traced back to Paul Dirac, like most of the math involved in quantum mechanics, it seems. It’s this approach which I’ll try to explain–again, in an intuitive way only–in order to show the two explanations should effectively lead to the same predictions.

The key to understanding the path integral formulation is the assumption that a particle–and a ‘particle’ may refer to both bosons (e.g. photons) or fermions (e.g. electrons)–can follow any path from point A to B, as illustrated below. Each of these paths is associated with a (complex-valued) probability amplitude, and we have to add all these probability amplitudes to arrive at the probability amplitude for the particle to move from A to B.


You can find great animations illustrating what it’s all about in the relevant Wikipedia article but, because I can’t upload video here, I’ll just insert two illustrations from Feynman’s 1985 QED, in which he does what I try to do, and that is to approach the topic intuitively, i.e. without too much mathematical formalism. So probability amplitudes are just ‘arrows’ (with a length and a direction, just like a complex number or a vector), and finding the resultant or final arrow is a matter of just adding all the little arrows to arrive at one big arrow, which is the probability amplitude, which he denotes as P(A, B), as shown below.


This intuitive approach is great and actually goes a very long way in explaining complicated phenomena, such as iridescence for example (the wonderful patterns of color on an oil film!), or the partial reflection of light by glass (anything between 0 and 16%!). All his tricks make sense. For example, different frequencies are interpreted as slower or faster ‘stopwatches’ and, as such, they determine the final direction of the arrows which, in turn, explains why blue and red light are reflected differently. And so on and son. It all works. […] Up to a point.

Indeed, Feynman does get in trouble when trying to explain diffraction. I’ve reproduced his explanation below. The key to the argument is the following:

  1. If we have a slit that’s very wide, there are a lot of possible paths for the photon to take. However, most of these paths cancel each other out, and so that’s why the photon is likely to travel in a straight line. Let me quote Feynman: “When the gap between the blocks is wide enough to allow many neighboring paths to P and Q, the arrows for the paths to P add up (because all the paths to P take nearly the same time), while the paths to Q cancel out (because those paths have a sizable difference in time). So the photomultiplier at Q doesn’t click.” (QED, p.54)
  2. However, “when the gap is nearly closed and there are only a few neighboring paths, the arrows to Q also add up, because there is hardly any difference in time between them, either (see Fig. 34). Of course, both final arrows are small, so there’s not much light either way through such a small hole, but the detector at Q clicks almost as much as the one at P! So when you try to squeeze light too much to make sure it’s going only in a straight line, it refuses to cooperate and begins to spread out.” (QED, p. 55)

Many arrowsFew arrows

This explanation is as simple and intuitive as Feynman’s ‘explanation’ of diffraction using the Uncertainty Principle in his introductory chapter on quantum mechanics (Lectures, I-38-2), which is illustrated below. I won’t go into the detail (I’ve done that before) but you should note that, just like the explanation above, such explanations do not explain the secondary, tertiary etc bumps in the diffraction pattern.

Diffraction of electrons

So what’s wrong with these explanations? Nothing much. They’re simple and intuitive, but essentially incomplete, because they do not incorporate all of the math involved in interference. Incorporating the math means doing these integrals for

  1. Electromagnetic waves in classical mechanics: here we are talking ‘wave functions’ with some real-valued amplitude representing the strength of the electric and magnetic field; and
  2. Probability waves: these are complex-valued functions, with the complex-valued amplitude representing probability amplitudes.

The two should, obviously, yield the same result, but a detailed comparison between the approaches is quite complicated, it seems. Now, I’ve googled a lot of stuff, and I duly note that diffraction of electromagnetic waves (i.e. light) is conveniently analyzed by summing up complex-valued waves too, and, moreover, they’re of the same familiar type: ψ = Aei(kx–ωt). However, these analyses also duly note that it’s only the real part of the wave that has an actual physical interpretation, and that it’s only because working with natural exponentials (addition, multiplication, integration, derivation, etc) is much easier than working with sine and cosine waves that such complex-valued wave functions are used (also) in classical mechanics. In fact, note the fine print in Feynman’s illustration of interference of physical waves (Fig. 37-2): he calculates the intensities I1 and I2 by taking the square of the absolute amplitudes ĥ1 and ĥ2, and the hat indicates that we’re also talking some complex-valued wave function here.

Hence, we must be talking the same mathematical waves in both explanations, aren’t we? In other words, we should get the same psi functions ψ = Aei(kx–ωt) in both explanations, don’t we? Well… Maybe. But… Probably not. As far as I know–but I must be wrong–we cannot just re-normalize the E and B vectors in these electromagnetic waves in order to establish an equivalence with probability waves. I haven’t seen that being done (but I readily admit I still have a lot of reading to do) and so I must assume it’s not very clear-cut at all.

So what? Well… I don’t know. So far, I did not find a ‘nice’ or ‘intuitive’ explanation of a quantum-mechanical approach to the phenomenon of diffraction yielding the same grand diffraction equation, referred to as the Fresnel-Kirchoff diffraction formula (see below), or one of its more comprehensible (because simplified) representations, such as the Fraunhofer diffraction formula, or the even easier formula which I used in my own post (you can google them: they’re somewhat less monstrous and–importantly–they work with real numbers only, which makes them easier to understand).

Kirchoff formula[…] That looks pretty daunting, isn’t it? You may start to understand it a bit better by noting that (n, r) and (n, s) are angles, so that’s OK in a cosine function. The other variables also have fairly standard interpretations, as shown below, but… Admit it: ‘easy’ is something else, isn’t it?


So… Where are we here? Well… As said, I trust that both explanations are mathematically equivalent – just like matrix and wave mechanics :-) –and, hence, that a quantum-mechanical analysis will indeed yield the same formula. However, I think I’ll only understand physics truly if I’ve gone through all of the motions here.

Well then… I guess that should be some kind of personal benchmark that should guide me on this journey, isn’t it? :-) I’ll keep you posted.

Post scriptum: To be fair to Feynman, and demonstrating his talent as a teacher once again, he actually acknowledges that the double-slit thought experiment uses simplified assumptions that do not include diffraction effects when the electrons go through the slit(s). He does so, however, only in one of the first chapters of Vol. III of the Lectures, where he comes back to the experiment to further discuss the first principles of quantum mechanics. I’ll just quote him: “Incidentally, we are going to suppose that the holes 1 and 2 are small enough that when we say an electron goes through the hole, we don’t have to discuss which part of the hole. We could, of course, split each hole into pieces with a certain amplitude that the electron goes to the top of the hole and the bottom of the hole and so on. We will suppose that the hole is small enough so that we don’t have to worry about this detail. That is part of the roughness involved; the matter can be made more precise, but we don’t want to do so at this stage.” So here he acknowledges that he omitted the intricacies of diffraction. I noted this only later. Sorry.

A Royal Road to quantum physics?

It is said that, when Ptolemy asked Euclid to quickly explain him geometry, Euclid told the King that there was no ‘Royal Road’ to it, by which he meant it’s just difficult and takes a lot of time to understand.

Physicists will tell you the same about quantum physics. So, I know that, at this point, I should just study Feynman’s third Lectures Volume and shut up for a while. However, before I get lost while playing with state vectors, S-matrices, eigenfunctions, eigenvalues and what have you, I’ll try that Royal Road anyway, building on my previous digression on Hamiltonian mechanics.

So… What was that about? Well… If you understood anything from my previous post, it should be that both the Lagrangian and Hamiltonian function use the equations for kinetic and potential energy to derive the equations of motion for a system. The key difference between the Lagrangian and Hamiltonian approach was that the Lagrangian approach yields one differential equation–which had to be solved to yield a functional form for x as a function of time, while the Hamiltonian approach yielded two differential equations–which had to be solved to yield a functional form for both position (x) and momentum (p). In other words, Lagrangian mechanics is a model that focuses on the position variable(s) only, while, in Hamiltonian mechanics, we also keep track of the momentum variable(s). Let me briefly explain the procedure again, so we’re clear on it:

1. We write down a function referred to as the Lagrangian function. The function is L = T – V with T and V the kinetic and potential energy respectively. T has to be expressed as a function of velocity (v) and V has to be expressed as a function of position (x). You’ll say: of course! However, it is an important point to note, otherwise the following step doesn’t make sense. So we take the equations for kinetic and potential energy and combine them to form a function L = L(x, v).

2. We then calculate the so-called Lagrangian equation, in which we use that function L. To be precise: what we have to do is calculate its partial derivatives and insert these in the following equation:


It should be obvious now why I stressed we should write L as a function of velocity and position, i.e. as L = L(x, v). Otherwise those partial derivatives don’t make sense. As to where this equation comes from, don’t worry about it: I did not explain why this works. I didn’t do that here, and I also didn’t do it in my previous post. What we’re doing here is just explaining how it goes, not why.

3. If we’ve done everything right, we should get a second-order differential equation which, as mentioned above, we should then solve for x(t). That’s what ‘solving’ a differential equation is about: find a functional form that satisfies the equation.

Let’s now look at the Hamiltonian approach.

1. We write down a function referred to as the Hamiltonian function. It looks similar to the Lagrangian, except that we sum kinetic and potential energy, and that T has to be expressed as a function of the momentum p. So we have a function H = T + V = H(x, p).

2. We then calculate the so-called Hamiltonian equations, which is a set of two equations, rather than just one equation. [We have two for the one-dimensional situation that we are modeling here: it’s a different story (i.e. we will have more equations) if we’d have more degrees of freedom of course.] It’s the same as in the Lagrangian approach: it’s just a matter of calculating partial derivatives, and insert them in the equations below. Again, note that I am not explaining why this Hamiltonian hocus-pocus actually works. I am just saying how it works.

Hamiltonian equations

3. If we’ve done everything right, we should get two first-order differential equations which we should then solve for x(t) and p(t). Now, solving a set of equations may or may not be easy, depending on your point of view. If you wonder how it’s done, there’s excellent stuff on the Web that will show you how (such as, for instance, Paul’s Online Math Notes).

Now, I mentioned in my previous post that the Hamiltonian approach to modeling mechanics is very similar to the approach that’s used in quantum mechanics and that it’s therefore the preferred approach in physics. I also mentioned that, in classical physics, position and momentum are also conjugate variables, and I also showed how we can calculate the momentum as a conjugate variable from the Lagrangian: p = ∂L/∂v. However, I did not dwell on what conjugate variables actually are in classical mechanics. I won’t do that here either. Just accept that conjugate variables, in classical mechanics, are also defined as pairs of variables. They’re not related through some uncertainty relation, like in quantum physics, but they’re related because they can both be obtained as the derivatives of a function which I haven’t introduced as yet. That function is referred to as the action, but… Well… Let’s resist the temptation to digress any further here. If you really want to know what action is–in physics, that is… :-) Well… Google it, I’d say. What you should take home from this digression is that position and momentum are also conjugate variables in classical mechanics.

Let’s now move on to quantum mechanics. You’ll see that the ‘similarity’ in approach is… Well… Quite relative, I’d say. :-)

Position and momentum in quantum mechanics

As you know by now (I wrote at least a dozen posts on this), the concept of position and momentum in quantum mechanics is very different from that in classical physics: we do not have x(t) and p(t) functions which give a unique, precise and unambiguous value for x and p when we assign a value to the time variable and plug it in. No. What we have in quantum physics is some weird wave function, denoted by the Greek letters φ (phi) or ψ (psi) or, using Greek capitals, Φ and Ψ. To be more specific, the psi usually denotes the wave function in the so-called position space (so we write ψ = ψ(x)), and the phi will usually denote the wave function in the so-called momentum space (so we write φ = φ(p)). That sounds more complicated than it is, obviously, but I just wanted to respect terminology here. Finally, note that the ψ(x) and φ(p) wave functions are related through the Uncertainty Principle: they’re conjugate variables, and we have this ΔxΔp = ħ/2 equation, in which the Δ is some standard deviation from some mean value. I should not go into more detail here: you know that by now, don’t you?

While the argument of these functions is some real number, the wave functions themselves are complex-valued, so they have a real and complex amplitude. I’ve also illustrated that a couple of times already but, just to make sure, take a look at the animation below, so you know what we are sort of talking about:

  1. The A and B situations represent a classical oscillator: we know exactly where the red ball is at any point in time.
  2. The C to H situations give us a complex-valued amplitude, with the blue oscillation as the real part, and the pink oscillation as the imaginary part.

QuantumHarmonicOscillatorAnimationSo we have such wave function both for x and p. Note that the animation above suggests we’re only looking at the wave function for x but–trust me–we have a similar one for p, and they’re related indeed. [To see how exactly, I’d advise you to go through the proof of the so-called Kennard inequality.] So… What do we do with that?

The position and momentum operators

When we want to know where a particle actually is, or what its momentum is, we need to do something with this wave function ψ or φ. Let’s focus on the position variable first. While the wave function itself is said to have ‘no physical interpretation’ (frankly, I don’t know what that means: I’d think everything has some kind of interpretation (and what’s physical and non-physical?), but let’s not get lost in philosophy here), we know that the square of the absolute value of the probability amplitude yields a probability density. So |ψ(x)|gives us a probability density function or, to put it simply, the probability to find our ‘particle’ (or ‘wavicle’ if you want) at point x. Let’s now do something more sophisticated and write down the expected value of x, which is usually denoted by 〈x〉 (although that invites confusion with Dirac’s bra-ket notation, but don’t worry about it):

expected value of x

Don’t panic. It’s just an integral. Look at it. ψ* is just the complex conjugate (i.e. a – ib if ψ = a + ib) and you will (or should) remember that the product of a complex number with its (complex) conjugate gives us the square of its absolute value: ψ*ψ = |ψ(x)|2. What about that x? Can we just insert that there, in-between ψ* and ψ ? Good question. The answer is: yes, of course! That x is just some real number and we can put it anywhere. However, it’s still a good question because, while multiplication of complex numbers is commutative (hence,  z1z2 = z2z1), the order of our operators – which we will introduce soon – can often not be changed without consequences, so it is something to note.

For the rest, that integral above is quite obvious and it should really not puzzle you: we just multiply a value with its probability of occurring and integrate over the whole domain to get an expected value 〈x〉. Nothing wrong here. Note that we get some real number. [You’ll say: of course! However, I always find it useful to check that when looking at those things mixing complex-valued functions with real-valued variables or arguments. A quick check on the dimensions of what we’re dealing helps greatly in understanding what we’re doing.]

So… You’ve surely heard about the position and momentum operators already. Is that, then, what it is? Doing some integral on some function to get an expected value? Well… No. But there’s a relation. However, let me first make a remark on notation, because that can be quite confusing. The position operator is usually written with a hat on top of the variable – like ẑ – but so I don’t find a hat with every letter with the editor tool for this blog and, hence, I’ll use a bold letter x and p to denote the operator. Don’t confuse it with me using a bold letter for vectors though ! Now, back to the story.

Let’s first give an example of an operator you’re already familiar with in order to understand what an operator actually is. To put it simply: an operator is an instruction to do something with a function. For example: ∂/∂t is an instruction to differentiate some function with regard to the variable t (which usually stands for time). The ∂/∂t operator is obviously referred to as a differentiation operator. When we put a function behind, e.g. f(x, t), we get ∂f(x, t)/∂t, which is just another function in x and t.

So we have the same here: x in itself is just an instruction: you need to put a function behind in order to get some result. So you’ll see it as xψ. In fact, it would be useful to use brackets probably, like x[ψ], especially because I can’t put those hats on the letters here, but I’ll stick to the usual notation, which does not use brackets.

Likewise, we have a momentum operator: p = –iħ∂/∂x. […] Let it sink in. [..]

What’s this? Don’t worry about it. I know: that looks like a very different animal than that x operator. I’ll explain later. Just note, for the moment, that the momentum operator (also) involves a (partial) derivative and, hence, we refer to it as a differential operator (as opposed to differentiation operator). The instruction p = –iħ∂/∂x basically means: differentiate the function with regard to x and multiply with iħ (i.e. the product of Planck’s constant and the imaginary unit i). Nothing wrong with that. Just calculate a derivative and multiply with a tiny imaginary (complex) number.

Now, back to the position operator x. As you can see, that’s a very simple operator–much simpler than the momentum operator in any case. The position operator applied to ψ yields, quite simply, the xψ(x) factor in the integrand above. So we just get a new function xψ(x) when we apply x to ψ, of which the values are simply the product of x and ψ(x). Hence, we write xψ = xψ.

Really? Is it that simple? Yes. For now at least. :-)

Back to the momentum operator. Where does that come from? That story is not so simple. [Of course not. It can’t be. Just look at it.] Because we have to avoid talking about eigenvalues and all that, my approach to the explanation will be quite intuitive. [As for ‘my’ approach, let me note that it’s basically the approach as used in the Wikipedia article on it. :-)] Just stay with me for a while here.

Let’s assume ψ is given by ψ = ei(kx–ωt). So that’s a nice periodic function, albeit complex-valued. Now, we know that functional form doesn’t make all that much sense because it corresponds to the particle being everywhere, because the square of its absolute value is some constant. In fact, we know it doesn’t even respect the normalization condition: all probabilities have to add up to 1. However, that being said, we also know that we can superimpose an infinite number of such waves (all with different k and ω) to get a more localized wave train, and then re-normalize the result to make sure the normalization condition is met. Hence, let’s just go along with this idealized example and see where it leads.

We know the wave number k (i.e. its ‘frequency in space’, as it’s often described) is related to the momentum p through the de Broglie relation: p = ħk. [Again, you should think about a whole bunch of these waves and, hence, some spread in k corresponding to some spread in p, but just go along with the story for now and don’t try to make it even more complicated.] Now, if we differentiate with regard to x, and then substitute, we get ∂ψ/∂x = ∂ei(kx–ωt)/∂x = ikei(kx–ωt) = ikψ, or


So what is this? Well… On the left-hand side, we have the (partial) derivative of a complex-valued function (ψ) with regard to x. Now, that derivative is, more likely than not, also some complex-valued function. And if you don’t believe me, just look at the right-hand side of the equation, where we have that i and ψ. In fact, the equation just shows that, when we take that derivative, we get our original function ψ but multiplied by ip/ħ. Hey! We’ve got a differential equation here, don’t we? Yes. And the solution for it is… Well… The natural exponential. Of course! That should be no surprise because we started out with a natural exponential as functional form! So that’s not the point. What is the point, then? Well… If we bring that i/ħ factor to the other side, we get:

(–i/ħ)(∂ψ/∂x) = pψ

[If you’re confused about the –i, remember that i–1 = 1/i = –i.] So… We’ve got pψ on the right-hand side now. So… Well… That’s like xψ, isn’t it? Yes. :-) If we define the momentum operator as p = (–i/ħ)(∂/∂x), then we get pψ = pψ. So that’s the same thing as for the position operator. It’s just that p is… Well… A more complex operator, as it has that –i/ħ factor in it. And, yes, of course it also involves an instruction to differentiate, which also sets it apart from the position operator, which is just an instruction to multiply the function with its argument.

I am sure you’ll find this funny–perhaps even fishy–business. And, yes, I have the same questions: what does it all mean? I can’t answer that here. As for now, just accept that this position and momentum operator are what they are, and that I can’t do anything about that. But… I hear you sputter: what about their interpretation? Well… Sorry… I could say that the functions xψ and pψ are so-called linear maps but that is not likely to help you much in understanding what these operators really do. You – and I for sure :-) – will indeed have to go through that story of eigenvalues to a somewhat deeper understanding of what these operators actually are. That’s just how it is. As for now, I just have to move on. Sorry for letting you down here. :-)

Energy operators

Now that we sort of ‘understand’ those position and momentum operators (or their mathematical form at least), it’s time to introduce the energy operators. Indeed, in quantum mechanics, we’ve also got an operator for (a) kinetic energy, and for (b) potential energy. These operators are also denoted with a hat above the T and V symbol. All quantum-mechanical operators are like that, it seems. However, because of the limitations of the editor tool here, I’ll also use a bold T and V respectively. Now, I am sure you’ve had enough of this operators, so let me just jot them down:

  1. V = V, so that’s just an instruction to multiply a function with V = V(x, t). That’s easy enough because that’s just like the position vector.
  2. As for T, that’s more complicated. It involves that momentum operator p, which was also more complicated, remember? Let me just give you the formula:

T = p/2m = p2/2m.

So we multiply the operator p with itself here. What does that mean? Well… Because the operator involves a derivative, it means we have to take the derivative twice and… No ! Well… Let me correct myself: yes and no. :-) That p·p product is, strictly speaking, a dot product between two vectors, and so it’s not just a matter of differentiating twice. Now that we are here, we may just as well extend the analysis a bit and assume that we also have a y and z coordinate, so we’ll have a position vector r = (x, y, z). [Note that r is a vector here, not an operator. !?! Oh… Well…] Extending the analysis to three (or more) dimensions means that we should replace the differentiation operator by the so-called gradient or del operator: ∇ = (∂/∂x, ∂/∂y, ∂/∂z). And now that dot product p will, among other things, yield another operator which you’re surely familiar with: the Laplacian. Let me remind you of it:


Hence, we can write the kinetic energy operator T as:

Kinetic energy operator

I quickly copied this formula from Wikipedia, which doesn’t have the limitation of the WordPress editor tool, and so you see it now the way you should see it, i.e. with the hat notation. :-)


In case you’re despairing, hang on ! We’re almost there. :-) We can, indeed, now define the Hamiltonian operator that’s used in quantum mechanics. While the Hamiltonian function was the sum of the potential and kinetic energy functions in classical physics, in quantum mechanics we add the two energy operators. You’ll grumble and say: that’s not the same as adding energies. And you’re right: adding operators is not the same as adding energy functions. Of course it isn’t. :-) But just stick to the story, please, and stop criticizing. [Oh – just in case you wonder where that minus sign comes from: i2 = –1, of course.]

Adding the two operators together yields the following:

Hamiltonian operator

So. Yes. That’s the famous Hamiltonian operator.

OK. So what?

Yes…. Hmm… What do we do with that operator? Well… We apply it to the function and so we write Hψ = … Hmm…

Well… What? 

Well… I am not writing this post just to give some definitions of the type of operators that are used in quantum mechanics and then just do obvious stuff by writing it all out. No. I am writing this post to illustrate how things work.

OK. So how does it work then? 

Well… It turns out that, in quantum mechanics, we have similar equations as in classical mechanics. Remember that I just wrote down the set of (two) differential equations when discussing Hamiltonian mechanics? Here I’ll do the same. The Hamiltonian operator appears in an equation of which you’ve surely heard of and which, just like me, you’d love to understand–and then I mean: understand it fully, completely, and intuitively. […] Yes. It’s the Schrödinger equation:

schrodinger 1

Note, once again, I am not saying anything about where this equation comes from. It’s like jotting down that Lagrange equation, or the set of Hamiltonian equations: I am not saying anything about the why of all this hocus pocus. I am just saying how it goes. So we’ve got another differential equation here, and we have to solve it. If we all write it out using the above definition of the Hamiltonian operator, we get:

Schrodinger 2

If you’re still with me, you’ll immediately wonder about that μ. Well… Don’t. It’s the mass really, but the so-called reduced mass. Don’t worry about it. Just google it if you want to know more about this concept of a ‘reduced’ mass: it’s a fine point which doesn’t matter here really. The point is the grand result.

But… So… What is the grand result? What are we looking at here? Well… Just as I said above: that Schrödinger equation is a differential equation, just like those equations we got when applying the Lagrangian and Hamiltonian approach to modeling a dynamic system in classical mechanics, and, hence, just like what we (were supposed to) do there, we have to solve it. :-) Of course, it looks much more daunting than our Lagrangian or Hamiltonian differential equations, because we’ve got complex-valued functions here, and you’re probably scared of that iħ factor too. But you shouldn’t be. When everything is said and done, we’ve got a differential equation here that we need to solve for ψ. In other words, we need to find functional forms for ψ that satisfy the above equation. That’s it. Period.

So how do these solutions look like? Well, they look like those complex-valued oscillating things in the very first animation above. Let me copy them again:


So… That’s it then? Yes. I won’t say anything more about it here, because (1) this post has become way too long already, and so I won’t dwell on the solutions of that Schrödinger equation, and because (2) I do feel it’s about time I really start doing what it takes, and that’s to work on all of the math that’s necessary to actually do all that hocus-pocus. :-)

P.S: As for understanding the Schrödinger equation “fully, completely, and intuitively”, I am not sure that’s actually possible. But I am trying hard and so let’s see. :-) I’ll tell you after I mastered the math. But something inside of me tells me there’s indeed no Royal Road to it. :-)

Newtonian, Lagrangian and Hamiltonian mechanics

This is just another loose end I wanted to tie up. As an economist, I thought I knew a thing or two about optimization. Indeed, when everything is said and done, optimization is supposed to an economist’s forte, isn’t it? :-) Hence, I thought I sort of understood what a Lagrangian would represent in physics, and I also thought I sort of intuitively understood why and how it could be used it to model the behavior of a dynamic system. In short, I thought that Lagrangian mechanics would be all about optimizing something subject to some constraints. Just like in economics, right?

[…] Well… When checking it out, I found that the answer is: yes, and no. And, frankly, the honest answer is more no than yes. :-) Economists (like me), and all social scientists (I’d think), learn only about one particular type of Lagrangian equations: the so-called Lagrange equations of the first kind. This approach models constraints as equations that are to be incorporated in an objective function (which is also referred to as a Lagrangian–and that’s where the confusion starts because it’s different from the Lagrangian that’s used in physics, which I’ll introduce below) using so-called Lagrange multipliers. If you’re an economist, you’ll surely remember it: it’s a problem written as “maximize f(x, y) subject to g(x, y) = c”, and we solve it by finding the so-called stationary points (i.e. the points for which the derivative is zero) of the (Lagrangian) objective function f(x, y) + λ[g(x, y) – c].

Now, it turns out that, in physics, they use so-called Lagrange equations of the second kind, which incorporate the constraints directly by what Wikipedia refers to as a “judicious choice of generalized coordinates.”

Generalized coordinates? Don’t worry about it: while generalized coordinates are defined formally as “parameters that describe the configuration of the system relative to some reference configuration”, they are, in practice, those coordinates that make the problem easy to solve. For example, for a particle (or point) that moves on a circle, we’d not use the Cartesian coordinates x and y but just the angle that locates the particles (or point). That simplifies matters because then we have only parameter to track. In practice, the number of parameters (i.e. the number of generalized coordinates) will be defined by the number of degrees of freedom of the system, and we know what that means: it’s the number of independent directions in which the particle (or point) can move, and that usually includes not only the x, y and z directions but also rotational and/or vibratory movements. We went over that when discussing kinetic gas theory, so I won’t say more about that here.

So… OK… That was my first surprise: the physicist’s Lagrangian is different from the social scientist’s Lagrangian. 

The second surprise was that all physics textbooks seem to dislike the Lagrangian approach. Indeed, they opt for a related but different function when developing a model of a dynamic system: it’s a function referred to as the Hamiltonian. And, no, the preference for the Hamiltonian approach has nothing to do with the fact that William Rowan Hamilton was Anglo-Irish, while Joseph-Louis Lagrange (born as Giuseppe Lodovico Lagrangia) was Italian-French. :-)

The modeling approach which uses the Hamiltonian instead of the Lagrangian is, of course, referred to as Hamiltonian mechanics.

And then we have good old Newtonian mechanics as well, obviously. In case you wonder what that is: it’s the modeling approach that we’ve been using all along. :-) But I’ll remind you of what it is in a moment: it amounts to making sense of some situation by using Newton’s laws of motion only, rather than any sophisticated mathematical system of equations.

Introducing Lagrangian and Hamiltonian mechanics is quite confusing because the functions that are involved (i.e. the so-called Lagrangian and Hamiltonian functions) look very similar: we write the Lagrangian as the difference between the kinetic and potential energy of a system (L = T – V), while the Hamiltonian is the sum of both (H = T + V). Now, I could make this post very simple and just ask you to note that both approaches are basically ‘equivalent’ (in the sense that they lead to the same solutions, i.e. the same equations of motion expressed as a function of time) and that a choice between them is just a matter of preference–like choosing between an English versus a continental breakfast. :-) [I note the English breakfast has usually some extra bacon, or a sausage, so you get more but not necessarily better.] So that would be the end of this digression then, and I should be done. However, I must assume you’re a curious person, just like me, and, hence, you’ll say that, while being ‘equivalent’, they’re obviously not the same. So how do the two approaches differ exactly?

Let’s try to get a somewhat intuitive understanding of it all by taking, once again, the example of a simple harmonic oscillator, as depicted below. It could be a mass on a spring. In fact, our example will, in fact, be that of an oscillating mass on a spring. Let’s also assume there’s no damping, because that makes the analysis soooooooo much easier: we can then just express everything as a function of one variable only, time or position, instead of having to keep track of both.


Of course, we already know all of the relevant equations for this system just from applying Newton’s laws (so that’s Newtonian mechanics). We did that in a previous post. [I can’t remember which one, but I am sure I’ve done this already.] Hence, we don’t really need the Lagrangian or Hamiltonian. But, of course, that’s the point of this post: I want to illustrate how these other approaches to modeling a dynamic system actually work, and so it’s good we have the correct answer already so we can make sure we’re not going off track here. So… Let’s go… :-)

I. Newtonian mechanics

Let me recapitulate the basics of a mass on a spring which, in jargon, is called a harmonic oscillator. Hooke’s law is there: the force on the mass is proportional to its distance from the zero point (i.e. the displacement), and the direction of the force is towards the zero point–not away from it, and so we have a minus sign. In short, we can write:

F = –kx (i.e. Hooke’s law)

Now, Newton‘s Law (Newton’s second law to be precise) says that F is equal to the mass times the acceleration: F = ma. So we write:

F = ma = m(d2x/dt2) = –kx

So that’s just Newton’s law combined with Hooke’s law. We know this is a differential equation for which there’s a general solution with the following form:

x(t) = Acos(ωt + α)

If you wonder why… Well… I can’t digress on that here again: just note, from that differential equation, that we apparently need a function x(t) that yields itself when differentiated twice. So that must be some sinusoidal function, like sine or cosine, because these do that. […] OK… Sorry, but I must move on.

As for the new ‘variables’ (A, ω and α), A depends on the initial condition and is the (maximum) amplitude of the motion. We also already know from previous posts (or, more likely, because you do knew something about physics before reading this) that A is related to the energy of the system. To be precise: the energy of the system is proportional to the square of the amplitude: E ∝ A2. As for ω, the angular frequency, that’s determined by the spring itself and the oscillating mass on it: ω = (k/m)1/2 = 2π/T = 2πf (with T the period, and f the frequency expressed in oscillations per second, as opposed to the angular frequency, which is the frequency expressed in radians per second). Finally, I should note that α is just a phase shift which depends on how we define our t = 0 point: if x(t) is zero at t = 0, then that cosine function should be zero and then α will be equal to ±π/2.

OK. That’s clear enough. What about the ‘operational currency of the universe’, i.e. the energy of the oscillator? Well… I told you already, and we don’t need the energy concept here to find the equation of motion. In fact, that’s what distinguishes this ‘Newtonian’ approach from the Lagrangian and Hamiltonian approach. But… Now that we’re at it, and we have to move to a discussion of these two animals (I mean the Lagrangian and Hamiltonian), let’s go for it.

We have kinetic versus potential energy. Kinetic energy (T) is what it always is. It depends on the velocity and the mass: K.E. = T = mv2/2 = m(dx/dt)2/2 = p2/2m. Huh? What’s this expression with p in it? […] It’s momentum: p = mv. Just check it: it’s an alternative formula for T really. Nothing more, nothing less. I am just noting it here because it will pop up again in our discussion of the Hamiltonian modeling approach. But that’s for later. Onwards!

What about potential energy (V)? We know that’s equal to V = kx2/2. And because energy is conserved, potential energy (V) and kinetic energy (T) should add up to some constant. Let’s check it: dx/dt = d[Acos(ωt + α)]/dt = –Aωsin(ωt + α). [Please do the derivation: don’t accept things at face value. :-)] Hence, T = mA2ω2sin2(ωt + α)/2 = mA2(k/m)sin2(ωt + α)/2 = kA2sin2(ωt + α)/2. Now, V is equal to V = kx2/2 = k[Acos(ωt + α)]2/2 = k[Acos(ωt + α)]2/2 = kA2cos2(ωt + α)/2. Adding both yields:

T + V = kA2sin2(ωt + α)/2 + kA2cos2(ωt + α)/2

= (1/2)kA2[sin2(ωt + α) + cos2(ωt + α)] = kA2/2.

Uff! Glad that seems to work out: the total energy is, indeed, proportional to the square of the amplitude and the constant of proportionality is equal to k/2. [You should now wonder why we do not have m in this formula but, if you’d think about it, you can answer your own question: the amplitude will depend on the mass (bigger mass, smaller amplitude, and vice versa), so it’s actually in the formula already.]

The point to note is that this Hamiltonian function H = T + V is just a constant, not only for this particular case (an oscillation without damping), but in all cases where H represents the total energy of a (closed) system: H = T + V = kA2/2.

OK. That’s clear enough. How does our Lagrangian look like? That’s not a constant obviously. Just so you can visualize things, I’ve drawn the graph below:

  1. The red curve represents kinetic energy (T) as a function of the displacement x: T is zero at the turning points, and reaches a maximum at the x = 0 point.
  2. The blue curve is potential energy (V): unlike T, V reaches a maximum at the turning points, and is zero at the x = 0 point. In short, it’s the mirror image of the red curve.
  3. The Lagrangian is the green graph: L = T – V. Hence, L reaches a minimum at the turning points, and a maximum at the x = 0 point.


While that green function would make an economist think of some Lagrangian optimization problem, it’s worth noting we’re doing any such thing here: we’re not interested in stationary points. We just want the equation(s) of motion. [I just thought that would be worth stating, in light of my own background and confusion in regard to it all. :-)]

OK. Now that we have an idea of what the Lagrangian and Hamiltonian functions are (it’s probably worth noting also that we do not have a ‘Newtonian function’ of some sort), let us now show how these ‘functions’ are used to solve the problem. What problem? Well… We need to find some equation for the motion, remember? [I find that, in physics, I often have to remind myself of what the problem actually is. Do you feel the same? :-) ] So let’s go for it.

II. Lagrangian mechanics

As this post should not turn into a chapter of some math book, I’ll just describe the how, i.e. I’ll just list the steps one should take to model and then solve the problem, and illustrate how it goes for the oscillator above. Hence, I will not try to explain why this approach gives the correct answer (i.e. the equation(s) of motion). So if you want to know why rather than how, then just check it out on the Web: there’s plenty of nice stuff on math out there.

The steps that are involved in the Lagrangian approach are the following:

  1. Compute (i.e. write down) the Lagrangian function L = T – V. Hmm? How do we do that? There’s more than one way to express T and V, isn’t it? Right you are! So let me clarify: in the Lagrangian approach, we should express T as a function of velocity (v) and V as a function of position (x), so your Lagrangian should be L = L(x, v). Indeed, if you don’t pick the right variables, you’ll get nowhere. So, in our example, we have L = mv2/2 – kx2/2.
  2. Compute the partial derivatives ∂L/∂x and ∂L/∂v. So… Well… OK. Got it. Now that we’ve written L using the right variables, that’s a piece of cake. In our example, we have: ∂L/∂x = – kx and ∂L/∂v = mv. Please note how we treat x and v as independent variables here. It’s obvious from the use of the symbol for partial derivatives: ∂. So we’re not taking any total differential here or so. [This is an important point, so I’d rather mention it.]
  3. Write down (‘compute’ sounds awkward, doesn’t it?) Lagrange’s equation: d(∂L/∂v)/dt = ∂L/∂x. […] Yep. That’s it. Why? Well… I told you I wouldn’t tell you why. I am just showing the how here. This is Lagrange’s equation and so you should take it for granted and get on with it. :-) In our example: d(∂L/∂v)/dt = d(mv)/dt = –k(dx/dt) = ∂L/∂x = – kx. We can also write this as m(dv/dt) = m(d2x/dt2) = –kx.     
  4. Finally, solve the resulting differential equation. […] ?! Well… Yes. […] Of course, we’ve done that already. It’s the same differential equation as the one we found in our ‘Newtonian approach’, i.e. the equation we found by combining Hooke’s and Newton’s laws. So the general solution is x(t) = Acos(ωt + α), as we already noted above.

So, yes, we’re solving the same differential equation here. So you’ll wonder what’s the difference then between Newtonian and Lagrangian mechanics? Yes, you’re right: we’re indeed solving the same second-order differential equation here. Exactly. Fortunately, I’d say, because we don’t want any other equation(s) of motion because we’re talking the same system. The point is: we got that differential equation using an entirely different procedure, which I actually didn’t explain at all: I just said to compute this and then that and… – Surprise, surprise! – we got the same differential equation in the end. :-) So, yes, the Newtonian and Lagrangian approach to modeling a dynamic system yield the same equations, but the Lagrangian method is much more (very much more, I should say) convenient when we’re dealing with lots of moving bits and if there’s more directions (i.e. degrees of freedom) in which they can move.

In short, Lagrange could solve a problem more rapidly than Newton with his modeling approach and so that’s why his approach won out. :-) In fact, you’ll usually see the spatial variables noted as qj. In this notation, j = 1, 2,… n, and n is the number of degrees of freedom, i.e. the directions in which the various particles can move. And then, of course, you’ll usually see a second subscript i = 1, 2,… m to keep track of every qfor each and every particle in the system, so we’ll have n×m qij‘s in our model and so, yes, good to stick to Lagrange in that case.

OK. You get that, I assume. Let’s move on to Hamiltonian mechanics now.

III. Hamiltonian mechanics

The steps here are the following. [Again, I am just explaining the how, not the why. You can find mathematical proofs of why this works in handbooks or, better still, on the Web.]

  1. The first step is very similar as the one above. In fact, it’s exactly the same: write T and V as a function of velocity (v) and position (x) respectively and construct the Lagrangian. So, once again, we have L = L(x, v). In our example: L(x, v) = mv2/2 – kx2/2.
  2. The second step, however, is different. Here, the theory becomes more abstract, as the Hamiltonian approach does not only keep track of the position but also of the momentum of the particles in a system. Position (x) and momentum (p) are so-called canonical variables in Hamiltonian mechanics, and the relation with Lagrangian mechanics is the following: p = ∂L/∂v. Huh? Yeah. Again, don’t worry about the why. Just check it for our example: ∂(mv2/2 – kx2/2)/∂v = 2mv/2 = mv. So, yes, it seems to work. Please note, once again, how we treat x and v as independent variables here, as is evident from the use of the symbol for partial derivatives. Let me get back to the lesson, however. The second step is: calculate the conjugate variables. In more familiar wording: compute the momenta.
  3. The third step is: write down (or ‘build’ as you’ll see it, but I find that wording strange too) the Hamiltonian function H = T + V. We’ve got the same problem here as the one I mentioned with the Lagrangian: there’s more than one way to express T and V. Hence, we need some more guidance. Right you are! When writing your Hamiltonian, you need to make sure you express the kinetic energy as a function of the conjugate variable, i.e. as a function of momentum, rather than velocity. So we have H = H(x, p), not H = H(x, v)! In our example, we have H = T + V = p2/2m + kx2/2.
  4. Finally, write and solve the following set of equations: (I) ∂H/∂p = dx/dt and (II) –∂H/∂x = dp/dt. [Note the minus sign in the second equation.] In our example: (I) p/m = dx/dt and (II) –kx = dp/dt. The first equation is actually nothing but the definition of p: p = mv, and the second equation is just Hooke’s law: F = –kx. However, from a formal-mathematical point of view, we have two first-order differential equations here (as opposed to one second-order equation when using the Lagrangian approach), which should be solved simultaneously in order to find position and momentum as a function of time, i.e. x(t) and p(t). The end result should be the same: x(t) = Acos(ωt + α) and p(t) = … Well… I’ll let you solve this: time to brush up your knowledge about differential equations. :-)

You’ll say: what the heck? Why are you making things so complicated? Indeed, what am I doing here? Am I making things needlessly complicated?

The answer is the usual one: yes, and no. Yes. If we’d want to do stuff in the classical world only, the answer seems to be: yes! In that case, the Lagrangian approach will do and may actually seem much easier, because we don’t have a set of equations to solve. And why would we need to keep track of p(t)? We’re only interested in the equation(s) of motion, aren’t we? Well… That’s why the answer to your question is also: no! In classical mechanics, we’re usually only interested in position, but in quantum mechanics that concept of conjugate variables (like x and p indeed) becomes much more important, and we will want to find the equations for both. So… Yes. That means a set of differential equations (one for each variable (x and p) in the example above) rather than just one. In short, the real answer to your question in regard to the complexity of the Hamiltonian modeling approach is the following: because the more abstract Hamiltonian approach to mechanics is very similar to the mathematics used in quantum mechanics, we will want to study it, because a good understanding of Hamiltonian mechanics will help us to understand the math involved in quantum mechanics. And so that’s the reason why physicists prefer it to the Lagrangian approach.

[…] Really? […] Well… At least that’s what I know about it from googling stuff here and there. Of course, another reason for physicists to prefer the Hamiltonian approach may well that they think social science (like economics) isn’t real science. Hence, we – social scientists – would surely expect them to develop approaches that are much more intricate and abstract than the ones that are being used by us, wouldn’t we?

[…] And then I am sure some of it is also related to the Anglo-French thing. :-)

Complex Fourier analysis: an introduction

One of the most confusing sentences you’ll read in an introduction to quantum mechanics – not only in those simple (math-free) popular books but also in Feynman’s Lecture introducing the topic – is that we cannot define a unique wavelength for a short wave train. In Feynman’s words: “Such a wave train does not have a definite wavelength; there is an indefiniteness in the wave number that is related to the finite length of the train, and thus there is an indefiniteness in the momentum.” (Feynman’s Lectures, Vol. I, Ch. 38, section 1).

That is not only confusing but, in some way, actually wrong. In fact, this is an oft-occurring statement which has effectively hampered my own understanding of quantum mechanics for decades, and it was only when I had a closer look at what a Fourier analysis really is that I understood what Feynman, and others, wanted to say. In short, it’s a classic example of where a ‘simple’ account of things can lead you astray.

Indeed, we can all imagine a short wave train with a very definite frequency, and I mean a precise, unambiguous and single-valued frequency here. Just take any sinusoidal function and multiply it with a so-called envelope function in order to shape it into a short pulse. Transients have that shape, and I gave an example in previous posts. Below, I’ve just copied the example given in the Wikipedia article on Fourier analysis: f(t) is a product of two factors:

  1. The first factor in the product is a cosine function: cos[2π(3t)] to be precise.
  2. The second factor is an exponential function: exp(–πt2).

The frequency of this ‘product function’ is quite precise: cos[2π(3t)] = cos[6πt] = cos[6π(t + 3)] for all values t, and so its period is equal to 3. The only thing the second factor, i.e. exp(–πt2), does is to shape this cosine function into a nice wave train, as it quickly tends to zero on both sides of the t = 0 point. So that second function is a nice simple bell curve (just plot the graph with a graph plotter) and doesn’t change the period (or frequency) of the product. In short, the oscillation below–which we should imagine as the representation of ‘something’ traveling through space–has a very definite frequency. So what’s Feynman saying above? There’s no Δf or Δλ here, is there?


The point to note is that these Δ concepts – Δf, Δλ, and so on – actually have very precise mathematical definitions, as one would expect in physics: they usually refer to the standard deviation of the distribution of a variable around the mean.

[…] OK, you’ll say. So what?

Well… That f(t) function above can – and, more importantly, should – be written as the sum of a potentially infinite number of waves in order to make sense of the Δf and Δλ factors in those uncertainty relations. Each of these component waves has a very specific frequency indeed, and each one of them makes its own contribution to the resultant wave. Hence, there is a distribution function for these frequencies, and so that is what Δf refers to. In other words, unlike what you’d think when taking a quick look at that graph above, Δf is not zero. So what is it then?

Well… It’s tempting to get lost in the math of it all now but I don’t want this blog to be technical. The basic ideas, however, are the following. We have a real-valued function here, f(t), which is defined from –∞ to +∞, i.e. over its so-called time domain. Hence, t ranges from –∞ to +∞ (the definition of the zero point is a matter of convention only, and we can easily change the origin by adding or subtracting some constant). [Of course, we could – and, in fact, we should – also define it over a spatial domain, but we’ll keep the analysis simple by leaving out the spatial variable (x).]

Now, the so-called Fourier transform of this function will map it to its so-called frequency domain. The animation below (for which the credit must, once again, go to Wikipedia, from which I borrow most of the material here) clearly illustrates the idea. I’ll just copy the description from the same article: “In the first frames of the animation, a function f is resolved into Fourier series: a linear combination of sines and cosines (in blue). The component frequencies of these sines and cosines spread across the frequency spectrum, are represented as peaks in the frequency domain, as shown shown in the last frames of the animation). The frequency domain representation of the function, \hat{f}, is the collection of these peaks at the frequencies that appear in this resolution of the function.”


[…] OK. You sort of get this (I hope). Now we should go a couple of steps further. In quantum mechanics, we’re talking not real-valued waves but complex-valued waves adding up to give us the resultant wave. Also, unlike what’s shown above, we’ll have a continuous distribution of frequencies. Hence, we’ll not have just six discrete values for the frequencies (and, hence, just six component waves), but an infinite number of them. So how does that work? Well… To do the Fourier analysis, we need to calculate the value of the following integral for each possible frequency, which I’ll denote with the Greek letter nu (ν), as we’ve used the f symbol already–not for the frequency but to denote the function itself! Let me just jot down that integral:

Fourier transform function

Huh? Don’t be scared now. Just try to understand what it actually represents:Relax. Just take a long hard look at it. Note, first, that the integrand (i.e. the function that is to be integrated, between the integral sign and the dt, so that’s f(t)ei2πtν) is a complex-valued function (that should be very obvious from the in the exponent of e). Secondly, note that we need to do such integral for each value of ν. So, for each possible value of ν, we have t ranging from –∞ to +∞ in that integral. Hmm… OK. So… How does that work? Well… The illustration below shows the real and imaginary part respectively of the integrand for ν = 3. [Just in case you still don’t get it: we fix ν here (ν = 3), and calculate the value of the real and imaginary part of the integrand for each possible value of t, so t ranges from –∞ to +∞ indeed.]


So what do we see here? The first thing you should note is that the value of both the real and imaginary part of the integrand quickly tends to zero on both sides of the t = 0 point. That’s because of the shape of f(t), which does exactly the same. However, in-between those ‘zero or close-to-zero values’, the integrand does take on very specific non-zero values. As for the real part of the integrand, which is denoted by Re[e−2πi(3t)f(t)], we see that’s always positive, with a peak value equal to one at t = 0. Indeed, the real part of the integrand is always positive because f(t) and the real part of e−2πi(3toscillate at the same rate. Hence, when f(t) is positive, so is the real part of e−2πi(3t), and when f(t) is negative, so is the real part of e−2πi(3t). However, the story is obviously different for the imaginary part of the integrand, denoted by Im[e−2πi(3t)f(t)]. That’s because, in general, eiθ = cosθ + isinθ and the sine and cosine function are essentially the same functions except for a phase difference of π/2 (remember: sin(θ+π/2) = cosθ).

Capito? No? Hmm… Well… Try to read what I am writing above once again. Else, just give up. :-)

I know this is getting complicated but let me try to summarize what’s going on here. The bottom line is that the integral above will yield a positive real number, 0.5 to be precise (as noted in the margin of the illustration), for the real part of the integrand, but it will give you a zero value for its imaginary part (also as noted in the margin of the illustration). [As for the math involved in calculating an integral of a complex-valued function (with a real-valued argument), just note that we should indeed just separate the real and imaginary parts and integrate separately. However, I don’t want you to get lost in the math so don’t worry about it too much. Just try to stick to the main story line here.]

In short, what we have here is a very significant contribution (the associated density is 0.5) of the frequency ν = 3. 

Indeed, let’s compare it to the contribution of the wave with frequency ν = 5. For ν = 5, we get, once again, a value of zero when integrating the imaginary part of the integral above, because the positive and negative values cancel out. As for the real part, we’d think they would do the same if we look at the graph below, but they don’t: the integral does yield, in fact, a very tiny positive value: 1.7×10–6 (so we’re talking 1.7 millionths here). That means that the contribution of the component wave with frequency ν = 5 is close to nil but… Well… It’s not nil: we have some contribution here (i.e. some density in other words).


You get the idea (I hope). We can, and actually should, calculate the value of that integral for each possible value of ν. In other words, we should calculate the integral over the entire frequency domain, so that’s for ν ranging from –∞ to +∞. However, I won’t do that. :-) What I will do is just show you the grand general result (below), with the particular results (i.e. the values of 0.5 and 1.7×10–6 for ν = 3 and ν = 5) as a green and red dot respectively. [Note that the graph below uses the ξ symbol instead of ν: I used ν because that’s a more familiar symbol, but so it doesn’t change the analysis.]


Now, if you’re still with me – probably not :-) – you’ll immediately wonder why there are two big bumps instead of just one, i.e. two peaks in the density function instead of just one. [You’re used to these Gauss curves, aren’t you?] And you’ll also wonder what negative frequencies actually are: the first bump is a density function for negative frequencies indeed, and… Well… Now that you think of it: why the hell would we do such integral for negative values of ν? I won’t say too much about that: it’s a particularity which results from the fact that eiθ and e−2πiθ both complete a cycle per second (if θ is measured in seconds, that is) so… Well… Hmm… […] Yes. The fact of the matter is that we do have a mathematical equivalent of the bump for positive frequencies on the negative side of the frequency domain, so… Well… […] Don’t worry about it, I’d say. As mentioned above, we shouldn’t get lost in the math here. For our purpose here, which is just to illustrate what a complex Fourier transform actually is (rather than present all of the mathematical intricacies of it), we should just focus on the second bump of that density function, i.e. the density function for positive frequencies only. :-)

So what? You’re probably tired by now, and wondering what I want to get at. Well… Nothing much. I’ve done what I wanted to do. I started with a real-valued wave train (think of a transient electric field working its way through space, for example), and I then showed how such wave train can (and should) be analyzed as consisting of an infinite number of complex-valued component waves, which each make their own contribution to the combined wave (which consists of the sum of all component waves) and, hence, can be represented by a graph like the one above, i.e. a real-valued density function around some mean, usually denoted by μ, and with some standard deviation, usually denoted by σ. So now I hope that, when you think of Δf or Δλ in the context of a so-called ‘probability wave’ (i.e. a de Broglie wave), then you’ll think of all this machinery behind.

In other words, it is not just a matter of drawing a simple figure like the one below and saying: “You see: those oscillations represent three photons being emitted one after the other by an atomic oscillator. You can see that’s quite obvious, can’t you?”


No. It is not obvious. Why not? Because anyone that’s somewhat critical will immediately say: “But how does it work really? Those wave trains seem to have a pretty definite frequency (or wavelength), even if their amplitude dies out, and, hence, the Δf factor (or Δλ factor) in that uncertainty relation must be close or, more probably, must be equal to zero. So that means we cannot say these particles are actually somewhere, because Δx must be close or equal to infinity.”

Now you know that’s a very valid remark, and then it isn’t it. Because now you understand that one actually has to go through the tedious exercise of doing that Fourier transform, and so now you understand what those Δ symbols actually represent. I hope you do because of this post, and despite the fact my approach has been very superficial and intuitive. In other words, I didn’t say what physicists would probably say, and that is: “Take a good math course before you study physics!” :-)

The Uncertainty Principle for energy and time

In all of my posts on the Uncertainty Principle, I left a few points open or rather vague, and that was usually because I didn’t have a clear understanding of them. As I’ve read some more in the meanwhile, I think I sort of ‘get’ these points somewhat better now. Let me share them with you in this and my next posts. This post will focus on the Uncertainty Principle for time and energy.

Indeed, most (if not all) experiments illustrating the Uncertainty Principle (such as the double-slit experiment with electrons for example) focus on the position (x) and momentum (p) variables: Δx·Δp = h. But there is also a similar relationship between time and energy:

ΔE·Δt = h

These pairs of variables (position and momentum, and energy and time) are so-called conjugate variables. I think I said enough about the Δx·Δp = h equation, but what about the ΔE·Δt = h equation?

We can sort of imagine what ΔE stands for, but what about Δt? It must also be some uncertainty: about time obviously–but what time are we talking about? I found one particularly easy explanation in a small booklet that I bought–long time ago– in Berlin: the dtv-Atlas zur Atomphysik. It makes things look easy by noting that everything is, obviously, related to everything. Indeed, we know that the uncertainty about the position (Δx) is to be related to the length of the (complex-valued) wave-train that represents the ‘particle’ (or ‘wavicle’ if you prefer that term) in space (and in time). In turn, the length of that wave-train is determined by the spread in the frequencies of the component waves that make up that wave-train, as illustrated below. [However, note that the illustration assumes the amplitudes are real-valued only, so there’s no imaginary part. I’ll come back to this point in my next post.]


More in particular, we can use the de Broglie relation for matter-particles (λ = h/p) and Planck’s relation for photons (E = hν = hc/λ ⇔ E/c = p = h/λ) to relate the uncertainty about the position to the spread in the wavelengths (and, hence, the frequencies) of the component waves:

p = h/λ and, hence, Δp = Δ(h/λ) = hΔ(1/λ)

Now, Δx equals h/Δp according to the Uncertainty Principle. Therefore, Δx must equal:

 Δx = h/[hΔ(1/λ)] = 1/[Δ(1/λ)]

But, again, what about that energy-time relationship? To measure all of the above variables–which are all related one to another indeed–we need some time. More in particular, to measure the frequency of a wave, we’ll need to look at that wave and register or measure at least a few oscillations, as shown below.

time and energy

I took the image taken from the above-mentioned German booklet and, hence, the illustration incorporates some German. However, that should not deter you from following the remarkably simple argument, which is the following:

  1. The error in our measurement of the frequency (i.e. the Meβfehler, denoted by Δν) is related to the measurement time (i.e. the Meβzeit, denoted by Δt). Indeed, if τ represents the actual period of the oscillation (which is obviously unknown to us: otherwise we wouldn’t be trying to measure the frequency, which is the reciprocal of the period: ν = 1/τ), then we can write Δt as some multiple of τ. [More specifically, in the example above we assume that Δt = 4τ = 4/ν. In fact, because we don’t know τ (or ν), we should write that Δt ≈ 4τ (= 4/ν).
  2. During that time, we will measure four oscillations and, hence, we are tempted to write that ν = 4/Δt. However, because of the measurement error, we should interpret the value for our measurement not as 4 exactly but as 4 plus or minus one: 4 ± 1. Indeed, it’s like measuring the length of something: if our yardstick has only centimeter marks, then we’ll measure someone’s length as some number plus or minus 1 cm. However, if we have some other instrument that is accurate up to one millimeter, then we’ll measure his or her length as some number plus or minus 1 mm. Hence, the result of our measurement should be written as ν ± Δν = (4 ± 1)/Δt = 4/Δt ± 1/Δt. Just put in some numbers in order to gain a better understanding. For example, imagine an oscillation of 100 Hz, and a measurement time of four hundredths of a second. Then we have Δt = 4×10–2 s and, during that time, we’ll measure 4 ± 1 oscillations. Hence, we’ll write that the frequency of this wave must be equal to ν ± Δν = (4 ± 1)/Δt = 4/(4×10–2 s) ± 1/(4×10–2 s) = 100 ± 25 Hz. So we’ll accept that, because measurement time was relatively short, we have a measurement error of Δν/ν = 25/100 = 25%. [Note that ‘relatively short’ means ‘short as compared to the actual period of the oscillation’. Indeed, 4×10–2 s is obviously not short in any absolute sense: in fact, it is like an eternity when we’re talking light waves, which have frequencies measured in terahertz.]
  3. The example makes it clear that Δν, i.e. the error in our measurement of the frequency, is related to the measurement time as follows: Δν = 1/Δt. Hence, if we double the measurement time, we halve the error in the measurement of the frequency. The relationship is quite straightforward indeed: let’s assume, for example, that our measurement time Δt is equal to Δt = 10τ. In that case, we get Δν = 1/(10×10–2 s) = 10 Hz, so the measurement error is Δν/ν = 10/100 = 10%. How long should the measurement time be in order to get a 1% error only? Let’s be more general and develop a solution for Δt as a function of the error expressed as a percentage. We write: Δν/ν = x % = x/100. But Δν = 1/Δt. Hence, we have Δν/ν = (1/Δt)/ν = 1/(Δt·ν) = x/100 or Δt = 100/(x·ν). So, for x = 1 (i.e. an error of 1%), we get Δt = 100/(1·100) = 1 second; for x = 5 (i.e. an error of 5%), we get Δt = 100/(5·100) = 0.2 seconds; and–just to check if our formula is correct–for x = 25 (i.e. an error of 25%), we get Δt = 100/(25·100) = 0.04 seconds, or 4×10–2 s, which is what this example started out with.

You’ll say: so what? Well, here we are! We can now immediately derive the Uncertainty Principle for time and energy because we know the energy–of a photon at least–is directly related to the frequency and, hence, the uncertainty about the energy must be related to the measurement time as follows:

E = hν ⇒ ΔE = Δ(hν) = hΔν = h(1/Δt) = h/Δt ⇔ ΔE·Δt = h

What if we’d not be talking photons but some particle here, such as an electron? Well… Our analysis is quite general. It’s the same really, but instead of Planck’s relation, we use the de Broglie relation, which looks exactly the same: f = E/h (hence, E = hf).

The same? Really?

Well… Yes and no. You should, of course, remember that a de Broglie wave is a complex-valued wave, and so that’s why I prefer to use two different symbols here: f (the frequency of the de Broglie wave) versus ν (the frequency of the photon). That brings me to the next point which I should clarify, and that’s how the Fourier analysis of a wave-train really works. However, I’ll do that in my next post, not here. To wrap up, I’ll just repeat the point I made when I started writing this: everything is effectively related to everything in quantum mechanics. Energy, momentum, position, frequency, time,… etcetera: all of these concepts are related to one another.

That being said, position and time are (and will always remain) the more ‘fundamental’ variables obviously. We measure position in space and in time. Likewise, energy and momentum are related to frequency and wavelength respectively (through the de Broglie relations E = hf and p = h/λ), and these concepts do not make sense without a reference to time (frequency) and/or space (wavelength), and both are related through the velocity of the wave: v = fλ. [For photons, this equation becomes v = c = νλ, or λ = c/ν. I know the notation is somewhat confusing – v for velocity and ν (i.e. the Greek letter nu) for frequency  – but so be it.]

A final note needs to be made on the value of h: it’s very tiny. Indeed, a value of (about) 6.6×10−34 J·s or, using the smaller eV unit for energy, some 4.1×10−15 eV·s is unimaginably small, especially because we need to take into account that the energy concept as used in the de Broglie equation includes the rest mass of a particle. Now, anything that has any rest mass has enormous energy according to Einstein’s mass-energy equivalence relationship: E = mc2. Let’s consider, for example, a hydrogen atom. Its atomic mass can be expressed in eV/c2, using the same E = mcbut written as m = E/c2, although you will usually find it expressed in so-called unified atomic mass units (u). The mass of our hydrogen atom is approximately 1 u ≈ 931.5×106 eV/c2. That means its energy is about 931.5×106 eV.

Just from the units used (106 eV means we’re measuring stuff here in million eV, so the uncertainty is also likely to be expressed in plus or minus one million eV), it’s obvious that even very small values for Δt will still respect the Uncertainty Principle. [I am obviously exaggerating here: it is likely that we’ll want to reduce the measurement error to much less than plus or minus 1×106 eV, so that means that our measurement time Δt will have to go up. That being said, the point is quite clear: we won’t need all that much time to measure its mass (or its energy) very accurately.] At the same time, the de Broglie frequency f = E/h will be very high. To be precise, the frequency will be in the order of (931.5×106 eV)/(4.1×10−15 eV·s) = 0.2×1024 Hz. In practice, this means that the wavelength is so tiny that there’s no detector which will actually measure the ‘oscillation': any physical detector will straddle most – in fact, I should say: all – of the wiggles of the probability curve. All these facts basically state the same: a hydrogen atom occupies a very precisely determined position in time and space. Hence, we will see it as a ‘hard’ particle–i.e. not as a ‘wavicle’. That’s why the interference experiment mentions electrons, although I should immediately add that interference patterns have been observed using much larger particles as well. In any case, I wrote about that before, so I won’t repeat myself here. The point was to make that energy-time relationship somewhat more explicit, and I hope I’ve been successful at that at least. You can play with some more numbers yourself now. :-)

Post scriptum: The Breit-Wigner distribution

The Uncertainty Principle applied to time and energy has an interesting application: it’s used to measure the lifetime of very short-lived particles. In essence, the uncertainty about their energy (ΔE) is measured by equating it to the standard deviation of the measurements. The equation ΔEΔt = ħ/2 is then used to determine Δt, which is equated to the lifetime of the particle that’s being studied. Georgia University’s Hyperphysics website gives an excellent quick explanation of this, and so I just copied that below.

Hyperphysics Breit-Wigner


A post for my kids: About Einstein’s laws of radiation, and lasers

I wrapped up my previous post, which gave Planck’s solution for the blackbody radiation problem, wondering whether or not one could find the same equation using some other model, not involving the assumption that atomic oscillators have discrete energy levels.

I still don’t have an answer to that question but, sure enough, Feynman introduces another model a few pages further in his Lectures. It’s a model developed by Einstein, in 1916, and it’s much ‘richer’ in the sense that it takes into account what we know to be true: unlike matter-particles (fermions), photons like to crowd together. In more advanced quantum-mechanical parlance, their wave functions obey Bose-Einstein statistics. Now, Bose-Einstein statistics are what allows a laser to focus so much energy in one beam, and so I am writing this post for two reasons–one serious and the other not-so-serious:

  1. To present Einstein’s 1916 model for blackbody radiation.
  2. For my kids, so they understand how a laser works.

Let’s start with Einstein’s model first because, if I’d start with the laser, my kids would only read about that and nothing else. [That being said, I am sure my kids will go straight to the second part and, hence, skip Einstein anyway. :-)]

Einstein’s model of blackbody radiation

Einstein’s model is based on Planck’s and, hence, also assumes that the energy of atomic oscillators can also only take on one value of a set of permitted energy levels. However, unlike Planck, he assumes two types of emission. The first is spontaneous, and that’s basically just Planck’s model. The second is induced emission: that’s emission when light is alrady present, and Einstein’s hypothesis was that an atomic oscillator is more likely to emit a photon when there’s light of the same frequency is shining on it.


The basics of the model are shown above, and the two new variables are the following:

  • Amn is the probability for the oscillator to have its energy drop from energy level m to energy level n, independent of whether light is shining on the atom or not. So that’s the probability of spontaneous emission and it only depends on m and n.
  • Bmn is not a probability but a proportionality constant that, together with the intensity of the light shining on the oscillator–denoted by I(ω), co-determines the probability of of induced emission.

Now, as mentioned above, in this post, I basically want to explain how a laser works, and so let me be as brief as possibly by just copying Feynman here, who says it all:

Feynman on Einstein

Of course, this result must match Planck’s equation for blackbody radiation, because Planck’s equation matched experiment:

formula blackbody

To get the eħω/kT –1, Bmn must be equal to Bnm, and you should not think that’s an obvious result, because it isn’t: this equality says that the induced emission probability and the absorption probability must be equal. Good to know: this keeps the numbers of atoms in the various levels constant through what is referred to as detailed balancing: in thermal equilibrium, every process is balanced by its exact opposite. While that’s nice, and the way it actually works, it’s not obvious. It shows that the process is fully time-reversible. That’s not obvious in a situation involving statistical mechanics, which is what we’re talking about there. In any case, that’s a different topic.

As for Amn, taking into account that Bmn = Bnm, we find that Amn/Bmn =ħω3/π2c2. So we have a ratio here. What about calculating the individual values for Amn and Bmn? Can we calculate the absolute spontaneous and induced emission rates? Feynman says: No. Not with what Einstein had at the time. That was possible only a decade or so later, it seems, when Werner Heisenberg, Max Born, Pascual Jordan, Erwin Schrödinger, Paul Dirac and John von Neumann developed a fully complete theory, in the space of just five years (1925-1930), but that’s the subject of the history of science.

The point is: we have got everything here now to sort of understand how lasers work, so let’s try that to do that now.


Laser is an acronym which stands for Light Amplification by Stimulated Emission of Radiation. It’s based on the mechanism described above, which I am sure you’ve studied in very much detail. :-)

The trick is to find a method to get a gas in a state in which the number of atomic oscillators with energy level m is much and much greater than the number with energy level n. So we’re talking a situation that is not in equilibrium. On the contrary: it’s far out of equilibrium. And then, suddenly, we induce emission from this upper state, which creates a sort of chain reaction that makes “the whole lot of them dump down together”, as Feynman puts it.

The diagram below is taken from the Wikipedia article on lasers. It shows a so-called Nd:YAG laser. Huh? Yes. Nd:YAG stands for neodymium-doped yttrium aluminium garnet, and an Nd:YAG laser is a pretty common type of laser. A garnet is a precious stone: a crystal composed of a silicate mineral. And that’s what it is here, and why the laser is so-called solid-state laser, because the so-called laser medium (see the diagram) may also be a gas or even a liquid, as in dye lasers). I could also take a ruby laser, which uses ruby as the laser medium. But let’s go along with this one as for now.


In the set-up as shown above, a simple xenon flash lamp (yes, that’s a ‘neon’ lamp) provides the energy exciting the atomic oscillators in the crystal. It’s important that the so-called pumping source emits light of a higher frequency than the laser light, as shown below. In fact, the light from xenon gas, or any source, will be a spectrum but so it should (also) have light in the blue or violet range (as shown below). The important thing is that it should not have the red laser frequency, because that’s what would trigger the laser, of course.


The diagram above shows how it actually works.The trick is to get the atoms to a higher state (that’s h in the diagram above, but it’s got nothing to do with the Planck constant) from where they trickle down (and, yes, they do emit other photons while doing that), until they all get stuck in the state m, which is referred to as metastable but which is, in effect, unstable. And so then they are all dumped down together by induced emissions. So the source ‘pumps’ the crystal indeed, leading to that ‘metastable’ state which is referred to as population inversion in statistical mechanics: a lot of atoms (i.e. the members of the ‘population’) are in an excited state, rather than in a lower energy state.

And then we have a so-called optical resonator (aka as a cavity) which, in its simplest form, consists of just two mirrors around the gain medium (i.e. the crystal): these mirrors reflect the light so, once the dump starts, the induced effect is enhanced: the light which is emitted gets a chance to induce more emission, and then another chance, and another, and so on. However, although the mirrors are almost one hundred percent reflecting, light does get out because one of the mirrors is only a partial reflector, which is referred to as the output coupler, and which produces the laser’s output beam.

So… That’s all there is to it.

Really? Is it that simple? Yep. I googled a few questions to increase my understanding but so that’s basically it. Perhaps they’ll help you too and so I copied them hereunder. Before you go through that, however, have a look at how they really look like. The image below (from Wikipedia again) shows a disassembled (and assembled) ruby laser head. You can clearly see the crystal rod in the middle, and the two flashlamps that are used for pumping. I am just inserting it here because, in engineering, I found that a diagram of something and the actual thing often have not all that much in common. :-) As you can see, it’s not the case here: it looks amazingly simple, doesn’t it?


Q: We have crystal here. What’s the atomic oscillator in the crystal? A: It is the neodymium ion which provides the lasing activity in the crystal, in the same fashion as red chromium ion in ruby lasers.

Q: But how does it work exactly? A: Well… The diagram is a bit misleading. The distance between h and m should not be too big of course, because otherwise half of the energy goes into these photons that are being emitted as the oscillators ‘trickle down’. Also, if these ‘in-between’ emissions would have the same frequency as the laser light, they would induce the emission, which is not what we want. So the actual distances should look more like this:


For an actual Nd:YAG laser, we have absorption mostly in the bands between 730–760 nm and 790–820 nm, and emitted light with a wavelength with a wavelength of 1064 nm. Huh? Yes. Remember: shorter wavelength (λ) is higher frequency (ν = c/λ) and, hence, higher energy (E =  hν = hc/λ). So that’s what’s shown below.


Q: But… You’re talking bullsh**. Wavelengths in the 700–800 nm range are infrared (IR) and, hence, not even visible. And light of 1064 nm even less. A: Now you are a smart-ass! You’re right. What actually happens is a bit more complicated, as you might expect. There’s something else going on as well, a process referred to as frequency doubling or second harmonic generation (SHG). It’s a process in which photons with the same frequency (1064 nm) interact with some material to effectively ‘combine’ into new photons with twice the energy, twice the frequency and, therefore, half the wavelength of the initial photons. And so that’s light with a wavelength of 532 nm. We actuall also have so-called higher harmonics, with wavelengths at 355 and 266 nm.

Q: But… That’s green? A: Sure. A Nd:YAG laser produces a green laser beam, as shown below. If you want the red color, buy a ruby laser, which produces pulses of light with a wavelength of 694.3 nm: that’s the deep red color you’d associate with lasers. In fact, the first operational laser, produced by Hughes Research Laboratories back in 1960 (the research arm of Hughes Aircraft, now part of the Raytheon), was a ruby laser.

Powerlite_NdYAGQ: Pulses? That reminds me of something: lasers pulsate indeed, don’t they? How does that work? A: They do. Lasers have a so-called continuous wave output mode. However, there’s a technique called Q-switching. Here, an optical switch is added to the system. It’s inserted into laser cavity, and it waits for a maximum population inversion before it opens. Then the light wave runs through the cavity, depopulating the excited laser medium at maximum population inversion. It allows to produce light pulses with extremely high peak power, much higher than would be produced by the same laser if it were operating in constant output mode.

Q: What’s the use of lasers? A: Because of their ability to focus, they’re used a surgical knives, in eye surgery, or to remove tumors in the brain and treat skin cancer. Lasers are also widely used for engraving, etching, and marking of metals and plastics. When they pack more power, they can also be used to cut or weld steel. Their ability to focus is why these tiny pocket lasers can damage your eye: it’s not like a flashlight. It’s a really focused beam and so it can really blind you–not for a while but permanently.

Q: Lasers can also be used as weapons, can’t they? A: Yes. As mentioned above, techniques like Q-switching allow to produce pulses packing enormous amounts of energy into one single pulse, and you hear a lot about lasers being used as directed-energy weapons (DEWs). However, they won’t replace explosives anytime soon. Lasers were already widely used for sighting, ranging and targeting for guns, but so they’re not the source of the weapon’s firepower. That being said, the pulse of a megajoule laser would deliver the same energy as 200 grams of high explosive, but all focused on a tiny little spot. Now that’s firepower obviously, and such lasers are now possible. However, their power is more likely to be used for more benign purposes, notably igniting a nuclear fusion reaction. There’s nice stuff out there if you’d want to read more.

Q: No. I think I’ve had it. But what are those pocket lasers? A: They are what they are: handheld lasers. It just shows how technology keeps evolving. The Nano costs a hundred dollars only. I wonder if Einstein would ever have imagined that what he wrote back in 1916 would, ultimately, lead to us manipulating light with little handheld tools. We live in amazing times. :-)

Planck’s constant (II)

My previous post was tough. Tough for you–if you’ve read it. But tough for me too. :-)

The blackbody radiation problem is complicated but, when everything is said and done, what the analysis says is that the the ‘equipartition theorem’ in the kinetic theory of gases ‘theorem (or the ‘theorem concerning the average energy of the center-of-mass motion’, as Feynman terms it), is not correct. That equipartition theorem basically states that, in thermal equilibrium, energy is shared equally among all of its various forms. For example, the average kinetic energy per degree of freedom in the translation motion of a molecule should equal that of its rotational motions. That equipartition theorem is also quite precise: it also states that the mean energy, for each atom or molecule, for each degree of freedom, is kT/2. Hence, that’s the (average) energy the 19th century scientists also assigned to the atomic oscillators in a gas.

However, the discrepancy between the theoretical and empirical result of their work shows that adding atomic oscillators–as radiators and absorbers of light–to the system (a box of gas that’s being heated) is not just a matter of adding additional ‘degree of freedom’ to the system. It can’t be analyzed in ‘classical’ terms: the actual spectrum of blackbody radiation shows that these atomic oscillators do not absorb, on average, an amount of energy equal to kT/2. Hence, they are not just another ‘independent direction of motion’.

So what are they then? Well… Who knows? I don’t. But, as I didn’t quite go through the full story in my previous post, the least I can do is to try to do that here. It should be worth the effort. In Feynman’s words: “This was the first quantum-mechanical formula ever known, or discussed, and it was the beautiful culmination of decades of puzzlement.” And then it does not involve complex numbers or wave functions, so that’s another reason why looking at the detail is kind of nice. :-)

Discrete energy levels and the nature of h

To solve the blackbody radiation problem, Planck assumed that the permitted energy levels of the atomic harmonic oscillator were equally spaced, at ‘distances’ ħωapart from each other. That’s what’s illustrated below.

Equally space energy levels

Now, I don’t want to make too many digressions from the main story, but this En = nħω0 formula obviously deserves some attention. First note it immediately shows why the dimension of ħ is expressed in joule-seconds (J·s), or electronvolt-seconds (J·s): we’re multiplying it with a frequency indeed, so that’s something expressed per second (hence, its dimension is s–1) in order to get a measure of energy: joules or, because of the atomic scale, electronvolts. [The eV is just a (much) smaller measure than the joule, but it amounts to the same: 1 eV ≈ 1.6×10−19 J.]

One thing to note is that the equal spacing consists of distances equal to ħω0, not of ħ. Hence, while h, or ħ (ħ is the constant to be used when the frequency is expressed in radians per second, rather than oscillations per second, so ħ = h/2π) is now being referred to as the quantum of action (das elementare Wirkungsquantum in German), Planck referred to it as as a Hilfsgrösse only (that’s why he chose the h as a symbol, it seems), so that’s an auxiliary constant only: the actual quantum of action is, of course, ΔE, i.e. the difference between the various energy levels, which is the product of ħ and ω(or of h and ν0 if we express frequency in oscillations per second, rather than in angular frequency). Hence, Planck (and later Einstein) did not assume that an atomic oscillator emits or absorbs packets of energy as tiny as ħ or h, but packets of energy as big as ħωor, what amounts to the same (ħω = (h/2π)(2πν) = hν), hν0. Just to give an example, the frequency of sodium light (ν) is 500×1012 Hz, and so its energy is E = hν. That’s not a lot–about 2 eV only– but it still packs 500×1012 ‘quanta of action’ !

Another thing is that ω (or ν) is a continuous variable: hence, the assumption of equally spaced energy levels does not imply that energy itself is a discrete variable: light can have any frequency and, hence, we can also imagine photons with any energy level: the only thing we’re saying is that the energy of a photon of a specific color (i.e. a specific frequency ν) will be a multiple of hν.

Probability assumptions

The second key assumption of Planck as he worked towards a solution of the blackbody radiation problem was that the probability (P) of occupying a level of energy E is P(EαeE/kT. OK… Why not? But what is this assumption really? You’ll think of some ‘bell curve’, of course. But… No. That wouldn’t make sense. Remember that the energy has to be positive. The general shape of this P(E) curve is shown below.


The highest probability density is near E = 0, and then it goes down as E gets larger, with kT determining the slope of the curve (just take the derivative). In short, this assumption basically states that higher energy levels are not so likely, and that very high energy levels are very unlikely. Indeed, this formula implies that the relative chance, i.e. the probability of being in state E1 relative to the chance of being in state E0, is P1/Pe−(E1–E0)k= e−ΔE/kT. Now, Pis n1/N and Pis n0/N and, hence, we find that nmust be equal to n0e−ΔE/kT. What this means is that the atomic oscillator is less likely to be in a higher energy state than in a lower one.

That makes sense, doesn’t it? I mean… I don’t want to criticize those 19th century scientists but… What were they thinking? Did they really imagine that infinite energy levels were as likely as… Well… More down-to-earth energy levels? I mean… A mechanical spring will break when you overload it. Hence, I’d think it’s pretty obvious those atomic oscillators cannot be loaded with just about anything, can they? Garbage in, garbage out:  of course, that theoretical spectrum of blackbody radiation didn’t make sense!

Let me copy Feynman now, as the rest of the story is pretty straightforward:

Now, we have a lot of oscillators here, and each is a vibrator of frequency w0. Some of these vibrators will be in the bottom quantum state, some will be in the next one, and so forth. What we would like to know is the average energy of all these oscillators. To find out, let us calculate the total energy of all the oscillators and divide by the number of oscillators. That will be the average energy per oscillator in thermal equilibrium, and will also be the energy that is in equilibrium with the blackbody radiation and that should go in the equation for the intensity of the radiation as a function of the frequency, instead of kT. [See my previous post: that equation is I(ω) = (ω2kt)/(π2c2).]

Thus we let N0 be the number of oscillators that are in the ground state (the lowest energy state); N1 the number of oscillators in the state E1; N2 the number that are in state E2; and so on. According to the hypothesis (which we have not proved) that in quantum mechanics the law that replaced the probability eP.E./kT or eK.E./kT in classical mechanics is that the probability goes down as eΔE/kT, where ΔE is the excess energy, we shall assume that the number N1 that are in the first state will be the number N0 that are in the ground state, times e−ħω/kT. Similarly, N2, the number of oscillators in the second state, is N=N0e−2ħω/kT. To simplify the algebra, let us call e−ħω/k= x. Then we simply have N1 = N0x, N2 = N0x2, …, N= N0xn.

The total energy of all the oscillators must first be worked out. If an oscillator is in the ground state, there is no energy. If it is in the first state, the energy is ħω, and there are N1 of them. So N1ħω, or ħωN0x is how much energy we get from those. Those that are in the second state have 2ħω, and there are N2 of them, so N22ħω=2ħωN0x2 is how much energy we get, and so on. Then we add it all together to get Etot = N0ħω(0+x+2x2+3x3+…).

And now, how many oscillators are there? Of course, N0 is the number that are in the ground state, N1 in the first state, and so on, and we add them together: Ntot = N0(1+x+x2+x3+…). Thus the average energy is


Now the two sums which appear here we shall leave for the reader to play with and have some fun with. When we are all finished summing and substituting for x in the sum, we should get—if we make no mistakes in the sum—

Feynman concludes as follows: “This, then, was the first quantum-mechanical formula ever known, or ever discussed, and it was the beautiful culmination of decades of puzzlement. Maxwell knew that there was something wrong, and the problem was, what was right? Here is the quantitative answer of what is right instead of kT. This expression should, of course, approach kT as ω → 0 or as → .”

It does, of course. And so Planck’s analysis does result in a theoretical I(ω) curve that matches the observed I(ω) curve as a function of both temperature (T) and frequency (ω). But so what it is, then? What’s the equation describing the dotted curves? It’s given below:

formula blackbody

I’ll just quote Feynman once again to explain the shape of those dotted curves: “We see that for a large ω, even though we have ωin the numerator, there is an e raised to a tremendous power in the denominator, so the curve comes down again and does not “blow up”—we do not get ultraviolet light and x-rays where we do not expect them!”

Is the analysis necessarily discrete?

One question I can’t answer, because I just am not strong enough in math, is the question or whether or not there would be any other way to derive the actual blackbody spectrum. I mean… This analysis obviously makes sense and, hence, provides a theory that’s consistent and in accordance with experiment. However, the question whether or not it would be possible to develop another theory, without having recourse to the assumption that energy levels in atomic oscillators are discrete and equally spaced with the ‘distance’ between equal to hν0, is not easy to answer. I surely can’t, as I am just a novice, but I can imagine smarter people than me have thought about this question. The answer must be negative, because I don’t know of any other theory: quantum mechanics obviously prevailed. Still… I’d be interested to see the alternatives that must have been considered.

Post scriptum: The “playing with the sums” is a bit confusing. The key to the formula above is the substitution of (0+x+2x2+3x3+…)/(1+x+x2+x3+…) by 1/[(1/x)–1)] = 1/[eħω/kT–1]. Now, the denominator 1+x+x2+x3+… is the Maclaurin series for 1/(1–x). So we have:

(0+x+2x2+3x3+…)/(1+x+x2+x3+…) = (0+x+2x2+3x3+…)(1–x)

x+2x2+3x3… –x22x3–3x4… = x+x2+x3+x4

= –1+(1+x+x2+x3…) = –1 + 1/(1–x) = –(1–x)+1/(1–x) = x/(1–x).

Note the tricky bit: if x = e−ħω/kT, then eħω/kis x−1 = 1/x, and so we have (1/x)–1 in the denominator of that (mean) energy formula, not 1/(x–1). Now 1/[(1/x)–1)] = 1/[(1–x)/x] = x/(1–x), indeed, and so the formula comes out alright.

Planck’s constant (I)

If you made it here, it means you’re totally fed up with all of the easy stories on quantum mechanics: diffraction, double-slit experiments, imaginary gamma-ray microscopes,… You’ve had it! You now know what quantum mechanics is all about, and you’ve realized all these thought experiments never answer the tough question: where did Planck find that constant (h) which pops up everywhere? And how did he find that Planck relation which seems to underpin all and everything in quantum mechanics?

If you don’t know, that’s because you’ve skipped the blackbody radiation story. So let me give it to you here. What’s blackbody radiation?

Thermal equilibrium of radiation

That’s what the blackbody radiation problem is about: thermal equilibrium of radiation.


Yes. Imagine a box with gas inside. You’ll often see it’s described as a furnace, because we heat the box. Hence, the box, and everything inside, acquires a certain temperature, which we then assume to be constant. The gas inside will absorb energy and start emitting radiation, because the gas atoms or molecules are atomic oscillators. Hence, we have electrons getting excited and then jumping up and down from higher to lower energy levels, and then again and again and again, thereby emitting photons with a certain energy and, hence, light of a certain frequency. To put it simply: we’ll find light with various frequencies in the box and, in thermal equilibrium, we should have some distribution of the intensity of the light according to the frequency: what kind of radiation do we find in the furnace? Well… Let’s find out.

The assumption is that the box walls send light back, or that the box has mirror walls. So we assume that all the radiation keeps running around in the box. Now that implies that the atomic oscillators not only radiate energy, but also receive energy, because they’re constantly being illuminated by radiation that comes straight back at them. If the temperature of the box is kept constant, we arrive at a situation which is referred to as thermal equilibrium. In Feynman’s words: “After a while there is a great deal of light rushing around in the box, and although the oscillator is radiating some, the light comes back and returns some of the energy that was radiated.”

OK. That’s easy enough to understand. However, the actual analysis of this equilibrium situation is what gave rise to the ‘problem’ of blackbody radiation in the 19th century which, as you know, led Planck and Einstein to develop a quantum-mechanical view of things. It turned out that the classical analysis predicted a distribution of the intensity of light that didn’t make sense, and no matter how you looked at it, it just didn’t come out right. Theory and experiment did not agree. Now, that is something very serious in science, as you know, because it means your theory isn’t right. In this case, it was disastrous, because it meant the whole of classical theory wasn’t right.

To be frank, the analysis is not all that easy. It involves all that I’ve learned so far: the math behind oscillators and interference, statistics, the so-called kinetic theory of gases and what have you. I’ll try to summarize the story but you’ll see it requires quite an introduction.

Kinetic energy and temperature

The kinetic theory of gases is part of what’s referred to as statistical mechanics: we look at a gas as a large number of inter-colliding atoms and we describe what happens in terms of the collisions between them. As Feynman puts it: “Fundamentally, we assert that the gross properties of matter should be explainable in terms of the motion of its parts.” Now, we can do a lot of intellectual gymnastics, analyzing one gas in one box, two gases in one box, two gases in one box with a piston between them, two gases in two boxes with a hole in the wall between them, and so on and so on, but that would only distract us here. The rather remarkable conclusion of such exercises, which you’ll surely remember from your high school days, is that:

  1. Equal volumes of different gases, at the same pressure and temperature, will have the same number of molecules.
  2. In such view of things, temperature is actually nothing but the mean kinetic energy of those molecules (or atoms if it’s a monatomic gas).

So we can actually measure temperature in terms of the kinetic energy of the molecules of the gas, which, as you know, equals mv2/2, with m the mass and v the velocity of the gas molecules. Hence, we’re tempted to define some absolute measure of temperature T and simply write:

T = 〈mv2/2〉

The 〈 and 〉 brackets denote the mean here. To be precise, we’re talking the root mean square here, aka as the quadratic mean, because we want to average some magnitude of a varying quantity. Of course, the mass of different gases will be different – and so we have 〈m1v12/2〉 for gas 1 and 〈m2v22/2〉 for gas 2 – but that doesn’t matter: we can, actually, imagine measuring temperature in joule, the unit of energy, including kinetic energy. Indeed, the units come out alright: 1 joule = 1 kg·(m2/s2). For historical reasons, however, T is measured in different units: degrees Kelvin, centigrades (i.e. degrees Celsius) or, in the US, in Fahrenheit. Now, we can easily go from one measure to the other as you know and, hence, here I should probably just jot down the so-called ideal gas law–because we need that law for the subsequent analysis of blackbody radiation–and get on with it:

PV = NkT

However, now that we’re here, let me give you an inkling of how we derive that law. A classical (Newtonian) analysis of the collisions (you can find the detail in Feynman’s Lectures, I-39-2) will yield the following equation: P = (2/3)n〈mv2/2〉, with n the number of atoms or molecules per unit volume. So the pressure of a gas (which, as you know, is the force (of a gas on a piston, for example) per unit area: P = F/A) is also equal to the mean kinetic energy of the gas molecules multiplied by (2/3)n. If we multiply that equation by V, we get PV = N(2/3)〈mv2/2〉. However, we know that equal volumes of volumes of different gases, at the same pressure and temperature, will have the same number of molecules, so we have PV = N(2/3)〈m1v12/2〉 = N(2/3)〈m2v22/2〉, which we write as PV = NkT with kT = (2/3)〈m1v12/2〉 = (2/3)〈m2v22/2〉.

In other words, that factor of proportionality k is the one we have to use to convert the temperature as measured by 〈mv2/2〉 (i.e. the mean kinetic energy expressed in joules) to T (i.e. the temperature expressed in the measure we’re used to, and that’s degrees Kelvin–or Celsius or Fahrenheit, but let’s stick to Kelvin, because that’s what’s used in physics). Vice versa, we have 〈mv2/2〉 = (3/2)kT. Now, that constant of proportionality k is equal to k 1.38×10–23 joule per Kelvin (J/K). So if T is (absolute) temperature, expressed in Kelvin (K), our definition says that the mean molecular kinetic energy is (3/2)kT.

That k factor is a physical constant referred to as the Boltzmann constant. If it’s one of these constants, you may wonder why we don’t integrate that 3/2 factor in it? Well… That’s just how it is, I guess. In any case, it’s rather convenient because we’ll have 2/3 factors in other equations and so these will cancel out with that 3/2 term. However, I am digressing way too much here. I should get back to the main story line. However, before I do that, I need to expand on one more thing, and that’s a small lecture on how things look like when we also allow for internal motion, i.e. the rotational and vibratory motions of the atoms within the gas molecule. Let me first re-write that PV = NkT equation as

PV = NkT = N(2/3)〈m1v12/2〉 = (2/3)U = 2U/3

For monatomic gas, that U would only be the kinetic energy of the atoms, and so we can write it as U = (2/3)NkT. Hence, we have the grand result that the kinetic energy, for each atom, is equal to (3/2)kT, on average that is.

What about non-monatomic gas? Well… For complex molecules, we’d also have energy going into the rotational and vibratory motion of the atoms within the molecule, separate from what is usually referred to as the center-of-mass (CM) motion of the molecules themselves. Now, I’ll again refer you to Feynman for the detail of the analysis, but it turns out that, if we’d have, for example, a diatomic molecule, consisting of an A and B atom, the internal rotational and vibratory motion would, indeed, also absorb energy, and we’d have a total energy equal to (3/2)kT + (3/2)kT = 2×(3/2)kT = 3kT. Now, that amount (3kT) can be split over (i) the energy related to the CM motion, which must still be equal to (3/2)kT, and (ii) the average kinetic energy of the internal motions of the diatomic molecule excluding the bodily motion of the CM. Hence, the latter part must be equal to 3kT – (3/2)kT = (3/2)kT. So, for the diatomic molecule, the total energy happens to consist of two equal parts.

Now, there is a more general theorem here, for which I have to introduce the notion of the degrees of freedom of a system. Each atom can rotate or vibrate or oscillate or whatever in three independent directions–namely the three spatial coordinates x, y and z. These spatial dimensions are referred to as the degrees of freedom of the atom (in the kinetic theory of gases, that is), and if we have two atoms, we have 2×3 = 6 degrees of freedom. More in general, the number of degrees of freedom of a molecule composed of r atoms is equal to 3rNow, it can be shown that the total energy of an r-atom molecule, including all internal energy as well as the CM motion, will be 3r×kT/2 = 3rkT/2 joules. Hence, for every independent direction of motion that there is, the average kinetic energy for that direction will be kT/2. [Note that ‘independent direction of motion’ is used, somewhat confusingly, as a synonym for degree of freedom, so we don’t have three but six ‘independent directions of motion’ for the diatomic molecule. I just wanted to note that because I do think it causes confusion when reading a textbook like Feynman’s.] Now, that total amount of energy, i.e.  3r(kT/2), will be split as follows according to the “theorem concerning the average energy of the CM motion”, as Feynman terms it:

  1. The kinetic energy for the CM motion of each molecule is, and will always be, (3/2)kT.
  2. The remainder, i.e. r(3/2)kT – (3/2)kT = (3/2)(r–1)kt, is internal vibrational and rotational kinetic energy, i.e. the sum of all vibratory and rotational kinetic energy but excluding the energy of the CM motion of the molecule.

Phew! That’s quite something. And we’re not quite there yet.

The analysis for photon gas

Photon gas? What’s that? Well… Imagine our box is the gas in a very hot star, hotter than the sun. As Feynman writes it: “The sun is not hot enough; there are still too many atoms, but at still higher temperatures in certain very hot stars, we may neglect the atoms and suppose that the only objects that we have in the box are photons.” Well… Let’s just go along with it. We know that photons have no mass but they do have some very tiny momentum, which we related to the magnetic field vector, as opposed to the electric field. It’s tiny indeed. Most of the energy of light goes into the electric field. However, we noted that we can write p as p = E/c, with c the speed of light (3×108). Now, we had that = (2/3)n〈mv2/2〉 formula for gas, and we know that the momentum p is defined as p = mv. So we can substitute mvby (mv)v = pv. So we get = (2/3)n〈pv/2〉 = (1/3)n〈pv〉.

Now, the energy of photons is not quite the same as the kinetic energy of an atom or an molecule, i.e. mv2/2. In fact, we know that, for photons, the speed v is equal to c, and pc = E. Hence, multiplying by the volume V, we get

PV = U/3

So that’s a formula that’s very similar to the one we had for gas, for which we wrote: PV = NkT = 2U/3. The only thing is that we don’t have a factor 2 in the equation but so that’s because of the different energy concepts involved. Indeed, the concept of the energy of a photon (E = pc) is different than the concept of kinetic energy. But so the result is very nice: we have a similar formula for the compressibility of gas and radiation. In fact, both PV = 2U/3 and PV = U/3 will usually be written, more generally, as:

PV = (γ – 1)U 

Hence, this γ would be γ = 5/3 ≈ 1.667 for gas and 4/3 ≈ 1.333 for photon gas. Now, I’ll skip the detail (it involves a differential analysis) but it can be shown that this general formula, PV = (γ – 1)U, implies that PVγ (i.e. the pressure times the volume raised to the power γ) must equal some constant, so we write:

PVγ = C

So far so good. Back to our problem: blackbody radiation. What you should take away from this introduction is the following:

  1. Temperature is a measure of the average kinetic energy of the atoms or molecules in a gas. More specifically, it’s related to the mean kinetic energy of the CM motion of the atoms or molecules, which is equal to (3/2)kT, with k the Boltzmann constant and T the temperature expressed in Kelvin (i.e. the absolute temperature).
  2. If gas atoms or molecules have additional ‘degrees of freedom’, aka ‘independent directions of motion’, then each of these will absorb additional energy, namely kT/2.

Energy and radiation

The atoms in the box are atomic oscillators, and we’ve analyzed them before. What the analysis above added was that average kinetic energy of the atoms going around is (3/2)kT and that, if we’re talking molecules consisting of r atoms, we have a formula for their internal kinetic energy as well. However, as an oscillator, they also have energy separate from that kinetic energy we’ve been talking about alrady. How much? That’s a tricky analysis. Let me first remind you of the following:

  1. Oscillators have a natural frequency, usually denoted by the (angular) frequency ω0.
  2. The sum of the potential and kinetic energy stored in an oscillator is a constant, unless there’s some damping constant. In that case, the oscillation dies out. Here, you’ll remember the concept of the Q of an oscillator. If there’s some damping constant, the oscillation will die out and the relevant formula is 1/Q = (dW/dt)/(ω0W) = γ0, with γ the damping constant (not to be confused with the γ we used in that PVγ = C formula).

Now, for gases, we said that, for every independent direction of motion there is, the average kinetic energy for that direction will be kT/2. I admit it’s a bit of a stretch of the imagination but so that’s how the blackbody radiation analysis starts really: our atomic oscillators will have an average kinetic energy equal to kT/2 and, hence, their total energy (kinetic and potential) should be twice that amount, according to the second remark I made above. So that’s kT. We’ll note the total energy as W below, so we can write:

W = kT

Just to make sure we know what we’re talking about (one would forget, wouldn’t one?), kT is the product of the Boltzmann constant (1.38×10–23 J/K) and the temperature of the gas (so note that the product is expressed in joule indeed). Hence, that product is the average energy of our atomic oscillators in the gas in our furnace.

Now, I am not going to repeat all of the detail we presented on atomic oscillators (I’ll refer you, once again, to Feynman) but you may or may not remember that atomic oscillators do have a Q indeed and, hence, some damping constant γ. So we can use and re-write that formula above as

dW/dt = (1/Q)(ω0W) = (ω0W)(γ/ω0) = γW, which implies γ = (dW/dt)/W

What’s γ? Well, we’ve calculated the Q of an atomic oscillator already: Q = 3λ/4πr0. Now, λ = 2πc/ω(we just convert the wavelength into (angular) frequency using λν = c) and γ = ω0/Q, so we get γ = 4πr0ω0/[3(2πc/ω0)] = (2/3)r0ω02/c. Now, plugging that result back into the equation above, we get

dW/dt = γW = (2/3)(r0ω02kT)/c

Just in case you’d have difficulty following – I admit I did :-) – dW/dt is the average rate of radiation of light of (or near) frequency ω02. I’ll let Feynman take over here:

Next we ask how much light must be shining on the oscillator. It must be enough that the energy absorbed from the light (and thereupon scattered) is just exactly this much. In other words, the emitted light is accounted for as scattered light from the light that is shining on the oscillator in the cavity. So we must now calculate how much light is scattered from the oscillator if there is a certain amount—unknown—of radiation incident on it. Let I(ω)dω be the amount of light energy there is at the frequency ω, within a certain range dω (because there is no light at exactly a certain frequency; it is spread all over the spectrum). So I(ω) is a certain spectral distribution which we are now going to find—it is the color of a furnace at temperature T that we see when we open the door and look in the hole. Now how much light is absorbed? We worked out the amount of radiation absorbed from a given incident light beam, and we calculated it in terms of a cross section. It is just as though we said that all of the light that falls on a certain cross section is absorbed. So the total amount that is re-radiated (scattered) is the incident intensity I(ω)dω multiplied by the cross section σ.

OK. That makes sense. I’ll not copy the rest of his story though, because this is a post in a blog, not a textbook. What we need to find is that I(ω). So I’ll refer you to Feynman for the details (these ‘details’ involve fairly complicated calculations, which are less important than the basic assumptions behind the model, which I presented above) and just write down the result:

blackbody radiation formula

This formula is Rayleigh’s law. [And, yes, it’s the same Rayleigh – Lord Rayleigh, I should say respectfully – as the one who invented that criterion I introduced in my previous post, but so this law and that criterion have nothing to do with each other.] This ‘law’ gives the intensity, or the distribution, of light in a furnace. Feynman says it’s referred to as blackbody radiation because “the hole in the furnace that we look at is black when the temperature is zero.” […] OK. Whatever. What we call it doesn’t matter. The point is that this function tells us that the intensity goes as the square of the frequency, which means that if we have a box at any temperature at all, and if we look at it, the X- and gamma rays will be burning out eyes out ! The graph below shows both the theoretical curve for two temperatures (Tand 2T0), as derived above (see the solid lines), and then the actual curves for those two temperatures (see the dotted lines).

Blackbody radation graph

This is the so-called UV catastrophe: according to classical physics, an ideal black body at thermal equilibrium should emit radiation with infinite power. In reality, of course, it doesn’t: Rayleigh’s law is false. Utterly false. And so that’s where Planck came to the rescue, and he did so by assuming radiation is being emitted and/or absorbed in finite quanta: multiples of h, in fact.

Indeed, Planck studied the actual curve and fitted it with another function. That function assumed the average energy of a harmonic oscillator was not just proportional with the temperature (T), but that it was also a function of the (natural) frequency of the oscillators. By fiddling around, he found a simple derivation for it which involved a very peculiar assumption. That assumption was that the harmonic oscillator can take up energies only ħω at the time, as shown below.

Equally space energy levels

Hence, the assumption is that the harmonic oscillators cannot take whatever (continous) energy level. No. The allowable energy levels of the harmonic oscillators are equally spaced: E= nħω. Now, the actual derivation is at least as complex as the derivation of Rayleigh’s law, so I won’t do it here. Let me just give you the key assumptions:

  1. The gas consists of a large number of atomic oscillators, each with their own natural frequency ω0.
  2. The permitted energy levels of these harmonic oscillator are equally spaced and ħωapart.
  3. The probability of occupying a level of energy E is P(Eαe–E/kT.

All the rest is tedious calculation, including the calculation of the parameters of the model, which include ħ (and, hence, h, because h = 2πħ) and are found by matching the theoretical curves to the actual curves as measured in experiments. I’ll just mention one result, and that’s the average energy of these oscillators:


As you can see, the average energy does not only depend on the temperature T, but also on their (natural) frequency. So… Now you know where h comes from. As I relied so heavily on Feynman’s presentation here, I’ll include the link. As Feynman puts it: “This, then, was the first quantum-mechanical formula ever known, or ever discussed, and it was the beautiful culmination of decades of puzzlement. Maxwell knew that there was something wrong, and the problem was, what was right? Here is the quantitative answer of what is right instead of kT.”

So there you go. Now you know. :-) Oh… And in case you’d wonder: why the h? Well… Not sure. It’s said the h stands for Hilfsgrösse, so that’s some constant which was just supposed to help him out with the calculation. At that time, Planck did not suspect it would turn out to be one of the most fundamental physical constants. :-)

Post scriptum: I went quite far in my presentation of the basics of the kinetic theory of gases. You may wonder now. I didn’t use that theoretical PVγ = C relation, did I? And why all the fuss about photon gas? Well… That was just to introduce that PVγ = C relation, so I could note, here, in this post scriptum, that it has a similar problem. The γ exponent is referred to as the specific heat ratio of a gas, and it can be calculated theoretically as well, as we did–well… Sort of, because we skipped the actual derivation. However, their theoretical value also differs substantially from actually measured values, and the problem is the same: one should not assume that a continuous value for 〈E〉. Agreement between theory and experiment can only be reached when the same assumptions as those of Planck are used: discrete energy levels, multiples of ħ and ω: E= nħω. Also, the specific functional form which Planck used to resolve the blackbody radiation problem is also to be used here. For more details, I’ll refer to Feynman too. I can’t say this is easy to digest, but then who said it would be easy? :-)

The point to note is that the blackbody radiation problem wasn’t the only problem in the 19th century. As Feynman puts it: “One often hears it said that physicists at the latter part of the nineteenth century thought they knew all the significant physical laws and that all they had to do was to calculate more decimal places. Someone may have said that once, and others copied it. But a thorough reading of the literature of the time shows they were all worrying about something.” They were, and so Planck came up with something new. And then Einstein took it to the next level and then… Well… The rest is history. :-)