Vector analysis

The relationship between math and physics is deep. When studying physics, one sometimes feels physics and math become one and the same. However, they are not the same. In fact, physicists warn against emphasizing the math side of physics too much. To paraphrase one of the greatest post–World War II scientists, Richard Feynman: “It is not because you understand the Maxwell equations mathematically inside out, that you understand physics inside out.”

Indeed, while vector equations and fields and all those other mathematical constructs do represent physical realities, one needs to develop a ‘physical’ – as opposed to a ‘mathematical’ – understanding of the equations. Now you’ll ask: what’s a ‘physical’ understanding? Well… Let me quote Feynman once again on that: “A physical understanding is a completely unmathematical, imprecise, and inexact thing, but absolutely necessary for a physicist.”

That’s surprising, isn’t it? Especially taking into account Feynman’s eminence. Judging from what he writes in his Lectures, Feynman doesn’t like philosophers, and he’d surely say that there’s no need for metaphysics (i.e. the branch of philosophy that deals with first principles). That being said, I think that’s a matter of definition and interpretation. If metaphysics is defined as the branch of philosophy that deals with first principles, I’d say it’s rather obvious that physics is also based on some kind of ‘metaphysical’ model, and I wouldn’t hesitate to equate it with the Standard Model ! Indeed, from what I’ve learned so far, quantum mechanics has a lot in common with Pythagoras’ belief that mathematical concepts – and numbers in particular – might have greater ‘actuality’ than the reality they are supposed to describe.

But that’s not what I want to write about in this post (if you want to read more about this, I’ll refer you another blog of mine). In this post, I want to get back to basics.

Vectors in math and physics

It may surprise you, but the term ‘vector’, in physics and in math, refers to more than a dozen different concepts. Just check it out on Wikipedia, if you don’t believe me. In fact, as an autodidact, I’d say this is probably one of the key sources of confusion for people like us. A vector means many things indeed. The most common definitions are:

  1. Some one-dimensional array of numbers (i.e. as an element of Rn), or of anything really: numerical or alphanumerical values, blob files,… Whatever !
  2. A vector can also be a point vector. In that case, it represents the position of a point in space in two, three or even four dimensions (if we include time), in relation to some arbitrarily chosen origin (i.e. the zero point).
  3. A vector can also be a displacement vector: in that case, it will specify the change in position of a point relative to its previous position. Again, such displacement vectors may be two-, three- or four-dimensional.
  4. A vector may also refer to a so-called four-vector: a four-vector obeys very specific transformation rules, referred to as the Lorentz transformation. In this regard, you’ve surely heard of space-time vectors, referred to as events, and noted as X = (ct, r), with r the spatial vector r = (x, y, z) and c the speed of light (which, in this case, is nothing but a proportionality constant ensuring that space and time are measured in compatible units). But there’s many more four-vectors, one of them being P = (E/c, p), which relates energy and momentum in spacetime.
  5. We also have vector operators, like the gradient vector , and that’s what I want to write about in this post.

Vectors in physics

In physics, the term ‘vector’ may refer to any of the above but, usually, it will mean yet another thing: a field vector. Now, funnily enough, the term ‘field vector’, while being the most obvious description of what it is, is not widely used: what I call a ‘field vector’ is usually referred to as a gradient, and the vectors and B are usually referred to as the electric or magnetic field. Period.

E and B behave like h, which is the symbol used for the heat flow in some body or block of material: they are vector fields derived from a scalar field.

Huh? Scalar field? I thought we were talking vectors. We are. And I will also need to qualify the statement above: the relation between A (the magnetic potential – i.e. the ‘magnetic scalar field’) and Φ (the electrostatic potential, or ‘electric scalar field’) and B and E and the magnetic and electrostatic potential A and Φ is a bit more complicated than the relationship between temperature (T) – i.e. the scalar field determining the heat flow – and h, but the math that is involved is the same. But I am getting ahead of myself. Let’s look at h and T only for the moment.

As you know, the temperature is a measure for energy. In a block of material, the temperature T will be a scalar: some real number that we can measure in Kelvin, Fahrenheit or Celsius but – whatever unit we use – any observer using the same unit will measure the same at any given point. That’s what distinguishes a ‘scalar’ from a ‘real number': a scalar field is something real. 

The same is true for a vector field: it is something real. As Feynman puts it: “It is not true that any three numbers form a vector [in physics, that is]. It is true only if, when we rotate the coordinate system, the components of the vector transform among themselves in the correct way.” What’s the ‘correct way’? It’s a way that ensures that any observer using the same unit will measure the same at any given point.


In physics, we can associate a point in space with physical realities, such as:

  1. Temperature, the ‘height‘ of a body in a gravitational field, or the pressure distribution in a gas or a fluid, are all examples of scalar fields: they are just (real) numbers from a math point of view but, because they do represent a physical reality, these ‘numbers’ respect certain mathematical conditions: in practice, they will be a continuous or continuously differentiable function of position.
  2. Heat flow (h), the velocity (v) of the molecules/atoms in a rotating object, or the electric field (E), are examples of vector fields. As mentioned above, the same condition applies: any observer using the same unit should measure the same at any given point.
  3. Tensors, which represent, for example, stress or strain at some point in space (in various directions), or the curvature of space in the general theory of relativity.
  4. Finally, there are also spinors, which are often defined as a “generalization of tensors using complex numbers instead of real numbers.” They are very relevant in quantum mechanics, it is said, but I don’t know enough about them to say anything about them, and so I won’t.

Back to basics indeed. How do we derive a vector field from a scalar field? Let’s study temperature and heat flow. The two illustrations below (taken from Feynman’s Lectures) illustrate the ‘mechanics’ behind it: the (magnitude of the) heat flow (h) is the amount of thermal energy (ΔJ) that passes, per unit time and per unit area, through an infinitesimal surface element at right angles to the direction of flow (which is, obviously, from the hotter to the colder places).

Fig 1 Fig 2

A vector has both a magnitude and a direction, as defined above, and, hence, if we define ef as the unit vector in the direction of flow, we can write:

h = h·ef = (ΔJ/Δa)·ef

ΔJ stands for the thermal energy flowing through an area marked as Δa in the diagram above. It is measured per unit time, obviously. Hence, the heat flow is the flow of thermal energy per unit area.

Using simple trigonometry (but all is relative: it took me a while to figure out that the heat flow through the Δa1 and Δa2 areas below, are, in effect, the same :-( and then I also needed some time to figure out the cosine factor in the formula below) yield an equally simple formula for the heat flow through any surface Δa2 (i.e. any surface that is not at right angles to the heat flow h):

ΔJ/Δa2 = (ΔJ/Δa1cosθ = h·n

Fig 3

In this equation, we have the scalar product of two vectors: (1) h, the heat flow and (2) n, the unit vector that is normal (orthogonal) to the surface Δa2. At this point, I need to remind you of the definition of the scalar product of two vectors. It yields a (real) number:

A·B = |A||B|cos(θ), with θ the angle between A and B

In this case, h·n = |h||n|cosθ = |h|·1·cosθ = |h|cosθ. For example, when the surfaces coincide, the angle θ will be zero and then h·n is just equal to |h|cosθ = |h| = h·1 = h = ΔJ/Δa1. The other extreme is that orthogonal surfaces: in that case, the angle θ will be 90° and, hence, h·n = |h||n|cos(π/2) = |h|·1·0 = 0: there is no heat flow normal to the direction of heat flow.

OK. That’s clear enough. The point to note is that the vectors h and n represent physical entities and, therefore, they do not depend on our reference frame (except for the units we use to measure things). That allows us to define  vector equations.

The ∇ (del) operator and the gradient

Let’s continue our example of temperature and heat flow. In a block of material, the temperature (T) will vary in the x, y and z direction and, hence, the partial derivatives ∂T/∂x, ∂T/∂y and ∂T/∂z make sense: they measure how the temperature varies with respect to position. Now, the remarkable thing is that the 3-tuple (∂T/∂x, ∂T/∂y, ∂T/∂z) is a physical vector itself: it is independent, indeed, of the reference frame (provided we measure stuff in the same unit) – so we can do a translation and/or a rotation of the coordinate axes and we get the same value. This means this set of three numbers is a vector indeed:

(∂T/∂x, ∂T/∂y, ∂T/∂z) = a vector (1)

If you like to see a formal proof of this, I’ll refer you to Feynman once again – but I think the intuitive argument will do: if temperature and space are real, then the derivatives of temperature in regard to the x-, y- and z-directions should be equally real, isn’t it? Let’s go for the more intricate stuff now.

If we go from one point to another, in the x-, y- or z-direction, then we can define some displacement vector ΔR = (Δx, Δy, Δz), and the difference in temperature between two nearby points (ΔT) will tend to the total differential:

ΔT = (∂T/∂x)Δx + (∂T/∂y)Δy + (∂T/∂z)Δz (2)

Equation (1) and (2) combine to:

ΔT = (∂T/∂x, ∂T/∂y, ∂T/∂z)(Δx, Δy, Δz) = T·ΔR

In this equation, we used the (del) operator, i.e. the vector differential operator. It’s an operator like the differential operator ∂/∂x but, unlike the derivative, it returns a vector, referred to as the gradient: T, in this case. In other words, the operator acts on a scalar (T) and yields a vector:

T = (∂T/∂x, ∂T/∂y, ∂T/∂z)

That’s why we write  in bold-type too, just like the vector R. [As you know, using bold-type (instead of an arrow or so) is a convenient way to mark a vector.]

If T is a vector, what’s its direction? Think about it. […] The rate of change of T in the x-, y- and z-direction are the x-, y- and z-component of our T vector respectively. In fact, the rate of change of T in any direction will be the component of the T vector in that direction. Now, the magnitude of a vector component will always be smaller than the magnitude of the vector itself, except if it’s the component in the same direction as the vector, in which case the component is the vector. Therefore, the direction of T will be the direction in which the rate of change of T is largest. In Feynman’s words: “The gradient of T has the direction of the steepest uphill slope in T.” But it is obvious what direction that is: it is the opposite direction of the heat flow h.

Now we’re ready to write our first vector equation:

h = –κT     

This is simple enough to understand: the direction of heat flow is opposite to the direction of T (so its flows from higher to lower temperature, as we would expect, of course!), and its magnitude is proportional, with the constant of proportionality equal to κ (kappa), which is called the thermal conductivity. In case you wonder what this means (don’t get lost in the math, indeed!), remember that the heat flow is the flow of thermal energy per unit area (and per unit time, of course): |h| = h = ΔJ/Δa.

So what’s the point? Well… We have a scalar field here, the temperature T, from which we can derive the heat flow h, i.e. a vector quantity, using this new operator . This is a most remarkable result, and we’ll encounter the same equation elsewhere. For example, if the electric potential is Φ, then we can immediately calculate the electric field using the following formula:

E = –Φ

The situation is entirely analogous from a mathematical point of view. For example, we have the same minus sign, so E also ‘flows’ from higher to lower potential.

Note: The formula for E above is only valid in electrostatics, i.e. when there are no moving charges. When moving charges are involved, we also have the magnetic force coming into play, and then equations become more complicated.

Operations with ∇:  divergence and curl

You may think we’ve covered a lot of ground already but, in fact, we only just got started. In what I wrote above, I emphasized the physics. Let me now turn to the math involved. Let’s start by dissociating the operator from the scalar field, so we just write:

 = (∂/∂x, ∂/∂y, ∂/∂z)

This doesn’t mean anything, you’ll say, because the operator has nothing to operate on. And, yes, you’re right. However, in math, it doesn’t matter: we can combine this ‘meaningless’ operator with something else. We can do a scalar vector product, for example:

·(a vector)

We can do this product because has three components, so it’s a ‘vector’ too, and then we need to make sure that the vector we’re operating on also has three components. To continue with our examples, we can write

·h = (∂/∂x, ∂/∂y, ∂/∂z)·(hxhyhz) = ∂hx/∂x + ∂hy/∂y, ∂hz/∂z

What we have here, in fact, is a new operator. It’s a vector operator, because it acts on a vector, but note the operator yields a scalar as a result. [Remember that our del operator acted on a scalar to yield a vector, so it was the other way around.] It’s an important operator in physics, and so it has a name and a symbol of its own:

·h = div h = the divergence of h

The physical significance of the divergence is related to the so-called flux of a vector field: it measures the magnitude of a field’s source or sink at a given point. Continuing our example with temperature, consider air as it is heated or cooled. The relevant vector field is now the velocity of the moving air at a point. If air is heated in a particular region, it will expand in all directions such that the velocity field points outward from that region. Therefore the divergence of the velocity field in that region would have a positive value, as the region is a source. If the air cools and contracts, the divergence has a negative value, as the region is a sink.

A less intuitive definition is the following: the divergence represents the volume density of the outward flux of a vector field from an infinitesimal volume around a given point. We’ll come back to this definition when we’re ready to define the concept of flux somewhat accurately. For now, just note two of Maxwell’s famous equations involve the divergence operator:

·E = ρ/ε0 and ·B = 0

In my previous post, I gave a verbal description of those two equations:

  1. The flux of E (through a closed surface) = (the net charge inside)/ε0
  2. The flux of B (through a closed surface) = zero

The first equation basically says that electric charges cause an electric field. The second equation basically says there is no such thing as a magnetic charge: the magnetic force only appears when charges are moving and/or when electric fields are changing.

Of course, you’ll anticipate the second new operator now, because that’s the one that appears in the other two equations in Maxwell’s set of equations. It’s the cross product:

∇×E = (∂/∂x, ∂/∂y, ∂/∂z)×(Ex, Ey, Ez) = … What?

Well… The cross product is not as straightforward to write down as the dot product. We get a vector indeed, not a scalar, and its three components are:

(∇×E)z = ∇xEyE= ∂Ey/∂x – ∂Ex/∂y

(∇×E)x = ∇yEzE= ∂Ez/∂y – ∂Ey/∂z

(∇×E)y = ∇zExE= ∂Ex/∂z – ∂Ez/∂x

I know this looks pretty monstrous, but so that’s how cross products work. I gave the geometric formula for a dot product above, so I should also give you the same for a cross product:

A×B = |A||B|sin(θ)n

In this formula, we once again have θ, the angle between A and B, but note we take its sine this time. In addition, we have n: a unit vector at right angles to both A and B. It’s what makes the cross product a vector, and its direction is given by that right-hand rule which we encountered a couple of times already.

Just like we did when using the del operator in a dot product, we also have a special name and symbol for using the del operator in a cross product:

∇×h = curl h = the curl of h

The curl is, just like the divergence, a vector operator, because its acts on a vector, but so its result is a vector too, not a scalar. What’s the geometric interpretation of the curl? Well… It’s a bit hard to describe that but let’s try. The curl describes the rotation of a vector field, so the length and direction of the curl vector characterize the rotation at that point:

  1. The direction of the curl is the axis of rotation, as determined by the right-hand rule. [By now, you know that, in physics, there’s not much use for a left hand. :-)]
  2. The magnitude of the curl is the magnitude of rotation.

I know. This is pretty abstract, and I’ll probably have to come back to it in another post. As for now, just note we defined three new operators in this ‘introduction’ to vector analysis:

  1. T = grad T = a vector
  2. ∇·h = div h = a scalar
  3. ×h = curl h = a vector

That’s all. It’s all we need to understand Maxwell’s famous equations:

Maxwell's equations-2

Huh? Hmm… You’re right: understanding the symbols, to some extent, doesn’t mean we ‘understand’ these equations. What does it mean to ‘understand’ an equation? Let me quote Feynman on that: “What it means really to understand an equation—that is, in more than a strictly mathematical sense—was described by Dirac. He said: “I understand what an equation means if I have a way of figuring out the characteristics of its solution without actually solving it.” So if we have a way of knowing what should happen in given circumstances without actually solving the equations, then we “understand” the equations, as applied to these circumstances.”

Well… It’s obvious this short post isn’t enough to reach such understanding. For that, we’ll need much more. But, for the moment, I’ll leave it at this. :-)

Back to tedious stuff: an introduction to electromagnetism

It seems I skipped too many chapters in Feynman’s second volume of Lectures (on electromagnetism) and so I have to return to that before getting back to quantum physics. So let me just do that in the next couple of posts. I’ll have to start with the basics: Maxwell’s equations.

Indeed, electromagnetic phenomena are described by a set of four equations known as Maxwell’s equations. They relate two fields: the electric field (E) and the magnetic field (B). The electric field appears when we have electric charges: positive (e.g. protons or positively charged ions) or negative (e.g. electrons or negatively charged ions). That’s obvious.

In contrast, there is no such thing as ‘magnetic charges’. The magnetic field appears only when the electric field changes, or when charges move. In turn, the change in the magnetic field causes an electric field, and that’s how electromagnetic radiation basically works: a changing electric field causes a magnetic field, and the build-up of that magnetic field (so that’s a changing magnetic field) causes a build-up of an electric field, and so on and so on.

OK. That’s obvious too. But how does it work exactly? Before explaining this, I need to point out some more ‘obvious’ things:

1. From Maxwell’s equations, we can calculate the magnitude of E and B. Indeed, a specific functional form for E and is what we get when we solve Maxwell’s set of equations, and we’ll jot down that solution in a moment–even if I am afraid you will shake your head when you see it. The point to note is that what we get as a solution for E and B is a solution in a particular frame of reference only: if we switch to another reference frame, E and B will look different.

Huh? Yes. According to the principle of relativity, we cannot say which charges are ‘stationary’ and which charges are ‘moving’ in any absolute sense: it all depends on our frame our reference.

But… Yes? Then if we put an electric charge in these fields, the force on it will also be different?

Yes. Forces also look different when moving from one reference to another.

But… Yes? The physical effect surely has to be the same, regardless of the reference frame?

Yes. The point is that, if we look at an electric charge q moving along a current-carrying wire in a coordinate system at rest with respect to the wire, with the same velocity (v0) as the conduction electrons (v), then the whole force on the electric charge will be ‘magnetic': F = qv0×B and E = 0. Now, if we’re looking at the same situation from a frame of reference that is moving with q, then our charge is at rest, and so there can be no magnetic force on it. Hence, the force on it must come from an electric field! But what produces the electric field? Our current-carrying wire is supposed to be neutral!

Well… It turns out that our ‘neutral’ wire appears to be charged when moving. We’ll explain – in very much detail – why this is so later. Now, you should just note that “we should not attach too much reality to E and B, because they appear in different ‘mixtures’ in different coordinate systems”, as Feynman puts it. In fact, you may or may not heard that magnetism is actually nothing but a “relativistic effect” of electricity. Well… That’s true, but we’ll also explain how that works later only. Let’s not jump the gun.

2. The remark above is related to the other ‘obvious’ thing I wanted to say before presenting Maxwell’s equations: fields are very useful to describe what’s going on but, when everything is said and done, what we really want to know is what force will be acting on a charge, because that’s what’s going to tell us how that charge is going to move. In other words, we want to find the equations of motion, and the force determines how the charge’s momentum will change: F = dp/dt = d(mv)/dt (i.e. Newton’s equation of motion).

So how does that work? We’ve given the formula before:

F = q(E + v×B) = qE + q(v×B)

This is a sum of two vectors:

  1. qE is the ‘electric force: that force is in the same direction as the electric field, but with a magnitude equal to q times E. [Note I use a bold letter (E) for a vector (which we may define as some quantity with a direction) and a non-bold letter (E) for its magnitude.]
  2. q(v×B) is the ‘magnetic’ force: that force depends on both v as well as on B. Its direction is given by the so-called right-hand rule for a vector cross-product (as opposed to a dot product, which is denoted by a dot (·) and which yields a scalar instead of a new vector).

That right-hand rule is illustrated below. Note that, if we switch a and b, the b×a vector will point downwards. The magnitude of q(v×B) is given by |v×B| = |v||B|sinθ (with θ the angle between v and B).    


We know the direction of (because we’re talking about some charge that is moving here) but what direction is B? It’s time to be a bit more systematic now.

Flux and circulation

In order to understand Maxwell’s equations, one needs to understand two concepts related to a vector field: flux and circulation. The two concepts are best illustrated referring to a vector field describing the flow of a liquid:

1. If we have a surface, the flux will give us the net amount of fluid going out through the surface per unit time. The illustration below (which I took from Feynman’s Lectures) gives us not only the general idea but a formal definition as well:


2. The concept of circulation is linked to the idea of some net rotational motion around some loop. In fact, that’s exactly what it describes. I’ll again use Feynman’s illustration (and description) because I couldn’t find anything better.

circulation-1 circulation-2 circulation-3

Diagram (a) gives us the velocity field in the liquid. Now, imagine a tube (of uniform cross section) that follows some arbitrary closed curve, like in (b), and then imagine we’d suddenly freeze the liquid everywhere except inside the tube: the liquid in the tube would circulate as shown in (c). Formally, the circulation is defined as:

circulation = (the average tangential component)·(the distance around)

OK. So far, so good. Back to electromagnetism.

E and B

We’re familiar with the electric field E from our high school physics course. Indeed, you’ll probably recognize the two examples below: (a) a (positive) charge near a (neutral) conducting sheet, and (b) two opposite charges next to each other. Note the convention: the field lines emanate from the positive charge. Does that mean that the force is in that direction too? Yes. But remember: if a particle is attracted to another, the latter particle is attracted to the former too! So there’s a force in both directions !


What more can we say about this? Well… It is clear that the field E is directed radially. In terms of our flux and circulation concepts, we say that there’s an outgoing flux from the (positive) point charge. Furthermore, it would seem to be pretty obvious (we’d need to show why, but we won’t do that here: just look at Coulomb’s Law once again) that the flux should be proportional to the charge, and it is: if we double the charge, the flux doubles too. That gives us Maxwell’s first equation:

flux of E through a closed surface = (the net charge inside)/ε0

Note we’re talking a closed surface here, like a sphere for example–but it does not have to be a nice symmetric shape: Maxwell’s first equation is valid for any closed surface. The expression above is Coulomb’s Law, which you’ll also surely remember from your high school physics course: while it looks very different, it’s the same. It’s just because we’re using that flux concept here that we seem to be getting an entirely different expression. But so we’re not: it’s the same as Coulomb’s Law.

As for the ε0 factor, that’s just a constant that depends on the units we’re using to measure what we write above, so don’t worry about it. [I am noting it here because you’ll see it pop up later too.]

For B, we’ve got a similar-looking law:

flux of B through a closed surface = 0 (= zero = nil)

That’s not the same, you’ll say. Well… Yes and no. It’s the same really, but the zero on the right-hand side of the expression above says there’s no such thing as a ‘magnetic’ charge.

Hmm… But… If we can’t create any flux of B, because ‘magnetic charges’ don’t exist, so how do we get magnetic fields then? 

Well… We wrote that above already, and you should remember it from your high school physics course as well: a magnetic field is created by (1) a moving charge (i.e. a flow or flux of electric current) or (2) a changing electric field.

Situation (1) is illustrated below: the current in the wire creates some circulation of B around the wire. How much? Not much: the magnetic effect is very small as compared to the electric effect (that has to do with magnetism being a relativistic effect of electricity but, as mentioned above, I’ll explain that later only). To be precise, the equation is the following:

c2(circulation of B)= (flux of electric current)/ε0

magnetic field wire

That c2 factor on the left-hand side becomes 1/c2 if we move it to the other side and, yes, is the speed of light here – so you can see we’re talking a very small amount of circulation only indeed! [As for the ε0 factor, that’s just the same constant: it’s got to do with the units we’re using to measure stuff.]

One last point perhaps: what’s the direction of the circulation? Well… There’s a so-called right-hand grip rule for that, which is illustrated below.


OK. Enough about this. Let’s go to situation (2): a changing electric field. That effect is usually illustrated with Faraday’s original 1831 experiment, which is shown below with a more modern voltmeter :-) : when the wire on one side of the iron ring is connected to the battery, we’ll see a transient current on the other side. It’s transient only, so the current quickly disappears. That’s why transformers don’t work with DC. In fact, it is said that Faraday was quite disappointed to see that the current didn’t last! Likewise, when the wire is disconnected, we’ll briefly see another transient current.


So this effect is due to the changing electric field, which causes a changing magnetic field. But so where is that magnetic field? We’re talking currents here, aren’t we? Yes, you’re right. To understand why we have a transient current in the voltmeter, you need to understand yet another effect: a changing magnetic field causes an electric field, and so that’s what actually generates the transient current. However, what’s going on in the iron ring is the magnetic effect, and so that’s caused by the changing electric field as we connect/disconnect the battery to the wire. Capito?

I guess so… So what’s the equation that captures this situation, i.e. situation (2)? That equation involves both flux and circulation, so we’ll have a surface (S) as well as a curve (C). The equation is the following one: for any surface S (not closed this time because, if the surface was closed, it wouldn’t have an edge!), we have:

c2(circulation of B around C)= d(flux of E through S)/dt

I mentioned above that the reverse is also true. A changing magnetic field causes an electric field, and the equation for that looks very similar, except that we don’t have the c2 factor:

circulation of around = d(flux of through S)/dt

Let me quickly mention the presence of absence of that c2 or 1/c2 factor in the previous equations once again. It is interesting. It’s got nothing to do with the units. It’s really a proportionality factor: any change in E will only cause a little change in (because of the 1/c2 factor in the first equation), but the reverse is not true: there’s no c2  in the second equation. Again, it’s got to do with magnetism being a relativistic effect of electricity, so the magnetic effect is, in most cases, tiny as compared to the electric effect, except when we’re talking charges that are moving at relativistic speeds (i.e. speeds close to c). As said, we’ll come back to that–later, much later. Let’s get back to Maxwell’s equations first.

Maxwell’s equations

We can now combine all of the equations above in one set, and so these are Maxwell’s four famous equations:

  1. The flux of E through a closed surface = (the net charge inside)/ε0
  2. The circulation of E around = d(flux of through S)/dt (with the curve or edge around S)
  3. The flux of B through a closed surface = 0
  4. c2(circulation of B around C)= d(flux of E through S)/dt + (flux of electric current)/ε0

From a mathematical point of view, this is a set of differential equations, and they are not easy to grasp intuitively. As Feynman puts it: “The laws of Newton were very simple to write down, but they had a lot of complicated consequences and it took us a long time to learn about them all. These laws are not nearly as simple to write down, which means that the consequences are going to be more elaborate and it will take us quite a lot of time to figure them all out.”

Indeed, Feynman needs about twenty (!) Lectures in that second Volume to show what it all implies, as he walks us through electrostatics, magnetostatics and various other ‘special’ cases before giving us the ‘complete’ or ‘general’ solution to the equations. This ‘general’ solution, in mathematical notation, is the following:

Maxwell's equations

Huh? What’s that? Well… The four equations are the equations we explained already, but this time in mathematical notation: flux and circulation can be expressed much more elegantly using the differential operator  indeed. As for the solutions to Maxwell’s set of equations, you can see they are expressed using two other concepts: the scalar potential Φ and the vector potential A.

Now, it is not my intention to summarize two dozen of Feynman’s Lectures in just a few lines, so I’ll have to leave you here for the moment.


Huh? What? What about my promise to show that magnetism is a relativistic effect of electricity indeed?

Well… I wanted to do that just now, but when I look at it, I realize that I’d end up copying most of Feynman’s little exposé on it and, hence, I’ll just refer you to that particular section. It’s really quite exciting but – as you might expect – it does take a bit of time to wrestle through it.

That being said, it really does give you a kind of an Aha-Erlebnis and, therefore, I really warmly recommend it ! Just click on the link ! :-)

Amplitudes and statistics

When re-reading Feynman’s ‘explanation’ of Bose-Einstein versus Fermi-Dirac statistics (Lectures, Vol. III, Chapter 4), and my own March 2014 post summarizing his argument, I suddenly felt his approach raises as many questions as it answers. So I thought it would be good to re-visit it, which is what I’ll do here. Before you continue reading, however, I should warn you: I am not sure I’ll manage to do a better job now, as compared to a few months ago. But let me give it a try.

Setting up the experiment

The (thought) experiment is simple enough: what’s being analyzed is the (theoretical) behavior of two particles, referred to as particle a and particle b respectively that are being scattered into  two detectors, referred to as 1 and 2. That can happen in two ways, as depicted below: situation (a) and situation (b). [And, yes, it’s a bit confusing to use the same letters a and b here, but just note the brackets and you’ll be fine.] It’s an elastic scattering and it’s seen in the center-of-mass reference frame in order to ensure we can analyze it using just one variable, θ, for the angle of incidence. So there is no interaction between those two particles in a quantum-mechanical sense: there is no exchange of spin (spin flipping) nor is there any exchange of energy–like in Compton scattering, in which a photon gives some of its energy to an electron, resulting in a Compton shift (i.e. the wavelength of the scattered photon is different than that of the incoming photon). No, it’s just what it is: two particles deflecting each other. […] Well… Maybe. Let’s fully develop the argument to see what’s going on.

Identical particles-aIdentical particles-b

First, the analysis is done for two non-identical particles, say an alpha particle (i.e. a helium nucleus) and then some other nucleus (e.g. oxygen, carbon, beryllium,…). Because of the elasticity of the ‘collision’, the possible outcomes of the experiment are binary: if particle a gets into detector 1, it means particle b will be picked up by detector 2, and vice versa. The first situation (particle a gets into detector 1 and particle b goes into detector 2) is depicted in (a), i.e. the illustration on the left above, while the opposite situation, exchanging the role of the particles, is depicted in (b), i.e. the illustration on the right-hand side. So these two ‘ways’ are two different possibilities which are distinguishable not only in principle but also in practice, for non-identical particles that is (just imagine a detector which can distinguish helium from oxygen, or whatever other substance the other particle is). Therefore, strictly following the rules of quantum mechanics, we should add the probabilities of both events to arrive at the total probability of some particle (and with ‘some’, I mean particle a or particle b) ending up in some detector (again, with ‘some’ detector, I mean detector 1 or detector 2).

Now, this is where Feynman’s explanation becomes somewhat tricky. The whole event (i.e. some particle ending up in some detector) is being reduced to two mutually exclusive possibilities that are both being described by the same (complex-valued) wave function f, which has that angle of incidence as its argument. To be precise: the angle of incidence is θ for the first possibility and it’s π–θ for the second possibility. That being said, it is obvious, even if Feynman doesn’t mention it, that both possibilities actually represent a combination of two separate things themselves:

  1. For situation (a), we have particle a going to detector 1 and particle b going to detector 2. Using Dirac’s so-called bra-ket notation, we should write 〈1|a〉〈2|b〉 = f(θ), with f(θ) a probability amplitude, which should yield a probability when taking its absolute square: P(θ) = |f(θ)|2.
  2. For situation (b), we have particle b going to detector 1 and particle a going to 2, so we have 〈1|b〉〈2|a〉, which Feynman equates with f(π–θ), so we write 〈1|b〉〈2|a〉 = 〈2|a〉〈1|b〉 = f(π–θ).

Now, Feynman doesn’t dwell on this–not at all, really–but this casual assumption–i.e. the assumption that situation (b) can be represented by using the same wave function f–merits some more reflection. As said, Feynman is very brief on it: he just says situation (b) is the same situation as (a), but then detector 1 and detector 2 being switched (so we exchange the role of the detectors, I’d say). Hence, the relevant angle is π–θ and, of course, it’s a center-of-mass view again so if a goes to 2, then b has to go to 1. There’s no Third Way here. In short, a priori it would seem to be very obvious indeed to associate only one wave function (i.e. that (complex-valued) f(θ) function) with the two possibilities: that wave function f yields a probability amplitude for θ and, hence, it should also yield some (other) probability amplitude for π–θ, i.e. for the ‘other’ angle. So we have two probability amplitudes but one wave function only.

You’ll say: Of course! What’s the problem? Why are you being fussy? Well… I think these assumptions about f(θ) and f(π–θ) representing the underlying probability amplitudes are all nice and fine (and, yes, they are very reasonable indeed), but I also think we should take them for what they are at this moment: assumptions.

Huh? Yes. At this point, I would like to draw your attention to the fact that the only thing we can measure are real-valued possibilities. Indeed, when we do this experiment like a zillion times, it will give us some real number P for the probability that a goes to 1 and b goes to 2 (let me denote this number as P(θ) = Pa→1 and b→2), and then, when we change the angle of incidence by switching detector 1 and 2, it will also give us some (other) real number for the probability that a goes to 2 and b goes to 1 (i.e. a number which we can denote as P(π–θ) = Pa→2 and b→1). Now, while it would seem to be very reasonable that the underlying probability amplitudes are the same, we should be honest with ourselves and admit that the probability amplitudes are something we cannot directly measure.

At this point, let me quickly say something about Dirac’s bra-ket notation, just in case you haven’t heard about it yet. As Feynman notes, we have to get away from thinking too much in terms of wave functions traveling through space because, in quantum mechanics, all sort of stuff can happen (e.g. spin flipping) and not all of it can be analyzed in terms of interfering probability amplitudes. Hence, it’s often more useful to think in terms of a system being in some state and then transitioning to some other state, and that’s why that bra-ket notation is so helpful. We have to read these bra-kets from right to left: the part on the right, e.g. |a〉, is the ket and, in this case, that ket just says that we’re looking at some particle referred to as particle a, while the part on the left, i.e. 〈1|, is the bra, i.e. a shorthand for particle a having arrived at detector 1. If we’d want to be complete, we should write:

〈1|a〉 = 〈particle a arrives at detector 1|particle a leaves its source〉

Note that 〈1|a〉 is some complex-valued number (i.e. a probability amplitude) and so we multiply it here with some other complex number, 〈2|b〉, because it’s two things happening together. As said, don’t worry too much about it. Strictly speaking, we don’t need wave functions and/or probability amplitudes to analyze this situation because there is no interaction in the quantum-mechanical sense: we’ve got a scattering process indeed (implying some randomness in where those particles end up, as opposed to what we’d have in a classical analysis of two billiard balls colliding), but we do not have any interference between wave functions (probability amplitudes) here. We’re just introducing the wave function f because we want to illustrate the difference between this situation (i.e. the scattering of non-identical particles) and what we’d have if we’d be looking at identical particles being scattered.

At this point, I should also note that this bra-ket notation is more in line with Feynman’s own so-called path integral formulation of quantum mechanics, which is actually implicit in his line of argument: rather than thinking about the wave function as representing the (complex) amplitude of some particle to be at point x in space at point t in time, we think about the amplitude as something that’s associated with a path, i.e. one of the possible itineraries from the source (its origin) to the detector (its destination). That explains why this f(θ) function doesn’t mention the position (x) and space (t) variables. What x and t variables would we use anyway? Well… I don’t know. It’s true the position of the detectors is fully determined by θ, so we don’t need to associate any x or t with them. Hence, if we’d be thinking about the space-time variables, then we should be talking the position in space and time of both particle a and particle b. Indeed, it’s easy to see that only a slight change in the horizontal (x) or vertical position (y) of either particle would ensure that both particles do not end up in the detectors. However, as mentioned above, Feynman doesn’t even mention this. Hence, we must assume that any randomness in any x or t variable is captured by that wave function f, which explains why this is actually not a classical analysis: so, in short, we do not have two billiard balls colliding here.

Hmm… You’ll say I am a nitpicker. You’ll say that, of course, any uncertainty is indeed being incorporated in the fact that we represent what’s going on by a wave function f which we cannot observe directly but whose absolute square represents a probability (or, to use precise statistical terminology, a probability density), which we can measure: P = |f(θ)|2 = f(θ)·f*(θ), with f* the complex conjugate of the complex number f. So… […] What? Well… Nothing. You’re right. This thought experiment describes a classical situation (like two billiard balls colliding) and then it doesn’t, because we cannot predict the outcome (i.e. we can’t say where the two billiard balls are going to end up: we can only describe the likely outcome in terms of probabilities Pa→1 and b→2 = |f(θ)|and Pa→2 and b→1 = |f(π–θ)|2. Of course, needless to say, the normalization condition should apply: if we add all probabilities over all angles, then we should get 1, we can write: ∫|f(θ)|2dθ = ∫f(θ)·f*(θ)dθ = 1. So that’s it, then?

No. Let this sink in for a while. I’ll come back to it. Let me first make a bit of a detour to illustrate what this thought experiment is supposed to yield, and that’s a more intuitive explanation of Bose-Einstein statistics and Fermi-Dirac statistics, which we’ll get out of the experiment above if we repeat it using identical particles. So we’ll introduce the terms Bose-Einstein statistics and Fermi-Dirac statistics. Hence, there should also be some term for the reference situation described above, i.e a situation in which we non-identical particles are ‘interacting’, so to say, but then with no interference between their wave functions. So, when everything is said and done, it’s a term we should associate with classical mechanics. It’s called Maxwell-Boltzmann statistics.

Huh? Why would we need ‘statistics’ here? Well… We can imagine many particles engaging like this–just colliding elastically and, thereby, interacting in a classical sense, even if we don’t know where exactly they’re going to end up, because of uncertainties in initial positions and what have you. In fact, you already know what this is about: it’s the behavior of particles as described by the kinetic theory of gases (often referred to as statististical mechanics) which, among other things, yields a very elegant function for the distribution of the velocities of gas molecules, as shown below for various gases (helium, neon, argon and xenon) at one specific temperature (25º C), i.e. the graph on the left-hand side, or for the same gas (oxygen) at different temperatures (–100º C, 20º C and 600º C), i.e. the graph on the right-hand side.

Now, all these density functions and what have you are, indeed, referred to as Maxwell-Boltzmann statistics, by physicists and mathematicians that is (you know they always need some special term in order to make sure other people (i.e. people like you and me, I guess) have trouble understanding them).

700px-MaxwellBoltzmann-en 800px-Maxwell-Boltzmann_distribution_1

In fact, we get the same density function for other properties of the molecules, such as their momentum and their total energy. It’s worth elaborating on this, I think, because I’ll later compare with Bose-Einstein and Fermi-Dirac statistics.

Maxwell-Boltzmann statistics

Kinetic gas theory yields a very simple and beautiful theorem. It’s the following: in a gas that’s in thermal equilibrium (or just in equilibrium, if you want), the probability (P) of finding a molecule with energy E is proportional to e–E/kT, so we have:

P ∝ e–E/kT

Now that’s a simple function, you may think. If we treat E as just a continuous variable, and T as some constant indeed – hence, if we just treat (the probability) P as a function of (the energy) E – then we get a function like the one below (with the blue, red and green using three different values for T).


So how do we relate that to the nice bell-shaped curves above? The very simple graphs above seem to indicate the probability is greatest for E = 0, and then just goes down, instead of going up initially to reach some maximum around some average value and then drop down again. Well… The fallacy here, of course, is that the constant of proportionality is itself dependent on the temperature. To be precise, the probability density function for velocities is given by:

Boltzmann distribution

The function for energy is similar. To be precise, we have the following function:

Boltzmann distribution-energy

This (and the velocity function too) is a so-called chi-squared distribution, and ϵ is the energy per degree of freedom in the system. Now these functions will give you such nice bell-shaped curves, and so all is alright. In any case, don’t worry too much about it. I have to get back to that story of the two particles and the two detectors.

However, before I do so, let me jot down two (or three) more formulas. The first one is the formula for the expected number 〈Ni〉 of particles occupying energy level ε(and the brackets here, 〈Ni〉, have nothing to do with the bra-ket notation mentioned above: it’s just a general notation for some expected value):

Boltzmann distribution-no of particlesThis formula has the same shape as the ones above but we brought the exponential function down, into the denominator, so the minus sign disappears. And then we also simplified it by introducing that gi factor, which I won’t explain here, because the only reason why I wanted to jot this down is to allow you to compare this formula with the equivalent formula when (a) Fermi-Dirac and (b) Bose-Einstein statistics apply:

B-E and F-D distribution-no of particles

Do you see the difference? The only change in the formula is the ±1 term in the denominator: we have a minus one (–1) for Fermi-Dirac statistics and a plus one (+1) for Bose-Einstein statistics indeed. That’s all. That’s the difference with Maxwell-Boltzmann statistics.

Huh? Yes. Think about it, but don’t worry too much. Just make a mental note of it, as it will be handy when you’d be exploring related articles. [And, of course, please don’t think I am bagatellizing the difference between Maxwell-Boltzmann, Bose-Einstein and Fermi-Dirac statistics here: that ±1 term in the denominator is, obviously, a very important difference, as evidenced by the consequences of formulas like the one above: just think about the crowding-in effect in lasers as opposed to the Pauli exclusion principle, for example. :-)]

Setting up the experiment (continued)

Let’s get back to our experiment. As mentioned above, we don’t really need probability amplitudes in the classical world: ordinary probabilities, taking into account uncertainties about initial conditions only, will do. Indeed, there’s a limit to the precision with which we can measure the position in space and time of any particle in the classical world as well and, hence, we’d expect some randomness (as captured in the scattering phenomenon) but, as mentioned above, ordinary probabilities would do to capture that. Nevertheless, we did associate probability amplitudes with the events described above in order to illustrate the difference with the quantum-mechanical world. More specifically, we distinguished:

  1. Situation (a): particle a goes to detector 1 and b goes to 2, versus
  2. Situation (b): particle a goes to 2 and b goes to 1.

In our bra-ket notation:

  1. 〈1|a〉〈2|b〉 = f(θ), and
  2. 〈1|b〉〈2|a〉 = f(π–θ).

The f(θ) function is a quantum-mechanical wave function. As mentioned above, while we’d expect to see some space (x) and time (t) variables in it, these are, apparently, already captured by the θ variable. What about f(π–θ)? Well… As mentioned above also, that’s just the same function as f(θ) but using the angle π–θ as the argument. So, the following remark is probably too trivial to note but let me do it anyway (to make sure you understand what we’re modeling here really): while it’s the same function f, the values f(θ) and f(π–θ) are, of course, not necessarily equal and, hence, the corresponding probabilities are also not necessarily the same. Indeed, some angles of scattering may be more likely than others. However, note that we assume that the function f itself is  exactly the same for the two situations (a) and (b), as evidenced by that normalization condition we assume to be respected: if we add all probabilities over all angles, then we should get 1, so ∫|f(θ)|2dθ = ∫f(θ)·f*(θ)dθ = 1.

So far so good, you’ll say. However, let me ask the same critical question once again: why would we use the same wave function f for the second situation? 

Huh? You’ll say: why wouldn’t we? Well… Think about it. Again, how do we find that f(θ) function? The assumption here is that we just do the experiment a zillion times while varying the angle θ and, hence, that we’ll find some average corresponding to P(θ), i.e. the probability. Now, the next step then is to equate that average value to |f(θ)|obviously, because we have this quantum-mechanical theory saying probabilities are the absolute square of probability amplitudes. And,  so… Well… Yes. We then just take the square root of the P function to find the f(θ) function, isn’t it?

Well… No. That’s where Feynman is not very accurate when it comes to spelling out all of the assumptions underpinning this thought experiment. We should obviously watch out here, as there’s all kinds of complications when you do something like that. To a large extent (perhaps all of it), the complications are mathematical only.

First, note that any number (real or complex, but note that |f(θ)|2 is a real number) has two distinct real square roots: a positive and a negative one: x = ± √x2. Secondly, we should also note that, if f(θ) is a regular complex-valued wave function of x and t and θ (and with ‘regular’, we mean, of course, that’s it’s some solution to a Schrödinger (or Schrödinger-like) equation), then we can multiply it with some random factor shifting its phase Θ (usually written as Θ = kx–ωt+α) and the square of its absolute value (i.e. its squared norm) will still yield the same value. In mathematical terms, such factor is just a complex number with a modulus (or length or norm–whatever terminology you prefer) equal to one, which we can write as a complex exponential: eiα, for example. So we should note that, from a mathematical point of view, any function eiαf(θ) will yield the same probabilities as f(θ). Indeed,

|f(θ)|= |eiαf(θ)|= (|eiα||f(θ)|)= |eiα|2|f(θ)|= 12|f(θ)|2

Likewise, while we assume that this function f(π–θ) is the same function f as that f(θ) function, from a mathematical point of view, the function eiβf(π–θ) would do just as well, because its absolute square yields the very same (real) probability |f(π–θ)|2. So the question as to what wave function we should take for the probability amplitude is not as easy to answer as you may think. Huh? So what function should we take then? Well… We don’t know. Fortunately, it doesn’t matter, for non-identical particles that is. Indeed, when analyzing the scattering of non-identical particles, we’re interested in the probabilities only and we can calculate the total probability of particle a ending up in detector 1 or 2 (and, hence, particle b ending up in detector 2 or 1) as the following sum:

|eiαf(θ)|2 +|eiβf(π–θ)|= |f(θ)|2 +|f(π–θ)|2.

In other words, for non-identical particles, these phase factors (eiα or eiβ) don’t matter and we can just forget about them.

However, and that’s the crux of the matter really, we should mention them, of course, in case we’d have to add the probability amplitudeswhich is exactly what we’ll have to do when we’re looking at identical particles, of course. In fact, in that case (i.e. when these phase factors eiα and eiβ will actually matter), you should note that what matters really is the phase difference, so we could replace α and β with some δ (which is what we’ll do below).

However, let’s not put the cart before the horse and conclude our analysis of what’s going on when we’re considering non-identical parties: in that case, this phase difference doesn’t matter. And the remark about the positive and negative square root doesn’t matter either. In fact, if you want, you can subsume it under the phase difference story by writing eiα as eiα = ± 1. To be more explicit: we could say that –f(θ) is the probability amplitude, as |–f(θ)|is also equal to that very same real number |f(θ)|2. OK. Done.

Bose-Einstein and Fermi-Dirac statistics

As I mentioned above, the story becomes an entirely different one when we’re doing the same experiment with identical particles. At this point, Feynman’s argument becomes rather fuzzy and, in my humble opinion, that’s because he refused to be very explicit about all of those implicit assumptions I mentioned above. What I can make of it, is the following:

1. We know that we’ll have to add probability amplitudes, instead of probabilities, because we’re talking one event that can happen in two indistinguishable ways. Indeed, for non-identical particles, we can, in principle (and in practice) distinguish situation (a) and (b) – and so that’s why we only have to add some real-valued numbers representing probabilities – but so we cannot do do that for identical particles.

2. Situation (a) is still being described by some probability amplitude f(θ). We don’t know what function exactly, but we assume there is some unique wave function f(θ) out there that accurately describes the probability amplitude of particle a going to 1 (and, hence, particle b going to 2), even if we can’t tell which is a and which is b. What about the phase factor? Well… We just assume we’ve chosen our t such that α = 0. In short, the assumption is that situation (a) is represented by some probability amplitude (or wave function, if you prefer that term) f(θ).

3. However, a (or some) particle (i.e. particle a or particle b) ending up in a (some) detector (i.e. detector 1 or detector 2) may come about in two ways that cannot be distinguished one from the other. One is the way described above, by that wave function f(θ). The other way is by exchanging the role of the two particles. Now, it would seem logical to associate the amplitude f(π–θ) with the second way. But we’re in the quantum-mechanical world now. There’s uncertainty, in position, in momentum, in energy, in time, whatever. So we can’t be sure about the phase. That being said, the wave function will still have the same functional form, we must assume, as it should yield the same probability when squaring. To account for that, we will allow for a phase factor, and we know it will be important when adding the amplitudes. So, while the probability for the second way (i.e. the square of its absolute value) should be the same, its probability amplitude does not necessarily have to be the same: we have to allow for positive and negative roots or, more generally, a possible phase shift. Hence, we’ll write the probability amplitude as eiδf(π–θ) for the second way. [Why do I use δ instead of β? Well… Again: note that it’s the phase difference that matters. From a mathematical point of view, it’s the same as inserting an eiβ factor: δ can take on any value.]

4. Now it’s time for the Big Trick. Nature doesn’t matter about our labeling of particles. If we have to multiply the wave function (i.e. f(π–θ), or f(θ)–it’s the same: we’re talking a complex-valued function of some variable (i.e. the angle θ) here) with a phase factor eiδ when exchanging the roles of the particles (or, what amounts to the same, exchanging the role of the detectors), we should get back to our point of departure (i.e. no exchange of particles, or detectors) when doing that two times in a row, isn’t it? So we exchange the role of particle a and b in this analysis (or the role of the detectors), and then we’d exchange their roles once again, then there’s no exchange of roles really and we’re back at the original situation. So we must have eiδeiδf(θ) = f(θ) (and eiδeiδf(π–θ) = f(π–θ) of course, which is exactly the same statement from a mathematical point of view).

5. However, that means (eiδ)= +1, which, in turn, implies that eiδ is plus or minus one: eiδ = ± 1. So that means the phase difference δ must be equal to 0 or π (or –π, which is the same as +π).

In practical terms, that means we have two ways of combining probability amplitudes for identical particles: we either add them or, else, we subtract them. Both cases exist in reality, and lead to the dichotomy between Bose and Fermi particles:

  1. For Bose particles, we find the total probability amplitude for this scattering event by adding the two individual amplitudes: f(θ) + f(π–θ).
  2. For Fermi particles, we find the total probability amplitude for this scattering event by subtracting the two individual amplitudes: f(θ) – f(π–θ).

As compared to the probability for non-identical particles which, you’ll remember, was equal to |f(θ)|2 +|f(π–θ)|2, we have the following Bose-Einstein and Fermi-Dirac statistics:

  1. For Bose particles: the combined probability is equal to |f(θ) + f(π–θ)|2. For example, if θ is 90°, then we have a scattering probability that is exactly twice the probability for non-identical particles. Indeed, if θ is 90°, then f(θ) = f(π–θ), and then we have |f(π/2) + f(π/2)|2 = |2f(π/2)|2 = 4|f(π/2)|2. Now, that’s two times |f(π/2)|2 +|f(π/2)|2 = 2|f(π/2)|2 indeed.
  2. For Fermi particles (e.g. electrons), we have a combined probability equal to |f(θ) – f(π–θ)|2. Again, if θ is 90°, f(θ) = f(π–θ), and so it would mean that we have a combined probability which is equal to zero ! Now, that‘s a strange result, isn’t it? It is. Fortunately, the strange result has to be modified because electrons will also have spin and, hence, in half of the cases, the two electrons will actually not be identical but have opposite spin. That changes the analysis substantially (see Feynman’s Lectures, III-3-12). To be precise, if we take the spin factor into, we’ll find a total probability (for θ = 90°) equal to |f(π/2)|2, so that’s half of the probability for non-identical particles.

Hmm… You’ll say: Now that was a complicated story! I fully agree. Frankly, I must admit I feel like I still don’t quite ‘get‘ the story with that phase shift eiδ, in an intuitive way that is (and so that’s the reason for going through the trouble of writing out this post). While I think it makes somewhat more sense now (I mean, more than when I wrote a post on this in March), I still feel I’ve only brought some of the implicit assumptions to the fore. In essence, what we’ve got here is a mathematical dichotomy (or a mathematical possibility if you want) corresponding to what turns out to be an actual dichotomy in Nature: in quantum-mechanics, particles are either bosons or fermions. There is no Third Way, in quantum-mechanics that is (there is a Third Way in reality, of course: that’s the classical world!).

I guess it will become more obvious as I’ll get somewhat more acquainted with the real arithmetic involved in quantum-mechanical calculations over the coming weeks. In short, I’ve analyzed this thing over and over again, but it’s still not quite clear me. I guess I should just move on and accept that:

  1. This explanation ‘explains’ the experimental evidence, and that’s different probabilities for identical particles as compared to non-identical particles.
  2. This explanation ‘complements’ analyses such as that 1916 analysis of blackbody radiation by Einstein (see my post on that), which approaches interference from an angle that’s somewhat more intuitive.

A numerical example

I’ve learned that, when some theoretical piece feels hard to read, an old-fashioned numerical example often helps. So let’s try one here. We can experiment with many functional forms but let’s keep things simple. From the illustration (which I copy below for your convenience), that angle θ can take any value between −π and +π, so you shouldn’t think detector 1 can only be ‘north’ of the collision spot: it can be anywhere.

Identical particles-a

Now, it may or may not make sense (and please work out other examples than this one here), but let’s assume particle a and b are more likely to go in a line that’s more or less straight. In other words, the assumption is that both particles deflect each other only slightly, or even not at all. After all, we’re talking ‘point-like’ particles here and so, even when we try hard, it’s hard to make them collide really.

That would amount to a typical bell-shaped curve for that probability density curve P(θ): one like the blue curve below. That one shows that the probability of particle a and b just bouncing back (i.e. θ ≈ ±π) is (close to) zero, while it’s highest for θ ≈ 0, and some intermediate value for anything angle in-between. The red curve shows P(π–θ), which can be found by mirroring the P(θ) around the vertical axis, which yields the same function because the function is symmetrical: P(θ) = P(–θ), and then shifting it by adding the vertical distance π. It should: it’s the second possibility, remember? Particle a ending up in detector 2. But detector 2 is positioned at the angle π–θ and, hence, if π–θ is close to ±π (so if θ ≈ 0), that means particle 1 is basically bouncing back also, which we said is unlikely. On the other hand, if detector 2 is positioned at an angle π–θ ≈ 0, then we have the highest probability of particle a going right to it. In short, the red curve makes sense too, I would think. [But do think about yourself: you’re the ultimate judge!]

Example - graph

The harder question, of course, concerns the choice of some wave function f(θ) to match those P curves above. Remember that these probability densities P are real numbers and any real number is the absolute square (aka the squared norm) of an infinite number of complex numbers! So we’ve got l’embarras du choix, as they say in French. So… What do to? Well… Let’s keep things simple and stupid and choose a real-valued wave function f(θ), such as the blue function below. Huh? You’ll wonder if that’s legitimate. Frankly, I am not 100% sure, but why not? The blue f(θ) function will give you the blue P(θ) above, so why not go along with it? It’s based on a cosine function but it’s only half of a full cycle. Why? Not sure. I am just trying to match some sinusoidal function with the probability density function here, so… Well… Let’s take the next step.

Example 2

The red graph above is the associated f(π–θ) function. Could we choose another one? No. There’s no freedom of choice here, I am afraid: if we choose a functional form for f(θ), then our f(π–θ) function is fixed too. So it is what it is: negative between –π and 0, and positive between 0 and +π and 0. Now that is definitely not good, because f(π–θ) for θ = –π is not equal to f(π–θ) for θ = +π: they’re opposite values. That’s nonsensical, isn’t it? Both the f(θ) and the f(π–θ) should be something cyclical… But, again, let’s go along with it as for now: note that the green horizontal line is the sum of the squared (absolute) values of f(θ) and f(π–θ), and note that it’s some constant.

Now, that’s a funny result, because I assumed both particles were more likely to go in some straight line, rather than recoil with some sharp angle θ. It again indicates I must be doing something wrong here. However, the important thing for me here is to compare with the Bose-Einstein and Fermi-Dirac statistics. What’s the total probability there if we take that blue f(θ) function? Well… That’s what’s shown below. The horizontal blue line is the same as the green line in the graph above: a constant probability for some particle (a or b) ending up in some detector (1 or 2). Note that the surface, when added, of the two rectangles above the x-axis (i.e. the θ-axis) should add up to 1. The red graph gives the probability when the experiment is carried out for (identical) bosons (or Bose particles as I like to call them). It’s weird: it makes sense from a mathematical point of view (the surface under the curve adds up to the same surface under the blue line, so it adds up to 1) but, from a physics point of view, what does this mean? A maximum at θ = π/2 and a minimum at θ = –π/2? Likewise, how to interpret the result for fermions?


Is this OK? Well… To some extent, I guess. It surely matches the theoretical results I mentioned above: we have twice the probability for bosons for θ = 90° (red curve), and a probability equal to zero for the same angle when we’re talking fermions (green curve). Still, this numerical example triggers more questions than it answers. Indeed, my starting hypothesis was very symmetrical: both particle a and b are likely to go in a straight line, rather than being deflected in some sharp(er) angle. Now, while that hypothesis gave a somewhat unusual but still understandable probability density function in the classical world (for non-identical particles, we got a constant for P(θ) + P(π–θ)), we get this weird asymmetry in the quantum-mechanical world: we’re much more likely to catch boson in a detector ‘north’ of the line of firing than ‘south’ of it, and vice versa for fermions.

That’s weird, to say the least. So let’s go back to the drawing board and take another function for f(θ) and, hence, for f(π–θ). This time, the two graphs below assume that (i) f(θ) and f(π–θ) have a real as well as an imaginary part and (ii) that they go through a full cycle, instead of a half-cycle only. This is done by equating the real part of the two functions with cos(θ) and cos(π–θ) respectively, and their imaginary part with sin(θ) and sin(π–θ) respectively. [Note that we conveniently forget about the normalization condition here.]


What do we see? Well… The imaginary part of f(θ) and f(π–θ) is the same, because sin(π–θ) = sin(θ). We also see that the real part of f(θ) and f(π–θ) are the same except for a phase difference equal to π: cos(π–θ) = cos[–(θ–π)] = cos(θ–π). More importantly, we see that the absolute square of both f(θ) and f(π–θ) yields the same constant, and so their sum P = |f(θ)|2 +|f(π–θ)|= 2|f(θ)|2 = 2|f(π–θ)|= 2P(θ) = 2P(π–θ). So that’s another constant. That’s actually OK because, this time, I did not favor one angle over the other (so I did not assume both particles were more likely to go in some straight line rather than recoil).

Now, how does this compare to Bose-Einstein and Fermi-Dirac statistics? That’s shown below. For Bose-Einstein (left-hand side), the sum of the real parts of f(θ) and f(π–θ) yields zero (blue line), while the sum of their imaginary parts (i.e. the red graph) yields a sine-like function but it has double the amplitude of sin(θ). That’s logical: sin(θ) + sin(π–θ) = 2sin(θ). The green curve is the more interesting one, because that’s the total probability we’re looking for. It has two maxima now, at +π/2 and at –π/2. That’s good, as it does away with that ‘weird asymmetry’ we got when we used a ‘half-cycle’ f(θ) function.

B-E and F-D

Likewise, the Fermi-Dirac probability density function looks good as well (right-hand side). We have the imaginary parts of f(θ) and f(π–θ) that ‘add’ to zero: sin(θ) – sin(π–θ) = 0 (I put ‘add’ between brackets because, with Fermi-Dirac, we’re subtracting of course), while the real parts ‘add’ up to a double cosine function: cos(θ) – cos(π–θ) = cos(θ) – [–cos(θ)] = 2cos(θ). We now get a minimum at +π/2 and at –π/2, which is also in line with the general result we’d expect. The (final) graph below summarizes our findings. It gives the three ‘types’ of probabilities, i.e. the probability of finding some particle in some detector as a function of the angle –π < θ < +π using:

  1. Maxwell-Boltzmann statistics: that’s the green constant (non-identical particles, and probability does not vary with the angle θ).
  2. Bose-Einstein: that’s the blue graph below. It has two maxima, at +π/2 and at –π/2, and two minima, at 0 and at ±π (+π and –π are the same angle obviously), with the maxima equal to twice the value we get under Maxwell-Boltzmann statistics.
  3. Finally, the red graph gives the Fermi-Dirac probabilities. Also two maxima and minima, but at different places: the maxima are at θ = 0 and  θ = ±π, while the minima are at at +π/2 and at –π/2.


Funny, isn’t it? These probability density functions are all well-behaved, in the sense that they add up to the same total (which should be 1 when applying the normalization condition). Indeed, the surfaces under the green, blue and red lines are obviously the same. But so we get these weird fluctuations for Bose-Einstein and Fermi-Dirac statistics, favoring two specific angles over all others, while there’s no such favoritism when the experiment involves non-identical particles. This, of course, just follows from our assumption concerning f(θ). What if we double the frequency of f(θ), i.e. from one cycle to two cycles between –π and +π? Well… Just try it: take f(θ) = cos(2·θ) + isin(2·θ) and do the calculations. You should get the following probability graphs: we have the same green line for non-identical particles, but interference with four maxima (and four minima) for the Bose-Einstein and Fermi-Dirac probabilities.

summary 2

Again… Funny, isn’t it? So… What to make of this? Frankly, I don’t know. But one last graph makes for an interesting observation: if the angular frequency of f(θ) takes on larger and larger values, the Bose-Einstein and Fermi-Dirac probability density functions also start oscillating wildly. For example, the graphs below are based on a f(θ) function equal to f(θ) = cos(25·θ) + isin(25·θ). The explosion of color hurts the eye, doesn’t it? :-) But, apart from that, do you now see why physicists say that, at high frequencies, the interference pattern gets smeared out? Indeed, if we move the detector just a little bit (i.e. we change the angle θ just a little bit) in the example below, we hit a maximum instead of a minimum, and vice versa. In short, the granularity may be such that we can only measure that green line, in which case we’d think we’re dealing with Maxwell-Boltzmann statistics, while the underlying reality may be different.

summary 4

That explains another quote in Feynman’s famous introduction to quantum mechanics (Lectures, Vol. III, Chapter 1): “If the motion of all matter—as well as electrons—must be described in terms of waves, what about the bullets in our first experiment? Why didn’t we see an interference pattern there? It turns out that for the bullets the wavelengths were so tiny that the interference patterns became very fine. So fine, in fact, that with any detector of finite size one could not distinguish the separate maxima and minima. What we saw was only a kind of average, which is the classical curve. In the Figure below, we have tried to indicate schematically what happens with large-scale objects. Part (a) of the figure shows the probability distribution one might predict for bullets, using quantum mechanics. The rapid wiggles are supposed to represent the interference pattern one gets for waves of very short wavelength. Any physical detector, however, straddles several wiggles of the probability curve, so that the measurements show the smooth curve drawn in part (b) of the figure.”

Interference with bullets

But that should really conclude this post. It has become way too long already. One final remark, though: the ‘smearing out’ effect also explains why those three equations for 〈Ni〉 sometimes do amount to more or less the same thing: the Bose-Einstein and Fermi-Dirac formulas may approximate the Maxwell-Boltzmann equation. In that case, the ±1 term in the denominator does not make much of a difference. As we said a couple of times already, it all depends on scale. :-)

Concluding remarks

1. The best I can do in terms of interpreting the above, is to tell myself that we cannot fully ‘fix’ the functional form of the wave function for the second or ‘other’ way the event can happen if we’re ‘fixing’ the functional form for the first of the two possibilities. We have to allow for a phase shift eiδ indeed, which incorporates all kinds of considerations of uncertainty in regard to both time and position and, hence, in regard to energy and momentum also (using both the ΔEΔt = ħ/2 and ΔxΔp = ħ/2 expressions)–I assume (but that’s just a gut instinct). And then the symmetry of the situation then implies eiδ can only take on one of two possible values: –1 or +1 which, in turn, implies that δ is equal to 0 or π.

2. For those who’d think I am basically doing nothing but re-write a chapter out of Feynman’s Lectures, I’d refute that. One point to note is that Feynman doesn’t seem to accept that we should introduce a phase factor in the analysis for non-identical particles as well. To be specific: just switching the detectors (instead of the particles) also implies that one should allow for the mathematical possibility of the phase of that f function being shifted by some random factor δ. The only difference with the quantum-mechanical analysis (i.e. the analysis for identical particles) is that the phase factor doesn’t make a difference as to the final result, because we’re not adding amplitudes but their absolute squares and, hence, a phase shift doesn’t matter.

3. I think all of the reasoning above makes not only for a very fine but also a very beautiful theoretical argument, even I feel like I don’t fully ‘understand’ it, in an intuitive way that is. I hope this post has made you think. Isn’t it wonderful to see that the theoretical or mathematical possibilities of the model actually correspond to realities, both in the classical as well as in the quantum-mechanical world? In fact, I can imagine that most physicists and mathematicians would shrug this whole reflection off like… Well… Like: “Of course! It’s obvious, isn’t it?” I don’t think it’s obvious. I think it’s deep. I would even qualify it as mysterious, and surely as beautiful. :-)

Relativity paradoxes

My son, who’s fifteen, said he liked my post on lasers. That’s good, because I effectively wrote it thinking of him as part of the audience. He also said it stimulated him to considering taking on studies in engineering later. That’s great. I hope he does, so he doesn’t have to go through what I am going through right now. Indeed, when everything is said and done, you do want your kids to take on as much math and science they can handle when they’re young because, afterwards, it’s tough to catch up.

Now, I struggled quite a bit with bringing relativity into the picture while pondering the ‘essence’ of a photon in my previous post. Hence, I’d thought it would be good to return to the topic of (special) relativity and write another post to (1) refresh my knowledge on the topic and (2) try to stimulate him even more. Indeed, regardless of whether one does or doesn’t understand any of what I write below here, relativity theory sounds fascinating, doesn’t it? :-) So, this post intends to present, in a nutshell, what (special) relativity theory is all about.

What relativity does

The thing that’s best known about Einstein’s (special) theory of relativity is the following: the mass of an object, as measured by the (inertial) observer, increases with its speed. The formula for this is m = γm0, and the γ factor here is the so-called Lorentz factor: γ = (1–u2/c2)–1/2. Let me give you that diagram of the Lorentz factor once again, which shows that very considerable speeds are required before relativity effects kick in. However, when they do, they kick in with a vengeance, it seems, which makes c the limit !


Now, you may or may not be familiar with two other things that come out of relativity theory as well:

  1. The first is length contraction: objects are measured to be shortened in the direction of motion with respect to the (inertial) observer. The formula to be used incorporates the reciprocal of the Lorentz factor: L = (1/γ)L0. For example, a stick of one meter in a space ship moving at a velocity v = 0.6c will appear to be only 80 cm to the external/inertial observer seeing it whizz past… That is if he can see anything at all of course: he’d have to take like a photo-finish picture as it zooms past ! :-)
  2. The second is time dilation, which is also rather well known – just like the mass increase effect – because of the so-called twin paradox: time will appear to be slower in that space ship and, hence, if you send one of two twins away on a space journey, traveling at relativistic speeds (i.e. a velocity sufficiently close to to make the relativistic effect significant), he will come back younger than his brother. The formula here is equally simple: t = γt0. Hence, one second in the space ship will be measured as 1.25 seconds by the external observer. Hence, the moving clock will appear to run slower – again: to the external (inertial) observer that is.

These simple rules, which comes out of Einsteins’ special relativity theory, give rise to all kinds of paradoxes. You know what a paradox is: a paradox (in physics) is something that, at first sight, does not make sense but that, when the issue is examined more in detail, does get resolved and actually helps us to better understand what’s going on.

You know the twin paradox already: only of the two twins can be the younger (or the older) when they meet again. However, because one can also say it’s the guy staying on Earth that’s moving (and, hence, is ‘traveling’ at relativistic speed) – so then the reference frame of the guy in the spaceship is the so-called inertial frame, one can say the guy who stayed behind (on Earth) should be the youngest when they meet after the journey. I am not ashamed to say that this actually is a paradox that is difficult to understand. So let me first start with another.

The ladder paradox

While the twin paradox examines the time dilation effect, the ladder paradox examines the length contraction effect. The situation is similar as the one for the twin paradox. However, because we don’t have accelerating and decelerating rockets and all that (cf. the twin paradox), I find this paradox not only more straightforward but also more amusing. Look at the left-hand side first. We have a garage which has both a front and back door. A ladder passes through it, and it seems to fit in the garage as you can see. Now, that may or may not be because of the length contraction effect, of course. Whatever. In any case, it seems we can (very) quickly close both doors of the garage to prove that it fits. Now look at the right-hand side. Here we are moving the garage over the ladder (I know, not very convenient, but just go along with the story). So now the ladder frame is the inertial reference frame and the garage is the moving frame. So, according to that length contraction ‘law’, it’s the garage that gets shorter and it turns out the ladder doesn’t fit any more. Hence, the paradox: does the ladder fit or not? The answer must be unambiguous, no? Yes or no. So what is it?

Ladder paradox 1Ladder paradox 2

The paradox pushes us to consider all kinds of important questions which are usually just glossed over. How does we decide if the ladder fits? Well… By closing both the front and back door of course, you’ll say. But then you mean closing them simultaneously, and absolute simultaneity does not exist: two events that appear to happen at the same time in one reference frame may not happen at the same time in another. Only the space-time interval between two events is absolute, in the sense that it’s the same in whatever reference frame we’re measuring it, not the individual space and individual time intervals. Hence, if you’re in the garage shutting those doors at the same time, then that’s your time, but if I am moving with the ladder, I will not see those two doors shutting as something that’s simultaneous. More formally, and using the definition of space-time intervals (and assuming only one space dimension x), we have:

cΔt– Δx= cΔt’– Δx’2.

In this equation, we’ll take the x and t coordinates to be those of the inertial frame (so that’s the garage on the left-hand side), while the the primed coordinates (x’ and t’) are the coordinates as measured in the other reference frame, i.e. the reference frame that moves from the perspective of the inertial frame. Indeed, note that we cannot say that one reference frame moves while the other stands still as we we’re talking relative speeds here: one reference frame moves in respect to the other, and vice versa. In any case, the equation with the space-time intervals above implies that:

c(ΔtΔt’2) – (Δx– Δx’2) = 0

However, that does not imply that the two terms on the left-hand side of the above equation are zero individually. In fact, they aren’t. Hence, while it must be true that c(ΔtΔt’2) = Δx– Δx’2, we have:

ΔtΔt’≠ 0 and Δx– Δx’2 ≠ 0 or ΔtΔt’and Δx≠ Δx’2

To put it simply, if you’re in the garage, and I am moving with the ladder (we’re talking the left-hand side situation) now, you’ll claim that you were able to shut both doors momentarily, so that Δt= 0. I’ll say: bollocks! Which is rude. I should say: my Δt’is not equal to zero. Hence, from my point of view, I always saw one of the two doors open and, hence, I don’t think the ladder fits. Hence, what I am seeing, effectively, is the situation on the right-hand side: your garage looks too short for my ladder.

You’ll say: what is this? The ladder fits or it doesn’t, does it? The answer is: no. It is ambiguous. It does depend on your reference frame. It fits in your reference frame but it does not fit in mine. In order to get a non-ambiguous answer you have to stop moving, or I have to stop moving– whatever: the point is that we need to merge our reference frames.

Hence, paradox solved. In fact, now that I think of it, it’s kinda funny that we don’t have such paradoxes for the relativistic mass formula. No one seems to wonder about the apparent contradiction that, if you’re moving away from me, you look heavier than me but that, vice versa, I also look heavier to you. So we both look heavier as seen from our own respective reference frames. So who’s heavier then? Perhaps no one developed a paradox because it is kinda impolite to compare personal weights? :-)

Of course, I am joking, but think of it: it has to do with our preconceived notions of time and space. Things like inertia (mass is a measure for inertia) don’t grab our attention as much. In any case, now it’s time to discuss time dilation.

Oh ! And do think about that photo-finish picture ! It’s related to the problem of defining what constitutes a length really. :-)

The twin paradox

I find the twin paradox much more difficult to analyze, and I guess many people do because it’s the one that usually receives all of the attention. [Frankly, I hadn’t heard of this ladder paradox before I started studying physics.] Feynman hardly takes the time to look at it. He basically notes that the situation is not unlike an unstable particle traveling at relativistic speeds: when it does, it lasts (much) longer that its lifetime (measured in the inertial reference frame) suggests. Let me actually just quote Feynman’s account of it:

“Peter and Paul are supposed to be twins, born at the same time. When they are old enough to drive a space ship, Paul flies away at very high speed. Because Peter, who is left on the ground, sees Paul going so fast, all of Paul’s clocks appear to go slower, his heart beats go slower, his thoughts go slower, everything goes slower, from Peter’s point of view. Of course, Paul notices nothing unusual, but if he travels around and about for a while and then comes back, he will be younger than Peter, the man on the ground! That is actually right; it is one of the consequences of the theory of relativity which has been clearly demonstrated. Just as the mu-mesons last longer when they are moving, so also will Paul last longer when he is moving. This is called a “paradox” only by the people who believe that the principle of relativity means that all motion is relative; they say, “Heh, heh, heh, from the point of view of Paul, can’t we say that Peter was moving and should therefore appear to age more slowly? By symmetry, the only possible result is that both should be the same age when they meet.” But in order for them to come back together and make the comparison, Paul must either stop at the end of the trip and make a comparison of clocks or, more simply, he has to come back, and the one who comes back must be the man who was moving, and he knows this, because he had to turn around. When he turned around, all kinds of unusual things happened in his space ship—the rockets went off, things jammed up against one wall, and so on—while Peter felt nothing.

So the way to state the rule is to say that the man who has felt the accelerations, who has seen things fall against the walls, and so on, is the one who would be the younger; that is the difference between them in an “absolute” sense, and it is certainly correct. When we discussed the fact that moving mu-mesons live longer, we used as an example their straight-line motion in the atmosphere. But we can also make mu-mesons in a laboratory and cause them to go in a curve with a magnet, and even under this accelerated motion, they last exactly as much longer as they do when they are moving in a straight line. Although no one has arranged an experiment explicitly so that we can get rid of the paradox, one could compare a mu-meson which is left standing with one that had gone around a complete circle, and it would surely be found that the one that went around the circle lasted longer. Although we have not actually carried out an experiment using a complete circle, it is really not necessary, of course, because everything fits together all right. This may not satisfy those who insist that every single fact be demonstrated directly, but we confidently predict the result of the experiment in which Paul goes in a complete circle.”

[…] Well… I am not sure I am “among those who insist that every single fact be demonstrated directly”, but you’ll admit that Feynman is quite terse here (or more terse than usual, I should say). That being said, I understand why: the calculations involved in demonstrating that the paradox is what it is, i.e. an apparent contradiction only, are not straightforward. I’ve googled a bit but it’s all quite confusing. Good explanations usually involve the so-called Minkowski diagram, also known as the spacetime diagram. You’ve surely seen it before–when the light cone was being discussed and what it implies for the concepts of past, present and future. It’s a way to represent those spacetime intervals. The Minkowski diagram–from the perspective of the twin brother on Earth (hence, we only have unprimed coordinates x and (c)t)– is shown below. Don’t worry about those simultaneity planes as for now. Just try to understand the diagram. The twin brother that stays just moves along the vertical axis: x = 0. His space-traveling brother travels out to some point and then turns back, so he first travels northeast on this diagram and then takes a turn northwest, to meet up again with his brother on Earth.

485px-Twin_Paradox_Minkowski_DiagramThe point to note is that the twin brother is not traveling along one straight line, but along two. Hence, the argument that we can just as well say his frame of reference is inertial and that of his brother is the moving one is not correct. As Wikipedia notes (from which I got this diagram): “The trajectory of the ship is equally divided between two different inertial frames, while the Earth-based twin stays in the same inertial frame.”

Still, the situation is essentially symmetric and so we could draw a similar-looking spacetime diagram for the primed coordinates, i.e. x’ and ct’, and wonder what’s the difference. That’s where these planes of simultaneity come in. Look at the wonderful animation below: A, B, C are simultaneous events when I am standing still (v = 0). However, when I move at considerable speed (v = 0.3c), that’s no longer the case: it takes more time for news to reach me from ‘point’ A and, hence, assuming news travels at the speed of light, event A appears to happen later. Conversely, event C (in spacetime) appears to have happened before event B. Now that explains these blue so-called simultaneity planes on the diagram above: they’re the white lines traveling from the past to the future on the animation below, but for the trip out only (> 0). For the trip back, we have the red lines, which correspond to the v = –0.5c situation below. So that’s the return trip (< 0).

Relativity_of_Simultaneity_AnimationWhat you see is that, “during the U-turn, the plane of simultaneity jumps from blue to red and very quickly sweeps over a large segment of the world line of the Earth-based twin.” Hence, “when one transfers from the outgoing frame to the incoming frame there is a jump discontinuity in the age of the Earth-based twin.” [I took the quotes taken from Wikipedia here, where you can find the original references.] Now, you will say, that is also symmetric if we switch the reference frames. Yes… Except for the sign. So, yes, it is the traveling brother who effectively skips some time. Paradox solved.

Now… For some real fun…

Now, for some real fun, I’d like to ask you how the world would look like when you were traveling through it riding a photon. So… Think about it. Think hard. I didn’t google at first and I must admit the question really started wracking my brain. There are some many effects to take into account. One basic property, of course, must be that time stands still around you. You see the world as it was when you reached v = c. Well… Yes and no. The fact of the matter is that, because of all the relativistic effects (e.g. aberration, Doppler shift, intensity shifts,…), you actually don’t see a whole lot. One visualization of it (visual effects of relativistic speeds) seems to indicate that (most) science fiction movies actually present the correct picture (if the animation shows the correct visualization, that is): we’re staring into one bright flash of light ahead of us as we’re getting close to v = c. Interesting…

Finally, you should also try to find out what actually happens to the clocks during the deceleration and acceleration as the space ship of that twin brother turns. You’re going to find it fascinating. At the same time, the math behind is, quite simply, daunting and, hence, I won’t even try go into the math of this thing. :-)


So… Well… That’s it really. I now realize why I never quite got this as a kid. These paradoxes do require some deep thinking and imagination and, most of all, some tools that one just couldn’t find as easily as today.

The Web definitely does make it easier to study without the guidance of professors and the material environment of a university, although I don’t think it can be a substitute for discipline. When everything is said and done, it’s still hard work. Very hard work. But I hope you get there, Vincent ! :-) And please do look at that Youtube video by clicking the link above. :-)

Post scriptum: Because the resolution of the video above is quite low, I looked for others, for example one that describes the journey from the Sun to the Earth, which–as expected–takes about 8 minutes. While it has higher resolution, it is far less informative. I’ll let you google some more. Please tell me if you found something nice. :-)

The Nature of Photons

As you can see from previous posts, I am quite intrigued by light, or electromagnetic radiation in general, because photons behave both like waves (interference and diffraction) as well as like particles (they arrive in a lump, and our eye can distinguish colors, even if only a few photons are coming in). Hence, I have a gut feeling that I’ll understand everything if I’d just be able to understand light.

From a classical point of view, i.e. a wave point of view, photons are to be analyzed as an electromagnetic disturbance, or a transient (see my post Photons as Strings). From a quantum-mechanical perspective, they are to be treated as particles with some spread in the position and/or the momentum described by their (complex-valued) psi and phi wave function, which are related to the Uncertainty Principle: σxσp = ħ/2. How do we reconcile both views?

I’ve introduced the path integral formulation of quantum mechanics already a couple of times (notably in my previous post). It assumes photons do occupy some ‘space’–and then I mean physical space. What? I can see you shaking your head. Well… Sorry. OK… Let me explain what I mean. When everything is said and done, what physicists are interested in is probability amplitudes (to be precise: they’re interested in the function that gives those amplitudes as a function of time and space). Now, in the path integral formulation, these probability amplitudes are defined as P(A, B), i.e. the probability amplitude to go from point A to B in space or, to be precise (and, hence, taking into account relativity), point A and B in space-time.


So what’s done is summing individual probability amplitudes over all possible paths, and light only goes in a straight line if there’s enough space to ensure that probability amplitudes of neighboring paths cancel each other out (see my previous post). The bottom line is: the wave function is a mathematical thing, but its shape is determined by the physical space. This is quite obvious from Feynman’s discussion of diffraction (see the previous post).

This mental image of a photon is not consistent with the view of a photon as a transient electromagnetic wave, such as the one I depict below. That mental image is based on the fact that atomic oscillators have a Q and that, based on this Q, we know the atom will radiate for about 10–8 seconds (technically, that’s the time it takes for the radiation to die out by a factor 1/e), during which it will emit one photon. That’s not very long but, taking into account the rather spectacular speed of light (3×10m/s), that still makes for a wave train with a length of some 3 meter.

Photon wave

Three meter! That does not match the picture of a photon as a ‘fundamental’ particle which cannot be broken up, and it’s also not consistent with the scale that we’re talking about when discussing diffraction or interference of light, which is micrometers, millimeters, or… OK… Let’s say centimeters even. But surely not meters! Well… Yes. You’re right. But you can check the math. It’s incontournable, as they say in French: for sodium light (Feynman’s example), which has a frequency of 500 THz (500×1012 oscillations per second) and a wavelength of 600 nm (600×10–9 meter), that length corresponds to some five million oscillations. All packed into one photon with an energy E = hν = 4.1×10–15 eV·s × 500×1012 Hz ≈ 2 eV. 

I see you shaking your head again. OK. This does not make sense. But – Hey! – What about bringing relativity into the picture? Now, that’s an idea! We have both length contraction as well as time dilation then. To be precise, if we’d sit on top of the photon, everything would be as described above, but… Well… We’re inert observers and so we are not sitting on top of a photon. Hence, that length of three meters appears to be only… What’s the formula again? Lengths appear to be shortened by a factor (1 – u2/c2)1/2, with u the velocity of the other reference frame as compared to ours. So… Well… That factor becomes zero and, hence, the photon appears to have… No length whatsoever! From our point of view, that is.

As for the time dilation effect, the factor to be used here is 1/(1 – u2/c2)1/2, so that means the ‘clock’ of the photon is basically standing still–again, from our point of view, that is. So there we are: from our point of view, the photon is a point particle, even it was emitted by a atomic oscillator generating a wave train with a length of 10–8 light-seconds (so that’s three meter indeed). Also, from our point of view, the photon lives forever, because its ‘clock’ is standing still.

Hmm… This explains quite a few things. Among others, it explains how a photon can have a wavelength (or a frequency if you want), even if it has zero length–from our point of view that is.

OK. That’s one important clarification. What else can we say?

Well… Not much more for the moment, I am afraid. The reflection above still doesn’t answer my question in regard to the assumed complementarity between the quantum-mechanical and classical explanation of diffraction. I feel there must be some relation between the (real-valued) E and B wave function describing the electromagnetic wave and the (complex-valued) ψ and φ wave function of the photons that make up that wave. It must have something to do with the fact that energy is (always) proportional to the square of the amplitude of a wave, and that higher-energy ‘particles’ are more likely to be detected (hence, probability amplitudes, energy and amplitudes must be related on to another).

I hope that all the math I’ll be digesting in the coming weeks, as I plan to grind through Feynman’s third Lectures Volume (i.e. the one on quantum physics), will help me to answer this question. In the meanwhile, any ‘intuitive’ explanation from your side (based on your own guts feeling or on what you’ve read) is more than welcome, of course! :-)

Post scriptum: There are, of course, relativistic transformation equations for energy and momentum as well. In fact, for energy and momentum, we have a similar quantity as that invariant ct– x– y2– z2 = ct’– x’– y’2– z’2 quantity in space-time. More in particular, we have that E– px– py2– pz2 = E’– p’x– p’y2– p’z2.  If we look at the photon as a transient, but travel along with it, i.e. at the speed of light, all we see is a static field spread out space. There’s no oscillation. The oscillation is only an oscillation in the inertial reference frame and, hence, the energy calculated above (2eV) is the energy as seen from that inertial reference frame. So what’s the energy as seen from the reference frame moving with the photon? I don’t have an answer to that. While it’s obvious that the E– px– py2– pz2 = E’– p’x– p’y2– p’zequality must hold, the question is how to define E and p from the ‘photon reference frame.’ The transformation equations I am referring assume we’re looking at some particle with a non-zero rest mass and, hence, are not relevant here. Also, while I wrote something on relativistic effects of radiation, that is also not relevant here (it had to do with a moving source: things like the Doppler effect and all that). I know there’s formulas to calculate the energy (and momentum) contained in a field, and so it’s likely those equations are relevant, but I am not sure these will help me to understand what I want to understand here. Perhaps my gut feeling is just plain wrong: perhaps there is no relation between the (real-valued) E and B wave function describing the electromagnetic wave and the (complex-valued) ψ and φ wave function of the photons that make up that wave. As said, I’ll continue digging and I’ll keep you posted.

Post scriptum 2: Thinking about the length contraction of an electromagnetic disturbance quickly gives rise to quite some contradictions. For starters, it’s strange that the length of the pulse doesn’t seem to matter: whatever the ‘proper’ length, from the point of view of the inertial observer, a pulse traveling at the speed of light will always have zero length. Isn’t that strange? What if we have no damping (i.e. a continuous wave)? These contradictions are likely to be paradoxes only, just like the ladder paradox, which results from the mistaken assumption of absolute simultaneity (simultaneity is relative). That being said, I should explore these paradoxes more in detail.

The Complementarity Principle

Unlike what you might think when seeing the title of this post, it is not my intention to enter into philosophical discussions here: many authors have been writing about this ‘principle’, most of which–according to eminent physicists–don’t know what they are talking about. So I have no intention to make a fool of myself here too. However, what I do want to do here is explore, in an intuitive way, how the classical and quantum-mechanical explanations of the phenomenon of the diffraction of light are different from each other–and fundamentally so–while, necessarily, having to yield the same predictions. It is in that sense that the two explanations should be ‘complementary’.

The classical explanation

I’ve done a fairly complete analysis of the classical explanation in my posts on Diffraction and the Uncertainty Principle (20 and 21 September), so I won’t dwell on that here. Let me just repeat the basics. The model is based on the so-called Huygens-Fresnel Principle, according to which each point in the slit becomes a source of a secondary spherical wave. These waves then interfere, constructively or destructively, and, hence, by adding them, we get the form of the wave at each point of time and at each point in space behind the slit. The animation below illustrates the idea. However, note that the mathematical analysis does not assume that the point sources are neatly separated from each other: instead of only six point sources, we have an infinite number of them and, hence, adding up the waves amounts to solving some integral (which, as you know, is an infinite sum).


We know what we are supposed to get: a diffraction pattern. The intensity of the light on the screen at the other side depends on (1) the slit width (d), (2) the frequency of the light (λ), and (3) the angle of incidence (θ), as shown below.


One point to note is that we have smaller bumps left and right. We don’t get that if we’d treat the slit as a single point source only, like Feynman does when he discusses the double-slit experiment for (physical) waves. Indeed, look at the image below: each of the slits acts as one point source only and, hence, the intensity curves I1 and I2 do not show a diffraction pattern. They are just nice Gaussian “bell” curves, albeit somewhat adjusted because of the angle of incidence (we have two slits above and below the center, instead of just one on the normal itself). So we have an interference pattern on the screen and, now that we’re here, let me be clear on terminology: I am going along with the widespread definition of diffraction being a pattern created by one slit, and the definition of interference as a pattern created by two or more slits. I am noting this just to make sure there’s no confusion.

Water waves

That should be clear enough. Let’s move on the quantum-mechanical explanation.

The quantum-mechanical explanation

There are several formulations of quantum mechanics: you’ve heard about matrix mechanics and wave mechanics. Roughly speaking, in matrix mechanics “we interpret the physical properties of particles as matrices that evolve in time”, while the wave mechanics approach is primarily based on these complex-valued wave functions–one for each physical property (e.g. position, momentum, energy). Both approaches are mathematically equivalent.

There is also a third approach, which is referred to as the path integral formulation, which  “replaces the classical notion of a single, unique trajectory for a system with a sum, or functional integral, over an infinity of possible trajectories to compute an amplitude” (all definitions here were taken from Wikipedia). This approach is associated with Richard Feynman but can also be traced back to Paul Dirac, like most of the math involved in quantum mechanics, it seems. It’s this approach which I’ll try to explain–again, in an intuitive way only–in order to show the two explanations should effectively lead to the same predictions.

The key to understanding the path integral formulation is the assumption that a particle–and a ‘particle’ may refer to both bosons (e.g. photons) or fermions (e.g. electrons)–can follow any path from point A to B, as illustrated below. Each of these paths is associated with a (complex-valued) probability amplitude, and we have to add all these probability amplitudes to arrive at the probability amplitude for the particle to move from A to B.


You can find great animations illustrating what it’s all about in the relevant Wikipedia article but, because I can’t upload video here, I’ll just insert two illustrations from Feynman’s 1985 QED, in which he does what I try to do, and that is to approach the topic intuitively, i.e. without too much mathematical formalism. So probability amplitudes are just ‘arrows’ (with a length and a direction, just like a complex number or a vector), and finding the resultant or final arrow is a matter of just adding all the little arrows to arrive at one big arrow, which is the probability amplitude, which he denotes as P(A, B), as shown below.


This intuitive approach is great and actually goes a very long way in explaining complicated phenomena, such as iridescence for example (the wonderful patterns of color on an oil film!), or the partial reflection of light by glass (anything between 0 and 16%!). All his tricks make sense. For example, different frequencies are interpreted as slower or faster ‘stopwatches’ and, as such, they determine the final direction of the arrows which, in turn, explains why blue and red light are reflected differently. And so on and son. It all works. […] Up to a point.

Indeed, Feynman does get in trouble when trying to explain diffraction. I’ve reproduced his explanation below. The key to the argument is the following:

  1. If we have a slit that’s very wide, there are a lot of possible paths for the photon to take. However, most of these paths cancel each other out, and so that’s why the photon is likely to travel in a straight line. Let me quote Feynman: “When the gap between the blocks is wide enough to allow many neighboring paths to P and Q, the arrows for the paths to P add up (because all the paths to P take nearly the same time), while the paths to Q cancel out (because those paths have a sizable difference in time). So the photomultiplier at Q doesn’t click.” (QED, p.54)
  2. However, “when the gap is nearly closed and there are only a few neighboring paths, the arrows to Q also add up, because there is hardly any difference in time between them, either (see Fig. 34). Of course, both final arrows are small, so there’s not much light either way through such a small hole, but the detector at Q clicks almost as much as the one at P! So when you try to squeeze light too much to make sure it’s going only in a straight line, it refuses to cooperate and begins to spread out.” (QED, p. 55)

Many arrowsFew arrows

This explanation is as simple and intuitive as Feynman’s ‘explanation’ of diffraction using the Uncertainty Principle in his introductory chapter on quantum mechanics (Lectures, I-38-2), which is illustrated below. I won’t go into the detail (I’ve done that before) but you should note that, just like the explanation above, such explanations do not explain the secondary, tertiary etc bumps in the diffraction pattern.

Diffraction of electrons

So what’s wrong with these explanations? Nothing much. They’re simple and intuitive, but essentially incomplete, because they do not incorporate all of the math involved in interference. Incorporating the math means doing these integrals for

  1. Electromagnetic waves in classical mechanics: here we are talking ‘wave functions’ with some real-valued amplitude representing the strength of the electric and magnetic field; and
  2. Probability waves: these are complex-valued functions, with the complex-valued amplitude representing probability amplitudes.

The two should, obviously, yield the same result, but a detailed comparison between the approaches is quite complicated, it seems. Now, I’ve googled a lot of stuff, and I duly note that diffraction of electromagnetic waves (i.e. light) is conveniently analyzed by summing up complex-valued waves too, and, moreover, they’re of the same familiar type: ψ = Aei(kx–ωt). However, these analyses also duly note that it’s only the real part of the wave that has an actual physical interpretation, and that it’s only because working with natural exponentials (addition, multiplication, integration, derivation, etc) is much easier than working with sine and cosine waves that such complex-valued wave functions are used (also) in classical mechanics. In fact, note the fine print in Feynman’s illustration of interference of physical waves (Fig. 37-2): he calculates the intensities I1 and I2 by taking the square of the absolute amplitudes ĥ1 and ĥ2, and the hat indicates that we’re also talking some complex-valued wave function here.

Hence, we must be talking the same mathematical waves in both explanations, aren’t we? In other words, we should get the same psi functions ψ = Aei(kx–ωt) in both explanations, don’t we? Well… Maybe. But… Probably not. As far as I know–but I must be wrong–we cannot just re-normalize the E and B vectors in these electromagnetic waves in order to establish an equivalence with probability waves. I haven’t seen that being done (but I readily admit I still have a lot of reading to do) and so I must assume it’s not very clear-cut at all.

So what? Well… I don’t know. So far, I did not find a ‘nice’ or ‘intuitive’ explanation of a quantum-mechanical approach to the phenomenon of diffraction yielding the same grand diffraction equation, referred to as the Fresnel-Kirchoff diffraction formula (see below), or one of its more comprehensible (because simplified) representations, such as the Fraunhofer diffraction formula, or the even easier formula which I used in my own post (you can google them: they’re somewhat less monstrous and–importantly–they work with real numbers only, which makes them easier to understand).

Kirchoff formula[…] That looks pretty daunting, isn’t it? You may start to understand it a bit better by noting that (n, r) and (n, s) are angles, so that’s OK in a cosine function. The other variables also have fairly standard interpretations, as shown below, but… Admit it: ‘easy’ is something else, isn’t it?


So… Where are we here? Well… As said, I trust that both explanations are mathematically equivalent – just like matrix and wave mechanics :-) –and, hence, that a quantum-mechanical analysis will indeed yield the same formula. However, I think I’ll only understand physics truly if I’ve gone through all of the motions here.

Well then… I guess that should be some kind of personal benchmark that should guide me on this journey, isn’t it? :-) I’ll keep you posted.

Post scriptum: To be fair to Feynman, and demonstrating his talent as a teacher once again, he actually acknowledges that the double-slit thought experiment uses simplified assumptions that do not include diffraction effects when the electrons go through the slit(s). He does so, however, only in one of the first chapters of Vol. III of the Lectures, where he comes back to the experiment to further discuss the first principles of quantum mechanics. I’ll just quote him: “Incidentally, we are going to suppose that the holes 1 and 2 are small enough that when we say an electron goes through the hole, we don’t have to discuss which part of the hole. We could, of course, split each hole into pieces with a certain amplitude that the electron goes to the top of the hole and the bottom of the hole and so on. We will suppose that the hole is small enough so that we don’t have to worry about this detail. That is part of the roughness involved; the matter can be made more precise, but we don’t want to do so at this stage.” So here he acknowledges that he omitted the intricacies of diffraction. I noted this only later. Sorry.

A Royal Road to quantum physics?

It is said that, when Ptolemy asked Euclid to quickly explain him geometry, Euclid told the King that there was no ‘Royal Road’ to it, by which he meant it’s just difficult and takes a lot of time to understand.

Physicists will tell you the same about quantum physics. So, I know that, at this point, I should just study Feynman’s third Lectures Volume and shut up for a while. However, before I get lost while playing with state vectors, S-matrices, eigenfunctions, eigenvalues and what have you, I’ll try that Royal Road anyway, building on my previous digression on Hamiltonian mechanics.

So… What was that about? Well… If you understood anything from my previous post, it should be that both the Lagrangian and Hamiltonian function use the equations for kinetic and potential energy to derive the equations of motion for a system. The key difference between the Lagrangian and Hamiltonian approach was that the Lagrangian approach yields one differential equation–which had to be solved to yield a functional form for x as a function of time, while the Hamiltonian approach yielded two differential equations–which had to be solved to yield a functional form for both position (x) and momentum (p). In other words, Lagrangian mechanics is a model that focuses on the position variable(s) only, while, in Hamiltonian mechanics, we also keep track of the momentum variable(s). Let me briefly explain the procedure again, so we’re clear on it:

1. We write down a function referred to as the Lagrangian function. The function is L = T – V with T and V the kinetic and potential energy respectively. T has to be expressed as a function of velocity (v) and V has to be expressed as a function of position (x). You’ll say: of course! However, it is an important point to note, otherwise the following step doesn’t make sense. So we take the equations for kinetic and potential energy and combine them to form a function L = L(x, v).

2. We then calculate the so-called Lagrangian equation, in which we use that function L. To be precise: what we have to do is calculate its partial derivatives and insert these in the following equation:


It should be obvious now why I stressed we should write L as a function of velocity and position, i.e. as L = L(x, v). Otherwise those partial derivatives don’t make sense. As to where this equation comes from, don’t worry about it: I did not explain why this works. I didn’t do that here, and I also didn’t do it in my previous post. What we’re doing here is just explaining how it goes, not why.

3. If we’ve done everything right, we should get a second-order differential equation which, as mentioned above, we should then solve for x(t). That’s what ‘solving’ a differential equation is about: find a functional form that satisfies the equation.

Let’s now look at the Hamiltonian approach.

1. We write down a function referred to as the Hamiltonian function. It looks similar to the Lagrangian, except that we sum kinetic and potential energy, and that T has to be expressed as a function of the momentum p. So we have a function H = T + V = H(x, p).

2. We then calculate the so-called Hamiltonian equations, which is a set of two equations, rather than just one equation. [We have two for the one-dimensional situation that we are modeling here: it’s a different story (i.e. we will have more equations) if we’d have more degrees of freedom of course.] It’s the same as in the Lagrangian approach: it’s just a matter of calculating partial derivatives, and insert them in the equations below. Again, note that I am not explaining why this Hamiltonian hocus-pocus actually works. I am just saying how it works.

Hamiltonian equations

3. If we’ve done everything right, we should get two first-order differential equations which we should then solve for x(t) and p(t). Now, solving a set of equations may or may not be easy, depending on your point of view. If you wonder how it’s done, there’s excellent stuff on the Web that will show you how (such as, for instance, Paul’s Online Math Notes).

Now, I mentioned in my previous post that the Hamiltonian approach to modeling mechanics is very similar to the approach that’s used in quantum mechanics and that it’s therefore the preferred approach in physics. I also mentioned that, in classical physics, position and momentum are also conjugate variables, and I also showed how we can calculate the momentum as a conjugate variable from the Lagrangian: p = ∂L/∂v. However, I did not dwell on what conjugate variables actually are in classical mechanics. I won’t do that here either. Just accept that conjugate variables, in classical mechanics, are also defined as pairs of variables. They’re not related through some uncertainty relation, like in quantum physics, but they’re related because they can both be obtained as the derivatives of a function which I haven’t introduced as yet. That function is referred to as the action, but… Well… Let’s resist the temptation to digress any further here. If you really want to know what action is–in physics, that is… :-) Well… Google it, I’d say. What you should take home from this digression is that position and momentum are also conjugate variables in classical mechanics.

Let’s now move on to quantum mechanics. You’ll see that the ‘similarity’ in approach is… Well… Quite relative, I’d say. :-)

Position and momentum in quantum mechanics

As you know by now (I wrote at least a dozen posts on this), the concept of position and momentum in quantum mechanics is very different from that in classical physics: we do not have x(t) and p(t) functions which give a unique, precise and unambiguous value for x and p when we assign a value to the time variable and plug it in. No. What we have in quantum physics is some weird wave function, denoted by the Greek letters φ (phi) or ψ (psi) or, using Greek capitals, Φ and Ψ. To be more specific, the psi usually denotes the wave function in the so-called position space (so we write ψ = ψ(x)), and the phi will usually denote the wave function in the so-called momentum space (so we write φ = φ(p)). That sounds more complicated than it is, obviously, but I just wanted to respect terminology here. Finally, note that the ψ(x) and φ(p) wave functions are related through the Uncertainty Principle: they’re conjugate variables, and we have this ΔxΔp = ħ/2 equation, in which the Δ is some standard deviation from some mean value. I should not go into more detail here: you know that by now, don’t you?

While the argument of these functions is some real number, the wave functions themselves are complex-valued, so they have a real and complex amplitude. I’ve also illustrated that a couple of times already but, just to make sure, take a look at the animation below, so you know what we are sort of talking about:

  1. The A and B situations represent a classical oscillator: we know exactly where the red ball is at any point in time.
  2. The C to H situations give us a complex-valued amplitude, with the blue oscillation as the real part, and the pink oscillation as the imaginary part.

QuantumHarmonicOscillatorAnimationSo we have such wave function both for x and p. Note that the animation above suggests we’re only looking at the wave function for x but–trust me–we have a similar one for p, and they’re related indeed. [To see how exactly, I’d advise you to go through the proof of the so-called Kennard inequality.] So… What do we do with that?

The position and momentum operators

When we want to know where a particle actually is, or what its momentum is, we need to do something with this wave function ψ or φ. Let’s focus on the position variable first. While the wave function itself is said to have ‘no physical interpretation’ (frankly, I don’t know what that means: I’d think everything has some kind of interpretation (and what’s physical and non-physical?), but let’s not get lost in philosophy here), we know that the square of the absolute value of the probability amplitude yields a probability density. So |ψ(x)|gives us a probability density function or, to put it simply, the probability to find our ‘particle’ (or ‘wavicle’ if you want) at point x. Let’s now do something more sophisticated and write down the expected value of x, which is usually denoted by 〈x〉 (although that invites confusion with Dirac’s bra-ket notation, but don’t worry about it):

expected value of x

Don’t panic. It’s just an integral. Look at it. ψ* is just the complex conjugate (i.e. a – ib if ψ = a + ib) and you will (or should) remember that the product of a complex number with its (complex) conjugate gives us the square of its absolute value: ψ*ψ = |ψ(x)|2. What about that x? Can we just insert that there, in-between ψ* and ψ ? Good question. The answer is: yes, of course! That x is just some real number and we can put it anywhere. However, it’s still a good question because, while multiplication of complex numbers is commutative (hence,  z1z2 = z2z1), the order of our operators – which we will introduce soon – can often not be changed without consequences, so it is something to note.

For the rest, that integral above is quite obvious and it should really not puzzle you: we just multiply a value with its probability of occurring and integrate over the whole domain to get an expected value 〈x〉. Nothing wrong here. Note that we get some real number. [You’ll say: of course! However, I always find it useful to check that when looking at those things mixing complex-valued functions with real-valued variables or arguments. A quick check on the dimensions of what we’re dealing helps greatly in understanding what we’re doing.]

So… You’ve surely heard about the position and momentum operators already. Is that, then, what it is? Doing some integral on some function to get an expected value? Well… No. But there’s a relation. However, let me first make a remark on notation, because that can be quite confusing. The position operator is usually written with a hat on top of the variable – like ẑ – but so I don’t find a hat with every letter with the editor tool for this blog and, hence, I’ll use a bold letter x and p to denote the operator. Don’t confuse it with me using a bold letter for vectors though ! Now, back to the story.

Let’s first give an example of an operator you’re already familiar with in order to understand what an operator actually is. To put it simply: an operator is an instruction to do something with a function. For example: ∂/∂t is an instruction to differentiate some function with regard to the variable t (which usually stands for time). The ∂/∂t operator is obviously referred to as a differentiation operator. When we put a function behind, e.g. f(x, t), we get ∂f(x, t)/∂t, which is just another function in x and t.

So we have the same here: x in itself is just an instruction: you need to put a function behind in order to get some result. So you’ll see it as xψ. In fact, it would be useful to use brackets probably, like x[ψ], especially because I can’t put those hats on the letters here, but I’ll stick to the usual notation, which does not use brackets.

Likewise, we have a momentum operator: p = –iħ∂/∂x. […] Let it sink in. [..]

What’s this? Don’t worry about it. I know: that looks like a very different animal than that x operator. I’ll explain later. Just note, for the moment, that the momentum operator (also) involves a (partial) derivative and, hence, we refer to it as a differential operator (as opposed to differentiation operator). The instruction p = –iħ∂/∂x basically means: differentiate the function with regard to x and multiply with iħ (i.e. the product of Planck’s constant and the imaginary unit i). Nothing wrong with that. Just calculate a derivative and multiply with a tiny imaginary (complex) number.

Now, back to the position operator x. As you can see, that’s a very simple operator–much simpler than the momentum operator in any case. The position operator applied to ψ yields, quite simply, the xψ(x) factor in the integrand above. So we just get a new function xψ(x) when we apply x to ψ, of which the values are simply the product of x and ψ(x). Hence, we write xψ = xψ.

Really? Is it that simple? Yes. For now at least. :-)

Back to the momentum operator. Where does that come from? That story is not so simple. [Of course not. It can’t be. Just look at it.] Because we have to avoid talking about eigenvalues and all that, my approach to the explanation will be quite intuitive. [As for ‘my’ approach, let me note that it’s basically the approach as used in the Wikipedia article on it. :-)] Just stay with me for a while here.

Let’s assume ψ is given by ψ = ei(kx–ωt). So that’s a nice periodic function, albeit complex-valued. Now, we know that functional form doesn’t make all that much sense because it corresponds to the particle being everywhere, because the square of its absolute value is some constant. In fact, we know it doesn’t even respect the normalization condition: all probabilities have to add up to 1. However, that being said, we also know that we can superimpose an infinite number of such waves (all with different k and ω) to get a more localized wave train, and then re-normalize the result to make sure the normalization condition is met. Hence, let’s just go along with this idealized example and see where it leads.

We know the wave number k (i.e. its ‘frequency in space’, as it’s often described) is related to the momentum p through the de Broglie relation: p = ħk. [Again, you should think about a whole bunch of these waves and, hence, some spread in k corresponding to some spread in p, but just go along with the story for now and don’t try to make it even more complicated.] Now, if we differentiate with regard to x, and then substitute, we get ∂ψ/∂x = ∂ei(kx–ωt)/∂x = ikei(kx–ωt) = ikψ, or


So what is this? Well… On the left-hand side, we have the (partial) derivative of a complex-valued function (ψ) with regard to x. Now, that derivative is, more likely than not, also some complex-valued function. And if you don’t believe me, just look at the right-hand side of the equation, where we have that i and ψ. In fact, the equation just shows that, when we take that derivative, we get our original function ψ but multiplied by ip/ħ. Hey! We’ve got a differential equation here, don’t we? Yes. And the solution for it is… Well… The natural exponential. Of course! That should be no surprise because we started out with a natural exponential as functional form! So that’s not the point. What is the point, then? Well… If we bring that i/ħ factor to the other side, we get:

(–i/ħ)(∂ψ/∂x) = pψ

[If you’re confused about the –i, remember that i–1 = 1/i = –i.] So… We’ve got pψ on the right-hand side now. So… Well… That’s like xψ, isn’t it? Yes. :-) If we define the momentum operator as p = (–i/ħ)(∂/∂x), then we get pψ = pψ. So that’s the same thing as for the position operator. It’s just that p is… Well… A more complex operator, as it has that –i/ħ factor in it. And, yes, of course it also involves an instruction to differentiate, which also sets it apart from the position operator, which is just an instruction to multiply the function with its argument.

I am sure you’ll find this funny–perhaps even fishy–business. And, yes, I have the same questions: what does it all mean? I can’t answer that here. As for now, just accept that this position and momentum operator are what they are, and that I can’t do anything about that. But… I hear you sputter: what about their interpretation? Well… Sorry… I could say that the functions xψ and pψ are so-called linear maps but that is not likely to help you much in understanding what these operators really do. You – and I for sure :-) – will indeed have to go through that story of eigenvalues to a somewhat deeper understanding of what these operators actually are. That’s just how it is. As for now, I just have to move on. Sorry for letting you down here. :-)

Energy operators

Now that we sort of ‘understand’ those position and momentum operators (or their mathematical form at least), it’s time to introduce the energy operators. Indeed, in quantum mechanics, we’ve also got an operator for (a) kinetic energy, and for (b) potential energy. These operators are also denoted with a hat above the T and V symbol. All quantum-mechanical operators are like that, it seems. However, because of the limitations of the editor tool here, I’ll also use a bold T and V respectively. Now, I am sure you’ve had enough of this operators, so let me just jot them down:

  1. V = V, so that’s just an instruction to multiply a function with V = V(x, t). That’s easy enough because that’s just like the position vector.
  2. As for T, that’s more complicated. It involves that momentum operator p, which was also more complicated, remember? Let me just give you the formula:

T = p/2m = p2/2m.

So we multiply the operator p with itself here. What does that mean? Well… Because the operator involves a derivative, it means we have to take the derivative twice and… No ! Well… Let me correct myself: yes and no. :-) That p·p product is, strictly speaking, a dot product between two vectors, and so it’s not just a matter of differentiating twice. Now that we are here, we may just as well extend the analysis a bit and assume that we also have a y and z coordinate, so we’ll have a position vector r = (x, y, z). [Note that r is a vector here, not an operator. !?! Oh… Well…] Extending the analysis to three (or more) dimensions means that we should replace the differentiation operator by the so-called gradient or del operator: ∇ = (∂/∂x, ∂/∂y, ∂/∂z). And now that dot product p will, among other things, yield another operator which you’re surely familiar with: the Laplacian. Let me remind you of it:


Hence, we can write the kinetic energy operator T as:

Kinetic energy operator

I quickly copied this formula from Wikipedia, which doesn’t have the limitation of the WordPress editor tool, and so you see it now the way you should see it, i.e. with the hat notation. :-)


In case you’re despairing, hang on ! We’re almost there. :-) We can, indeed, now define the Hamiltonian operator that’s used in quantum mechanics. While the Hamiltonian function was the sum of the potential and kinetic energy functions in classical physics, in quantum mechanics we add the two energy operators. You’ll grumble and say: that’s not the same as adding energies. And you’re right: adding operators is not the same as adding energy functions. Of course it isn’t. :-) But just stick to the story, please, and stop criticizing. [Oh – just in case you wonder where that minus sign comes from: i2 = –1, of course.]

Adding the two operators together yields the following:

Hamiltonian operator

So. Yes. That’s the famous Hamiltonian operator.

OK. So what?

Yes…. Hmm… What do we do with that operator? Well… We apply it to the function and so we write Hψ = … Hmm…

Well… What? 

Well… I am not writing this post just to give some definitions of the type of operators that are used in quantum mechanics and then just do obvious stuff by writing it all out. No. I am writing this post to illustrate how things work.

OK. So how does it work then? 

Well… It turns out that, in quantum mechanics, we have similar equations as in classical mechanics. Remember that I just wrote down the set of (two) differential equations when discussing Hamiltonian mechanics? Here I’ll do the same. The Hamiltonian operator appears in an equation of which you’ve surely heard of and which, just like me, you’d love to understand–and then I mean: understand it fully, completely, and intuitively. […] Yes. It’s the Schrödinger equation:

schrodinger 1

Note, once again, I am not saying anything about where this equation comes from. It’s like jotting down that Lagrange equation, or the set of Hamiltonian equations: I am not saying anything about the why of all this hocus pocus. I am just saying how it goes. So we’ve got another differential equation here, and we have to solve it. If we all write it out using the above definition of the Hamiltonian operator, we get:

Schrodinger 2

If you’re still with me, you’ll immediately wonder about that μ. Well… Don’t. It’s the mass really, but the so-called reduced mass. Don’t worry about it. Just google it if you want to know more about this concept of a ‘reduced’ mass: it’s a fine point which doesn’t matter here really. The point is the grand result.

But… So… What is the grand result? What are we looking at here? Well… Just as I said above: that Schrödinger equation is a differential equation, just like those equations we got when applying the Lagrangian and Hamiltonian approach to modeling a dynamic system in classical mechanics, and, hence, just like what we (were supposed to) do there, we have to solve it. :-) Of course, it looks much more daunting than our Lagrangian or Hamiltonian differential equations, because we’ve got complex-valued functions here, and you’re probably scared of that iħ factor too. But you shouldn’t be. When everything is said and done, we’ve got a differential equation here that we need to solve for ψ. In other words, we need to find functional forms for ψ that satisfy the above equation. That’s it. Period.

So how do these solutions look like? Well, they look like those complex-valued oscillating things in the very first animation above. Let me copy them again:


So… That’s it then? Yes. I won’t say anything more about it here, because (1) this post has become way too long already, and so I won’t dwell on the solutions of that Schrödinger equation, and because (2) I do feel it’s about time I really start doing what it takes, and that’s to work on all of the math that’s necessary to actually do all that hocus-pocus. :-)

P.S: As for understanding the Schrödinger equation “fully, completely, and intuitively”, I am not sure that’s actually possible. But I am trying hard and so let’s see. :-) I’ll tell you after I mastered the math. But something inside of me tells me there’s indeed no Royal Road to it. :-)

Newtonian, Lagrangian and Hamiltonian mechanics

This is just another loose end I wanted to tie up. As an economist, I thought I knew a thing or two about optimization. Indeed, when everything is said and done, optimization is supposed to an economist’s forte, isn’t it? :-) Hence, I thought I sort of understood what a Lagrangian would represent in physics, and I also thought I sort of intuitively understood why and how it could be used it to model the behavior of a dynamic system. In short, I thought that Lagrangian mechanics would be all about optimizing something subject to some constraints. Just like in economics, right?

[…] Well… When checking it out, I found that the answer is: yes, and no. And, frankly, the honest answer is more no than yes. :-) Economists (like me), and all social scientists (I’d think), learn only about one particular type of Lagrangian equations: the so-called Lagrange equations of the first kind. This approach models constraints as equations that are to be incorporated in an objective function (which is also referred to as a Lagrangian–and that’s where the confusion starts because it’s different from the Lagrangian that’s used in physics, which I’ll introduce below) using so-called Lagrange multipliers. If you’re an economist, you’ll surely remember it: it’s a problem written as “maximize f(x, y) subject to g(x, y) = c”, and we solve it by finding the so-called stationary points (i.e. the points for which the derivative is zero) of the (Lagrangian) objective function f(x, y) + λ[g(x, y) – c].

Now, it turns out that, in physics, they use so-called Lagrange equations of the second kind, which incorporate the constraints directly by what Wikipedia refers to as a “judicious choice of generalized coordinates.”

Generalized coordinates? Don’t worry about it: while generalized coordinates are defined formally as “parameters that describe the configuration of the system relative to some reference configuration”, they are, in practice, those coordinates that make the problem easy to solve. For example, for a particle (or point) that moves on a circle, we’d not use the Cartesian coordinates x and y but just the angle that locates the particles (or point). That simplifies matters because then we have only parameter to track. In practice, the number of parameters (i.e. the number of generalized coordinates) will be defined by the number of degrees of freedom of the system, and we know what that means: it’s the number of independent directions in which the particle (or point) can move, and that usually includes not only the x, y and z directions but also rotational and/or vibratory movements. We went over that when discussing kinetic gas theory, so I won’t say more about that here.

So… OK… That was my first surprise: the physicist’s Lagrangian is different from the social scientist’s Lagrangian. 

The second surprise was that all physics textbooks seem to dislike the Lagrangian approach. Indeed, they opt for a related but different function when developing a model of a dynamic system: it’s a function referred to as the Hamiltonian. And, no, the preference for the Hamiltonian approach has nothing to do with the fact that William Rowan Hamilton was Anglo-Irish, while Joseph-Louis Lagrange (born as Giuseppe Lodovico Lagrangia) was Italian-French. :-)

The modeling approach which uses the Hamiltonian instead of the Lagrangian is, of course, referred to as Hamiltonian mechanics.

And then we have good old Newtonian mechanics as well, obviously. In case you wonder what that is: it’s the modeling approach that we’ve been using all along. :-) But I’ll remind you of what it is in a moment: it amounts to making sense of some situation by using Newton’s laws of motion only, rather than any sophisticated mathematical system of equations.

Introducing Lagrangian and Hamiltonian mechanics is quite confusing because the functions that are involved (i.e. the so-called Lagrangian and Hamiltonian functions) look very similar: we write the Lagrangian as the difference between the kinetic and potential energy of a system (L = T – V), while the Hamiltonian is the sum of both (H = T + V). Now, I could make this post very simple and just ask you to note that both approaches are basically ‘equivalent’ (in the sense that they lead to the same solutions, i.e. the same equations of motion expressed as a function of time) and that a choice between them is just a matter of preference–like choosing between an English versus a continental breakfast. :-) [I note the English breakfast has usually some extra bacon, or a sausage, so you get more but not necessarily better.] So that would be the end of this digression then, and I should be done. However, I must assume you’re a curious person, just like me, and, hence, you’ll say that, while being ‘equivalent’, they’re obviously not the same. So how do the two approaches differ exactly?

Let’s try to get a somewhat intuitive understanding of it all by taking, once again, the example of a simple harmonic oscillator, as depicted below. It could be a mass on a spring. In fact, our example will, in fact, be that of an oscillating mass on a spring. Let’s also assume there’s no damping, because that makes the analysis soooooooo much easier: we can then just express everything as a function of one variable only, time or position, instead of having to keep track of both.


Of course, we already know all of the relevant equations for this system just from applying Newton’s laws (so that’s Newtonian mechanics). We did that in a previous post. [I can’t remember which one, but I am sure I’ve done this already.] Hence, we don’t really need the Lagrangian or Hamiltonian. But, of course, that’s the point of this post: I want to illustrate how these other approaches to modeling a dynamic system actually work, and so it’s good we have the correct answer already so we can make sure we’re not going off track here. So… Let’s go… :-)

I. Newtonian mechanics

Let me recapitulate the basics of a mass on a spring which, in jargon, is called a harmonic oscillator. Hooke’s law is there: the force on the mass is proportional to its distance from the zero point (i.e. the displacement), and the direction of the force is towards the zero point–not away from it, and so we have a minus sign. In short, we can write:

F = –kx (i.e. Hooke’s law)

Now, Newton‘s Law (Newton’s second law to be precise) says that F is equal to the mass times the acceleration: F = ma. So we write:

F = ma = m(d2x/dt2) = –kx

So that’s just Newton’s law combined with Hooke’s law. We know this is a differential equation for which there’s a general solution with the following form:

x(t) = Acos(ωt + α)

If you wonder why… Well… I can’t digress on that here again: just note, from that differential equation, that we apparently need a function x(t) that yields itself when differentiated twice. So that must be some sinusoidal function, like sine or cosine, because these do that. […] OK… Sorry, but I must move on.

As for the new ‘variables’ (A, ω and α), A depends on the initial condition and is the (maximum) amplitude of the motion. We also already know from previous posts (or, more likely, because you do knew something about physics before reading this) that A is related to the energy of the system. To be precise: the energy of the system is proportional to the square of the amplitude: E ∝ A2. As for ω, the angular frequency, that’s determined by the spring itself and the oscillating mass on it: ω = (k/m)1/2 = 2π/T = 2πf (with T the period, and f the frequency expressed in oscillations per second, as opposed to the angular frequency, which is the frequency expressed in radians per second). Finally, I should note that α is just a phase shift which depends on how we define our t = 0 point: if x(t) is zero at t = 0, then that cosine function should be zero and then α will be equal to ±π/2.

OK. That’s clear enough. What about the ‘operational currency of the universe’, i.e. the energy of the oscillator? Well… I told you already, and we don’t need the energy concept here to find the equation of motion. In fact, that’s what distinguishes this ‘Newtonian’ approach from the Lagrangian and Hamiltonian approach. But… Now that we’re at it, and we have to move to a discussion of these two animals (I mean the Lagrangian and Hamiltonian), let’s go for it.

We have kinetic versus potential energy. Kinetic energy (T) is what it always is. It depends on the velocity and the mass: K.E. = T = mv2/2 = m(dx/dt)2/2 = p2/2m. Huh? What’s this expression with p in it? […] It’s momentum: p = mv. Just check it: it’s an alternative formula for T really. Nothing more, nothing less. I am just noting it here because it will pop up again in our discussion of the Hamiltonian modeling approach. But that’s for later. Onwards!

What about potential energy (V)? We know that’s equal to V = kx2/2. And because energy is conserved, potential energy (V) and kinetic energy (T) should add up to some constant. Let’s check it: dx/dt = d[Acos(ωt + α)]/dt = –Aωsin(ωt + α). [Please do the derivation: don’t accept things at face value. :-)] Hence, T = mA2ω2sin2(ωt + α)/2 = mA2(k/m)sin2(ωt + α)/2 = kA2sin2(ωt + α)/2. Now, V is equal to V = kx2/2 = k[Acos(ωt + α)]2/2 = k[Acos(ωt + α)]2/2 = kA2cos2(ωt + α)/2. Adding both yields:

T + V = kA2sin2(ωt + α)/2 + kA2cos2(ωt + α)/2

= (1/2)kA2[sin2(ωt + α) + cos2(ωt + α)] = kA2/2.

Uff! Glad that seems to work out: the total energy is, indeed, proportional to the square of the amplitude and the constant of proportionality is equal to k/2. [You should now wonder why we do not have m in this formula but, if you’d think about it, you can answer your own question: the amplitude will depend on the mass (bigger mass, smaller amplitude, and vice versa), so it’s actually in the formula already.]

The point to note is that this Hamiltonian function H = T + V is just a constant, not only for this particular case (an oscillation without damping), but in all cases where H represents the total energy of a (closed) system: H = T + V = kA2/2.

OK. That’s clear enough. How does our Lagrangian look like? That’s not a constant obviously. Just so you can visualize things, I’ve drawn the graph below:

  1. The red curve represents kinetic energy (T) as a function of the displacement x: T is zero at the turning points, and reaches a maximum at the x = 0 point.
  2. The blue curve is potential energy (V): unlike T, V reaches a maximum at the turning points, and is zero at the x = 0 point. In short, it’s the mirror image of the red curve.
  3. The Lagrangian is the green graph: L = T – V. Hence, L reaches a minimum at the turning points, and a maximum at the x = 0 point.


While that green function would make an economist think of some Lagrangian optimization problem, it’s worth noting we’re doing any such thing here: we’re not interested in stationary points. We just want the equation(s) of motion. [I just thought that would be worth stating, in light of my own background and confusion in regard to it all. :-)]

OK. Now that we have an idea of what the Lagrangian and Hamiltonian functions are (it’s probably worth noting also that we do not have a ‘Newtonian function’ of some sort), let us now show how these ‘functions’ are used to solve the problem. What problem? Well… We need to find some equation for the motion, remember? [I find that, in physics, I often have to remind myself of what the problem actually is. Do you feel the same? :-) ] So let’s go for it.

II. Lagrangian mechanics

As this post should not turn into a chapter of some math book, I’ll just describe the how, i.e. I’ll just list the steps one should take to model and then solve the problem, and illustrate how it goes for the oscillator above. Hence, I will not try to explain why this approach gives the correct answer (i.e. the equation(s) of motion). So if you want to know why rather than how, then just check it out on the Web: there’s plenty of nice stuff on math out there.

The steps that are involved in the Lagrangian approach are the following:

  1. Compute (i.e. write down) the Lagrangian function L = T – V. Hmm? How do we do that? There’s more than one way to express T and V, isn’t it? Right you are! So let me clarify: in the Lagrangian approach, we should express T as a function of velocity (v) and V as a function of position (x), so your Lagrangian should be L = L(x, v). Indeed, if you don’t pick the right variables, you’ll get nowhere. So, in our example, we have L = mv2/2 – kx2/2.
  2. Compute the partial derivatives ∂L/∂x and ∂L/∂v. So… Well… OK. Got it. Now that we’ve written L using the right variables, that’s a piece of cake. In our example, we have: ∂L/∂x = – kx and ∂L/∂v = mv. Please note how we treat x and v as independent variables here. It’s obvious from the use of the symbol for partial derivatives: ∂. So we’re not taking any total differential here or so. [This is an important point, so I’d rather mention it.]
  3. Write down (‘compute’ sounds awkward, doesn’t it?) Lagrange’s equation: d(∂L/∂v)/dt = ∂L/∂x. […] Yep. That’s it. Why? Well… I told you I wouldn’t tell you why. I am just showing the how here. This is Lagrange’s equation and so you should take it for granted and get on with it. :-) In our example: d(∂L/∂v)/dt = d(mv)/dt = –k(dx/dt) = ∂L/∂x = – kx. We can also write this as m(dv/dt) = m(d2x/dt2) = –kx.     
  4. Finally, solve the resulting differential equation. […] ?! Well… Yes. […] Of course, we’ve done that already. It’s the same differential equation as the one we found in our ‘Newtonian approach’, i.e. the equation we found by combining Hooke’s and Newton’s laws. So the general solution is x(t) = Acos(ωt + α), as we already noted above.

So, yes, we’re solving the same differential equation here. So you’ll wonder what’s the difference then between Newtonian and Lagrangian mechanics? Yes, you’re right: we’re indeed solving the same second-order differential equation here. Exactly. Fortunately, I’d say, because we don’t want any other equation(s) of motion because we’re talking the same system. The point is: we got that differential equation using an entirely different procedure, which I actually didn’t explain at all: I just said to compute this and then that and… – Surprise, surprise! – we got the same differential equation in the end. :-) So, yes, the Newtonian and Lagrangian approach to modeling a dynamic system yield the same equations, but the Lagrangian method is much more (very much more, I should say) convenient when we’re dealing with lots of moving bits and if there’s more directions (i.e. degrees of freedom) in which they can move.

In short, Lagrange could solve a problem more rapidly than Newton with his modeling approach and so that’s why his approach won out. :-) In fact, you’ll usually see the spatial variables noted as qj. In this notation, j = 1, 2,… n, and n is the number of degrees of freedom, i.e. the directions in which the various particles can move. And then, of course, you’ll usually see a second subscript i = 1, 2,… m to keep track of every qfor each and every particle in the system, so we’ll have n×m qij‘s in our model and so, yes, good to stick to Lagrange in that case.

OK. You get that, I assume. Let’s move on to Hamiltonian mechanics now.

III. Hamiltonian mechanics

The steps here are the following. [Again, I am just explaining the how, not the why. You can find mathematical proofs of why this works in handbooks or, better still, on the Web.]

  1. The first step is very similar as the one above. In fact, it’s exactly the same: write T and V as a function of velocity (v) and position (x) respectively and construct the Lagrangian. So, once again, we have L = L(x, v). In our example: L(x, v) = mv2/2 – kx2/2.
  2. The second step, however, is different. Here, the theory becomes more abstract, as the Hamiltonian approach does not only keep track of the position but also of the momentum of the particles in a system. Position (x) and momentum (p) are so-called canonical variables in Hamiltonian mechanics, and the relation with Lagrangian mechanics is the following: p = ∂L/∂v. Huh? Yeah. Again, don’t worry about the why. Just check it for our example: ∂(mv2/2 – kx2/2)/∂v = 2mv/2 = mv. So, yes, it seems to work. Please note, once again, how we treat x and v as independent variables here, as is evident from the use of the symbol for partial derivatives. Let me get back to the lesson, however. The second step is: calculate the conjugate variables. In more familiar wording: compute the momenta.
  3. The third step is: write down (or ‘build’ as you’ll see it, but I find that wording strange too) the Hamiltonian function H = T + V. We’ve got the same problem here as the one I mentioned with the Lagrangian: there’s more than one way to express T and V. Hence, we need some more guidance. Right you are! When writing your Hamiltonian, you need to make sure you express the kinetic energy as a function of the conjugate variable, i.e. as a function of momentum, rather than velocity. So we have H = H(x, p), not H = H(x, v)! In our example, we have H = T + V = p2/2m + kx2/2.
  4. Finally, write and solve the following set of equations: (I) ∂H/∂p = dx/dt and (II) –∂H/∂x = dp/dt. [Note the minus sign in the second equation.] In our example: (I) p/m = dx/dt and (II) –kx = dp/dt. The first equation is actually nothing but the definition of p: p = mv, and the second equation is just Hooke’s law: F = –kx. However, from a formal-mathematical point of view, we have two first-order differential equations here (as opposed to one second-order equation when using the Lagrangian approach), which should be solved simultaneously in order to find position and momentum as a function of time, i.e. x(t) and p(t). The end result should be the same: x(t) = Acos(ωt + α) and p(t) = … Well… I’ll let you solve this: time to brush up your knowledge about differential equations. :-)

You’ll say: what the heck? Why are you making things so complicated? Indeed, what am I doing here? Am I making things needlessly complicated?

The answer is the usual one: yes, and no. Yes. If we’d want to do stuff in the classical world only, the answer seems to be: yes! In that case, the Lagrangian approach will do and may actually seem much easier, because we don’t have a set of equations to solve. And why would we need to keep track of p(t)? We’re only interested in the equation(s) of motion, aren’t we? Well… That’s why the answer to your question is also: no! In classical mechanics, we’re usually only interested in position, but in quantum mechanics that concept of conjugate variables (like x and p indeed) becomes much more important, and we will want to find the equations for both. So… Yes. That means a set of differential equations (one for each variable (x and p) in the example above) rather than just one. In short, the real answer to your question in regard to the complexity of the Hamiltonian modeling approach is the following: because the more abstract Hamiltonian approach to mechanics is very similar to the mathematics used in quantum mechanics, we will want to study it, because a good understanding of Hamiltonian mechanics will help us to understand the math involved in quantum mechanics. And so that’s the reason why physicists prefer it to the Lagrangian approach.

[…] Really? […] Well… At least that’s what I know about it from googling stuff here and there. Of course, another reason for physicists to prefer the Hamiltonian approach may well that they think social science (like economics) isn’t real science. Hence, we – social scientists – would surely expect them to develop approaches that are much more intricate and abstract than the ones that are being used by us, wouldn’t we?

[…] And then I am sure some of it is also related to the Anglo-French thing. :-)

Complex Fourier analysis: an introduction

One of the most confusing sentences you’ll read in an introduction to quantum mechanics – not only in those simple (math-free) popular books but also in Feynman’s Lecture introducing the topic – is that we cannot define a unique wavelength for a short wave train. In Feynman’s words: “Such a wave train does not have a definite wavelength; there is an indefiniteness in the wave number that is related to the finite length of the train, and thus there is an indefiniteness in the momentum.” (Feynman’s Lectures, Vol. I, Ch. 38, section 1).

That is not only confusing but, in some way, actually wrong. In fact, this is an oft-occurring statement which has effectively hampered my own understanding of quantum mechanics for decades, and it was only when I had a closer look at what a Fourier analysis really is that I understood what Feynman, and others, wanted to say. In short, it’s a classic example of where a ‘simple’ account of things can lead you astray.

Indeed, we can all imagine a short wave train with a very definite frequency, and I mean a precise, unambiguous and single-valued frequency here. Just take any sinusoidal function and multiply it with a so-called envelope function in order to shape it into a short pulse. Transients have that shape, and I gave an example in previous posts. Below, I’ve just copied the example given in the Wikipedia article on Fourier analysis: f(t) is a product of two factors:

  1. The first factor in the product is a cosine function: cos[2π(3t)] to be precise.
  2. The second factor is an exponential function: exp(–πt2).

The frequency of this ‘product function’ is quite precise: cos[2π(3t)] = cos[6πt] = cos[6π(t + 3)] for all values t, and so its period is equal to 3. The only thing the second factor, i.e. exp(–πt2), does is to shape this cosine function into a nice wave train, as it quickly tends to zero on both sides of the t = 0 point. So that second function is a nice simple bell curve (just plot the graph with a graph plotter) and doesn’t change the period (or frequency) of the product. In short, the oscillation below–which we should imagine as the representation of ‘something’ traveling through space–has a very definite frequency. So what’s Feynman saying above? There’s no Δf or Δλ here, is there?


The point to note is that these Δ concepts – Δf, Δλ, and so on – actually have very precise mathematical definitions, as one would expect in physics: they usually refer to the standard deviation of the distribution of a variable around the mean.

[…] OK, you’ll say. So what?

Well… That f(t) function above can – and, more importantly, should – be written as the sum of a potentially infinite number of waves in order to make sense of the Δf and Δλ factors in those uncertainty relations. Each of these component waves has a very specific frequency indeed, and each one of them makes its own contribution to the resultant wave. Hence, there is a distribution function for these frequencies, and so that is what Δf refers to. In other words, unlike what you’d think when taking a quick look at that graph above, Δf is not zero. So what is it then?

Well… It’s tempting to get lost in the math of it all now but I don’t want this blog to be technical. The basic ideas, however, are the following. We have a real-valued function here, f(t), which is defined from –∞ to +∞, i.e. over its so-called time domain. Hence, t ranges from –∞ to +∞ (the definition of the zero point is a matter of convention only, and we can easily change the origin by adding or subtracting some constant). [Of course, we could – and, in fact, we should – also define it over a spatial domain, but we’ll keep the analysis simple by leaving out the spatial variable (x).]

Now, the so-called Fourier transform of this function will map it to its so-called frequency domain. The animation below (for which the credit must, once again, go to Wikipedia, from which I borrow most of the material here) clearly illustrates the idea. I’ll just copy the description from the same article: “In the first frames of the animation, a function f is resolved into Fourier series: a linear combination of sines and cosines (in blue). The component frequencies of these sines and cosines spread across the frequency spectrum, are represented as peaks in the frequency domain, as shown shown in the last frames of the animation). The frequency domain representation of the function, \hat{f}, is the collection of these peaks at the frequencies that appear in this resolution of the function.”


[…] OK. You sort of get this (I hope). Now we should go a couple of steps further. In quantum mechanics, we’re talking not real-valued waves but complex-valued waves adding up to give us the resultant wave. Also, unlike what’s shown above, we’ll have a continuous distribution of frequencies. Hence, we’ll not have just six discrete values for the frequencies (and, hence, just six component waves), but an infinite number of them. So how does that work? Well… To do the Fourier analysis, we need to calculate the value of the following integral for each possible frequency, which I’ll denote with the Greek letter nu (ν), as we’ve used the f symbol already–not for the frequency but to denote the function itself! Let me just jot down that integral:

Fourier transform function

Huh? Don’t be scared now. Just try to understand what it actually represents:Relax. Just take a long hard look at it. Note, first, that the integrand (i.e. the function that is to be integrated, between the integral sign and the dt, so that’s f(t)ei2πtν) is a complex-valued function (that should be very obvious from the in the exponent of e). Secondly, note that we need to do such integral for each value of ν. So, for each possible value of ν, we have t ranging from –∞ to +∞ in that integral. Hmm… OK. So… How does that work? Well… The illustration below shows the real and imaginary part respectively of the integrand for ν = 3. [Just in case you still don’t get it: we fix ν here (ν = 3), and calculate the value of the real and imaginary part of the integrand for each possible value of t, so t ranges from –∞ to +∞ indeed.]


So what do we see here? The first thing you should note is that the value of both the real and imaginary part of the integrand quickly tends to zero on both sides of the t = 0 point. That’s because of the shape of f(t), which does exactly the same. However, in-between those ‘zero or close-to-zero values’, the integrand does take on very specific non-zero values. As for the real part of the integrand, which is denoted by Re[e−2πi(3t)f(t)], we see that’s always positive, with a peak value equal to one at t = 0. Indeed, the real part of the integrand is always positive because f(t) and the real part of e−2πi(3toscillate at the same rate. Hence, when f(t) is positive, so is the real part of e−2πi(3t), and when f(t) is negative, so is the real part of e−2πi(3t). However, the story is obviously different for the imaginary part of the integrand, denoted by Im[e−2πi(3t)f(t)]. That’s because, in general, eiθ = cosθ + isinθ and the sine and cosine function are essentially the same functions except for a phase difference of π/2 (remember: sin(θ+π/2) = cosθ).

Capito? No? Hmm… Well… Try to read what I am writing above once again. Else, just give up. :-)

I know this is getting complicated but let me try to summarize what’s going on here. The bottom line is that the integral above will yield a positive real number, 0.5 to be precise (as noted in the margin of the illustration), for the real part of the integrand, but it will give you a zero value for its imaginary part (also as noted in the margin of the illustration). [As for the math involved in calculating an integral of a complex-valued function (with a real-valued argument), just note that we should indeed just separate the real and imaginary parts and integrate separately. However, I don’t want you to get lost in the math so don’t worry about it too much. Just try to stick to the main story line here.]

In short, what we have here is a very significant contribution (the associated density is 0.5) of the frequency ν = 3. 

Indeed, let’s compare it to the contribution of the wave with frequency ν = 5. For ν = 5, we get, once again, a value of zero when integrating the imaginary part of the integral above, because the positive and negative values cancel out. As for the real part, we’d think they would do the same if we look at the graph below, but they don’t: the integral does yield, in fact, a very tiny positive value: 1.7×10–6 (so we’re talking 1.7 millionths here). That means that the contribution of the component wave with frequency ν = 5 is close to nil but… Well… It’s not nil: we have some contribution here (i.e. some density in other words).


You get the idea (I hope). We can, and actually should, calculate the value of that integral for each possible value of ν. In other words, we should calculate the integral over the entire frequency domain, so that’s for ν ranging from –∞ to +∞. However, I won’t do that. :-) What I will do is just show you the grand general result (below), with the particular results (i.e. the values of 0.5 and 1.7×10–6 for ν = 3 and ν = 5) as a green and red dot respectively. [Note that the graph below uses the ξ symbol instead of ν: I used ν because that’s a more familiar symbol, but so it doesn’t change the analysis.]


Now, if you’re still with me – probably not :-) – you’ll immediately wonder why there are two big bumps instead of just one, i.e. two peaks in the density function instead of just one. [You’re used to these Gauss curves, aren’t you?] And you’ll also wonder what negative frequencies actually are: the first bump is a density function for negative frequencies indeed, and… Well… Now that you think of it: why the hell would we do such integral for negative values of ν? I won’t say too much about that: it’s a particularity which results from the fact that eiθ and e−2πiθ both complete a cycle per second (if θ is measured in seconds, that is) so… Well… Hmm… […] Yes. The fact of the matter is that we do have a mathematical equivalent of the bump for positive frequencies on the negative side of the frequency domain, so… Well… […] Don’t worry about it, I’d say. As mentioned above, we shouldn’t get lost in the math here. For our purpose here, which is just to illustrate what a complex Fourier transform actually is (rather than present all of the mathematical intricacies of it), we should just focus on the second bump of that density function, i.e. the density function for positive frequencies only. :-)

So what? You’re probably tired by now, and wondering what I want to get at. Well… Nothing much. I’ve done what I wanted to do. I started with a real-valued wave train (think of a transient electric field working its way through space, for example), and I then showed how such wave train can (and should) be analyzed as consisting of an infinite number of complex-valued component waves, which each make their own contribution to the combined wave (which consists of the sum of all component waves) and, hence, can be represented by a graph like the one above, i.e. a real-valued density function around some mean, usually denoted by μ, and with some standard deviation, usually denoted by σ. So now I hope that, when you think of Δf or Δλ in the context of a so-called ‘probability wave’ (i.e. a de Broglie wave), then you’ll think of all this machinery behind.

In other words, it is not just a matter of drawing a simple figure like the one below and saying: “You see: those oscillations represent three photons being emitted one after the other by an atomic oscillator. You can see that’s quite obvious, can’t you?”


No. It is not obvious. Why not? Because anyone that’s somewhat critical will immediately say: “But how does it work really? Those wave trains seem to have a pretty definite frequency (or wavelength), even if their amplitude dies out, and, hence, the Δf factor (or Δλ factor) in that uncertainty relation must be close or, more probably, must be equal to zero. So that means we cannot say these particles are actually somewhere, because Δx must be close or equal to infinity.”

Now you know that’s a very valid remark, and then it isn’t it. Because now you understand that one actually has to go through the tedious exercise of doing that Fourier transform, and so now you understand what those Δ symbols actually represent. I hope you do because of this post, and despite the fact my approach has been very superficial and intuitive. In other words, I didn’t say what physicists would probably say, and that is: “Take a good math course before you study physics!” :-)

The Uncertainty Principle for energy and time

In all of my posts on the Uncertainty Principle, I left a few points open or rather vague, and that was usually because I didn’t have a clear understanding of them. As I’ve read some more in the meanwhile, I think I sort of ‘get’ these points somewhat better now. Let me share them with you in this and my next posts. This post will focus on the Uncertainty Principle for time and energy.

Indeed, most (if not all) experiments illustrating the Uncertainty Principle (such as the double-slit experiment with electrons for example) focus on the position (x) and momentum (p) variables: Δx·Δp = h. But there is also a similar relationship between time and energy:

ΔE·Δt = h

These pairs of variables (position and momentum, and energy and time) are so-called conjugate variables. I think I said enough about the Δx·Δp = h equation, but what about the ΔE·Δt = h equation?

We can sort of imagine what ΔE stands for, but what about Δt? It must also be some uncertainty: about time obviously–but what time are we talking about? I found one particularly easy explanation in a small booklet that I bought–long time ago– in Berlin: the dtv-Atlas zur Atomphysik. It makes things look easy by noting that everything is, obviously, related to everything. Indeed, we know that the uncertainty about the position (Δx) is to be related to the length of the (complex-valued) wave-train that represents the ‘particle’ (or ‘wavicle’ if you prefer that term) in space (and in time). In turn, the length of that wave-train is determined by the spread in the frequencies of the component waves that make up that wave-train, as illustrated below. [However, note that the illustration assumes the amplitudes are real-valued only, so there’s no imaginary part. I’ll come back to this point in my next post.]


More in particular, we can use the de Broglie relation for matter-particles (λ = h/p) and Planck’s relation for photons (E = hν = hc/λ ⇔ E/c = p = h/λ) to relate the uncertainty about the position to the spread in the wavelengths (and, hence, the frequencies) of the component waves:

p = h/λ and, hence, Δp = Δ(h/λ) = hΔ(1/λ)

Now, Δx equals h/Δp according to the Uncertainty Principle. Therefore, Δx must equal:

 Δx = h/[hΔ(1/λ)] = 1/[Δ(1/λ)]

But, again, what about that energy-time relationship? To measure all of the above variables–which are all related one to another indeed–we need some time. More in particular, to measure the frequency of a wave, we’ll need to look at that wave and register or measure at least a few oscillations, as shown below.

time and energy

I took the image taken from the above-mentioned German booklet and, hence, the illustration incorporates some German. However, that should not deter you from following the remarkably simple argument, which is the following:

  1. The error in our measurement of the frequency (i.e. the Meβfehler, denoted by Δν) is related to the measurement time (i.e. the Meβzeit, denoted by Δt). Indeed, if τ represents the actual period of the oscillation (which is obviously unknown to us: otherwise we wouldn’t be trying to measure the frequency, which is the reciprocal of the period: ν = 1/τ), then we can write Δt as some multiple of τ. [More specifically, in the example above we assume that Δt = 4τ = 4/ν. In fact, because we don’t know τ (or ν), we should write that Δt ≈ 4τ (= 4/ν).
  2. During that time, we will measure four oscillations and, hence, we are tempted to write that ν = 4/Δt. However, because of the measurement error, we should interpret the value for our measurement not as 4 exactly but as 4 plus or minus one: 4 ± 1. Indeed, it’s like measuring the length of something: if our yardstick has only centimeter marks, then we’ll measure someone’s length as some number plus or minus 1 cm. However, if we have some other instrument that is accurate up to one millimeter, then we’ll measure his or her length as some number plus or minus 1 mm. Hence, the result of our measurement should be written as ν ± Δν = (4 ± 1)/Δt = 4/Δt ± 1/Δt. Just put in some numbers in order to gain a better understanding. For example, imagine an oscillation of 100 Hz, and a measurement time of four hundredths of a second. Then we have Δt = 4×10–2 s and, during that time, we’ll measure 4 ± 1 oscillations. Hence, we’ll write that the frequency of this wave must be equal to ν ± Δν = (4 ± 1)/Δt = 4/(4×10–2 s) ± 1/(4×10–2 s) = 100 ± 25 Hz. So we’ll accept that, because measurement time was relatively short, we have a measurement error of Δν/ν = 25/100 = 25%. [Note that ‘relatively short’ means ‘short as compared to the actual period of the oscillation’. Indeed, 4×10–2 s is obviously not short in any absolute sense: in fact, it is like an eternity when we’re talking light waves, which have frequencies measured in terahertz.]
  3. The example makes it clear that Δν, i.e. the error in our measurement of the frequency, is related to the measurement time as follows: Δν = 1/Δt. Hence, if we double the measurement time, we halve the error in the measurement of the frequency. The relationship is quite straightforward indeed: let’s assume, for example, that our measurement time Δt is equal to Δt = 10τ. In that case, we get Δν = 1/(10×10–2 s) = 10 Hz, so the measurement error is Δν/ν = 10/100 = 10%. How long should the measurement time be in order to get a 1% error only? Let’s be more general and develop a solution for Δt as a function of the error expressed as a percentage. We write: Δν/ν = x % = x/100. But Δν = 1/Δt. Hence, we have Δν/ν = (1/Δt)/ν = 1/(Δt·ν) = x/100 or Δt = 100/(x·ν). So, for x = 1 (i.e. an error of 1%), we get Δt = 100/(1·100) = 1 second; for x = 5 (i.e. an error of 5%), we get Δt = 100/(5·100) = 0.2 seconds; and–just to check if our formula is correct–for x = 25 (i.e. an error of 25%), we get Δt = 100/(25·100) = 0.04 seconds, or 4×10–2 s, which is what this example started out with.

You’ll say: so what? Well, here we are! We can now immediately derive the Uncertainty Principle for time and energy because we know the energy–of a photon at least–is directly related to the frequency and, hence, the uncertainty about the energy must be related to the measurement time as follows:

E = hν ⇒ ΔE = Δ(hν) = hΔν = h(1/Δt) = h/Δt ⇔ ΔE·Δt = h

What if we’d not be talking photons but some particle here, such as an electron? Well… Our analysis is quite general. It’s the same really, but instead of Planck’s relation, we use the de Broglie relation, which looks exactly the same: f = E/h (hence, E = hf).

The same? Really?

Well… Yes and no. You should, of course, remember that a de Broglie wave is a complex-valued wave, and so that’s why I prefer to use two different symbols here: f (the frequency of the de Broglie wave) versus ν (the frequency of the photon). That brings me to the next point which I should clarify, and that’s how the Fourier analysis of a wave-train really works. However, I’ll do that in my next post, not here. To wrap up, I’ll just repeat the point I made when I started writing this: everything is effectively related to everything in quantum mechanics. Energy, momentum, position, frequency, time,… etcetera: all of these concepts are related to one another.

That being said, position and time are (and will always remain) the more ‘fundamental’ variables obviously. We measure position in space and in time. Likewise, energy and momentum are related to frequency and wavelength respectively (through the de Broglie relations E = hf and p = h/λ), and these concepts do not make sense without a reference to time (frequency) and/or space (wavelength), and both are related through the velocity of the wave: v = fλ. [For photons, this equation becomes v = c = νλ, or λ = c/ν. I know the notation is somewhat confusing – v for velocity and ν (i.e. the Greek letter nu) for frequency  – but so be it.]

A final note needs to be made on the value of h: it’s very tiny. Indeed, a value of (about) 6.6×10−34 J·s or, using the smaller eV unit for energy, some 4.1×10−15 eV·s is unimaginably small, especially because we need to take into account that the energy concept as used in the de Broglie equation includes the rest mass of a particle. Now, anything that has any rest mass has enormous energy according to Einstein’s mass-energy equivalence relationship: E = mc2. Let’s consider, for example, a hydrogen atom. Its atomic mass can be expressed in eV/c2, using the same E = mcbut written as m = E/c2, although you will usually find it expressed in so-called unified atomic mass units (u). The mass of our hydrogen atom is approximately 1 u ≈ 931.5×106 eV/c2. That means its energy is about 931.5×106 eV.

Just from the units used (106 eV means we’re measuring stuff here in million eV, so the uncertainty is also likely to be expressed in plus or minus one million eV), it’s obvious that even very small values for Δt will still respect the Uncertainty Principle. [I am obviously exaggerating here: it is likely that we’ll want to reduce the measurement error to much less than plus or minus 1×106 eV, so that means that our measurement time Δt will have to go up. That being said, the point is quite clear: we won’t need all that much time to measure its mass (or its energy) very accurately.] At the same time, the de Broglie frequency f = E/h will be very high. To be precise, the frequency will be in the order of (931.5×106 eV)/(4.1×10−15 eV·s) = 0.2×1024 Hz. In practice, this means that the wavelength is so tiny that there’s no detector which will actually measure the ‘oscillation': any physical detector will straddle most – in fact, I should say: all – of the wiggles of the probability curve. All these facts basically state the same: a hydrogen atom occupies a very precisely determined position in time and space. Hence, we will see it as a ‘hard’ particle–i.e. not as a ‘wavicle’. That’s why the interference experiment mentions electrons, although I should immediately add that interference patterns have been observed using much larger particles as well. In any case, I wrote about that before, so I won’t repeat myself here. The point was to make that energy-time relationship somewhat more explicit, and I hope I’ve been successful at that at least. You can play with some more numbers yourself now. :-)

Post scriptum: The Breit-Wigner distribution

The Uncertainty Principle applied to time and energy has an interesting application: it’s used to measure the lifetime of very short-lived particles. In essence, the uncertainty about their energy (ΔE) is measured by equating it to the standard deviation of the measurements. The equation ΔEΔt = ħ/2 is then used to determine Δt, which is equated to the lifetime of the particle that’s being studied. Georgia University’s Hyperphysics website gives an excellent quick explanation of this, and so I just copied that below.

Hyperphysics Breit-Wigner


A post for my kids: About Einstein’s laws of radiation, and lasers

I wrapped up my previous post, which gave Planck’s solution for the blackbody radiation problem, wondering whether or not one could find the same equation using some other model, not involving the assumption that atomic oscillators have discrete energy levels.

I still don’t have an answer to that question but, sure enough, Feynman introduces another model a few pages further in his Lectures. It’s a model developed by Einstein, in 1916, and it’s much ‘richer’ in the sense that it takes into account what we know to be true: unlike matter-particles (fermions), photons like to crowd together. In more advanced quantum-mechanical parlance, their wave functions obey Bose-Einstein statistics. Now, Bose-Einstein statistics are what allows a laser to focus so much energy in one beam, and so I am writing this post for two reasons–one serious and the other not-so-serious:

  1. To present Einstein’s 1916 model for blackbody radiation.
  2. For my kids, so they understand how a laser works.

Let’s start with Einstein’s model first because, if I’d start with the laser, my kids would only read about that and nothing else. [That being said, I am sure my kids will go straight to the second part and, hence, skip Einstein anyway. :-)]

Einstein’s model of blackbody radiation

Einstein’s model is based on Planck’s and, hence, also assumes that the energy of atomic oscillators can also only take on one value of a set of permitted energy levels. However, unlike Planck, he assumes two types of emission. The first is spontaneous, and that’s basically just Planck’s model. The second is induced emission: that’s emission when light is alrady present, and Einstein’s hypothesis was that an atomic oscillator is more likely to emit a photon when there’s light of the same frequency is shining on it.


The basics of the model are shown above, and the two new variables are the following:

  • Amn is the probability for the oscillator to have its energy drop from energy level m to energy level n, independent of whether light is shining on the atom or not. So that’s the probability of spontaneous emission and it only depends on m and n.
  • Bmn is not a probability but a proportionality constant that, together with the intensity of the light shining on the oscillator–denoted by I(ω), co-determines the probability of of induced emission.

Now, as mentioned above, in this post, I basically want to explain how a laser works, and so let me be as brief as possibly by just copying Feynman here, who says it all:

Feynman on Einstein

Of course, this result must match Planck’s equation for blackbody radiation, because Planck’s equation matched experiment:

formula blackbody

To get the eħω/kT –1, Bmn must be equal to Bnm, and you should not think that’s an obvious result, because it isn’t: this equality says that the induced emission probability and the absorption probability must be equal. Good to know: this keeps the numbers of atoms in the various levels constant through what is referred to as detailed balancing: in thermal equilibrium, every process is balanced by its exact opposite. While that’s nice, and the way it actually works, it’s not obvious. It shows that the process is fully time-reversible. That’s not obvious in a situation involving statistical mechanics, which is what we’re talking about there. In any case, that’s a different topic.

As for Amn, taking into account that Bmn = Bnm, we find that Amn/Bmn =ħω3/π2c2. So we have a ratio here. What about calculating the individual values for Amn and Bmn? Can we calculate the absolute spontaneous and induced emission rates? Feynman says: No. Not with what Einstein had at the time. That was possible only a decade or so later, it seems, when Werner Heisenberg, Max Born, Pascual Jordan, Erwin Schrödinger, Paul Dirac and John von Neumann developed a fully complete theory, in the space of just five years (1925-1930), but that’s the subject of the history of science.

The point is: we have got everything here now to sort of understand how lasers work, so let’s try that to do that now.


Laser is an acronym which stands for Light Amplification by Stimulated Emission of Radiation. It’s based on the mechanism described above, which I am sure you’ve studied in very much detail. :-)

The trick is to find a method to get a gas in a state in which the number of atomic oscillators with energy level m is much and much greater than the number with energy level n. So we’re talking a situation that is not in equilibrium. On the contrary: it’s far out of equilibrium. And then, suddenly, we induce emission from this upper state, which creates a sort of chain reaction that makes “the whole lot of them dump down together”, as Feynman puts it.

The diagram below is taken from the Wikipedia article on lasers. It shows a so-called Nd:YAG laser. Huh? Yes. Nd:YAG stands for neodymium-doped yttrium aluminium garnet, and an Nd:YAG laser is a pretty common type of laser. A garnet is a precious stone: a crystal composed of a silicate mineral. And that’s what it is here, and why the laser is so-called solid-state laser, because the so-called laser medium (see the diagram) may also be a gas or even a liquid, as in dye lasers). I could also take a ruby laser, which uses ruby as the laser medium. But let’s go along with this one as for now.


In the set-up as shown above, a simple xenon flash lamp (yes, that’s a ‘neon’ lamp) provides the energy exciting the atomic oscillators in the crystal. It’s important that the so-called pumping source emits light of a higher frequency than the laser light, as shown below. In fact, the light from xenon gas, or any source, will be a spectrum but so it should (also) have light in the blue or violet range (as shown below). The important thing is that it should not have the red laser frequency, because that’s what would trigger the laser, of course.


The diagram above shows how it actually works.The trick is to get the atoms to a higher state (that’s h in the diagram above, but it’s got nothing to do with the Planck constant) from where they trickle down (and, yes, they do emit other photons while doing that), until they all get stuck in the state m, which is referred to as metastable but which is, in effect, unstable. And so then they are all dumped down together by induced emissions. So the source ‘pumps’ the crystal indeed, leading to that ‘metastable’ state which is referred to as population inversion in statistical mechanics: a lot of atoms (i.e. the members of the ‘population’) are in an excited state, rather than in a lower energy state.

And then we have a so-called optical resonator (aka as a cavity) which, in its simplest form, consists of just two mirrors around the gain medium (i.e. the crystal): these mirrors reflect the light so, once the dump starts, the induced effect is enhanced: the light which is emitted gets a chance to induce more emission, and then another chance, and another, and so on. However, although the mirrors are almost one hundred percent reflecting, light does get out because one of the mirrors is only a partial reflector, which is referred to as the output coupler, and which produces the laser’s output beam.

So… That’s all there is to it.

Really? Is it that simple? Yep. I googled a few questions to increase my understanding but so that’s basically it. Perhaps they’ll help you too and so I copied them hereunder. Before you go through that, however, have a look at how they really look like. The image below (from Wikipedia again) shows a disassembled (and assembled) ruby laser head. You can clearly see the crystal rod in the middle, and the two flashlamps that are used for pumping. I am just inserting it here because, in engineering, I found that a diagram of something and the actual thing often have not all that much in common. :-) As you can see, it’s not the case here: it looks amazingly simple, doesn’t it?


Q: We have crystal here. What’s the atomic oscillator in the crystal? A: It is the neodymium ion which provides the lasing activity in the crystal, in the same fashion as red chromium ion in ruby lasers.

Q: But how does it work exactly? A: Well… The diagram is a bit misleading. The distance between h and m should not be too big of course, because otherwise half of the energy goes into these photons that are being emitted as the oscillators ‘trickle down’. Also, if these ‘in-between’ emissions would have the same frequency as the laser light, they would induce the emission, which is not what we want. So the actual distances should look more like this:


For an actual Nd:YAG laser, we have absorption mostly in the bands between 730–760 nm and 790–820 nm, and emitted light with a wavelength with a wavelength of 1064 nm. Huh? Yes. Remember: shorter wavelength (λ) is higher frequency (ν = c/λ) and, hence, higher energy (E =  hν = hc/λ). So that’s what’s shown below.


Q: But… You’re talking bullsh**. Wavelengths in the 700–800 nm range are infrared (IR) and, hence, not even visible. And light of 1064 nm even less. A: Now you are a smart-ass! You’re right. What actually happens is a bit more complicated, as you might expect. There’s something else going on as well, a process referred to as frequency doubling or second harmonic generation (SHG). It’s a process in which photons with the same frequency (1064 nm) interact with some material to effectively ‘combine’ into new photons with twice the energy, twice the frequency and, therefore, half the wavelength of the initial photons. And so that’s light with a wavelength of 532 nm. We actuall also have so-called higher harmonics, with wavelengths at 355 and 266 nm.

Q: But… That’s green? A: Sure. A Nd:YAG laser produces a green laser beam, as shown below. If you want the red color, buy a ruby laser, which produces pulses of light with a wavelength of 694.3 nm: that’s the deep red color you’d associate with lasers. In fact, the first operational laser, produced by Hughes Research Laboratories back in 1960 (the research arm of Hughes Aircraft, now part of the Raytheon), was a ruby laser.

Powerlite_NdYAGQ: Pulses? That reminds me of something: lasers pulsate indeed, don’t they? How does that work? A: They do. Lasers have a so-called continuous wave output mode. However, there’s a technique called Q-switching. Here, an optical switch is added to the system. It’s inserted into laser cavity, and it waits for a maximum population inversion before it opens. Then the light wave runs through the cavity, depopulating the excited laser medium at maximum population inversion. It allows to produce light pulses with extremely high peak power, much higher than would be produced by the same laser if it were operating in constant output mode.

Q: What’s the use of lasers? A: Because of their ability to focus, they’re used a surgical knives, in eye surgery, or to remove tumors in the brain and treat skin cancer. Lasers are also widely used for engraving, etching, and marking of metals and plastics. When they pack more power, they can also be used to cut or weld steel. Their ability to focus is why these tiny pocket lasers can damage your eye: it’s not like a flashlight. It’s a really focused beam and so it can really blind you–not for a while but permanently.

Q: Lasers can also be used as weapons, can’t they? A: Yes. As mentioned above, techniques like Q-switching allow to produce pulses packing enormous amounts of energy into one single pulse, and you hear a lot about lasers being used as directed-energy weapons (DEWs). However, they won’t replace explosives anytime soon. Lasers were already widely used for sighting, ranging and targeting for guns, but so they’re not the source of the weapon’s firepower. That being said, the pulse of a megajoule laser would deliver the same energy as 200 grams of high explosive, but all focused on a tiny little spot. Now that’s firepower obviously, and such lasers are now possible. However, their power is more likely to be used for more benign purposes, notably igniting a nuclear fusion reaction. There’s nice stuff out there if you’d want to read more.

Q: No. I think I’ve had it. But what are those pocket lasers? A: They are what they are: handheld lasers. It just shows how technology keeps evolving. The Nano costs a hundred dollars only. I wonder if Einstein would ever have imagined that what he wrote back in 1916 would, ultimately, lead to us manipulating light with little handheld tools. We live in amazing times. :-)

Planck’s constant (II)

My previous post was tough. Tough for you–if you’ve read it. But tough for me too. :-)

The blackbody radiation problem is complicated but, when everything is said and done, what the analysis says is that the the ‘equipartition theorem’ in the kinetic theory of gases ‘theorem (or the ‘theorem concerning the average energy of the center-of-mass motion’, as Feynman terms it), is not correct. That equipartition theorem basically states that, in thermal equilibrium, energy is shared equally among all of its various forms. For example, the average kinetic energy per degree of freedom in the translation motion of a molecule should equal that of its rotational motions. That equipartition theorem is also quite precise: it also states that the mean energy, for each atom or molecule, for each degree of freedom, is kT/2. Hence, that’s the (average) energy the 19th century scientists also assigned to the atomic oscillators in a gas.

However, the discrepancy between the theoretical and empirical result of their work shows that adding atomic oscillators–as radiators and absorbers of light–to the system (a box of gas that’s being heated) is not just a matter of adding additional ‘degree of freedom’ to the system. It can’t be analyzed in ‘classical’ terms: the actual spectrum of blackbody radiation shows that these atomic oscillators do not absorb, on average, an amount of energy equal to kT/2. Hence, they are not just another ‘independent direction of motion’.

So what are they then? Well… Who knows? I don’t. But, as I didn’t quite go through the full story in my previous post, the least I can do is to try to do that here. It should be worth the effort. In Feynman’s words: “This was the first quantum-mechanical formula ever known, or discussed, and it was the beautiful culmination of decades of puzzlement.” And then it does not involve complex numbers or wave functions, so that’s another reason why looking at the detail is kind of nice. :-)

Discrete energy levels and the nature of h

To solve the blackbody radiation problem, Planck assumed that the permitted energy levels of the atomic harmonic oscillator were equally spaced, at ‘distances’ ħωapart from each other. That’s what’s illustrated below.

Equally space energy levels

Now, I don’t want to make too many digressions from the main story, but this En = nħω0 formula obviously deserves some attention. First note it immediately shows why the dimension of ħ is expressed in joule-seconds (J·s), or electronvolt-seconds (J·s): we’re multiplying it with a frequency indeed, so that’s something expressed per second (hence, its dimension is s–1) in order to get a measure of energy: joules or, because of the atomic scale, electronvolts. [The eV is just a (much) smaller measure than the joule, but it amounts to the same: 1 eV ≈ 1.6×10−19 J.]

One thing to note is that the equal spacing consists of distances equal to ħω0, not of ħ. Hence, while h, or ħ (ħ is the constant to be used when the frequency is expressed in radians per second, rather than oscillations per second, so ħ = h/2π) is now being referred to as the quantum of action (das elementare Wirkungsquantum in German), Planck referred to it as as a Hilfsgrösse only (that’s why he chose the h as a symbol, it seems), so that’s an auxiliary constant only: the actual quantum of action is, of course, ΔE, i.e. the difference between the various energy levels, which is the product of ħ and ω(or of h and ν0 if we express frequency in oscillations per second, rather than in angular frequency). Hence, Planck (and later Einstein) did not assume that an atomic oscillator emits or absorbs packets of energy as tiny as ħ or h, but packets of energy as big as ħωor, what amounts to the same (ħω = (h/2π)(2πν) = hν), hν0. Just to give an example, the frequency of sodium light (ν) is 500×1012 Hz, and so its energy is E = hν. That’s not a lot–about 2 eV only– but it still packs 500×1012 ‘quanta of action’ !

Another thing is that ω (or ν) is a continuous variable: hence, the assumption of equally spaced energy levels does not imply that energy itself is a discrete variable: light can have any frequency and, hence, we can also imagine photons with any energy level: the only thing we’re saying is that the energy of a photon of a specific color (i.e. a specific frequency ν) will be a multiple of hν.

Probability assumptions

The second key assumption of Planck as he worked towards a solution of the blackbody radiation problem was that the probability (P) of occupying a level of energy E is P(EαeE/kT. OK… Why not? But what is this assumption really? You’ll think of some ‘bell curve’, of course. But… No. That wouldn’t make sense. Remember that the energy has to be positive. The general shape of this P(E) curve is shown below.


The highest probability density is near E = 0, and then it goes down as E gets larger, with kT determining the slope of the curve (just take the derivative). In short, this assumption basically states that higher energy levels are not so likely, and that very high energy levels are very unlikely. Indeed, this formula implies that the relative chance, i.e. the probability of being in state E1 relative to the chance of being in state E0, is P1/Pe−(E1–E0)k= e−ΔE/kT. Now, Pis n1/N and Pis n0/N and, hence, we find that nmust be equal to n0e−ΔE/kT. What this means is that the atomic oscillator is less likely to be in a higher energy state than in a lower one.

That makes sense, doesn’t it? I mean… I don’t want to criticize those 19th century scientists but… What were they thinking? Did they really imagine that infinite energy levels were as likely as… Well… More down-to-earth energy levels? I mean… A mechanical spring will break when you overload it. Hence, I’d think it’s pretty obvious those atomic oscillators cannot be loaded with just about anything, can they? Garbage in, garbage out:  of course, that theoretical spectrum of blackbody radiation didn’t make sense!

Let me copy Feynman now, as the rest of the story is pretty straightforward:

Now, we have a lot of oscillators here, and each is a vibrator of frequency w0. Some of these vibrators will be in the bottom quantum state, some will be in the next one, and so forth. What we would like to know is the average energy of all these oscillators. To find out, let us calculate the total energy of all the oscillators and divide by the number of oscillators. That will be the average energy per oscillator in thermal equilibrium, and will also be the energy that is in equilibrium with the blackbody radiation and that should go in the equation for the intensity of the radiation as a function of the frequency, instead of kT. [See my previous post: that equation is I(ω) = (ω2kt)/(π2c2).]

Thus we let N0 be the number of oscillators that are in the ground state (the lowest energy state); N1 the number of oscillators in the state E1; N2 the number that are in state E2; and so on. According to the hypothesis (which we have not proved) that in quantum mechanics the law that replaced the probability eP.E./kT or eK.E./kT in classical mechanics is that the probability goes down as eΔE/kT, where ΔE is the excess energy, we shall assume that the number N1 that are in the first state will be the number N0 that are in the ground state, times e−ħω/kT. Similarly, N2, the number of oscillators in the second state, is N=N0e−2ħω/kT. To simplify the algebra, let us call e−ħω/k= x. Then we simply have N1 = N0x, N2 = N0x2, …, N= N0xn.

The total energy of all the oscillators must first be worked out. If an oscillator is in the ground state, there is no energy. If it is in the first state, the energy is ħω, and there are N1 of them. So N1ħω, or ħωN0x is how much energy we get from those. Those that are in the second state have 2ħω, and there are N2 of them, so N22ħω=2ħωN0x2 is how much energy we get, and so on. Then we add it all together to get Etot = N0ħω(0+x+2x2+3x3+…).

And now, how many oscillators are there? Of course, N0 is the number that are in the ground state, N1 in the first state, and so on, and we add them together: Ntot = N0(1+x+x2+x3+…). Thus the average energy is


Now the two sums which appear here we shall leave for the reader to play with and have some fun with. When we are all finished summing and substituting for x in the sum, we should get—if we make no mistakes in the sum—

Feynman concludes as follows: “This, then, was the first quantum-mechanical formula ever known, or ever discussed, and it was the beautiful culmination of decades of puzzlement. Maxwell knew that there was something wrong, and the problem was, what was right? Here is the quantitative answer of what is right instead of kT. This expression should, of course, approach kT as ω → 0 or as → .”

It does, of course. And so Planck’s analysis does result in a theoretical I(ω) curve that matches the observed I(ω) curve as a function of both temperature (T) and frequency (ω). But so what it is, then? What’s the equation describing the dotted curves? It’s given below:

formula blackbody

I’ll just quote Feynman once again to explain the shape of those dotted curves: “We see that for a large ω, even though we have ωin the numerator, there is an e raised to a tremendous power in the denominator, so the curve comes down again and does not “blow up”—we do not get ultraviolet light and x-rays where we do not expect them!”

Is the analysis necessarily discrete?

One question I can’t answer, because I just am not strong enough in math, is the question or whether or not there would be any other way to derive the actual blackbody spectrum. I mean… This analysis obviously makes sense and, hence, provides a theory that’s consistent and in accordance with experiment. However, the question whether or not it would be possible to develop another theory, without having recourse to the assumption that energy levels in atomic oscillators are discrete and equally spaced with the ‘distance’ between equal to hν0, is not easy to answer. I surely can’t, as I am just a novice, but I can imagine smarter people than me have thought about this question. The answer must be negative, because I don’t know of any other theory: quantum mechanics obviously prevailed. Still… I’d be interested to see the alternatives that must have been considered.

Post scriptum: The “playing with the sums” is a bit confusing. The key to the formula above is the substitution of (0+x+2x2+3x3+…)/(1+x+x2+x3+…) by 1/[(1/x)–1)] = 1/[eħω/kT–1]. Now, the denominator 1+x+x2+x3+… is the Maclaurin series for 1/(1–x). So we have:

(0+x+2x2+3x3+…)/(1+x+x2+x3+…) = (0+x+2x2+3x3+…)(1–x)

x+2x2+3x3… –x22x3–3x4… = x+x2+x3+x4

= –1+(1+x+x2+x3…) = –1 + 1/(1–x) = –(1–x)+1/(1–x) = x/(1–x).

Note the tricky bit: if x = e−ħω/kT, then eħω/kis x−1 = 1/x, and so we have (1/x)–1 in the denominator of that (mean) energy formula, not 1/(x–1). Now 1/[(1/x)–1)] = 1/[(1–x)/x] = x/(1–x), indeed, and so the formula comes out alright.

Planck’s constant (I)

If you made it here, it means you’re totally fed up with all of the easy stories on quantum mechanics: diffraction, double-slit experiments, imaginary gamma-ray microscopes,… You’ve had it! You now know what quantum mechanics is all about, and you’ve realized all these thought experiments never answer the tough question: where did Planck find that constant (h) which pops up everywhere? And how did he find that Planck relation which seems to underpin all and everything in quantum mechanics?

If you don’t know, that’s because you’ve skipped the blackbody radiation story. So let me give it to you here. What’s blackbody radiation?

Thermal equilibrium of radiation

That’s what the blackbody radiation problem is about: thermal equilibrium of radiation.


Yes. Imagine a box with gas inside. You’ll often see it’s described as a furnace, because we heat the box. Hence, the box, and everything inside, acquires a certain temperature, which we then assume to be constant. The gas inside will absorb energy and start emitting radiation, because the gas atoms or molecules are atomic oscillators. Hence, we have electrons getting excited and then jumping up and down from higher to lower energy levels, and then again and again and again, thereby emitting photons with a certain energy and, hence, light of a certain frequency. To put it simply: we’ll find light with various frequencies in the box and, in thermal equilibrium, we should have some distribution of the intensity of the light according to the frequency: what kind of radiation do we find in the furnace? Well… Let’s find out.

The assumption is that the box walls send light back, or that the box has mirror walls. So we assume that all the radiation keeps running around in the box. Now that implies that the atomic oscillators not only radiate energy, but also receive energy, because they’re constantly being illuminated by radiation that comes straight back at them. If the temperature of the box is kept constant, we arrive at a situation which is referred to as thermal equilibrium. In Feynman’s words: “After a while there is a great deal of light rushing around in the box, and although the oscillator is radiating some, the light comes back and returns some of the energy that was radiated.”

OK. That’s easy enough to understand. However, the actual analysis of this equilibrium situation is what gave rise to the ‘problem’ of blackbody radiation in the 19th century which, as you know, led Planck and Einstein to develop a quantum-mechanical view of things. It turned out that the classical analysis predicted a distribution of the intensity of light that didn’t make sense, and no matter how you looked at it, it just didn’t come out right. Theory and experiment did not agree. Now, that is something very serious in science, as you know, because it means your theory isn’t right. In this case, it was disastrous, because it meant the whole of classical theory wasn’t right.

To be frank, the analysis is not all that easy. It involves all that I’ve learned so far: the math behind oscillators and interference, statistics, the so-called kinetic theory of gases and what have you. I’ll try to summarize the story but you’ll see it requires quite an introduction.

Kinetic energy and temperature

The kinetic theory of gases is part of what’s referred to as statistical mechanics: we look at a gas as a large number of inter-colliding atoms and we describe what happens in terms of the collisions between them. As Feynman puts it: “Fundamentally, we assert that the gross properties of matter should be explainable in terms of the motion of its parts.” Now, we can do a lot of intellectual gymnastics, analyzing one gas in one box, two gases in one box, two gases in one box with a piston between them, two gases in two boxes with a hole in the wall between them, and so on and so on, but that would only distract us here. The rather remarkable conclusion of such exercises, which you’ll surely remember from your high school days, is that:

  1. Equal volumes of different gases, at the same pressure and temperature, will have the same number of molecules.
  2. In such view of things, temperature is actually nothing but the mean kinetic energy of those molecules (or atoms if it’s a monatomic gas).

So we can actually measure temperature in terms of the kinetic energy of the molecules of the gas, which, as you know, equals mv2/2, with m the mass and v the velocity of the gas molecules. Hence, we’re tempted to define some absolute measure of temperature T and simply write:

T = 〈mv2/2〉

The 〈 and 〉 brackets denote the mean here. To be precise, we’re talking the root mean square here, aka as the quadratic mean, because we want to average some magnitude of a varying quantity. Of course, the mass of different gases will be different – and so we have 〈m1v12/2〉 for gas 1 and 〈m2v22/2〉 for gas 2 – but that doesn’t matter: we can, actually, imagine measuring temperature in joule, the unit of energy, including kinetic energy. Indeed, the units come out alright: 1 joule = 1 kg·(m2/s2). For historical reasons, however, T is measured in different units: degrees Kelvin, centigrades (i.e. degrees Celsius) or, in the US, in Fahrenheit. Now, we can easily go from one measure to the other as you know and, hence, here I should probably just jot down the so-called ideal gas law–because we need that law for the subsequent analysis of blackbody radiation–and get on with it:

PV = NkT

However, now that we’re here, let me give you an inkling of how we derive that law. A classical (Newtonian) analysis of the collisions (you can find the detail in Feynman’s Lectures, I-39-2) will yield the following equation: P = (2/3)n〈mv2/2〉, with n the number of atoms or molecules per unit volume. So the pressure of a gas (which, as you know, is the force (of a gas on a piston, for example) per unit area: P = F/A) is also equal to the mean kinetic energy of the gas molecules multiplied by (2/3)n. If we multiply that equation by V, we get PV = N(2/3)〈mv2/2〉. However, we know that equal volumes of volumes of different gases, at the same pressure and temperature, will have the same number of molecules, so we have PV = N(2/3)〈m1v12/2〉 = N(2/3)〈m2v22/2〉, which we write as PV = NkT with kT = (2/3)〈m1v12/2〉 = (2/3)〈m2v22/2〉.

In other words, that factor of proportionality k is the one we have to use to convert the temperature as measured by 〈mv2/2〉 (i.e. the mean kinetic energy expressed in joules) to T (i.e. the temperature expressed in the measure we’re used to, and that’s degrees Kelvin–or Celsius or Fahrenheit, but let’s stick to Kelvin, because that’s what’s used in physics). Vice versa, we have 〈mv2/2〉 = (3/2)kT. Now, that constant of proportionality k is equal to k 1.38×10–23 joule per Kelvin (J/K). So if T is (absolute) temperature, expressed in Kelvin (K), our definition says that the mean molecular kinetic energy is (3/2)kT.

That k factor is a physical constant referred to as the Boltzmann constant. If it’s one of these constants, you may wonder why we don’t integrate that 3/2 factor in it? Well… That’s just how it is, I guess. In any case, it’s rather convenient because we’ll have 2/3 factors in other equations and so these will cancel out with that 3/2 term. However, I am digressing way too much here. I should get back to the main story line. However, before I do that, I need to expand on one more thing, and that’s a small lecture on how things look like when we also allow for internal motion, i.e. the rotational and vibratory motions of the atoms within the gas molecule. Let me first re-write that PV = NkT equation as

PV = NkT = N(2/3)〈m1v12/2〉 = (2/3)U = 2U/3

For monatomic gas, that U would only be the kinetic energy of the atoms, and so we can write it as U = (2/3)NkT. Hence, we have the grand result that the kinetic energy, for each atom, is equal to (3/2)kT, on average that is.

What about non-monatomic gas? Well… For complex molecules, we’d also have energy going into the rotational and vibratory motion of the atoms within the molecule, separate from what is usually referred to as the center-of-mass (CM) motion of the molecules themselves. Now, I’ll again refer you to Feynman for the detail of the analysis, but it turns out that, if we’d have, for example, a diatomic molecule, consisting of an A and B atom, the internal rotational and vibratory motion would, indeed, also absorb energy, and we’d have a total energy equal to (3/2)kT + (3/2)kT = 2×(3/2)kT = 3kT. Now, that amount (3kT) can be split over (i) the energy related to the CM motion, which must still be equal to (3/2)kT, and (ii) the average kinetic energy of the internal motions of the diatomic molecule excluding the bodily motion of the CM. Hence, the latter part must be equal to 3kT – (3/2)kT = (3/2)kT. So, for the diatomic molecule, the total energy happens to consist of two equal parts.

Now, there is a more general theorem here, for which I have to introduce the notion of the degrees of freedom of a system. Each atom can rotate or vibrate or oscillate or whatever in three independent directions–namely the three spatial coordinates x, y and z. These spatial dimensions are referred to as the degrees of freedom of the atom (in the kinetic theory of gases, that is), and if we have two atoms, we have 2×3 = 6 degrees of freedom. More in general, the number of degrees of freedom of a molecule composed of r atoms is equal to 3rNow, it can be shown that the total energy of an r-atom molecule, including all internal energy as well as the CM motion, will be 3r×kT/2 = 3rkT/2 joules. Hence, for every independent direction of motion that there is, the average kinetic energy for that direction will be kT/2. [Note that ‘independent direction of motion’ is used, somewhat confusingly, as a synonym for degree of freedom, so we don’t have three but six ‘independent directions of motion’ for the diatomic molecule. I just wanted to note that because I do think it causes confusion when reading a textbook like Feynman’s.] Now, that total amount of energy, i.e.  3r(kT/2), will be split as follows according to the “theorem concerning the average energy of the CM motion”, as Feynman terms it:

  1. The kinetic energy for the CM motion of each molecule is, and will always be, (3/2)kT.
  2. The remainder, i.e. r(3/2)kT – (3/2)kT = (3/2)(r–1)kt, is internal vibrational and rotational kinetic energy, i.e. the sum of all vibratory and rotational kinetic energy but excluding the energy of the CM motion of the molecule.

Phew! That’s quite something. And we’re not quite there yet.

The analysis for photon gas

Photon gas? What’s that? Well… Imagine our box is the gas in a very hot star, hotter than the sun. As Feynman writes it: “The sun is not hot enough; there are still too many atoms, but at still higher temperatures in certain very hot stars, we may neglect the atoms and suppose that the only objects that we have in the box are photons.” Well… Let’s just go along with it. We know that photons have no mass but they do have some very tiny momentum, which we related to the magnetic field vector, as opposed to the electric field. It’s tiny indeed. Most of the energy of light goes into the electric field. However, we noted that we can write p as p = E/c, with c the speed of light (3×108). Now, we had that = (2/3)n〈mv2/2〉 formula for gas, and we know that the momentum p is defined as p = mv. So we can substitute mvby (mv)v = pv. So we get = (2/3)n〈pv/2〉 = (1/3)n〈pv〉.

Now, the energy of photons is not quite the same as the kinetic energy of an atom or an molecule, i.e. mv2/2. In fact, we know that, for photons, the speed v is equal to c, and pc = E. Hence, multiplying by the volume V, we get

PV = U/3

So that’s a formula that’s very similar to the one we had for gas, for which we wrote: PV = NkT = 2U/3. The only thing is that we don’t have a factor 2 in the equation but so that’s because of the different energy concepts involved. Indeed, the concept of the energy of a photon (E = pc) is different than the concept of kinetic energy. But so the result is very nice: we have a similar formula for the compressibility of gas and radiation. In fact, both PV = 2U/3 and PV = U/3 will usually be written, more generally, as:

PV = (γ – 1)U 

Hence, this γ would be γ = 5/3 ≈ 1.667 for gas and 4/3 ≈ 1.333 for photon gas. Now, I’ll skip the detail (it involves a differential analysis) but it can be shown that this general formula, PV = (γ – 1)U, implies that PVγ (i.e. the pressure times the volume raised to the power γ) must equal some constant, so we write:

PVγ = C

So far so good. Back to our problem: blackbody radiation. What you should take away from this introduction is the following:

  1. Temperature is a measure of the average kinetic energy of the atoms or molecules in a gas. More specifically, it’s related to the mean kinetic energy of the CM motion of the atoms or molecules, which is equal to (3/2)kT, with k the Boltzmann constant and T the temperature expressed in Kelvin (i.e. the absolute temperature).
  2. If gas atoms or molecules have additional ‘degrees of freedom’, aka ‘independent directions of motion’, then each of these will absorb additional energy, namely kT/2.

Energy and radiation

The atoms in the box are atomic oscillators, and we’ve analyzed them before. What the analysis above added was that average kinetic energy of the atoms going around is (3/2)kT and that, if we’re talking molecules consisting of r atoms, we have a formula for their internal kinetic energy as well. However, as an oscillator, they also have energy separate from that kinetic energy we’ve been talking about alrady. How much? That’s a tricky analysis. Let me first remind you of the following:

  1. Oscillators have a natural frequency, usually denoted by the (angular) frequency ω0.
  2. The sum of the potential and kinetic energy stored in an oscillator is a constant, unless there’s some damping constant. In that case, the oscillation dies out. Here, you’ll remember the concept of the Q of an oscillator. If there’s some damping constant, the oscillation will die out and the relevant formula is 1/Q = (dW/dt)/(ω0W) = γ0, with γ the damping constant (not to be confused with the γ we used in that PVγ = C formula).

Now, for gases, we said that, for every independent direction of motion there is, the average kinetic energy for that direction will be kT/2. I admit it’s a bit of a stretch of the imagination but so that’s how the blackbody radiation analysis starts really: our atomic oscillators will have an average kinetic energy equal to kT/2 and, hence, their total energy (kinetic and potential) should be twice that amount, according to the second remark I made above. So that’s kT. We’ll note the total energy as W below, so we can write:

W = kT

Just to make sure we know what we’re talking about (one would forget, wouldn’t one?), kT is the product of the Boltzmann constant (1.38×10–23 J/K) and the temperature of the gas (so note that the product is expressed in joule indeed). Hence, that product is the average energy of our atomic oscillators in the gas in our furnace.

Now, I am not going to repeat all of the detail we presented on atomic oscillators (I’ll refer you, once again, to Feynman) but you may or may not remember that atomic oscillators do have a Q indeed and, hence, some damping constant γ. So we can use and re-write that formula above as

dW/dt = (1/Q)(ω0W) = (ω0W)(γ/ω0) = γW, which implies γ = (dW/dt)/W

What’s γ? Well, we’ve calculated the Q of an atomic oscillator already: Q = 3λ/4πr0. Now, λ = 2πc/ω(we just convert the wavelength into (angular) frequency using λν = c) and γ = ω0/Q, so we get γ = 4πr0ω0/[3(2πc/ω0)] = (2/3)r0ω02/c. Now, plugging that result back into the equation above, we get

dW/dt = γW = (2/3)(r0ω02kT)/c

Just in case you’d have difficulty following – I admit I did :-) – dW/dt is the average rate of radiation of light of (or near) frequency ω02. I’ll let Feynman take over here:

Next we ask how much light must be shining on the oscillator. It must be enough that the energy absorbed from the light (and thereupon scattered) is just exactly this much. In other words, the emitted light is accounted for as scattered light from the light that is shining on the oscillator in the cavity. So we must now calculate how much light is scattered from the oscillator if there is a certain amount—unknown—of radiation incident on it. Let I(ω)dω be the amount of light energy there is at the frequency ω, within a certain range dω (because there is no light at exactly a certain frequency; it is spread all over the spectrum). So I(ω) is a certain spectral distribution which we are now going to find—it is the color of a furnace at temperature T that we see when we open the door and look in the hole. Now how much light is absorbed? We worked out the amount of radiation absorbed from a given incident light beam, and we calculated it in terms of a cross section. It is just as though we said that all of the light that falls on a certain cross section is absorbed. So the total amount that is re-radiated (scattered) is the incident intensity I(ω)dω multiplied by the cross section σ.

OK. That makes sense. I’ll not copy the rest of his story though, because this is a post in a blog, not a textbook. What we need to find is that I(ω). So I’ll refer you to Feynman for the details (these ‘details’ involve fairly complicated calculations, which are less important than the basic assumptions behind the model, which I presented above) and just write down the result:

blackbody radiation formula

This formula is Rayleigh’s law. [And, yes, it’s the same Rayleigh – Lord Rayleigh, I should say respectfully – as the one who invented that criterion I introduced in my previous post, but so this law and that criterion have nothing to do with each other.] This ‘law’ gives the intensity, or the distribution, of light in a furnace. Feynman says it’s referred to as blackbody radiation because “the hole in the furnace that we look at is black when the temperature is zero.” […] OK. Whatever. What we call it doesn’t matter. The point is that this function tells us that the intensity goes as the square of the frequency, which means that if we have a box at any temperature at all, and if we look at it, the X- and gamma rays will be burning out eyes out ! The graph below shows both the theoretical curve for two temperatures (Tand 2T0), as derived above (see the solid lines), and then the actual curves for those two temperatures (see the dotted lines).

Blackbody radation graph

This is the so-called UV catastrophe: according to classical physics, an ideal black body at thermal equilibrium should emit radiation with infinite power. In reality, of course, it doesn’t: Rayleigh’s law is false. Utterly false. And so that’s where Planck came to the rescue, and he did so by assuming radiation is being emitted and/or absorbed in finite quanta: multiples of h, in fact.

Indeed, Planck studied the actual curve and fitted it with another function. That function assumed the average energy of a harmonic oscillator was not just proportional with the temperature (T), but that it was also a function of the (natural) frequency of the oscillators. By fiddling around, he found a simple derivation for it which involved a very peculiar assumption. That assumption was that the harmonic oscillator can take up energies only ħω at the time, as shown below.

Equally space energy levels

Hence, the assumption is that the harmonic oscillators cannot take whatever (continous) energy level. No. The allowable energy levels of the harmonic oscillators are equally spaced: E= nħω. Now, the actual derivation is at least as complex as the derivation of Rayleigh’s law, so I won’t do it here. Let me just give you the key assumptions:

  1. The gas consists of a large number of atomic oscillators, each with their own natural frequency ω0.
  2. The permitted energy levels of these harmonic oscillator are equally spaced and ħωapart.
  3. The probability of occupying a level of energy E is P(Eαe–E/kT.

All the rest is tedious calculation, including the calculation of the parameters of the model, which include ħ (and, hence, h, because h = 2πħ) and are found by matching the theoretical curves to the actual curves as measured in experiments. I’ll just mention one result, and that’s the average energy of these oscillators:


As you can see, the average energy does not only depend on the temperature T, but also on their (natural) frequency. So… Now you know where h comes from. As I relied so heavily on Feynman’s presentation here, I’ll include the link. As Feynman puts it: “This, then, was the first quantum-mechanical formula ever known, or ever discussed, and it was the beautiful culmination of decades of puzzlement. Maxwell knew that there was something wrong, and the problem was, what was right? Here is the quantitative answer of what is right instead of kT.”

So there you go. Now you know. :-) Oh… And in case you’d wonder: why the h? Well… Not sure. It’s said the h stands for Hilfsgrösse, so that’s some constant which was just supposed to help him out with the calculation. At that time, Planck did not suspect it would turn out to be one of the most fundamental physical constants. :-)

Post scriptum: I went quite far in my presentation of the basics of the kinetic theory of gases. You may wonder now. I didn’t use that theoretical PVγ = C relation, did I? And why all the fuss about photon gas? Well… That was just to introduce that PVγ = C relation, so I could note, here, in this post scriptum, that it has a similar problem. The γ exponent is referred to as the specific heat ratio of a gas, and it can be calculated theoretically as well, as we did–well… Sort of, because we skipped the actual derivation. However, their theoretical value also differs substantially from actually measured values, and the problem is the same: one should not assume that a continuous value for 〈E〉. Agreement between theory and experiment can only be reached when the same assumptions as those of Planck are used: discrete energy levels, multiples of ħ and ω: E= nħω. Also, the specific functional form which Planck used to resolve the blackbody radiation problem is also to be used here. For more details, I’ll refer to Feynman too. I can’t say this is easy to digest, but then who said it would be easy? :-)

The point to note is that the blackbody radiation problem wasn’t the only problem in the 19th century. As Feynman puts it: “One often hears it said that physicists at the latter part of the nineteenth century thought they knew all the significant physical laws and that all they had to do was to calculate more decimal places. Someone may have said that once, and others copied it. But a thorough reading of the literature of the time shows they were all worrying about something.” They were, and so Planck came up with something new. And then Einstein took it to the next level and then… Well… The rest is history. :-)

Diffraction and the Uncertainty Principle (II)

In my previous post, I derived and explained the general formula for the pattern generated by a light beam going through a slit or a circular aperture: the diffraction pattern. For light going through an aperture, this generates the so-called Airy pattern. In practice, diffraction causes a blurring of the image, and may make it difficult to distinguish two separate points, as shown below (credit for the image must go to Wikipedia again, I am afraid).


What’s actually going on is that the lens acts as a slit or, if it’s circular (which is usually the case), as an aperture indeed: the wavefront of the transmitted light is taken to be spherical or plane when it exits the lens and interferes with itself, thereby creating the ring-shaped diffraction pattern that we explained in the previous post.

The spatial resolution is also known as the angular resolution, which is quite appropriate, because it refers to an angle indeed: we know the first minimum (i.e. the first black ring) occurs at an angle θ such that sinθ = λ/L, with λ the wavelength of the light and L the lens diameter. It’s good to remind ourselves of the geometry of the situation: below we picture the array of oscillators, and so we know that the first minimum occurs at an angle such that Δ = λ. The second, third, fourth etc minimum occurs at an angle θ such that Δ = 2λ, 3λ, 4λ, etc. However, these secondary minima do not play any role in determining the resolving power of a lens, or a telescope, or an electron microscope, etc, and so you can just forget about them for the time being.


For small angles (expressed in radians), we can use the so-called small-angle approximation and equate sinθ with θ: the error of this approximation is less than one percent for angles smaller than 0.244 radians (14°), so we have the amazingly simply result that the first minimum occurs at an angle θ such that:

θ = λ/L

Spatial resolution of a microscope: the Rayleigh criterion versus Dawes’ limit 

If we have two point sources right next to each other, they will create two Airy disks, as shown above, which may overlap. That may make it difficult to see them, in a telescope, a microscope, or whatever device. Hence, telescopes, microscopes (using light or electron beams or whatever) have a limited resolving power. How do we measure that?

The so-called Rayleigh criterion regards two point sources as just resolved when the principal diffraction maximum of one image coincides with the first minimum of the other, as shown below. If the distance is greater, the two points are (very) well resolved, and if it is smaller, they are regarded as not resolved. This angle is obviously related to the θ = λ/L angle but it’s not the same: in fact, it’s a slightly wider angle. The analysis involved in calculating the angular resolution in terms of angle, and we use the same symbol θ for it, is quite complicated and so I’ll skip that and just give you the result:

θ = 1.22λ/L

two point sourcesRayleigh criterion

Note that, in this equation, θ stands for the angular resolution, λ for the wavelength of the light being used, and L is the diameter of the (aperture of) the lens. In the first of the three images above, the two points are well separated and, hence, the angle between them is well above the angular resolution. In the second, the angle between just meets the Rayleigh criterion, and in the third the angle between them is smaller than the angular resolution and, hence, the two points are not resolved.

Of course, the Rayleigh criterion is, to some extent, a matter of judgment. In fact, an English 19th century astronomer, named William Rutter Dawes, actually tested human observers on close binary stars of equal brightness, and found they could make out the two stars within an angle that was slightly narrower than the one given by the Rayleigh criterion. Hence, for an optical telescope, you’ll also find the simple θ = λ/L formula, so that’s the formula without the 1.22 factor (of course, λ here is, once again, the wavelength of the observed light or radiation, and L is the diameter of the telescope’s primary lens). This very simple formula allows us, for example, to calculate the diameter of the telescope lens we’d need to build to separate (see) objects in space with a resolution of, for example, 1 arcsec (i.e. 1/3600 of a degree or π/648,000 of a radian). Indeed, if we filter for yellow light only, which has a wavelength of 580 nm, we find L = 580×10−9 m/(π/648,000) = 0.119633×10−6 m ≈ 12 cm. [Just so you know: that’s about the size of the lens aperture of a good telescope (4 or 6 inches) for amateur astronomers–just in case you’d want one. :-)]

This simplified formula is called Dawes’ limit, and you’ll often see it used instead of Rayleigh’s criterion. However, the fact that it’s exactly the same formula as our formula for the first minimum of the Airy pattern should not confuse you: angular resolution is something different.

Now, after this introduction, let me get to the real topic of this post: Heisenberg’s Uncertainty Principle according to Heisenberg.

Heisenberg’s Uncertainty Principle according to Heisenberg

I don’t know about you but, as a kid, I didn’t know much about waves and fields and all that, and so I had difficulty understanding why the resolving power of a microscope or any other magnifying device depended on the frequency or wavelength. I now know my understanding was limited because I thought the concept of the amplitude of an electromagnetic wave had some spatial meaning, like the amplitude of a water or a sound wave. You know what I mean: this false idea that an electromagnetic wave is something that sort of wriggles through space, just like a water or sound wave wriggle through their medium (water and air respectively). Now I know better: the amplitude of an electromagnetic wave measures field strength and there’s no medium (no aether). So it’s not like a wave going around some object, or making some medium oscillate. I am not ashamed to acknowledge my stupidity at the time: I am just happy I finally got it, because it helps to really understand Heisenberg’s own illustration of his Uncertainty Principle, which I’ll present now.

Heisenberg imagined a gamma-ray microscope, as shown below (I copied this from the website of the American Institute for Physics ). Gamma-ray microscopes don’t exist – they’re hard to produce: you need a nuclear reactor or so :-) – but, as Heisenberg saw the development of new microscopes using higher and higher energy beams (as opposed to the 1.5-3 eV light in the visible spectrum) so as to increase the angular resolution and, hence, be able to see smaller things, he imagined one could use, perhaps, gamma-rays for imaging. Gamma rays are the hardest radiation, with frequencies of 10 exaherz and more (or >1019 Hz) and, hence, energies above 100 keV (i.e. 100,000 more than photons in the visible light spectrum, and 1000 times more than the electrons used in an average electron microscope). Gamma rays are not the result of some electron jumping from a higher to a lower energy level: they are emitted in decay processes of atomic nuclei (gamma decay). But I am digressing. Back to the main story line. So Heisenberg imagined we could ‘shine’ gamma rays on an electron and that we could then ‘see’ that electron in the microscope because some of the gamma photons would indeed end up in the microscope after their ‘collision’ with the electron, as shown below.


The experiment is described in many places elsewhere but I found these accounts often confusing, and so I present my own here. :-)

What Heisenberg basically meant to show is that this set-up would allow us to gather precise information on the position of the electron–because we would know where it was–but that, as a result, we’d lose information in regard to its momentum. Why? To put it simply: because the electron recoils as a result of the interaction. The point, of course, is to calculate the exact relationship between the two (position and momentum). In other words: what we want to do is to state the Uncertainty Principle quantitatively, not qualitatively.

Now, the animation above uses the symbol L for the γ-ray wavelength λ, which is confusing because I used L for the diameter of the aperture in my explanation of diffraction above. The animation above also uses a different symbol for the angular resolution: A instead of θ. So let me borrow the diagram used in the Wikipedia article and rephrase the whole situation.


From the diagram above, it’s obvious that, to be scattered into the microscope, the γ-ray photon must be scattered into a cone with angle ε. That angle is obviously related to the angular resolution of the microscope, which is θ = ε/2 = λ/D, with D the diameter of the aperture (i.e. the primary lens). Now, the electron could actually be anywhere, and the scattering angle could be much larger than ε, and, hence, relating D to the uncertainty in position (Δx) is not as obvious as most accounts of this thought experiment make it out to be. The thing is: if the scattering angle is larger than ε, it won’t reach the light detector at the end of the microscope (so that’s the flat top in the diagram above). So that’s why we can equate D with Δx, so we write Δx = ± D/2 = D. To put it differently: the assumption here is basically that this imaginary microscope ‘sees’ an area that is approximately as large as the lens. Using the small-angle approximation (so we write sin(2ε) ≈ 2ε), we can write:

Δx = 2λ/ε

Now, because of the recoil effect, the electron receives some momentum from the γ-ray photon. How much? Well… The situation is somewhat complicated (much more complicated than the Wikipedia article on this very same topic suggests), because the photon keeps some but also gives some of its original momentum. In fact, what’s happening really is Compton scattering: the electron first absorbs the photon, and then emits another with a different energy and, hence, also with different frequency and wavelength. However, what we do now is that the photon’s original momentum was equal to E/c= p = h/λ. That’s just the Planck relation or, if you’d want to look at the photon as a particle, the de Broglie equation.

Now, because we’re doing an analysis in one dimension only (x), we’re only going to look at the momentum in this direction only, i.e. px, and we’ll assume that all of the momentum of the photon before the interaction (or ‘collision’ if you want) was horizontal. Hence, we can write p= h/λ. After the collision, however, this momentum is spread over the electron and the scattered or emitted photon that’s going into the microscope. Let’s now imagine the two extremes:

  1. The scattered photon goes to the left edge of the lens. Hence, its horizontal momentum is negative (because it moves to the left) and the momentum pwill be distributed over the electron and the photon such that p= p’–h(ε/2)/λ’. Why the ε/2 factor? Well… That’s just trigonometry: the horizontal momentum of the scattered photon is obviously only a tiny fraction of its original horizontal momentum, and that fraction is given by the angle ε/2.
  2. The scattered photon goes to the right edge of the lens. In that case, we write p= p”+ h(ε/2)/λ”.

Now, the spread in the momentum of the electron, which we’ll simply write as Δp, is obviously equal to:

Δp = p”– p’= p+ h(ε/2)/λ” – p+ h(ε/2)/λ’ = h(ε/2)/λ” + h(ε/2)/λ’ = h(ε/2)/λ” + h(ε/2)/λ’

That’s a nice formula, but what can we do with it? What we want is a relationship between Δx and Δp, i.e. the position and the momentum of the electron, and of the electron only. That involves another simplification, which is also dealt with very summarily – too summarily in my view – in most accounts of this experiment. So let me spell it out. The angle ε is obviously very small and, hence, we may equate λ’ and λ”. In addition, while these two wavelengths differ from the wavelength of the incoming photon, the scattered photon is, obviously, still a gamma ray and, therefore, we are probably not too far off when substituting both λ’ and λ” for λ, i.e. the frequency of the incoming γ-ray. Now, we can re-write Δx = 2λ/ε as 1/Δx = ε/(2λ). We then get:

Δp = p”– p’= hε/2λ” + hε/2λ’ = 2hε/2λ = 2h/Δx

Now that yields ΔpΔx = 2h, which is an approximate expression of Heisenberg’s Uncertainty Principle indeed (don’t worry about the factor 2, as that’s something that comes with all of the approximations).

A final moot point perhaps: it is obviously a thought experiment. Not only because we don’t have gamma-ray microscopes (that’s not relevant because we can effectively imagine constructing one) but because the experiment involves only one photon. A real microscope would organize a proper beam, but that would obviously complicate the analysis. In fact, it would defeat the purpose, because the whole point is to analyze one single interaction here.

The interpretation

Now how should we interpret all of this? Is this Heisenberg’s ‘proof’ of his own Principle? Yes and no, I’d say. It’s part illustration, and part ‘proof’, I would say. The crucial assumptions here are:

  1. We can analyze γ-ray photons, or any photon for that matter, as particles having some momentum, and when ‘colliding’, or interacting, with an electron, the photon will impart some momentum to that electron.
  2. Momentum is being conserved and, hence, the total (linear) momentum before and after the collision, considering both particles–i.e. (1) the incoming ray and the electron before the interaction and (2) the emitted photon and the electron that’s getting the kick after the interaction–must be the same.
  3. For the γ-ray photon, we can relate (or associate, if you prefer that term) its wavelength λ with its momentum p through the Planck relation or, what amounts to the same for photons (because they have no mass), the de Broglie relation.

Now, these assumptions are then applied to an analysis of what we know to be true from experiment, and that’s the phenomenon of diffraction, part of which is the observation that the resolving power of a microscope is limited, and that its resolution is given by the θ = λ/D equation.

Bringing it all together, then gives us a theory which is consistent with experiment and, hence, we then assume the theory is true. Why? Well… I could start a long discourse here on the philosophy of science but, when everything is said and done, we should admit we don’t any ‘better’ theory.

But, you’ll say: what’s a ‘better’ theory? Well… Again, the answer to that question is the subject-matter of philosophers. As for me, I’d just refer to what’s known as Occam’s razor: among competing hypotheses, we should select the one with the fewest assumptions. Hence, while more complicated solutions may ultimately prove correct, the fewer assumptions that are made, the better. Now, when I was a kid, I thought quantum mechanics was very complicated and, hence, describing it here as a ‘simple’ theory sounds strange. But that’s what it is in the end: there’s no better (read: simpler) way to describe, for example, why electrons interfere with each other, and with themselves, when sending them through one or two slits, and so that’s what all these ‘illustrations’ want to show in the end, even if you think there must be simpler way to describe reality. As said, as a kid, I thought so too. :-)

Diffraction and the Uncertainty Principle (I)

In his Lectures, Feynman advances the double-slit experiment with electrons as the textbook example explaining the “mystery” of quantum mechanics. It shows interference–a property of waves–of ‘particles’, electrons: they no longer behave as particles in this experiment. While it obviously illustrates “the basic peculiarities of quantum mechanics” very well, I think the dual behavior of light – as a wave and as a stream of photons – is at least as good as an illustration. And he could also have elaborated the phenomenon of electron diffraction.

Indeed, the phenomenon of diffraction–light, or an electron beam, interfering with itself as it goes through one slit only–is equally fascinating. Frankly, I think it does not get enough attention in textbooks, including Feynman’s, so that’s why I am devoting a rather long post to it here.

To be fair, Feynman does use the phenomenon of diffraction to illustrate the Uncertainty Principle, both in his Lectures as well as in that little marvel, QED: The Strange Theory of Light of Matter–a must-read for anyone who wants to understand the (probability) wave function concept without any reference to complex numbers or what have you. Let’s have a look at it: light going through a slit or circular aperture, illustrated in the left-hand image below, creates a diffraction pattern, which resembles the interference pattern created by an array of oscillators, as shown in the right-hand image.

Diffraction for particle wave Line of oscillators

Let’s start with the right-hand illustration, which illustrates interference, not diffraction. We have eight point sources of electromagnetic radiation here (e.g. radio waves, but it can also be higher-energy light) in an array of length L. λ is the wavelength of the radiation that is being emitted, and α is the so-called intrinsic relative phase–or, to put it simply, the phase difference. We assume α is zero here, so the array produces a maximum in the direction θout = 0, i.e. perpendicular to the array. There are also weaker side lobes. That’s because the distance between the array and the point where we are measuring the intensity of the emitted radiation does result in a phase difference, even if the oscillators themselves have no intrinsic phase difference.

Interference patterns can be complicated. In the set-up below, for example, we have an array of oscillators producing not just one but many maxima. In fact, the array consists of just two sources of radiation, separated by 10 wavelengths.

Interference two dipole radiatorsThe explanation is fairly simple. Once again, the waves emitted by the two point sources will be in phase in the east-west (E-W) direction, and so we get a strong intensity there: four times more, in fact, than what we would get if we’d just have one point source. Indeed, the waves are perfectly in sync and, hence, add up, and the factor four is explained by the fact that the intensity, or the energy of the wave, is proportional to the square of the amplitude: 2= 4. We get the first minimum at a small angle away (the angle from the normal is denoted by ϕ in the illustration), where the arrival times differ by 180°, and so there is destructive interference and the intensity is zero. To be precise, if we draw a line from each oscillator to a distant point and the difference Δ in the two distances is λ/2, half an oscillation, then they will be out of phase. So this first null occurs when that happens. If we move a bit further, to the point where the difference Δ is equal to λ, then the two waves will be a whole cycle out of phase, i.e. 360°, which is the same as being exactly in phase again! And so we get many maxima (and minima) indeed.

But this post should not turn into a lesson on how to construct a radio broadcasting array. The point to note is that diffraction is usually explained using this rather simple theory on interference of waves assuming that the slit itself is an array of point sources, as illustrated below (while the illustrations above were copied from Feynman’s Lectures, the ones below were taken from the Wikipedia article on diffraction). This is referred to as the Huygens-Fresnel Principle, and the math behind is summarized in Kirchhoff’s diffraction formula.

500px-Refraction_on_an_aperture_-_Huygens-Fresnel_principle Huygens_Fresnel_Principle 

Now, that all looks easy enough, but the illustration above triggers an obvious question: what about the spacing between those imaginary point sources? Why do we have six in the illustration above? The relation between the length of the array and the wavelength is obviously important: we get the interference pattern that we get with those two point sources above because the distance between them is 10λ. If that distance would be different, we would get a different interference pattern. But so how does it work exactly? If we’d keep the length of the array the same (L = 10λ) but we would add more point sources, would we get the same pattern?

The easy answer is yes, and Kirchhoff’s formula actually assumes we have an infinite number of point sources between those two slits: every point becomes the source of a spherical wave, and the sum of these secondary waves then yields the interference pattern. The animation below shows the diffraction pattern from a slit with a width equal to five times the wavelength of the incident wave. The diffraction pattern is the same as above: one strong central beam with weaker lobes on the sides.


However, the truth is somewhat more complicated. The illustration below shows the interference pattern for an array of length L = 10λ–so that’s like the situation with two point sources above–but with four additional point sources to the two we had already. The intensity in the E–W direction is much higher, as we would expect. Adding six waves in phase yields a field strength that is six times as great and, hence, the intensity (which is proportional to the square of the field) is thirty-six times as great as compared to the intensity of one individual oscillator. Also, when we look at neighboring points, we find a minimum and then some more ‘bumps’, as before, but then, at an angle of 30°, we get a second beam with the same intensity as the central beam. Now, that’s something we do not see in the diffraction patterns above. So what’s going on here?

Six-dipole antenna

Before I answer that question, I’d like to compare with the quantum-mechanical explanation. It turns out that this question in regard to the relevance of the number of point sources also pops up in Feynman’s quantum-mechanical explanation of diffraction.

The quantum-mechanical explanation of diffraction

The illustration below (taken from Feynman’s QED, p. 55-56) presents the quantum-mechanical point of view. It is assumed that light consists of a photons, and these photons can follow any path. Each of these paths is associated with what Feynman simply refers to as an arrow, but so it’s a vector with a magnitude and a direction: in other words, it’s a complex number representing a probability amplitude.

Many arrows Few arrows

In order to get the probability for a photon to travel from the source (S) to a point (P or Q), we have to add up all the ‘arrows’ to arrive at a final ‘arrow’, and then we take its (absolute) square to get the probability. The text under each of the two illustrations above speaks for itself: when we have ‘enough’ arrows (i.e. when we allow for many neighboring paths, as in the illustration on the left), then the arrows for all of the paths from S to P will add up to one big arrow, because there is hardly any difference in time between them, while the arrows associated with the paths to Q will cancel out, because the difference in time between them is fairly large. Hence, the light will not go to Q but travel to P, i.e. in a straight line.

However, when the gap is nearly closed (so we have a slit or a small aperture), then we have only a few neighboring paths, and then the arrows to Q also add up, because there is hardly any difference in time between them too. As I am quoting from Feynman’s QED here, let me quote all of the relevant paragraph: “Of course, both final arrows are small, so there’s not much light either way through such a small hole, but the detector at Q will click almost as much as the one at P ! So when you try to squeeze light too much to make sure it’s going only in a straight line, it refuses to cooperate and begins to spread out. This is an example of the Uncertainty Principle: there is a kind of complementarity between knowledge of where the light goes between the blocks and where it goes afterwards. Precise knowledge of both is impossible.” (Feynman, QED, p. 55-56).

Feynman’s quantum-mechanical explanation is obviously more ‘true’ that the classical explanation, in the sense that it corresponds to what we know is true from all of the 20th century experiments confirming the quantum-mechanical view of reality: photons are weird ‘wavicles’ and, hence, we should indeed analyze diffraction in terms of probability amplitudes, rather than in terms of interference between waves. That being said, Feynman’s presentation is obviously somewhat more difficult to understand and, hence, the classical explanation remains appealing. In addition, Feynman’s explanation triggers a similar question as the one I had on the number of point sources. Not enough arrows !? What do we mean with that? Why can’t we have more of them? What determines their number?

Let’s first look at their direction. Where does that come from? Feynman is a wonderful teacher here. He uses an imaginary stopwatch to determine their direction: the stopwatch starts timing at the source and stops at the destination. But all depends on the speed of the stopwatch hand of course. So how fast does it turn? Feynman is a bit vague about that but notes that “the stopwatch hand turns around faster when it times a blue photon than when it does when timing a red photon.” In other words, the speed of the stopwatch hand depends on the frequency of the light: blue light has a higher frequency (645 THz) and, hence, a shorter wavelength (465 nm) then red light, for which f = 455 THz and λ = 660 nm. Feynman uses this to explain the typical patterns of red, blue, and violet (separated by borders of black), when one shines red and blue light on a film of oil or, more generally,the phenomenon of iridescence in general, as shown below.


As for the size of the arrows, their length is obviously subject to a normalization condition, because all probabilities have to add up to 1. But what about their number? We didn’t answer that question–yet.

The answer, of course, is that the number of arrows and their size are obviously related: we associate a probability amplitude with every way an event can happen, and the (absolute) square of all these probability amplitudes has to add up to 1. Therefore, if we would identify more paths, we would have more arrows, but they would have to be smaller. The end result would be the same though: when the slit is ‘small enough’, the arrows representing the paths to Q would not cancel each other out and, hence, we’d have diffraction.

You’ll say: Hmm… OK. I sort of see the idea, but how do you explain that pattern–the central beam and the smaller side lobes, and perhaps a second beam as well? Well… You’re right to be skeptical. In order to explain the exact pattern, we need to analyze the wave functions, and that requires a mathematical approach rather than the type of intuitive approach which Feynman uses in his little QED booklet. Before we get started on that, however, let me give another example of such intuitive approach.

Diffraction and the Uncertainty Principle

Let’s look at that very first illustration again, which I’ve copied, for your convenience, again below. Feynman uses it (III-2-2) to (a) explain the diffraction pattern which we observe when we send electrons through a slit and (b) to illustrate the Uncertainty Principle. What’s the story?

Well… Before the electrons hit the wall or enter the slit, we have more or less complete information about their momentum, but nothing on their position: we don’t know where they are exactly, and we also don’t know if they are going to hit the wall or go through the slit. So they can be anywhere. However, we do know their energy and momentum. That momentum is horizontal only, as the electron beam is normal to the wall and the slit. Hence, their vertical momentum is zero–before they hit the wall or enter the slit that is. We’ll denote their (horizontal) momentum, i.e. the momentum before they enter the slit, as p0.

Diffraction for particle wave

Now, if an electron happens to go through the slit, and we know because we detected it on the other side, then we know its vertical position (y) at the slit itself with considerable accuracy: that position will be the center of the slit ±B/2. So the uncertainty in position (Δy) is of the order B, so we can write: Δy = B. However, according to the Uncertainty Principle, we cannot have precise knowledge about its position and its momentum. In addition, from the diffraction pattern itself, we know that the electron acquires some vertical momentum. Indeed, some electrons just go straight, but more stray a bit away from the normal. From the interference pattern, we know that the vast majority stays within an angle Δθ, as shown in the plot. Hence, plain trigonometry allows us to write the spread in the vertical momentum py as p0Δθ, with pthe horizontal momentum. So we have Δpy = p0Δθ.

Now, what is Δθ? Well… Feynman refers to the classical analysis of the phenomenon of diffraction (which I’ll reproduce in the next section) and notes, correctly, that the first minimum occurs at an angle such that the waves from one edge of the slit have to travel one wavelength farther than the waves from the other side. The geometric analysis (which, as noted, I’ll reproduce in the next section) shows that that angle is equal to the wavelength divided by the width of the slit, so we have Δθ = λ/B. So now we can write:

Δpy = p0Δθ = p0λ/B

That shows that the uncertainty in regard to the vertical momentum is, indeed, inversely proportional to the uncertainty in regard to its position (Δy), which is the slit width B. But we can go one step further. The de Broglie relation relates wavelength to momentum: λ = h/p. What momentum? Well… Feynman is a bit vague on that: he equates it with the electron’s horizontal momentum, so he writes λ = h/p0. Is this correct? Well… Yes and no. The de Broglie relation associates a wavelength with the total momentum, but then it’s obvious that most of the momentum is still horizontal, so let’s go along with this. What about the wavelength? What wavelength are we talking about here? It’s obviously the wavelength of the complex-valued wave function–the ‘probability wave’ so to say.

OK. So, what’s next? Well… Now we can write that Δpy = p0Δθ = p0λ/B = p0(h/p0)/B. Of course, the pfactor vanishes and, hence, bringing B to the other side and substituting for Δy = B yields the following grand result:

Δy·Δp= h

Wow ! Did Feynman ‘prove’ Heisenberg’s Uncertainty Principle here?

Well… No. Not really. First, the ‘proof’ above actually assumes there’s fundamental uncertainty as to the position and momentum of a particle (so it actually assumes some uncertainty principle from the start), and then it derives it from another fundamental assumption, i.e. the de Broglie relation, which is obviously related to the Uncertainty Principle. Hence, all of the above is only an illustration of the Uncertainty Principle. It’s no proof. As far as I know, one can’t really ‘prove’ the Uncertainty Principle: it’s a fundamental assumption which, if accepted, makes our observations consistent with the theory that is based on it, i.e. quantum or wave mechanics.

Finally, note that everything that I wrote above also takes the diffraction pattern as a given and, hence, while all of the above indeed illustrates the Uncertainty Principle, it’s not an explanation of the phenomenon of diffraction as such. For such explanation, we need a rigorous mathematical analysis, and that’s a classical analysis. Let’s go for it!

Going from six to n oscillators

The mathematics involved in analyzing diffraction and/or interference are actually quite tricky. If you’re alert, then you should have noticed that I used two illustrations that both have six oscillators but that the interference pattern doesn’t match. I’ve reproduced them below. The illustration on the right-hand side has six oscillators and shows a second beam besides the central one–and, of course, there’s such beam also 30° higher, so we have (at least) three beams with the same intensity here–while the animation on the left-hand side shows only one central beam. So what’s going on here?

Six-dipole antenna Huygens_Fresnel_Principle

The answer is that, in the particular example on the left-hand side, the successive dipole radiators (i.e. the point sources) are separated by a distance of two wavelengths (2λ). In that case, it is actually possible to find an angle where the distance δ between successive dipoles is exactly one wavelength (note the little δ in the illustration, as measured from the second point source), so that the effects from all of them are in phase again. So each one is then delayed relative to the next one by 360 degrees, and so they all come back in phase, and then we have another strong beam in that direction! In this case, the other strong beam has an angle of 30 degrees as compared to the E-W line. If we would put in some more oscillators, to ensure that they are all closer than one wavelength apart, then this cannot happen. And so it’s not happening with light. :-) But now that we’re here, I’ll just quickly note that it’s an interesting and useful phenomenon used in diffraction gratings, but I’ll refer you to the literature on that, as I shouldn’t be bothering you with all these digressions. So let’s get back at it.

In fact, let me skip the nitty-gritty of the detailed analysis (I’ll refer you to Feynman’s Lectures for that) and just present the grand result for n oscillators, as depicted below:

n oscillatorsThis, indeed, shows the diffraction pattern we are familiar with: one strong maximum separated from successive smaller ones (note that the dotted curve magnifies the actual curve with a factor 10). The vertical axis shows the intensity, but expressed as a fraction of the maximum intensity, which is n2I(Iis the intensity we would observe if there was only 1 oscillator). As for the horizontal axis, the variable there is ϕ really, although we re-scale the variable in order to get 1, 2, 2 etcetera for the first, second, etcetera minimum. This ϕ is the phase difference. It consists of two parts:

  1. The intrinsic relative phase α, i.e. the difference in phase between one oscillator and the next: this is assumed to be zero in all of the examples of diffraction patterns above but so the mathematical analysis here is somewhat more general.
  2. The phase difference which results from the fact that we are observing the array in a given direction θ from the normal. Now that‘s the interesting bit, and it’s not so difficult to show that this additional phase is equal to 2πdsinθ/λ, with d the distance between two oscillators, λ the wavelength of the radiation, and θ the angle from the normal.

In short, we write:

ϕ α 2πdsinθ/λ

Now, because I’ll have to use the variables below in the analysis that follows, I’ll quickly also reproduce the geometry of the set-up (all illustrations here taken from Feynman’s Lectures): 


Before I proceed, please note that we assume that d is less than λ, so we only have one great maximum, and that’s the so-called zero-order beam centered at θ 0. In order to get subsidiary great maxima (referred to as first-order, second-order, etcetera beams in the context of diffraction gratings), we must have the spacing d of the array greater than one wavelength, but so that’s not relevant for what we’re doing here, and that is to move from a discrete analysis to a continuous one.

Before we do that, let’s look at that curve again and analyze where the first minimum occurs. If we assume that α = 0 (no intrinsic relative phase), then the first minimum occurs when ϕ 2π/n. Using the ϕ α 2πdsinθ/λ formula, we get 2πdsinθ/λ 2π/n or ndsinθ λ. What does that mean? Well, nd is the total length L of the array, so we have ndsinθ Lsinθ Δ = λWhat that means is that we get the first minimum when Δ is equal to one wavelength.

Now why do we get a minimum when Δ λ? Because the contributions of the various oscillators are then uniformly distributed in phase from 0° to 360°. What we’re doing, once again, is adding arrows in order to get a resultant arrow AR, as shown below for n = 6. At the first minimum, the arrows are going around a whole circle: we are adding equal vectors in all directions, and such a sum is zero. So when we have an angle θ such that Δ λ, we get the first minimum. [Note that simple trigonometry rules imply that θ must be equal to λ/L, a fact which we used in that quantum-mechanical analysis of electron diffraction above.]    

Adding waves

What about the second minimum? Well… That occurs when ϕ = 4π/n. Using the ϕ 2πdsinθ/λ formula again, we get 2πdsinθ/λ = 4π/n or ndsinθ = 2λ. So we get ndsinθ Lsinθ Δ = 2λ. So we get the second minimum at an angle θ such that Δ = 2λFor the third minimum, we have ϕ = 6π/n. So we have 2πdsinθ/λ = 6π/n or ndsinθ = 3λ. So we get the third minimum at an angle θ such that Δ = 3λAnd so on and so on.

The point to note is that the diffraction pattern depends only on the wavelength λ and the total length L of the array, which is the width of the slit of course. Hence, we can actually extend the analysis for n going from some fixed value to infinity, and we’ll find that we will only have one great maximum with a lot of side lobes that are much and much smaller, with the minima occurring at angles such that Δ = λ, 2λ, 3λ, etcetera.

OK. What’s next? Well… Nothing. That’s it. I wanted to do a post on diffraction, and so that’s what I did. However, to wrap it all up, I’ll just include two more images from Wikipedia. The one on the left shows the diffraction pattern of a red laser beam made on a plate after passing a small circular hole in another plate. The pattern is quite clear. On the right-hand side, we have the diffraction pattern generated by ordinary white light going through a hole. In fact, it’s a computer-generated image and the gray scale intensities have been adjusted to enhance the brightness of the outer rings, because we would not be able to see them otherwise.

283px-Airy-pattern 600px-Laser_Interference

But… Didn’t I say I would write about diffraction and the Uncertainty Principle? Yes. And I admit I did not write all that much about the Uncertainty Principle above. But so I’ll do that in my next post, in which I intend to look at Heisenberg’s own illustration of the Uncertainty Principle. That example involves a good understanding of the resolving power of a lens or a microscope, and such understanding also involves some good mathematical analysis. However, as this post has become way too long already, I’ll leave that to the next post indeed. I’ll use the left-hand image above for that, so have a good look at it. In fact, let me quickly quote Wikipedia as an introduction to my next post:

The diffraction pattern resulting from a uniformly-illuminated circular aperture has a bright region in the center, known as the Airy disk which together with the series of concentric bright rings around is called the Airy pattern.

We’ll need in order to define the resolving power of a microscope, which is essential to understanding Heisenberg’s illustration of the Principle he advanced himself. But let me stop here, as it’s the topic of my next write-up indeed. This post has become way too long already. :-)

Photons as strings

In my previous post, I explored, somewhat jokingly, the grey area between classical physics and quantum mechanics: light as a wave versus light as a particle. I did so by trying to picture a photon as an electromagnetic transient traveling through space, as illustrated below. While actual physicists would probably deride my attempt to think of a photon as an electromagnetic transient traveling through space, the idea illustrates the wave-particle duality quite well, I feel.

Photon wave

Understanding light is the key to understanding physics. Light is a wave, as Thomas Young proved to the Royal Society of London in 1803, thereby demolishing Newton’s corpuscular theory. But its constituents, photons, behave like particles. According to modern-day physics, both were right. Just to put things in perspective, the thickness of the note card which Young used to split the light – ordinary sunlight entering his room through a pinhole in a window shutter – was 1/30 of an inch, or approximately 0.85 mm. Hence, in essence, this is a double-slit experiment with the two slits being separated by a distance of almost 1 millimeter. That’s enormous as compared to modern-day engineering tolerance standards: what was thin then, is obviously not considered to be thin now. Scale matters. I’ll come back to this.

Young’s experiment (from

Young experiment

The table below shows that the ‘particle character’ of electromagnetic radiation becomes apparent when its frequency is a few hundred terahertz, like the sodium light example I used in my previous post: sodium light, as emitted by sodium lamps, has a frequency of 500×1012 oscillations per second and, therefore (the relation between frequency and wavelength is very straightforward: their product is the velocity of the wave, so for light we have the simple λf = c equation), a wavelength of 600 nanometer (600×10–9 meter).

Electromagnetic spectrum

However, whether something behaves like a particle or a wave also depends on our measurement scale: 0.85 mm was thin in Young’s time, and so it was a delicate experiment then but now, it’s a standard classroom experiment indeed. The theory of light as a wave would hold until more delicate equipment refuted it. Such equipment came with another sense of scale. It’s good to remind oneself that Einstein’s “discovery of the law of the photoelectric effect”, which explained the photoelectric effect as the result of light energy being carried in discrete quantized packets of energy, now referred to as photons, goes back to 1905 only, and that the experimental apparatus which could measure it was not much older. So waves behave like particles if we look at them close enough. Conversely, particles behave like waves if we look at them close enough. So there is this zone where they are neither, the zone for which we invoke the mathematical formalism of quantum mechanics or, to put it more precisely, the formalism of quantum electrodynamics: that “strange theory of light and Matter”, as Feynman calls it.

Let’s have a look at how particles became waves. It should not surprise us that the experimental apparatuses needed to confirm that electrons–or matter in general–can actually behave like a wave is more recent than the 19th century apparatuses which led Einstein to develop his ‘corpuscular’ theory of light (i.e. the theory of light as photons). The engineering tolerances involved are daunting. Let me be precise here. To be sure, the phenomenon of electron diffraction (i.e. electrons going through one slit and producing a diffraction pattern on the other side) had been confirmed experimentally already in 1925, in the famous Davisson-Germer experiment. I am saying because it’s rather famous indeed. First, because electron diffraction was a weird thing to contemplate at the time. Second, because it confirmed the de Broglie hypothesis only two years after Louis de Broglie had advanced it. And, third, because Davisson and Germer had never intended to set it up to detect diffraction: it was pure coincidence. In fact, the observed diffraction pattern was the result of a laboratory accident, and Davisson and Germer weren’t aware of other, conscious, attempts of trying to prove the de Broglie hypothesis. :-) […] OK. I am digressing. Sorry. Back to the lesson.

The nanotechnology that was needed to confirm Feynman’s 1965 thought experiment on electron interference (i.e. electrons going through two slits and interfering with each other (rather than producing some diffraction pattern as they go through one slit only) – and, equally significant as an experiment result, with themselves as they go through the slit(s) one by one! – was only developed over the past decades. In fact, it was only in 2008 (and again in 2012) that the experiment was carried out exactly the way Feynman describes it in his Lectures.

It is useful to think of what such experiments entail from a technical point of view. Have a look at the illustration below, which shows the set-up. The insert in the upper-left corner shows the two slits which were used in the 2012 experiment: they are each 62 nanometer wide – that’s 50×10–9 m! – and the distance between them is 272 nanometer, or 0.272 micrometer. [Just to be complete: they are 4 micrometer tall (4×10–6 m), and the thing in the middle of the slits is just a little support (150 nm) to make sure the slit width doesn’t vary.]

The second inset (in the upper-right corner) shows the mask that can be moved to close one or both slits partially or completely. The mask is 4.5µm wide ×20µm tall. Please do take a few seconds to contemplate the technology behind this feat: a nanometer is a millionth of a millimeter, so that’s a billionth of a meter, and a micrometer is a millionth of a meter. To imagine how small a nanometer is, you should imagine dividing one millimeter in ten, and then one of these tenths in ten again, and again, and once again, again, and again. In fact, you actually cannot imagine that because we live in the world we live in and, hence, our mind is used only to addition (and subtraction) when it comes to comparing sizes and – to a much more limited extent – with multiplication (and division): our brain is, quite simply, not wired to deal with exponentials and, hence, it can’t really ‘imagine’ these incredible (negative) powers. So don’t think you can imagine it really, because one can’t: in our mind, these scales exist only as mathematical constructs. They don’t correspond to anything we can actually make a mental picture of.

Electron double-slit set-up

The electron beam consisted of electrons with an (average) energy of 600 eV. That’s not an awful lot: 8.5 times more than the energy of an electron in orbit in a atom, whose energy would be some 70 eV, so the acceleration before they went through the slits was relatively modest. I’ve calculated the corresponding de Broglie wavelength of these electrons in another post (Re-Visiting the Matter-Wave, April 2014), using the de Broglie equations: f = E/h or λ = p/h. And, of course, you could just google the article on the experiment and read about it, but it’s a good exercise, and actually quite simple: just note that you’ll need to express the energy in joule (not in eV) to get it right. Also note that you need to include the rest mass of the electron in the energy. I’ll let you try it (or else just go to that post of mine). You should find a de Broglie wavelength of 50 picometer for these electrons, so that’s 50×10–12 m. While that wavelength is less than a thousandth of the slit width (62 nm), and about 5,500 times smaller than the space between the two slits (272 nm), the interference effect was unambiguous in the experiment. I advice you to google the results yourself (or read that April 2014 post of mine if you want a summary): the experiment was done at the University of Nebraska-Lincoln in 2012.

Electrons and X-rays

To put everything in perspective: 50 picometer is like the wavelength of X-rays, and you can google similar double-slit experiments for X-rays: they also loose their ‘particle behavior’ when we look at them at this tiny scale. In short, scale matters, and the boundary between ‘classical physics’ (electromagnetics) and quantum physics (wave mechanics) is not clear-cut. If anything, it depends on our perspective, i.e. what we can measure, and we seem to be shifting that boundary constantly. In what direction?

Downwards obviously: we’re devising instruments that measure stuff at smaller and smaller scales, and what’s happening is that we can ‘see’ typical ‘particles’, including hard radiation such as gamma rays, as local wave trains. Indeed, the next step is clear-cut evidence for interference between gamma rays.

Energy levels of photons

We would not associate low-frequency electromagnetic waves, such as radio or radar waves, with photons. But light in the visible spectrum, yes. Obviously. […]

Isn’t that an odd dichotomy? If we see that, on a smaller scale, particles start to look like waves, why would the reverse not be true? Why wouldn’t we analyze radio or radar waves, on a much larger scale, as a stream of very (I must say extremely) low-energy photons? I know the idea sounds ridiculous, because the energies involved would be ridiculously low indeed. Think about it. The energy of a photon is given by the Planck relation: E = h= hc/λ. For visible light, with wavelengths ranging from 800 nm (red) to 400 nm (violet or indigo), the photon energies range between 1.5 and 3 eV. Now, the shortest wavelengths for radar waves are in the so-called millimeter band, i.e. they range from 1 mm to 1 cm. A wavelength of 1 mm corresponds to a photon energy of 0.00124 eV. That’s close to nothing, of course, and surely not the kind of energy levels that we can currently detect.

But you get the idea: there is a grey area between classical physics and quantum mechanics, and it’s our equipment–notably the scale of our measurements–that determine where that grey area begins, and where it ends, and it seems to become larger and larger as the sensitivity of our equipment improves.

What do I want to get at? Nothing much. Just some awareness of scale, as an introduction to the actual topic of this post, and that’s some thoughts on a rather primitive string theory of photons. What !? 

Yes. Purely speculative, of course. :-)

Photons as strings

I think my calculations in the previous post, as primitive as they were, actually provide quite some food for thought. If we’d treat a photon in the sodium light band (i.e. the light emitted by sodium, from a sodium lamp for instance) just like any other electromagnetic pulse, we would find it’s a pulse of not less than 3 meter long. While I know we have to treat such photon as an elementary particle, I would think it’s very tempting to think of it as a vibrating string.


Yes. Let me copy that graph again. The assumption I started with is a standard one in physics, and not something that you’d want to argue with: photons are emitted when an electron jumps from a higher to a lower energy level and, for all practical purposes, this emission can be analyzed as the emission of an electromagnetic pulse by an atomic oscillator. I’ll refer you to my previous post – as silly as it is – for details on these basics: the atomic oscillator has a Q, and so there’s damping involved and, hence, the assumption that the electromagnetic pulse resembles a transient should not sound ridiculous. Because the electric field as a function in space is the ‘reversed’ image of the oscillation in time, the suggested shape has nothing blasphemous.

Photon wave

Just go along with it for a while. First, we need to remind ourselves that what’s vibrating here is nothing physical: it’s an oscillating electromagnetic field. That being said, in my previous post, I toyed with the idea that the oscillation could actually also represent the photon’s wave function, provided we use a unit for the electric field that ensures that the area under the squared curve adds up to one, so as to normalize the probability amplitudes. Hence, I suggested that the field strength over the length of this string could actually represent the probability amplitudes, provided we choose an appropriate unit to measure the electric field.

But then I was joking, right? Well… No. Why not consider it? An electromagnetic oscillation packs energy, and the energy is proportional to the square of the amplitude of the oscillation. Now, the probability of detecting a particle is related to its energy, and such probability is calculated from taking the (absolute) square of probability amplitudes. Hence, mathematically, this makes perfect sense.

It’s quite interesting to think through the consequences, and I’ll continue to do so in the coming weeks. One interesting thing is that the field strength (i.e. the magnitude of the electric field vector) is a real number. Hence, if we equate these magnitudes with probability amplitudes, we’d have real probability amplitudes, instead of complex-valued ones. That’s not an issue: we could look at them as complex numbers of which the imaginary part is zero. In fact, an obvious advantage of equating the imaginary part of the probability amplitude with zero is that we free up a dimension which we can then use to analyze the oscillation of the electric field in the other direction that’s normal to the direction of propagation, i.e. the z-coordinate, so we could take the polarization of the light into account. The figure below–which I took from Wikipedia again (by far the most convenient place to shop for images and animations: what would I do without it?– does not show the y- and z-coordinate of circularly or elliptically polarized light (it actually shows both the electric as well as the magnetic field vector associated with linearly polarized light, and also note that my y-direction is the x-direction here), but you get the idea: a probability wave has two spatial dimensions normal to its direction of propagation, and so we’d have two complex-valued functions really. However, if we can equate the electric field with the probability amplitude, we’d have a real-valued wave function. :-)


Another interesting thing to think about is how the collapse of the wave function would come about. If we think of a photon as a string, we can think of its energy as some kind of hook which could cause it to collapse into another lump of energy. What kind of hook? What force would come into play? Well… Perhaps all of them, as we know that even the weakest of fundamental forces – gravity – becomes much stronger at smaller distance scale, as it’s also subject to the inverse square law: its strength decreases (or increases) as the square of the distance. In fact, I don’t know much–nothing at all, actually–about great unification theories, but I understand that the prediction is that, at the Planck scale, all forces unify and become: gravity, electromagnetic force, strong nuclear force, and even weak nuclear force. So, why wouldn’t we think of some part of the string getting near enough to ‘something else’ (e.g. an electron) to get hooked and then collapse into the particle it is interacting with?

You must be laughing aloud now. A new string theory–really?

I know… I know… I haven’t reach sophomore level and I am already wildly speculating… Well… Yes. What I am talking about here has probably nothing to do with current string theories, although my proposed string would also replace the point-like photon by a one-dimensional ‘string’. However, ‘my’ string is, quite simply, an electromagnetic pulse (a transient actually, for reasons I explained in my previous post). Naive? Perhaps. However, I note that the earliest version of string theory is referred to as bosonic string theory, because it only incorporated bosons, which is what photons are.

So what? Well… Nothing… I am sure others have thought of this too, and I’ll look into it. It’s surely an idea which I’ll keep in the back of my head as I continue to explore physics. The idea is just too simple and beautiful to disregard, even if I am sure it must be pretty naive indeed. Photons as three-meter long strings? Let’s just forget about it. :-) Onwards !!!  :-)

The shape and size of a photon

Photons are weird. In fact, I already wrote some fairly confused posts on them. This post is probably even more confusing. If anything, it shows how easy it is to get lost when thinking things through. In any case, it did help me to make sense of it all and, hence, perhaps it will help you too.

Electrons and photons: similarities and differences

All elementary particles are weird. As Feynman puts it, in the very first paragraph of his Lectures on Quantum Mechanics : “Historically, the electron, for example, was thought to behave like a particle, and then it was found that in many respects it behaved like a wave. So it really behaves like neither. Now we have given up. We say: “It is like neither. There is one lucky break, however—electrons behave just like light. The quantum behavior of atomic objects (electrons, protons, neutrons, photons, and so on) is the same for all, they are all “particle waves,” or whatever you want to call them. So what we learn about the properties of electrons will apply also to all “particles,” including photons of light.” (Feynman’s Lectures, Vol. III, Chapter 1, Section 1)

I wouldn’t dare to argue with Feynman, of course… But… Photons are like electrons, and then they are not. For starters, photons do not have mass, and they are bosons, force-carrying ‘particles’ obeying very different quantum-mechanical rules, referred to as Bose-Einstein statistics. I’ve written about that in the past, so I won’t do that again here. It’s probably sufficient to remind the reader that these rules imply that the so-called Pauli exclusion principle does not apply to them: bosons like to crowd together, thereby occupying the same quantum state–unlike their counterparts, the so-called fermions or matter-particles: quarks (which make up protons and neutrons) and leptons (including electrons and neutrinos), which can’t do that: two electrons can only sit on top of each other if their spins are opposite (so that makes their quantum state different), and there’s no place whatsoever to add a third one–because there are only two possible values for the spin: up or down.

From all that I’ve been writing so far, I am sure you have some kind of picture of matter-particles now, and notably of the electron: when everything is said and done, it’s a point-like particle defined by some weird wave function–the so-called ‘probability wave’. But what about the photon? They are point-like particles too, aren’t they? Hence, why wouldn’t we associate them with a probability wave too? Do they have a de Broglie wavelength?

Before answering that question, let me present that ‘picture’ of the electron once again.

The wave function for electrons

The electron ‘picture’ can be represented in a number of ways but one of the more scientifically correct ones – whatever that means – is that of a spatially confined wave function representing a complex quantity referred to as the probability amplitude. The animation below (which I took from Wikipedia) visualizes such wave functions. As you know by now, the wave function is usually represented by the Greek letter psi (ψ), and it is often referred to, as mentioned above, as a ‘probability wave’, although that term is quite misleading. Why? You surely know that by now: the wave function represents a probability amplitude, not a probability.

That being said, probability amplitude and probability are obviously related: if we square the psi function (so we square all these amplitudes), then we get the actual probability of finding that electron at point x. That’s the so-called probability density function on the right of each function. [I should be fully correct now and note that we are talking the absolute square here, or the squared norm: remember that the square of a complex number can be negative, as evidenced by the definition of i: i= –1. In fact, if there’s only an imaginary part, then its square is always negative.]


Below, I’ve inserted another image, which gives a static picture (i.e. one that is not varying in time) of the wave function of an electron. To be precise: it’s the wave function for an electron on the 5d orbital of a hydrogen orbital. I also took it from Wikipedia and so I’ll just copy the explanation here: “The solid body shows the places where the electron’s probability density is above a certain value (0.02), as calculated from the probability amplitude.” As for the colors, this image uses the so-called HSL color system to represent complex numbers: each complex number is represented by a unique color, with a different hue (H), saturation (S) and lightness (L). [Just google if you want to know how that works exactly.]


The Uncertainty Principle revisited

The wave function is usually given as a function in space and time: ψ = ψ(x, t). However, I should also remind you that we have a similar function in the ‘momentum space’.  Indeed, the position-space and momentum-space wave functions are related through the Uncertainty Principle. To be precise: they are Fourier transforms of each other – but don’t be put off by that statement. I’ll just quickly jot down the Uncertainty Principle once again:

σx·σ≥ ħ/2

This is the so-called Kennard formulation of the Principle: it measures the uncertainty (usually written as Δ), of both the position (Δx) as well as momentum (Δp), in terms of the standard deviation–that’s the σ (sigma) symbol–around the mean. Hence, the assumption is that both x and p follow some kind of distribution–and that’s usually a nice “bell curve” in the textbooks. Finally, let me also remind you how tiny that physical constant ħ actually is: about 6.58×10−16 eV·s.

At this point, you may wonder about the units. A position is expressed in distance units, and momentum… Euh… […] Momentum is mass times velocity, so it’s kg·m/s. Hence, the dimension of the product on the left-hand side of the inequality is m·kg·m/s = kg·m2/s. So what about this eV·s dimension on the right-hand side? Well… The electronvolt is a unit of energy, and so we can convert it to joules. Now, a joule is a newton-meter (N·m), which is the unit for both energy and work. So how do we relate the two sides? Simple: a newton can also be expressed in SI units: 1 N = 1 kg·m/s2: one newton is the force needed to give a mass of 1 kg an acceleration of 1 m/s per second. So just substitute and you’ll see the dimension on the right-hand side is kg·(m/s2)·m·s = kg·m2/s, so it comes out alright. Why this digression? Not sure. Perhaps just to remind you also that the Uncertainty Principle can also be expressed in terms of energy and time:

ΔE·Δt ≥ ħ/2

That expression makes it clear the units on both sides of the inequality are, indeed, the same, but it’s not so obvious to relate the two expressions of the same Uncertainty Principle. I’ll just note that energy and time, just like position and momentum, are also so-called complementary variables in quantum mechanics. We have the same duality for the de Broglie relation, which I’ll also jot down here:

λ = h/p and f = E/h

In these two complementary equations, λ is the wavelength of the (complex-valued) de Broglie wave, and is its frequency. A stupid question perhaps: what’s the velocity of the de Broglie wave? Well… As you should know from previous posts, the mathematically correct answer involves distinguishing the group and phase velocity of a wave, but the easy answer is: the de Broglie wave of a particle moves with the particle :-) and, hence, its velocity is, obviously, the speed of the particle which, for electrons, is usually non-relativistic (i.e. rather slow as compared to the speed of light).

Before proceeding, I need to make some more introductory remarks. The first is that the Uncertainty Principle implies that we cannot assign a precise wavelength (or a equally precise frequency) to a de Broglie wave: if there is a spread in p (and, hence, in E), then there will be a spread in λ (and in f). That’s good, because a regular wave with an exact frequency would not give us any information about the location. Frankly, I always had a lot of trouble understanding this, so I’ll just quote the expert teacher (Feynman) on this:

“The amplitude to find a particle at a place can, in some circumstances, vary in space and time, let us say in one dimension, in this manner: ψ Aei(ωtkx, where ω is the frequency, which is related to the classical idea of the energy through ħω, and k is the wave number, which is related to the momentum through ħk. [These are equivalent formulations of the de Broglie relations using the angular frequency and the wave number instead of wavelength and frequency.] We would say the particle had a definite momentum p if the wave number were exactly k, that is, a perfect wave which goes on with the same amplitude everywhere. The ψ Aei(ωtkxequation [then] gives the [complex-valued probability] amplitude, and if we take the absolute square, we get the relative probability for finding the particle as a function of position and time. This is a constant, which means that the probability to find a [this] particle is the same anywhere.” (Feynman’s Lectures, I-48-5)

Of course, that’s a problem: if the probability to find a particle is the same anywhere, then the particle can be anywhere, and, hence, that means it’s actually nowhere. Hence, that wave function doesn’t serve the purpose. In short, that nice ψ Aei(ωtkxfunction is completely useless in terms of representing an electron, or any other actual particle moving through space. So what to do?

The Wikipedia article on the Uncertainty Principle has this wonderful animation that shows how we can superimpose several waves to form a wave packet. And so that’s what we want: a wave packet traveling through (and limited in) space. I’ve copied the animation below. You should note that it shows only one part of the complex probability amplitude: just visualize the other part (imaginary if the wave below is the real part, and vice versa if the wave below would happen to represent the imaginary part of the probability amplitude). The illustration basically illustrates a mathematical operation (a Fourier analysis–and that’s not the same as that Fourier transform I mentioned above, although these two mathematical concepts obviously have a few things in common) that separates a wave packet into a finite or (potentially) infinite number of component waves. Indeed, note how, in the illustration below, the frequency increases gradually (or, what amounts to the same, the wavelength gets smaller) and, with every wave we add to the packet, it becomes increasingly localized. So that illustrates how the ‘uncertainty’ about (or the spread in) the frequency is inversely related to the uncertainty in position:

Δx = 1/[Δ(1/λ)] = h/Δp.


Frankly, I must admit both Feynman’s explanation and the animation above don’t quite convince me, because I can perfectly imagine a localized wave train with a fixed frequency and wavelength, like the one below, which I’ll re-use later. I’ve made this wave train myself: it’s a standard sine or cosine function multiplied with another easy function generating the envelope. You can easily make one like this yourself. This thing is localized in space and, as mentioned above, it has a fixed frequency and wavelength.

graph (1)

In any case, I need to move on. If you, my reader, would have any suggestion here (I obviously don’t quite get it here), please let me know. As for now, however, I’ll just continue wandering. Let me proceed with two more remarks:

  1. What about the amplitude of a de Broglie wave? Well… It’s just like this perceived problem with fixed wavelengths: frankly, I couldn’t find much about that in textbooks either. However, there’s an obvious constraint: when everything is said and done, all probabilities must take a value between 0 and 1, and they must also all add up to exactly 1. So that’s a so-called normalization condition that obviously imposes some constraints on the (complex-valued) probability amplitudes of our wave function.
  2. The complex waves above are so-called standing waves: their frequencies only take discrete values. You can, in fact, see that from the first animation. Likewise, there is no continuous distribution of energies. Well… Let me be precise and qualify that statement: in fact, when the electron is free, it can have any energy. It’s only when it’s bound that it must take one or another out of a set of allowed values, as illustrated below.

Energy Level Diagrams

Now, you will also remember that an electron absorbs or emits a photon when it goes from one energy level to the other, so it absorbs or emits radiation. And, of course, you will also remember that the frequency of the absorbed or emitted light is related to those energy levels. More specifically, the frequency of the light emitted in a transition from, let’s say, energy level Eto Ewill be written as ν31 = (E– E1)/h. This frequency will be one of the so-called characteristic frequencies of the atom and will define a specific so-called spectral emission line.

Now that’s where the confusion starts, because that ν31 = (E– E1)/h equation is – from a mathematical point of view – identical to the de Broglie equation, which assigns a de Broglie wave to a particle: f = E/h. While mathematically similar, the formulas represent very different things. The most obvious remark to make is that a de Broglie wave is a matter-wave and, as such, quite obviously, that it has nothing to do with an electromagnetic wave. Let me be even more explicit. A de Broglie wave is not a ‘real’ wave, in a sense (but, of course, that’s a very unscientific statement to make); it’s a psi function, so it represents these weird mathematical quantities–complex probability amplitudes–which allow us to calculate the probability of finding the particle at position x or, if it’s a wave function for the momentum-space, to find a value p for its momentum. In contrast, a photon that’s emitted or absorbed represents a ‘real’ disturbance of the electromagnetic field propagating through space. Hence, that frequency ν is something very different than f, which is why we another symbol for it (ν is the Greek letter nu, not to be confused with the v we use for velocity). [Of course, let’s not get into a philosophical discussion on how ‘real’ an electromagnetic field is here.]

That being said, we also know light is emitted in discrete energy packets: in fact, that’s how photons were defined originally, first by Planck and then by Einstein. Now, when an electron falls from one energy level in an atom to another (lower) energy level, it emits one – and only one – photon with that particular wavelength and energy. The question then is: how should we picture that photon? Does it also have some more or less defined position and momentum? Can we also associate some wave function – i.e. a de Broglie wave – with it?

When you google for an answer to this simple question, you will get very complicated and often contradictory answers. The very short answer I got from a nuclear scientist – you can imagine that these guys don’t have the time to give a more nuanced answer to idiots like me – was simple: No. One does not associate photons with a de Broglie wave.

When he gave me that answer, I first thought: of course not. A de Broglie wave is a ‘matter wave’, and photons aren’t matter. Period. That being said, the answer is not so simple. Photons do behave like electrons, don’t they? There’s diffraction (when you send a photon through one slit) and interference (when photons go through two slits, and there’s interference even when they go through one by one, just like electrons), and so on and so on. Most importantly, the Uncertainty Principle obviously also applies to them and, hence, one would expect to be able to associate some kind of wave function with them, wouldn’t one?

Who knows? That’s why I wrote this post. I don’t know, so I thought it would be nice to just freewheel a bit on this question. So be warned: nothing of what I write below has been researched really, so critical comments and corrections from actual specialists are more than welcome.

The shape of a photon wave

The answer in regard to the definition of a photon’s position and momentum is, obviously, unambiguous: photons are, per definition, little lumps of energy indeed, and so we detect them as such and, hence, they do occupy some physical space: we detect them somewhere, and we usually do so at a rather precise point in time.

They obviously also have some momentum. In fact, I calculated that momentum in one of my previous posts (Light: Relating Waves to Photons). It was related to the magnetic field vector, which we usually never mention when discussing light – because it’s so tiny as compared to the electric field vector – but so it’s there and a photon’s momentum (in the direction of propagation) is as tiny as the magnetic field vector for an electromagnetic wave traveling through space: p = E/c, with E = νh and c = 3×108. Hence, the momentum of a photon is only a tiny fraction of the photon’s energy. In fact, because it’s so tiny (remember that the energy of a photon is only one or two eV, and so that’s a very tiny unit, and so we have to divide that by c, which is a huge number obviously…), I’ll just forget about it here for a while and focus on that electric field vector only.

So… A photon is, in essence, a electromagnetic disturbance and so, when trying to picture a photon, we can think of some oscillating electric field vector traveling through–and also limited in–space. In short, in the classical world – and in the classical world only of course – a photon must be some electromagnetic wave train, like the one below–perhaps.

Photon - E

Hmm… Why would it have that shape? I don’t know. Your guess is as good as mine. But you’re right to be skeptical. In fact, the wave train above has the same shape as Feynman’s representation of a particle (see below) as a ‘probability wave’ traveling through–and limited in–space. Wave train

So, what about it? Let me first remind you once again (I just can’t stress this point enough it seems) that Feynman’s representation – and most are based on his, it seems – is, once again, misleading because it suggests that ψ(x) is some real number. It’s not. In the image above, the vertical axis should not represent some real number (and it surely should not represent a probability, i.e. some real positive number between 0 and 1) but a probability amplitude, i.e. a complex number in which both the real and imaginary part are important. As mentioned above, this wave function will, of course, give you all the probabilities you need when you take its (absolute) square, but so it’s not the same: the image above gives you only one part of the complex-valued wave function (it could be either the real or the imaginary part, in fact), which is why I find it misleading.

But let me go back to the first illustration: the vertical axis of the first illustration is not ψ but E – the electric field vector. So there’s no imaginary part here: just a real number, representing the strength–or magnitude I should say– of the electric field E as a function of the space coordinate x. [Can magnitudes be negative? Of course ! In that case, the field vector points the other way !] Does this suggestion for how a photon could look like make sense? In quantum mechanics, the answer is obviously: no. But in the classical worldview? Well… Maybe. […] You should be skeptical, however. Even in the classical world, the answer is: most probably not. I know you won’t like it (because the formula doesn’t look easy), but let me remind you of the formula for the (vertical) component of E as a function of the acceleration of some charge q:

EMR law

The charge q (i.e. the source of the radiation) is, of course, our electron that’s emitting the photon as it jumps from a higher to a lower energy level (or, vice versa, absorbing it). This formula basically states that the magnitude of the electric field (E) is proportional to the acceleration (a) of the charge (with t–r/c the retarded argument). Hence, the suggested shape of E as a function of x implies that the acceleration of the electron is initially quite small, that it swings between positive and negative, that these swings then become larger and larger to reach some maximum, and then they become smaller and smaller again to then die down completely. In short, it’s like a transient wave: “a short-lived burst of energy in a system caused by a sudden change of state”, as Wikipedia defines it.

[…] Well… No. An actual transient looks more like what’s depicted below: no gradual increase in amplitude but big swings initially which then dampen to zero. In other words, if our photon is a transient electromagnetic disturbance caused by a ‘sudden burst of energy’ (which is what that electron jump is, I would think), then its representation will, much more likely, resemble a damped wave, like the one below, rather than Feynman’s picture of a moving matter-particle.

graph (1)

In fact, we’d have to flip the image, both vertically and horizontally, because the acceleration of the source and the field are related as shown below. The vertical flip is because of the minus sign in the formula for E(t). The horizontal flip is because of the minus sign in the (t – r/c) term, the retarded argument: if we add a little time (Δt), we get the same value for a(tr/cas we would have if we had subtracted a little distance: Δr=cΔt. So that’s why E as a function of r (or of x), i.e. as a function in space, is a ‘reversed’ plot of the acceleration as a function of time.

wave in space

So we’d have something like below.

Photon wave

What does this resemble? It’s not a vibrating string (although I do start to understand the attractiveness of string theory now: vibrating strings are great as energy storage systems, so the idea of a photon being some kind of vibrating string sounds great, doesn’t it?). It’s not resembling a bullwhip effect either, because the oscillation of a whip is confined by a different envelope (see below). And, no, it’s also definitely not a trumpet.


It’s just what it is: an electromagnetic transient traveling through space. Would this be realistic as a ‘picture’ of a photon? Frankly, I don’t know. I’ve looked at a lot of stuff but didn’t find anything on this really. The easy answer, of course, is quite straightforward: we’re not interested in the shape of a photon because we know it is not an electromagnetic wave. It’s a ‘wavicle’, just like an electron.


Sure. I know that too. Feynman told me. :-) But then why wouldn’t we associate some wave function with it? Please tell me, because I really can’t find much of an answer to that question in the literature, and so that’s why I am freewheeling here. So just go along with me for a while, and come up with another suggestion. As I said above, your bet is as good as mine. All that I know is that there’s one thing we need to explain when considering the various possibilities: a photon has a very well-defined frequency (which defines its color in the visible light spectrum) and so our wave train should – in my humble opinion – also have that frequency. At least for quite a while–and then I mean most of the time, or on average at least. Otherwise the concept of a frequency – or a wavelength – doesn’t make sense. Indeed, if the photon has no defined wavelength or frequency, then it has no color, and a photon should have a color: that’s what the Planck relation is all about.

What would be your alternative? I mean… Doesn’t it make sense to think that, when jumping from one energy level to the other, the electron would initially sort of overshoot its new equilibrium position, to then overshoot it again on the other side, and so on and so on, but with an amplitude that becomes smaller and smaller as the oscillation dies out? In short, if we look at radiation as being caused by atomic oscillators, why would we not go all the way and think of them as oscillators subject to some damping force? Just think about it. :-)

The size of a photon wave

Let’s forget about the shape for a while and think about size. We’ve got an electromagnetic train here. So how long would it be? Well… Feynman calculated the Q of these atomic oscillators: it’s of the order of 10(see his Lectures, I-34-3: it’s a wonderfully simple exercise, and one that really shows his greatness as a physics teacher) and, hence, this wave train will last about 10–8 seconds (that’s the time it takes for the radiation to die out by a factor 1/e). That’s not very long but, taking into account the rather spectacular speed of light (3×10m/s), that still makes for a wave train with a length of 3 meter. Three meter !? Holy sh**! That’s like infinity on an atomic scale! Such length surely doesn’t match the picture of a photon as a fundamental particle which cannot be broken up, does it? This surely cannot be right and, if it is, then there surely must be some way to break this thing up. It can’t be ‘elementary’, can it?

Well… You’re right, of course. I shouldn’t be doing these classical analyses of a photon, but then I think it actually is kind of instructive. So please do double-check but that’s what it is, it seems. For sodium light (I am just continuing Feynman’s example) here, which has a frequency of 500 THz (500×1012 oscillations per second) and a wavelength of 600 nm (600×10–9 meter), that length corresponds to some five million oscillations. All packed into one photon? One photon with a length of three meters? You must be joking, right?

Sure. I am joking here–but, as far as jokes go, this one is fairly robust from a scientific point of view, isn’t it? :-) Again, please do double-check and correct me, but all what I’ve written so far is not all that speculative. It corresponds to all what I’ve read about it: only one photon is produced per electron in any de-excitation, and its energy is determined by the number of energy levels it drops, as illustrated (for a simple hydrogen atom) below. For those who continue to be skeptical about my sanity here, I’ll quote Feynman once again:

“What happens in a light source is that first one atom radiates, then another atom radiates, and so forth, and we have just seen that atoms radiate a train of waves only for about 10–8 sec; after 10–8 sec, some atom has probably taken over, then another atom takes over, and so on. So the phases can really only stay the same for about 10–8 sec. Therefore, if we average for very much more than 10–8 sec, we do not see an interference from two different sources, because they cannot hold their phases steady for longer than 10–8 sec. With photocells, very high-speed detection is possible, and one can show that there is an interference which varies with time, up and down, in about 10–8 sec.” (Feynman’s Lectures, I-34-4)


So… Well… Now it’s up to you. I am going along here with the assumption that a photon, from a classical world perspective, should indeed be something that’s several meters long and something that packs like five million oscillations. So, while we usually measure stuff in seconds, or hours, or years, and, hence, while we would that think 10–8 seconds is short, a photon would actually be a very stretched-out transient that occupies quite a lot of space. I should also add that, in light of that scale (3 meter), the dampening seems to happen rather slowly!

I can see you shaking your head, for various reasons. First because this type of analysis is not appropriate. [Yes. I know. A photon should not be viewed as an electromagnetic wave. It’s a discrete packet of energy. Period.] Second, I guess you may find the math involved in this post not to your liking, even if it’s quite simple and I am not doing anything spectacular here. […] Whatever. I don’t care. I’ll just bulldozer on.

What about the ‘vertical’ dimension, the y and the z coordinates in space? We’ve got this long snaky  thing: how thick-bodied is it?

Here, we need to watch our language. It’s not very obvious to associate a photon with some kind of cross-section normal to its direction of propagation. Not at all actually. Indeed, as mentioned above, the vertical axis of that graph showing the wave train does not indicate some spatial position: it’s not a y- (or z-)coordinate but the magnitude of an electric field vector. [Just to underline the fact that this magnitude has nothing to do with spatial coordinates: note that the value of that magnitude depends on our unit, so it’s really got nothing to do with an actual position in space-time.]

However, that being said, perhaps we can do something with that idea of a cross-section. In nuclear physics, the term ‘cross-section’ would usually refer to the so-called Thompson scattering cross-section, which can be defined rather loosely as the target area for the incident wave (i.e. the photon): it is, in fact, a surface which can be calculated from what is referred to as the classical electron radius, which is about 2.82×10–15 m. Just to compare: you may or may not remember the so-called Bohr radius of an atom, which is about 5.29×10–11 m, so that’s a length that’s about 20,000 times longer. To be fully complete, let me give you the exact value for the Thompson scattering cross-section: 6.62×10–29 m(note that this is a surface indeed, so we have m squared as a unit, not m).

Now, let me remind you – once again – that we should not associate the oscillation of the electric field vector with something actually happening in space: an electromagnetic field does not move in a medium and, hence, it’s not like a water or sound wave, which makes molecules go up and down as it propagates through its medium. To put it simply: there’s nothing that’s wriggling in space as that photon is flashing through space. However, when it does hit an electron, that electron will effectively ‘move’ (or vibrate or wriggle or whatever you can imagine) as a result of the incident electromagnetic field.

That’s what’s depicted and labeled below: there is a so-called ‘radial component’ of the electric field, and I would say: that’s our photon! [What else would it be?] The illustration below shows that this ‘radial’ component is just E for the incident beam and that, for the scattered beam, it is, in fact, determined by the electron motion caused by the incident beam through that relation described above, in which a is the normal component (i.e. normal to the direction of propagation of the outgoing beam) of the electron’s acceleration.


Now, before I proceed, let me remind you once again that the above illustration is, once again, one of those illustrations that only wants to convey an idea, and so we should not attach too much importance to it: the world at the smallest scale is best not represented by a billiard ball model. In addition, I should also note that the illustration above was taken from the Wikipedia article on elastic scattering (i.e. Thomson scattering), which is only a special case of the more general Compton scattering that actually takes place. It is, in fact, the low-energy limit. Photons with higher energy will usually be absorbed, and then there will be a re-emission, but, in the process, there will be a loss of energy in this ‘collision’ and, hence, the scattered light will have lower energy (and, hence, lower frequency and longer wavelength). But – Hey! – now that I think of it: that’s quite compatible with my idea of damping, isn’t it? :-) [If you think I’ve gone crazy, I am really joking here: when it’s Compton scattering, there’s no ‘lost’ energy: the electron will recoil and, hence, its momentum will increase. That’s what’s shown below (credit goes to the HyperPhysics site).]


In any case, I don’t want to make this post too long. I do think we’re getting something here in terms of our objective of picturing a photon–using classical concepts that is! A photon should be a long wave train – a very long wave train actually – but its effective ‘radius’ should be of the same order as the classical electron radius, one would think. Or, much more likely, much smaller. If it’s more or less the same radius, then it would be in the order of femtometers (1 fm = 1 fermi = 1×10–15 m). That’s good because that’s a typical length-scale in nuclear physics. For example, it would be comparable with the radius of a proton. So we look at a photon here as something very different – because it’s so incredibly long (as mentioned above, three meter is not an atomic scale at all!) – but as something which does have some kind of ‘radius’ that is normal to its direction of propagation and equal or, more likely, much smaller than the classical electron radius. [Why smaller? First, an electron is obviously fairly massive as compared to a photon (if only because an electron has a rest mass and a photon hasn’t). Second, it’s the electron that absorbs a photon–not the other way around.]

Now, that radius determines the area in which it may produce some effect, like hitting an electron, for example, or like being detected in a photon detector, which is just what this so-called radius of an atom or an electron is all about: the area which is susceptible of being hit by some particle (including a photon), or which is likely to emit some particle (including a photon). What is exactly, we don’t know: it’s still as spooky as an electron and, therefore, it also does not make all that much sense to talk about its exact position in space. However, if we’d talk about its position, then we should obviously also invoke the Uncertainty Principle, which will give us some upper and lower bounds for its actual position, just like it does for any other particle: the uncertainty about its position will be related to the uncertainty about its momentum, and more knowledge about the former, will implies less knowledge about the latter, and vice versa. Therefore, we can also associate some complex wave function with this photon which is – for all practical purposes – a de Broglie wave. Now how should we visualize that wave?

The shape and size of a photon’s probability wave

I am actually not going to offer anything specific here. First, it’s all speculation. Second, I think I’ve written too much rubbish already. However, if you’re still reading, and you like this kind of unorthodox application of electromagnetics, then the following remarks may stimulate your imagination.

First we should note that if we’re going to have a wave function for the photon in position-space (as opposed to momentum-space), its argument will not only be x and t, but also y and z. In fact, when trying to visualize this wave function, we should probably first think of keeping x and t constant and then how a little complex-valued wave train normal to the direction of propagation would look like.

What about its frequency? You may think that, if we know the frequency of this photon, and its energy, and its momentum (we know all about this photon, don’t we), then we can also associate some de Broglie frequency with this photon. Well… Yes and no. The simplicity of these de Broglie relations (λ = h/p and f = E/h) suggests we can, indeed, assign some frequency (f) or wavelength (λ) to it, all within the limits imposed by the Uncertainty Principle. But we know that we should not end up with a wave function that, when squared, gives us probabilities for each and every point in space. No. The wave function needs to be confined in space and, hence, we’re also talking a wave train here, and a very short one in this case. Indeed, while, in our reasoning here, we look at the photon as being somewhere, we know it should be somewhere within one or two femtometer of our line of sight.

Now, what’s the typical energy of a photon again? Well… Let’s calculate it for that sodium light. E = hν, so we have to multiply Planck’s constant (h = 4.135×10−15 eV·s) with the photon frequency (ν = 500×1012 oscillations/s), so that’s about 2 eV. I haven’t checked this but it should be about right: photons in the visible light spectrum have energies ranging from 1.5 to 3.5 eV. Not a lot but something. Now, what’s the de Broglie frequency and wavelength associated with that energy level? Hmm… Well… It’s the same formula, so we actually get the same frequency and wavelength: 500×1012 Hz and 600 nm (nanometer) for the wavelength. So how do we pack that into our one or two femtometer space? Hmm… Let’s think: one nanometer is a million femtometer, isn’t it? And so we’ve got a de Broglie wavelength of 600 nanometer?

Oh-oh We must be doing something wrong here, isn’t it?

Yeah. I guess so. Here I’ll quote Feynman again: “We cannot define a unique wavelength for a short wave train. Such a wave train does not have a definite wavelength; there is an indefiniteness in the wave number that is related to the finite length of the train.”

I had equally much trouble with this as with that other statement of Feynman–and then I mean on the ‘impossibility’ of a wave train with a fixed frequency. But now I think it’s very simple actually: a very very short burst is just not long enough to define a wavelength or a frequency: there’s a few up and downs, which are more likely than not to be very irregular, and that’s it. No nice sinusoidal shape. It’s as simple as that… I think. :-)

In fact–now that I am here–there’s something else I didn’t quite understand when reading physics: everyone who writes about light or matter waves seems to be focused on the frequency of these waves only. There’s little or nothing on the amplitude. Now, the energy of a physical wave, and of a light wave, does not only depend on its frequency, but also on the amplitude. In fact, we all know that doubling, tripling or quadrupling the frequency of a wave will double, triple or quadruple its energy (that’s obvious from the E = hν relation), but we tend to forget that the energy of a wave is also proportional to the square of its amplitude, for which I’ll use the symbol A, so we can write: E ∝ A2. Hence, if we double, triple or quadruple the amplitude of a wave, its energy will be multiplied by four, nine and sixteen respectively!

The same relationship obviously holds between probability amplitudes and probability densities: if we double, triple or quadruple the probability amplitudes, then the associated probabilities obviously also get multiplied by four, nine and sixteen respectively! This obviously establishes some kind of relation between the shape of the electromagnetic wave train and the probability wave: if the electromagnetic wave train (i.e. the photon itself) packs a lot of energy upfront (cf. the initial overshooting and the gradual dying out), then we should expect the probability amplitudes to be ‘bigger’ there as well. [Note that we can’t directly compare two complex numbers in terms of one being ‘bigger’ or ‘smaller’ than the other, but you know what I mean: their absolute square will be bigger.]

So what?

Well… Nothing. I can’t say anything more about this. However, to compensate for the fact that I didn’t get anywhere with my concept of a de Broglie wave for a photon – and, hence, I let you down, I guess – I’ll explore the relationship between amplitude, frequency and size of a wave train somewhat more in detail. It may inspire you when thinking yourself about a ‘probability wave’ for a photon. And then… Well… I will write some kind of conclusion, which may or may not give the answer(s) that you are looking for.

The relation between amplitude, frequency and energy

From what I wrote above, it’s obvious that there are two ways of packing more energy in a (real) wave, or a (sufficiently long) wave train:

  1. We can increase the frequency, and so that results in a linear increase in energy (twice the frequency is twice the energy).
  2. We increase the amplitude, and that results in an exponential (quadratic) increase in energy (double all amplitudes, and you pack four times more energy in that wave).

With a ‘real’ wave, I obviously mean either a wave that’s traveling in a medium or, in this case, an electromagnetic wave. OK. So what? Well… It’s probably quite reasonable to assume that both factors come into play when an electron emits a photon. Indeed, if the difference between the two energy levels is larger, then the photon will not only have a higher frequency (i.e. we’re talking light (or electromagnetic radiation) in the upper ranges of the spectrum then) but one should also expect that the initial overshooting – and, hence, the initial oscillation – will also be larger. In short, we’ll have larger amplitudes. Hence, higher-energy photons will pack even more energy upfront. They will also have higher frequency, because of the Planck relation. So, yes, both factors would come into play.

What about the length of these wave trains? Would it make them shorter? Yes. I’ll refer you to Feynman’s Lectures to verify that the wavelength appears in the numerator of the formula for Q. Hence, higher frequency means shorter wavelength and, hence, lower Q. Now, I am not quite sure (I am not sure about anything I am writing here it seems) but this may or may not be the reason for yet another statement I never quite understood: photons with higher and higher energy are said to become smaller and smaller, and when they reach the Planck scale, they are said to become black holes.


What’s the conclusion? Well… I’ll leave it to you to think about this. Let me make a bold statement here: that transient above actually is the wave function.

You’ll say: What !? What about normalization? All probabilities have to add up to one and, surely, those magnitudes of the electric field vector wouldn’t add up to one, would they?

My answer to that is simple: that’s just a question of units, i.e. of normalization indeed. So just measure the field strength in some other unit and it will come all right.

[…] But… Yes? What? Well… Those magnitudes are real numbers, not complex numbers.

I am not sure how to answer that one but there’s two things I could say:

  1. Real numbers are complex numbers too: it’s just that their imaginary part is zero)
  2. When working with waves, and especially with transients, we’ve always represented them using the complex exponential function. For example, we would write a wave function whose amplitude varies sinusoidally in space and time as Aei(ωtr), with ω the (angular) frequency and k the wave number (so that’s the wavelength expressed in radians per unit distance).

Frankly, think about it: where is the photon? It’s that three-meter long transient, isn’t it? And the probability to find it somewhere is the (absolute) square of some complex number, right? And then we have a wave function already, representing an electromagnetic wave, for which we know that the energy which it packs is the square of its amplitude, as well as being proportional to its frequency. We also know we’re more likely to detect something with high energy than something with low energy, don’t we? So why would we hesitate?

But then what about these probability amplitudes being a function of the y and z coordinates?

Well… Frankly, I’ve started to wonder if a photon actually has a radius. If it doesn’t have a mass, it’s probably the only real point-like particle (i.e. a particle not occupying any space) – as opposed to all other matter-particles, which do have mass.


I don’t know. Your guess is as good as mine. Maybe our concepts of amplitude and frequency of a photon are not very relevant. Perhaps it’s only energy that counts. We know that a photon has a more or less well-defined energy level (within the limits of the Uncertainty Principle) and, hence, our ideas about how that energy actually gets distributed over the frequency, the amplitude and the length of that ‘transient’ have no relation with reality. Perhaps we like to think of a photon as a transient electromagnetic wave, because we’re used to thinking in terms of waves and fields, but perhaps a photon is just a point-like thing indeed, with a wave function that’s got the same shape as that transient. :-)

Post scriptum: I should apologize to you, my dear reader. It’s obvious that, in quantum mechanics, we don’t think of a photon as having some frequency and some wavelength and some dimension in space: it’s just an elementary particle with energy interacting with other elementary particles with energy, and we use these coupling constants and what have you to work with them. We don’t think of photons as a three-meter long snake swirling through space. So, when I write that “our concepts of amplitude and frequency of a photon are maybe not very relevant” when trying to picture a photon, and that “perhaps, it’s only energy that counts”, I actually don’t mean “maybe” or “perhaps”. I mean: Of course !!! In the quantum-mechanical world view, that is.

So I apologize for posting nonsense. However, as all of this nonsense helps me to make sense of these things myself, I’ll just continue. :-) I seem to move very slowly on this Road to Reality, but the good thing about moving slowly, is that it will – hopefully – give me the kind of ‘deeper’ understanding I want, i.e. an understanding beyond the formulas and mathematical and physical models. In the end, that’s all that I am striving for when pursuing this ‘hobby’ of mine. Nothing more, nothing less. :-)

Babushka thinking

What is that we are trying to understand? As a kid, when I first heard about atoms consisting of a nucleus with electrons orbiting around it, I had this vision of worlds inside worlds, like a set of babushka dolls, one inside the other. Now I know that this model – which is nothing but the 1911 Rutherford model basically – is plain wrong, even if it continues to be used in the logo of the International Atomic Energy Agency, or the US Atomic Energy Commission. 

IAEA logo US_Atomic_Energy_Commission_logo

Electrons are not planet-like things orbiting around some center. If one wants to understand something about the reality of electrons, one needs to familiarize oneself with complex-valued wave functions whose argument represents a weird quantity referred to as a probability amplitude and, contrary to what you may think (unless you read my blog, or if you just happen to know a thing or two about quantum mechanics), the relation between that amplitude and the concept of probability tout court is not very straightforward.

Familiarizing oneself with the math involved in quantum mechanics is not an easy task, as evidenced by all those convoluted posts I’ve been writing. In fact, I’ve been struggling with these things for almost a year now and I’ve started to realize that Roger Penrose’s Road to Reality (or should I say Feynman’s Lectures?) may lead nowhere – in terms of that rather spiritual journey of trying to understand what it’s all about. If anything, they made me realize that the worlds inside worlds are not the same. They are different – very different.

When everything is said and done, I think that’s what’s nagging us as common mortals. What we are all looking for is some kind of ‘Easy Principle’ that explains All and Everything, and we just can’t find it. The point is: scale matters. At the macro-scale, we usually analyze things using some kind of ‘billiard-ball model’. At a smaller scale, let’s say the so-called wave zone, our ‘law’ of radiation holds, and we can analyze things in terms of electromagnetic or gravitational fields. But then, when we further reduce scale, by another order of magnitude really – when trying to get  very close to the source of radiation, or if we try to analyze what is oscillating really – we get in deep trouble: our easy laws do no longer hold, and the equally easy math – easy is relative of course :-) – we use to analyze fields or interference phenomena, becomes totally useless.

Religiously inclined people would say that God does not want us to understand all or, taking a somewhat less selfish picture of God, they would say that Reality (with a capital R to underline its transcendental aspects) just can’t be understood. Indeed, it is rather surprising – in my humble view at least – that things do seem to get more difficult as we drill down: in physics, it’s not the bigger things – like understanding thermonuclear fusion in the Sun, for example – but the smallest things which are difficult to understand. Of course, that’s partly because physics leaves some of the bigger things which are actually very difficult to understand – like how a living cell works, for example, or how our eye or our brain works – to other sciences to study (biology and biochemistry for cells, or for vision or brain functionality). In that respect, physics may actually be described as the science of the smallest things. The surprising thing, then, is that the smallest things are not necessarily the simplest things – on the contrary.

Still, that being said, I can’t help feeling some sympathy for the simpler souls who think that, if God exists, he seems to throw up barriers as mankind tries to advance its knowledge. Isn’t it strange, indeed, that the math describing the ‘reality’ of electrons and photons (i.e. quantum mechanics and quantum electrodynamics), as complicated as it is, becomes even more complicated – and, important to note, also much less accurate – when it’s used to try to describe the behavior of  quarks and gluons? Additional ‘variables’ are needed (physicists call these ‘variables’ quantum numbers; however, when everything is said and done, that’s what quantum numbers actually are: variables in a theory), and the agreement between experimental results and predictions in QCD is not as obvious as it is in QED.

Frankly, I don’t know much about quantum chromodynamics – nothing at all to be honest – but when I read statements such as “analytic or perturbative solutions in low-energy QCD are hard or impossible due to the highly nonlinear nature of the strong force” (I just took this one line from the Wikipedia article on QCD), I instinctively feel that QCD is, in fact, a different world as well – and then I mean different from QED, in which analytic or perturbative solutions are the norm. Hence, I already know that, once I’ll have mastered Feynman’s Volume III, it won’t help me all that much to get to the next level of understanding: understanding quantum chromodynamics will be yet another long grind. In short, understanding quantum mechanics is only a first step.

Of course, that should not surprise us, because we’re talking very different order of magnitudes here: femtometers (10–15 m), in the case of electrons, as opposed to attometers (10–18 m) or even zeptometers ((10–21 m) when we’re talking quarks. Hence, if past experience (I mean the evolution of scientific thought) is any guidance, we actually should expect an entirely different world. Babushka thinking is not the way forward.

Babushka thinking

What’s babushka thinking? You know what babushkas are, don’t you? These dolls inside dolls. [The term ‘babushka’ is actually Russian for an old woman or grandmother, which is what these dolls usually depict.] Babushka thinking is the fallacy of thinking that worlds inside worlds are the same. It’s what I did as a kid. It’s what many of us still do. It’s thinking that, when everything is said and done, it’s just a matter of not being able to ‘see’ small things and that, if we’d have the appropriate equipment, we actually would find the same doll within the larger doll – the same but smaller – and then again the same doll with that smaller doll. In Asia, they have these funny expression: “Same-same but different.” Well… That’s what babushka thinking all about: thinking that you can apply the same concepts, tools and techniques to what is, in fact, an entirely different ballgame.


Let me illustrate it. We discussed interference. We could assume that the laws of interference, as described by superimposing various waves, always hold, at every scale, and that it’s just  the crudeness of our detection apparatus that prevents us from seeing what’s going on. Take two light sources, for example, and let’s say they are a billion wavelengths apart – so that’s anything between 400 to 700 meters for visible light (because the wavelength of visible light is 400 to 700 billionths of a meter). So then we won’t see any interference indeed, because we can’t register it. In fact, none of the standard equipment can. The interference term oscillates wildly up and down, from positive to negative and back again, if we move the detector just a tiny bit left or right – not more than the thickness of a hair (i.e. 0.07 mm or so). Hence, the range of angles θ (remember that angle θ was the key variable when calculating solutions for the resultant wave in previous posts) that are being covered by our eye – or by any standard sensor really – is so wide that the positive and negative interference averages out: all that we ‘see’ is the sum of the intensities of the two lights. The terms in the interference term cancel each other out. However, we are still essentially correct assuming there actually is interference: we just cannot see it – but it’s there.

Reinforcing the point, I should also note that, apart from this issue of ‘distance scale’, there is also the scale of time. Our eye has a tenth-of-a-second averaging time. That’s a huge amount of time when talking fundamental physics: remember that an atomic oscillator – despite its incredibly high Q – emits radiation for like 10-8 seconds only, so that’s one-hundred millionths of a second. Then another atom takes over, and another – and so that’s why we get unpolarized light: it’s all the same frequencies (because the electron oscillators radiate at their resonant frequencies), but so there is no fixed phase difference between all of these pulses: the interference between all of these pulses should result in ‘beats’ – as they interfere positively or negatively – but it all cancels out for us, because it’s too fast.

Indeed, while the ‘sensors’ in the retina of the human eye (there are actually four kind of cells there, but so the principal ones are referred to as ‘rod’ and ‘cone’ cells respectively) are, apparently, sensitive enough able to register individual photons, the “tenth-of-a-second averaging” time means that the cells – which are interconnected and ‘pre-process’ light really – will just amalgamate all those individual pulses into one signal of a certain color (frequency) and a certain intensity (energy). As one scientist puts it: “The neural filters only allow a signal to pass to the brain when at least about five to nine photons arrive within less than 100 ms.” Hence, that signal will not keep track of the spacing between those photons.

In short, information gets lost. But so that, in itself, does not invalidate babushka thinking. Let me visualize it by a non-very-mathematically-rigorous illustration. Suppose that we have some very regular wave train coming in, like the one below: one wave train consisting of three ‘groups’ separated between ‘nodes’.


All will depend on the period of the wave as compared to that one-tenth-of-a-second averaging time. In fact, we have two ‘periods’: the periodicity of the group – which is related to the concept of group velocity – and, hence, I’ll associate a ‘group wavelength’ and a ‘group period’ with that. [In case you haven’t heard of these terms before, don’t worry: I haven’t either. :-)] Now, if one tenth of a second covers like two or all three of the groups between the nodes (so that means that one tenth of a second is a multiple of the group period Tg), then even the envelope of the wave does not matter much in terms of ‘signal’: our brain will just get one pulse that averages it all out. We will see none of the detail of this wave train. Our eye will just get light in (remember that the intensity of the light is the square of the amplitude, so the negative amplitudes make contributions too) but we cannot distinguish any particular pulse: it’s just one signal. This is the most common situation when we are talking about electromagnetic radiation: many photons arrive but our eye just sends one signal to the brain: “Hey Boss! Light of color X and intensity Y coming from direction Z.”

In fact, it’s quite remarkable that our eye can distinguish colors in light of the fact that the wavelengths of various colors (violet, blue, green, yellow, orange and red) differs 30 to 40 billionths of a meter only! Better still: if the signal lasts long enough, we can distinguish shades whose wavelengths differ by 10 or 15 nm only, so that’s a difference of 1% or 2% only. In case you wonder how it works: Feynman devotes not less than two chapters in his Lectures to the physiology of the eye: not something you’ll find in other physics handbooks! There are apparently three pigments in the cells in our eyes, each sensitive to color in a different way and it is “the spectral absorption in those three pigments that produces the color sense.” So it’s a bit like the RGB system in a television – but then more complicated, of course!

But let’s go back to our wave there and analyze the second possibility. If a tenth of a second covers less than that ‘group wavelength’, then it’s different: we will actually see the individual groups as two or  three separate pulses. Hence, in that case, our eye – or whatever detector (another detector will just have another averaging time – will average over a group, but not over the whole wave train. [Just in case you wonder how we humans compare with our living beings: from what I wrote above, it’s obvious we can see ‘flicker’ only if the oscillation is in the range of 10 or 20 Hz. The eye of a bee is made to see the vibrations of feet and wings of other bees and, hence, its averaging time is much shorter, like a hundredth of a second and, hence, it can see flicker up to 200 oscillations per second! In addition, the eye of a bee is sensitive over a much wider range of ‘color’ – it sees UV light down to a wavelength of 300 nm (where as we don’t see light with a wavelength below 400 nm) – and, to top it all off, it has got a special sensitivity for polarized light, so light that gets reflected or diffracted looks different to the bee.]

Let’s go to the third and final case. If a tenth of a second would cover less than the wavelength of the the so-called carrier wave, i.e. the actual oscillation, then we will be able to distinguish the individual peaks and troughs of the carrier wave!

Of course, this discussion is not limited to our eye as a sensor: any instrument will be able to measure individual phenomena only within a certain range, with an upper and a lower range, i.e. the ‘biggest’ thing it can see, and the ‘smallest’. So that explains the so-called resolution of an optical or an electron microscope: whatever the instrument, it cannot really ‘see’ stuff that’s smaller than the wavelength of the ‘light’ (real light or – in the case of an electron microscope – electron beams) it uses to ‘illuminate’ the object it is looking at. [The actual formula for the resolution of a microscope is obviously a bit more complicated, but this statement does reflect the gist of it.]

However, all that I am writing above, suggests that we can think of what’s going on here as ‘waves within waves’, with the wave between nodes not being any different – in substance that is – as the wave as a whole: we’ve got something that’s oscillating, and within each individual oscillation, we find another oscillation. From a math point of view, babushka thinking is thinking we can analyze the world using Fourier’s machinery to decompose some function (see my posts on Fourier analysis). Indeed, in the example above, we have a modulated carrier wave (it is an example of amplitude modulation – the old-fashioned way of transmitting radio signals), and we see a wave within a wave and, hence, just like the Rutherford model of an atom, you may think there will always be ‘a wave within a wave’.

In this regard, you may think of fractals too: fractals are repeating or self-similar patterns that are always there, at every scale. However, the point to note is that fractals do not represent an accurate picture of how reality is actually structured: worlds within worlds are not the same.

Reality is no onion

Reality is not some kind of onion, from which you peel off a layer and then you find some other layer, similar to the first: “same-same but different”, as they’d say in Asia. The Coast of Britain is, in fact, finite, and the grain of sand you’ll pick up at one of its beaches will not look like the coastline when you put it under a microscope. In case you don’t believe me: I’ve inserted a real-life photo below. The magnification factor is a rather modest 300 times. Isn’t this amazing? [The credit for this nice picture goes to a certain Dr. Gary Greenberg. Please do google his stuff. It’s really nice.]


In short, fractals are wonderful mathematical structures but – in reality – there are limits to how small things get: we cannot carve a babushka doll out of the cellulose and lignin molecules that make up most of what we call wood. Likewise, the atoms that make up the D-glucose chains in the cellulose will never resemble the D-glucose chains. Hence, the babushka doll, the D-glucose chains that make up wood, and the atoms that make up the molecules within those macro-molecules are three different worlds. They’re not like layers of the same onion. Scale matters. The worlds inside words are different, and fundamentally so: not “same-same but different” but just plain different. Electrons are no longer point-like negative charges when we look at them at close range.

In fact, that’s the whole point: we can’t look at them at close range because we can’t ‘locate’ them. They aren’t particles. They are these strange ‘wavicles’ which we described, physically and mathematically, with a complex wave function relating their position (or their momentum) with some probability amplitude, and we also need to remember these funny rules for adding these amplitudes, depending on whether or not the ‘wavicle’ obeys Fermi or Bose statistics.

Weird, but – come to think of it – not more weird, in terms of mathematical description, than these electromagnetic waves. Indeed, when jotting down all these equations and developing all those mathematical argument, one often tends to forget that we are not talking some physical wave here. The field vector E (or B) is a mathematical construct: it tells us what force a charge will feel when we put it here or there. It’s not like a water or sound wave that makes some medium (water or air) actually move. The field is an influence that travels through empty space. But how can something actually through empty space? When it’s truly empty, you can’t travel through it, can you?

Oh – you’ll say – but we’ve got these photons, don’t we? Waves are not actually waves: they come in little packets of energy – photons. Yes. You’re right. But, as mentioned above, these photons aren’t little bullets – or particles if you want. They’re as weird as the wave and, in any case, even a billiard ball view of the world is not very satisfying: what happens exactly when two billiard balls collide in a so-called elastic collision? What are the springs on the surface of those balls – in light of the quick reaction, they must resemble more like little explosive charges that detonate on impact, isn’t it? – that make the two balls recoil from each other?

So any mathematical description of reality becomes ‘weird’ when you keep asking questions, like that little child I was – and I still am, in a way, I guess. Otherwise I would not be reading physics at the age of 45, would I? :-)


Let me wrap up here. All of what I’ve been blogging about over the past few months concerns the classical world of physics. It consists of waves and fields on the one hand, and solid particles on the other – electrons and nucleons. But so we know it’s not like that when we have more sensitive apparatuses, like the apparatus used in that 2012 double-slit electron interference experiment at the University of Nebraska–Lincoln, that I described at length in one of my earlier posts. That apparatus allowed control of two slits – both not more than 62 nanometer wide (so that’s the difference between the wavelength of dark-blue and light-blue light!), and the monitoring of single-electron detection events. Back in 1963, Feynman already knew what this experiment would yield as a result. He was sure about it, even if he thought such instrument could never be built. [To be fully correct, he did have some vague idea about a new science, for which he himself coined the term ‘nanotechnology’, but what we can do today surpasses, most probably, all his expectations at the time. Too bad he died too young to see his dreams come through.]

The point to note is that this apparatus does not show us another layer of the same onion: it shows an entirely different world. While it’s part of reality, it’s not ‘our’ reality, nor is it the ‘reality’ of what’s being described by classical electromagnetic field theory. It’s different – and fundamentally so, as evidenced by those weird mathematical concepts one needs to introduce to sort of start to ‘understand’ it.

So… What do I want to say here? Nothing much. I just had to remind myself where I am right now. I myself often still fall prey to babushka thinking. We shouldn’t. We should wonder about the wood these dolls are made of. In physics, the wood seems to be math. The models I’ve presented in this blog are weird: what are those fields? And just how do they exert a force on some charge? What’s the mechanics behind? To these questions, classical physics does not have an answer really.

But, of course, quantum mechanics does not have a very satisfactory answer either: what does it mean when we say that the wave function collapses? Out of all of the possibilities in that wonderful indeterminate world ‘inside’ the quantum-mechanical universe, one was ‘chosen’ as something that actually happened: a photon imparts momentum to an electron, for example. We can describe it, mathematically, but – somehow – we still don’t really understand what’s going on.

So what’s going on? We open a doll, and we do not find another doll that is smaller but similar. No. What we find is a completely different toy. However – Surprise ! Surprise ! – it’s something that can be ‘opened’ as well, to reveal even weirder stuff, for which we need even weirder ‘tools’ to somehow understand how it works (like lattice QCD, if you’d want an example: just google it if you want to get an inkling of what that’s about). Where is this going to end? Did it end with the ‘discovery’ of the Higgs particle? I don’t think so.

However, with the ‘discovery’ (or, to be generous, let’s call it an experimental confirmation) of the Higgs particle, we may have hit a wall in terms of verifying our theories. At the center of a set of babushka dolls, you’ll usually have a little baby: a solid little thing that is not like the babushkas surrounding it: it’s young, male and solid, as opposed to the babushkas. Well… It seems that, in physics, we’ve got several of these little babies inside: electrons, photons, quarks, gluons, Higgs particles, etcetera. And we don’t know what’s ‘inside’ of them. Just that they’re different. Not “same-same but different”. No. Fundamentally different. So we’ve got a lot of ‘babies’ inside of reality, very different from the ‘layers’ around them, which make up ‘our’ reality. Hence, ‘Reality’ is not a fractal structure. What is it? Well… I’ve started to think we’ll never know. For all of the math and wonderful intellectualism involved, do we really get closer to an ‘understanding’ of what it’s all about?

I am not sure. The more I ‘understand’, the less I ‘know’ it seems. But then that’s probably why many physicists still nurture an acute sense of mystery, and why I am determined to keep reading. :-)

Post scriptum: On the issue of the ‘mechanistic universe’ and the (related) issue of determinability and indeterminability, that’s not what I wanted to write about above, because I consider that solved. This post is meant to convey some wonder – on the different models of understanding that we need to apply to different scales. It’s got little to do with determinability or not. I think that issue got solved long time ago, and I’ll let Feynman summarize that discussion:

“The indeterminacy of quantum mechanics has given rise to all kinds of nonsense and questions on the meaning of freedom of will, and of the idea that the world is uncertain. […] Classical physics is also indeterminate. It is true, classically, that if we knew the position and the velocity of every particle in the world, or in a box of gas, we could predict exactly what would happen. And therefore the classical world is deterministic. Suppose, however, we have a finite accuracy and do not know exactly where just one atom is, say to one part in a billion. Then as it goes along it hits another atom, and because we did not know the position better than one part in a billion, we find an even larger error in the position after the collision. And that is amplified, of course, in the next collision, so that if we start with only a tiny error it rapidly magnifies to a very great uncertainty. […] Speaking more precisely, given an arbitrary accuracy, no matter how precise, one can find a time long enough that we cannot make predictions valid for that long a time. That length of time is not very large. It is not that the time is millions of years if the accuracy is one part in a billion. The time goes only logarithmically with the error. In only a very, very tiny time – less than the time it took to state the accuracy – we lose all our information. It is therefore not fair to say that from the apparent freedom and indeterminacy of the human mind, we should have realized that classical ‘deterministic’ physics could not ever hope to understand, and to welcome quantum mechanics as a release from a completely ‘mechanistic’ universe. For already in classical mechanics, there was indeterminability from a practical point of view.” (Feynman, Lectures, 1963, p. 38-10)

That really says it all, I think. I’ll just continue to keep my head down – i.e. stay away from philosophy as for now – and try to find a way to open the toy inside the toy. :-)

Light: relating waves to photons

This is a concluding note on my ‘series’ on light. The ‘series’ gave you an overview of the ‘classical’ theory: light as an electromagnetic wave. It was very complete, including relativistic effects (see my previous post). I could have added more – there’s an equivalent for four-vectors, for example, when we’re dealing with frequencies and wave numbers: quantities that transform like space and time under the Lorentz transformations – but you got the essence.

One point we never ever touched upon, was that magnetic field vector though. It is there. It is tiny because of that 1/c factor, but it’s there. We wrote it as

magnetic field

All symbols in bold are vectors, of course. The force is another vector vector cross-product: F = qv×B, and you need to apply the usual right-hand screw rule to find the direction of the force. As it turns out, that force – as tiny as it is – is actually oriented in the direction of propagation, and it is what is responsible for the so-called radiation pressure.

So, yes, there is a ‘pushing momentum’. How strong is it? What power can it deliver? Can it indeed make space ships sail? Well… The magnitude of the unit vector er’ is obviously one, so it’s the values of the other vectors that we need to consider. If we substitute and average F, the thing we need to find is:

〈F〉 = q〈vE〉/c

But the charge q times the field is the electric force, and the force on the charge times the velocity is the work dW/dt being done on the charge. So that should equal the energy absorbed that is being absorbed from the light per second. Now, I didn’t look at that much. It’s actually one of the very few things I left – but I’ll refer you to Feynman’s Lectures if you want to find out more: there’s a fine section on light scattering, introducing the notion of the Thompson scattering cross section, but – as said – I think you had enough as for now. Just note that 〈F〉 = [dW/dt]/c and, hence, that the momentum that light delivers is equal to the energy that is absorbed (dW/dt) divided by c.

So the momentum carried is 1/c times the energy. Now, you may remember that Planck solved the ‘problem’ of black-body radiation – an anomaly that physicists couldn’t explain at the end of the 19th century – by re-introducing a corpuscular theory of light: he said light consisted of photons. We all know that photons are the kind of ‘particles’ that the Greek and medieval corpuscular theories of light envisaged but, well… They have a particle-like character – just as much as they have a wave-like character. They are actually neither, and they are physically and mathematically being described by these wave functions – which, in turn, are functions describing probability amplitudes. But I won’t entertain you with that here, because I’ve written about that in other posts. Let’s just go along with the ‘corpuscular’ theory of photons for a while.

Photons also have energy (which we’ll write as W instead of E, just to be consistent with the symbols above) and momentum (p), and Planck’s Law says how much:

W = hf and p = W/c

So that’s good: we find the same multiplier 1/c here for the momentum of a photon. In fact, this is more than just a coincidence of course: the “wave theory” of light and Planck’s “corpuscular theory” must of course link up, because they are both supposed to help us understand real-life phenomena.

There’s even more nice surprises. We spoke about polarized light, and we showed how the end of the electric field vector describes a circular or elliptical motion as the wave travels to space. It turns out that we can actually relate that to some kind of angular momentum of the wave (I won’t go into the details though – because I really think the previous posts have really been too heavy on equations and complicated mathematical arguments) and that we could also relate it to a model of photons carrying angular momentum, “like spinning rifle bullets” – as Feynman puts it.

However, he also adds: “But this ‘bullet’ picture is as incomplete as the ‘wave’ picture.” And so that’s true and that should be it. And it will be it. I will really end this ‘series’ now. It was quite a journey for me, as I am making my way through all of these complicated models and explanations of how things are supposed to work. But a fascinating one. And it sure gives me a much better feel for the ‘concepts’ that are hastily explained in all of these ‘popular’ books dealing with science and physics, hopefully preparing me better for what I should be doing, and that’s to read Penrose’s advanced mathematical theories.

Radiation and relativity

Now we are really going to do some very serious analysis: relativistic effects in radiation. Fasten your seat belts please.

The Doppler effect for physical waves traveling through a medium

In one of my post – I don’t remember which one – I wrote about the Doppler effect for sound waves, or any physical wave traveling through a medium. I also said it had nothing to do with relativity. What happens really is that the observer sort of catches up with wave or – the other way around: that he falls back – and, because the velocity of a wave in a medium is always the same (that also has nothing to do with relativity: it’s just a general principle that’s true – always), the frequency of the physical wave will appear to be different. [So that’s why a siren of an ambulance sounds different as it moves past you.] Wikipedia has an excellent article on that – so I’ll refer you to that – but so that article does not say anything – or nothing much – about the Doppler effect when electromagnetic radiation is involved. So that’s what we’ll talk about here. Before we do that, though, let me quickly jot down the formula for the Doppler effect for a physical wave traveling through a medium, so we are clear about the differences between the two ‘Doppler effects':


In this formula, vp is the propagation speed of the wave in the medium which – as mentioned above – depends on the medium. Now the source and the receiver will have a velocity with respect to the medium as well (positive or negative – depending on whether they’re moving in the same direction as the wave or not), and so that’s vr (r for receiver) and v(s for source) respectively. So we’re adding speeds here to calculate some relative speed and then we take the ratio. Some people think that’s what relativity theory is all about. It’s not. Everything I’ve written so far is just Galilean relativity – so stuff that’s been known for a thousand years already, if not longer.

Relativity is weird: one aspect of it is length contraction – not often discussed – but the other thing is better known: there is no such thing as absolute time, so when we talk velocities, you need to specify according to whose time?

The Doppler effect for electromagnetic radiation

One thing that’s not relative – and which makes things look somewhat similar to what I wrote above – is that the speed of light is always equal to c. That was the startling fact that came out of Maxwell’s equations (startling because it is not consistent with Galilean relativity: it says that we cannot ‘catch up’ with a light wave!) around which Einstein build all of “new physics”, and so it’s something we use rather matter-of-factly in all that we’ll write below.

In all of the preceding posts about light, I wrote – more than once actually – that the movement of the oscillating charge (i.e. the source of the radiation) along the line of sight did not matter: the only thing that mattered was its acceleration in the xy-plane, which is perpendicular to our line of sight, which we’ll call the z-axis. Indeed, let me remind of you of the two equations defining electromagnetic radiation (see my post on light and radiation about a week ago):

Formula 3

Formula 4

The first formula gives the electromagnetic effect that dominates in the so-called wave zone, i.e. a few wavelengths away from the source – because the Coulomb force varies as the inverse of the square of the distance r, unlike this ‘radiation’ effect, which varies inversely as the distance only (E ∝ 1/r), so it falls off much less rapidly.

Now, that general observation still holds when we’re considering an oscillating charge that is moving towards or away from us with some relativistic speed (i.e. a speed getting close enough to c to produce relativistic length contraction and time dilation effects) but, because of the fact that we need to consider local times, our formula for the retarded time is no longer correct.

Huh? Yes. The matter is quite complicated, and Feynman starts with jotting down the derivatives for the displacement in the x- and y-directions, but I think I’ll skip that. I’ll go for the geometrical picture straight away, which is given below. As said, it’s going to be difficult, but try to hang in here, because it’s necessary to understand the Doppler effect in a correct way (I myself have been fooled by quite a few nonsensical explanations of it in popular books) and, as a bonus, you also get to understand synchrotron radiation and other exciting stuff.

Doppler effect with text

So what’s going on here? Well… Don’t look at the illustration above. We’ll come back at it. Let’s first build up the logic. We’ve got a charge moving vertically – from our point that is (the observer), but also moving in and out of us, i.e. in the z-direction. Indeed, note the arrow pointing to the observer: that’s us! So it could indeed be us looking at electrons going round and round and round – at phenomenal speeds – in a synchrotron indeed – but then one that’s been turned 90 degrees (probably easier for us to just lie down on the ground and look sideways). In any case, I hope you can imagine the situation. [If not, try again.] Now, if that charge was not moving at relativistic speeds (e.g. 0.94c, which is actually the number that Feynman uses for the graph above), then we would not have to worry about ‘our’ time t and the ‘local’ time τ. Huh? Local time? Yes.

We denote the time as measured in the reference frame of the moving charge as τ. Hence, as we are counting t = 1, 2, 3 etcetera, the electron is counting τ = 1, 2, 3 etcetera as it goes round and round and round. If the charge would not be moving at relativistic speeds, we’d do the standard thing and that’s to calculate the retarded acceleration of the charge a'(t) = a(t − r/c). [Remember that we used a prime to mark functions for which we should use a retarded argument and, yes, I know that the term ‘retarded’ sounds a bit funny, but that’s how it is. In any case, we’d have a'(t) = a(t − r/c) – so the prime vanishes as we put in the retarded argument.] Indeed, from the ‘law’ of radiation, we know that the field now and here is given by the acceleration of the charge at the retarded time, i.e. t – r/c. To sum it all up, we would, quite simply, relate t and τ as follows:

τ = t – r/c or, what amounts to the same, t = τ + r/c

So the effect that we see now, at time t, was produced at a distance r at time τ = t − r/c. That should make sense. [Again, if it doesn’t: read again. I can’t explain it in any other way.]

The crucial chain in this rather complicated chain of reasoning comes now. You’ll remember that one of the assumptions we used to derive our ‘law’ of radiation was that the assumption that “r is practically constant.” That does no longer hold. Indeed, that electron moving around in the synchrotron comes in and out at us at crazy speeds (0.94c is a bit more than 280,000 km per second), so r goes up and down too, and the relevant distance is not r but r + z(τ). This means the retardation effect is actually a bit larger: it’s not r/c but [r + z(τ)]/c = r/c + z(τ)/c. So we write:

τ = t – r/c – z(τ)/c or, what amounts to the same, t = τ + r/c + z(τ)/c

Hmm… You’ll say this is rather fishy business. Why would we use the actual distance and not the distance a while ago? Well… I’ll let you mull over that. We’ve got two points in space-time here and so they are separated both by distance as well as time. It makes sense to use the actual distance to calculate the actual separation in time, I’d say. If you’re not convinced, I can only refer you to those complicated derivations that Feynman briefly does before introducing this ‘easier’ geometric explanation. This brings us to the next point. We can measure time in seconds, but also in equivalent distance, i.e. light-seconds: the distance that light (remember: always at absolute speed c) travels in one second, i.e. approximately 299,792,458 meter. It’s just another ‘time unit’ to get used to.

Now that’s what’s being done above. To be sure, we get rid of the constant r/c, which is a constant indeed: that amounts to a shift of the origin by some constant (so we start counting earlier or later). In short, we have a new variable ‘t’ really that’s equal to t = τ + z(τ)/c. But we’re going to count time in meter (well – in units of c meter really), so we will just multiply this and we get:

ct = cτ + z(τ)

Why the different unit? Well… We’re talking relativistic speeds here, don’t we? And so the second is just an inappropriate unit. When we’re finished with this example, we’ll give you an even simpler example: a source just moving in on us, with an equally astronomical speed, so the functional shape of z(τ) will be some fraction of c (let’s say kc) times τ, so kcτ. So, to simplify things, just think of it as re-scaling the time axis in units that makes sense as compared to the speeds we are talking about.

Now we can finally analyze that graph on the right-hand side. If we would keep r fixed – so if we’d not care about the charge moving in and out of – the plot of x'(t) – i.e. the retarded position indeed – against ct would yield the sinusoidal graph plotted by the red numbers 1 to 13 here. In fact, instead of a sinusoidal graph, it resembles a normal distribution, but that’s just because we’re looking at one revolution only. In any case, the point to note is that – when everything is said and done – we need to calculate the retarded acceleration, so that’s the second derivative of x'(t). The animated illustration shows how that works: the second derivative (not the first) turns from positive to negative – and vice versa – at inflection points, when the curve goes from convex to concave, and vice versa. So, on the segment of that sinusoidal function marked by the red numbers, it’s positive at first (the slope of the tangent becomes steeper and steeper), then negative (cf. that tangent line turning from blue to green in the illustration below), and then it becomes positive again, as the negative slope becomes less negative at the second inflection point.

Animated_illustration_of_inflection_pointThat should be straightforward. However, the actual x'(t) curve is the black curve with the cusp. A curve like that is called a hypocycloid. Let me reproduce it once again for ease of reference.

Doppler effect with textWe relate x'(t) to ct (this is nothing but our new unit for t) by noting that ct = cτ + z(τ). Capito? It’s not easy to grasp: the instinct is to just equate t and τ and write x'(t) = x'(τ), but that would not be correct. No. We must measure in ‘our’ time and get a functional form for x’ as a function of t, not of τ. In fact, x'(t) − i.e. the retarded vertical position at time t – is not equal to x'(τ) but to x(τ), i.e. that’s the actual (instead of retarded) position at (local) time τ, and so that’s what the black graph above shows.

I admit it’s not easy. I myself am not getting it in an ‘intuitive’ way. But the logic is solid and leads us where it leads us. Perhaps it helps to think in terms of the curvature of this graph. In fact, we have to think in terms of curvature of this graph in order to understand what’s happening in terms of radiation. When the charge is moving away from us, i.e. during the first three ‘seconds’ (so that’s the 1-2-3 count), we see that the curvature is less than than what it would be – and also doesn’t change very much – if the displacement was given by the sinusoidal function, which means there’s very little radiation (because there’s little acceleration – negative or positive. However, as the charge moves towards us, we get that sharp cusp and, hence, we also get sharp curvature, which results in a sharp pulse of the electric field, rather than the regular – equally sinusoidal – amplitude we’d have if that electron was not moving in and out at us at relativistic speeds. In fact, that’s what synchrotron radiation is: we get these sharp pulses indeed. Feynman shows how they are measured – in very much detail – using a diffraction grating, but that would just be another diversion and so I’ll spare you of that.

Hmm… This has nothing to do with the Doppler effect, you’ll say. Well… Yes and no. The discussion above basically set the stage for that discussion. So let’s turn to that now. However, before I do that, I want to insert another graph for an oscillating charge moving in and out at us in some irregular way – rather than the nice circular route described above.

Doppler irregular

The Doppler effect

The illustration below is a similar diagram as the ones above – but looks much simpler. It shows what happens when an oscillating charge (which we assume to oscillate at its natural or resonant frequency ω0) moves towards us at some relativistic speed v (whatever speed – fairly close to c, so the ratio v/c is substantial). Note that the movement is from A to B – and that the observer (we!) are, once again, at the left – and, hence, the distance traveled is AB = vτ. So what’s the impact on the frequency? That’s shown on the x'(t) graph on the right: the curvature of the sinusoidal motion is much sharper, which means that its angular frequency as we see or measure it (and we’ll denote that by ω1) will be higher: if it’s a larger object emitting ‘white’ light (i.e. a mix of everything), then the light will not longer be ‘white’ but it will have shifted towards the violet spectrum. If it moves away from us, it will appear ‘more red’.

Doppler moving in

What’s the frequency change? Well… The z(τ) function is rather simple here: z(τ) = vτ. Let’s use f and ffor a moment, instead of the angular frequency ω and ω0, as we know they only differ by the factor 2π (ω = 2π/T = 2π·f, with f = 1/T, i.e. the reciprocal of the period). Hence, in a given time Δτ, the number of oscillations will be f0Δτ. These oscillations will be spread over a distance vΔτ, and the time needed to travel that distance is Δτ – of course! For the observer, however, the same number of oscillations now is compressed over a distance (c-v)Δτ. The time needed to travel that distance corresponds to a time interval Δt = (c − v)Δτ/c = (1 − v/c)Δτ. Now, hence, the frequency f will be equal to f0Δτ (the number of oscillations) divided by Δt = (1 − v/c)Δτ. Hence, we get this relatively simple equation:

f = f0/(1 − v/c) and ω = ω0/(1 − v/c)

 Is that it? It’s not quite the same as the formula we had for the Doppler effect of physical waves traveling through a medium, but it’s simple enough indeed. And it also seems to use relative speed. Where’s the Lorentz factor? Why did we need all that complicated machinery?

You are a smart ass ! You’re right. In fact, this is exactly the same formula: if we equal the speed of propagation with c, set the velocity of the receiver to zero, and substitute v (with a minus sign obviously) for the speed of the source, then we get what we get above:

The thing we need to add is that the natural frequency of an atomic oscillator is not the same as that measured when standing still: the time dilation effects kicks in. If w0  is the ‘true’ natural frequency (so measured locally, so to say), then the modified natural frequency – as corrected for the time dilation effect – will be w1 = w0(1 – v2/c2)1/2. Therefore, the grand final relativistic formula for the Doppler effect for electromagnetic radiation is:

Doppler - relativistic

You may feel cheated now: did you really have to suffer going through that story on the synchrotron radiation to get the formula above? I’d say: yes and no. No, because you could be happy with the Doppler formula alone. But, yes, because you don’t get the story about those sharp pulses just from the relativistic Doppler formula alone. So the final answer is: yes. I hope you felt it was worth the suffering :-)