# Photons as strings

In my previous post, I explored, somewhat jokingly, the grey area between classical physics and quantum mechanics: light as a wave versus light as a particle. I did so by trying to picture a photon as an electromagnetic transient traveling through space, as illustrated below. While actual physicists would probably deride my attempt to think of a photon as an electromagnetic transient traveling through space, the idea illustrates the wave-particle duality quite well, I feel.

Understanding light is the key to understanding physics. Light is a wave, as Thomas Young proved to the Royal Society of London in 1803, thereby demolishing Newton’s corpuscular theory. But its constituents, photons, behave like particles. According to modern-day physics, both were right. Just to put things in perspective, the thickness of the note card which Young used to split the light – ordinary sunlight entering his room through a pinhole in a window shutter – was 1/30 of an inch, or approximately 0.85 mm. That’s enormous as compared to modern-day engineering tolerance standards: what was thin then, is obviously not considered to be thin now. Scale matters. I’ll come back to this.

Young’s experiment (from www.physicsclassroom.com)

The table below shows that the ‘particle character’ of electromagnetic radiation becomes apparent when its frequency is a few hundred terahertz, like the sodium light example I used in my previous post: sodium light, as emitted by sodium lamps, has a frequency of 500×1012 oscillations per second and, therefore (the relation between frequency and wavelength is very straightforward: their product is the velocity of the wave, so for light we have the simple λf = c equation), a wavelength of 600 nanometer (600×10–9 meter).

However, whether something behaves like a particle or a wave also depends on our measurement scale: 0.85 mm was thin in Young’s time, and so it was a delicate experiment then but now, it’s a standard classroom experiment indeed. The theory of light as a wave would hold until more delicate equipment refuted it. Such equipment came with another sense of scale. It’s good to remind oneself that Einstein’s “discovery of the law of the photoelectric effect”, which explained the photoelectric effect as the result of light energy being carried in discrete quantized packets of energy, now referred to as photons, goes back to 1905 only, and that the experimental apparatus which could measure it was not much older. So waves behave like particles if we look at them close enough. Conversely, particles behave like waves if we look at them close enough. So there is this zone where they are neither, the zone for which we invoke the mathematical formalism of quantum mechanics or, to put it more precisely, the formalism of quantum electrodynamics: that “strange theory of light and Matter”, as Feynman calls it.

Let’s have a look at how particles became waves. It should not surprise us that the experimental apparatuses needed to confirm that electrons–or matter in general–can actually behave like a wave is more recent than the 19th century apparatuses which led Einstein to develop his ‘corpuscular’ theory of light (i.e. the theory of light as photons). The engineering tolerances involved are daunting. Let me be precise here. To be sure, the phenomenon of electron diffraction (i.e. electrons going through one slit and producing a diffraction pattern on the other side) had been confirmed experimentally already in 1925, in the famous Davisson-Germer experiment. I am saying because it’s rather famous indeed. First, because electron diffraction was a weird thing to contemplate at the time. Second, because it confirmed the de Broglie hypothesis only two years after Louis de Broglie had advanced it. And, third, because Davisson and Germer had never intended to set it up to detect diffraction: it was pure coincidence. In fact, the observed diffraction pattern was the result of a laboratory accident, and Davisson and Germer weren’t aware of other, conscious, attempts of trying to prove the de Broglie hypothesis. :-) [...] OK. I am digressing. Sorry. Back to the lesson.

The nanotechnology that was needed to confirm Feynman’s 1965 thought experiment on electron interference (i.e. electrons going through two slits and interfering with each other (rather than producing some diffraction pattern as they go through one slit only) – and, equally significant as an experiment result, with themselves as they go through the slit(s) one by one! – was only developed over the past decades. In fact, it was only in 2008 (and again in 2012) that the experiment was carried out exactly the way Feynman describes it in his Lectures.

It is useful to think of what such experiments entail from a technical point of view. Have a look at the illustration below, which shows the set-up. The insert in the upper-left corner shows the two slits which were used in the 2012 experiment: they are each 62 nanometer wide – that’s 50×10–9 m! – and the distance between them is 272 nanometer, or 0.272 micrometer. [Just to be complete: they are 4 micrometer tall (4×10–6 m), and the thing in the middle of the slits is just a little support (150 nm) to make sure the slit width doesn't vary.]

The second inset (in the upper-right corner) shows the mask that can be moved to close one or both slits partially or completely. The mask is 4.5µm wide ×20µm tall. Please do take a few seconds to contemplate the technology behind this feat: a nanometer is a millionth of a millimeter, so that’s a billionth of a meter, and a micrometer is a millionth of a meter. To imagine how small a nanometer is, you should imagine dividing one millimeter in ten, and then one of these tenths in ten again, and again, and once again, again, and again. In fact, you actually cannot imagine that because we live in the world we live in and, hence, our mind is used only to addition (and subtraction) when it comes to comparing sizes and – to a much more limited extent – with multiplication (and division): our brain is, quite simply, not wired to deal with exponentials and, hence, it can’t really ‘imagine’ these incredible (negative) powers. So don’t think you can imagine it really, because one can’t: in our mind, these scales exist only as mathematical constructs. They don’t correspond to anything we can actually make a mental picture of.

The electron beam consisted of electrons with an (average) energy of 600 eV. That’s not an awful lot: 8.5 times more than the energy of an electron in orbit in a atom, whose energy would be some 70 eV, so the acceleration before they went through the slits was relatively modest. I’ve calculated the corresponding de Broglie wavelength of these electrons in another post (Re-Visiting the Matter-Wave, April 2014), using the de Broglie equations: f = E/h or λ = p/h. And, of course, you could just google the article on the experiment and read about it, but it’s a good exercise, and actually quite simple: just note that you’ll need to express the energy in joule (not in eV) to get it right. Also note that you need to include the rest mass of the electron in the energy. I’ll let you try it (or else just go to that post of mine). You should find a de Broglie wavelength of 50 picometer for these electrons, so that’s 50×10–12 m. While that wavelength is less than a thousandth of the slit width (62 nm), and about 5,500 times smaller than the space between the two slits (272 nm), the interference effect was unambiguous in the experiment. I advice you to google the results yourself (or read that April 2014 post of mine if you want a summary): the experiment was done at the University of Nebraska-Lincoln in 2012.

Electrons and X-rays

To put everything in perspective: 50 picometer is like the wavelength of X-rays, and you can google similar double-slit experiments for X-rays: they also loose their ‘particle behavior’ when we look at them at this tiny scale. In short, scale matters, and the boundary between ‘classical physics’ (electromagnetics) and quantum physics (wave mechanics) is not clear-cut. If anything, it depends on our perspective, i.e. what we can measure, and we seem to be shifting that boundary constantly. In what direction?

Downwards obviously: we’re devising instruments that measure stuff at smaller and smaller scales, and what’s happening is that we can ‘see’ typical ‘particles’, including hard radiation such as gamma rays, as local wave trains. Indeed, the next step is clear-cut evidence for interference between gamma rays.

Energy levels of photons

We would not associate low-frequency electromagnetic waves, such as radio or radar waves, with photons. But light in the visible spectrum, yes. Obviously. [...]

Isn’t that an odd dichotomy? If we see that, on a smaller scale, particles start to look like waves, why would the reverse not be true? Why wouldn’t we analyze radio or radar waves, on a much larger scale, as a stream of very (I must say extremely) low-energy photons? I know the idea sounds ridiculous, because the energies involved would be ridiculously low indeed. Think about it. The energy of a photon is given by the Planck relation: E = h= hc/λ. For visible light, with wavelengths ranging from 800 nm (red) to 400 nm (violet or indigo), the photon energies range between 1.5 and 3 eV. Now, the shortest wavelengths for radar waves are in the so-called millimeter band, i.e. they range from 1 mm to 1 cm. A wavelength of 1 mm corresponds to a photon energy of 0.00124 eV. That’s close to nothing, of course, and surely not the kind of energy levels that we can currently detect.

But you get the idea: there is a grey area between classical physics and quantum mechanics, and it’s our equipment–notably the scale of our measurements–that determine where that grey area begins, and where it ends, and it seems to become larger and larger as the sensitivity of our equipment improves.

What do I want to get at? Nothing much. Just some awareness of scale, as an introduction to the actual topic of this post, and that’s some thoughts on a rather primitive string theory of photons. What !?

Yes. Purely speculative, of course. :-)

Photons as strings

I think my calculations in the previous post, as primitive as they were, actually provide quite some food for thought. If we’d treat a photon in the sodium light band (i.e. the light emitted by sodium, from a sodium lamp for instance) just like any other electromagnetic pulse, we would find it’s a pulse of not less than 3 meter long. While I know we have to treat such photon as an elementary particle, I would think it’s very tempting to think of it as a vibrating string.

Huh?

Yes. Let me copy that graph again. The assumption I started with is a standard one in physics, and not something that you’d want to argue with: photons are emitted when an electron jumps from a higher to a lower energy level and, for all practical purposes, this emission can be analyzed as the emission of an electromagnetic pulse by an atomic oscillator. I’ll refer you to my previous post – as silly as it is – for details on these basics: the atomic oscillator has a Q, and so there’s damping involved and, hence, the assumption that the electromagnetic pulse resembles a transient should not sound ridiculous. Because the electric field as a function in space is the ‘reversed’ image of the oscillation in time, the suggested shape has nothing blasphemous.

Just go along with it for a while. First, we need to remind ourselves that what’s vibrating here is nothing physical: it’s an oscillating electromagnetic field. That being said, in my previous post, I toyed with the idea that the oscillation could actually also represent the photon’s wave function, provided we use a unit for the electric field that ensures that the area under the squared curve adds up to one, so as to normalize the probability amplitudes. Hence, I suggested that the field strength over the length of this string could actually represent the probability amplitudes, provided we choose an appropriate unit to measure the electric field.

But then I was joking, right? Well… No. Why not consider it? An electromagnetic oscillation packs energy, and the energy is proportional to the square of the amplitude of the oscillation. Now, the probability of detecting a particle is related to its energy, and such probability is calculated from taking the (absolute) square of probability amplitudes. Hence, mathematically, this makes perfect sense.

It’s quite interesting to think through the consequences, and I’ll continue to do so in the coming weeks. One interesting thing is that the field strength (i.e. the magnitude of the electric field vector) is a real number. Hence, if we equate these magnitudes with probability amplitudes, we’d have real probability amplitudes, instead of complex-valued ones. That’s not an issue: we could look at them as complex numbers of which the imaginary part is zero. In fact, an obvious advantage of equating the imaginary part of the probability amplitude with zero is that we free up a dimension which we can then use to analyze the oscillation of the electric field in the other direction that’s normal to the direction of propagation, i.e. the z-coordinate, so we could take the polarization of the light into account. The figure below–which I took from Wikipedia again (by far the most convenient place to shop for images and animations: what would I do without it?– does not show the y- and z-coordinate of circularly or elliptically polarized light (it actually shows both the electric as well as the magnetic field vector associated with linearly polarized light, and also note that my y-direction is the x-direction here), but you get the idea: a probability wave has two spatial dimensions normal to its direction of propagation, and so we’d have two complex-valued functions really. However, if we can equate the electric field with the probability amplitude, we’d have a real-valued wave function. :-)

Another interesting thing to think about is how the collapse of the wave function would come about. If we think of a photon as a string, we can think of its energy as some kind of hook which could cause it to collapse into another lump of energy. What kind of hook? What force would come into play? Well… Perhaps all of them, as we know that even the weakest of fundamental forces – gravity – becomes much stronger at smaller distance scale, as it’s also subject to the inverse square law: its strength decreases (or increases) as the square of the distance. In fact, I don’t know much–nothing at all, actually–about great unification theories, but I understand that the prediction is that, at the Planck scale, all forces unify and become: gravity, electromagnetic force, strong nuclear force, and even weak nuclear force. So, why wouldn’t we think of some part of the string getting near enough to ‘something else’ (e.g. an electron) to get hooked and then collapse into the particle it is interacting with?

You must be laughing aloud now. A new string theory–really?

I know… I know… I haven’t reach sophomore level and I am already wildly speculating… Well… Yes. What I am talking about here has probably nothing to do with current string theories, although my proposed string would also replace the point-like photon by a one-dimensional ‘string’. However, ‘my’ string is, quite simply, an electromagnetic pulse (a transient actually, for reasons I explained in my previous post). Naive? Perhaps. However, I note that the earliest version of string theory is referred to as bosonic string theory, because it only incorporated bosons, which is what photons are.

So what? Well… Nothing… I am sure others have thought of this too, and I’ll look into it. It’s surely an idea which I’ll keep in the back of my head as I continue to explore physics. The idea is just too simple and beautiful to disregard, even if I am sure it must be pretty naive indeed. Photons as three-meter long strings? Let’s just forget about it. :-) Onwards !!!  :-)

# The shape and size of a photon

Photons are weird. In fact, I already wrote some fairly confused posts on them. This post is probably even more confusing. If anything, it shows how easy it is to get lost when thinking things through. In any case, it did help me to make sense of it all and, hence, perhaps it will help you too.

Electrons and photons: similarities and differences

All elementary particles are weird. As Feynman puts it, in the very first paragraph of his Lectures on Quantum Mechanics : “Historically, the electron, for example, was thought to behave like a particle, and then it was found that in many respects it behaved like a wave. So it really behaves like neither. Now we have given up. We say: “It is like neither. There is one lucky break, however—electrons behave just like light. The quantum behavior of atomic objects (electrons, protons, neutrons, photons, and so on) is the same for all, they are all “particle waves,” or whatever you want to call them. So what we learn about the properties of electrons will apply also to all “particles,” including photons of light.” (Feynman’s Lectures, Vol. III, Chapter 1, Section 1)

I wouldn’t dare to argue with Feynman, of course… But… Photons are like electrons, and then they are not. For starters, photons do not have mass, and they are bosons, force-carrying ‘particles’ obeying very different quantum-mechanical rules, referred to as Bose-Einstein statistics. I’ve written about that in the past, so I won’t do that again here. It’s probably sufficient to remind the reader that these rules imply that the so-called Pauli exclusion principle does not apply to them: bosons like to crowd together, thereby occupying the same quantum state–unlike their counterparts, the so-called fermions or matter-particles: quarks (which make up protons and neutrons) and leptons (including electrons and neutrinos), which can’t do that: two electrons can only sit on top of each other if their spins are opposite (so that makes their quantum state different), and there’s no place whatsoever to add a third one–because there are only two possible values for the spin: up or down.

From all that I’ve been writing so far, I am sure you have some kind of picture of matter-particles now, and notably of the electron: when everything is said and done, it’s a point-like particle defined by some weird wave function–the so-called ‘probability wave’. But what about the photon? They are point-like particles too, aren’t they? Hence, why wouldn’t we associate them with a probability wave too? Do they have a de Broglie wavelength?

Before answering that question, let me present that ‘picture’ of the electron once again.

The wave function for electrons

The electron ‘picture’ can be represented in a number of ways but one of the more scientifically correct ones – whatever that means – is that of a spatially confined wave function representing a complex quantity referred to as the probability amplitude. The animation below (which I took from Wikipedia) visualizes such wave functions. As you know by now, the wave function is usually represented by the Greek letter psi (ψ), and it is often referred to, as mentioned above, as a ‘probability wave’, although that term is quite misleading. Why? You surely know that by now: the wave function represents a probability amplitude, not a probability.

That being said, probability amplitude and probability are obviously related: if we square the psi function (so we square all these amplitudes), then we get the actual probability of finding that electron at point x. That’s the so-called probability density function on the right of each function. [I should be fully correct now and note that we are talking the absolute square here, or the squared norm: remember that the square of a complex number can be negative, as evidenced by the definition of i: i= –1. In fact, if there's only an imaginary part, then its square is always negative.]

Below, I’ve inserted another image, which gives a static picture (i.e. one that is not varying in time) of the wave function of an electron. To be precise: it’s the wave function for an electron on the 5d orbital of a hydrogen orbital. I also took it from Wikipedia and so I’ll just copy the explanation here: “The solid body shows the places where the electron’s probability density is above a certain value (0.02), as calculated from the probability amplitude.” As for the colors, this image uses the so-called HSL color system to represent complex numbers: each complex number is represented by a unique color, with a different hue (H), saturation (S) and lightness (L). [Just google if you want to know how that works exactly.]

The Uncertainty Principle revisited

The wave function is usually given as a function in space and time: ψ = ψ(x, t). However, I should also remind you that we have a similar function in the ‘momentum space’.  Indeed, the position-space and momentum-space wave functions are related through the Uncertainty Principle. To be precise: they are Fourier transforms of each other – but don’t be put off by that statement. I’ll just quickly jot down the Uncertainty Principle once again:

σx·σ≥ ħ/2

This is the so-called Kennard formulation of the Principle: it measures the uncertainty (usually written as Δ), of both the position (Δx) as well as momentum (Δp), in terms of the standard deviation–that’s the σ (sigma) symbol–around the mean. Hence, the assumption is that both x and p follow some kind of distribution–and that’s usually a nice “bell curve” in the textbooks. Finally, let me also remind you how tiny that physical constant ħ actually is: about 6.58×10−16 eV·s.

At this point, you may wonder about the units. A position is expressed in distance units, and momentum… Euh… [...] Momentum is mass times velocity, so it’s kg·m/s. Hence, the dimension of the product on the left-hand side of the inequality is m·kg·m/s = kg·m2/s. So what about this eV·s dimension on the right-hand side? Well… The electronvolt is a unit of energy, and so we can convert it to joules. Now, a joule is a newton-meter (N·m), which is the unit for both energy and work. So how do we relate the two sides? Simple: a newton can also be expressed in SI units: 1 N = 1 kg·m/s2: one newton is the force needed to give a mass of 1 kg an acceleration of 1 m/s per second. So just substitute and you’ll see the dimension on the right-hand side is kg·(m/s2)·m·s = kg·m2/s, so it comes out alright. Why this digression? Not sure. Perhaps just to remind you also that the Uncertainty Principle can also be expressed in terms of energy and time:

ΔE·Δt ≥ ħ/2

That expression makes it clear the units on both sides of the inequality are, indeed, the same, but it’s not so obvious to relate the two expressions of the same Uncertainty Principle. I’ll just note that energy and time, just like position and momentum, are also so-called complementary variables in quantum mechanics. We have the same duality for the de Broglie relation, which I’ll also jot down here:

λ = h/p and f = E/h

In these two complementary equations, λ is the wavelength of the (complex-valued) de Broglie wave, and is its frequency. A stupid question perhaps: what’s the velocity of the de Broglie wave? Well… As you should know from previous posts, the mathematically correct answer involves distinguishing the group and phase velocity of a wave, but the easy answer is: the de Broglie wave of a particle moves with the particle :-) and, hence, its velocity is, obviously, the speed of the particle which, for electrons, is usually non-relativistic (i.e. rather slow as compared to the speed of light).

Before proceeding, I need to make some more introductory remarks. The first is that the Uncertainty Principle implies that we cannot assign a precise wavelength (or a equally precise frequency) to a de Broglie wave: if there is a spread in p (and, hence, in E), then there will be a spread in λ (and in f). That’s good, because a regular wave with an exact frequency would not give us any information about the location. Frankly, I always had a lot of trouble understanding this, so I’ll just quote the expert teacher (Feynman) on this:

“The amplitude to find a particle at a place can, in some circumstances, vary in space and time, let us say in one dimension, in this manner: ψ Aei(ωtkx, where ω is the frequency, which is related to the classical idea of the energy through ħω, and k is the wave number, which is related to the momentum through ħk. [These are equivalent formulations of the de Broglie relations using the angular frequency and the wave number instead of wavelength and frequency.] We would say the particle had a definite momentum p if the wave number were exactly k, that is, a perfect wave which goes on with the same amplitude everywhere. The ψ Aei(ωtkxequation [then] gives the [complex-valued probability] amplitude, and if we take the absolute square, we get the relative probability for finding the particle as a function of position and time. This is a constant, which means that the probability to find a [this] particle is the same anywhere.” (Feynman’s Lectures, I-48-5)

Of course, that’s a problem: if the probability to find a particle is the same anywhere, then the particle can be anywhere, and, hence, that means it’s actually nowhere. Hence, that wave function doesn’t serve the purpose. In short, that nice ψ Aei(ωtkxfunction is completely useless in terms of representing an electron, or any other actual particle moving through space. So what to do?

The Wikipedia article on the Uncertainty Principle has this wonderful animation that shows how we can superimpose several waves to form a wave packet. And so that’s what we want: a wave packet traveling through (and limited in) space. I’ve copied the animation below. You should note that it shows only one part of the complex probability amplitude: just visualize the other part (imaginary if the wave below is the real part, and vice versa if the wave below would happen to represent the imaginary part of the probability amplitude). The illustration basically illustrates a mathematical operation (a Fourier analysis–and that’s not the same as that Fourier transform I mentioned above, although these two mathematical concepts obviously have a few things in common) that separates a wave packet into a finite or (potentially) infinite number of component waves. Indeed, note how, in the illustration below, the frequency increases gradually (or, what amounts to the same, the wavelength gets smaller) and, with every wave we add to the packet, it becomes increasingly localized. So that illustrates how the ‘uncertainty’ about (or the spread in) the frequency is inversely related to the uncertainty in position:

Δx = 1/[Δ(1/λ)] = h/Δp.

Frankly, I must admit both Feynman’s explanation and the animation above don’t quite convince me, because I can perfectly imagine a localized wave train with a fixed frequency and wavelength, like the one below, which I’ll re-use later. I’ve made this wave train myself: it’s a standard sine or cosine function multiplied with another easy function generating the envelope. You can easily make one like this yourself. This thing is localized in space and, as mentioned above, it has a fixed frequency and wavelength.

In any case, I need to move on. If you, my reader, would have any suggestion here (I obviously don’t quite get it here), please let me know. As for now, however, I’ll just continue wandering. Let me proceed with two more remarks:

1. What about the amplitude of a de Broglie wave? Well… It’s just like this perceived problem with fixed wavelengths: frankly, I couldn’t find much about that in textbooks either. However, there’s an obvious constraint: when everything is said and done, all probabilities must take a value between 0 and 1, and they must also all add up to exactly 1. So that’s a so-called normalization condition that obviously imposes some constraints on the (complex-valued) probability amplitudes of our wave function.
2. The complex waves above are so-called standing waves: their frequencies only take discrete values. You can, in fact, see that from the first animation. Likewise, there is no continuous distribution of energies. Well… Let me be precise and qualify that statement: in fact, when the electron is free, it can have any energy. It’s only when it’s bound that it must take one or another out of a set of allowed values, as illustrated below.

Now, you will also remember that an electron absorbs or emits a photon when it goes from one energy level to the other, so it absorbs or emits radiation. And, of course, you will also remember that the frequency of the absorbed or emitted light is related to those energy levels. More specifically, the frequency of the light emitted in a transition from, let’s say, energy level Eto Ewill be written as ν31 = (E– E1)/h. This frequency will be one of the so-called characteristic frequencies of the atom and will define a specific so-called spectral emission line.

Now that’s where the confusion starts, because that ν31 = (E– E1)/h equation is – from a mathematical point of view – identical to the de Broglie equation, which assigns a de Broglie wave to a particle: f = E/h. While mathematically similar, the formulas represent very different things. The most obvious remark to make is that a de Broglie wave is a matter-wave and, as such, quite obviously, that it has nothing to do with an electromagnetic wave. Let me be even more explicit. A de Broglie wave is not a ‘real’ wave, in a sense (but, of course, that’s a very unscientific statement to make); it’s a psi function, so it represents these weird mathematical quantities–complex probability amplitudes–which allow us to calculate the probability of finding the particle at position x or, if it’s a wave function for the momentum-space, to find a value p for its momentum. In contrast, a photon that’s emitted or absorbed represents a ‘real’ disturbance of the electromagnetic field propagating through space. Hence, that frequency ν is something very different than f, which is why we another symbol for it (ν is the Greek letter nu, not to be confused with the v we use for velocity). [Of course, let's not get into a philosophical discussion on how 'real' an electromagnetic field is here.]

That being said, we also know light is emitted in discrete energy packets: in fact, that’s how photons were defined originally, first by Planck and then by Einstein. Now, when an electron falls from one energy level in an atom to another (lower) energy level, it emits one – and only one – photon with that particular wavelength and energy. The question then is: how should we picture that photon? Does it also have some more or less defined position and momentum? Can we also associate some wave function – i.e. a de Broglie wave – with it?

When you google for an answer to this simple question, you will get very complicated and often contradictory answers. The very short answer I got from a nuclear scientist – you can imagine that these guys don’t have the time to give a more nuanced answer to idiots like me – was simple: No. One does not associate photons with a de Broglie wave.

When he gave me that answer, I first thought: of course not. A de Broglie wave is a ‘matter wave’, and photons aren’t matter. Period. That being said, the answer is not so simple. Photons do behave like electrons, don’t they? There’s diffraction (when you send a photon through one slit) and interference (when photons go through two slits, and there’s interference even when they go through one by one, just like electrons), and so on and so on. Most importantly, the Uncertainty Principle obviously also applies to them and, hence, one would expect to be able to associate some kind of wave function with them, wouldn’t one?

Who knows? That’s why I wrote this post. I don’t know, so I thought it would be nice to just freewheel a bit on this question. So be warned: nothing of what I write below has been researched really, so critical comments and corrections from actual specialists are more than welcome.

The shape of a photon wave

The answer in regard to the definition of a photon’s position and momentum is, obviously, unambiguous: photons are, per definition, little lumps of energy indeed, and so we detect them as such and, hence, they do occupy some physical space: we detect them somewhere, and we usually do so at a rather precise point in time.

They obviously also have some momentum. In fact, I calculated that momentum in one of my previous posts (Light: Relating Waves to Photons). It was related to the magnetic field vector, which we usually never mention when discussing light – because it’s so tiny as compared to the electric field vector – but so it’s there and a photon’s momentum (in the direction of propagation) is as tiny as the magnetic field vector for an electromagnetic wave traveling through space: p = E/c, with E = νh and c = 3×108. Hence, the momentum of a photon is only a tiny fraction of the photon’s energy. In fact, because it’s so tiny (remember that the energy of a photon is only one or two eV, and so that’s a very tiny unit, and so we have to divide that by c, which is a huge number obviously…), I’ll just forget about it here for a while and focus on that electric field vector only.

So… A photon is, in essence, a electromagnetic disturbance and so, when trying to picture a photon, we can think of some oscillating electric field vector traveling through–and also limited in–space. In short, in the classical world – and in the classical world only of course – a photon must be some electromagnetic wave train, like the one below–perhaps.

Hmm… Why would it have that shape? I don’t know. Your guess is as good as mine. But you’re right to be skeptical. In fact, the wave train above has the same shape as Feynman’s representation of a particle (see below) as a ‘probability wave’ traveling through–and limited in–space.

So, what about it? Let me first remind you once again (I just can’t stress this point enough it seems) that Feynman’s representation – and most are based on his, it seems – is, once again, misleading because it suggests that ψ(x) is some real number. It’s not. In the image above, the vertical axis should not represent some real number (and it surely should not represent a probability, i.e. some real positive number between 0 and 1) but a probability amplitude, i.e. a complex number in which both the real and imaginary part are important. As mentioned above, this wave function will, of course, give you all the probabilities you need when you take its (absolute) square, but so it’s not the same: the image above gives you only one part of the complex-valued wave function (it could be either the real or the imaginary part, in fact), which is why I find it misleading.

But let me go back to the first illustration: the vertical axis of the first illustration is not ψ but E – the electric field vector. So there’s no imaginary part here: just a real number, representing the strength–or magnitude I should say– of the electric field E as a function of the space coordinate x. [Can magnitudes be negative? Of course ! In that case, the field vector points the other way !] Does this suggestion for how a photon could look like make sense? In quantum mechanics, the answer is obviously: no. But in the classical worldview? Well… Maybe. [...] You should be skeptical, however. Even in the classical world, the answer is: most probably not. I know you won’t like it (because the formula doesn’t look easy), but let me remind you of the formula for the (vertical) component of E as a function of the acceleration of some charge q:

The charge q (i.e. the source of the radiation) is, of course, our electron that’s emitting the photon as it jumps from a higher to a lower energy level (or, vice versa, absorbing it). This formula basically states that the magnitude of the electric field (E) is proportional to the acceleration (a) of the charge (with t–r/c the retarded argument). Hence, the suggested shape of E as a function of x implies that the acceleration of the electron is initially quite small, that it swings between positive and negative, that these swings then become larger and larger to reach some maximum, and then they become smaller and smaller again to then die down completely. In short, it’s like a transient wave: “a short-lived burst of energy in a system caused by a sudden change of state”, as Wikipedia defines it.

[...] Well… No. An actual transient looks more like what’s depicted below: no gradual increase in amplitude but big swings initially which then dampen to zero. In other words, if our photon is a transient electromagnetic disturbance caused by a ‘sudden burst of energy’ (which is what that electron jump is, I would think), then its representation will, much more likely, resemble a damped wave, like the one below, rather than Feynman’s picture of a moving matter-particle.

In fact, we’d have to flip the image, both vertically and horizontally, because the acceleration of the source and the field are related as shown below. The vertical flip is because of the minus sign in the formula for E(t). The horizontal flip is because of the minus sign in the (t – r/c) term, the retarded argument: if we add a little time (Δt), we get the same value for a(tr/cas we would have if we had subtracted a little distance: Δr=cΔt. So that’s why E as a function of r (or of x), i.e. as a function in space, is a ‘reversed’ plot of the acceleration as a function of time.

So we’d have something like below.

What does this resemble? It’s not a vibrating string (although I do start to understand the attractiveness of string theory now: vibrating strings are great as energy storage systems, so the idea of a photon being some kind of vibrating string sounds great, doesn’t it?). It’s not resembling a bullwhip effect either, because the oscillation of a whip is confined by a different envelope (see below). And, no, it’s also definitely not a trumpet.

It’s just what it is: an electromagnetic transient traveling through space. Would this be realistic as a ‘picture’ of a photon? Frankly, I don’t know. I’ve looked at a lot of stuff but didn’t find anything on this really. The easy answer, of course, is quite straightforward: we’re not interested in the shape of a photon because we know it is not an electromagnetic wave. It’s a ‘wavicle’, just like an electron.

[...]

Sure. I know that too. Feynman told me. :-) But then why wouldn’t we associate some wave function with it? Please tell me, because I really can’t find much of an answer to that question in the literature, and so that’s why I am freewheeling here. So just go along with me for a while, and come up with another suggestion. As I said above, your bet is as good as mine. All that I know is that there’s one thing we need to explain when considering the various possibilities: a photon has a very well-defined frequency (which defines its color in the visible light spectrum) and so our wave train should – in my humble opinion – also have that frequency. At least for quite a while–and then I mean most of the time, or on average at least. Otherwise the concept of a frequency – or a wavelength – doesn’t make sense. Indeed, if the photon has no defined wavelength or frequency, then it has no color, and a photon should have a color: that’s what the Planck relation is all about.

What would be your alternative? I mean… Doesn’t it make sense to think that, when jumping from one energy level to the other, the electron would initially sort of overshoot its new equilibrium position, to then overshoot it again on the other side, and so on and so on, but with an amplitude that becomes smaller and smaller as the oscillation dies out? In short, if we look at radiation as being caused by atomic oscillators, why would we not go all the way and think of them as oscillators subject to some damping force? Just think about it. :-)

The size of a photon wave

Let’s forget about the shape for a while and think about size. We’ve got an electromagnetic train here. So how long would it be? Well… Feynman calculated the Q of these atomic oscillators: it’s of the order of 10(see his Lectures, I-34-3: it’s a wonderfully simple exercise, and one that really shows his greatness as a physics teacher) and, hence, this wave train will last about 10–8 seconds (that’s the time it takes for the radiation to die out by a factor 1/e). That’s not very long but, taking into account the rather spectacular speed of light (3×10m/s), that still makes for a wave train with a length of 3 meter. Three meter !? Holy sh**! That’s like infinity on an atomic scale! Such length surely doesn’t match the picture of a photon as a fundamental particle which cannot be broken up, does it? This surely cannot be right and, if it is, then there surely must be some way to break this thing up. It can’t be ‘elementary’, can it?

Well… You’re right, of course. I shouldn’t be doing these classical analyses of a photon, but then I think it actually is kind of instructive. So please do double-check but that’s what it is, it seems. For sodium light (I am just continuing Feynman’s example) here, which has a frequency of 500 THz (500×1012 oscillations per second) and a wavelength of 600 nm (600×10–9 meter), that length corresponds to some five million oscillations. All packed into one photon? One photon with a length of three meters? You must be joking, right?

Sure. I am joking here–but, as far as jokes go, this one is fairly robust from a scientific point of view, isn’t it? :-) Again, please do double-check and correct me, but all what I’ve written so far is not all that speculative. It corresponds to all what I’ve read about it: only one photon is produced per electron in any de-excitation, and its energy is determined by the number of energy levels it drops, as illustrated (for a simple hydrogen atom) below. For those who continue to be skeptical about my sanity here, I’ll quote Feynman once again:

“What happens in a light source is that first one atom radiates, then another atom radiates, and so forth, and we have just seen that atoms radiate a train of waves only for about 10–8 sec; after 10–8 sec, some atom has probably taken over, then another atom takes over, and so on. So the phases can really only stay the same for about 10–8 sec. Therefore, if we average for very much more than 10–8 sec, we do not see an interference from two different sources, because they cannot hold their phases steady for longer than 10–8 sec. With photocells, very high-speed detection is possible, and one can show that there is an interference which varies with time, up and down, in about 10–8 sec.” (Feynman’s Lectures, I-34-4)

So… Well… Now it’s up to you. I am going along here with the assumption that a photon, from a classical world perspective, should indeed be something that’s several meters long and something that packs like five million oscillations. So, while we usually measure stuff in seconds, or hours, or years, and, hence, while we would that think 10–8 seconds is short, a photon would actually be a very stretched-out transient that occupies quite a lot of space. I should also add that, in light of that scale (3 meter), the dampening seems to happen rather slowly!

I can see you shaking your head, for various reasons. First because this type of analysis is not appropriate. [Yes. I know. A photon should not be viewed as an electromagnetic wave. It's a discrete packet of energy. Period.] Second, I guess you may find the math involved in this post not to your liking, even if it’s quite simple and I am not doing anything spectacular here. [...] Whatever. I don’t care. I’ll just bulldozer on.

What about the ‘vertical’ dimension, the y and the z coordinates in space? We’ve got this long snaky  thing: how thick-bodied is it?

Here, we need to watch our language. It’s not very obvious to associate a photon with some kind of cross-section normal to its direction of propagation. Not at all actually. Indeed, as mentioned above, the vertical axis of that graph showing the wave train does not indicate some spatial position: it’s not a y- (or z-)coordinate but the magnitude of an electric field vector. [Just to underline the fact that this magnitude has nothing to do with spatial coordinates: note that the value of that magnitude depends on our unit, so it's really got nothing to do with an actual position in space-time.]

However, that being said, perhaps we can do something with that idea of a cross-section. In nuclear physics, the term ‘cross-section’ would usually refer to the so-called Thompson scattering cross-section, which can be defined rather loosely as the target area for the incident wave (i.e. the photon): it is, in fact, a surface which can be calculated from what is referred to as the classical electron radius, which is about 2.82×10–15 m. Just to compare: you may or may not remember the so-called Bohr radius of an atom, which is about 5.29×10–11 m, so that’s a length that’s about 20,000 times longer. To be fully complete, let me give you the exact value for the Thompson scattering cross-section: 6.62×10–29 m(note that this is a surface indeed, so we have m squared as a unit, not m).

Now, let me remind you – once again – that we should not associate the oscillation of the electric field vector with something actually happening in space: an electromagnetic field does not move in a medium and, hence, it’s not like a water or sound wave, which makes molecules go up and down as it propagates through its medium. To put it simply: there’s nothing that’s wriggling in space as that photon is flashing through space. However, when it does hit an electron, that electron will effectively ‘move’ (or vibrate or wriggle or whatever you can imagine) as a result of the incident electromagnetic field.

That’s what’s depicted and labeled below: there is a so-called ‘radial component’ of the electric field, and I would say: that’s our photon! [What else would it be?] The illustration below shows that this ‘radial’ component is just E for the incident beam and that, for the scattered beam, it is, in fact, determined by the electron motion caused by the incident beam through that relation described above, in which a is the normal component (i.e. normal to the direction of propagation of the outgoing beam) of the electron’s acceleration.

Now, before I proceed, let me remind you once again that the above illustration is, once again, one of those illustrations that only wants to convey an idea, and so we should not attach too much importance to it: the world at the smallest scale is best not represented by a billiard ball model. In addition, I should also note that the illustration above was taken from the Wikipedia article on elastic scattering (i.e. Thomson scattering), which is only a special case of the more general Compton scattering that actually takes place. It is, in fact, the low-energy limit. Photons with higher energy will usually be absorbed, and then there will be a re-emission, but, in the process, there will be a loss of energy in this ‘collision’ and, hence, the scattered light will have lower energy (and, hence, lower frequency and longer wavelength). But – Hey! – now that I think of it: that’s quite compatible with my idea of damping, isn’t it? :-) [If you think I've gone crazy, I am really joking here: when it's Compton scattering, there's no 'lost' energy: the electron will recoil and, hence, its momentum will increase. That's what's shown below (credit goes to the HyperPhysics site).]

In any case, I don’t want to make this post too long. I do think we’re getting something here in terms of our objective of picturing a photon–using classical concepts that is! A photon should be a long wave train – a very long wave train actually – but its effective ‘radius’ should be of the same order as the classical electron radius, one would think. Or, much more likely, much smaller. If it’s more or less the same radius, then it would be in the order of femtometers (1 fm = 1 fermi = 1×10–15 m). That’s good because that’s a typical length-scale in nuclear physics. For example, it would be comparable with the radius of a proton. So we look at a photon here as something very different – because it’s so incredibly long (as mentioned above, three meter is not an atomic scale at all!) – but as something which does have some kind of ‘radius’ that is normal to its direction of propagation and equal or, more likely, much smaller than the classical electron radius. [Why smaller? First, an electron is obviously fairly massive as compared to a photon (if only because an electron has a rest mass and a photon hasn't). Second, it's the electron that absorbs a photon–not the other way around.]

Now, that radius determines the area in which it may produce some effect, like hitting an electron, for example, or like being detected in a photon detector, which is just what this so-called radius of an atom or an electron is all about: the area which is susceptible of being hit by some particle (including a photon), or which is likely to emit some particle (including a photon). What is exactly, we don’t know: it’s still as spooky as an electron and, therefore, it also does not make all that much sense to talk about its exact position in space. However, if we’d talk about its position, then we should obviously also invoke the Uncertainty Principle, which will give us some upper and lower bounds for its actual position, just like it does for any other particle: the uncertainty about its position will be related to the uncertainty about its momentum, and more knowledge about the former, will implies less knowledge about the latter, and vice versa. Therefore, we can also associate some complex wave function with this photon which is – for all practical purposes – a de Broglie wave. Now how should we visualize that wave?

The shape and size of a photon’s probability wave

I am actually not going to offer anything specific here. First, it’s all speculation. Second, I think I’ve written too much rubbish already. However, if you’re still reading, and you like this kind of unorthodox application of electromagnetics, then the following remarks may stimulate your imagination.

First we should note that if we’re going to have a wave function for the photon in position-space (as opposed to momentum-space), its argument will not only be x and t, but also y and z. In fact, when trying to visualize this wave function, we should probably first think of keeping x and t constant and then how a little complex-valued wave train normal to the direction of propagation would look like.

What about its frequency? You may think that, if we know the frequency of this photon, and its energy, and its momentum (we know all about this photon, don’t we), then we can also associate some de Broglie frequency with this photon. Well… Yes and no. The simplicity of these de Broglie relations (λ = h/p and f = E/h) suggests we can, indeed, assign some frequency (f) or wavelength (λ) to it, all within the limits imposed by the Uncertainty Principle. But we know that we should not end up with a wave function that, when squared, gives us probabilities for each and every point in space. No. The wave function needs to be confined in space and, hence, we’re also talking a wave train here, and a very short one in this case. Indeed, while, in our reasoning here, we look at the photon as being somewhere, we know it should be somewhere within one or two femtometer of our line of sight.

Now, what’s the typical energy of a photon again? Well… Let’s calculate it for that sodium light. E = hν, so we have to multiply Planck’s constant (h = 4.135×10−15 eV·s) with the photon frequency (ν = 500×1012 oscillations/s), so that’s about 2 eV. I haven’t checked this but it should be about right: photons in the visible light spectrum have energies ranging from 1.5 to 3.5 eV. Not a lot but something. Now, what’s the de Broglie frequency and wavelength associated with that energy level? Hmm… Well… It’s the same formula, so we actually get the same frequency and wavelength: 500×1012 Hz and 600 nm (nanometer) for the wavelength. So how do we pack that into our one or two femtometer space? Hmm… Let’s think: one nanometer is a million femtometer, isn’t it? And so we’ve got a de Broglie wavelength of 600 nanometer?

Oh-oh We must be doing something wrong here, isn’t it?

Yeah. I guess so. Here I’ll quote Feynman again: “We cannot define a unique wavelength for a short wave train. Such a wave train does not have a definite wavelength; there is an indefiniteness in the wave number that is related to the finite length of the train.”

I had equally much trouble with this as with that other statement of Feynman–and then I mean on the ‘impossibility’ of a wave train with a fixed frequency. But now I think it’s very simple actually: a very very short burst is just not long enough to define a wavelength or a frequency: there’s a few up and downs, which are more likely than not to be very irregular, and that’s it. No nice sinusoidal shape. It’s as simple as that… I think. :-)

In fact–now that I am here–there’s something else I didn’t quite understand when reading physics: everyone who writes about light or matter waves seems to be focused on the frequency of these waves only. There’s little or nothing on the amplitude. Now, the energy of a physical wave, and of a light wave, does not only depend on its frequency, but also on the amplitude. In fact, we all know that doubling, tripling or quadrupling the frequency of a wave will double, triple or quadruple its energy (that’s obvious from the E = hν relation), but we tend to forget that the energy of a wave is also proportional to the square of its amplitude, for which I’ll use the symbol A, so we can write: E ∝ A2. Hence, if we double, triple or quadruple the amplitude of a wave, its energy will be multiplied by four, nine and sixteen respectively!

The same relationship obviously holds between probability amplitudes and probability densities: if we double, triple or quadruple the probability amplitudes, then the associated probabilities obviously also get multiplied by four, nine and sixteen respectively! This obviously establishes some kind of relation between the shape of the electromagnetic wave train and the probability wave: if the electromagnetic wave train (i.e. the photon itself) packs a lot of energy upfront (cf. the initial overshooting and the gradual dying out), then we should expect the probability amplitudes to be ‘bigger’ there as well. [Note that we can't directly compare two complex numbers in terms of one being 'bigger' or 'smaller' than the other, but you know what I mean: their absolute square will be bigger.]

So what?

Well… Nothing. I can’t say anything more about this. However, to compensate for the fact that I didn’t get anywhere with my concept of a de Broglie wave for a photon – and, hence, I let you down, I guess – I’ll explore the relationship between amplitude, frequency and size of a wave train somewhat more in detail. It may inspire you when thinking yourself about a ‘probability wave’ for a photon. And then… Well… I will write some kind of conclusion, which may or may not give the answer(s) that you are looking for.

The relation between amplitude, frequency and energy

From what I wrote above, it’s obvious that there are two ways of packing more energy in a (real) wave, or a (sufficiently long) wave train:

1. We can increase the frequency, and so that results in a linear increase in energy (twice the frequency is twice the energy).
2. We increase the amplitude, and that results in an exponential (quadratic) increase in energy (double all amplitudes, and you pack four times more energy in that wave).

With a ‘real’ wave, I obviously mean either a wave that’s traveling in a medium or, in this case, an electromagnetic wave. OK. So what? Well… It’s probably quite reasonable to assume that both factors come into play when an electron emits a photon. Indeed, if the difference between the two energy levels is larger, then the photon will not only have a higher frequency (i.e. we’re talking light (or electromagnetic radiation) in the upper ranges of the spectrum then) but one should also expect that the initial overshooting – and, hence, the initial oscillation – will also be larger. In short, we’ll have larger amplitudes. Hence, higher-energy photons will pack even more energy upfront. They will also have higher frequency, because of the Planck relation. So, yes, both factors would come into play.

What about the length of these wave trains? Would it make them shorter? Yes. I’ll refer you to Feynman’s Lectures to verify that the wavelength appears in the numerator of the formula for Q. Hence, higher frequency means shorter wavelength and, hence, lower Q. Now, I am not quite sure (I am not sure about anything I am writing here it seems) but this may or may not be the reason for yet another statement I never quite understood: photons with higher and higher energy are said to become smaller and smaller, and when they reach the Planck scale, they are said to become black holes.

Conclusion

What’s the conclusion? Well… I’ll leave it to you to think about this. Let me make a bold statement here: that transient above actually is the wave function.

You’ll say: What !? What about normalization? All probabilities have to add up to one and, surely, those magnitudes of the electric field vector wouldn’t add up to one, would they?

My answer to that is simple: that’s just a question of units, i.e. of normalization indeed. So just measure the field strength in some other unit and it will come all right.

[...] But… Yes? What? Well… Those magnitudes are real numbers, not complex numbers.

I am not sure how to answer that one but there’s two things I could say:

1. Real numbers are complex numbers too: it’s just that their imaginary part is zero)
2. When working with waves, and especially with transients, we’ve always represented them using the complex exponential function. For example, we would write a wave function whose amplitude varies sinusoidally in space and time as Aei(ωtr), with ω the (angular) frequency and k the wave number (so that’s the wavelength expressed in radians per unit distance).

Frankly, think about it: where is the photon? It’s that three-meter long transient, isn’t it? And the probability to find it somewhere is the (absolute) square of some complex number, right? And then we have a wave function already, representing an electromagnetic wave, for which we know that the energy which it packs is the square of its amplitude, as well as being proportional to its frequency. We also know we’re more likely to detect something with high energy than something with low energy, don’t we? So why would we hesitate?

But then what about these probability amplitudes being a function of the y and z coordinates?

Well… Frankly, I’ve started to wonder if a photon actually has a radius. If it doesn’t have a mass, it’s probably the only real point-like particle (i.e. a particle not occupying any space) – as opposed to all other matter-particles, which do have mass.

Why?

I don’t know. Your guess is as good as mine. Maybe our concepts of amplitude and frequency of a photon are not very relevant. Perhaps it’s only energy that counts. We know that a photon has a more or less well-defined energy level (within the limits of the Uncertainty Principle) and, hence, our ideas about how that energy actually gets distributed over the frequency, the amplitude and the length of that ‘transient’ have no relation with reality. Perhaps we like to think of a photon as a transient electromagnetic wave, because we’re used to thinking in terms of waves and fields, but perhaps a photon is just a point-like thing indeed, with a wave function that’s got the same shape as that transient. :-)

Post scriptum: I should apologize to you, my dear reader. It’s obvious that, in quantum mechanics, we don’t think of a photon as having some frequency and some wavelength and some dimension in space: it’s just an elementary particle with energy interacting with other elementary particles with energy, and we use these coupling constants and what have you to work with them. We don’t think of photons as a three-meter long snake swirling through space. So, when I write that “our concepts of amplitude and frequency of a photon are maybe not very relevant” when trying to picture a photon, and that “perhaps, it’s only energy that counts”, I actually don’t mean “maybe” or “perhaps”. I mean: Of course !!! In the quantum-mechanical world view, that is.

So I apologize for posting nonsense. However, as all of this nonsense helps me to make sense of these things myself, I’ll just continue. :-) I seem to move very slowly on this Road to Reality, but the good thing about moving slowly, is that it will – hopefully – give me the kind of ‘deeper’ understanding I want, i.e. an understanding beyond the formulas and mathematical and physical models. In the end, that’s all that I am striving for when pursuing this ‘hobby’ of mine. Nothing more, nothing less. :-)

# Babushka thinking

What is that we are trying to understand? As a kid, when I first heard about atoms consisting of a nucleus with electrons orbiting around it, I had this vision of worlds inside worlds, like a set of babushka dolls, one inside the other. Now I know that this model – which is nothing but the 1911 Rutherford model basically – is plain wrong, even if it continues to be used in the logo of the International Atomic Energy Agency, or the US Atomic Energy Commission.

Electrons are not planet-like things orbiting around some center. If one wants to understand something about the reality of electrons, one needs to familiarize oneself with complex-valued wave functions whose argument represents a weird quantity referred to as a probability amplitude and, contrary to what you may think (unless you read my blog, or if you just happen to know a thing or two about quantum mechanics), the relation between that amplitude and the concept of probability tout court is not very straightforward.

Familiarizing oneself with the math involved in quantum mechanics is not an easy task, as evidenced by all those convoluted posts I’ve been writing. In fact, I’ve been struggling with these things for almost a year now and I’ve started to realize that Roger Penrose’s Road to Reality (or should I say Feynman’s Lectures?) may lead nowhere – in terms of that rather spiritual journey of trying to understand what it’s all about. If anything, they made me realize that the worlds inside worlds are not the same. They are different – very different.

When everything is said and done, I think that’s what’s nagging us as common mortals. What we are all looking for is some kind of ‘Easy Principle’ that explains All and Everything, and we just can’t find it. The point is: scale matters. At the macro-scale, we usually analyze things using some kind of ‘billiard-ball model’. At a smaller scale, let’s say the so-called wave zone, our ‘law’ of radiation holds, and we can analyze things in terms of electromagnetic or gravitational fields. But then, when we further reduce scale, by another order of magnitude really – when trying to get  very close to the source of radiation, or if we try to analyze what is oscillating really – we get in deep trouble: our easy laws do no longer hold, and the equally easy math – easy is relative of course :-) – we use to analyze fields or interference phenomena, becomes totally useless.

Religiously inclined people would say that God does not want us to understand all or, taking a somewhat less selfish picture of God, they would say that Reality (with a capital R to underline its transcendental aspects) just can’t be understood. Indeed, it is rather surprising – in my humble view at least – that things do seem to get more difficult as we drill down: in physics, it’s not the bigger things – like understanding thermonuclear fusion in the Sun, for example – but the smallest things which are difficult to understand. Of course, that’s partly because physics leaves some of the bigger things which are actually very difficult to understand – like how a living cell works, for example, or how our eye or our brain works – to other sciences to study (biology and biochemistry for cells, or for vision or brain functionality). In that respect, physics may actually be described as the science of the smallest things. The surprising thing, then, is that the smallest things are not necessarily the simplest things – on the contrary.

Still, that being said, I can’t help feeling some sympathy for the simpler souls who think that, if God exists, he seems to throw up barriers as mankind tries to advance its knowledge. Isn’t it strange, indeed, that the math describing the ‘reality’ of electrons and photons (i.e. quantum mechanics and quantum electrodynamics), as complicated as it is, becomes even more complicated – and, important to note, also much less accurate – when it’s used to try to describe the behavior of  quarks and gluons? Additional ‘variables’ are needed (physicists call these ‘variables’ quantum numbers; however, when everything is said and done, that’s what quantum numbers actually are: variables in a theory), and the agreement between experimental results and predictions in QCD is not as obvious as it is in QED.

Frankly, I don’t know much about quantum chromodynamics – nothing at all to be honest – but when I read statements such as “analytic or perturbative solutions in low-energy QCD are hard or impossible due to the highly nonlinear nature of the strong force” (I just took this one line from the Wikipedia article on QCD), I instinctively feel that QCD is, in fact, a different world as well – and then I mean different from QED, in which analytic or perturbative solutions are the norm. Hence, I already know that, once I’ll have mastered Feynman’s Volume III, it won’t help me all that much to get to the next level of understanding: understanding quantum chromodynamics will be yet another long grind. In short, understanding quantum mechanics is only a first step.

Of course, that should not surprise us, because we’re talking very different order of magnitudes here: femtometers (10–15 m), in the case of electrons, as opposed to attometers (10–18 m) or even zeptometers ((10–21 m) when we’re talking quarks. Hence, if past experience (I mean the evolution of scientific thought) is any guidance, we actually should expect an entirely different world. Babushka thinking is not the way forward.

Babushka thinking

What’s babushka thinking? You know what babushkas are, don’t you? These dolls inside dolls. [The term 'babushka' is actually Russian for an old woman or grandmother, which is what these dolls usually depict.] Babushka thinking is the fallacy of thinking that worlds inside worlds are the same. It’s what I did as a kid. It’s what many of us still do. It’s thinking that, when everything is said and done, it’s just a matter of not being able to ‘see’ small things and that, if we’d have the appropriate equipment, we actually would find the same doll within the larger doll – the same but smaller – and then again the same doll with that smaller doll. In Asia, they have these funny expression: “Same-same but different.” Well… That’s what babushka thinking all about: thinking that you can apply the same concepts, tools and techniques to what is, in fact, an entirely different ballgame.

Let me illustrate it. We discussed interference. We could assume that the laws of interference, as described by superimposing various waves, always hold, at every scale, and that it’s just  the crudeness of our detection apparatus that prevents us from seeing what’s going on. Take two light sources, for example, and let’s say they are a billion wavelengths apart – so that’s anything between 400 to 700 meters for visible light (because the wavelength of visible light is 400 to 700 billionths of a meter). So then we won’t see any interference indeed, because we can’t register it. In fact, none of the standard equipment can. The interference term oscillates wildly up and down, from positive to negative and back again, if we move the detector just a tiny bit left or right – not more than the thickness of a hair (i.e. 0.07 mm or so). Hence, the range of angles θ (remember that angle θ was the key variable when calculating solutions for the resultant wave in previous posts) that are being covered by our eye – or by any standard sensor really – is so wide that the positive and negative interference averages out: all that we ‘see’ is the sum of the intensities of the two lights. The terms in the interference term cancel each other out. However, we are still essentially correct assuming there actually is interference: we just cannot see it – but it’s there.

Reinforcing the point, I should also note that, apart from this issue of ‘distance scale’, there is also the scale of time. Our eye has a tenth-of-a-second averaging time. That’s a huge amount of time when talking fundamental physics: remember that an atomic oscillator – despite its incredibly high Q – emits radiation for like 10-8 seconds only, so that’s one-hundred millionths of a second. Then another atom takes over, and another – and so that’s why we get unpolarized light: it’s all the same frequencies (because the electron oscillators radiate at their resonant frequencies), but so there is no fixed phase difference between all of these pulses: the interference between all of these pulses should result in ‘beats’ – as they interfere positively or negatively – but it all cancels out for us, because it’s too fast.

Indeed, while the ‘sensors’ in the retina of the human eye (there are actually four kind of cells there, but so the principal ones are referred to as ‘rod’ and ‘cone’ cells respectively) are, apparently, sensitive enough able to register individual photons, the “tenth-of-a-second averaging” time means that the cells – which are interconnected and ‘pre-process’ light really – will just amalgamate all those individual pulses into one signal of a certain color (frequency) and a certain intensity (energy). As one scientist puts it: “The neural filters only allow a signal to pass to the brain when at least about five to nine photons arrive within less than 100 ms.” Hence, that signal will not keep track of the spacing between those photons.

In short, information gets lost. But so that, in itself, does not invalidate babushka thinking. Let me visualize it by a non-very-mathematically-rigorous illustration. Suppose that we have some very regular wave train coming in, like the one below: one wave train consisting of three ‘groups’ separated between ‘nodes’.

All will depend on the period of the wave as compared to that one-tenth-of-a-second averaging time. In fact, we have two ‘periods’: the periodicity of the group – which is related to the concept of group velocity – and, hence, I’ll associate a ‘group wavelength’ and a ‘group period’ with that. [In case you haven’t heard of these terms before, don’t worry: I haven’t either. :-)] Now, if one tenth of a second covers like two or all three of the groups between the nodes (so that means that one tenth of a second is a multiple of the group period Tg), then even the envelope of the wave does not matter much in terms of ‘signal’: our brain will just get one pulse that averages it all out. We will see none of the detail of this wave train. Our eye will just get light in (remember that the intensity of the light is the square of the amplitude, so the negative amplitudes make contributions too) but we cannot distinguish any particular pulse: it’s just one signal. This is the most common situation when we are talking about electromagnetic radiation: many photons arrive but our eye just sends one signal to the brain: “Hey Boss! Light of color X and intensity Y coming from direction Z.”

In fact, it’s quite remarkable that our eye can distinguish colors in light of the fact that the wavelengths of various colors (violet, blue, green, yellow, orange and red) differs 30 to 40 billionths of a meter only! Better still: if the signal lasts long enough, we can distinguish shades whose wavelengths differ by 10 or 15 nm only, so that’s a difference of 1% or 2% only. In case you wonder how it works: Feynman devotes not less than two chapters in his Lectures to the physiology of the eye: not something you’ll find in other physics handbooks! There are apparently three pigments in the cells in our eyes, each sensitive to color in a different way and it is “the spectral absorption in those three pigments that produces the color sense.” So it’s a bit like the RGB system in a television – but then more complicated, of course!

But let’s go back to our wave there and analyze the second possibility. If a tenth of a second covers less than that ‘group wavelength’, then it’s different: we will actually see the individual groups as two or  three separate pulses. Hence, in that case, our eye – or whatever detector (another detector will just have another averaging time – will average over a group, but not over the whole wave train. [Just in case you wonder how we humans compare with our living beings: from what I wrote above, it’s obvious we can see ‘flicker’ only if the oscillation is in the range of 10 or 20 Hz. The eye of a bee is made to see the vibrations of feet and wings of other bees and, hence, its averaging time is much shorter, like a hundredth of a second and, hence, it can see flicker up to 200 oscillations per second! In addition, the eye of a bee is sensitive over a much wider range of ‘color’ – it sees UV light down to a wavelength of 300 nm (where as we don’t see light with a wavelength below 400 nm) – and, to top it all off, it has got a special sensitivity for polarized light, so light that gets reflected or diffracted looks different to the bee.]

Let’s go to the third and final case. If a tenth of a second would cover less than the wavelength of the the so-called carrier wave, i.e. the actual oscillation, then we will be able to distinguish the individual peaks and troughs of the carrier wave!

Of course, this discussion is not limited to our eye as a sensor: any instrument will be able to measure individual phenomena only within a certain range, with an upper and a lower range, i.e. the ‘biggest’ thing it can see, and the ‘smallest’. So that explains the so-called resolution of an optical or an electron microscope: whatever the instrument, it cannot really ‘see’ stuff that’s smaller than the wavelength of the ‘light’ (real light or – in the case of an electron microscope – electron beams) it uses to ‘illuminate’ the object it is looking at. [The actual formula for the resolution of a microscope is obviously a bit more complicated, but this statement does reflect the gist of it.]

However, all that I am writing above, suggests that we can think of what’s going on here as ‘waves within waves’, with the wave between nodes not being any different – in substance that is – as the wave as a whole: we’ve got something that’s oscillating, and within each individual oscillation, we find another oscillation. From a math point of view, babushka thinking is thinking we can analyze the world using Fourier’s machinery to decompose some function (see my posts on Fourier analysis). Indeed, in the example above, we have a modulated carrier wave (it is an example of amplitude modulation – the old-fashioned way of transmitting radio signals), and we see a wave within a wave and, hence, just like the Rutherford model of an atom, you may think there will always be ‘a wave within a wave’.

In this regard, you may think of fractals too: fractals are repeating or self-similar patterns that are always there, at every scale. However, the point to note is that fractals do not represent an accurate picture of how reality is actually structured: worlds within worlds are not the same.

Reality is no onion

Reality is not some kind of onion, from which you peel off a layer and then you find some other layer, similar to the first: “same-same but different”, as they’d say in Asia. The Coast of Britain is, in fact, finite, and the grain of sand you’ll pick up at one of its beaches will not look like the coastline when you put it under a microscope. In case you don’t believe me: I’ve inserted a real-life photo below. The magnification factor is a rather modest 300 times. Isn’t this amazing? [The credit for this nice picture goes to a certain Dr. Gary Greenberg. Please do google his stuff. It's really nice.]

In short, fractals are wonderful mathematical structures but – in reality – there are limits to how small things get: we cannot carve a babushka doll out of the cellulose and lignin molecules that make up most of what we call wood. Likewise, the atoms that make up the D-glucose chains in the cellulose will never resemble the D-glucose chains. Hence, the babushka doll, the D-glucose chains that make up wood, and the atoms that make up the molecules within those macro-molecules are three different worlds. They’re not like layers of the same onion. Scale matters. The worlds inside words are different, and fundamentally so: not “same-same but different” but just plain different. Electrons are no longer point-like negative charges when we look at them at close range.

In fact, that’s the whole point: we can’t look at them at close range because we can’t ‘locate’ them. They aren’t particles. They are these strange ‘wavicles’ which we described, physically and mathematically, with a complex wave function relating their position (or their momentum) with some probability amplitude, and we also need to remember these funny rules for adding these amplitudes, depending on whether or not the ‘wavicle’ obeys Fermi or Bose statistics.

Weird, but – come to think of it – not more weird, in terms of mathematical description, than these electromagnetic waves. Indeed, when jotting down all these equations and developing all those mathematical argument, one often tends to forget that we are not talking some physical wave here. The field vector E (or B) is a mathematical construct: it tells us what force a charge will feel when we put it here or there. It’s not like a water or sound wave that makes some medium (water or air) actually move. The field is an influence that travels through empty space. But how can something actually through empty space? When it’s truly empty, you can’t travel through it, can you?

Oh – you’ll say – but we’ve got these photons, don’t we? Waves are not actually waves: they come in little packets of energy – photons. Yes. You’re right. But, as mentioned above, these photons aren’t little bullets – or particles if you want. They’re as weird as the wave and, in any case, even a billiard ball view of the world is not very satisfying: what happens exactly when two billiard balls collide in a so-called elastic collision? What are the springs on the surface of those balls – in light of the quick reaction, they must resemble more like little explosive charges that detonate on impact, isn’t it? – that make the two balls recoil from each other?

So any mathematical description of reality becomes ‘weird’ when you keep asking questions, like that little child I was – and I still am, in a way, I guess. Otherwise I would not be reading physics at the age of 45, would I? :-)

Conclusion

Let me wrap up here. All of what I’ve been blogging about over the past few months concerns the classical world of physics. It consists of waves and fields on the one hand, and solid particles on the other – electrons and nucleons. But so we know it’s not like that when we have more sensitive apparatuses, like the apparatus used in that 2012 double-slit electron interference experiment at the University of Nebraska–Lincoln, that I described at length in one of my earlier posts. That apparatus allowed control of two slits – both not more than 62 nanometer wide (so that’s the difference between the wavelength of dark-blue and light-blue light!), and the monitoring of single-electron detection events. Back in 1963, Feynman already knew what this experiment would yield as a result. He was sure about it, even if he thought such instrument could never be built. [To be fully correct, he did have some vague idea about a new science, for which he himself coined the term ‘nanotechnology’, but what we can do today surpasses, most probably, all his expectations at the time. Too bad he died too young to see his dreams come through.]

The point to note is that this apparatus does not show us another layer of the same onion: it shows an entirely different world. While it’s part of reality, it’s not ‘our’ reality, nor is it the ‘reality’ of what’s being described by classical electromagnetic field theory. It’s different – and fundamentally so, as evidenced by those weird mathematical concepts one needs to introduce to sort of start to ‘understand’ it.

So… What do I want to say here? Nothing much. I just had to remind myself where I am right now. I myself often still fall prey to babushka thinking. We shouldn’t. We should wonder about the wood these dolls are made of. In physics, the wood seems to be math. The models I’ve presented in this blog are weird: what are those fields? And just how do they exert a force on some charge? What’s the mechanics behind? To these questions, classical physics does not have an answer really.

But, of course, quantum mechanics does not have a very satisfactory answer either: what does it mean when we say that the wave function collapses? Out of all of the possibilities in that wonderful indeterminate world ‘inside’ the quantum-mechanical universe, one was ‘chosen’ as something that actually happened: a photon imparts momentum to an electron, for example. We can describe it, mathematically, but – somehow – we still don’t really understand what’s going on.

So what’s going on? We open a doll, and we do not find another doll that is smaller but similar. No. What we find is a completely different toy. However – Surprise ! Surprise ! – it’s something that can be ‘opened’ as well, to reveal even weirder stuff, for which we need even weirder ‘tools’ to somehow understand how it works (like lattice QCD, if you’d want an example: just google it if you want to get an inkling of what that’s about). Where is this going to end? Did it end with the ‘discovery’ of the Higgs particle? I don’t think so.

However, with the ‘discovery’ (or, to be generous, let’s call it an experimental confirmation) of the Higgs particle, we may have hit a wall in terms of verifying our theories. At the center of a set of babushka dolls, you’ll usually have a little baby: a solid little thing that is not like the babushkas surrounding it: it’s young, male and solid, as opposed to the babushkas. Well… It seems that, in physics, we’ve got several of these little babies inside: electrons, photons, quarks, gluons, Higgs particles, etcetera. And we don’t know what’s ‘inside’ of them. Just that they’re different. Not “same-same but different”. No. Fundamentally different. So we’ve got a lot of ‘babies’ inside of reality, very different from the ‘layers’ around them, which make up ‘our’ reality. Hence, ‘Reality’ is not a fractal structure. What is it? Well… I’ve started to think we’ll never know. For all of the math and wonderful intellectualism involved, do we really get closer to an ‘understanding’ of what it’s all about?

I am not sure. The more I ‘understand’, the less I ‘know’ it seems. But then that’s probably why many physicists still nurture an acute sense of mystery, and why I am determined to keep reading. :-)

Post scriptum: On the issue of the ‘mechanistic universe’ and the (related) issue of determinability and indeterminability, that’s not what I wanted to write about above, because I consider that solved. This post is meant to convey some wonder – on the different models of understanding that we need to apply to different scales. It’s got little to do with determinability or not. I think that issue got solved long time ago, and I’ll let Feynman summarize that discussion:

“The indeterminacy of quantum mechanics has given rise to all kinds of nonsense and questions on the meaning of freedom of will, and of the idea that the world is uncertain. […] Classical physics is also indeterminate. It is true, classically, that if we knew the position and the velocity of every particle in the world, or in a box of gas, we could predict exactly what would happen. And therefore the classical world is deterministic. Suppose, however, we have a finite accuracy and do not know exactly where just one atom is, say to one part in a billion. Then as it goes along it hits another atom, and because we did not know the position better than one part in a billion, we find an even larger error in the position after the collision. And that is amplified, of course, in the next collision, so that if we start with only a tiny error it rapidly magnifies to a very great uncertainty. […] Speaking more precisely, given an arbitrary accuracy, no matter how precise, one can find a time long enough that we cannot make predictions valid for that long a time. That length of time is not very large. It is not that the time is millions of years if the accuracy is one part in a billion. The time goes only logarithmically with the error. In only a very, very tiny time – less than the time it took to state the accuracy – we lose all our information. It is therefore not fair to say that from the apparent freedom and indeterminacy of the human mind, we should have realized that classical ‘deterministic’ physics could not ever hope to understand, and to welcome quantum mechanics as a release from a completely ‘mechanistic’ universe. For already in classical mechanics, there was indeterminability from a practical point of view.” (Feynman, Lectures, 1963, p. 38-10)

That really says it all, I think. I’ll just continue to keep my head down – i.e. stay away from philosophy as for now – and try to find a way to open the toy inside the toy. :-)

# Light: relating waves to photons

This is a concluding note on my ‘series’ on light. The ‘series’ gave you an overview of the ‘classical’ theory: light as an electromagnetic wave. It was very complete, including relativistic effects (see my previous post). I could have added more – there’s an equivalent for four-vectors, for example, when we’re dealing with frequencies and wave numbers: quantities that transform like space and time under the Lorentz transformations – but you got the essence.

One point we never ever touched upon, was that magnetic field vector though. It is there. It is tiny because of that 1/c factor, but it’s there. We wrote it as

All symbols in bold are vectors, of course. The force is another vector vector cross-product: F = qv×B, and you need to apply the usual right-hand screw rule to find the direction of the force. As it turns out, that force – as tiny as it is – is actually oriented in the direction of propagation, and it is what is responsible for the so-called radiation pressure.

So, yes, there is a ‘pushing momentum’. How strong is it? What power can it deliver? Can it indeed make space ships sail? Well… The magnitude of the unit vector er’ is obviously one, so it’s the values of the other vectors that we need to consider. If we substitute and average F, the thing we need to find is:

〈F〉 = q〈vE〉/c

But the charge q times the field is the electric force, and the force on the charge times the velocity is the work dW/dt being done on the charge. So that should equal the energy absorbed that is being absorbed from the light per second. Now, I didn’t look at that much. It’s actually one of the very few things I left – but I’ll refer you to Feynman’s Lectures if you want to find out more: there’s a fine section on light scattering, introducing the notion of the Thompson scattering cross section, but – as said – I think you had enough as for now. Just note that 〈F〉 = [dW/dt]/c and, hence, that the momentum that light delivers is equal to the energy that is absorbed (dW/dt) divided by c.

So the momentum carried is 1/c times the energy. Now, you may remember that Planck solved the ‘problem’ of black-body radiation – an anomaly that physicists couldn’t explain at the end of the 19th century – by re-introducing a corpuscular theory of light: he said light consisted of photons. We all know that photons are the kind of ‘particles’ that the Greek and medieval corpuscular theories of light envisaged but, well… They have a particle-like character – just as much as they have a wave-like character. They are actually neither, and they are physically and mathematically being described by these wave functions – which, in turn, are functions describing probability amplitudes. But I won’t entertain you with that here, because I’ve written about that in other posts. Let’s just go along with the ‘corpuscular’ theory of photons for a while.

Photons also have energy (which we’ll write as W instead of E, just to be consistent with the symbols above) and momentum (p), and Planck’s Law says how much:

W = hf and p = W/c

So that’s good: we find the same multiplier 1/c here for the momentum of a photon. In fact, this is more than just a coincidence of course: the “wave theory” of light and Planck’s “corpuscular theory” must of course link up, because they are both supposed to help us understand real-life phenomena.

There’s even more nice surprises. We spoke about polarized light, and we showed how the end of the electric field vector describes a circular or elliptical motion as the wave travels to space. It turns out that we can actually relate that to some kind of angular momentum of the wave (I won’t go into the details though – because I really think the previous posts have really been too heavy on equations and complicated mathematical arguments) and that we could also relate it to a model of photons carrying angular momentum, “like spinning rifle bullets” – as Feynman puts it.

However, he also adds: “But this ‘bullet’ picture is as incomplete as the ‘wave’ picture.” And so that’s true and that should be it. And it will be it. I will really end this ‘series’ now. It was quite a journey for me, as I am making my way through all of these complicated models and explanations of how things are supposed to work. But a fascinating one. And it sure gives me a much better feel for the ‘concepts’ that are hastily explained in all of these ‘popular’ books dealing with science and physics, hopefully preparing me better for what I should be doing, and that’s to read Penrose’s advanced mathematical theories.

Now we are really going to do some very serious analysis: relativistic effects in radiation. Fasten your seat belts please.

The Doppler effect for physical waves traveling through a medium

In one of my post – I don’t remember which one – I wrote about the Doppler effect for sound waves, or any physical wave traveling through a medium. I also said it had nothing to do with relativity. What happens really is that the observer sort of catches up with wave or – the other way around: that he falls back – and, because the velocity of a wave in a medium is always the same (that also has nothing to do with relativity: it’s just a general principle that’s true – always), the frequency of the physical wave will appear to be different. [So that's why a siren of an ambulance sounds different as it moves past you.] Wikipedia has an excellent article on that – so I’ll refer you to that – but so that article does not say anything – or nothing much – about the Doppler effect when electromagnetic radiation is involved. So that’s what we’ll talk about here. Before we do that, though, let me quickly jot down the formula for the Doppler effect for a physical wave traveling through a medium, so we are clear about the differences between the two ‘Doppler effects':

In this formula, vp is the propagation speed of the wave in the medium which – as mentioned above – depends on the medium. Now the source and the receiver will have a velocity with respect to the medium as well (positive or negative – depending on whether they’re moving in the same direction as the wave or not), and so that’s vr (r for receiver) and v(s for source) respectively. So we’re adding speeds here to calculate some relative speed and then we take the ratio. Some people think that’s what relativity theory is all about. It’s not. Everything I’ve written so far is just Galilean relativity – so stuff that’s been known for a thousand years already, if not longer.

Relativity is weird: one aspect of it is length contraction – not often discussed – but the other thing is better known: there is no such thing as absolute time, so when we talk velocities, you need to specify according to whose time?

The Doppler effect for electromagnetic radiation

One thing that’s not relative – and which makes things look somewhat similar to what I wrote above – is that the speed of light is always equal to c. That was the startling fact that came out of Maxwell’s equations (startling because it is not consistent with Galilean relativity: it says that we cannot ‘catch up’ with a light wave!) around which Einstein build all of “new physics”, and so it’s something we use rather matter-of-factly in all that we’ll write below.

In all of the preceding posts about light, I wrote – more than once actually – that the movement of the oscillating charge (i.e. the source of the radiation) along the line of sight did not matter: the only thing that mattered was its acceleration in the xy-plane, which is perpendicular to our line of sight, which we’ll call the z-axis. Indeed, let me remind of you of the two equations defining electromagnetic radiation (see my post on light and radiation about a week ago):

The first formula gives the electromagnetic effect that dominates in the so-called wave zone, i.e. a few wavelengths away from the source – because the Coulomb force varies as the inverse of the square of the distance r, unlike this ‘radiation’ effect, which varies inversely as the distance only (E ∝ 1/r), so it falls off much less rapidly.

Now, that general observation still holds when we’re considering an oscillating charge that is moving towards or away from us with some relativistic speed (i.e. a speed getting close enough to c to produce relativistic length contraction and time dilation effects) but, because of the fact that we need to consider local times, our formula for the retarded time is no longer correct.

Huh? Yes. The matter is quite complicated, and Feynman starts with jotting down the derivatives for the displacement in the x- and y-directions, but I think I’ll skip that. I’ll go for the geometrical picture straight away, which is given below. As said, it’s going to be difficult, but try to hang in here, because it’s necessary to understand the Doppler effect in a correct way (I myself have been fooled by quite a few nonsensical explanations of it in popular books) and, as a bonus, you also get to understand synchrotron radiation and other exciting stuff.

So what’s going on here? Well… Don’t look at the illustration above. We’ll come back at it. Let’s first build up the logic. We’ve got a charge moving vertically – from our point that is (the observer), but also moving in and out of us, i.e. in the z-direction. Indeed, note the arrow pointing to the observer: that’s us! So it could indeed be us looking at electrons going round and round and round – at phenomenal speeds – in a synchrotron indeed – but then one that’s been turned 90 degrees (probably easier for us to just lie down on the ground and look sideways). In any case, I hope you can imagine the situation. [If not, try again.] Now, if that charge was not moving at relativistic speeds (e.g. 0.94c, which is actually the number that Feynman uses for the graph above), then we would not have to worry about ‘our’ time t and the ‘local’ time τ. Huh? Local time? Yes.

We denote the time as measured in the reference frame of the moving charge as τ. Hence, as we are counting t = 1, 2, 3 etcetera, the electron is counting τ = 1, 2, 3 etcetera as it goes round and round and round. If the charge would not be moving at relativistic speeds, we’d do the standard thing and that’s to calculate the retarded acceleration of the charge a'(t) = a(t − r/c). [Remember that we used a prime to mark functions for which we should use a retarded argument and, yes, I know that the term 'retarded' sounds a bit funny, but that's how it is. In any case, we'd have a'(t) = a(t − r/c) – so the prime vanishes as we put in the retarded argument.] Indeed, from the ‘law’ of radiation, we know that the field now and here is given by the acceleration of the charge at the retarded time, i.e. t – r/c. To sum it all up, we would, quite simply, relate t and τ as follows:

τ = t – r/c or, what amounts to the same, t = τ + r/c

So the effect that we see now, at time t, was produced at a distance r at time τ = t − r/c. That should make sense. [Again, if it doesn't: read again. I can't explain it in any other way.]

The crucial chain in this rather complicated chain of reasoning comes now. You’ll remember that one of the assumptions we used to derive our ‘law’ of radiation was that the assumption that “r is practically constant.” That does no longer hold. Indeed, that electron moving around in the synchrotron comes in and out at us at crazy speeds (0.94c is a bit more than 280,000 km per second), so r goes up and down too, and the relevant distance is not r but r + z(τ). This means the retardation effect is actually a bit larger: it’s not r/c but [r + z(τ)]/c = r/c + z(τ)/c. So we write:

τ = t – r/c – z(τ)/c or, what amounts to the same, t = τ + r/c + z(τ)/c

Hmm… You’ll say this is rather fishy business. Why would we use the actual distance and not the distance a while ago? Well… I’ll let you mull over that. We’ve got two points in space-time here and so they are separated both by distance as well as time. It makes sense to use the actual distance to calculate the actual separation in time, I’d say. If you’re not convinced, I can only refer you to those complicated derivations that Feynman briefly does before introducing this ‘easier’ geometric explanation. This brings us to the next point. We can measure time in seconds, but also in equivalent distance, i.e. light-seconds: the distance that light (remember: always at absolute speed c) travels in one second, i.e. approximately 299,792,458 meter. It’s just another ‘time unit’ to get used to.

Now that’s what’s being done above. To be sure, we get rid of the constant r/c, which is a constant indeed: that amounts to a shift of the origin by some constant (so we start counting earlier or later). In short, we have a new variable ‘t’ really that’s equal to t = τ + z(τ)/c. But we’re going to count time in meter (well – in units of c meter really), so we will just multiply this and we get:

ct = cτ + z(τ)

Why the different unit? Well… We’re talking relativistic speeds here, don’t we? And so the second is just an inappropriate unit. When we’re finished with this example, we’ll give you an even simpler example: a source just moving in on us, with an equally astronomical speed, so the functional shape of z(τ) will be some fraction of c (let’s say kc) times τ, so kcτ. So, to simplify things, just think of it as re-scaling the time axis in units that makes sense as compared to the speeds we are talking about.

Now we can finally analyze that graph on the right-hand side. If we would keep r fixed – so if we’d not care about the charge moving in and out of – the plot of x'(t) – i.e. the retarded position indeed – against ct would yield the sinusoidal graph plotted by the red numbers 1 to 13 here. In fact, instead of a sinusoidal graph, it resembles a normal distribution, but that’s just because we’re looking at one revolution only. In any case, the point to note is that – when everything is said and done – we need to calculate the retarded acceleration, so that’s the second derivative of x'(t). The animated illustration shows how that works: the second derivative (not the first) turns from positive to negative – and vice versa – at inflection points, when the curve goes from convex to concave, and vice versa. So, on the segment of that sinusoidal function marked by the red numbers, it’s positive at first (the slope of the tangent becomes steeper and steeper), then negative (cf. that tangent line turning from blue to green in the illustration below), and then it becomes positive again, as the negative slope becomes less negative at the second inflection point.

That should be straightforward. However, the actual x'(t) curve is the black curve with the cusp. A curve like that is called a hypocycloid. Let me reproduce it once again for ease of reference.

We relate x'(t) to ct (this is nothing but our new unit for t) by noting that ct = cτ + z(τ). Capito? It’s not easy to grasp: the instinct is to just equate t and τ and write x'(t) = x'(τ), but that would not be correct. No. We must measure in ‘our’ time and get a functional form for x’ as a function of t, not of τ. In fact, x'(t) − i.e. the retarded vertical position at time t – is not equal to x'(τ) but to x(τ), i.e. that’s the actual (instead of retarded) position at (local) time τ, and so that’s what the black graph above shows.

I admit it’s not easy. I myself am not getting it in an ‘intuitive’ way. But the logic is solid and leads us where it leads us. Perhaps it helps to think in terms of the curvature of this graph. In fact, we have to think in terms of curvature of this graph in order to understand what’s happening in terms of radiation. When the charge is moving away from us, i.e. during the first three ‘seconds’ (so that’s the 1-2-3 count), we see that the curvature is less than than what it would be – and also doesn’t change very much – if the displacement was given by the sinusoidal function, which means there’s very little radiation (because there’s little acceleration – negative or positive. However, as the charge moves towards us, we get that sharp cusp and, hence, we also get sharp curvature, which results in a sharp pulse of the electric field, rather than the regular – equally sinusoidal – amplitude we’d have if that electron was not moving in and out at us at relativistic speeds. In fact, that’s what synchrotron radiation is: we get these sharp pulses indeed. Feynman shows how they are measured – in very much detail – using a diffraction grating, but that would just be another diversion and so I’ll spare you of that.

Hmm… This has nothing to do with the Doppler effect, you’ll say. Well… Yes and no. The discussion above basically set the stage for that discussion. So let’s turn to that now. However, before I do that, I want to insert another graph for an oscillating charge moving in and out at us in some irregular way – rather than the nice circular route described above.

The Doppler effect

The illustration below is a similar diagram as the ones above – but looks much simpler. It shows what happens when an oscillating charge (which we assume to oscillate at its natural or resonant frequency ω0) moves towards us at some relativistic speed v (whatever speed – fairly close to c, so the ratio v/c is substantial). Note that the movement is from A to B – and that the observer (we!) are, once again, at the left – and, hence, the distance traveled is AB = vτ. So what’s the impact on the frequency? That’s shown on the x'(t) graph on the right: the curvature of the sinusoidal motion is much sharper, which means that its angular frequency as we see or measure it (and we’ll denote that by ω1) will be higher: if it’s a larger object emitting ‘white’ light (i.e. a mix of everything), then the light will not longer be ‘white’ but it will have shifted towards the violet spectrum. If it moves away from us, it will appear ‘more red’.

What’s the frequency change? Well… The z(τ) function is rather simple here: z(τ) = vτ. Let’s use f and ffor a moment, instead of the angular frequency ω and ω0, as we know they only differ by the factor 2π (ω = 2π/T = 2π·f, with f = 1/T, i.e. the reciprocal of the period). Hence, in a given time Δτ, the number of oscillations will be f0Δτ. These oscillations will be spread over a distance vΔτ, and the time needed to travel that distance is Δτ – of course! For the observer, however, the same number of oscillations now is compressed over a distance (c-v)Δτ. The time needed to travel that distance corresponds to a time interval Δt = (c − v)Δτ/c = (1 − v/c)Δτ. Now, hence, the frequency f will be equal to f0Δτ (the number of oscillations) divided by Δt = (1 − v/c)Δτ. Hence, we get this relatively simple equation:

f = f0/(1 − v/c) and ω = ω0/(1 − v/c)

Is that it? It’s not quite the same as the formula we had for the Doppler effect of physical waves traveling through a medium, but it’s simple enough indeed. And it also seems to use relative speed. Where’s the Lorentz factor? Why did we need all that complicated machinery?

You are a smart ass ! You’re right. In fact, this is exactly the same formula: if we equal the speed of propagation with c, set the velocity of the receiver to zero, and substitute v (with a minus sign obviously) for the speed of the source, then we get what we get above:

The thing we need to add is that the natural frequency of an atomic oscillator is not the same as that measured when standing still: the time dilation effects kicks in. If w0  is the ‘true’ natural frequency (so measured locally, so to say), then the modified natural frequency – as corrected for the time dilation effect – will be w1 = w0(1 – v2/c2)1/2. Therefore, the grand final relativistic formula for the Doppler effect for electromagnetic radiation is:

You may feel cheated now: did you really have to suffer going through that story on the synchrotron radiation to get the formula above? I’d say: yes and no. No, because you could be happy with the Doppler formula alone. But, yes, because you don’t get the story about those sharp pulses just from the relativistic Doppler formula alone. So the final answer is: yes. I hope you felt it was worth the suffering :-)

# Loose ends: on energy of radiation and polarized light

I said I would move on to another topic, but let me wrap up some loose ends in this post. It will say a few things about the energy of a field; then it will analyze these electron oscillators in some more detail; and, finally, I’ll say a few words about polarized light.

The energy of a field

You may or may not remember, from our discussions on oscillators and energy, that the total energy in a linear oscillator is a constant sum of two variables: the kinetic energy mv2/2 and the potential energy (i.e. the energy stored in the spring as it expands and contracts) kx2/2 (remember that the force is -kx). So the kinetic energy is proportional to the square of the velocity, and the potential energy to the square of the displacement. Now, from the general solution that we had obtained for a linear oscillator – damped or not – we know that the displacement x, its velocity dx/dt, and even its acceleration are all proportional to the magnitude of the field – with different factors of proportionality of course. Indeed, we have x = qeE0eiωt/m(ω02–ω2), and so every time we take a derivative, we’ll be bring a iω factor down (and so we’ll have another factor of proportionality), but the E0 factor is still the same, and a factor of proportionality multiplied with some constant is still a factor of proportionality. Hence, the energy should be proportional to the square of the amplitude of the motion E0. What more can we say about it?

The first thing to note is that, for a field emanating from a point source, the magnitude of the field vector E will vary inversely with r. That’s clear from our formula for radiation:

Hence, the energy that the source can deliver will vary inversely as the square of the distance. That implies that the energy we can take out of a wave, within a given conical angle, will always be the same, not matter how far away we are. What we have is an energy flux spreading over a greater and greater effective area. That’s what’s illustrated below: the energy flowing within the cone OABCD is independent of the distance r at which it is measured.

However, these considerations do not answer the question: what is that factor of proportionality? What’s its value? What does it depend on?

We know that our formula for radiation is an approximate formula, but it’s accurate for what is called the “wave zone”, i.e. for all of space as soon as we are more than a few wavelengths away from the source. Likewise, Feynman derives an approximate formula only for the energy carried by a wave using the same framework that was used to derive the dispersion relation. It’s a bit boring – and you may just want to go to the final result – but, well… It’s kind of illustrative of how physics analyzes physical situations and derives approximate formulas to explain them.

Let’s look at that framework again: we had a wave coming in, and then a wave being transmitted. In-between, the plate absorbed some of the energy, i.e. there was some damping. The situation is shown below, and the exact formulas were derived in the previous post.

Now, we can write the following energy equation for a unit area:

Energy in per second = energy out per second + work done per second

That’s simple, you’ll say. Yes, but let’s see where we get with this. For the energy that’s going in (per second), we can write that as α〈Es2〉, so that’s the averaged square of the amplitude of the electric field emanating from the source multiplied by a factor α. What factor α? Well… That’s exactly what we’re trying to find out: be patient.

For the energy that’s going out per second, we have α〈Es2 + Ea2〉. Why the same α? Well… The transmitted wave is traveling through the same medium as the incoming wave (air, most likely), so it should be the same factor of proportionality. Now, α〈Es2 + Ea2〉 = α[〈Es2〉 + 2〈Es〉〈Ea〉 + 〈Ea2〉]. However, we know that we’re looking at a very thin plate here only, and so the amplitude Ea must be small as compared to Ea. So we can leave its averaged square 〈Ea2〉 value out. Indeed, as mentioned above, we’re looking at an approximation here: any term that’s proportional with NΔz, we’ll leave in (and so we’ll leave 〈Es〉〈Ea〉 in), but terms that are proportional to (NΔz)2 or a higher power can be left out. [That’s, in fact, also the reason why we don't bother to analyze the reflected wave.]

So we now have the last term: the work done per second in the plate. Work done is force times distance, and so the work done per second (i.e. the power being delivered) is the force times the velocity. [In fact, we should do a dot product but the force and the velocity point are along the same direction – except for a possible minus sign – and so that’s alright.] So, for each electron oscillator, the work done per second will be 〈qeEsv〉 and, hence, for a unit area, we’ll have NΔzqe〈Esv〉. So our energy equation becomes:

α〈Es2〉 = α〈Es2〉 + 2α〈Es〉〈Ea〉 + NΔzqe〈Esv〉

⇔ –2α〈Es〉〈Ea〉 = NΔzqe〈Esv〉

Now, we had a formula for Ea (we didn’t do the derivation of this one though: just accept it):

We can substitute this in the energy equation, noting that the average of Ea is not dependent from time. So the left-hand side of our energy equation becomes:

However, Es(at z) is Es(at atoms) retarded by z/c, so we can insert the same argument. But then, now that we’ve made sure that we got the same argument for Es and v, we know that such average is independent of time and, hence, it will be equal to the 〈Esv〉 factor on the right-hand side of our energy equation, which means this factor can be scrapped. The NΔzqe (and that 2 in the numerator and denominator) can be scrapped as well, of course. We then get the remarkably simple result that

α = ε0c

Hence, the energy carried in an electric wave per unit area and per unit time, which is also referred to as the intensity of the wave, equals:

〈S〉 = ε0c〈E〉

The rate of radiation of energy

Plugging our formula for radiation above into this formula, we get an expression for the power per square meter radiated in the direction q:

In this formula, a’ is, of course, the retarded acceleration, i.e. the value of a at point t – r/c. The formula makes it clear that the power varies inversely as the square of the distance, as it should, from what we wrote above. I’ll spare you the derivation (you’ve had enough of these derivations,  I am sure), but we can use this formula to calculate the total energy radiated in all directions, by integrating the formula over all directions. We get the following general formula:

This formula is no longer dependent on the distance r – which is also in line with what we said above: in a given cone, the energy flux is the same. In this case, the ‘cone’ is actually a sphere around the oscillating charge, as illustrated below.

Now, we usually assume we have a nice sinusoidal function for the displacement of the charge and, hence, for the acceleration, so we’ll often assume that the acceleration a equals a = –ω2x0et. In that case, we can average over a cycle (note that the average of a cosine is one-half) and we get:

Now, historically, physicists used a value written as e2, not to be confused with the transcendental number e, equal to e2 = qe2/4πe0, which – when inserted above – yields the older form of the formula above:

P = 2e2a2/3c3

In fact, we actually worked with that e2 factor already, when we were talking about potential energy and calculated the potential energy between a proton and an electron at distance r: that potential energy was equal to e2/r but that was a while ago indeed – and so you’ll probably not remember.

Atomic oscillators

Now, I can imagine you’ve had enough of all these formulas. So let me conclude by giving some actual numbers and values for things. Let’s look at these atomic oscillators and put some values in indeed. Let’s start with calculating the Q of an atomic oscillator.

You’ll remember what the Q of an oscillator is: it is a measure of the ‘quality’ (that’s what the Q stands for really) of a particular oscillator. A high Q implies that, if we ‘hit’ the oscillator, it will ‘ring’ for many cycles, so its decay time will be quite long. It also means that the peak width of its ‘frequency response’ will be quite tall. Huh? The illustrations below will refresh your memory.

The first one (below) gives a very general form for a typical resonance: we have a fixed frequency f0 (which defines the period T, and vice versa), and so this oscillator ‘rings’ indeed, and slowly dies out. An associated concept is the decay time (t) of an oscillation: that’s the time it takes for the amplitude of the oscillation to fall by a factor 1/e = 1/2.7182… ≈ 36.8% of the original value.

The second illustration (below) gives the frequency response curve. That assumes there is a continuous driving force, and we know that the oscillator will react to that driving force by oscillating – after an initial transient – at the same frequency driving force, but its amplitude will be determined by (i) the difference between the frequency of the driving force and the oscillator’s natural frequency (f0) as well as (ii) the damping factor. We will not prove it here, but the ‘peak height’ is equal to the low-frequency response (C) multiplied by the Q of the system, and the peak width is f0 divided by Q.

But what is the Q for an atomic oscillator? Well… The Q of any system is the total energy content of the oscillator and the work done (or the energy loss) per radian. [If we define it per cycle, then we need to throw an additional 2π factor in – that's just how the Q has been defined !] So we write:

Q = W/(dW/dΦ)

Now, dW/dΦ = (dW/dt)/(dΦ/dt) = (dW/dt)/ω, so Q = ωW/(dW/dt), which can be re-written as the first-order differential equation dW/dt = -(ω/Q)W. Now, that equation has the general solution

W = W0e-ωt/Q, with W0 the initial energy.

Using our energy equation – and assuming that our atomic oscillators are radiating at some natural (angular) frequency ω0, which we’ll relate to the wavelength λ = 2πc/ω0 – we can calculate the Q. But what do we use for W0? Well… The kinetic energy of the oscillator is mv2/2. Assuming the displacement x has that nice sinusoidal shape, we get mω2x02/4 for the mean kinetic energy, which we have to double to get the total energy (remember that, on average, the total energy of an oscillator is half kinetic, and half potential), so then we get W = mω2x02/2. Using me (the electron mass) for m, we can then plug it all in, divide and cancel what we need to divide and cancel, and we get the grand result:

Q = Q = ωW/(dW/dt) = 3λmec2/4πe2 or 1/Q =  4πe2/3λmec2

The second form is preferred because it allows substituting e2/mec2 for yet another ‘historical’ constant, referred to as the classical electron radius r0 = e2/mec2 = 2.82×10–15 m. However, that’s yet another diversion, and I’ll try to spare you here. Indeed, we’re almost done so let’s sprint to the finish.

So all we need now is a value for λ. Well… Let’s just take one: a sodium atom emits light with a wavelength of approximately 600 nanometer. Yes, that’s the yellow-orange light emitted by low-pressure sodium-vapor lamps used for street lighting. So that’s a typical wavelength and we get a Q equal to

Q = 3λ/4πr0 ≈ 5×107.

So what? Well… This is great ! We can finally calculate things like the decay time now – for our atomic oscillators ! From the general above, we see that the decay time must be equal to t = ω/Q ≈ 1.6×10–8 seconds (but please do check my calculation). It seems that that corresponds to experimental fact: light, as emitted by all these atomic oscillators, basically consists of very sharp pulses: one atom emits a pulse, and then another one takes over, etcetera. That’s why light is usually unpolarized – I’ll talk about that in a minute.

In addition, we can calculate the peak width Δf = f0/Q. In fact, we’ll not use frequency but wavelength: Δλ = λ/Q = 1.2×10–14. This also seems to correspond with the width of the so-called spectral lines of light-emitting sodium atoms.

Isn’t this great? With a few simple formulas, we’ve illustrated the strange world of atomic oscillators and electromagnetic radiation. I’ve covered an awful lot of ground here, I feel.

There is one more “loose end” which I’ll quickly throw in here. It’s the topic of polarization – as promised – and then we’re done really. I promise. :-)

Polarization

One of the properties of the ‘law’ of radiation as derived by Feynman is that the direction of the electric field is perpendicular to the line of sight. That’s – quite simply – because it’s only the component ax perpendicular to the line of sight that’s important. So if we have a source – i.e. an accelerating electric charge – moving in and out straight at us, we will not get a signal.

That being said, while the field is perpendicular to the line of sight – which we identify with the z-axis – the field still can have two components and, in fact, it is likely to have two components: an x- and a y-component. We show a beam with such x- and y-component below (so that beam ‘vibrates’ not only up and down but also sideways), and we assume it hits an atom – i.e. an electron oscillator – which, in turn, emits another beam. As you can see from the illustration, the light scattered at right angles to the incident beam will only ‘vibrate’ up and down: not sideways. We call such light ‘polarized’. The physical explanation is quite obvious from the illustration below: the motion of the electron oscillator is perpendicular to the z-direction only and, therefore, any radiation measured from a direction that’s perpendicular to that z-axis must be ‘plane polarized’ indeed.

Light can be polarized in various ways. In fact, if we have a ‘regular’ wave, it will always be polarized. With ‘regular’, we mean that both the vibration in the x- and y-direction will be sinusoidal: the phase may or may not be the same, that doesn’t matter. But both vibrations need to be sinusoidal. In that case, there are two broad possibilities: either the oscillations are ‘in phase’, or they are not. When the x- and y-vibrations are in phase, then the superposition of their amplitudes will look like the examples below. You should imagine here that you are looking at the end of the electric field vector, and so the electric field oscillates on a straight line.

When they are in phase, it means that the frequency of oscillation is the same. Now, that may not be the case, as shown in the examples below. However, even these ‘out of phase’ x- and y-vibrations produce a nice ellipsoidal motion and, hence, such beams are referred to as being ‘elliptically polarized’.

So what’s unpolarized light then? Well… That’s light that’s – quite simply – not polarized. So it’s irregular. Most light is unpolarized because it was emitted by electron oscillators. From what I explained above, you now know that such electron oscillators emit light during a fraction of a second only – the window is of the order of 10-–8 seconds only actually – so that’s very short indeed (a hundred millionth of a second!). It’s a sharp little pulse basically, quickly followed by another pulse as another atom takes over, and then another and so on. So the light that’s being emitted cannot have a steady phase for more than 10-8 seconds. In that sense, such light will be ‘out of phase’.

In fact, that’s why two light sources don’t interfere. Indeed, we’ve been talking about interference effects all of the time but you may have noticed :-) that – in daily life – the combined intensity of light from two sources is just the sum of the intensities of the two lights: we don’t see interference. So there you are. [Now you will, of course, wonder why physics studies phenomena we don't observe in daily life - but that's an entirely different matter, and you would actually not be reading this post if you thought that.]

Now, with polarization, we can explain a number of things that we couldn’t explain before. One of them is birefringence: a material may have a different index of refraction depending on whether the light is linearly polarized in one direction rather than another, which explains why the amusing property of Iceland spar, a crystal that doubles the image of anything seen through it. But we won’t play with that here. You can look that up yourself.

# Refraction and Dispersion of Light

In this post, we go right at the heart of classical physics. It’s going to be a very long post – and a very difficult one – but it will really give you a good ‘feel’ of what classical physics is all about. To understand classical physics – in order to compare it, later, with quantum mechanics – it’s essential, indeed, to try to follow the math in order to get a good feel for what ‘fields’ and ‘charges’ and ‘atomic oscillators’ actually represent.

As for the topic of this post itself, we’re going to look at refraction again: light gets dispersed as it travels from one medium to another, as illustrated below.

Dispersion literally means “distribution over a wide area”, and so that’s what happens as the light travels through the prism: the various frequencies (i.e. the various colors that make up natural ‘white’ light) are being separated out over slightly different angles. In physics jargon, we say that the index of refraction depends on the frequency of the wave – but so we could also say that the breaking angle depends on the color. But that sounds less scientific, of course. In any case, it’s good to get the terminology right. Generally speaking, the term refraction (as opposed to dispersion) is used to refer to the bending (or ‘breaking’) of light of a specific frequency only, i.e. monochromatic light, as shown in the photograph below. [...] OK. We’re all set now.

It is interesting to note that the photograph above shows how the monochromatic light is actually being obtained: if you look carefully, you’ll see two secondary beams on the left-hand side (with an intensity that is much less than the central beam – barely visible in fact). That suggests that the original light source was sent through a diffraction grating designed to filter only one frequency out of the original light beam. That beam is then sent through a bloc of transparent material (plastic in this case) and comes out again, but displaced parallel to itself. So the block of plastics ‘offsets’ the beam. So how do we explain that in classical physics?

The index of refraction and the dispersion equation

As I mentioned in my previous post, the Greeks had already found out, experimentally, what the index of refraction was. To be more precise, they had measured the θ1 and θ2 – depicted below – for light going from air to water. For example, if the angle in air (θ1) is 20°, then the angle in the water (θ2) will be 15°. It the angle in air is 70°, then the angle in the water will be 45°.

Of course, it should be noted that a lot of the light will also be reflected from the water surface (yes, imagine the romance of the image of the moon reflected on the surface of glacial lake while you’re feeling damn cold) – but so that’s a phenomenon which is better  explained by introducing probability amplitudes, and looking at light as a bundle of photons, which we will not do here. I did that in previous posts, and so here, we will just acknowledge that there is a reflected beam but not say anything about it.

In any case, we should go step by step, and I am not doing that right now. Let’s first define the index of refraction. It is a number n which relates the angles above through the following relationship, which is referred to as Snell’s Law:

sinθ1 = n sinθ2

Using the numbers given above, we get: sin(20°) = n sin(15°), and sin(70°) = n sin(45°), so n must be equal to n = sin(20°)/sin(15°)  = sin(70°)/sin(45°) ≈ 1.33. Just for the record, Willibrord Snell was a medieval Dutch astronomer but, according to Wikipedia, some smart Persian, Ibn Sahl, had already jotted this down in a treatise – “On Burning Mirrors and Lenses” – while he was serving the Abbasid court of Baghdad, back in 984, i.e. more than a thousand years ago! What to say? It was obviously a time when the Sunni-Shia divide did not matter, and Arabs and ‘Persians’ were leading civilization. I guess I should just salute the Islamic Golden Age here, regret the time lost during Europe’s Dark Ages and, most importantly, regret where Baghdad is right now ! And, as for the ‘burning’ adjective, it just refers to the fact that large convex lenses can concentrate the sun’s rays to a very small area indeed, thereby causing ignition. [It seems that story about Archimedes burning Roman ships with a 'death ray' using mirrors - in all likelihood: something that did not happen - fascinated them as well.]

But let’s get back at it. Where were we? Oh – yes – the refraction index. It’s (usually) a positive number written as n = 1 + some other number which may be positive or negative, and which depends on the properties of the material. To be more specific, it depends on the resonant frequencies of the atoms (or, to be precise, I should say: the resonant frequencies of the electrons bound by the atom, because it’s the charges that generate the radiation). Plus a whole bunch of natural constants that we have encountered already, most of which are related to electrons. Let me jot down the formula – and please don’t be scared away now (you can stop a bit later, but not now :-) please):

N is just the number of charges (electrons) per unit volume of the material (e.g. the water, or that block of plastic), and qe and m are just the charge and mass of the electron. And then you have that electric constant once again, ε0, and… Well, that’s it ! That’s not too terrible, is it? So the only variables on the right-hand side are ω0 and ω, so that’s (i) the resonant frequency of the material (or the atoms – well, the electrons bound to the nucleus, to be precise, but then you know what I mean and so I hope you’ll allow me to use somewhat less precise language from time to time) and (ii) the frequency of the incoming light.

The equation above is referred to as the dispersion relation. It’s easy to see why: it relates the frequency of the incoming light to the index of refraction which, in turn, determinates that angle θ. So the formula does indeed determine how light gets dispersed, as a function of the frequencies in it, by some medium indeed (glass, air, water,…).

So the objective of this post is to show how we can derive that dispersion relation using classical physics only. As usual, I’ll follow Feynman – arguably the best physics teacher ever. :-) Let me warn you though: it is not a simple thing to do. However, as mentioned above, it goes to the heart of the “classical world view” in physics and so I do think it’s worth the trouble. Before we get going, however, let’s look at the properties of that formula above, and relate it some experimental facts, in order to make sure we more or less understand what it is that we are trying to understand. :-)

First, we should note that the index of refraction has nothing to do with transparency. In fact, throughout this post, we’ll assume that we’re looking at very transparent materials only, i.e. materials that do not absorb the electromagnetic radiation that tries to go through them, or only absorb it a tiny little bit. In reality, we will have, of course, some – or, in the case of opaque (i.e. non-transparent) materials, a lot – of absorption going on, but so we will deal with that later. So, let me repeat: the index of refraction has nothing to do with transparency. A material can have a (very) high index of refraction but be fully transparent. In fact, diamond is a case in point: it has one of the highest indexes of refraction (2.42) of any material that’s naturally available, but it’s – obviously – perfectly transparent. [In case you're interested in jewellery, the refraction index of its most popular substitute, cubic zirconia, comes very close (2.15-2.18) and, moreover, zirconia actually works better as a prism, so its disperses light better than diamond, which is why it reflects more colors. Hence, real diamond actually sparkles less than zirconia! So don't be fooled! :-)]

Second, it’s obvious that the index of refraction depends on two variables indeed: the natural, or resonant frequency, ω0, and the frequency ω, which is the frequency of the incoming light. For most of the ordinary gases, including those that make up air (i.e. nitrogen (78%) and oxygen (21%), plus some vapor (averaging 1%) and the so-called noble gas argon (0.93%) – noble because, just like helium and neon, it’s colorless, odorless and doesn’t react easily), the natural frequencies of the electron oscillators are close to the frequency of ultraviolet light. [The greenhouse gases are a different story - which is why we're in trouble on this planet. Anyway...] So that’s why air absorbs most of the UV, especially the cancer-causing ultraviolet-C light (UVC), which is formally classified as a carcinogen by the World Health Organization. The wavelength of UVC light is 100 to 300 nanometer – as opposed to visible light, which has a wavelength ranging from 400 to 700 nm – and, hence, the frequency of UV light is in the 1000 to 3000 Teraherz range (1 THz = 1012 oscillations per second) – as opposed to visible light, which has a frequency in the range of 400 to 800 THz. So, because we’re squaring those frequencies in the formula, ω2 can then be disregarded in comparison with ω02: for example, 15002 = 2,250,000 and that’s not very different from 15002 – 5002 = 2,000,000. Hence, if we leave the ω2 out, we are still dividing by a very large number. That’s why n is very close to one for visible light entering the atmosphere from space (i.e. the vacuum). Its value is, in fact, around 1.000292 for incoming light with a wavelength of 589.3 nm (the odd value is the mean of so-called sodium D light, a pretty common yellow-orange light (street lights!), so that’s why it’s used as a reference value – however, don’t worry about it).

That being said, while the n of air is close to one for all visible light, the index is still slightly higher for blue light as compared to red light, and that’s why the sky is blue, except in the morning and evening, when it’s reddish. Indeed, the illustration below is a bit silly, but it gives you the idea. [I took this from http://mathdept.ucr.edu/ so I’ll refer you to that for the full narrative on that. :-)]

Where are we in this story? Oh… Yes. Two frequencies. So we should also note that – because we have two frequency variables – it also makes sense to talk about, for instance, the index of refraction of graphite (i.e. carbon in its most natural occurrence, like in coal) for x-rays. Indeed, coal is definitely not transparent to visible light (that has to do with the absorption phenomenon, which we’ll discuss later) but it is very ‘transparent’ to x-rays. Hence, we can talk about how graphite bends x-rays, for example. In fact, the frequency of x-rays is much higher than the natural frequency of the carbon atoms and, hence, in this case we can neglect the w02 factor, so we get a denominator that is negative (because only the -w2 remains relevant), so we get a refraction index that is (a bit) smaller than 1. [Of course, our body is transparent to x-rays too – to a large extent – but in different degrees, and that’s why we can take x-ray photographs of, for example, a broken rib or leg.]

OK. [...] So that’s just to note that we can have a refraction index that is smaller than one and that’s not ‘anomalous’ – even if that’s a historical term that has survived.

Finally, last but not least as they say, you may have heard that scientists and engineers have managed to construct so-called negative index metamaterials. That matter is (much) more complicated than you might think, however, and so I’ll refer you to the Web if you want to find out more about that.

Light going through a glass plate: the classical idea

OK. We’re now ready to crack the nut. We’ll closely follow my ‘Great Teacher’ Feynman (Lectures, Vol. I-31) as he derives that formula above. Let me warn you again: the narrative below is quite complicated, but really worth the trouble – I think. The key to it all is the illustration below. The idea is that we have some electromagnetic radiation emanating from a far-away source hitting a glass plate – or whatever other transparent material. [Of course, nothing is to scale here: it's just to make sure you get the theoretical set-up.]

So, as I explained in my previous post, the source creates an oscillating electromagnetic field which will shake the electrons up and down in the glass plate, and then these shaking electrons will generate their own waves. So we look at the glass as an assembly of little “optical-frequency radio stations” indeed, that are all driven with a given phase. It creates two new waves: one reflecting back, and one modifying the original field.

Let’s be more precise. What do we have here? First, we have the field that’s generated by the source, which is denoted by Es above. Then we have the “reflected” wave (or field – not much difference in practice), so that’s Eb. As mentioned above, this is the classical theory, not the quantum-electrodynamical one, so we won’t say anything about this reflection really: just note that the classical theory acknowledges that some of the light is effectively being reflected.

OK. Now we go to the other side of the glass. What do we expect to see there? If we would not have the glass plate in-between, we’d have the same Es field obviously, but so we don’t: there is a glass plate. :-) Hence, the “transmitted” wave, or the field that’s arriving at point P let’s say, will be different than Es. Feynman writes it as Es + Ea

Hmm… OK. So what can we say about that? Not easy…

The index of refraction and the apparent speed of light in a medium

Snell’s Law – or Ibn Sahl’s Law – was re-formulated, by a 17th century French lawyer with an interesting in math and physics, Pierre de Fermat, as the Principle of Least Time. It is a way of looking at things really – but it’s very confusing actually. Fermat assumed that light traveling through a medium (water or glass, for instance) would travel slower, by a certain factor n, which – indeed – turns out to be the index of refraction. But let’s not run before we can walk. The Principle is illustrated below. If light has to travel from point S (the source) to point D (the detector), then the fastest way is not the straight line from S to D, but the broken S-L-D line. Now, I won’t go into the geometry of this but, with a bit of trial and error, you can verify for yourself that it turns out that the factor n will indeed be the same factor n as the one which was ‘discovered’ by Ibn Sahl: sinθ1 = n sinθ2.

What we have then, is that the apparent speed of the wave in the glass plate that we’re considering here will be equal to v = c/n. The apparent speed? So does that mean it is not the real speed? Hmm… That’s actually the crux of the matter. The answer is: yes and no. What? An ambiguous answer in physics? Yes. It’s ambiguous indeed. What’s the speed of a wave? We mentioned above that n could be smaller than one. Hence, in that case, we’d have a wave traveling faster than the speed of light. How can we make sense of that?

We can make sense of that by noting that the wave crests or nodes may be traveling faster than c, but that the wave itself – as a signal – cannot travel faster than light. It’s related to what we said about the difference between the group and phase velocity of a wave. The phase velocity – i.e. the nodes, which are mathematical points only – can travel faster than light, but the signal as such, i.e. the wave envelope in the illustration below, cannot.

What is happening really is the following. A wave will hit one of these electron oscillators and start a so-called transient, i.e. a temporary response preceding the ‘steady state’ solution (which is not steady but dynamic – confusing language once again – so sorry!). So the transient settles down after a while and then we have an equilibrium (or steady state) oscillation which is likely to be out of phase with the driving field. That’s because there is damping: the electron oscillators resist before they go along with the driving force (and they continue to put up resistance, so the oscillation will die out when the driving force stops!). The illustration below shows how it works for the various cases:

In case (b), the phase of the transmitted wave will appear to be delayed, which results in the wave appearing to travel slower, because the distance between the wave crests, i.e. the wavelength λ, is being shortened. In case (c), it’s the other way around: the phase appears to be advanced, which translated into a bigger distance between wave crests, or a lengthening of the wavelength, which translates into an apparent higher speed of the transmitted wave.

So here we just have a mathematical relationship between the (apparent) speed of a wave and its wavelength. The wavelength is the (apparent) speed of the wave (that’s the speed with which the nodes of the wave travel through space, or the phase velocity) divided by the frequency: λ = vp/f. However, from the illustration above, it is obvious that the signal, i.e. the start of the wave, is not earlier – or later – for either wave (b) and (c). In fact, the start of the wave, in time, is exactly the same for all three cases. Hence, the electromagnetic signal travels at the same speed c, always.

While this may seem obvious, it’s quite confusing, and therefore I’ll insert one more illustration below. What happens when the various wave fronts of the traveling field hit the glass plate (coming from the top-left hand corner), let’s say at time t = t0, as shown below, is that the wave crests will have the same spacing along the surface. That’s obvious because we have a regular wave with a fixed frequency and, hence, a fixed wavelength λ0, here. Now, these wave crests must also travel together as the wave continues its journey through the glass, which is what is shown by the red and green arrows below: they indicate where the wave crest is after one and two periods (T and 2T) respectively.

To understand what’s going on, you should note that the frequency f of the wave that is going through the glass sheet and, hence, its period T, has not changed. Indeed, the driven oscillation, which was illustrated for the two possible cases above (n > 1 and n < 1), after the transient has settled down, has the same frequency (f) as the driving source. It must. Always. That being said, the driven oscillation does have that phase delay (remember: we’re in the (b) case here, but we can make a similar analysis for the (c) case). In practice, that means that the (shortest) distance between the crests of the wave fronts at time t = t0 and the crests at time t0 + T will be smaller. Now, the (shortest) distance between the crests of a wave is, obviously, the wavelength divided by the frequency: λ = vp/f, with vp the speed of propagation, i.e. the phase velocity, of the wave, and f = 1/T. [The frequency f is the reciprocal of the period T - always. When studying physics, I found out it's useful to keep track of a few relationships that hold always, and so this is one of them. :-)]

Now, the frequency is the same, but so the wavelength is shortened as the wave travels through the various layers of electron oscillators, each causing a delay of phase – and, hence, a shortening of the wavelength, as shown above. But, if f is the same, and the wavelength is shorter, then vp cannot be equal to the speed of the incoming light, so vp ≠ c. The apparent speed of the wave traveling through the glass, and the associated shortening of the wavelength, can be calculated using Snell’s Law. Indeed, knowing that n ≈ 1.33, we can calculate the apparent speed of light through the glass as v = c/n  ≈ 0.75c and, therefore, we can calculate the wavelength of the wave in the glass l as λ = 0.75λ0.

OK. I’ve been way too lengthy here. Let’s sum it all up:

• The field in the glass sheet must have the shape that’s depicted above: there is no other way. So that means the direction of ‘propagation’ has been changed. As mentioned above, however, the direction of propagation is a ‘mathematical’ property of the field: it’s not the speed of the ‘signal’.
• Because the direction of propagation is normal to the wave front, it implies that the bending of light rays comes about because the effective speed of the waves is different in the various materials or, to be even more precise, because the electron oscillators cause a delay of phase.
• While the speed and direction of propagation of the wave, i.e. the phase velocity, accurately describes the behavior of the field, it is not the speed with which the signal is traveling (see above). That is why it can be larger or smaller than c, and so it should not raise any eyebrow. For x-rays in particular, we have a refractive index smaller than one. [It's only slightly less than one, though, and, hence, x-ray images still have a very good resolution. So don't worry about your doctor getting a bad image of your broken leg. :-) In case you want to know more about this: just Google x-ray optics, and you’ll find loads of information. :-)]

Calculating the field

Are you still there? Probably not. If you are, I am afraid you won’t be there ten or twenty minutes from now. Indeed, you ain’t done nothing yet. All of the above was just setting the stage: we’re now ready for the pièce de résistance, as they say in French. We’re back at that illustration of the glass plate and the various fields in front and behind the plate. So we have electron oscillators in the glass plate. Indeed, as Feynman notes: “As far as problems involving light are concerned, the electrons behave as though they were held by springs. So we shall suppose that the electrons have a linear restoring force which, together with their mass m, makes them behave like little oscillators, with a resonant frequency ω0.”

So here we go:

1. From everything I wrote about oscillators in previous posts, you should remember that the equation for this motion can be written as m[d2x/dt2 + ω02) = F. That’s just Newton’s Law. Now, the driving force F comes from the electric field and will be equal to F = qeEs.

Now, we assume that we can chose the origin of time (i.e. the moment from which we start counting) such that the field Es = E0cos(ωt). To make calculations easier, we look at this as the real part of a complex function Es = E0eiωt. So we get:

m[d2x/dt2 + ω02] = qeE0eiωt

We’ve solved this before: its solution is x = x0eiωt. We can just substitute this in the equation above to find x0 (just substitute and take the first- and then second-order derivative of x indeed): x0 = qeE0/m(ω022). That, then, gives us the first piece in this lengthy derivation:

x = qeE0eiωt/m(ω02 2)

Just to make sure you understand what we’re doing: this piece gives us the motion of the electrons in the plate. That’s all.

2. Now, we need an equation for the field produced by a plane of oscillating charges, because that’s what we’ve got here: a plate or a plane of oscillating charges. That’s a complicated derivation in its own, which I won’t do there. I’ll just refer to another chapter of Feynman’s Lectures (Vol. I-30-7) and give you the solution for it (if I wouldn’t do that, this post would be even longer than it already is):

This formula introduces just one new variable, η, which is the number of charges per unit area of the plate (as opposed to N, which was the number of charges per unit volume in the plate), so that’s quite straightforward. Less straightforward is the formula itself: this formula says that the magnitude of the field is proportional to the velocity of the charges at time t – z/c, with z the shortest distance from P to the plane of charges. That’s a bit odd, actually, but so that’s the way it comes out: “a rather simple formula”, as Feynman puts it.

In any case, let’s use it. Differentiating x to get the velocity of the charges, and plugging it into the formula above yields:

Note that this is only Ea, the additional field generated by the oscillating charges in the glass plate. To get the total electric field at P, we still have to add Es, i.e. the field generated by the source itself. This may seem odd, because you may think that the glass plate sort of ‘shields’ the original field but, no, as Feynman puts it: “The total electric field in any physical circumstance is the sum of the fields from all the charges in the universe.”

3. As mentioned above, z is the distance from P to the plate. Let’s look at the set-up here once again. The transmitted wave, or Eafter the plate as we shall note it, consists of two components: Es and Ea. Es here will be equal to (the real part of) Es = E0eiω(t-z/c). Why t – z/c instead of just t? Well… We’re looking at Es here as measured in P, not at Es at the glass plate itself.

Now, we know that the wave ‘travels slower’ through the glass plate (in the sense that its phase velocity is less, as should be clear from the rather lengthy explanation on phase delay above, or – if n would be greater than one – a phase advance). So if the glass plate is of thickness Δz, and the phase velocity is is v = c/n, then the time it will take to travel through the glass plate will be Δz/(c/n) instead of Δz/c (speed is distance divided by time and, hence, time = distance divided by speed). So the additional time that is needed is Δt = Δz/(c/n) – Δz/c = nΔz/c – Δz/c = (n-1)Δz/c. That, then, implies that Eafter the plate is equal to a rather monstrously looking expression:

Eafter plate = E0eiω[t - (n-1)Δz/c - z/c) = e-iω(n-1)Δz/c)E0eiω(t - z/c)

We get this by just substituting t for t – Δt.

So what? Well… We have a product of two complex numbers here and so we know that this involves adding angles – or substracting angles in this case, rather, because we’ve got a minus sign in the exponent of the first factor. So, all that we are saying here is that the insertion of the glass plate retards the phase of the field with an amount equal to w(n-1)Δz/c. What about that sum Eafter the plate = Es + Ea that we were supposed to get?

Well… We’ll use the formula for a first-order (linear) approximation of an exponential once again: ex ≈ 1 + x. Yes. We can do that because Δz is assumed to be very small, infinitesimally small in fact. [If it is not, then we’ll just have to assume that the plate consists of a lot of very thin plates.] So we can write that e-iω(n-1)Δz/c) = 1 – iω(n-1)Δz/c, and then we, finally, get that sum we wanted:

Eafter plate = E0eiω[t - z/c) iω(n-1)Δz·E0eiω(t - z/c)/c

The first term is the original Es field, and the second term is the Ea field. Geometrically, they can be represented as follows:

Why is Ea perpendicular to Es? Well… Look at the –i = 1/i factor. Multiplication with –i amounts to a clockwise rotation by 90°, and then just note that the magnitude of the vector must be small because of the ω(n-1)Δz/c factor.

4. By now, you’ve either stopped reading (most probably) or, else, you wonder what I am getting at. Well… We have two formulas for Ea now:

and Ea = – iω(n-1)Δz·E0eiω(t - z/c)/c

Equating both yields:

But η, the number of charges per unit area, must be equal to NΔz, with N the number of charges per unit volume. Substituting and then cancelling the Δz finally gives us the formula we wanted, and that’s the classical dispersion relation whose properties we explored above:

Absorption and the absorption index

The model we used to explain the index of refraction had electron oscillators at its center. In the analysis we did, we did not introduce any damping factor. That’s obviously not correct: it means that a glass plate, once it had illuminated, would continue to emit radiation, because the electrons would oscillate forever. When introducing damping, the denominator in our dispersion relation becomes m(ω02 – ω2 + iγω), instead of m(ω02 – ω2). We derived this in our posts on oscillators. What it means is that the oscillator continues to oscillate with the same frequency as the driving force (i.e. not its natural frequency) – so that doesn’t change – but that there is an envelope curve, ensuring the oscillation dies out when the driving force is no longer being applied. The γ factor is the damping factor and, hence, determines how fast the damping happens.

We can see what it means by writing the complex index of refraction as n = n’ – in’’, with n’ and n’’ real numbers, describing the real and imaginary part of n respectively. Putting that complex n in the equation for the electric field behind the plate yields:

Eafter plate = e-ωn’’Δz/ce-iω(n’-1)Δz/cE0eiω(t - z/c)

This is the same formula that we had derived already, but so we have an extra exponential factor: e-ωn’’Δz/c. It’s an exponential factor with a real exponent, because there were two i‘s that cancelled. The e-x function has a familiar shape (see below): e-x is 1 for x = 0, and between 0 and 1 for any value in-between. That value will depend on the thickness of the glass sheet. Hence, it is obvious that the glass sheet weakens the wave as it travels through it. Hence, the wave must also come out with less energy (the energy being proportional to the square of the amplitude). That’s no surprise: the damping we put in for the electron oscillators is a friction force and, hence, must cause a loss of energy.

Note that it is the n’’ term – i.e. the imaginary part of the refractive index n – that determines the degree of absorption (or attenuation, if you want). Hence, n’’ is usually referred to as the “absorption index”.

The complete dispersion relation

We need to add one more thing in order to get a fully complete dispersion relation. It’s the last thing: then we have a formula which can really be used to describe real-life phenomena. The one thing we need to add is that atoms have several resonant frequencies – even an atom with only one electron, like hydrogen ! In addition, we’ll usually want to take into account the fact that a ‘material’ actually consists of various chemical substances, so that’s another reason to consider more than one resonant frequency. The formula is easily derived from our first formula (see the previous post), when we assumed there was only one resonant frequency. Indeed, when we have Nk electrons per unit of volume, whose natural frequency is ωk and whose damping factor is γk, then we can just add the contributions of all oscillators and write:

The index described by this formula yields the following curve:

So we have a curve with a positive slope, and a value n > 1, for most frequencies, except for a very small range of ω’s for which the slope is negative, and for which the index of refraction has a value n < 1. As Feynman notes, these ω’s– and the negative slope – is sometimes referred to as ‘anomalous’ dispersion but, in fact, there’s nothing ‘abnormal’ about it.

The interesting thing is the iγkω term in the denominator, i.e. the imaginary component of the index, and how that compares with the (real) “resonance term” ωk2- ω2. If the resonance term becomes very small compared to iγkω, then the index will become almost completely imaginary, which means that the absorption effect becomes dominant. We can see that effect in the spectrum of light that we receive from the sun: there are ‘dark lines’, i.e. frequencies that have been strongly absorbed at the resonant frequencies of the atoms in the Sun and its ‘atmosphere’, and that allows us to actually tell what the Sun’s ‘atmosphere’ (or that of other stars) actually consists of.

So… There we are. I am aware of the fact that this has been the longest post of all I’ve written. I apologize. But so it’s quite complete now. The only piece that’s missing is something on energy and, perhaps, some more detail on these electron oscillators. But I don’t think that’s so essential. It’s time to move on to another topic, I think.

# Euler’s spiral

When talking diffraction, one of the more amusing curves is the curve showing the intensity of light near the edge of a shadow. It is shown below.

Light becomes more intense as we move away from the edge, then it overshoots (so it is brighter than further away), then the intensity wobbles and oscillates, to finally ‘settle’ at the intensity of the light elsewhere.

How do we get a curve like that? We get it through another amusing curve: the Cornu spiral (which was re-named as the Euler spiral for some reason I don’t understand), which we’ve encountered also when adding probability amplitudes. Let me first depict the ‘real’ situation below: we have an opaque object AB, so no light goes through AB itself. However, the light that goes past it, casts a shadow on a screen, which is denoted as QPR here. And so the curve above shows the intensity of the light near the edge of that shadow.

The first weird thing to note is what I said about diffraction of light through a slit (or a hole – in somewhat less respectful language) in my previous post: the diffraction patterns can be explained if we assume that there are sources distributed, with uniform density, across the open holes. This is a deep mystery, which I’ll attempt to explain later. As for now, I can only state what Feynman has to say about it: “Of course, actually there are no sources at the holes. In fact, that is the only place that there are certainly no sources. Nevertheless, we get the correct diffraction pattern by considering the holes to be the only places where there are sources.”

So we do the same here. We assume that we have a series of closely spaced ‘antennas’, or sources, starting from B, up to D, E, C and all the way up to infinity, and so we need to add the contributions – or the waves – from these sources to calculate the intensity at all of the points on the screen. Let’s start with the (random) point P. P defines the inflection point D: we’ll say the phase there is zero (because we can, of course, choose our point in time so as to make it zero). So we’ll associate the contribution from D with a tiny vector (an infinitesimal vector) with angle zero. That is shown below: it’s the ‘flat’ (horizontal) vector pointing straight east at the very center of this so-called Cornu spiral.

Now, in the neighborhood of D, i.e. just below or above point D, the phase difference will be very small, because the distance from those points near D to P will not differ much from the distance between D and P (i.e. the distance DP). However, as h increases, the phase difference will become larger and larger, it will not increase linearly with h but, because of the geometry involved, the path difference – and, hence, the phase difference (remember – from the previous post – that the phase difference was the product of the wave number and the difference in distance) will increase proportionally with the square of h. In fact, using similar triangles once again, we can easily show that this path difference EF can be approximated by EF ≈ h2/s. However, don’t lose sleep if you wouldn’t manage to figure that out. :-)

The point to note is that, when you look at that spiral above, the angle of each vector that we’re adding, increases more and more, so that’s why we get a spiral, and not a polygon in a circle, such as the one we encountered in our previous post: the phase differences there were linearly proportional and, hence, each vector added a constant angle to the previous one. Likewise, if we go down from D, to the edge B, the angles will decrease. Of course, if we’re adding contributions to get the amplitude or intensity for point P, we will not get any contributions from points below B. The last (or, I should say, the first) contribution that we get is denoted by the vector BP on that spiral curve, so if we want to get the total contribution, then we have to start adding vectors from there. [Don’t worry: you’ll understand why the other vectors, ‘down south’, are there in a few minutes.]

So we start from BP and go all the way… Well… You see that, once, we’re ‘up north’, in the center of the upper-most spiral, we’re not adding much anymore, because the additional vectors are just sharply changing direction and going round and round and round. In short, most of the contribution to the amplitude of the resultant vector BP∞ is given by points near D. Now, we have chosen point P randomly, and you can easily see from that Cornu spiral that the amplitude, or the intensity rather (which is the square of the amplitude) of that vector BP∞, increases initially, to reach some maximum, depending upon where P is located above B, but then it falls and oscillates indeed, producing the curve with which we started this post.

OK. [...] So what else do we have here? Well… That Cornu spiral also shows how we should add arrows to get the intensity at point Q. We’d be adding arrows in the upper-most spiral only and, hence, we would not get much of a total contribution as a result. That’s what marked by vector BQ. On the other hand, if we’d be adding contributions to calculate the intensity at a point much higher than P, i.e. R, then we’d be using pretty much all of the arrows, down from the spiral ‘south’ all the way up to the spiral ‘north’. So that’s BR obviously and, as you can see, most of the contribution comes, once again, from points near D, so that’s the points near the edge. [So now you know why we have an infinite number of arrows in both directions: we need to be able to calculate the intensity from any point on the screen really, below or above P.]

OK. What else? Well… Nothing. This is it really − for the moment that is. Just note that we’re not adding probability amplitudes here (unlike what we did a couple of months ago). We’re adding vectors representing something real here: electric field vectors. [As for how 'real' they are: I'll entertain you about that later. :-)]

This was rather short, isn’t it? I hope you liked it because… Well… What will follow is actually much more boring, because it involves a lot more formulas. However, these formulas will help us get where we want to get, and that is to understand – somehow, if only from a classical perspective – why that empty space acts like an array of electromagnetic radiation sources.

Indeed, when everything is said and done, that’s the deep mystery of light really. Really really deep.

# Diffraction gratings

Diffraction gratings are fascinating. The iridescent reflections from the grooves of a compact disc (CD), or from oil films, soap bubbles: it is all the same principle (or closely related – to be precise). In my April, 2014 posts, I introduced Feynman’s ‘arrows’ to explain it. Those posts talked about probability amplitudes, light as a bundle of photons, quantum electrodynamics. They were not wrong. In fact, the quantum-electrodynamical explanation is actually the only one that’s 100% correct (as far as we ‘know’, of course). But it is also more complicated than the classical explanation, which just explains light as waves.

To understand the classical explanation, one first needs to understand how electromagnetic waves interfere. That’s easy, you’ll say. It’s all about adding waves, isn’t it? And we have done that before, haven’t we? Yes. We’ve done it for sinusoidal waves. We also noted that, from a math point of view, the easiest way to go about it was to use vectors or complex numbers, and equate the real parts of the complex numbers with the actual physical quantities, i.e. the electric field in this case.

You’re right. Let’s continue to work with sinusoidal waves, but instead of having just two waves, we’ll consider a whole array of sources, because that’s what we’ll need to analyze when analyzing a diffraction grating.

First the simple case: two sources

Let’s first re-analyze the simple situation: two sources – or two dipole radiators as I called them in my previous post. The illustration below gives a top view of two such oscillators. They are separated, in the north-south direction, by a distance d.

Is that realistic? It is for radio waves: the wavelength of a 1 megahertz radio wave is 300 m (remember: λ = c/f). So, yes, we can separate two sources by a distance in the same order of magnitude as the wavelength of the radiation, but, as Feynman writes: “We cannot make little optical-frequency radio stations and hook them up with infinitesimal wires and drive them all with a given phase.”

For light, it will work differently – and we’ll describe how, but not now. As for now, we should continue with our radio waves.

The illustration above assumes that the radiation from the two sources is sinusoidal and has the same (maximum) amplitude A, but that the two sources might be out of phase: we’ll denote the difference by α. Hence, we can represent the radiation emitted by the two sources by the real part of the complex numbers Aeiωt and Aei(ωt + α) respectively. Now, we can move our detector around to measure the intensity of the radiation from these two antennas. If we place our detector at some point P, sufficiently far away from the sources, then the angle θ will result in another phase difference, due to the difference in distance from point P to the two oscillators. From simple geometry, we know that this difference will be equal to d·sinθ. The phase difference due to the distance difference will then be equal to the product of the wave number k (i.e. the rate of change of the phase (expressed in radians) with distance, i.e. per meter) and that distance d·sinθ. So the phase difference at arrival (i.e. at point P) would be

Φ2 – Φ1 = α + k· d·sinθ = α + (2π/λ)·d·sinθ

That’s pretty obvious, but let’s play a bit with this, in order to make we understand what’s going on. The illustration below gives two examples: α = 0 and α = π.

How do we get these numbers 0, 2 and 4, which indicate the intensity, i.e. the amount of energy that the field carries past per second, which is proportional to the square of the field, averaged in time? [If it would be (visible) light, instead of radio waves, the intensity would be the brightness of the light.]

Well… In the first case, we have α = 0 and d = λ/2 and, hence, at an angle of 30 degrees, we have d·sin(30°) = (λ/2)(1/2) = λ/4. Therefore, Φ2 – Φ1 = α + (2π/λ)·d·sinθ = 0 + (2π/λ)·(λ/4) = π/2. So what? Well… Let’s add the waves. We will have some combined wave with amplitude AR and phase ΦR:

Now, to calculate the length of this ‘vector’, i.e. the amplitude AR, we take the product of this complex number and its complex conjugate, and that will give us the length squared, and then we multiply it all out and so on and so on. To make a long story short, we’ll find that

AR2 = A12 + A22 + 2A1A2cos(Φ2 – Φ1)

The last term in this sum is the interference effect, and so that’s equal to zero in the case we’ve been studying above (α = 0, d = λ/2 and θ = 30°), so we get twice the intensity of one oscillator only. The other cases can be worked out in the same way.

Now, you should not think that the pattern is always symmetric, or simple, as the two illustrations below make clear.

With more oscillators, the patterns become even more interesting. The illustration below shows part of the intensity pattern of a six-dipole antenna array:

Let’s look at that now indeed: arrays with n oscillators.

Arrays with n oscillators

If we have six oscillators, like in the illustration above, we have to add something like this:

R = A[cos(ωt) + cos(ωt + Φ) + cos(ωt + 2Φ) + … + cos(ωt + 5Φ)]

From what we wrote above, it is obvious that the phase difference Φ can have two causes: the oscillators may be driven differently in phase, or we may be looking at them at an angle so that there is a difference in time delay. Hence, we have the same formula as the one above:

Φ = α + (2π/λ)·d·sinθ

Now, we have an interesting geometrical approach to finding the net amplitude AR. We can, once again, consider the various waves as vectors and add them, as shown below.

The length of all vectors is the same (A), and then we have the phase difference, i.e. the different angles: zero for A1, Φ for A1, 2Φ for A2, etcetera. So as we’re adding these vectors, we’re going around and forming an equiangular polygon with n sides, with the vertices (corner points) lying on a circle with radius r. It requires just a bit of trigonometry to establish that the following equality must hold: A = 2rsin(Φ/2). So that fixes r. We also have that the large angle OQT equals nΦ and, hence, AR = 2rsin(nΦ/2). We can now combine the results to find the following amplitude and intensity formula:

This formula is obvious for n = 1 and for n = 2: it gives us the results which were shown above already. But here we want to know how this thing behaves for large n. It is easy to see that the numerator above, i.e. sin2(nΦ/2), will always be larger than the denominator, sin2(Φ/2), and that both are – obviously – smaller or equal to 1. It can be demonstrated that this function of the angle Φ reaches its maximum value for Φ = 0. Indeed, taking the limit gives us I = I0n2. [We can intuitively see this because, if we express the angle in radians, we can substitute sin(Φ/2) and sin(nΦ/2) for Φ/2 and nΦ/2, and then we can eliminate the (Φ/2)2 factor to get n2.

It’s a bit more difficult to understand what happens next. If Φ becomes a bit larger, the ratio of the two sines begins to fall off (so it becomes smaller than n2). Note that the numerator, i.e. sin2(nΦ/2), will be equal to one if nΦ/2 = π/2, i.e. if Φ = π/n, and the ratio sin2(nΦ/2)/sin2(Φ/2) then becomes sin2(π/2)/sin2(π/2n) = 1/sin2(π/2n). Again, if we assume that n is (very) large, we can approximate and write that this ratio is more or less equal to 1/(π2/4n2) = 4n22. That means that the intensity there will be 4/ π2 times the intensity of the beam at the maximum, i.e. 40.53% of it. That’s the point at nΦ/2π = 0.5 on the graph below.

The graph above has a re-scaled vertical as well as a re-scaled horizontal axis. Indeed, instead of I, the vertical axis shows I/n2I0, so the maximum value is 1. And the horizontal axis does not show Φ but nΦ/2π, so if Φ = π/n, then nΦ/2π = 0.5 indeed. [Don’t worry about the dotted curve: that’s the solid-line curve multiplied by 10: it’s there to make sure you see what’s going on, as this ratio of those sines becomes very small very rapidly indeed.]

So, once we’re past that 40.53% point, we get at our first minimum, which is reached at nΦ/2π = 1 or Φ = 2π/n. The numerator sin2(nΦ/2) equals sin2(π) = 0 there indeed, so the whole ratio becomes zero. Then it goes up again, to our second maximum, which we get when our numerator comes close to one again, i.e. when sin2(nΦ/2) ≈ 1. That happens when nΦ/2 = 3π/2, or Φ = 3π/n. Again, when n is (very) large, Φ will be very small, and so we can substitute the denominator sin2(Φ/2) for Φ2/4. We then get a ratio equal to 1/(9π2/4), or an intensity equal to 4n2I0/9π2, i.e. only 4.5% of the intensity at the (first) maximum. So that’s tiny. [Well... All is relative, of course. :-)] We can go on and on like that but that’s not the point here: the point is that we have a very sharp central maximum with very weak subsidiary maxima on the sides.

But what about that big lobe at 30 degrees on that graph with the six-dipole antenna? Relax. We’re not done yet with this ‘quick’ analysis. Let’s look at the general case from yet another angle, so to say. :-)

The general case

To focus our minds, we’ve depicted that array with n oscillators below. Once again, we note that the phase difference between two sources, one to the next, will depend on (1) the intrinsic phase difference between them, which we denote by α, and (2) the time delay because we’re observing the system in a given direction q from the normal, which effect we calculated as equal to (2π/λ)·d·sinθ. So the whole effect is Φ = α + (2π/λ)·d·sinθ = a + k·d·sinθ, with k the wave number.

To make things simple, let’s first assume that α = 0. We’re then in the case that we described above: we’ll have a sharp maximum at Φ = 0, so that means θ = 0. It’s easy to see why: all oscillators are in phase and so we have maximum positive (or constructive) interference.

Let’s now examine the first minimum. When looking back at that geometrical interpretation, with the polygon, all the arrows come back to the starting point: we’ve completed a full circle. Indeed, n times Φ gives nΦ = n·2π/n = 2π. So what’s going on here? Well… If we put that value in our formula Φ = α + (2π/λ)·d·sinθ, we get 2π/n = 0 + (2π/λ)·d·sinθ or, getting rid of the 2π factor, n·d·sinθ = λ.

Now, n·d is the total length of the array, i.e. L, and, from the illustration above, we see that n·d·sinλ = L·sinθ = Δ. So we have that n·d·sinθ = λ = Δ. Hence, Δ is equal to one wavelength.That means that the total phase difference between the first and the last oscillator is equal to 2π, and the contributions of all the oscillators in-between are uniformly distributed in phase between 0° and 360°. The net result is a vector AR with amplitude AR = 0 and, hence, the intensity is zero as well.

OK, you’ll say, you’re just repeating yourself here. What about the other lobe or lobes? Well… Let’s go back to that maximum. We had it at Φ = 0, but we will also have it at Φ = 2π, and at Φ = 4π, and at Φ = 6π etcetera, etcetera. We’ll have such sharp maximum – the maximum, in fact – at any Φ = m⋅2π, where m is any integer. Now, plugging that into the Φ = α + (2π/λ)·d·sinθ formula (again, assuming that α = 0), we get m⋅2π = (2π/λ)·d·sinθ or d·sinθ = mλ

While that looks very similar to our n·d·sinθ = λ = Δ condition for the (first) minimum, we’re not looking at that Δ but at that δ angle measured from the individual sources, and so we have δ = Δ/n = mλ. What’s being said here, is that each successive source is out of phase by 360° and, because, being out of phase by 360° obviously means that you’re in phase once again, ensure that all sources are, once again, contributing in phase and produce a maximum that is just as good as the one we had for m = 0. Now, these maxima will also have a (first) minimum described by that other formula above, and so that’s how we get that pattern of lobes with weak ‘side lobes’.

Conditions

Now, the conditions presented above for maxima and minima obviously all depend on the distance d, i.e. the spacing of the array, and the wavelength λ. That brings us to an interesting point: if d is smaller than λ (so if the spacing is smaller than one wavelength), we have (d/λ)·sinθ = m < 1, so we only have one solution for m: m = 0. So we only have on beam in that case, the so-called zero-order beam centered at θ = 0. [Note that we also have a beam in the opposite direction.]

The point to note is that we can only have subsidiary great maxima if the spacing d of the array is greater than the wavelength λ. If we have such subsidiary great maxima, we’ll call them first-order, second-order etcetera beams, according to the value m.

Diffraction gratings

We are now, finally, ready to discuss diffraction gratings. A diffraction grating, in its simplest form, is a plane glass sheet with scratches on it: several hundred grooves, or several thousand even, to the millimeter. That is because the spacing has to be of the same order of magnitude of the wavelength of light, so that’s 400 to 700 nanometer (nm) indeed – with the 400-500 nm range corresponding to violet-blue light, and the (longer) 700+ nm range corresponding to red light. Remember, a nanometer is a billionth of a meter (1´10-9 m), so even one thousandth of a millimeter is 1000 nanometer, i.e. longer than the wavelength of red light. Of course, from what we wrote above, it is obvious that the spacing d must be wider than the wavelength of interest to cause second- and third-order beams and, therefore, diffraction but, still, the order of magnitude must be the same to produce anything of interest. Isn’t it amazing that scientists were able to produce such diffraction experiments around the turn of the 18th century already? One of the earliest apparatuses, made in 1785, by the first director of the United States Mint, used hair strung between two finely threaded screws. In any case, let’s go back to the physics of it.

In my previous post, I already noted Feynman’s observation that “we cannot literally make little optical-frequency radio stations and hook them up with infinitesimal wires and drive them all with a given phase.” What happens is something similar to the following set-up, and I’ll quote Feynman again (Vol. I, p. 30-3), just because it’s easier to quote than to paraphrase: “Suppose that we had a lot of parallel wires, equally spaced at a spacing d, and a radio-frequency source very far away, practically at infinity, which is generating an electric field which arrives at each one of the wires at the same phase. Then the external electric field will drive the electrons up and down in each wire. That is, the field which is coming from the original source will shake the electrons up and down, and in moving, these represent new generators. This phenomenon is called scattering: a light wave from some source can induce a motion of the electrons in a piece of material, and these motions generate their own waves.”

When Feynman says “light” here, he means electromagnetic radiation in general. But so what’s happening with visible light? Well… All of the glass in that piece that makes up our diffraction grating scatters light, but so the notches in it scatter differently than the rest of the glass. The light going through the ‘rest of the glass’ goes straight through (a phenomenon which should be explained in itself, but so we don’t do that here), but the notches act as sources and produce  secondary or even tertiary beams, as illustrated by the picture below, which shows a flash of light seen through such grating, showing three diffracted orders: the order m = 0 corresponds to a direct transmission of light through the grating, while the first-order beams (m = +1 and m = -1), show colors with increasing wavelengths (from violet-blue to red), being diffracted at increasing angles.

The ‘mechanics’ are very complicated, and the correct explanation in physics involve a good understanding of quantum electrodynamics, which we touched upon in our April, 2014 posts. I won’t do that here, because here we are introducing the so-called classical theory only. This classical theory does away with all of the complexity of a quantum-electrodynamical explanation and replaces it by what is now as the Huygens-Fresnel Principle, which was first formulated in 1678 (!), and which basically states that “every point which a luminous disturbance reaches becomes a source of a spherical wave, and the sum of these secondary waves determines the form of the wave at any subsequent time.”

This comes from Wikipedia, as do the illustrations below. It does not only ‘explain’ diffraction gratings, but it also ‘explains’ what happens when light goes through a slit, cf. the second (animated) illustration.

Now that, light being diffracted as it is going through a slit, is obviously much more mysterious than a diffraction grating – and, you’ll admit, a diffraction grating is already mysterious enough, because it’s rather strange that only certain points in the grating (i.e. the notches) would act as sources, isn’t it? Now, if that’s difficult to understand, it’s even more difficult to understand why an empty space, i.e. a slit, would act as a diffraction grating! However, because this post has become way too long already, we’ll leave this discussion for later.

Introduction: Scale Matters

One of the points which Richard Feynman, as a great physics teacher, does admirably well is to point out why scale matters. In fact, ‘old’ physics are not incorrect per se. It’s just that ‘new’ physics analyzes stuff at a much smaller scale.

For example, Snell’s Law, or Fermat’s Principle of Least Time, which were ‘discovered’ 500 years ago – and they are actually older, because they formalize something that the Greeks had already found out: refraction of light, as it travels from one medium (air, for example) into another (water, for example) – are still fine when studying focusing lenses and mirrors, i.e. geometrical optics. The dimensions of the analysis, or the equipment involved (i.e. the lenses or the mirrors), are huge as compared to the wavelength of the light and, hence, we can effectively look at light as a beam that travels from one point to another in a straight line, that bounces of a surface, or as a beam that gets refracted when it passes from one medium to another.

However, when we let the light pass through very narrow slits, it starts behaving like a wave. Geometrical optics does not help us, then, to understand its behavior: we will, effectively, analyze light as a wave-like thing at that scale, and analyze wave-like phenomena, such as interference, the Doppler effect and what have you. That level of analysis is referred to as the classical theory of electromagnetic radiation, and it’s what we’ll be introducing in this post.

The analysis of light as photons, i.e. as a bunch of ‘particles’ described by some kind of ‘wave function’ (which does not describe any real wave, but only some ‘probability amplitude’), is the third and final level of analysis, referred to as quantum mechanics or, to be more precise, as quantum electrodynamics (QED). [Note the terminology: quantum mechanics describes the behavior of matter particles, such as protons and electrons, while quantum electrodynamics (QED) describes the nature of photons, a force-carrying particle, and their interaction with matter particles.]

But so we’ll focus on the second level of analysis in this post.

Different mathematical approaches

One other thing which Feynman points out in his Lectures is that, even within a well-agreed level of analysis, there are different mathematical approaches to a problem. In fact, while, at any level of analysis, there’s (probably) only one fully mathematically correct analysis, approximate approaches may actually be easier to work with, not only because they actually allow us to solve a practical problem, but also because they help us to understand what’s going on.

Feynman’s treatment of electromagnetic radiation (Volume I, Chapters 28 to 34) is a case in point. While he notes that Maxwell’s field equations are actually the ones to be used, he writes them in a mathematical form that we can understand more easily, and then simplifies that mathematical form even further, in order to derive all that a sophomore student is supposed to know about electromagnetic radiation (EMR), which, of course, not only includes what we call light but also radio waves, radar waves, infrared waves and, on the other side of the spectrum, x-rays and gamma rays.

But let’s get down to business now.

The oscillating charge

Radiation is caused by some far-away electric charge (q) that’s moving in various directions in a non-uniform way, i.e. it is accelerating or decelerating, and perhaps reversing direction in the process. From our point of view (P), we draw a unit vector er’ in the direction of the charge. [If you want a drawing, there's one further down.]

We write r’ (r prime), not r, because it is the retarded distance: when we look at the charge, we see where it was r’/c seconds ago: r’/c is indeed the time that’s needed for some influence to travel from the charge to the here and now, i.e. to P. So now we can write Coulomb’s Law:

E1 = –qer’/4πe0r’2

This formula can quickly be explained as follows:

1. The minus sign makes the direction of the force come out alright: like charges do not attract but repel, unlike gravitation. [Indeed, for gravitation, there’s only one ‘charge’, a mass, and masses always attract. Hence, for gravitation, the force law is that like charges attract, but so that's not the case here.]
2. E and er’ and, hence, the electric force, are all directed along the line of sight.
3. The Coulomb force is proportional to the amount of charge, and the factor of proportionality is 1/4πe0r’2.
4. Finally, and most importantly in this context (study of EMR), the influence quickly diminishes with the distance: it varies inversely as the square of the distance (i.e. it varies as the inverse square).

Coulomb’s Law is not all that comes out of Maxwell’s field equations. Maxwell’s equations also cover electrodynamics. Fortunately, because we are, indeed, talking moving charges here, so electrostatics is only part of the picture and, in fact, the least important one in this case. :-) That’s why I wrote E1, with as subscript, above – not E.

So we have a second term, and I’ll actually be introducing a third term in a minute or so. But let’s first look at the second term. I am not sure how Feynman derives it from Maxwell’s equations – I am sure I’ll see the light :-) when reading Volume II – but, from Maxwell’s equations, he does, somehow, derive the following, secondary, effect:

This is a term I struggled with in a first read, and I still do. As mentioned above, I need to read Feynman’s Volume II, I guess. But, while I still don’t understand the why, I now understand what this expression catches. The term between brackets is the Coulomb effect, which we mentioned above already, and the time derivative is the rate of change. We multiply that with the time delay (i.e. r’/c). So what’s going on? As Feynman writes it: “Nature seems to be attempting to guess what the field at the present time is going to be, by taking the rate of change and multiplying by the time that is delayed.”

OK. As said, I don’t really understand where this formula comes from but it makes sense, somehow. As for now, we just need to answer another question in order to understand what’s going on: in what direction is the Coulomb field changing?

It could be either: if the charge is moving along the direction of sight er’ won’t change but r’ will. However, if r’ does not change, then it’s er’ that changes direction, and that change will be perpendicular to the line of sight, or transverse (as opposed to radial), as Feynman puts it. Or, of course, it could be a combination of both. [Don't worry too much if you're not getting this: we will need this again in just a minute or so, and then I will also give you a drawing so you'll see what I mean.]

The point is, these first two terms are actually not important because electromagnetic radiation is given by the third effect, which is written as:

Wow ! This looks even more complicated, doesn’t it? Let’s analyze it. The first thing to note is that there is no r’ or r’2 in this equation. However, that’s an optical illusion of sorts, because r’ does matter when looking at that second-order derivative. How? Well… Let’s go step by step and first look at that second-order derivative. It’s the acceleration (or deceleration) of er’. Indeed, visualize er’ wiggling about, trying to follow the charge by pointing at where the charge was r’/c seconds ago. Let me help you here by, finally, inserting hat drawing I promised you.

This acceleration will have a transverse as well as a radial component: we can imagine the end of er’ (i.e. the point of the arrow) being on the surface of a unit sphere indeed. So as it wiggles about, the tip of the arrow moves back a bit from the tangential line. That’s the radial component of the acceleration. It’s easy to see that it’s quite small as compared to the transverse component, which is the component along the line that’s tangent to the surface (i.e. perpendicular to er’).

Now, we need to watch out: we are not talking displacement or velocity here but acceleration. Hence, even if the displacement of the charge is very small, and even if velocities would not be phenomenal either (i.e. non-relativistic), the acceleration involved can take on any value really. Hence, even with small displacements, we can have large accelerations, so the radial component is small relative to the transverse component only, not in an absolute sense.

That being said, it’s easy to see that both the transverse as well as the radial component depend on the distance r’ but in a different way. I won’t bother you with the geometrical proof (it’s not that obvious). Just accept that the radial component varies, more or less as the inverse square of the distance. Hence, we will simplify and say that we’re considering large distances r’ only – i.e. large in comparison to the length of the unit vector, which just means large in comparison to one (1) – and then it’s only the transverse component of a that matters, which we’ll denote by ax.

However, if we drop that radial component, then we should drop E1 as well, because the Coulomb effect will be very small as compared to the radiation effect (i.e. E3). And, then, if we drop E1, we can drop the ‘correction’ E2 as well, of course. Indeed, that’s what Feynman does. He ends up with this third term only, which he terms the law of radiation:

So there we are. That’s all I wanted to introduce here. But let’s analyze it a bit more. Just to make sure we’re all getting it here.

All that simplification business above is tricky, you’ll say. First, why do we write t – r/c for the retarded time (t’)? It should be t – r’/c, no? You’re right. There’s another simplification here: we fix the delay time, assuming that the charge only moves very small distances at an effectively constant distance r. Think of some far-away antenna indeed.

Hmm… But then we have that 1/c2 factor, so that should reduce the effect to zilch, isn’t it? And then… Hey! Wait a minute! Where does that r suddenly come from? Well, we’ve replaced d2er’/dt2 by the lateral acceleration of the charge itself (i.e. its component perpendicular to the line of sight, denoted by ax) divided by r. That’s just similar triangles.

Phew! That’s a lot of simplifications and/or approximations indeed. How do we know this law really works? And, if it does, for what distance? When is that 1/r part (i.e. E3) so large as compared to the other two terms (E1 and E2) that the latter two don’t matter anymore? Well… That seems to depend on the wavelength of the radiation, but we haven’t introduced that concept yet. Let me conclude this first introduction by just noting this ‘law’ can easily be confirmed by experiment.

A so-called dipole oscillator or radiator can be constructed, as shown below: a generator drives electrons up and down in two wires (A and B). Why do we put the generator in the middle? That’s because we want a net effect: the radiation effect of the electrons in the wires connecting the generator with A and B will be neutral, because the electrons move right next to each other in opposite direction. With the generator in the middle, A and B form one antenna, which we’ll denote by G (for generator).

Now, another antenna can act as a receiver, and we can amplify the signal to hear it. That’s the D (for detector) shown below. Now, one of the consequences of the above ‘law’ for electromagnetic radiation is, obviously, that the strength of the received signal should become weaker as we turn the detector. The strongest signal should be when D is parallel to G. At point 2, there is a projection effect and, hence, the strength of the field should be less. Indeed, remember that the strength of the field is proportional to the acceleration of the charge projected perpendicular to the line of sight. Hence, at point 3, it should be zero, because the projection is zero.

Now, that’s what an experiment like this would indeed confirm. [I am tempted now to explain how a radio receiver works, but I will resist the temptation.]

I just need to make a last point here in order to make sure that we understand the formula above and – more importantly – that we can use in subsequent chapters without having to wonder where it comes from. The formula above implies that the direction of the field is at right angles to the line of sight. Now, if a charge is just accelerating up and down, in a motion of very small amplitude, i.e. like the motion in that antenna, then the magnitude (or strength let’s say) of the field will be given by the following formula:

θ, in this formula, is the angle between the axis of motion and the line of sight, as illustrated below:

So… That’s all we need to know for now. We’re done. As for now that is. This was quite technical, I guess, but I am afraid the next post will be even more technical. Sorry for that. I guess this is just a piece we need to get through.

Post scriptum:

You’ll remember that, with moving and accelerating charges, we should also have a magnetic field, usually denoted by B. That’s correct. If we have a changing electric field, then we will also have a magnetic field. There’s a formula for B:

B = –er’´E/c = –| er’||E|c–1sin(er’, En = –(E/c)·n

This is a vector cross-product. The angle between the unit vector er’ and E is π/2, so the sine is one. The vector n is the vector normal to both vectors as defined by the right-hand screw rule. [As for the minus sign, note that -a´b = b´a, so we could have reversed the vectors: the minus sign just reverses the direction of the normal vector.] In short, the magnetic field vector B is perpendicular to E, but its magnitude is tiny: E/c. That’s why Feynman neglects it, but we will come back on that in later posts.