Saturday 9 December 2017

The problem of universals

This apple is red and so is that stop sign. Bob has two arms and two legs.

The apple, the stop sign and Bob are all individuals or particular things. When we say that the apple is red or that Bob has two arms, we are predicating something of those individuals. And when the same thing can be predicated of more than one individual, then that thing (say, redness or the number two) is termed a universal. A universal is just a general characteristic that those individuals have in common.

Now the status of individuals seems straightforward. We say that there is an apple there - it exists and we can see it and also eat it. But the status of universals seems less obvious. Is redness something real? If so, how does it happen to be in many places at once, in both the apple and the stop sign? And also, the number two seems very abstract, not a physical kind of thing that can exist like the apple and Bob do.

In philosophy, this is called the problem of universals. Plato was the first to discuss the issue in depth and proposed that universals do exist, but they exist in a higher realm of the Forms. So there is an ideal or perfect form for redness, a form for the number two, and a form for every other universal. The (imperfect) individual things that we can see around us are said to participate in those perfect forms. This view is called Platonic realism.

Aristotle (who was Plato's student) strongly disagreed with Plato on this issue. In his view, universals were indeed real, but they were not separable from the individuals that they were found in. That is, there is no redness independent of particular things that are red. This view is called moderate or immanent realism - the universals are immanent within the individuals.

A third view, nominalism, denies that universals are real at all. There are only the names we use and there is no universal quality that they have in common.

My own view is that Aristotle's immanent realism best captures what we implicitly mean when we assert things about individuals. The characteristics that the individuals have in common are not merely naming conventions or the creations of humans. Instead they are necessary for the individuals to be observable in any form at all. Granting that, there also seems to be no good reason to posit a non-natural realm for universals to reside in. They are simply features or aspects of the natural world that we experience, albeit at a more abstract level that requires human intelligence to be able to identify and reflect on them.

The image above shows part of the School of Athens fresco by the Italian Renaissance artist Raphael. It depicts Plato pointing towards the heavens where he supposed the ideal forms to be. It also depicts Aristotle gesturing downwards to the here-and-now world that he held universals were a part of.

Friday 29 September 2017

The Sleeping Beauty problem

Continuing the theme of probability and self-locating uncertainty from my last post, there is a famous puzzle in philosophy called the Sleeping Beauty problem. It involves the following experiment that Sleeping Beauty volunteers to take part in.

On Sunday, Sleeping Beauty is put to sleep and then a fair coin is tossed. On Monday, Sleeping Beauty is awakened and interviewed. She is then put to sleep again with an amnesia-inducing drug that makes her forget that awakening. On Tuesday, if the coin landed tails she is awakened, interviewed and put to sleep once more. Otherwise, if the coin landed heads, she is left asleep. On Wednesday, she is awakened and the experiment ends.

When she is awakened and interviewed she does not know which day it is or whether she has been awakened before. During the interview, Beauty is asked, "What is the probability that the coin landed heads?" The experiment is illustrated below.

Sleeping Beauty problem (illustration by Stuart Armstrong)











There are two popular but opposing solutions that are commonly given by philosophers. The solution known as the thirder position is that Sleeping Beauty should report that the probability of heads is 1/3. [1] The intuition is that Sleeping Beauty cannot distinguish between the three awake states, so she should assign the same probability to each state. Also, if the experiment is run many times, Sleeping Beauty is awakened on average in a state where the coin has landed heads 1/3 of the time and in a state where the coin has landed tails 2/3 of the time.

The halfer position is that that Sleeping Beauty should report that the probability of heads is 1/2. [2] The intuition here is that Sleeping Beauty seems to learn no new information when she is awakened. So she should not update on the probability that she held on the Sunday before the experiment began. That is, it is the coin toss that is the relevant event, not how many times she is awakened. [3]

So which position is correct?

Before the experiment starts, Sleeping Beauty knows that during the experiment she can be in one of the four following states with equal probability.

Monday  Tuesday
Heads   1/4  1/4
Tails  1/4  1/4
She also knows that she will not be awakened on Tuesday if the coin lands heads, so the probability for that state is 0 when she is awakened. The question then is how the other probabilities should be updated.

The halfers distribute the probability to the remaining heads state, as follows:

Monday Tuesday
Heads  1/2 0
Tails 1/4 1/4
From the table:
P(Heads and Monday) = 1/2
P(Tails and Monday) = 1/4
P(Monday) = P(Heads and Monday) + P(Tails and Monday)
          = 1/2 + 1/4 = 3/4
Using the formula for conditional probability: [4][5]
P(Heads|Monday) = P(Heads and Monday) / P(Monday)
                = 1/2 / 3/4 = 2/3
This, I think, reveals the fatal flaw with the halfer position. If Sleeping Beauty is told it is Monday, then she will think the probability for the coin landing heads is 2/3!

The thirders distribute the probability evenly to all awake states, as follows:

Monday Tuesday
Heads  1/3 0
Tails 1/3 1/3
From the table:
P(Heads and Monday) = 1/3
P(Tails and Monday) = 1/3
P(Monday) = P(Heads and Monday) + P(Tails and Monday)
          = 1/3 + 1/3 = 2/3
Using the formula for conditional probability: [6]
P(Heads|Monday) = P(Heads and Monday) / P(Monday)
                = 1/3 / 2/3 = 1/2
So, if Sleeping Beauty is told it is Monday, then she will think the probability for the coin landing heads is 1/2, which intuitively seems correct.

But the thirder position seems to imply that Beauty has learnt new information since Sunday (when the probability for heads was 1/2) which causes her to update her probability for the coin landing heads to 1/3. What could that new information be?

That it is both Tuesday and that the coin landed heads is a possible state for Beauty to be in when she is asleep. So she learns that she is not in that state when she wakes up. As a consequence, she updates by excluding that state and normalizing the probabilities for the remaining awake states.

One further test for the thirder position is to consider the probabilities when Sleeping Beauty is awakened on, say, 1000 consecutive days when the coin lands tails (instead of two consecutive days). [7] The probability of the coin landing heads when Beauty is awakened is then 1/1001. This gives:
P(Heads) = 1/1001
P(Tails) = 1000/1001
P(Heads and Monday) = 1/1001
P(Tails and Monday) = 1/1001
P(Monday) = P(Heads and Monday) + P(Tails and Monday)
          = 1/1001 + 1/1001 = 2/1001
P(Heads|Monday) = P(Heads and Monday) / P(Monday)
                = 1/1001 / 2/1001 = 1/2
As with the original experiment, if Sleeping Beauty is told that it is Monday, then she will think the probability for the coin landing heads is 1/2 which (as before) intuitively seems correct. Note that there are still 1000 states that Beauty can be in as a result of the coin landing heads. But she will just not be awake when she is in 999 of those states so they are excluded from her awake state calculations.

--

[1] The thirder position was initially introduced by Adam Elga in this paper.

[2] The halfer position was defended by David Lewis in his reply to Elga.

[3] The thirders interpret the problem as an experiment about awakening events (i.e., which branch Beauty will find herself on) whereas the halfers interpret the problem as an experiment about a coin toss event (i.e., which branch Beauty will be placed on). But both sides agree about what odds should be accepted if Beauty is offered a bet that the coin landed heads. If she is offered a bet whenever she is awakened, she should accept 1/3 odds. If she is only offered a bet once, she should accept 1/2 odds.

[4] The formula for calculating the conditional probability of A given B is:
P(A|B) = P(A and B) / P(B)
[5] The probability calculations for Sleeping Beauty when awake according to halfers:
P(Heads) = 1/2
P(Tails) = 1/2
P(Heads and Monday) = 1/2
P(Heads and Tuesday) = 0
P(Tails and Monday) = 1/4
P(Tails and Tuesday) = 1/4
P(Monday|Heads) = P(Heads and Monday) / P(Heads) = 1/2 / 1/2 = 1
P(Tuesday|Heads) = P(Heads and Tuesday) / P(Heads) = 0 / 1/2 = 0
P(Monday|Tails) = P(Tails and Monday) / P(Tails) = 1/4 / 1/2 = 1/2
P(Tuesday|Tails) = P(Tails and Tuesday) / P(Tails) = 1/4 / 1/2 = 1/2
P(Monday) = P(Heads and Monday) + P(Tails and Monday) = 1/2 + 1/4 = 3/4
P(Tuesday) = P(Heads and Tuesday) + P(Tails and Tuesday) = 0 + 1/4 = 1/4
P(Heads|Monday) = P(Heads and Monday) / P(Monday) = 1/2 / 3/4 = 2/3
P(Heads|Tuesday) = P(Heads and Tuesday) / P(Tuesday) = 0 / 1/4 = 0
P(Tails|Monday) = P(Tails and Monday) / P(Monday) = 1/4 / 3/4 = 1/3
P(Tails|Tuesday) = P(Tails and Tuesday) / P(Tuesday) = 1/4 / 1/4 = 1
[6] The probability calculations for Sleeping Beauty when awake according to thirders:
P(Heads) = 1/3
P(Tails) = 2/3
P(Heads and Monday) = 1/3
P(Heads and Tuesday) = 0
P(Tails and Monday) = 1/3
P(Tails and Tuesday) = 1/3
P(Monday|Heads) = P(Heads and Monday) / P(Heads) = 1/3 / 1/3 = 1
P(Tuesday|Heads) = P(Heads and Tuesday) / P(Heads) = 0 / 1/3 = 0
P(Monday|Tails) = P(Tails and Monday) / P(Tails) = 1/3 / 2/3 = 1/2
P(Tuesday|Tails) = P(Tails and Tuesday) / P(Tails) = 1/3 / 2/3 = 1/2
P(Monday) = P(Heads and Monday) + P(Tails and Monday) = 1/3 + 1/3 = 2/3
P(Tuesday) = P(Heads and Tuesday) + P(Tails and Tuesday) = 0 + 1/3 = 1/3
P(Heads|Monday) = P(Heads and Monday) / P(Monday) = 1/3 / 2/3 = 1/2
P(Heads|Tuesday) = P(Heads and Tuesday) / P(Tuesday) = 0 / 1/3 = 0
P(Tails|Monday) = P(Tails and Monday) / P(Monday) = 1/3 / 2/3 = 1/2
P(Tails|Tuesday) = P(Tails and Tuesday) / P(Tuesday) = 1/3 / 1/3 = 1
[7] A similar example is used by Nick Bostrom in this paper intending to show that the thirder position is counter-intuitive. I disagree, for the reasons I give in my conclusion. Bostrom also makes several other arguments against both the halfer and thirder positions and defends a hybrid position.

Thursday 7 September 2017

Probability in a causal universe

Does God play dice with the universe?
Probability is a familiar idea. When we roll a dice, we know there is a 1/6 chance of seeing a particular number between 1 and 6.

While we often say that the number we see came up by chance, we also know that that description really just reflects our lack of knowledge about the physical forces that acted on the dice. If we knew precisely its initial orientation, how fast it was rolled, what the surface characteristics were and so on, we could accurately predict which number would come up.

However the situation is more complex with quantum mechanics. An observer could have complete knowledge about a particular system, but not know which state they will measure it in. For example, suppose a photon is sent through a balanced beam splitter. The system evolves into a superposition of two relative states. In one state, the photon is on the reflection path, in the second state the photon is on the transmission path.

The Born rule states that the probability of measuring a system in a particular state is given by squaring the magnitude of the amplitude for the state. In the beam splitter example above, the amplitude for each relative state is √(1/2).[1] So the probability for measuring each state is 1/2.

If the observer has complete knowledge about the system, it would appear that the probabilities are intrinsic to nature. It was this idea that Einstein disputed with when he famously proclaimed that God does not play dice with the universe. As he put it in a letter to Max Born, "Quantum theory yields much, but it hardly brings us close to the Old One's secrets. I, in any case, am convinced He does not play dice with the universe."

So in many textbooks the Born rule is stated as a fundamental postulate of quantum mechanics and not as something that needs to be explained. But it need not be this way. The Everett (Many Worlds) interpretation of quantum mechanics takes Einstein's side on the issue and requires that the Born rule be derivable from quantum mechanics, not merely postulated. If this succeeds, it returns us to the original understanding where probability is a result of a lack of knowledge. The twist, though, is that the observer could still have complete knowledge of the system under observation, but just not of their own location with respect to that system. I'm going to outline a derivation below that draws from Carroll's and Sebens' derivation.

1. On the Everett interpretation, measurement leads to initial self-locating uncertainty. An observer can have complete knowledge about the relative states of the system, but not which particular state they have just measured (since there is a version of them that has measured each state). This raises the question of how to quantify their uncertainty in terms of probabilities.

2. If the state amplitudes are equal, the observer should initially be indifferent about which state they have measured. So the states can simply be counted to calculate the probability that a particular state has been measured.

3. If the state amplitudes are not equal, they can be mathematically factored into states that do have equal amplitudes. And again the states can be counted to calculate the probability. The number of factored states exactly tracks the square of the initial amplitude, so it is equivalent to applying the Born rule.

This last point is interesting. Suppose that the wave function for a particular beam splitter gives a (non-normalized)[2] amplitude of 1 for the reflection path state and an amplitude of 2 for the transmission path state. The Born rule says that the probability of observing the two states is in the ratio 1:4. That is, it is not correct to just count the states, or even just apportion the amplitude. Instead the amplitudes must be squared. But why should this be the rule?

This can be demonstrated by transforming the initial setup into a new setup with equal amplitude states as shown in Diagram 1. To do this, add a second beam splitter (with 1/2 probability of reflection and transmission) to the transmission path of the first beam splitter. Note: the amplitude for the reflection and transmission relative states is √(1/2) each.

Diagram 1: Factoring beam splitter states (amplitudes shown)
When a photon is sent through this setup, there are two new relative states on the transmission path with an amplitude of √2 each (i.e., 2 * √(1/2) = √2). Now add two more balanced beam splitters on each of those two paths. When a photon is sent through this setup, there are four new relative states on the initial transmission path with an amplitude of 1 each (i.e., √2 * √(1/2) = 1). Now the amplitudes for all five final relative states are equal to 1 and the ratio of the initial reflection path states to the initial transmission path states is 1:4 as required by the Born rule.

This can also be understood geometrically as an application of the Pythagorean Theorem as shown in Diagram 2.

Diagram 2: Factoring beam splitter states
(squared amplitudes shown)
The lengths of the sides of the green triangle at the far left represents the amplitudes for the first beam splitter states. The reflection path state has an amplitude of 1 and the transmission path state has an amplitude of 2. Since the relative states are orthogonal, the triangle is right-angled. The hypotenuse represents the superposition state which has an amplitude of √5. The red numbers represent the squares of the sides (which is the non-normalized probability of measuring the state).

The second green triangle represents the second beam splitter states with a hypotenuse of length 2. Since this beam splitter is balanced, the shorter sides have the same length. Therefore, by the Pythagorean Theorem, 22 = a2 + a2; 4 = 2a2; 2 = a2; a = √2. Each of these shorter sides now becomes the hypotenuse for two further right-angled triangles with equal length short sides. Again, by the Pythagorean Theorem, (√2)2 = a2 + a2; 2 = 2a2; 1 = a2; a = 1. Thus the amplitudes are all equal to 1 with a reflection/transmission ratio of 1:4 as required.

--

[1] If the photon entered the front of the beam splitter then, due to a phase change, the amplitude for the reflection path would be -√(1/2). For the purpose of this post, assume that the photon enters the rear of the beam splitters and so both the transmission and reflection amplitudes will be positive.

[2] Amplitudes are usually normalized such that the probabilities (the squares of the amplitudes) of the relative states sum to 1.

Friday 18 August 2017

Modeling quantum interference (Part 2)

Diagram 1: Mach-Zehnder interferometer
In my previous post I described how to model a Mach-Zehnder interferometer by taking into account all paths that a photon can take and calculating the amplitudes. In this post, I will elaborate on the mathematical model further to create a visual sense of what is going on.

The interferometer and the photon travelling within it can be considered together as a quantum system. At the beginning of the experiment the system has a single quantum state that contains all the information about the system. This quantum state has a complex value associated with it called an amplitude which can be visualized as an arrow that can rotate around a center point (like a clock hand or compass arrow).[1]

In our model, the amplitude for the initial quantum state is 1. As the photon travels toward the first beam splitter, the quantum state continually changes and this change is reflected in the amplitude. Using the clock analogy, the clock hand is continually rotating as the photon travels. When the photon arrives at the beam splitter, the quantum state is transformed into two distinct quantum states which are in superposition - one representing a reflected photon and one representing a transmitted photon. Each quantum state has its own amplitude as determined by the photon interaction with the beam splitter. There are now three distinct quantum states. One state represents a photon travelling on the upper (green) path, one state represents a photon travelling on the lower (red) path and the global state that now represents those two component states in superposition.

To help conceptualize this, we can think of each quantum state as representing a real system. So we can refer to the photon on the green path, the photon on the red path, and the global photon that is in superposition, with each referent indexed to a distinct quantum system.[2]

Mathematically, the complex amplitude that represents a quantum state is a vector and it allows us to model the superposition of the two quantum states. Consider an arrow pointing north-east. This can be thought of as a vector that is the combination of two basis vectors - one pointing north and one pointing east. Suppose the arrow is 1 unit long. The arrow therefore has a length of 1/√2 in the north direction and a length of 1/√2 in the east direction (per the Pythagorean theorem). So we can represent this as a linear equation using bra-ket notation:

  |north-east> = 1/√2|north> + 1/√2|east>

In the same notation, when the photon passes through the first beam splitter the new quantum state is:

  |blue_aftersplitter> = -1/√2|green> + 1/√2|red>

This captures all the information about the system at this point. |blue_aftersplitter> is the main quantum state that is in superposition with an amplitude of 1 (implied). The state for the green (reflected) path has an amplitude of -1/√2 and the state for the red (transmission) path has an amplitude of 1/√2. See diagram 2 below for the path amplitudes.

Diagram 2: Path amplitudes
Note that the squares of the two component amplitudes sum to 1. If a measurement were performed at this point (i.e., the detectors were placed on those two path segments), there would be an equal probability of finding the photon in either state.[3]







Evolving from the initial (blue) quantum state to the superposition state after the beam splitter requires a transformation operation which is represented by a matrix. The matrix required for the beam splitter is:[4]

  bs = [1/√2  1/√2] = 1/√2[1  1]
       [1/√2 -1/√2]       [1 -1]

The beam splitter has two input ports - one at the front (indicated by the dot) and one at the rear. The matrix columns describes the rear port and front port behavior respectively. The matrix rows describe the transformations on the transmitted beam and reflected beam respectively. (For example, the bottom-right cell describes the transformation on a photon that enters the beam splitter through the front port and is reflected.)

Since the photon initially passes through the front port, the initial (blue) state is described by the following vector which specifies an amplitude of 0 for the photon passing through the rear port and an amplitude of 1 for the photon passing through the front port:[5]

  |blue> = [0]
           [1]

We can now see what happens when the initial (blue) state is transformed by the beam splitter matrix (i.e., when the vector is multiplied by the matrix).

  bs|blue> = 1/√2[1  1][0] = 1/√2[0*1 +  1*1] = 1/√2[ 1] = [ 1/√2]
                 [1 -1][1]       [0*1 + 1*-1]       [-1]   [-1/√2]

The top cell of the resulting vector describes the transmitted (red) state amplitude and the bottom cell describes the reflected (green) state amplitude. That is, the amplitude for the global (blue) state is now distributed between two component states (red and green). This can be equivalently expressed as:

  bs|blue> = 1/√2(-|green> + |red>)
           = -1/√2|green> + 1/√2|red>

To explain the above equation, we know the front port behavior is described by the second column of the bs matrix. Then the bottom cell applies to the reflected beam state (which, in this case, is green) so it is multiplied by -1. The top cell applies to the transmitted beam state (which, in this case, is red) so it remains unchanged. Finally, the 1/√2 coefficient applies to both states so they are both multiplied by it.

The transformation matrix required for the two mirrors is:

  mi = [-1  0]
       [0  -1]

which inverts the phase of both states (i.e., multiplies each state by -1). The second beam splitter has the same matrix as the first. Note that, in this case, a beam will pass through the rear of the beam splitter as well as a beam through the front.

The entire evolution from the initial quantum state is:

  bs mi bs|blue>
    = bs mi (-|green> + |red>)/√2
    = bs (-(-|green) + -(|red>))/√2
    = bs (|green - |red>)/√2
    = ((|detector1> + |detector2>) - (-|detector1> + |detector2>))/2
    = (|detector1> + |detector2> + |detector1> - |detector2>)/2
    = (2|detector1> + 0|detector2>)/2
    = |detector1>

  P(|detector1>) = |1|2 = 1 = 100%

Thus the photon always ends up at detector 1. To explain the mathematics, the photon in the initial (blue) state passes through the front of the beam splitter so its reflected beam (green) is inverted. Both beams are then inverted by the mirrors. Finally, the green state itself becomes a superposition of two states representing the photon heading towards each detector. Similarly for the red state. But note that the photon in the red path passes through the front of the beam splitter, so its reflected beam to detector 1 is inverted. The detector 2 states destructively interfere (since they are indistinguishable) while the detector 1 states constructively interfere (since they are also indistinguishable) which finally results in a single quantum state at detector 1 with an amplitude of 1.

Note that it is also possible to insert a phase shifter into one of the paths (emulating a sample or change in path length) which will change the probability of the photon arriving at each detector. The transformation matrix for a phase shifter in the lower (red) path is:

  ph(φ) = [e 0]
          [0   1]

where φ is the phase angle. A phase shift of 180° means the photon will always arrive at detector 2 while a phase shift of 90° means the photon will be found at either detector with equal probability. To illustrate this, adding a phase shift of 90° gives:

  ph(90°) = [ei*pi/2 0] = [i 0]
            [0      1]   [0 1]

The state evolution is now:

  bs mi ph(90°) bs|blue>
    = bs mi ph(90°(-|green> + |red>)/√2
    = bs mi (-|green> + i|red>)/√2
    = bs (-(-|green) + -(i|red>))/√2
    = bs (|green - i|red>)/√2
    = ((|detector1> + |detector2>) - i(-|detector1> + |detector2>))/2
    = (|detector1> + |detector2> + i|detector1> - i|detector2>)/2
    = (1+i|detector1> + 1-i|detector2>)/2
    = 0.5+0.5i|detector1> + 0.5-0.5i|detector2>

  P(|detector1>) = |0.5+0.5i|2 = 0.5 = 50%
  P(|detector2>) = |0.5-0.5i|2 = 0.5 = 50%

One final interesting effect occurs when two separate (but otherwise indistinguishable) photons are directed into each port of the first beam splitter simultaneously. There are four possible combinations: (1) the front port photon is reflected and the rear port photon is transmitted, (2) both photons are reflected, (3) both photons are transmitted and (4) the front port photon is transmitted and the rear port photon is reflected. This results in a superposition of four states as represented below:

  1/√2(-|greengreen> - |greenred> + |redgreen> + |redred>)

However the two states with a photon on each path are actually the same state since the photons are indistinguishable. And, since their amplitudes sum to 0, they destructively interfere. Thus, if a measurement were performed at this point by adding detectors, there would be an equal probability of finding either two photons on the green path or two photons on the red path but never one photon on each path.[6]

--

[1] The complex plane is analogous to a clock face or compass face. The real number line is horizontal, with 3pm or East representing the number 1 and 9pm or West representing the number -1. The imaginary number line is vertical, with 12am or North representing the imaginary number i and 6am or South representing the imaginary number -i. The origin is the center point which is 0.

[2] The global photon in superposition could be considered an abstraction similar to a university that is an abstraction over its distinct buildings or campuses.

[3] This is in accordance with the Born rule. The probability that the photon will be observed on a particular path is given by the square of the amplitude.

[4] This is known as the Hadamard matrix. It also represents the Hadamard gate in quantum computing which can be used to transform a qubit into a superposition state.

[5] The vector can be regarded as a qubit that has been prepared in state |1> (i.e., by directing the photon towards the front port of the beam splitter). The beam splitter transforms the qubit in state |1> into the superposition 1/√2(|0> - |1>) which later results in the selection of both ports of the second beam splitter.

[6] This is the Hong-Ou-Mandel effect.

Tuesday 15 August 2017

Modeling quantum interference

Diagram 1: Mach-Zehnder interferometer
In this post, I'm going to model a device that exhibits quantum behavior in a simple but striking way.

The device pictured at the left is called a Mach-Zehnder interferometer. The beam splitter splits a beam of light into two paths. 50% of the beam is reflected towards mirror 1 and 50% of the beam is transmitted towards mirror 2. When the beams reach the second beam splitter, each beam is split again and is reflected or transmitted towards the detectors.

Intuitively, it would seem that half the light should end up at detector 1 and half at detector 2.[1] However, assuming the two paths are the same length, all the light actually ends up at detector 1 at the right and none at the top detector.

This result is due to quantum interference at the second beam splitter where light heading towards the top detector destructively interferes and light heading towards the right detector constructively interferes. In true quantum style, this result always occurs even if only a single photon of light is emitted towards the first beam splitter.

In quantum mechanical terms, the photon is in a superposition of travelling along both paths simultaneously. At the second beam splitter, each path forms a further superposition (again with one path reflecting and one transmitting - see the four arrows heading towards the detectors in the diagram). The two paths heading towards detector 2 destructively interfere (i.e., they are 180° out of phase) and thus cancel each other out. Whereas the two paths heading toward detector 1 constructively interfere and so the photon is always detected there.

So how does the device actually work? The mathematics is actually fairly straightforward. The basic strategy is to model each path that the photon can take and combine identical paths at the end. Each path segment has a complex value associated with it called an amplitude which can be visualized as an arrow that can rotate around a center point (like a clock hand).[2]

The initial (blue) path amplitude is 1 (see Diagram 2 below which specifies the calculated amplitudes for each path segment). The basic rule at the beam splitter is that the path splits into two paths and each path takes the amplitude of the source path value and multiplies it by 1/√2 (this is the normalization condition - the squares of the amplitudes in a superposition of paths must always sum to 1, i.e., 1/2 + 1/2 = 1).[3] Also, the path of the reflected beam additionally multiplies the amplitude by -1 which represents a phase change of 180°. So the upper (green) path has an amplitude of -1/√2 (-0.707) and the lower (red) path has an amplitude of 1/√2 (0.707).

(Note: If the photon passes through the rear of the beam splitter, the result is the same except that the phase change rule does not apply.[4] This is the case for the upper beam path when it reaches the second beam splitter. The front of each beam splitter is indicated by the dot.)

At each mirror, the amplitude for each path is multiplied by -1 (i.e., a phase change of 180°). So the upper path now has a value of 1/√2. The lower path now has a value of -1/√2. At the second beam splitter, the upper (reflection) path itself splits into reflection and transmission paths toward the two detectors. The upper beam reflection path value is 0.5 (1/√2 * 1/√2) and the upper beam transmission path value is also 0.5 (1/√2 * 1/√2). The lower beam reflection path value is 0.5 (-1/√2 * 1/√2 * -1) and the lower beam transmission path value is -0.5 (-1/√2 * 1/√2).

Diagram 2: Path amplitudes
This is where the quantum magic happens. The upper beam reflection path and the lower beam transmission path coincide. They are both directed towards detector 2. So the paths merge and the amplitudes are added to give a value of 0 (0.5 + -0.5) which is destructive interference. Similarly, the upper beam transmission path and the lower beam reflection path also coincide. They are both directed towards detector 1. So the paths merge and the amplitudes are added to give a value of 1 (0.5 + 0.5) which is constructive interference.

The probability of finding the photon at a particular detector is given by the amplitude squared, which is 100% for detector 1. Thus the photon always ends up at detector 1.

Note that this result depends on the physical configuration of the interferometer. In this case, the two paths between the beam splitters are the same length. However if the length of one of the paths is changed, the results also can change such that the photon is instead always found at detector 2 (i.e., change a path phase by 180° by multiplying by -1 and recalculate the subsequent path values), or found at either detector with equal probability (i.e., change a path phase by 90° by multiplying by the imaginary number i and recalculate), or any other probabilistic combination.

For further interferometer fun, see Part 2.

--

[1] If the second beam splitter were removed, the light would be distributed between both detectors. In the case of one emitted photon, the photon would be observed at one detector or the other with 50% probability. No interference between the photon paths would occur since the paths are different (one is directed towards detector 1 and one is directed towards detector 2 when they cross).

[2] The amplitude actually continually changes as the photon travels (i.e., the arrow rotates). To simplify the example, the path segments are of lengths that are multiples of the wavelength. So a photon that leaves the beam splitter with a particular phase angle will have the same phase angle when it arrives at the mirror. Also, the top path and lower path are the same length.

[3] This is in accordance with the Born rule. The probability that the photon will be observed on a particular path is given by the square of the amplitude.

[4] There is a phase change for a reflection at a surface with a higher refractive index which is true at the front of the beam splitter (the glass refracts more than the air the photon is travelling in) but not at the rear (where the photon is already travelling in the glass before it reflects).

Sunday 11 June 2017

Visualizing the Schrodinger equation

The Schrodinger equation describes how a physical system changes over time. But what does it mean intuitively?[1]

Imagine a particle moving freely through space. There are no forces acting on the particle so it travels in a straight line along the x-axis.

In Classical Mechanics, if the current state of the particle is known (such as its position and momentum) then its future state can be predicted according to classical laws.

In Quantum Mechanics, the state of the particle is represented by a wave function[2] that has a complex value and is denoted by the Greek letter Ψ (psi). The state of the particle can be prepared so that its wave function is initially known and it will then evolve in time according to the Schrodinger equation. Further mathematical operations can be performed on the wave function to determine the position, momentum or energy of the particle.

The time evolution can be elegantly expressed as Ψ(t) = U(t)Ψ(t=0) where U(t) is a unitary operator that propagates the wave function from its initial configuration at time t=0 to its final configuration at time t. U(t) = e-iEt/ħ which is an exponential formula that represents a rotation (or phase change) on the complex plane of Et/ħ radians, where E is the total energy of the system and ħ is the reduced Planck constant. The wave function (at time t=0) can be visualized as a clock hand that rotates to a new position when the operator U(t) is applied to it. The greater the energy, the more rotations per second.

So we know U(t) and how to calculate the future wave function Ψ(t). All we need is the wave function at time t=0 to plug into the equation. The simplest wave function is a plane wave that curls in a uniform spiral around the x-axis (see Figure 1 below).

Figure 1 - Complex plane wave

A plane wave has the general formula Aeipx/ħ where A is the wave amplitude, p is the momentum and x is the position on the x-axis. Note that, like the time evolution operator, it also has an exponential representing rotation on the complex plane. However, in this case, the rotation is across space rather than over time. The greater the momentum, the more rotations per meter.

So our initial wave function is Ψ(x,t=0) = Aeipx/ħ . Therefore our wave function at time t is Ψ(x,t) = e-iEt/ħAeipx/ħ, which can be expressed more simply as:

Ψ(x,t) = Aei(px - Et)/ħ 

where the energy is proportional to the square of the momentum (E=p2/2m) and position and momentum are related via the canonical commutation relation xp - px = iħ.[3]

A way to visualize this equation is to imagine the entire wave in Figure 1 to be dynamically rotating as time progresses. It will appear to be travelling along the x-axis in a periodic manner. The greater the momentum (and therefore energy), the tighter the spiral and the faster it will be spinning.

Let's suppose that we've prepared our particle to have a precise momentum (for example, 40 kgm/s where our particle weighs 5kg and has an energy of 160 joules, since E=p2/2m). So our wave function Ψ is Ae(40x - 160t)/ħ . We can now use our operators to measure those observable quantities in our wave function.

Let's start with momentum. The operator for measuring the particle's momentum is -iħ ∂/∂x. So -iħ ∂Ψ/∂x = -iħ(i40/ħ)Ψ = 40Ψ. Since the result is a constant times Ψ, Ψ is an eigenfunction of the momentum operator with eigenvalue 40. So if we make a measurement, we will measure the momentum of the particle to be 40 kgm/s with certainty. In this case the shape of the momentum function is the same as Ψ, but its amplitude is everywhere scaled by 40.

Let's try energy. The energy operator is iħ ∂/∂t. So iħ ∂Ψ/∂t = iħ(-i160/ħ)Ψ = 160Ψ. So Ψ is an eigenfunction of the energy operator with eigenvalue 160. If we make a measurement, we will measure the energy of the particle to be 160 joules with certainty. As with momentum, the shape of the energy function is the same as Ψ, but its amplitude is scaled by 160.

Now we'll try the position. The position operator is simply x. But xΨ is not a constant times Ψ, so Ψ is not an eigenfunction of position. This means there is uncertainty about the position of the particle. The position function scales the amplitude of Ψ at each x-position by the ordinal value of that x-position (i.e., the spirals increase in amplitude along the x-axis like a cone). (Note that the eigenfunctions of the position operator are actually Dirac delta functions which spike at their respective x-positions and have zero amplitude everywhere else.)

We can actually predict these results by looking at the plane wave in Figure 1. It has a single wavelength which translates to a single momentum and energy (per the de Broglie relations). But it has the same amplitude everywhere so the particle's position is spread through spacetime. A wave function that provides a more localized position is a wave packet that combines a range of plane waves of different frequencies in superposition as shown in Figure 2.[4] The wave packet also curls in a spiral as it propagates along the x-axis over time, but it has a localized distribution of non-zero amplitude as illustrated in Figure 2 below.[5]

Figure 2 - Wave packet
This diagram represents a wave function at a single instant in time. Each white ball represents the complex amplitude of the wave at a particular x-position. This amplitude can be used to calculate the probability that a measurement will find the particle at that x-position.[6] Note that x-positions beyond the two ends of the wave packet have zero amplitude which means that the particle cannot be located at those x-positions.

Suppose that the particle were prepared in a superposition of Ψ = Ψ1 + Ψ2 = A1ei(p1x - E1t)/ħ + A2ei(p2x - E2t)/ħ.[7] For example, the amplitudes are A1=6 and A2=8 and the momenta are p1=5 kgm/s and p2=3 kgm/s.

If we apply the (linear) momentum operator, we get -iħ ∂/∂x(Ψ1 + Ψ2) = p1Ψ1 + p2Ψ2 (i.e., 5Ψ1 + 3Ψ2). So Ψ1 and Ψ2 are both eigenfunctions of the momentum operator with different momentum eigenvalues.

Now suppose a momentum of 3 kgm/s is measured in an actual experiment locating the experimenter with Ψ2. If measurement is a linear process, the particle's momentum would also be measured to be 5 kgm/s locating the experimenter with Ψ1.

The fact that the experimenter reports only one measurement outcome (with a probability ratio of A12:A22 which is 36:64 in our example) is what gives rise to the measurement problem. The Copenhagen Interpretation postulates that the wave function collapses to Ψ2 (Ψ = Ψ2) and Ψ1 disappears. Whereas the Many Worlds Interpretation assumes that the wave function Ψ continues to evolve unitarily with the experimenter and measuring apparatus now entangled with the particle in superposition.

--

[1] This post builds on the concept of exponential growth in the complex number plane that was explored in Visualizing Euler's Identity.

[2] Familiar examples of wave behavior are vibrating guitar strings and ocean waves. Quantum Mechanics applies this idea to all matter via the de Broglie hypothesis and so the Schrodinger equation is a wave equation that describes how matter waves evolve. For an excellent tutorial on wave equations, see here.

Note that it is important to distinguish between the quantum state and the wave function. A wave function is a representation of a quantum state in a particular basis, such as position or momentum. This post presents the wave function in the position basis (as plane wave states of definite momentum). It can be alternatively represented in the momentum basis via a Fourier transform.

[3] Given the plane wave solution Ψ = Aei(px - Et)/ħ, the Schrodinger equation can be derived. The time derivative ∂Ψ/∂t is how the wave function changes over time and is (-iE/ħ)Ψ. The first spatial derivative ∂Ψ/∂x is how the wave function slopes in space and is (ip/ħ)Ψ. The second spatial derivative ∂2Ψ/∂x2 is how the wave function curves in space and is (i2p22)Ψ which reduces to (-p22)Ψ.

Multiplying the time derivative by iħ gives iħ ∂Ψ/∂t = iħ(-iE/ħ)Ψ = EΨ. Multiplying the second spatial derivative by -ħ2/2m gives (-ħ2/2m)∂2Ψ/∂x2 = (-ħ2/2m)(-p22)Ψ = (p2/2m)Ψ = EΨ (E=p2/2m relates kinetic energy to momentum).

Therefore -ħ2/2m ∂2Ψ/∂x2 = iħ ∂Ψ/∂t which is the time-dependent Schrodinger equation for a free particle in one dimension. The equation for any non-relativistic particle is described in Figure 3 below. Note that ∇2 represents the second derivative over all space (x,y,z) and that V represents the potential energy which, for free particles, is zero.

Figure 3 - Time-dependent Schrodinger equation for a single non-relativistic particle

The Schrodinger equation expresses the principle of the conservation of energy consistent with the de Broglie relations. That is, the kinetic energy of the particle (which is proportional to the curvature of Ψ over space) plus the potential energy of the particle equals the total energy (which is proportional to the slope of Ψ over time).

The time-dependent Schrodinger equation can be more generally expressed as iħ ∂Ψ/∂t = ĤΨ where Ĥ is called the Hamiltonian operator (representing the total energy of the system) and differs with the situation or number of particles. In our free particle scenario where the potential energy is zero, Ĥ = -ħ2/2m ∂2/∂x2.

The simpler time-independent Schrodinger equation applies to stationary states and is ĤΨ = EΨ where E is the total energy of the system. This is an eigenvalue equation which means that the Hamiltonian operates on the function Ψ and produces a definite (and real) energy value E multiplied by the same function Ψ. If Ψ describes the physical system and satisfies the eigenvalue equation (meaning it is an eigenfunction), then that energy eigenvalue would be measured with 100% certainty. In general, Ψ will not be an eigenfunction of the Hamiltonian but, instead, will be a linear superposition of energy eigenfunctions (with the probability of measuring a particular energy eigenvalue being the squared magnitude of the amplitude of that eigenfunction, per the Born rule).

[4] Combining a finite number of plane waves also fails to avoid the problem of the particle being delocalized since the large wave packets will still be periodic through spacetime (with other smaller periodic wave packets in between). It is only in the limit that there is a single wave packet as other wave packets would, in effect, be infinitely far away. That is, an integral over a continuous range of wave numbers (or momenta, since p=kħ) produces a single localized wave packet.

[5] Figure 2 (enlarged below) is a snapshot of a localized particle's wave function at an instant in time. Per Euler's formula, ei(px - Et)/ħ = cos((px - Et)/ħ) + i.sin((px - Et)/ħ). So the complex spiral is the sum of the real cosine wave (at the back) and the imaginary sine wave (at the bottom) propagating along the x-axis. Each white ball represents the amplitude of the complex wave at that particular x-position (imagine the clock hand pointing laterally from the x-position on the x-axis to the white ball). Instead of visualizing a ball moving with the wave packet along the x-axis as time progresses, imagine that it remains at the same x-position, but simply spins around the x-axis in the complex plane, shrinking or expanding in magnitude as time progresses (i.e., as the wave packet propagates through that x-position).

Now imagine that the white ball is actually a linear combination of colored balls at that x-position, one for each plane wave in the superposition (and each with a different magnitude and phase). Each colored ball simply spins around the x-axis with a fixed magnitude but the colored balls taken together constructively and destructively interfere to produce the white ball that is seen in the image as the wave packet propagates through that x-position. That is, the entire propagating wave packet can be explained as a combination of fixed length clock hands spinning at different rates.

Figure 2 (enlarged) - Wave packet

[6] The probability that a measurement will find the particle at a particular x-position is calculated by squaring the magnitude of the wave function's amplitude at that particular x-position and time per the Born Rule. The amplitude is a complex number that, when multiplied by its complex conjugate, produces a real number that is the square of the magnitude. The squared magnitude is also the intensity of the wave function at that position and time.

[7] Ψ is a combination of different momentum basis states ei(pnx - Ent)/ħ, each with its own coefficient (amplitude) An where n is the index for the basis state.