Saturday, 11 February 2023

Locality and quantum mechanics

Until now, I have concentrated on trying to free quantum mechanics, as far as possible, from reference to measurement but quantum mechanics also has a problem with locality. However, firstly it is worth remembering that classical mechanics also had a locality problem. This is exemplified by the Newton's theory of gravity followed by Coulomb's law of electric charge attraction and repulsion. In both cases any local change in mass or charge, whether magnitude or position, had an instantaneous effect everywhere. There was no mechanism in the physics for propagation of the effect. The solution to this was found first for electricity in combination with magnetism. Faraday proposed the existence of a field. The mathematical formulation of this concept by Maxwell led to the classical electromagnetic theory and provided a propagation mechanism. 

The success of electromagnetic theory brought to the fore two problems with classical dynamics. The space and time translation invariance in classical Newtonian dynamics did not follow the same transformation rules as in the electromagnetic theory and there was still no mechanism for the propagation of gravitational effect. As is well known, Einstein solved both anomalies with first his special and then his general theory of relativity. 

By the time the general theory of relativity was formulated it was evident that classical theory had a further deep problem; it could not explain atomic and other micro phenomena. To tackle this problem solutions were found for specific situations. Max Plank introduce his constant \(\hbar\) to resolve the problem of the ultraviolet singularity in the black body radiation spectrum through energy quantisation. This same constant came to be fundamental in explaining atomic energy levels, the photoelectric effect role and more generally the quantisation of action.

Quantum theory took shape is the 1920's with the rival formations that agreed with experiment, by Heisenberg and Schrödinger (with much help from others), shown to be formally equivalent.  The space time translation symmetry of special relativity was also built into an equation for the electron proposed by Dirac that in turn implies the existence of anti-matter. But a fully relativistic quantum mechanics remains a research topic. 

To combine particle theory with electromagnetism quantum electrodynamics was developed. This theory was remarkably successful in its empirical confirmation but relied on some dubious mathematical manipulation. To deal with this the mathematical foundations of quantum field theory were examined. It is at this point that the first type of locality that we are going to consider appears in quantum theory in mathematically precise form.

Causal locality

 A basic characteristic of physics in the context of special relativity and general relativity is that causal influences on a Lorentzian manifold spacetime propagate in timelike or light-like directions but not space-like. Space-like separated points in space-time lie outside each other's light cone, which means that no influence can pass from one to the other. 

A further way of considering causality is that influences only propagate into the future in time-like and light-like directions, but this is not simple to dealt with in either classical special relativity or standard quantum mechanics because of their time reversal symmetry.  One approach would be to treat irreversible processes through coarse-grained entropy in statistical physics. But this seems more like a mathematical trick or treats irreversibility because of a lack of access to the detailed microscopic reversable dynamics. That is, as an illusion. A more fundamental approach is to develop a new physics as is being attempted by Fröhlich [1] and hopefully in this blog.

To return to Einstein causality, any two space-like-separated regions of spacetime should behave like independent subsystems. This causal locality is, with a slightly stronger technical definition, Einstein causality. This concept of locality when adopted in relativistic quantum theory (algebraic theory) implies that space-like separated local self-adjoint operators commute. This is sometimes known as microcausality. Microcausality is causal locality at the atomic level and below.

In quantum theory, where operators represent physical quantities, the microcausality condition requires that any operators commute that pertain to two points of space-time if these points cannot be linked by light propagation. This commutation means, as in standard quantum mechanics, that the physical quantities to which these operators correspond can be precisely determined locally, independently, and simultaneously. However, the operators in standard quantum theories and the non-relativistic alternatives discussed so far in this blog don't have a natural definition of an operator that is local in space-time.   For example, the position operator is not at any point in space. The points in space are held as potential values in the quantum state that is represented mathematically by the density matrix.  How these potential values become actual is dealt with in standard quantum mechanics by the Born criterion, which is, however, tied to measurement situations. To remove this dependence on measurement situations is a major aim of this blog and we will see that measurement only need be invoked when discussing how various form of locality and non-locality are known about.

As the introduction of classical fields cured Newtonian dynamics of action at a distance and eventually modified then replaced it with General Relativity, the development of quantum field theory could cure standard quantum mechanics of its causal locality problem. Local quantum theory as set out in the book by Haag [2] tackles this challenge. The technical details involved are too advanced to deal with here.

Although dealing with these questions coherently within non-relativistic quantum theory is not strictly valid it is possible to explore specific examples. Following Fröhlich [1], it is natural to consider the spin of the particle to be local to that particle. Therefore, the spin operators, whether represented by Pauli matrices or by projection operators that project states associated with some subsets of the spin spectrum, can be assigned unambiguously to one particle or another. 

In a situation where two particles are prepared so that they propagated in opposite directions their local interactions with other entities will eventually be space-like separated. The spin operators of one commute with the spin operators of the other. The local interaction of one cannot then be influence by the local interaction of the other. This is a specific example of microcausality

But what if the preparation of the two particles entangles their quantum states? This entanglement may persist over any subsequent separation, if the particle does not first undergo any interaction with other particles or fields. 

We note that entanglement is a state property whereas microcausality is an operator property and proceed to a discussion of entanglement and its consequences in a developed version of the two-particle example.

Entanglement and non-locality

The two-particle example we have been discussing only needs the introduction of a local spin measurement mechanism for each particle for it to become the version of the Einstein Podolsky, Rosen thought experiment formulated by David Bohm [3]. This post will follow Bohm's mathematical treatment closely but will avoid as far as possible invoking the results of measurements. Bohm's discussion follows the Copenhagen interpretation but also uses the concept of potentiality as developed by Heisenberg [4].

The system in this example consists of experimental setups (described below) for two atoms (\(1\) and \(2\)) with spin \(1/2\) (up/down or \(\pm\hbar\)). The \(z\) direction spin aspect of the state of the total system consists of four basic wavefunctions
$$ | a> = |+,z,1>| +,z,2) > $$
$$ | b> = |-,z, 1>| -,z, 2 > $$
$$ | c> = |+, z,1>| -,z, 2 > $$
$$ | d> = |-,z, 1>| +,z, 2 >, $$
it will be shown below that although the choice of the \(z\) direction is convenient the results of the analysis do not depend on it.
  
If the total system is prepared in a zero-spin state, then it is represented by the linear combination
$$ \tag{1} | 0> = |c> - |d>.$$    
This correlation of the spin states of the particles is an example of quantum entanglement.                                 
Each particle also has associated with its spin state a wavefunction that describes its motion and position. Theses space wavefunctions will not be shown explicitly here but are important conceptually because the particles aways move away from each other. The description of the thought experiment is completed by each particle undergoing a Stern-Gerlach experimental interaction at space-like separated regions of space-time, as shown below.
Two space-like separated Stern-Gerlach interaction situations.


The detecting screen is a part of an experimental setup that is need for confirming the predictions of the theory but not the physics of the effects. Here we are primarily concerned with the interaction of the particles with the magnetic field \(\mathfrak{H}\). The component of the system Hamiltonian for the interaction of the particle spin with the magnetic field is, from Bohm [3],
$$ \mathcal{H}= \mu (\mathfrak{H}_0 + z_1 \mathfrak{H}'_0 )\sigma_{1,z} +\mu (\mathfrak{H}_0 + z_2 \mathfrak{H}'_0 )\sigma_{2,z} $$
where \(\mu = \frac{e \hbar}{2mc} \), \(\mathfrak{H}_0 = (\mathfrak{H}_z)_{z=0}\) and \(\mathfrak{H}'_0 =(\frac{\partial \mathfrak{H}_z}{\partial z})_{z=0}\).  \(m\) and \(e\) are the electron mass and charge. \(c\) is the speed of light in vacuum. We also assume the magnetic fields have the same strength and spatial form in both regions but this not essential. It is also assumed that each particle interacts with its own local magnetic field at the same time. This is not a limiting assumption, but it is essential to assume that the time of the interaction is short enough for the local space-time regions to remain space-lie separated.

The Schrödinger equation can now be solved for a wavefunction of the form
$$ |\psi> = f_c |c> + f_d |d>$$
with initial conditions given by equation (1). The result is, once the particles have passed through the region with non-zero magnetic field strength
$$f_c = \frac{1}{\sqrt{2}}e^{-i \frac{\mu\mathfrak{H}'_0}{\hbar}(z_1-z_2) \Delta t} $$
and
$$f_d = - \frac{1}{\sqrt{2}}e^{i \frac{\mu\mathfrak{H}'_0}{\hbar}(z_1-z_2) \Delta t}. $$
Where \(\Delta t\) is the time it takes for the particles to pass through the magnetic field.
Inserting the above results into the equation for \(|\psi>\) gives the post interaction wavefunction
$$ |\psi>=\frac{1}{\sqrt{2}}e^{-i \frac{\mu\mathfrak{H}'_0}{\hbar}(z_1-z_2) \Delta t} |c> - \frac{1}{\sqrt{2}}e^{i \frac{\mu\mathfrak{H}'_0}{\hbar}(z_1-z_2) \Delta t}|d>.$$
Therefore, for a system prepared with total spin zero undergoing local interactions in space-like separated regions, as shown in the figure, there is equal probability for each particle to be deflected either up or down. However, because of the correlation when one is deflected up the other is deflected down. 

This may seem unsurprising because the total spin is prepared to be zero. No more surprising than taking a green card and a blue card, putting them in identical envelopes, shuffling them and then giving one to a friend to take far away. Opening the envelope you kept and seeing a green card means that the distant envelope contains a blues card. This, clearly, is not a non-local influence.

However, this is not the end of the story. As mentioned above, there is nothing special about the \(z\) direction of spin. The same analysis can be carried with \(x\) direction states, as follows 
$$ | a'> = |+, x,1>| +, x,2) > $$
$$ | b'> = |-, x, 1>| -, x, 2 > $$
$$ | c'> = |+, x,1>| -, x, 2 > $$
$$ | d'> = |-, x, 1>| +,x, 2 > $$ 
and again, the zero total spin state is
$$\tag{2} | 0'> = |c'> - |d'>.$$
Using the standard spin state relations (valid for both particles one and two, by introducing the appropriate tags (1 or 2), see Bohm [3])

\( |+,x> = \frac{1}{\sqrt{2}}(|+,z> + |-,z>)\) and \( |-,x> = \frac{1}{\sqrt{2}}(|+,z> - |-,z>)\)

Inserting into equation (2), with some algebra, it can be shown that 
$$ |0'> = |0>. $$
Therefore, if the Stern-Gerlach setup is rotated to measure the \(x\) component of spin, exactly the same analysis can be carried out as for the \(z\) component giving the same anti-correlation effect. It must be stressed that we are discussing physical effects and not the results of experiments or the experimenter's knowledge of events at this point.

In general, there is no reason for the two space-like separated setups to be chosen in the same direction.  If the choice is effectively random then when the direction of interaction does not coincide there will be no correlation between the outcomes but if they happen to be in the same direction, then there will be the \(\pm\) anti-correlation. Locally the spin operators for the \(x, y\) and \(z\) do not commute. Their values are potential rather than actual and remain non-actual after the interactions. The situation is not like the classical coloured cards in envelopes example. There is no direction of spin fixed by the initial state preparation. Indeed, that would be inconsistent with a total spin zero state preparation. What the interaction does is chose a \(\sigma\)-algebra from the local \(\sigma\)-complex but the spin state of the system remains entangled.

As far as local effects are concerned, each particle behaves as expected for a spin \(1/2\) particle. This is causal locality. It is only if someone gets access to a sequence of measurements from both regions (here is the only place where detection enters this description of the physics of this situation) that the anti-correlation effect can be confirmed. 

The effect depends on the preparation of the initial total system state. There is persistent correlation across any distance just as in the green and blue card example, but it is mysterious because the initial state does not hold an actual value of each spin component for each particle, unlike the actuality green and blue card example. There is no way for the one particle to be influenced by the choice of direction of measurement at the region where the other particle is, but a correlation of potentiality persists that depends on the details of the total quantum state.

It is perhaps too early to simply accept that there are non-causal, non-classical correlations of potentialities between two space-like separated regions.  That would be a quantum generalisation of the blue and green card example. What the theory does predict is that the effect due to entanglement is not just epistemic but physical once potentiality is accepted as an aspect of the ontology.

References

\(\mbox{[1] }\) Fröhlich, J. (2021). Relativistic Quantum Theory. In: Allori, V., Bassi, A., Dürr, D., Zanghi, N.(eds) Do Wave Functions Jump? . Fundamental Theories of Physics, vol 198. Springer, Cham.    https://doi.org/10.1007/978-3-030-46777-7_19
\(\mbox{[2] }\) Haag, R. (1996). Local Quantum Physics: Fields, Particles, Algebras, 2nd revised edition, Springer Verlag
\(\mbox{[3] }\) Bohm, D. (1951). Quantum Theory, Prentice Hall
\(\mbox{[4] }\) Heisenberg, W (1958). Physics and Philosophy: The Revolution in Modern Science. New York: Harper.

Tuesday, 24 January 2023

Prediction, indeterminacy, and randomness

 The previous post introduced the ETH (Events, Trees and Histories) approach to the foundations and completion of quantum theory. This post addresses why it is impossible to use a physical theory to predict the future and why quantum mechanics is probabilistic although the Schrödinger is deterministic.

Prediction uses avalable information to get knowledge about what will happen. It is an epistemic concept. The impossibility of prediction does not mean that the world is not deterministic, not governed by probabilistic law or even not governed by any laws at all. However, if we could predict the future reliably then this would be evidence that the world is deterministic. We will examine Fröhlich's [1] arguments.

Impossibility of prediction

It is always possible for someone to make a series of successful guesses about the future but what we are considering here is prediction based on available information and physical theory. To predict an event, we must know of everything that can affect that event. That is what the event is and when and where it takes place. Fröhlich's [1] argument is simple and illustrated by the diagram below.

The diagram uses the standard light-cone representation of space-time. The Future is at the top and the Past at the bottom. The predictor sits at the Present and has in principle access to all information in their Past light-cone. That is, they can access information on all past events but no information on what happens outside their Past light-cone. There is causal structure within the Past light-cone. Events 1 and 2 are space-like separated. They cannot influence each other. Event 3 is in the future of 2, so event 2 can influence 3. Of significance for the argument are the events outside the predictors light-cones (Future and Past) that are in the past light-cone of the predicted event. These events can influence what happens at the space-time point of the predicted event, but the predictor can have no knowledge of them. Therefore, the predictor cannot predict in principle what will happen at a future space time point. In practice it is common knowledge that prediction is seen to work many situations. These situations are controlled to isolate the predicted phenomenon from the influence outside the predictors control.   

Indeterminacy of quantum mechanics

The impossibility of reliable prediction does not imply indeterminacy.

Consider an isolated system. That is, over a period of time its evolution is independent of the rest of the universe. It is only for isolated physical systems that we know how to describe the time evolution of operators representing physical quantities in the Heisenberg picture (in terms of the unitary propagation of the system). 

In the Heisenberg picture states of \(S\) are given by density matrices, \(\rho\), acting on a separable Hilbert space, \(\mathcal{H}\), of pure state vectors of S as in the mathematical formulation presented previously. Let \(\hat{X}\) be a physical property of \(S\), and let \(X(t) = X(t)^∗\) be the self-adjoint linear operator on \(\mathcal{H}\) representing \(\hat{X}\) at time t. Then the operators \(X(t)\) and \(X(t')\) representing \(\hat{X}\) at two different times \(t\) and \(t'\), respectively, are related by a unitary transformation:
$$ X(t) = U(t', t) X(t') U(t,t') $$
where, for each pair of times \(t, t'\), \(U(t, t')\) is the propagator (from \(t'\) to \(t\)) of the system \(S\), which is a unitary operator acting on \(\mathcal{H}\), and \(\{U(t,t')\}_{t,t'} \in \mathbb{R}\) satisfy
$$ U(t, t') · U(t', t'') = U(t, t''), \forall  \mbox{ pairs } t, t', U(t, t) = 1 , \forall  t $$
However, in the Copenhagen interpretation, whenever a measurement is made, at some time t, say the deterministic unitary evolution of the state of \(S\) in the Schrödinger picture is interrupted, and the state collapses into an eigenspace of the selfadjoint operator representing the physical quantity that is measured and over that eigenspace the probabilities are given by Born’s Rule. This is what we have previously called the selection of a \(\sigma\)-algebra fron the \(\sigma\)-complex of \(S\).

As I am working towards a formulation of quantum mechanics that does not give a special status to measurement or observers. In the post Modal categories and quantum chance Born's rule is invoked as follows
The probability measure describes a contingent mode of being for the quantum system with a spectrum of valued that are possible and become actual. What is missing is an understanding of the timing of the actualisation. In all the versions of quantum theory considered so far time behaves the same way as in classical physics.

 While the question of timing is still to be resolved, quantum mechanics should be a theory that incorporates random events that are not derived from the deterministic evolution of the state. However, that evolution does govern the probabilities of becoming actual of property values associated with the spectrum of the self-adjoint operators representing the physical properties of \(S\). 

It should be noted that Bohmian mechanics provides an alternative model in which the uncertainty of the outcome is due to uncertainty in the initial conditions for the dynamical evolution. In the Bohmian theory the uncertainty is epistemic, and the dynamics is deterministic. The Bohmian theory needs to introduce this uncertainty into an otherwise deterministic theory to obtain empirical equivalence to standard quantum mechanics.

The formulation of the quantum mechanics presented in this blog is, so far, consistent with the ETH approach [1]. Some aspects of special relativity have now been introduced even though a relativistic formulation of quantum mechanics has not been presented yet. This will be part of the formulation of a mathematical description of the quantum "event".


References

\(\mbox{[1] }\) Fröhlich, J. (2021). Relativistic Quantum Theory. In: Allori, V., Bassi, A., Dürr, D., Zanghi, N.(eds) Do Wave Functions Jump? . Fundamental Theories of Physics, vol 198. Springer, Cham.    https://doi.org/10.1007/978-3-030-46777-7_19

Wednesday, 11 January 2023

The Quantum Mechanics of Events, Histories and Trees

It is time to return to quantum mechanics. The approach I have been developing is a generalised probability theory were the quantum state sits on a complex of probability spaces. In my review post of October 2022 I referred to the work of Fröhlich and colleagues and their search for a fundamental theory of quantum mechanics. They call it ETH (Events, Trees, and Histories). Theirs is also an approach that proposes that quantum mechanics is fundamentally probabilistic and that it describes events and not just measurements. So, I will, over several posts, work through their theory to learn how some of the gaps in my own approach may be addressed. The picture below gives an early indiction how the concept of possibilities fit into the ETH scheme. An event is identified with the realisation of a possibility.


Illustration of ETH - Events, trees, and histories

It has been a theme of my posts to try and clarify the philosophical fundamentals, especially ontology, associated with a physical theory. So, I will start the review of ETH with the introduction to a paper in which Fröhlich sets out his "credo" for his endeavour [1]. His credo is:

  1. Talking of the “interpretation” of a physical theory presupposes implicitly that the theory has reached its final form, but that it is not completely clear, yet, what it tells us about natural phenomena. Otherwise, we had better speak of the “foundations” of the theory. Quantum Mechanics has apparently not reached its final form, yet. Thus, it is not really just a matter of interpreting it, but of completing its foundations.
  2. The only form of “interpretation” of a physical theory that I find legitimate and useful is to delineate approximately the ensemble of natural phenomena the theory is supposed to describe and to construct something resembling a “structure-preserving map” from a subset of mathematical symbols used in the theory that are supposed to represent physical quantities to concrete physical objects and phenomena (or events) to be described by the theory. Once these items are clarified the theory is supposed to provide its own “interpretation”. (A good example is Maxwell’s electrodynamics, augmented by the special theory of relativity.)
  3. The ontology a physical theory is supposed to capture lies in sequences of events, sometimes called “histories”, which form the objects of series of observations extending over possibly long stretches of time and which the theory is supposed to describe.
  4. In discussing a physical theory and mathematical challenges it raises it is useful to introduce clear concepts and basic principles to start from and then use precise and – if necessary – quite sophisticated mathematical tools to formulate the theory and to cope with those challenges.
  5. To emphasize this last point very explicitly, I am against denigrating mathematical precision and ignoring or neglecting precise mathematical tools in the search for physical theories and in attempts to understand them, derive consequences from them and apply them to solve concrete problems.
 where I have added the numbering for easy reference. Let's take them one by one.

  1. I agree completely with this comment although I may have lapsed occasionally into using the term "interpretation" loosely. So, a possibility to be investigated is that in addition to the standard formulation of quantum mechanics there may be an additional stochastic process that describes event histories.
  2. The use of "interpreted" in the second paragraph has to do with the scope and meaning of the theory. We have the physical quantities that are to be described or explained, their mathematical representation and then the various theoretical structures that can make use of these quantities in their mathematical representation. In this way it should be clear from the outset what the intended theory is about. It about things and their mathematical representation.
  3. This statement poses more of a problem. For me, the ontology has more to do with the concerns in the previous paragraph.  While I agree that there are events, there must be physical quantities that participate in these events. These quantities must also form part of the ontology. For example, atoms may be made up of more fundamental particles. The atoms and the more fundamental particle are part of the ontology, and it is part of the structure of the ontology that atoms are made up of electrons, protons, and neutrons. Neutrons and protons are made up of still more fundamental particles. I would also include fields and possible states of the physical objects in the ontology.
  4. Again, I agree. Feynman is known to have said that doing science is to stop us fooling ourselves. He was thinking primarily of comparing predictions with the outcome of experiments. However, mathematics also plays this role. By formulating rigorously the mathematics of a theory and following strictly the consequences we can avoid introducing implicit assumptions that make thing work out "all right" when they should not. When we get disagreement with experiment then we can be sure that it is the initial assumptions about objects or their mathematical representation that is at fault.
  5.  Frölich's reputation is as an especially rigorous mathematical physicist and not only philosophers, but many physicists take such a rigorous approach to the mathematics to be rigour for rigour's sake. While I do not claim his skills, I am more than happy to try ans learn form an approach that emphasises precise mathematics.

Within this "credo" Fröhlich and collaborators address:

  1. Why it is fundamentally impossible to use a physical theory to predict the future.
  2. Why quantum mechanics is probabilistic.
  3. The clarification of "locality" and "causality" in quantum mechanics.
  4. The nature of events.
  5. The evolution of states in space-time.
  6. The nature of space-time in quantum mechanics.
We will work our way through these topics in upcoming posts.

Reference
  1. Fröhlich, J. (2021). Relativistic Quantum Theory. In: Allori, V., Bassi, A., Dürr, D., Zanghi, N. (eds) Do Wave Functions Jump? . Fundamental Theories of Physics, vol 198. Springer, Cham. https://doi.org/10.1007/978-3-030-46777-7_19

Monday, 9 January 2023

Conditional probability: Renyi versus Kolmogorov

Four years ago, I wrote about Renyi's axiomisation of probability that, in contrast to that of Kolmogorov, takes conditional probability as the fundamental concept. It is timely to revisit the topic given my last post on Kolmogorov's axioms.  In addition, Suarez (whose latest book was also discussed in my last post) appears to endorse the Renyi axioms over those of Kolmogorov although only in a footnote. Stephen Mumford, Rani Lill Anjum and Johan Arnt Myrstad in the book What Tends to Be, Chapter 6 also follow their analysis of conditional probability in the Kolmogorov axiomisation by taking the view that conditional probabilities should not be reducible to absolute probabilities.  

In his Foundations of Probability (1969) Renyi provided an alternative axiomisation to that of Kolmogorov that takes conditional probability as the fundamental notion, otherwise he stays as close as possible to Kolmogorov. As with the Kolmogorov axioms, I shall replace reference to events with possibilities. 

Renyi's conditional probability space \((\Omega, \mathfrak{F} (, \mathfrak{G}, P(F | G))\) is defined as follows. 

The set \(\Omega\) is the space of elementary possibilities and \(\mathfrak{F}\) is a \(\sigma\)-field of subsets of \(\Omega\) and \(\mathfrak{G}\), a subset of \(\mathfrak{F}\) (called the set of admissible conditions) having the properties:
(a) \( G_1, G_2 \in \mathfrak{G} \Rightarrow G_1 \cup G_2 \in \mathfrak{G}\),
(b) \(\exists \{G_n\}\),  a sequence in \(\mathfrak{G}\), such that \(\cup_{n=1}^{\infty} G_n = \Omega,\)
 (c) \(\emptyset \notin \mathfrak{G}\),
\(P\) is the conditional probability function satisfying the following four axioms.
R0. \( P : \mathfrak{F} \times \mathfrak{G} \rightarrow [0, 1]\),
R1. \( (\forall G \in \mathfrak{G} ) , P(G | G) = 1.\)
R2. \((\forall G \in \mathfrak{G}) , P(\centerdot | G)\) , is a countably additive measure on \(\mathfrak{F}\).
R3. If \(\forall G_1, G_2 \in \mathfrak{G}, G_2 \subseteq G_1 \Rightarrow P(G_2 | G_1) > 0\), 
$$(\forall F \in \mathfrak{F}) P(F|G_2 ) = { \frac{P(F \cap G_2 | G_1)}{P(G_2 | G_1)}}.$$
Several problems have been examined by Stephen Mumford, Rani Lill Anjum and Johan Arnt Myrstad in What Tends to Be, Chapter 6, as part of a critique of the applicability of Kolmogorov's definition of conditional probability to the ontology of dispositions that tend to cause or hinder events. These have been analysed by them using Kolmogorov's absolute probabilities, but without a careful construction of the probability space appropriate for the application. These same examples will be analysed here using both Kolmogorov's and Renyi's formulation. 

The first example that indicates a problem with absolute probability (absolute probability will be denoted by \(\mu\) below to avoid confusion with Renyi's function \(P\), the \(\sigma\)-field, \(\mathfrak{F}\) is the same for both).

P1. For \(A, B \in \mathfrak{F}\), let \(\mu(A) = 1\) then \(\mu(A | B) =1\),  \(\mu\) is Kolmogorov's absolute probability

Strictly this holds only for sets \(B\) with \(\mu(B) \gt 0\). We can calculate this result from Kolmogorov's conditional probability postulate as follows: since 
$$\mu(A \cap B) = \mu(B),$$ 
$$\mu(A|B) = \mu(A \cap B)/\mu(B) = \mu(B)/\mu(B)=1.$$ 
This is not problematic within the mathematics but Mumford et al consider it to be if \(\mu(A|B)\) is to be understood as a degree of implication. They claim that there must exist a condition under which the probability of \(A\) decreases. They justify this through an example:
Say we strike a match and it lights. The match is lit and thus the (unconditional) probability of it being lit is one. Still, this does not entail that the match lit given that there was a strong wind. A number of conditions could counteract the match lighting, thus lowering the probability of this outcome. The match might be struck in water, the flammable tip could have been knocked off, the match could have been sucked into a black hole, and so on. 
Let us analyse this more closely. Let \(A =\) "the match lights". Then, "The match is lit and thus the (unconditional) probability of it being lit is one." is equivalent to \(\mu(A|A) = 1\). This is not controversial. They go on to bring other considerations into play and, intuitively, it seems evident that whether a match is lit or not will depend on the existing situation. For example, on whether it is wet or dry, windy, or not, and whether the match is stuck effectively.   But this enlarges the space the elementary possibilities. In this enlarged probability space, the set \(A\) labelled "the match lights" is

$$ A=\{(\textsf{"the match is lit", "it is windy", "it is dry", "match is struck"}\} \cup \\ \{(\textsf{"the match is lit", "it is not windy", "it is dry", "match is struck"}\} \cup  \\ \{(\textsf{"the match is lit", "it is windy", "it is not dry", "match is struck"}\} ......$$ 
where \(......\) indicates all the other subsets (elementary possibilities) that make up \(A\).

Each elementary possibility is now a 4-tuple and \(A\) is the union of all sets consisting of a single
 4-tuple in which the first item is "the match is lit".  Similarly, a set can be constructed to represent \(B=\) "match stuck". The probability function over the probability space is constructed from the probabilities assigned to each elementary possibility. An assignment can be made such that 
$$ \mu(A|B) =1 \textsf{ or } \mu(A| B^C) =0 $$
where \(B^C\) ("match not struck") is the complement of \(B\) .  It would not be a physically or causally feasible allocation of probabilities to have \( \mu(A| B^C) =1 \) whereas \( \mu(A| B^C) =0 \) is. Indeed, a physically valid allocation of probabilities should give \(\mu(A \cap B^C) = \mu(\emptyset) =0\).  All Kolmogorov probability assignments of elementary possibilities with "the match is lit" and "match not struck" in the 4-tuples of elementary possibilities should be zero. The Kolmogorov ratio formula for the conditional probability would apply in the case when all the conditions are accommodated in the set of elementary possibilities. Therefore, P1 is not a problem for the Kolmogorov axioms if the probability space is appropriately modelled.

Appropriate modelling is just as relevant when using the Renyi axioms. I addition, as we are working in the context of conditions influencing outcomes, we will not allow outcomes that cannot be influences to be in the set of admissible conditions \(\mathfrak{G}\). This has no effect on the analysis of P1 but, as will be discussed below, is important for modelling causal conditions.

A further problematic consequence of Kolmogorov's conditional probability, according to Mumford et al, is when \(A\) and \(B\) are probabilistically independent
P2. \(\mu (A\cap B)=\mu(A )\mu(B)\) implies \(\mu(A|B)=\mu(A).\)
This is indeed a consequence of Kolmogorov's definition. Renyi's formulation does not allow this analysis to be carried out, unless \(\mathfrak{G} = \mathfrak{F}\).  Mumford et al illustrate there concern through an example
The dispositional properties of solubility and sweetness are not generally thought to be probabilistically dependent.
Whatever is generally thought, the mathematical analysis will depend on the probability model. If two properties are probabilistically independent, then that should be captured in the model. However the objections of Mumford et al are combined with a criticism of Adam's Thesis
Assert (if B then A) = \(P\)(if B then A) =\(P\)(A given B) = \(P(A|B)\)

where \(P(A|B)\) is given by the Kolmogorov ratio formula. However, it should be remembered that the Kolmogorov ratio formula can be simply showing correlation and not that B causes A or that B implies A to any degree. I do not want to get into defending or challenging this thesis here but within the Renyi axiomisation the Kolmogorov conditional probability formula only holds under special conditions, see R3. Independence, in Renyi's scheme, is only defined with reference to some conditioning set, \(C\) say. In which case probabilistic independence is described by the condition

$$ P(A \cap B |C) = P(A|C)P(B|C)$$
and as a consequence, it is only if \(B \subseteq C\) that
$$P(A|B) =\frac{P(A \cap B | C)}{P(B | C)} = P(A|C)$$
This means that if a set \(D\) in \(\mathfrak{G}\) that is not a subset of \(C\) is used to condition \(A \cap B\) then, in general,
$$P(A \cap B |D) \neq P(A|D)P(B|D)$$
even if
$$P(A \cap B |C) = P(A|C)P(B|C).$$
This shows that in the Renyi axiomisation statistical independence is not an absolute property of two sets. 

The third objection is that regardless of the probabilistic relation between \(A\) and \(B\), a third consequence of the Kolmogorov conditional probability definition is that whenever the probability of \(A\) and \(B\) is high \(\mu(A|B)\) is high and so is \(\mu(B|A)\):
P3. \((\mu(A \cap B) \sim 1) \Rightarrow((\mu(A|B) \sim 1) \land \mu(B|A) \sim 1)).\)
If \(\mu(A \cap B) \sim 1\) then \(A \equiv B\) but for a set of measure zero. Then \(\mu(A) \sim 1\) and \(\mu(B) \sim 1\) that implies the statement in P3. Mumford et al object
The probability of the conjunction of ‘the sun will rise tomorrow’ and ‘snow is white’ is high. But this doesn’t necessarily imply that the sun rising tomorrow is conditional upon snow being white, or vice versa.
That may the case but the correlation between situations where both are the case is high. Once again, the problem is the identification of conditional probability with a degree of implication in Adam's Thesis. But it is well known that conditional probability may simply capture correlation. If we want to separate conditioning sets from other sets that are consequences in the \(\sigma\)-algebra generated by all elementary possibilities, then Renyi's axioms allow this. 

The Renyi equivalent of P3 is
$$ P(A \cap B|C) \sim 1 \Rightarrow (P(A|B) \sim 1, B \subseteq C) \land (P(B|A) \sim 1, A \subseteq C$$
 
It does holds, when both \(A\) and \(B\) are subsets of \(C\) but that is then a reasonable conclusion for the case of both \(A\) and \(B\) included in \(C\). However, if one of the sets is not a subset of C then it will not hold in general.  

When \(\mathfrak{G}\) is a smaller set than \(\mathfrak{F}\) it becomes useful for causal conditioning.  We can exclude sets from \(\mathfrak{G}\) that are outcomes and include sets that are causes. If we are interested in causes of snow be white, we will condition in facts of crystallography and local conditions that may turn snow yellow, as pointed out by Frank Zappa. 

For the earlier example above the set \(A\), "the match lights" would not be included in \(\mathfrak{G}\). So for \(C \in \mathfrak{G}, P(A|C)\) is a defined probability measure but \(P(C|A)\) is not.

The Kolmogorov axioms are good for modelling situations where measurable sets represent events of the same status. If there are reasons to believe that some sets have the status of causal conditions for other sets then they should be modelled with Renyi's axiomisation (or some similar axiomisation) as subsets of the set of admissible conditions.

The next question is whether adopting, and modelling fully, with the Renyi scheme allows a counter to objections such as those of Humphreys (Humphreys, P. (1985) ‘Why Propensities Cannot Be Probabilities’, Philosophical Review, 94: 557–70.) to using conditional probabilities to represent dispositional probabilities. 

Friday, 23 December 2022

The Kolmogorov probability axioms and objective chance

Philosophy of Probability and Statistical Modelling by Mauricio Suárez [1] provides a historical overview of the philosophy of probability argues for the significance of this philosophy for well founded statistical modelling.  Within this wider scope, the book discusses, the themes: objective probability, propensities, and measurements. So, it has much in common with the themes of this blog.  Suarez defends a model of objective probability that disentangles propensity from single-case chance and the observed sequences of outputs that result in relative frequencies of outcomes.  This is close to what I argued for in Potentiality and probability, but there are differences that I will return to them in a future post.

Suarez's main theme is objective probability, but he expends a lot of effort on examining subjective probabilities. I have no doubt that subjective probability has its place alongside objective probability but ad hoc mixture of the two is to be avoided. Subjective probability has been zealously defended by de Finetti and Jaynes to the point of them attempting to eliminate objective probability altogether. I believe, however, the posts in this blog make a case for objective probability as an aspect of ontology. In this post I will focus on the Kolmogorov axioms of probability. Curiously, it is only in examining subjective probability that Suarez discusses the axioms of probability.

The axioms that Suarez states are as follows.

Let \(\{E_1 , E_2, …, E_n\}\) be the set of events over which an agent´s degrees of belief range; and let \(\Omega\) be an event which occurs necessarily. The axioms of probability may be expressed as follows:

Axiom 1: \(0 \le P (E) \le 1\), for any \(P (E)\): In other words, all probabilities lie in the real unit number interval.

Axiom 2: \(P (\Omega) =1\): The tautologous proposition, or the necessary event has probability one.

Axiom 3: If \(\{E_1, E_2, …, E_n\}\) are exhaustive and exclusive events, then \(P (E_1) + P(E_2) + … + P(E_n) = P (\Omega) = 1\): This is known as the addition law and is sometimes expressed equivalently as follows: If \(\{E_1, E_2, …, E_n \}\) is a set of exclusive (but not necessarily exhaustive) events then: \(P (E_1 \vee E_2 \vee … E_n) = P (E_1) + P(E_2) + … + P(E_n)\).

Axiom 4: \(P (E_1 \& E_2) = P (E_1 | E_2) P (E_2)\). This is sometimes known as the multiplication axiom, the axiom of conditional probability, or the ratio analysis of conditional probability since it expresses the conditional probability of \(E_1\) given \(E_2\).

According to Suarez, the Kolmogorov axioms are essentially equivalent to those above. The axioms that Kolmogorov published it in 1933 have become the standard formulation. The axioms themself form a short passage near the start of the book. 

 Nathan Morrison has translated the Kolmogorov axioms [2] as:

Let \(E\) be a collection of elements ξ, η, ζ, ..., which we shall call elementary events, and \(\mathfrak{F}\) a set of subsets of \(E\); the elements of the set \(\mathfrak{F}\) will be called random events

I. \(\mathfrak{F}\) is a field of sets. 

II. \(\mathfrak{F}\) contains the set \(E\).

III. To each set \(A\) in \(\mathfrak{F}\) is assigned a non-negative real number \(P(A)\). This number \(P(A)\) is called the probability of the event \(A\).

IV. \(P(E)\) equals \(1\).  

V. If \(A\) and \(B\) have no element in common, then 

$P(A+B) =P(A)+P(B)$

A system of sets \(\mathfrak{F}\), together with a definite assignment of numbers \(P(A)\), satisfying Axioms I-V, is called a field of probability

Terminology has moved on and it would now be usual to identify the tiple \((E, \mathfrak{F}. P)\) as the probability space with the term field reserved for the Borel field of sets \(\mathfrak{F}\). In addition, it is preferable not to use the same symbol for the addition of numbers and the union of sets. It is also important to note that ξ, η, ζ, ..., indicating the elements of \(E\) does not constrain the set of elementary events to be a countable set. This is important for applications to statistical and quantum physics where, for example, particle positions position and momentum can take a continuum of values. 

It is a shorthand when Kolmogorov writes in Axiom III that the number \(P(A)\) is the probability of the event \(A\). The full interpretation is that \(P(A)\) is the probability that the outcome will be an element in \(A\). This does seem to be the standard, if often implicit, understanding of the situation. In the same Axiom III, the use of the term "assigned" is also deceptive. The probabilities are more properly assigned to the singleton set with each element of \(E\) as the sole member and from that the probability for each set in\(\mathfrak{F}\) is constructed.

The salient differences between the axioms provided by Suarez and those of Kolmogorov are:
  • Suarez provides axioms only for a discrete finite set of events
    • What he calls events are, within the restriction in the bullet above are Kolmogorovs elementary events.
    • There is therefore no need to introduce the field of sets
  • Apart from event, Suarez uses the language and symbols of propositional logic  
    • As he is dealing with probabilities as credences (degrees of belief) in this section of his book it would have been better to consistently employ the language and symbols of propositional logic.
    • This would give a mapping
                                Logical                                    Set theoretical
                    elementary propositions                       elementary events
                    Logical 'and', \(\land\) or \(\&\)          Set intersection \(\cap\)
                      Logical 'or' \(\vee\)                           Set union \(\cup\) 
                    Tautology \(\Omega\)                        Set of elementary events
  • Axiom 1 should read:  \(0 \le P (E) \le 1\), for any \(E\). This axiom is not one of Kolmogorov's but can be derives from them.
    • Caution: in the Suarez version \(E\) is an arbitrary 'event' whereas in Kolmogorov this symbol is used for the set of elementary events.
  • Kolmogorov has no equivalent to Axiom 4 in his list but introduces the equivalent formula for conditional probability later, as a definition. I think it is better to list it as one of the axioms.
As a formal structure, it would have been even better if Kolmogorov had used rigorously the language of sets and not used a term like 'event'. A set theoretic formulation with potential possibilities and numerical probability assignment would then be:

Let \(E\) be a collection of elements ξ, η, ζ, ..., which we shall call elementary possibilities and \(\mathfrak{F}\) a set of subsets of \(E\). 

I. \(\mathfrak{F}\) is a field of sets. 

II. \(\mathfrak{F}\) contains the set \(E\).

III. To each set \(A\) in \(\mathfrak{F}\) is assigned a non-negative real number \(P(A)\). This number \(P(A)\) is called the probability of \(A\), an element of (\mathfrak{F}\).

IV. \(P(E)\) equals \(1\).  

V. If \(A\) and \(B\) in \(\mathfrak{F}\) have no element in common, then 

$P(A \cup B) =P(A)+P(B)$

                    VI. For any \(A\) and \(B\) in \(\mathfrak{F}\) 

                                    \(P (A | B) = \frac{P (A \cap B)}{P (B)}\), 

                          is the conditional probability

Such a system of sets \(\mathfrak{F}\), \(E\), together with a definite assignment of numbers \(P(A)\) for all \(A \in \mathfrak{F}\), satisfying Axioms I-V, is called a probability space.

I propose that this is a neutral set of axioms for a probability theory. It has one advantage that it can be applied to eventless situations such as in the quatum description of a free particle. The formualtion is based on the mathematics of measure theory plus a numerical probability assignment and the identification of a set of possibilities. I have added, as is often done, the definition of conditional probability as an axiom.  It is possible to map this formulation to one in terms of propositions and logical operators. This move would not restrict application and interpretation to subjective probability. That would be governed by the meaning given to the numerical probability assignment. The axiom system considered here take the numerical probability assignment as fundamental and the conditional probability is then added. It is also possible to take conditional probability as fundamental, as done by Renyi. I will discuss this formulation in a future post.

In adopting these axioms for objective chance, the collection of elements can again be called elementary events or possible events, and \(P\) is a numerical assignment furnished by a theory or estimated through experiment. The elements of (\mathfrak{F}\) are not, in general, outcomes in the way that the elements of \(E\) are. An element of (\mathfrak{F}\) is a set of some elements of \(E\). That this can useful can be made clear through two examples.

A simple physical example is provided by a die with six sides and in normal game playing circumstances this provides six possibilities for which face will face upwards when thrown. These six possibilities are the elementary events \(E\) in the probability space for a dice throwing game with one die. I will also call these elementary events outcomes. These outcomes are not in \(\mathfrak{F}\), the set of subsets of \(E\) that Kolmogorov calls events. For example, the outcome "face 5 faces upwards" is not in set of random events but the subset {"face 5 faces upwards"} is. On one throw of the die we can only obtain an element of \(E\) so what are the random events \(\mathfrak{F}\)? Consider \(P(E)\) that is equal to one. It is usual to interpret this as saying that \(E\) is the sure event. But in one throw of the die we will never get \(E\) but only one element of it. What is therefore sure is that any outcome will be in \(E\).  

So far, I have said little about \(P\) itself.  In the die example \(P(\){"face 5 faces upwards"}\()\)  can be estimated as the proportion of times it occurs in a long run of repetitions. For anyone familiar with the relative frequency interpretation of probability, I emphasise that this in not such an interpretation. Here the relative frequencies are an estimate of the numerical probability assignment. \(P(\){"face 5 faces upwards"}\()\) itself is the relative strength of the tendency for "face 5 faces upwards" to occur, otherwise known as the single-case chance for that event. If we consider and elements of \(\mathfrak{F}\) such that \(A = \{\)"face 1 faces upwards", "face 3 faces upwards", "face 5 faces upwards"\(\}\) then \(A\) can be interpreted as "an odd valued face faces upwards".  This illustrates that even in the application to objective chance that it is difficult to avoid the use of propositions to give meaning to useful subsets of \(\mathfrak{F}\). 

A strength of the Kolmogorov axioms and my reformulation is the application to continuous infinite sets. An example is the case of an observation of an electron governed by the Schrödinger equation. According to quantum mechanics the probability of it being observed at any pre-designated spot is zero. In this example, all the elements of the set of elementary events have numerical probability assignment zero. This is where the field of events \(\mathfrak{F}\) is useful in providing sets with non-zero probability in which the position of the electron may be observed.

As discussed above, it can be convenient to use proposition to give meaning to relevant elements of \(\mathfrak{F}\) in a specific application. However, this need not introduce any subjectivity. The subjectivity enters through interpreting \(P\) as credence or degree of belief.  In applications with objective probability, \(P(A)\) is a numerical assignment of the strength of the single-chance tendency for an elementary event to appear in set \(A\).  In objective probability \(P\) is ontological but in subjective probability it is epistemic.
  1. Mauricio Suárez, Philosophy of Probability and Statistical Modelling, Elements in the Philosophy of Science, Cambridge University Press, 2021
  2. Kolmogorov, AN. (2018) Foundations of the Theory of Probability. [edition unavailable]. Dover Publications. Available at: https://www.perlego.com/book/823610/foundations-of-the-theory-of-probability-second-english-edition-pdf (Accessed: 18 December 2022).

The heart of the matter

The ontological framework for this blog is from Nicolai Hartmann's  new ontology  programme that was developed in a number of very subst...