Analytical conduction adjoint

May 13, 2015 · 4 min read
blog

The cost function is described as:

$$ J(x) = \frac{(M(x)-y)^2}{\sigma_y^2} + \frac{(x-x_0)^2}{\sigma_x^2} $$

where,

  • $x$ are the unknowns, $k$ and $H$ in our case
  • $x_0$ are prior estimates of unknowns
  • $\sigma_x$ are the uncertainty of these prior estimates
  • $M$ is the model we are solving, in our case the heat equation
  • $y$ are observables on the solution, surface heat flow in our case
  • $\sigma_y$ is the uncertainty of our observables

We are solving the steady state heat equation which requires input variables $k$, $H$, and $q_0$. The second part of this equation can be written as:

$$ \sum \frac{(k-k_0)^2}{\sigma_k^2} , \frac{(H-H_0)^2}{\sigma_H^2} , \dots $$

where all of our prior estimates and uncertainties are added to the cost function.

The temperature, $T$, with respect to depth, $z$ is:

$$ T(z) = T_0 + \frac{Q_0 z}{k} + \frac{H dz^2}{k} \left( 1-e^{-z/dz} \right) $$

we can formulate this into a forward model where the cost function is returned:

def f(x):
    ''' forward model, returns the cost function J(x) '''
    k, H, Q0 = tuple(x)
    T = T0 + Q0*z/k + H*dz**2/k * (1-exp(-z/dz))
    q = k * (T[1]-T[0]) / (z[1]-z[0])
    return (q-q_obs)**2/sigma_q**2 + (k-k_prior)**2/sigma_k**2 + (H-H_prior)**2/sigma_H**2 + (Q0-Q0_prior)**2/sigma_Q0**2

q_obs and sigma_q are surface heat flow observations and uncertainty respectively. The right hand side of this equation is all our prior estimates of $k$, $H$, and $Q_0$. These are all summed to the cost function. Note that k, H, and Q0 are the only parameters that change.

Now the tangent linear model uses the derivatives of each line in the forward model with respect to k, H, and Q0.

def f_tl(x, dx):
    ''' tangent linear model '''
    k, H, Q0 = tuple(x)
    dkdx, dHdx, dQ0dx = 1, 1, 1
    dk, dH, dQ0 = tuple(dx)

    # T = f(k, H, Q0)
    T = T0 + Q0*z/k + H*dz**2/k * (1-exp(-z/dz))
    dTdk = -Q0*np.array(z)/k**2 - H*dz**2/k**2 * (1-exp(z/dz))
    dTdH = dz**2/k * (1-exp(z/dz))
    dTdQ0 = z/k
    dT = dTdk*dk + dTdH*dH + dTdQ0*dQ0 # similar to the dot product

    # q_model = f(T0, T1, k)
    q_model = k * (T[1]-T[0])/(z[1]-z[0])
    dqdT1 = k / (z[1]-z[0])
    dqdT0 = -k / (z[1]-z[0])
    dqdk = (T[1]-T[0]) / (z[1]-z[0])
    dq_model = dot([dqdT0, dqdT1, dqdk], [dT[0], dT[1], dk])

    # return f(q_model, k, H, Q0)
    d2q = (2*q_model-2*q_obs)/sigma_q**2
    d2k = (2*k-2*k_prior)/sigma_k**2
    d2H = (2*H-2*H_prior)/sigma_H**2
    d2Q0 = (2*Q0-2*Q0_prior)/sigma_Q0**2
    d2x = dot([d2q, d2k, d2H, d2Q0], [dq_model, dk, dH, dQ0])

    return d2x

This returns reasonable values of $df$ providing the change in our variables ($k, H, Q_0$) are relatively small. Now to move onto the adjoint model…

We backwards-propagate a small change in $f$ through the function, starting with the last line of the tangent linear model. For every element ($k, H, Q_0$) that appears in more than 1 line, we need to accumulate them and be careful not to overwrite them.

def f_ad(x, df):
    ''' adjoint model '''
    k, H, Q0 = tuple(x)
    T = T0 + Q0*z/k + H*dz**2/k * (1-exp(z/dz))
    q_model = k * (T[1]-T[0])/(z[1]-z[0])

    d2q = (2*q_model-2*q_obs)/sigma_q**2
    d2k = (2*k-2*k_prior)/sigma_k**2
    d2H = (2*H-2*H_prior)/sigma_H**2
    d2Q0 = (2*Q0-2*Q0_prior)/sigma_Q0**2

    dq_ad = d2q*df
    dk_ad = d2k*df
    dH_ad = d2H*df
    dQ0_ad = d2Q0*df


    dqdT0 = -k / (z[1]-z[0])
    dqdT1 = k / (z[1]-z[0])
    dqdk = (T[1]-T[0]) / (z[1]-z[0])

    dT0_ad = dqdT0*dq_ad
    dT1_ad = dqdT1*dq_ad
    dk_ad += dqdk*dq_ad


    dTdk = -Q0*np.array(z)/k**2 - H*dz**2/k**2 * (1-exp(z/dz))
    dTdH = dz**2/k * (1-exp(z/dz))
    dTdQ0 = z/k

    dk_ad += dTdk[0]*dT0 + dTdk[1]*dT1
    dH_ad += dTdH[0]*dT0 + dTdH[1]*dT1
    dQ0_ad += dTdQ0[0]*dT0 + dTdQ0[1]*dT1

    return dk_ad, dH_ad, dQ0_ad

The adjoint model returns $dk, dH, dQ_0$.

Apart from being able to calculate $dx$ from any $f(x)$, the adjoint model is much more computationally friendly than the TLM. E.g. the TLM computed T[0], T[1], T[2]T[n], but the adjoint only requires computation on elements you actually need - in this case only T[0] and T[1].

When writing the adjoint, only enough of the forward model needs to be computed so that you have the appropriate variables. For most cases this is a complete forward run.

Dr. Ben Mather
Authors
ARC Industry Research Fellow

I am an ARC Industry Research Fellow in the School of Geography, Earth and Atmospheric Sciences at The University of Melbourne. I am an expert in fusing Earth evolution models with data to understand how groundwater moves critical minerals through the landscape. Related research interests include the cycling of volatiles within the Earth, probabilistic thermal models of the lithosphere to unravel past tectonic and climatic events, and understanding the how enigmatic volcanoes form.

I am a vocal advocate for the integral role of geoscience in responding to challenges we face in transitioning to the carbon-neutral economy. As an expert in my field, I have been interviewed in national and international print media, TV, and radio on a wide variety of subjects including earthquakes, volcanoes, groundwater, and critical minerals.