fMRI analysis: Seasonality analysis with Lomb-Scargle algorithm and Dynamic Causal Modeling

fMRI time-series analysis and Primary Motor Area detection

April 2020

1 Preface: Project outline

1.1 fMRI time-series analysis and Primary Motor Area (PMA) detection

Functional magnetic resonance imaging (fMRI) is one of the most widely used tools to study the neural underpinnings of human cognition. One of the most important application for fMRI time-series data is to detect the Primary Motor Area (PMA) of the brain. My goal in this report is to develop several simple and general methods to detect PMA based on a given brain scan experimental design (like the following finger tapping experiment in this report) and the corresponding fMRI data. A typical fMRI data is a 4 dimension time-series data consists of 3 spatial dimensions (X, Y, and Z axes) representing the brain area. Each X, Y, and Z coordinate represents a cuboid element (a brain area) of the fMRI scan result, we call each of them as a “voxel”. The last dimenson is a temporal dimension representing the time when the MRI machine conducts a brain scan on every voxel, which makes each voxel contain a time-series data which mainly captures the changes in blood flow and oxygenation associated with neural activity (the hemodynamic response), and on the differing magnetic properties of oxygenated and deoxygenated blood of that certain voxel. In particular, we call the signal captured by the time-series data in each voxel as the blood oxygen level-dependent (BOLD) signal.

For any certain given biain scan experimental design, we would ask our subjects to conduct that certain experiment with a certain time, we call all MRI scans conducted in those periods as “ON epochs”. By contrast, there are also certain time that our subjects are asked to rest without any certain movement. Those periods would act as the control group with the “ON epochs”, which we call as “OFF epochs”. Moreover, for a given experiment, we would expect that only a few certain brain areas would be activated, especially for a given experiment related to the movement of a certain human body, we would expect activation of one or several certain areas in the Primary Motor Area (PMA) of the brain.

Primary Motor Area (PMA) of the brain

Primary Motor Area (PMA) of the brain

Accurately detecting the Primary Motor Area of a certain experiment would be helpful for neural science researchers to research on the functionality of a human brain and neural activities. This project is a follow up project based on my midterm report fMRI time-series analysis and Primary Motor Area detection. In the midterm project, I provided several different PMA detective methods based on ARMA model with trends. IN this follow-up report, I would try to apply the Lomb-Scargle algorithm on the seasonal analysis of a certain voxel. And then apply the Dynamic Causal Modeling on the analysis of the neuronal states of a single voxel.

1.2 Data description

In this report, the data I use comes from a sensitive source. It is stored in the file “MOD1.RData”. It contains the fMRI data which comes from the following finger tapping experiment:

Finger-Tapping Experiment

Finger-Tapping Experiment

My data is a 4D fMRI data with \(64*64*40\) voxel locations (i.e. Brain scan area: X axis: 64, Y axis: 64, Z axis: 40). Each voxel location has 160 scans arranged by time. There are 3 seconds pause between each scan. Thus each experiment would last for 8 minutes. Just like what is illustrated in the graph above, during those 160 scans there are alternant ON and OFF epochs, each ON and OFF epochs last for 30 seconds, thus give us 8 ON and OFF alternant periods for the whole experiment.

The following graph shows the expected BOLD response of this finger tapping experiment, along with the linear signal. Here we keep the linear trend as analysis above shows that we might face a significant liear trend in each voxel. These two trends would act as the external regressors for the time series models

2 P-value of seasonality based on Lomb-Scargle algorithm

2.1 Advantages for Lomb-Scargle algorithm

Lomb-Scargle periodogram is a efficient and widely used algorithm to detect seasonality and examine the significance. This algorithm can tolerate the uneven sampling with least-squares fitting of sinusoids. The best part for this algorithm in the domain of fMRI data primary motor area detection is that this algorithm provides us with a p-value based detecting method. And that is eactly what we need. As a numerical result is always more convincing than a “subjective judgement” based result.

2.2 Examples for Lomb-Scargle algorithm

Example 1: voxel X=10, Y=10, Z=10

Voxel X=10, Y=10, Z=10 is clearly not an activated area of a brain. In fact, in our data, this sample is even not inside the main brain area.

Based on the Lomb-Scargle algorithm, the p-value of that voxel is 0.584, clearly not an activated area.

Example 2: voxel X=43, Y=42, Z=33

Voxel X=43, Y=42, Z=33 is a voxel which is anticipated to be inside the activated area.

Based on the Lomb-Scargle algorithm, the p-value of that voxel is 0, clearly not an activated area.

3 P-value activation area Mapping based on Lomb-Scargle algorithm

Similiar to the Posterior Predictive Mapping (PPM), if our algorithm is based on compute the p-value of each voxel to judge whether a certain area is activated area, we could draw the p-value maps to visualize a certain area.

Here in order to get better visualization result, 1-p value is used to replace p-value. And high value in 1-p represents that a certain voxel could be activated.

Notice that for the Lomb-Scargle algorithm applied in this dataset, the period that we wish to detect is whether our data has a period around 20.

3.1 P-value mapping example: Z=33

To simplify the calculation process, a 2D slice on the Z-axis (top view) Z=33 is given as an example. Below is the visuailzation results. Also, the repeats applied by the Lomb-Scargle algorithm is also reduced to 300 in order to simplify the computation process. Moreover, according to the experimental design, we are actually only interested in the period of 20 (as 20 is the cycle for this experiment). Thus we could control the period scanned by the Lomb-Scargle algorithm to just around 20.

Below is the visualization result for laye Z=33

We could see that clearly based on the Lomb-Scargle algorithm, we could get some reasonable Primary Motor Area. However, There are still many small fraction of activated area that need to be taken care of. Thus we might still need to apply FDR correction and probably spatial clustering to get better results.

4 Dynamic Causal Modeling by “pomp” package

As the fMRI response to the neural states could always be delayed for a few seconds, only by applying the ON-OFF states of the experiment cannot capture all the characteristics of the fMRI signal. Thus by applying the Dynamic Caucal Model, we could mix the lag signals with the ON-OFF signals, and create the Hemodynamic Response Function (HRF) result for the fMRI signal.

Dynamic causal modelling describes the change in neuronal states (x) in response of external stimulation (u). Together with haemodynamic response function which describes relationship between neuronal state (x) and observed fMRI (y), we can settle the following model:

Model for neural state: \[\dot{x} = Ax + \sum^{m}_{j=1}u_jB^{(j)}x+Cu\]

Model for signal \[\dot{s} = x - \kappa_s s - \kappa_f (e^f - 1)\]

Model for blood flow \[\dot{f} = s e^{-f}\]

Model for volume \[\dot{v} = \frac{1}{\tau_0}(e^f - e^{v/\alpha})e^{-v}\]

Model for deoxyhemoglobin \[\dot{q} = \frac{1}{\tau_0}(\frac{e^{f-q}(1-(1-E_0)^{e^{-f}})}{E_0}-e^{(1-\alpha)v/\alpha})\]

Model for the observed BOLD signal \[y^* = V_0(4.3\nu_0E_0TE(1-e^q) + \epsilon_0r_0E_0TE(1-e^{q-v}) + (1-\epsilon_0)(1-e^v))\]

After carefully construct the Dynamic Causal Model in pomp, we could apply the model to some certain voxels that we wish to experiment on. Notice that the time we put into the model is based on the real time. Thus the first scan happen in 3s and between each two scans there is a 3s time interval.

The pomp dynamic causal model is developed based on the model shown above, and it could be constructed in the following code. One example I applied here is voxel X=43, Y=42, and Z=33, as it is a voxel inside the Primary Motor Area. We anticipate we could get some interesting result based on that.

The choice of initial parameters should be based on the prior knowledge of a certain voxel. Here the choice of initial parameters are given based on the suggestions of the paper.

I’ve set run level to 1 for the local search. Several choices has been made below for the search.

One simulaton result based on the given parameters are shown here.

Particle filtering could be done but is not showing any interesting results, thus it is aborted.

The local search is conducted below:

## [1] "mifs_local" "t_local"

The biggest disadvantage of this kind of method is that we need to get a very good prior information about the brain area we are working on. And clearly this information is not always given for all fMRI data. And in this case, I’ve tried numerous different initial value for the parameters, and sometimes I cannot find all matches, and some of them would become NAN after several turn of iterations. This kind of instability is affected by the seeds I choose and the initial value I use. One good news is that I could actually conduct the computation based on the initial value given by the paper.

Also, the computation always take a long time to run. Thus it would be hard to apply this method for all voxels. The code below is useful to conduct the global search and show the convergence of parameters. Theoretically it can work if we could successfully conduct the local search. Thus, this method is highly useful if we have enough knowledge about the prior and we only wish to conduct our analysis on a few voxels.

Visualization result for global search

Visualization result for global search

Again, the visualization result is not that ideal due to the instability of the voxel choice and initial value choice. And we might need to have better prior knowledge to getter a better Dynamic Causal Model.

There is another important issue that might affet the final result is that due to the limitation of computational ability and time for this project. Iterations for this model migh not be very sufficient. We might get better result if we could increase the iterations we conduct.

Moreover, this method doesn’t take account for the correlations between different voxels, thus the spatial information is not applied. To fix this problem, one could apply the Bayesian method to build the spatial model or advanced spatial temporal model to detect the Primary Motor Area.

5 Bayesian fmri analysis: Spatial Model

5.1 Two-Stage Single-subject Modeling and Spatial Modeling

Ignoring the correlations between different times, a easy way to conduct Bayesian analysis on fmri data is only to consider the correlations between neighboring pixels (spatial information of pixels). However, the simulation requires an initial value of all coefficients. Thus, to fully conduct this model, we need initial value of coefficients from different pixels. Basically it is computed by a GLM model using all time-series data from each pixel. Typicall, we could choose from 3 different methods.

  1. Regression models with baseline, linear signals, and BOLD signals.
  2. ARIMA time series model with trends based on linear signals and BOLD signals.
  3. Method discribed in the Single-subject Model part.

A good reason for us to apply the third method is that, the first two methods could only give us a single initial value for all coefficients. By the Single-subject Model, all coefficients are computed by simulation, and we could try multiple initial values on the Spatial model discribed here.

Thus in order to apply the spatial information of different pixels, here we developed a two stage modeling process, with the first stage simulating the coefficients for each single voxel, and the second stage apply the simulated single voxel coefficients data for spatial modeling.

To conduct the Bayesian analysis for fmri data, we need to apply the STAN statistical software. In r, we need to use package “rstan”.

5.2 Single-subject Modeling

5.2.1 Observation Model

For each pixel \(i\), \(i = 1,... ,I\), the time series \(\{Y_{it}, t = 1,... ,T\}\) of magnetic resonance signals is related to the stimulus \(\{X_{it}, t = 1, .. ., T\}\) through a pixelwise linear model: \(y_{it}={\alpha}_i+z_tb_i+{\epsilon}_{it}\), \(I=1,2,..,I, t=1,2,…,T\). Here, we already transformed our original stimulus \(x_{it}\) to BOLD signal \(z_{it}\) by applying hemodynamic response function (HRF) h, such as \(z_t=\int_0^t x(s)h(t-s)ds\) (…) In this model, we choose Poisson function as our HRF. Adding the HRF with the ON-OFF signals for this experiment, we could get the following BOLD signals.

For parameter \(b_i\), we consider it as the effect of activation at pixel i. For error term \({\epsilon}_{it}\), it captures random noise and various nuisance effects due to the machine as well as subject-related physiological noise.
According to our knowledge, a Gaussian assumption for the observations, conditional upon parameters, is suitable for this model. That is, the observation model for one pixel, i, i=1,2,…,I, is

\[ y_{it}|{\alpha_i},z_t,b_i,{\sigma_i^2} \sim N({\alpha}_i+z_tb_i, {\sigma_i^2}), \; t= 1,2,…,T\]

5.2.2 Prior Distribution

For the parameters \(b_i\) and \(a_i\), we choose a highly dispersed diffuse Gaussian distribution to get a conjugate posterior distribution. For \(a_i\), we have

\[a_i \sim N({\mu_a},1/{\lambda_a})\] Where \({mu_a}\) and \({\lambda_a}\) are hyperparameters for \(a_i\). The diffusion of this prior is controlled by \(\lambda\), and when \(\lambda \rightarrow 0\), the prior is diffuse and the Bayes estimator is in line with with the least squares estimator.

For \(b_i\), we have

\[b_i \sim N({\mu_b},1/{\lambda_b})\]

The form for prior of \(b_i\) is the same as \(a_i\), we only need to specify different hyperparameters.

For the parameter \({\sigma_i^2}\), we choose an Inverse-Gamma distribution.

\[{\sigma_i^2}\sim Inv-Gamma(\gamma_a,\gamma_b)\]

Result for the Single-Subject Model

We apply the single-subject model to the dataset described in the Introduction section. After using ‘fmri’ in R, we convert original stimulus to BOLD signal. Then we fit the linear model with data in STAN. Our results contain 2000 iterations, with the first 1000 being discarded as burn-in. Hence, inference is performed with 1000 samples for each parameter. Parameters of the gamma hyperpriors were set to \(\gamma_a\) = \(\gamma_b\) = 1. Hyperparameters for \(a_i\)’s prior were set to \(\mu_a\)=0, \(\lambda_a\)=1. Hyperparameters for \(b_i\)’s prior were set to \(\mu_b\)=0, \(\lambda_b\)=\(\frac{1}{10}\).

STAN output for a non-active pixel (1,5) for b, a, sigma^2

STAN output for a non-active pixel (1,5) for b, a, \(sigma^2\)

STAN output for an active pixel (33,33) for b, a, sigma^2

STAN output for an active pixel (33,33) for b, a, \(sigma^2\)

Two plots above show posterior sampling results for b, a, \(sigma^2\) in a non-active pixel(1,5,33) and an active pixel(33,33,33). We observe how 4 chains forget the initial value after some iterations and that the four traces are mixing well. We also can check the effective sample sizes and R hat. For non-active pixel (1,5,33), the effective sample sizes of b, a and \(sigma^2\) are 760, 4502 and 4017 respectively. All R hat equal to 1. For active pixel (33,33,33), the effective sample sizes are 4149,4377,3817,2122 respectively. All R hat also equal to 1.

Posterior predictive density estimates for a pixel

Posterior predictive density estimates for a pixel

In the plot above, the dark line is the distribution of the observed outcomes y and each of the 1000 lighter lines is the kernel density estimate of one of the replications of y from the posterior predictive distribution. This plot makes it easy to see that this model succeed to account for the bell-shaped of y.

Finally, we can save our results for parameters \(b_i\) to use in the next stage —- spatial model inference.

5.3 Spatial Model

Traditional generalized linear model(GLM) and time-series model focus on the analysis of the time series data based on each voxel (in our case, each pixel). However, it is reasonable to expect that there exist spatial correlation between different pixels because the response at a particular pixel is likely to be similar to the responses of neighboring pixels. Referring to the description of Zhang’s paper, spatial dependence between brain pixels could be captured by imposing spatial priors on the model parameters. And a very commonly used approach is based on applying Gaussian Markov random field (GMRF) priors on the coefficients vector. Here, our original GLM model contains three signals, the baseline (intercept), linear signals, and BOLD signals. Thus GMRF priors are applied on this coefficients vector. And in it, we are mainly interested in the simulation for the coefficient of the BOLD signal as it represents if the signal in that certain pixel could follow the experimental design.

Structure for the GMRF priors

\[ p(\beta_{(j)}|\lambda)\propto exp(-\frac{1}{2}\lambda \beta^T_{(j)}Q\beta_{(j)}) \]

The precision matrix Q is defined as

\[ Q_{v,k}= \begin{cases} n_v,\; v=k\\ -1,\; v\sim k\\ 0,\; otherwise \end{cases} \]

In the above form, the symbol \(v\sim k\) represents that pixel v and k are neighbors. The simplest and less computational cost assumption applied here indicate that v and k are defined as neighbors when they have only one pixel difference on the x axis or y axis.

Simulation iteration of \(\beta\) based on the GMRF priors

Based on the prior information and the penalty parameter \(\lambda\), we could get the conditional distribution of beta based on the prior information. Thus our simulation iteration is based on the following distribution.

\[ \beta_{v,j}|\beta_{-v,j},\lambda\sim N(\frac{1}{n_v}\sum_{k\sim v}\beta_{k,j},\frac{1}{n_v\lambda}) \]

where \(\beta_{-v,j}=\{\beta_{l,j};l\neq v\}\) and \(k\sim v\) represents point k and v are neighboring voxels.

Model discription and explaination

Essentially, the Spatial model is a smooting method. If there are few scattered pixels with very significant coefficients on the BOLD signal, by applying the neighboring information of other pixels, their coefficients significance woould be “smoothed” and become less significant. Thus this method is useful to rule out some single pixels with unusually high significance level. What we should expect is that after applying the spatial model, it would be easier for us to observe several areas with high significance levels, but no single pixels with high significance level.

6 Other Spatial Model Variation

6.1 Spatial Model with Weighted Neighbors

The definition of neighbors in the previous model is only taken account of pixels with one distance from the original one. However, since each activated area could be an area with multiple pixels, we could think about expanding the neighboring area and put different weight to different neighbors based on some criteria. A very straightforward thought is to put weight based on the distance with the original pixel. However, correlation among voxels does not necessarily decay with distance. Thus weights for different pixels might also need to be determined by prior knowledge.

Below is the conditional distribution for a model like this \[ \beta_{v,j}|\beta_{-v,j},\lambda\sim N(w_v\sum_{k\sim v}\beta_{k,j},\frac{1}{n_v\lambda}) \] in which \(w_v\) represents the weight and we have \(\sum w_v=1\).

In our report, one weighted neighboring model is given taking 12 neighboring pixels with the weight determined by distance. Detailed result is shown in the conclusion.

6.2 Multiple Smooting Based on the Spatial Model

According to the conditional distribution of the spatial model, we could find that the essence for this model is taking the neighboring information and smooth a certain pixel. Thus a reasonable procedure to control the smoothness of our result is to conduct the same(or different weighted) spatial model for multiple times. One example of applying the same spatial model two times is shown in the conclusion part of this report. Notice that applying the spatial model too many times would only drag all coeffifients to the average level.

7 Posterior Probability Maps

The major goal for posterior inference is to create a spatial mapping of the activated brain regions. Based on the estimated regression parameters \(\beta\), We construct posterior probability maps (PPMs). PPMs represent a complementary alternative to statistical parametric maps that are used to make classical inferences.The main idea of PPMs is to detect activations by mapping the estimates of \(\beta\) at each voxel of a single slice and then thresholding the conditional posterior probabilities at a specific confidence level.

At each voxel, the conditional posterior probability that a particular effect exceeds some threshold \(\kappa\) is calculted as \[p = 1 - \Phi\biggl[\frac{\kappa-w^TM_{\beta_v|Y}}{(w^TC_{\beta_v|Y}w)^{1/2}}\biggr],\] where \(M_{\beta_v|Y}\) is the posterior mean, \(C_{\beta_v|Y}\) is the posterior covariance of the parameter \(\beta_v\), \(w\) is a contrast weight vector, \(\Phi(.)\) is the cumulative density function of the standard normal distribution. As we have generated the postrior samples, we can easily obtain the posterior means and covariances, and the optimal threshold is formulated by minimizing a loss function. Specifically, we established a function to calculate the p value at each voxel and organized them into a matrix, then we drew the contour maps.

7.1 Spatial Model Posterior Probability Maps(PPM) Visualization

With a masking tools(BrainSuite19), we are able to narrow all p-values inside a reasonable brain area. Then based on the formula of PPM, we could construct a plot based on p-values.

7.1.1 Visualiztaion: Single First Stage Observation model with out spatial correlation

From this graph we could see that based on the formula of PPM, though we are able to detect several areas with high p-values, all those are are pretty scattered, and it is hard for us to determine which area is the Primary Motor Area that we are looking for. Thus, without taking the spatial correlation into consideration, it is hard to find one or several particular area that could be detected as the Primary Motor Area.

7.1.2 Visualization: Two Stage Spatial Model

Based on this graph after applying the two stage spatial model, we could see that many scattered pixels with high p-value in the previous plot is now ruled out. And the scattered area with high p-values in the upper right of the brain is now connected into a clear area with high p-values. Thus we courd see, after applyig two stage spatial model, it is easier for us to determine which area is the Primary Motor Area.

However, we still see several small scattered area with high p-values. This could be reasonable, or this is caused by the limitation of the Two stage Spatial method. We might think about weighted spatial model or the multiple smoothing model discribed before. Or better, try more advanced bayesian model like the Nonadditive Spatial-temporal model.

7.1.3 Visualization: Double Smoothing Model

This graph comes from the double smoothing model. the result comes from applying the same spatial model on the result given by the previous spatial model. In this way, the data would be smoothed twice. Based on the result, we see that comparing to the two stage Spatial model, p-value given by the double smoothing model is clearly more smoothed. But there is not a huge difference than the previous one. Thus we should favor the simplier spatial smoothing model given before. However, we might able to get a better model if we combine more spatial model together.

7.1.4 Visualization: Weighted Spatial Model

This graph comes from tne weighted spatial model, in which we applied not the nearest 4 pixels, but the nearest 12 pixels, and give them weight based on the distance of those pixels with the pixel we are computing. We can clearly see that, in our case, this model put more pixels into significant pixels under the same criteria of PPM. This suggests that it might not be very reasonable to assign weight only based on the distance of pixels. More detailed prior knowledge might need to be applied to get a reasonable weighted spatial model. Also, this might suggest that we should think about applying some FDR correction method on the p-value to get more reasonable conclusion. Like the benjamini hochberg correction.