# Last edited on 2023-12-07 06:28:38 by stolfi

THE EXPERIMENT

  This folder has some analysis and visualizations of the data
  collected by Fernando Najman and used in his paper 
  "The brain uses renewal points to model random sequences of stimuli"
  by Fernando A. Najman, Antonio Galves, Marcela Svarc, Claudia D. Vargas.

  According to the paper, 
  
    An analog first order Butterworth band pass filter
    (0.3-50 Hz) was applied to the signal and the Cz
    electrode was used as the reference during data
    acquisition. The signal was acquired with recording
    frequency of 500 Hz. In offline processing the data was
    re-referenced to the average using the EEGLAB package
    for MATLAB and a fourth order Butterworth band pass
    filter (1-30 Hz) was applied to the signal. For each
    electrode $e$, each participant $v$, and each stimulus
    sequence index $n$ in $0\dd N$, we will denote by
    $Y_n^{e,v}$ the segment from that EEG signal starting 50
    ms before the onset of that auditory stimulus and ending
    at 0.4 seconds after that onset. The baseline of each
    segment was shifted by subtracting from it the average
    of the signal in the first 50 ms.
  
  Each visrtual electrode is the pointwise average of three 
  actual electrodes: RPF = {9, 10, 11}, LPF = {11, 18, 22},
  OCC = {74, 75, 82}, in the Electrical Geodesics Inc.
  numbering system.

PREPROCESSED DATA FILE FOR ONE SUBJECT AND SEGMENT

  In 2023 received by email from Fernando Najman
  the file "Tres_segmentos.mat" (Matlab binary matrix format) 
  with one EEG segment (n = 001) of his experiment for 
  one subject (v = 01), AFTER his pre-processing 
  (software filtering, re-referencing, and virtual
  electrode combination).  
  
  The file contains 113 samples of the three virtual electrodes 
  (RPF, LPF, and OCC) corresponding to one stimulus.  
  
  Converted the file from Matlab to plain text with the script 
  "convert_electrodes_to_text.m" in directory 
  "programs/matlab/fernando_renpts/2023-10-20-electrodes/".
  The file was placed in "pre/runs/v01n001/v01n001.txt".
  
  Added by hand a header like that accepted by my EEG analysis 
  programs.  The sampling frequency was said to be 500 Hz, however 
  113 samples in 450 ms is ~250 Hz, not 500.  Assumed 250 Hz.

  Plotting the power spectrum of that run:
  
    runs=( "v01n001" ) 
    plot_some_runs_and_spectra.sh \
      SHOW \
      pre/runs/v01n001 0 2 60 36 500 \
      "${runs[@]}"

  Initial PCA on the thre electrode signals.

    runs=( v01n001 )
    do_basic_pca.sh pre/runs/v01n001 ${runs[@]}

FULL "RAW" FILE FOR ONE SUBJECT

  On 2023-10-27 received from Fernando the full "raw" recording of one
  subject (v = 01) with all segments (n = 001 to ???). Saved to
  "raw/v01/eeg.mat".
  
  The stimulus timing and type data were damaged. A new version was
  provided on 2023-10-30. Saved as "raw/v01/sti.mat".
  
  Converted these files to plain text "raw/v01/sti.txt",
  "raw/v01/V_eeg.txt", "raw/v01/S_eeg.txt", and "raw/v01/R_eeg.txt" with
  
    convert_raw_v01_mat.py
    
  Basic statistcs, histograms:
  
    ( plot_session_channels_histograms.sh raw 650 100 )
    
  The value histograms are crazy: most have several widely spaced humps,
  jagged and very far from bell-shaped.  The channel plots show that 
  there is substantial low-frequency non-Gaussian drift.
  The spectral plots show line noise at 60 Hz and 120 Hz.

SIGNAL CLEANUP

  Will try a median filter to remove the low-frequency drift,
  and a Fourier filter to remove the high-frequency noise.
  
    filter_eeg.sh raw flt v01
 
    ( plot_session_channels_histograms.sh flt 100 650 )
 
  Packing the channels used in the paper:
  
    sess=S_0000
    for ch in C009 C010 C011 C018 C022 C074 C075 C082; do
      r_pref="raw/v01/${sess}/eeg"
      f_pref="flt/v01/${sess}/eeg"
      l_pref="flt/v01/${sess}/eeg_lo"
      afile="flt/v01/${sess}/eeg_${ch}_all.png"
      convert \
        -append \
        {${r_pref},${f_pref},${l_pref}}_${ch}{,_pwr,_hist}.png \
        ${afile}
      display ${afile}
    done


==> STOPPED HERE 


PCA    
    convert \
        SAVE/2021-08-30-1/s012/pca.png \
        SAVE/2021-08-31-1/s012/pca.png \
        SAVE/2021-08-31-2/s012/pca.png \
        SAVE/2021-09-01-1/s012/pca.png \
        SAVE/2021-09-02-1/s012/pca.png \
        flt-runs-C/s012/A1/pca.png \
      -append s012_pca.png
    display s012_pca.png

    # subjects=( s010 s011 s012 s013 s014 s015 s016 s017 s018 s019 s020 s021 )
    subjects=( s012 )
    make_pca_composite_image.sh ${subjects{@]}
    
PLOTTING THE COMPONENTS MAGNITUDES 

REMOVING THE MAIN COMPONENTS

  Decomposing the filtered runs into a combination of the first two components (P000,P001)
  and a residue (RES) that is orthogonal to both patterns:

    ( cd ~/programs/c/NEUROMAT/nmeeg_spectrum && make uninstall build install )
    ( cd ~/programs/c/NEUROMAT/nmeeg_comp_analysis && make uninstall build install )
    ( cd ~/programs/c/NEUROMAT/nmeeg_animate && make uninstall build install )
    subjects=( s012 )
    for subj in ${subjects[@]}; do 
      runs=( r00624 )
      for run in ${runs[@]} ; do
        analyze_components.sh SHOW flt-runs-C ${subj} ${run} 151 51  P000 P003 P004
        nmeeg_plot_channel_scatter.sh SHOW flt-runs-C/${subj}/A1/${run}_AMP.txt 1 P000 150 P004 150
        nmeeg_plot_channel_scatter.sh SHOW flt-runs-C/${subj}/A1/${run}_AMP.txt 1 P003 150 P004 150
      done
    done
 
    for subj in s014; do 
      runs=( r00106 )
      for run in ${runs[@]} ; do
        analyze_components.sh SHOW flt-runs-C ${subj} ${run} 201 51 P000 P001
      done
    done

  Plotting all the runs:

    logdate="`yyyy-mm-dd-hhmmss`" 
    echo "${logdate}"
    plot_all_runs.sh NOSHOW "${logdate}" raw-runs 0 130 260 500
    echo 'done plotting.'

IDENTIFYING THE RUNS

  Extracting the run types:
  
    (cd flt-runs-B && egrep -e '^type *[=]' s???/r?????.txt ) \
      | sed -e 's/[.]txt:type *= */ /g' \
      > run-types.txt

LISTING GARBAGE RUNS

  By visual inspection of flt-runs-B/s???/r?????.png, created files
  "flt-runs-C/s0${subj}/garbage.dir" with a list of the run file names ("r${runid}.txt")
  that look like garbage, with many electrodes dominated by noise.
  
  Moving to new filtered runs directory:

  for subj in `( cd flt-runs-B && ls -d s??? )` ; do 
    mv -vi flt-runs-{B,C}/${subj}/garbage.dir 
    mv -vi flt-runs-{B,C}/${subj}/blink-free.dir 
  done


############################################################################################
### BLINKS #################################################################################
  
LISTING RUNS WITHOUT BLINKS

  By visual inspection of flt-runs-B/s???/r?????.png, created files
  "flt-runs-B/s0${subj}/blink-free.dir" with a list of the run file names ("r${runid}.txt")
  of the runs without blinks in them.

LOCATING BLINKS

  Creating a file blinks/blinks-by-hand.txt with the time ranges (seconds) of blinks.
  
    ( cd flt-runs-B && ls s???/r?????.txt | sed -e 's:[.]txt::g' | sort ) > blinks/blinks-by-hand.txt
    
  Editing manually the file, extracting visually the intervals with 0.1 precision.
  Considered only blinks that are completely inside the run's time window 
  (fixation and stimulus phases, plus ~0.5sec buffer on each end).
  Typically each blink lasts 0.45 seconds, almost never more than 0.6 sec.
  
    cleanup_blinks.gawk \
        -v trun=7.0 \
        blinks/blinks-by-hand.txt \
      > blinks/blinks-clean.txt
  
  Plotting an histogram of the blink locations:
  
    bfile="blinks/blinks-clean.txt"
    for sid in 013 014 ; do
      plot_blink_locations.sh SHOW ${sid} blinks/blinks-clean.txt
    done
    
  Locating runs without weirdness flags and without 
  blinks in the stimulus phase or in the end of the fixation phase:
    
    list_blink_free_runs.gawk \
      -v stini=2.5 \
      -v stfin=6.5 \
      blinks/blinks-by-hand.txt \
    > blinks/blink-safe-runs.txt
      
      107 safe runs in 512 runs (20.90%)
      subject 013: 91 safe runs in 256 runs (35.55%)
      subject 014: 16 safe runs in 256 runs (6.25%)

EXTRACTING ALL BLINKS OF EACH SUBJECT

  To model the blinks, we extract all blinks from all good runs of each subject and
  concatenate them into a single file "flt-runs-B/s${sid}_blinks.txt":
  
    vmax=160
    bfile="blinks/blinks-clean.txt"
    for sid in 013 014 ; do
      xfile="flt-runs-B/s${sid}_blinks.txt"
      cat ${bfile} \
        | egrep -e '^[ ]*'"${sid}" \
        | gawk \
            -v trun=7.0 \
            ' // { 
                sid = $1; rid = $2; nfl = $3; 
                tini = 0+$5; tfin = 0+$6;
                if ((nfl == 0) && (tini >= 0) && (tfin <= trun)) { print sid, rid, tini, tfin; }
              }
            ' \
        | extract_run_segments.gawk -v dir="flt-runs-B" \
        > .xfr
      nt=`cat .xfr | egrep -v -e '[=]' | wc -l`
      printf "nt = %d\n" "${nt}" > ${xfile} 
      cat .xfr >> ${xfile} 
    done
    
    for sid in 013 014 ; do
      xfile="flt-runs-B/s${sid}_blinks.txt"
      nmeeg_plot_channels.sh SHOW ${xfile%%.*} 0 ${vmax} 0 9999  0 0 1400 500
    done
    
EXTRACTING ALL NON-BLINK IMPORTANT SEGMENTS

  To analyze the non-blink sections, we extract all the blink-free important segments
  and concatenate them into a single file "flt-runs-B/s${sid}_nonblinks.txt":
  
    bfile="blinks/blink-safe-runs.txt"
    for sid in 013 014 ; do
      xfile="flt-runs-B/s${sid}_nonblinks.txt"
      cat ${bfile} \
        | egrep -e '^[ ]*'"${sid}" \
        | gawk \
            ' // { 
                sid = $1; rid = $2; 
                tini = 2.5; tfin = 6.5;
                print sid, rid, tini, tfin;
              }
            ' \
        | extract_run_segments.gawk -v dir="flt-runs-B" \
        > .xfr
      nt=`cat .xfr | egrep -v -e '[=]' | wc -l`
      printf "nt = %d\n" "${nt}" > ${xfile} 
      cat .xfr >> ${xfile} 
    done
    
    vmax=160
    for sid in 013 014 ; do
      xfile="flt-runs-B/s${sid}_nonblinks.txt"
      nmeeg_plot_channels.sh SHOW ${xfile%%.*} 0 ${vmax} 0 9999  0 0 1400 500
    done
    
IDENTIFYING PRINCIPAL COMPONENTS OF BLINKS
    
    for sid in 013 014 ; do
      xfile="flt-runs-B/s${sid}_blinks.txt"
      opref="flt-runs-B/s${sid}_blinks"
      echo "computing components"
      nmeeg_correl \
          -outDirix ${opref} \
          -zeroMean T \
          -maxComps 5 \
        < ${xfile}
    done

    btype=2
    for sid in 013 014 ; do
      opref="flt-runs-B/s${sid}_blinks"
      echo "plotting components"
      for cfile in ${opref}_P???_eig.txt ; do
        cpref="${cfile%.*}_b${btype}"
        nmeeg_animate ${cpref} 0 1 1 ${btype} 560 640 < ${cfile}
      done
    done

IDENTIFYING PRINCIPAL COMPONENTS OF BLINK-FREE SEGMENTS
    
    for sid in 013 014 ; do
      xfile="flt-runs-B/s${sid}_nonblinks.txt"
      opref="flt-runs-B/s${sid}_nonblinks"
      echo "computing components"
      nmeeg_correl \
          -outDirix ${opref} \
          -zeroMean F \
          -maxComps 5 \
        < ${xfile}
    done

    btype=2
    for sid in 013 014 ; do
      opref="flt-runs-B/s${sid}_nonblinks"
      echo "plotting components"
      for cfile in ${opref}_P???_eig.txt ; do
        cpref="${cfile%.*}_b${btype}"
        nmeeg_animate ${cpref} 0 1 1 ${btype} 560 640 < ${cfile}
      done
    done

SEPARATING THE BLINKS AND THE 10 HZ OSCILLATION

  Decomposing the filtered runs into a combination of the blink principal component (BL0),
  the general 10 Hz oscillation component (H10), and a residue (RES) that is orthogonal
  to both patterns:

    vmax=160
    subruns=( 013:00229 014:00425)
    for sr in ${subruns[@]} ; do
      subjid="${sr%%:*}"
      runid="${sr##*:}"
      ppref=flt-runs-B/s${subjid}
      opref=flt-runs-B/s${subjid}/r${runid}
      nmeeg_comp_analysis \
        -pattern BL0 ${ppref}_blinks_P000_eig.txt \
        -pattern H10 ${ppref}_nonblinks_P000_eig.txt \
        -normalize \
        -writeComp BL0 ${opref}_BL0.txt = BL0 \
        -writeComp H10 ${opref}_H10.txt = H10 \
        -writeComp BLH ${opref}_BLH.txt = BL0 H10 \
        -delete BL0 H10 \
        < ${opref}.txt \
        > ${opref}_RES.txt
      nmeeg_plot_channels.sh SHOW ${opref}_BL0 0 ${vmax} 0 9999  0 0 1400 500 
      nmeeg_plot_channels.sh SHOW ${opref}_H10 0 ${vmax} 0 9999  0 0 1400 500 
      nmeeg_plot_channels.sh SHOW ${opref}_BLH 0 ${vmax} 0 9999  0 0 1400 500 
      nmeeg_plot_channels.sh SHOW ${opref}_RES 0 ${vmax} 0 9999  0 0 1400 500 
    done

EXPORTING SOME TO DROPBOX

  Moving selected files to Dropbox for easy access:
  
    projdir="."
    dropdir="${HOME}/Dropbox/eeg_sinal_ghislain/2013-11-15-stolfi"
    echo "${runs[@]}"
    for run in ${runs[@]} ; do 
      cp -av ${projdir}/raw-runs/${run}{.txt,.png,_pwr.png} ${dropdir}/raw-runs/
      cp -av ${projdir}/flt-runs-B/${run}{.txt,.png,_pwr.png} ${dropdir}/flt-runs-B/
      cp -av ${projdir}/nse-runs-B/${run}{.txt,.png,_pwr.png} ${dropdir}/nse-runs-B/
    done
  
>>>> TO FIX AND DO >>>>

  Tabulating the range of electrodes Fp1 and Fp2 between trigger pulses 
  1 and 4, and flagging those that vary too much:
  
    vfile="flt-runs-B/Fp-variation.txt"
    rm -f ${vfile}
    for f in flt-runs-B/s???/r?????.txt ; do 
      printf "%s " "${f}" >> ${vfile}
      cat ${f} \
        | get_electrode_ranges.gawk -v ix1=4 -v ix2=12 -v ixt=21 -v dvmax=150 \
        >> ${vfile}
    done
  
  
SAMPLE RUNS FOR TESTING

  Identified two runs from each subject, one "Bio" and one "nonBio", both apparently
  free from blinks in the inter-pulse region.

    datadir=${HOME}/programs/c/NEUROMAT/neuromat_show_eeg/tests/data
    cp -av raw-runs/s001/r005.txt ${datadir} #  nonBio 
    cp -av raw-runs/s001/r039.txt ${datadir} #  Bio    
    cp -av raw-runs/s002/r016.txt ${datadir} #  Bio    
    cp -av raw-runs/s002/r043.txt ${datadir} #  nonBio 
    cp -av raw-runs/s003/r009.txt ${datadir} #  Bio    
    cp -av raw-runs/s003/r030.txt ${datadir} #  nonBio 
    cp -av raw-runs/s004/r030.txt ${datadir} #  nonBio 
    cp -av raw-runs/s004/r036.txt ${datadir} #  Bio    
    cp -av raw-runs/s005/r014.txt ${datadir} #  Bio    
    cp -av raw-runs/s005/r044.txt ${datadir} #  nonBio 
    cp -av raw-runs/s006/r024.txt ${datadir} #  Bio    
    cp -av raw-runs/s006/r042.txt ${datadir} #  nonBio 
    cp -av raw-runs/s007/r028.txt ${datadir} #  nonBio 
    cp -av raw-runs/s007/r046.txt ${datadir} #  Bio    
    cp -av raw-runs/s008/r015.txt ${datadir} #  Bio    
    cp -av raw-runs/s008/r049.txt ${datadir} #  nonBio