# Last edited on 2023-12-07 06:28:38 by stolfi THE EXPERIMENT This folder has some analysis and visualizations of the data collected by Fernando Najman and used in his paper "The brain uses renewal points to model random sequences of stimuli" by Fernando A. Najman, Antonio Galves, Marcela Svarc, Claudia D. Vargas. According to the paper, An analog first order Butterworth band pass filter (0.3-50 Hz) was applied to the signal and the Cz electrode was used as the reference during data acquisition. The signal was acquired with recording frequency of 500 Hz. In offline processing the data was re-referenced to the average using the EEGLAB package for MATLAB and a fourth order Butterworth band pass filter (1-30 Hz) was applied to the signal. For each electrode $e$, each participant $v$, and each stimulus sequence index $n$ in $0\dd N$, we will denote by $Y_n^{e,v}$ the segment from that EEG signal starting 50 ms before the onset of that auditory stimulus and ending at 0.4 seconds after that onset. The baseline of each segment was shifted by subtracting from it the average of the signal in the first 50 ms. Each visrtual electrode is the pointwise average of three actual electrodes: RPF = {9, 10, 11}, LPF = {11, 18, 22}, OCC = {74, 75, 82}, in the Electrical Geodesics Inc. numbering system. PREPROCESSED DATA FILE FOR ONE SUBJECT AND SEGMENT In 2023 received by email from Fernando Najman the file "Tres_segmentos.mat" (Matlab binary matrix format) with one EEG segment (n = 001) of his experiment for one subject (v = 01), AFTER his pre-processing (software filtering, re-referencing, and virtual electrode combination). The file contains 113 samples of the three virtual electrodes (RPF, LPF, and OCC) corresponding to one stimulus. Converted the file from Matlab to plain text with the script "convert_electrodes_to_text.m" in directory "programs/matlab/fernando_renpts/2023-10-20-electrodes/". The file was placed in "pre/runs/v01n001/v01n001.txt". Added by hand a header like that accepted by my EEG analysis programs. The sampling frequency was said to be 500 Hz, however 113 samples in 450 ms is ~250 Hz, not 500. Assumed 250 Hz. Plotting the power spectrum of that run: runs=( "v01n001" ) plot_some_runs_and_spectra.sh \ SHOW \ pre/runs/v01n001 0 2 60 36 500 \ "${runs[@]}" Initial PCA on the thre electrode signals. runs=( v01n001 ) do_basic_pca.sh pre/runs/v01n001 ${runs[@]} FULL "RAW" FILE FOR ONE SUBJECT On 2023-10-27 received from Fernando the full "raw" recording of one subject (v = 01) with all segments (n = 001 to ???). Saved to "raw/v01/eeg.mat". The stimulus timing and type data were damaged. A new version was provided on 2023-10-30. Saved as "raw/v01/sti.mat". Converted these files to plain text "raw/v01/sti.txt", "raw/v01/V_eeg.txt", "raw/v01/S_eeg.txt", and "raw/v01/R_eeg.txt" with convert_raw_v01_mat.py Basic statistcs, histograms: ( plot_session_channels_histograms.sh raw 650 100 ) The value histograms are crazy: most have several widely spaced humps, jagged and very far from bell-shaped. The channel plots show that there is substantial low-frequency non-Gaussian drift. The spectral plots show line noise at 60 Hz and 120 Hz. SIGNAL CLEANUP Will try a median filter to remove the low-frequency drift, and a Fourier filter to remove the high-frequency noise. filter_eeg.sh raw flt v01 ( plot_session_channels_histograms.sh flt 100 650 ) Packing the channels used in the paper: sess=S_0000 for ch in C009 C010 C011 C018 C022 C074 C075 C082; do r_pref="raw/v01/${sess}/eeg" f_pref="flt/v01/${sess}/eeg" l_pref="flt/v01/${sess}/eeg_lo" afile="flt/v01/${sess}/eeg_${ch}_all.png" convert \ -append \ {${r_pref},${f_pref},${l_pref}}_${ch}{,_pwr,_hist}.png \ ${afile} display ${afile} done ==> STOPPED HERE PCA convert \ SAVE/2021-08-30-1/s012/pca.png \ SAVE/2021-08-31-1/s012/pca.png \ SAVE/2021-08-31-2/s012/pca.png \ SAVE/2021-09-01-1/s012/pca.png \ SAVE/2021-09-02-1/s012/pca.png \ flt-runs-C/s012/A1/pca.png \ -append s012_pca.png display s012_pca.png # subjects=( s010 s011 s012 s013 s014 s015 s016 s017 s018 s019 s020 s021 ) subjects=( s012 ) make_pca_composite_image.sh ${subjects{@]} PLOTTING THE COMPONENTS MAGNITUDES REMOVING THE MAIN COMPONENTS Decomposing the filtered runs into a combination of the first two components (P000,P001) and a residue (RES) that is orthogonal to both patterns: ( cd ~/programs/c/NEUROMAT/nmeeg_spectrum && make uninstall build install ) ( cd ~/programs/c/NEUROMAT/nmeeg_comp_analysis && make uninstall build install ) ( cd ~/programs/c/NEUROMAT/nmeeg_animate && make uninstall build install ) subjects=( s012 ) for subj in ${subjects[@]}; do runs=( r00624 ) for run in ${runs[@]} ; do analyze_components.sh SHOW flt-runs-C ${subj} ${run} 151 51 P000 P003 P004 nmeeg_plot_channel_scatter.sh SHOW flt-runs-C/${subj}/A1/${run}_AMP.txt 1 P000 150 P004 150 nmeeg_plot_channel_scatter.sh SHOW flt-runs-C/${subj}/A1/${run}_AMP.txt 1 P003 150 P004 150 done done for subj in s014; do runs=( r00106 ) for run in ${runs[@]} ; do analyze_components.sh SHOW flt-runs-C ${subj} ${run} 201 51 P000 P001 done done Plotting all the runs: logdate="`yyyy-mm-dd-hhmmss`" echo "${logdate}" plot_all_runs.sh NOSHOW "${logdate}" raw-runs 0 130 260 500 echo 'done plotting.' IDENTIFYING THE RUNS Extracting the run types: (cd flt-runs-B && egrep -e '^type *[=]' s???/r?????.txt ) \ | sed -e 's/[.]txt:type *= */ /g' \ > run-types.txt LISTING GARBAGE RUNS By visual inspection of flt-runs-B/s???/r?????.png, created files "flt-runs-C/s0${subj}/garbage.dir" with a list of the run file names ("r${runid}.txt") that look like garbage, with many electrodes dominated by noise. Moving to new filtered runs directory: for subj in `( cd flt-runs-B && ls -d s??? )` ; do mv -vi flt-runs-{B,C}/${subj}/garbage.dir mv -vi flt-runs-{B,C}/${subj}/blink-free.dir done ############################################################################################ ### BLINKS ################################################################################# LISTING RUNS WITHOUT BLINKS By visual inspection of flt-runs-B/s???/r?????.png, created files "flt-runs-B/s0${subj}/blink-free.dir" with a list of the run file names ("r${runid}.txt") of the runs without blinks in them. LOCATING BLINKS Creating a file blinks/blinks-by-hand.txt with the time ranges (seconds) of blinks. ( cd flt-runs-B && ls s???/r?????.txt | sed -e 's:[.]txt::g' | sort ) > blinks/blinks-by-hand.txt Editing manually the file, extracting visually the intervals with 0.1 precision. Considered only blinks that are completely inside the run's time window (fixation and stimulus phases, plus ~0.5sec buffer on each end). Typically each blink lasts 0.45 seconds, almost never more than 0.6 sec. cleanup_blinks.gawk \ -v trun=7.0 \ blinks/blinks-by-hand.txt \ > blinks/blinks-clean.txt Plotting an histogram of the blink locations: bfile="blinks/blinks-clean.txt" for sid in 013 014 ; do plot_blink_locations.sh SHOW ${sid} blinks/blinks-clean.txt done Locating runs without weirdness flags and without blinks in the stimulus phase or in the end of the fixation phase: list_blink_free_runs.gawk \ -v stini=2.5 \ -v stfin=6.5 \ blinks/blinks-by-hand.txt \ > blinks/blink-safe-runs.txt 107 safe runs in 512 runs (20.90%) subject 013: 91 safe runs in 256 runs (35.55%) subject 014: 16 safe runs in 256 runs (6.25%) EXTRACTING ALL BLINKS OF EACH SUBJECT To model the blinks, we extract all blinks from all good runs of each subject and concatenate them into a single file "flt-runs-B/s${sid}_blinks.txt": vmax=160 bfile="blinks/blinks-clean.txt" for sid in 013 014 ; do xfile="flt-runs-B/s${sid}_blinks.txt" cat ${bfile} \ | egrep -e '^[ ]*'"${sid}" \ | gawk \ -v trun=7.0 \ ' // { sid = $1; rid = $2; nfl = $3; tini = 0+$5; tfin = 0+$6; if ((nfl == 0) && (tini >= 0) && (tfin <= trun)) { print sid, rid, tini, tfin; } } ' \ | extract_run_segments.gawk -v dir="flt-runs-B" \ > .xfr nt=`cat .xfr | egrep -v -e '[=]' | wc -l` printf "nt = %d\n" "${nt}" > ${xfile} cat .xfr >> ${xfile} done for sid in 013 014 ; do xfile="flt-runs-B/s${sid}_blinks.txt" nmeeg_plot_channels.sh SHOW ${xfile%%.*} 0 ${vmax} 0 9999 0 0 1400 500 done EXTRACTING ALL NON-BLINK IMPORTANT SEGMENTS To analyze the non-blink sections, we extract all the blink-free important segments and concatenate them into a single file "flt-runs-B/s${sid}_nonblinks.txt": bfile="blinks/blink-safe-runs.txt" for sid in 013 014 ; do xfile="flt-runs-B/s${sid}_nonblinks.txt" cat ${bfile} \ | egrep -e '^[ ]*'"${sid}" \ | gawk \ ' // { sid = $1; rid = $2; tini = 2.5; tfin = 6.5; print sid, rid, tini, tfin; } ' \ | extract_run_segments.gawk -v dir="flt-runs-B" \ > .xfr nt=`cat .xfr | egrep -v -e '[=]' | wc -l` printf "nt = %d\n" "${nt}" > ${xfile} cat .xfr >> ${xfile} done vmax=160 for sid in 013 014 ; do xfile="flt-runs-B/s${sid}_nonblinks.txt" nmeeg_plot_channels.sh SHOW ${xfile%%.*} 0 ${vmax} 0 9999 0 0 1400 500 done IDENTIFYING PRINCIPAL COMPONENTS OF BLINKS for sid in 013 014 ; do xfile="flt-runs-B/s${sid}_blinks.txt" opref="flt-runs-B/s${sid}_blinks" echo "computing components" nmeeg_correl \ -outDirix ${opref} \ -zeroMean T \ -maxComps 5 \ < ${xfile} done btype=2 for sid in 013 014 ; do opref="flt-runs-B/s${sid}_blinks" echo "plotting components" for cfile in ${opref}_P???_eig.txt ; do cpref="${cfile%.*}_b${btype}" nmeeg_animate ${cpref} 0 1 1 ${btype} 560 640 < ${cfile} done done IDENTIFYING PRINCIPAL COMPONENTS OF BLINK-FREE SEGMENTS for sid in 013 014 ; do xfile="flt-runs-B/s${sid}_nonblinks.txt" opref="flt-runs-B/s${sid}_nonblinks" echo "computing components" nmeeg_correl \ -outDirix ${opref} \ -zeroMean F \ -maxComps 5 \ < ${xfile} done btype=2 for sid in 013 014 ; do opref="flt-runs-B/s${sid}_nonblinks" echo "plotting components" for cfile in ${opref}_P???_eig.txt ; do cpref="${cfile%.*}_b${btype}" nmeeg_animate ${cpref} 0 1 1 ${btype} 560 640 < ${cfile} done done SEPARATING THE BLINKS AND THE 10 HZ OSCILLATION Decomposing the filtered runs into a combination of the blink principal component (BL0), the general 10 Hz oscillation component (H10), and a residue (RES) that is orthogonal to both patterns: vmax=160 subruns=( 013:00229 014:00425) for sr in ${subruns[@]} ; do subjid="${sr%%:*}" runid="${sr##*:}" ppref=flt-runs-B/s${subjid} opref=flt-runs-B/s${subjid}/r${runid} nmeeg_comp_analysis \ -pattern BL0 ${ppref}_blinks_P000_eig.txt \ -pattern H10 ${ppref}_nonblinks_P000_eig.txt \ -normalize \ -writeComp BL0 ${opref}_BL0.txt = BL0 \ -writeComp H10 ${opref}_H10.txt = H10 \ -writeComp BLH ${opref}_BLH.txt = BL0 H10 \ -delete BL0 H10 \ < ${opref}.txt \ > ${opref}_RES.txt nmeeg_plot_channels.sh SHOW ${opref}_BL0 0 ${vmax} 0 9999 0 0 1400 500 nmeeg_plot_channels.sh SHOW ${opref}_H10 0 ${vmax} 0 9999 0 0 1400 500 nmeeg_plot_channels.sh SHOW ${opref}_BLH 0 ${vmax} 0 9999 0 0 1400 500 nmeeg_plot_channels.sh SHOW ${opref}_RES 0 ${vmax} 0 9999 0 0 1400 500 done EXPORTING SOME TO DROPBOX Moving selected files to Dropbox for easy access: projdir="." dropdir="${HOME}/Dropbox/eeg_sinal_ghislain/2013-11-15-stolfi" echo "${runs[@]}" for run in ${runs[@]} ; do cp -av ${projdir}/raw-runs/${run}{.txt,.png,_pwr.png} ${dropdir}/raw-runs/ cp -av ${projdir}/flt-runs-B/${run}{.txt,.png,_pwr.png} ${dropdir}/flt-runs-B/ cp -av ${projdir}/nse-runs-B/${run}{.txt,.png,_pwr.png} ${dropdir}/nse-runs-B/ done >>>> TO FIX AND DO >>>> Tabulating the range of electrodes Fp1 and Fp2 between trigger pulses 1 and 4, and flagging those that vary too much: vfile="flt-runs-B/Fp-variation.txt" rm -f ${vfile} for f in flt-runs-B/s???/r?????.txt ; do printf "%s " "${f}" >> ${vfile} cat ${f} \ | get_electrode_ranges.gawk -v ix1=4 -v ix2=12 -v ixt=21 -v dvmax=150 \ >> ${vfile} done SAMPLE RUNS FOR TESTING Identified two runs from each subject, one "Bio" and one "nonBio", both apparently free from blinks in the inter-pulse region. datadir=${HOME}/programs/c/NEUROMAT/neuromat_show_eeg/tests/data cp -av raw-runs/s001/r005.txt ${datadir} # nonBio cp -av raw-runs/s001/r039.txt ${datadir} # Bio cp -av raw-runs/s002/r016.txt ${datadir} # Bio cp -av raw-runs/s002/r043.txt ${datadir} # nonBio cp -av raw-runs/s003/r009.txt ${datadir} # Bio cp -av raw-runs/s003/r030.txt ${datadir} # nonBio cp -av raw-runs/s004/r030.txt ${datadir} # nonBio cp -av raw-runs/s004/r036.txt ${datadir} # Bio cp -av raw-runs/s005/r014.txt ${datadir} # Bio cp -av raw-runs/s005/r044.txt ${datadir} # nonBio cp -av raw-runs/s006/r024.txt ${datadir} # Bio cp -av raw-runs/s006/r042.txt ${datadir} # nonBio cp -av raw-runs/s007/r028.txt ${datadir} # nonBio cp -av raw-runs/s007/r046.txt ${datadir} # Bio cp -av raw-runs/s008/r015.txt ${datadir} # Bio cp -av raw-runs/s008/r049.txt ${datadir} # nonBio