# Last edited on 2019-10-28 13:34:22 by jstolfi

EXTRACTING TRADE SUMMARY DATA

  Collected transaction data from some Bitcoin exchanges from the 
  site "http://bitcoincharts.com/charts/", summarized with various time steps.
  The files were cleaned with manual editing,
  then {cleanup_bitcoincharts_data.sh}, and then
  with {check_and_fix_price_files.sh} (see below)
  Files are in the "fix" subdirectory directory.
  
  The typical file name is
  "{TDLO}--{TDHI}-{EXCHANGE}-{CURRENCY}-{TIMESTEP}.txt" where
  {TDLO} and {TDHI} are date-times in the format "%Y-%m-%d-%H%M%S",
  {EXCHANGE} is the exchange's 4-character tag, {CURRENCY} is the
  conversion currency (USD, EUR, etc.) and {TIMESTEP} is "01m", "05m",
  "01h", "01d", etc..
  
  The "%H%M%S" can be omitted in both dates, and the
  "%Y-" and possibly "%m-" can be omitted from {TDHI} if it is the
  same as in {TDLO}. All dates and times, in the file names and
  contents, are UTC.
  
  The data file includes all synchronized intervals of length {TIMESTEP} that
  intersect the range {TDLO} to {TDHI} inclusive.
  The file may include one additional interval at each end,
  for context. 
  
COLLECTING DATA FROM BITCOINCHARTS

  To collect daily summary data from bitcoincharts.com, I copy the 
  "download data" of all the relevant exchange/currency pairs ${ex}/${cr}
  (see volumes/relevant-ex-cr-pairs.dir). Some manual editing is
  necessary to remove or comment out non-data lines, and replacing em-dash by 0.
  
  For manual editing, I found it convenient to put everything in a
  single file called ".new", with the lines of each pair preceded by
  70 "~"s and "#FILE ${date1}--${date2}-${ex}-${cr}-01d-raw.txt" Then
  I run
  
    splitsep < .new
      
  That puts each pair in a separate file. 
      
REFORMATTING AND CLEANING "raw" FILES:

  After the individual series data files are in "raw/", I do one
  of the following.
  
  If all date intervals are the same:
    
    cleanup_all_bitcoinchart_raw_files.sh 2014-09-26 2014-11-27 01d

  If the date intervals are not the same:
  
    fnames=( `ls raw/*--*-[A-Z]???-[A-Z]??-[0-9]?[hdmw].txt | sed -e 's:^raw/::g' -e 's:[.]txt::g'` )
    for fname in "${fnames[@]}" ; do
      excr="${fname%%-[0-9][0-9][mhdw]}"
      excr="${excr##*[0-9][0-9]-}"
      ex="${excr%%-*}"
      cr="${excr##*-}"
      rawfile="raw/${fname}.txt"
      orgfile="org/${fname}.txt"
      cleanup_bitcoincharts_data.sh "${ex}" "${cr}" "${rawfile}" \
        > ${orgfile}
      echo "=== ${fname} ex = ${ex} cr = ${cr} ==="
      # cat ${orgfile}
    done

COMBINED SOME "org" FILES

  Created some special "01h" files for tests:

    for excur in MGOX:USD BSTP:USD BTCC:CNY BTCE:USD ; do 
      ex="${excur%%:*}"
      curr="${excur##*:}"
      ( cd org && \
        join_recent_hourly_files.sh \
          ${ex} ${curr} \
          2013-11-28 2014-01-17 \
          "Two-month section with nearly zero mean price increase" \
          2013-11-28--2013-12-19-${ex}-${curr}-01h.txt \
          2013-12-20--2014-01-17-${ex}-${curr}-01h.txt \
      )
    done

    for excur in MGOX:USD BSTP:USD BTCC:CNY BTCE:USD ; do 
      ex="${excur%%:*}"
      curr="${excur##*:}"
      ( cd org && \
        join_recent_hourly_files.sh \
          ${ex} ${curr} \
          2013-11-01 2014-01-17 \
          "Spans the Nov/2013 rally and crash" \
          2013-11-01--2013-11-27-${ex}-${curr}-01h.txt \
          2013-11-28--2013-12-19-${ex}-${curr}-01h.txt \
          2013-12-20--2014-01-17-${ex}-${curr}-01h.txt \
      )
    done

    for excur in MGOX:USD BSTP:USD BTCC:CNY BTCE:USD ; do 
      ex="${excur%%:*}"
      curr="${excur##*:}"
      ( cd org && \
        join_recent_hourly_files.sh \
          ${ex} ${curr} \
          2013-09-01 2014-01-17 \
          "Spans the Sep/2013 plateau and the Nov/2013 rally and crash" \
          2013-09-01--2013-10-31-${ex}-${curr}-01h.txt \
          2013-11-01--2013-11-27-${ex}-${curr}-01h.txt \
          2013-11-28--2013-12-19-${ex}-${curr}-01h.txt \
          2013-12-20--2014-01-17-${ex}-${curr}-01h.txt \
      )
    done

CHECKING VALUES AND WRITING "fix" FILES

  Checking the consistency of prices and volumes:

    for sp in 43200:12h 3600:01h 900:15m 300:05m 86400:01d 60:01m ; do 
      secs="${sp%%:*}"
      per="${sp##*:}"
      ( cd org && ( ls 20*-${per}.txt || echo NONE) 2> /dev/null ) | egrep -v -e 'NONE|[*]' | sort > .orgnew-${per}.dir
      if [[ -s .orgnew-${per}.dir ]]; then
        check_and_fix_price_files.sh ${secs} `cat .orgnew-${per}.dir`
      else
        echo '!!'" no '${per}' files found" 2>&1
      fi
    done   

CHECK DATE/TIME RANGES

  # Checking the date ranges in the "fix" files.
  # Every file should be renamed to match the actual date range.

  rm -f .checks
  for ff in `( cd fix && ls 20*.txt )` ; do
    echo ${ff} >> .checks
    cat fix/${ff} \
      | check_date_range.gawk -v fname="${ff}" \
      >> .checks 2>&1
  done
  
PLOTTING INTERVALS SPANNED

  Testing the plot of time intervals spanned by various files:
  
    show_date_intervals.sh \
        fix "BSTP" "USD" "01h" \
        "Series files for BSTP USD:BTC - 01h intervals" \
        2010-06-27 2019-11-31

    show_all_fix_intervals.sh
    
  This created files 
  
    fix/.tags.txt
    fix/.single-files.dir
    fix/.merge-ops-${tag}.dir
    plots/date-intervals-${tag}.png
    
MERGING INTERVALS:

  Assumes that there are the files 
  
    fix/.tags.txt
    fix/.merge-ops-${tag}.dir
    
  for each tag that has two or more price data files.
  
  Remove some that can't be merged because of gaps or 
  because they are synthetic files:
  
    rm -fv fix/.merge-ops-BRWN-*.dir
    rm -fv fix/.merge-ops-PREF-*.dir
    rm -fv fix/.merge-ops-TEST-*.dir
    rm -fv fix/.merge-ops-BSTP-USD-01m.dir
    rm -fv fix/.merge-ops-BTCC-CNY-05m.dir
    rm -fv fix/.merge-ops-BTCE-USD-01m.dir
    rm -fv fix/.merge-ops-MCBT-BRL-01d.dir
    rm -fv fix/.merge-ops-MGOX-USD-01m.dir
    
  Create files "fix/.master-${tag}.txt" with the date interval
  and tag of merged files:
  
    create_master_merge_files.sh
   
MERGING FILES:

  There were half a dozen inconsistencies between files,
  usually at the last line where the data was still
  incomplete at the source.  Fixed by hand.

  Now we check which combinations "{EX}-{CUR}-{TINT}" 
  don't need to be merged because one of the files is
  already the merged file:

    merge_price_files.sh "fix" BFNX-USD-01d
    merge_price_files.sh "fix" BSTP-USD-01d
    merge_price_files.sh "fix" OKCO-CNY-01d
    merge_price_files.sh "fix" ANEX-HKD-01d

    merge_all_fix_price_files.sh

CHECKING WHICH RAW FILES CAN BE JUNKED:

  Finding files in "raw/" that have a version in "org/" 

    ( cd raw && ls 20*-[0-9][0-9][a-z].txt ) | sort > .rawfiles
    ( cd org && ls 20*-[0-9][0-9][a-z].txt ) | sort > .orgfiles
    bool 1.2 .rawfiles .orgfiles > .rawjunk
    bool 1-2 .rawfiles .orgfiles > .rawnew

  Files in ".rawjunk can be moved to JUNK/raw/:
  
    for f in `cat .rawjunk` ; do mv -vi raw/$f JUNK/raw/$f ; done
    
  Files in ".rawnew" should be pushed through {cleanup_bitcoincharts_data.sh}
  as per above, ad then this step should be repeated.

CHECKING "org" FILES THAT CAN BE JUNKED

  Finding files in "org/" that have a version in "fix/" 

    ( cd org && ls 20*-[0-9][0-9][a-z].txt ) | sort > .orgfiles
    ( cd fix && ls 20*-[0-9][0-9][a-z].txt ) | sort > .fixfiles
    bool 1.2 .orgfiles .fixfiles > .orgjunk
    bool 1-2 .orgfiles .fixfiles > .orgnew

  Files in ".orgjunk can be moved to JUNK/org/:
  
    for f in `cat .orgjunk` ; do mv -vi org/$f JUNK/org/$f ; done
    
  Files in ".orgnew" should be pushed through {check_and_fix_price_files.sh}
  as per above, ad then this step should be repeated.

PLOTTING PRICES

  # Plotting some:
  
    plot_prices_old.sh \
      "2013-12-08 (1 min)" \
      '(1/60.0)' 0.0  6 \
      650 850 \
      fix/2013-12-08-MGOX-USD-01m.txt "MtGOX" 1.000 \
      fix/2013-12-08-BTCE-USD-01m.txt "BTC-e" 1.000 \
      > plots/foo.png
      
    plot_prices_old.sh \
      "2013-12-17 (5 min)" \
      '(5/60.0)' 0.0  6 \
      610 850 \
      fix/2013-12-17-BTCC-CNY-05m.txt "BTC-China" 5.300 \
      fix/2013-12-17-BSTP-USD-05m.txt "Bitstamp"  0.950 \
      fix/2013-12-17-MGOX-USD-05m.txt "MtGOX"     1.000 \
      > plots/foo.png
      
    plot_prices_old.sh \
      "2013-11-01 to 2013-11-30 (12 hours)" \
      0.5 0.0  4 \
      205 1350 \
      fix/2013-11-01--2013-11-30-BTCC-CNY-12h.txt "BTC-China" 5.850 \
      fix/2013-11-01--2013-11-30-BSTP-USD-12h.txt "Bitstamp"  0.926 \
      fix/2013-11-01--2013-11-30-MGOX-USD-12h.txt "MtGOX"     1.000 \
      > plots/foo.png
      
    plot_prices_old.sh \
      "2013-11-28 to 2013-12-19 (hourly)" \
      '(1/24.0)' '(-2.0)'  4 \
      360 1350 \
      fix/2013-11-28--2013-12-19-MGOX-USD-01h.txt "MtGOX"     1.000 \
      fix/2013-11-28--2013-12-19-BTCC-CNY-01h.txt "BTC-China" 5.850 \
      > plots/foo.png
      
    plot_prices_old.sh \
      "2013-12-16 to 2013-12-20 (15 min)" \
      '(0.25/24.0)' '(+16.0)'  4 \
      340 1100 \
      fix/2013-12-16--2013-12-20-BTCC-CNY-15m.txt "BTC-China" 5.600 \
      fix/2013-12-16--2013-12-20-BSTP-USD-15m.txt "Bitstamp"  0.926 \
      fix/2013-12-16--2013-12-20-MGOX-USD-15m.txt "MtGOX"     1.000 \
      > plots/2013-12-16--12-20-BTCC5600-BSTP-MGOX-15m.png
      
    plot_prices_old.sh \
      "2013-12-16 to 2013-12-20 (15 min)" \
      '(0.25/24.0)' '(+16.0)'  4 \
      340 1100 \
      fix/2013-12-16--2013-12-20-BTCC-CNY-15m.txt "BTC-China" 5.200 \
      fix/2013-12-16--2013-12-20-BSTP-USD-15m.txt "Bitstamp"  0.926 \
      fix/2013-12-16--2013-12-20-MGOX-USD-15m.txt "MtGOX"     1.000 \
      > plots/2013-12-16--12-20-BTCC5200-BSTP-MGOX-15m.png
      
    plot_prices_old.sh \
      "2013-12-16 to 2013-12-20 (15 min)" \
      '(0.25/24.0)' '(+16.0)'  4 \
      340 1100 \
      fix/2013-12-16--2013-12-20-BTCC-CNY-15m.txt "BTC-China" 5.000 \
      fix/2013-12-16--2013-12-20-BSTP-USD-15m.txt "Bitstamp"  0.926 \
      fix/2013-12-16--2013-12-20-MGOX-USD-15m.txt "MtGOX"     1.000 \
      > plots/2013-12-16--2013-12-20-BTCC5000-BSTP-MGOX-15m.png
    
    plot_prices_old.sh \
      "2014-01-05 to 2014-01-13 (hourly)" \
      '(1.00/24.0)' '(+5.0)'  4 \
      720 1020 \
      fix/2014-01-05--2014-01-13-BTCC-CNY-01h.txt "BTC-China" 6.100 \
      fix/2014-01-05--2014-01-13-BSTP-USD-01h.txt "Bitstamp"  1.000 \
      fix/2014-01-05--2014-01-13-MGOX-USD-01h.txt "MtGOX"     1.120 \
      > plots/2014-01-05--2014-01-13-BTCC-BSTP-MGOX-01h.png
    
    plot_prices_old.sh \
      "2014-01-18 to 2014-01-30 (hourly)" \
      '(1/24.0)' '(+18.0)'  4 \
      360 1350 \
      fix/2014-01-18--2014-01-30-MGOX-USD-01h.txt "MtGOX"     1.000 \
      fix/2014-01-18--2014-01-30-BSTP-USD-01h.txt "Bitstamp"  1.000 \
      > plots/foo.png
      
    plot_prices_old.sh \
      "2013-09-01 to 2014-01-30 (hourly)" \
      '(1/24.0)' '(+9.0)'  7.5 \
      40 1550 \
      fix/2013-09-01--2014-01-30-MGOX-USD-01h.txt "MtGOX"     1.000 \
      fix/2013-09-01--2014-01-30-BSTP-USD-01h.txt "Bitstamp"  1.000 \
      > plots/foo.png
      
    plot_prices.sh \
      "Mean daily prices 2010-07-17 to 2015-02-20" \
      '180' 6  \
      2010-06-20 2015-03-10 \
      0.04 1600 \
      "YES" \
      fix/2010-07-17--2014-02-25-MGOX-USD-01d.txt 16 "MtGOX"     1.000 0022ff \
      fix/2011-09-13--2015-02-20-BSTP-USD-01d.txt 16 "Bitstamp"  1.000 0066ff \
      fix/2011-08-14--2015-02-20-BTCE-USD-01d.txt 16 "BTC-e"     1.000 008800 \
      fix/2013-03-31--2015-02-20-BFNX-USD-01d.txt 16 "Bitfinex"  1.000 8800dd \
      fix/2011-06-13--2015-02-20-BTCC-CNY-01d.txt 16 "BTC-China" 6.100 ff0000 \
      fix/2013-06-12--2015-01-23-OKCO-CNY-01d.txt 16 "OKCoin.cn" 6.100 dd4400 \
      > plots/all-data.png
      
  # The official USD to CNY factor as of 2013-12-17 was 6.073.  The factors above were used to get the 
  # plots as close as possible.

PLOTTING PRICE RATIOS

    plot_two_prices_ratio.sh \
      "2013-09-01 to 2014-01-30 (hourly)" \
      '(1/24.0/(365.25/12))' '(+9.0)'  7.5 \
      0.90 1.333 \
      fix/2013-09-01--2014-01-30-MGOX-USD-01h.txt "MtGOX"     1.000 \
      fix/2013-09-01--2014-01-30-BSTP-USD-01h.txt "Bitstamp"  1.000 \
      > plots/2013-09-01--2014-01-30-MGOX-BSTP-ratio.png

REFERENCE PRICE

  See directory "ref-price" and update it as needed.
  
    rundate=2015-02-10
    lodate=2010-07-17
    hidate=2015-02-20
    ifile=../../ref-price/out/${rundate}-refprice-01d.txt
    ofile=${lodate}--${hidate}-PREF-USD-01d.txt
    ( cd fix && rm -f ${ofile} && cp -av ../${ifile} ${ofile} )
    chmod a-w fix/${ofile}

CREATING A TEST FILE

  Creating a daily series with exponential growth with gaps:
  
    make_test_series_file.gawk \
        fix/2010-07-17--2014-02-25-MGOX-USD-01d.txt \
      > fix/2010-07-17--2014-02-25-TEST-USD-01d.txt