# Last edited on 2019-10-28 03:13:25 by jstolfi EXTRACTING TRADE SUMMARY DATA Collected transaction data from some Bitcoin exchanges from the site "http://bitcoincharts.com/charts/", summarized with various time steps. The files were cleaned with manual editing, then {cleanup_bitcoincharts_data.sh}, and then with {check_and_fix_price_files.sh} (see below) Files are in the "fix" subdirectory directory. The typical file name is "{TDLO}--{TDHI}-{EXCHANGE}-{CURRENCY}-{TIMESTEP}.txt" where {TDLO} and {TDHI} are date-times in the format "%Y-%m-%d-%H%M%S", {EXCHANGE} is the exchange's 4-character tag, {CURRENCY} is the conversion currency (USD, EUR, etc.) and {TIMESTEP} is "01m", "05m", "01h", "01d", etc.. The "%H%M%S" can be omitted in both dates, and the "%Y-" and possibly "%m-" can be omitted from {TDHI} if it is the same as in {TDLO}. All dates and times, in the file names and contents, are UTC. The data file includes all synchronized intervals of length {TIMESTEP} that intersect the range {TDLO} to {TDHI} inclusive. The file may include one additional interval at each end, for context. COLLECTING DATA FROM BITCOINCHARTS To collect daily summary data from bitcoincharts.com, I copy the "download data" of all the relevant exchange/currency pairs ${ex}/${cr} (see volumes/relevant-ex-cr-pairs.dir). Some manual editing is necessary to remove or comment out non-data lines, and replacing em-dash by 0. For manual editing, I found it convenient to put everything in a single file called ".new", with the lines of each pair preceded by 70 "~"s and "#FILE ${date1}--${date2}-${ex}-${cr}-01d-raw.txt" Then I run splitsep < .new That puts each pair in a separate file. REFORMATTING AND CLEANING "raw" FILES: After the individual series data files are in "raw/", I do one of the following. If all date intervals are the same: cleanup_all_bitcoinchart_raw_files.sh 2014-09-26 2014-11-27 01d If the date intervals are not the same: fnames=( `ls raw/*--*-[A-Z]???-[A-Z]??-[0-9]?[hdmw].txt | sed -e 's:^raw/::g' -e 's:[.]txt::g'` ) for fname in "${fnames[@]}" ; do excr="${fname%%-[0-9][0-9][mhdw]}" excr="${excr##*[0-9][0-9]-}" ex="${excr%%-*}" cr="${excr##*-}" rawfile="raw/${fname}.txt" orgfile="org/${fname}.txt" cleanup_bitcoincharts_data.sh "${ex}" "${cr}" "${rawfile}" \ > ${orgfile} echo "=== ${fname} ex = ${ex} cr = ${cr} ===" # cat ${orgfile} done COMBINED SOME "org" FILES Created some special "01h" files for tests: for excur in MGOX:USD BSTP:USD BTCC:CNY BTCE:USD ; do ex="${excur%%:*}" curr="${excur##*:}" ( cd org && \ join_recent_hourly_files.sh \ ${ex} ${curr} \ 2013-11-28 2014-01-17 \ "Two-month section with nearly zero mean price increase" \ 2013-11-28--2013-12-19-${ex}-${curr}-01h.txt \ 2013-12-20--2014-01-17-${ex}-${curr}-01h.txt \ ) done for excur in MGOX:USD BSTP:USD BTCC:CNY BTCE:USD ; do ex="${excur%%:*}" curr="${excur##*:}" ( cd org && \ join_recent_hourly_files.sh \ ${ex} ${curr} \ 2013-11-01 2014-01-17 \ "Spans the Nov/2013 rally and crash" \ 2013-11-01--2013-11-27-${ex}-${curr}-01h.txt \ 2013-11-28--2013-12-19-${ex}-${curr}-01h.txt \ 2013-12-20--2014-01-17-${ex}-${curr}-01h.txt \ ) done for excur in MGOX:USD BSTP:USD BTCC:CNY BTCE:USD ; do ex="${excur%%:*}" curr="${excur##*:}" ( cd org && \ join_recent_hourly_files.sh \ ${ex} ${curr} \ 2013-09-01 2014-01-17 \ "Spans the Sep/2013 plateau and the Nov/2013 rally and crash" \ 2013-09-01--2013-10-31-${ex}-${curr}-01h.txt \ 2013-11-01--2013-11-27-${ex}-${curr}-01h.txt \ 2013-11-28--2013-12-19-${ex}-${curr}-01h.txt \ 2013-12-20--2014-01-17-${ex}-${curr}-01h.txt \ ) done CHECKING VALUES AND WRITING "fix" FILES Checking the consistency of prices and volumes: for sp in 43200:12h 3600:01h 900:15m 300:05m 86400:01d 60:01m ; do secs="${sp%%:*}" per="${sp##*:}" ( cd org && ( ls 20*-${per}.txt || echo NONE) 2> /dev/null ) | egrep -v -e 'NONE|[*]' | sort > .orgnew-${per}.dir if [[ -s .orgnew-${per}.dir ]]; then check_and_fix_price_files.sh ${secs} `cat .orgnew-${per}.dir` else echo '!!'" no '${per}' files found" 2>&1 fi done CHECK DATE/TIME RANGES # Checking the date ranges in the "fix" files. # Every file should be renamed to match the actual date range. rm -f .checks for ff in `( cd fix && ls 20*.txt )` ; do echo ${ff} >> .checks cat fix/${ff} \ | check_date_range.gawk -v fname="${ff}" \ >> .checks 2>&1 done PLOTTING INTERVALS SPANNED Testing the plot of time intervals spanned by various files: show_date_intervals.sh \ fix "BSTP" "USD" "01h" \ "Series files for BSTP USD:BTC - 01h intervals" \ 2010-06-27 2019-11-31 show_all_fix_intervals.sh MERGING FILES: There were half a dozen inconsistencies between files, usually at the last line where the data was still incomplete at the source. There were a few file sets that could not be merged because they had coverage gaps. merge_price_files.sh "fix" `cat fix/.merge-BFNX-USD-01d.dir` merge_price_files.sh "fix" `cat fix/.merge-BSTP-USD-01d.dir` merge_price_files.sh "fix" `cat fix/.merge-OKCO-CNY-01d.dir` merge_price_files.sh "fix" `cat fix/.merge-ANEX-HKD-01d.dir` merge_all_fix_price_files.sh CHECKING WHICH RAW FILES CAN BE JUNKED: Finding files in "raw/" that have a version in "org/" ( cd raw && ls 20*-[0-9][0-9][a-z].txt ) | sort > .rawfiles ( cd org && ls 20*-[0-9][0-9][a-z].txt ) | sort > .orgfiles bool 1.2 .rawfiles .orgfiles > .rawjunk bool 1-2 .rawfiles .orgfiles > .rawnew Files in ".rawjunk can be moved to JUNK/raw/: for f in `cat .rawjunk` ; do mv -vi raw/$f JUNK/raw/$f ; done Files in ".rawnew" should be pushed through {cleanup_bitcoincharts_data.sh} as per above, ad then this step should be repeated. CHECKING "org" FILES THAT CAN BE JUNKED Finding files in "org/" that have a version in "fix/" ( cd org && ls 20*-[0-9][0-9][a-z].txt ) | sort > .orgfiles ( cd fix && ls 20*-[0-9][0-9][a-z].txt ) | sort > .fixfiles bool 1.2 .orgfiles .fixfiles > .orgjunk bool 1-2 .orgfiles .fixfiles > .orgnew Files in ".orgjunk can be moved to JUNK/org/: for f in `cat .orgjunk` ; do mv -vi org/$f JUNK/org/$f ; done Files in ".orgnew" should be pushed through {check_and_fix_price_files.sh} as per above, ad then this step should be repeated. PLOTTING PRICES # Plotting some: plot_prices_old.sh \ "2013-12-08 (1 min)" \ '(1/60.0)' 0.0 6 \ 650 850 \ fix/2013-12-08-MGOX-USD-01m.txt "MtGOX" 1.000 \ fix/2013-12-08-BTCE-USD-01m.txt "BTC-e" 1.000 \ > plots/foo.png plot_prices_old.sh \ "2013-12-17 (5 min)" \ '(5/60.0)' 0.0 6 \ 610 850 \ fix/2013-12-17-BTCC-CNY-05m.txt "BTC-China" 5.300 \ fix/2013-12-17-BSTP-USD-05m.txt "Bitstamp" 0.950 \ fix/2013-12-17-MGOX-USD-05m.txt "MtGOX" 1.000 \ > plots/foo.png plot_prices_old.sh \ "2013-11-01 to 2013-11-30 (12 hours)" \ 0.5 0.0 4 \ 205 1350 \ fix/2013-11-01--2013-11-30-BTCC-CNY-12h.txt "BTC-China" 5.850 \ fix/2013-11-01--2013-11-30-BSTP-USD-12h.txt "Bitstamp" 0.926 \ fix/2013-11-01--2013-11-30-MGOX-USD-12h.txt "MtGOX" 1.000 \ > plots/foo.png plot_prices_old.sh \ "2013-11-28 to 2013-12-19 (hourly)" \ '(1/24.0)' '(-2.0)' 4 \ 360 1350 \ fix/2013-11-28--2013-12-19-MGOX-USD-01h.txt "MtGOX" 1.000 \ fix/2013-11-28--2013-12-19-BTCC-CNY-01h.txt "BTC-China" 5.850 \ > plots/foo.png plot_prices_old.sh \ "2013-12-16 to 2013-12-20 (15 min)" \ '(0.25/24.0)' '(+16.0)' 4 \ 340 1100 \ fix/2013-12-16--2013-12-20-BTCC-CNY-15m.txt "BTC-China" 5.600 \ fix/2013-12-16--2013-12-20-BSTP-USD-15m.txt "Bitstamp" 0.926 \ fix/2013-12-16--2013-12-20-MGOX-USD-15m.txt "MtGOX" 1.000 \ > plots/2013-12-16--12-20-BTCC5600-BSTP-MGOX-15m.png plot_prices_old.sh \ "2013-12-16 to 2013-12-20 (15 min)" \ '(0.25/24.0)' '(+16.0)' 4 \ 340 1100 \ fix/2013-12-16--2013-12-20-BTCC-CNY-15m.txt "BTC-China" 5.200 \ fix/2013-12-16--2013-12-20-BSTP-USD-15m.txt "Bitstamp" 0.926 \ fix/2013-12-16--2013-12-20-MGOX-USD-15m.txt "MtGOX" 1.000 \ > plots/2013-12-16--12-20-BTCC5200-BSTP-MGOX-15m.png plot_prices_old.sh \ "2013-12-16 to 2013-12-20 (15 min)" \ '(0.25/24.0)' '(+16.0)' 4 \ 340 1100 \ fix/2013-12-16--2013-12-20-BTCC-CNY-15m.txt "BTC-China" 5.000 \ fix/2013-12-16--2013-12-20-BSTP-USD-15m.txt "Bitstamp" 0.926 \ fix/2013-12-16--2013-12-20-MGOX-USD-15m.txt "MtGOX" 1.000 \ > plots/2013-12-16--2013-12-20-BTCC5000-BSTP-MGOX-15m.png plot_prices_old.sh \ "2014-01-05 to 2014-01-13 (hourly)" \ '(1.00/24.0)' '(+5.0)' 4 \ 720 1020 \ fix/2014-01-05--2014-01-13-BTCC-CNY-01h.txt "BTC-China" 6.100 \ fix/2014-01-05--2014-01-13-BSTP-USD-01h.txt "Bitstamp" 1.000 \ fix/2014-01-05--2014-01-13-MGOX-USD-01h.txt "MtGOX" 1.120 \ > plots/2014-01-05--2014-01-13-BTCC-BSTP-MGOX-01h.png plot_prices_old.sh \ "2014-01-18 to 2014-01-30 (hourly)" \ '(1/24.0)' '(+18.0)' 4 \ 360 1350 \ fix/2014-01-18--2014-01-30-MGOX-USD-01h.txt "MtGOX" 1.000 \ fix/2014-01-18--2014-01-30-BSTP-USD-01h.txt "Bitstamp" 1.000 \ > plots/foo.png plot_prices_old.sh \ "2013-09-01 to 2014-01-30 (hourly)" \ '(1/24.0)' '(+9.0)' 7.5 \ 40 1550 \ fix/2013-09-01--2014-01-30-MGOX-USD-01h.txt "MtGOX" 1.000 \ fix/2013-09-01--2014-01-30-BSTP-USD-01h.txt "Bitstamp" 1.000 \ > plots/foo.png plot_prices.sh \ "Mean daily prices 2010-07-17 to 2015-02-20" \ '180' 6 \ 2010-06-20 2015-03-10 \ 0.04 1600 \ "YES" \ fix/2010-07-17--2014-02-25-MGOX-USD-01d.txt 16 "MtGOX" 1.000 0022ff \ fix/2011-09-13--2015-02-20-BSTP-USD-01d.txt 16 "Bitstamp" 1.000 0066ff \ fix/2011-08-14--2015-02-20-BTCE-USD-01d.txt 16 "BTC-e" 1.000 008800 \ fix/2013-03-31--2015-02-20-BFNX-USD-01d.txt 16 "Bitfinex" 1.000 8800dd \ fix/2011-06-13--2015-02-20-BTCC-CNY-01d.txt 16 "BTC-China" 6.100 ff0000 \ fix/2013-06-12--2015-01-23-OKCO-CNY-01d.txt 16 "OKCoin.cn" 6.100 dd4400 \ > plots/all-data.png # The official USD to CNY factor as of 2013-12-17 was 6.073. The factors above were used to get the # plots as close as possible. PLOTTING PRICE RATIOS plot_two_prices_ratio.sh \ "2013-09-01 to 2014-01-30 (hourly)" \ '(1/24.0/(365.25/12))' '(+9.0)' 7.5 \ 0.90 1.333 \ fix/2013-09-01--2014-01-30-MGOX-USD-01h.txt "MtGOX" 1.000 \ fix/2013-09-01--2014-01-30-BSTP-USD-01h.txt "Bitstamp" 1.000 \ > plots/2013-09-01--2014-01-30-MGOX-BSTP-ratio.png REFERENCE PRICE See directory "ref-price" and update it as needed. rundate=2015-02-10 lodate=2010-07-17 hidate=2015-02-20 ifile=../../ref-price/out/${rundate}-refprice-01d.txt ofile=${lodate}--${hidate}-PREF-USD-01d.txt ( cd fix && rm -f ${ofile} && cp -av ../${ifile} ${ofile} ) chmod a-w fix/${ofile} CREATING A TEST FILE Creating a daily series with exponential growth with gaps: make_test_series_file.gawk \ fix/2010-07-17--2014-02-25-MGOX-USD-01d.txt \ > fix/2010-07-17--2014-02-25-TEST-USD-01d.txt