The ‘Twitter Hedge Fund’ has an outofsample experience.
Derwent Capital, the Hedge fund which is working with Johan Bollen and Huina Mao to implement their 'Twitter Predictor' strategy, had, until recently, been publishing their monthly returns on the web. This is fairly irregular: hedge funds typically do not release this data due to regulatory concerns and performance anxiety. Even more irregular, as of May 3rd, 2012, the monthly returns were removed from the trading performance webpage. The page now states, “We are going through some exciting changes…more soon,” as does Derwent’s homepage; they no longer appear to be taking new investments.
You can see the last published monthly return values, as of April 27, 2012, in the google cache. I replicate the data here^{1}:
Period:  Jan 12  Feb 12  Mar 12  Apr 12 
Return:  2.04%  3.18%  1.89%  Na 
I added the Na for April 2012. My suspicion is that April was a down month and Derwent panicked, but concede there are numerous alternative explanations. The total return over the period is 7.3%. While this is better than a sharp stick in the eye, is it consistent with the spectacular performance implied by the claims of Bollen’s original paper?
"That 86.7% figure is widely quoted in the media. What people forget is that … you might lose all your money in the 13 or 15% where you’re wrong"
Here is the (composite) null hypothesis that I intend to reject the hell out of:
Derwent is trading the model described by Bollen et. al., longing and shorting the DJIA at the close, at unit leverage or greater, capturing the indicative value of the index, not incurring costs, and the ‘Twitter Predictor’ model has the advertised forecast accuracy of 86.7%.
To reject this null hypothesis, we would have to conclude one of the following:
 Derwent is not trading the Twitter Predictor model because of technical difficulties. Given that they inked a deal with Bollen over a year ago and could be trading on a single ETF once daily, on a signal based on three day old tweets, they would have to be hopelessly incompetent. Since Bollen’s claims imply that Derwent is sitting on a gold mine, one would expect them to spare no expense bringing the strategy to market. How they have failed to do so is inexplicable.
 Derwent is not trading the Twitter Predictor model because they do not believe it is profitable. This seems dishonest to me, since stories on the web link Derwent to Bollen’s paper. Although Derwent’s offering material probably allow them to trade whatever the hell they want, it would be odd indeed if Bollen’s business partner didn’t buy his story, while licensing his technology (and renting his reputation).
 Derwent is trading the Twitter Predictor model, but at less than unit leverage, i.e. they are keeping some part of their money in a ‘riskfree’ instrument, like cash. This would imply that Derwent is experiencing less risk than the strawman markettimer analyzed below. This is very odd behaviour for a hedge fund, especially one collecting only performance fees. Moreover, since backtests indicate little chance of massive drawdowns and astronomical upside potential for the Twitter Predictor, Derwent should be borrowing money to trade at leverage if possible.
 Derwent is trading the Twitter Predictor model and the forecast accuracy quoted in Bollen’s paper did not materialize in the real world.
None of these possibilities is attractive for Bollen; they all point to the ‘Twitter Predictor’ strategy being a statistical phantom, the product of smoke, mirrors and data dredging, amplified by a credulous media which has drank deep the social media KoolAid on Twitter.
Testing the Null
It would be difficult to infer much from only 3 months of data even if you were given the daily marks, less from the monthly marks. However, under the null hypothesis quoted above, we have a toehold. There are a number of technical tricks we can employ (and I do so below), but the simplest test is to compare the live performance to that of a simulated markettimer trading on DJIA with the Bollen’s fabled forecast accuracy.
Monte Carlo simulations
This is a rerun of the previous simulations, but applied to the live trading period. To recap, for one realization of the experiment, I simulate a trader who correctly guesses the sign of the log return of DJIA tomorrow with probability 86.7%, and trades at the close, at unit leverage, long or short based on that guess, over the period 20120101 to 20120331. I perform 10000 such Monte Carlo realizations with different random seeds, recording the total returns of each experiment.
Of the 10000 simulations, exactly 8 did as ‘poorly’ as Derwent, approximately 0.08%. The worst simulation experienced a total return over the period of 3.24%, compared with Derwent’s achieved total return of 7.3%. The median total return over the 10000 simulations is 20.83%.
Estimating the forecast accuracy
Assuming the null hypothesis, we know the absolute value of Derwent’s simple returns on every day in the trading period, although not the sign. We know their total log return over the period, and so we know their mean daily log return, which allows us to estimate their experienced forecast accuracy.
Suppose that, for \(i=1,2,\ldots,n, w_i\) are the absolute values of the daily returns of DJIA in the period in question. Let \(s_i\) be a \(\pm 1\) random variable that equals 1 with probability equal to the forecast accuracy. The total return experienced over the period would then be \[r_t = \prod_i (1 + s_i w_i).\] The fact that simple returns compound in this way is rather an annoyance. We know their total return, thus their total log return, and use this as an estimate of the sum of daily returns as follows: \[\log r_t = \log \prod_i (1 + s_i w_i) = \sum_i \log\left(1 + s_i w_i\right) \approx \sum_i s_i w_i.\] This follows because for \(x\) small we have \(\log (1+x) \approx x\).
The sum on the right above could be used to estimate the forecast accuracy \(p\), which is the probability that the random variable \(s\) equals 1. This is basically just a weighted mean computation, conditional on the weights \(w_i\) being fixed: \[\mathbf{E}\left(\frac{\sum_i w_i s_i}{\sum_i w_i}\right) = 2p  1.\] This means the statistic \[\hat{p} = \frac{1}{2} + \frac{\log r_t}{2 \sum_i w_i}\] can be used to estimate \(p\).
Performing this calculation, I get the estimate \(\hat{p} = 0.64\). A fairly rough 95% confidence interval for the true forecast accuracy is \(\left[0.47,0.8\right]\). Note that at this type I rate we cannot reject the possibility that \(p = 0.5\), i.e. Derwent has no market timing ability. We can reject the hypothesis that \(p = 0.87\).
Sharpe Ratio
If we can estimate the volatility of Derwent’s daily returns, we can compute their achieved Sharpe ratio, based on log returns. Under the null hypothesis, based on historical simulations, this Sharpe ratio should be on the order of \(9\mbox{yr}^{1/2}\).
Let \(r_i\) be Derwent’s daily simple returns on each day in the period in question, and let \(l_i = \log(1 + r_i)\) be the log returns. We do not have Derwent’s daily returns, so cannot compute their log returns on each day. However, under the null hypothesis we assume that \(r_i = w_i,\) the absolute simple returns of DJIA on each day.
We can also get a lower bound on the volatility of log returns as follows. First note that \[\log(1 + r_i) \le \log(1 + r_i) = l_i.\] The sample variance, \(\hat{\sigma}^2\), of the log returns can then be bounded by \[(n1) \hat{\sigma}^2 = \sum_i l_i^2  n \left(\frac{\sum_i l_i}{n}\right)^2 \ge \sum_i \left(\log(1 + r_i)\right)^2  n \left(\frac{\sum_i l_i}{n}\right)^2.\] Note that we have Derwent’s mean returns, \(\sum_i l_i / n\) by transforming their total returns and using the ‘telescoping property.’
Using this lower bound on volatility, we get an upper bound on their Sharpe ratio, calculated on log returns. The value I get is \(3.3\mbox{yr}^{1/2}\), with 95% confidence interval \([0.8\mbox{yr}^{1/2},7.2\mbox{yr}^{1/2}]\). So I am confident that Derwent does not have a Sharpe ratio of \(9\mbox{yr}^{1/2}\), and I reject the null hypothesis.
Note that while \(3.3\mbox{yr}^{1/2}\) should be considered ‘very good’, the sample size is so small we cannot be sure this was not just a fluke.
"We are going through some exciting changes…"
Based on the Monte Carlo simulations and the analysis based on inferred volatility, we can soundly reject the null; Derwent is not trading the Twitter Predictor, or is massively underlevered, or the forecast accuracy failed to materialize.
Note that this says relatively little about whether Derwent is a ‘good’ investment or not. If we cannot assume they are trading on the DJIA in the simple way outlined in the null hypothesis, then the Monte Carlo experiments, the forecast accuracy and Sharpe estimations become meaningless, and we are left with only 3 monthly returns numbers, from which we can infer very little. My goal is merely to show that the forecast accuracy touted in Bollen’s paper has yet to be seen in the real world.
Disclaimer The information provided does not constitute investment advice.
Disclosure author has no holdings in Twitter, holds broad market ETFs which intersect with the DJIA.

I believe these returns are gross, i.e. do not reflect performance fees. ↩