<?xml version="1.0" encoding="UTF-8"?>
<rss xmlns:dc="http://purl.org/dc/elements/1.1/" version="2.0"><channel><atom:link rel="hub" href="http://tumblr.superfeedr.com/" xmlns:atom="http://www.w3.org/2005/Atom"/><description>snarky skeptical statistics</description><title>Buy the Hype</title><generator>Tumblr (3.0; @sellthenews)</generator><link>http://sellthenews.tumblr.com/</link><item><title>Piled Higher and Deeper</title><description>&lt;p&gt;The business press is reporting on a recently published paper,
&lt;a href="http://www.nature.com/srep/2013/130425/srep01684/full/srep01684.html#supplementary-information"&gt;Quantifying Trading Behavior in Financial Markets using Google Trends&lt;/a&gt;, by Tobias Preis, Helen Susanannah Moat, and H. Eugene Stanley.
This paper has not had as much impact as that of Bollen &lt;em&gt;et al.&lt;/em&gt;, probably because it does not make
such outlandish claims, but likely also because Google Trends is not as sexy as Twitter.&lt;/p&gt;

&lt;p&gt;The Preis &lt;em&gt;et al.&lt;/em&gt; paper has the dubious distinction of being the worst paper I&amp;#8217;ve read in the last month. 
Here are the problems I found with this paper before giving up on it:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;It is not entirely clear that Google Trends data is causal: the historical data you retrieve now may not
represent what one would have (or even could have) observed at point in time. Google&amp;#8217;s help pages make some
vague reference to data normalization, but neither confirm nor deny causality. If time trends are removed using
all the data, the entire exercise is utterly pointless.&lt;/li&gt;
&lt;li&gt;The authors do not understand how shorting works! They claim that the changes in &amp;#8216;cumulative returns&amp;#8217; from
a short position are \(\log(p(t)) - \log(p(t+1))\). Under this formulation, a short position can experience
unlimited gains but limited losses, when, in fact, the opposite is the case. The proper expression is
\(\log(2 - p(t+1)/p(t))\), which could be undefined if \(p(t+1)/p(t)\) is two or larger. This bungled backtest
accounting introduces a positive bias of order \((1 - p(t+1)/p(t))^2\), which can be large for the weekly hold periods
considered in the paper. The upshot is that short-biased strategies get a tailwind which is pure &amp;#8216;backtest arb&amp;#8217;.&lt;/li&gt;
&lt;li&gt;I am unable to replicate the backtest presented in Figure 2. Note the paper is ambiguous regarding how one
should act if the change in trend data is exactly zero (this occurs around 5% of the time for &amp;#8216;debt&amp;#8217; data,
using a three week normalization window), but breaking the tie in any of the three ways, and backtesting using
the &amp;#8216;corrupt&amp;#8217; method for shorts and a correct method never gives 326% cumulative returns as quoted in the
paper. The &amp;#8216;corrupt&amp;#8217; method does indeed boost total returns and Sharpe ratio. However, under none of the tested
configurations, including the suspect ones, does the Sharpe ratio achieve 95% significance.&lt;/li&gt;
&lt;li&gt;If I am to understand Figure 3 correctly, the &amp;#8216;debt&amp;#8217; signal achieves returns which are 2.31 standard
deviations above the returns of a &amp;#8216;random strategy&amp;#8217;. Presumably the random strategies do not have the shorting
bias that the &amp;#8216;debt&amp;#8217; signal does. However, given that approximately 100 different search terms are tested,
a 2.3 sigma event is not statistically significant when a Bonferroni correction is applied.&lt;/li&gt;
&lt;li&gt;Since the authors (or the paper&amp;#8217;s reviewers, if there indeed were any) are apparently aware of the pitfalls of 
multiple hypothesis testing, they do not draw much attention to the 2.3 sigma event. Rather, they compute the mean
&amp;#8216;Sharpe&amp;#8217; over the 98 strategies, then quote the t-statistic (a whopping 8.6) and p-value. 
Back in the eighties when professional statisticians bemoaned the coming availability of statistical software which 
would allow &lt;em&gt;hoi polloi&lt;/em&gt; to misuse statistical techniques, &lt;em&gt;this&lt;/em&gt; is what they were warning us about. Because
the search term time series measure latent &amp;#8216;interest&amp;#8217; with correlated errors, and because they are all backtested
on the same Dow Jones time history, the errors in the 98 backtests&amp;#8217; returns are correlated. One cannot perform a t-test
on the aggregate statistics without dealing with this correlation, otherwise one is rejecting the (composite)
null for the wrong reason: &lt;em&gt;i.e.&lt;/em&gt; because independence of errors is violated. (The 
&lt;a href="http://papers.ssrn.com/sol3/papers.cfm?abstract_id=907270"&gt;Leung and Wong&lt;/a&gt; test for paired Sharpe seems
more appropriate in this case.)&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;In all, this paper teaches me nothing about the world other than the low standards of the journal &amp;#8216;Scientific
Reports&amp;#8217;, which, I am horrified to find, is somehow associated with the journal &amp;#8216;Nature&amp;#8217;.&lt;/p&gt;

&lt;hr&gt;&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt; The information provided does not constitute investment advice.&lt;/p&gt;</description><link>http://sellthenews.tumblr.com/post/49271345693</link><guid>http://sellthenews.tumblr.com/post/49271345693</guid><pubDate>Tue, 30 Apr 2013 14:33:00 -0400</pubDate><category>quantitative-finance</category><category>market-timing</category><dc:creator>astrawman</dc:creator></item><item><title>Did you ever think your tweets might predict the future?</title><description>&lt;p&gt;Not wanting to get be left behind on all this &amp;#8216;social media&amp;#8217; stuff, Fox Business News 
&lt;a href="http://video.foxbusiness.com/v/2198065964001/can-tweets-predict-the-stock-markets-future/"&gt;trotted out Johan Bollen&lt;/a&gt;
for an interview regarding his research. Bollen notes that his system is designed
for hedge funds looking for a little extra alpha, not retail clients. This displays
shrewd market positioning on his part, since Derwent&amp;#8217;s experiment with bringing 
social media trading to the masses appears to have deflated&amp;#8212;their
recent &amp;#8216;innovative&amp;#8217; self auction earned a 
&lt;a href="http://www.seethefuturenow.com/auction/"&gt;non-binding bid of 120K GBP&lt;/a&gt; for the company, a ROI
of perhaps negative 65 percent on the 
initial 350K invested.&lt;/p&gt;

&lt;p&gt;I would like to believe that Bollen is giving me
a shoutout at 2:11, when he notes:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;It&amp;#8217;s absolutely clear that there&amp;#8217;s communities out there whose purpose is
  simply to spread misinformation or to &amp;#8230; 
  throw a wrench into &amp;#8230; 
  the gears of this algorithm.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;hr&gt;&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt; The information provided does not constitute investment advice.&lt;/p&gt;</description><link>http://sellthenews.tumblr.com/post/45150391415</link><guid>http://sellthenews.tumblr.com/post/45150391415</guid><pubDate>Mon, 11 Mar 2013 20:34:00 -0400</pubDate><category>twitter</category><category>quantitative-finance</category><category>market-timing</category><dc:creator>astrawman</dc:creator></item><item><title>The Sentiment Trading Platform is for Sale</title><description>&lt;p&gt;&lt;a href="http://www.derwentcapitalmarkets.com/"&gt;Derwent Capital&lt;/a&gt;, the former hedge fund turned retail broker
announced that they are &lt;a href="http://www.derwentcapitalmarkets.com/auction/"&gt;auctioning themselves&lt;/a&gt; to the
highest bidder. At the moment, the highest bid id 100K GBP, far lower than the
350K over/under number for profitability, 
&lt;a href="http://www.waterstechnology.com/buy-side-technology/news/2241367/dcm-capital-goes-under-the-hammer"&gt;according to Paul Hawtin&lt;/a&gt;, 
Derwent&amp;#8217;s CEO.
The &amp;#8216;guidance figure&amp;#8217; (read: anchor) is 5M GBP, and as part of the deal you take ownership of
the &amp;#8216;Sentiball&amp;#8217; trademark.&lt;/p&gt;

&lt;p&gt;As Hawtin &lt;a href="http://www.waterstechnology.com/buy-side-technology/news/2241367/dcm-capital-goes-under-the-hammer"&gt;notes:&lt;/a&gt;&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;The beauty of an auction is that you get a true valuation of the company.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;And so I will be greatly amused for the next ten days.&lt;/p&gt;

&lt;hr&gt;&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt; The information provided does not constitute investment advice.&lt;/p&gt;</description><link>http://sellthenews.tumblr.com/post/42530922672</link><guid>http://sellthenews.tumblr.com/post/42530922672</guid><pubDate>Thu, 07 Feb 2013 17:23:13 -0500</pubDate><category>twitter</category><category>quantitative-finance</category><category>derwent</category><dc:creator>astrawman</dc:creator></item><item><title>You had me at the third significant digit</title><description>&lt;p&gt;I have, in the past, 
&lt;a href="http://sellthenews.tumblr.com/post/21067996377/noitdoesnot"&gt;been rather harsh&lt;/a&gt;
on Bollen, Mao and Zheng for their Twitter
paper, which boggles the imagination with its naïveté. However, to their credit, 
theirs is not clearly the most ridiculous &amp;#8216;quant&amp;#8217; paper I have ever seen.
A recent contender for that distinction is 
&lt;em&gt;Limited Attention, Salience, and Stock Returns&lt;/em&gt;, by A. Subrahmanyam, J. Wei, and H-Y. Yu,
dated March 25, 2012.
Here is the abstract:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;We show that a long-short portfolio based on stocks that have just arrived to and left from
    extreme winner and loser deciles materially outperforms a conventional momentum portfolio. A
    6-month-ranking and 6-month-holding portfolio based on the standard Jegadeesh and Titman
    (1993, 2001) momentum strategy commands an average monthly return of 1.20% and a Sharpe
    ratio of 0.262 over the past four decades; the corresponding numbers for our long-short portfolio
    are 10.30% and 1.035, respectively. For the 2001-2010 period, our monthly return is even higher
    at 16.38%, compounding to an annual return of 517.36%. The sheer size of these profits poses a
    further, significant challenge to the asset pricing literature and the market efficiency hypothesis.
    We propose that arrival to an extreme decile is a salient signal that attracts retail investor
    attention, and stimulates strong buying, boosting returns. Supporting this explanation, we show
    that there is significantly abnormal buying pressure in extreme decile arrivals that reverses in the
    longer run.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This paper was &lt;em&gt;formerly&lt;/em&gt; posted at &lt;a href="http://ssrn.com/abstract=2027950"&gt;SSRN&lt;/a&gt;, 
but was mysteriously removed less than two weeks after 
the publication date (and after receiving some 
attention at &lt;a href="http://tinyurl.com/bma27ol"&gt;CXO Advisory&lt;/a&gt;.)&lt;/p&gt;

&lt;p&gt;Some relevant facts about their purported &amp;#8220;challenge&amp;#8221; to the Efficient Markets Hypothesis
which are omitted from the abstract: their strategy rebalances monthly; they delay 
their signal by a month; the quoted Sharpe ratio numbers are &lt;em&gt;monthly&lt;/em&gt;; no mention
is made of leverage.&lt;/p&gt;

&lt;p&gt;So as a recap, the claim is that if one trades once a month, on a month-old signal,
based on a 12 month moving average of publicly available price and volume data,
on U. S. equities, one can capture a Sharpe around 3.5\(\mbox{yr}^{-1/2}\) and
annualized returns over 500 percent. Moreover, the returns have been measured with
no fewer than five significant digits.&lt;/p&gt;

&lt;p&gt;If your error-checking sense is not properly calibrated, you should have goosebumps
right now. If so, I am going to marsh your mellow by revealing that these results are,
indeed, too good to be true. There is no conceivable way such a large effect could
have lurked, unnoticed, within the landscape of technical strategies for five years,
much less for four decades. Moreover, to suggest that the returns of U.S. equities
could be predicted with such certainty based on a month-old highly autocorrelated signal
is ludicrous.&lt;/p&gt;

&lt;p&gt;Luckily for the world, someone must have notified the authors of their mistake,
and the paper went down the memory hole. The alternative explanation is that Derwent
inked a deal with Subrahmanyam, Wei, and Yu to license their technology, and they 
went into stealth mode.&lt;/p&gt;</description><link>http://sellthenews.tumblr.com/post/27963772643</link><guid>http://sellthenews.tumblr.com/post/27963772643</guid><pubDate>Wed, 25 Jul 2012 01:13:00 -0400</pubDate><category>quantitative-finance</category><dc:creator>astrawman</dc:creator></item><item><title>Converting Timing Edge to Sharpe</title><description>&lt;p&gt;Let \(x_t\) be the time series of relative returns of some instrument. As a very
rudimentary market timing model, suppose you have a signal \(s_{t-1}\) which equals
\(\mbox{sign}\left(x_t\right)\) with probability \(p = \frac{1}{2} + g\), and otherwise
equals \(-\mbox{sign}\left(x_t\right)\). Here \(g \in \left[-0.5,0.5\right]\) is one&amp;#8217;s
timing &lt;em&gt;edge&lt;/em&gt; over a coin flip. Note that I find this model somewhat weird because
the probability of correctly guessing tomorrow&amp;#8217;s returns is independent of the 
absolute &lt;em&gt;magnitude&lt;/em&gt; of the return. This is, however, the model evidently envisioned
by Bollen, Mao and Zeng, in their 
&lt;a href="http://arxiv.org/abs/1010.3003"&gt;Twitter market timing study.&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;Previously I had &lt;a href="http://sellthenews.tumblr.com/post/21067996377/noitdoesnot"&gt;performed a Monte Carlo experiment&lt;/a&gt;
to convert \(g\), which Bollen &lt;em&gt;et. al.&lt;/em&gt; claim equals \(11/30\) for their model,
into an annualized Sharpe ratio. There is a &amp;#8216;direct&amp;#8217; computation, however, which 
only requires us to estimate a normalized &amp;#8216;spread&amp;#8217; of the market returns.&lt;/p&gt;

&lt;p&gt;Suppose that you hold a long position in the instrument when your signal is positive,
and otherwise hold a short position. The expected returns of your portfolio is, after
a little math, \(2g\mathbf{E}\left[|x|\right]\).  Because one is always either
entirely long or entirely short the market, the second moment of one&amp;#8217;s returns is
\(\mathbf{E}\left[x^2\right]\). Then the true, population, Sharpe ratio of one&amp;#8217;s
strategy is 
\[\psi = \frac{2g\mathbf{E}\left[|x|\right]}{\sqrt{\mathbf{E}\left[x^2\right] - 4g^2\mathbf{E}\left[|x|\right]^2}} = 
\frac{g}{\sqrt{\frac{\mathbf{E}\left[x^2\right]}{4\mathbf{E}\left[|x|\right]^2} - g^2}} =
\frac{g}{\sqrt{\kappa - g^2}}.\]&lt;/p&gt;

&lt;p&gt;It remains only to compute or estimate \(\kappa\) for the given market returns. Note
that when \(g\) is reasonably smaller than \(\sqrt{\kappa}\), a linear approximation is
pretty good. Assuming we trade daily, the &lt;em&gt;annualized&lt;/em&gt; Sharpe ratio has the
linear approximation
\[\psi \approx \sqrt{\frac{253}{\kappa}} g \mbox{ yr}^{-1/2}.\]&lt;/p&gt;

&lt;p&gt;Here is a table showing, for a few different distributions, the computed,
or estimated, value of \(\kappa\), as well as the annualized linear
approximation constant, \(\sqrt{253 / \kappa}\). In the last column,
I give the annualized Sharpe ratio corresponding to an edge of 
0.367, the value claimed by Bollen for
the &amp;#8216;Twitter predictor&amp;#8217;.  The distributions are: zero mean Gaussian 
(the variance does not matter), a Student t with 10 d.f., a Student t
with 4 d.f., and the daily relative returns of 
DJIA from 1930-01-02 to 2012-07-10.&lt;/p&gt;

&lt;table border="1"&gt;&lt;tr&gt;&lt;th&gt;
            distribution
        &lt;/th&gt;
        &lt;th&gt;
            &amp;#92;(\kappa&amp;#92;)
        &lt;/th&gt;
        &lt;th&gt;
            &amp;#92;(\sqrt{253/\kappa}&amp;#92;)
        &lt;/th&gt;
        &lt;th&gt;
            SR for &amp;#92;(g = 0.367&amp;#92;)
        &lt;/th&gt;
    &lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
            Gaussian
        &lt;/td&gt;
        &lt;td&gt;
            0.39 
        &lt;/td&gt;
        &lt;td&gt;
            25 
        &lt;/td&gt;
        &lt;td&gt;
            11 &amp;#92;(\mbox{yr}^{-1/2}&amp;#92;)
        &lt;/td&gt;
    &lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
            t(10)
        &lt;/td&gt;
        &lt;td&gt;
            0.42 
        &lt;/td&gt;
        &lt;td&gt;
            25 
        &lt;/td&gt;
        &lt;td&gt;
            11 &amp;#92;(\mbox{yr}^{-1/2}&amp;#92;)
        &lt;/td&gt;
    &lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
            t(4)
        &lt;/td&gt;
        &lt;td&gt;
            0.5 
        &lt;/td&gt;
        &lt;td&gt;
            22 
        &lt;/td&gt;
        &lt;td&gt;
            9.6 &amp;#92;(\mbox{yr}^{-1/2}&amp;#92;)
        &lt;/td&gt;
    &lt;/tr&gt;&lt;tr&gt;&lt;td&gt;
            DJIA from 1930-01-02 to 2012-07-10
        &lt;/td&gt;
        &lt;td&gt;
            0.59 
        &lt;/td&gt;
        &lt;td&gt;
            21 
        &lt;/td&gt;
        &lt;td&gt;
            8.6 &amp;#92;(\mbox{yr}^{-1/2}&amp;#92;)
        &lt;/td&gt;
    &lt;/tr&gt;&lt;/table&gt;&lt;p&gt;&lt;br/&gt;&lt;/p&gt;

&lt;p&gt;Some takeaways:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;The &amp;#8216;blackjack&amp;#8217; rule of thumb for market timing strategies of this
type is: &lt;em&gt;Annualized Sharpe is twenty-one times the daily edge.&lt;/em&gt; This 
confirms your suspicions that a five percent edge is pretty good.&lt;/li&gt;
&lt;li&gt;The annualized Sharpes for an edge of 0.367
is around 9\(\mbox{yr}^{-1/2}\), consistent with the 
&lt;a href="http://sellthenews.tumblr.com/post/21067996377/noitdoesnot"&gt;previous Monte Carlo findings&lt;/a&gt;.
I do not believe a Sharpe as high as 2.5\(\mbox{yr}^{-1/2}\) has ever been
realized for a market timing strategy (assuming at least three years of trading).
Bollen&amp;#8217;s strategy, were it real and not a statistical ghost, would represent
&lt;strong&gt;the greatest quantitative strategy ever discovered.&lt;/strong&gt; &lt;/li&gt;
&lt;li&gt;Fatter tailed distributions appear to have larger \(\kappa\), and thus a
constant edge has lower Sharpe in these markets.&lt;/li&gt;
&lt;/ol&gt;&lt;h2&gt;A More Powerful Yeti Detector&lt;/h2&gt;

&lt;p&gt;Given \(n\) observations of such a market timing signal, along with
the leading market returns, the standard error on the estimated edge, 
assuming the edge is near zero, is around \(\left(4n\right)^{-1/2}\).
Using the linear approximation to convert edge into a daily Sharpe ratio, the
standard error on Sharpe should be around \(\left(4n\kappa\right)^{-1/2}\).&lt;/p&gt;

&lt;p&gt;If, on the other hand, you backtested your timing strategy and computed the 
sample Sharpe ratio, the standard error&lt;sup id="fnref:p26960651244-1"&gt;&lt;a href="#fn:p26960651244-1" rel="footnote"&gt;1&lt;/a&gt;&lt;/sup&gt; is around \(n^{-1/2}\).
Noting that, by Jensen&amp;#8217;s Inequality, \(\kappa \ge 0.25\), it appears that the standard error on Sharpe as computed via
edge estimation is smaller than that computed by a backtest. Which
is to say you should get somewhat tighter estimates (modulo the uncertainty
in \(\kappa\)!) of Sharpe based on the edge method.&lt;/p&gt;

&lt;p&gt;Unfortunately, this is something like having a more powerful Yeti detector:
If there were Yetis in the wild, you would be looking pretty smart; however,
since the incidence rate is effectively zero, you&amp;#8217;re just making type I 
errors. So it goes with market timing.&lt;/p&gt;

&lt;hr&gt;&lt;div class="footnotes"&gt;
&lt;hr&gt;&lt;ol&gt;&lt;li id="fn:p26960651244-1"&gt;
&lt;p&gt;This assumes that the daily Sharpe is modest, less than 0.15, say, and the
market is not terribly skewed. These are rough comparisons. &lt;a href="#fnref:p26960651244-1" rev="footnote"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;&lt;/div&gt;</description><link>http://sellthenews.tumblr.com/post/26960651244</link><guid>http://sellthenews.tumblr.com/post/26960651244</guid><pubDate>Wed, 11 Jul 2012 01:21:00 -0400</pubDate><category>market-timing</category><category>quantitative-finance</category><dc:creator>astrawman</dc:creator></item><item><title>Thanks for your response. Question: Why did you address your letter to Johan Bollen and not to Derwent?</title><description>&lt;p&gt;Bollen, who I lump with his co-authors, is an academic. In the long term, his
reputation is, or should be, worth more than a short term deal with a hedge
fund. He would be better served, and less harmed, than Derwent, by admitting
that ‘mistakes were made’ and moving on. After all, this is how the scientific
process is supposed to work: when a theory is inconsistent with facts, we throw
it away. We should applaud Bollen as a real scientist if he retracts his paper.&lt;/p&gt;

&lt;p&gt;On the other hand, sentiment analysis is Derwent Capital’s &lt;em&gt;raison d’etre&lt;/em&gt;; they
have no other gimmick to set themselves apart. They simply cannot renounce Bollen’s 
paper or ‘Twitter market sentiment’.&lt;/p&gt;

&lt;p&gt;I imagine that the sentiment analysis ‘advice’ that Derwent provides on their platform 
is given in a way that is consistent with British securities law&lt;sup id="fnref:p26542555585-1"&gt;&lt;a href="#fn:p26542555585-1" rel="footnote"&gt;1&lt;/a&gt;&lt;/sup&gt;. My guess
is that stock tips from sentiment analysis are no worse, and probably no
better, than stock tips one might get from a human broker, and this latter
process is allowed in the United States, subject to certain
limitations. The overstated certainty of Bollen’s claims, if advertised by
Derwent, might be grounds for a fraud case, but I am no lawyer.&lt;/p&gt;

&lt;p&gt;I &lt;em&gt;am&lt;/em&gt;, however, a statistician, working as a ‘quant’ at a hedge fund. 
Bollen’s paper, in my opinion, has been harmful to the field of quantitative 
finance for many reasons:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;The appalling statistical and logical errors in this 
&lt;a href="http://scholar.google.com/citations?view_op=view_citation&amp;hl=en&amp;user=jDmcdsUAAAAJ&amp;citation_for_view=jDmcdsUAAAAJ:zYLM7Y9cAGgC" title="google scholar"&gt;highly visible&lt;/a&gt; 
publication make a mockery of what little standards the field has.&lt;/li&gt;
&lt;li&gt;His ‘results’ raise a false bar. If I were to go to my employer with a
market timing model that I honestly believed&lt;sup id="fnref:p26542555585-2"&gt;&lt;a href="#fn:p26542555585-2" rel="footnote"&gt;2&lt;/a&gt;&lt;/sup&gt; had a predictive accuracy of
56%, why would my employer take that over Bollen’s 3 day old tweets model?&lt;/li&gt;
&lt;li&gt;Bollen’s paper is often cited as ‘proof’ that sentiment analysis presents
profitable trading strategies. The result has been a massive misallocation of
time, money, and human capital into chasing a statistical ghost. &lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;I also rather suspect that gambling on Twitter sentiment will make a whole lot
of people slightly less wealthy on average, while making a fair amount of money 
in fees for the brokers and sentiment peddlers. I want to believe that, deep
down, Bollen is a nice guy, and would rather not have somebody’s Aunt Tilly
lose her pension on his mistake.&lt;/p&gt;

&lt;hr&gt;&lt;div class="footnotes"&gt;
&lt;hr&gt;&lt;ol&gt;&lt;li id="fn:p26542555585-1"&gt;
&lt;p&gt;I am not familiar enough with either Derwent’s platform or British
securities law to say for certain. &lt;a href="#fnref:p26542555585-1" rev="footnote"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn:p26542555585-2"&gt;
&lt;p&gt;This is a huge hypothetical; I do not, in general, believe in market
timing strategies. &lt;a href="#fnref:p26542555585-2" rev="footnote"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;&lt;/div&gt;</description><link>http://sellthenews.tumblr.com/post/26542555585</link><guid>http://sellthenews.tumblr.com/post/26542555585</guid><pubDate>Thu, 05 Jul 2012 01:45:00 -0400</pubDate><dc:creator>astrawman</dc:creator></item><item><title>do you have the screenshots for the 2012 managed account reports at derwent? thanks.</title><description>&lt;p&gt;Yes, I do. And now the internets have it too:&lt;/p&gt;

&lt;p&gt;&lt;img src="http://i.imgur.com/6KmDi.png" width="95%"/&gt;&lt;/p&gt;</description><link>http://sellthenews.tumblr.com/post/26428715645</link><guid>http://sellthenews.tumblr.com/post/26428715645</guid><pubDate>Tue, 03 Jul 2012 13:21:00 -0400</pubDate><dc:creator>astrawman</dc:creator></item><item><title>An open letter to Johan Bollen</title><description>&lt;p&gt;Johan,&lt;/p&gt;

&lt;p&gt;You may be wondering why you are not living on your own Carribean island
by now.  I had the same feeling once, a long time ago, after my first hedge fund
launch. You will get over it. I am guessing that Paul Hawtin 
is no longer returning your calls, since Derwent 
&lt;a href="http://sellthenews.tumblr.com/post/22334483882/derwents-performance"&gt;somehow fumbled&lt;/a&gt;
in implementing &lt;a href="http://arxiv.org/abs/1010.3003"&gt;your ideas&lt;/a&gt;. Well, you do not need them: I am going to
offer you a chance to redeem your research.&lt;/p&gt;

&lt;p&gt;Your paper claims that using two- or three-day old, publicly available data from
Twitter feeds allows one to predict 
the “daily up and down changes in the closing values of the DJIA” with “87.6% accuracy.” 
I &lt;a href="http://sellthenews.tumblr.com/post/21067996377/noitdoesnot"&gt;remain skeptical&lt;/a&gt; of
this claim; however, I will allow you to prove, to me and the world, the validity of 
your findings. Here is my proposal: over 80 market days, a 4 month period, 
you will send to me, before the start of every market session, a file 
containing the predictions for that day’s movement of the DJIA. 
This should be a simple “up” or “down”, unambiguously coded. We will 
define ‘daily movement’ in such a
way that one could act on the information in a meaningful way: you are to tell me the
future, not the past. You may encrypt the file if you wish, but
you must provide the key at the end of the 4 month period. At the end of that period,
we will determine the accuracy of your method.&lt;/p&gt;

&lt;p&gt;To make this project worth your time, I will wager you the sum of ten thousand 
U.S. dollars&lt;sup id="fnref:p26259141917-1"&gt;&lt;a href="#fn:p26259141917-1" rel="footnote"&gt;1&lt;/a&gt;&lt;/sup&gt;, at even odds, that your predictions are true for not more 
than 57 of the 80 market days.  That is, you must achieve 72% accuracy in
your predictions. This should be fairly easy to achieve given an “87.6% accuracy”: the
probability that you would lose this wager is less than 0.1%. However, if
you are flipping a fair, or nearly fair coin, you will &lt;em&gt;very&lt;/em&gt; likely lose.&lt;/p&gt;

&lt;p&gt;Of course, we would have to formalize this wager by setting the terms very clearly, 
agreeing on how we publicize the results, appointing a third party for dispute resolution, 
defining the definitive source of DJIA marks, putting the money in an escrow account, &lt;em&gt;etc&lt;/em&gt;.
You can save face by claiming that you have neither the time, money or lawyers for this 
kind of charade; that it would taint the integrity of your work; that you have 
nothing to prove, &lt;em&gt;etc&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Thus I expect that you will decline, or more likely, ignore my offer. Sadly, then, I 
must plow on in my crusade to discredit your paper. While that means continuing my 
rants on this obscure blog, I will be forced to escalate my campaign. I am drafting
a letter&lt;sup id="fnref:p26259141917-2"&gt;&lt;a href="#fn:p26259141917-2" rel="footnote"&gt;2&lt;/a&gt;&lt;/sup&gt; to the editors of the Journal of Computational Science urging them,
in no uncertain terms, to retract your paper&lt;sup id="fnref:p26259141917-3"&gt;&lt;a href="#fn:p26259141917-3" rel="footnote"&gt;3&lt;/a&gt;&lt;/sup&gt;.&lt;/p&gt;

&lt;p&gt;The choice is yours, Johan: we can test your hypothesis in public, or I publicize
your failings.&lt;/p&gt;

&lt;p&gt;You can contact me on Twitter, although I predict you will not.&lt;/p&gt;

&lt;hr&gt;&lt;div class="footnotes"&gt;
&lt;hr&gt;&lt;ol&gt;&lt;li id="fn:p26259141917-1"&gt;
&lt;p&gt;Yes, I stole this idea from Mitt Romney. &lt;a href="#fnref:p26259141917-1" rev="footnote"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn:p26259141917-2"&gt;
&lt;p&gt;Readers of this blog: I am looking for co-signers. Contact me on twitter, if you
are interested. &lt;a href="#fnref:p26259141917-2" rev="footnote"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn:p26259141917-3"&gt;
&lt;p&gt;This is maybe not as bad as it sounds, since there is evidence that 
&lt;a href="http://www.esajournals.org/doi/pdf/10.1890/ES10-00142.1"&gt;retracted papers never really die&lt;/a&gt;. &lt;a href="#fnref:p26259141917-3" rev="footnote"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;&lt;/div&gt;</description><link>http://sellthenews.tumblr.com/post/26259141917</link><guid>http://sellthenews.tumblr.com/post/26259141917</guid><pubDate>Sun, 01 Jul 2012 23:57:24 -0400</pubDate><category>Twitter</category><category>Derwent</category><category>Bollen</category><dc:creator>astrawman</dc:creator></item><item><title>Derwent closes shop</title><description>&lt;p&gt;In May the Financial Times &lt;a href="http://www.ft.com/cms/s/0/d5d9c3f8-a5bf-11e1-b77a-00144feabdc0.html#axzz1yHsdv0Es"&gt;reported&lt;/a&gt;
that Derwent Capital, the hedge fund 
that &lt;a href="http://www.idsnews.com/news/story.aspx?id=80469"&gt;partnered with Johan Bollen and Huina Mao&lt;/a&gt; to trade 
the &lt;a href="http://arxiv.org/abs/1010.3003"&gt;&amp;#8220;Twitter Predictor&amp;#8221; Strategy&lt;/a&gt; &amp;#8220;shut down&amp;#8221;.
The official story 
is that Derwent&amp;#8217;s Capital Markets&amp;#8217; Absolute Return fund opened
for investments in July 2011, and shuttered after a single month, with reported
returns of 1.86%.&lt;/p&gt;

&lt;p&gt;There are a few oddities here:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;Why is the FT reporting in May 2012 that a hedge fund closed in August 2011?&lt;sup id="fnref:p25682131606-1"&gt;&lt;a href="#fn:p25682131606-1" rel="footnote"&gt;1&lt;/a&gt;&lt;/sup&gt;
It would seem this is no longer news. To confirm this is not an error on the part
of the Financial Times, I quote a &amp;#8216;weekly sentiment email&amp;#8217; sent by
Derwent Capital on June 6, 2012: 
&amp;#8220;Some of you may have read about our Hedge Fund closing last year in press
articles this week.&amp;#8221; &lt;em&gt;What?&lt;/em&gt; I just caught up on the news of this &amp;#8216;moon landing&amp;#8217;, and
now you&amp;#8217;re telling me there are more events happening in the world?&lt;/li&gt;
&lt;li&gt;As late as the end of March 2012, Derwent was posting performance numbers
for &lt;em&gt;managed accounts&lt;/em&gt; on their webpage. The reported performance was
generally positive, but
&lt;a href="http://sellthenews.tumblr.com/post/22334483882/derwents-performance"&gt;not consistent&lt;/a&gt;,
with the spectacular performance promised by Johan Bollen.
This period of Derwent&amp;#8217;s existence has gone down the memory hole.&lt;/li&gt;
&lt;li&gt;Derwent&amp;#8217;s founder, Paul Hawtin, speaking in the FT, claimed that, 
&amp;#8220;&amp;#8230; [a hedge fund] is a very difficult product to market and there&amp;#8217;s a very small clientele who
can even know about it.&amp;#8221;&lt;sup id="fnref:p25682131606-2"&gt;&lt;a href="#fn:p25682131606-2" rel="footnote"&gt;2&lt;/a&gt;&lt;/sup&gt; If we take Bollen&amp;#8217;s research at face value, however, 
Derwent is &lt;a href="http://sellthenews.tumblr.com/post/21067996377/noitdoesnot"&gt;sitting on a gold mine&lt;/a&gt;;
they do not need &lt;em&gt;clients&lt;/em&gt;, rather, they need a loan, a payday loan will do even. As long as they
are paying less than 400 percent annually, they should borrow money and plow it into the &amp;#8216;Twitter
Predictor&amp;#8217;.&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;Note that the main thrust of the FT story is that Derwent is re-inventing itself as a retail trading platform with &amp;#8216;sentiment measures&amp;#8217; baked into it somehow. That is, they are &lt;em&gt;democratizing&lt;/em&gt; the process of gambling on Johan Bollen&amp;#8217;s faulty statistical practice.&lt;/p&gt;

&lt;div class="footnotes"&gt;
&lt;hr&gt;&lt;ol&gt;&lt;li id="fn:p25682131606-1"&gt;
&lt;p&gt;And why am I re-reporting it a month later? Because I have been busy. &lt;a href="#fnref:p25682131606-1" rev="footnote"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn:p25682131606-2"&gt;
&lt;p&gt;Note, however, that a Google search for &amp;#8220;Derwent Capital&amp;#8221; gives 28K links, and
a news search yields several dozen stories (or, rather, echoes of stories) linking Derwent to
Twitter. &lt;a href="#fnref:p25682131606-2" rev="footnote"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;&lt;/div&gt;</description><link>http://sellthenews.tumblr.com/post/25682131606</link><guid>http://sellthenews.tumblr.com/post/25682131606</guid><pubDate>Fri, 22 Jun 2012 20:34:00 -0400</pubDate><category>twitter</category><category>quantitative-finance</category><category>market-timing</category><category>derwent</category><dc:creator>astrawman</dc:creator></item><item><title>The 'Twitter Hedge Fund' has an out-of-sample experience.</title><description>&lt;p&gt;&lt;a href="http://www.derwentcapitalmarkets.com/"&gt;Derwent Capital&lt;/a&gt;, the Hedge fund which is 
&lt;a href="http://www.idsnews.com/news/story.aspx?id=80469"&gt;working with Johan Bollen and Huina Mao&lt;/a&gt; to implement
their &lt;a href="http://arxiv.org/abs/1010.3003"&gt;&amp;#8216;Twitter Predictor&amp;#8217; strategy&lt;/a&gt;, had, until recently, been
&lt;em&gt;publishing their monthly returns&lt;/em&gt; on the web.  This is fairly irregular: hedge funds typically do 
not release this data due to regulatory concerns and performance anxiety.  Even more irregular, as of
May 3rd, 2012, the monthly returns were removed from the 
&lt;a href="http://www.derwentcapitalmarkets.com/tradingperformance.html"&gt;trading performance webpage.&lt;/a&gt; 
The page now states, &amp;#8220;We are going through some exciting changes&amp;#8230;more soon,&amp;#8221; as does Derwent&amp;#8217;s 
homepage; they no longer appear to be taking new investments.&lt;/p&gt;

&lt;p&gt;You can see the last published monthly return values, as of April 27, 2012, in the 
&lt;a href="http://webcache.googleusercontent.com/search?q=cache:YopzeIpKN8AJ:www.derwentcapitalmarkets.com/tradingperformance.html+&amp;amp;cd=3&amp;amp;hl=en&amp;amp;ct=clnk&amp;amp;gl=us"&gt;google cache.&lt;/a&gt;
I replicate the data here&lt;sup id="fnref:p22334483882-1"&gt;&lt;a href="#fn:p22334483882-1" rel="footnote"&gt;1&lt;/a&gt;&lt;/sup&gt;:&lt;/p&gt;

&lt;table border="1" cellspacing="0" align="center" cellpadding="0" style=" width: 40%; text-align: center;"&gt;&lt;tr&gt;&lt;td width="8%"&gt;
            Period:
        &lt;/td&gt;
        &lt;td width="8%"&gt;
            Jan 12
        &lt;/td&gt;
        &lt;td width="8%"&gt;
            Feb 12
        &lt;/td&gt;
        &lt;td width="8%"&gt;
            Mar 12
        &lt;/td&gt;
        &lt;td width="8%"&gt;
            Apr 12
        &lt;/td&gt;
    &lt;/tr&gt;&lt;tr&gt;&lt;td width="8%"&gt;
            Return:
        &lt;/td&gt;
        &lt;td width="8%"&gt;
            &lt;span style=" color: #ff6600;"&gt;2.04%&lt;/span&gt; 
        &lt;/td&gt;
        &lt;td width="8%"&gt;
            &lt;span style=" color: #ff6600;"&gt;3.18%&lt;/span&gt; 
        &lt;/td&gt;
        &lt;td width="8%"&gt;
            &lt;span style=" color: #ff6600;"&gt;1.89%&lt;/span&gt; 
        &lt;/td&gt;
        &lt;td width="8%"&gt;
            &lt;span style=" color: #ff6600;"&gt;Na&lt;/span&gt; 
        &lt;/td&gt;
    &lt;/tr&gt;&lt;/table&gt;&lt;p&gt;I added the Na for April 2012. My suspicion is that April was a down month and Derwent panicked, but 
concede there are numerous alternative explanations.  The total return over the period 
is 7.3%. 
While this is better than a sharp stick in the eye, is it consistent with the 
&lt;a href="http://sellthenews.tumblr.com/post/21067996377/noitdoesnot"&gt;spectacular performance implied&lt;/a&gt; by the
claims of Bollen&amp;#8217;s &lt;a href="http://arxiv.org/abs/1010.3003"&gt;original paper&lt;/a&gt;?&lt;/p&gt;

&lt;h2&gt;&lt;a href="http://www.forbes.com/sites/thestreet/2012/04/03/stock-trading-and-social-media-dontbuythehype/"&gt;&amp;#8220;That 86.7% figure is widely quoted in the media. What people forget is that &amp;#8230; you might lose all your money in the 13 or 15% where you&amp;#8217;re wrong&amp;#8221;&lt;/a&gt;&lt;/h2&gt;

&lt;p&gt;Here is the (composite) null hypothesis that I intend to reject the hell out of:&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;Derwent is trading the model described by Bollen &lt;em&gt;et. al.&lt;/em&gt;, longing and shorting the DJIA at the close, at unit leverage or greater, capturing the indicative value of the index, not incurring costs, and the &amp;#8216;Twitter Predictor&amp;#8217; model has the advertised forecast accuracy of 86.7%.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;To reject this null hypothesis, we would have to conclude one of the following:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;Derwent is not trading the Twitter Predictor model because of technical difficulties.
Given that they inked a deal with Bollen over a year ago and could be trading on a single ETF
once daily, on a signal based on &lt;em&gt;three day old&lt;/em&gt; tweets, they would have to be hopelessly incompetent.
Since Bollen&amp;#8217;s claims imply that Derwent is 
&lt;a href="http://sellthenews.tumblr.com/post/21067996377/noitdoesnot"&gt;sitting on a gold mine&lt;/a&gt;, one would expect
them to spare no expense bringing the strategy to market.  How they have failed to do so is inexplicable.&lt;/li&gt;
&lt;li&gt;Derwent is not trading the Twitter Predictor model because they do not believe it is profitable. 
This seems dishonest to me, since
stories on the web link Derwent to Bollen&amp;#8217;s paper. Although Derwent&amp;#8217;s offering material probably allow
them to trade whatever the hell they want, it would be odd indeed if Bollen&amp;#8217;s business partner didn&amp;#8217;t
buy his story, while licensing his technology (and renting his reputation).&lt;/li&gt;
&lt;li&gt;Derwent is trading the Twitter Predictor model, but at less than unit leverage, &lt;em&gt;i.e.&lt;/em&gt; they are
keeping some part of their money in a &amp;#8216;risk-free&amp;#8217; instrument, like cash.  This would imply
that Derwent is experiencing less risk than the strawman market-timer analyzed below. 
This is very odd behaviour for a hedge fund, especially one collecting
&lt;a href="http://www.derwentcapitalmarkets.com/privatemanagedaccounts.html"&gt;only performance fees&lt;/a&gt;.
Moreover, since backtests indicate little chance of massive drawdowns and 
&lt;a href="http://sellthenews.tumblr.com/post/21067996377/noitdoesnot"&gt;astronomical upside potential&lt;/a&gt; for
the Twitter Predictor, Derwent should be &lt;em&gt;borrowing money&lt;/em&gt; to trade at leverage if possible.&lt;/li&gt;
&lt;li&gt;Derwent is trading the Twitter Predictor model and the forecast accuracy quoted in Bollen&amp;#8217;s 
paper did not materialize in the real world.&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;None of these possibilities is attractive for Bollen; they all point to the &amp;#8216;Twitter Predictor&amp;#8217;
strategy being a statistical phantom, the product of smoke, mirrors and data dredging, amplified
by a credulous media which has drank deep the social media Kool-Aid on Twitter.&lt;/p&gt;

&lt;h2&gt;Testing the Null&lt;/h2&gt;

&lt;p&gt;It would be difficult to infer much from only 3 months of data even if you 
were given the daily marks, less from the monthly marks. However, under the null hypothesis quoted
above, we have a toehold. There are a number of technical tricks we can employ (and I do so below),
but the simplest test is to compare the live performance to that of a simulated market-timer trading
on DJIA with the Bollen&amp;#8217;s fabled forecast accuracy.&lt;/p&gt;

&lt;h3&gt;Monte Carlo simulations&lt;/h3&gt;

&lt;p&gt;This is a rerun of the 
&lt;a href="http://sellthenews.tumblr.com/post/21067996377/noitdoesnot"&gt;previous simulations&lt;/a&gt;, but applied to the
live trading period. To recap, for one realization of the experiment, I simulate a trader who correctly 
guesses the sign of the log return of DJIA tomorrow with probability 86.7%, and 
trades at the close, at unit leverage, long or short based on that guess, over the period 
2012-01-01 to 2012-03-31.  I perform 10000 such Monte
Carlo realizations with different random seeds, recording the total returns of each experiment.&lt;/p&gt;

&lt;p&gt;Of the 10000 simulations, exactly 8 did as &amp;#8216;poorly&amp;#8217; as Derwent, approximately
0.08%.
The worst simulation experienced a total return over the period of 3.24%, 
compared with Derwent&amp;#8217;s achieved total return of 7.3%. 
The median total return over the 10000 simulations is 20.83%.&lt;/p&gt;

&lt;h3&gt;Estimating the forecast accuracy&lt;/h3&gt;

&lt;p&gt;Assuming the null hypothesis, we know the absolute value of Derwent&amp;#8217;s simple returns
on every day in the trading period, although &lt;em&gt;not the sign&lt;/em&gt;. We know their total log return over the
period, and so we know their mean daily log return, which allows us to estimate their experienced
forecast accuracy.&lt;/p&gt;

&lt;p&gt;Suppose that, for \(i=1,2,\ldots,n, w_i\) are the &lt;em&gt;absolute&lt;/em&gt; values of the daily returns of DJIA 
in the period in question.  Let \(s_i\) be a \(\pm 1\) random variable that equals 1 with probability
equal to the forecast accuracy.  The total return experienced over the period would then be
\[r_t = \prod_i (1 + s_i w_i).\]
The fact that simple returns compound in this way is rather an annoyance. We know their total return, thus
their total log return, and use this as an estimate of the sum of daily returns as follows:
\[\log r_t = \log \prod_i (1 + s_i w_i) = \sum_i \log\left(1 + s_i w_i\right) \approx \sum_i s_i w_i.\]
This follows because for \(x\) small we have \(\log (1+x) \approx x\).&lt;/p&gt;

&lt;p&gt;The sum on the right above could be used to estimate the forecast accuracy \(p\), which is the probability that 
the random variable \(s\) equals 1.  This is basically just a weighted mean computation, conditional on the 
weights \(w_i\) being fixed:
\[\mathbf{E}\left(\frac{\sum_i w_i s_i}{\sum_i w_i}\right) = 2p - 1.\]
This means the statistic
\[\hat{p} = \frac{1}{2} + \frac{\log r_t}{2 \sum_i w_i}\]
can be used to estimate \(p\).&lt;/p&gt;

&lt;p&gt;Performing this calculation, I get the estimate \(\hat{p} = 0.64\). A 
&lt;a href="http://stats.stackexchange.com/q/25895"&gt;fairly rough&lt;/a&gt; 95% confidence
interval for the true forecast accuracy is \(\left[0.47,0.8\right]\).
Note that at this type I rate we cannot reject the possibility that \(p = 0.5\), &lt;em&gt;i.e.&lt;/em&gt; Derwent
has no market timing ability. We &lt;em&gt;can&lt;/em&gt; reject the hypothesis that \(p = 0.87\).&lt;/p&gt;

&lt;h3&gt;Sharpe Ratio&lt;/h3&gt;

&lt;p&gt;If we can estimate the volatility of Derwent&amp;#8217;s daily returns, we can compute their achieved
Sharpe ratio, based on log returns.
Under the null hypothesis, based 
on &lt;a href="http://sellthenews.tumblr.com/post/21067996377/noitdoesnot"&gt;historical simulations&lt;/a&gt;, this Sharpe
ratio should be on the order of \(9\mbox{yr}^{-1/2}\).&lt;/p&gt;

&lt;p&gt;Let \(r_i\) be Derwent&amp;#8217;s daily simple returns on each day in the period in question, and let
\(l_i = \log(1 + r_i)\) be the log returns.
We do not have Derwent&amp;#8217;s daily returns, so cannot compute their log returns on each day. However, 
under the null hypothesis we assume that \(|r_i| = w_i,\) the absolute simple returns of 
DJIA on each day.&lt;/p&gt;

&lt;p&gt;We can also get a lower bound on the volatility of log returns as follows. 
First note that
\[\log(1 + |r_i|) \le |\log(1 + r_i)| = |l_i|.\]
The sample variance, \(\hat{\sigma}^2\), of the log returns can then be bounded by 
\[(n-1) \hat{\sigma}^2 = \sum_i l_i^2 - n \left(\frac{\sum_i l_i}{n}\right)^2 \ge \sum_i \left(\log(1 + |r_i|)\right)^2  - n \left(\frac{\sum_i l_i}{n}\right)^2.\]
Note that we have Derwent&amp;#8217;s mean returns, \(\sum_i l_i / n\) by transforming their total returns and using the
&amp;#8216;telescoping property.&amp;#8217;&lt;/p&gt;

&lt;p&gt;Using this lower bound on volatility, we get an &lt;em&gt;upper&lt;/em&gt; bound on their Sharpe ratio, calculated
on log returns. The value I get is \(3.3\mbox{yr}^{-1/2}\), 
with 95% confidence interval 
\([-0.8\mbox{yr}^{-1/2},7.2\mbox{yr}^{-1/2}]\).
So I am confident that Derwent does not have a Sharpe ratio of \(9\mbox{yr}^{-1/2}\), and
I reject the null hypothesis.&lt;/p&gt;

&lt;p&gt;Note that while \(3.3\mbox{yr}^{-1/2}\) should be considered &amp;#8216;very good&amp;#8217;,
the sample size is so small we cannot be sure this was not just a fluke.&lt;/p&gt;

&lt;h2&gt;&lt;a href="http://www.derwentcapitalmarkets.com/"&gt;&amp;#8220;We are going through some exciting changes&amp;#8230;&amp;#8221;&lt;/a&gt;&lt;/h2&gt;

&lt;p&gt;Based on the Monte Carlo simulations and the analysis based on inferred volatility, we can soundly reject 
the null; Derwent is not trading the Twitter Predictor, or is massively underlevered, or the forecast 
accuracy failed to materialize.&lt;/p&gt;

&lt;p&gt;Note that this says relatively little about whether Derwent is a &amp;#8216;good&amp;#8217; investment or not. If we cannot
assume they are trading on the DJIA in the simple way outlined in the null hypothesis, then the Monte Carlo
experiments, the forecast accuracy and Sharpe estimations become meaningless, and we are left with only
3 monthly returns numbers, from which we can infer very little. My goal is
merely to show that the forecast accuracy touted in Bollen&amp;#8217;s paper has yet to be seen in the real world.&lt;/p&gt;

&lt;hr&gt;&lt;p&gt;&lt;strong&gt;Disclaimer&lt;/strong&gt; The information provided does not constitute investment advice.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Disclosure&lt;/strong&gt; author has no holdings in Twitter, holds broad market ETFs which intersect with the
DJIA.&lt;/p&gt;

&lt;div class="footnotes"&gt;
&lt;hr&gt;&lt;ol&gt;&lt;li id="fn:p22334483882-1"&gt;
&lt;p&gt;I believe these returns are gross, &lt;em&gt;i.e.&lt;/em&gt; do not reflect performance fees. &lt;a href="#fnref:p22334483882-1" rev="footnote"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;&lt;/div&gt;</description><link>http://sellthenews.tumblr.com/post/22334483882</link><guid>http://sellthenews.tumblr.com/post/22334483882</guid><pubDate>Thu, 03 May 2012 16:01:00 -0400</pubDate><category>twitter</category><category>quantitative-finance</category><category>market-timing</category><dc:creator>astrawman</dc:creator></item><item><title>The junk science behind the 'Twitter Hedge Fund'</title><description>&lt;h2&gt;&lt;a href="http://arxiv.org/abs/1010.3003"&gt;&amp;#8220;Twitter Mood Predicts the Stock Market&amp;#8221;&lt;/a&gt;&lt;/h2&gt;

&lt;p&gt;In a widely cited &lt;a href="http://arxiv.org/abs/1010.3003"&gt;study&lt;/a&gt;, Johan Bollen, Huina Mao and Xiao-Jun Zeng claim
that&lt;/p&gt;

&lt;blockquote&gt;
  &lt;p&gt;&amp;#8230; collective mood states derived from large-scale Twitter feeds are correlated to the value of 
   the Dow Jones Industrial Average (DJIA) over time. &amp;#8230; We find an accuracy of 87.6% in predicting 
   the daily up and down changes in the closing values of the DJIA &amp;#8230;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The media have responded to this study with a mix of adulation and credulity.  Perhaps the 
narrative presented by Bollen &lt;em&gt;et. al&lt;/em&gt; is appealing because it assuages our suspicions 
that Twitter is a frivolous waste of time;
or perhaps it fits with the &amp;#8216;needle in the haystack&amp;#8217; technophiliac fantasy;
or perhaps it empowers the &lt;em&gt;hoi polloi&lt;/em&gt;, who now claim the mantle of controlling the Dow Jones Index by
the vagaries of their mood.&lt;/p&gt;

&lt;p&gt;Whatever the appeal of the paper, the story continues to resonate in the internet echo chamber, with ripples
still appearing now, some 18 months after the original publication. Among those reporting this paper,
without any hint of skepticism are
&lt;a href="http://www.telegraph.co.uk/technology/twitter/8755587/Twitter-becomes-latest-tool-for-hedge-fund-managers.html"&gt;The Telegraph&lt;/a&gt;,
&lt;a href="http://www.dailymail.co.uk/sciencetech/article-2036499/Twitter-used-hedge-fund-managers-predict-share-prices.html"&gt;The Daily Mail&lt;/a&gt;,
&lt;a href="http://www.usatoday.com/money/perfi/stocks/2011-05-03-wall-street-traders-mine-tweets_n.htm"&gt;USA Today&lt;/a&gt;,
&lt;a href="http://www.theatlantic.com/technology/archive/2010/10/predicting-stock-market-changes-using-twitter/64897/"&gt;The Atlantic&lt;/a&gt;,
&lt;a href="http://www.wired.com/wiredscience/2010/10/twitter-crystal-ball/"&gt;Wired Magazine&lt;/a&gt;,
&lt;a href="http://techland.time.com/2011/03/24/let-twitter-tell-you-where-to-invest-your-money/"&gt;Time Magazine&lt;/a&gt;,
&lt;a href="http://www.cnbc.com/id/15840232?video=1619250397&amp;amp;play=1"&gt;CNBC&lt;/a&gt;,
&lt;a href="http://archives.cnn.com/TRANSCRIPTS/1010/19/qmb.01.html"&gt;CNN&lt;/a&gt;,
&lt;a href="http://www.npr.org/templates/story/story.php?storyId=130793579"&gt;All Things Considered&lt;/a&gt;,
&lt;a href="http://www.onthemedia.org/2011/nov/11/sentiment-analysis-reveals-how-world-feeling/transcript/"&gt;On the Media&lt;/a&gt;,
and a &lt;a href="https://www.google.com/search?q=johan+bollen+87.6+accuracy"&gt;long tail&lt;/a&gt; of blogs, newspapers, &lt;em&gt;etc.&lt;/em&gt;&lt;/p&gt;

&lt;h2&gt;&lt;a href="http://www.wired.com/wiredscience/2010/10/twitter-crystal-ball/"&gt;&amp;#8220;We were pretty astonished that this actually worked&amp;#8221;&lt;/a&gt;&lt;/h2&gt;

&lt;p&gt;Given the entirely unskeptical reception the Bollen paper 
has received, there is a clear need for a critical evaluation of it, expressed in terms that
can be understood by those with no formal statistical training.&lt;/p&gt;

&lt;p&gt;The principal problems with this paper are:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;The authors exhibit a level of sloppiness that taints the integrity of the results. From
basic accounting mistakes to appalling methodological flaws, these errors call into question whether
&lt;em&gt;any&lt;/em&gt; of their results can be trusted.&lt;/li&gt;
&lt;li&gt;The advertised results, &lt;em&gt;e.g.&lt;/em&gt; the purported forecast accuracy of their system, are biased &amp;#8216;by 
selection.&amp;#8217; Effectively, the authors have picked winners after the race is run, citing the results
of the race as unbiased estimates of true merit, without untangling the effects of luck.&lt;/li&gt;
&lt;li&gt;The advertised 86.7% forecast accuracy is suspect &lt;em&gt;in vacuo&lt;/em&gt;, since it would
yield &lt;em&gt;the greatest quantitative strategy ever discovered.&lt;/em&gt; That this would have
been discovered by newcomers to the world of quantitative finance, and that the strategy
depends only on two- to six-day old public information beggars the imagination. There is no
sensible physical model of how such a large effect could exist, nor any reason it 
would have passed undetected until now.&lt;/li&gt;
&lt;/ol&gt;&lt;h2&gt;&lt;a href="http://www.wired.com/wiredscience/2010/10/twitter-crystal-ball/"&gt;&amp;#8220;That&amp;#8217;s a pretty big result&amp;#8221;&lt;/a&gt;&lt;/h2&gt;

&lt;p&gt;There are roughly three parts of this paper beyond the introductory material:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;The sentiment analysis tools employed are not insane, in that they correctly detect that
people are, in general, happy around Thanksgiving, and uneasy before an election, for example.&lt;/li&gt;
&lt;li&gt;Some of the mood scores found by the sentiment analysis tools are purportedly correlated 
with changes in the DJIA, according to a &amp;#8216;Granger causality&amp;#8217; analysis.&lt;/li&gt;
&lt;li&gt;The raw mood scores are turned into forecasts of daily DJIA movements. Accuracy of the
system is &amp;#8216;confirmed&amp;#8217; by looking at some extra data (a &amp;#8216;hold out&amp;#8217; set) which
was &lt;em&gt;not&lt;/em&gt; used in the training of the predictive models. &lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;I will tackle the last two findings in turn; the first finding is mostly absent of any specific predictive 
claims.&lt;sup id="fnref:p21067996377-1"&gt;&lt;a href="#fn:p21067996377-1" rel="footnote"&gt;1&lt;/a&gt;&lt;/sup&gt; 
Because the paper gives only loose technical details, and the data used are not widely available 
(collecting &lt;em&gt;all&lt;/em&gt; Twitter feeds over a 1 year period is a technically challenging feat),
it is impossible to definitively refute the claims; rather they can only be cast into serious doubt.&lt;/p&gt;

&lt;h2&gt;the Granger Causality tests and Table II&lt;/h2&gt;

&lt;p&gt;The second &amp;#8216;finding&amp;#8217; of Bollen &lt;em&gt;et. al.&lt;/em&gt; is that of purported statistical significance in a 
&lt;a href="http://en.wikipedia.org/wiki/Granger_causality"&gt;Granger causality test&lt;/a&gt;. This is supposed to establish
the ability of raw Twitter mood data to forecast changes in the DJIA index.  There are
numerous technical reasons why such an analysis might malfunction. However, none of them need be
invoked here because the authors make a much more basic statistical blunder, that of not correcting
for &lt;a href="http://en.wikipedia.org/wiki/Multiple_comparisons"&gt;multiple hypothesis testing&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;A classical statistical test spits out a &amp;#8216;p-value&amp;#8217;, which is something like a probability when assuming 
some condition that you would like to rule out. A p-value balances the amount of evidence and its 
strength.  If the resultant p-value is indeed small, smaller than some &amp;#8216;sacred value&amp;#8217;, usually taken to 
be 0.05, one claims that the &amp;#8216;null hypothesis&amp;#8217;, the condition assumed as part of the test, is unlikely, 
or is &amp;#8216;rejected&amp;#8217;.  In the Granger causality analysis being performed here, the &amp;#8216;null hypothesis&amp;#8217;, the 
hypothesis the authors wish to reject, is that the Twitter based signal has no forecasting ability on 
DJIA, and is effectively independent from it.&lt;/p&gt;

&lt;p&gt;Bollen &lt;em&gt;et. al.&lt;/em&gt;, in Table II of their paper, commit the statistical sin of performing
many such tests (49 of them), and then attributing statistical significance to those that have a 
small p-value 
(they display the p-values in boldface if they are less than 0.10, attach two stars to those 
 less than 0.05, &lt;em&gt;etc.&lt;/em&gt;).
If one were to perform ten million such tests, and the null hypothesis were true 
(&lt;em&gt;i.e.&lt;/em&gt; if Twitter did not predict DJIA in any way), one would expect to have one million
resultant p-values less than 0.1, printed in boldface in one&amp;#8217;s enormous table.  Similarly,
one would expect to have one million p-values between 0.317 and 0.417, a hundred thousand
between 0.8349 and 0.8449, &lt;em&gt;etc.&lt;/em&gt;  The presence of many small p-values in this scenario is
simply due to chance &amp;#8216;bad luck&amp;#8217; under the null hypothesis.&lt;/p&gt;

&lt;p&gt;For comparison, here is a plot of the 
&lt;a href="http://en.wikipedia.org/wiki/Empirical_distribution_function"&gt;empirical distribution&lt;/a&gt; of the 
p-values from Table II. 
Under the null hypothesis, as one performs more and more statistical tests one expects the
p-values to be &amp;#8216;uniformly distributed&amp;#8217;, and thus the empirical CDF plot would fall on the \(y=x\) line, 
plotted in red here.  If the null were violated, &lt;em&gt;i.e.&lt;/em&gt; if the Twitter mood data exhibited
&amp;#8216;causality&amp;#8217; on the DJIA movement, we should see a lot of p-values on the left side of the plot,
and the empirical CDF would hug the left side and top of the plot, bowing away from the diagonal.
However, by my eye, the data are consistent with the null hypothesis, and the 
7 p-values less than 0.10 are no more remarkable
than the 13 that are greater than 0.90.&lt;/p&gt;

&lt;p&gt;&lt;img src="http://media.tumblr.com/tumblr_m2gdcoxgqx1r99ok3.jpg" alt=""/&gt;&lt;/p&gt;

&lt;p&gt;Performing a &lt;a href="http://en.wikipedia.org/wiki/Bonferroni_correction"&gt;Bonferroni correction&lt;/a&gt; 
for multiple tests, &lt;em&gt;none&lt;/em&gt; of the p-values from Bollen&amp;#8217;s Table II are considered
statistically significant at the 0.10 level. For the layman, the conclusion to be drawn
is that the evidence is not inconsistent with all the Twitter moods and lags being independent
from movements of the DJIA, and some of them &lt;em&gt;looking&lt;/em&gt; better than others due to chance.&lt;/p&gt;

&lt;h2&gt;the Forecast Model&lt;/h2&gt;

&lt;p&gt;The third, and perhaps most galvanizing, &amp;#8216;finding&amp;#8217; of Bollen &lt;em&gt;et. al.&lt;/em&gt; is of an 
&amp;#8220;accuracy of 87.6% in predicting the daily up and down changes in the closing values of the DJIA.&amp;#8221;
This is formulated in terms of cross-validation of a Neural Net model,
using training and test (or &amp;#8216;hold out&amp;#8217;) sets of data.
The goal is to simulate how this model would be used in the real world, trading real money:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;Train the model using all the data you have up to this very minute;&lt;/li&gt;
&lt;li&gt;Going forward, input each day&amp;#8217;s new Twitter data into the model to get predictions to make trades.&lt;/li&gt;
&lt;li&gt;Repeat this process, retraining the model as is expedient or necessary, and trading the forecasts
every day.&lt;/li&gt;
&lt;/ol&gt;&lt;p&gt;Typically when one trains a model on data, the model&amp;#8217;s own estimate of how well it understands or
can predict that data is optimistic. This is why one tests a model &lt;em&gt;methodology&lt;/em&gt; by training a
model, then validating it&amp;#8217;s predictive ability on data that was not used in building the model.&lt;/p&gt;

&lt;p&gt;This is commonly accepted practice. However, Bollen&amp;#8217;s finding is broken in so many ways:&lt;/p&gt;

&lt;ol&gt;&lt;li&gt;&lt;p&gt;&lt;strong&gt;They got the number wrong.&lt;/strong&gt; They report an accuracy of 87.6% in the abstract and twice in
the paper; they report &lt;em&gt;the same figure&lt;/em&gt; as 86.7% twice, including in Table III.
Since the accuracy estimates are based on 15 (!) days of test data, the correct value
is the smaller one, 86.7% corresponding to the fraction 13 / 15.
The incorrect figure is widely quoted in the media, and was used by Johan Bollen during his
&lt;a href="http://archives.cnn.com/TRANSCRIPTS/1010/19/qmb.01.html"&gt;interview with CNN&lt;/a&gt;. Not that it
matters, because &amp;#8230;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The forecast accuracy is reported with far too many significant figures.&lt;/strong&gt; If the model had
correctly predicted 12 or 14 days&amp;#8217; directions, instead of 
the 13 it did, the number would change by plus or minus 
7 percent.  For the technically minded, the 
&lt;a href="http://en.wikipedia.org/wiki/Standard_error"&gt;standard error&lt;/a&gt; on the accuracy figure
is around 9%, and a 95% lower confidence interval on the
accuracy figure is 72%.  For the layman, the upshot is that
it is not inconceivable that the accuracy of the system is as small as
72%, but it looked better in this experiment simply due to
random luck. In all, reporting two significant figures is unwarranted, much
less three.  The effect is perhaps minor, but it does not instill confidence in the
authors&amp;#8217; attention to detail.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The accuracy figure is biased upward.&lt;/strong&gt; 
The reported 86.7&amp;#160;% accuracy is 
the &lt;em&gt;maximal&lt;/em&gt; accuracy achieved for the 8 models listed in Table III of the 
paper. As in Part II, where the smallest p-values were reported as &amp;#8216;significant&amp;#8217;, when they could 
be explained due to chance, here there is an (upward) bias in the sample accuracy numbers when 
selecting based on those same quantities.&lt;sup id="fnref:p21067996377-2"&gt;&lt;a href="#fn:p21067996377-2" rel="footnote"&gt;2&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;As an analogy, imagine if the 8 models listed in Table III truly
had &lt;em&gt;no&lt;/em&gt; predictive ability, and thus a forecast accuracy of 50%. You can view them as fair coins.
The probability that a single fair coin would land heads 13 or more times out 
of 15 is 0.37%.  This probability is so small it makes us 
doubt the assumption that the models in Table III are really non-predictive.
However, if one were to flip 8 fair coins, the probability that
at least &lt;em&gt;one&lt;/em&gt; of them would land heads 13 or more times out of 
15 is 2.9%.
While this is still small, it is less damning of the assumption of non-predictive models.&lt;/p&gt;

&lt;p&gt;A similar problem exists with selecting the &amp;#8216;best&amp;#8217; model based on some sample
statistic, then using &lt;em&gt;that same sample statistic&lt;/em&gt; as an estimate of a population parameter.&lt;sup id="fnref:p21067996377-3"&gt;&lt;a href="#fn:p21067996377-3" rel="footnote"&gt;3&lt;/a&gt;&lt;/sup&gt;
Here the forecast accuracy of 86.7% is inflated by the fact that we selected the model based
on the estimate.&lt;/p&gt;

&lt;p&gt;And this is only the bias that we can observe from the paper. There is the very real possibility
of unobservable bias, &lt;em&gt;i.e.&lt;/em&gt; &lt;a href="http://en.wikipedia.org/wiki/Data_dredging"&gt;datamining bias&lt;/a&gt; and
&lt;a href="http://en.wikipedia.org/wiki/Publication_bias"&gt;publication bias&lt;/a&gt;. That is, the authors might have
tried numerous different data treatments and algorithms, evaluating the purported out-of-sample
accuracy, before settling on one where the results were considered sufficiently &amp;#8216;interesting.&amp;#8217; 
Continuing the coin flip analogy, if one were to
flip 50 fair coins 15 times, the probability that one of them
would land heads 13 times is 
17%. Now the results seem much less interesting.&lt;/p&gt;

&lt;p&gt;One cannot prove that the authors biased their results in this way. It just provides a 
plausible alternative explanation for the observed &amp;#8216;effect.&amp;#8217; The authors also did themselves
no favors by using such a tiny sample size: if their model had correctly predicted the
direction of the DJIA on 130 out of 150 days instead,
the possible effect of this kind of bias is lessened.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The model accuracy seems high compared to the Granger causality results.&lt;/strong&gt;
The forecast accuracy of 86.7% seems rather high compared to the unconvincing p-values
reported in Bollen&amp;#8217;s Table II.
To test this, I perform some 
&lt;a href="http://en.wikipedia.org/wiki/Monte_Carlo_method"&gt;Monte Carlo experiments&lt;/a&gt;.  For one realization of 
the Monte Carlo experiment, I take the returns of the DJIA index
over the period February 28, 2008 to November 3, 2008, and spawn a random -1/+1 random variable which 
has the sign of the next day&amp;#8217;s DJIA log return with probability \(13 / 15\). 
I then feed it to R&amp;#8217;s &lt;code&gt;grangertest&lt;/code&gt; function, with 2 lags, and record the p-value. I repeat this experiment
200 times. The point of this experiment is to get some kind of feeling for what a binary signal
with the purported accuracy would yield in a Granger analysis.&lt;/p&gt;

&lt;p&gt;The &lt;em&gt;maximum&lt;/em&gt; p-value from 200 Monte Carlo realizations is 2e-05. 
Compare this to the smallest of the 49 p-values reported in Table II, 
0.013.  This is something of an apples-to-oranges comparison because, in
general you cannot just compare p-values, and the Neural Net model can capture non-linear relationships
that the inherently linear Granger model does not. However, it is very suspicious to me that such an
accurate forecast could be made from raw data about which the Granger tests were so ambivalent.&lt;sup id="fnref:p21067996377-4"&gt;&lt;a href="#fn:p21067996377-4" rel="footnote"&gt;4&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;An 86.7% forecast accuracy on DJIA&amp;#8217;s daily movement would represent the
greatest quantitative strategy ever discovered.&lt;/strong&gt; 
As an illustration, here I perform a Monte Carlo simulation of the historical performance of a 
system with the purported forecasting ability. With probability 
\(13 / 15\), the strategy gains the absolute
return of DJIA, and otherwise loses that amount. It trades at 1x leverage on the DJIA
from 1970-01-02 to 2012-04-13.  Here are the performance plots showing, respectively,
the cumulative return, the daily return, and the drawdown from peak. 
&lt;img src="http://media.tumblr.com/tumblr_m2gdcySiq51r99ok3.jpg" alt=""/&gt;&lt;/p&gt;

&lt;p&gt;Note that under the random seed chosen here, the simulation is on the wrong side of 
&lt;a href="http://en.wikipedia.org/wiki/Crash_of_1987"&gt;Black Monday&lt;/a&gt;, and thus the results are mildly
pessimistic. However, the annualized &lt;a href="http://en.wikipedia.org/wiki/Sharpe_ratio"&gt;Sharpe ratio&lt;/a&gt; 
of this backtest is \(9.2\mbox{yr}^{-1/2}\), with 95% confidence interval
\([8.9\mbox{yr}^{-1/2},9.5\mbox{yr}^{-1/2}]\).
It doubles its money every 26 weeks.&lt;br/&gt;
For the layman, the Sharpe ratio is the metric (other than &lt;em&gt;ex post&lt;/em&gt; returns!) by which 
trading strategies are measured.  To put these figures into context, an achieved 
(&lt;em&gt;i.e.&lt;/em&gt; in real trading, not backtesting) Sharpe ratio of 
\(1\mbox{yr}^{-1/2}\) is considered &amp;#8216;good&amp;#8217;; an achieved 
value of \(2\mbox{yr}^{-1/2}\) is &amp;#8216;excellent&amp;#8217;; anything north of \(3\mbox{yr}^{-1/2}\)
is the stuff of legend.&lt;sup id="fnref:p21067996377-5"&gt;&lt;a href="#fn:p21067996377-5" rel="footnote"&gt;5&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;p&gt;I have read dozens of papers on quantitative strategies and market timing&lt;sup id="fnref:p21067996377-6"&gt;&lt;a href="#fn:p21067996377-6" rel="footnote"&gt;6&lt;/a&gt;&lt;/sup&gt;, and, to 
the best of my recollection, have never seen one claim a Sharpe ratio higher than \(4\mbox{yr}^{-1/2}\).
Shen&amp;#8217;s &lt;a href="http://www.kc.frb.org/publicat/reswkpap/pdf/rwp02-01.pdf"&gt;analysis of timing strategies&lt;/a&gt;, for example,
lists &amp;#8216;successful&amp;#8217; market-timing Strategies with Sharpe ratios on the order of 
\(0.5\mbox{yr}^{-1/2}\) to
\(0.7\mbox{yr}^{-1/2}\).
None of the tin-foil hat purveyors of market timing signals one can find on the web claim Sharpe ratios
higher than \(2\mbox{yr}^{-1/2}\), nor do they promise 100% returns in 26 weeks.
Bollen was apparently unaware he had found the philosopher&amp;#8217;s stone when he was 
&lt;a href="http://www.theatlantic.com/technology/archive/2010/10/predicting-stock-market-changes-using-twitter/64897/"&gt;quoted as saying&lt;/a&gt;:
&amp;#8220;&amp;#8230; we are hopeful to find &amp;#8230; better improvements for more sophisticated market models,&amp;#8221; &lt;em&gt;i.e.&lt;/em&gt; we 
hope to make the model even better.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The putative mechanism for the forecast defies all common sense.&lt;/strong&gt; 
Part of the authors&amp;#8217; argument is that the &amp;#8216;Calm&amp;#8217; signal from Twitter is predictive of the DJIA 
&lt;em&gt;at two to six day lag&lt;/em&gt;, and thus they use lagged data from this signal as input to their 
forecast model.  Somewhat paradoxically, the one day lag of &amp;#8216;Calm&amp;#8217; does not give significant 
Granger p-values in Table II.  Somehow, we are to believe, the information content 
&amp;#8216;skips a day&amp;#8217; (or more).  This is contrary to common sense, and common practice of downweighting
older observations as less relevant. It is particularly hard to imagine how using 
two- or three-day old tweets would give one the &lt;em&gt;best market timing model of all time&lt;/em&gt;.&lt;/p&gt;

&lt;p&gt;Furthermore, because the daily movements of the DJIA are &amp;#8216;high frequency&amp;#8217; 
(autocorrelation would be &amp;#8216;arbed out&amp;#8217;), a gap such as this could cause the signal to appear
&amp;#8216;out of sync&amp;#8217;.
For example, let &lt;code&gt;P&lt;/code&gt; and &lt;code&gt;C&lt;/code&gt; stand for &amp;#8216;panic&amp;#8217; and &amp;#8216;calm&amp;#8217; in the Twitter &amp;#8216;Calm&amp;#8217; signal, and let 
&lt;code&gt;+&lt;/code&gt; and &lt;code&gt;-&lt;/code&gt; mean up and down days for the DJIA. Imagine the following stream of days, where the
DJIA moves exactly as suggested by the &amp;#8216;Calm&amp;#8217; signal two (market) days prior.&lt;sup id="fnref:p21067996377-7"&gt;&lt;a href="#fn:p21067996377-7" rel="footnote"&gt;7&lt;/a&gt;&lt;/sup&gt;&lt;/p&gt;

&lt;pre&gt;&lt;code&gt;Calm:  P  P  C  C  P  C  P  P  C ...
DJIA:   ...  -  -  +  +  -  +  - ...
&lt;/code&gt;&lt;/pre&gt;

&lt;p&gt;Because of the delay effect and DJIA&amp;#8217;s high frequency nature, in this example the DJIA often
has down days when the &amp;#8216;Calm&amp;#8217; signal is calm, and up days when it panics, meaning market participants pay more
attention to how the Twitterverse felt two or three days ago than how it feels today.&lt;br/&gt;
Are we to believe that Twitter users are &lt;em&gt;trading&lt;/em&gt; on how they felt two or three (but not one) days prior, 
and thus moving the market? Or are they &lt;em&gt;predicting&lt;/em&gt; the state of the world two or three days ahead of time,
without being able to predict tomorrow? Both of these models are nonsensical.&lt;/p&gt;

&lt;p&gt;A more reasoned interpretation of the results is that the two- to six-day lags in the
&amp;#8216;Calm&amp;#8217; signal looked better due to &lt;a href="http://en.wikipedia.org/wiki/Data_dredging"&gt;datamining bias&lt;/a&gt;,
and any justification for their existence (I have seen none) is &lt;em&gt;ex post&lt;/em&gt; story telling.&lt;/p&gt;

&lt;p&gt;Moreover, given that the putative effect leads to the &lt;em&gt;best market timing model of all time&lt;/em&gt;, and
the signal is based on people&amp;#8217;s expression of mood, one would think that people, in general, would 
be good at market timing, &lt;em&gt;i.e.&lt;/em&gt; do significantly better than random. There is no evidence that this
is the case.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;&lt;strong&gt;The form of the accuracy claim is almost impossibly general.&lt;/strong&gt;
The forecast accuracy is quoted in terms of the predictive accuracy of the 
&amp;#8220;daily up and down changes in the closing values of the DJIA,&amp;#8221; full stop.
Are we to accept this accuracy claim holds both in bear and bull markets?
In periods of high volatility and low? Regardless of whether tomorrow&amp;#8217;s DJIA
return is, in absolute value, 2 percent or 0.05 percent? It is not clear how 
such a broad claim could be extrapolated from performance during 
15 trading days in December 2008.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;&lt;h2&gt;&lt;a href="http://www.npr.org/templates/story/story.php?storyId=130793579"&gt;&amp;#8220;We&amp;#8217;re scientists, so we&amp;#8217;re not interested in making a quick buck.&amp;#8221;&lt;/a&gt;&lt;/h2&gt;

&lt;p&gt;Employing &lt;a href="http://en.wikipedia.org/wiki/Hanlon%27s_razor"&gt;Hanlon&amp;#8217;s Razor&lt;/a&gt;, I am to conclude that
Bollen, Mao and Zeng are statistical naifs. This is consistent with the egregious methodological 
flaws evidenced in their paper. If it were merely a matter of the authors&amp;#8217; reputation, we could
agree that mistakes were made and move on.  However, Bollen and Mao 
&lt;a href="http://www.idsnews.com/news/story.aspx?id=80469"&gt;have teamed up&lt;/a&gt; with a hedge fund to 
&amp;#8216;capitalize&amp;#8217; on this market timing model.&lt;sup id="fnref:p21067996377-8"&gt;&lt;a href="#fn:p21067996377-8" rel="footnote"&gt;8&lt;/a&gt;&lt;/sup&gt;  Thus unsuspecting real investors can lose real money
if the advertised forecast accuracy fails to exist in the real world. Moreover, given the fee
structure of hedge funds, investors in said fund are probably signing up for &amp;#8216;random walk
minus costs&amp;#8217;, which seems like a bad deal.&lt;/p&gt;

&lt;p&gt;It would be too simple to fault the media&amp;#8217;s fawning reaction to this paper. After all,
the whole story is stuffed full of new-technology-catnip, and there has not apparently
been an accessible critical debate of its merits.  In my opinion, the peer-review process
has failed miserably here, and journalists can choose only to either re-report the finding as 
gospel fact or ignore it entirely.&lt;/p&gt;

&lt;hr&gt;&lt;p&gt;&lt;strong&gt;Disclosure&lt;/strong&gt; author has no holdings in Twitter, holds broad market ETFs which intersect with the
DJIA, has never made money in market-timing, and would short the Twitter hedge fund if shorting costs
were possible.&lt;/p&gt;

&lt;div class="footnotes"&gt;
&lt;hr&gt;&lt;ol&gt;&lt;li id="fn:p21067996377-1"&gt;
&lt;p&gt;And I don&amp;#8217;t need a computer to tell me how people think about Christmas. &lt;a href="#fnref:p21067996377-1" rev="footnote"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn:p21067996377-2"&gt;
&lt;p&gt;Based on the accuracy values reported in Table III, I suspect there is another effect afoot here. &lt;a href="#fnref:p21067996377-2" rev="footnote"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn:p21067996377-3"&gt;
&lt;p&gt;Statisticians call this problem &amp;#8216;estimation after selection.&amp;#8217; &lt;a href="#fnref:p21067996377-3" rev="footnote"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn:p21067996377-4"&gt;
&lt;p&gt;Bear in mind that the magical black box used by Bollen is a Neural Network, which have a (well-deserved)
bad reputation for overfitting to training data, giving horrible out-of-sample performance.  They apparently
use a five layer Neural Network, leading one to believe they first tried the one- through four-layer Neural Networks
before arriving at the desired result. &lt;a href="#fnref:p21067996377-4" rev="footnote"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn:p21067996377-5"&gt;
&lt;p&gt;The scale is somewhat nonlinear as well. For example, the probability of having a down year, say,
is (very roughly) bounded by \(1 / (1 + z^2)\), where \(z\) is the true Sharpe ratio of a strategy
in units of \(\mbox{yr}^{-1/2}\). And thus a Sharpe of 2 can be said to be four times as good as a Sharpe of 1. &lt;a href="#fnref:p21067996377-5" rev="footnote"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn:p21067996377-6"&gt;
&lt;p&gt;Debunking quantitative trading strategies is not just my hobby, it is my profession. &lt;a href="#fnref:p21067996377-6" rev="footnote"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn:p21067996377-7"&gt;
&lt;p&gt;The probability of guessing correctly 7 times in a row under a forecast accuracy of
87% is 37%. &lt;a href="#fnref:p21067996377-7" rev="footnote"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;li id="fn:p21067996377-8"&gt;
&lt;p&gt;Perhaps this is the first time that a &amp;#8216;quantitative&amp;#8217; hedge fund was launched based
on a 15 day backtest!  Or perhaps not. &lt;a href="#fnref:p21067996377-8" rev="footnote"&gt;↩&lt;/a&gt;&lt;/p&gt;
&lt;/li&gt;

&lt;/ol&gt;&lt;/div&gt;</description><link>http://sellthenews.tumblr.com/post/21067996377</link><guid>http://sellthenews.tumblr.com/post/21067996377</guid><pubDate>Sat, 14 Apr 2012 00:48:00 -0400</pubDate><category>twitter</category><category>quantitative-finance</category><category>market-timing</category><dc:creator>astrawman</dc:creator></item></channel></rss>
