Candlesticks Pattern Matching

Here we make considerations about candlesticks pattern matching in Trading Conceiver; the intricacies of pattern matching and our approach to accomplish it.

Intricacies include:

Patterns are not mathematical formulae
Different descriptions
Arbitrary values
Mismatch between description and examples
Canonical figures
Bullish / bearish versions can differ
Sight and perception can be deceptive

Our Approach includes:

Minimum requirements
Psychology
Number of occurrences
Number of input parameters
Not very specific candles
Not mutually exclusive
Prior trend

Intricacies in Pattern Matching

When coding software routines to implement patterns, many difficulties pop up. Here we report a list of them. The purpose is not to involve the user in the complexity of such a task, but to let him better understand the results he might get from our tool.

Patterns Are not Mathematical Formulae

Patterns in general, and candlestick ones don't make an exception, are not defined through mathematical formulae. Basically all authors describe them in words, not showing formulae, so they are not rigorously well defined. When a software engineer needs to implement them, he needs to translate the words into algorithms, mathematically defined directives. No matter how detailed a description is. It is not a mathematical formula. It is not like the definition of an indicator, like the Simple Moving Average (SMA), where a formula exists. Whoever is calculating the SMA, the result will always be the same. This is not the case for patterns. The pattern recognition will be dependent on the 'translation' from words to algorithm. Hence different pattern recognition software will output different results, i.e. there will not be a one-to-one correspondence of patterns found between different software scanners. There is not a right and a wrong recognition software. They are just different, differently interpreting the pattern, differently implemented, subject to the individual sensibility and personal knowledge. Even when the description might look very detailed and precise, when looking at the results when implemented, it usually appears clearly right away that it is faulty, maybe because it recognizes configurations not very similar to the 'canonical' pictures. In Trading Conceiver, we implemented our version of software. Because of the above-mentioned problem, we explicitly state all and only the conditions we check to match a pattern, i.e. the conditions necessary and sufficient, meaning all of them must be true, and there are no other checks performed. You can find this information in the Formulae section of the description of the candle pattern.

Pattern in Formulae

To be more explicit, formulae of patterns can definitely be found, but they are very different from each other, because they have been obtained as just described, by translating words into formulae.

Different Descriptions

To exacerbate the problem is that different authors give slightly different descriptions of the same pattern. We report an example about the Three Stars in the South further below. Again, this holds even when examining formulae versions of the pattern. The same author might even accept different variations of the pattern. Furthermore, some rules might be considered as mandatory by some but as signal enhancements by others. The latter means that a certain condition is not required, but, if present, makes the signal stronger. Typical examples are:

The longer the first candle, the more forceful the reversal signal.
A gap between the first two candles adds to the probability of the reversal.
The longer the upper shadow, the higher the potential of a reversal.

The pattern has the same name, but it is not uniquely defined. This poses further difficulties in writing the logical function implementing the pattern. What rules should be required exactly?

Arbitrary Values

Sometimes, in the description or formula of a pattern, some arbitrary values can be found. Typical examples could be:

The lower shadow length must be at least two times that of the body.
The body must span at least 75% of the whole high-low range.

These values are of course arbitrary. In order to make them more useful, an input parameter should be introduced for each of them. This brings to the problem of the number of input parameters, as explained further below.

Mismatch Between Description and Examples

This might look surprising, but it is not rare to find, in the same source, a certain description of the pattern and then charts examples not matching the description. When this happens, usually the examples require less strict rules. So should the software adhere to the stricter description or the looser examples? We suggest that the user looks thoroughly at the examples the author offers when studying candlesticks patterns. Indeed such examples should come from the real world, not drawn purposely by hand. This way things will be clearer.

Canonical Figures

As a corollary of the previous point, the canonical figure representing the pattern, the one appearing in every source, resulting from a drawing and not taken from a real stock chart, is only partially explicative. Sometimes that drawing represents just an example, maybe the most meaningful, an ideal case, the epitome of the pattern. How much a real configuration can diverge from that picture? How much longer can a candle be? How much lower its position with respect to the others?

Bullish / Bearish Versions Can Differ

This is surprising, too. When describing both the bullish and the bearish version of the pattern, which should match substituting dual quantities, the same author sometimes gives slightly different requirements. It is never clear whether it is intentional or just the fact that the pattern is not so well defined, exactly.

Sight and Perception Can Be Deceptive

When we look at a certain candle configuration to decide whether it constitutes a pattern, we are relying on our perception, which can be misleading. Let's look at the example depicted here, representing Doji. Probably, someone might be tempted to consider the one on the right as a Doji, and the one on the left as a long-legged Doji. And yet, they have exactly the same body and the same shadows, so they are exactly the same candle. They are just drawn with a different width (maybe exaggerated here for convenience), which means nothing, it is irrelevant. This is just one example, involving the form factor, i.e. the width vs the height, but there are many others. For instance, the different zoom at which we are observing a graph could lead us to different conclusions.

An Example: the Notorious 'Long' Adjective

As an example we want to dwell upon the adjective 'long' that appears in many descriptions, referring to candles. Here is a list of various interpretations of what a 'long' candle should be according to different authors:

The body of the candle must be long with respect to the previous candles' body. [1]
The body of the candle must be long with respect to the previous candles' high-low range.
The whole high-low range of the candle must be long with respect to the previous candles' high-low range. [2]
As in [1], and the shadows must be short compared with the body. [3]
[1] and [2] simultaneously.
([1]), [2] and [3] simultaneously.

In all the previous options, note that there are arbitrary quantities:

The threshold for considering something 'long'. What is the value above which we can state the candle is long with respect to the previous ones? 75%? 120%? Three times?
The lookback period. How many previous candles should we look at?
The kind of average to use for the length of the previous candles. It could be a simple moving average (SMA), an exponential moving average (EMA) or anything else.

Unfortunately other problems arise:

Some sources suggest numbers for the threshold, e.g. 3 times greater than the average of the preceding candles. When putting in such a high number, the risk is that the patterns become so rare, that any result in the trading system is statistically meaningless.
Looking at the real life examples proposed by the authors, sometimes 'long' appears to mean more probably 'not short'. This refers again to the subjectivity of the threshold for 'long'.
Sometimes it is clear that 'long' doesn't refer to a comparison with the previous candles, but simply to the other candles of the pattern.
Where do we start exactly to look backward for the 'previous' candles? Simply before the pattern or before the candle within the pattern required to be 'long'?

On a final note, let's add some psychological considerations when looking at a graph, because perception counts and could lead us to different conclusions in different contexts.

What might look short when zoomed out, might look long when zoomed in. This is not trivial. When selecting the Fit Viewport option in Trading Conceiver charts, the zoom changes continuously when shifting the charts horizontally, and the effect of candles 'becoming' long or short is apparent.
Candles with long bodies could appear more prominent to the eye, which could 'filter out' those with short bodies. So, when deciding mentally whether a candle is long or short, we could be misled by that.

Unfortunately, it is not wise to collect in one place the definition of what 'long' means for the user and use that definition for all the patterns. The user could desire to trim that definition according to the pattern. E.g. for a very rare pattern, he might decide to relax the definition of 'long', by changing the threshold. According to the market under study, he could prefer to change the lookback period. If 'long' refers to the other candles of the pattern, rather than those prior the pattern, a completely different definition could be desirable.

Another Example: 'Gap'

The 'gap' is another dreadful word to the software programmer. Different authors might confer a different meaning to the gap between two candles. Here are some possibilities.

Body gap, i.e. considering only the bodies.
Whole range gap, i.e. considering shadows, too.
Opening gap, i.e. considering only the open price; again relative to the previous body or whole range.
It also exists a gap which, in its 'up' version, considers the high of the first candle and the lower between the open and the close of the second candle.

Our Approach

In order to decide the conditions that must be satisfied for a certain pattern, we evaluate all the following considerations.

Minimum Requirements

We prefer to pick out a set of minimum requirements, comparing those from various sources. So we tend to demand the lowest possible number of rules. Obviously, they must be a reasonably complete set, not just an intersection. In particular, we tend to omit rules that we deem just signal enhancements.

Psychology

When in doubt, we refer to the psychology underlying the pattern, that is its rationale, to refine the choice and as a guidance. Let's make an example, the Bullish Breakaway, pictured here. We implemented it like this:

n-4, n-3, n-1 are black
n-2 can be black or white
n is white
body gap down between n-4 and n-3
avg(n-3) ≥ avg(n-2) ≥ avg(n-1)
open(n-3) < close(n) [1]

The psychology of the pattern is the following. The first black candle represents a bearish trend. The second day gaps down, accelerating the downtrend, confirmed even more by its black color. The third and fourth days continue the trend. So up to now, the trend is evident. The last day must clearly reverse the trend and go in the opposite direction. So it must be a white candle, and its close must be high enough to be convincing. If the close of the last day is higher than the open of the second day, it means it cancels the move of the three previous days and could prelude a bullish reversal. So we think our implementation, complying with the minimum requirements approach explained in the previous point, respects this psychology. Some authors add some requirements or have different ones, though:

The gap between n-4 and n-3 must include also the shadows, not just the bodies. This is a choice that doesn't change the psychology of the pattern. Both are legitimate. We opted for the bodies version, because that seems the favorite in general among authors.
Some authors express the downtrend through conditions on opens and/or closes of the candles. We used the average instead. In all cases, the psychology is respected: there is an ongoing downtrend. So we didn't add any condition on open or close values. We think our constraint is better, though, because the third candle can be any color, and using open or close values is faulty.
Some require open(n) not in the direction of the downtrend, e.g. some demand open(n) > close(n-1). We didn't add this rule because we deem it a signal enhancement rather than a requirement. Although omitting it, we think we still complied with the pattern psychology. Here, even the number of occurrences comes into play, see next section.
Instead of [1], some require the last candle closing exactly in the gap between the first two, i.e. not closing the gap completely. We think that with our weaker assumption [1] we still honored the rationale of the pattern. Here again the number of occurrences must be factored in.

Number of Occurrences

We take into consideration also the number of occurrences to define the list of requirements. Obviously, we try to keep the core concept intact.

Too Rare

In order for a pattern to be useful, it must not occur too rarely, or the study of the trading system will be statistically meaningless. So if a pattern is already extremely rare with a few rules, we try not to add even more conditions to be met, and maybe to relax some of the existing ones. In particular, if only the canonical figure is accepted, usually the occurrences are very few. Moreover, we made the decision to not implement patterns occurring virtually never, see Implemented Patterns.

Too Frequent

By the same token, if a pattern happens too frequently, probably is not very useful as well. If it is always there, probably can't give a strong signal. In that case, we try to add some requirements, to limit its frequency.

Number of Input Parameters

This has been a very strong design decision.

The Code Is Already Implemented

The Composer already accepts input parameters for each trading algorithm, so accepting input parameters for candle patterns comes for free. It is already there, already coded, already available, no effort on our part.

Too Cumbersome With Many Parameters

However, we decided to limit as much as possible the number of input parameters for candlesticks. Each added parameter would of course increment the flexibility for the user, who could tailor the pattern search algorithm to his needs, but it would also increase the difficulty in using it. We believe that introducing parameters to control pattern recognition would render the software too cumbersome and an exhausting process for the user. So we limited as much as possible the number of parameters, and used them only when strictly necessary. Please note that there are some trading algorithms in the Technical Indicators Based branch with quite a number of parameters, but they are basically the same for all of them. Apart the parameters for the specific technical indicator, all the other parameters are always the same (MA, slope...). For candlesticks, on the contrary, each parameter would be specific to the pattern. It would be a nightmare for the user to understand what all the parameters mean, an input pertinent values. The number of parameters required could easily sky rocket, see the following example. Probably we would also need to supply the user with some mean to save his preferences for default values, and this for each pattern.

When a Parameter Is Added

If an arbitrary number is required for a certain condition, which would call for an input parameter, we tend to dismiss that rule. We add a parameter when a core rule must be introduced and it requires a number that would be too subjective and arbitrary to hard-code. For instance, in the Bottom Tweezers, exemplified here, the core requirement is that the lows of the two candles should be almost equal. In this case, we must introduce this rule. Deciding what 'almost' means is arbitrary. We had to let the user decide for himself and introduced an input parameter.

Example

Let's make an example, with the Three Stars in the South, a simple 3 black candles pattern, depicted here. We decided to implement it like this, with the following conditions that must be satisfied on day n:

high-low range(n-1) inside high-low range(n-2) [1]
high-low range(n) inside high-low range(n-1) [2]

However, other authors give different / other requirements, as listed here:

n-2 with
- no upper shadow [3]
- virtually no upper shadow [4]
lower shadow of n-1
- pretty long [5]
- shorter than lower shadow of n-2 [6]
body(n-1) shorter than body(n-2) [7]
close(n-1) < close(n-2) [8]
low(n-2) < low(n-1), instead of [1]. [9]
candle n must be small. [10]

So, even for a very simple pattern like this, we should introduce 10 parameters to let the user decide how to exactly identify the pattern. We think this is not the right way to go. This worsens when one or more candles should be 'long', as explained earlier.

Not Very Specific Candles

We try to require the most general type of candles, not very specific ones, that is we are not too stringent on the type of candles comprising a pattern. For instance, in the Deliberation pattern, illustrated here, some authors require the third candle to be a Doji, others to be a Spinning Top. We simply demand its body to be smaller than the other two.

One-candle Patterns

We kept the one-candle patterns (Marubozu, Doji...) particularly simple. These patterns don't give very reliable signals, because after all, they are just one candle. Complicating their definition, with input parameters, seemed unjustified to us.

Not Mutually Exclusive

We tend to consider a single pattern per se, without trying to render it mutually exclusive with others. For instance, we consider the Dragonfly Doji and the Gravestone Doji to be 'standard' Doji, too. So Dragonfly and Gravestone are subsets of Doji. Other authors consider them mutually exclusive, so add rules in order to guarantee that. Of course, this holds as long as it is reasonable. For instance, there are some pattern very similar to each other. In that case, we do introduce rules that distinguish them.

Prior Trend

Most of the candlestick patterns are meaningful in a certain trend, up or down. We don't check the prior trend in order to identify a pattern. This is because Trading Conceiver has a vast list of trading algorithms designed just to do that. There are so many, that it would be silly to pick out one of them to test the pattern. We let this choice to the user, who can select the one he prefers. Simply logical AND the pattern with one of the numerous algorithms in the Composer to check the trend the pattern happens in. See an example here.