Format and Parsing

In this chapter, we describe the settings to correctly parse the format of the historical data file.

Format and Parsing

Introduction

In order to correctly parse the historical file, many non-trivial aspects must be taken into account. We suggest you read this chapter even if the auto-detect feature works like a charm for your data, because here you can develop the required sensibility in order to evaluate the correctness of the parsing.

The Character Set

Character Set and Character Encoding

You don't need to know the theory behind character set and character encoding, or the difference between the two concepts. All you need to know is that you must select the right one, so that the file appears intelligible. If the wrong character set / encoding is selected, the file results in being gibberish, with 'strange' characters.

The Character Set Combobox

You select the character set / encoding through the combobox:

  Character set

A brief description of the character encoding is put in the label below the combobox.

What to Check For

Upon selecting a character set, look at the Original file pane. If you can read the file, you selected a suitable character set; if you can't, try another one. Only the data count:

  Date / time
  Open
  High
  Low
  Close
  Volume

The header or other descriptive lines present in the file will be dismissed. So focus on data, without bothering much for the writings. With the wrong setting, all parsing results in errors, clearly visible everywhere, e.g. in spinners, in the parsed file and inconsistencies panes.

Which Character Set / Encoding to Try

Nowadays, the only character encoding used should be the

  UTF-8

That is the first one to try. If it doesn't work, for western languages start trying, in the order:

  - windows-1252
  - US-ASCII
  - ISO-8859-1

In all the other cases try selecting those whose name might be the right ones based on the country. For instance, if you downloaded data from a Japanese provider, try one of the Japanese encoding.

Examples

Correct Character Set / Encoding

Wrong Character Set / Encoding

The `Do not auto detect` Checkbox

If the auto detect feature doesn't select the right set, manually select the right one and then check the checkbox

  Do not auto detect

This way, it will not be overridden each time you run the auto-detect routine.

Field Separator

Definitions

Each line of the historical file constitutes one entry and contains data such as:

  Date / time
  Open
  High
  Low
  Close
  Volume

Each of these data represents a field. Being on the same line, they must be separated by a specific character, so that it is clear where fields begin and end. Such specific character is the field separator. As fields are usually tabled in columns, the field separator is also known as column separator. Fields separators delimit columns data. Indeed, the Parsed file pane shows fields in columns, where each column represents a field.

Pinpointing the Field Separator

The target of this paragraph is to pinpoint the field separator. When the field separator is selected properly, you will see each field in a different column in the Parsed file pane. So, upon selecting a field separator, always check the Parsed file table. If the separator is the right one, each field will occupy one column. While if it is a wrong separator, many kinds of errors can happen, like multiple fields being placed in the same column, or one field being split into two parts with each part being placed in a different column. Other errors likely will appear, as explained later on, but for now focus on the separation of the fields in the table.

Look at the Header First

Arguably the easiest way to identify the field separator is to look at the header, i.e. usually the first and only line reporting the column order, like:

  Date;Open;High;Low;Close;Volume

The field separator for this line is the semicolon (;) and it is usually the same adopted for the rest of the file. Looking at lines with numbers may be more confusing, because, in some formatting, numbers can have commas and dots, like

  1,234.56

Separating them with semicolons:

  2,120.36;2,182.30;2,100.59;2,164.44

the semicolon is not readily apparent as separator.

Don't Look Only at the Header

Nonetheless, it can happen that the separator for the header is not the same as the one used for the other lines. The important lines are those containing data, not the header. So always look also at lines with data.

Selecting the Field Separator

You select the field separator through the dedicated radio buttons. Almost always the field separator is one of the following character:

Comma (,)
Semicolon (;)
Tab
Space

Comma

If your file looks like the one in the example, then the separator is the comma (','). Select the radio button

  Comma (,)

Semicolon

If your file looks like the one in the example, then the separator is the semicolon (';'). Select the radio button

  Semicolon (;)

Tab

Tabs are not visible. A file whose field separator is a tab might look like the one in the example. A first hint that the tab is the separator comes from the fact that the space between the fields varies within the same line: can be narrower for a pair of columns and larger for another pair. In the example, the space between the first and the second column is greater than the space between the second and the third. A second hint is that some rows appear with columns not aligned. In the example, the header is not aligned with the data.

Show Tabs

If you have the suspect that the tab is the separator, in order to be sure and to look at tabs clearly, select the checkbox

  Show tabs

in the Original file pane. With the checkbox selected, each tab will be substituted with the string:

  <tab>

Now you are sure the tab is the separator you are looking for. Select the radio button

Tab

Space

If your file looks like the one in the example, then the separator is a space character (' '). Select the radio button

  Space

Other Characters

If the field separator for your file is none of the above, you can explicitly set it. Select the radio button

  Other

and input the character in the adjacent text field. Press the Enter key on the keyboard after editing this field to make changes effective. In the depicted example, the field separator is the vertical bar (|), a.k.a. pipe.

Multiple Characters

This feature, where you input the field separator in the text field, works even if the separator is not only one character, but a sequence of characters, i.e. a string. In the example, the field separator is the string, unrealistic, but just to show the capability:

sep

You can enter the string in the text field and it is correctly recognized.

Merge Consecutive Separators

Sometimes there are consecutive separators just to beautify the appearance of the file. This happens mostly when the field separator is the space. Multiple spaces might be added between the columns, both in order to align columns and to space columns apart. In this case, two consecutive spaces could be erroneously interpreted as two real separators with an empty field in between. In order to clearly distinguish between such cases, if consecutive field separators are added only to make the file more readable, select the checkbox

  Merge consecutive separators

When this checkbox is selected, consecutive separators are considered as one, i.e. they are merged together.

Text Delimiter

Definition

Each single datum between the field separators can be enclosed between two text delimiters so that anything inside is not to be confused with the field separator. Usually, but not necessarily, this happens when the number format has the same character of the field separator. For instance, the thousands separator can be the comma (','); e.g. the number one thousand could be written as:

  1,000

But also the field separator could be the same symbol, the comma. How to distinguish them? By putting numbers between text delimiters, it is specified that their commas are not field separators but number formatting. The field separator is not treated as such when within text delimiters. An example where the text delimiter are the double quotes:

  1/1/2000 , 1.2 , "3,400,000"

In the third field, the comma is not a field separator but a thousands separator, because it appears between double quotes.

Selecting the Text Delimiter

You select the field separator through the dedicated radio buttons. Almost always the field separator is absent or one of the following characters:

Double quotes (")
Single quotes (')

None

Usually, there is no text delimiter, like in all the previous examples. In such cases, select the radio button

  None

Double Quotes

If your file looks like the one in the example, then the text delimiter are the double quotes and the field separator is the comma (','). Select the radio button

  " (double quotes)

Now it is clear when the comma is a thousand separator (light blue) and when it is a field separator (yellow).

Single Quotes

If your file looks like the one in the example, then the text delimiter are the single quotes and the field separator is the comma (','). Select the radio button

  ' (single quotes)

Now it is clear when the comma is a thousand separator (light blue) and when it is a field separator (yellow).

Other Characters

If the text delimiter for your file is none of the above, you can explicitly set it. Select the radio button

  Other

and input the character in the adjacent text field. Press the Enter key on the keyboard after editing this field to make changes effective. In the depicted example, the text delimiter is the vertical bar (|), a.k.a. pipe. Now it is clear when the comma is a thousand separator (light blue) and when it is a field separator (yellow).

Multiple Characters

This feature, where you input the text delimiter in the text field, works even if the delimiter is not only one character, but a sequence of characters, i.e. a string. In the example, the text delimiter is the string, unrealistic, but just to show the capability:

  delim

You can enter the string in the text field and it is correctly recognized.

Skip Lines

Lines to Skip

Only lines with data, i.e. the open, high, low, close and volume, must be retained, while the others must be skipped, or ignored. Typically, an historical file at the beginning has descriptions and a header. They must all be dismissed. This might happen, although more rarely, at the end of the file, too. Empty lines must be dismissed, too and are automatically recognized.

The Skip Lines Spinners

You can select how many lines at the beginning and at the end of the file to skip with the respective spinners

  Skip the first lines
  Skip the last lines

For this task, it is convenient to select the radio buttons

  Show first and last lines

both in the Original file and in the Parsed file panes. In this example, only the very first line in the file, containing the header, must be skipped.

Fields / Columns

Now we have to identify the fields, or columns, containing the data we are interested in:

  Date
  Time
  Open
  High
  Low
  Close
  Volume

Look at the File Header First

Once again, the easiest way to do that, is to look at the header. That's exactly what it is there for. Indeed, without the header it wouldn't be obvious what kind of data each column holds. In the example, we have the following file header, highlighted in yellow:

  Date,Open,High,Low,Close,Adj Close,Volume

From the file header we can see that the time is missing, because they are classical end of day data and that there is the adjusted close, which we will dismiss.

The Text Area Header

The text area comes with a very useful header, highlighted in light blue, showing the ordinal number of the field. This way, you can easily locate the field number, because these are the values that you need. This header is relative and aligns to the first line visible in the text area. When you scroll the text area, it will change and adapt accordingly. It is most useful if you scroll the file so that the line with the file header appears first in the text area, so that you have a one-to-one correspondence between a field and its number. Fields start at number 1, not 0.
The text area header replicates the field separator, too; comma in the example.

The Spinners

You insert the number of each field through the dedicated spinners. For instance, you insert the number of the close field, or column, in the spinner

  Close

and so on.

Missing Field

If a field is missing, like the time in our example, set the spinner to 0.

Date

The date field usually contains information like:

  day / month / year

For daily data, each entry, or line in the file, is one day. For weekly data, each entry is one week, and the day might not be present. For yearly data, each entry is one year, and the day and month probably are not present. For the date field select the column number of your file containing the indication of the day, month and year, even if one of them is missing. In our example, a classical end of day, the date is the field number 1, so we set the spinner

  Date = 1

Time

The time field represents an instant within a day, typically in:

  hour : minute : second

and is present mostly only for intraday data. Sometimes, the 'second' information is missing. For the time field select the column number of your file containing the indication of the hour, minute and second, even if one of them is missing. In our example, the time is missing, so we set the spinner

  Time = 0

Date and Time Together

Sometimes, the information about date and time are condensed in one unique field, like:

  2017-11-18  09:31:57

In such a case you can put it indifferently in the date or time spinner and set the other spinner to 0. For example, if it is the first column, set the spinners like this:

  Date = 1
  Time = 0

Either the Date or the Time Must Be Present

Either the date or the time must be present. So the following configuration for the spinners is not allowed:

  Date = 0
  Time = 0

Open, High, Low and Close

Proceed the same way, by setting the spinners for open, high, low and close. In our example:

  Open = 2
  High = 3
  Low = 4
  Close = 5

Adjusted Close

As already explained in another chapter, as in this file there is information about open, high and low, you must use the close, not the adjusted close, in order to have coherent data. Hence, the adjusted close column, number 6 in the example, is simply dismissed.

Volume

Select which column contains the volume. In our example, the volume is the field number 7, so we put the spinner

  Volume = 7

If the volume field doesn't exist, set its spinner to 0.

Warnings and Errors

In order to facilitate you in this task, info, warnings and errors are given by changing the background color and the tooltip of the spinners. Read the tooltip of the spinner signaling that something might be wrong. Don't focus on the single spinner, though, but on the whole picture. Sometimes, a spinner might be set at the right value but signaling a warning just because there is another spinner set to a wrong value. Sometimes the warning or error might be generated by some other controls in the pane having the incorrect value.

Other Examples

If the columns in the file are:

    1 ,   2 ,   3 ,   4 ,  5 ,   6  ,    7
  Date, Time, Open, High, Low, Close, Volume

then select:

  Date: 1, Time: 2, Open: 3, High: 4, Low: 5, Close: 6, Volume: 7

If the columns in the file are:

    1 ,   2 ,   3 ,   4 ,  5
  Date, Open, High, Low, Close

then select:

  Date: 1, Time: 0, Open: 2, High: 3, Low: 4, Close: 5, Volume: 0

If the columns in the file are:

     1  ,   2 ,   3 
  Ticker, Date, Close

then select:

  Date: 2, Time: 0, Open: 0, High: 0, Low: 0, Close: 3, Volume: 0

  Date: 2, Time: 0, Open: 3, High: 3, Low: 3, Close: 3, Volume: 0

The latter technique about single-valued historical data is explained in the introduction of this section.

Number Format

Numbers are formatted in many different ways, mostly dependent on the language and the country. For instance, the following are different representations of the same number:

  123,456.7  ← English (United States) 
  123 456,7  ← French (France) 
  123.456,7  ← German (Germany)

When reading the historical file, Trading Conceiver must parse, or interpret, correctly the numbers.

Two Alternatives

We offer 2 methods to correctly parse the numbers:

By specifying the decimal and thousands separators.
By selecting the language.

These two methods are mutually exclusive, in that only one of them is applied. You select between the two by clicking anywhere within the area of the panes dedicated to them. Only the one you clicked is active, whilst the other is disabled.

The Original File Text Area Header

In order to make numbers stand out better, it is useful to select, in the original file pane, the radio button

  Show all file

and scroll the file past the file header and initial descriptions, until you reach the real data. This way, the text area header makes clear where the numbers start and end.

Format for the Parsed File Table

The parsed file table shows the numbers according to the user preference

  Tools → Options → Numbers

So be aware that the numbers in the original file and in the parsed file panes can be formatted in different ways, but they must represent the same numbers. In the example, the original file is in the format

  English (US)

but the numbers in the parsed file appear as selected in the options preferences, in French or German.

Decimal Separator

You select the decimal separator through the dedicated radio buttons. Look at the numbers in the original file text area and select the appropriate decimal separator. Upon selecting it, make sure the number is interpreted correctly, by looking at the parsed file pane. The numbers in the original file and in the parsed file panes must correspond. Almost always the decimal separator is one of the following characters:

Dot (.)
Comma (,)

Examples:

  123,456.7  ← dot
  123 456,7  ← comma
  123.456,7  ← comma

Other Characters

If the decimal separator for your file is none of the above, you can explicitly set it. Select the radio button

  Other

and input the character in the adjacent text field. Press the Enter key on the keyboard after editing this field to make changes effective.

Multiple Characters

This feature, where you input the decimal separator in the text field, works even if the separator is not only one character, but a sequence of characters, i.e. a string. You can enter the string in the text field and it is correctly recognized.

Thousands Separator

The thousands separator is more appropriately called the grouping separator, because it can group not only the 'thousands', but also the 'ten-thousands' in some languages. It is used to enhance readability. You select the thousands separator through the dedicated radio buttons. Look at the numbers in the original file text area and select the appropriate thousands separator. Upon selecting it, make sure the number is interpreted correctly, by looking at the parsed file pane. The numbers in the original file and in the parsed file panes must correspond. Almost always the thousands separator either is absent or is one of the following characters:

Comma (,)
Dot (.)

Examples:

  123456.7   ← none
  123,456.7  ← comma
  123.456,7  ← dot

Other Characters

If the thousands separator for your file is none of the above, you can explicitly set it. Select the radio button

  Other

and input the character in the adjacent text field. Press the Enter key on the keyboard after editing this field to make changes effective. Example:

  123 456,7  ← space

Input a space in the text field. Actually, for instance in the French locale, the space could be not a real space (space bar) but another character, looking like space, but different, like the Unicode character 0xA0 or 0x202F. If you are dealing with a certain language, you are surely familiar with its idiosyncrasies.

Multiple Characters

This feature, where you input the thousands separator in the text field, works even if the separator is not only one character, but a sequence of characters, i.e. a string. You can enter the string in the text field and it is correctly recognized.

Language

In order to correctly parse numbers, as an alternative to selecting the decimal and thousands separators, you can select the language the numbers are written in. You select the language through the dedicated combobox. Note that for the same language there are specific variants linked to countries, like:

  Spanish (Argentina)
  Spanish (Mexico)
  Spanish (Spain)
  ...

Select the appropriate one. Upon selecting it, make sure the number is interpreted correctly, by looking at the parsed file pane. The numbers in the original file and in the parsed file panes must correspond. The examples report an historical file written in French and one written in German. By selecting the right locale, the numbers are interpreted correctly. The parsed file table shows the numbers in English.

Don't Be Deceived by the Language

Don't be deceived by the country you are downloading the historical file from. It does not necessarily match the number format. For instance, an Italian provider could format numbers according to English rules. Look at the numbers.

Language: Input and Output

We highlight the fact that the Language combobox in this historical file window is used to parse numbers in the input historical file. On the contrary, the language combobox in

  Tools → Options → Numbers

is used to format the output numbers throughout Trading Conceiver.

Reverse Order

Sometimes the historical file is sorted from the most recent to the least recent date, sometimes the other way around. In the Parsed file table, dates must appear sorted from the oldest (top, first rows) to the most recent (bottom, last rows). Look at the date / time and make sure it is really that way. If it's not, you must toggle the checkbox

  Reverse line order

Basically, check this box if the most recent date appears first in your file, i.e. if time flows from bottom to top.

Final Steps

The Save Step

When you are sure the historical file is interpreted correctly, you can press the button

  Save

By saving the current format configuration, you can load it again when needed, without repeating all the previous steps.

The OK Step

The button

OK

is enabled if there are no severe errors and there is a valid license installed. If you are satisfied with how the file has been interpreted, press the 'OK' button. You are now ready to use the file.

Format and Parsing

Introduction

The Character Set

Character Set and Character Encoding

The Character Set Combobox

What to Check For

Which Character Set / Encoding to Try

Examples

Correct Character Set / Encoding

Wrong Character Set / Encoding

Wrong Character Set / Encoding

The Do not auto detect Checkbox

Field Separator

Definitions

Pinpointing the Field Separator

Look at the Header First

Don't Look Only at the Header

Selecting the Field Separator

Comma

Semicolon

Tab

Show Tabs

Space

Other Characters

Multiple Characters

Merge Consecutive Separators

Text Delimiter

Definition

Selecting the Text Delimiter

None

Double Quotes

Single Quotes

Other Characters

Multiple Characters

Skip Lines

Lines to Skip

The Skip Lines Spinners

Fields / Columns

Look at the File Header First

The Text Area Header

The Spinners

Missing Field

Date

Time

Date and Time Together

Either the Date or the Time Must Be Present

Open, High, Low and Close

Adjusted Close

Volume

Warnings and Errors

Other Examples

Number Format

Two Alternatives

The Original File Text Area Header

Format for the Parsed File Table

Decimal Separator

Other Characters

Multiple Characters

Thousands Separator

Other Characters

Multiple Characters

Language

Don't Be Deceived by the Language

Language: Input and Output

Reverse Order

Final Steps

The Save Step

The OK Step

The `Do not auto detect` Checkbox