Format and Parsing
In this chapter, we describe the settings to correctly parse the format of the historical data file.
Introduction
In order to correctly parse the historical file, many non-trivial aspects must be taken into account. We suggest you read this chapter even if the auto-detect feature works like a charm for your data, because here you can develop the required sensibility in order to evaluate the correctness of the parsing.The Character Set
Character Set and Character Encoding
You don't need to know the theory behind character set and character encoding, or the difference between the two concepts. All you need to know is that you must select the right one, so that the file appears intelligible. If the wrong character set / encoding is selected, the file results in being gibberish, with 'strange' characters.The Character Set Combobox

Character setA brief description of the character encoding is put in the label below the combobox.
What to Check For
Upon selecting a character set, look at theOriginal file
pane.
If you can read the file, you selected a suitable character set; if you can't, try another one.
Only the data count:
Date / time Open High Low Close VolumeThe header or other descriptive lines present in the file will be dismissed. So focus on data, without bothering much for the writings. With the wrong setting, all parsing results in errors, clearly visible everywhere, e.g. in spinners, in the parsed file and inconsistencies panes.
Which Character Set / Encoding to Try
Nowadays, the only character encoding used should be theUTF-8That is the first one to try. If it doesn't work, for western languages start trying, in the order:
- windows-1252 - US-ASCII - ISO-8859-1In all the other cases try selecting those whose name might be the right ones based on the country. For instance, if you downloaded data from a Japanese provider, try one of the
Japanese
encoding.
Examples
Correct Character Set / Encoding

Wrong Character Set / Encoding

Wrong Character Set / Encoding

The Do not auto detect
Checkbox

Do not auto detectThis way, it will not be overridden each time you run the auto-detect routine.
Field Separator
Definitions
Each line of the historical file constitutes one entry and contains data such as:Date / time Open High Low Close VolumeEach of these data represents a field. Being on the same line, they must be separated by a specific character, so that it is clear where fields begin and end. Such specific character is the field separator. As fields are usually tabled in columns, the field separator is also known as column separator. Fields separators delimit columns data. Indeed, the
Parsed file
pane shows fields in columns, where each column represents a field.
Pinpointing the Field Separator
The target of this paragraph is to pinpoint the field separator. When the field separator is selected properly, you will see each field in a different column in theParsed file
pane.
So, upon selecting a field separator, always check the Parsed file
table.
If the separator is the right one, each field will occupy one column.
While if it is a wrong separator, many kinds of errors can happen, like multiple fields being placed in the same column,
or one field being split into two parts with each part being placed in a different column.
Other errors likely will appear, as explained later on, but for now focus on the separation of the fields in the table.
Look at the Header First
Arguably the easiest way to identify the field separator is to look at the header, i.e. usually the first and only line reporting the column order, like:Date;Open;High;Low;Close;VolumeThe field separator for this line is the semicolon (;) and it is usually the same adopted for the rest of the file. Looking at lines with numbers may be more confusing, because, in some formatting, numbers can have commas and dots, like
1,234.56Separating them with semicolons:
2,120.36;2,182.30;2,100.59;2,164.44the semicolon is not readily apparent as separator.
Don't Look Only at the Header
Nonetheless, it can happen that the separator for the header is not the same as the one used for the other lines. The important lines are those containing data, not the header. So always look also at lines with data.Selecting the Field Separator

- Comma (,)
- Semicolon (;)
- Tab
- Space
Comma

Comma (,)
Semicolon

Semicolon (;)
Tab

Show Tabs

Show tabsin the
Original file
pane. With the checkbox selected, each tab will be substituted with the string:
<tab>Now you are sure the tab is the separator you are looking for. Select the radio button
Tab
Space

Space
Other Characters

Otherand input the character in the adjacent text field. Press the
Enter
key on the keyboard after editing this field to make changes effective.
In the depicted example, the field separator is the vertical bar (|), a.k.a. pipe.
Multiple Characters

sepYou can enter the string in the text field and it is correctly recognized.
Merge Consecutive Separators

Merge consecutive separatorsWhen this checkbox is selected, consecutive separators are considered as one, i.e. they are merged together.
Text Delimiter
Definition
Each single datum between the field separators can be enclosed between two text delimiters so that anything inside is not to be confused with the field separator. Usually, but not necessarily, this happens when the number format has the same character of the field separator. For instance, the thousands separator can be the comma (','); e.g. the number one thousand could be written as:1,000But also the field separator could be the same symbol, the comma. How to distinguish them? By putting numbers between text delimiters, it is specified that their commas are not field separators but number formatting. The field separator is not treated as such when within text delimiters. An example where the text delimiter are the double quotes:
1/1/2000 , 1.2 , "3,400,000"In the third field, the comma is not a field separator but a thousands separator, because it appears between double quotes.
Selecting the Text Delimiter

- Double quotes (")
- Single quotes (')
None
Usually, there is no text delimiter, like in all the previous examples. In such cases, select the radio buttonNone
Double Quotes

" (double quotes)Now it is clear when the comma is a thousand separator (light blue) and when it is a field separator (yellow).
Single Quotes

' (single quotes)Now it is clear when the comma is a thousand separator (light blue) and when it is a field separator (yellow).
Other Characters

Otherand input the character in the adjacent text field. Press the
Enter
key on the keyboard after editing this field to make changes effective.
In the depicted example, the text delimiter is the vertical bar (|), a.k.a. pipe.
Now it is clear when the comma is a thousand separator (light blue) and when it is a field separator (yellow).
Multiple Characters

delimYou can enter the string in the text field and it is correctly recognized.
Skip Lines
Lines to Skip

The Skip Lines Spinners
You can select how many lines at the beginning and at the end of the file to skip with the respective spinnersSkip the first lines Skip the last linesFor this task, it is convenient to select the radio buttons
Show first and last linesboth in the
Original file
and in the Parsed file
panes.
In this example, only the very first line in the file, containing the header, must be skipped.
Fields / Columns

Date Time Open High Low Close Volume
Look at the File Header First
Once again, the easiest way to do that, is to look at the header. That's exactly what it is there for. Indeed, without the header it wouldn't be obvious what kind of data each column holds. In the example, we have the following file header, highlighted in yellow:Date,Open,High,Low,Close,Adj Close,VolumeFrom the file header we can see that the time is missing, because they are classical end of day data and that there is the adjusted close, which we will dismiss.
The Text Area Header
The text area comes with a very useful header, highlighted in light blue, showing the ordinal number of the field. This way, you can easily locate the field number, because these are the values that you need. This header is relative and aligns to the first line visible in the text area. When you scroll the text area, it will change and adapt accordingly. It is most useful if you scroll the file so that the line with the file header appears first in the text area, so that you have a one-to-one correspondence between a field and its number. Fields start at number 1, not 0.The text area header replicates the field separator, too; comma in the example.
The Spinners
You insert the number of each field through the dedicated spinners. For instance, you insert the number of the close field, or column, in the spinnerCloseand so on.
Missing Field
If a field is missing, like the time in our example, set the spinner to 0.Date
The date field usually contains information like:day / month / yearFor daily data, each entry, or line in the file, is one day. For weekly data, each entry is one week, and the day might not be present. For yearly data, each entry is one year, and the day and month probably are not present. For the date field select the column number of your file containing the indication of the day, month and year, even if one of them is missing. In our example, a classical end of day, the date is the field number 1, so we set the spinner
Date = 1
Time
The time field represents an instant within a day, typically in:hour : minute : secondand is present mostly only for intraday data. Sometimes, the 'second' information is missing. For the time field select the column number of your file containing the indication of the hour, minute and second, even if one of them is missing. In our example, the time is missing, so we set the spinner
Time = 0
Date and Time Together
Sometimes, the information about date and time are condensed in one unique field, like:2017-11-18 09:31:57In such a case you can put it indifferently in the date or time spinner and set the other spinner to 0. For example, if it is the first column, set the spinners like this:
Date = 1 Time = 0
Either the Date or the Time Must Be Present
Either the date or the time must be present. So the following configuration for the spinners is not allowed:Date = 0 Time = 0
Open, High, Low and Close
Proceed the same way, by setting the spinners for open, high, low and close. In our example:Open = 2 High = 3 Low = 4 Close = 5
Adjusted Close
As already explained in another chapter, as in this file there is information about open, high and low, you must use the close, not the adjusted close, in order to have coherent data. Hence, the adjusted close column, number 6 in the example, is simply dismissed.Volume
Select which column contains the volume. In our example, the volume is the field number 7, so we put the spinnerVolume = 7If the volume field doesn't exist, set its spinner to 0.
Warnings and Errors
In order to facilitate you in this task, info, warnings and errors are given by changing the background color and the tooltip of the spinners. Read the tooltip of the spinner signaling that something might be wrong. Don't focus on the single spinner, though, but on the whole picture. Sometimes, a spinner might be set at the right value but signaling a warning just because there is another spinner set to a wrong value. Sometimes the warning or error might be generated by some other controls in the pane having the incorrect value.Other Examples
If the columns in the file are:1 , 2 , 3 , 4 , 5 , 6 , 7 Date, Time, Open, High, Low, Close, Volumethen select:
Date: 1, Time: 2, Open: 3, High: 4, Low: 5, Close: 6, Volume: 7If the columns in the file are:
1 , 2 , 3 , 4 , 5 Date, Open, High, Low, Closethen select:
Date: 1, Time: 0, Open: 2, High: 3, Low: 4, Close: 5, Volume: 0If the columns in the file are:
1 , 2 , 3 Ticker, Date, Closethen select:
Date: 2, Time: 0, Open: 0, High: 0, Low: 0, Close: 3, Volume: 0or
Date: 2, Time: 0, Open: 3, High: 3, Low: 3, Close: 3, Volume: 0The latter technique about single-valued historical data is explained in the introduction of this section.
Number Format
Numbers are formatted in many different ways, mostly dependent on the language and the country. For instance, the following are different representations of the same number:123,456.7 ← English (United States) 123 456,7 ← French (France) 123.456,7 ← German (Germany)When reading the historical file, Trading Conceiver must parse, or interpret, correctly the numbers.
Two Alternatives

- By specifying the decimal and thousands separators.
- By selecting the language.
The Original File Text Area Header

Show all fileand scroll the file past the file header and initial descriptions, until you reach the real data. This way, the text area header makes clear where the numbers start and end.
Format for the Parsed File Table

Tools → Options → NumbersSo be aware that the numbers in the original file and in the parsed file panes can be formatted in different ways, but they must represent the same numbers. In the example, the original file is in the format
English (US)but the numbers in the parsed file appear as selected in the options preferences, in French or German.
Decimal Separator

- Dot (.)
- Comma (,)
123,456.7 ← dot 123 456,7 ← comma 123.456,7 ← comma
Other Characters
If the decimal separator for your file is none of the above, you can explicitly set it. Select the radio buttonOtherand input the character in the adjacent text field. Press the Enter key on the keyboard after editing this field to make changes effective.
Multiple Characters
This feature, where you input the decimal separator in the text field, works even if the separator is not only one character, but a sequence of characters, i.e. a string. You can enter the string in the text field and it is correctly recognized.Thousands Separator

- Comma (,)
- Dot (.)
123456.7 ← none 123,456.7 ← comma 123.456,7 ← dot
Other Characters
If the thousands separator for your file is none of the above, you can explicitly set it. Select the radio buttonOtherand input the character in the adjacent text field. Press the Enter key on the keyboard after editing this field to make changes effective. Example:
123 456,7 ← spaceInput a space in the text field. Actually, for instance in the French locale, the space could be not a real space (space bar) but another character, looking like space, but different, like the Unicode character 0xA0 or 0x202F. If you are dealing with a certain language, you are surely familiar with its idiosyncrasies.
Multiple Characters
This feature, where you input the thousands separator in the text field, works even if the separator is not only one character, but a sequence of characters, i.e. a string. You can enter the string in the text field and it is correctly recognized.Language

Spanish (Argentina) Spanish (Mexico) Spanish (Spain) ...Select the appropriate one. Upon selecting it, make sure the number is interpreted correctly, by looking at the parsed file pane. The numbers in the original file and in the parsed file panes must correspond. The examples report an historical file written in French and one written in German. By selecting the right locale, the numbers are interpreted correctly. The parsed file table shows the numbers in English.
Don't Be Deceived by the Language
Don't be deceived by the country you are downloading the historical file from. It does not necessarily match the number format. For instance, an Italian provider could format numbers according to English rules. Look at the numbers.Language: Input and Output
We highlight the fact that theLanguage
combobox in this historical file window is used to parse numbers in the input historical file.
On the contrary, the language combobox in
Tools → Options → Numbersis used to format the output numbers throughout Trading Conceiver.
Reverse Order

Parsed file
table, dates must appear sorted from the oldest (top, first rows) to the most recent (bottom, last rows).
Look at the date / time and make sure it is really that way.
If it's not, you must toggle the checkbox
Reverse line orderBasically, check this box if the most recent date appears first in your file, i.e. if time flows from bottom to top.
Final Steps
The Save Step

SaveBy saving the current format configuration, you can load it again when needed, without repeating all the previous steps.
The OK Step

OKis enabled if there are no severe errors and there is a valid license installed. If you are satisfied with how the file has been interpreted, press the 'OK' button. You are now ready to use the file.