Numerical Cruncher



Data Configuration Files



Configuration files are used to specify all the metainformation needed by Numerical Cruncher to access to input data.

These files syntax is similar to that used in Microsoft Windows .INI files. Each section is headed by a bracket-enclosed line. Sections can be disordered (the full CFG file is loaded into memory before being interpreted) and comments are allowed in lines beginning with ';'.

In NC data configuration files you can found quite different sections. Some of then are common to all CFG files. Some of them are specific to some data format.

Common sections

[DESCRIPTION] includes an ASCII description of the dataset the CFG file references to. The first line of this description is used as title for NC main window.

[FORMAT] specifies the kind of data you access to. Allowed values are ASCII (standard files used in other packages such as C4.5 or LVQ_PAK), IMAGE (data included in a set of RAW images) and JDBC (when Java DataBase Connectivity is used to access data).

[ATTRIBUTES] must contain a list of the patterns components, one identifier per line. This ID corresponds to the name of the RAW image file (IMAGE format), to a table column name (when JDBC is used) and could be merely descriptive (ASCII format).

[CLASSIFIER] includes the ID of the attribute used for pattern classification. This section is optional. Obviously, if it is not included, you will not be able to test classification algorithms on the current input data.

Finally, [CLASSES] is used to enumerate all the categories allowed for input data. As the previous section, it is optional.

ASCII format

You have to specify, at least, the file which contains the data in the [DATA FILE] section. NC will internally split the data in two subsets randomly (the training set and the test set). The test set will lump together 30% of the available patterns.

You can also settle the patterns in the training and test sets apriori. Use [LEARNING FILE] and [TEST FILE] sections for that purpose. These sections must contain the name of the files which contain the training set and the test set, respectively.

IMAGE format

When data is stored as a set of RAW images, the size of those images must be indicated in the [WIDTH] and [HEIGHT] sections.

You must also use the sections [TRAINING FILE], [LEARNING FILE] and [TEST FILE] to reference the files where the patterns are labelled.

JDBC format

When a JDBC connection must be established to access data, some parameters are needed: