How To Analyze and Export Data

This tutorial will describe the process of analyzing stored data from previous experiments.

Step 1 – Start the application

Start Evolved Horizon by clicking the Evolved Horizon icon in your OS; you will first start the Booter that takes care of all the licensing and application upgrading. After the program is initialized, the Launcher is shown. If you are satisfied with the current choice of user, you may press the Launch button.

Figure 1. Start Evolved Horizon by clicking on the icon in your OS. The Booter will check for license and available upgrades prior to starting the main application

Figure 2. The Evolved Horizon launcher; choose your user and click on “Launch”
Step 2 – Connect to a database

After the launch sequence is complete, the Welcome interface will be shown, Figure 2. In this tutorial it is assumed that you already have collected some channel data in an already existing database. If you already have a database module loaded and ready with entries in it, you may skip the reminder of this step, otherwise please click on the Modules->Database->Add (the “+” Icon). Evolved Horizon currently support one type of database, the “Binary File”.
The Id and the Database Root Folder settings are the most important. A database is a series of files and folder located under ../<Database Root Folder>/<Id>. So, to create a new one you simply choose a suitable folder and Id; to connect to a existing you must match both the folder and Id. In this tutorial we assume a folder that is located in the ../Documents/Evolved Horizon Databases/Tutorial DB/Lab Database 1/.

Figure 3. Choose modules at the welcome screen

Figure 4. Create a new database module

The structure for each database is a database table, “EHZ_database.bin”, and the “Data” and “Recovery” folders. Each assembly (experiment) will generate a *.ehdat file and will eventually end up in the “Data” folder. You may manually add other .ehdat files to this folder, and they will be included to the database if issued a reconstruct command. More on that later. The database listing on the Module Interface should now resemble the figure below, Figure 5. As seen, the database consists of 75 database entries and it is properly connected since its status is “Connected!”. 

Figure 5. The database modules and some of their settings and status.

Great – you are now ready to analyze those (75) entries! If there are no entries in the database, you may create some random entries by entering this command in the console:

create database entry -id "Lab Database 1" -nr <10>

This command will create 10 entries of some sinus curves, usable for testing the Analytical Interface.

If you wish to create the database from the console, instead of the interface, you may write:

dbnew -type "file" -id "Lab Database 1"

If you wish to view the details of the database, its status etc, write:

dbinf -complete.

Finally, the content (entries) of a database may be viewed in the console by typing:

dbtable or dbtable -id "Lab Database 1"

if you want to be explicit about what database table you want to view.

Ok, you have a database with experiments ready to be analyzed!

Step 3 – View and manipulate the database table

Ok. You have now created/connected to a database and have some entries in it, either by real sensor data or generated from the console. Let´s analyze the data. First go to the Analysis by clicking on the icon in the Welcome screen, see Figure 6.

Figure 6. Go to the Analysis interface by clicking the button in the Welcome screen.

 

You are now presented with the Analysis Interface – a combined database viewer, data visualizer and signal processing tool, as seen in Figure 7.

Database Viewer

The upper half of the interface is the database viewer and manipulator area; and various parts have been indicated by rectangles. The blue rectangle is the database selector; it lists all database modules (if any), and upon selection updates the database table (red rectangle). Some handy database manipulator buttons are indicated by a green indicator. The database table has a number of columns showing various attributes of the database table e.g., the index, Assembly Id, date etc. You may sort and search by clicking on the column name. In addition, column relative position and width may be altered.

Figure 7. Analytical Interface with the database components highlighted.

If you change the database by selecting another database in the selector, the table content will be cleared, and any plot will also be cleared. The column names of the database are rather self-explanatory, but there are some special columns:

  • Comment – The only field that you may enter your own text. E.g., to leave a comment about the quality of the data. Double click to activate the enter the edit mode. Upon pressing “Enter” – the comment will saved to the database.
  • Datapoints – The total amount of data points; time and channel data for all sessions in the assembly.
  • Big Data – A true/false of whether Evolved Horizon has determined that this data is above a threshold of what your RAM can handle. Other techniques will take over the plotting and analysis. Some options may not be available to Big Data marked datasets.
  • Unique ID – Each assembly has an unique ID (UUID) tag attached to it. Both in the database viewer and in each file. Hence, the UUID makes it easy to track an individual assembly, and to handle export data.

The database manipulator buttons will be discussed below (from left to right):

  • Remove – Removes the selected entries from the view and puts them in the recycle folder (if you have not changed the default setting of the database).
  • Export – Will be discussed in step 6 below…
  • Recycle – Transfers any deleted items from the recycle folder back to the database again.
  • Reload – Reload the database and tries to add database items from files not currently included in the database. Useful when there are files in the “/data” folder that is not in the database table.
  • Refresh – Refreshes the table view from the database. Useful when more assemblies has been added, but you have not refreshed the database view by selecting a new database…

Tip: By hovering the mouse over a button, a tooltip will be displayed describing the button’s function.

Of-course, you may use the console for many database related tasks. Use dbrefresh, dbrecycle, dbreload for the equivalent interface commands, as described above. If for some reason you wish to totally reconstruct the database from the *.ehdat files, use the dbreconstruct command. It will clear any database table and reconstruct it from the ground up using the source files in the “../data/” folder.
For all commands above: please use the “-id <database id>” option if you don’t want every loaded database to be included. Hence, without the “-id” options, there in an implicit “-all true”, meaning that the command will be formed on all databases. Below is a screen-shot of a dbreconstruct command, Figure 8.

Figure 8. Reconstructing the database in console suing the dbreconstruct command.

Please make sure that you backup your database now and then. Essentially, the *.ehdat files in the ../<database folder>/data are the only essential files, and with them any database may be reconstructed.

Tip: For best performance and security, put the database folder on a fast, local drive that is backed-up reguarly.

Step 4 – Visualize data

Visualizing channel data using the default plot options are simply a matter of choosing your assemblies (holding down Ctrl or Shift for multiple selection) and pressing the Visualize Data button. All channels; of all sessions; of all assemblies currently selected will be loaded and plotted.
The data is now plotted, as seen in Figure 9, with time on the x-axis and the data values on the y-axis. The y-axis label is a combination of all the channels unique “Unit” settings. A legend in the top right corner is included. The legend entries are color coded to match the line colors, and the strings are constructed in the follwing manner:
DB#<the database index> | <Date started> | <session name> | <channel name> 
to be able to differentiate the channels.

Figure 9. Visualize the channel data by selecting a database entry and click the Visualize Data button

Relating, visualizing a subset, or all, of the available experimental data is now readily done within seconds. We have tried visualizing lots of assemblies with millions of data points and they are quickly rendered into sharp plots. However, the speed of the rendering process is helped by fast SSD drives and CPUs.

Ok, this step was fairly easy and straight forward… lets try to make some interesting analysis of the data. For example by changing the plot type, or by applying some digital signal processing filters (DSPs).

Step 5 – Analyze Data

Before we jump to the analysis of data, we should define some handy DSPs. Evolved Horizon DSP perform data analysis on each of the channel vectors, and they range from scaling and normalizing to calculus and smoothing filters. With DSPs you will have tools to gain data-driven insights. DSPs are modules just as a device or a channel. We chose to design Evolved Horizon that way, since the DSPs often need to have their analysis parameters adjusted to suit your needs. We cannot create the complete range of pre-defined DSPs a priori.

Go to Welcome Screen -> Modules -> DSP and define a set of DSP to look like the screen shot below in Figure 10. To describe the exact definition of each DSP is beyond the scope of this tutorial. In essence, we have defined three smoothing DSPs (moving mean and median, polynomial smoothing), three scaling DSPs (normalization, and two custom equations) and one calculus DSP (derivative). You may for now leave the settings of these DSPs to the default.

Figure 10. A set of DSPs that will be used in the analysis of channel data.

Ok. We now have some DSPs ready to be used, but before we do that, we will start by adjusting the plot type and colormap. Open the advanced analysis panel by clicking on the two downward facing arrows on the left-hand side. You will now shrink the plot-area in favor of an advanced analysis panel consisting of plot type selector, colormap selector and the various DSP components. You may click the double arrow again to hide the advanced panel. Let’s start by changing the plot colormap, the possible ones are:

  • Earth – Ranging from yellow to blue through green. Based on MATLAB’s “Parula”.  Default
  • Sun – Ranging from white to red through yellow. Based on MATLAB’s “Hot”
  • Space – Ranging from magenta to cyan. Based on MATLAB’s “Cool”
  • Spectra – Ranging is the ~rainbow. Based on MATLAB’s “Jet”
Figure 11. If you are not happy with our default choice of color: change them into any of our available color maps.

There are currently four types of plots:

  • Line – Transient data are plotted with a line connecting the data points. Default.
  • Histogram – Divides the data into a number of bins and widths. Shows underlaying distribution.
  • Box Plot – Displays the distribution of data in a box consisting of 50% of all data. The box interquartile ranges from 25% – 75% with the 50% (median) is marked as a notch. Wishers extend to ~99 % of all data. Outliers are marked as red “+”. Non-overlapping channel-channel notch overlap indicates a difference in median in the 5% significance level.
  • Power Spectra – Shows the transformation from time-frequency domain in the form of the Lomb-Scargle power spectral density (PSD) estimate. dBW/Hz units.

We will show you some examples of the available plot types, except for the Line:

Figure 12. A histogram of some sinus Waves, bin width is determined automatically and the bin length is the number of counts.
Figure 13. Boxplot of the channel data. The box covers 50% of the data distribution marked at 25%, 50% and 75%. The whiskers extends the data range to ~99 %. Outliers are marked as red “+” signs.
Figure 14. A time->frequency transformation using the non-uniform Lomb-Scargle power spectral density (PSD) estimate. A peak indicates a high relative contribution of that frequency to the original data.

Applying DSP

Ok, up until now we only changed the presentation of data, except for the Power Spectra plot type. This time we will apply some DSPs, as we defined in the sections above. Your own data will respond differently to the DSPs, but the main ideas will be the same. Let’s start by normalizing some data using our normalization DSP. Select the normalization DSP in the selector and press the button (Greek letter Ψ) on the right-hand side. This will execute the DSP on the data and re-render the plot. As can be seen in Figure 15 below, minimum and maximum of each line is now zero and one, respectively. Changing the normalization settings, you may change it any number. In this case the normalization makes it easy to compare magnitudes of each channel easily.

Figure 15. Apply custom DSP such as this normalization DSP. Useful for relating channel data of different magnitudes.

Now we will try to remove the noise from the data (assuming that the jaggedness is noise). Select the DSP from the DSP list below the selector and press the X button. This will remove the selected DSP. Apply the Savitzky-Golay smoothing DSP which we choose to label “Poly Smooth 1”. Check the Show original checkbox. The end result looks in our case like Figure 16 below:

Figure 16. Polynomial smoothing using the Savitzky-Golay technique. The original data is drawn as a greyish line.

Since we opt to compare the current data to the original, it is now easy to realize that the smoothing seems to overshoot for the yellow line and somewhat undershoot for the blue-line. Now we have two options: either change the settings for the Poly Smooth 1, i.e., the frame window or order, or try another DSP to better smooth the data. We choose a symmetric mean (average) and median statics with window width of 10 and 100, respectively. The effect of these two DSP looks like what can be seen in Figure 17 and Figure 18.

Figure 17. A moving mean smoothing DSP of a symmetric window width of 10 applied to some data.
Figure 18. A moving median smoothing DSP of a symmetric window width of 100 applied to some data.

We would say that the moving mean fails for the blue and the moving median fails for the yellow. The reason for this is strongly rooted in the sample frequency; the yellow line is sampled at 10 Hz and the blue at 100 Hz. The moving median is also more discrete than the more continuous mean, and is more robust towards outliers, as expected.
Try yourself: apply our “Outlier DSP” to see if you could filter outliers.

We continue by showing a custom scaling DSP. You may scale the data with respect to both time and value, see below for a sinus perturbation of data. Virtually any mathematical function of the MATLAB language is available, and that is a lot!

Figure 19.

Now we shall demonstrate the true strength of DSPs: stack multiple DSP on top of each other for a combination effect. This will allow for e.g., smoothing, scaling, and e.g., integration and in that order. Removing any DSP will cause the analytical engine to recalculate the entire DSPs stack. The Show Previous checkbox is now useful; before it was the same as Show Original.
Show Previous overlays the current data with the latter DSP data, allowing you to see the progress of each step. See an example figure below, Figure 20, for a multiple DSP stack.

Figure 20. Many applied DSPs stack and produce a final plot. Gaining data-driven insights has never been easier!

We are almost done with this step, just a couple of tips:

  • Plot Type visualizes the current data i.e., the combination of any applied DSP.
  • The effect of the DSP on the raw-data is only temporary: You may not alter the raw, original data, as stored in the database.
Step 6 – Export Data

As a researcher you probably want your data collection to be smooth, flexible and robust. Also, you will probably value a structured way to view, relate and perform analysis of the experiments, as shown above, to, for example, be able to answer a hypothesis. But we totally understand that you, in the end, want the data exported. Either as a summary. Or to a more general file format. Or to further post-process it. Or to have the figures published using the font, colors etc. that you wish. Don’t fret, Evolved Horizon has many export options.

Basically, you may either export the data from the database table in the upper part of the analysis interface, or you may export the current visualized data (with the applied DSPs).

Database Export

Export the meta-data (settings, Ids etc) and the raw data from each of the selected files to either Excel or csv formats. You will be prompted for filename and location in a dialog, and the filename will be: <your filename>_<UUID>.<format>. E.g., “My export_16fcee28-6de2-4a35-9661-dd17c1e5956c.xlsx”. Thus any data stored in the database may easily be transferred to other systems and formats using the export button. Choosing many database entries results in many files (one file for each entry).

The datafiles are based on binary matfiles (.mat), that may easily be opened in MATLAB. Your data will never be closed in a Stardots propriety format!

Visualization Export

You have three main export options for the visualization export, reachable from the interface via the three right-most buttons at the bottom of the interface. The export buttons are, from left to right:

  1. Data – Choose to export the current data in the CSV or Excel-formats.
  2. Image – Save the current plot as an image in the following image formats: .png, .jpg, .eps, .pdf.
  3. Snapshot – Take a snap shot of the current plot. In this snapshot (MATLAB figure) you may perform many actions: zoom, pan, rotate, label data points, change the legend, print, save, export to image. You may open the saved figure (.fig) If you have MATLAB installed. From MATLAB you may alter just about anything in the plot including colors, annotations etc. etc. The MATLAB figure format is a very common format in the R&D community.
Figure 21. Two snapshots with data exploring, zooming, printing and exporting options.

Now – go hunt for that insight!