The advent of data

Throughout the years there have been several approaches for the storage of data. It started with the punch card and progressed through magnetic storage such as tape and disk with further progression. Then came optical storage such as compact disk and optical drives, then to flash memory such as solid-state drives, and finally to the modern cloud storage available today.

In each progression, it got a little better with the amount and portability of the data. As an example, The William-Kilburn Tube invented in 1947 was the first fully electronic storage of data and was about the size of two soda cans stacked on top of one another. You would only need 72 of these to store the equivalent of a single JPG image. Contrast that to a high capacity micro-SD card today, just a little larger than a couple of tic-tacs which can store up to a terabyte or about 450,000 JPG images.

As the ability to store data has improved so has the ability to use it and access it. Data structures have improved with time, but their basis of operation is the same. They are to relate the collection of data values to one another and operations that can be applied to them. There are flat file structures such as ADABAS and DB2 and then relational structures which most are familiar with implemented in relational data stores such as SQL Server, Oracle, Access, etc. There are Object databases that use graph type structures to relate the data some familiar names might be MarkLogic, Neo4J, or Neptune as examples. If you want to explore even more data types there is a pretty good complete list here. It can be overwhelming as there are more than a few as in most areas of technology someone is always trying to invent something better.

Creating a data emulsion

You can really create magic when you mix models together. As part of the Cambria Integration Platform Solution (CIPS) we have implemented Snowflake which is a Cloud Data Platform. It executes fully in the cloud and can elastically scale and allows for the storage and querying of both structured (relational) and semi-structured (graph) data in one place. What makes this so incredibly powerful is that every data element you consume doesn’t necessarily have to have a place to live in a structure. As an example – as part of the platform we ingest a JSON or XML message that has 9 data elements. We store 6 of those elements as columns in a relational table along with a pointer to the message. We can then query using standard SQL and dotted notation the 6 relational data elements and then join on the pointer the 3 other data elements from the message as if it was all relational data. Take this example below:

We store some of the data from the incoming data stream to a relational table and the rest of the stream as raw.

We can create a query that joins the structured and semi-structured data into a single response.

From the example above you can see the power of having access to an infinite amount of data elements without having to painfully modify the relational structure every time you want to access a new data element. This query can be used to create a view and that view accessed using traditional SQL as if all the data were relational. As any good GenX-er would say, “dang that’s rad!”. This is just one example of the power in modernized data structure and there are undoubtedly many more.

In a nutshell, data storage and data structures have come a long way and undoubtedly will continue to evolve and mold as AI becomes more ingrained in our solutions. Data is of course the driver of AI and has become the new currency of the digital world.