GeistHaus
log in · sign up

Qlik Dork

Part of qlikdork.com

Let's make it simple

stories primary
Step by Step… Into Garbage Data Quality: My Smart Knee Experience
Data Quality

My Smart Knee In December of 2023 I had a fight with a step ladder, and it won. As a result I’ve needed 3 surgeries and the most recent was a total knee replacement. Doesn’t sound fun, but for a… Read more ›

The post Step by Step… Into Garbage Data Quality: My Smart Knee Experience appeared first on Qlik Dork.

Show full content
My Smart Knee

In December of 2023 I had a fight with a step ladder, and it won. As a result I’ve needed 3 surgeries and the most recent was a total knee replacement. Doesn’t sound fun, but for a data guy I was really excited that my surgeon was going to install a smart knee. No, not a bionic knee like the Six Million Dollar Man. Just like any other TKR device but with the added value of a blue tooth communication device in it that could report data on my progress. Visions of bar charts, and trend charts, and best practices for visualizing knee data danced through my head as the surgeon went to work.

Alas, I have no best practices in visualization to offer you. Because the data obtained is hot garbage.

In their book “Data Quality Fundamentals: A Practioner’s Guide to Building Trustworthy Data Pipelines” the authors used a phrase I will never forget:

“Data for the sake of data is as useful as a fish riding a bicycle.”

The irony of it is that I’m at the stage in my recovery where I’m supposed to start using a stationary bike to gain flexibility and motion. As I go through pain later today doing that, at least I will have a smile on my face from the image I generated for this post.

Freshness

One of the overlays in a Venn Diagram for Data Quality and Modern Data Stack Pipelines would be freshness. If data isn’t flowing then the data pipeline is clogged up, and we all know how bad it smells when pipes are clogged up. The same goes for Data Quality. If the data is stale it simply can’t be trusted, and nobody is going to act on it. Before even seeing a single data value, my brain was already triggered … what do you mean you don’t know when you last talked to the device? The single easiest piece of data I would expect from a “smart device” is the ability to “log/trace” the communications. But here I was confronted with the inability to even know how fresh the data was when I looked at it. For the sake of this post I will refer to a Dork Trust Scale that goes from 0 – I would never act on this data to 10 – I will believe this data and use it to motivate me. I was already lowering my expectations and we will say I was at an 8 on my Dork Trust Scale.

Data vs Gut Driven

I’ve been in the industry long enough that I’ve challenged a lot of people with the old “You need to be more DATA DRIVEN, and less GUT DRIVEN” phrase. Yet, there I sat as the CEO of my Smart Knee looking at this chart. Instantly my gut took over and said “This is WRONG” “This CAN’T be right” “I DON’T TRUST this.”

I’m not trying to give you TMI but it takes me at least 15 steps to get from my bed to the restroom. It takes 20 to get from my reclining chair to the kitchen or the bathroom. Thus, my gut was telling … me this data was hot garbage. Before completely lowering the value on my Dork Trust Scale, I had to look at the “labels” on the chart. It was titled “qualified steps.” What does that mean? So, I pressed the question mark in the upper right corner expecting to see the semantic definitions for their “terms.” Bummer was it took me to a contact phone number. Which I did call and did spend 30 minutes on-line with and was told that the term “Qualified Steps” was rather technical. While I only partially understood their definition, the fact that I tried to express to him is that I had left my house on Saturday and walked quite a bit and walked many times on Sunday actually feeling good that my “gate” was improving. But they showed 0 steps. The complete absence of values for 2 whole days was simply to much for my gut to handle. My Dork Trust Scale had dropped to a 5.

Data Fidelity

Another core data quality concept is that of relationships. Fields/values don’t exist in a vacuum. Many years ago my dear friend Prashant Natarajan introduced me to the wonderful phrase “data fidelity” in his best selling work “Demystifying Big Data and Machine Learning for Healthcare.” Data fidelity is the measure of how faithfully a dataset preserves both the intrinsic correctness of individual fields and the contextual dependencies between them, ensuring that each field’s value retains its intended meaning only when the related fields that give it context are also valid. In the sales world it would mean you can’t act on the value in a field Discount Percentage if you don’t have the value for the Original Price.

In the case of my knee I looked at the chart for the Distance Traveled and my heart sank. You see Distance Traveled and Qualified Steps taken are directly related. Yet, I was being shown that I doubled the qualified steps I took on Monday vs Friday, yet the value for Distance Traveled didn’t even show up. There was NO data fidelity. Not only that, but if you look at the labels in the chart you will see the developers clearly don’t understand basic axis values.

But it gets even worse. Walking Speed is also related to Steps. Ignoring the axis values being incorrect, after all it’s a smart knee, not a smart visualization tool, you will notice that my speed seemed to improve slightly on Saturday. Ordinarily, you would think “good for you Qlik Dork you were getting better.” After all that’s kind of the goal to use it as motivation to continue improving.

But this chart actually has the opposite effect. How did I improve on my speed when in fact they reported 0 steps for Saturday. Worse, they reported steps on Monday, yet no distance traveled and clearly no speed.

Summary

If you are struggling with why nobody is adopting your reports, your analytics or your AI … consider where your data quality is at. You know that I’m a data driven guy. That I’m passionate about helping people make data driven decisions. Yet, here I am my friends telling you that my Dork Trust Scale is at 0 for my “smart knee.” The data doesn’t even meet a modicum of Data Quality standards. While I was really looking forward to having a data driven approach for my road to recovery, here I am relying on my gut to tell me if I’m getting better or not.

But this post isn’t just about my not-so-smart knee. It’s about helping you understand how vitally critical data quality is in enabling the executives/leaders in your organization to make data driven decisions. If you think you can slap charts together and then wait for them to complain about data quality, and then fix it … you are sadly mistaken. If your executive team thinks they are simply funding your efforts for a fish to ride a bike, and that they will free up even more money for those new AI projects you would like to undertake … you are sadly mistaken.

Data Quality needs to be built into your data pipeline from the ground up.

 

The post Step by Step… Into Garbage Data Quality: My Smart Knee Experience appeared first on Qlik Dork.

https://qlikdork.com/?p=85659
Extensions
Query Tags: Turning Precision into Visibility
FinOpsCloudCostManagementCloudDatabasesDataOpsQlikQueryTaggingSnowflakeTraceability

Background 6 years ago I wrote a post entitled “Actionable Intelligence” in which I described flipping the script on the data storytelling world. Rather than presenting a story about the data you have, you first consider what story you need… Read more ›

The post Query Tags: Turning Precision into Visibility appeared first on Qlik Dork.

Show full content
Background

6 years ago I wrote a post entitled “Actionable Intelligence” in which I described flipping the script on the data storytelling world. Rather than presenting a story about the data you have, you first consider what story you need to tell. Then you make sure you have the data to tell it. When I wrote “The Hidden Lens Every CFO Needs On Cloud Data Spend” I shaped an insightful story of how technical precision could be turned into strategic visibility which is needed. Query Tags are the key to revealing that hidden lens of traceability. While that post was intended more for the big cheeses in the organization, this post is for the data people in the organization. You know: Those moving the data, as well as those consuming the data.

The Foundation: A Flexible JSON Framework for Query Tags

In “The Hidden Lens Every CFO Needs On Cloud Data Spend” I explained that it is Query Tags that provide the hidden lens into the accountability and intelligence when it comes to cloud database costs. Each cloud database vendor varies in how you apply those query tags. But all are TEXT based. While you could define your query tag to be “Bob the Big Cheese Told Me to Insert this Data.” While it might be true, it is not a good foundation from which to move forward. I suggest a JSON framework for those query tags instead. One that lets you identify all of the pieces of information necessary to answer the questions that your organization can’t answer from the black hole that is the current bill.

Based on the expanded list I suggested in that post, your JSON structure would look like this with simple field/value pairs.

{
  “business_unit”: “”,
  “project”: “”,
  “environment”: “”,
  “process”: “”,
  “priority”: “”,
  “application”: “”,
  “deployment”: “”,
  “process_owner”: “”
}

If your organization is a giant conglomerate and business_unit won’t provide the clarity intended, perhaps you will need to add another fields like “region, territory, team” etc. If everything your organization does is in the cloud, you won’t have need for the “deployment” value, as none of the entries would be “on-premise.” In other words, this is my suggestion for a framework, not necessarily the exact blue print you should espouse for your company.

It’s always helpful to see an example, so here you go:

{
  “business_unit”: “Sales”,
  “project”: “Q1_Forecasting”,
  “environment”: “Prod”,
  “process”: “Data Load”,
  “priority”: “High”,
  “application”: “Qlik Replicate”,
  “deployment”: “On-Premise”,
  “process_owner”: “Qlik Dork”
}

Tagging at the Source: Implementing with Qlik Replicate

Until now I’ve been agnostic in terms of implementations. For the remainder of the post I will be talking solely to Qlikkies and the content will be all about how they can implement this framework. Specifically, these instructions revolve around Snowflake as the cloud database provider.

As you know Qlik Replicate involves push down SQL. Which means you can “tag” all of the SQL Queries that are pushed down to Snowflake.

Step 1: Edit your Snowflake Endpoint for your project

Step 2: Click on the “Advanced” tab

Step 3: Click on Internal Parameters.

Step 4: After the “Internal Parameters” dialog is presented to you, type “afterConnectScript” into the search. As soon as you are finished it will show a drop-down with “afterConnectScript” and you need to click on it. The concept here is that once your session is connected to Snowflake, we want to set a session value that will be used for the remaining of our session. In other words we want everything tagged with what we indicated should be used. ie Our framework.

Step 5: The system will then allow you to input a value.

Step 6: Please refer to the Snowflake documentation for Alter Session to understand all of the values that we could potentially set. The key is that it uses this basic syntax: “alter session set query_tag = ‘{tag you want}’

As I shared before it wants a TEXT value. For the framework example I used earlier, the Value for the afterConnectScript Parameter would be:

alter session set query_tag = ‘{ “business_unit”: “Sales”,   “project”: “Partner Testing”,   “environment”: “Test”,   “process”: “Replication”,   “priority”: “Trial”,   “application”: “Replicate “,   “deployment”: “On-Premise”,   “owner”: “QlikDork” }’

Obviously you need to input your actual values, and not those from this example. Once you have input the value press “Ok.”

Qlik Replicate provides an afterConnectScript parameter that can be used to set the query tags for the session as data is being moved into your cloud database

Step 7: Press Save for your endpoint

Step 8: Now when you run this Replication task all of the queries that are pushed down to Snowflake will be “tagged” and will visibility and clarity. No more black holes of just a bunch of queries against Snowflake.

Snowflake query details revealing the query tags that we sent for the session in which the query was executed

Closing the Loop: Query Tags in Qlik Sense

Your organization very well may interact with your cloud database much more for analytics/artificial intelligence than you do in pushing data into it. Thus it is even more vital that you utilize query tags as for your consumption queries.

Step 1: The first step is to ensure that all of your Snowflake connections are configured to allow you to add the query tags needed. Simply edit each of your Snowflake connections and check the box under Miscellaneous to allow non-select queries. You may well have done that for some of your connections in the past for various reasons. So, don’t be surprised if you edit them and find this already checked.

Step 2: When you issue a LIB CONNECT TO statement a “session” is being established with Snowflake in order to communicate and issue queries. Again please refer to the Snowflake documentation for Alter Session to understand all of the values that we could potentially set. As with our use for Qlik Replicate we want to alter that session and the JSON framework for our extraction queries against Snowflake. To do that we simply need to add a line of code like this and be sure an notice that in addition to configuring the connection to allow the non-select queries, we also have to add a statement that clearly indicates we don’t expect any return from our command:

LIB CONNECT TO ‘Snowflake Connection Name’;
SQL alter session set query_tag = ‘query tag’ !EXECUTE_NON_SELECT_QUERY;

In our case imagine that we are going to be extracting our Sales data that was moved into Snowflake by Replicate. The particular project is to track Quarterly Achievement numbers. For Replicate I used a priority “trial” suggesting you might have “trial/low/medium/high” type priorities. I also encouraged you to make the values fit your organization. So, this time I used T1 as the priority indicating it is Tier 1. Many organizations refer to things like that to indicate “if this fails on the weekend or late at night the on-call person better get it running again.” While our Qlik Replicate tasks run on-premise, this is Qlik Analytics in the cloud thus SaaS is the deployment type.

SQL alter session set query_tag = ‘{ “business_unit”: “Sales”,   “project”: “Quarterly Achievement”,   “environment”: “Production”,   “process”: “Extract”,   “priority”: “T1”,   “application”: “Qlik Sense”,   “deployment”: “SaaS”,   “owner”: “QlikDork” }’ !EXECUTE_NON_SELECT_QUERY;

Traceability

In many enterprise environments one extraction application is utilized to extract many different things and places them in a QVD library. Feel free to utilize additional SQL Alter Session commands as needed to be as clear as possible. We don’t want anything hidden.

LIB CONNECT TO ‘Snowflake Connection Name’;
SQL alter session set query_tag = ‘tag for Finance data extracts’ !EXECUTE_NON_SELECT_QUERY;

FinanceTable1:
FinanceTable2:

SQL alter session set query_tag = ‘tag for HR data extracts’ !EXECUTE_NON_SELECT_QUERY;

HRTable1:
HRTable2:

SQL alter session set query_tag = ‘tag for Sales data extracts’ !EXECUTE_NON_SELECT_QUERY;

SalesTable1:
SalesTable2:

For your process value you might be tempted to always use EXTRACT since Qlik Sense is always reading data. However, since the point of using the Query Tags is to add traceability feel free to be precise with values like FULL EXTRACT or INCREMENTAL EXTRACT. In addition if you utilize the On Demand Application Generation (ODAG) feature or Dynamic Views, consider specifying that in your query tag LIVE EXTRACT or ODAG EXTRACT or … whatever term you need to call that out so that your organization can clearly understand the difference in load for in-memory applications versus these on-demand technologies.

Summary

This is absolutely additional work above your existing tasks. However, your taking this time to ensure precise query tags makes your invisible work visible. It adds strategic value to your otherwise technical work. I see you … and you do matter.

The post Query Tags: Turning Precision into Visibility appeared first on Qlik Dork.

https://qlikdork.com/?p=85626
Extensions
The Hidden Lens Every CFO Needs on Cloud Database Spend
FinOpsAccountabilityCloudCostManagementCostAttributionCostTransparencyFinancialTransparencyOperationVisibilityQueryTaggingTagGovernanceTraceability

The Black Hole Your cloud database bill just arrived, and it’s a real doozey. Worst yet is the fact that it is a black hole. In other words when you ask for the spare change in the companies couch cushions… Read more ›

The post The Hidden Lens Every CFO Needs on Cloud Database Spend appeared first on Qlik Dork.

Show full content
The Black Hole

Your cloud database bill just arrived, and it’s a real doozey. Worst yet is the fact that it is a black hole. In other words when you ask for the spare change in the companies couch cushions to pay for it, you have no idea where the money is going when asked. You expect clarity on your bank/credit card statements. In this post I’m going to explain the hidden lens that can be used to gain it with your cloud database bill as well. And help you bring what’s really driving your cloud database spend into clear focus.

Clear Accountability

The first step would be ensuring clear accountability. That means you should be able to track:

  • Which business unit spent which part of that $85,200? Perhaps you went to a cloud database environment to support sales initiative. Is that what it’s being used for?
  • Which projects are consuming what dollar amounts of spend? Every project in the organization sounded good at some point to the person who started it. But do the projects have a sound return on investment or are we throwing money down the drain.
  • Which environments are the most costly? Is our development environment used way more than our production environment?
  • Whether the spend  is going towards populating your cloud database, or transforming values, or reading the data or AI against the data?

Your black hole, should begin moving forward on the path of insight.

Actionable Intelligence

In my eyes those items for clear accountability are just the minimum you need to ensure your billing data is brought into focus. However, for actionable intelligence we need our lens to zoom a little more so that we can track:

  • What priority are each of the processes that are running? Are we spending more on what the company deemed as high priority, or is more being spent on nice to have tasks?
  • What applications are driving the spend? Is your IT spend focused on the cost of the tools, or can you optimize your spend with a different tool?
  • Which deployments are you running and what is each costing? Your company set out to modernize it’s data strategy so does your cloud database spend reflect that?
  • Who are the owners of the processes? The process owners are typically different than the people running the queries against your cloud database.

The Hidden Lens

If you are anything like me you might have a lifetime of pictures at your fingertips. But who in the world is that adorable baby covered in cake icing? Is that your own daughter, or your Uncle Bob’s, neighbors, best friends daughter that you met one time on vacation? I guess we should have written something on the back of the photos we developed 42 years again. We had access to that important information, but at the time like me I’m sure you just assumed you would never forget that moment. Without recording it, that knowledge is now hidden from us and the picture is pretty much a throw away, instead of a legacy picture.

Just like an unlabeled photo eventually loses its meaning, an unlabeled project in your cloud database quickly loses its context — and when the context is gone, so is your ability to know whether the ongoing cost is justified.

Imagine Carol’s manager frantically tells her that the CEO must have access to some information from some system(s) for the Board and they need it YESTERDAY. So, Carol rushes off and builds what is needed and delivers what we will call Project Big Cheese. Imagine that Project Big Cheese is responsible for $3,371. It’s the price of doing business. Right?

Certainly, but it also means that every single day a $3,371 is being added to your cloud database bill. What if Carol had labeled the picture, I mean her work, with information about it being for Project Big Cheese?

Well you were in the meeting as well, and you know, that Project Big Cheese was a 1 time thing. The company needed that information for a bond renewal that you got. But Carol’s manager seems to have forgotten to give her that key piece of information. If you had the clarity into the charges on your bill, you could stop the project. But you don’t. So, after a year you have accumulated $1,230,415 in completely unnecessary charges. So sorry! I probably should have lead with that, instead of burying it down here on page 12.

In data terms that information you would write on the back of your photos, if you choose, is called “metadata.” Your data and analytics teams have access to that traceable knowledge when they they run their queries to insert data to, transform data in, or consume data from your cloud database. All the major cloud database providers (Snowflake, Databricks, Amazon, Microsoft and Google) provide the ability for your data teams to “tag” the data flowing in or out with that “metadata.” Thus they literally provide a lens, but it’s hidden from view unless your organization wants to use it.

Put another way, think of that of you and your college mates going over Niagara Falls in a barrel. Which would have been a classic, had your goofy friend not left the lens cover on. Your organization needs to take action to remove the cover, by adding tags, in order for the lens to be of any use.

Query Tagging

Without getting into all of the technical weeds, these “query tags” are the hidden lens that provides your organization with clarity into it’s spend. That’s what financial operations is all about for leadership.

For those reading this who deal with the data, understand they are more than just a #finops checkbox. It’s what makes your work visible, accountable and visible to leadership. It’s the checks/balances that are completely missed on all of the frantic calls. They are what provide you insight in reviews of your spend. Obviously each organization will differ in size and intent and the way costs are broken down. As a starting point I recommend this “minimum” list of values to ensure accountability:

    • Business Unit
    • Project
    • Environment
    • Process

If your organization wants full traceability then I recommend this extended “tag” list:

    • Business Unit
    • Project
    • Environment
    • Process
    • Priority
    • Application
    • Deployment
    • Process Owner

🧠 If you are the type who lives in the technical weeds be sure to read my follow-up postQuery Tags: Turning Precision into Visibility.” 🧠

 

The post The Hidden Lens Every CFO Needs on Cloud Database Spend appeared first on Qlik Dork.

https://qlikdork.com/?p=85598
Extensions
Read Data On The Fly
Application AutomationArchitectureCodingSelf ServicelivePartial ReloadQlik Senserapid prototype

Background Recently I’ve heard a lot of comments suggesting that Qlik Sense must be pre-loaded. Pre-loading has a lot of advantages for speed and reuse, but that doesn’t have to be the case. So, I published a video where I… Read more ›

The post Read Data On The Fly appeared first on Qlik Dork.

Show full content
Background

Recently I’ve heard a lot of comments suggesting that Qlik Sense must be pre-loaded. Pre-loading has a lot of advantages for speed and reuse, but that doesn’t have to be the case. So, I published a video where I demonstrated the ability to browse a Snowflake catalog, choose tables, and then read the data on the fly into application. If that sounds cool, I then demonstrated the ability to create a new application that loaded the data from the selected tables. All with no QVD’s up my sleeves.

The goal of this post is to document the technical bits of how I was able to do those things. If you haven’t seen the video, feel free to watch now before digging in to all the details.

Target Audience

I don’t often identify who my posts are for, but for this one I would probably suggest that if you aren’t a technical programmer type, you stop here. The video let you know what is possible, and the rest of this post gets technical in terms of how to actually make it happen. I don’t want you to lose the magic, by getting overwhelmed with the technical implementation.

Partial Reload

The magic behind the “Ingest” button I had on the screen is the ability of Qlik Sense to do a Partial Reload. That means that all of the data you had previously loaded (full load) can stay in tact, while new data is added to your data model. First you need to call the IsPartialReload function. Then you check the result of calling the function and either do the code you wish to do for an initial “full load” or the code you wish to perform if it’s being called to do a “partial reload.” The basic structure in your load script would be as follows.

If (IsPartialReload() = False()) THEN // Full Load Else // Partial Reload End If;

In my code the “full load” consisted of reading the data Snowflake Data Catalog to get a list of databases, schemas and tables. Then I simply presented those things in filter boxes just like you would any other data. Once I selected a table, an “Ingest” button was enabled. When I pressed the button, voila the system read the data on the fly for that table and it was part of my application. In Figure 1 I show you the properties for the button. First I go through a series of Setting Variables including setting a variable for the type of action I wish to take: “Add a Table” or “Clear the Tables.” Then I simply invoke the Action “Reload Data” and check the box indicating that I do not want to reload everything, I simply want to do a partial reload.

Figure 1: Ingest button properties panels showing Partial Reload action

 

In the “ELSE” section of my “IsPartialReload” condition of my load script I then check the “vReloadType” variable. If I pressed the Ingest button, then I simply setup the same type of code I would if I were pre-loading the data. In a previous video I produced that fully explains the partial reload functionality, I explained the concept of the load prefixes “Add” and “Replace.” Basically the “replace” says “if this table already exists in memory, replace it.” Using variables to read data on the fly from any location you wish. Now that’s cool.

Figure 2: Load script code for a Partial Reload

Clearing Tables

The intent of adding tables on the fly was to be able to rapidly prototype a solution and determine what data would be needed to answer business questions. Proving that Qlik Sense “can” read data on the fly is simply an exercise in what “can” be done. Being able to do rapid business intelligence prototype as I did in the video is what “should” be done to bridge the gap between IT and Business, or to allow end users a real self-service approach in which they didn’t even need to know how to load data. Clearing the tables was functionality that I added so that you could then start over and test a new scenario.

The Clear Tables button also invokes the partial reload script. But in the “else” condition for the “vReloadType” variable, instead of adding a new table, it clears the tables out of memory. Notice in Figure 2 that while looping through all of the in-memory tables backwards, I tell it not to delete 3 specific tables. The Databases, Tables and Tenants tables. You can probably guess that the Databases and Tables “tables” represent the data read from Snowflake. The Tenants table is the data read via a REST API that loads up the information about the SPACES available on the tenant. That information is needed when trying to create a new application as you will see in the next section.

Create a New Application

Reading the data on the fly probably caught your attention when you watched the video. But creating a new application on the fly, that then read all of the data in a more traditional approach probably kept your eyes glued to the monitor.

One of the key elements was being to identify the Space where the application should be stored. Hence that “Tenants” table that I loaded, and that I avoided not clearing out. Thus, the user is able to select the Space where they want to create the new application. The Application Name and Description are simply Input boxes for variables. So, let’s just jump right in to the “Create” button.

First the button action is set to “Execute Automation” and I selected a pre-built Qlik Application Automation. (Described later.) I checked the “Run mode: triggered” box which enables any/all users to have permissions to trigger “my” automation. Then I configured a bunch of parameters that simply contain a bunch of variable values. Tables is a concatenated list of all of the tables that were selected. The Library is a variable that contains the name of the Snowflake Connector to use to read the data from. The rest you can guess simply match selections and inputs from the screen.

Figure 3: Create button properties

 

Parameters go t0 the Start Block

I’m about to unveil one of the least known things about Qlik Application Automations. Any parameters that are passed automatically go to the Start block of the automation, and you don’t need any type of Input block.  They are passed just like you would pass a “body” to any other API.

Figure 4: Image of Start Block with an Output Block

 

For debugging purposes, I tend to always follow my Start block with an Output block so that I can see what parameter names I passed in, and how the values come across. You would be surprised how many times this Dork has typos, or I actually pass the wrong variable. In Figure 5, you can see the output history from one run.

Figure 5: Output block showing all of the passed in parameters

 

Once you know about the fact you can pass parameters, the rest becomes easy. You simply refer to them in the same way you refer to anything else in Qlik Application Automations.

vScript

The center of attention for my Qlik Application Automation is simply a variable named “vScript.” Why?

Well as you will see the rest of the process really just involves standard automation blocks, like Create Application, Save Application and Do Reload. As you know an application without a load script is useless, and that’s what this vScript variable will contain. The script we want that new application to have.

In Figure 6 you can see that the first thing we want to have in our load script is the Library Connection that will be used to load the data. Notice that I hard code the text “LIB CONNECT TO” but the library name itself comes from the Start blocks, Library parameter that gets passed in. Another thing to take note of is that Qlik Application Automation provides a handy dandy function you can refer to called “linebreak” that inserts a line break so our text won’t all run together.

Figure 6: Beginning of the vScript variable

Looping through the Tables

If you notice in Figure 5, my Qlik Sense application sends all of the tables as a concatenated list like: “Table1,Table2,Table3.” That’s because the expression behind the “Tables” parameters is simply the Qlik Sense Concat function: =Concat([Table Name], ‘,’)

Well we need to load each table separately in our load script (vScript) and that process begins by using a “Loop” block, that iterates through the “list” that I “explode” out of the Start.Tables string.

Figure 7: Loop block with explode table list

 

Before you panic … you DO NOT HAVE TO TYPE expressions like that in Qlik Application Automation. That string was created for me. When the item is blank I chose “Add Formula” and then used a Wizard as shown in Figure 8. Keep that in mind as you see other figures as well. I simply chose to show some of the values in their “raw” form so you could see how everything refers to the parameters from the start block. 

Figure 8: Wizard to build the explode command from the Start.Tables string

 

Adding Select Commands

You are already familiar with the fact that in your hand input load scripts you say “Table: Load …. Table Load … etc.” In Figure 9 you can see that same pattern. We append a “linebreak” each time through the loop. Then we append the table name that we are currently looping on. Then we simply do “SQL Select * From” before each table. Notice that we definitely want the fully qualified path of “Database.Schema.TableName” and that we surround each of the values with double quotes, just like we would type in Snowflake. Then we simply end each table in the loop with a “linebreak.”

Figure 9: Appending table names 1 by 1 to our vScript variable

 

Metadata

If I’ve said “Metadata” once this year, I’ve said “Metadata” a thousand times this year. In fact I even wrote an entire post called “The Importance of Metadata.” Not only did I hammer home the importance of Metadata, I demonstrated how to build it in to your Qlik Scripts so that Table/Field descriptions could be seen to ensure trust in the data.

So, while it is an extra step, I felt it was important enough to show you how easily you could take take care of it. Even in a situation like this where you are building an application on the fly. We simply append another linebreak to our vScript variable, then we create a mapping load called “TableMap.” The “TableMap” is populated with the Table Names and Comments for them from Snowflake. We add another linebreak and then simply follow the format “Comment Tables using TableMap” that adds the comments to the tables. Plus it’s a bonus win: Not only do you read data on the fly, you read the metadata about the data on the fly.

 

Figure 10: We have to append a TableMap to our script that will load the metadata from Snowflake

 

Create the Application

Feel free take a nice deep breath, stand, and relax your brain. Because believe it or not, the hardest part of this post is over. The next several steps are very easy to understand.

While it may seem difficult, creating an application is actually easy. You simply drag/drop the “Create App” block and fill in the simple inputs for Application Name, Application Description and Space ID. Best part … they were all passed in to the Start. Gotta love that.

Figure 11: Create Application block in Qlik Application Automation

 

When this block is invoked when you trigger the automation from the button in the Qlik Sense Application … boom a “brand new” application is created with the name and description you supplied in the Space ID that you supplied. Why did I call out “brand new” the way I did? Because I wanted to call out that there is another option for you to create an application, other than creating a “brand new” one. You could also Copy an Application instead. Pause on that for a second and think through “why” it might be beneficial to copy an application, rather than create a brand new one.

Many organizations like to start with “template applications.” Perhaps the template application script file has INCLUDE scripts that bring in a bunch of variables, or subroutines that are commonly used. Maybe the template application has a company logo that should be as a header, and color style properties setup. What if it has sheets already established for things like “Help me” “Dashboard” “Trends” “Details.” Albeit they wouldn’t have any data values they could follow a design pattern that your organization desires. Interesting to think through the possibilities. I would love to hear comments from any of you reading who are in that situation and the light goes on for you.

Figure 12: Alternative to creating a brand new application is to Copy an Application

 

Setting the Load Script

While establishing our “vScript” variable may well have been difficult to follow if you haven’t done much with Qlik Application Automation in the past, setting the script for our newly created application is easy. The Application ID is simply set to the ID returned by the Create (or Copy) Application block. If it is a “brand new” application, you simply set the script to the variable we have created and appended to.

Figure 13: Set Load Script block takes an Application ID and the Script you want it to use

 

If you do like the idea of “Copy Application” then you would want to first use the “Get Script” block and start our vScript variable with the contents of the copied application. Then keep appending all of the work we did. 

Saving the Application

I told you earlier to relax and breath again, that it was getting much easier. All we have to do now is use the “Save App” block and give it the ID of our newly created application that has our newly created (or appended) script in it.

Figure 14: Save Application block is super easy and only takes an Application ID

 

To Reload or Not Reload, that is the Question

At this point reloading the application is simply a matter of choice. For the sake of my video I certainly wanted to add a wow factor and have the opportunity to simply open our newly created application and start working. Seeing the METADATA already loaded for the data model view was kind of cool. But it is just a matter of preference. You may prefer to simply created the application, add the script for it, save it and be done. Then you can manipulate the script as desired before you ever actually load data.

Figure 15: Do Reload block allows you to reload the data for your newly created application so it’s ready to go when you open it.

 

Summary

Hopefully this journey was a fruitful one for you. Both in terms of your understand of just how flexible you can get with your Qlik Sense environment, and in terms of your technical understanding of how to implement it should you desire to. If nothing else, keep the link to this post or the video handy so when someone tells you “Qlik has to pre-load data using QVD files and we need to read data on the fly”, you can send it to them. Do a mic drop. And go grab lunch early with a smile on your face.

Bonus Points

If I did my Qlik Dork “thing” right, then this will seem like it’s absolutely something you can do. Perhaps even too easy, and you will feel guilty for allowing your end users to read data on the fly, but not using your own magical coding chops. Don’t worry I have a little challenge for you to kick things up a notch. You probably noticed that in my example I simply used “Select * From Table” as the pattern to iterate through all of the tables. For the Metadata I simply added Table Descriptions and not field Metadata. Clearly not what you would want to do in the real world. Right?

So, feel free to utilize my example to simply get started … and then try and add in the ability to list the columns for each table just the way that would be done if you used the Select in the Load Script from a connector. “Load Field1, Field2, Field3; Select Field1, Field2, Field3 from Table” kind of thing. Hint: The Snowflake Describe Table query is what you will mostly like want to use. Something like:

Describe Table “GENERALHOSPITALDB”.”dbo” .”Accounts”

You could use that in the way I pulled the table comments to create a mapping load for the field comments as well.

You can execute that command with the Snowflake “Do Query” block inside of your automation if you want so that you can have a list of fields to loop over so instead of Select *, you can load the field list. If you do this, I would recommend that you have 2 variables: vLoad and vSelect, so that for each field you can simply append accordingly.

Look at you out there getting ready to complete this bonus activity, to read Snowflake data and metadata on the fly. You go with your bad self. 

Figure 16: Do Query block can be used to fire a command to Snowflake and get a list of the fields for any table.

 

The post Read Data On The Fly appeared first on Qlik Dork.

https://qlikdork.com/?p=85558
Extensions
Visualizing Structure in an Unstructured World
Application AutomationBusiness ValueGenerative AIVisualizationaideepseekLLMQlik Application AutomationQlik Sensereasonerreasoningstructured dataunstructured data

Background Recently a very close family friend reached out to me and said “Can you help me visualize some data?” I was like “Heck yes.” We had a brief conversation and I found out my serial entrepreneur type friend was… Read more ›

The post Visualizing Structure in an Unstructured World appeared first on Qlik Dork.

Show full content
Background

Recently a very close family friend reached out to me and said “Can you help me visualize some data?” I was like “Heck yes.”

We had a brief conversation and I found out my serial entrepreneur type friend was now running yet another dental focused business called Practice Matchmaker. He had a message board where people posted messages wanting to sell a practice, or buy a practice or find an employee or search for a job. “That seems easy enough Allen, please just send me some sample data and I should be able to build a dashboard in an hour or two.” I mean after all, I am the Qlik Dork.

How hard could it be?

Guessing that like me, you assumed the message board had well structured forms that accepted inputs and the sample data I was expecting to see would have a myriad of columns. One table of seller data values. A table of buyer data values. A table of job posting values. Etc. I guess I should have known if it was that easy he probably wouldn’t have asked for my help.

Low and behold when I opened the Excel file all of the data was in a single worksheet.

No separate tables, and worse yet no myriad of columns. The worksheet had only 12 columns in it. If that doesn’t seem like enough data to you, it’s ok … I was scratching my head as well. Until my worst fears were realized: 1 of the 12 columns was simply labeled “Content.” It was a free-form, unstructured column where buyers, sellers, job applications etc., simply hand typed their messages. “Oh yeah it is from a message board where people just write to each other with no regards for Allen or the Qlik Dork’s desire to analyze and gain intelligence from their chicken scratch.”

Unstructured

Here are some of these sample messages (massaged to deidentify names, numbers, emails, locations etc):

  • Alex Carter: Hello! I’ve been practicing for 7 years and am now ready to take the next step into practice ownership. I’m looking to purchase a practice in Los Angeles with a minimum of 3 operatories. Prefer FFS/PPO (No Medicaid/HMOs). I’m also open to a part-time associateship that could lead to acquisition. Please comment below or send me a private message with any leads. Thank you! #looking #buying #losangeles #socal #california 2023-03-05 14:01:54
  • Jordan Blake: I’m interested to see if there are any associate doctors looking to work anywhere from 1-5 days a week in Santa Barbara, California. We are a general dental office looking to expand and strengthen our team. #associate #hiring #santabarbara I’m also interested in purchasing another dental office anywhere from Goleta to Ventura if anyone is considering selling. #buying 2023-07-20 2:21:54
  • Taylor Morgan: Looking to purchase a pediatric dental practice in Sacramento, Placer County, or surrounding areas. If you have any leads or are considering selling, please reach out. Thank you! #buying #california #sacramento #roseville 2023-07-20 17:55:07

Like me, you can read the messages and your brain can figure out what is going on. We could easily put this text into a comment field and let an end user read it. Right?

But how would you create structured data visualizations out of these unstructured comments? You can’t. Neither can I. So, my heart kind of sank. This close personal friend that I’ve known my entire life in Atlanta, who had basically provided job opportunities for my wife’s entire working career, needed help making sense of these chaotic and ill-formed unstructured text messages and I couldn’t do it. Well … maybe … what if I …

Reading and Inference

Quick Question: You know what is good at inferring information from text just like humans?

You got it my friend … Large LANGUAGE Models (LLM’s).

The Qlik Dork could just pass in Allen’s nasty data into an LLM with a prompt like:

What structured information can you pull from this unstructured message?

Bing-bang-boom I would get data in the way I could use it. The response I got was:

Figure 1: First response for a message asking it to give me the structured information contained in a free form text field

Figure 1: First response for a message asking it to give me the structured information contained in a free form text field

 

Look at that baby! It is so much easier to read. Right? Then it dawned on me … while it is easier to read, that is still very much unstructured data. Worse yet, was the fact that when I asked the same question for other messages I got different returns each time. Oh man, I should have known I would be foiled. All this LLM stuff is just hype anyway:

Second response for a message asking it to give me the structured information contained in a free form text field which is good but different than the first one

Figure 2: Second response for a message asking it to give me the structured information contained in a free form text field which is good but different than the first one. It’s visualizing structure in an unstructured world but not in a form I can use

Ask a better question

As a Chief Question Officer , I know that solutions are often uncovered when you ask better questions. In this case I was simply asking for the structured information, but wasn’t clear on the format. I had a very narrow expectation of what the technology could do. Like many, I got excited at the thought it could 1 simple task to make my life easier but wasn’t really pushing it. I needed to be able to load the “structured” data into a Qlik Sense application if I was going to create some spinning 5 dimensional pie charts that gave Allen a back rub. But I hadn’t asked for that. So I tried again:

Can you pull the structured information from this message and put it into a pipe delimited formatted that I could read into Qlik Sense?

Woohoo … now we are cooking with gas.

With a better question the response is returned in a pipe delimited format to ingest into Qlik Sense

Figure 3: With a better question the response is returned in a pipe delimited format to ingest into Qlik Sense

 

But shoot, again I had an issue with a different record:

Great response again with a pipe delimited format to ingest into Qlik Sense, but still different than figure 3

Figure 4: Great response again with a pipe delimited format to ingest into Qlik Sense, but still different than figure 3

Try again

It’s almost like this rusty bucket of bolts needed me to be even more specific on the DATA MODEL that I needed. Shouldn’t LLM’s be smart enough to read our minds and know we have thousands of messages and we need it to return the same thing for us every time? That’s not how it works my friends. You need to share all of the information that is in your head. Give it your hopes and dreams but more importantly provided the needed CONTEXT and SPECIFICITY.

Can you pull the structured information from this message and put it into a pipe delimited formatted that I could read into Qlik Sense? The fields I need are  Buyer Name, Dentistry Type, Goal Type, Desired Location, Post Date, Transition Timeline, Transition Preference and Operatory Room Requirements

 A more consistent response driven by a more clearly defined question. Visualizing structure in an unstructured world requires better prompts.

Figure 5: A more consistent response driven by a more clearly defined question. Visualizing structure in an unstructured world requires better prompts.

 

That worked out great and in fact was very consistent as I tried message after message after message. While I could easily parse that out in Qlik Sense, I didn’t want the hassle of the 10 seconds needed to remove the header row. Qlik Dork is a busy man you know. 😉

Then it hit me, I needed my response in more of a code friendly manner. I know what you are thinking … “Qlik Dork, LLM’s are really good at generating code.” You are so right … I should have thought about that. Perfect technology for inferring information from language and really good technology for most coding type tasks. Glad I have you here to push me to go further:

You are a data modeler and I need the following structured information that must be in a well formed JSON format that can be loaded by an application so no extra text above or below the json, and the field names should be in upper case but the data values should remain as is. The fields I need are Buyer Name, Dentistry Type, Goal Type, Desired Location, Post Date, Transition Timeline, Transition Preference and Operatory Room Requirements. If there are no values for Transition please state No Preference. Also please add a recommendation string on what you think might be needed to enhance the post so it would stand out and call that field RECOMMENDATION

For the sake of your time I will skip a few of the iterations I went through in terms of step by step but didn’t want to share them with you. Before asking for the field names in upper case, I had results that used 3 different formats for casing on the field names. I needed consistency so I asked for it. When no transition information was found sometimes I got null, and sometimes I got something else. I also needed to include the in my visualizations so Allen and his team could select those with no preference. So I asked for it. One of the things that Allen mentioned when we talked was actually asking for a recommendation that could be shared with the message poster to aid them in getting more attention for their post. So I asked for it.

Response formatted as a JSON block with consistent field names in an easily ingestible format that Qlik Sense can utilize for visualizing data in an unstructured world

Figure 6: Response formatted as a JSON block with consistent field names in an easily ingestible format that Qlik Sense can utilize for visualizing data in an unstructured world

Finish it off

How slick is that? This magical LLM was able to set me up for success in visualizing structure in an unstructured world. Now all I had to do was read those field values.

You might be thinking “But Qlik Dork that JSON code isn’t any better than the plain text you were getting originally. It’s still unstructured and it’s not like Qlik Sense allows you to easily pull the values from the JSON blocks.” Aha! You are wrong my friend. Qlik Sense does indeed have functions to deal with JSON, cause you know it’s awesome like that. Easy peasy lemon squeezy, I was able to pull each value directly out of the JSON block returned by my LLM. By using the JSON structure, you can also use that response anywhere else you might want, which is why I handed it to you on a silver platter.

Qlik Sense code using JSON functions to get the values out of the response so that the data can be visualized, filtered, aggregated on etc.

Figure 7: Qlik Sense code using JSON functions to get the values out of the response so that the data can be visualized, filtered, aggregated on etc.

 

LLM Choice

I have created many different Qlik Community Posts and Qlik Dork videos on many different LLM’s out there. If we were speaking privately I could share with you my preferences for different tasks. But for this particular task I selected an LLM that is currently receiving a ton of hype … Deepseek just for the fun of playing with it. “Look at the Qlik Dork on the bleeding edge.”

Figure 8: Deepseek logo

 

Figure 9: Qlik Application Automation Call URL block calling deepseek api to get responses

 

You would probably expect more of me than to do it just because I wanted to play with it, and to stay above the “hype”, and you would be right. So bear with me for a second and understand:

I am not recommending Deepseek. Their policy is to keep everything and that is scary with real data. I was passing faked names and other fake information just to test. The reason I wanted to make you “*gasp*” when I mentioned it was so I could bring up the one point that is driving everyone in the world crazy.

The low low low price of this thing. While you may not want to use it in the open internet, you very well may want to take the open source and run it in some secured network of your own making. The low processing requirements are blowing peoples minds. While I’m still not advocating for it, the fact that AWS and Snowflake now offer Deepseek in secured environments for you, after being tested and proven, might eventually become something you may consider.

So, I wanted to conduct the test for this fun one off process for my friend Allen using this because it was really cheap (and that dude is frugal). Plus I know how to manipulate and mask the data so sharing faked data didn’t bother me.

How low priced are we talking about? I got 1,000,000 tokens for $2. I meant I could have also bought a Double Big Gulp of Mountain Dew for that same price, but the price was worth it for a test. My prompts ranged anywhere from the low 300’s to the low 400’s for tokens used for each question.

Figure 10: Screenshot of the fact it took 315 for one of the prompts

 

Let’s be conservative for our cost calculations and error on the high side, and say 400 tokens per question. That comes out to 2,500 messages that could that I could visualize structure for for $2. That’s 11.5 messages processed for 1 penny. That’s affordable. That meant I didn’t need to smack myself in the head, or sacrifice a second or third Double Big Gulp of Mountain Dew trying figure out how to tune my prompt to give me exactly what I needed. It also afford Allen, and Practice Matchmaker, the chance to visualize structure in their unstructured world comprised of free form messages. By asking for recommendations it affords them to ponder how they could become PraticeMatchmaker.ai and utilize AI in a way to guide their clients to succeed with an affordable price point.

Switching Gears

I know my audience, and I know you aren’t buying or selling a dental practice. But my friend Allen’s “problem point” is the same as in your organization. More than likely you have “unstructured notes/comments/feedback/surveys” but the manual effort to pull useful structured information out of them is more than you wish to deal with. So, that unstructured data just accumulates and collects dust. At the same time you want to utilize AI, in a real and meaningful way. I want my post to be like a matchmaker to help you practice thinking about AI in those situations. (See what I did there?)

So, let’s say you take my recommendation to heart to think about using an LLM and you just happen to be in the healthcare space. You utilize a partner for surveys and they do a great job for of you aggregating the NUMBERS on the surveys. But what about all of that unstructured data the comments they write by hand and their is no structured # to aggregate? Maybe something like this:

The nursing staff was incredibly rude and completely dismissed or ignored questions me and my family had about my care. In some cases I actually saw them roll their eyes, one to the other. The clinical care I received was excellent and I was healed and out of the hospital faster than my surgeon originally told me it would take.

You’ve heard about Sentiment Analysis but let me ask you for real, does a value like .62 negative do much for you? Oh sure you can aggregate the numbers up so someone gets a bonus for “65% of our patients reported positive surveys.” But does the .62 help you know what actions you need to take internally to ensure you drive that number higher? Not a chance.

Are you on the edge of your seat waiting to see what response I get? I sure am so let’s get going. First thing I need to do is change my question.

Figure 11: Healthcare question asked to deepseek to analyze a survey comment

Then I changed my API call and instructed it to use the “deepseek-reasoner” model instead of just the chat model. I also told it to pretend it was a quality improvement analyst. You know, not just some bean counter tracking .62’s all day long.

Figure 12: Screenshot of Qlik Application Automation Call URL block to call the deepseek reasoner model and act as though it is a quality improvement analyst

I might be bias, but I think these results are insane. I’ve repeated this particular exercise on nearly every LLM I have access to, and truth be told, this reasoning model generated the best response I’ve seen to date. I just asked the question directly, one time, without any tuning. I’m sure as I get more familiarity with deepseeks “reasoner” model, I will understand just how far I can push it.

Sentiment Analysis

The survey reflects a mixed sentiment with strong negative emotions toward the nursing staff’s communication and bedside manner, contrasted with positive sentiment about the clinical outcomes and efficiency of care.


Key Words & Emotional Classifications

Negative Sentiment Categories

    1. Frustration/Disrespect: “incredibly rude,” “dismissed,” “ignored,” “roll their eyes.”
    2. Lack of Communication: “ignored questions,” “dismissed concerns.”

Positive Sentiment Categories

    1. Gratitude/Satisfaction: “excellent clinical care,” “healed faster,” “surgeon’s expectations exceeded.”

Word Cloud Suggestions

    • Negative focus: Rude, dismissed, ignored, roll eyes, frustration.
    • Positive focus: Excellent, healed, faster, clinical care, gratitude.

Recommended Actions

Addressing Negative Feedback

    1. Staff Training:
      • Conduct empathy and communication workshops for nursing staff to address dismissive behavior and improve patient-family interactions.
      • Implement role-playing scenarios to practice responding to questions respectfully.
    2. Patient-Family Communication Protocol:
      • Establish a structured process for nurses to acknowledge and address patient/family concerns (e.g., scheduled Q&A times).
    3. Internal Follow-Up:
      • Investigate the incident with the nursing team involved and provide constructive feedback.
      • Consider anonymous patient satisfaction surveys post-discharge to identify recurring issues.

Patient Follow-Up

    • Direct Apology: Have a patient experience manager contact the patient to apologize for the negative interactions and acknowledge their concerns.
    • Commitment to Improvement: Share steps being taken to prevent recurrence (e.g., training initiatives).

Amplifying Positive Feedback

    1. Recognize Clinical Team:
      • Publicly praise the clinical/surgical team in internal communications (e.g., newsletters, team meetings) for delivering efficient, effective care.
      • Consider awards or incentives for teams that exceed recovery expectations.
    2. Share Success Stories:
      • Use anonymized positive feedback in staff training to highlight ideal patient outcomes.

Word Cloud Design Tips
    • Use larger fonts for impactful negative terms (rude, ignored) to emphasize areas needing improvement.
    • Balance with positive terms (excellent, healed) in smaller sizes to reflect gratitude while prioritizing actionable feedback.

This approach addresses systemic issues while reinforcing strengths, fostering a culture of accountability and continuous improvement.

Are you kidding me?

Incredible response. Right?

So, you know, that response used 1,227 tokens (reasoning uses more tokens) for a rough cost of $0.002592, and it is a GAME CHANGER for us. Notice how I stayed with the theme of visualizing structure in an unstructured world. I actually had it roll things up emotionally and it gave me advice on how to build a well designed Word Cloud.

My next steps would be to pull all of this back … from each survey. Pull out the emotional classification words so I can build a real word cloud where I size the words based on how many surveys the were included in. That way I know if “rude” is a 1 off, or if we have a lot of patients that report it. I take the recommended actions and send them through a workflow engine and route them to the appropriate person on staff in whatever manner using Qlik Application Automation for example.

Sure glad my old friend Allen started me on this journey. Seems like his message board use case, for visualizing structure in an unstructured world, is actually really applicable to the rest of the world.

Summary

I am not advocating for Deepseek.

I am not recommending you throw it all your data.

I am not recommending you throw it any of your data. 

All I wanted to do was use if for this test case to highlight the fact that at the price point, and fact that it can run with much less CPU power is something that just might change the ROI needed. Which might justify use cases for Allen, and for you.

At $1 per prompt, Allen might not want to use it. He can pay a human less than that. But at a cost of .000864 per prompt, he can afford it. If it were 2 cents per prompt, would he still find it worth it, so that he can utilize a safer LLM? He’s evaluating that.

For my healthcare example … I think we would all agree that bringing that level of reasoning and presenting an action plan, would absolutely be worth several dollars for every survey comment. And would be a steal at just 2 cents per survey comment with a more trusted LLM. Because seeing RUDE, DISMISSED on a chart would tug at people’s hearts and make them want to get out of their seat to take action, or see it and realize a change is needed on their end.

If your organization is looking for an immediate win with AI – consider using it for visualizing structure in your unstructured world.

 

 

 

 

The post Visualizing Structure in an Unstructured World appeared first on Qlik Dork.

https://qlikdork.com/?p=85528
Extensions
Diving into Parquet
ArchitectureData Modelingassociative engineintegrityjoinmetadataparquetQlik SenseQVDSQL

Diving into Parquet: It’s Not Just About the File Format I’m finally diving into something I’ve been hinting at for a while: Diving into parquet files. We’re going to look at using parquet both internally and externally for storage because:… Read more ›

The post Diving into Parquet appeared first on Qlik Dork.

Show full content
Diving into Parquet: It’s Not Just About the File Format I’m finally diving into something I’ve been hinting at for a while: Diving into parquet files. We’re going to look at using parquet both internally and externally for storage because:
  • As I shared in my Qlik Community Post Making Sense of QVD Files, it’s not about the file format, it’s about the concept of reusing data.
  •  I want to prove that you can share the intellectual property in your applications, regardless of format, for reporting and other uses.
While I will be describing the progression with a few screenshots embedded, the video at the bottom is your way to actually see everything in action. I know this is knew and I didn’t want you think I was faking anything with manipulated images. 😂 Setting the Stage: Data and Metadata Let’s get oriented. In my example, I have a simple data model: patients and encounters.  Screenshot of inline data load script to understand the data values and metadata being used

Figure 1: Screenshot of inline data load script to understand the data values and metadata being used

I have four patients and five encounters. A couple of my patients, Dan The Wizard and Levi the man Turner, are so healthy they have no encounters at all. As I’ve previously discussed with you about metadata, our AI engine interprets their “cognitive wisdom” as 999 years old. This highlights the importance of metadata, which helps me understand my data fields correctly. Without it I might think I have really bad data. In my storing application (Figure 1), I’ve set up comments, table maps, field maps, and tag maps. This metadata travels with QVD files when the data is stored. When I read the data back (Figure 2), I can see my table comments, field comments, and tags.  The Data Model View is a great way to see information about the tables/fields including the metadata in the system about them.

Figure 2: Data Model View of the simple inline data we created in figure 1.

Switching Gears: From QVD to Parquet Now, here’s where it gets interesting. I’m going to switch from QVD to parquet format. I’m still using the same data—patients, encounters, table maps, comment maps, and field tag maps. But this time, instead of storing to a qvd file, I’m going to store to parquet, using the DataFiles, in the Dalton space. I’m creating files named Patients.parquet and Encounters.parquet. When I store to parquet format, my data model in the storing application remains the same, as you’d expect, with all the metadata intact so nothing to see there. You can store to parquet just like you do with QVD by changing the extension and the format to parquet

Figure 3: You can store to parquet just like you do with QVD by changing the extension and the format to parquet

When loading the data from parquet files instead of QVD, I still get my four rows of patients and five encounters. The data is all there. However, if I check my data model viewer, I have a problem: I’ve lost all the metadata.  To read from Parquet you simply change the file extension you are reading from and the format

Figure 4: To read from Parquet you simply change the file extension you are reading from and the format

 

When you store or read from parquet files all of the data is fine, but you lose the precious metadata

Figure 5: Data Model View of application that read from Parquet shows all of the data, but the metadata is missing

Parquet simply doesn’t retain the metadata. Therefore, if you are using files internally and need to keep your metadata, sticking with a QVD file format is likely your best option. Metadata aside, if I am just looking at the visuals, they remain the same whether I read from .qvd or .parquet files. The end user doesn’t need to know, or care about, the underlying architecture.  ✨✨✨ Good time to notice that Dan Wizard Pilla and Levi TheMan Turner have no encounters, and that EncounterID 103 has an associated PatientID that doesn’t actually exist. Keep note in your head of this as it will be important soon. Now back to post. ✨✨✨ Other than the metadata all of our data values survived the trip when storing into parquet files and reading from them.

Figure 6: Screenshot of a some filter panels and table object showing the data read in from parquet tables

Taking It External: Storing Parquet in S3 While end users don’t care about architecture, your boss’ boss just may. He might insist on not retaining so much IP inside of proprietary Qlik QVD files. I just showed you could do that. Yay. But you know that boss’ bosses can be bossy and their next edict is that it has to be shareable outside of your Qlik environment. Don’t panic that’s easily doable.  Let’s go ahead and store those Parquet files into an Amazon S3 bucket, rather than our data files library. Agility is the whole purpose of the libraries to begin with. I can’t help it if you were lazy and just always picked the default “DataFiles” library. 😁 I’m using the same four patients, five encounters, and mappings. This time, I’m storing to an Amazon S3 bucket (Figure 7).  This post was entitled Diving into Parquet and we just leaped with both feet. Storing multiple tables into a single parquet file that will act as a mart, not just a table

Figure 7: Storing both the Patients and Encounters table into a single parquet file called HealthMart

Did you actually read the image or did you just scroll your eyes over it. Because if you read it, and understood the implications in it … your mind would be blown. 🤯 In my previous videos and my Community Post: Making Sense of QVD files , I highlighted over and over that QVD files were “Qlik Virtual Data warehouse tables.” They are 1 for 1 for internal tables. But wowza, when you store to Parquet there is no 1 for 1 table limit. As it might be your “Gold/Mart” layer when you store it … the cool thing is it can actually be a “MART” that contains all of the needed tables. Just like a Data Mart, it has 1 center and is most likely a Star schema. If you want to dive into this, the first file must be the center of that star or snowflake in your Qlik data model. If I have other data islands for example, I can not write those tables into the same parquet “mart.” Notice in Figure 7, that rather than naming the file like I normally would with a table name … I actually called my “HealthMart.” Maybe, just maybe … your boss’ boss edict to externalize was actually a good thing for you. Sure you were ticked off at first, but now you just might feel like a superstar. Just like Qlik Data Product rolling up the entire data model for easy end user consumption … you’ve encapsulated all of the tables into 1 parquet file. My loading app is going to change to read externally from the Amazon S3 bucket and when I choose to read the “HealthMart.parquet” file (Figure 8) I see something interesting, and perhaps confusing.  Notice that I don’t see a Patients table or an Encounters table. Instead I see a “HealthMart” table that sure looks like my Patient data. And another table called PatientID:Encounters. Even more interesting perhaps is the surrogate that got created “%Key_PatientID:Encounters.” Are those good things? Load select from our Parquet mart, not table, file showing that much like an Excel load with multiple worksheets you can choose which tables in the mart you want. Also shows that the tables aren't the same table names we stored them as.

Figure 8: Load select from our Parquet mart, not table, file showing that much like an Excel load with multiple worksheets you can choose which tables in the mart you want. Also shows that the tables aren’t the same table names we stored them as.

Mart vs Table When we stored the data “as a mart” not as a “table” Qlik chooses to use the name we give the file for that center spoke. We called the file “HealthMart.Parquet” and it said “rock on Qlik Dork I will refer to your data mart as “HealthMart.” That’s a great thing. Understanding the other … is a bit more complex. But if I do the job right, you will understand it in no time. Within Qlik’s Associative Engine we have no problem at all with the fact that we have 2 patients with no encounters, and an encounter that had no patient. But If we are “externalizing” our data beyond Qlik’s walls … then we can no longer think with an Associative Engine mindset. We have to flatten the data out, the way a SQL VIEW would look. Within Qlik you don’t have to think about Inner, Outer, Left, Right, Upside down or inside out joins. But in a Data Mart world you do again. So, let’s take a quick look, just for giggles and grins at the Encounters “table” that is in our mart in Figure 9. A few things you need to notice: There are only 4 valid encounters listed, what happened to 103 and why are there 2 blank encounters but values for that weird surrogate key? When you store/read data from a Parquet mart you need to be aware of the things you loved forgetting about SQL JOINs.

Figure 9: Load Select for the Encounters table data read from our parquet mart.

 

Most Data Mart generation systems create the surrogate key to utilize to join facts/dimensions not knowing if you might perhaps change the  value. Maybe Dalton Ruer with PatientID 1 today, becomes 000001 tomorrow. Using this surrogate key value instead allows the relationships in the marts to keep on ticking without having to cascade changes to tons of tables down the road. Maybe we wouldn’t want anyone to see the actual PatientID field because it was PII. Well that surrogate key is sure handy dandy. Thanks for elevating our game Qlik. The question we are left with then is “why do we have 2 blank encounter rows with Dan and Levi’s patient surrogate key?” Since Patients was the center of the star, we MUST RETAIN all of it’s information. Qlik has flattened out that in-memory Associative Engine version of Encounters into more of a SQL view, and like a Left Join with the Patients information. We end up with 2 blank encounters but the %Key_PatientID:Encounters surrogate value. And just like a Left Join we do not return Encounter 103. Aha. Now I remember why I fell in love with the Associative Engine 15 years … I hated dealing with the symptoms of joins when the underlying data wasn’t perfect and had referential integrity. Almost like I created that inline data many posts earlier for this very reason.😎 Swapping Left and Right I hear you Mr./Mrs. Smarty Pants. Yes I did change my mart storage to store Encounters and then Patients. And you are 100% correct … I do get to see all of my encounters. Screenshot of load select where I created the parquet mart with encounters first and patients second

Figure 10: Screenshot of load select where I created the parquet mart with encounters first and patients second

Then Encounters becomes the LEFT side of a LEFT JOIN … and we end up with an entirely different issue with our Patients table. Dan and Levi, the stars of my AI Cognitive Wisdom Age calculation to demonstrate the importance of metadata are entirely gone from my system. Like I said … I hated dealing with the symptoms of joins when the underlying data wasn’t perfect and had referential integrity problems like I introduced. Be sure and consider this and other limitations of storing to Parquet files in general. Especially keep this in mind if you are going to try and create a Parquet MART. I entitled the post Diving into Parquet and we just dove. Depending on your data it might look like a reverse 4.5 somersault in the pike position with no splash on entry, or it could look like a belly flop. I merely wanted to point out that it is an option. Within Qlik Talend Cloud you would likely use our Data Product to surface a wonderful mart of all of the required tables. When sending data externally this is an option you might consider, as long as you understand the risks. If you have integrity issues, then I strongly suggest sticking with pure table based parquet files. That way all your data will travel forward. When you store into a mart ... you always have to consider what data might get left behind when it's flattened out.

Figure 11: Screenshot of the load select focusing on the loss of patients if we stored encounters into our parquet mart first

 

🧊🧊 If you got excited about the idea of having all of the parquet tables in one place as a data mart, but really want to ensure all of your data for all of your tables is maintained; Don’t give up hope as there just might be an iceberg in the waters ahead that can make you look like a hero. 🧊🧊

The Metadata Challenge Let’s put the referential integrity issues aside, since I forced them to make a point. We need to get back to the fact that we lost our metadata when saving to table based or mart based Parquet files. I really REALLY REALLY think it’s important that we keep it. If you need a refresher on the importance of metadata, please take the time to read my previous post that I’ve shared multiple times entitled: The Importance of Metada. If you agree with me that anyone, including people external to Qlik should know that the Age field isn’t biological then we need to solve this problem. Right? In the previous examples the metadata was created simply using MAPPING LOADS. That’s in memory only and can’t be shared. But nobody will think less of us if we make our Metadata a “real table.” Because “real tables” can be stored. If I alternatively switch to a real table that contains all of the information my code would look like this Screenshot of inline load script using a physical table store to contain all our metadata that is then read mapping table to apply the metadata to tables and fields

Figure 12 Screenshot of inline load script using a physical table store to contain all our metadata that is then read mapping table to apply the metadata to tables and fields

Solved I have all the same tagging “maps” that I needed, but now I can literally just store the metadata table itself into a Parquet file as well. So, external users can see it, just like they see .ReadMe or .MD files etc. If you have a "real" table that contains all your metadata then you are free to use it downstream internally or externally in this case

Figure 12: Screenshot of store statements saving the tables to a parquet mart file, and also storing the metdata for the entire mart to a parquet file as well

Notice my last step it is to simply Drop the Health_Metadata table because I don’t want a weird data island in the model. But as a challenge … feel free to comment on any thoughts or ways you can see yourself actually using that Metadata Island inside of an application? As you think of ideas … suddenly you will be loving METADATA as much as I do. While my load script code purposely removed the metadata I wanted you to see that you could retain it and challenge you to think of ways of using it.

Figure 13: Screenshot of the Data Model view showing our Metadata table

Reading the Metadata In my reading application all I have to do is implement the same basic code structure. Instead of an INLINE statement to build the metadata, I literally just read it from the file. Then I still go through and create the maps and then comment/tag the Tables and Fields. That’s just to easy. When reading from a metadata file all you need to do is read the table instead of using an inline and then simply build your mapping loads from it

Figure 14: Screenshot of the load script for an application that reads the metadata

Video I realize it’s hard to really grasp these very new concepts by just reading and seeing images. To that end please feel free to listen to me while you see the loading actually occuring.

Bonus

Just because I appreciate your actually reading the entire post I wanted to throw in a little bonus. I wanted you to see that not only does it all work with external S3 storage and Parquet … you can track the lineage of it as well. How cool is that? Now the Dork in me wants to have 2 different storage applications to see that the two tables in the healthmart file come from different locations.

The post Diving into Parquet appeared first on Qlik Dork.

https://qlikdork.com/?p=85492
Extensions
Decentralizing Metadata
AdminApplication AutomationData ModelingUser AdoptionVisualizationMaster ItemsmetadataQlikQlik SenseQVD

Prerequisite In my previous post on The Importance of Metadata I shared an example of some data values that looked crazy when viewed without the metadata about them. Then I shared how you could add that metadata into your Qlik… Read more ›

The post Decentralizing Metadata appeared first on Qlik Dork.

Show full content
Prerequisite

In my previous post on The Importance of Metadata I shared an example of some data values that looked crazy when viewed without the metadata about them. Then I shared how you could add that metadata into your Qlik Virtual Data warehouse files/tables. Normally I would simply put a blurb about “in my previous post … blah blah blah.” In this you should consider it a prerequisite so you understand the data involved, the metadata involved and more importantly: the concept of decentralizing the metadata to other developers not just yourself. If you don’t understand those things this post might go way over your head.

One person that totally understood was Armando Lopez. Before there were even any other comments on my post, Armando wanted more. “Qlik Dork it’s awesome that you can decentralize all of that delicious metadata with other developers, those who can read script files, and those who can understand the Data Model Viewer … but what about my users? They are BEGGING ME FOR IT?” Ok, I shared the image so you know that’s not exactly how he phrased it. I might have taken liberties for a smile.

Story as old as time. End users see lots of different values for the same "term" and aren't sure they can trust any of them. They are clammering for the metadata so they know if they should trust the answers. If they can't trust the answers, they can't act on the answers.

Armando asks about decentralizing metadata to end users as well as developers

Decentralizing Metadata to End Users

I’ve never met Armando’s end users. I’ve never even met Armando. But with my ESP I can tell you exactly how the conversations went with Armando, and his end users, and between your end users and you —- “Our company has many different ways that the same things can be measured. This report says the KPI is 17, that report says the KPI is 20 and that report says the KPI is 14. I can’t take ACTION because I don’t TRUST what I’m looking at.” Or worse, every meeting ends in a fight over the numbers and the only action items are “Dalton you go off and get the detail to prove to me why the number is 17, and Armando your action item is to prove why the number is 14.”

I was in that environment for 30 years … so, while I have never met Armando’s end users I do know their pain. There is no such thing as a “single source of truth.” There is a single source of truth by department, or organization, or even managers. Teams/people call them the same thing, but you absolutely must know which “version of the truth” you are seeing on the screen. If you have no idea, then you can’t trust it, and if you don’t trust it you certainly can’t act on it.

Please do not miss this point

In most environments the problem is even worse … I go back and run my SQL query and I get 18, not 17 anymore. Armando runs his SQL query again and he now sees 16 instead of 14. One of the most important aspects of using Qlik Virtual Data warehouse tables/files (QVDs) in the first place … is the fact that the data in them is a point in time snapshot and is CONSISTENT. While our calculations are obviously different, the data itself is CONSISTENT. Me and Armando can meet, share some cookies, enjoy a Mountain Dew and discuss our EXPRESSIONS/CALCULATIONS. Without having to pull our hair out, or doubt ourselves. Because each of us can absolutely reproduce the same numbers, without worrying that the source system values are constantly in flux.

Please do not miss this point either

As a customer who loved QlikView, I saw Qlik Sense for the first time in November 2014 and it changed my world. Not because it was new. Not because of responsive visuals that resized and changed what was displayed as the user changed what platform or size of screen they were using. The reason was simple … MASTER ITEMS. The ability to add the metadata and the heavy duty coding into a simple consumable Dimension or Measure that application designers could easily use to drag/drop and create screens and because END USERS could easily hover of a field and SEE THAT METADATA.

As a QlikView customer you better believe that I had lots of “super users.” They could copy objects and add a column. “Ooooh.” “Ahhhh.” They couldn’t change the KPI’s because they couldn’t recreate different versions of expressions they weren’t coders. Unless I hand coded metadata into each object, nobody was able to know for user which version of truth they saw. Master Items changed all of that. Now designers/super users can drag/drop them and see the definitions to know “oh I see Dalton was using this way that Division 1 considers to be right, and my department is in Division 2 and considers this way to be right and that’s what Armando was showing.

Getting personal for a moment … Master Items were so important to me, that I joined Qlik just months later to help customers globally understand their game changing power. So, you better believe I jumped in joy when Armando proactively asked for what I already had planned: “A way to expose that metadata to end users.” 🙏

Example

In the SAP world there are literally dozens of ways to calculate a value called Days Sales Outstanding and it is one of the biggest KPI’s organizations track. You can imagine how many fights might ensue in a meeting if nobody understands which of the dozens is being use. But what a difference it makes being able to hover over a value and see that in Dalton’s KPI it is a “weighted” calculation, where Armando’s is a “raw” value with no regard for the transaction value.

Master Items that allow end users to know with confidence if they are looking at the right version of the truth for their team/department/division

Weighted DSO Calculation with metadata visible thanks to Master Item

Master Items that allow end users to know with confidence if they are looking at the right version of the truth for their team/department/division

A different DSO calculation that is NOT weighted and the metadata is clear

 

QVD to Master Item

Now that we are all on the same page:

💡We believe that decentralizing metadata is important

💡We know how to do that in the QVD files themselves

💡We know that Qlik Sense provides the means to share that information to designers and end users

The next obvious thing we need to do is move all that wonderful metadata about the fields into some Master Items. I could have responded to Armando “Just copy/paste it one field at a time.” It’s not like we provide any means of Automating that task.

Oooh wait a second, we do have that Qlik Application Automation thing-a-ma-jiggy. Let’s think this through for a second because we just might be on to something here.

The Task of Decentralizing the Metadata

The manual task to decentralize the data even further to end users would simply be to go through all of the fields in an application one at time, copy the metadata, create a new Master Dimension, then paste the “comments” into the Description field, and then copy the tags for the field and paste them into the “tags” for the new Master Dimension. Oh and press Create of course.

Blank screen to Create a New Master Dimension

Blank screen to Create a New Master Dimension

Some of you reading are thinking “that doesn’t really seem so bad, I can do that manually with no problems.” At least I don’t have to think about using that Qlik Application Automation stuff. I’ve never really understood it, and who knows, it might make my head hurt. 😊

Others of you may build  applications that might have hundreds of fields in them. Like the SAP Accelerator for Orders to Cash application that I built to help our customers accelerate their journey. That might really stink to have to go through that manual process 1 field at a time. So, maybe, just maybe, this is a good time to finally take a look into Qlik Application Automation.

Quick Summary screen that utilizes system tables/fields to help the end user understand the volume of data inside the application.

Quick Summary screen that utilizes system tables/fields to help the end user understand the volume of data inside the application.

Automating the Task to Decentralize the Metadata

In this post I am not going to teach you everything about Qlik Application Automation. I’ve already created an entire playlist on YouTube that you can watch to learn the basics and the concepts that you can watch. 

What I will do is walk you through the algorithm/logic I used to automate the process of decentralizing the metadata with end users. As I’m going to share this Qlik Application Automation with you I wanted to start by displaying a picklist where you could simply choose the application you wanted to do this for. I could have been cheesy and hardcoded a single application, but you know that’s not how I roll.

While you can build an Application Automation for a hard coded application id, it's just as easy to allow a pick list so you can agilely do the same things as needed in more applications

Dropdown list showing all of the application names and ID’s for a particular space

Step 1

How did I present the list? Easy breezy lemon squeezy. I use a “List Applications” block that iterates through all of the applications. In my case the list of “all applications” is a bit overwhelming. So, I used the setting in that “block” to just choose my “personal” space where I knew the applications were that I would test with. As it looped through each one I simply added the Application Name and the ID to a Variable that itself was a list. Notice in the image above that I wrapped the ID in square brackets “[ ]” that’s important momentarily.

Screenshot from the Qlik Application Automation that lists the applications and builds the list previously visualized

Screenshot from the Qlik Application Automation that lists the applications and builds the list previously visualized

Step 2

Next I simply created an Input Block. Normally when you see them, like you did if you watched my playlist for Qlik Application Automations, you simply see an input field. In our case, I needed to utilize the Dropdown type instead. As for the options … too easy. I just handed it the list of application names and ids’ created above. Voila easy input box that I (you) can click and choose the desired application.

Screenshot from the Qlik Application Automation that uses the list for the dropdown input

Screenshot from the Qlik Application Automation that uses the list for the dropdown input

Step 3

Now that an application was chosen, I need to create a variable that would contain the Application ID. That’s what other needed blocks need as input. I previously made a point of indicating that I was wrapping the Application inside of square brackets. That was so that I could easily use a REGEX expression to parse the ID out from inside those brackets.

Screenshot from the Qlik Application Automation that uses a regex expression to parse out the selected application id from the dropdown list

Screenshot from the Qlik Application Automation that uses a regex expression to parse out the selected application id from the dropdown list

Don’t be too impressed. I knew I needed a REGEX expression. I knew it would work. But I don’t recite REGEX syntax in my sleep. So, how did I come up with the right set of jibberish?

Chat GPT of course. 😆

While I don't advocate just naively using any code Chat GPT recommends, I certainly do in cases like this where you need a regex expression you don't quickly know how to build by hand. If it's wrong the function will simply fail

Screenshot from Chat GPT showing one of the things it is really good at … building code you may not be strong in

Step  4

The next step was to have the Automation “open” the selected application and get the important information in it … LIKE THE LIST OF FIELDS. Then I simply looped through them.

Screenshot from the Qlik Application Automation showing you can ask for the application information about your selected application and then loop the fields it contains

Screenshot from the Qlik Application Automation showing you can ask for the application information about your selected application and then loop the fields it contains

Step 5

Before you say “I get it … then you simply used a block to create a Master Dimension for every field” … there are 2 things you should consider:

  1. Qlik maintains a series of “system tables” and “system fields.” They are values you can reference inside of your application. For instance that Quick Summary table object I used to illustrate that my SAP Accelerator had 963 fields. I didn’t really want those to become Master Dimensions.
  2. That value 963 fields might be a little overwhelming for application designers or end users to scroll through.

With both of those things in my mind I decided that I only wanted to “Create or Update” the Master Dimensions for those fields that had a value for the table they were sourced in, and that had a tag value of “%Dimension” that I had to use inside my load (creation) script as a clue … “Hey Automation this field is important so please create a Master Dimension” for it.

While it's awesome to have work done for you that saves you massive amounts of hours ... always consider the time for the people for which it's intended. You will almost certainly want to apply conditions to which of your fields you wish to create the master dimensions for

Screenshot from the Qlik Application Automation showing you can apply conditions so you aren’t just creating dimensions for every field in the application

Step 6

Immediately upon reading my conditions you might have thought of your own. That’s a great thing. The great thing is that you easily add as many conditions and decide if they are AND’s or OR’s. I simply chose something simple to show you the logic.

The most important part of this Qlik Application Automation is this step … Creating or Updating a Master Dimension with all of that previous metadata. In the image you can see that I simply said to create the Dimension with the same name as the field, decentralize the metadata in the Description field, and then add the tag metadata in the tags field. If the Master Dimension was already created, it is updated. If the Master Dimension did not exist, it will be created.

Screenshot from the Qlik Application Automation showing the good stuff ... creating a master dimension to further decentralize the metadata for end users and designers

Screenshot from the Qlik Application Automation showing the good stuff … creating a master dimension to further decentralize the metadata for end users and designers

Hands on to Decentralize your Metadata

I built this Qlik Application Automation to be shared. I wanted y’all to actually to be able to use it. So, here is the link where you can pull this down from a GitHub repository I created to share Qlik Application Automations.

Click here to go to the Qlik Dork’s GitHub Repository to get the Create Master Dimensions automation

If you really are new to Qlik Application Automations you might be saying “Gee, thanks for a JSON file Qlik Dork now what do I with it?” Well be sure to read this Qlik Community post that walks through how to import the JSON you download from me, into your tenant. Also be sure and watch the video playlist I shared earlier in this post to understand more about them.

Summary

The purpose of this post was to help you further value and appreciate the metadata you have, and help you further decentralize the metadata to your users as well as developers. If you are trying to increase adoption, and grow trust to encourage action … this is something you really need to focus on. I used a Qlik Application Automation to greatly simplify and speed up the initial process. When a new application is built from the QVD’s, simply load them, and before you start building screens … run this automation to decentralize the metadata into Master Dimensions to help you get started.

The End

Not really. Because I know some wise guy out there is going to respond in comments here, or on LinkedIn and say “Well that was really slick. But I’ve got a manager who really wants all of this deliciously powerful metadata available outside of the Qlik environment.” I hope they don’t because my goodness I’m not sure how I would respond. 😉

 

 

 

The post Decentralizing Metadata appeared first on Qlik Dork.

https://qlikdork.com/?p=85463
Extensions
The Importance of Metadata
CodingData ModelingmetadataQlikQlik SenseQlikViewQVD

Setting the Stage Imagine that you are an average ordinary Qlik Developer. You start to create a new application for some healthcare big cheese so you find the virtual data warehouse of QVD files and all goes well. The data… Read more ›

The post The Importance of Metadata appeared first on Qlik Dork.

Show full content
Setting the Stage

Imagine that you are an average ordinary Qlik Developer. You start to create a new application for some healthcare big cheese so you find the virtual data warehouse of QVD files and all goes well. The data loads just fine and just for giggles and grins you utilize the Data Preview in the Load Script to see the values and panic hits you. What on earth is wrong with the Age field? That Qlik Dork guy is falling apart and could be 99 for all you know. But Dan and Levi? C’mon they can’t possibly be that old. Suddenly you are reminded of the importance of metadata. If you knew how that field was defined, it might help you understand those crazy values. Right?

Without metadata anyone who sees ages of 999 could totally freak out

Anomalies in data showcase the importance of metadata

Comments = Metadata

Since I first started coding I had a love/hate relationship with comments. You might know the feeling. You want to type your code as fast as you can while the algorithm is fresh in your head. It will be quick … you don’t need comments in your script.

After sleeping for a few weeks/months/years … you have to go back in and edit that comment-less code. You smack yourself in the head because without comments, your well thought out algorithm and flow now makes no sense. Why did you have an IF statement for a code value of 930XQ? After spending 5 days of trial and error, the spider webs begin to clear out of your head. You are able to update the code and this time you do what your first coding instructor told you that you should always do … you add comments to the code. Now you, and any other developer who looks at your fine work can fully understand it.

What does any of that have to do with the work you do in extracting and transforming data and storing those data tables into QVDs for other developers?

Everything. Without metadata (comments from you describing the data) those QVDs in a Qlik Virtual Data warehouse environment, the data is like a bunch of spaghetti code with no comments. When an anomaly arises, weeks of trying to clear the spider webs.

Creating Metadata with Comments

It was so important to the developers they literally called the function to add “metadata” about the data “Comment.” You are able to add the much needed secret decoder metadata with table comments and field comments. They even provided a way for you to add metadata tags to your fields.

Metadata is added to tables and fields simply through the use of the Comment or Tag functions. Seeing the description for the Age field makes the values 999 make a lot more "sense" (pun intended)

Metadata is added to tables and fields simply through the use of the Comment or Tag functions

I know what you are thinking “Qlik Dork you would still have the same problem loading QVD files because you won’t see the code that extracts and transforms and stores the QVD files into the Qlik Virtual Data warehouse.”

Metadata Travels

The beauty is that all of the metadata comments or tags you add … actually become part of the data model and the QVD files themselves. Because Qlik absolutely understands the importance of metadata in helping everyone trust the data.

The Metadata you add to tables/fields is revealed in the data model viewer because Qlik understands the importance of that metadata to data modelers and developers

Data Model Viewer of the creating or utilizing application reveals the Metadata

While you are 100% correct … I will never see those comments in the application you built, me and any developer who has to maintain your application can quickly understand why the Age values are so high. You were using artificial intelligence to gain an understanding of the cognitive abilities. No doubt in any of our minds that cognitively Dan Wizard Pilla and Levi TheMan Turner are clearly working like someone with the wisdom gained from 999 years.

That few minutes of creating comments/metadata just saved both of us weeks of panic and work. I sure hope you added metadata for the Encounters table I loaded as well. Because understanding the Arrival and Discharge times are NOT the values when someone enters data into the Electronic Health Record will be huge, especially if the “know it all” down the hall presents screenshots of the electronic health record system showing your times don’t match. “Oh good you did. You must have had that issue in the past.”

Code revealing how to add metadata to the Encounters table

Code revealing how to add metadata to the Encounters table

Why isn’t PatientID commented for the Encounters table?

I’m just asking for a friend. I applaud you being so vigilant in your work, but the PatientID field is in the Encounters table, but you didn’t add a comment for it. I will never see your code anyway, but they might and they might be curious. Or if you realize I’m teasing you and I wrote the above code and you are seeing it, you might be curious.

The answer is the same as for the question about “what makes Qlik so unique and so powerful?” The Associative Engine of course.

Fields don’t exist as part of “tables.” They just exist in their own symbol tables, and bit pointers map the associations across everything.

Luckily, there is an alternative method of adding all this wonderful metadata and that is through a variation of the “Comment” function that lets you create mapping tables and simply “map” all of the comments/tags at 1 time. Faster to type without all of the syntax characters like line by line, and I prefer this method because it won’t confuse anyone either. That’s kind of cool. Right?

Code that reveals that rather than adding metadata 1 table/field/tag at a time you can create mapping tables and add all the metadata from them. Saving you time.

Code that reveals that rather than adding metadata 1 table/field/tag at a time you can create mapping tables and add all the metadata from them.

Maybe this easier method will help you lean a bit more towards love, than hate, when it comes to adding the needed metadata.

Summary of The Importance of Metadata

The goal of this post was to help you realize how important it is for you, and to encourage you “the data modeler:”

  • To really take to heart the importance of metadata for yourself and other developers
  • To understand how that metadata makes data so much clearer for everyone involved, especially in a decentralized development world
  • To take the few minutes needed to input key metadata with: the future you, anyone that later maintains the code, or tries to utilize the QVD files that you have contributed to the organizations Qlik Virtual Data warehouse library will thank you.

If you have already been doing this I’d love to see your comments about how valuable it is in your organization. 😎

If you didn’t know this could be done and are open to sharing I’d love to read your comments about the times you sure wish you had. 😉

Bonus Concept

While I won’t go into a deeper dive I did want to bring up a bonus concept. One that just hit me when looked at the image of the comment/tag maps. Purely by accident of course. Not pre-planned in anyway of course.  😉

“What if that metadata was read from a catalog system or database instead of being hand typed?” Interesting huh?

Metadata: The Secret Decoder Ring of Data Quiz

Unlike other recent posts … you are off the hook and there is no quiz. You either understand the importance of metadata after reading this, or it is me that has failed. 😭

The post The Importance of Metadata appeared first on Qlik Dork.

https://qlikdork.com/?p=85450
Extensions
Visualizing a Knowledge Graph?
Data ModelingVisualizationdata visualizationHealthcareKnowledge GraphLink TableNetwork ChartQlik SenseSemantic Layer

Goal While on a recent journey to learn more about Semantic Modeling and Graphing Database technology, I set off to try and visualize a Knowledge Graph. Oh sure I could have created some fake sales data with 2 or 3… Read more ›

The post Visualizing a Knowledge Graph? appeared first on Qlik Dork.

Show full content
Goal

While on a recent journey to learn more about Semantic Modeling and Graphing Database technology, I set off to try and visualize a Knowledge Graph. Oh sure I could have created some fake sales data with 2 or 3 simple tables, but you know that’s not how I roll. I wanted to visualize a knowledge graph that was more real world.

I wanted to visualize a knowledge graph that was complex and made me think.

I always talk about how complex Healthcare data is … so I thought “If I could walk the walk, instead of just talking the talk about how complex Healthcare data is, then I would really be able to make a point.”

So, I sat down with my headphones on and 15 minutes later I had this beautiful Network Chart completed. All my healthcare friends would be pleased that I put patients at the center of the universe, and selected good ‘ole Dalton Ruer as a demonstration. Surrounding Dalton are all of his encounters. Each encounter has Procedures performed, Care Plans that were issued, Diagnosis that were made, and Prescriptions that were rendered. Around those are the details about the Procedures, Care Plans, Conditions and Medications. Along with the Providers and the details about them. If you ever wondered what a “frequent flyer” in a health system looked like … This is it. That guy clearly has issues.

Network Chart fully populated and displaying a single parent

Truth be told

I was crossing my fingers when I said “15 minutes later” (kind of fibbing a bit) because honestly, I struggled a little. Ok, I struggled a lot.

I started out great … all the data loaded and the node table I built visualized very cleanly. But alas, I filtered for Dalton Ruer … and my chart didn’t change a single bit. I tapped my keyboard ever so gently, threw my mouse across the room, and screamed … but nothing changed. WHY OH WHY was this so hard. All my lovely filters worked perfectly. I could see all of the information mentioned above in them. But nothing changed in my Network Chart changed.

This chart is intended to illustrate the struggle where the nodes used for the Network Chart had no connection to the actual data

Base Network Chart with no connection to a filtered patient

 

Data Modeling

It’s been weeks, but I’m still blushing as I share an image of the first data model that I created. Now you might be thinking “Qlik Dork you should have denormalized those end tables so you had a Star Schema, instead of a Snowflake Schema.” But that wasn’t the issue at all. So, don’t wag your finger at me. 😆

It was a much more fundamental issue. It’s ok if it doesn’t jump out at you right away … it didn’t jump out to me either. (Which is a good thing because my struggle led to this post which will hopefully help you alleviate a struggle in the future.)

My Classes data island worked great, until I made a selection because it had no connection to the actual data

Data Model view of my problematic data model

Network Chart

Before I go much further, I should give you some details about what  Network Chart is and the values required to make them work. A Network Chart is a Qlik Sense Extension available as part of the Qlik Visualization bundle. Simply go to Custom Objects and drag/drop one to your screen.

A Network Chart is a specialized visualization extension that can be used to display knowledge graphs

Illustration of where you find a Network Chart

A Network Chart takes  a series of parameters. 4 Dimensions and 1 Measure that can be displayed as Help.

The values in my Classes table mapped directly to the properties required for a Network Chart

Network Chart properties match what is needed to visualize a knowledge graph

 

The Node Identifier is like any other tables “id/key” and needs to a unique value.

The Node Label is what will be displayed when you select the node. The splash image and the image for the Goal section above, demonstrates that.

The Node Parent is a reference to the Node Identifier of the parent node.

The Node Group is really for coloring. In this posts case, it distinguishes Patients from Encounters … etc.

The Tooltip measure is for displaying some value about that instance. In my case I am displaying the “Class IRI.” While the rest of the values all probably made sense, If you aren’t familiar with Graphing Databases, or Knowledge Graphs I’m sure that may be a new term for you. IRI stands for Internationalized Resource Identifier, and is what I’m displaying below the Patient Name in that image in the Goal section. You can read more about them by clicking here. 

Back to the Story

Before I dig into the real data, you need to understand the basic pattern. The image in the “Truth be Told” section was created by starting with this basic Inline code block. A Patient has no parent. An Encounter has a Patient as it’s parent. Other Classes all have the Encounter as their parent. All of it’s Class Groups are 1, which is why in that image all Nodes in the Network Chart are the same color.

I always recommend starting with something simple like this where you simply hand input nodes. So that you can quickly prototype and visualize something like a Network Chart

Image of the base code for the Classes table to help you understand the concept

 

With that foundation I went through a series of walking my actual healthcare information tables.

Pretty simple pattern of how to add information to the Classes table once you are sure you understand the parameters needed to visualize the knowledge graph

Image of code actually used to add my real data into the Classes table to feed our network chart

 

Easy breasy. I looked at the Preview of the Classes table and could see all of my entries. They were perfect.

So, why in the world wasn’t the Network Chart reflecting my selection of Dalton Ruer?

Data Model Modification

If it didn’t stand out to you before, if you go back to the image of the Data Model you will notice what I eventually realized … The Classes table is what is referred to as a Data Island in Qlik terms. It had absolutely no connection to my Patients, Encounters or anything else. Literally out on an island all by itself.

If you look at the script code above you will notice that Key/ID values for each of my real table’s values were renamed as [Class ID] so that they could be displayed in a common manner. After smacking myself in the head, and then laughing, I realized there was no chance of any filters working.

I quickly went back to the code and also added each value so that it would be associated with my actual tables.

Since my Classes table was originally a Data Island I updated it to include the ID's that associated it back to all of my entities.

Update to code to ensure the ID’s for each of my classes were also part of the Class records

 

While that would certainly contain the Associations needed … Can you guess how that worked out for me?

If you guessed “Horribly Horribly wrong” you really understand Data Modeling with Qlik. I had multiple tables referring to the same combinations of fields which created Synthetic Keys. But worse was the fact that a Circular Reference was created. Uggghh.

Nothing can break your heart more than seeing a horrible data model like that in the image. What sounded good on paper, and when typing the code, clearly went sideways and upside down

Image of a horrible data model with a circular reference and a slew of synthetic keys

Data Model Rewrite

It’s all fun and games laughing at myself now, but I assure you at the time I was losing my mind. I tried many attempts … over the course of … many days. I changed the code to eliminate the circular reference and considered the concept of simply not letting anyone see all of the synthetic keys.

But I hit another logical brick wall. While the table was associated, and my filter for my boy Dalton Ruer did show the Patient node for Dalton … all of the other nodes were then excluded. So, all I saw was Dalton and nothing else.

When you try to hide your data modeling issues, even if nobody will see them ... someone will eventually uncover them when visualizing the data

Image of the Network Chart that only showed the selection and no other values

 

Clearly that wasn’t optimal. Dalton was the parent record, and the only record not excluded from the Classes table was (… insert drum roll here …) Dalton as the Patient.

Light Dawns on Marblehead

I realized I needed a Link Table model where all of my ID’s were in 1 table if I wanted to be able to visualize the children and the selected parent(s). If you aren’t familiar with the concept of a Qlik Link Table, then be sure to check out my previous post Creating a Qlik Link Table.

Seriously, if you aren’t already familiar with Qlik Link Tables be sure to read it. If you read it, and wondered why I made a big deal of creating a Link Table just to handle 2/3 tables, the image below will clear that up for you. I needed a single fact record that related all of the information so that when Dalton Ruer was selected, I still knew how to visualize all of that other beautiful information.

A Link Table was the only way to go to resolve this particular type of goal where you are visualizing data in a traditional way as well as using a Network Chart so that filters were honored, and you could also see the children

Cleaned up data model with a Qlik Link Table so all facts are together

 

Yeah you got the Star Schema you wanted when you were wagging your finger at me. But most importantly I had a structure now that was more fitting of visualizing a Knowledge Graph. The final step in my Link Table was creating a Graph Key column that was a unique identifier for each row.

My Link table would still be separated from the Classes table until I added a unique identifier for that concatenation of all my individual fact ids. I chose RowNo() instead of actually trying to concatenate all of the values.

RowNo() added for the Link Table so that each row of parent/child facts was unique

 

Graph Key for the Win

That unique Graph Key was then the field that could associate the Nodes in my Classes table with the data itself. The Link Table solved the problem of only being able see any 1 Node if I filtered to something, while this Graph Key provided the Association to the nodes when I made selections. The code was the same for each class of data. I built a bunch of mapping tables to define what value(s) I wanted to use for each classes label.

For the child classes I used a concatenated key comprised of the GraphKEY value and the data ID for the Class ID.  Why? You probably guessed … The same Encounter ID had lots of children and when I first just used the Encounter ID as the Class ID none of them were distinguishable. Ooops! I shouldn’t have shared that I had made yet another mental mistake. 😂

This is the first blocks of code used to build my correct Classes table. The GraphKey serves to link the Classes table to my LinkTable.

Code to build the Classes table from the Link Table

 

Data Model Viewer image showing the GraphKEY as the association between my Link Table and all the data and the Classes table used to visualize the knowledge graph

Data Model Viewer image showing the GraphKEY as the association between my Link Table and all the data and the Classes table used to visualize the knowledge graph

 

Goal Achieved

The goal of this post wasn’t to argue for or pitch for any data modeling technique over any other. I had a perfectly good associative data model on my first attempt that worked great until the requirements changed. In this case, the requirements were to visualize the data within it, in the form of Triples for a Knowledge Graph, by using the Network Chart visualization. That “oh by the way can you show me … ” feature change, was what caused the need for a different data model approach. One that was able to be achieved by using a Link Tale.

Per the starting image the end result was one in which I could visualize the complexity of all of the data for Dalton Ruer. The following image demonstrates that I am also able to visualize multiple patients if desired as well. Appears like my boss Hugo Sheng is competitive and is trying to catchup on visits.

Image of the goal being achieved illustrating I could pick multiple patients and see the knowledge about both/all of them

Image of the goal being achieved illustrating I could pick multiple patients and see the knowledge about both/all of them

Disclaimer

Admittedly what I visualized in the Network Chart is not a complete Knowledge Graph. Just one representation of the Knowledge it would contain. 

 

Study Guide Questions
  1. What type of data did I use to illustrate the process of visualizing a knowledge graph? Why did I choose this type of data?
  2. What specific Qlik Sense visualization tool is used for creating a knowledge graph representation?
  3. What are the key parameters required by the Network Chart in Qlik Sense? Briefly describe each.
  4. Why did my initial data model prove unsuccessful for visualizing the knowledge graph?
  5. Explain the concept of a “Data Island” in Qlik Sense.
  6. What data modeling problem arose when I tried to directly connect the “Classes” table to the rest of the model?
  7. What is a Qlik Link Table, and why did I decide to use it in their revised data model?
  8. How did the introduction of the “Graph Key” column in the Link Table contribute to the successful visualization of the knowledge graph?
  9. What was the ultimate goal of my exploration of data modeling techniques in this post?
Answers
  1. I used healthcare data to illustrate the process, because of its inherent complexity and real-world relevance, demonstrating the power of knowledge graphs in handling intricate relationships.
  2. The Network Chart, a Qlik Sense extension available in the Qlik Visualization bundle, is what I used for visualizing a knowledge graph.
  3. The Network Chart requires: Node Identifier: a unique identifier for each node, like a primary key. Node Label: the displayed text when selecting a node. Node Parent: a reference to the identifier of the parent node. Node Group: used for color-coding nodes based on category. Tooltip measure: displays a value associated with the node, like an attribute.
  4. My Classes table with the node information was a data island and lacked connections to the other tables, preventing filtering and proper visualization of relationships.
  5. A Data Island in Qlik Sense refers to a table that is not connected to other tables in the data model. The “Classes” table initially acted as a Data Island, hindering the visualization as selections in connected tables could not propagate to the isolated “Classes” table.
  6. Directly connecting the “Classes” table led to synthetic keys due to multiple tables referencing the same field combinations. Additionally, it created circular references in the data model, making the model illogical and unusable for analysis.
  7. A Qlik Link Table is a central table connecting multiple fact tables, simplifying complex relationships. I used it to avoid synthetic keys and circular references, providing a clean and efficient structure for representing the knowledge graph relationships.
  8. The “Graph Key” column in the Link Table served as a unique identifier for each row, linking it to the “Classes” table. This connection enabled the Network Chart to associate specific data points with their corresponding nodes, allowing for dynamic filtering and visualization of the connected data.
  9. My goal was to highlight the importance of choosing the right data modeling approach based on specific analytical/visualization needs. While an associative model might work for general analysis, visualizing a knowledge graph required a Link Table structure for clarity and accurate representation. We all have situations where users introduce new requirements into the situation that may force data model changes. Give yourself grace when that happens.

The post Visualizing a Knowledge Graph? appeared first on Qlik Dork.

https://qlikdork.com/?p=85347
Extensions
Creating a Qlik Link Table
CodingData Modeling

Background Before I dive into the deep end of walking you through the process of creating a Qlik Link Table, I want to share a tiny bit of background. As I was learning Qlik one of the most impactful things… Read more ›

The post Creating a Qlik Link Table appeared first on Qlik Dork.

Show full content
Background Stephen Redmond's masterpiece book "Mastering QlikView" has been my go to resource for over 10 years.

Mastering QlikView by Stephen Redmond

Before I dive into the deep end of walking you through the process of creating a Qlik Link Table, I want to share a tiny bit of background.

As I was learning Qlik one of the most impactful things I ever read came from Stephen Redmond‘s “Mastering QlikView” book. I was really struggling to grasp the concept of the AGGR function, which according to so many experts could do all sorts of awesome things. The problem I struggled with, which you may as well, was the fact that the AGGR function does all it’s wondrous things IN MEMORY. You can’t see it and I’m a visual person.

Stephen said something profound to me, that I will soon be sharing with you in this post. On page 275 he explained that this invisible in-memory AGGR function contained nothing more than the data you could visualize in a Straight Table.

🤯 – My mind was blown, and frankly I felt rather foolish for not realizing that sooner. This Qlik guru was basically saying “Dalton it’s ok for you to be a visual person and need to see it first. If you need to do that, on your way to eventually becoming a Qlik Dork, go right ahead.”

Don’t let the fact that this book is over 10 years old and has the product name “QlikView” in it dismay you from immediately placing an order for it, if it isn’t already in your library. Stephen’s writing style is very much like my own. My theme is “Let’s Make it Simple” and that is exactly how he explains everything in the book. Never forget that the engine and scripting language, behind both QlikView and Qlik Sense are the same. This is a book that will help you take your scripting, and data modeling to the next level and I’m sure you will be able to focus on the code, and not the visuals. The reason I’m giving a shoutout for the concept of providing yourself a visual to understand is something I will use repeatedly within this post.

Setup

In my upcoming post, “Visualizing a Knowledgegraph”, I will walk through a struggle that required me to change my data model. The solution couldn’t be obtained without creating a Qlik link table. Not knowing my audiences familiarity with the concept, I wanted to provide this resource that they could refer to, as well an Google searches that they might use. As you scan the web you will find various terms like “link table” or “bridge table” each with it’s own nuances and reasons for creating them. Some are even about performance. They want to eliminate the hops between entities because just as it takes you time to walk the data model, it takes the Associative Engine time to traverse.

For this post, and the hands on aspects of it, I want you to forget about all of the many nuances, and simply focus on the primary goal … the ability to connect all of our Dimensions/Facts keys in one table. While, still retaining all of the values about the Dimensions/Facts in individual tables. There is simply no better way to learn than by doing. To that end, I will be using pure inline scripts to make it easy for you to copy and paste if you would like to walk through the examples to ensure this concept is solidly at your disposal. You will seldom ever need this, but if you face an issue where you will need it, I would like you to know you are prepared and can resolve it.

Before asking you to code the changes to create a “link table” I will be using Stephen’s great advice and asking you to visualize what you will get by simply using a Straight Table. As you will see it is of great advantage throughout the process.

Associative Engine

Many people getting started with Qlik simply think of it as a data visualization tool. So, in the load scripts they simply execute a Select clause against a SQL View. The same thing they would do with any other “visualization” tool. Over time, they learn, and I sure with analysts would as well, that what distinguishes Qlik (both QlikView and Qlik Sense) from everything else, is the Associative Engine. New Qlikkies realize that while they can execute selects against SQL Views, they will continue leaving data behind, and thus missing part of the important organizational story. They begin by bringing in each table individually as I have below. Again, feel free to copy and paste this load script to a new application so you can really follow along and learn. Clearly I have greatly reduced the complexity of the health data below, as well as the number of rows. 

Patients:
Load * Inline [
PatientID, Name, Age
1, Dalton Ruer, 60
2, Qlik Dork, 30
3, Stephen Redmond,
];

Encounters:
Load * Inline [
EncounterID, PatientID, Arrival, Discharge
101, 1, 12/2/1999 1:42 PM, 12/3/1999 2:17 AM
102, 2, 12/4/2012 3:17 AM, 12/5/2012 1:08 PM
103, 4, 12/17/2023 9:01 PM, 12/17/2023 10:12 PM
104, 1, 4/12/2003 7:16 PM, 4/14/2003 9:04 AM
105, 2, 7/4/2013 8:27 AM, 7/4/2013 10:15 PM
];

If you go ahead and run this script you will see a very very simple Associative Data Model.

Qlik's Associative Engine enables you to access all Encounters, all Patients and the association between them is managed through the PatientID field that exists in both tables.

Associative Relationship between the Patients and Encounters table

 

We will have all the Patients and all the Encounters In-Memory, and we can easily refer to either table when we create data visualizations because they are “associated” with each other because the Encounters table contains the key “PatientID” that links it to the proper patient. If you really are just learning about Qlik, to Stephen’s point, you can visualize the associative engine with a simple table visualization. We have a patient with no encounters (visits) which is perfectly natural. Stephen is a healthy guy. But we can also see that we have an encounter with a Patient ID that doesn’t actually exist. Yikes a referential integrity problem. But data is data and all we can do is reflect what the source system contains.

The great part of the Associative Engine is that you can visualize the data expected to be seen as well as what you might leave behind with SQL

Typical screen showing filters and a table object showing the data we loaded. Not all patients have encounters and not all encounters have patients

 

A simple selection of EncounterID 102 — reveals the nature of the Associative Engine itself. In the table we only see what the SQL View would return if we asked for Encounter 102, while the filters show you that all of the excluded information is still in memory. It’s just excluded because of the filters.

While the table object reflects a where clause of sorts like SQL script, the filters reflect the special nature of the Associative Engine and provide the end users the ability to see what is selected as well as what is excluded

Typical filters and a table reflecting the status of the Associative Engine after selection(s) are made

Link Table

For the sake of your time I will move on to how to use a Link Table in our Data Model, but wanted you to understand why so many do stop at the above form of data modeling. It provides everything they ever dreamed of and more. As I started I have another post where I explained a really complex scenario where this form of just purely relying on Associations between all of the tables didn’t work. I needed something where all of the “facts/ID’s” existed in a single place … a “link table.” I’ve very seldom ever come across this need, but absolutely did in this case. 95% of the time you won’t need to do, what we are about to do … but when you do, you will now be prepared.

Before we go through the coding to create it … I want you to visualize what our “link table” should look like so create a table object and simply include the PatientID and then include EncounterID. Not only is a Table object good for visualizing what to expect from AGGR statements, it’s ideal for seeing the relationships/links managed by the Associative Engine:

Simple Table object with values we would expect to see when we create a link table that contains the keys to our facts

Simple Table object with values we would expect to see when we create a link table that contains the keys to our facts

Coding a Link Table

Coding this is pretty straightforward. The first thing you want to do is simply populate the table, in the same way you built the visual table. You include all of the patients, then you “join in” to include all of the encounters. Go ahead and replace your code with the following and load the data.

Patients:
Load * Inline [
PatientID, Name, Age
1, Dalton Ruer, 60
2, Qlik Dork, 30
3, Stephen Redmond,
];

Encounters:
Load * Inline [
EncounterID, PatientID, Arrival, Discharge
101, 1, 12/2/1999 1:42 PM, 12/3/1999 2:17 AM
102, 2, 12/4/2012 3:17 AM, 12/5/2012 1:08 PM
103, 4, 12/17/2023 9:01 PM, 12/17/2023 10:12 PM
104, 1, 4/12/2003 7:16 PM, 4/14/2003 9:04 AM
105, 2, 7/4/2013 8:27 AM, 7/4/2013 10:15 PM
];

LinkTable:
Load PatientID
Resident Patients;

Join (LinkTable)
Load PatientID, EncounterID
Resident Encounters;

Synthetic Key

Well cruddy crud crud. The fact that we now have both PatientID and EncounterID in the Encounters table, as well as our new link table has caused a Synthetic Key.

The great thing about the load editor is that it will alert you right away if your data model has an issues.

Data Load trace indicating that a Synthetic Key has been encountered

We are trying to simplify things, not make them look even more complicated and confuse everyone if they look at our Data Model.

This image shows that when you create a link table you need to ensure that there is only a single field that is associated.

Data Model Viewer showing the Synthetic Key problem that arises if multiple tables contain an association for multiple field names instead of just one

 

We can avoid the Synthetic Key by simply adding a single line of code at the bottom of the script

Drop Field PatientID From Encounters;

I could have added that initially, but let’s face it … you would have wondered why in the world it was there. Now you know, and when you see an issue in the future you will be reminded how to resolve it. 😂

Link Table Data Model

Go ahead and load the data. After reloading your script you will have the following data model. The Data Preview at the bottom of the script editor reflects exactly what we visualized as our target.

Many times you will never need the Data Preview. But in cases like this where we are really pushing our Data Modeling skills, it can really be handy

Data preview in the Load Script showing our LinkTable is correct

 

If you were to look at the Data Model view, you would see that we now have a clean data model. One in which both tables still maintain their “key field”/”ID” so we can access all of the values for any specific Patient or Encounter. The biggest difference is that the LinkTable now contains the relationships between the Patients and the Encounters that they had.

Corrected Data Model Viewer image of a clean data model where are two facts are linked

Corrected Data Model Viewer image of a clean data model where are two facts are linked

 

More Complexity

Now let’s add some more data into our application. Let’s face it … two tables seems like a weak example. But since my goal is to make things simple, it was an easy way to start without the need to become overwhelmed. In my post about Visualizing a Knowledgegraph I used an illustrate with a ton of tables. Don’t worry I won’t have you add the hundreds of tables that we might really need in Healthcare, we will just add two more. We have a table of Diagnosis codes, problems that patients can have. On any given encounter, a patient might be diagnosed with 1 or more diagnosis codes. The diagnosis would happen at some time, and be diagnosed by someone and someone treatment might be prescribed to take care of the issue. So, let’s add those to your load script and load the data.

Diagnosis:
Load * Inline [
DiagnosisID, Diagnosis Description
999, Mountain Dew Addiction
998, Snicker Bar Addiction
];

EncounterDiagnosis:
Load * Inline [
EncounterID, DiagnosisID, When, ByWhom, Treatment
101, 999, 12/2/1999 3:07 PM, Doctor 1, Big Gulp
101, 998, 12/2/1999 3:08 PM, Doctor 1, King Size Snickers
102, 998, 12/5/2012 11:13 AM, Doctor 1, Baja Blast IV Injection
104, 998, 4/13/2003 7:13 AM, Doctor 2, Double Gulp
];

 

Go ahead and add that script, at the bottom of your code and do a reload. You will see that your data model should now look like this. While you could live with this hybrid, Link Table model mixed with pure Associated Model, we want to change it.

Data Model Viewer reflecting that the Diagnosis table and the table showing which Diagnosis were made on each Encounter

 

Naturally we want that Encounter Diagnosis information to be part of our LinkTable. Before we try to build it, let’s imagine what that would look like, by simply adding the DiagnosisID to our simple Table object:

Don’t you love visualizing the goal before trying to do all the mental gymnastics for something you can’t see? Good thing it’s super simple to do. Go ahead and add the following and do a reload. Notice how I was a nice guy this time, and helped you avoid the Synthetic Key that would have been created had you not dropped the EncounterID field from the EncounterDiagnosis table. 😁

Join (LinkTable)
Load EncounterID, DiagnosisID
Resident EncounterDiagnosis;

Drop Field EncounterID from EncounterDiagnosis;

Yeah! A simple Preview at the bottom of the load script reveals that we met our goal. I guess we are done and should head to an early lunch.

The Data Preview in the load script editor is an excellent way to quickly visualize the values in tables in your data model especially when you are doing something modeling wise, like creating a link table, that might be new to you

Data Preview in the load script editor reflects we matched our expectations

 

Not so fast

Just for giggles and grins, let’s go ahead and add some of that new information to a simple table just to make sure our users will have what they expect.

A table object with all fields indicates we have a Cartesian product. While the data model looked clean, and the LinkTable looked correct visualizing a table object that contains all the information is the only way to know if your data model is truly sound

A table object with all fields indicates we have a Cartesian product

Well that’s certainly not right. Each combination of Patient and Encounter for Dalton Ruer and for the Qlik Dork show all of the diagnosis information from all of their encounters.

I don’t get it!!!!! Our LinkTable reflected exactly what we previewed in a Table object we should have been fine. Right?????

So, what happened? Always remember … the Table object shows the DISTINCT values of the dimensions you choose. Our preview table only contained the PatientID, EncounterID and DiagnosisID. Even though the LinkTable itself now shows the right distinct values that we expected.  And our  Data Model Viewer looks super clean the problem was a logic one.

The “Key” for the EncounterDiagnosis “fact” isn’t really EncounterID or DiagnosisID. It’s really both. But we treated it as though it was only the DiagnosisID. Creating a Cartesian product where all Encounters display all of the new Encounter/Diagnosis information.

Close examination of the Data Model Viewer shows we used the wrong field for the association

Close examination of the Data Model Viewer shows we used the wrong field for the association

 

Resolution(s)

Better order in lunch, or grab a Snickers bar because we need to resolve this. There are really 3 distinct methods we can take to resolve it.

Option 1

Simply go back to the hybrid model, where we do nothing with Diagnosis and the Link table at all. While it avoids the Cartesian issue, and gets us to lunch right away … please don’t do this.

Option 2

If your end user tells you “I will never, ever, in a million years need to see Diagnosis information other than as it relates to Encounters” … then you are free to greatly simplify things. You can simply load the Diagnosis information as a Mapping Table, and then use a Preceding Load and do ApplyMap for the EncounterDiagnosis table.

Diagnosis: Mapping Load * Inline [ DiagnosisID, Diagnosis Description 999, Mountain Dew Addiction 998, Snicker Bar Addiction ]; EncounterDiagnosis: Load  EncounterID,      ApplyMap(‘Diagnosis’, DiagnosisID) as DiagnosisDescription,     When,     ByWhom,     Treatment; Load * Inline [ EncounterID, DiagnosisID, When, ByWhom, Treatment 101, 999, 12/2/1999 3:07 PM, Doctor 1, Big Gulp 101, 998, 12/2/1999 3:08 PM, Doctor 1, King Size Snickers 102, 998, 12/5/2012 11:13 AM, Doctor 1, Baja Blast IV Injection 104, 998, 4/13/2003 7:13 AM, Doctor 2, Double Gulp ];

 

Since EncounterID is already a part of the LinkTable … we are good to go. Right?????

Data Model Viewer using the EncounterID instead of the DiagnsosisID for the LinkTable

Data Model Viewer using the EncounterID instead of the DiagnsosisID for the LinkTable

Option 3

The reality is that option 1 simply kicked the can to let us grab lunch, but now we need to resolve it. Whenever you know you are simply doing something in a hurry, experience will prove that you will spend more time and have more headaches in the future if you do it right.

I prefaced Option 2 with “your end user said never, ever, in a million years” which we all know really translates to “until next week when I change my mind.” Besides, it is somewhat misleading to have the EncounterDiagnosis information associated by EncounterID. Because as mentioned … the real “fact” key for EncounterDiagnosis is actually a concatenation of the EncounterID –AND– DiagnosisID. We neglected to deal with that. A “fact” KEY needs to be unique. PatientID’s are unique and point to a single patient in the Patient Table. EncounterID’s are unique and point to only 1 Encounter in the Encounter Table. Yet the EncounterDiagnosis is actually linked in a 1 to many manner.

Do it right the first time

So, now let’s actually deal with the issue in the way it should be dealt with all along. Ensuring that our LinkTable contains all of the relevant FACT KEYS in a 1 to 1 manner. Since our EncounterDiagnosis is really “keyed” on EncounterID and DiagnosisID we need to concatenate them. This is one of the “nuances” I mentioned you would find if you search for the topic. Notice that I’m using a preceding load for the EncounterDiagnosis table. We would NOT want the table to actually contain the EncounterID field so we simply alias that field. We could also ignore it all together.

Next we concatenate the EncounterID and the DiagnosisID to form the real concatenated key that the data should likely contain.

Finally we want to add that concatenated key to our LinkTable.

Diagnosis: Load * Inline [ DiagnosisID, Diagnosis Description 999, Mountain Dew Addiction 998, Snicker Bar Addiction ]; EncounterDiagnosis: Load      EncounterID as Encounter%,      DiagnosisID,     EncounterID & ‘-‘ & DiagnosisID as EncounterDiagnosis_KEY,     When,     ByWhom,     Treatment; Load * Inline [ EncounterID, DiagnosisID, When, ByWhom, Treatment 101, 999, 12/2/1999 3:07 PM, Doctor 1, Big Gulp 101, 998, 12/2/1999 3:08 PM, Doctor 1, King Size Snickers 102, 998, 12/5/2012 11:13 AM, Doctor 1, Baja Blast IV Injection 104, 998, 4/13/2003 7:13 AM, Doctor 2, Double Gulp ]; Join (LinkTable) Load Encounter% as EncounterID, EncounterDiagnosis_KEY Resident EncounterDiagnosis; Now we have truly met our goal to have a LinkTable which contains all of the relevant dimensional keys. If you did trust that your end users would never, ever, in a million years want to see Diagnosis that have never been used, then feel free to do the apply map as in Option 2, so that you would have a pure Star Schema. Data Model Viewer of our corrected LinkTable which links to all "facts" in a 1 to 1 manner which is the goal

Data Model Viewer of our corrected LinkTable which links to all “facts” in a 1 to 1 manner

Our LinkTable ends up Creating a Qlik Link Table Summary

95 times out of a 100 you can accomplish your visualization goals by simply using the Associative data modeling directly, and never need a Qlik Link Table.

In my other post, on Visualizing a Knowledge Graph, the problem was that I did in fact HAVE TO use a Qlik Link Table. And realized that when I wrote it, I needed this post as a primer for those who have never implemented one.

The absolute “KEY” to remember when creating Qlik Link Tables is that they should link to all of your “facts” in a 1 to 1 manner. If you always focus on that thought, you will avoid a lot of other issues introduced. If you need a concatenated key field to do it … then just create it as part of the data load.

While you may never need it, hopefully, you are now armed with a new skill or two. As well as what is one of the best books ever written (in my opinion) to help you add even more skills to your repertoire.

Study Guide

I’m going to try something I’ve never done before as an additional way to help those who are really trying to grow their skills by providing this study guide section. I will provide some questions about the material, as well as the answers to them if you scroll further.

Questions
  1. What is the primary function of Qlik’s Associative Engine?
  2. Why might a simple SQL SELECT statement be insufficient when loading data into Qlik?
  3. Describe the situation in the source material where a referential integrity issue was encountered.
  4. What is a Qlik Link Table and what is its primary purpose?
  5. What issue arises when both the Encounters table and the LinkTable contain both PatientID and EncounterID?
  6. How can the synthetic key problem be resolved?
  7. Explain the Cartesian product problem that occurs after adding the Diagnosis and EncounterDiagnosis tables.
  8. What are the three options presented to resolve the Cartesian product problem?
  9. Why is Option 2 (using a Mapping Load and ApplyMap) potentially problematic?
  10. Describe the “Do it right the first time” approach (Option 3) and why it is considered the most robust solution.
Study Guide Key
  1. Qlik’s Associative Engine enables the exploration of relationships between data in memory, allowing users to make selections and see how those selections impact other data points across various tables.
  2. A simple SQL SELECT might not capture the rich relationships between data tables that Qlik’s associative engine leverages. It might also leave out data that is important for understanding the complete story within the data.
  3. The source material shows an encounter record (EncounterID 103) linked to a non-existent patient (PatientID 4), highlighting a data inconsistency.
  4. A Qlik link table is a central table containing keys from various fact tables. It serves to establish connections between different data elements and simplify complex data models.
  5. The presence of the same field combination (PatientID and EncounterID) in both the Encounters and LinkTable leads to a synthetic key issue, creating ambiguity in the data model.
  6. Dropping the redundant field (PatientID) from the Encounters table resolves the synthetic key problem, ensuring a clear and unambiguous data model.
  7. Adding Diagnosis information without a proper key results in a Cartesian product where each Patient/Encounter combination incorrectly displays all diagnoses. This occurs because the distinct values shown in the initial preview table don’t reflect the actual relationships in the data.
  8. Option 1: Revert to a hybrid model (least desirable), Option 2: Use Mapping Load and ApplyMap (potentially limiting), Option 3: Concatenate keys for accurate representation (most robust).
  9. Option 2 relies on the assumption that users will never need to analyze unused Diagnoses, which might prove inaccurate in the future. It also creates a misleading association based on EncounterID instead of the combined EncounterID and DiagnosisID key.
  10. Option 3 involves concatenating EncounterID and DiagnosisID to form a unique key for EncounterDiagnosis, ensuring a 1-to-1 relationship in the LinkTable. This accurately reflects the underlying data structure and prevents future issues.

The post Creating a Qlik Link Table appeared first on Qlik Dork.

https://qlikdork.com/?p=85366
Extensions