Data & AI
News & Insights
News & Insights
Powering Data Pipelines: XML Ingestion with Databricks Autoloader
Many of us have and maybe still use XML as a standard format for various applications and industries in web services, APIs, and data integration scenarios for exchanging data between different systems, platforms, and programming languages.
Thinking outside of the Power BI box
I’ve had a recent experience with a client. We’ve stood their first Azure modern data platform. They’ve data streaming into the cloud, stored in a lake, transformed into various business data layer and presentation layer objects.
A Practitioners Guide to Databricks vs Snowflake
When comparing Databricks and Snowflake across various features and capabilities, it is evident that Databricks holds a competitive edge for TCO sensitive organizations seeking a unified analytics platform that supports all their data, all their users and all their use cases.
Taxonomy is the scientific name used predominantly to describe and classify organisms. Traditionally taxonomy uses hierarchical classifications to help scientists understand and organise the diversity of life on our planet.
“Office-Like strategy” – the March 2023 Power BI update takes UX in a very cool new direction
Since turning on the new on-object interaction opt-in preview in the March Power BI update, I must confess; I’m struggling with it, but I want more!
Artificial Intelligence (AI) for Business
Artificial Intelligence (AI) is rapidly becoming an integral part of our lives, and it has the potential to revolutionise every aspect of our existence.
What can Data Governance tools not do for you?
Data Governance tools have been in the market for more than 20 years, and throughout that time the vendors have been promising success, but the reality is that there are very few successful Data Governance implementations.
Data Modelling in Big Data Solutions – why bother
When I was a teenager, I read a quote that stayed with me. It was a comment by a character in a novel by David Eddings: "All the books in the world won't help you if they're just piled up in a heap."
Manage your fragging VertiPaq dictionary!
With organisations starting to realise the many benefits of Power BI Premium Capacities and utilising the golden dataset approach, more and more we’re pushing up against dataset size limits.
Ingesting Microsoft Excel using Apache Spark Structured Streaming
Everyone who has been into building data ingestions for the Internet of Things (IoT) using Apache Spark would have been very well versed with Apache Spark structured streaming.
Building a Power BI solution that encourages self-serve report development
We spend a lot of time writing about developing organisational Governance and Best Practices. It is one thing to set up a successful community of Governance and install and deploy Power BI; it’s a completely different thing to get report writers to actually use it.
Do you love Data, and know how to use it effectively?
To people in the Business Intelligence and Analytics world this question is like asking, do you like air? We love data! Not just for the sake of it, not just because we like looking at rows and rows of information, it’s not anything truly weird like that… We love data for the things it can do for us, the questions it can answer and efficiencies it can create.
Eagerly chatty bots
The release of the ChatGTP dialog chat bot has been a recent popular topic. From those just playing, others getting “help” to write their school essay, to experts involved in language processing, many have had a go and tried it. Perhaps you have already chatted with it for a few giggles and were surprised on how coherent, diverse and natural the responses were. Better yet, it seems that it can understand the context of your questions!
Gathering Data Projects requirement
In the last 3-5 years the variety of data and ways to onboard, process and visualise data using various tools and technologies have grown exponentially. Before 2015, if you worked on Microsoft platform data projects, you had limited tools like SQL server for storage and transformation, SSIS for integration, SSAS for modelling and SSRS for reporting.
Unit Testing - let’s do this!
In a previous article we explored the why of Unit Testing. We discussed the main benefits and outlined reasons you and your team should be using Unit Testing as one aspect of your overall Testing Strategy. Implementing Unit Testing gives you benefits, such as having confidence in code, striving for better coding practices, and producing improved code.
The right house for your data workload: Data Lakehouse or Data Warehouse?
With businesses having an ever increasing volume of data, it is important to carefully consider the specific needs and requirements of your organisation when selecting a data architecture that is well-suited to business needs and can support the long-term growth of your data management and analysis systems.
A picture is worth a thousand words…
We have all heard the saying, and for the most part it is so very true. If we use the fact that a “word” is defined as two bytes, then a picture that is worth a thousand words will most likely be a very low-resolution image at 2kb – not a lot of information in that! However, if I use that space to create a chart to display the sum totals of say ten million rows of sales data, I can convey much more than a “thousand words.”
Are your data visuals making you pie-eyed?
When you find yourself spending more time interpreting a visual than absorbing the information it contains, there’s a good chance you’re using the wrong visualisation technique. By breaking it down and understanding where the gaps in presentation are, you can fine hone your chart selection and formatting to get the most from your data.
Christmas Data Governance
As it’s getting close to Christmas, it’s time for the annual summary newsletter from the Data Governance team, here at Christmas Inc. As usual for us, it’s been a busy year ensuring that the data we collect here is accurate, timely, understood, secured and available when needed, to support our organisation’s mission statement of “Delivering sustainable joy”.
The Importance of Data Lineage and the Business Benefits
Data Lineage is the transparency and visibility over the journey of data as it flows throughout the business. Good data lineage provides the traceability and ability to answer where you got the data from, the transformations that happened to the data and what else has happened to the data on its journey to a report, dashboard or connected system.
The importance of Data Platform Assessments
“Data! Data! Data! I can’t make bricks without clay!” as Sherlock Homes explained in ‘The Adventure of the Copper Beeches’, highlighting the importance of data. This has now become common knowledge and modern enterprises of all sizes have already acknowledged the catalytic force of the data they create and consume, to optimally analyse and use the data to make informed business decisions.
Top 12 Data Integration Challenges
Integration of data into a data lake or data warehouse is one of the core functions of any modern data analytics or reporting solution. In this article, we look at the top 12 considerations when implementing data integration for an Azure based data solution.
Data Olympics Go for Gold!
Gold is the epitome of sporting achievement. Sports people all over the world aspire to win a gold medal in the Olympics. When they first start their training, they will create a plan to set out short, medium and long-term goals. Short term goals might be to compete in a local event and finish in the top three. While a long-term goal might be to achieve gold in the Olympics.
Elastic Map Reduce (EMR) vs Databricks – Development examples
There are plenty of articles comparing EMR and Databricks. Most of them mention that EMR has a much steeper learning curve than Databricks. This is true but doesn't really give you much insight into what it is like to work with either EMR or Databricks, so I want to take you through a few summary examples of how a developer might work with each of them.
Beware of the undead Data Science projects
So, your data science endeavour failed again? Let’s be honest, too many data science projects end up in a drawer of forget and disappointment. It is almost comical how so many businesses have leftovers of failed data science projects laying in some sort of zombie status. Even those with mature data science teams have “those” projects.
Improving Azure IoT Edge application robustness
IoT solutions differ from other software and data platform solutions in that they need to be able to quickly recover from failures, reduce the amount of network traffic and be able to deal with being offline for a period. Adding an Azure IoT Edge server into the IoT solution can help implement some of these design features. But it needs to be configured properly for offline support and can be further enhanced by implementing a circuit breaker design pattern.
Can Explainable AI Replace AI?
A feature of statistical approaches to predictive algorithms, such as AI and Machine Learning (AIML) is that it can be difficult to explain the rationale behind any one particular prediction. Explainable AI refers to the collection a of tools that data scientists can use to dilute this “black box” nature of their models with a view towards improving user understanding of the models and thus increasing user adoption.
Data Warehousing on Microsoft/Azure SQL Platforms
Before cloud, Microsoft’s offerings for SQL Server databases were straight forward to understand. Fast forward to today and in the Azure cloud you have a multitude of SQL options. The intention of this article is to discuss which option to choose for a modern cloud-based data platform.
How could better Data Governance have helped Optus mitigate their massive data hack?
For those of you buried under a rock, or not from Australia, Optus Telecoms (one of the major telecommunications providers in Australia) recently announced that they had suffered a major data breach and customer data on something approaching 10 million customers, including some personal documents such as drivers license and passport information.
Managing Op Ex in A Cloud Data Platform
One of the key selling points of building data platforms in the cloud is it eliminates upfront capital expenditure. Since it is a pay as you go approach, the cost model transitions to an operational expenditure model. While the operational cost of a cloud infrastructure would not be significant at the outset; it could grow unexpectedly as more data platforms are migrated into or built in the cloud.
Unit Testing, if not why not?
Chances are if you develop any type of software, you have either heard of Unit Testing, would like to write some unit tests or you are actively writing unit tests while developing. If you aren't writing unit tests, what’s stopping you?If you are one of the few developers who aren’t sure what they are, Unit Testing is a method of testing that the smallest parts of your software work as expected.
Data – ‘The Good Oil’
It has often been said that data is the oil of the new age. Data is an enormously untapped reserve in our modern business world, and just like oil, is immensely valuable. However, just because you have the oil does not necessarily mean you have anything useful at all, it only comes into its own when you have a machine to put it through or you make something from it.
Data quality in data engineering
As a data engineer, going into new projects often means a lot of data discovery and data profiling to meet the requirement of a specific use case. Having worked in the Databricks environment for the last couple of years, Databricks has introduced some excellent features for the discovery and profiling tasks, which makes this initial work a breeze.
How to calculate a Data Quality score?
Earlier this year I worked for an organisation that was looking to improve its data quality measurement capabilities. The existing data quality measurement system was really only measuring the effectiveness of their processes, not the data collected. So, for example, a "data quality" measurement was how long new data took to arrive in their central registry from the original source of the data, in comparison to the standard benchmark time.
Requirements Gathering for Organisational Reports
Business Intelligence is more than just data reporting. When fully leveraged, it brings insights into data reporting that allow for making strategic decisions for organisations. Search “Business Intelligence” news on any web browser and you will find many studies and reports indicating the Business Intelligence market will grow quickly over the next few years.
Resource scarcity – A data challenge
Almost every customer that I have spoken with recently has had or is currently (and increasingly) having issues with the availability and retention of skilled resources. This experience extends from the small corner store, your local builder and plumber, right through to the organisation that delivers fresh milk to many of the world’s dairy markets.
Are you the Master of your Data?
I remember talking with an industry peer about how to make data quality issues clear to people who are not data specialists. She told me about her initiative that she called "12 Christines". What she did was to find all the different instances of herself as a customer in her company. Each different product or service that she'd purchased from her own company had created another "Christine" record without checking to see if she already existed.
How to transition and structure your organisation’s Power BI reporting
Your organisation has invested in a new Data Platform utilising Power BI on the front end. Now it is time for the next step: Deciding where to start. An organisation in this position can be quite overwhelmed with all the data needs from top to bottom of the organisational structure but just developing to the loudest demand (or "squeaky wheel") can cause quite a bit of disorganisation midway through your backlog.
Is your long-term data science solution sustainable?
Businesses will go to great lengths to gather, store and clean data. And with it make nice pictures, measure KPIs and record gains. Some will even use data science to create models, predictions and recommendations from such data. Machine Learning and Artificial Intelligence (AIML) are for many shiny novelties and very effective keywords at a stakeholders meeting.
Lies, Damn Lies and Data Science
Recently, I find myself being asked by friends and acquaintances from outside the industry “What is Data Science?” and “What does a Data Scientist do?” I struggle to find a concise yet understandable answer for them which doesn’t lead to more questions. How is Data Science different from Data Analytics?
The gap between the business and its data
One of the most common problems I have seen in companies is that there is a huge divide between the data that a business collects and the business decisions that are being made by its leaders. Gone are the days of being able to run a company based on gut feel alone however some business leaders are still trying to do that and you can see the evidence in the results.
6 Warning signs that your Data Governance program needs work
Data Governance is difficult to get right. 92% of data specialists in a recent TDWI study self-reported that their Data Governance program was not very successful. Too much governance and rigidity leads to inflexibility to change, a "one size fits all" mentality and focus on process at the expense of people and business value. Too little governance leads to multiple approaches that don't connect end to end to drive synergistic value.
Driving Actionable Analytics
So "What are you gonna to do about it?” - It's a song the Pretenders covered, back in the 90's and it was also the challenge from Paul, the 10-year-old kid with whom I got into my first punch up with, in the playground years ago on a Friday at lunch time. It's also the foundation on why we do what we do, both individually as data professionals and as a company. That is providing insights that drive action that results in change.
Fujitsu Data & AI Data and Analytics Assessment
Data and analytics capabilities have advanced a lot in recent years. Reporting and analytics tools have made more data available to more people more easily. Data science and artificial intelligence technologies have become mainstream. Yet many organisations are still older operating older data platforms which may meet basic business reporting needs but are not fully capable of providing the advanced analytics capabilities required by modern business.
How embracing Azure might impact your operating model
If your organisation has decided that a cloud architecture will form the base of your future data strategy, it will be on the basis that there are significant benefits to be had in:• Allowing easy accessibility • Leveraging the security features of cloud datacentres • Quickly scaling resources up and down • Reducing the cost of infrastructure.
Is spending money on your data strategy a cost or an investment?
One of the biggest and most important questions that businesses need to answer for themselves is how they are going to look at the money they have budgeted for their data, as a cost or an investment. This goes right back to the culture of the company… So, who sets the culture of the company?
Data is all around us
Just like Billy Mack says, data is all around us, and data and reporting can be similar to decorating a Christmas tree. Like a Christmas tree, the reporting and dashboards can be flashy and sparkle, however the structure, storage and data integrity all play a vital part in data integrations and reporting.
Gain more business value from Advanced Analytics, by recognizing and overcoming common obstacles
In March 2021 the MIT Sloan Management Review included an article entitled ‘Why So Many Data Science Projects Fail to Deliver’. An interesting read at the time, I’ve found myself referring colleagues and clients to the article on numerous occasions since.
Making use of real-time data
Every business decision maker that relies on data to support their decisions should be interested in getting and processing their data in real time to rapidly make use of the information at hand. But how fast is “real time” data, and what does a business need to do to prepare themselves to make use of this elusive capability?
Source Control – Going up the maturity curve
Most of us reading this have had some discussions in and around source control, and some of us interact with a version of it on a regular basis. However, sometimes businesses need a helping hand to get them on a more formal source code control path to allow for all its benefits to flow through their ecosystem. Here’s how we are helping one valuable client uplift their disparate code base into a managed source control.
Why Data Governance projects fail and what to do instead (Part 1)
Whilst the numbers vary depending on your source, the consensus is that more than half of all Data Governance initiatives aren't successful. Successful in this context means that they deliver more value than their cost, AND that they are sustained and continuing over the medium to long term. A very recent Teradata-sponsored study found 90% of Australian organisations think that Data Governance initiatives should have more priority than they currently do.
Why Data Governance projects fail and what to do instead (Part 2)
In my previous post, I discussed the most common causes for Data Governance programs to be seen as unsuccessful. My view, in summary, is that Data Governance fails when it is Big Bang over Incremental, Formality over Delivery, and Ivory Tower over Engaged. The Fujitsu Data & AI ResultsNow® Data Governance approach takes these issues into account to deliver more successful long term Data Governance programs.
Building sustainable Data Governance Programs
If you've tried to sustain a New Year's resolution for more exercise or less drinking for more than the first few weeks, you know how hard it is to change to and maintain good habits over time. This same issue happens to organisations in relation to Data Governance programs. A lot of organisations find it very hard to sustain effective Data Governance over long periods of time, despite the benefits they can deliver.
How to transition from Column & Row Tabular reports to Visualisations
I often get asked to recreate column and row excel reports in PowerBI. Ultimately, businesses are making this request because they are comfortable with analysing data in columns and rows and believe there is only one way to quickly assess whether the organisation is functioning in an acceptable manner.
Present your back side: 7 ways to tone your Power BI booty
In the battle for the hearts and minds of executive Excel power users, should we be making our back end more presentable? Excel plugins are becoming more commonplace, and pro licencing more widespread, meaning more and more people are accessing our datasets on their own terms.
The benefits of automation
As the new year has begun, your company may be trying to organise or implement automation or dashboards but are unsure how to start, and the steps needed to automate your reports. It may seem impossible, and companies may feel it’s easier to extract, transform and load data manually and then load these into Power BI for reporting and dashboards.
Why do people bash on about ‘Single Source of Truth’?
I was involved in a conversation with a client today where they were talking about a manually maintained Microsoft excel based data source that was required to be included in data being pulled into a database for reporting purposes. After asking some more questions it was established that this data was actually being generated from a source that originated with Human Resources.
Azure Data Explorer
Organisations these days are producing more data than ever before and are needing it to be analysed faster than ever before. This growth has been particularly pronounced for timeseries and telemetry data primarily generated by SCADA and IoT devices. These types of data have some unique characteristics – such as high-velocity, high-volume and the need for analysis and reporting in near real time – that can be challenging for many tools.
Do you start with Data or the Outcome?
What is the biggest challenge that organisations have today with data? The Data Warehouse technician will tell you it’s the technology they are using, the Business Intelligence Analyst will tell you that it’s the quality of the data, the Data Scientist will tell you it’s the volume of the data... I am going to disagree with all of them, I believe it’s knowing what they want the data for!
Maximising the effectiveness of data professionals
Much of our consulting is focused on implementing platforms and tools that provide data teams (data analysts, data engineers and data scientists) the ability to uplift their productivity and solve more complex business problems or create insights into the operations of their organisation that reveal new strategic opportunities.
Power BI governance: Tuning your information engine in a data driven organisation
Power BI is a serious and complex set of tools for trained engineers and analysts to provide robust enterprise-wide, certified reporting to your business users and customers. But it is also a nice little tool for data savvy power users to pick up and throw data at when excel just won’t do. Where’s the balance?
Delivery Like a G
To ‘Delivery Like a G’ could mean different a different thing to different people, or roles, or where you fit into the organisation. The questions are pertinent: where does it start, where does it end and most importantly how do you get there?
Feng Shui Your Data
Welcome again to what you may still consider an unorthodox view on data and all things PMO. The art of de-cluttering just to the right point makes things so much more useful and apparent in the real world; doing the same for your PMO data will bring much needed clarity, if you do not already have it.
How accurate is my accuracy?
Can your Data Scientist tell you how confident they are in their predictions? Data science and predictive analytics are an integral part of data driven decision making. Now suppose you have commissioned a demand forecasting model from your data science team, and you can now receive detailed predictions for the volume of sales for the next month or the next year.
The low-down on Azure Purview
Microsoft's Enterprise Data Catalogue tool, Azure Purview, was released as "Generally Available" last week so I thought it was timely to do a review of its focus, features, functions, fees, and future.
Tips and Tricks: Deriving first Normal Form using pySpark / Databricks
In this article, I present a lesser-known inbuilt Spark function – “stack”, which is very useful for data wrangling operations. Code and output in this article were written in pySpark using a Databricks workspace.
Data platform architecture is not a topic that is subject to the rapid changes and advances that often afflict other technology solutions. Data warehouses were first implemented in the 1980s and have been the mainstay of many data solutions ever since. We had to wait until well past 2000 for big data architectures with data lakes and parallel compute capabilities to come along.
On being a Data Detective
Data is logical, organised – and a modern data platform will do exactly what you have told it to do, but it’s doing what you told it to do on ALL the data, not only the edge cases. Unfortunately, data is also noisy and messy. Even though a lot of today’s data is automatically generated – almost always are you adding it to human generated data. And humans invariably make mistakes.
Power BI: Applying the 80/20 rule to an enterprise deployment to achieve 100% requirements coverage
Every good analyst knows that attention to detail is critical for reporting. But how much detail should you be trying to cover on behalf of your report users?
The Many Hats of a Chief Data Officer
All senior executives must wear many different hats to fulfill their duties, but the Chief Data Officer needs to be one of the most flexible executives, to deal with the significantly different roles that they must, from time to time, undertake. The need for different roles derives from the fact that there is a different vision for data within each organisation, and sometimes within separate divisions within a single organisation.
What is Delivery Culture, and why is it important to Data Analytics teams?
Recently we were asked by a client to give them some advice on how to establish a ‘Delivery Culture’. Now that’s a term I’ve been hearing more often in recent years, but I’m not sure that there is yet a commonly accepted definition of how a ‘Delivery Culture’ looks different to your average IT team culture.
Advanced analytics expertise in a democratised AI world
It has been said that Artificial intelligence (AI) is well on the way to being democratised in the workplace. But what does this really mean for businesses, employees and customers?
Why Data Governance needs to stop being overlooked
While Customer Experience (CX) analysis/design and Data Science have been embraced by business over the last decade, it surprises me how few organisations have managed to bring these capabilities together to really understand how to drive customer loyalty.
Do your CX and Data Science teams work together, and if not, why not?
Good CX analysis helps us to understand why our customers (or potential customers) behave the way they do. However good CX analysis is not easy to validate. Customer behaviour is notoriously illogical and driven by feelings, beliefs and concealed motivations. Often only a significant investment in ethnographic research and extensive observation gets a CX team past the customers rationalising their behaviour and reveals the true underlying drivers.
Accelerating IoT Analytics
Key Trends, Challenges and Opportunities in Water Utilities: There has been a significant growth in connected IoT devices, with almost 3 x growth from 2020 to 2030, with a significant slice of this being driven globally by the water utility and energy sector. Spending on sensor related devices will see at least a 35% growth from 2018 to 2022 worldwide, presenting massive opportunity in leveraging value from these devices!
You don’t have to be in security to be insecure
Many organisations struggle with keeping their information secure. Whilst IT departments are very experienced in on premise or private data centre security controls, moving to the cloud has significantly shifted the mindset, skillset and risk of data security and controls.
Accelerating the Time to Value utilising Modern Cloud Data Architectures
It wasn’t too long ago (10 to 15 years) that on-premise data warehouses were the ‘in thing’. Remember when we went through the entire process of selecting vendors, hardware and software and waiting 3 to 6 months to get the hardware installed and configured correctly?
Data Engineering and Business Intelligence – Everything has changed, except nothing has really changed
It is said that change is inevitable. In today’s data world it’s clear that the rate of change has been accelerating at an unprecedented rate. There is more data than ever, but at the same time, more storage capability, and faster ways of processing than ever.
External data sets to unlock additional business value … are you missing an opportunity?
As discussed in detail in a recent MIT Sloan Management Article, ‘Why external data should be part of your data strategy’, by Sara Brown, there is a vast array of public data and third-party data available which could be utilised to drive additional economic gains for your organisation.
From Proof Of Concept (POC) to Productionising AI
After being involved in many AI projects, I have realised that many AI POC never reach the production stage due to the following reasons...
The 6th V of Big Data
People love memory aids, especially when it comes to learning lists of technical jargon. So, it is not surprising that when Doug Laney wrote his paper "3D Data Management: Controlling Data Volume, Velocity and Veracity" in 2001, that these soon became the 3 V's of Big Data.