BONUS | Ep. 4: Dan Smith - Technology & Data Analytics
Daniel Smith, Head of Innovation and Founder of Theory Lane Integration Solutions, concludes his conversation with IMA about all things relating to technology & analytics as it pertains to accounting and finance professionals.Starting on 6/24/19, he spoke to us about all things affecting the evolving role of the accounting and finance professional. He defined and simplified AI, machine learning, RPA, and other emerging technologies and explained the various opportunities created by data and technology. In this episode, Dan discusses learning and developing new skills and competencies, applying technology and analytics, and the intersection of data and business - data science. From disruptive technologies, to quantitative vs. qualitative data, and all things in between, Dan now answers new questions to talk about how management accountants can learn and apply these new skills. "When you first want to build a house...you don't ask that architect what kind of hammer they intend to use!" Learning the appropriate skills, applied with the correct tools, can lead to better strategic decisions and more successful outcomes for your business.(Conclusion of Ep. 4 from 6/24/19)
(*EXTENDED EPISODE* Conclusion of Episode 4 from 6/24/19)
#YAADS #datapossible
https://www.theorylane.com/
https://www.linkedin.com/in/daniel-smith-data-scientist/
https://www.linkedin.com/pulse/acid-just-sweet-80s-jeans-datapossible-daniel-smith/
https://github.com/thedanindanger
#YAADS #datapossible
https://www.theorylane.com/
https://www.linkedin.com/in/daniel-smith-data-scientist/
https://www.linkedin.com/pulse/acid-just-sweet-80s-jeans-datapossible-daniel-smith/
https://github.com/thedanindanger
FULL EPISODE TRANSCRIPT
Music: (00:00)
Adam: (00:04)
Hey everybody. Welcome to Count Me In, IMA's podcast about all things effecting the accounting and finance world. I'm Adam Larson here with Mitch Roshong and this week we cover the topic of data analytics and emerging technologies in accounting and finance. We have an extended bonus episode for you where we will cover multiple areas within this topic and conclude a previously recorded conversation. Mitch, can you tell us more about it?
Mitch: (00:28)
Thanks Adam. As you may remember a while ago I spoke with Dan Smith at length about all these technology and data related topics. Again, Dan is the head of innovation and the founder of theory lane integration solutions and he offers a very unique perspective on how these ideas relate to accounting, finance. Our talk got even more interesting as it went on, so I'm really excited for you to hear the remainder of our conversation.
Adam: (00:04)
Hey everybody. Welcome to Count Me In, IMA's podcast about all things effecting the accounting and finance world. I'm Adam Larson here with Mitch Roshong and this week we cover the topic of data analytics and emerging technologies in accounting and finance. We have an extended bonus episode for you where we will cover multiple areas within this topic and conclude a previously recorded conversation. Mitch, can you tell us more about it?
Mitch: (00:28)
Thanks Adam. As you may remember a while ago I spoke with Dan Smith at length about all these technology and data related topics. Again, Dan is the head of innovation and the founder of theory lane integration solutions and he offers a very unique perspective on how these ideas relate to accounting, finance. Our talk got even more interesting as it went on, so I'm really excited for you to hear the remainder of our conversation.
Music: (00:54)
Mitch: (00:56)
How can senior management accountants who may have limited knowledge when it comes to data analytics gain a deeper knowledge or a better understanding so they can enable themselves and their organization to kind of face these new challenges that are presented or new opportunities as we've said to work with technological tools.
Dan: (01:18)
Absolutely. We have this conversation almost every day. The easiest answer for me would be to check out the the IMA's analytics competency framework cause I've done a lot of advising with you guys on that. Absolutely right. Quick plug!, A longer answer is that I mentioned in a previous response the idea we're starting to break down the barriers of traditional business structure. There was a famous statement made over half a century ago by I believe he was a doctor, Dr. Conway. It's called Conway's law. It comes up in software development all the time. Conway stated, "any communication system designed in a business is going to model the structure of that business." In a modern context, it means that any solution, any application that's designed to solve a business problem is going to model the structure of that business. Now we've created a whole new set of ways we can current problems with the new paradigm of it's actually the internet. It's digital data. It's not just analytics, it's because now we can have information move in a completely different way. We have a business structure that is set up with a pencil and paper type of data in mind. Up until the past 10 or 20 years, we've just used computers to accelerate what was otherwise a written form of communication. Now we have to have these cross functional competencies because information is no longer constrained to a specific department. Those cross functional competencies are what we've been calling data science. That's the intersection of data statistics and business application of data and statistics. In my general competency framework, not the one that's just for accountants management accountants. I replaced statistics with machine learning simply because machine learning to me is the application of statistics through computer programs as opposed to a more traditional statistical approach. I don't think in many cases now for financial accountants you do because you guys are heavy in math, but in most cases you don't actually need to know that much statistics. It's abstracted away in most of the models you just need to know if it's right or wrong. So management accountants are a little bit of an exception, but otherwise in terms of data, the competency, if you know the lower level competencies, so you know how data moves in an organization, where does it live? How, how was it created, what are basic data structures and do you know how to use the data to create analysis in such a way that it benefits the business? Those are the low level competencies. I'm going to get more into those later so I don't want to dwell too much on them. fundamentally though it's the difference between understanding the competencies, understanding the low level reasoning behind what you're doing versus thinking about what tools should I use or what program should I use to solve this problem? Understanding what that tool is doing to solve the problem as opposed to what type of tool should I use.
Mitch: (06:03)
Once we have that foundational knowledge, those low level competencies, how do we, how do we move up? You know, how do we get these skills, these competencies? How do we learn the tools that are available so that we can make more effective decisions?
Dan: (06:20)
Yes. Perfect segue. There's a slide that I use all the time and you can probably find it on a webinar or on LinkedIn or somewhere where I talked about the idea of concepts versus tools versus technology. I use the analogy of building a house. When you first want to build a house, you talk to an architect. That architect uses the concepts of material design, of calculus, of structural engineering, all these ways in which he or she knows where to place a wall, to build a house, to put the foundation in, to put up the roof, et cetera. What you don't ask that architect is what type of hammer do they want to use for, or what type of CAD software are they are they using to create these images? It's irrelevant. and an architect certainly wouldn't start learning architecture by going to the hardware store and picking out, sitting there, evaluating what's the best hammer for the job. They would figure out what the concepts are and they would realize that, you know, a hammer might not even be what they need. They might need a screwdriver, they might need a pneumatic press. I look at learning a specific tool in the same way. Some concepts and some knowledge and tools translates very easily into other ones. I tend to recommend a bottom up approach with the caveat that you want to be able to apply that knowledge as quickly as possible. So you feel like your doing something, it's easy to get discouraged if you just feel like you're taking classes all the time. a nice mix of that is Python and Python, Jupiter notebooks or the various notebooks, solutions that you can find easily. If you know, if you know Python or if you can, you can't, you can't know. You can't know a programming language. First off, that's, that's a that's a common misconception. You can be capable of solving problems using that programming language, but you will never learn all of the programming languages. It's impossible. It'd be like learning English. People still study English all the time, but you can reach a level of functional competency with it. With, with Python, you can understand what's going on behind the scenes. You can do some basic programming, but it abstracts enough so that you're not bogged down in defining every single thing and working through all this obscure knowledge with that knowledge, with understanding basic programming competencies, you can move into a lower level language like a Java or a C plus or a C sharp. you can also easily move up into a into a more visual tool like a rapid miner or even a Tableau. Well, Tableau is not a good example of rapid, minor or a nine, because you know what's going on behind the scenes. Similarly when it comes to the data competencies, every BI tool across the board, this is, this is one of the few blanket statements that I'll make every BI tool that you come across, be it Tablo, be it click view, be it power BI. All those are doing are simplifying the process of creating an SQL statement. They're doing the aggregations and the joins for you so you don't have to write sequel code. That may seem like a benefit and in many cases it is. You can quickly get started. You can explore some data, you can see some visualizations. It's great when it becomes a problem is when we spend so much time trying to get a specific visualization or a specific solution or a specific thing using that tool. There are people that make a living off of using one of those specific tools because people have gotten themselves so imbedded into one platform that they can't decouple their operations from the tool. When you use raw code, when you use even SQL, but you can use code like Python or.net to execute SQL statements. I digress. When you use raw code, it is transparent, it is editable. You can have simple version controls against that code and other people can easily pick up what it is that you did. If any of you have ever tried to use a notebook or and an Excel workbook created by somebody else, you will know how long it takes to figure out what the heck it is that they did and even worse, if you try to debug it, it's nearly impossible. You have all these lines going everywhere. There's all these obscure references. It's a mess. That's much easier to do in transparent code. It's, it's also why you'll see people that are programmers being able to pick up a BI tool rapidly because they understand the concept and then all they have to do is look up, how do I do this concept in this tool? If you just know the tool, it's very hard because to navigate to another one because you only know how to do something in that tool as opposed to what is the lower level thing that you're trying to solve.
Mitch: (13:09)
In your opinion, where does this aspect of business fit in as far as which function and how does somebody in the finance function you know, where's the crossover I guess is what I'm asking and where should all of this data really be housed within an organization? Because to me it almost sounds like two different sets of skills and I just want to know where is the merger? Like what do you think the management accountants really need to be aware of?
Dan: (13:45)
Yeah. And I'm going to split your question into two parts cause the, the first part was where should the data live. And the other one is what is the role of the management accountant? The first one, where should the data live? I and I had a multi-part video with some people who are data integration. specialist data integration is something that a lot of people don't even know as a profession. That, that is the, that is the business of taking data and applications in an organization and making sure that they can talk to each other, what you hit upon and on. Where should the data live is a contentious topic in that space. What the state of the industry is moving towards is that every thing should have its own data set. Every problem should have its own set of data that's specific to it. And it's okay if that data is reproduced in places storage and processing with the emergence of cloud platform as a service and infrastructure as a service has become much more affordable so we can have redundant data. There's a whole, there's a whole specialization when it gets into really deep data theory of things like cap theorem the concept of we can have a data environment that's consistent, available or partitioned tolerant but you can only choose to participant partitioned tolerant, meaning that it can be in multiple locations at once. you can't have something that's always available and consistent. If it's replicated in multiple locations, I could go down to, can't be consistent if it's always available and partition tolerant because one is going to be updating versus the other. But that's a bit of a digression. I do have an article on LinkedIn about that. The reason I go into that is that it's okay. Now if you have data that's specific to a problem or an organization, in fact that's, that's, that's even better. The traditional argument for that was that it was expensive, it was expensive to house and it was expensive to manage and there would be risk associated with that data being everywhere where the management accountant fits into that. Particularly, and I've talked about this at length, the management accountant, if they are the ones that have those competencies of data statistics of the business applications they're in, they can be the one who articulates the value proposition of doing things differently or of investing a little more in your data environment or talking about the enormous return on investment for effective data governance. That takes a little longer than you would think. It does take a long time to get returns on invaded on data governance, but once you get them, it's huge.
Mitch: (17:35)
In order to communicate, kind of the last step in this progression when you are, you know, speaking with the executives about the decisions. A big hot topic is data visualization. So I'm curious about your thoughts on visualization. What kind of skills do you really need to effectively illustrate and visualize your data for your audience?
Dan: (18:00)
Yeah, and here is, here's another thing where I'm going to inject some unnecessary philosophy, but I want to give you a straight answer first. Visualizations as they stand right now are very, very, very important in order to move the space of business to a better understanding of what machine learning AI analytics in general is capable of doing. In the future. And what I mean by that, what I mean by that is in the future, generally speaking, people will have a better sense of what machine learning is, what analytics is, what is the new state of business. We're going to understand these capabilities internally or in general. We won't need to teach people data science in order to tell them what data science is telling them. Meaning visualizations in a few decades won't be as important, but right now they're extremely important because they're used to teach people data science. That's, the big stumbling block when people are trying to make visualizations, we're trying to get data scientists to make visualizations. Data scientists are generally not very good at visualizations because those visuals are used to teach. They're used to explain it's a completely different skill set than the predictive and programming and data modeling that data scientists have been trained to do. I put in my competency framework, I put visualizations not in machine learning and statistics. I don't put it in analytics. I put visualizations in business because it's a communication tool. Now there are visualizations like dashboards or a graph or showing that there's a story to be told there or or reporting to people. That's fine, but that's a reporting exercise that's communicating information in a succinct manner so people can make fast decisions. That's kind of out of the box. Those are normal things and then you have very complex specialized analytics or visualizations to tell advanced stories. That's a specialist. There's visualizations that are used as exploratory analytics, so am I trying to find a problem. There are also visualizations that are used to represent performance. Those are descriptive visualizations. They say this is how something performed in the past. Most management accountants right now are focused on those descriptive visualizations, which is what I would label as the baseline, the fundamental ones given that, what you just said Mitch, those are fundamental visualizations. The skill with the visualizations is knowing when to report, which making sure that you don't convey that information in a misleading way. There are two primary sources of what makes an effective visualization and that's tough. Tufty and Cleveland Tufty tends to be artistic in visualizations. It's almost about the art of visualizing data and telling a story with it. Cleveland on the other hand gives very practical advice for which visualization to use when in order to represent the distance between one point versus another, when to use a time series graph, when to use a pie chart, which is usually never by the way when to use a bar chart, et cetera. So the fundamental skills in that type of descriptive visualization, are not about what tools should I use, it's what visualization should I use and when and how can I represent that data in a simple, clean, effective manner. All right, I'm just going to give you a couple of quick ones. What is click view? Okay, so click view is one of the family of BI tools, business intelligence tools. Compare it to a Spotfire or Tableau or power BI or burst. That's B. I. R. S. T. there are a lot of them out there. Which one you want to use as a matter of personal preference, they are largely for descriptive and to a degree diagnostic analytics. So they are always, in all cases intended for communication to a human recipient. They are to help a person better understand what data is doing in a business. Because of that it can be a little challenging to operationalize some of the solutions that come about in a click view or any of the other BI tools. By operationalize, I mean what will happen as a person or maybe a few people will have so much capability with those tools to create an entire workflow of merging databases together creating their own table on their laptop, then making a bunch of visualizations and new tables and visualizations off those tables and so on and so forth. It is hard for the data engineers of the world to tease out exactly what the heck is happening in that thing and making it into a business application. So a new report or a new KPI, one in which they have acceptance criteria and unit test around to make sure that it's consistently correct. I emphasize that last point, not as a knock against BI tools. They're great to explore data. They're great to communicate some complex relationships to people where folks run into problems. And back when I was doing a lot of BI engagements, this is where they would call in my team. And I think there's another question that you've asked about why my reports are slow. The reason behind it is that you can have in a BI tool, a single person develop all this stuff, but nobody can ever figure out what it is exactly that they did. So oftentimes they've failed to account for every what we would call a corner case where the data may not be correct. There may be something on the back end or business rule that wasn't applied correctly. And it's hard for people to identify where that issue is occurring because it's all self contained within this platform. So what they are are BI tools. They're used for fast analytics and insights. What they are not are super rigorous reporting platforms, nor are they artificial intelligence automation platforms for creating your own custom product. And they themselves will not attest to being that they will tell you up front that that's what they should be used for exploratory and diagnostic. And to a degree descriptive analytics, but you should not try to operationalize these things in less. You have a robust system behind it.
Mitch: (27:32)
How about our Python, SQL, spark? How do those relate? What are their benefits?
Dan: (27:39)
All right, so R in Python are somewhat similar. R is kind of sliding out of favor at the moment because it is geared more for purely statistical and mathematical operations. In order to explain the subtle difference between them would take an entire podcast itself and it's nearly impossible to do visually without a visual tool but just to keep it simple. R and Python are what you would call a scripted language, they're a true programming language where you can create functions that execute operations. You can import libraries that automate a lot of the routine and mundane things you do in code. A lot of websites will have some Python element baked into them now. So when you open up an app on your computer or you go to a website and it's doing some processing in the back end, there might be some Python operation somewhere along those lines. Similar with our but ours on a little smaller scale. You'll see those types of data operations be embedded into BI tools like click and Tableau in order to extend their functionality a little bit. sequel SQL is a query language as opposed to a scripting language. You can't really build or you can't period. You can't build a application on your computer. That would be like a desktop app load. So like you can't build Excel using sequel. You can't build word out of sequel. It is a query language, meaning that you pass into it what you want from a data set and it will return that you just say, give me these columns the sum of this column and group by those other columns. That's it. It's basically making pivot tables just at a larger scale confounding all of this spark. Well, spark is a framework. You don't code spark is not a thing that you program in. It's not a programming language. You don't code spark you use Python or Scala or R to access the spark API. That is application program interface. the, R Python Scala operations get data into a shape or pass commands into spark, which then executes operations in a massively parallel processing in memory environment. so spark is a framework that you need to know a few operations in and understand how you need to shape the data for it. but the way in which you interface with the spark application is through an R or Python code. So you would still need to know our Python or Scala, which is another scripted language in order to use spark. But spark is a way to execute data operations similar to the way you would in SQL, but in a distributed data environment. I must say though, very long answer for this. Again I personally would not spend much time learning spark because it's largely abstracted in a lot of the operations that we do in data today. You don't, you, you don't code much in spark anymore. You're fine with just Python and SQL. If you have to study something that's really advanced, I would study TensorFlow, although even that is getting abstracted and things like torch PI. so conceptually you should understand what spark and Pence or flow and those other advanced data operation programs are doing. You probably do not need to learn how to code anything in them because by the time you do a easier way of doing it, we'll have come along.
Announcer: (32:57)
This has been Count Me In, IMA's podcast, providing you with the latest perspectives of thought leaders from the accounting and finance profession. If you like what you heard and you'd like to be counted in from more relevant accounting and finance education, visit IMA's website at www.imanet.org.
Mitch: (00:56)
How can senior management accountants who may have limited knowledge when it comes to data analytics gain a deeper knowledge or a better understanding so they can enable themselves and their organization to kind of face these new challenges that are presented or new opportunities as we've said to work with technological tools.
Dan: (01:18)
Absolutely. We have this conversation almost every day. The easiest answer for me would be to check out the the IMA's analytics competency framework cause I've done a lot of advising with you guys on that. Absolutely right. Quick plug!, A longer answer is that I mentioned in a previous response the idea we're starting to break down the barriers of traditional business structure. There was a famous statement made over half a century ago by I believe he was a doctor, Dr. Conway. It's called Conway's law. It comes up in software development all the time. Conway stated, "any communication system designed in a business is going to model the structure of that business." In a modern context, it means that any solution, any application that's designed to solve a business problem is going to model the structure of that business. Now we've created a whole new set of ways we can current problems with the new paradigm of it's actually the internet. It's digital data. It's not just analytics, it's because now we can have information move in a completely different way. We have a business structure that is set up with a pencil and paper type of data in mind. Up until the past 10 or 20 years, we've just used computers to accelerate what was otherwise a written form of communication. Now we have to have these cross functional competencies because information is no longer constrained to a specific department. Those cross functional competencies are what we've been calling data science. That's the intersection of data statistics and business application of data and statistics. In my general competency framework, not the one that's just for accountants management accountants. I replaced statistics with machine learning simply because machine learning to me is the application of statistics through computer programs as opposed to a more traditional statistical approach. I don't think in many cases now for financial accountants you do because you guys are heavy in math, but in most cases you don't actually need to know that much statistics. It's abstracted away in most of the models you just need to know if it's right or wrong. So management accountants are a little bit of an exception, but otherwise in terms of data, the competency, if you know the lower level competencies, so you know how data moves in an organization, where does it live? How, how was it created, what are basic data structures and do you know how to use the data to create analysis in such a way that it benefits the business? Those are the low level competencies. I'm going to get more into those later so I don't want to dwell too much on them. fundamentally though it's the difference between understanding the competencies, understanding the low level reasoning behind what you're doing versus thinking about what tools should I use or what program should I use to solve this problem? Understanding what that tool is doing to solve the problem as opposed to what type of tool should I use.
Mitch: (06:03)
Once we have that foundational knowledge, those low level competencies, how do we, how do we move up? You know, how do we get these skills, these competencies? How do we learn the tools that are available so that we can make more effective decisions?
Dan: (06:20)
Yes. Perfect segue. There's a slide that I use all the time and you can probably find it on a webinar or on LinkedIn or somewhere where I talked about the idea of concepts versus tools versus technology. I use the analogy of building a house. When you first want to build a house, you talk to an architect. That architect uses the concepts of material design, of calculus, of structural engineering, all these ways in which he or she knows where to place a wall, to build a house, to put the foundation in, to put up the roof, et cetera. What you don't ask that architect is what type of hammer do they want to use for, or what type of CAD software are they are they using to create these images? It's irrelevant. and an architect certainly wouldn't start learning architecture by going to the hardware store and picking out, sitting there, evaluating what's the best hammer for the job. They would figure out what the concepts are and they would realize that, you know, a hammer might not even be what they need. They might need a screwdriver, they might need a pneumatic press. I look at learning a specific tool in the same way. Some concepts and some knowledge and tools translates very easily into other ones. I tend to recommend a bottom up approach with the caveat that you want to be able to apply that knowledge as quickly as possible. So you feel like your doing something, it's easy to get discouraged if you just feel like you're taking classes all the time. a nice mix of that is Python and Python, Jupiter notebooks or the various notebooks, solutions that you can find easily. If you know, if you know Python or if you can, you can't, you can't know. You can't know a programming language. First off, that's, that's a that's a common misconception. You can be capable of solving problems using that programming language, but you will never learn all of the programming languages. It's impossible. It'd be like learning English. People still study English all the time, but you can reach a level of functional competency with it. With, with Python, you can understand what's going on behind the scenes. You can do some basic programming, but it abstracts enough so that you're not bogged down in defining every single thing and working through all this obscure knowledge with that knowledge, with understanding basic programming competencies, you can move into a lower level language like a Java or a C plus or a C sharp. you can also easily move up into a into a more visual tool like a rapid miner or even a Tableau. Well, Tableau is not a good example of rapid, minor or a nine, because you know what's going on behind the scenes. Similarly when it comes to the data competencies, every BI tool across the board, this is, this is one of the few blanket statements that I'll make every BI tool that you come across, be it Tablo, be it click view, be it power BI. All those are doing are simplifying the process of creating an SQL statement. They're doing the aggregations and the joins for you so you don't have to write sequel code. That may seem like a benefit and in many cases it is. You can quickly get started. You can explore some data, you can see some visualizations. It's great when it becomes a problem is when we spend so much time trying to get a specific visualization or a specific solution or a specific thing using that tool. There are people that make a living off of using one of those specific tools because people have gotten themselves so imbedded into one platform that they can't decouple their operations from the tool. When you use raw code, when you use even SQL, but you can use code like Python or.net to execute SQL statements. I digress. When you use raw code, it is transparent, it is editable. You can have simple version controls against that code and other people can easily pick up what it is that you did. If any of you have ever tried to use a notebook or and an Excel workbook created by somebody else, you will know how long it takes to figure out what the heck it is that they did and even worse, if you try to debug it, it's nearly impossible. You have all these lines going everywhere. There's all these obscure references. It's a mess. That's much easier to do in transparent code. It's, it's also why you'll see people that are programmers being able to pick up a BI tool rapidly because they understand the concept and then all they have to do is look up, how do I do this concept in this tool? If you just know the tool, it's very hard because to navigate to another one because you only know how to do something in that tool as opposed to what is the lower level thing that you're trying to solve.
Mitch: (13:09)
In your opinion, where does this aspect of business fit in as far as which function and how does somebody in the finance function you know, where's the crossover I guess is what I'm asking and where should all of this data really be housed within an organization? Because to me it almost sounds like two different sets of skills and I just want to know where is the merger? Like what do you think the management accountants really need to be aware of?
Dan: (13:45)
Yeah. And I'm going to split your question into two parts cause the, the first part was where should the data live. And the other one is what is the role of the management accountant? The first one, where should the data live? I and I had a multi-part video with some people who are data integration. specialist data integration is something that a lot of people don't even know as a profession. That, that is the, that is the business of taking data and applications in an organization and making sure that they can talk to each other, what you hit upon and on. Where should the data live is a contentious topic in that space. What the state of the industry is moving towards is that every thing should have its own data set. Every problem should have its own set of data that's specific to it. And it's okay if that data is reproduced in places storage and processing with the emergence of cloud platform as a service and infrastructure as a service has become much more affordable so we can have redundant data. There's a whole, there's a whole specialization when it gets into really deep data theory of things like cap theorem the concept of we can have a data environment that's consistent, available or partitioned tolerant but you can only choose to participant partitioned tolerant, meaning that it can be in multiple locations at once. you can't have something that's always available and consistent. If it's replicated in multiple locations, I could go down to, can't be consistent if it's always available and partition tolerant because one is going to be updating versus the other. But that's a bit of a digression. I do have an article on LinkedIn about that. The reason I go into that is that it's okay. Now if you have data that's specific to a problem or an organization, in fact that's, that's, that's even better. The traditional argument for that was that it was expensive, it was expensive to house and it was expensive to manage and there would be risk associated with that data being everywhere where the management accountant fits into that. Particularly, and I've talked about this at length, the management accountant, if they are the ones that have those competencies of data statistics of the business applications they're in, they can be the one who articulates the value proposition of doing things differently or of investing a little more in your data environment or talking about the enormous return on investment for effective data governance. That takes a little longer than you would think. It does take a long time to get returns on invaded on data governance, but once you get them, it's huge.
Mitch: (17:35)
In order to communicate, kind of the last step in this progression when you are, you know, speaking with the executives about the decisions. A big hot topic is data visualization. So I'm curious about your thoughts on visualization. What kind of skills do you really need to effectively illustrate and visualize your data for your audience?
Dan: (18:00)
Yeah, and here is, here's another thing where I'm going to inject some unnecessary philosophy, but I want to give you a straight answer first. Visualizations as they stand right now are very, very, very important in order to move the space of business to a better understanding of what machine learning AI analytics in general is capable of doing. In the future. And what I mean by that, what I mean by that is in the future, generally speaking, people will have a better sense of what machine learning is, what analytics is, what is the new state of business. We're going to understand these capabilities internally or in general. We won't need to teach people data science in order to tell them what data science is telling them. Meaning visualizations in a few decades won't be as important, but right now they're extremely important because they're used to teach people data science. That's, the big stumbling block when people are trying to make visualizations, we're trying to get data scientists to make visualizations. Data scientists are generally not very good at visualizations because those visuals are used to teach. They're used to explain it's a completely different skill set than the predictive and programming and data modeling that data scientists have been trained to do. I put in my competency framework, I put visualizations not in machine learning and statistics. I don't put it in analytics. I put visualizations in business because it's a communication tool. Now there are visualizations like dashboards or a graph or showing that there's a story to be told there or or reporting to people. That's fine, but that's a reporting exercise that's communicating information in a succinct manner so people can make fast decisions. That's kind of out of the box. Those are normal things and then you have very complex specialized analytics or visualizations to tell advanced stories. That's a specialist. There's visualizations that are used as exploratory analytics, so am I trying to find a problem. There are also visualizations that are used to represent performance. Those are descriptive visualizations. They say this is how something performed in the past. Most management accountants right now are focused on those descriptive visualizations, which is what I would label as the baseline, the fundamental ones given that, what you just said Mitch, those are fundamental visualizations. The skill with the visualizations is knowing when to report, which making sure that you don't convey that information in a misleading way. There are two primary sources of what makes an effective visualization and that's tough. Tufty and Cleveland Tufty tends to be artistic in visualizations. It's almost about the art of visualizing data and telling a story with it. Cleveland on the other hand gives very practical advice for which visualization to use when in order to represent the distance between one point versus another, when to use a time series graph, when to use a pie chart, which is usually never by the way when to use a bar chart, et cetera. So the fundamental skills in that type of descriptive visualization, are not about what tools should I use, it's what visualization should I use and when and how can I represent that data in a simple, clean, effective manner. All right, I'm just going to give you a couple of quick ones. What is click view? Okay, so click view is one of the family of BI tools, business intelligence tools. Compare it to a Spotfire or Tableau or power BI or burst. That's B. I. R. S. T. there are a lot of them out there. Which one you want to use as a matter of personal preference, they are largely for descriptive and to a degree diagnostic analytics. So they are always, in all cases intended for communication to a human recipient. They are to help a person better understand what data is doing in a business. Because of that it can be a little challenging to operationalize some of the solutions that come about in a click view or any of the other BI tools. By operationalize, I mean what will happen as a person or maybe a few people will have so much capability with those tools to create an entire workflow of merging databases together creating their own table on their laptop, then making a bunch of visualizations and new tables and visualizations off those tables and so on and so forth. It is hard for the data engineers of the world to tease out exactly what the heck is happening in that thing and making it into a business application. So a new report or a new KPI, one in which they have acceptance criteria and unit test around to make sure that it's consistently correct. I emphasize that last point, not as a knock against BI tools. They're great to explore data. They're great to communicate some complex relationships to people where folks run into problems. And back when I was doing a lot of BI engagements, this is where they would call in my team. And I think there's another question that you've asked about why my reports are slow. The reason behind it is that you can have in a BI tool, a single person develop all this stuff, but nobody can ever figure out what it is exactly that they did. So oftentimes they've failed to account for every what we would call a corner case where the data may not be correct. There may be something on the back end or business rule that wasn't applied correctly. And it's hard for people to identify where that issue is occurring because it's all self contained within this platform. So what they are are BI tools. They're used for fast analytics and insights. What they are not are super rigorous reporting platforms, nor are they artificial intelligence automation platforms for creating your own custom product. And they themselves will not attest to being that they will tell you up front that that's what they should be used for exploratory and diagnostic. And to a degree descriptive analytics, but you should not try to operationalize these things in less. You have a robust system behind it.
Mitch: (27:32)
How about our Python, SQL, spark? How do those relate? What are their benefits?
Dan: (27:39)
All right, so R in Python are somewhat similar. R is kind of sliding out of favor at the moment because it is geared more for purely statistical and mathematical operations. In order to explain the subtle difference between them would take an entire podcast itself and it's nearly impossible to do visually without a visual tool but just to keep it simple. R and Python are what you would call a scripted language, they're a true programming language where you can create functions that execute operations. You can import libraries that automate a lot of the routine and mundane things you do in code. A lot of websites will have some Python element baked into them now. So when you open up an app on your computer or you go to a website and it's doing some processing in the back end, there might be some Python operation somewhere along those lines. Similar with our but ours on a little smaller scale. You'll see those types of data operations be embedded into BI tools like click and Tableau in order to extend their functionality a little bit. sequel SQL is a query language as opposed to a scripting language. You can't really build or you can't period. You can't build a application on your computer. That would be like a desktop app load. So like you can't build Excel using sequel. You can't build word out of sequel. It is a query language, meaning that you pass into it what you want from a data set and it will return that you just say, give me these columns the sum of this column and group by those other columns. That's it. It's basically making pivot tables just at a larger scale confounding all of this spark. Well, spark is a framework. You don't code spark is not a thing that you program in. It's not a programming language. You don't code spark you use Python or Scala or R to access the spark API. That is application program interface. the, R Python Scala operations get data into a shape or pass commands into spark, which then executes operations in a massively parallel processing in memory environment. so spark is a framework that you need to know a few operations in and understand how you need to shape the data for it. but the way in which you interface with the spark application is through an R or Python code. So you would still need to know our Python or Scala, which is another scripted language in order to use spark. But spark is a way to execute data operations similar to the way you would in SQL, but in a distributed data environment. I must say though, very long answer for this. Again I personally would not spend much time learning spark because it's largely abstracted in a lot of the operations that we do in data today. You don't, you, you don't code much in spark anymore. You're fine with just Python and SQL. If you have to study something that's really advanced, I would study TensorFlow, although even that is getting abstracted and things like torch PI. so conceptually you should understand what spark and Pence or flow and those other advanced data operation programs are doing. You probably do not need to learn how to code anything in them because by the time you do a easier way of doing it, we'll have come along.
Announcer: (32:57)
This has been Count Me In, IMA's podcast, providing you with the latest perspectives of thought leaders from the accounting and finance profession. If you like what you heard and you'd like to be counted in from more relevant accounting and finance education, visit IMA's website at www.imanet.org.