Home › Forums › Past Conference Talks › Accepted Talks Proposals for Hyderabad › Agile for DW/BI Programs
Tagged: Agile for DataWarehousing, BI
This topic contains 1 reply, has 2 voices, and was last updated by Saket Bansal 6 years, 7 months ago.
October 17, 2016 at 12:18 am #7502
The scope of a data warehouse program can be either loading the data warehouse or developing the application that will source data from the data warehouse. In either case, it’s important for programs using Agile that the focus is maintained on the end users.
A key tenet of Agile is working software delivered in smaller iterations that can be demonstrated to end users. Deliverables should be tangible and easily relatable to those end users. The short duration of iterations enables soliciting feedback to align with their needs, identify gaps, and address them sooner in process. It helps with “Fail fast, recover faster.”
A DW program focuses on sourcing, loading, transforming, and modeling the data and ensuring availability of data in a standard model called a schema. The data can cater to generic consumption patterns.
A BI program brings focus toward a specific customer, their reporting or application needs, and typically will have a visual interface through which an end user can be engaged.
For DW programs that combine BI scope, it’s easy to engage end users and involve them during validation as part of sprints.
In data-only programs, this poses a challenge as there is no user interface to demonstrate the functionality. This is one of the key challenges for a DW program, and teams find it hard to adapt the Agile model. This paper proposes best practices, suggestions for how it can be achieved, and why Agile works effectively for such programs.
Characteristics of a DW/BI program
The definition of a data warehouse program is very important. Are we defining the entire BI program, which contains loading the DW data? Or are we separating the BI piece from DW and catering to BI program separately, which will have data dependencies on DW?
The end-to-end solutions aren’t possible if we cater to BI programs only; they will always wait for data availability. A DW program will use necessary ETL or other data-loading tools, which will help in loading the data into databases or big data. Data profiling, data standardization, data modeling, and data load will form an integral part of a DW program. Developing front-end BI forms only a part of the entire DW/BI program. If the DW activities are separated out, then the BI teams will be waiting for them to be completed in order to test the BI functionalities. Unlike BI front-end applications, DW solutions lean on data and its interrelationships in order to give the result for a query.
Data warehousing solutions are dependent on architecting the system correctly; time is needed to understand various data relationships, primary keys, and how to model the databases. Without having the right relationships and conditions up front, the data model and architecture can’t be accurately designed.
For a BI solution, it is easy to demo a working product every iteration and get feedback. People may reflect on the positioning of the columns or color scheming. The same isn’t true for the DW solution. In order to showcase the “solution,” the data has to be profiled, modeled, and coded correctly. Since data drives DW/BI projects and data management is the larger effort, the number of features that can be delivered every release is extremely small.
Data-driven approach vs. business-driven approach
In general, Agile frameworks take a business-driven approach as opposed to DW/BI solutions, which are very data-driven in nature. Creating analytical databases is complex, time-consuming, and often expensive when data-driven methods are used.
Taking a data-driven approach, we would want to integrate and homogenize most of the data before the first query or report can be written. This means that most of the effort would go into integrating thousands of fields before any value is realized by the business. This could take a lot of time and effort, not to mention the risk involved in spending most of the project budget before business sees any value.
In the business-driven approach, only the data that is needed to answer specific business questions or to solve specific business problems is sourced. Agile practitioners would work with the business community to define the most essential data elements, which drive the performance and are able to bring value to the table much more quickly. These requirements are captured using the user stories.
Recommended practices for DW/BI programs
In order for the program to adopt Agile, those involved should consider the following recommended practices.
1. Adopt an evolutionary data modeling approach
Do not get into a “big design up front” approach. Instead, perform data modeling in an iterative, incremental, and collaborative manner. Have the architectural runway available for teams to start with. Based on the team’s feedback and better understanding of the requirements, apply changes to the model. A good approach is to build an end-to-end skeleton of the system to prove that all aspects of it work. For example: access to legacy data systems, ETL strategy works, database regression strategy works, and reporting tools can access the DW.
For data migration and data integration programs, database and code refactoring often will be needed. Do not see this as a failure; realize that this will be a given, and often needed as and when more information is uncovered.
Refactoring improves objective attributes of code that correlate with ease of maintenance; it also helps code understanding. Code refactoring encourages each developer to think about and understand design decisions; it favors the emergence of reusable design elements and code modules.
3. Focus on data “usage”
In order to better serve the needs of the business, we need to understand how the data will be used. What’s the end goal of migrating all the columns of various tables? What kind of analytical reports will be generated using these reports? If we focus too much on data and not on its usage, we will build a system that nobody is interested in using.
In order to effectively demo the system, it is important to understand the data usage. This helps develop, validat, and demo the intended behavior of the data.
4. Working software
The only accurate measure of progress on an Agile project is the delivery of working software, not the delivery of documentation or other non-executable work products that are nothing more than promised to deliver software at some point in the future. Giving a demo every two weeks as part of the iteration cycle helps get faster feedback. It helps incorporate “fail fast” behavior.
If a program has a BI front-end application, it is easier to demo the data usage on the screen. The iterations should be planned strategically so that not all the DW work is done up front before BI work is taken on. There should be an overlay of iterations between DW and BI work, such that in order to demo the BI application, the data is available and is ready to be showcased.
If a program doesn’t incorporate a BI front end, and is typically only movement or transformation of data from one source to another (typical ODS programs), the challenge lies in frequent demos. In order for the team to demo their working software, they need to provision a simple self-service UI that will allow a basic query against the data loaded into the database.
5. Stakeholder participation
Stakeholder involvement is critical to the project. It is always preferred to have a product owner who represents the business and is available to the team every day. Active stakeholder participation helps the project move forward faster. Doing so allows taking early feedback and teams to adjust design, match requirements with customer needs, and avoid surprises.
In DW programs, this role will depend on the identity of the end user for the data being delivered. Is it a reporting need, or is it needed by another application team, or is the production support team going to be responsible for this data?
Depending on the purpose and usage of the data, the product owner should be identified. This should be someone who can be involved early on in the life cycle to understand the requirements and needs.
6. Test throughout the lifecycle
One of the critical factors in the success of projects in the data community is testing; the number of data quality challenges that teams continue to suffer from in production databases is huge. It is common on Agile projects to do significantly more testing than typically occurs on traditional projects. Agilists prefer to test throughout the project lifecycle (in fact, many agilists take a test-first approach to development). For database testing, just as you should test your application code, you should also test your database code.
There are many types of testing that can be adopted for a DWBI project: developer TDD, acceptance TDD (ATDD), regression testing, pre-production integration testing, etc.
It is critical to make sure that the data that is going to be available for front-end BI application is tested throughout for its validity and expected behavior. The team should have an extensive test suite to validate the data.
7. Involve operations up front
Once the data is loaded into production systems, the operations and support teams take over. They are key stakeholders for any DW/BI project. That is how they should be treated — as key stakeholders. So involve them up front in the discussions; they should be invited to each demo session. It’s essential to find out what their requirements are — for example, deployment dates, archiving needs, security details, etc.
October 19, 2016 at 12:52 pm #7505
Thank you for your talk proposal, we are happy to accept your talk for Hyderabad conference.
The forum ‘Accepted Talks Proposals for Hyderabad’ is closed to new topics and replies.