Installing DBT using SQL server is extremely simple. I had many clients coming to me and asking if they could use DBT with SQL Server on-prem or managed instances. They look surprised when I answered them, "Yes, why not".
Now you might ask me, what is DBT? No its not Dialectic Behavior Therapy. Its short for Data Build Tool. It is an open source ETL tool. Well, not ETL tool rather an ELT tool.
You can consider it like a framework tool for your SQL Database or Warehouse management. Don't worry, I will show you what I mean in this little exercise.
ELT stands for Extract, Load, Transform. In ELT, data is first loaded into the data warehouse in its raw state, and then transformed there. This contrasts with ETL (Extract, Transform, Load) where the transformation happens before the data reaches the warehouse. Will it replace Azure Data Factory? No, Azure Factory is mostly used for moving data form services or building pipelines. DBT's focus is on transformation and not on moving terabytes of data between services. A combination of ADF and dbt or SSIS and DBT will mean that your data architect is a smart dude. Simply put, dbt is a data transformation to allow you to write SQL code to define transformations on the data already residing in your warehouse or RDBMS (Yes, even local ones included). If you're considering migrating to a cloud data warehouse in the future, using dbt with your on-premises SQL Server can give your team familiarity with the tool and its concepts, making the transition smoother.
DBT can promote the use of data transformation best practices, like writing modular and reusable SQL code. Improving the documentation practices. dbt excels at documenting data transformations and keeping track of changes through version control.This can improve the maintainability and consistency of your data transformations, even for those done directly in SQL Server.
Installation Steps:
Install SQL Server: You know how to do it
Install Python: You already know this bit too.
Install Miniconda: Follow the instructions in the link to download miniconda and install it. What is miniconda? It is a python environment manager. When you are using Python, it is better to use seperate environment. Installing Miniconda — Anaconda documentation
After installation of miniconda, run the following code
conda create --name dbt_proj
conda activate dbt_proj
pip install dbt-sqlserver
Now, lets go to your windows explorer's address bar and type down %userprofile%
Create a folder with .dbt name and inside the folder create the following file. File name must be profiles.yml
demo:
target: dev
outputs:
dev:
type: sqlserver
driver: 'ODBC Driver 18 for SQL Server' # (The ODBC Driver installed on your system)
server: LAPTOP-51A4638R\MSSQLSERVER01
port: 1433
database: test
schema: dbo
windows_login: True
trust_cert: True
What I am trying to do here is creating a config file to direct my dbt to connect to the sql server database and schema that I have created. If you have not created the sql server db and the schema, I suggest you do that first. Let's break down the components:
demo
: This is the name of the profile. You can define multiple profiles in the same file for different environments or data sources.target: dev
: This specifies the default target within this profile. Targets are typically used to differentiate between environments (dev, test, prod)
Under the outputs
section, we have the details for connecting to the SQL Server instance:
dev
: This reiterates the target name for this specific configuration.type: sqlserver
: This tells dbt that we're connecting to a SQL Server database.driver: 'ODBC Driver 18 for SQL Server'
: This specifies the ODBC driver to be used for connecting to the database. Make sure this driver is installed on your system.server: LAPTOP-51A4638R\MSSQLSERVER01
: This defines the server name of the SQL Server instance. In this case, it's likely a local instance namedMSSQLSERVER01
on a machine namedLAPTOP-51A4638R
.port: 1433
: This specifies the port on which the SQL Server instance is listening. The default port for SQL Server is 1433.database: test
: This defines the name of the specific database you want to connect to within the SQL Server instance.schema: dbo
: This specifies the default schema to use within the database.windows_login: True
: This indicates that you want to use Windows Authentication to connect to the database.trust_cert: True
: This setting tells dbt to trust the server's SSL certificate (if applicable).
Now save the file and Lets get back to installing DBT.
Now lets open up the visual studio code and install the following extension one by one.
Python
dbt
Download the ones with the highest number of downloads.
Let's now cd into a folder where you want to create your dbt project framework. In the folder and type down the following command
dbt init demo
This will spin up the dbt project and you will see a new folder created. If you cd into the demo folder and do a ls, you will see the following file format.
Now lets find a csv file and place it into the seeds folder. I will use this retail transaction dataset Retail Transaction Dataset (kaggle.com) csv file. I placed it into the seed folder. Before ingesting this table of kaggle's retail transaction into the sql test database, I want to show you that this table does not exists in my sql database. For that, I used the code below:
Now you can see I have placed the retail transaction csv file into the seeds folder.
We are ready to run the following command to ingest the file into the sql test db in dbo schema. This process is a quick process.
dbt seed
Do you see that dbt has run and ingested the csv into our sqlserver test db in dbo schema? It took a bit longer than expected, anyways, data ingestion is not what dbt is used for! I just wanted to show you the installation process and how it works. In one of my upcoming write up, I will probably try to explain the main dbt functionality and use case. dbt became the talk of the town due to its code reusability, scalability and documentation support for data transformation purposes, not due to it's extract and load capabilities. No one that I know of uses dbt for extract and load purposes.
See you in next with another write up.