Skip to main content

SQL : How to setup SQL workbench ?

MySQL Workbench is a unified visual tool for database architects, developers, and DBAs. MySQL Workbench provides data modeling, SQL development, and comprehensive administration tools for server configuration, user administration, backup etc. 

We are trying to interact with relational databases inside cloud(AWS in this scenario) especially residing in AWS RDS service. Using this workbench, we are going to have a tool where we can visually see the tables inside database. 

Let's see how to set this up :

  • Open browser, type SQL workbench inside the google page
  • Now click on https://www.sql-workbench.eu/ 
  • Under downloads tab, use below link to download it
  • Generic package for all systems without support for importing or exporting Excel or OpenOffice spreadsheets (sha1)
  • In your local system, create a folder with name "Big Data" and keep above downloaded folder inside that folder
  • Extract it, go inside extracted folder and right click on SQLWorkbench 64, under show more options create a shortcut in desktop to access it via desktop

Note : We need Java 11 to be installed, use "java -version" from command prompt to confirm it. Incase if you need to install, refer any youtube video to get it done, make sure to set the Java path in environments variables.


How to open a new workbench and connect to a Database ?

  • Click on the SQLWorkbench shortcut in your local desktop
  • Give a name, example as "mysql" as per below screenshot
  • We are trying to connect to a MySQL database, hence select MySQL driver under Driver section
  • Note that you need to download required libraries first time, once you selected the driver, it will prompt for it, click yes and proceed further to download libraries, click on Download driver, select and older version like 8.0.28 under available versions, then click ok.
  • Now enter database username and password(you should collect this information while creating this DB in AWS RDS)
  • Now, under URL section, update hostname & port no(you will find this information inside the database folder under RDS service in AWS) - check below screenshot for same information
  • Remove property name_of_database incase if you are not sure what to mention here
  • Click on test if the connection is successful
  • If connection is successful, then click OK
  • If you are doing it for first time, then go to Tools, click show DB Tree, then it will show databases





  • To create a new database, then use below query
  • CREATE DATABASE <name of database>
  • To execute, shortcut is Ctrl+Enter (or) click on run button on top right corner
  • To use this database, use below query
  • USER <name_of_database>

                                                                              



Below are the sample queries to create some dummy tables and insert data into.

-- Create the 'dept' table
CREATE TABLE dept (
    dept_id INT PRIMARY KEY,
    dept_name VARCHAR(100),
    location VARCHAR(100)
);

-- Insert 21 records into 'dept'
INSERT INTO dept (dept_id, dept_name, location) VALUES
(1, 'HR', 'New York'),
(2, 'Finance', 'Los Angeles'),
(3, 'Engineering', 'San Francisco'),
(4, 'Sales', 'Chicago'),
(5, 'Marketing', 'Boston'),
(6, 'Support', 'Seattle'),
(7, 'IT', 'Austin'),
(8, 'R&D', 'San Diego'),
(9, 'Operations', 'Houston'),
(10, 'Legal', 'Dallas'),
(11, 'Admin', 'Denver'),
(12, 'Procurement', 'Miami'),
(13, 'Security', 'Atlanta'),
(14, 'QA', 'Phoenix'),
(15, 'Logistics', 'Detroit'),
(16, 'Design', 'Portland'),
(17, 'Analytics', 'Philadelphia'),
(18, 'Training', 'San Jose'),
(19, 'Consulting', 'Salt Lake City'),
(20, 'Recruitment', 'Las Vegas'),
(21, 'Strategy', 'Minneapolis');

-- Create the 'emp' table
CREATE TABLE emp (
    emp_id INT PRIMARY KEY,
    emp_name VARCHAR(100),
    dept_id INT,
    job_title VARCHAR(100),
    salary DECIMAL(10, 2),
    FOREIGN KEY (dept_id) REFERENCES dept(dept_id)
);

-- Insert 21 records into 'emp'
INSERT INTO emp (emp_id, emp_name, dept_id, job_title, salary) VALUES
(1, 'John Doe', 1, 'Manager', 75000.00),
(2, 'Jane Smith', 2, 'Accountant', 65000.00),
(3, 'Bob Johnson', 3, 'Engineer', 85000.00),
(4, 'Alice Brown', 4, 'Sales Representative', 55000.00),
(5, 'Tom Clark', 5, 'Marketing Specialist', 60000.00),
(6, 'Nancy White', 6, 'Support Analyst', 50000.00),
(7, 'Steve Adams', 7, 'IT Administrator', 70000.00),
(8, 'Emma Lee', 8, 'R&D Scientist', 90000.00),
(9, 'Paul King', 9, 'Operations Manager', 75000.00),
(10, 'Susan Hill', 10, 'Legal Advisor', 80000.00),
(11, 'Mary Green', 11, 'Admin Assistant', 45000.00),
(12, 'David Wright', 12, 'Procurement Officer', 55000.00),
(13, 'Chris Hall', 13, 'Security Specialist', 60000.00),
(14, 'Megan Scott', 14, 'QA Analyst', 52000.00),
(15, 'Laura Evans', 15, 'Logistics Coordinator', 58000.00),
(16, 'Kevin Harris', 16, 'Graphic Designer', 62000.00),
(17, 'Sophia Young', 17, 'Data Analyst', 75000.00),
(18, 'Michael Turner', 18, 'Trainer', 48000.00),
(19, 'Olivia Walker', 19, 'Consultant', 67000.00),
(20, 'Ethan Lewis', 20, 'Recruiter', 52000.00),
(21, 'Isabella Carter', 21, 'Strategy Specialist', 89000.00);

-- Create the 'orders' table
CREATE TABLE orders (
    order_id INT PRIMARY KEY,
    emp_id INT,
    order_date DATE,
    order_amount DECIMAL(10, 2),
    FOREIGN KEY (emp_id) REFERENCES emp(emp_id)
);

-- Insert 21 records into 'orders'
INSERT INTO orders (order_id, emp_id, order_date, order_amount) VALUES
(1, 1, '2023-01-10', 1000.00),
(2, 2, '2023-01-15', 2000.00),
(3, 3, '2023-01-20', 1500.00),
(4, 4, '2023-02-10', 1200.00),
(5, 5, '2023-02-15', 1800.00),
(6, 6, '2023-02-20', 1400.00),
(7, 7, '2023-03-10', 1700.00),
(8, 8, '2023-03-15', 1300.00),
(9, 9, '2023-03-20', 1600.00),
(10, 10, '2023-04-10', 1900.00),
(11, 11, '2023-04-15', 1100.00),
(12, 12, '2023-04-20', 1500.00),
(13, 13, '2023-05-10', 2000.00),
(14, 14, '2023-05-15', 1250.00),
(15, 15, '2023-05-20', 1750.00),
(16, 16, '2023-06-10', 1350.00),
(17, 17, '2023-06-15', 1650.00),
(18, 18, '2023-06-20', 1550.00),
(19, 19, '2023-07-10', 1850.00),
(20, 20, '2023-07-15', 1950.00),
(21, 21, '2023-07-20', 2100.00);

Finally, our SQL workbench is ready to interact with AWS. We can use Databricks account to create a Spark cluster and connect to this DB to read, write data.

Comments

Popular posts from this blog

Python : Python for Spark

Python is a general purpose programming language, that is used for variety of tasks like web-development, Data analytics etc. Initially Python is developed as a functional programming language, later object oriented programming concepts are also added to Python. We will see what basics we need in Python to play with Spark. Incase if you want to practice Spark in Big Data environment, you can use Databricks. URL :  https://community.cloud.databricks.com This is the main tool which programmers are using in real time production environment We have both Community edition(Free version with limited support) & paid versions available Register for above tool online for free and practice Indentation is very important in Python. We don't use braces in Python like we do in Java, and the scope of the block/loop/definition is interpreted based on the indentation of code. Correct Indentation : def greet():     print("Hello!")  # Indented correctly     print("Welcome ...

AWS : Working with Lambda, Glue, S3/Redshift

This is one of the important concept where we will see how an end-to-end pipeline will work in AWS. We are going to see how to continuously monitor a common source like S3/Redshift from Lambda(using Boto3 code) and initiate a trigger to start some Glue job(spark code), and perform some action.  Let's assume that, AWS Lambda should initiate a trigger to another AWS service Glue as soon as some file got uploaded in AWS S3 bucket, Lambda should pass this file information as well to Glue, so that Glue job will perform some transformation and upload that transformed data into AWS RDS(MySQL). Understanding above flow chart : Let's assume one of your client is uploading some files(say .csv/.json) in some AWS storage location, for example S3 As soon as this file got uploaded in S3, we need to initiate a TRIGGER in AWS Lambda using Boto3 code Once this trigger is initiated, another AWS service called GLUE(ETL Tool)  will start a Pyspark job to receive this file from Lambda, perform so...

AWS : Boto3 (Accessing AWS using Python)

Boto3 is the Amazon Web Services software development kit for Python, which allows Python developers to write software that makes use of services like Amazon S3 and Amazon EC2. Boto3 is maintained and published by AWS. Please find latest documentation at : https://boto3.amazonaws.com/v1/documentation/api/latest/index.html Command to install it : pip install boto3 Local storage Vs Cloud storage: Local file system is block oriented, means storage is divided into block with size range 1-4kb Collections of multiple blocks is called a file in local storage Example : 10MB file will be occupying almost 2500 blocks(assuming 4kb each block) We know that we can install softwares in local system (indirectly in blocks) Local system blocks managed by Operating system But Cloud storage is a object oriented storage, means everything is object No size limit, it is used only to store data, we can't install software in cloud storage Cloud storage managed by users We need to install either Pyc...