Skip to main content

AWS : Boto3 (Accessing AWS using Python)

Boto3 is the Amazon Web Services software development kit for Python, which allows Python developers to write software that makes use of services like Amazon S3 and Amazon EC2. Boto3 is maintained and published by AWS.

Please find latest documentation at : https://boto3.amazonaws.com/v1/documentation/api/latest/index.html

Command to install it : pip install boto3


Local storage Vs Cloud storage:

  • Local file system is block oriented, means storage is divided into block with size range 1-4kb
  • Collections of multiple blocks is called a file in local storage
  • Example : 10MB file will be occupying almost 2500 blocks(assuming 4kb each block)
  • We know that we can install softwares in local system (indirectly in blocks)
  • Local system blocks managed by Operating system
  • But Cloud storage is a object oriented storage, means everything is object
  • No size limit, it is used only to store data, we can't install software in cloud storage
  • Cloud storage managed by users


We need to install either Pycharm/Visual Studio code, install AWS plugins like AWS Core, AWS Toolkit etc., to write Python programs to connect to AWS using Boto3 package. I have installed VSC, and installed AWS Toolkit, AWS Boto3 plugins as shown in below image.


Once you installed above AWS plugins, you will see a AWS icon on the left hand side as shown in above screen shot. Click on AWS icon, screen will be as below.




Now, first thing we need is IAM credentials. We can get those from AWS IAM service. Follow below steps to get them.

  • Login to your AWS account using your credentials
  • Go to IAM service 
  • Using "User Groups" options, create a group and assign full permissions for EC2, RDS and S3 services as shown below
  • Now create a user under this group
  • Done, now this user in a group which is having full permissions for required AWS services
  • Click on "Create Acces key" and create Access key and Secret access key.

Now login to AWS from your IDE(Visual Studio Code in this case) using IAM credentials. Our IDE will be associated with AWS using these IAM credentials in the background.


Below python code is to get list of S3 buckets from AWS :

import boto3

# Create a boto3 session
session = boto3.Session(
aws_access_key_id='XXXX-XXXX-XXXX-XXXX',
aws_secret_access_key='xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx',
region_name='ap-south-1'
)

# Create an S3 client
s3 = session.client('s3')

# List all the buckets in your account
response = s3.list_buckets()

# Print the bucket names
for bucket in response['Buckets']:
print(bucket['Name'])

Output :

amathe1@HYDHTC119810LT /opt/ais/if -> /usr/local/bin/python3 ~/boto3_project/boto3_module/boto3_examples.py
arun2025


GitHub location to get above python code :
https://github.com/amathe1/boto3_project/blob/main/boto3_module/b3_get_s3_buckets_list.py

Lets see what we are doing in above program :
  • First thing is we have to import boto3 package
  • Next build a session for boto3 by passing AWS access key id, secret access key and region name of your AWS account
  • Note this session is the type : <class 'boto3.session.Session'>
  • Now create a S3 client using above session
  • This client is of type : <class 'botocore.client.S3'>
  • Now we have to get the list of all buckets using above client
  • This list is of type : <class 'dict'> 
For your understanding, this is what inside response(it is a dictionary) : We are looping below dict(key, value pairs) to get the value of key 'Name' from key 'Buckets'. We need to have knowledge on Python containers(tuple, list, set, dict) to understand above for loop in the code.

{'ResponseMetadata': {'RequestId': 'D75SXKJJSTV5FMJK', 'HostId': '0WlVGx7o3RMvo+c/GxqE3x43DWgWTj6LIFsKG2WqRp4AmL2xadNlxVNkiAexki1/g2YKwlaLP5k=', 'HTTPStatusCode': 200, 'HTTPHeaders': {'x-amz-id-2': '0WlVGx7o3RMvo+c/GxqE3x43DWgWTj6LIFsKG2WqRp4AmL2xadNlxVNkiAexki1/g2YKwlaLP5k=', 'x-amz-request-id': 'D75SXKJJSTV5FMJK', 'date': 'Wed, 29 Jan 2025 09:21:41 GMT', 'content-type': 'application/xml', 'transfer-encoding': 'chunked', 'server': 'AmazonS3'}, 'RetryAttempts': 0}, 'Buckets': [{'Name': 'arun2025', 'CreationDate': datetime.datetime(2025, 1, 27, 14, 43, 40, tzinfo=tzutc())}], 'Owner': {'ID': '91e446ffa1b150c0133eaa3988eb39ecc882794690a4070586143886ef724df8'}}


We have successfully connected with AWS programatically and get the list of S3 buckets in account(whose IAM credentials are used to login in IDE as shown above in this blog).

Incase if you have mentioned incorrect access key id & key, you will below error when you exacute above program :

Traceback (most recent call last):
  File "/Users/Arunkumar_Mathe/Documents/boto3_project/boto3_module/boto3_examples.py", line 15, in <module>
    response = s3.list_buckets()  
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/botocore/client.py", line 565, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/Library/Frameworks/Python.framework/Versions/3.9/lib/python3.9/site-packages/botocore/client.py", line 1021, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (InvalidAccessKeyId) when calling the ListBuckets operation: The AWS Access Key Id you provided does not exist in our records.



Applications of Boto3 module:

  • Managing AWS resources : Boto3 provides a simple and intuitive API for managing various AWS resources, such as EC2 instances, S3 buckets, DynamoDB tables, and more.
  • Automating AWS workflows : With Boto3, you can automate complex workflows and processes involving multiple AWS services. For example, you can create a script that automatically launches an EC2 instance, sets up a database on RDS, and deploys a web application on Elastic Beanstalk.
  • Data analysis and processing : Boto3 can be used to analyze and process large volumes of data stored in AWS services such as S3 and DynamoDB. You can use Boto3 to write scripts that read, write, and manipulate data stored in these services.
  • Monitoring and logging : Boto3 can be used to monitor and log various AWS resources, such as EC2 instances, Lambda functions, and CloudWatch metrics. You can create scripts that automatically monitor these resources and alert you if any issues arise.
  • Security and access control : Boto3 provides tools for managing security and access control in AWS. For example, you can use Boto3 to create and manage IAM users, groups, and policies, as well as to configure security groups and network ACLs.
Overall, Boto3 is a powerful and versatile tool that can be used to automate, manage, and monitor various AWS resources and services.


Below python code is to Create & Delete S3 buckets in AWS :

import boto3

# Create a boto3 client
client = boto3.client('s3',
region_name="ap-south-1",
aws_secret_access_key = '<put_your_secret_key_here>',
aws_access_key_id = 'put_your_aws_access_key_id_here')

# To create a bucket with name 'arun9705'
response = client.create_bucket(

Bucket='arun9705',
CreateBucketConfiguration={
'LocationConstraint': 'ap-south-1'
})
# To get list of buckets in our AWS account
response = client.list_buckets()

# Print the bucket names
for bucket in response['Buckets']:
print(bucket['Name'])

# To delete given bucket
response = client.delete_bucket(
Bucket='arun9705',
)

GitHub location to get above python code :
https://github.com/amathe1/boto3_project/blob/main/boto3_module/b3_create_delete_s3_buckets.py


I will write more blogs on other common examples of Boto3 in coming blogs. Have a great day!


Arun Mathe

Gmail ID : arunkumar.mathe@gmail.com

Contact No : +91 9704117111





















Comments

Popular posts from this blog

Python : Python for Spark

Python is a general purpose programming language, that is used for variety of tasks like web-development, Data analytics etc. Initially Python is developed as a functional programming language, later object oriented programming concepts are also added to Python. We will see what basics we need in Python to play with Spark. Incase if you want to practice Spark in Big Data environment, you can use Databricks. URL :  https://community.cloud.databricks.com This is the main tool which programmers are using in real time production environment We have both Community edition(Free version with limited support) & paid versions available Register for above tool online for free and practice Indentation is very important in Python. We don't use braces in Python like we do in Java, and the scope of the block/loop/definition is interpreted based on the indentation of code. Correct Indentation : def greet():     print("Hello!")  # Indented correctly     print("Welcome ...

AWS : Working with Lambda, Glue, S3/Redshift

This is one of the important concept where we will see how an end-to-end pipeline will work in AWS. We are going to see how to continuously monitor a common source like S3/Redshift from Lambda(using Boto3 code) and initiate a trigger to start some Glue job(spark code), and perform some action.  Let's assume that, AWS Lambda should initiate a trigger to another AWS service Glue as soon as some file got uploaded in AWS S3 bucket, Lambda should pass this file information as well to Glue, so that Glue job will perform some transformation and upload that transformed data into AWS RDS(MySQL). Understanding above flow chart : Let's assume one of your client is uploading some files(say .csv/.json) in some AWS storage location, for example S3 As soon as this file got uploaded in S3, we need to initiate a TRIGGER in AWS Lambda using Boto3 code Once this trigger is initiated, another AWS service called GLUE(ETL Tool)  will start a Pyspark job to receive this file from Lambda, perform so...