Access AWS Glue libraries and develop code locally free of cost on windows

Naga Sri Harsha Akavarapu
5 min readOct 13, 2020

Use Docker containers to test your glue scripts locally free of cost and without using Dev Endpoints.

In this Post, I will demonstrate how to run AWS Glue on your local Windows laptop by running a Docker container using the official image provided by AWS. Technically, it can also be run on MacOS, but the steps would be a little different.

Below are the steps to be followed to setup AWS glue functionality on a ‘Windows 10’ environment:

1) Download Docker for Windows and configure Docker settings

2) Pull Docker Image

3) Run Docker Image

Step1: Download Docker for Windows:

· Download Docker for Windows using this link: https://hub.docker.com/editions/community/docker-ce-desktop-windows/

· Scroll down and click on ‘Get Docker Desktop for Windows (stable)’.

· Run the Docker Desktop Installer.exe from the downloads folder (Path where you have downloaded it).

· When you run the Docker desktop app for the first time, you will see this window.

· Once the docker desktop app gets installs it will prompt to a setting to use either “WSL2” or “Hyper-V” setting.

· Select ‘Hyper-V’ and configure it. For further instructions on how to do this, refer to the following link –

o https://docs.microsoft.com/en-us/virtualization/hyper-v-on-windows/quick-start/enable-hyper-v

o Alternatively, the easiest way to setup ‘Hyper-V’ is to open ‘Windows PowerShell’ and run this command:

Enable-WindowsOptionalFeature -Online -FeatureName Microsoft-Hyper-V -All

· Once you run this command in Windows PowerShell, it will ask you to restart the system. Please do that.

· After your system has successfully restarted, open ‘Docker Desktop’ again.

· Go to ‘Show hidden icons’ and you should be able to see a Docker icon.

· Right click on Docker icon and select sign in option. It will ask for DockerID and Password.

· You can create a Docker ID and password by signing up using this link: https://hub.docker.com/

· Once you have successfully signed in, go back to Docker icon and right-click on it and select Settings.

· Once you click on Settings, this screen would pop up

· Make sure that all the selections are done as per the above screenshot for ‘General’ tab.

· Next click on “Resources” tab and make the selections as shown below in the screenshot. Do note that these can vary as per your own desktop/laptop hardware configuration. You can tweak this to whatever is most appropriate for you.

· Once you select all the required settings correctly, click on ‘Apply and Restart’ option.

· Docker will restart and start running, thereby concluding its initial setup.

Step 2: Pull Docker Image

· Open the ‘Command Prompt’ on your machine and run below command to pull the official AWS Glue image on your local machine. It may take a while.

docker pull amazon/aws-glue-libs:glue_libs_1.0.0_image_01

· The image size is 5.92 GB.

· If you open Docker desktop after the above step is completed and you will be able to see an image created.

Step 3: Run Docker Image

· Open the ‘Command Prompt’ on your machine and run the below command to setup a container with Jupyter notebook up and running:

docker run -it -p 8888:8888 — name glue_jupyter amazon/aws-glue-libs:glue_libs_1.0.0_image_01 /home/jupyter/jupyter_start.sh

· In the command above ‘glue_jupyter’ can be customized to give whatever name you wish to give to this container

· This command will run the Docker image and create a container with name glue_jupyter on local host 8888 (default working port)

· Once you have ran this command, open Docker desktop and you can see a container created with the name ‘glue_jupyter’

If you hover over the container, click on ‘Open in Browser’ icon which will show up and it will open the Local server on which this container is running.

Once you click on “Open in Browser” option it will open the server in web browser.

command to start this local server : /home/jupyter/jupyter_start.sh (required only once)

This interface is similar to Jupyter notebook and you can create pyspark notebook and initialise aws glue context.

we can start spark application, spark session using glue context.

we can create dynamic frames, spark dataframes and perform several operations similar to what we perform in glue jobs at free cost and with no job startup time.

Additional Information:

· If you wish to connect to any AWS service, you would need to configuring AWS CLI to access AWS services within Docker.

· To do this, click on ‘CLI’ icon in the created container.

· In the cli shell, run the command ‘aws configure’

. Add the ‘Access Key id’ and ‘Secret Access Key id’ and configure your respective keys.

· Now you will be able to access AWS services in the created container.

· For running spark commands and see the possible operations you can refer to this document for details: https://aws.amazon.com/blogs/big-data/building-an-aws-glue-etl-pipeline-locally-without-an-aws-account/

For further reading, please refer to the following blog post from AWS — https://aws.amazon.com/blogs/big-data/developing-aws-glue-etl-jobs-locally-using-a-container/

--

--