Working With EMR Notebooks AWS Using Jupyter Notebook

Working With EMR Notebooks AWS

A simplified method of interacting with data processing clusters is offered by Amazon EMR Notebooks, which are now accessible in the interface as EMR Studio Workspaces. They use the well-known open-source Jupyter Notebook or JupyterLab editors and are accessible straight from the Amazon EMR UI. This arrangement may be more efficient than notebooks hosted directly on an EMR cluster. The editor may be opened directly in the console by users who have enough IAM rights.

Notebook Statuses

Knowing the current state of EMR Notebooks is essential for determining when and how to communicate with them. Below is a summary of the many states you may come across:

Starting: Currently, the notebook is being generated and connected to the cluster. You are unable to launch the editor, halt it, remove it, or modify its cluster at this stage. It starts quickly, however it can take longer if a cluster was formed at the same time.

Ready: You may use the notebook editor to access the fully prepared notebook. You can choose to halt or remove the notebook while it is in this condition. However, the notebook must be halted before changing the associated cluster. If a notebook in the Ready state is left inactive for a long time, it will automatically shut down.

Pending: Although the notebook has been created, it is awaiting complete cluster integration, which may involve resource provisioning or other requirements. You can launch the notebook editor in local mode in this condition, but code that depends on cluster processes will not run and will fail.

Stopping: The laptop is shutting down, or the cluster that goes with it is ending. The editor cannot be opened, stopped, deleted, or clusters changed while it is halting, just like in the ‘Starting’ condition.

Stopped: The laptop has shut down successfully. From this state, you may either destroy the laptop, switch the cluster to which it is attached, or restart it on the same cluster (if the cluster is still operational).

Deleting: The notebook is being deleted from the console’s list. The notebook file (NotebookName.ipynb) remains on Amazon S3 and will still incur appropriate storage fees even after the notebook entry is deleted.
To get the most recent status, you can refresh the console’s notebook list.

Working Within the Notebook Editor

The notebook must be in the Ready or Pending state in order to launch the notebook editor. You pick Open in JupyterLab or Open in Jupyter after selecting the notebook from the list. By doing this, a new tab in the browser is opened with the editor. After it’s open, you should choose from the Kernel menu the kernel that corresponds to your programming language.

The fact that only one person may have a certain EMR notebook open at once is a crucial feature of the editor that can be accessible through the console. An error will appear if you try to open a notebook that is already being used by someone else. Amazon EMR generates a distinct pre-signed URL for every session that is only valid for a little amount of time, demonstrating how important security is.

It is highly discouraged to share this URL since doing so might expose receivers to security risks as they would inherit your rights. IAM permissions policies and making sure the service role for EMR Notebooks has access to the specified Amazon S3 location are two ways to handle proper access control.

Saving Your Work

The contents of your notebook cells and output are automatically and sporadically saved to the notebook file hosted in Amazon S3 while you are working in the editor. When there are no changes since the previous save, the editor displays “autosaved,” and when there are, it displays “unsaved.” By clicking CTRL+S or selecting Save and Checkpoint from the File menu, you may also manually save the notebook. A checkpoint file (NotebookName.ipynb) is created by manual saves and stored in the checkpoints folder inside the notebook’s primary Amazon S3 folder. This location just keeps the most recent checkpoint file.

Changing Attached Clusters

The ability to switch the cluster to which an EMR notebook is linked without changing the notebook’s content is a useful feature. However, only notebooks with a Stopped state can do this activity. Selecting the paused notebook, looking at its information, selecting the Change cluster, and then either picking an already-existing suitable Hadoop, Spark, and Livy cluster or building a new one are the steps involved in the procedure. Lastly, you pick the security group choices and click Change cluster and start laptop to confirm.

Deleting Notebooks and Associated Files

You may delete an EMR notebook from your list of available notebooks by using the Amazon EMR interface. Importantly, the underlying notebook files on Amazon S3 are not erased by this procedure. These data stay on S3 and keep racking up storage fees.

You must first delete the notebook from the console, making note of its Amazon S3 location (located in the notebook information), in order to remove the notebook entry and its files. The folder and its contents must then be manually removed from the designated S3 location using the AWS Command Line Interface (AWS CLI) or the Amazon S3 interface. To remove the notebook directory and its contents, an example CLI command is given.

Sharing and Using Notebook Files

Every EMR notebook is kept in Amazon S3 as a NotebookName.ipynb file. You can open a notebook file as an EMR notebook as long as it is compatible with the version of Jupyter Notebook that EMR Notebooks utilise. utilising a notebook file from another user is as easy as saving the.ipynb file locally and then utilising the Jupyter or JupyterLab editors’ upload capability. If you still have the file, you may use this procedure to recover a notebook that was erased from the console or to work with publicly shared Jupyter notebooks.

As an alternative, you may replace the original notebook file in S3 with a different notebook file that serves as the foundation for a new EMR notebook. Any open editor sessions must be closed, the EMR notebook must be stopped if it is running.

A new EMR notebook must be created with the exact name you want the new file to have, its S3 location and Notebook ID must be noted, the newly created notebook must be stopped, and the.ipynb file must be copied and replaced in that specific S3 location using a tool such as the AWS CLI, making sure the file name matches the one provided when creating the new notebook. This procedure is demonstrated using an example of an AWS CLI command.

Thank you for your Interest in Cloud Computing. Please Reply

Discover more from Cloud Computing

Subscribe now to keep reading and get access to the full archive.

Continue reading