How To Create EMR Notebook?
Within the new Amazon EMR dashboard, Amazon Web Services (AWS) has integrated Amazon EMR Notebooks into the Amazon EMR Studio Workspaces experience. The goal of this integration is to offer a single environment for notebook development and large data processing. Although the “Create Workspace” button in the new console is now mostly responsible for creating new notebooks.
To create an EMR notebook, users must go to the Amazon EMR console at the given web URL and follow the steps outlined for the previous console. Users would normally choose “Notebooks” and then “Create notebook” from this interface.
Users are asked for a Notebook name and given the choice to include a Notebook description when the creation process begins. Connecting the notebook to an Amazon EMR cluster, where the code will be run, is the next crucial step.
There are two main ways for users to associate a cluster:
Choose an existing cluster
Users can choose this option if an existing, suitable EMR cluster is already operational, click “Choose,” pick the cluster of their choosing from a list, and then click “Choose cluster” to confirm their selection. According to the documentation, EMR Notebooks have certain cluster needs. Specialised sections provide further details on these requirements, variations among EMR release versions, and security issues.
Create a cluster
As an alternative, users can choose to “Create a cluster” to have Amazon EMR generate a new cluster just for the laptop. Users specify a cluster name when they create a new cluster in this manner. The most recent supported EMR release version and necessary apps like Hadoop, Spark, and Livy are the defaults in this particular workflow, however some configuration variables, such as the Release version and pre-selected apps, might not be changeable.
By choosing the EC2 Instance type and inputting the required number of instances, users may customize the instance parameters. One instance is identified as the principal node, while the others function as core nodes. The kind of instance that is selected is important because it establishes the maximum number of notebooks that may connect to the cluster at once, subject to certain restrictions.
Further cluster options covered in the construction process include defining the EC2 instance profile and EMR role, for which users can select custom roles or utilize the defaults; links to further information about these service roles are included. Additionally, an EC2 key pair that allows SSH connections to cluster instances can be chosen.
Amazon EMR versions 5.30.0 and 6.1.0 and beyond offer the optional yet helpful feature of auto-termination. Users can designate a period of inactivity after which the cluster will automatically shut down if enabled by checking the box. Users can define distinct security groups for the primary instance and the notebook client instance, or they can opt to utilize the default security groups or choose from custom ones that are accessible in the cluster’s VPC.
Configuring notebook-specific information is part of the notebook creation process, in addition to cluster settings. Either a custom role or the default AWS Service Role for the notebook client instance can be chosen. The notebook file will be stored at the Notebook location on Amazon S3. The bucket or folder can be created by Amazon EMR if it doesn’t already exist, or users can select their own location. The notebook file structure is standardized; within the designated S3 location, a folder with the Notebook ID is created, holding the notebook file with the NotebookName and.ipynb extension.
One important security aspect is that the Service role for EMR Notebooks (which by default is EMR_Notebooks_DefaultRole) needs to be set up as a key user for the AWS KMS key used for encryption if an encrypted location in Amazon S3 is chosen. For instructions on adding key users to key policies, links to AWS KMS documentation and support articles are included.
If a Git-based repository has been added to Amazon EMR, users have the option to link it to the notebook. Picking from the list after selecting “Git repository” and then “Choose repository” is how this is accomplished.
Lastly, users have the option to add Tags as key-value pairs to the notebook. An Important Note regarding a default tag with the key creatorUserID and the value set to the user’s IAM user ID is included in the documentation. Because it can be used with IAM policies to manage access, users are advised not to alter or delete this tag, which is automatically applied for access control purposes. Selecting “Create Notebook” completes the notebook creation process after all options have been configured.
Users must be aware that although these instructions outline the creation procedure for the previous console, the new console now integrates EMR Notebooks as EMR Studio Workspaces. EMR Notebooks users need extra IAM role rights in order to access existing notebooks as Workspaces or create new ones using the “Create Workspace” option in the new interface. Users should refrain from changing or deleting the default tag that is automatically applied to notebooks for access control and contains the creator’s user ID. It is not possible to create notebooks with the Amazon EMR API or the AWS CLI.
Although the comprehensive construction instructions included in some current literature match the previous console interface, this shift represents AWS’s attempt to centralize notebook development within the EMR Studio environment.










Thank you for your Interest in Cloud Computing. Please Reply