How to set up an EMR studio in AWS? Standards for EMR Studio

To guarantee that users can access and use the environment efficiently, setting up an Amazon EMR Studio requires a few crucial procedures. Once you’ve fulfilled certain requirements, the process starts.

How to set up an EMR studio

Requirements for EMR Studio Setup Prior to beginning the setup, you ought to have:

  • An AWS account
  • The ability to establish and run an EMR Studio.
  • An Amazon S3 bucket set aside specifically for EMR Studio to backup notebook data and workspaces.
  • Up to five subnets and an Amazon Virtual Private Cloud (VPC), especially if you want to use Git repositories or connect to Amazon EMR on EC2 or Amazon EMR on EKS clusters. EMR Studio can be used with EMR Serverless without a VPC.

Steps for Setup Typically, the setup procedure consists of the following steps:

Select an Authentication Mode: You need to choose between using IAM Identity Centre authentication mode or IAM authentication mode for your studio. How you handle users and permissions is impacted by this decision. AWS IAM is used for authentication and IAM Identity Centre for identity centre. If you are familiar with IAM authentication or federation, IAM mode is compatible with numerous identity providers and easy to set up for managing identities. If you’re new to Amazon EMR or AWS, IAM Identity Centre mode offers features that make assigning users and groups simple. It also integrates with SAML 2.0 and Microsoft Active Directory, which makes multi-account federation easier.

Create the EMR Studio Service Role: In order for an EMR Studio to communicate with other AWS services, including creating a secure network channel between Workspaces and clusters, storing notebook files in Amazon S3, and gaining access to AWS Secrets Manager for Git repositories, the Studio needs an IAM service role. All Amazon S3 access rights for notebook storage and AWS Secrets Manager access for Git repositories should be defined using this service role.

Usually restricted by AWS, you construct this role with a particular trust policy that permits elasticmapreduce.amazonaws.com to take on the role:AWS:SourceArn and SourceAccount settings to guard against the confused deputy issue. You connect an IAM permissions policy to the role after creating it with the trust policy. Permissions for actions such as ec2:ModifyNetworkInterfaceAttribute for Amazon EC2 tag-based access control and particular S3 read/write operations for your assigned S3 bucket must be included in this policy. You will also require the appropriate AWS Key Management Service (KMS) permissions if your S3 bucket is encrypted. For the service role to function properly, some of the policy’s assertions about tagging network interfaces and default security groups must remain unchanged.

Set up user permissions for EMR Studio: To have fine-grained control over what users can do in the Studio, you must set up user access policies.

  • You need to create an EMR Studio user role in order to use IAM Identity Centre authentication option. With the help of sts, elasticmapreduce.amazonaws.com is able to assume this role’s trust relationship policy:sts:SetContext and AssumeRole operations. You attach your EMR Studio session policies to this user role prior to allocating users. For individuals or groups allocated to the Studio, session policies specify fine-grained rights, such as the ability to establish new EMR clusters. The permissions in a user’s session policy and their EMR Studio user role interact to form their final permissions. A user’s permissions are a combination of the group policies if they are a member of more than one group that has been assigned to the Studio.
  • In IAM authentication mode, you employ attribute-based access control (ABAC) and IAM permissions policies to provide users access to a studio. By allowing the elasticmapreduce:CreateStudioPresignedUrl action in a user’s IAM permissions policy, you may leverage its ARN or ABAC tags to limit the user to a certain Studio.
  • You define one or more IAM permissions policies that describe user actions, regardless of the authentication mode. Workspace creation, cluster attachment and detachment, Git repository management, and cluster creation are a few examples of basic, intermediate, and advanced rules with different levels of authority. The clusters, not the Studio user permissions, are where the permissions for data access control are set up.

(Optional) Define Security Groups: To manage network traffic for your EMR Studio, you can create custom security groups. Default security groups are utilised when Studio is created if you don’t select custom security groups. When utilising custom security groups, you choose a Workspace security group for outgoing access to associated clusters and Git repositories and an engine security group for inbound access from Workspaces.

Establish an EMR Studio: The AWS CLI or the Amazon EMR console can be used to establish the Studio. In addition to creating an EMR Serverless application, the interface provides a straightforward experience with preset settings for interactive workloads or batch operations. Alternatively, for complete control over settings, select ‘Custom’. The studio name, S3 location, workspace count, authentication mechanism (IAM or IAM Identity Centre), VPC, subnets, and security groups are examples of custom settings. An Identity Provider (IdP) login URL and RelayState parameter name for federated users can be optionally specified for IAM authentication.

You must choose the EMR Studio Service Role and User Role for IAM Identity Centre authentication. You may also activate trusted identity propagation for faster sign-on. Depending on the authentication mode selected, the AWS CLI command create-studio requires particular arguments for programmatic creation.

Assign a User or Group to an EMR Studio: You can designate users and groups to an EMR Studio once it has been built. The authentication mode determines the approach.

  • User assignment and permissions are set up in IAM for IAM authentication mode, maybe requiring your identity provider. This is accomplished by limiting access to the Studio using its ARN or ABAC tags and setting the user’s IAM rights policy to permit the CreateStudioPresignedUrl action.
  • You can use the AWS CLI or the Amazon EMR administration console to manage users in IAM Identity Centre authentication mode. You locate and choose users or groups in the Identity Centre directory when assigning them via the console. By entering the Studio ID, identity name, identity type (USER or GROUP), and ARN of the session policy to associate, you can use the create-studio-session-mapping command in the AWS CLI. At the time of assignment, you define a session policy. By altering the session policy, you can later change the permissions of assigned users.

You can start utilising the Amazon EMR Studio after completing these steps.

Thank you for your Interest in Cloud Computing. Please Reply

Discover more from Cloud Computing

Subscribe now to keep reading and get access to the full archive.

Continue reading