EMR Studio Features Requirements and Limits AWS

Features, specifications, and limitations of Amazon EMR Studio:

An overview of an integrated development environment (IDE) for data preparation and visualization, departmental collaboration, and application debugging is offered by Amazon EMR Studio. When using EMR Studio, the factors including tool usage, cluster needs, known problems, feature constraints, service limits, and regional availability.

Amazon EMR Studio Features

  • Administrators can use Service Catalogue to link an EMR Studio to a collection of cluster templates. For a workspace, this enables users to set up new EMR clusters running on Amazon EC2. Administrators can decide which individuals or groups in a Studio have access to particular cluster templates or none at all.
  • You should utilise the Amazon EMR service role to define access permissions to notebook files stored in Amazon S3 or to read secrets from AWS Secrets Manager because these permissions do not support session policies.
  • Multiple EMR Studios can be created to control access to EMR clusters situated in various Virtual Private Clouds (VPCs).
  • Amazon EMR should be configured on EKS clusters using the AWS Command Line Interface (CLI). To launch notebook jobs, you can then connect these clusters to Workspaces using a managed endpoint through the Studio interface.
  • When utilising trusted identity propagation with Amazon EMR, there are additional factors to take into account, and these also apply to EMR Studio. EMR Studio can only connect to EMR clusters that use trusted identity propagation if it is set up with IAM Identity Centre and trusted identity propagation enabled.
  • Applications hosted by application hosting domains are listed in the Public Suffix List (PSL) to improve the security of off-console applications deployed with Amazon EMR. emrappui-prod.us-east-1.amazonaws.com, emrnotebooks-prod.us-east-1.amazonaws.com, and emrstudio-prod.us-east-1.amazonaws.com are a few examples. A __Host- prefix is suitable for sensitive cookies in the default domain name to assist prevent cross-site request forgery (CSRF) and provide additional protection.
  • For encryption-in-transit, EMR Studio Workspaces and Persistent UI endpoints use FIPS 140 certified cryptographic modules, which facilitates the adoption of the service for regulated workloads.

Amazon EMR Studio prerequisites and compatibility

  • Versions 5.32.0 (5.x series) and 6.2.0 (6.x series) of Amazon EMR Software are compatible with EMR Studio.
  • Trusted identity propagation must be used by the related EMR clusters when utilising IAM Identity Centre with trusted identity propagation.
  • Make sure that proxy control programs such as FoxyProxy or SwitchyOmega are disabled in your browser before setting up a Studio. Network Failure problems during Studio creation may result from active proxies.

Amazon EMR Studio Limitations and Known Problems

  • The Python magic commands %alias, %alias_magic, %automagic, %macro, %%js, and %%javascript are among those that EMR Studio does not support. Changing KERNEL_USERNAME with %env or %set_env or proxy_user with %configure is also not supported.
  • In EMR Studio, Amazon EMR on EKS clusters does not accept SparkMagic commands.
  • All but the final line of multi-line Scala statements written in notebook cells must be terminated with a period.
  • On EKS clusters, kernels running on Amazon EMR may occasionally fail to start because to timeout problems. You might need to restart the kernel and close and reopen the notebook file if this happens. Restarting the Workspace is necessary for the Restart kernel procedure to take effect, and it might not function properly with EMR on EKS clusters.
  • An error message shows while starting a notebook and choosing a kernel if a workspace is not connected to a cluster. To run code, you need to choose a kernel and attach the workspace, but you can disregard this error.
  • When Amazon EMR 6.2.0 is used with a security setting, the Workspace interface may show as blank. Using a different supported version is advised when setting up EMRFS S3 authorisation or data encryption with a security configuration.
  • On-cluster Spark UI links may not function or show up while troubleshooting EMR on EC2 tasks. Regenerating these links can be done by running the %%info command in a new cell.
  • 5.32.0, 5.33.0, 6.2.0, and 6.3.0 versions of Amazon EMR do not have idle kernels on the primary node cleaned up by Jupyter Enterprise Gateway. This may suck up resources and may lead to the failure of long-running clusters. The sources include a script that can be used to set up idle kernel cleanup for these versions.
  • Because running a Python3 kernel does not submit a Spark task, a cluster may be listed as idle and terminated even if it has an active Python3 kernel if the auto-termination policy is enabled on Amazon EMR versions 5.32.0, 5.33.0, 6.2.0, or 6.3.0. Using Amazon EMR version 6.4.0 or later is advised for auto-termination with a Python3 kernel.
  • When displaying a Spark DataFrame with %%display, very wide tables could be truncated. You may create a scrollable view by right-clicking the output and choosing Create New View for Output.
  • When a Spark-based kernel (PySpark, Spark, SparkR) is started, the corresponding Spark job keeps running if you interrupt a running cell. Use of the on-cluster Spark UI is required to terminate the job.
  • A 403: Forbidden issue occurs when EMR Studio Workspaces is used as the root user in an AWS account because the Jupyter Enterprise Gateway settings prevents root user access. It is advised that for routine operations, alternative authentication methods be used rather than the root user.
  • Certain functionalities of Amazon EMR are not supported by EMR Studio:
    • connecting to and executing tasks on clusters that use a Kerberos authentication security setup.
    • clusters having more than one main node.
    • Clusters with EC2 instances based on AWS Graviton2 for EMR 6.x releases below 6.9.0 and 5.x releases below 5.36.1.
  • When using trusted identity propagation, a studio cannot support the following features:
    • Using EMR serverless applications and building EMR clusters without a template.
    • On EKS clusters, Amazon EMR is launched.
    • A runtime role is used.
    • Facilitating Workspace cooperation or SQL Explorer.

Amazon EMR Studio Service limits

Service Restrictions The EMR Studio service limits are listed in the sources as follows:
EMR Studios:

  • Each AWS account may have a maximum of 100.
  • There can be no more than five subnets linked to one EMR Studio.
  • There can be no more than five IAM Identity Centre Groups allocated to each EMR Studio.
  • Each EMR Studio may have up to 100 IAM Identity Centre users.

Amazon EMR Studio availability regions

AWS RegionRegion CodeLive Spark UI Support
US East (Ohio)us-east-2Yes
US East (N. Virginia)us-east-1Yes
US West (N. California)us-west-1Yes
US West (Oregon)us-west-2Yes
Africa (Cape Town)af-south-1Yes
Asia Pacific (Hong Kong)ap-east-1Yes
Asia Pacific (Jakarta)ap-southeast-3No
Asia Pacific (Melbourne)ap-southeast-4No
Asia Pacific (Mumbai)ap-south-1Yes
Asia Pacific (Osaka)ap-northeast-3No
Asia Pacific (Seoul)ap-northeast-2Yes
Asia Pacific (Singapore)ap-southeast-1Yes
Asia Pacific (Sydney)ap-southeast-2Yes
Asia Pacific (Tokyo)ap-northeast-1Yes
Canada (Central)ca-central-1Yes
Europe (Frankfurt)eu-central-1Yes
Europe (Ireland)eu-west-1Yes
Europe (London)eu-west-2Yes
Europe (Milan)eu-south-1Yes
Europe (Paris)eu-west-3Yes
Europe (Spain)eu-south-2Yes
Europe (Stockholm)eu-north-1Yes
Europe (Zurich)eu-central-2No
Israel (Tel Aviv)il-central-1No
Middle East (UAE)me-central-1No
South America (São Paulo)sa-east-1Yes
AWS GovCloud (US-East)gov-us-east-1Yes
AWS GovCloud (US-West)gov-us-west-1Yes

Thank you for your Interest in Cloud Computing. Please Reply

Discover more from Cloud Computing

Subscribe now to keep reading and get access to the full archive.

Continue reading