Analytics

Elastic MapReduce (EMR)

  • Fully managed Hadoop framework as a service.

    • You can also run other frameworks in EMR, and they integrate with Hadoop, such as, Apache Spark, Hive, HBase, Presto and Flink.

  • It runs over a cluster of EC2 instance, and these clusters can be automatically deleted upon task completion.

  • Data can be analyzed by EMR in several data stores, including S3 and DynamoDB.

  • EMR Studio IDE to create data science applications.

Athena

  • Allows you to analyze data stored in S3 bucket using your standard SQL statements.

    • You can use EMR to cleanse and/or transform the data, before running on Athena.

  • Pay only for the queries you run OR by TB analysed.

  • Serverless.

Good for analysing logs stored in S3, like ELB logs, S3 access logs or others.

AWS Glue

  • It is a serverless data integration service that makes it easy to discover, prepare and combine data for analytics, machine learning and application development.

  • It provides all of the capabilities needed for data integration so that you can start analyzing your data and putting it to use in minutes.

Redshift

  • Similar to Athena, used to analyze data in S3 with SQL.

  • But can query exabytes of data.

  • Pay for queries you run AND the Redshift Cluster.

FinSpace

  • Is a petabyte scale data management, and analytics service, purpose-built for the financial services industry.

  • Also includes a library of over 100 financial analysis functions.

Kinesis

  • Allows you to collect, process, and analyze real-time streaming data.

  • It consists of a number os services like:

    • Kinisis Data Streams: Capture, process, and store data streams.

    • Kinisis Data Firehose: Load data streams into AWS data stores.

    • Kinisis Data Analytics: Analyze streams with SQL or Apache Flink.

    • Kinisis Video Streams: Capture, process, and store video streams.

QuickSight

  • Is a Business Intelligence (BI) reporting tool.

    • Similar to tableau, or if you're a java programmer, similar to bert.

  • It allows you to visualize your analyzed data.

  • Uses a super-fast, Parallel, In-memory, Calculation Engine (SPICE).

  • 1/10 cost of traditional BI software.

CloudSearch

  • Is a fully managed search engine service that supports up to 34 languages.

  • It allows you to create search solutions for your website or application.

OpenSearch (previously ElasticSearch)

  • Is a fully managed ElasticSearch service.

  • It is a real-time distributed search and analytics engine.

    • It is the most popular enterprise search engine. (Facebook, Github, Stack Exchange, Quora, ...)

  • This allows high-speed crawling and analysis of data that is stored on AWS.

  • Analyzes data from:

    • S3

    • Kinesis Streams

    • DynamoDB Streams

    • CloudWatch logs

    • CloudTrail logs

Last updated