selected_top_l2
"/home/yossef/notes/Su/selected_top/selected_top_l2.md"
path: Su/selected_top/selected_top_l2.md
- **fileName**: selected_top_l2
- **Created on**: 2025-04-12 14:59:08
Big Data Challenges: The V’s:
- Volume : Amount of data generated
- Variety : all kinds of data are generated (text,
image, voice, time series, etc.) - Velocity : Rate at which data are produced and
should be processed - Veracity : Noise/anomalies in data, truthfulness
- Value : How do we extract/learn valuable
knowledge from the data. - various : all of sources of data
what is Horizontal Scaling (Scaling Down)?
Increase the processing powers by adding more resources for existing node
- upgrade the process
- upgrade the memory volume
- upgrade the storage volume
what is Horizontal Scaling( pros and cons)?
- Performance increase without modifying applications
- limited Scalability(based on Moore's low)
- Expensive (non linear cost)
what is Vertical Scaling (Scaling out)?
Increasing process powers by adding more nodes to the system
- Cluster of commodity servers
what is Horizontal Scaling( pros and cons)?
- Often requires modifying applications
- infinity Scalability(based on Moore's low)
- less expensive (non linear cost)
what is scaling up , scaling out?
Scaling up: a single powerful computer is added with more cpu cores,
memory and hard disk.
scaling out: is divided task between large number of less powerful machine
with slow cpu, less memory, and less hard disk space
A hierarchical infrastructure
- Resources clustered in racks
- Communication inside a rack is more
efficient than between racks - Resources can even be geographically
distributed
what is 3 major challenges posed by cluster architecture
- Ensure reliability upon node failure
- Minimize network communication bottleneck
- Ease distributed programming model
what is Cloud computing ?
Is a model for enabling convenient, on-demand network access to a shared pool of
configurable computing resources (e.g., networks, servers, storage,
applications, and services) that can
what is Cloud Computing Essential characteristics:
- On-demand self-service,
- Broad network access,
- Resource pooling,
- Rapid elasticity, and
- Measured Service.
what is cloud computing Service Models:
- Cloud Software as a Service (SaaS),
- Cloud Platform as a Service (PaaS), and
- Cloud Infrastructure as a Service (IaaS).
What is cloud computing Deployment Models:
- Private cloud,
- Community cloud,
- Public cloud, and
- Hybrid cloud.
what is MapReduce?
A programming model for processing big data sets with parallel.
Consists of three main components:
- chunk server: large data are split into several chunks each chunk store
on one or more node and can replicated 2 or 3 times in per chunk - master server: it's store meta data about the file and chunk, and number
of chunks and where each chunk store address - client API: allow client to access data stored in chunk servers
what is MapReduce using for or designed for?
- easy parallel programming
- easy management for large data sets of big data
- invisible management for hardware and software failures
what is 3 steps of MapReduce?
- Map:
- Mapper applies the Map function to a single element
- Group by key: Sort and shuffle
- System sorts all the key value pairs by key, and outputs
key-(list of values) pairs
- System sorts all the key value pairs by key, and outputs
- Reduce:
- User written Reduce function is applied to each
key-(list of values)
- User written Reduce function is applied to each
what is MapReduce environment takes care of:
-
Partitioning the input data
-
Scheduling the program’s execution across a set of machines
-
Performing the group by key step
-
Handling machine failures
-
Managing required inter-machine communication
continue:./selected_top_l3.md
before:./selected_top_l1.md