본문 바로가기

개발기술/빌드, 배포, 인프라

인프라 확장

 

스케일업 : 단일 서버(하드웨어)의 성능을 증가시켜 더 많은 요청을 처리하는 방법

 

스케일아웃 : 동일한 사양의 새로운 서버(하드웨어)를 추가하는 방법

일반적으로, 스케일업을 할때에는 서비스 중단이나 추가적인 하드웨어 비용이 발생한다. 

RDBMS는 스케일 업을 하기위해서는 새로운 서버에 기존 서버의 데이터를 옮기고 데이터를 정리하는데 번거로운 작업이 필요함.

반면, NoSQL은 처음부터 스케일 아웃을 염두에 두고 설계되었기때문에 데이터의 증가나 요청량이 증가하더라도 비슷한 사양의 새로운 하드웨어를 추가하면 문제가없음.

 

 

 

Horizontal Scaling (Scale-Out)

  • What It Is: Horizontal scaling involves adding more machines (or nodes) to the system and distributing the load across them. In the context of databases, this means distributing the data and queries across multiple servers.
  • Challenges for RDBMS:
    • Data Distribution: RDBMS are designed with a single-node architecture in mind. Distributing relational data across multiple nodes requires sharding, which can be complex. Sharding splits the database into smaller, more manageable pieces (shards) that are distributed across multiple servers, but it requires careful management to maintain consistency and performance.
    • ACID Compliance: Ensuring ACID properties across multiple nodes is difficult. For example, ensuring strong consistency and isolation of transactions across distributed nodes can lead to increased complexity and performance trade-offs.
    • Complex Queries: SQL queries, especially those involving joins, aggregates, and complex transactions, are harder to execute efficiently when data is spread across multiple nodes.
    • Synchronization: Keeping data synchronized across nodes can introduce latency and requires sophisticated algorithms to handle replication and consistency.
    • Increased Complexity: Managing multiple nodes, dealing with node failures, and ensuring that all nodes have the correct data adds significant operational complexity.

Scale-Out Solutions in RDBMS

Despite the challenges, some modern RDBMS have introduced features to facilitate horizontal scaling:

  • Sharding: Some RDBMS support sharding natively, allowing for horizontal distribution of data across multiple servers. However, this often requires careful planning and may not work well for all types of workloads.
  • Replication: Many RDBMS support replication (e.g., master-slave or master-master setups), which can help distribute read workloads across multiple servers. However, write operations still generally occur on a single master node, which can become a bottleneck.
  • Distributed SQL Databases: Some newer systems, like Google Spanner, CockroachDB, and YugabyteDB, are designed to support SQL queries and ACID transactions while being distributed across multiple nodes. These systems are designed to scale out more easily than traditional RDBMS.

 

Understanding DB I/O Bottlenecks

  1. I/O Operations in Databases:
    • Disk I/O: Databases frequently read from and write to disk storage. These I/O operations can become bottlenecks, particularly when the disk cannot keep up with the volume of read/write requests.
    • Network I/O: If the database is distributed or interacts with other services over a network, network bandwidth can also be a limiting factor, especially when transferring large volumes of data.
  2. Disk I/O Bottlenecks:
    • Storage Speed: Traditional spinning hard drives (HDDs) have slower read/write speeds compared to solid-state drives (SSDs). If a database relies on slower storage, the I/O operations may become a bottleneck, leading to slower query responses and reduced throughput.
    • Throughput and Latency: The storage device's throughput (the amount of data that can be read or written per unit of time) and latency (the time it takes to start a data transfer) are critical factors. SSDs generally offer lower latency and higher throughput compared to HDDs, making them better suited for I/O-intensive workloads.
    • I/O Contention: When multiple processes or queries try to access the disk simultaneously, I/O contention can occur, where processes are forced to wait for access to the disk, leading to performance degradation.
  3. Network I/O Bottlenecks:
    • Bandwidth Limitations: If the database is distributed across multiple nodes or interacts heavily with external systems, the available network bandwidth can become a bottleneck. This is particularly true for operations that involve transferring large datasets or frequent communication between nodes.
    • Latency: Network latency, the time it takes for data to travel between nodes, can also impact performance, particularly in distributed databases or cloud-based systems where data might be stored in different regions.

 

데이터베이스 인프라 확장

파티셔닝 : 데이터베이스의 테이블을 더 작은 테이블로 나누는 것

  • vertical partitioning : column을 대상으로 테이블을 나누는 것
    • 정규화라는 것은 결국 중복되는 데이터를 막기 위해서 column을 기준으로 간소화시켜 나누는 것이기때문에 vertical partitioning이라고 할 수 있음.
    • 데이터베이스 쿼리라는 것은 어떤 column을 select하던 간에 전체 record를 선택한 후에 거기서 필요한 column을 선택하는 방식으로 동작하기 때문에 사용하지않는 column의 경우 쿼리양상에 따라서  vertical partioning을 하는 것이 쿼리효율적임.
  • horizontal particioning : row를 대상으로 테이블을 나누는 것
    • 데이터양이 증가할 수록 이에 비례하여 Index의 크기와 query의 속도가 증가하기때문에 이를 방지하기 위하여 row를 분류하여 테이블을 나누는 행위를 함. (ex : 알파벳 순서로 분할, Id를 Hash값으로 변환하여 Hash값에 따른 분할)

샤딩 : Row를 기준으로 테이블을 나누는데, 각 나눠진 테이블이 서로 다른 DB에 저장됨, 이는 DB서버의 부하를 분산시키기 위한 목적임

 

레플리케이션 : DB서버의 데이터를 그대로 동작을 복사하여 sync를 맞추고 본 DB서버가 문제가 발생했을때 백업DB로 동작함. read문 같은 경우에는 백업DB에 분산하여 과부하를 막을 수 있다.

'개발기술 > 빌드, 배포, 인프라' 카테고리의 다른 글

Git 그리고 GitHub  (0) 2024.09.16
SQL과 NoSQL  (0) 2024.09.02
스프링 부트 환경설정 (스프링 Init, Package, Configuration)  (0) 2024.07.23
Test 코드 작성  (1) 2024.07.22
Command-Line Instructions(CLI)  (0) 2024.01.24