자료구조 이론정리

데이터구조 학습과 선택의 접근관점

1. 데이터 구조 선택에 있어서 고려할 것들.
- 데이터 간의 순서관계가 유의미한가? Hash table과 Dynamic Array
- 데이터 간의 대소관계가 유의미한가? Sorting으로 자원 절약이 가능한지
- 메모리 inplace로 구현해야하는가? Dynamic Programming 혹은 Stack과 Queue 구조를 사용하면 좋을지
- 데이터의 조회가 주목적인가, 데이터의 보관이 주목적인가? Hashtable/Dynamic Array 혹은 Stack/Queue
- 데이터 의미 단위가 전체 데이터의 맥락과 별개로, 개별 데이터에 있는가 혹은 전체 데이터 맥락을 고려해야하는가? Hashtable/Dynamic Array 혹은 Stack/Queue
2. 사용시의 장점과 단점
3. 실제 활용되는 분야
4. 응용 : 프로그래밍 언어(파이썬/C)에서 실제 구현은 어떻게 되어있는지? 그 이유는?

Data Structure, 자료구조

Array, Dynamic Array, Direct Access Array
Sorted Array
Hash table
Linked list
Stack and Queue, Priority Queue
AVL, Heap

Interface(ADT/API) VS Data Structure

A data structure : is a way to store a non-constant amount of data, supporting a set of operations to interact with that data. The set of operations supported by a data structure is called an interface.
Interface 라는 요구사항/규약이, 구체적인 구현으로 달성된 상태를 ADT라고 한다.
Interface
- 공학적의미 : In theory, the set of all signatures (parameter and return value) defined by an object's operations is called the interface to the object.
- 일반적의미 : It serves as a contract specifying the methods, properties, and behaviors that a class, module, or component should provide, without specifying how these details are implemented.
  - When a company or website publicizes an interface, it typically means they are providing documentation and information about the methods, functions, or services that external developers or systems can use to interact with their platform or services. (User Interface)

Sequence Interface

a collection of items in an extrinsic order, where each item stored has a rank in the sequence
Sequence Operation Contract : Insert_at, Get_at, Set_at, Delete_at
Sequence ADT Example : Array and Dynamic Array : Linked list
Sequence 자료구조 특이성; Zero indexing, Not Including Last
- 해당 내용에 대해서 Dijkstra의 논증 https://www.cs.utexas.edu/users/EWD/ewd08xx/EWD831.PDF

1. Why Index Start from Zero?
- 이는 컴퓨터가 Offset방식을 사용하여 데이터의 위치를 파악하기 때문이다. 바꿔말하면, RAM에서 데이터의 구역분배는 절대적배치가 아니기에, 특정 데이터 탐색은 데이터 구역 시작점으로부터 상대적 위치(몇칸을 평행이동 시킬지)로 파악하기 때문.
- arr[i] is interpreted as *(arr + i). arr is the address of the array or address of 0th index element of the array. arr + i mean the address at i distance away from the starting element of the array.
2. Index exclude last address?
- a=<x<b 방식은 두가지 이점이 있는데 1. 시작지점인 a를 포함하여 표현하고 있음. 2. b-a로 array length를 바로 구할 수 있어, 평행이동이 필요한 횟수가 바로 계산됨. 반면, a=<x=<b-1의 경우 a~b-1까지라는 개념은 인간관점에선 직관적이지만, 컴퓨터 관점에서는 ((b-a-1)+1)이라는 +1이라는 부가적인 연산을 통해 인식할 수 있음
결론 : It is more about convention and common heritage (such as languages influenced by C) to increase efficiency when there was limited computer resource.

배열

특 : 각 데이터를 인덱스와 1:1 대응하도록 구성되며 데이터가 메모리 상에 연속적으로 저장됨.
장 : Direct Access Arrays구조를 가지고 있어 인덱스를 이용하여 데이터에 빠르게 접근 가능. 메모리에 배치된 데이터의 초기 index에서 찾고자하는 index만큼의 변위를 추가하여 탐색하면 빠르게 탐색가능 O(1)
단 : 데이터의 추가/삭제가 번거로운 편.
- 최초 생성시 길이를 정해서 생성함. 만약, 용량이상으로 새로운 데이터를 추가할 때, 다른 메모리 영역에 배열을 새로 생성하여 복사 진행. 단, 최대길이는 넉넉하게 생성하기때문에 tail에 데이터 추가/삭제로 인하여 자주 변하지는 않음. armortized O(1)
- 배열의 중간의 데이터를 삽인,삭제할시 그 뒤의 데이터를 모두 평행시켜야함. O(n)

public class arrayList<E> {
    private Object[] elements;
    private int size = 0;
    private static final int DEFAULT_CAPACITY = 10;

    public arrayList() {
        elements = new Object[DEFAULT_CAPACITY];
    }

    public void add(E e) {
        if (size == elements.length) {
            ensureCapacity(); // Increase capacity when needed
        }
        elements[size++] = e;
    }

    @SuppressWarnings("unchecked")
    public E get(int index) {
        if (index >= size || index < 0) {
            throw new IndexOutOfBoundsException("Index: " + index + ", Size: " + size);
        }
        return (E) elements[index];
    }

    public void remove(int index) {
        if (index >= size || index < 0) {
            throw new IndexOutOfBoundsException("Index: " + index + ", Size: " + size);
        }
        int numMoved = size - index - 1;
        if (numMoved > 0) {
            System.arraycopy(elements, index + 1, elements, index, numMoved);
        }
        elements[--size] = null; // Clear to let GC do its work
    }

    private void ensureCapacity() {
        int newSize = elements.length * 2;
        Object[] newElements = new Object[newSize];
        System.arraycopy(elements, 0, newElements, 0, size);
        elements = newElements;
    }

    public int size() {
        return size;
    }

    public boolean isEmpty() {
        return size == 0;
    }
}

코딩테스트 : 복합구조의 List의 경우( ArrayList<ArrayList<int[]>>) python과 달리 안정성이 떨어져서 Class를 생성하는 것을 고려해볼 것.

연결리스트 Linked List

특 : 데이터를 링크로 연결해서 관리하는 자료구조. 자료의 순서는 정해져 있지만, 메모리상 물리적 연속성이 보장되지는 않음
- 노드 (Node) - 데이터 저장 단위로, 값과 포인터로 구성
장 : 데이터 공간을 미리 할당할 필요 없으며, 리스트의 길이가 가변적이라 head/tail 데이터에 한하여 추가/삭제 용이. O(1)
단 : 연결구조를 위한 별도 데이터 공간 필요, 데이터 접근시 head or tail부터 접근가능하여 접근속도 느림 O(n)
단 : head or tail이 아닌 데이터 추가, 삭제 시 해당 데이터까지 접근하는 속도가 느려서 느림응용 : 주로 활용은 doubly linked list로써 일반 linked list의 head 혹은 tail부터의 접근속도를 보완함.

public class linkedList<E> {
    // Node inner class for LinkedList
    private static class Node<E> {
        E data;
        Node<E> next;

        Node(E data) {
            this.data = data;
            this.next = null;
        }
    }

    private Node<E> head; // Head node to hold the list

    public linkedList() {
        head = null;
    }

    // Add element to the end of the list
    public void add(E element) {
        Node<E> newNode = new Node<>(element);
        if (head == null) {
            head = newNode;
        } else {
            Node<E> current = head;
            while (current.next != null) {
                current = current.next;
            }
            current.next = newNode;
        }
    }

    // Retrieve element by its position in the list
    public E get(int index) {
        Node<E> current = head;
        int count = 0;
        while (current != null) {
            if (count == index) {
                return current.data;
            }
            count++;
            current = current.next;
        }
        throw new IndexOutOfBoundsException("Index: " + index);
    }

    // Display all elements of the list
    public void printList() {
        Node<E> current = head;
        while (current != null) {
            System.out.print(current.data + " -> ");
            current = current.next;
        }
        System.out.println("null");
    }
}

스택 Stack

특징 : 후입선출 (Last In First Out; LIFO) 자료구조 - 마지막에 들어온 데이터가 먼저 나가는 구조.
함수 콜 스택, 수식 계산, 인터럽트 처리 등, 프로그램의 메모리구조의 stack영역에서 함수 호출시 stack을 사용하며, 지역변수 정보, 매개변수정보, 함수종료 후 복귀주소가 저장된다.
하던 작업을 멈추고, stack call의 일을 우선으로 처리함. 마트에 가던 중 동전이 떨어져있는 것을 발견하면 동접을 줍고서 가던길을 계속간다.
java의 stack은 collections framework의 list interface의 vector subclass로 collection과 list의 method를 갖는다.
기능 : 기본적으로 데이터 추가, 꺼내기, 스택 공간 확인 동작으로 이루어짐
- push(item) : 스택의 맨 위에 요소를 추가하고 추가된 요소를 반환
- pop() : 스택의 맨 위 요소를 제거하고 반환. 스택이 비어 있으면 EmptyStackException 발생
- peek() : 스택의 맨 위 요소를 제거하지 않고 반환. 스택이 비어 있으면 EmptyStackException 발생
- isempty() : 스택이 비어있는지 확인 (true / false)

큐 Queue

특 : 선입선출 (First In First Out; FIFO) 자료구조- 먼저 들어온 데이터가 먼저 나가는 구조
Java에서 Queue는 Interface(추상클래스)로 'java.util.Queue (interface)- java.util.Deque (interface) - ArrayDeque/ LinkedList'의 상속관계를 갖는다. 그러므로 실제활용시에는 LinkedList를 통해서 Queue를 구현한다.
기능
- boolean offer(E e): 큐의 끝에 요소를 추가. 용량을 초과하면 false 반환.
- E poll(): 큐의 앞에서 요소를 제거하고 반환. 큐가 비어 있으면 null 반환.
- E peek(): 큐의 앞 요소를 반환. 큐가 비어 있으면 null 반환.
python의 내장라이브러리 중 deque의 deque.popleft(), appendleft()로 구현가능
- from collection import deque, variable = deque(list)
- deque는 dynamic array를 사용하는 파이썬의 list와는 다르게, double linked list를 사용하여 양끝단에서의 insertion과 deletion에 대한 비용을 줄임.단, pointer 저장을 위한 메모리 사용과 random access가 불가하여 조회에 시간이 오래걸리는 것은 단점임

데크 Deque

양쪽에서 삽입과 삭제가 모두 가능한 자료구조 ( Deque: Doubly-ended Queue ), Stack과 Queue 를 합친 형태
Scroll : 한 쪽의 입력을 제한한 데크, Shelf : 한 쪽의 출력을 제한한 데크
method
- offerFirst(E e) / offerLast(E e): Adds an element at the front/tail of the deque and returns true if successful, or false if the deque is full.
- pollFirst() / pollLast(): Removes and returns the first/tail element of the deque, or returns null if the deque is empty.

Set Interface

Sets maintain a collection of items based on an intrinsic property involving what the items are, usually based on a unique key, x.key
Set Operation Contract
- Find_withkey, Insert_withkey
임의로 배치된 배열 내의 데이터의 탐색은 비용이 많이든다. 그러나 데이터의 값을 인덱스와 내재적으로 유의미한 관계가 있도록 배치하면, 바로 해당 값을 찾을 수 있음. (Hashing기법)
- Direct Access Array의 특이성 : Most operations within a computer only allow for constant logical branching, like if statements in your code. However, one operation on your computer allows for non-constant branching facto which is a normal static array that associates a semantic meaning with each array index locationDirect Access Arrays : static array that associates a semantic meaning with each array index location

해쉬 테이블

해쉬 함수를 사용하여 키를 해시값으로 매핑하고, 이 해시값을 색인(index)삼아 데이터의 값(value)을 키와 함께 pair로 저장하는 자료구조. 데이터가 저장되는 곳을 슬롯 혹은 버킷이라고 함. 키를 통해서 데이터 값에 빠르게 접근가능함.

해싱 Hashing - 키를 특정 계산식에 넣어 나온 결과를 사용하여 값에 접근하는 과정
- Direct Access Array의 빠른 접근 속도를 살리되, possible input의 최대값 u만큼의 메모리를 할당하고 싶지 않을때 사용함. (Direct Access Arrays : static array that associates a semantic meaning with each array index location)
- 용어정리 ; 키: 해시 테이블 접근을 위한 입력 값, 해시 함수: 키를 해시 값으로 매핑하는 연산 , 해시 값: 해시 테이블의 인덱스, 해시 테이블: 키-값을 연관시켜 저장하는 데이터 구조
- to store the items in a smaller dynamic direct access array, with m = O(n) slots instead of u. Use a function that maps item keys to different slots of the direct access array, h(k) (a hash function or a hash map) : {0,...,u−1} → {0,...,m−1}(a hash table)

Hash충돌
- 해쉬값의 개수보다 보통은 더 많은 키값을 해쉬값으로 변환하기 때문에 해쉬함수가 서로 다른 두 개의 키에 대해서 동일한 해쉬값을 반환하는 해쉬 충돌이 발생함 (비둘기집의 원리)
- 해결법1 : 체인법 (Separate Chaining) : 한 버킷에 들어갈수있는 엔트리 값에 제한을 두지 않음. 충돌 발생 시, 테이블 내의 다른 위치를 탐색하는 것이 아닌 연결 리스트를 이용하여 해당 테이블에 데이터를 연결
- 해결법2 : 개방 주소법 (Open Addressing): 한 버킷당 들어갈 수 있는 엔트리를 하나로 제한함. 충돌 시, hash값이 아닌 다른 주소에 데이터를 저장할 수 있도록 허용. Address를 바꿀 수 있다는 점에서 Open Addressing이라고 부름.
  - A. 선형탐사법 (Linear Probing) : 해당 해쉬값에서 고정 폭(1칸?)을 옮겨 다음 해쉬값에 해당하는 버켓에 엑세스함.
    - 일차 군집화(clustering) 문제 발생- 반복된 충돌 발생 시 해당 지점 주변에 데이터가 몰리는 경우 발생
  - B. 제곱탐사법(Quadratic Probing) : 해당 해쉬값에서 특정 수의 제곱만큼의 간격을 두고 탐사하는 방법. n제곱 간격으로 탐사 (2^1칸, 2^2칸, 2^3칸 ...)
    - 일차 군집화(clustering) 문제 일부 보완하나 여전히 특정 부분에서 군집이되는 이차 군집화 문제 발생 가능성
  - C. 이중해싱(Double Hashing) : 해싱 함수를 이중으로 사용함. 해시 함수 1: 최초 해시를 구할 때 사용, 해시 함수 2: 충돌 발생 시, 탐사 이동 간격을 해쉬값으로 사용. 선형 탐사, 제곱 탐사에 비해 데이터가 골고루 분포되어 Clustering이있다.

Good hash function / Bad hash fucntion
- 해쉬테이블의 삽입,삭제,탐색은 O(1)이 걸리지만 잘 짜여지지 못한 hash function은 잦은 충돌로 O(n)이 걸릴 수 있다.
- 보통 적재율(자료의 수/버켓의 수)는 20%수준으로 유지한다.

Binary Tree : a linked node container, similar to a linked list node (a connected graph with no cycles)

Tree concept
- Root : node that has no parent
- depth : number of edge from a certain node to root upward
- height : number of edge from a certain node to leave(longest downward path)
Binary Tree Operation
- Traversal Order : every node in node <X>’s left subtree is before <X>, every node in node <X>’s right subtree is after <X>
- Insert node <Y> after node <X> in the traversal order
  - If <X> has no right child, make <Y> the right child of <X>. Otherwise, make <Y> the left child of <X>’s successor (which cannot have a left child) Running time is O(h) where h is the height of the tree
- Delete the item in node from ’s subtree
  - If <X> is a leaf, detach from parent and return. If <X> has a left child, swap items with the predecessor of <X> and recurse ∗ Otherwise <X> has a right child, swap items with the successor of <X> and recurse
- Subtree Aggregation
  - Ex : Subtree Height, Subtree Maximum Value, Subtree Key Max etc
  - Subtree Aggregation Property Maintanence
    - Insertion : Update all ancestors of an inserted or deleted node in O(h) time by walking up the tree
    - Search :Look up subtree property in O(1) time

AVL Tree

AVL Tree (Adelson-Velsky and Landis, 1962) ,A binary tree that maintains balanced O(log n) height under dynamic operations.
Operation : change the structure of a tree, while preserving traversal order by Rotations for balance.
Set and Sequence Application : By maintaining sorted traversal order to search for specific key in logn time, we can implment Set AVL. By maintaining subtree size property to search for specific order in logn time,we can implement Sequence AVL

Heap Tree as Set AVL

Java에서 Tree구조는 모두 heaptree를 사용하여, 내부적으로 자료의 순서를 지키고 있음.
Idea: keep larger elements higher in tree, but only locally (subtree aggregation, 중복이 허용된 반정렬상태)
부모 노드의 키가 자식 노드의 키보다 작거나 같은 형태. 부모노드와 자식노드의 관계는 i와 2i or 2i+1.
- 데이터 중, Priority가 높은 것에 따라서, 순서를 바꾸는 것이므로 set자료구조임. 따라서, 삽입 후 순서바꾸기가 들어가기때문에, 최초의 요소 간의 삽입위치 및 배열순서는 크게 중요하지 않음.
Operations: Excellent for finding, extracting the min/max, and inserting/deleting elements.
Deletion, Insertion : to delete swap the max value with last value of the array and heapify down. Insertion as well.
Sorting: Can be used for efficient in-place sorting using heap structure. pop element sequencely.
Building : Use Max heapify down for O(n),
Diagram Example

*Set Data Structure의 경우 heap tree를 사용할시, O(n)이 필요하여 AVL Tree보다 우위에 있음

ㅇPython implmentation : import heapq. iterable을 input으로 받아 최소힙을 구현함. 최대힙은 input값을 음수로 표기하고 output값을 다시 양수로 변환하여 구현한다.
- heapify(iterable): This function converts the iterable into a heap in-place. It rearranges the elements of the iterable such that they satisfy the heap property.
- heappush(heap, item): This function adds the item to the heap while maintaining the heap property. It inserts the item at the correct position in the heap.
- heappop(heap): This function removes and returns the smallest element from the heap while maintaining the heap property. It pops the root element of the heap.

* 각 자료구조의 기초부터의 구현은 기술면접으로 출제될 가능성이 있음

'개발기술 > 알고리즘, 자료구조, PS' 카테고리의 다른 글

코딩테스트 탐색문제 (DFS,BFS,DP) (0)	2025.01.12
그래프이론 알고리즘, 자료구조 정리 (2)	2024.09.18
정렬 알고리즘 (1)	2024.08.10
코딩테스트 문제후기 (3)	2024.05.21
코딩테스트 학습법 및 빈출유형 문제풀이 (0)	2024.01.21

생각정리, 개발정리

자료구조 이론정리

데이터구조 학습과 선택의 접근관점

Data Structure, 자료구조

Interface(ADT/API) VS Data Structure

Sequence Interface

Set Interface

'개발기술 > 알고리즘, 자료구조, PS' 카테고리의 다른 글

티스토리툴바

자료구조 이론정리

데이터구조 학습과 선택의 접근관점

Data Structure, 자료구조

Interface(ADT/API) VS Data Structure

Sequence Interface

Set Interface

'개발기술 > 알고리즘, 자료구조, PS' 카테고리의 다른 글

'개발기술/알고리즘, 자료구조, PS' Related Articles

티스토리툴바