블로그 이사 중입니다 >> gumgood.github.io

Algorithm Gym :: Data structures 본문

Problem Solving/분류없음

Algorithm Gym :: Data structures

gumgood 2019. 5. 8. 12:25

Today I want to introduce you some very very useful data structures.

 

Trees

Trees are one of the most useful data structures. A tree is a connected-acyclic graph. Ther are too many types of trees, like : rooted trees, weighted trees, directed trees, tries, etc.

 

Partial sum

There are two types of problems solvable by partial sum.

 

1. Problems which you are asked to answer some queries about the sum of a part of elements(without modify queries).

 

Solution of all of this problems are the same. You just need to know how to solve one of them:

 

Example : You are asked some queries on an array \(a_1, a_2, ... a_n\). Each query give you numbers \(l\) and \(r\) and you should print \(a_l+a_{l+1}+...+a_r\).

 

Solution: You should have another array \(p_1, p_2, ..., p_n\) which, all of its members are initially 0, for each query, you should increase \(p_l\) by v and decrease \(p_{r+1}\) by \(v\).

 

An then, for each \(i\), starting from 1 you should increase \(p_i\) by \(p_{i-1}\). So, final array would be \(a_1+p_1, a_2+p_2, ..., a_n+p_n\).

 

hard problem of partial sum : https://codeforces.com/gym/100571/problem/B

 

Disjoint sets

Disjoint sets are also useful data structures. Using them is fast and easy. We use theme in many algorithms, like Kruskal's and Prim's.

 

Disjoint sets, or DSU (Disjoint Sets Union) as their name, are sum sets. Imagine we have some boxes and some tools and initially each tool is in one box. Mostly we are given some queries and ask to merge two boxes or print the members of a box or find which box is some tool in.

 

For rest of these, let's consider that initially there is exactly one tool in a box. That is, we have \(n\) tools and \(n\) boxes and initially, tool number \(i\) is in box number \(i\).

 

For this propose, we can use so many containers. Like:

 

Trees

Trees are the most useful containers for DSU. For each vertex, we keep it's parent (and parrent of the root is -1). So, initially are parents are set to -1, and we have queries to find the root of each box(having the root, we can easily find the box's index) and queries for merging two trees. For better time complexity, every time we want to find the root of each vertex, we set it's parent to the root for the next queries.And while merging, we always want to minimize the height of the tree, so when we want to merge the boxes, it's like we put all the tools of the box with fewer tools in the other box.

The best way I've seen to code this kind of DSU, is style of bmerry : (C++)

int root(int v){return par[v] < 0 ? v : (par[v] = root(par[v]));}
void merge(int x,int y){	//	x and y are some tools (vertices)
        if((x = root(x)) == (y = root(y))     return ;
	if(par[y] < par[x])	// balancing the height of the tree
		swap(x, y);
	par[x] += par[y];
	par[y] = x;
}

In the code above, for each root \(v\), \(par[v]\) equals the negative of number of tools in that box.

 

Array, vectors

We keep tools in a vector (or an array) and when we have a query to merge two boxes, we put all the tools of the box with  fewer tools in the other box.

 

The time complexity is good because for each tool, we take and put it in an other box at most \(log(n)\) times (each time the size of the vector will be at least doubled).

 

So time complextiy would be \(O(nlog(n))\).

 

Sets (red-black trees)

Other way is to keep them in a red_black tree (in C++ it's set). We do exactly like vectors, so time complextiy would be \(O(nlog^2(n))\). (One \(log\) is for inserting).

 

Problems : Hamro and tools, TROY Query (Join the group ACM-OI first)

Tries

Tries are some kind of rooted trees in which each edge has a character on it. Actually, trie is some kind of DFA (Determining Finite Automata). For a bunch of strings, their trie is the smallest rooted tree with a character on each edge and each of these strings can be build by writing down the characters in the path from the root to some node.

 

It's advantage is, LCP (Longest Common Prefix) of two of these strings is the LCA (Lowest Common Ancestor) of their nodes in the trie(a node that we can build the string by writing down the characters in the path from the root to that node).

 

Generating the trie:

 

Root is vertex number 0 (C++)

int x[MAX_NUMBER_OF_NODES][MAX_ASCII_CODE], next = 1; //initially all numbers in x are -1
void build(string s){
	int i = 0, v = 0;
	while(i < s.size()){
		if(x[v][s[i]] == -1)
			v = x[v][s[i++]] = next ++;
		else
			v = x[v][s[i++]];
	}
}

Problemhttps://codeforces.com/contest/456/problem/D

 

Suffix array

Suffix array is a data structure that helps you sort all the suffixes in lexicography order.

 

This array consists of integers, the beginning of suffixes.

 

There are two ways to achieve this goal :

 

One) Non-deterministic algorithm : Use Robin-Carp and for check if a suffix is lexicographically less than another one, find their LCP using binary search + hash and then check the next character after their LCP.

 

Code:

namespace HashSuffixArray
{
	const int
		MAXN = 1 << 21;

	typedef unsigned long long hash;

	const hash BASE = 137;

	int N;
	char * S;
	int sa[MAXN];
	hash h[MAXN], hPow[MAXN];

	#define getHash(lo, size) (h[lo] - h[(lo) + (size)] * hPow[size])

	inline bool sufCmp(int i, int j)
	{
		int lo = 1, hi = min(N - i, N - j);
		while (lo <= hi)
		{
			int mid = (lo + hi) >> 1;
			if (getHash(i, mid) == getHash(j, mid))
				lo = mid + 1;
			else
				hi = mid - 1;
		}
		return S[i + hi] < S[j + hi];
	}

	void buildSA()
	{
		N = strlen(S);
		hPow[0] = 1;
		for (int i = 1; i <= N; ++i)
			hPow[i] = hPow[i - 1] * BASE;
		h[N] = 0;
		for (int i = N - 1; i >= 0; --i)
			h[i] = h[i + 1] * BASE + S[i], sa[i] = i;

		stable_sort(sa, sa + N, sufCmp);
	}

} // end namespace HashSuffixArray

Two) Determinisitc algorithm : We sort them \(log(MaxLength)\) steps, int the \(i-th\) step (counting from 0), we sort them according to their first \(2^i\) characters and puts the suffixes whit the same prefix with \(2^i\) characters in the same buckets.

 

Code:

/*
Suffix array O(n lg^2 n)
LCP table O(n)
*/
#include <cstdio>
#include <algorithm>
#include <cstring>

using namespace std;

#define REP(i, n) for (int i = 0; i < (int)(n); ++i)

namespace SuffixArray
{
	const int MAXN = 1 << 21;
	char * S;
	int N, gap;
	int sa[MAXN], pos[MAXN], tmp[MAXN], lcp[MAXN];

	bool sufCmp(int i, int j)
	{
		if (pos[i] != pos[j])
			return pos[i] < pos[j];
		i += gap;
		j += gap;
		return (i < N && j < N) ? pos[i] < pos[j] : i > j;
	}

	void buildSA()
	{
		N = strlen(S);
		REP(i, N) sa[i] = i, pos[i] = S[i];
		for (gap = 1;; gap *= 2)
		{
			sort(sa, sa + N, sufCmp);
			REP(i, N - 1) tmp[i + 1] = tmp[i] + sufCmp(sa[i], sa[i + 1]);
			REP(i, N) pos[sa[i]] = tmp[i];
			if (tmp[N - 1] == N - 1) break;
		}
	}

	void buildLCP()
	{
		for (int i = 0, k = 0; i < N; ++i) if (pos[i] != N - 1)
		{
			for (int j = sa[pos[i] + 1]; S[i + k] == S[j + k];)
			++k;
			lcp[pos[i]] = k;
			if (k)--k;
		}
	}
} // end namespace SuffixArray

Heaps

A heap is a binary rooted tree (a rooted tree that each node has at most 2 children) and each vectex has a value.

 

Heap property : Heap usually has a property, like the value of each vertex is equal to or greater than the value of its child(ren) (we call this a max heap). We can use heaps in heap sort.

Fibonacci heaps

A fibonacci heap is a kind of heap with better complexities. We don't need to know what a fibonacci heap is. C++ already has one, priority_queue. (정말 그런지 cplusplus에서 확인해야함)

 

Binary Search Tree (BST)

A binary search tree (BST) is a binary rooted tree that every node has a value, and for each node, the value of every node in its left child's subtree is less than its value and the value of every node in its right child's subtree is greater than that. Usually we perform some queries on BSTs, like inserting, deleting, asking and ... .

Binary search trees are too useful.

Red-black trees

A red-black tree is a kind of BST that after each query, BST will be balanced in such a way that it's height remains \(O(logn)\).

 

C++ already has a red-black tree inside, set.

 

You can read about them in C++ references.

Unfortunately, set has not any function to find the \(k-th\) smallest minimum or find the index of an element, bust there is a data structure in C++ with does it in \(O(log(n))\)(also contains all set functions). tree:

#include<bits/stdc++.h>
#include<ext/pb_ds/assoc_container.hpp>
#include<ext/pb_ds/tree_policy.hpp>
using namespace __gnu_pbds;
using namespace std;
template <typename T>
using ordered_set = tree<T, null_type, less<T>, rb_tree_tag, tree_order_statistics_node_update>;

int main(){
	ordered_set<int>  s;
	s.insert(1); 
	s.insert(3);
	cout << s.order_of_key(2) << endl; // the number of elements in the s less than 2
	cout << *s.find_by_order(0) << endl; // print the 0-th smallest number in s(0-based)
}

This works even in C++ 98!

 

You can read more about it, just google 'sgi STL'.

 

SQRT Decomposition

Suppose we have an array \(a_1, a_2, ..., a_n\) and \(k = \sqrt{n}\). We partition this array into \(k\) pieces each containing \(k\) elements of \(a\).

 

Doing this, we can do a lot of things in \(O(\sqrt{n})\). Usually we use them in the problems with modify and ask queries.

 

ProblemsHoles, DZY Loves Colors, RMQ (range minimum query) problem

Sparse Table

The main...

0 Comments
댓글쓰기 폼