/ Algorithm

Algorithm: An Application of Bitmap Data Structure

0x00: General Introduction

Bitmap is a very useful data structure which is widely used in indexing, data compressing and other situations. It doesn't only demand little memory, but also has an optimized time performance.

0x01: Lead-In: Company Problem

Description

There's a company with a staff of \(n\) numbered from \(1\) to \(n\). Because the number of staff is enormous, you need to write a management system for them.

When the staff start work or leave, they are ought to check in or check out a code in the system. Also, they are capable of updating their codes at any time. When it's time to get off work, all staff will be logged out of the system automatically.

The administrator of the company may want to know how many staff are working and which code is used for logging to the system of any staff.

Input Format

In the first line, two integers \(n\), \(m\) are given.

Each of the following \(m\) lines contains one of these contents:

  • I a c means number \(a\) staff logs in to the system using the code \(c\). If \(a\) is logged in, assign \(c\) to his or her code.
    • Oa means staff numbered \(a\) logs out of the system. If \(a\) isn't logged in, the operation is invalid.
  • C means it's time to get off work. All staff log out of the system.
  • N means inquiring how many staff are working.
    • Qa means making an inquiry about the code of the staff numbered \(a\). If the specific staff is logged out, answer \(-1\).

Output Format

Answer an integer, which is the sum of all inquiries beginning with N and Q.

int is supposed to be used to store that value and don't care about the overflowed part.

Sample Input

10	8
I	1	2
I	1	4
Q	1
C
I	2	4
O	2
Q	2
N

Sample Output

3

Clarification

  • When asked Q 1, you are supposed to return \(4\)
    • When asked Q2, you are supposed to return \(-1\) since the staff numbered \(4\) is logged out
  • When asked N, you are supposed to return \(0\) since all the staff have been logged out
  • The sum of these three inquires is \(3\)

Limitation

Input Data Scale

  • \(n \le 10,000,000\)

  • \(m \le 1,000,000\)

  • \(1\le a \le n\)

  • Each code is within the range of \([0,2^{31})\)

Time Limitation

  • 2 secs per test case

Memory Limitation

  • 256 Megabytes in total

0x02: Brute-Force Algorithm

Firstly, I use map in C++ STL to represent a pair of the correspondence between one staff and its code.

#include <iostream>
#include <map>
using namespace std;
typedef map<int, int> my_map;
void logout_all(my_map &c) {
    for (my_map::iterator it = c.begin(); it != c.end(); ) {
        it = c.erase(it);
    }
    return ;
}
bool is_loggedin(my_map &c, int staff) {
    my_map::iterator it = c.find(staff);
    if (it != c.end()) {
        return true;
    }
    return false;
}
void update(my_map &c, int staff, int code) {
    my_map::iterator it = c.find(staff);
    if (it != c.end()) {
        it->second = code;
    }
    return ;
}
void login(my_map &c, int staff, int code) {
    c.insert(make_pair(staff, code));
    return ;
}
void logout(my_map &c, int staff) {
    my_map::iterator it = c.find(staff);
    if (it != c.end()) {
        c.erase(it);
    }
    return ;
}
int query_loggedin(my_map &c) {
    int ans = 0;
    for (my_map::iterator it = c.begin(); it != c.end(); ++ it) {
        ++ ans;
    }
    return ans;
}
int query_staff(my_map &c, int staff) {
    if (is_loggedin(c, staff)) {
        my_map::iterator it = c.find(staff);
        return it->second;
    }
    return -1;
}
int main(void) {
    int n, m;
    int ans = 0;
    my_map c;
    cin >> n >> m;
    for (int i = 1; i <= m; ++ i) {
        char ch;
        int a, b;
        cin >> ch;
        switch (ch) {
        case 'I':
            cin >> a >> b;
            if (is_loggedin(c, a)) {
                update(c, a, b);
            }
            else {
                login(c, a, b);
            }
            break;
        case 'O':
            cin >> a;
            if (is_loggedin(c, a)) {
                logout(c, a);
            }
            break;
        case 'C':
            logout_all(c);
            break;
        case 'N':
            ans += query_loggedin(c);
            break;
        case 'Q':
            cin >> a;
            ans += query_staff(c, a);
            break;
        }
    }
    cout << ans << endl;
    return 0;
}

Consider an example that repeats logging out all staff of the system: each logout_all operation needs \(O(n)\), so the worst case is \(O(nm)\) which will exceed the time limitation.

0x03: Using Bitmap

To reduce the unnecessary time consumption when logging out all the staff, we can use one bit to represent whether one staff is working or not, since we can erase one binary digit in \(O(1)\).

In C++ STL, we have bitset to store a fixed-size sequence of \(N\) bits. Considering \(n\) is determined during the runtime, we have to design a dynamic data class. Among many binary functions, I referred to some bits hacks. You can check it out at the end of the text.

#include <iostream>
#include <cstring>
#include <cmath>
#include <map>
using namespace std;
class bitmap {
private:
    const size_t INT_BITS = sizeof(int) * 8;
    const size_t SHIFT = static_cast<size_t>(log(INT_BITS) / log(2));
    const size_t MASK = INT_BITS - 1; // 00011111
    int *bits;
    size_t size, range;
public:
    void set(size_t i) {
        // i >> SHIFT is equivalent to i / (2 ^ SHIFT)
        // i & MASK is equivalent to i mod MASK
        bits[i >> SHIFT] |= 1 << (i & MASK);
        return ;
    }
    void clear(size_t i) {
        bits[i >> SHIFT] &= ~(1 << (i & MASK));
        return ;
    }
    void clear_all(void) {
        memset(bits, 0, sizeof(int) * range);
        /*
        for (size_t i = 0; i <= range; ++ i) {
            bits[i] = 0;
        }
        */
        return ;
    }
    bool operator[] (size_t i) {
        return bits[i >> SHIFT] & (1 << (i & MASK));
    }
    bool get(size_t i) {
        return bits[i >> SHIFT] & (1 << (i & MASK));
    }
    int count(void) {
        int c = 0;
        for (size_t i = 0; i <= range; ++ i) {
            int v = bits[i];
            for (; v; ++ c) {
                v &= v - 1;
            }
        }
        return c;
    }
    bitmap(size_t s) {
        size = s;
        range = size / INT_BITS;
        bits = new int[size / INT_BITS];
        return ;
    }
    ~bitmap(void) {
        delete[] bits;
        return ;
    }
};
int main(void) {
    int n, m;
    int ans = 0;
    int working = 0;
    cin >> n >> m;
    bitmap c(n);
    c.clear_all();
    map<int, int> code;
    for (int i = 1; i <= m; ++ i) {
        char ch;
        int a, b;
        cin >> ch;
        switch (ch) {
        case 'I':
            cin >> a >> b;
            code[a] = b;
            if (! c.get(a)) {
                ++ working;
                c.set(a);
            }
            break;
        case 'O':
            cin >> a;
            if (c.get(a)) {
                -- working;
                c.clear(a);
            }
            break;
        case 'C':
            working = 0;
            c.clear_all();
            break;
        case 'N':
            ans += working;
            break;
        case 'Q':
            cin >> a;
            if (c.get(a)) {
                ans += code[a];
            }
            else {
                -- ans;
            }
            break;
        }
    }
    cout << ans << endl;
    return 0;
}

0x04: One More Application

Given 250 million integers, you're supposed to answer the number of non-duplicate numbers.

You are given at most 600 Megabytes memory space.

0x05: Thanks and References

Thanks BML for his revision of this post.

[1] CppReference.com - Bitset

[2] Counting bits set, Brian Kernighan's way