Source Code Party - Container Performance

A lovely night.

Wanna dance the Waltz with this beauty?

2020_11_24_1

🕺: 😍 yes, please

💃: Do you wanna dance with me?

🕺: 😍 of course

💃: Men are sometimes liars.

🕺: I am the only compatible partner for you.

💃: Humans can’t be trusted, only the source code won’t lie!

🤖: Yes? I’m always online for you, my dear.

Prelude to the Waltz: Why this?

std::vector std::deque std::list
Indexed Access Super Fast Fast Impossible
Insert/remove front Slow Fast Fast
Insert/remove back Super Fast Very Fast Fast
Insert/remove elsewhere Slow Fast Very Fast
Memory Low High High
Splicing/Joining Slow Very Slow Fast
Stability * Poor Very Poor Good

from some page here

Should I look at the source code of Container?

3 Sources

Indexed Access

Vector: Super Fast

Line 3042:

template <class _Allocator>
typename vector<bool, _Allocator>::reference
vector<bool, _Allocator>::at(size_type __n)
{
    if (__n >= size())
        this->__throw_out_of_range();
    return (*this)[__n];
}

As shown in the list, to have an indexed access to an element is just like to retrieve an item in an array.

Deque: Fast

template <class _Tp, class _Allocator>
inline
typename deque<_Tp, _Allocator>::reference
deque<_Tp, _Allocator>::at(size_type __i)
{
    if (__i >= __base::size())
        __base::__throw_out_of_range();
    size_type __p = __base::__start_ + __i;
    return *(*(__base::__map_.begin() + __p / __base::__block_size) + __p % __base::__block_size);
}

Seems that Deque is using a combination of small continuous blocks, so if we wanna get a value, we need to find the place of the small continuous block, then get the value. It looks a bit complicated than Vector, so it’s slower.

List: Impossible !

Ahaha 😀, look at the source code, no such function. lol.

Push Back <- adds an element to the end

Vector: Super Fast

Line 3060:

template <class _Allocator>
void
vector<bool, _Allocator>::push_back(const value_type& __x)
{
    if (this->__size_ == this->capacity())
        reserve(__recommend(this->__size_ + 1));
    ++this->__size_;
    back() = __x;
}

I am curious what is so-called __recommend, which is also mentioned in the cs106L lecture note, so I dig in.

__recommend: if the new size is larger than the max size the system can give, throw error. If current the capacity (the size of the allocated memory) is larger than or equal to half of the max capacity, then return the max capacity; If not, return 2 x current capacity, or the new size, depends on which is larger.

template <class _Tp, class _Allocator>
inline _LIBCPP_INLINE_VISIBILITY
typename vector<_Tp, _Allocator>::size_type
vector<_Tp, _Allocator>::__recommend(size_type __new_size) const
{
    const size_type __ms = max_size();
    if (__new_size > __ms)
        this->__throw_length_error();   // boundary check
    const size_type __cap = capacity();
    if (__cap >= __ms / 2)
        return __ms;
    return _VSTD::max<size_type>(2*__cap, __new_size);
}

max_size: roughly understand this is to get the max size a vector can be allocated. Ignore the deeper details.

template <class _Tp, class _Allocator>
typename vector<_Tp, _Allocator>::size_type
vector<_Tp, _Allocator>::max_size() const _NOEXCEPT
{
    return _VSTD::min<size_type>(__alloc_traits::max_size(this->__alloc()),
                                 numeric_limits<difference_type>::max());
}

capacity: get the size of the allocated memory

size_type capacity() const _NOEXCEPT
        {return static_cast<size_type>(__end_cap() - __begin_);}

end_cap: returns the pointer to the end of the allocated memory, different from the end pointer

pointer& __end_cap() _NOEXCEPT
        {return __end_cap_.first();} // __compressed_pair<pointer, allocator_type> __end_cap_;

back(): access the last element, return a reference

_LIBCPP_INLINE_VISIBILITY reference back()        
{return __make_ref(__size_ - 1);}

Now, I come back to this code: when the size is equal to the allocated memory, means it’s full, then the system reserves a larger recommended size to this vector, get the new size, and then assign the address of the new value to the last element’s reference(alias). No wonder it’s super fast.

template <class _Allocator>
void
vector<bool, _Allocator>::push_back(const value_type& __x)
{
    if (this->__size_ == this->capacity())
        reserve(__recommend(this->__size_ + 1));
    ++this->__size_;
    back() = __x;
}

Deque: Very Fast

template <class _Tp, class _Allocator>
void
deque<_Tp, _Allocator>::push_back(const value_type& __v)
{
    allocator_type& __a = __base::__alloc(); // allocate a new space
    if (__back_spare() == 0)
        __add_back_capacity();  // make sure got a back_spare for the new space
    // __back_spare() >= 1
    __alloc_traits::construct(__a, _VSTD::addressof(*__base::end()), __v); // educated guess, link the new allocated space to the last pointer of the base, put the new value v inside the allocated space
    ++__base::size();   // increase the base size
}

As seen from the code, seems no copying options, just allocated new space and add new value, so it’s very fast.

List: Fast

template <class _Tp, class _Alloc>
void
list<_Tp, _Alloc>::push_back(const value_type& __x)
{
    __node_allocator& __na = base::__node_alloc();
    __hold_pointer __hold = __allocate_node(__na);
    __node_alloc_traits::construct(__na, _VSTD::addressof(__hold->__value_), __x);
    __link_nodes_at_back(__hold.get()->__as_link(), __hold.get()->__as_link());
    ++base::__sz();
    __hold.release();
}

Obviously 😅, it’s fast, but slower than deque and vector.

Insertion

There are many types of insertion:

iterator insert(const_iterator position, const value_type& x);
iterator insert(const_iterator position, value_type&& x);
iterator insert(const_iterator position, size_type n, const value_type& x);
template <class InputIterator>
    iterator insert(const_iterator position, InputIterator first, InputIterator last);
iterator insert(const_iterator position, initializer_list<value_type> il);

for Insertion, just take a look at the simplest one: inserts value before pos (position).

Vector

front: slow; elsewhere: slow; end: super fast; memory: low

Pre

To read the Insert function of Vector, needs to understand the following concepts first:

  1. Doc
  2. std::allocator
  3. std::iterator
  4. std::copy_backward
template< class BidirIt1, class BidirIt2 >
BidirIt2 copy_backward( BidirIt1 first, BidirIt1 last, BidirIt2 d_last );
  • first, last: the range of the elements to copy from
  • d_last: the end of the destination range
  1. A const_iterator is an iterator that points to const value (like a const T* pointer), use * to visit the element that the iterator points to.

  2. __const_iterator_cast

iterator __const_iterator_cast(const_iterator __p) _NOEXCEPT
        {return begin() + (__p - cbegin());}   //starting point + diff

cbegin():

 const_iterator         cbegin()  const _NOEXCEPT
        {return begin();}
  1. std::copy
template< class InputIt, class OutputIt >
OutputIt copy( InputIt first, InputIt last, OutputIt d_first );

Copies the elements in the range, defined by [first, last), to another range beginning at d_first.

  1. std::swap

  2. std::vector::swap

  3. Difference between std::swap and std::vector::swap

Code

Then we can look at the code of vector:

template <class _Allocator>
typename vector<bool, _Allocator>::iterator
vector<bool, _Allocator>::insert(const_iterator __position, const value_type& __x)
{
    iterator __r;
    if (size() < capacity())  // if allocated memory is enough 
    {
        const_iterator __old_end = end();
        ++__size_;    // increase the size
        _VSTD::copy_backward(__position, __old_end, end());  // copy the second-half of the vector in a backward manner starting from the end of current size, and the first half remains the same
        __r = __const_iterator_cast(__position); // get the iterator version of the const iterator
    }
    else
    {
        vector __v(__alloc()); // allocate some new spaces for a temporary vector
        __v.reserve(__recommend(__size_ + 1));  // reserve recommended new space for it
        __v.__size_ = __size_ + 1;              // calculate the  size
        __r = _VSTD::copy(cbegin(), __position, __v.begin()); // copy the first part of the original vector(until pos) from the original location to the starting point of the temp vector's space
        _VSTD::copy_backward(__position, cend(), __v.end()); // copy the second part backwards from the end of the temp space
        swap(__v); // swap the original v's content with the temp __v's content
    }
    *__r = __x;   // Make the iterator's pointer pointing to the new value's address
    return __r;	  // return the iterator of the inserted value
} 

So I guess, inserting in front/elsewhere is slow because _VSTD::copy is slow!

Based on cpp, the behaviour of copy is:

template<class InputIterator, class OutputIterator>
  OutputIterator copy (InputIterator first, InputIterator last, OutputIterator result)
{
  while (first!=last) {
    *result = *first;
    ++result; 
    ++first;
  }
  return result;
}

The time complexity seems to be O(n). But if the element to add is the last element, we don’t need to copy the whole array, since the copy_backward is used, so just appending the new element to the end, so it’s very fast.

Deque

front: slow; elsewhere: fast; end: very fast; memory: high

Doc iterator insert( iterator pos, const T& value );

I feel sleepy, this part is the DIY part for you =) To found that front is slow, and elsewhere is fast!

template <class _Tp, class _Allocator>
typename deque<_Tp, _Allocator>::iterator
deque<_Tp, _Allocator>::insert(const_iterator __p, value_type&& __v)
{
    size_type __pos = __p - __base::begin();
    size_type __to_end = __base::size() - __pos;
    allocator_type& __a = __base::__alloc();
    if (__pos < __to_end)
    {   // insert by shifting things backward
        if (__front_spare() == 0)
            __add_front_capacity();
        // __front_spare() >= 1
        if (__pos == 0)
        {
            __alloc_traits::construct(__a, _VSTD::addressof(*--__base::begin()), _VSTD::move(__v));
            --__base::__start_;
            ++__base::size();
        }
        else
        {
            iterator __b = __base::begin();
            iterator __bm1 = _VSTD::prev(__b);
            __alloc_traits::construct(__a, _VSTD::addressof(*__bm1), _VSTD::move(*__b));
            --__base::__start_;
            ++__base::size();
            if (__pos > 1)
                __b = _VSTD::move(_VSTD::next(__b), __b + __pos, __b);
            *__b = _VSTD::move(__v);
        }
    }
    else
    {   // insert by shifting things forward
        if (__back_spare() == 0)
            __add_back_capacity();
        // __back_capacity >= 1
        size_type __de = __base::size() - __pos;
        if (__de == 0)
        {
            __alloc_traits::construct(__a, _VSTD::addressof(*__base::end()), _VSTD::move(__v));
            ++__base::size();
        }
        else
        {
            iterator __e = __base::end();
            iterator __em1 = _VSTD::prev(__e);
            __alloc_traits::construct(__a, _VSTD::addressof(*__e), _VSTD::move(*__em1));
            ++__base::size();
            if (__de > 1)
                __e = _VSTD::move_backward(__e - __de, __em1, __e);
            *--__e = _VSTD::move(__v);
        }
    }
    return __base::begin() + __pos;
}

List

front: fast; elsewhere: very fast; end: fast; memory: high

Doc iterator insert( iterator pos, const T& value );

template <class _Tp, class _Alloc>
typename list<_Tp, _Alloc>::iterator
list<_Tp, _Alloc>::insert(const_iterator __p, const value_type& __x)
{
#if _LIBCPP_DEBUG_LEVEL == 2
    _LIBCPP_ASSERT(__get_const_db()->__find_c_from_i(&__p) == this,
        "list::insert(iterator, x) called with an iterator not"
        " referring to this list");
#endif
    __node_allocator& __na = base::__node_alloc();
    __hold_pointer __hold = __allocate_node(__na);
    __node_alloc_traits::construct(__na, _VSTD::addressof(__hold->__value_), __x);
    __link_nodes(__p.__ptr_, __hold->__as_link(), __hold->__as_link());
    ++base::__sz();
#if _LIBCPP_DEBUG_LEVEL == 2
    return iterator(__hold.release()->__as_link(), this);
#else
    return iterator(__hold.release()->__as_link());
#endif
}

Basically, constructing a new node and put in, so it’s very fast compared with others!

Source Code Party is over…

2020_11_24_2

💃: Good night~

References