HSDS
 All Data Structures Namespaces Files Functions Variables Typedefs Friends Macros
Data Structures | Public Member Functions
hsds::WaveletMatrix Class Reference

Wavelet matrix class. More...

#include <wavelet-matrix.hpp>

Public Member Functions

 WaveletMatrix ()
 Constructor. More...
 
virtual ~WaveletMatrix ()
 Destructor. More...
 
void clear ()
 Clear wavelet matrix. More...
 
void swap (WaveletMatrix &x)
 Exchanges the content of the instance. More...
 
void build (std::vector< uint64_t > &src)
 Build the wavelet matrix form array(src) More...
 
uint64_t size () const
 Return the number of elements. More...
 
uint64_t operator[] (uint64_t pos)
 Lookup A[pos]. More...
 
uint64_t lookup (uint64_t pos) const
 Lookup A[pos]. More...
 
uint64_t rank (uint64_t c, uint64_t pos) const
 Compute the rank = the frequency of a character 'c' in the prefix of the array A[0...pos) More...
 
uint64_t rankLessThan (uint64_t c, uint64_t pos) const
 Compute the frequency of characters c' < c in the subarray A[0...pos) More...
 
uint64_t rankMoreThan (uint64_t c, uint64_t pos) const
 Compute the frequency of characters c' > c in the subarray A[0...pos) More...
 
void rankAll (uint64_t c, uint64_t begin_pos, uint64_t end_pos, uint64_t &rank, uint64_t &rank_less_than, uint64_t &rank_more_than) const
 Compute the frequency of characters c' < c, c'=c, and c' > c, in the subarray A[begin_pos...end_pos) More...
 
uint64_t select (uint64_t c, uint64_t rank) const
 Compute the select = the position of the (rank+1)-th occurrence of 'c' in the array. More...
 
uint64_t selectFromPos (uint64_t c, uint64_t pos, uint64_t rank) const
 Compute the select = the position of the (rank+1)-th occurrence of 'c' in the suffix of the array starting from 'pos'. More...
 
uint64_t freq (uint64_t c) const
 Compute the frequency of the character c. More...
 
uint64_t freqSum (uint64_t min_c, uint64_t max_c) const
 Compute the frequency of the characters. More...
 
uint64_t freqRange (uint64_t min_c, uint64_t max_c, uint64_t begin_pos, uint64_t end_pos) const
 Compute the frequency of characters min_c <= c' < max_c in the subarray A[beg_pos ... More...
 
void maxRange (uint64_t begin_pos, uint64_t end_pos, uint64_t &pos, uint64_t &val) const
 Range Max Query. More...
 
void minRange (uint64_t begin_pos, uint64_t end_pos, uint64_t &pos, uint64_t &val) const
 Range Min Query. More...
 
void quantileRange (uint64_t begin_pos, uint64_t end_pos, uint64_t k, uint64_t &pos, uint64_t &val) const
 Range Quantile Query, Return the K-th smallest value in the subarray. More...
 
void listModeRange (uint64_t min_c, uint64_t max_c, uint64_t beg_pos, uint64_t end_pos, uint64_t num, std::vector< ListResult > &res) const
 List the distinct characters appeared in A[beg_pos ... More...
 
void listMinRange (uint64_t min_c, uint64_t max_c, uint64_t beg_pos, uint64_t end_pos, uint64_t num, std::vector< ListResult > &res) const
 List the distinct characters in A[beg_pos ... More...
 
void listMaxRange (uint64_t min_c, uint64_t max_c, uint64_t beg_pos, uint64_t end_pos, uint64_t num, std::vector< ListResult > &res) const
 List the distinct characters appeared in A[beg_pos ... More...
 
void save (std::ostream &os) const throw (hsds::Exception)
 Save the current status to a stream. More...
 
void load (std::istream &is) throw (hsds::Exception)
 Load the current status from a stream. More...
 
uint64_t map (void *ptr, uint64_t mapSize) throw (hsds::Exception)
 Mapping pointer to the WaveletMatrix. More...
 

Detailed Description

Wavelet matrix class.

See also http://www.dcc.uchile.cl/~gnavarro/ps/spire12.4.pdf

Definition at line 67 of file wavelet-matrix.hpp.

Constructor & Destructor Documentation

hsds::WaveletMatrix::WaveletMatrix ( )

Constructor.

virtual hsds::WaveletMatrix::~WaveletMatrix ( )
virtual

Destructor.

Member Function Documentation

void hsds::WaveletMatrix::build ( std::vector< uint64_t > &  src)

Build the wavelet matrix form array(src)

Parameters
[in]srcAn array to be initialized
void hsds::WaveletMatrix::clear ( )

Clear wavelet matrix.

uint64_t hsds::WaveletMatrix::freq ( uint64_t  c) const

Compute the frequency of the character c.

Parameters
[in]cThe character to be examined
Returns
Return the frequency of c in the array.
uint64_t hsds::WaveletMatrix::freqRange ( uint64_t  min_c,
uint64_t  max_c,
uint64_t  begin_pos,
uint64_t  end_pos 
) const

Compute the frequency of characters min_c <= c' < max_c in the subarray A[beg_pos ...

end_pos)

Parameters
[in]min_cThe smallest character to be examined
[in]max_cThe upper bound of the character to be examined
[in]begin_posThe beginning position of the array (inclusive)
[in]end_posThe ending position of the array (not inclusive)
Returns
The frequency of characters min_c <= c < max_c in the subarray A[beg_pos .. end_pos) or NOTFOUND if max_c > alphabet_num or end_pos > length
uint64_t hsds::WaveletMatrix::freqSum ( uint64_t  min_c,
uint64_t  max_c 
) const

Compute the frequency of the characters.

Parameters
[in]min_cThe minimum character
[in]max_cThe maximum character
Returns
Return the frequency of min_c <= c < max_c in the array.
void hsds::WaveletMatrix::listMaxRange ( uint64_t  min_c,
uint64_t  max_c,
uint64_t  beg_pos,
uint64_t  end_pos,
uint64_t  num,
std::vector< ListResult > &  res 
) const

List the distinct characters appeared in A[beg_pos ...

end_pos) from largest ones

Parameters
min_cThe smallest character to be examined
max_cThe upper bound of the character to be examined
beg_posThe beginning position of the array (inclusive)
end_posThe ending position of the array (not inclusive)
numThe maximum number of reporting results.
resThe distinct characters in the A[beg_pos ... end_pos) from largest ones. Each item consists of c:character and freq: frequency of c.
void hsds::WaveletMatrix::listMinRange ( uint64_t  min_c,
uint64_t  max_c,
uint64_t  beg_pos,
uint64_t  end_pos,
uint64_t  num,
std::vector< ListResult > &  res 
) const

List the distinct characters in A[beg_pos ...

end_pos) min_c <= c < max_c from smallest ones

Parameters
min_cThe smallest character to be examined
max_cThe upper bound of the character to be examined
beg_posThe beginning position of the array (inclusive)
end_posThe ending position of the array (not inclusive)
numThe maximum number of reporting results.
resThe distinct characters in the A[beg_pos ... end_pos) from smallest ones. Each item consists of c:character and freq: frequency of c.
void hsds::WaveletMatrix::listModeRange ( uint64_t  min_c,
uint64_t  max_c,
uint64_t  beg_pos,
uint64_t  end_pos,
uint64_t  num,
std::vector< ListResult > &  res 
) const

List the distinct characters appeared in A[beg_pos ...

end_pos) from most frequent ones

Parameters
min_cThe smallest character to be examined
max_cThe upper bound of the character to be examined
beg_posThe beginning position of the array (inclusive)
end_posThe ending position of the array (not inclusive)
numThe maximum number of reporting results.
resThe distinct characters in the A[beg_pos ... end_pos) from most frequent ones. Each item consists of c:character and freq: frequency of c.
void hsds::WaveletMatrix::load ( std::istream &  is) throw (hsds::Exception)

Load the current status from a stream.

Parameters
[in]isThe input stream where the status is saved
Exceptions
hsds::ExceptionWhen failed to load.
uint64_t hsds::WaveletMatrix::lookup ( uint64_t  pos) const

Lookup A[pos].

Parameters
[in]posThe position
Returns
return A[pos] if found, or return NOT_FOUND if pos >= size
uint64_t hsds::WaveletMatrix::map ( void *  ptr,
uint64_t  mapSize 
) throw (hsds::Exception)

Mapping pointer to the WaveletMatrix.

Parameters
[in]ptrThe pointer of the mmaped file
[in]mapSizeThe size of mmaped file
Returns
Actually mapped size(byte size of offset from ptr).
Exceptions
hsds::ExceptionWhen failed to load.
void hsds::WaveletMatrix::maxRange ( uint64_t  begin_pos,
uint64_t  end_pos,
uint64_t &  pos,
uint64_t &  val 
) const

Range Max Query.

Parameters
[in]begin_posThe beginning position
[in]end_posThe ending position
[out]posThe position where the largest value appeared in the subarray A[beg_pos .. end_pos) If there are many items having the largest values, the smallest pos will be reported
[out]valThe largest value appeared in the subarray A[beg_pos ... end_pos)
void hsds::WaveletMatrix::minRange ( uint64_t  begin_pos,
uint64_t  end_pos,
uint64_t &  pos,
uint64_t &  val 
) const

Range Min Query.

Parameters
[in]begin_posThe beginning position
[in]end_posThe ending position
[out]posThe position where the smallest value appeared in the subarray A[beg_pos .. end_pos) If there are many items having the smallest values, the smallest pos will be reported
[out]valThe smallest value appeared in the subarray A[beg_pos ... end_pos)
uint64_t hsds::WaveletMatrix::operator[] ( uint64_t  pos)
inline

Lookup A[pos].

Parameters
[in]posThe position
Returns
return A[pos] if found, or return NOT_FOUND if pos >= size

Definition at line 115 of file wavelet-matrix.hpp.

void hsds::WaveletMatrix::quantileRange ( uint64_t  begin_pos,
uint64_t  end_pos,
uint64_t  k,
uint64_t &  pos,
uint64_t &  val 
) const

Range Quantile Query, Return the K-th smallest value in the subarray.

Parameters
[in]begin_posThe beginning position
[in]end_posThe ending position
[in]kThe order (should be smaller than end_pos - beg_pos).
[out]posThe position where the k-th largest value appeared in the subarray A[beg_pos .. end_pos) If there are many items having the k-th largest values, the smallest pos will be reported
[out]valThe k-th largest value appeared in the subarray A[beg_pos ... end_pos)
uint64_t hsds::WaveletMatrix::rank ( uint64_t  c,
uint64_t  pos 
) const

Compute the rank = the frequency of a character 'c' in the prefix of the array A[0...pos)

Parameters
[in]cCharacter to be examined
[in]posThe position of the prefix (not inclusive)
Returns
The frequency of a character 'c' in the prefix of the array A[0...pos) or NOT_FOUND if c >= alphabet_num or pos > size
void hsds::WaveletMatrix::rankAll ( uint64_t  c,
uint64_t  begin_pos,
uint64_t  end_pos,
uint64_t &  rank,
uint64_t &  rank_less_than,
uint64_t &  rank_more_than 
) const

Compute the frequency of characters c' < c, c'=c, and c' > c, in the subarray A[begin_pos...end_pos)

Parameters
cThe character
begin_posThe beginning position of the array (not inclusive)
end_posThe ending position of the array (not inclusive)
rankThe frefquency of c in A[0...pos)
rank_less_thanThe frequency of c' < c in A[0...pos)
rank_more_thanThe frequency of c' > c in A[0...pos)
uint64_t hsds::WaveletMatrix::rankLessThan ( uint64_t  c,
uint64_t  pos 
) const

Compute the frequency of characters c' < c in the subarray A[0...pos)

Parameters
[in]cThe upper bound of the characters
[in]posThe position of the end of the prefix (not inclusive)
Returns
The frequency of characters c' < c in the prefix of the array A[0...pos) or NOTFOUND if c > alphabet_num or pos > size
uint64_t hsds::WaveletMatrix::rankMoreThan ( uint64_t  c,
uint64_t  pos 
) const

Compute the frequency of characters c' > c in the subarray A[0...pos)

Parameters
[in]cThe lower bound of the characters
[in]posThe position of the end of the prefix (not inclusive)
Returns
The frequency of characters c' < c in the prefix of the array A[0...pos) or NOTFOUND if c > alphabet_num or pos > length
void hsds::WaveletMatrix::save ( std::ostream &  os) const throw (hsds::Exception)

Save the current status to a stream.

Parameters
[out]osThe output stream where the data is saved
Exceptions
hsds::ExceptionWhen failed to save.
uint64_t hsds::WaveletMatrix::select ( uint64_t  c,
uint64_t  rank 
) const

Compute the select = the position of the (rank+1)-th occurrence of 'c' in the array.

Parameters
[in]cCharacter to be examined
[in]rankThe rank of the character
Returns
The position of the (rank+1)-th occurrence of 'c' in the array. or NOT_FOUND if c >= alphabet_num or rank > freq(c)
uint64_t hsds::WaveletMatrix::selectFromPos ( uint64_t  c,
uint64_t  pos,
uint64_t  rank 
) const

Compute the select = the position of the (rank+1)-th occurrence of 'c' in the suffix of the array starting from 'pos'.

Parameters
[in]cCharacter to be examined
[in]posThe beginning position of the suffix (inclusive)
[in]rankThe rank of the character
Returns
The position of the (rank+1)-th occurrence of 'c' in the suffix of the array. or NOT_FOUND if c >= alphabet_num or rank > freq(c) - rank(c, pos)
uint64_t hsds::WaveletMatrix::size ( ) const
inline

Return the number of elements.

Returns
number of elements.

Definition at line 104 of file wavelet-matrix.hpp.

void hsds::WaveletMatrix::swap ( WaveletMatrix x)

Exchanges the content of the instance.

Parameters
[in,out]xAnother WaveletMatrix instance

The documentation for this class was generated from the following file: