Skip to content

Python wrapper for KMC API

marekkokot edited this page Dec 29, 2018 · 9 revisions

Python wrapper for C++ KMC API is made to mimic C++ API interface. C++ are python are very different languages, so some compromise needs to be made. For example in python it is not possible to pass integer by reference, but in C++ KMC API this is the way the counter of k-mer is returnet. One (rather ugly) workaroud is to wrap integer in a class that is passed by reference in python.

py_kmc_api module contains following classes:

  • CountVec - it contains one filed (value) which is a list of integers being an output parameter of GetCountersForRead method of CKMCFile class.
  • Count - it contains one filed (value) which is a single integer being an output parameter of ReadNextKmer and CheckKmer methods of CKMCFile class.
  • LongKmerRepresentation - it contains one field (value) wich is a list of integers being a binary k-mer representation (each integer is, at C++ side 8 byte unsigned), it is used in to_long method of KmerAPI class
  • CKMCFileInfo - contains base information of KMC database.
  • CKmerAPI - represents a single k-mer.
  • CKMCFile - represents a KMC database, that may be opened in one of two modes: listing(only part of it is loaded into memory, privides sequential access to the database), random access (whole database is loaded into memory, provides existence query of a specific k-mer).

The public interface of this classes is described below.


CKMCFileInfo class

Fields:

  • kmer_length - the length of a k-mer
  • mode - always 0 (1 for older versions of KMC where quake aware counters were supported)
  • counter_size - the numer of bytes used to store each counter in kmc database
  • lut_prefix_length - internal parameter of kmc database, see more details in API.pdf
  • signature_len - the length of signature (it is also internal parameter used while database was constructed)
  • min_count - minimum value of a counter (if some k-mer had lower occurences than this value it is not stored in the database)
  • max_count - maximal value of a counter (if some k-mer had higher occurences than this value it is not store int the database)
  • both_strands - True if kmc was run with -b switch, False otherwise
  • total_kmers - the total numer of k-mers stored in the database

Methods: None

CKmerAPI class

Fields: None Methods:

  • init(length) - constructor, takes one parameter, the length of a k-mer (may be skipped then 1 is taken as a default value)
  • init(kmer: KmerAPI) - constructor that created new object beased on existing one (copy ctor in C++ nomenclature)
  • assign(kmer: KmerAPI) - replace kmer with the one passed by parameter (equivalent of C++ copy assignment operator)
  • eq(kmer: KmerAPI) - equality comparison
  • lt(kmer: KmerAPI) - lower than comparison
  • get_asci_symbol(pos) - returns symbol at 0-based position
  • get_num_symbol(pos) - same as get_asci_sybol, but encoded (A->0, C->1, G->2, T->3)
  • str - converts k-mer to string representation
  • to_long -

CKMCFile class

Fields: Methods: