Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

io->bio #339

Closed
wants to merge 67 commits into from
Closed
Show file tree
Hide file tree
Changes from 2 commits
Commits
Show all changes
67 commits
Select commit Hold shift + click to select a range
b87947b
Moved io to bio
Koeng101 Aug 24, 2023
5c887a3
fixed io imports
Koeng101 Aug 24, 2023
d8f4b38
Add more generic definitions to bio
Koeng101 Aug 31, 2023
4fb41ff
Update bio/fastq/fastq.go
Koeng101 Sep 1, 2023
2452282
update fasta
Koeng101 Sep 1, 2023
6dda2b9
Merge branch 'ioToBio' of github.com:TimothyStiles/poly into ioToBio
Koeng101 Sep 1, 2023
16fbcbd
add fasta updates and parser
Koeng101 Sep 1, 2023
382a014
made readability improvements
Koeng101 Sep 2, 2023
0bbd05e
changed ParseWithHeader
Koeng101 Sep 2, 2023
eb68f81
removed int64 in reads
Koeng101 Sep 2, 2023
344220c
add more example tests
Koeng101 Sep 2, 2023
03f8b68
gotta update this for this tests!
Koeng101 Sep 2, 2023
6199c43
integrate slow5
Koeng101 Sep 5, 2023
65f0539
have examples covering most of changes
Koeng101 Sep 5, 2023
8ff6da4
removed interfaces
Koeng101 Sep 5, 2023
00732a4
updated with NewXXXParser
Koeng101 Sep 7, 2023
3ce8109
added 3 parsers
Koeng101 Sep 7, 2023
630bd88
added pileup
Koeng101 Sep 7, 2023
df98fe3
add concurrent functions plus better documentation
Koeng101 Sep 9, 2023
fa4d29a
moved svb to ioToBio
Koeng101 Sep 11, 2023
f80b317
Improve tests
Koeng101 Sep 11, 2023
37859a8
make better docs for header
Koeng101 Sep 11, 2023
e24801b
Update bio/fasta/fasta_test.go
Koeng101 Sep 12, 2023
584b73e
changed name of LowLevelParser to parserInterface
Koeng101 Sep 12, 2023
da7118a
Merge branch 'main' into ioToBio
Koeng101 Sep 12, 2023
5e6204f
zw -> zipWriter
Koeng101 Sep 12, 2023
90316d3
remove a identifier from pileup
Koeng101 Sep 12, 2023
7b2cd52
genbank parser now compatible
Koeng101 Sep 13, 2023
9b55fda
writeTo interface now fulfilled
Koeng101 Sep 13, 2023
6655565
make linter happy :)
Koeng101 Sep 13, 2023
11972ae
convert all types to io.WriterTo
Koeng101 Sep 14, 2023
12a4b48
fixed linter issues
Koeng101 Sep 14, 2023
4b50625
handle EOF better
Koeng101 Sep 14, 2023
f44721c
fixed tutorial
Koeng101 Sep 14, 2023
b192fda
fix genbank read error
Koeng101 Sep 14, 2023
3eab1f9
remove io.WriterTo checks
Koeng101 Sep 14, 2023
0edfd1c
fix with cmp.Equal
Koeng101 Sep 16, 2023
34de749
Merge pull request #341 from TimothyStiles/slow5StreamVByte2
Koeng101 Sep 16, 2023
6abe0cd
Merge branch 'main' into ioToBio
Koeng101 Oct 28, 2023
4c61c22
genbank tests merged
Koeng101 Oct 28, 2023
1d23668
sample merge
Koeng101 Oct 28, 2023
56772bb
Merge branch 'main' of github.com:TimothyStiles/poly into ioToBio
Koeng101 Oct 28, 2023
956d26e
make linter happy
Koeng101 Oct 28, 2023
158fcf1
Added generic collections module
abondrn Oct 30, 2023
8862a6c
Switched Feature.Attributes to use multimap
abondrn Oct 30, 2023
ef07e94
Fixed tests
abondrn Oct 30, 2023
cac1e55
Ran linter
abondrn Oct 30, 2023
8025bc2
Added copy methods
abondrn Oct 30, 2023
1f49f9d
Adds new functional test that addresses case where there is a partial…
abondrn Oct 30, 2023
8112866
Ran linter
abondrn Oct 30, 2023
fec8796
Add capability to compute sequence features and marshal en masse
abondrn Oct 30, 2023
9ce9f4f
Add methods to convert polyjson -> genbank
abondrn Oct 30, 2023
89a2ba4
Removed generic collections library in favor of hand-rolled multimap,…
abondrn Oct 30, 2023
b88d7b8
Propogate handrolled multimap to test files
abondrn Oct 30, 2023
b4c3a37
Responded to more comments
abondrn Oct 30, 2023
8b82d7b
Reduced new example genbank file
abondrn Oct 31, 2023
f523651
Resolved lint errors, added test StoreFeatureSequences and fixed unco…
abondrn Oct 31, 2023
1270ec8
Added multimap.go file doc
abondrn Oct 31, 2023
9c322f6
Responded to more comments
abondrn Oct 31, 2023
f124fae
First merge attempt
abondrn Nov 4, 2023
98b6984
Fixed deref issue
abondrn Nov 4, 2023
fc2ca75
Merged updated branch
abondrn Nov 4, 2023
25e0f61
Fixed tests, moved genbank files
abondrn Nov 4, 2023
60abf6d
Fixed fasta docs
abondrn Nov 4, 2023
7e3c812
Added changelog
abondrn Nov 5, 2023
35a5492
Merge pull request #394 from abondrn/ioToBio-genbank
Koeng101 Nov 5, 2023
433df00
added to changelog
Koeng101 Nov 10, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
23 changes: 23 additions & 0 deletions bio/bio.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,23 @@
/*
Koeng101 marked this conversation as resolved.
Show resolved Hide resolved
Package bio provides utilities for reading and writing sequence data.
*/
package bio

import (
"io"
)

/*
This package is supposed to be empty and only exists to provide a doc string.
Koeng101 marked this conversation as resolved.
Show resolved Hide resolved
Otherwise its namespace would collide with Go's native IO package.
*/

type Parser[T any, TH any] interface {
Koeng101 marked this conversation as resolved.
Show resolved Hide resolved
Header() (TH, error)
Next() (T, error)
MaxLineCount() int64
}

type Writer interface {
Koeng101 marked this conversation as resolved.
Show resolved Hide resolved
Write(io.Writer) error
}
13 changes: 13 additions & 0 deletions bio/bio_test.go
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
package bio

import (
"testing"

"github.com/TimothyStiles/poly/bio/fasta"
"github.com/TimothyStiles/poly/bio/fastq"
)

func TestWriter(t *testing.T) {
var _ Writer = &fastq.Fastq{}
var _ Writer = &fasta.Fasta{}
}
12 changes: 6 additions & 6 deletions io/example_test.go → bio/example_test.go
Original file line number Diff line number Diff line change
@@ -1,10 +1,10 @@
package io_test
package bio_test

import (
"github.com/TimothyStiles/poly/io/fasta"
"github.com/TimothyStiles/poly/io/genbank"
"github.com/TimothyStiles/poly/io/gff"
"github.com/TimothyStiles/poly/io/polyjson"
"github.com/TimothyStiles/poly/bio/fasta"
"github.com/TimothyStiles/poly/bio/genbank"
"github.com/TimothyStiles/poly/bio/gff"
"github.com/TimothyStiles/poly/bio/polyjson"
)

// This is where the integration tests that make effed up cyclic dependencies go.
Expand All @@ -22,7 +22,7 @@ func Example() {
// Poly can also output these file formats. Every file format has a corresponding Write function.
_ = gff.Write(gffInput, "test.gff")
_ = genbank.Write(gbkInput, "test.gbk")
_ = fasta.Write(fastaInput, "test.fasta")
_ = fasta.WriteFile(fastaInput, "test.fasta")
Koeng101 marked this conversation as resolved.
Show resolved Hide resolved
_ = polyjson.Write(jsonInput, "test.json")

// Extra tips:
Expand Down
File renamed without changes.
12 changes: 8 additions & 4 deletions io/fasta/example_test.go → bio/fasta/example_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ import (
"os"
"strings"

"github.com/TimothyStiles/poly/io/fasta"
"github.com/TimothyStiles/poly/bio/fasta"
)

//go:embed data/base.fasta
Expand Down Expand Up @@ -40,8 +40,12 @@ func ExampleParse() {
// ExampleBuild shows basic usage for Build
func ExampleBuild() {
fastas, _ := fasta.Read("data/base.fasta") // get example data
fasta, _ := fasta.Build(fastas) // build a fasta byte array
firstLine := string(bytes.Split(fasta, []byte("\n"))[0])
var buffer bytes.Buffer // Initialize a buffer to write fastas into
for _, fasta := range fastas {
_ = fasta.Write(&buffer) // build a fasta byte array

}
firstLine := string(bytes.Split(buffer.Bytes(), []byte("\n"))[0])

fmt.Println(firstLine)
// Output: >gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]
Expand All @@ -50,7 +54,7 @@ func ExampleBuild() {
// ExampleWrite shows basic usage of the writer.
func ExampleWrite() {
fastas, _ := fasta.Read("data/base.fasta") // get example data
_ = fasta.Write(fastas, "data/test.fasta") // write it out again
_ = fasta.WriteFile(fastas, "data/test.fasta") // write it out again
Koeng101 marked this conversation as resolved.
Show resolved Hide resolved
testSequence, _ := fasta.Read("data/test.fasta") // read it in again

os.Remove("data/test.fasta") // getting rid of test file
Expand Down
87 changes: 50 additions & 37 deletions io/fasta/fasta.go → bio/fasta/fasta.go
Original file line number Diff line number Diff line change
Expand Up @@ -13,7 +13,6 @@ package fasta

import (
"bufio"
"bytes"
"compress/gzip"
"errors"
"fmt"
Expand Down Expand Up @@ -56,12 +55,6 @@ Keoni

******************************************************************************/

var (
Koeng101 marked this conversation as resolved.
Show resolved Hide resolved
gzipReaderFn = gzip.NewReader
openFn = os.Open
buildFn = Build
)

// Fasta is a struct representing a single Fasta file element with a Name and its corresponding Sequence.
type Fasta struct {
Name string `json:"name"`
Expand Down Expand Up @@ -303,7 +296,7 @@ Start of Read functions
// Deprecated: Use Parser.ParseNext() instead.
func ReadGzConcurrent(path string, sequences chan<- Fasta) {
file, _ := os.Open(path) // TODO: these errors need to be handled/logged
reader, _ := gzipReaderFn(file)
reader, _ := gzip.NewReader(file)
go func() {
defer file.Close()
defer reader.Close()
Expand All @@ -322,12 +315,12 @@ func ReadConcurrent(path string, sequences chan<- Fasta) {

// ReadGz reads a gzipped file into an array of Fasta structs.
func ReadGz(path string) ([]Fasta, error) {
file, err := openFn(path)
file, err := os.Open(path)
if err != nil {
return nil, err
}
defer file.Close()
reader, err := gzipReaderFn(file)
reader, err := gzip.NewReader(file)
if err != nil {
return nil, err
}
Expand All @@ -337,7 +330,7 @@ func ReadGz(path string) ([]Fasta, error) {

// Read reads a file into an array of Fasta structs
func Read(path string) ([]Fasta, error) {
file, err := openFn(path)
file, err := os.Open(path)
if err != nil {
return nil, err
}
Expand All @@ -351,38 +344,58 @@ Start of Write functions

******************************************************************************/

// Build converts a Fastas array into a byte array to be written to a file.
func Build(fastas []Fasta) ([]byte, error) {
var fastaString bytes.Buffer
fastaLength := len(fastas)
for fastaIndex, fasta := range fastas {
fastaString.WriteString(">")
fastaString.WriteString(fasta.Name)
fastaString.WriteString("\n")

lineCount := 0
// write the fasta sequence 80 characters at a time
for _, character := range fasta.Sequence {

fastaString.WriteRune(character)
lineCount++
if lineCount == 80 {
fastaString.WriteString("\n")
lineCount = 0
}
// Write converts a Fastas array into a byte array to be written to a file.
func (fasta *Fasta) Write(w io.Writer) error {
_, err := w.Write([]byte(">"))
if err != nil {
return err
}
_, err = w.Write([]byte(fasta.Name))
if err != nil {
return err
}
_, err = w.Write([]byte("\n"))
if err != nil {
return err
}

lineCount := 0
// write the fasta sequence 80 characters at a time
for _, character := range fasta.Sequence {

_, err = w.Write([]byte{byte(character)})
if err != nil {
return err
}
if fastaIndex != fastaLength-1 {
fastaString.WriteString("\n\n")
lineCount++
if lineCount == 80 {
_, err = w.Write([]byte("\n"))
if err != nil {
return err
}
lineCount = 0
}
}
return fastaString.Bytes(), nil
_, err = w.Write([]byte("\n\n"))
if err != nil {
return err
}
return nil
}

// Write writes a fasta array to a file.
func Write(fastas []Fasta, path string) error {
fastaBytes, err := buildFn(fastas) // fasta.Build returns only nil errors.
// WriteFile writes a fasta array to a file.
func WriteFile(fastas []Fasta, path string) error {
file, err := os.OpenFile(path, os.O_WRONLY|os.O_CREATE, 0644)
if err != nil {
return err
}
return os.WriteFile(path, fastaBytes, 0644)
defer file.Close()

for _, fasta := range fastas {
err = fasta.Write(file)
if err != nil {
return err
}
}
return nil
}
51 changes: 6 additions & 45 deletions io/fasta/fasta_test.go → bio/fasta/fasta_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -73,63 +73,24 @@ func BenchmarkParser(b *testing.B) {

func TestRead_error(t *testing.T) {
t.Run("Read errors opening the file", func(t *testing.T) {
openErr := errors.New("open error")
oldOpenFn := openFn
openFn = func(name string) (*os.File, error) {
return nil, openErr
}
defer func() {
openFn = oldOpenFn
}()
_, err := Read("/tmp/file")
openErr := errors.New("open /tmp/file12345: no such file or directory")
_, err := Read("/tmp/file12345")
assert.EqualError(t, err, openErr.Error())
})

t.Run("ReadGz errors opening the file", func(t *testing.T) {
openErr := errors.New("open error")
oldOpenFn := openFn
openFn = func(name string) (*os.File, error) {
return nil, openErr
}
defer func() {
openFn = oldOpenFn
}()
_, err := ReadGz("/tmp/file")
openErr := errors.New("open /tmp/file12345: no such file or directory")
_, err := ReadGz("/tmp/file12345")
assert.EqualError(t, err, openErr.Error())
})

t.Run("ReadGz errors reading the file", func(t *testing.T) {
readErr := errors.New("read error")
oldOpenFn := openFn
oldGzipReaderFn := gzipReaderFn
openFn = func(name string) (*os.File, error) {
return &os.File{}, nil
}
gzipReaderFn = func(r io.Reader) (*gzip.Reader, error) {
return nil, readErr
}
defer func() {
openFn = oldOpenFn
gzipReaderFn = oldGzipReaderFn
}()
_, err := ReadGz("/tmp/file")
readErr := errors.New("open /tmp/file12345: no such file or directory")
_, err := ReadGz("/tmp/file12345")
assert.EqualError(t, err, readErr.Error())
})
}

func TestWrite_error(t *testing.T) {
buildErr := errors.New("build error")
oldBuildFn := buildFn
buildFn = func(fastas []Fasta) ([]byte, error) {
return nil, buildErr
}
defer func() {
buildFn = oldBuildFn
}()
err := Write([]Fasta{}, "/tmp/file")
assert.EqualError(t, err, buildErr.Error())
}

func TestParser(t *testing.T) {
parser := NewParser(nil, 256)
for testIndex, test := range []struct {
Expand Down
File renamed without changes.
File renamed without changes.
Binary file added bio/fastq/data/pOpen_v3.fastq.gz
Binary file not shown.
2 changes: 1 addition & 1 deletion io/fastq/example_test.go → bio/fastq/example_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ import (
"os"
"strings"

"github.com/TimothyStiles/poly/io/fastq"
"github.com/TimothyStiles/poly/bio/fastq"
)

//go:embed data/nanosavseq.fastq
Expand Down
Loading