Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 2270: invalid continuation byte #219

Open
valleyUp opened this issue Jul 23, 2024 · 1 comment

Comments

@valleyUp
Copy link

Hi,
During the process of reading MAF files, it seems that the default encoding used is utf-8 in my laptop, which leads to a failure in reading the file. The following modification is needed to use latin-1 encoding for reading. Hopefully, this will be helpful for others encountering this issue!

diff --git a/AnnotatorCore.py b/AnnotatorCore.py
index a448a7d..6e34f82 100644
--- a/AnnotatorCore.py
+++ b/AnnotatorCore.py
@@ -506,7 +506,7 @@ 
def processalterationevents(eventfile, outfile, previousoutfile, defaultCancerTy
     if os.path.isfile(previousoutfile):
         cacheannotated(previousoutfile, defaultCancerType, cancerTypeMap)
     outf = open(outfile, 'w+', 1000)
-    with open(eventfile, DEFAULT_READ_FILE_MODE) as infile:
+    with open(eventfile, DEFAULT_READ_FILE_MODE, encoding='latin-1') as infile:
         reader = csv.reader(infile, delimiter='\t')

         headers = readheaders(reader)
@zhx828
Copy link
Member

zhx828 commented Oct 3, 2024

@valleyUp thanks! Do you think this is a general solution that can be used by anyone? If so, do you mind sending a pull request?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants