UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 2270: invalid continuation byte #219

valleyUp · 2024-07-23T13:31:00Z

Hi,
During the process of reading MAF files, it seems that the default encoding used is utf-8 in my laptop, which leads to a failure in reading the file. The following modification is needed to use latin-1 encoding for reading. Hopefully, this will be helpful for others encountering this issue!

diff --git a/AnnotatorCore.py b/AnnotatorCore.py
index a448a7d..6e34f82 100644
--- a/AnnotatorCore.py
+++ b/AnnotatorCore.py
@@ -506,7 +506,7 @@ 
def processalterationevents(eventfile, outfile, previousoutfile, defaultCancerTy
     if os.path.isfile(previousoutfile):
         cacheannotated(previousoutfile, defaultCancerType, cancerTypeMap)
     outf = open(outfile, 'w+', 1000)
-    with open(eventfile, DEFAULT_READ_FILE_MODE) as infile:
+    with open(eventfile, DEFAULT_READ_FILE_MODE, encoding='latin-1') as infile:
         reader = csv.reader(infile, delimiter='\t')

         headers = readheaders(reader)

zhx828 · 2024-10-03T21:49:30Z

@valleyUp thanks! Do you think this is a general solution that can be used by anyone? If so, do you mind sending a pull request?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 2270: invalid continuation byte #219

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 2270: invalid continuation byte #219

valleyUp commented Jul 23, 2024

zhx828 commented Oct 3, 2024

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 2270: invalid continuation byte #219

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xd4 in position 2270: invalid continuation byte #219

Comments

valleyUp commented Jul 23, 2024

zhx828 commented Oct 3, 2024