forked from DidierStevens/DidierStevensSuite
-
Notifications
You must be signed in to change notification settings - Fork 0
/
find-file-in-file.py
399 lines (339 loc) · 14.2 KB
/
find-file-in-file.py
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
#!/usr/bin/env python
__description__ = 'Find if a file is present in another file'
__author__ = 'Didier Stevens'
__version__ = '0.0.5'
__date__ = '2015/11/14'
"""
Source code put in public domain by Didier Stevens, no Copyright
https://DidierStevens.com
Use at your own risk
History:
2013/07/24: start
2013/07/25: continue
2013/09/28: bug fixes; added option overlap
2013/10/17: v0.0.2 added batch mode: processing of several containing files
2013/10/31: v0.0.3 added error handling ProcessAt
2014/05/03: v0.0.4 added option skip
2014/06/11: added options partial, output, range, hexdump
2014/06/13: added option quiet
2014/11/25: changed help string for options; added manual
2015/09/17: 0.0.5 added indicator (End of containing file)
2015/11/14: bugfix File2StringZIP and added option -r
Todo:
cOutput: close on dispose
"""
import optparse
import operator
import glob
import collections
import zipfile
import sys
import binascii
import textwrap
MALWARE_PASSWORD = 'infected'
def PrintManual():
manual = '''
Manual:
find-file-in-file is a program to test if one file (the contained file) can be found inside another file (the containing file).
Here is an example.
We have a file called contained-1.txt with the following content: ABCDEFGHIJKLMNOPQRSTUVWXYZ
and have a file called containing-1.txt with the following content: 0000ABCDEFGHIJKLM1111NOPQRSTUVWXYZ2222
When we execute the following command:
find-file-in-file.py contained-1.txt containing-1.txt
We get this output:
0x00000004 0x0000000d (50%)
0x00000015 0x0000000d (50%)
Finished
This means that the file contained-1.txt was completely found inside file containing-1.txt At position 0x00000004 we found a first part (0x0000000d bytes) and at position 0x00000015 we found a second part (0x0000000d bytes).
We can use option hexdump (-x) to see which bytes were found:
find-file-in-file.py -x contained-1.txt containing-1.txt
0x00000004 0x0000000d (50%)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
0x00000015 0x0000000d (50%)
4e 4f 50 51 52 53 54 55 56 57 58 59 5a
Finished
The containing file may contain the contained file in an arbitrary order, like file containing-2.txt:
0000NOPQRSTUVWXYZ1111ABCDEFGHIJKLM2222
Example:
find-file-in-file.py -x contained-1.txt containing-2.txt
0x00000015 0x0000000d (50%)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
0x00000004 0x0000000d (50%)
4e 4f 50 51 52 53 54 55 56 57 58 59 5a
Finished
If a part of the contained file is found at the end of the containing file then indicator (End of containing file) is used, like file containing-4.txt:
0000NOPQRSTUVWXYZ1111ABCDEFGHIJKLM
Example:
find-file-in-file.py -x contained-1.txt containing-4.txt
0x00000015 0x0000000d (50%) (End of containing file)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
0x00000004 0x0000000d (50%)
4e 4f 50 51 52 53 54 55 56 57 58 59 5a
Finished
The containing file does not need to contain the complete contained file, like file containing-3.txt:
0000ABCDEFGHIJKLM1111
Example:
find-file-in-file.py -x contained-1.txt containing-3.txt
0x00000004 0x0000000d (50%)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
Remaining 13 (50%)
The message "Remaining 13 (50%)" means that the last 13 bytes of the contained file were not found in the containing file (that's 50% of the contained file).
If the contained file starts with a byte sequence not present in the containing file, nothing will be found. Example with file contained-2.txt:
0123456789ABCDEFGHIJKLMNOPQRSTUVWXYZ
Nothing is found:
find-file-in-file.py -x contained-2.txt containing-1.txt
Remaining 36 (100%)
If you know how long that initial byte sequence is, you can skip it. Use option rangebegin (-b) to specify the position in the contained file from where you want to start searching.
Example:
find-file-in-file.py -x -b 10 contained-2.txt containing-1.txt
0x00000004 0x0000000d (50%)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
0x00000015 0x0000000d (50%)
4e 4f 50 51 52 53 54 55 56 57 58 59 5a
Finished
If you want to skip bytes at the end of the contained file, use option rangeend (-e).
If you don't know how long that initial byte sequence is, you can instruct find-file-in-file to "brute-force" it. With option partial (-p), one byte at a time will be removed from the beginning of the contained file until a match is found.
Example:
find-file-in-file.py -x -p contained-2.txt containing-1.txt
File: containing-1.txt (partial 0x0a)
0x00000004 0x0000000d (50%)
41 42 43 44 45 46 47 48 49 4a 4b 4c 4d
0x00000015 0x0000000d (50%)
4e 4f 50 51 52 53 54 55 56 57 58 59 5a
Finished
"(partial 0x0a)" tells you that the first 10 bytes of the contained file were skipped before a match was found.
There are some other options:
-m minimum: find-file-in-file will search for byte sequences of 10 bytes long minimum. If you want to change this minimum, use option -m minimum.
-o overlap: find-file-in-file will not let byte sequences overlap. Use option -o overlap to remove this restriction.
-v verbose: be verbose in batch mode (more than one containing file).
-O output: besides writing output to stdout, write the output also to the given file.
-q quiet: do not output to stdout.
If the contained and/or containing files are ZIP files containing a single file, then the single file is extracted from the ZIP files and analyzed. To analyze the ZIP file and not the file inside the ZIP file, use option -r: regular; consider the ZIP file as a regular file.
'''
for line in manual.split('\n'):
print(textwrap.fill(line))
#Convert 2 Bytes If Python 3
def C2BIP3(string):
if sys.version_info[0] > 2:
return bytes([ord(x) for x in string])
else:
return string
def File2StringZIP(filename, regular=False):
f = None
oZipfile = None
if not regular and filename.lower().endswith('.zip'):
oZipfile = zipfile.ZipFile(filename, 'r')
if len(oZipfile.infolist()) == 1:
f = oZipfile.open(oZipfile.infolist()[0], 'r', C2BIP3(MALWARE_PASSWORD))
else:
f = None
oZipfile.close()
oZipfile = None
if f == None:
try:
f = open(filename, 'rb')
except:
return None
try:
return f.read()
except:
return None
finally:
f.close()
if oZipfile != None:
oZipfile.close()
def File2Strings(filename):
try:
f = open(filename, 'r')
except:
return None
try:
return map(lambda line:line.rstrip('\n'), f.readlines())
except:
return None
finally:
f.close()
def ProcessAt(argument):
if argument.startswith('@'):
strings = File2Strings(argument[1:])
if strings == None:
raise Exception('Error reading %s' % argument)
else:
return strings
else:
return [argument]
def ExpandFilenameArguments(filenames):
return list(collections.OrderedDict.fromkeys(sum(map(glob.glob, sum(map(ProcessAt, filenames), [])), [])))
# CIC: Call If Callable
def CIC(expression):
if callable(expression):
return expression()
else:
return expression
# IFF: IF Function
def IFF(expression, valueTrue, valueFalse):
if expression:
return CIC(valueTrue)
else:
return CIC(valueFalse)
class cOutput():
def __init__(self, filename, quiet=False):
self.filename = filename
self.quiet = quiet
if filename == '':
self.fOut = None
else:
self.fOut = open(self.filename, 'w')
def __del__(self):
self.Close()
def Close(self):
if self.fOut != None:
self.fOut.close()
def Print(self, line):
if not self.quiet:
print(line)
if self.fOut != None:
self.fOut.write(line + '\n')
def Hexdumpline(bytes):
line = binascii.hexlify(bytes[0:16])
return ' '.join(line[iIter:iIter + 2] for iIter in range(0, len(line), 2))
def Hexdump(bytes, length, oOutput):
while length >= 16:
oOutput.Print(' ' + Hexdumpline(bytes[0:16]))
length -= 16
bytes = bytes[16:]
if length > 0:
oOutput.Print(' ' + Hexdumpline(bytes[0:length]))
def Match(contained, containing, index, dFound):
for i in range(len(contained)):
if i + index >= len(containing) or contained[i] != containing[i + index] or i + index in dFound:
return i
return len(contained)
def ScanSub(contained, containing, dFound, options):
dMatches = {}
index = 0
while True:
result = containing.find(contained[0:options.minimum], index)
if result == -1:
break
found = Match(contained, containing, result, dFound)
if found > 0:
dMatches[result] = found
index = result + 1
if dMatches == {}:
return None, None
return max(dMatches.iteritems(), key=operator.itemgetter(1))
def Scan(fileContaining, contained, containing, singleMode, partial, options):
global oOutput
totalLength = 0
dFound = {}
remaining = contained
printFilename = not singleMode
if options.verbose:
oOutput.Print('File: %s' % fileContaining)
while True:
index, length = ScanSub(remaining, containing, dFound, options)
if index == None:
if singleMode or options.verbose:
oOutput.Print('Remaining %d (%d%%)' % (len(remaining), len(remaining) * 100.0 / len(contained)))
break
totalLength += length
if printFilename and not options.verbose:
oOutput.Print('File: %s%s' % (fileContaining, IFF(partial != 0, ' (partial 0x%02x)' % partial, '')))
printFilename = False
if len(containing[index + length:]) == 0:
eofMessage = ' (End of containing file)'
else:
eofMessage = ''
oOutput.Print('0x%08x 0x%08x (%d%%)%s' % (index, length, length * 100.0 / len(contained), eofMessage))
if options.hexdump:
Hexdump(remaining, length, oOutput)
if not options.overlap:
for counter in range(length):
dFound[counter + index] = True
remaining = remaining[length:]
if len(remaining) == 0:
oOutput.Print('Finished')
break
if len(remaining) < options.minimum:
oOutput.Print('Remaining bytes (%d) smaller than the minimum (option -m)' % len(remaining))
break
return totalLength
def FindFileInFile(fileContained, filesContaining, options):
global oOutput
oOutput = cOutput(options.output)
if options.minimum < 1:
oOutput.Print('Option m is too small')
return
containedAll = File2StringZIP(fileContained, options.regular)
if containedAll == None:
oOutput.Print('Error reading file %s' % fileContained)
return
try:
filesContaining = ExpandFilenameArguments(filesContaining)
except Exception as e:
oOutput.Print(e)
return
singleMode = len(filesContaining) == 1
rangebegin = IFF(options.rangebegin > 0, options.rangebegin, 0)
rangeend = IFF(options.rangeend > 0, options.rangeend, len(containedAll))
containedRange = containedAll[rangebegin:rangeend]
for fileContaining in filesContaining:
containing = File2StringZIP(fileContaining, options.regular)
if containing == None:
if singleMode or options.verbose:
oOutput.Print('Error reading file %s' % fileContaining)
continue
if len(containing) < options.minimum:
if singleMode or options.verbose:
oOutput.Print('Error file %s is smaller than the minimum (option -m)' % containing)
continue
if options.partial:
partials = range(len(containedRange) - options.minimum)
singleMode = False
else:
partials = [0]
while len(partials) > 0:
contained = containedRange[partials[0]:]
if len(contained) < options.minimum:
oOutput.Print('Error file %s is smaller than the minimum (option -m)' % fileContained)
return
totalLength = Scan(fileContaining, contained, containing, singleMode, partials[0], options)
if totalLength > 0:
partials = partials[totalLength:]
else:
partials = partials[1:]
def Main():
moredesc = '''
Arguments:
file-containing can be a single file, several files, and/or @file
@file: run the command on each file listed in the text file specified
wildcards are supported
batch mode is enabled when more than one file is specified
Source code put in the public domain by Didier Stevens, no Copyright
Use at your own risk
https://DidierStevens.com'''
oParser = optparse.OptionParser(usage='usage: %prog [options] file-contained file-containing [...]\n' + __description__ + moredesc, version='%prog ' + __version__)
oParser.add_option('--man', action='store_true', default=False, help='Print manual')
oParser.add_option('-m', '--minimum', type=int, default=10, help='Minimum length of byte-sequence to find (default 10)')
oParser.add_option('-o', '--overlap', action='store_true', default=False, help='Found sequences may overlap')
oParser.add_option('-v', '--verbose', action='store_true', default=False, help='Be verbose in batch mode')
oParser.add_option('-p', '--partial', action='store_true', default=False, help='Perform partial search of contained file')
oParser.add_option('-O', '--output', default='', help='Output to file')
oParser.add_option('-b', '--rangebegin', type=int, default=-1, help='Select the beginning of the contained file (by default byte 0)')
oParser.add_option('-e', '--rangeend', type=int, default=-1, help='Select the end of the contained file (by default last byte)')
oParser.add_option('-x', '--hexdump', action='store_true', default=False, help='Hexdump of found bytes')
oParser.add_option('-q', '--quiet', action='store_true', default=False, help='Do not output to standard output')
oParser.add_option('-r', '--regular', action='store_true', default=False, help='Handle a ZIP file like a regular (non-ZIP) file')
(options, args) = oParser.parse_args()
if options.man:
oParser.print_help()
PrintManual()
return
if len(args) < 2:
oParser.print_help()
return
else:
FindFileInFile(args[0], args[1:], options)
if __name__ == '__main__':
Main()