-
Notifications
You must be signed in to change notification settings - Fork 45
/
TODO
86 lines (72 loc) · 2.92 KB
/
TODO
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
TODO
----
General development directions
* More various transport protocols support.
* More various APIs. e.g write Java class with libdpsearch support.
* Support for huge databases with hundred or thousand millions documents.
* Make it more managable, i.e. administration tools, etc.
Search quality and results presentation
---------------------------------------
* Click rank
* Administator defined dynamic site priority:
- approved sites which should be displayed in the top of results;
- disapproved sites (e.g. for abuse) which should not be displayed.
* Take in account words context: <b>, <font size="xx">, <big> and so on.
* Optional automatic URL limit by SERVER_NAME variable.
* "Exclude" limits, for example "to search though everything except
given site": ue=http://esite/
* Rank URLs with long pathnames lower than direct hits on let's say a domain
name with no directory path.
Indexing related stuff
----------------------
* Detect clones on site level. Currently it is implemented on page level
only. The idea is to detect that site being indexed is a mirror of another
site without having to index all pages but after indexing several pages only.
* SPAM clearance.
* Fix that indexer bacame slow when ServerTable is big. This is because
of full consecutive examination. Make in-memory cache for ServerTable part.
* Fix that "posgreSQL.org" and "posgresql.org" are considered as a
different sites.
* FTP digest ls-lR.gz support. For example,ftp://ftp.chg.ru/ls-lR.gz
* Make it possible for external parsers to return converted content
together with headers like Content-Type, Title and so on.
Charset related stuff
---------------------
* Remove "ForceIISCharset1251 yes/no"command. Replcase it with
enhanced "CharsetByServer <charset> <regexp> [<regexp>...]"
commmand.
* Stateful character sets support: UTF-7, Asian ISO-2022-XX
and others. They will not be used as a LocalCharset because
of much space, however indexer should be able to index them,
as well as search frontend should be able to use them as
a BrowserCharset.
Misc
----
* Smart search results cache cleaning after reindexing.
* Make it possible to set table names in indexer.conf and search.htm
* Learn about dublin core. A simple set of standard metadata for web pages.
http://www.searchtools.com/related/metadata.html#dc
* Add curl library support.
* Optimization for clusterisation.
Portability and code quality
----------------------------
Remove warnings on various platforms. Currenly it is built without
warnings on Linux and FreeBSD with these CFLAGS:
-Wall
-Wconversion
-Wshadow
-Wpointer-arith
-Wcast-qual
-Wcast-align
-Wwrite-strings
-Waggregate-return
-Wstrict-prototypes
-Wmissing-prototypes
-Wmissing-declarations
-Wredundant-decls
-Wnested-externs
-Wlong-long
-Winline
However some other platform compilers do produce warnings.
For example, mixed signed/unsigned chars on NetBSD Alpha compiler.
Please report those warnings and suggetions to fix to [email protected]