-
Notifications
You must be signed in to change notification settings - Fork 0
/
analysis.txt
87 lines (81 loc) · 1.68 KB
/
analysis.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
# Keeping structure the same as in original file
# This allows me to use javascript dot notation to get into nested jsons
# METADATA = {
# "newid",
# "oldid",
# "cgisplit",
# "lewissplit",
# "topics"
# }
# FULLTEXT = {
# "title",
# "dateline",
# "body"
# }
# LISTITEMS = {
# "places",
# "people",
# "orgs",
# "exchanges",
# "companies"
# "topics"
# }
schema:
{
"reuters": {
"attr": {
"newid",
"oldid",
"n",
"set",
"cgisplit",
"ewissplit",
"topics",
},
"date",
"topics",
"places",
"people",
"orgs",
"exchanges",
"companies",
"unknown",
"text": {
"title",
"dateline",
"body"
}
}
}
Expected APIs
1. API to list content (overview)
a) list by date
* greater than datetime
* lower than datetime
* in range of 2 datetimes
b) list by type of the content:
"topics" one of [set]
"places" one of [set]
"people" one of [set]
"orgs" one of [set]
"exchanges" one of [set]
"companies" one of [set]
2. API to search content
create full text search over these fields:
"text": {
"title".substring(xyz)
"dateline".substring(xyz)
"body".substring(xyz)
}
returns none, one or many (overview)
3. API get a specific content by id/any identifier (get body according to identifier usuallly oldid/newid)
metadata are equal to one specific value:
e.g. : <REUTERS TOPICS="YES" LEWISSPLIT="TRAIN" CGISPLIT="TRAINING-SET" OLDID="5544" NEWID="1">
"reuters": {
"attr": {
"newid" == value
"oldid" == value
"cgisplit" == value
"lewissplit" == value
"topics" == value
},