-
Notifications
You must be signed in to change notification settings - Fork 1
/
work_notes.txt
155 lines (109 loc) · 4.19 KB
/
work_notes.txt
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
FIX
- DSpace web source (login, options must be called properly)
- single run
- WEB export core
- MARCjs won't compile on nodejs 12
- remove request module OK
TODO:
- source node suggested keys shoudl be default keys (like Finna node)
- save visible keys to node settings
- re-generate schema should refresh keys
-
- node run register /stop functionality / force
- mark fields that node produces, also out.setter
- remove node dir on node remove
- File downloads
- CLD core
- user auth
- custom node functionality
- forked node functionality
- allow labeling of collections
- shibboleth
- devmode
DONE:
- PDF core
- script node
- allow labeling of nodes
- facet and search queries
- facet view
- add "views" section to UI
- finalize socket.io communication (messages must include project and node uuids) OK
- csv core OK
- node dirs
FIXED:
- OK node settings are not remembered
/pipe
collection of nodes, that are executed one after one. Nodes do not have to be from same collection or even project.
- title
- id
- nodes
Settings with underscore are lifted to UI (are not stored in settings)
What is GLAMpipe?
High Level Data Processing and Storage API - with a GUI
Rest aliases (nicknames)
LAS-WS
docker pull jiemakel/las-ws:1.1
docker run -d -p 9000:9000 --name las_ws jiemakel/las-ws:1.1
TESTING
worker-farm -> huge performance boost!
- makes communication a little bit more complex. -> Generic process counter is needed, child processes are aware only on they part
API:
/api/v2/collections/[COLLECTION_ID]/fields
- returns all fields of the collection
-
search
/api/v2/collections/[COLLECTION_ID]/docs
parameters:
- limit
- skip
- keys -> keys included
- nokeys -> keys not included
- mongoquery -> query object that is passed to mongo "mongoquery={'title': 'My document'}"
- "title=kissa|koira"
Nodes and cores
Nodes are synchronous code. No file writing or web requests. In order to write file or make a web request the node can use core modules.
Node gives options to core module and the result is given back to node.
Currently available core modules:
- web
- (pdf)
- (csv)
CHANGES:
- nodes can be labeled -> /nodes/labels/dspace_thesesis_query/run
- custom nodes can be made -> every file in node must be editable -> PUT /nodes/[UUID]/scripts/run.js
- custom node can be saved as user node -> where to store?
- schema checker is run after every process node -> if there are new fields, then attach them to latest node run. This way we can get fields set by "out.setter"
- possibly option to disable schem checker
- nodes and config can be loaded on the fly (no need to restart container)
- up-to-date check for data -> for example, rerun of source node invalidates all flollowing processing and export nodes.
- automatic node ordening -> if node is dependent of the output of the other node, then the second node is show earlien in the node list.
- python export
- view nodes are now separated from export nodes
- master-data collection -> immutable collection that can be used in project like it was a normal collection
Core communication
* things that are set in GP-node
core.options
- options object for core module (core request)
- the format depends on core module (for example, options object for web-core options is the options of request module)
core.skip
- tell core to skip this request
- if skip is non-empty, it is passed to a next round as core.error
core.skip = "no file present"
if(core.error)
out.value = GP.error + core.error
* Things that are set by core
core.data
- data that is coming from core module (core response)
core.error
- is set when core can not fulfill the request (file is missing, network error etc.)
OR
if core.skip is non-empty
OPTIONS
Options are node type specific values stored in the database. These are used mostly for URLs for nodes (like DSpace nodes)
SCHEMA
Every collection has their own schema in gp__schemas. Schema keeps track of fields in the collection.
source:
- when a source node is run, the createSchema is run after every import
- this results the "original fields" list
process and export:
- if node uses out.setter, then fields in the setter are added to schema in every round.
- if node has setings.out_value, then this is added to the the schema when node is created