Skip to content
This repository has been archived by the owner on Sep 26, 2022. It is now read-only.

System monitor #148

Open
wants to merge 19 commits into
base: master
Choose a base branch
from

Conversation

orbitalturtle
Copy link
Collaborator

This PR puts together a first draft of a system monitor for a Teos watchtower. It can be run once Elasticsearch and Kibana are running to produce a Kibana dashboard with several graphs displaying Teos data.

Would love some feedback if anyone has time!

I also have my eyes on the following improvements:

  • Right now each time the program boots up, in order to gather the most recent log data, it wipes out the data that was entered into Elasticsearch entirely and refills it with the data from the log file. Would probably be better to track how many logs have been entered so far and when the program starts up, start where it left off.
  • The data could be updated in real-time (maybe Logstash can help here? I'm not super familiar)
  • I might be missing something, but right now it seems the regtest Teos log data and mainnet log data are mixed in to the same log file, which would muddle the data in the dashboard. So perhaps would be good to make separate files for each.
  • Testing and exception handling are very minimal so far
  • There's probably much more that can be graphed :-) esp. time series data

@sr-gi
Copy link
Member

sr-gi commented May 20, 2020

Supper excited about this 🚀 I'll give it a look asap.

@yahgwai
Copy link
Collaborator

yahgwai commented May 25, 2020

Looks cool, always nice to see some dashboards :)

I think you might be able to do without your own custom loading code though. You can take two basic approaches to loading logs: handler or agent.

The handler approach is to add an additional logging stream. You already have a console and file stream:

self.f_logger.info(self._create_file_message(msg, **kwargs))
self.c_logger.info(self._create_console_message(msg, **kwargs))

so you would just add a third stream for elasticsearch. To do this you'd want to use a 3rd party lib as it would do a better job of keeping track of location (in case of crash), and avoiding blocking io. I dont work with python so I dont know how to assess the quality of the lib, but this seems ok: https://github.com/cmanaha/python-elasticsearch-logger

The agent approach is the one you've taken already, but you can use an existing agent instead of writing your own. Filebeat (https://www.elastic.co/beats/filebeat) will keep track of files and locations, and you can use logstash if you need transforms (https://www.elastic.co/logstash).

There's good discussion comparing the two approaches here (cmanaha/python-elasticsearch-logger#44) with the author python-elasticsearch-logger

@yahgwai
Copy link
Collaborator

yahgwai commented May 25, 2020

You have potential problem with overlapping fields of different types. Elasticsearch wont accept the same field with a type different to it's mapping. So if you have a two logs like:

logger.info(message, testy="testy")
logger.info(message, testy=10)

the latter would be rejected.

I havent checked to see if there are any instances of this issue, just flagging it as potential problem.

@sr-gi
Copy link
Member

sr-gi commented Jun 3, 2020

Starting to check this finally, sorry for taking so long :(

Copy link
Member

@sr-gi sr-gi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I cannot give you a proper review since I'm not able to run it. It may be a misconfiguration on my end or something. Maybe we can chat about it.

I've left some minor comments of what I've been able to find so far.

Also, visualizations.py seems like a view. Couldn't it be a json?

Finally, I think the best approach for this is to be run as a module or plugin on top of teos but with no actual connection to it, so the user can opt-in specifying some runtime params in teos (or config file). Therefore I think what @yahgwai was suggesting would make more sense than just parsing from the log file (my bad here, since I suggested this approach in the first place).

I'd focus on fixing the current approach and we can update it later though.

monitor/README.md Outdated Show resolved Hide resolved
monitor/README.md Outdated Show resolved Hide resolved
monitor/README.md Show resolved Hide resolved
monitor/__init__.py Outdated Show resolved Hide resolved
monitor/__init__.py Show resolved Hide resolved
conf_loader = ConfigLoader(DATA_DIR, CONF_FILE_NAME, DEFAULT_CONF, command_line_conf)
conf = conf_loader.build_config()

max_users = conf.get("DEFAULT_SLOTS")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not completely sure what max_users is used for, but you're using DEFAULT_SLOTS, which is used to define the default number of slots a users gets for a free subscription.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ohhhhhhh I see. Lol, I'm not sure how I reached this conclusion, but I was thinking that was the maximum number of users a watchtower supported. But I see, there's a maximum # of appointments, but not users. Will rework the graph I made with that data bit based on that

monitor/monitor_start.py Outdated Show resolved Hide resolved
monitor/sample-monitor.conf Show resolved Hide resolved
monitor/visualizer.py Outdated Show resolved Hide resolved
monitor/data_loader.py Outdated Show resolved Hide resolved
@orbitalturtle
Copy link
Collaborator Author

Awesome thanks for the feedback @yahgwai and @sr-gi, I'll look into this over the weekend

@sr-gi sr-gi self-requested a review June 16, 2020 14:26
Copy link
Member

@sr-gi sr-gi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a conflict with the .gitignore version in master. Merge or rebase.

@sr-gi sr-gi self-requested a review June 16, 2020 14:38
Copy link
Member

@sr-gi sr-gi left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Overall I think this creates a good base to work on the online logs to Elasticsearch. A structure that I think would make sense going in that direction is:

  • Changing start_monitor for monitord, and starting it from teosd as a standalone process if enabled (trough teos conf).
  • Sending logs from teosd to monitord. This can be done in multiple ways, but maybe the easiest one is trough a queue (check multiple-processing and queues).
  • Send the data in a similar way you're doing it now but instead of pulling from a file, pulling from the queue. Notice this may simplify some of the stuff since you don't need to care about where you left it in the file anymore.
  • Have data persistency in mind (what to do if the process is stopped with pending data). This may be worth leaving as an enhancement, but it is good to have it in mind while designing the code.
  • Have a backup strategy. The server connection may be interrupted and the data needs to be stored locally until it can be sent again. Check the watchtower-plugin for an example of how to handle this (feel free to ping me about it too). As the previous point, this may be worth leaving as an enhancement.

Nice job so far!

PS:

  • Code needs proper formatting (black 120)
  • Looks like the code breaks if the URLs are set with http://

@@ -0,0 +1,34 @@
This is a system monitor for viewing available user slots, appointments, and other data related to Teos. Data is loaded and searched using Elasticsearch and visualized using Kibana to produce something like this:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The README would need some work, but I'll leave the nits for the final version.

KIBANA_HOST = localhost
KIBANA_PORT = 9243

# CLOUD_ID = System_monitor:sdiohafdlkjfl
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All the cloud stuff should be removed.

ES_HOST = localhost
ES_PORT = 9200
KIBANA_HOST = localhost
KIBANA_PORT = 9243
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Change for the default Kibana port

@@ -0,0 +1,14 @@
[Teos]
DATA_DIR = ~/.teos
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

All fields in the config file must match the ones in __init__py.

You may have some fields that are not included in the config file b/c you may not want them to be settable. But you should not have anything in the conf file that has no default.

visualizer.create_dashboard()

if __name__ == "__main__":
command_line_conf = {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cmd params are missing. Take a look at teosd.py to see how to add them.

results = self.index_client.delete(index)

# For testing purposes...
def search_logs(self, field, keyword, index):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This does not seem to be used anymore. If the only purpose is testing, maybe it could be part of a local temp file or the testing suite.

return results


def get_all_logs(self):
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method does not seem to be used anywhere.

self.kibana_endpoint = "http://{}:{}".format(kibana_host, kibana_port)
self.saved_obj_endpoint = "{}/api/saved_objects/".format(self.kibana_endpoint)
self.space_endpoint = "{}/api/spaces/space/".format(self.kibana_endpoint)
self.headers = headers = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a double variable assignment there

self.max_users = max_users

def create_dashboard(self):
index_pattern = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This initialisations do not seem necessary. A dict returns None if a value is not found when calling get.

dashboard = data.get("dashboard")
image_url = data.get("imageUrl")

index_id = None
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, it does not need init.

@sr-gi sr-gi mentioned this pull request Jul 2, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants