Using eLabFTW as a flexible data entry system #4857
din14970
started this conversation in
Show and tell
Replies: 2 comments 5 replies
-
Well, it is! It's not normalized because everything is tucked up in the |
Beta Was this translation helpful? Give feedback.
1 reply
-
Hey @din14970 Would you be interested in presenting your system (10 minutes) at the next Community Meeting? |
Beta Was this translation helpful? Give feedback.
4 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
Context & problem
I am sharing this because I think it will be recognizable for many research groups collecting lab data. Perhaps it can serve as an inspiration.
I currently work at an applied research institute, helping out research teams with various data related topics. A group in our chemistry department is working on lignin depolymerization and has problems with data management. Currently they record all their data across multiple Excel files on Sharepoint. This has the following issues:
So essentially, a database is managed that can not be used as a database. One can not search or query it. One can not analyze the data. One can not be sure about the quality of any of the data.
Solution requirements
The implemented solution, part 1
As I had prior experience using eLabFTW, and had similar issues when I was working as a researcher myself, I proposed we could pilot eLabFTW for their case. Since their data was quite structured I proposed to make heavy use of the recent metadata fields feature. We approached the problem as follows:
Sourced lignins come in from external parties or suppliers. These are then depolymerized in various ways yielding a depolymerized lignin. There is a many-to-one relationship from depolymerized lignin to sourced lignin, which is why it is split into another item. At this moment we include the depolymerization conditions information directly into the depolymerized lignin template, but one might imagine that if depolymerization becomes very complex with multiple different protocols it may need to be split into a separate template. Characterizations are split out into experiment templates, although we may move this to resource templates in the future (since we discovered that experiments do not track which templates they originate from, and experiment templates are generally not admin controlled).
Which render to (when filled in):
in view mode and in edit mode:
At this stage the researcher already has en environment to start recording new data. Notice that we use the newest feature of being able to link to other entities directly from the metadata.
Ultimately we end up with a system that behaves conceptually like this:
What we achieve with this:
What we can not yet achieve with this:
The implemented solution part 2
To address the requirements related to data analysis, there are multiple different possible solutions. These are not necessarily eLabFTW specific but I thought I would share anyway to show what we have done and to serve as inspiration.
The simplest approach would be to write a script to extract and combine all the data again into tables, which are downloaded and stored locally in some tabular form. These could then be analyzed/queried with a tool of choice (Excel, python, matplotlib, R, pandas, DuckDB, ...). The main downsides to this approach:
We opted for another approach. At our research institute we are also implementing an AWS based research data platform that supports building automated data flows and pipelines. I used this system to extract and transform the eLabFTW data into tables where the data lives on AWS S3 and the schemas are registered in AWS Glue databases. These tables can then be queried directly using e.g. SQL in AWS Athena or using tools like awswrangler (for a pandas interface).
The pipeline is also written in python, containerized, and deployed on Airflow:
At the moment it runs once per day at night.
I map each item back to a row. So each entity type becomes a table. Links are parsed so that they refer to row numbers in another table (so that we can do joins). The end result are data artifacts on S3:
Tables can be queried directly in Athena:
As you can see, this now does allow us to write custom queries to answer any question. I don't have a screenshot of a notebook but we also showed that we can now easily use awswrangler to read the data as pandas tables, which we can then process and analyze with plots.
Finally, we are also experimenting with a data catalogue called DataHub which should make these kinds of datasets searchable across the entire organization. It also would allow one to register more metadata about the tables themselves (e.g. descriptions on columns). For example:
Datahub also has a data lineage feature, which allows one to show which tasks produced which datasets etc which is pretty cool:
So as a summary, the system now looks like this:
Summary
We are testing eLabFTW as a flexible data entry system which allows us to quickly implement forms to record high quality lab data. Through the API we can easily manage the templates and even the items and do migrations as necessary. We can also implement automation scripts/tools. If we would have to implement a custom web application for each specific research group it would take longer than any research project. eLabFTW enabled us to build this entire project in about 3 months (not full time effort, and with more than 1 month spent on migrating the Excels).
We also employ our AWS data analysis platform to make the same data available in a format that can be easily queried or analyzed using various programming languages. This allows the researcher to only have to care about entering data in eLab, and they or others can later actually get insights from this data easily.
It is still early stage for both eLabFTW in this group and for our data platform but we are excited to develop it further, bring it to more teams and see where it goes.
Beta Was this translation helpful? Give feedback.
All reactions