-
Notifications
You must be signed in to change notification settings - Fork 3.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support API based import/export for pageviews as JSON #2653
Comments
I am not much of a Typescript user, and I've never touched Clickhouse - but I'm happy to contribute towards this effort. As I mentioned, it's a blocker for me and I'd be happy to avoid having to start over 😄 |
Are you referring to self-hosting or cloud? For the cloud product there is a guide on how to import data by providing a CSV file, https://umami.is/docs/cloud/import-data. This is a cloud only feature because it's made specifically for our infrastructure. For self-hosting, there is no such feature because 1) it's database specific and 2) you should have full access to your database and access to tools like |
I'm talking about self hosting, but I'm not totally sure I understand why it matters? Maybe I didn't express my intent properly, so I'll try to clarify.
I'm not sure I understand this. How is it any more database specific than calling In my specific case, I pulled my events out of Mixpanel and I have them all in JSON, no database involved. This request is more about user friendly APIs; things like indexing multiple views in one call, etc. This is practically no different whether it's cloud hosted or self hosted, unless I'm missing something. Edit: to clarify, I'm not asking Umami to accept the Mixpanel JSON format I have, it would be up to me to translate it to the Umami format in advance of passing it to the API. Lots of people have asked about import/export in various issues and discussions and the answer appears to be that it's not being added. Adding APIs that can be used this way allows the community to do the work to build importers from other analytics services so that the Umami team doesn't have to.
Well, sure, if you also happen to know the schema of Umami's tables. I (and anyone else who ever wants this) could sit here and sift through Prisma definitions to figure out how to map data from other services into Umami, but this is definitely a) not productive and b) a HUGE turn off. The same is true for export; clearly I can log into the database and try figure out how to stitch all the necessary tables together, but I feel like it's pretty standard to offer this via an API. It's extremely common that people want to pull things out of their main analytics storage for things like scripting and more complex evaluation. |
I think I get what you're saying. You want to be able to use the API to import data so that it uses the same internal logic as send. I think we can support something like that. But session data will be tricky because your session id is calculated on the fly at time of creation. But I'm sure we can figure something out. So, there are really only two tables that you would care about for import/export and that's
There is already an existing |
Yeah, pretty much hit the nail on the head. I understand there's probably a lot of work here, I know it'll take some time to get to this state. As a thought exercise based on your initial response, I actually did this over the weekend because I wanted to use Umami for my personal projects. It took me approximately 6 hours to write something to map my data across to Umami's format (converting my Mixpanel events to both pageview events and also session models). Most of this was spent looking at how to duplicate the logic of Umami (things like normalizing browser, os, that type of stuff). To import I ditched SQL and actually loaded Prisma from the Umami directory and wrote an import using that. This took another couple of hours, but worked. However I did hit the case where I had formatted something correctly, and of course it wasn't easy to revert without going into SQL. Obviously not a Umami issue, just a note for anyone who tries the same approach. My hope is that these new endpoints will eventually accept the Umami models, and use whatever validation is necessary to determine the inputs are correct (e.g. things like OS should be "Windows 10" or "Windows 11" but not "Windows"). Then it would be possible to write a generic I agree the session stuff is a bit awkward, but a lot of other providers you'd be importing from have their own session identification. In the case you don't have an existing session and/or device identifier, the path I took was to simply hash the event identifier and generate a 1:1 session. I know this is far from ideal, but there's really nothing else you can do in this case (unless It's currently already pretty close, it's just that
Yes, although I also wanted to put a Sorry for the text dump, just wanted to share from my migration this weekend 😅. |
Instead of this API, I think it's better to just document the table structure and maybe how to use pg_dump and pg_restore. If you're selfhosting, you have direct access to the database. Edit: If you have a CSV or whatever, a script is really simple to make. I have my own here: https://github.com/boehs/site/blob/master/utils/cloudflare-analytics-to-umami-lord-save-me.ts |
@whitfin did you document this process in more detail anywhere? I have some data I'd like to import to Umami, and I'm wondering how hard it would be. |
@boehs the issue with that is the assumption that everyone who wants analytics for their blog (or whatever) knows and wants to interact with a database. It's much more likely you break something doing it directly in the database instead of through the Umami API (which would have model validation, etc). Plus, as you point out, anyone can write scripts to move their stuff around (like both me and you did), but the advantage of an API is that someone somewhere could make a
I think this is an assumption that doesn't necessarily hold up. Not every user will be a single person hosting a blog on a $5 server.
@githubyouser, I didn't - although I could consider doing so. The script from @boehs looks like a decent starting point though! |
Thanks @whitfin I took a look at @boehs's script, but I've never used typescript before, so I'm not quite sure how to get started. Basically I want to just import the bare minimum of data (pageviews, etc) from an old Matomo installation for several fairly low traffic sites. My Matomo install is getting too expensive to maintain, and I'm starting fresh with a lite installation of umami, which seems to be a lot less resource hungry. I've been experimenting with umami for a few months now, and I was actually able to migrate a website from one test umami installation to another (following this guide), so that was a win, but moving from a different analytics install is a lot bigger of a challenge. :( |
I am attempting to migrate to Umami from Mixpanel, due to MIxpanel's new data retention rules. Everything is setup and went fairly smoothly, so for my next step I wanted to import from Mixpanel into Umami.
After looking through documentation, discussion, code, etc. it appears that this is a weakness of Umami and something I believe we can improve.
I propose adding (at least) the following routes:
Although it's possible to index
pageview
events via/send
and just not providingeventName
, this isn't really intuitive for the API itself (it's fine forscript.js
) so I propose splitting them here. They would use the same underlying logic, of course.There is no way at all to access raw documents via the API, which isn't ideal for anyone trying to export their data to process in any other form. There are some tools internally that would do the queries required, but ideally we'd probably have pagination here.
So I guess my question is... is this on the roadmap? If it were to be contributed, would it be accepted? It's quite a blocker on my side; essentially I cannot use Umami unless I can bring my historical data with me.
The text was updated successfully, but these errors were encountered: