Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Wappalyzer technolgies table not showing expected results for Ecommerce #25

Open
rockeynebhwani opened this issue Aug 16, 2021 · 6 comments

Comments

@rockeynebhwani
Copy link

Take an example of this site - https://www.maxtondesign.co.uk/. This site uses 'OpenCart' Ecommerce platform as per Wappalyzer chrome extension and also as per BuiltWith (https://builtwith.com/detailed/maxtondesign.co.uk). BuiltWith shows OpenCart from Aug-2018.

When we query technologies table for this site, we don't see Ecommerce category at all. This impacts the stats for Ecommerce chapter.

SELECT *  FROM
    `httparchive.technologies.2021_*`
where url = 'https://www.maxtondesign.co.uk/'
and category = 'Ecommerce'

@pmeenan - Can we please check if Wappalyzer integration is working as expected?

@rviscomi rviscomi transferred this issue from HTTPArchive/almanac.httparchive.org Aug 16, 2021
@rockeynebhwani
Copy link
Author

@pmeenan - We are also seeing junk values in technologies table in some cases..

image

@pmeenan
Copy link
Member

pmeenan commented Aug 17, 2021

Looks like OpenCart uses the cookies for detection. Looking now to see if those are plumbed and how to add it if they aren't.

@pmeenan
Copy link
Member

pmeenan commented Aug 17, 2021

Created a PR for WebPageTest. Should be in the September crawl but it's too late for August which is just wrapping up.

@tunetheweb
Copy link
Member

Thanks @pmeenan !

@rockeynebhwani / @rviscomi I think for the second issue that's a hangover from when Wappalyzer was broken affecting crawls in the beginning of the year (think it was tracked in HTTPArchive/almanac.httparchive.org#1843).

When I run this:

SELECT
  _TABLE_SUFFIX AS run,
  COUNT(0) AS total
FROM
  `httparchive.technologies.*`
WHERE
  _TABLE_SUFFIX > '2021' AND
  app LIKE '%function%'
GROUP BY
  _TABLE_SUFFIX
ORDER BY
  run

We don't see any recent issues:

image

@rviscomi probably should clean up your Web Technologies Report to filter out these "apps" so they don't show in the drop down.

@rockeynebhwani
Copy link
Author

@pmeenan / @tunetheweb - I am still seeing some inconsistencies in July table when I compare the results with Wappalyzer extension. Example - https://ezplaytoys.com/

Ran this query and got only 7 results


SELECT
  * 
FROM
  `httparchive.technologies.2021_07_01_mobile`
where url = 'https://ezplaytoys.com/'

Same query for desktop gives 21 results.. I don't expect so much difference between desktop and mobile for this site...

@rviscomi
Copy link
Member

Did a bunch of cleanup of obviously bad technology names in the dashboard table (httparchive.core_web_vitals.technologies). For example, names containing only spaces, dots, and/or numbers, and source code like function or this..

image

There are still some odd combinations of valid app names with version numbers appended. I won't touch those since there are lots of them and some may be useful.

image

This doesn't prevent new datasets from adding more junk to the dashboard so we may need to clean it up again in the future or put in place better checks in HA or upstream in WPT.

@rviscomi rviscomi transferred this issue from HTTPArchive/httparchive.org Jun 30, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants