DataAnalyzer.app

DataAnalyzer understands and converts any JSON (or CSV) data into type-aware code for any language!

A passion project by Dan Levy ✨

Introduction

If you consume popular APIs or utilize Component libraries, you've probably had the tedious task of re-implementing data structures you (hopefully) found in documentation.

What happens in the all-too-common case when the docs are wrong, outdated or missing?

With internal or private APIs the situation is generally worse.

When facing unreliable docs, often all you can count on is the actual HTTP response data.

Solution

DataAnalyzer.app to the rescue!

It can ingest raw data and generate intelligent type-aware code.

The schema-analyzer uses a highly extensible adapter/template pattern which can accommodate almost any kind of output. (e.g. SQL, ORM, GraphQL, Classes/Interfaces, Swagger JSON, JSON Schema to Protocol Buffers, and much more.)

Contributors: View todo & idea list

Issues/Requests/PRs welcome! 💜💙💚💛🧡♥️

"Wait, there's more!"

DataAnalyzer has 3 Powerful Features to Explore:

1. Analyze column type & size stats from any JSON/CSV!

2. Generate auto-typed code & database interfaces, instantly!

3. Visualize results, explore & understand your data structure!

Features

The primary goal is to support any input JSON/CSV and infer as much as possible. More data will generally yield better results.

Completed

Note: CSV files must include column names.

TODO

Move the demo button to a hover menu over the Data Input Panel (with a Clear button.)
Change the OutputButtons to use a scrolling grid of icons.
Convert CSSin-JS to use Linaria.

Bugs by area/function

Library.TypeMatcher('Timestamp'): Aggregate calculations fail with only 1 match. Should fall back to the value.
SQL.writer: Use actual nested table ID Column in FOREIGN KEY.
SQL.writer: Null/nullable fields emit correctly.

Better code generator support

Render output using handlebars templates.
Support multiple output files.
Use AI to name subtypes based on their column names (When assigning the full-qualified Path or de-duplicating types)
Add fuzzy matching of types if fields meet similarity threshold.

WHEN
  Completed TypeSummary processing.
  And SubType Shapes (column names) have `>= X%` similar columns.
GIVEN
  Nested type shapes:
    'latitude|longitude'
    'latitude|longitude|title'
    'latitude|longitude|url'
THEN
  1. Return an adjusted type with combined fields
    `latitude|longitude|title*|url*`
  2. Determine a new suggested name.
  3. Apply the Rename & Update fields with unified/composite type.

Type inference & detection

Range option for precise Timestamp detection.
Option to visit Hypermedia URLs to discover nested types?
Custom type matchers/regex patterns.
De-duplicate similar shaped objects (example below)

type PokemonGame struct {
    Name string
    Url string
}

type PokemonMove struct {
    Name string
    Url string
}

Becomes the generic (possibly prefixed struct):

type NameUrl struct {
    Name string
    Url string
}

Web App Interface

Migrate leftover Bootstrap utility classes to Material.
Add a "Schema Editor" table-like view to tune & view the results.
Fix options & overall menu
Add App Bar for config, or use router, modal - anything to get away from z-index BS.
Complete Web Worker for Background Processing.
Add confirmation for processing lots of data. (Rows and raw MB limit?)
Setup plausible analytics.

Code Writers

Add TypeScript+Mongoose Support (Possibly write all templates in TypeScript first, using tsc to emit JS code as needed?)
SQL CREATE TABLE
Added Zod support (like Yup or Joi)
JSON Schemas (for libraries like ajv)
Swagger yaml Reader/Writer
Binary Encoders (protocol buffers, thrift, avro)
Java Persistence API
Rails Models

Project Goals

The primary goal is to support any input JSON/CSV and infer as much as possible. More data will generally yield better results.

Output Support

Tips & Notes

For enum detection, adjust the relevant thresholds if you know (approximately) the expected number of unique enum values. For more accurate results, provide a randomized sample of 100+ rows. Accuracy increases (and speed decreases) greatly with 1,000+ rows.

Enumeration detection.
- Can set a required row count (default 100 rows)
- The next enum limit is the max number of unique values allowed?
  - For example, with 10 max enum items:
  - Only fields with a uniqueCount <= 10 will 'match' as enumerations and include an enum property.
Not Null detection.

For more info on the Schema Analyzer (core library) powering the DataAnalyzer.app, check out the schema-analyzer docs!

Included Type Matchers

Some of these (Email) are aliases of a base type (String). See code for more details on structure/relationship.

Unknown
ObjectId
UUID
Boolean
Date
Timestamp
Currency
Float
Number
BigNumber
Email
String
Array
Object
Null

Name		Name	Last commit message	Last commit date
Latest commit History 85 Commits
.github/workflows		.github/workflows
.vscode		.vscode
public		public
src		src
.gitignore		.gitignore
.npmignore		.npmignore
README.library.md		README.library.md
README.md		README.md
SECURITY.md		SECURITY.md
config-overrides.js		config-overrides.js
package-lock.json		package-lock.json
package.json		package.json
promotional-plan.md		promotional-plan.md
tsconfig.json		tsconfig.json
tsconfig.paths.json		tsconfig.paths.json
vercel.json		vercel.json

elite-libs/DataAnalyzer.app

Folders and files

Latest commit

History

Repository files navigation

DataAnalyzer.app

Introduction

Solution

"Wait, there's more!"

1. Analyze column type & size stats from any JSON/CSV!

2. Generate auto-typed code & database interfaces, instantly!

3. Visualize results, explore & understand your data structure!

Features

Completed

TODO

Project Goals

Output Support

Tips & Notes

Included Type Matchers

Similar/Alternative Projects

About

Topics

Resources

Security policy

Stars

Watchers

Forks

Languages