Skip to content

A Go(Golang) package for extracting, parsing and manipulating URLs.

License

Notifications You must be signed in to change notification settings

hueristiq/hqgourl

Repository files navigation

hqgourl

go report card open issues closed issues license maintenance contribution

A Go(Golang) package for extracting, parsing and manipulating URLs.

Resources

Features

  • Flexible URL extraction from text using regular expressions.
  • Domain parsing into subdomains, root domains, and TLDs.
  • Extends the standard net/url URLs parsing with additional fields.

Installation

go get -v -u github.com/hueristiq/hqgourl

Usage

URL Extraction

package main

import (
    "fmt"
    "github.com/hueristiq/hqgourl"
    "regexp"
)

func main() {
    extractor := hqgourl.NewURLExtractor()
    text := "Check out this website: https://example.com and send an email to info@example.com."
    
    regex := extractor.CompileRegex()
    matches := regex.FindAllString(text, -1)
    
    fmt.Println("Found URLs:", matches)
}

The URLExtractor allows customization of the URL extraction process through various options. For instance, you can specify whether to include URL schemes and hosts in the extraction and provide custom regex patterns for these components.

  • Extracting URLs with Specific Schemes

    extractor := hqgourl.NewURLExtractor(
        hqgourl.URLExtractorWithSchemePattern(`(?:https?|ftp)://`),
    )

    This configuration will extract only URLs starting with http, https, or ftp schemes.

  • Extracting URLs with Custom Host Patterns

    extractor := hqgourl.NewURLExtractor(
        hqgourl.URLExtractorWithHostPattern(`(?:www\.)?example\.com`),
    )

    This setup will extract URLs that have hosts matching www.example.com or example.com.

Note

Since API is centered around regexp.Regexp, many other methods are available

Domain Parsing

package main

import (
    "fmt"
    "github.com/hueristiq/hqgourl"
)

func main() {
    dp := hqgourl.NewDomainParser()

    parsedDomain := dp.Parse("subdomain.example.com")

    fmt.Printf("Subdomain: %s, Root Domain: %s, TLD: %s\n", parsedDomain.Sub, parsedDomain.Root, parsedDomain.TopLevel)
}

URL Parsing

package main

import (
    "fmt"
    "github.com/hueristiq/hqgourl"
)

func main() {
    up := hqgourl.NewURLParser()

    parsedURL, err := up.Parse("https://subdomain.example.com:8080/path/file.txt")
    if err != nil {
        fmt.Println("Error parsing URL:", err)

        return
    }

    fmt.Printf("Subdomain: %s\n", parsedURL.Domain.Sub)
    fmt.Printf("Root Domain: %s\n", parsedURL.Domain.Root)
    fmt.Printf("TLD: %s\n", parsedURL.Domain.TopLevel)
    fmt.Printf("Port: %d\n", parsedURL.Port)
    fmt.Printf("File Extension: %s\n", parsedURL.Extension)
}

Set a default scheme:

up := hqgourl.NewURLParser(hqgourl.URLParserWithDefaultScheme("https"))

Contributing

Issues and Pull Requests are welcome! Check out the contribution guidelines.

Licensing

This utility is distributed under the MIT license.

Credits

Contributors

Thanks to the amazing contributors for keeping this project alive.

contributors

Similar Projects

Thanks to similar open source projects - check them out, may fit in your needs.

DomainParserurlxxurlsgoware's tldomainsjakewarren's tldomains