Finds the URL to the ‘favicon’ for a website. This is useful if you want to display the ‘favicon’ in an HTML document or web application, especially if the website is behind a firewall.
library(faviconPlease)
faviconPlease("https://github.com/")
## [1] "https://github.githubassets.com/favicons/favicon.svg"
Also check out my blog post on faviconPlease for more background and examples.
Install latest release from CRAN:
install.packages("faviconPlease")
Install development version from GitHub:
install.packages("remotes")
remotes::install_github("jdblischak/faviconPlease")
Please note that the faviconPlease project is released with a Contributor Code of Conduct. By contributing to this project, you agree to abide by its terms.
By default, faviconPlease()
uses the following strategy to find the
URL to the favicon for a given website. It stops once it finds a URL and
returns it.
-
Download the HTML file and search its
<head>
for any<link>
elements withrel="icon"
orrel="shortcut icon"
. -
Download the HTML file at the root of the server (i.e. discard the path) and search its
<head>
for any<link>
elements withrel="icon"
orrel="shortcut icon"
. -
Attempt to download a file called
favicon.ico
at the root of the server. This is the default location that a browser looks if the HTML file does not specify an alternative location in a<link>
element. If the filefavicon.ico
is successfully downloaded, then this URL is returned. -
If the above steps fail, as a fallback, use the favicon service provided by the search engine DuckDuckGo. This provides a nice default for websites that don’t have a favicon (or can’t be easily found).
The default strategy above is designed to reliably get you a favicon URL for most websites. However, you can customize it as needed.
The default fallback function is faviconDuckDuckGo()
. To instead use
Google’s favicon service, you can set the argument
fallback = faviconGoogle
.
Note that neither DuckDuckGo nor Google have every favicon you might expect. And the availability can change over time. You can see some examples in my blog post. Fortunately they both provide a generic favicon to insert when they don’t have the favicon.
You can use your own custom fallback function instead. It must accept
one argument, which is the server, e.g. "github.com"
. The easiest
approach would be to copy-paste one of the existing fallback functions
and modify it to use your alternative favicon service.
args(faviconDuckDuckGo)
## function (server)
## NULL
body(faviconDuckDuckGo)
## {
## iconService <- "https://icons.duckduckgo.com/ip3/%s.ico"
## favicon <- sprintf(iconService, server)
## return(favicon)
## }
If you have a URL to a generic favicon file that you would like to use as a fallback, you can directly pass this as a character vector. It could also be a path to an image file on the server where your app is running.
The default strategy first checks the <head>
for a link to the favicon
file and then checks for the availability of the file favicon.ico
. You
can change this order, or only perform one of them, by changing the
argument functions
passed to faviconPlease()
. It should be a list of
functions.
# default
functions = list(faviconLink, faviconIco)
# Switch the order
functions = list(faviconIco, faviconLink)
# Only search <head>
functions = list(faviconLink)
# Only check for favicon.ico
functions = list(faviconIco)
# Skip the favicon functions entirely and just use the fallback
functions = NULL
You can also create your own custom favicon function to pass to
faviconPlease()
. By default it must accept 3 arguments. It will be
passed the URL’s scheme (e.g. "https"
), server (e.g. "github.com"
),
and path (e.g. "/jdblischak/faviconPlease"
). Your function should
return the URL to a favicon or an empty string, ""
, if it can’t find
one.
# Favicon functions must accept at least 3 positional arguments
args(faviconLink)
## function (scheme, server, path)
## NULL
As a concrete example, here is a custom function for searching for
favicon.ico
on Ubuntu 20.04, which has increased security settings
(see troubleshooting section below).
faviconIcoUbuntu20 <- function(scheme, server, path) {
faviconIco(scheme, server, path, method = "wget",
extra = c("--no-check-certificate",
"--ciphers=DEFAULT:@SECLEVEL=1"))
}
It calls faviconIco()
with the specific settings needed by
download.file()
to work on Ubuntu 20.04. You could then use your
custom function instead of the default faviconIco()
by calling
faviconPlease()
with
functions = list(faviconLink, faviconIcoUbuntu20)
.
Note that the example function faviconIcoUbuntu20()
will likely fail
on Windows, macOS, and Ubuntu versions prior to 20.04.
Unfortunately it’s not easy to make this fool proof for all operating systems and all websites. Here are some known issues:
-
download.file()
, used byfaviconIco()
, is known to have cross-platform issues. Thus the official documentation in?download.file
recommends:Setting the
method
should be left to the end user.Accordingly,
faviconIco()
exposes the argumentsmethod
,extra
, andheaders
, which are passed directly todownload.file()
. Alternatively you can set the global options"download.file.method"
or"download.file.extra"
. -
Ubuntu 20.04 increased its default security settings for downloading files from the internet (details). Unfortunately many websites have not updated their SSL certificates to comply with the increased security restrictions.
faviconLink()
has a workaround for this situation, but notfaviconIco()
. As an example, here’s how you could detect the availability of favicon.ico for the Ensembl website on Ubuntu 20.faviconIco("https", "www.ensembl.org", "", method = "wget", extra = c("--no-check-certificate", "--ciphers=DEFAULT:@SECLEVEL=1"))
Alternatively, if it’s an option for you, you could avoid this workaround by using the previous Ubuntu LTS release 18.04. Also note that the above command will fail on Ubuntu 18.04 because the default
wget
installed doesn’t have the argument--ciphers
.