Finding a Website’s Favicon with Ruby

For a project I’ve been working on, I wanted to to have my Sidekiq worker (which is part of an RSS crawler) discover the favicon for a web site and cache it for later display. It was fun figuring out a way to do this, so I just had to share.

A Brief History of Favicons

Favicons, or “shortcut icons,” can be defined in multiple ways. Like all too many things in web design, browsers handle them in slightly different and mildly incompatible ways, meaning there’s plenty of redundancy. Favicons came to be when Microsoft added them to Internet Explorer 5 in 1999, implementing a feature where the browser would check the server for a file named favicon.ico and display it in certain parts of the UI. The following year, the W3C published a standard method for defining a favicon. Rather than simply having the browser look for a file in the root directory, an HTML document should specify a file in the header with a <link> tag, just like with stylesheets.

Fast forward to the present, and you have a bit of screwiness.

  • All major web browsers check for the link tag first, and fall back to favicon.ico if it’s not found.
  • You can define multiple icons in the HTML header. You can have ICO/PNG/GIF formats, as well as different sizes.
  • Some browsers support larger 32×32 favicons, while others will only use the 16×16 ones. Chrome for Mac prefers the 32×32 ones, and scales them down to 16×16 on Macs without Retina displays.
  • Big Bad Internet Explorer only supports ICO files for favicons, not PNGs.

The most compatible way to set up your favicon is to define both 32×32 and 16×16 icons in your header, using the PNG format, and make a 16×16 ICO formatted one to name “favicon.ico” and drop into your web root. Browsers that play nicely will use the PNG ones in whatever dimensions they prefer, and IE will fall back to the ICO file.

Writing the Class

Now that the history lesson is out of way, you can see why there’s a little bit of a challenge here. Depending on how badly you want to find and display that icon, you may have to write logic for the different methods. For this tutorial, I will focus on two. The simplest, which is looking to see if there’s a favicon.ico, and a basic implementation of checking for a link tag defining a shortcut icon.

Before we do anything else, we need to install a few dependencies. Either add them to your Gemfile and do a bundle install, or use the gem install command to install them manually.

Now require the necessary libraries at the top of a new Ruby file and we can get going.

require "httparty"
require "nokogiri"
require "base64"

We can define a class to make a nice, clean interface for this to keep it modular and easier to reuse. As you can see below, I’ve made a Favicon class and added some accessors for instance variables, as well as an initialize method that assigns the parameter it receives to the @host instance variable before calling the method we will be defining next.

require "httparty"
require "nokogiri"
require "base64"


class Favicon


  attr_reader :host
  attr_reader :uri
  attr_reader :base64


  def initialize(host)
    @host = host
    check_for_ico_file
  end


end

We’ll be implementing the simplest part first. The check_for_ico_file method will send an HTTP GET request to /favicon.ico on the server specified in @host and check to see if a file exists. (The server will send a 200 OK response if it does, and a 404 Not Found error otherwise.) If it does, the URL will be saved to an instance variable and the icon file’s contents will be base64 encoded before being saved to an instance variable as well.

The HTTParty gem is great for this, since it drastically simplifies simple HTTP requests like this.

# Check /favicon.ico
def check_for_ico_file
  uri = URI::HTTP.build({:host => @host, :path => '/favicon.ico'}).to_s
  res = HTTParty.get(uri)
  if res.code == 200
    @base64 = Base64.encode64(res.body)
    @uri = uri
  end
end

If you want, you could go ahead and instantiate the class to try out what we have so far. If you pass it the domain name of a site that uses the /favicon.ico convention, the object should find it without issue.

favicon = Favicon.new("arstechnica.com")

puts favicon.uri
#Outputs http://arstechnica.com/favicon.ico

puts favicon.base64
#Outputs a bunch of base64-encoded gibberish. More on this later

puts puts favicon.host
#Outputs arstechnica.com

Now let’s handle link tags! The process for that is a little bit more in-depth. First we need to request a web page from the server, such as the index page, and parse it for tags that resemble <link rel="shortcut icon" href="..." />. Then we have to evaluate the contents of href to make sure it’s an absolute URL, and prepend the domain name if it is not. After that, we can finally make a request to get the icon itself and save it.

Still with me? Excellent, now here’s the code to do that. I’ll comment it a little more thoroughly, since it looks messier at a glance.

# Check "shortcut icon" tag
def check_for_html_tag

  # Load the index page with HTTParty and pass the contents to Nokogiri for parsing
  uri = URI::HTTP.build({:host => @host, :path => '/'}).to_s
  res = HTTParty.get(uri)
  doc = Nokogiri::HTML(res)

  # Use an xpath expression to tell Nokogiri what to look for.
  doc.xpath('//link[@rel="shortcut icon"]').each do |tag|

    # This is the contents of the "href" attribute, which we pass to Ruby's URI module for analysis
    taguri = URI(tag['href'])

    unless taguri.host.to_s.length < 1
      # There is a domain name in taguri, so we're good
      iconuri = taguri.to_s
    else
      # There is no domain name in taguri. It's a relative URI!
      # So we have to join it with the index URL we built at the beginning of the method
      iconuri = URI.join(uri, taguri).to_s
    end

    # Grab the icon and set the instance variables
    res = HTTParty.get(iconuri)
    if res.code == 200
      @base64 = Base64.encode64(res.body)
      @uri = iconuri
    end
    
  end

end

Now there’s one more thing to do before we’re done. The initialize method needs to be tweaked so it calls our newest method:

def initialize(host)
  @host = host
  check_for_ico_file
  check_for_html_tag
end

Now the class will check for the favicon.ico file first, then the HTML tag. If the HTML tag is present, it will take precedence.

Available as a Gist! For your convenience, the results of this tutorial are available as a GitHub Gist.

Using the Class

Now all you have to do is include the class with a require statement, and grab favicons.

require "favicon"

favicon = Favicon.new("arstechnica.com")

puts favicon.uri
#Outputs http://static.arstechnica.net/favicon.ico

puts favicon.base64
#Outputs a bunch of base64-encoded gibberish. More on this later

puts puts favicon.host
#Outputs arstechnica.com

Now…what of that “base64-encoded gibberish?” It’s the perfect format for a little trick called Data URIs, which you can read all about over at CSS-Tricks. If you cache that base64 string somewhere, probably in a database, you can output it like so:

<img width="16" height="16" alt="favicon" src="" />

It will display like any other image, but won’t use an additional HTTP request, because the image data is already embedded on the page. This makes it perfect for a list of web sites with icons beside them. Instead of kicking off several HTTP requests for individual tiny images, you just embed them right in the page.

If you’re unfortunate enough that you must support antique versions of Internet Explorer (version seven or prior) then you can’t use Data URIs, as they were not supported. However, all is not lost. You could conceivably adapt the class and have it write the image data to files on the server instead of base64-encoding them.

  • http://www.inspiredgiftgiving.com marquita herald

    Great tutorial – there is one other way to install a favicon. If you have a self hosted WordPress site you can simply upload the favicon plugin, install your favicon and it shows up immediately. Easy.

  • Grabicon

    Hi Matt – great article! If your readers want a shortcut way to get free favicons (also written in Ruby, by me) they can try grabicon.com. The benefit over the DIY approach is that instead of waiting 3-4 seconds to retrieve the icon, grabicon caches them, so they’re almost instant.

    It also resizes icons to what you request, and generates unique default icons for sites that don’t have one. This allows web/mobile apps to have a uniform user experience because icons are all the same size, and none are missing. Here’s an example:

    http://grabicon.com/icon?domain=wikipedia.org

    The full docs are on the homepage. Thanks!

  • FredLuis

    Well, resolving such issues is important because of many reasons tile installation

  • Brett M

    Wow! This is really helpful information I’ve been looking for this since yesterday, glad to see this post. Thanks for sharing. Check here

  • Emmanuel Orta

    Agreed thank you for sharing. So much value!
    USA Directory

  • Luis M

    Thank you this is helpful.
    Trip Fall Accident Attorney

  • nicole patton

    Great content. This is very helpful Thanks. http://www.sanantoniofoundationandleveling.com/

  • Josh Albright

    You have an informative article. Thanks for sharing | Used Cars dealers

  • Robert

    I think it depend on how badly you want to find and display that icon, you may have to write logic for the different methods. – http://www.kitchenremodelhawaii.com

  • Robert

    We can finally make a request to get the icon itself and save it. Kitchen Remodels

  • Mary Solero

    This is very helpful. Thank you. Hudson Valley Deck and Fence

  • Yvette Katerine

    The following year, the W3C published a standard method for defining a favicon. – concrete contractors buffalo ny

  • Georgia Miller

    Thanks for giving us a brief history.our vision

  • Angie Lyn

    This makes it perfect for a list of web sites with icons beside them. |
    Flooring Services near me

  • James Wood

    Browsers that play nicely will use the PNG ones in whatever dimensions they prefer, and IE will fall back to the ICO file. |
    crawlspace insulation

  • James Wood

    Browsers that play nicely will use the PNG ones in whatever dimensions they prefer, and IE will fall back to the ICO file. crawlspace dehumidifier

  • Haleigh Jolla

    After that, we can finally make a request to get the icon itself and save it. Murfreesboro Crawlspace

  • Rosa Mannelli

    If it does, the URL will be saved to an instance variable and the icon file’s contents will be base64 encoded before being saved to an instance variable as well.
    https://www.drywallphilly.com/

  • Patricia Miller

    Thanks for sharing that great info. Keep on posting. our site

  • Patricia Miller

    Such an informative site. Keep on posting. https://www.foamprosboston.com/

  • Rosa Mannelli

    It was fun figuring out a way to do this, so I just had to share. online marketing fort worth, tx

  • Valarie Everett

    If you want, you could go ahead and instantiate the class to try out what we have so far when doing kitchen renovation .

  • Kadan

    This is a good one. Please keep on sharing your wisdom
    Lawn Care

  • Vance Three

    Excellent explanation, but there is another way to add a favicon. If you have a self-hosted WordPress site, you can simply install the favicon plugin and your favicon will appear immediately. Easy. | Delaware Drywallers

  • bellid

    Great job on a very detailed explanation! Appreciate your work!
    Appliance Repair Experts

  • JOANNE

    Great post, very informative site indeed! Thank you for sharing!
    Excavating Contractors

  • Kadan

    Great job explaining favicons! Keep it up
    Metal Fence

  • bellid

    Fantastic job on explaining it on detail. I really learned favicons thru your article!
    Regards,
    Victoria Fabrication Company

  • Kadan

    Great input. Please keep us updated. Great explanation on the technical stuff!
    Red Deer Septic Company

  • Jack Briggs

    This is really helpful to me! Wow. post office

  • James Geller

    This presentation is easy to understand than my professor explaining it Springfield IL seamless gutters

  • https://OFallonRoofingPros.com Peter21

    Very well explained. Thanks for the clarification
    San Antonio Fence Pros.

  • Felicity Young

    Thank you for this information about Favicons. contact us today

  • Jack Briggs

    I really find Favicons interesting! haroclean.com

  • Karlitoo Bing

    The favicons are being found by two ways. First, there is a ‘hardcoded’, traditional name . Concrete Contractors Burlington IA

  • Amber Brion

    Favicons, also known as “shortcut icons,” are small icons associated with a website that are displayed in the browser’s address bar, bookmarks, and other UI elements. The history of favicons can be traced back to the late 1990s.

    In 1999, Microsoft added support for favicons in Internet Explorer 5, with a feature where the browser would check the server for a file named “favicon.ico” and display it in certain parts of the UI. This was the first implementation of favicons in a web browser.

    The following year, in 2000, the World Wide Web Consortium (W3C) published a standard method for defining a favicon. This standard specified that an HTML document should include a tag in the header that points to the favicon file, just like with stylesheets.

    Since then, favicons have become a standard feature of web design and are widely used to help users identify and distinguish between different websites. However, due to differences in browser implementations, there are still some minor inconsistencies in how favicons are displayed across different browsers and platforms.http://www.bestcasepropertygroup.com/

  • Justin

    Very great information provided I will def be reading more of your articles
    Lawn Mowing Service San Antonio

  • Felicity Young

    Favicon seems interesting! -Matt

  • Vance Three

    It’s always interesting to see how developers find creative solutions to problems like this case, discovering a website’s favicon using Ruby. – https://www.mcallendrywall.com

  • ampva200

    I might try this one after putting up wallpaper. Very interesting!

  • Adele Adkins

    Wow! What an incredibly helpful article. call us

  • Jack Briggs

    Glad that you did not keep it to yourself. You really share it to us and we’re grateful. contact us

  • Naoma Laopa

    The contents of the icon file will be base64 encoded before being saved to an instance variable, and the URL will also be saved to an instance variable., contact us!

  • Louis Cottier

    This seems like a pretty complex way to find a favicon… I own a tree service and we have a website so I get my web developer to deal with it but damn, didn’t know it was so complex.

    – Tim Learn about my company

  • shapannsp@yahoo.com

    Actually, it’s pretty good to see! Tiler Adelaide

  • shapannsp@yahoo.com

    Thanks for sharing! Tiler Adelaide

  • shapannsp@yahoo.com

    Thanks for letting us know! Tiler Wollongong

  • shapannsp@yahoo.com

    Excellent post! Concreters in Wollongong

  • shapannsp@yahoo.com

    Thanks for sharing this to public! Adelaide Landscaping

  • shapannsp@yahoo.com

    I visited Your blog and got a massive number of informative articles. I read many articles carefully and got the information that I had been looking for for a long time. Hope you will write such a helpful article in future. Thanks for writing.Tilers in Hobart

  • shapannsp@yahoo.com

    Very useful and informative post! Tiling Townsville

  • shapannsp@yahoo.com

    Very informative post! tiler melbourne

  • shapannsp@yahoo.com

    To be honest, I generally don’t read. But, this article caught my attention.digital marketing adelaide

  • shapannsp@yahoo.com

    I am really impressed with your writing style. Keep it up! Landscapers Canberra

  • shapannsp@yahoo.com

    Many thanks for sharing this! Adelaide Coolroom Hire

  • shapannsp@yahoo.com

    Thanks for sharing! Sliding Doors Adelaide

  • shapannsp@yahoo.com

    It’s so kind of you! Solar Panels Adelaide

  • shapannsp@yahoo.com

    Many many thanks to you! Cleaning Services Adelaide

  • shapannsp@yahoo.com

    You presented your ideas and thoughts really well on the paper. adelaide electrician

  • shapannsp@yahoo.com

    Very informative content. Thanks. tow truck wollongong

  • shapannsp@yahoo.com

    Thanks for letting us know. Tiler Adelaide

  • shapannsp@yahoo.com

    I thik this is very helpfull post Canberra landscapers

  • shapannsp@yahoo.com

    Great Post! I learned a lot from this, Thank you! Canberra landscapers

  • shapannsp@yahoo.com

    Really nice article and helpful me Canberra landscapers

  • shapannsp@yahoo.com

    Nice article, waiting for your another Canberra landscapers

  • shapannsp@yahoo.com

    Such a great post! Glenelg South

  • shapannsp@yahoo.com

    Thats what I was looking for! air conditioning repair adelaide

  • shapannsp@yahoo.com

    Good to know about this! Tilers Wollongong Albion Park

  • shapannsp@yahoo.com

    This is really very nice blog and so informative Bathroom Tilers Sydney

  • Lead Fox

    It’s the little details like this that we think make a website look great. Our web designers in Swansea have a checklist they must go through to ensure all these little things are met and favicons are on there.

  • Anthony Tutino
  • Patricia Miller

    Thank you so much for sharing this informative blog.
    https://applicationfiling.com/

  • Naoma Laopa

    Setting up 32×32 and 16×16 PNG icons in your header is the most compatible way to set up your favicon. west auckland

  • Kelly

    That sounds like a cool and challenging project! I’m curious to hear how you tackled the favicon discovery and caching within your Sidekiq worker. pinellas park metal roofing

  • morgan
  • Naoma Laopa

    I’ve only opened one support ticket, and it was promptly resolved, thus far the service has been reliable. See: http://concretedrivewaysmiami.com

  • Naoma Laopa

    Among the more intriguing improvements are the ability to work with static pages and a new method of rapidly editing posts by adding. See: http://roofrepairsauckland.co.nz