2016-02-14

Have made a gem to retrieve holidays from Google Calendar

Ruby

I've made a gem to retrieve national holidays from Google Calendar with simple interface.

github.com

What motivates me is there is no suitable gem for my purpose as far as I can see.

What I want to do

What I want to do and some features the gem provides are below:

List holidays in a particular month
List holidays in a particular year
Check a date whether it is a holiday or not
Retrieve holidays of a lot of countries

Plus, I want to do these thing with LESS API ACCESS to Google Calendar.

Strengths

Notable features are CACHEING and PRELOAD. There are some tools which have similar features, but they need API access every time.

As for my gem, it needs a few API access when you use the preload feature properly.

For example, when you pass date_range: 1.year as a preload parameter, it is going to retrieve and cache holidays from "this day last year" to "this day next year" at initialization. As far as you access days only in this range, the gem never calls Google API.

In adittion, the caching feature adopts LRU algorithm to care about memory leak.

How to use it

Sample code is below:

require "holidays_from_google_calendar"

usa_holidays = HolidaysFromGoogleCalendar::Holidays.new do |config|
  config.calendar = {
    nation: "usa",
    language: "en"
  }

  config.credential = {
    api_key: "YOUR OWN GOOGLE API KEY"
  }
end

usa_holidays.in_year(Date.parse("2016-02-06")) # Retrieve 2016's holidays
=> [#<HolidaysFromGoogleCalendar::Holiday:0x007ff7d42df748 @date=Fri, 01 Jan 2016, @name="New Year's Day">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d42def28 @date=Mon, 18 Jan 2016, @name="Martin Luther King Day">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d42de820 @date=Sun, 14 Feb 2016, @name="Valentine's Day">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d42de0f0 @date=Mon, 15 Feb 2016, @name="Presidents' Day">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d42dd808 @date=Sun, 13 Mar 2016, @name="Daylight Saving Time starts">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d42dc8e0 @date=Sun, 27 Mar 2016, @name="Easter Sunday">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d4ab3b40 @date=Wed, 13 Apr 2016, @name="Thomas Jefferson's Birthday">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d4ab36b8 @date=Sun, 08 May 2016, @name="Mother's Day">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d4ab3230 @date=Mon, 30 May 2016, @name="Memorial Day">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d4ab2d08 @date=Sun, 19 Jun 2016, @name="Father's Day">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d4ab26a0 @date=Mon, 04 Jul 2016, @name="Independence Day">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d4ab21a0 @date=Mon, 05 Sep 2016, @name="Labor Day">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d4ab1cc8 @date=Mon, 10 Oct 2016, @name="Columbus Day (regional holiday)">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d4ab1610 @date=Mon, 31 Oct 2016, @name="Halloween">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d4ab0f30 @date=Sun, 06 Nov 2016, @name="Daylight Saving Time ends">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d4ab0a58 @date=Tue, 08 Nov 2016, @name="Election Day">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d4ab0648 @date=Fri, 11 Nov 2016, @name="Veterans Day">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d4ab0238 @date=Thu, 24 Nov 2016, @name="Thanksgiving Day">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d42d7f20 @date=Sat, 24 Dec 2016, @name="Christmas Eve">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d42d7a70 @date=Sun, 25 Dec 2016, @name="Christmas Day">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d42d7660 @date=Mon, 26 Dec 2016, @name="Christmas Day observed">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d42d70e8 @date=Sat, 31 Dec 2016, @name="New Year's Eve">,

usa_holidays.in_month(Date.parse("3rd March 2016")) # Retrieve holidays of March, 2016
=> [#<HolidaysFromGoogleCalendar::Holiday:0x007ff7d42dd808 @date=Sun, 13 Mar 2016, @name="Daylight Saving Time starts">,
 #<HolidaysFromGoogleCalendar::Holiday:0x007ff7d42dc8e0 @date=Sun, 27 Mar 2016, @name="Easter Sunday">]

usa_holidays.holiday?(Date.parse("Oct 31 2016")) # Halloween
=> true

usa_holidays.holiday?(Date.parse("April 16th 2016")) # Satruday
=> true

usa_holidays.holiday?(Date.parse("April 17th 2016")) # Sunday
=> true

usa_holidays.holiday?(Date.parse("Aug 2 2016")) # Weekday (Tuesday)
=> false

Impression

I need to learn data structures and algorithms more and more.

2015-11-04

How to read a text file in batches with Ruby

Ruby

When I was reviewing my colleague's source, I found awful code in his program.

Like this:

File.open("./production.log") do |file|
  results = {}
  file.readline # skip header
  file.each_line do |line|
    key, value = process(line)
    results[key] = value
    next if results.size < 2000
    output(results)
    results = {}
  end
  output(results)
end

Oops! How did this happen!?

He said "We need purge results of memory every 2,000 lines because that causes out of memory. And there is no method to get 'each batch of lines' from text files in Ruby."

I got it, we need some methods to read files in batches, like a "find_in_batches" of ActiveRecord. Then I searched such methods, and I found a same question as ours in Stack Overflow.

stackoverflow.com

Those answers will go well, but those are not cool! Finally, I couldn't find out great ways.

Ideal ways

I'd like to use like a "find_in_batches" method, like this:

File.open("./production.log", skip_header: true) do |file|
  file.batch_line(batch_size: 2000) do |lines|
    results = process(lines)
    output(results)
  end
end

Or, it seems nice to wrap text file by an object which has the "batch_line*1" method:

file = TextFilePager.new("./production.log", skip_header: true)
file.batch_line(batch_size: 2000) do |lines|
  results = process(lines)
  output(results) 
end

Both are better than previous code. We can easily understand what this code will do.

First Proposal Way

Using "Enumerator::Lazy" and "each_slice" seems one of the great ways:

File.open("./production.log") do |file|
  file.lazy.drop(1).each_slice(2000) do |lines|
    results = process(lines)
    output(results)
  end
end

It's really simple way, because we need no additional class and method.

All we need is understand "Enumerator::Lazy *2" and "each_slice", that's it.

Second Proposal Way

Making a new class, like "TextFilePager", seems a nice idea:

class TextFilePager
  DEFAULT_BATCH_SIZE = 1000

  def initialize(file_path, skip_header: false, delete_line_break: false)
    @file_path = file_path
    @skip_header = skip_header
    @delete_line_break = delete_line_break
  end

  def batch_line(batch_size: DEFAULT_BATCH_SIZE)
    File.open(@file_path) do |file|
      file.gets if skip_header?
      loop do
        line, lines = "", []
        batch_size.times do
          break if (line = file.gets).nil?
          lines << (delete_line_break? ? line.chomp : line)
        end
        yield lines
        break if line.nil?
      end
    end
  end

  def skip_header?
    @skip_header
  end

  def delete_line_break?
    @delete_line_break
  end
end

necojackarc/text_file_pager.rb - Gist

You can use this class like this:

file = TextFilePager.new("./production.log", skip_header: true)
file.batch_line(batch_size: 2000) do |lines|
  results = process(lines)
  output(results) 
end

This code is absolutely same what I wrote above section. The reason is clear, I've just created this class along with my ideal's interface, like TDD.

Conclusion

I prefer first proposal because it only requires knowledge of Ruby and we don't need to make a new class. Too many classes bother us, so we shouldn't add a class unless we really need that.

But second way gives us good interface, it seems intuitive. Plus, we can easily pass options to that object. If you need such text file processing time and time again, adding a new class may be a good way.

Perhaps, it's nice to add the "batch_line" method to "IO" or "File" object. But I don't like to modify core classes so much, because that affects our code widely.

Hence, I'll choose to use "Enumerator::Lazy" and "each_slice" at first when it comes to that matter.

Thanks for reading.

An appendix

Here is my original post about that matter, written in Japanese.

*1:Of course, this method doesn't exist yet.

*2:If you don't need to use `drop`, `lazy` is unnecessary.

Taka’s blog

A software engineer's blog who works at a start-up in London