Ruby, Ruby on Rails, Artificial Intelligence and something more ...

Ver y Cocinar, descubre la cocina y disfrútala a simple vista

RailsConf Europe 2007 Speaker

RailsConf Europe 2007 Speaker
http://www.railsconfeurope.com/

Spejman Thumblog!

Monday, 14 May 2007

Backup of Bloglines keep new items

If you are a Bloglines user, you probably save the interesting posts using "keep new" feature.

If you have use this reader for a time, the quantity of post saved could be large... Some day I thought that won't be funny to loose all this data, then I build a ruby script to backup bloglines "keep new" items into a xml file.

With this script I learned better Mechanize and Hpricot Ruby libraries.
Este script me ha servido para probar dos librerías muy útiles de Ruby: mechanize y hpricot.

You need these libraries in order to use the script:


gem install json
gem install activesupport
gem install hpricot
gem install mechanize


And here you have the script. I think is very understandable. I hope it helps you to backup your bloglines account (remember to change EMAIL and PASSWORD values with yours) or to understand better how mechanize and hpricot works.

require "rubygems"
require "hpricot"
require "json"
require "mechanize"
require "active_support"

# Reads a bloglines javascript tree structure that has all
# feeds data.
def read_tree( tree_base, label = "" )
tree_base.each do |tree|
if tree["kids"]
read_tree tree["kids"], label + "/" + tree["n"]
else
@feeds << [tree["n"], label, tree["kn"], "http://www.bloglines.com/myblogs_display?sub=#{tree["id"]}&site=#{tree["sid"]}"]
end
end
end

# Add more memory to hpricot otherwise couldn't load some webs.
Hpricot.buffer_size = 262144

agent = WWW::Mechanize.new
page = agent.get 'http://www.bloglines.com/login'

form = page.forms[1]
form.email = 'EMAIL'
form.password = 'PASSWORD'

page = agent.submit form

# Get the bloglines sindicated feeds
menu_page = agent.get "http://www.bloglines.com/myblogs_subs"
start_text = "var initTreeData = "
end_text = "\n;\n"
js_feeds_tree_str = menu_page.content[menu_page.content.index(start_text)+start_text.size..menu_page.content.index(end_text)]
feeds_tree = JSON.parse js_feeds_tree_str.gsub("\\","")
@feeds = []
read_tree(feeds_tree["kids"])

puts "<bloglines_saves>"
@feeds.each do |feed|

page = agent.get feed[3]
doc= Hpricot(page.content)

# get the content of all saved feed posts
content = ((doc/"body")/"td.article")
next if content.empty?
puts "<feed name=\"#{feed[0].strip}\" folder=\"#{feed[1].strip}\">"

# Iterate each saved feed post
((doc/"body")/"a.bl_itemtitle").each_with_index do |title, index|
puts "<feed_save title=\"#{title.inner_html.strip}\" href=\"#{title.attributes["href"]}\">"
puts content[index].inner_html.to_xs
puts "</feed_save>"
end
puts "</feed>"

end
puts "</bloglines_saves>"

Más información:

2 comments:

admin said...

Thanks, great solution, works like a charm!

mikehedge said...

I want to export my "keep New" items from Bloglines.

what do I do? do I install Ruby?

About Me

Barcelona, Barcelona, Spain