Showing posts with label no-SQL. Show all posts
Showing posts with label no-SQL. Show all posts

Wednesday, July 25, 2012

Using the Datomic free edition in a lein based project

Hopefully I can save a few people some time:

I flailed a bit trying to use the first released version yesterday but after updating a new version (datomic-free-0.8.3343) life is good. Download a recent release, and do a local maven install:

mvn install:install-file -DgroupId=com.datomic -DartifactId=datomic-free -Dfile=datomic-free-0.8.3343.jar -DpomFile=pom.xml
Setting a lein project.clj file for this version:
(defproject datomic-test "1.0.0-SNAPSHOT"
  :description "Datomic test"
  :dependencies [[org.clojure/clojure "1.4.0"]
                 [com.datomic/datomic-free "0.8.3343"]])
Do a lein deps and then in the datomic-free-0.8.3343 directory start up a transactor using the H2db embedded database:
cp config/samples/ mw.propreties
bin/transactor mw.propreties
When you run the transactor it prints out the pattern for a connection URI and I used this in a modified version of the Datomic developer's Clojure startup example code:
(ns datomic-test.test.core
  (:use [datomic-test.core])
  (:use [clojure.test]))

(use '[datomic.api :only [q db] :as d])
(use 'clojure.pprint)

(def uri "datomic:free://localhost:4334//news")
(d/create-database uri)
(def conn (d/connect uri))

(def schema-tx (read-string (slurp "data/schema.dtm")))
(println "schema-tx:")
(pprint schema-tx)

@(d/transact conn schema-tx)

;; add some data:
(def data-tx [{:news/title "Rain Today", :news/url "", :db/id #db/id[:db.part/user -1000001]}])

@(d/transact conn data-tx)

(def results (q '[:find ?n :where [?n :news/title]] (db conn)))
(println (count results))
(pprint results)
(pprint (first results))

(def id (ffirst results))
(def entity (-> conn db (d/entity id)))

;; display the entity map's keys
(pprint (keys entity))

;; display the value of the entity's community name
(println (:news/title entity))
That was easy. You need to define a schema before using Datomic; here is the simple schema in data/schema.dtm that I wrote patterned on their sample schema (but much simpler!):
 ;; news

 {:db/id #db/id[:db.part/db]
  :db/ident :news/title
  :db/valueType :db.type/string
  :db/cardinality :db.cardinality/one
  :db/fulltext true
  :db/doc "A news story's title"
  :db.install/_attribute :db.part/db}

 {:db/id #db/id[:db.part/db]
  :db/ident :news/url
  :db/valueType :db.type/string
  :db/cardinality :db.cardinality/one
  :db/doc "A news story's url"
  :db.install/_attribute :db.part/db}

 {:db/id #db/id[:db.part/db]
  :db/ident :news/summary
  :db/valueType :db.type/string
  :db/cardinality :db.cardinality/one
  :db/doc "Automatically generated summary of a news story"
  :db.install/_attribute :db.part/db}

 {:db/id #db/id[:db.part/db]
  :db/ident :news/category
  :db/valueType :db.type/string
  :db/cardinality :db.cardinality/many
  :db/fulltext true
  :db/doc "Categories automatically set for a news story"
  :db.install/_attribute :db.part/db}


I hope this saves you some time :-)

Saturday, January 21, 2012

Citrusleaf: an interesting (non open source) NoSQL data store

I have been using Citrusleaf for a customer (SiteScout) task. Interesting technology. Maybe because I am excessively frugal, but I almost always favor open source tools (Ruby, Clojure, Java, PostgreSQL, MongoDB, Emacs, Rails, GWT, etc., etc. that I base my businesses on). That said, I also rely on paid for software and services (IntelliJ, Rubymine, Heroku, AWS services, etc.) and it looks like Citrusleaf is a worthy tool because of its speed and scalability (which it gets from Paxos, using lots of memory, efficient multicast when possible for communication between nodes in a cluster, etc.)

Wednesday, January 18, 2012

Yes, the DynamoDB managed data service is a very big deal

Just announced today: DynamoDB solves several problems for developers:

  • No administration except for creating database tables (including some decisions like using simple lookup keys or keys with range indices and whether reads should be consistent or not)
  • Fast and predictable performance at any scale (but see comment below on the requirement for provisioning)
  • Fault tolerance
  • Efficient atomic counters
The probable hassle for developers that I see is in knowing how to provision tables for reasonable numbers of allowed reads and writes per second. When you create tables one option is to get warning emails when you hit 80% of provisioning capacity; I interpret this to mean that you really had better not go over the capacity that you have provisioned. Amazon needs to know how much capacity you need in order to allocate enough computing nodes for your tables. The capacity that you pay for can be raised and lowered to avoid getting runtime exceptions when you go over your provisioned number of reads and/or write per second.

The lastest AWS Java SDK handles DynamoDB. For Ruby, the latest aws-sdk (gem install aws-sdk) supports DynamoDB. I signed up for DynamoDB, looked at the Java example, and wrote a little bit of working Ruby code using documentation - I had to slightly change the example code to get it to work for me:

require 'aws-sdk'

dynamo_db =
    :access_key_id => ENV['AMAZON_ACCESS_KEY_ID'],
    :secret_access_key => ENV['AMAZON_SECRET_ACCESS_KEY'])
table = dynamo_db.tables.create('my-table', 10, 5)

  sleep 3
  puts "Waiting on status change #{table.status}"
end while table.status == :creating

# add an item
item = table.items.create('id' => '12345', 'foo' => 'bar')

# add attributes to an item
item.attributes.add 'category' => %w(demo), 'tags' => %w(sample item)
p item

# update an item with mixed add, delete, update
item.attributes.update do |u|
  u.add 'colors' => %w(red)
  u.set 'category' => 'demo-category'
  u.delete 'foo'
p item.attributes.to_h

# delete attributes
item.attributes.delete 'colors', 'category'

# get attributes
p item.attributes.to_h

# delete an item and all of its attributes
I used the AWS web console to then delete the test table to avoid charges.

DynamoDB is a big deal because while it is easy enough to horizontally scale out web applications and back end business applications, it is a real pain to scale out data storage for session handling and application data. Except for paying for the service, Amazon is trying to remove these hassles for developers.

I think that in addition to deployments to EC2s, DynamoDB will be a very big deal for Heroku users because it gives them another data store option in addition to Heroku's excellent managed PostgreSQL service, MongoHQ, Cloudant, and other 3rd party data service providers.

Sunday, November 08, 2009

"always on" MongoDB installation on my laptop

I spend a lot of time experimenting with infrastructure software, sometimes for customer jobs and sometimes just because it is fun to learn new things. For non-SQL data stores, I have spent a lot of time in the last year experimenting with and using CouchDB, AppEngine datastore, Tokyo Cabinet, MongoDB, Cassandra, and SimpleDB. Tokyo Cabinet and SimpleDB store hash values as strings, and don't have the great client APIs that the others have because limitations in string-only hash values. That said, for an Amazon hosted application SimpleDB can be a good choice and Tokyo Cabinet is light weight and easy to install and use. Casandra looks great, and as I have written about here before, Cassandra is easy to use from ruby and has great features.

MongoDB has great performance and similar capabilities as Casandra. Chris Kampmeier has a great writeup that covers installing MongoDB on OS X, including setting it up as a system service. I followed Chris's directions. A pleasant surprise is that MongoDB has a light footprint, and leaving it running as a service like I do with PostgreSQL and MySQL is reasonable.

Along with the mongo and mongo_record gems, MongoDB is an awesome tool, and always keeping it running makes it easy to experiment with. BTW, I also keep Sesame and CouchDB running on my local network for much the same reasons.