I am very pleased to be helping the Common Crawl Organization

Originally published January 13, 2014

I am setting aside some of my time to volunteer helping out with the CommonCrawl.org

Much of the information in the world is now digitized and on the web. Search engines allow people to have a tiny view of the web, sort of like shining a low powered flashlight around in the forest at night. The Common Crawl provides the data from billions of web sites as compressed web archive files in Amazon S3 storage and thus allows individuals and organizations to inexpensively access much of the web for whatever information they need - like turning the lights on :-)

The crawl is now in a different file format. My first project is working on programming examples and how-to material for using this new format.

Comments

Popular posts from this blog

DBPedia Natural Language Interface Using Huggingface Transformer

Custom built SBCL and using spaCy and TensorFlow in Common Lisp

I have a new job helping to build a Knowledge Graph at Olive AI