Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

One approach is to put text files in Amazon S3 and write map reduce jobs that you can run with Elastic MapReduce. I did this a number of years ago for a customer project and it was inexpensive and a nice platform to work with. Microsoft, Google, and Amazon all have data warehousing products you can try if you don't want to write MapReduce jobs.

That said, if you are only processing 2 GB of text, you can often do that in memory on your laptop. This is especially true if you are doing NLP on individual sentences, or paragraphs.



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: