Jeff Allen wrote:
Public release of Haitian Creole language data by Carnegie Mellon
The Language Technologies Institute (LTI) of Carnegie Mellon
University's School of Computer Science (CMU SCS) is making publicly
available the Haitian Creole spoken and text data that we have
collected or produced....
http://www.speech.cs.cmu.edu/haitian/
FarkasAndras wrote:
I'd have expected Google to be the first out of the blocks, and certainly not this fast.
I tip my hat to Redmond.
Because of my background in having helped developed the previous speech-to-speech MT system at CMU 12 years ago (many of my papers that describe the system at: https://www.box.net/shared/bz4sq9jx88), I have been receiving emails and phones from so many MT developers over the past 2 weeks. they are all screaming to have data to build such systems.
So I've spent a significant amount of time with CMU this week in identifying and verifying all of the data that is archived and helping progressivly make it available on the CMU page. This is to promote the innovation of MT systems with such data. More data is coming.
And there are many MT development teams scrambling in many labs right now who are trying to get various types of MT systems together.
Just having one out "first" doesn't mean that it's the best, or that it fits a specifically identified domain/subject field/topic or a translation/interpretation use case.
but lot's is happing and more will appear.
Jeff