Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
My 3 hour weekend project: Fix My Movie Subtitles (fixmysubs.com)
91 points by superasn on Dec 24, 2011 | hide | past | favorite | 41 comments


Being a non-native english speaker I often like to use subtitles for watching movies. While there are some very good sites like podnapisi, subtitlesource, etc unfortunately it is sometimes still hard to find matching subtitles and the sync is usually out.

So to fix this problem I wrote a Perl script for my weekend project and it was surprisingly easy to port to the web thanks to Twitter's bootstrap CSS. So here is the site. Even the domain name is merely 1-hour old!

Your questions / comments are welcome. I hope this is useful to people like me who have had trouble finding movie subtitles and also people who are hearing-impaired.


I have used dvds with foreign subtitles to learn languages. One thing I don't like is that the spoken translation is usually different than the subtitled one. I think it would be a nice learning aide if the two translations could be made to match. Or to at least have the option to put in alternate subtitled translations.


Translated words and phrases often have to be very different than the literal translation. For example: "sack" in english can also mean firing a worker, or plundering a city. The word for "a large bag" in most languages will not have these same extra meanings.

Also, you may even need 3 subtitles:

- transcription of the spoken language: les jeux sont faits

- literal translation: the games are made/done

- correct/useful translation: the chips are down


I'm not 100% sure since I'm not the guy you responded to, but I think his situation is that he's listening to the audio in the same language as the subtitles, and they are different.

I've seen this happen when English subtitles were enabled for an English language movie I've been watching. Sometimes what is transcribed in the subtitles is different than what people are actually saying (though the gist is the same) even though both are in English.

In any case, this is clearly not something easily fixable by a website script.


Good catch. That looks right now you've pointed it out. I was thrown off by the word "translation".


I'm not sure if you're using "translation for "transcription" here but I think I'm in the same position.

I've been using foreign movies subtitled in the same language as is spoken, as a method for studying languages for years as well and really hate it when the transcription is different from the dialogue. Problem seems to be that the transcription is taken directly from the movie/show script, but when filmed, actors rarely go with the script word-for-word. Unless the movies has been fanSubbed by native speakers, it quite hard to find subs with the exact transcription.

I initially thought fixmysubs addressed this somehow, but i guess not. Cool project none the less.


I use subtitles most of the time. It helps a lot with strange accents and noisy movies. I can watch most things without subs, but I have to turn up the volume and it can get annoying.

I've used some desktop apps that do the same thing your website does. It was really hit or miss and a lot of work. These days if I can't find matching subs, I either wait or watch without.

Instead of searching a dozen websites manually, I highly recommend SubtitleSeeker. It indexes all major subtitle sites and it lets you drill down by language, release, episode, etc.

http://www.subtitleseeker.com/


While I'm sure this would probably violate all sorts of copyright, I'd love a site that could provide alternate dubbed languages (specifically Spanish, German and Portuguese) for movies.

The dubbing is never quite right or simply not included (for the media I'm looking at purchasing, I always ensure that at least Spanish is covered). I'm using television shows and movies that I have nearly memorized the script to help with "immersion" between more traditional study (I took many years of Spanish and gave up ... this hack wasn't mine, it was mentioned by a coworker and it turned out to be a very clever one). In the US, it's easy to find DVD/BD versions that include Spanish (to a lesser extent, French, which I'm not interested in), but rarely include anything else in the US (and most of the stuff I want to watch includes neither).

This service is intriguing to me. I don't routinely download movies, and I'm unfamiliar with .srt and .sub formats, but I'm wondering if this could help assist reading in another language the same way it has helped in listening/comprehension and it certainly seems like your site is solving a problem I've seen when transcoding DVDs ... it used to be audio mismatch (largely a problem of the past with modern container formats), now it sounds like the same pain point exists with subtitles.

Well done.


http://www.yabla.com might be what you are looking for -- it's great for learning via subtitled foreign clips. The content isn't dubbed however, rather it's from those countries that speak the language natively.


http://subscene.com/ and a few other sites offer subtitles in many languages for pirated videos. No doubt it violates all sorts of copyright.



Nice idea -- more automation would be awesome.

I wrote a simple script at one point for dealing with constant offsets by doing simple analysis on the audio track of a movie and the subtitle file to match up peaks. It worked...okay, although I was using very simple features.

It would be great to make it more robust, although I don't know if there's an easy way to upload some audio from a movie to a website (that wouldn't be a lot of work for the user).


I'm also missing captions (Netflix for TV, for other devices certain movies have them).

Me & my wife used to (and still do) learn quite a lot new words from there, and also how to pronounce certain words, and understand.

Lately it was very useful for us, as we can turn down the volume, and not disturb our son while he's sleeping. Just a notch up & down for music/sounds, and reading the rest of the movie.


Nice weekend project!

My current workflow to sync subtitles is to find one sentence in the beginning of the movie and one near the end, and search the subtitle text for these sentences. The four timestamps are then ran through the perl program subs (see http://search.cpan.org/~karasik/Subtitles-1.03/subs).

Perhaps using a search based interface, instead of letting the user pick from all lines, would be a nice extension.


Hmm, good solution for the times when you can't find subtitles that match. Often VLC's subtitle sync works though (lets you adjust in hundreds of ms each way).


Thank you :) Yes, there is an option in KMPlayer too (for subtitle fps conversion and time shifting) but sometimes trial and error just doesn't seem to work for me. So, I wrote this script instead (it was good fun).

As of now I've tried many different movies and deliberately downloaded the wrong FPS and time shifted it using the subtitle tool but so far it is working as it should be :)


It's been a while since I last used third-party subtitles for a movie but as I recall, it rarely happens that all captions are off by a constant offset. Usually the desynchronization itself stems from a movie version having longer gaps between scenes, for instance. In that case, this won't be much helpful but otherwise, good job.


That's a framerate issue, e.g. 23.976 frame/s vs. 25 frame/s. Easily correctable, not sure if this service can handle that though.


Yes, it can automatically handle all frame-rate issues with time-shifting. What it can't handle is if there is an irregular break in between (say extra subtitles in between, extended versions, etc).


Yes, that too, but that's not what I had in mind. I recall having to adjust the timing after a major block of footage, not permamently as would be the case with a mismatched framerate. Again, it might be a rare problem.


I had the same need and found this a couple of years ago:

code.google.com/p/subeditor/downloads/list

Have been using this since then. It seems to take first two times to calculate the difference in speed, while third onward it just adjusts the starting point maintaining the same speed.


Question: How did you know a "well" css class exists in Twitter Bootstrap. I cannot find it in the documentation, but it's in your code and it's in the Bootstrap file!


I found it in one of their demo applications. They hide a lot of stuff in there.


Got a link to their demo apps? I just have the 3 html examples linked in from the main doc sheet.


I also found that the Twitter Bootstrap documentation isn't complete. Using Chrome's Web Inspector to look at interesting elements of the Twitter Bootstrap website is a nice addition to its documentation.


There are sites that do this. I don't remember which but one of the sites where you can download subtitles already has a little javascript app that does the same.


could you add some screenshots to explain the process?

I'll test it on my next subtitle issue :)


Yes, I am hoping to add a 30 second video to explain the whole process. Basically you upload a srt file in the first screen and then match any 3 subtitle text to the correct time (as determined from watching the video).

Based on that information the script calculates the discrepancy in the frame rate or time shift by comparing it with the original file and then fixes that for you (also gives you two different versions if more than one solution fits)


Are you going to have subtitles for your 30 second tutorial video? ;-)


I had the reverse idea. A plugin for VLC that takes the MD5SUM of whatever file you're watching and looks for the matching/appropriate subtitle file and autoloads it.


SMPlayer [1] searches opensubtitles.org with a simple hash [2].

[1] http://smplayer.sourceforge.net/

[2] http://trac.opensubtitles.org/projects/opensubtitles/wiki/Ha...


Fantastic. Thanks for this!


Not to pick nits, but I really don't want to compute the MD5SUM of a 4GB DVD before I can play it. Maybe hash the header or the metadata (if there is any) instead?


May be md5 the first hundred and last hundred mb's? Add 10 samples from the middle of the movie taken at equal distances calculated based on the size of the file.

Having said that I don't mind having to compute 4gb md5's, since that also helps me to check the integrity of the file at a later date.


opensubtitles hashes only the initial and final 64kB.


Another project that uses a hash to search for subtitles on opensubtitles.org is Periscope (see http://code.google.com/p/periscope/), a Python command line program that searches several subtitles websites.


I made that project :) It uses hashes for OpenSubtitles and TheSubDB. It uses file names for other sources (Podnapisi, SubtitleSource, ...)


Thanks a lot for it! I really like it and use it almost daily.


I use Totem's "Download subtitles", pick the one with the best rating, click "Play with subtitles", and most of the time it works out perfectly.


My movie player should have this integrated. Nice idea.


Copyright 2012? Really?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: