Just released: my utility for people reading 4pda.ru via Babelfish and Google Translate
4pda.ru, in addition to XDA-Developers, is one of the best sources of Windows Mobile-related hacking information. It was in there that, for example, I’ve found the hacked, 1023 TAO Midlet Managers, on which, then, Risidoro has built on his latest (1036) TAO releases at XDA-Dev, along with a lot of other info never published in English. Myself not being a businessman traveling in Russia all around or watching / reading / listening to Russian-language literature / music / movies, I would never have thought I would find my knowledge of Russian useful in my professional life – now I have :)
Should you not know Russian, I’ve created a tool that greatly helps in reading 4pda forums. For this, you will, of course, want to use Babelfish or Google Translate to translate these pages. Just enter the URL (in the case of the MIDlet thread, http://4pda.ru/forum/index.php?showtopic=1333 ) in the Translate a Web page text field, select “Russian to English” in the lower “Select from and to languages” drop-down list and click the bottom Translate button.
After a while, you’ll notice that not everything is displayed – after about 100 kbytes of source HTML, posts are cut and you can’t make the final part displayed in any way, as can also be seen in THIS screenshot (see the “<<<<<<<< snip >>>>>>>>” at the bottom – it shows Babelfish won’t translate more). The situation is pretty similar with Google, which, after a while, has a tendency to switch back to the original language.
The wrong approach you can take in these cases are as follows:
- you cut and paste the text in Babelfish / Google T. in the upper, direct text input field. It’s a very awkward and slow solution because you can only have some 2-3 kbytes of text translated at one time
- save the original Web page to your local PC, edit its HTML source (cut out the first, say, half of the original page) and upload the edited version to any Web server so that it becomes visible to Babelfish / Google T. This also involves a lot of additional work.
Unfortunately, the print mode (clicking the “Версия для печати” link, which takes you HERE – HUGE page!) doesn’t help much. Much as it (to some degree) cleans up the code and removes the, for most quick translations, unnecessary stuff (avatars, number of posts, links to other pages, ads), it returns the article as one big file, which is, with longer threads, well above 100 kbytes. This means you won’t see most of the newer posts translated either.
The right approach, of course, is using my tools ;-). I’ve created a program that automatically downloads the contents of an entire thread, sliced to small HTML pages named following an easy-to-autogenerate-links-to naming convention. What is more, I also release the source (available HERE) for it so that you can see how it works. You’re also free to modify it to download other forum content in a much more Bablefish (and, for that matter, also PDA / mobile) -friendly format.
Usage
- if you haven’t already done so, install a Java environment on your desktop PC (free JDK download HERE)
- download the above-mentioned source file
- enter the “javac ForPDAruSimplifier.java” command in the same directory (from inside, say, Total Commander) so that the source is compiled
- enter the following command:
java ForPDAruSimplifier 30 1050 1333 "4pda-"
where the parameters are as follows:
- 30 is pretty much fixed for 4pda.ru (but different with other sites; this is why I’ve made it an easily modifiable parameter) – the number of posts displayed on a forum page
- 1050 is a product of 30 and 35. 30 is the above-introduced posts-on-a-page parameter; 35 is the number of thread pages. (Now, the MIDlet thread has 35 pages.)
- 1333 is the number of the thread itself; it can be very easily found. For example, the 1333 for the MIDlet thread can be very easily spotted if you take a look at the URL of the thread: http://4pda.ru/forum/index.php?showtopic=1333. Yes, it’s the number after “showtopic=”.
- Finally, "4pda-" instructs the tool to save the target files with the “4pda-“ filename prefix. You can use any other prefix.
After running the tool, upload the resulting files (in this case, 4pda-01.html … 4pda-35.html) to the Web so that Babelfish / Google T. can access it. You can start entering the new addresses into Babelfish / Google. An even better an easier approach is creating a link file, where all you need to do is clicking all the link files in order, with Ctrl (IE) or Ctrl-Shift (Opera) hold down. The latter makes sure the links are opened in a background tab. Just give a try to the following links to see this for yourself. For Babelfish, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38; for Google Translate, 01, 02, 03, 04, 05, 06, 07, 08, 09, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38. (Downloadable, original source of the above HTML links HERE)
See the difference? None of the original pages were cut in half and all posts are perfectly readable.
Note that I’ve modified the code so that it doesn’t include local file attachments (my code removes the links). If you know 4pda.ru (or, most Russian PDA sites and their views about international Copyright issues), you know why I’ve chosen to do so.
- Login or Register to post comments
Printer-friendly version




Glad you've found it useful :)
BTW, thanks for pointing to Google Translate - it indeed is better than and really has advantages over Babelfish:
- it doesn't try to convert the characters of unknown words into English pronunciation (for example, j -> zh (as in JAR), e -> je (as in Esmertec), c -> k and b -> p (as in CAB, CPC) etc.)
- quick look
- doesn't mess up numbers at all (unlike Babelfish)
- doesn't put a . after \ and _ characters.
Note that it seems size restrictions depend on the actual server load. In general, you'll get appr. the same amount of text translated by Google than by Babel (the rest being in the original Russian). You'll, therefore, still want to use my above-released program to "clean up" 4PDA forums.
Of course, it doesn't know, otherwise, grammatically OK, new IT / slang words like "девайс" ("device", written in Russian as pronounced). But that was also a problem(?) with Babelfish.
Sometimes, however, it delivers decidedly worse results - for example, the title "Управление в java-играх (мидлетах). Известные проблемы и некоторые способы их решения" is translated as "Control in java- games (midletakh). Known problems and some methods of their solution" by Babel and as "Office of java games (MIDlets). Known issues, and some ways to address them." by Google. As can clearly be seen, the first word ("управление", which is "administration, governance, management") is pretty much misinterpreted in this context by Google, while well-translated by Babel. Still, the overall presentation uses much better English ("Known problems and some methods of their solution" (Babel) vs. "Known issues, and some ways to address them" (Google)) and, generally, knowns a bit more new words (see MIDlets vs. midletakh. Incidentally, the latter also shows that Babel doesn't even try to sound / look similar to the source Russian: for example, instead of the much-easier-to-understand "ah" ("ах", that is, the Plural Prepositional case in all three genders) postfix, it uses "akh", which is a real pain in the back for people that know the Russian grammar but prefer reading these English "translations" instead).
Yes, it should work flawlessly under any JVM capable of running standalone Java applications - that is, the very old JEODEK, CrEme or MySaifu.
I'll try to make a MIDlet version of it.