Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Scribd in HTML (scribd.com)
315 points by ZeroGravitas on May 7, 2010 | hide | past | favorite | 104 comments


These folks just went from my list of websites I dread visiting to "damn, that is some sweet technology".


And they probably wouldn't have been able to pull it off at this scale had they not first started a site you dreaded.


It may be "sweet" technology. I just haven't found out yet what the purpose of this or the previous technology is.


This is more than a 'sweet' technology. It is a big step forward for the web, because it means that all the documents uploaded to Scribd are now fully indexable!

They're now all accessible by crawlers, and will start showing up in search engine results. I believe the vision they craft (all the world's documents living online as part of the fabric of the web) is a very powerful one.


It's a big step forward only in comparison to what Scribd did before. The Web has always been perfectly capable of showing indexable documents, even PDF documents.


For a more standard document example, with complex math, charts etc. see:

http://www.scribd.com/documents/5/Paper-5


Wouah, that one was really awesome. And in full screen mode, it's as good as reading a real pdf. Awesome work scribd. Performance wise, Firefox seems to have a little bit of a hard time.


My Firefox died for a while on this one as well (3.6.3).


Excellent (on my own build of 64 bit Firefox on Linux). I will now start using Scribd for more than just downloading stuff on it.

One note about using NoScript: I can toggle on and off the annoying Facebook popup by hitting the S button it points to, but I must enable a Facebook site to banish it with its X in the corner widget.


Is anybody else having problems reading that in full screen? Under Firefox 3.6 and Windows 7 it consistently locks up my browser after scrolling 4 or 5 pages.


Awesome.

One note: is there a way to pin the bottom toolbar as hidden? It's kind of annoying to read a document with it popping in and out every 30 seconds...


We're probably going to do something to improve the toolbar hiding interaction. Thanks for the feedback.


Also, using the arrows on the keyboard to move to another page would be awesome.


If you change the View Mode to Slideshow, this allows you to change slides via the left and right arrow keys.


Good to hear! I don't have a problem with the toolbar hiding/showing at all - I just noticed it continuously popping up as I scroll down in Chrome / OSX. It seemed tied to the scroll action, and tended to distract a little. Good to know you guys are on it - you guys switching to HTML5 is a boon!


Works really well here on FF3.6. This is REALLY awesome!


Another thing is that scrolling is kinda sluggish in Chrome on my 2007 Mac Mini. Google docs PDF reader seems to scroll much more smoothly. It's still a big improvement over the flash version.


This is a major coup for Scribd. I think it's a service that people begrudgingly and painstakingly used. But now it's just joyful, and kind of sexy. But as someone points out - Scribd is a bridge and if the things that they bridge to ever contain the ability to embed or convert to HTML5, what will they do?


Their employees will retire, to live forever on the free pizza and beer which every geek will gratefully buy them in exchange for compelling every single popular proprietary document format to support effort-free export to standard HTML.

Or, you know, they'll invent something else. It's a lot to ask of a business plan that it be good for twenty years.


Reminds me of the Google Chrome cartoon: http://www.google.com/googlebooks/chrome/ Even the colors, characters and "objects with faces". Not necessarily a bad thing, just curious.


I was thinking the same thing. However, it made me respect them just a little bit more. They could have just written a blog post and said... "We're no longer using Flash," but instead, they took the time to build something that a) illustrated very well their new technology, and 2) actually entertained some of us. Valiant effort.


You called it. That was exactly what inspired this presentation. That Chrome cartoon was awesome - made me an instant Chrome advocate.


This is really nice work guys. It's slightly disorienting to see HTML5 in real use. I kept looking for a scollbar until I realized it was just a regular web page; the scrollbar is my browser's scrollbar. (Though I think part of it was the positioning of the toolbar at the bottom; it looks like the page is split by the toolbar and it's not really.) The rendering in-browser is beautiful, and much more usable than a pdf viewer on my system.


I applaud their effort, but when I look at real documents in both formats (e.g. http://www.scribd.com/documents/5/Paper-5 like ZeroGravitas posted here), the HTML5 version looks very noticeably worse than Flash (tested in Chrome 5, Safari 4, FF 3.6). Fonts are rough or missing, kerning is shot to hell, and layout looks like it was performed on a shake table. It resembles the output of a poor PDF viewer.

I wonder if this was the right time to roll out the HTML5 format.


I don't know if we're seeing the same thing: http://people.cs.vt.edu/~scschnei/pictures/scribd.png

That's with FF 3.6.3 on a Mac. That looks pretty damned good to me. I like that much better than the Flash version, party because I think it renders better and partly because it's easier to navigate. The only problem I see is that the text could be darker so that it would pop more.


Yes, good point. I am on Windows XP; you are on Mac and SURPRISE! things look much better for you. I should have mentioned OS in my original post.

http://ftso.net/scribd-chrome5-winxp.gif shows what I am seeing (zoomed in for a little more clarity; it still looks bad at regular size).


Thanks for the feedback. Win XP has a different font renderer than Win 7 which is different than os x's. The fonts look best in win xp with cleartype enabled. We are going to be working on improving the font quality in XP in all modes.

It looks like you don't even have font smoothing enabled which. Enabling that will make a huge improvement cleartype or not.


Thanks for the tips! I never spent much time making XP look better, I mostly just turn off all the piss-me-off stuff the first day I run the machine, and forget it.

I'm glad it was a mixture of XP and PEBKAC causing the output to look bad. Like I hinted before, I am in complete support of your decision to move to HTML5, and I am tickled to see it look so good (on other people's screens ;).

EDIT: Wow, yes, ClearType smoothing really takes care of the font appearance.


Have you tried turning anti-aliasing on? It’s a system level control in Display Properties I believe (though it’s been many years since I used Windows in anger).


I'm on Ubuntu with Chrome:

http://i.imgur.com/qF0xA.png


Can you do screenshot comparisons? I am not seeing these problems and I don't know if it has to do with my browser/OS or because I just don't notice these things.


Also super laggy when scrolling fast over the content. I never see that kind of scroll lag and it was quite apparent while viewing your provided ZeroGravitas link.

I have to say going from a "viewer" to having all the content on the same page is a huge improvement.


In order to get that level of display control, they're doing some insane things under the hood with the HTML. It's a tag fiesta in there (understandably -- there's no other way to do it), and complex HTML can put some real demands on the processor -- but nothing compared to the same content in Flash.

In Safari scrolling that document is very smooth. So much better than the Flash Scribd. Kudos.


Yeah. Even if scrolling is laggy, it's arguably better than having to click "next" for every page.


There is a lot going on under the hood, and we have put substantial work into optimization for things like scrolling through content. We know we can still do better, though, so expect to see things even more performant in the coming weeks!


I was quite impressed by how good it was. There are problems but they are at least not immediately obvious. I’m still not sure, though, whether it’s the right thing to do.


The HTML5 fonts look fine for me (FF 3.6.3, OS X 10.5), but the text overflows the right margin rather badly, and it persists even after resizing. I hope they can work out the kinks.


Mac/Safari 4 Looks perfect. Better then boxed Flash version


Fine Job, both technically and graphically. Smooth on Chrome. The only aesthetic change I'd make is that the up/down buttons could jump, or scroll faster - at first I was unsure what they were doing different from my window scroll bar.

I did find one bug...On slide 14 (well actually all of them, but it's most noticeable there) you can indeed highlight and copy the text...but not the last character in a block. If you try to select the last character you'll invert the selection to be from the start of the slide to your highlight point (sometimes this includes the page frame so it looks like you've selected the whole page). I suspect (based on my own bad habits) that it's a boundary error, counting the length of the highlight from 1 when the string length is counted from 0.

Rendering more complex documents isn't as perfect as pdf, eg column-spacing or margins can look a little bit off, but that's a minor cosmetic flaw that I'm sure will be fixed.


I've noticed Chrome itself has some highlighting issues. Another example: Put some text in on http://www.eeemo.net/ , which is the Zalgo text generator, which works by putting lots of Unicode character decorations on your base text. Chrome demonstrates some very strange highlighting/copy/paste interactions on that page if you try to copy the Zalgo-ed text out.

My guess is that it's not a Scribd bug, and I also suspect there's probably not much they can do to fix it.


Hmm, seems fine for me with Chrome beta 5, but you might be right. I don't have FF installed right now to compare.


That is extremely cool. I always found Scribd really frustrating to use - the disconnect between my normal browser use and the embedded Flash reader felt about the same as the disconnect caused by viewing the same file in Acrobat Reader, so I didn't really see the point of it.

If browsers take this onboard as a common HTML5 scenario to be optimised, and it becomes a viable, quick and plugin-free way of reading any document online, I will be very happy :)

Kudos to the Scribd team for the mighty effort this must have taken to implement.


As somebody who dreads scribd links (I won't install flash), this really looks like a good step forward. PDFs are still better viewed in okular, but I could see this being useful for viewing MS formats. Really slick interface, too.


Likewise, I won't install okular. Evince is the only game in town.


Tried it on my iPhone, only the first three frames show some text (no images) and the rest of the sheets are blank. I believe it is the future though, anyone with an iPad who can try this out?


I have no iPad, but Opera 10.52 (Windows) has a similar behaviour: The first three frames are there (with images), the rest of the sheets are blank.


Opera 10.53 on Mac OS X Tiger and the same thing happens.


I get a "download now" button on my iPad =(

edit: Works great on iPad. Copy/paste doesn't work for text, but that might be an iPad issue.


Press 'regular site' and select the presentation in the upper right corner.


AH. Thank you!


So, it turns out that the version of Mobile Safari on the iPhone doesn't support getBoundingClientRect, which we were relying on. The fix for that should go out soon, at which point it will work on your iPhone.


Comparison of HTML5 display and Flash on chrome + win7: http://dl.dropbox.com/u/2601554/comparison.png


Initially I was expecting the purchase of opinion through the long-played Flash vs. HTML5 gang/mob, "Down with Flash" signs high in the air, binary opinionated nonsense that is so common.

Respect paid to Flash for its use when it was needed.


Was it really needed? Most of the stuff I've seen on Scribd was basically pdfs in an iFrame, except replace iFrame with Flash.


How will this change affect API users or those who have embedded Scribd content on their sites?


We are working on a migration path that will switch people over to the HTML versions (unless they don't want it). Are you one of those people? If so ping me directly - jared at scribd.


I was wondering about this too. I often get document conversion errors (using the JavaScript API), especially with UTF-8 documents. Think HTML5 will help?


Honestly, I'm not sure. Send me your Scribd username and I'll take a look.


Much better than their Flash interface. But completely broken in Firefox 3.0.x (under Red Hat; can't advance past the first page and there are no pictures) and the fonts are too big under Iceweasel 3.5.9 (under Debian). Until they stop requiring a log-in to download the original PDF so I can view the content in a decent viewer, I'm going to continue cursing every time I accidentally follow a link to scribd.


Would you mind emailing me screenshots of the issues you are seeing? We are trying to document all inconsistencies, and your help is much appreciated.

quin -at- scribd.com


Nice. I have hated scribd for years, because their flash app sucked enormously and I just wanted a PDF.

But this is better than that - now scibd is actually useable.


When viewing flash slides, the scrolling is annoying. I prefer to hit right and get the next page. Can't this be simulated in HTML5?


Sure thing; the view mode switcher is in the middle of the toolbar at the bottom. Select slideshow mode from there and you're set.


I wonder how Scribd pulled this off wrt HR. I mean, they must have had a sizable investment in Flash engineering. Did those people leave? Did they just start doing HTML stuff instead?

Back in the day when I was doing webdev, there was a pretty serious schism between webdevs and flash devs, and never the twain met. Maybe this is less true now.


I dont think Scribd made too much investment in Flash developers. On server side they were using "PDF2SWF" released under GPL by Swftools.org. My understanding is that, initially they were using 'FlashPaper' to display documents in front end.


I feel like the real accomplishment here is that they are converting documents to websites on the fly - complete with font-faces and image positioning. How do you do that? If that's possible, will we need front-end layout developers in a few years (we'll probably still need animation/transition development, right?)?


I'm totally in that "this is great" camp.

That said, can someone familiar with HTML5 explain, or provide a link, for the seemingly crazy source? Is this a result of the work-in-progress framework that creates these? Or is this really what it takes? Kinda looks like the source out of .doc --> .html conversion.


Nice! But scrolling that page makes my cpu cry in Firefox...


No problem in Chrome.


Slightly less load in Chrome for me but it's still significant.

What probably matters is I am using Windows XP with integrated graphics on the motherboard.

Are you using Windows 7 with discreet?

I am betting a Flash version would have the same or more load so the issue is probably moot.


Are you using Windows 7 with discreet?

No I'm running 64-bit Ubuntu 10.04. I do have integrated graphics though. I agree that Flash wouldn't likely be any better.


Looks nice, but it suffers from the typical "page with too much JavaScript + fancy features" sluggish scrolling issue (I'm only talking 8-10fps vs a more regular 20-30fps here, but I'm sensitive ;-)). This problem is reduced, but still present, in Chrome, even. Good start though.


It's really quite atypical, but all things considered, that problem is quite difficult to get around in this case. We do, however, plan to put significant effort into optimizations.


Considering you are almost emulating what Flash does natively, it is an awesome effort already. I dare say that with Chrome's continuing improvements, it'll be barely noticeable soon. That said, people running slow machines might have more issues.


Wow. I would love to see a technical write-up about how they generate these documents. (I assume they're programmatically generated...?) How do they get around issues of font licensing? (Where do the font file themselves come from?) Really, really cool.


Awesome! I am psyched for the interest ... we'll publish an explanation of the tech details next week.


I know it's been said here a couple times, but positive reinforcement is good: I used to get annoyed every time a document was linked to on scribd, and only went there grudgingly. In my mind, this is a complete 180 and a very welcome upgrade.


As an engineer on the project, things like this are amazing to hear. Thanks so much for the feedback!


Great work! My browser doesn't come to a halt when I click a link to visit scribd! Huuuuuge PLUS. I would avoid your site because of this experience, but will no longer fear.


By the way, I like how they used a semi-geeky woman to present this (well at least in 'toon form).

Are there actually female coders at Scribd by any chance or was this just a marketing concept?


No female coders here presently - but we are hiring and welcome any candidates! All of marketing is female, though.


Haha. Seems to be the norm in tech firms. I know it is where I work.


Wonderful! I honestly never liked the flash choice. I always wished for something like google's view as html, but done right. And this may be it!


Wow. Sweet.

Quick quesiton, is there a way to enable this by default on the main site? I notice that if you go to /doc/, it shows Flash, and if you got to document, it's HTML.

So, from the front page, I click on Scribd in HTML link, and it takes me to http://www.scribd.com/doc/30964170/Scribd-in-HTML5

Edit: NM, I see the light blue box on the right side there. >_<


What's unclear to me is if they're gonna keep showing Flash to those browsers that dont have HTML5 capability. I haven't seen this noted anywhere.


Believe it or not, it supports IE6, 7 and8 out of the box. Was not easy either.


That's an understatement.


Finally! I've always wondered why Scribd went Flash; it made their usability less than the documents they were copying, most of the time.


Works well overall, but the damn toolbar is fixed width, so it gets clipped when the browser window is smaller than 1024 wide, which is lame.


Yeah, we need to fix that. I imagine we will take care of that soon.


The illustrations are kick ass - gave me a feeling i was reading a comic strip. +5 just for this document.


It would be great to see Adobe products export to HTML5 like this.


Wow, I might actually start clicking the [scribd] link now!


What does this mean for their copy protection? How can they prevent people from downloading/printing works now?


It's a shame that the link on the last page to Special Agent Productions takes you to a Flash splash screen.


This is absolutely going to get scribd bought by Adobe. It's a no-brainer. Case closed. Amazing work, guys!


Great technology, underwhelming choice of a font. g's that look like q's are rather distracting.


I can't navigate with arrow keys


Try switching the 'view mode' in the bottom bar (just to the left of the search field) to 'Slideshow'. Arrow keys should work once you do that.


We will be implementing more and more advanced features such as hotkeys for various actions. These are the sorts of things we are anxious to hear suggestions for, so keep them coming.


Or the back button. But it's still damn cool.


Didnt' we do this yesterday ?

http://news.ycombinator.com/item?id=1326047

And the day before yesterday ?

http://news.ycombinator.com/item?id=1322768


keyboard support?


There is keyboard support in some of the view modes. We will be adding more in the near future.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: