retrobits Posted May 24, 2010 Author Share Posted May 24, 2010 At first glance these are the books I have that are not in the Torrent. Hey orpheus, those look like great additions to the collection. I am fairly sure that Thumpnugget would be interested in scanning those and would want to discuss with you further. He will be available again in mid-June, so expect a follow up then. Quote Link to comment Share on other sites More sharing options...
+orpheuswaking Posted May 24, 2010 Share Posted May 24, 2010 At first glance these are the books I have that are not in the Torrent. Hey orpheus, those look like great additions to the collection. I am fairly sure that Thumpnugget would be interested in scanning those and would want to discuss with you further. He will be available again in mid-June, so expect a follow up then. No problem... He can PM me anytime Quote Link to comment Share on other sites More sharing options...
Redb3ard Posted July 15, 2010 Share Posted July 15, 2010 I hate to be critical of such work, please don't take my suggestion that way. Digitizing these books is important... they're the sort that are discarded even by libraries. However, it is a very sizable archive. And for most of these books, they could be reduced in size without losing any of the information in the pages. I know that the standard is to digitize these as some raster format (usually jpeg, but for those who want high quality, they generally use tiff) and make PDFs of them. However, such books were usually light on actual images... a diagram here or there, things of that nature. Generally, for a single person, anything other than this would be very difficult... even with OCR, it makes so many mistakes that they can't be corrected by one man. We could as a community transcribe it into html with very little work for any single individual. If each book averages 120 pages, then if you just put up a website where each person proof-reads a page or half a page from OCR, or those who know how if they were to do the markup (wrapping paragraphs in p tags, the occasional img for a diagram, the occasional table)... we could probably complete a book every few weeks. Even with images, I'm betting that most books would clock in under about 3 megs. No matter what though, congratulations and good work. People who work to preserve books, even for niche interests like this, are doing a very good thing. I'd say it was even noble, if it didn't sound so corny. Quote Link to comment Share on other sites More sharing options...
Hatta Posted July 15, 2010 Share Posted July 15, 2010 I think vectorizing these books would be a great thing, for ease of searching, copy & paste, etc. But I like having the raster images too. It feels more authentic to read, well as far as reading a PDF of a 20 year old book can be authentic. I like seeing the original typesetting. I'm not sure size is such a big deal. It's just 2 DVDs. Hard drives are in the terabyte range these days. You can get an 8 gig flash drive for $10. I'd definitely be up for typing in a book or two if there's some interest in such a project. But I'd prefer to add the text to the PDFs, rather than turning it all into HTML. Quote Link to comment Share on other sites More sharing options...
+Allan Posted July 15, 2010 Share Posted July 15, 2010 I hate to be critical of such work, please don't take my suggestion that way. Digitizing these books is important... they're the sort that are discarded even by libraries. However, it is a very sizable archive. And for most of these books, they could be reduced in size without losing any of the information in the pages. I know that the standard is to digitize these as some raster format (usually jpeg, but for those who want high quality, they generally use tiff) and make PDFs of them. However, such books were usually light on actual images... a diagram here or there, things of that nature. Generally, for a single person, anything other than this would be very difficult... even with OCR, it makes so many mistakes that they can't be corrected by one man. We could as a community transcribe it into html with very little work for any single individual. If each book averages 120 pages, then if you just put up a website where each person proof-reads a page or half a page from OCR, or those who know how if they were to do the markup (wrapping paragraphs in p tags, the occasional img for a diagram, the occasional table)... we could probably complete a book every few weeks. Even with images, I'm betting that most books would clock in under about 3 megs. No matter what though, congratulations and good work. People who work to preserve books, even for niche interests like this, are doing a very good thing. I'd say it was even noble, if it didn't sound so corny. I hate to be critical but....many of the books do have a lot of imagines in them. Modern OCR programs are actually pretty good as long as the source isn't too poor. Yes, you have to correct stuff and it does take time, but it's not so hard that 'they can't be corrected by one man.' HTMLing does take a lot of time although with modern web page creation programs it's a lot easier than it used to be. Especially those damn tables. If your really believe in what your saying about the preservation of books like these, you could email Kevin Savetz over at www.atarimagazines.com and offer to HTML the many books and magazines he has permission to post by the authors themselves. I did many books and magazines for him but unfortunately I don't have the time like I used to. Allan Quote Link to comment Share on other sites More sharing options...
+wood_jl Posted July 15, 2010 Share Posted July 15, 2010 I hate to be critical, as well. However, I absolutely **LOVE** these books, the way they are!! I want ORIGINAL typesetting! I like to see them as they were intended to be seen. When I set up my laptop next to my Atari8, and I have the PDF on the screen, looking as it's intended - I'm pretty darn happy! I still love having the real book - and I have several - but I'm not sure if I'd call these books (the way they are scanned) the next-best-thing - or maybe even better - as a laptop screen doesn't try to close on you and lose the page when you're typing in a listing. Target had 1-TB external USB hard drives on sale last week for $79. Broadband internet has been popular for years now. I don't see the need to reduce the size any, esp. if it meant compromising the look of the books even in the slightest. Bravo to Thumpnugget, and whoever else made efforts in this regard. 1 Quote Link to comment Share on other sites More sharing options...
+orpheuswaking Posted July 15, 2010 Share Posted July 15, 2010 I think everyone is entitled to their views, pro or con... I'm just glad that these are scanned no matter how it's done. Personally I like them the way they are, I want to see the layout the way it was intended. plus I printed out 60% of them Quote Link to comment Share on other sites More sharing options...
+Allan Posted July 15, 2010 Share Posted July 15, 2010 My only grip with jpgs and PDFs is that you can't text search them nor cut text from them. Allan Quote Link to comment Share on other sites More sharing options...
+orpheuswaking Posted July 15, 2010 Share Posted July 15, 2010 Can't do that with a real book either Quote Link to comment Share on other sites More sharing options...
Havok69 Posted July 15, 2010 Share Posted July 15, 2010 My only grip with jpgs and PDFs is that you can't text search them nor cut text from them. Allan Who says you can't do a search on PDF's? Of course, I do have the full version of Acrobat, so perhaps it's different for me. I suppose that when these are created they could be OCR'd first and then posted up so everyone can do a search. And to voice my opinion - I prefer the books in the format they were printed in. I absolutely detest HTML formatted books; especially if you want to print the whole thing in one shot. 1 Quote Link to comment Share on other sites More sharing options...
+Allan Posted July 15, 2010 Share Posted July 15, 2010 My only grip with jpgs and PDFs is that you can't text search them nor cut text from them. Allan Who says you can't do a search on PDF's? Of course, I do have the full version of Acrobat, so perhaps it's different for me. I suppose that when these are created they could be OCR'd first and then posted up so everyone can do a search. And to voice my opinion - I prefer the books in the format they were printed in. I absolutely detest HTML formatted books; especially if you want to print the whole thing in one shot. You can't unless you made the original PDF from text. With these pdfs, they're just made from tiffs or jpgs. Allan Quote Link to comment Share on other sites More sharing options...
+Allan Posted July 15, 2010 Share Posted July 15, 2010 Can't do that with a real book either Ever hear of an index or a Table of Contents? Yes I know they are not as good as a full word search but they are better than nothing. Allan Quote Link to comment Share on other sites More sharing options...
Havok69 Posted July 15, 2010 Share Posted July 15, 2010 You can't unless you made the original PDF from text. With these pdfs, they're just made from tiffs or jpgs. Allan Incorrect sir - I did just that on a couple of books without even OCR'ing them. Which one can't you search? Quote Link to comment Share on other sites More sharing options...
+wood_jl Posted July 15, 2010 Share Posted July 15, 2010 Can't do that with a real book either Ever hear of an index or a Table of Contents? Yes I know they are not as good as a full word search but they are better than nothing. Fortunately, when these books were scanned - in all their original glory - the tables of contents were scanned and included too, so what's the disadvantage to them, again? Quote Link to comment Share on other sites More sharing options...
TwiliteZoner Posted July 15, 2010 Share Posted July 15, 2010 For those with Usenet access with good retention rates these are available in the usual groups. Quote Link to comment Share on other sites More sharing options...
+MrFish Posted July 16, 2010 Share Posted July 16, 2010 (edited) My only grip with jpgs and PDFs is that you can't text search them nor cut text from them. Allan Who says you can't do a search on PDF's? Of course, I do have the full version of Acrobat, so perhaps it's different for me. I suppose that when these are created they could be OCR'd first and then posted up so everyone can do a search. And to voice my opinion - I prefer the books in the format they were printed in. I absolutely detest HTML formatted books; especially if you want to print the whole thing in one shot. You can't unless you made the original PDF from text. With these pdfs, they're just made from tiffs or jpgs. Allan I'm pretty sure every book in this collection is text searchable. It's done by using Acrobat (not reader). The function is called "Paper Capture", which is just their term for OCR'ing. When using the feature the user has the choice of converting the OCR'ed parts to actual text or leaving the image and having the text "behind it", so to speak. You can search, select text, and copy & paste from the latter which was used on this collection. The only minus is that the OCR'ing knows nothing of word wrap. So, the end of each physical line of text will have a line return when pasted. I prefer this method as well, as the original fonts and formatting are there and you get the advantages of OCR'ing. The OCR'ing isn't always perfect but it's quite painless to add to your PDF once it's together. Edited July 16, 2010 by MrFish Quote Link to comment Share on other sites More sharing options...
Mirage Posted July 16, 2010 Share Posted July 16, 2010 I like it better like it is as well. We're not in 1993 anymore, where storage space is a huge issue. If it is, either burn it to a DVD or copy to external hard drive, or buy a better computer. And get better software as has been explained already. Sorry, just my opinion. I like seeing it exactly as it was, so that if there are any errors, I know they are from the book, not trying to guess what OCR mucked up. Quote Link to comment Share on other sites More sharing options...
russg Posted July 16, 2010 Share Posted July 16, 2010 (edited) Seems a most important reference is missing. I guess it would be a manual, maybe why not included. 'Technical Reference Notes'. There are a few others, DOS II listing. 800XL Field Service Manual. For understanding the inner workings of our Atari, the Technical Reference Notes is superb. The Technical Reference Notes maybe, probably, already available OCR'd somewhere. I'll look around. Edited July 16, 2010 by russg Quote Link to comment Share on other sites More sharing options...
+Allan Posted July 16, 2010 Share Posted July 16, 2010 Can't do that with a real book either Ever hear of an index or a Table of Contents? Yes I know they are not as good as a full word search but they are better than nothing. Fortunately, when these books were scanned - in all their original glory - the tables of contents were scanned and included too, so what's the disadvantage to them, again? Well, I was talking about searching the whole book, not just the table of contents. Allan Quote Link to comment Share on other sites More sharing options...
+Allan Posted July 16, 2010 Share Posted July 16, 2010 You can't unless you made the original PDF from text. With these pdfs, they're just made from tiffs or jpgs. Allan Incorrect sir - I did just that on a couple of books without even OCR'ing them. Which one can't you search? Then that means somebody typed or OCRed the Table of Contents. And that's great but I like the ability to do a search throw the whole text. Allan Quote Link to comment Share on other sites More sharing options...
+remowilliams Posted July 16, 2010 Share Posted July 16, 2010 As someone who has done a whole lot of archival scanning - size is largely irrelevant these days, and original image capture preserves the uniqueness of the works while underlaid OCR provides most of the search functionality one could want. It's not always 100% perfect for text cut and paste, but it is often very good. Quote Link to comment Share on other sites More sharing options...
+hunmanik Posted July 16, 2010 Share Posted July 16, 2010 For those with Usenet access with good retention rates these are available in the usual groups. What groups? Quote Link to comment Share on other sites More sharing options...
Havok69 Posted July 16, 2010 Share Posted July 16, 2010 (edited) Then that means somebody typed or OCRed the Table of Contents. And that's great but I like the ability to do a search throw the whole text. Allan I never said anything about a table of contents - that was wood; I can search the entire text, copy, paste it into Notepad, Word, etc. You probably are using the free reader which doesn't have that functionality apparently - I am using Acrobat Pro... What groups? I remember seeing it pop up in alt.binaries.boneless. Edited July 16, 2010 by Havok69 Quote Link to comment Share on other sites More sharing options...
+MrFish Posted July 16, 2010 Share Posted July 16, 2010 Then that means somebody typed or OCRed the Table of Contents. And that's great but I like the ability to do a search throw the whole text. I never said anything about a table of contents - that was wood; I can search the entire text, copy, paste it into Notepad, Word, etc. You probably are using the free reader which doesn't have that functionality apparently - I am using Acrobat Pro... You don't need Acrobat Pro or anything special to text search, etc. Once Paper Capture has been done on the PDF you can use any PDF reader. Quote Link to comment Share on other sites More sharing options...
+Allan Posted July 16, 2010 Share Posted July 16, 2010 Then that means somebody typed or OCRed the Table of Contents. And that's great but I like the ability to do a search throw the whole text. I never said anything about a table of contents - that was wood; I can search the entire text, copy, paste it into Notepad, Word, etc. You probably are using the free reader which doesn't have that functionality apparently - I am using Acrobat Pro... You don't need Acrobat Pro or anything special to text search, etc. Once Paper Capture has been done on the PDF you can use any PDF reader. I honestly never checked it. If that's the case then he OCRed them as well which is a nice addition. I don't know how accurate it is but for 99% of the searches it's probably fine. Allan Quote Link to comment Share on other sites More sharing options...
Recommended Posts
Join the conversation
You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.