6/29/2009

nautilus script : make PDF thumbnail to png file

* requirements :
1. python
2. gmessage (sudo apt-get install gmessage)
3. ImageMagick (sudo apt-get install imagemagick)

* source file : ~/.gnome2/nautilus-scripts/thumbnail_pdf.py (file mode 755)


#!/usr/bin/python
# -*- coding: UTF-8 -*-

import os #, glob

PDFextensions = ['.pdf', '.PDF', '.Pdf']

def showme(s):
# for easy debugging, gmessage is used
os.system('gmessage -center ' + str(s))

def MakePDFThumbnail(filename):
os.system('convert -sample 50%x50% +append "' + filename + '" "' + filename + '.png"')
# convert command from imagemagick are used here.
# -sample 50%x50% --> 50% downsizing the pdf file
# +append --> horizontally appending option. If one want a vertical appending, -append option will do

tmp = os.environ['NAUTILUS_SCRIPT_SELECTED_FILE_PATHS']
fns = tmp.split('\n')
#showme(len(fns)) # for debugging

for f in fns:
filename, fileext = os.path.splitext(f)
if fileext in PDFextensions :
MakePDFThumbnail(f)

ref : http://www.imagemagick.org/script/convert.php

6/23/2009

Setting "Color Management" in firefox 3

http://cafe.naver.com/inmacbook.cafe?iframe_url=/ArticleRead.nhn%3Farticleid=75366



If the above picture looks like the following, your web browser is ok.

(safari 3 is the only one that display the color of the above picture correctly)

Or, if not, it looks like

3 parts of different color..


firefox 3 has the option to correct this:

type about:config in address bar

search for gfx.color_management.enabled option, and change it to "True"

restart firefox.


The the picture looks correctly.

6/21/2009

gfx module

http://www.swftools.org/gfx_tutorial.html



The Python gfx module


>gfx<

1.gfx

All functionality of pdf2swf, swftools' PDF to SWF converter, is also exposed by the Python module "gfx". gfx contains a PDF parser (based on xpdf) and a number of rendering backends. In particular, it can extract text from PDF pages, create bitmaps from them, or convert PDF files to SWF. The latter functionality is similar to what is offered by swftools' (http://www.swftools.org) pdf2swf utility, however more powerful- You can also create individual SWF files from single pages of the PDF or mix pages from different PDFs.

1.1 Compiling gfx and installing

To install gfx, you first need to download and uncompress one of the archives at http://www.swftools.org/download.html. You then basically have two options:
  • You can build the Python module using setup.py
  • You can build it "manually" by using make
To do the former, all that should be required is


Code listing 1.1



python setup.py build
python setup.py install

This is the preferred way. If the above gives you any trouble or you prefer make, the following will also create the Python module:


Code listing 1.2



./configure
make
# substitute the following path with your correct python installation:
cp lib/python/*.so /usr/lib/python2.4/site-packages/

You can test whether the python module was properly installed by doing


Code listing 1.3



python -c 'import gfx'

Once the module has been properly installed, you can start to work with it, see next section.

1.2 Reading a PDF file

Reading PDF files in done using the open() call. Once the document has been opened, you can query the resulting object for some information about the PDF:


Code listing 1.4



#!/usr/bin/python
import gfx

doc = gfx.open("pdf", "document.pdf")

print "Author:", doc.getInfo("author")
print "Subject:", doc.getInfo("subject")
print "PDF Version:", doc.getInfo("version")

Using getInfo, You can query the following fields:

title, subject, keywords, author, creator, producer, creationdate, moddate, linearized, tagged, encrypted, oktoprint, oktocopy, oktochange, oktoaddnotes, version

Depending on the PDF file, not all these fields may contain useful information.

Some PDF files may be protected, or even password encrypted. You recognize protected files by the fact that doc.getInfo("encrypted") return "yes". If additionally doc.getInfo("oktocopy") is set to "no", then the file has copy protection enabled, which means that the gfx module won't allow you to extract information from it- extraction of pages (see below) will raise an exception.

If the PDF file is password encrypted, you need the password do display the file. You can pass the password to the open function by appending it to the filename, using '|' as seperator:



Code listing 1.5



#!/usr/bin/python
import gfx

doc = gfx.open("pdf", "protecteddocument.pdf|mysecretpassword")

1.3 Reading an Image or SWF file

Reading image files or SWF files is done analogously. You only need to pass a different filetype specifier to the open() function:


Code listing 1.6



#!/usr/bin/python
import gfx

doc1 = gfx.open("image", "myimage.png")
doc2 = gfx.open("swf", "flashfile.swf")

You can use all objects opened with gfx.open() in the same way. In particular, you can extract pages from them (as described in the next section), and render those pages to any kind of output device. (Notice that for image files, the number of pages in the document is always 1)

1.4 Extracting pages from a (PDF/SWF/Image) file

Once the document has been properly opened, you can start working with the content, i.e., the individual pages. You can extract a page from a file using the getPage() function. The resulting Page object gives you additional information about the file. getPage() expects the page number, which starts at 1 for the first page.

The following code lists all pages in a file, along with their size:



Code listing 1.7



#!/usr/bin/python
import gfx

doc = gfx.open("pdf", "document.pdf")
for pagenr in range(1,doc.pages+1):
page = doc.getPage(pagenr)
print "Page", pagenr, "has dimensions", page.width, "x", page.height

Note: The size of pages can vary in PDF documents. Don't make the common mistake of querying only the first page for its dimensions and using that for all other pages.

1.5 Rendering pages to bitmaps

The gfx module contains a number of rendering backends. The most interesting is probably the ImageList renderer, which creates images from pages. The following code extracts the first page of a PDF document as an image:


Code listing 1.8



#!/usr/bin/python
import gfx
doc = gfx.open("pdf", "document.pdf")
img = gfx.ImageList()
img.setparameter("antialise", "1") # turn on antialising
page1 = doc.getPage(1)
img.startpage(page1.width,page1.height)
page1.render(img)
img.endpage()
img.save("thumbnail80x80.png")

There are a number of pitfalls to be aware of, here:
  • The width and height of the thumbnail must be the same as the page (page.width, page.height). You can specify sizes smaller (or larger) than that, which will cause the page to be clipped or extended, but not scaled. If you want to scale the page, you can use the multiply option of the ImageList (which allows you to scale the page up by an integer value), the zoom option of the PDF parser (which is the same as the DPI, and 72 by default), and allows you to scale the image to any size, keeping the aspect ratio. (Also, you can using page.getImage() instead of using ImageList, which, however, will only get you a raw imagestring)
  • The 'save' function of ImageList will only create PNG files.
  • If you rendered more than one page, the save() function might create several files- one for each page. If the filename passed to save() is "image.png", then the files will be named "image.1.png", "image.2.png" etc.

1.6 Extracting text from PDF files

Using the PlainText device, you can extract fulltext from PDF files. The following code snippet demonstrates this behaviour:


Code listing 1.9



#!/usr/bin/python
import gfx
doc = gfx.open("pdf", "document.pdf")
text = gfx.PlainText()
for pagenr in range(1,doc.pages+1):
page = doc.getPage(pagenr)
text.startpage(page.width, page.height)
page.render(text)
text.endpage()
text.save("document_fulltext.txt")

If you want to extract text from images, or have broken PDF files (i.e., PDF files where the fonts don't correctly reference Unicode characters, and hence text can't be extracted properly the "normal" way), you should use the OCR device instead. In the code above, substitute the call to gfx.PlainText() with the following:


Code listing 1.10



...
gfx.setparameter("zoom", "400")
text = gfx.OCR()
...

As you can see, the OCR device behaves just like any other device. Internally, it will generate images out of the pages that it's asked to process, and perform OCR (Optical character recognition) on them. You should use the "zoom" parameter to scale up the images that OCR operates on, for better results (in this example, the images are scaled up by 400%).

1.7 Rendering pages to SWF files

As the gfx module derives from pdf2swf, of course it can also convert PDF files to SWF files. The code needed for this is similar to the previous examples:



Code listing 1.11



#!/usr/bin/python
import gfx
doc = gfx.open("pdf", "document.pdf")
swf = gfx.SWF()
for pagenr in range(1,doc.pages+1):
page = doc.getPage(pagenr)
swf.startpage(page.width, page.height)
page.render(swf)
swf.endpage()
swf.save("document.swf")

With gfx.SWF device (and with pdf2swf, too), you have a number of options for how the SWF content should be created:
  • Render everything in the same way as in the PDF- shapes will be converted to shapes, text to text, and bitmaps to bitmaps. (pdf2swf without any options)
  • Render as much as possible to bitmaps, but keep text as text and links as links. (pdf2swf -O1) In gfx, this is done by passing the "poly2bitmap" parameter to the module.
  • Render everything to bitmaps. Only links will be preserved. (pdf2swf -O2) In gfx, this is done by passing the "bitmap" parameter to the module.
It's important that you set all parsing related parameters before loading the PDF file, as most of the optimization is done during the loading process:


Code listing 1.12



#!/usr/bin/python
import gfx
gfx.setparameter("bitmap", "1") # or "poly2bitmap"
doc = gfx.open("pdf", "document.pdf")
...

Parsing related parameters are: bitmap, poly2bitmap, bitmapfonts, fonts, fontdir, languagedir.

1.8 Putting more than one input page on one SWF page

You don't need to start a new output page for every input page you get. Therefore, you can e.g. put pairs of two pages beside each other:


Code listing 1.13



#!/usr/bin/python
import gfx
doc = gfx.open("pdf", "document.pdf")
swf = gfx.SWF()

for pagenr in range(doc.pages/2):
page1 = doc.getPage(pagenr*2+1)
page2 = doc.getPage(pagenr*2+2)
swf.startpage(page1.width+page2.width, max(page1.height,page2.height))
page1.render(swf,move=(0,0))
page2.render(swf,move=(page1.width,0), clip=(page1.width,0,page1.width+page2.width,page2.height))
swf.endpage()

if doc.pages%2:
# for an odd number of pages, render final page
page = doc.getPage(doc.pages)
swf.startpage(page.width,page.height)
page.render(swf)
swf.endpage()

swf.save("document.swf")

In this code, we used the move and clip parameters of the render function to shift the second page to the right, and then clip it to its bounding box.

1.9 Parsing (PDF/Image/SWF) content yourself

If none of the supplied output devices (PlainText, ImageList, SWF) is doing what you need, you can also process the PDF content yourself. The gfx module gives you an easy way to do it, by translating the usually very complex PDF file contents into a number of very simple drawing operations. In order to pass those operations to Python, you need the PassThrough output device, together with a custom class:


Code listing 1.14



import gfx
class MyOutput:
def setparameter(key,value):
print "setparameter",key,value
def startclip(outline):
print "startclip",outline
def endclip():
print "endclip"
def stroke(outline, width, color, capstyle, jointstyle, miterLimit):
print "stroke",outline
def fill(outline, color):
print "fill",outline
def fillbitmap(outline, image, matrix, colortransform):
print "fillbitmap",outline
def fillgradient(outline, gradient, gradienttype, matrix):
print "fillgradient",outline
def addfont(font):
print "addfont"
def drawchar(font, glyph, color, matrix):
print "drawchar"
def drawlink(outline, url):
print "drawlink", outline, url

doc = gfx.open("pdf", "document.pdf")
output = gfx.PassThrough(MyOutput())
doc.getPage(1).render(output)

The above is the minimum of functions the class passed to "PassThrough" must have in order to be able to process all PDF content. If any of the functions are not defined, a error message will be printed, however the rendering process will not be aborted.
gfx

mendeley (reference manager)

Hardy/8.04:
deb http://www.mendeley.com/repositories/xUbuntu_8.04 /
Intrepid/8.10:
deb http://www.mendeley.com/repositories/xUbuntu_8.10 /

apt-get update
apt-get install mendeleydesktop

Then in a terminal window as a user enter the following to run Mendeley Desktop:
mendeleydesktop

6/19/2009

Openoffice calc function script giving a grade from a rank

The following function returns a degree (e.g., A+ or A or ... etc) from a rank data

Function getdegree(V)
'--------------------------------------------------
' Parameters must be set according to each class
'--------------------------------------------------

total = 31 ' total student number
Ap_percent = 15 * 0.01 ' percentage of A+ degree= 15%
A_percent = 30 * 0.01 ' accumulated percentage 0f A degree
Bp_percent = 45 * 0.01
B_percent = 60 * 0.01
Cp_percent = 85 * 0.01
C_percent = 100 * 0.01
Dp_percent = 0 * 0.01
D_percent = 0 * 0.01

'---------------------------------------------------

Ap = int(total * Ap_percent)
A = int(total * A_percent)
Bp = int(total * Bp_percent)
B = int(total * B_percent)
Cp = int(total * Cp_percent)
C = int(total * C_percent)
Dp = int(total * Dp_percent)
D = int(total * D_percent)



select case V
case 1 to Ap
getdegree = "A+"
case Ap to A
getdegree = "A"
case A to Bp
getdegree = "B+"
case Bp to B
getdegree = "B"
case B to Cp
getdegree = "C+"
case Cp to C
getdegree = "C"
case C to Dp
getdegree = "D+"
case Dp to D
getdegree = "D"
case else
getdegree = "F"
end select

End Function