UNICODE on DAT'S

hello Touchdesigners (this a problem that afects non-English languages display in TD)
i’m trying to parse a text file DAT into usable Table DAT so i can render with the text TOP.

all good, except for one problem; the DAT class doesn’t support UNICODE characters!

i’ve successfully read via Python script a txt file, and render with UNICODE chars on the text TOP. but I need to use DAT’s to parse the info I want to render, so that way is not an option right now…

any clues?

thanks!

Hi ruigato,

Like you said — the DAT class doesn’t support Unicode characters, so you’ll have to come up with some way to accomplish what you want to do without using DATs for parsing the text. What kind of things do you want to do to the text? Can they be done using just Python?

If you absolutely need to use a Table DAT, you can encode the strings you’re working with as Unicode escapes — characters that look like ‘\uFFEF’ — and then store them in the Table DAT. You’ll have to then push them into the text parameter of the Text TOP with another script, you can’t reference the Table DAT in the text parameter of the Text TOP (I don’t know why, but if I had to guess it has to do with the Unicode escapes being re-encoded as Unicode).

If you just want to use the Table DAT so you can select from a handful of different strings, I think a better approach would be to store a list of Unicode strings in the Text TOP and then reference them via fetching and indexing.

Attached is an example network that has both methods in it. I used Greek language text, most fonts have a Greek charset included but if the text doesn’t show up, try using a different font!

Nic
unicode_td.zip (7.59 KB)

Hi Nick,

thank you very much for your help!

i’ve come up with a similar approach using Python to do the parsing of the text file, here’s the file with my solution

cheers,
Rui
unicode.zip (45.5 KB)

meanwhile, a simple copy/paste from the web browser into the text DAT gives UNICODE text inside DAT operators!

DATs currently support extended ASCII. Which includes quite a few non-english characters, but definitely nothing like Japanese, Chinese, Korean etc.
This is what you are seeing here, not unicode, sorry.

Ahhh thanks for the heads up Malcolm

So if I convert a text file to extended ASCII I might get the Portuguese chars right?

Maybe that is what happens when I copy from HTML browser directly…

edit: just tested with notepad++ converting from unicode to ANSI, and it works!

Cheers,
Rui

Not sure about all of the characters in the Portuguese alphabet, but you’ll be able to use text with any of the characters that show up in this ASCII table (including the extended table):

ascii-code.com/

i use the tox of ruigato

I am trying to use what you have created for the french language.
it works when I use your download tox with just changing the file.
but when I try to replicate it does not work.
can you help me to figure what is wrong
TEST_TITLE_TOX.zip (7.96 KB)

Try using nicwolf1 example above or the examples found here in the wiki:
derivative.ca/wiki088/index. … le=Unicode

The ruigato example uses Table DATs so some other stuff but Table DATs do not support unicode so you can use that approach if you are trying to load in unicode characters.

NewProject.28.toe (8.85 KB)Hello,
Coming from Isadora where I was able to do synched subtitles with Isadora (accepting unicode…),
I worked hard between Christmas and New Year and I am now able to use Unicode text and parse it accordingly.
–I load my unicode .srt text with a start script:
def start():
f = open(‘lampedusaST.srt’, encoding=‘utf8’)
s = f.read()
f.close()
op(‘/project1’).store(‘loadedText’, s)
return
–I parse the file in table, calculating in and out time
tableau = op(‘table1’)
tableau.clear(keepFirstRow=True)
texte = op(‘/project1’).fetch(‘loadedText’)
texteParag = texte.split(‘\n\n’)
longParag = len(texteParag)
Rows = tableau.col(1)
nombreRows = len(Rows)
for i in range (0,longParag-1):
tableau.appendRow()
paragLigne = texteParag[i].split(‘\n’)
longLigne = len(paragLigne)
for j in range (0,longLigne):
if j == 0:
tableau[i+1,j]=paragLigne[j]
elif j == 1:
TC = paragLigne[1].split(’ → ‘)
TCinList = TC[0].split(’,‘)
TCinListA = TCinList[0].split(’:‘)
TCoutList = TC[1].split(’,‘)
TCinMil = int(TCinList[1])
TCinSec = int(TCinListA[2])*1000
TCinMin = int(TCinListA[1])*60000
TCinHour = int(TCinListA[0])*3600000
TCin = TCinMil + TCinSec + TCinMin + TCinHour
tableau[i+1,j]=TCin
TCoutList = TC[1].split(’,‘)
TCoutListA = TCoutList[0].split(’:')
TCoutMil = int(TCoutList[1])
TCoutSec = int(TCoutListA[2])*1000
TCoutMin = int(TCoutListA[1])*60000
TCoutHour = int(TCoutListA[0])3600000
TCout = TCoutMil + TCoutSec + TCoutMin + TCoutHour
tableau[i+1,j+1]=TCout
else:
tableau[i+1,j+1]=paragLigne[j]
– I search the table during movie, comparing time code and put the subtitle in text op, keeping the good french letters.
def cook(scriptOp):
mP = op(‘moviefilein1’)
tableau = op(‘table1’)
ligne1 = op(‘ligne1’)
ligne2 = op(‘ligne2’)
Rows = tableau.col(1)
nombreRows = len(Rows)
mTMil = int(mP.index)/25
1000
TCin = float(tableau[1,1])
TCout = float(tableau[1,2])
i = 1
while mTMil > TCout:
i = i+1
if i < nombreRows:
TCin = float(tableau[i,1])
TCout = float(tableau[i,2])
else :
break
if mTMil > TCin and mTMil < TCout :
L1 = tableau[i,3]
L2 = tableau[i,4]
ligne1.par.text = L1
ligne2.par.text = L2
else :
ligne1.par.text = ’ ’
ligne2.par.text = ’ ’
return
Here si the actual program (needs improvement…)
Jacques