Three things…
First, avoid using name as a variable as it already has meaning in various places. You can see that its colour is different than the other variables (on my system it is coloured purple, like a property, rather than green like a variable. Try something like fName instead.
Second, the key cause of your issue is words.
If you run the following line in a separate script, you will see the issue:
words of "111.1.1 - 檔案(cookie)"
--> {"111.1.1", "檔", "案", "cookie"}
To the system, each of the two Chinese characters are individually considered a word. Additionally, the () are considered punctuation and not parts of the word cookie. I should note that depending upon your system environment (mine is set to US English), you may get a different result.
The outcome is that when you ask for the first three words of the string, you don’t get cookie.
Will your input string always have the same structure?
eg ‘datestring’ ‘ – ‘ ‘chinese characters’ ‘word in parentheses’
Updates:
Try this. Instead of using grep and words (which may not apply depending upon the input string), I use sed and paragraphs. The sed command breaks the string into paragraphs based on each capturing group. Each of the three variables can then be assigned the appropriate group.
Update: Revise script to properly handle both inputString and currentFolderPath, per comments/image
set inputString to quoted form of "113.1.29 - 一段文字(123)"
set currentFolderPath to "/Users/cookie/Desktop/113.10.10 - helloworld測試(test)"
set regexCmd to "sed -Ee 's@^.*\\/([[:digit:]]+\\.[[:digit:]]+\\.[[:digit:]].*$)@\\1@' -e 's@([[:digit:]]+\\.[[:digit:]]+\\.[[:digit:]]+)[[:blank:]]-[[:blank:]](.*)(\\(.*\\))@\\1\\
\\2\\
\\3@ ' "
-- filename, person
set {inputDateString, inputFileName, inputPerson} to extractData(inputString, regexCmd)
display alert "Date string: " & inputDateString # working as intended
display alert "File name: " & inputFileName # bad value
display alert "Person: " & inputPerson # bad value
log "inputString: " & inputString & linefeed & "inputDateString: " & inputDateString & linefeed & "inputFileName: " & inputFileName & linefeed & "inputPerson: " & inputPerson & linefeed
-- folder path
set currentFolderPath to quoted form of currentFolderPath
set {folderDateString, folderFileName, folderPerson} to extractData(currentFolderPath, regexCmd)
log "currentFolderPath: " & currentFolderPath & linefeed & "folderDateString: " & folderDateString & linefeed & "folderFileName: " & folderFileName & linefeed & "folderPerson: " & folderPerson
-- handler
on extractData(inputString, regexCmd)
try
set capturedData to {}
set regexResults to do shell script "printf " & inputString & " | " & regexCmd
set {rocDateString, filename, person} to paragraphs of regexResults
set capturedData to {rocDateString, filename, person}
on error
set capturedData to {"", "", ""}
end try
return capturedData
end extractData
BTW, here is the shell version of the command, so you can also test it in the terminal.
% printf "113.1.29 - 一段文字(123)" | sed -E -e 's@^.*/([[:digit:]]+\.[[:digit:]]+\.[[:digit:]].*$)@\1@' -e 's@([[:digit:]]+\.[[:digit:]]+\.[[:digit:]]+)[[:blank:]]-[[:blank:]](.*)(\(.*\))@\1\
\2\
\3@ '
