Generation

generate functionFri, 12 May 2023

Find all the Chinese characters in a file mixed with Chinese, Japanese and English using langid

def find_chinese(filename): text = open(filename, "r").read() text = re.sub("[\s+\.\!\/_,$%^*(+\"\'“”《》?“]+|[+——!,。?、~@#¥%……&*()]+", "",text) result = "" for i in text: if langid.classify(i)[0]=="zh": result += i return result

Want to kickstart your project?Use the new AI Studio to create your code