Flairのtag_dictionaryをハードコーディングで入力

11月 16, 2020

datapath = "/path/to/BIO" # train.tsv, test.tsv, devel.tsvが入っているフォルダ
corpus: Corpus = loadCorpus(datapath)
tag_type = "ner"
tag_dictionary = corpus.make_tag_dictionary(tag_type=tag_type)
tag_dictionary.idx2item = [b'<unk>', b'O', b'B-Chemical', b'I-Chemical', b'<START>', b'<STOP>'] #リストで入力
tag_dictionary.item2idx = {b'<unk>':0, b'O':1, b'B-Chemical':2, b'I-Chemical':3, b'<START>':4, b'<STOP>':5} #辞書で入力
embedding_objects: List[TokenEmbeddings] = []
embedding_objects.append(BytePairEmbeddings("en"))
embeddings: StackedEmbeddings = StackedEmbeddings(embeddings=embedding_objects)
tagger: SequenceTagger = SequenceTagger(
    hidden_size=256,
    embeddings=embeddings,
    tag_dictionary=tag_dictionary,
    tag_type=tag_type,
    use_crf=True,
    )

from flair.trainers import ModelTrainer
resultpath = "/path/to/result"
trainer: ModelTrainer = ModelTrainer(tagger, corpus)
trainer.train(
    str(resultpath),
    learning_rate=0.1,
    mini_batch_size=128,
    max_epochs=3,
    patience=5,
    embeddings_storage_mode="gpu"
    )

補足

この方法で学習したところ,エラー発生.やはり.make_tag_dictionaryでしか作れないのか...

Flair

Posted by vastee