Trains any class that implements the flair.nn.Model interface.
:param base_path: Main path to which all output during training is logged and models are saved
:param learning_rate: Initial learning rate (or max, if scheduler is OneCycleLR)
:param mini_batch_size: Size of mini-batches during training
:param mini_batch_chunk_size: If mini-batches are larger than this number, they get broken down into chunks of this size for processing purposes
:param max_epochs: Maximum number of epochs to train. Terminates training if this number is surpassed.
:param scheduler: The learning rate scheduler to use
:param cycle_momentum: If scheduler is OneCycleLR, whether the scheduler should cycle also the momentum
:param anneal_factor: The factor by which the learning rate is annealed
:param patience: Patience is the number of epochs with no improvement the Trainer waits
until annealing the learning rate
:param min_learning_rate: If the learning rate falls below this threshold, training terminates
:param train_with_dev: If True, training is performed using both train+dev data
:param monitor_train: If True, training data is evaluated at end of each epoch
:param monitor_test: If True, test data is evaluated at end of each epoch
:param embeddings_storage_mode: One of 'none’ (all embeddings are deleted and freshly recomputed),
'cpu’ (embeddings are stored on CPU) or 'gpu’ (embeddings are stored on GPU)
:param checkpoint: If True, a full checkpoint is saved at end of each epoch
:param save_final_model: If True, final model is saved
:param anneal_with_restarts: If True, the last best model is restored when annealing the learning rate
:param shuffle: If True, data is shuffled during training
:param param_selection_mode: If True, testing is performed against dev data. Use this mode when doing
:param num_workers: Number of workers in your data loader.
:param sampler: You can pass a data sampler here for special sampling of data.
:param eval_on_train_fraction: the fraction of train data to do the evaluation on,
if 0. the evaluation is not performed on fraction of training data,
if 'dev’ the size is determined from dev set size
:param eval_on_train_shuffle: if True the train data fraction is determined on the start of training
and kept fixed during training, otherwise it’s sampled at beginning of each epoch
:param kwargs: Other arguments for the Optimizer