中华英语 中文的深层上下文单词表示。 本仓库只是输出某些无关的单词嵌入。 依赖 python3 张量流> = 1.10 界坝 使用方法 准备数据,参考data和vocab目录,可用pre_data/vocab.py处理出字典(每个data文件不能太大,否则内存不足) 训练模型train_elmo.py 输出模型dump_weights.py 把options.json里的261改成262 输出单词嵌入到hdf5文件usage_token.py 实验结果 用可视化工具看合理, textmatch任务textmatch AUC 1-2。 执照 麻省理工学院
2022-11-15 21:49:53 3.32MB nlp tensorflow word-embedding wordvectors
1
表示学习算法实践 word embedding & KG embedding 神经语言模型 • 词向量学习 – 基于预测的模型:word2vec – 基于技术的模型:GloVe • 课间休息 • 知识图谱表示学习 – 常用评价任务 – 基于映射的方法:TransE,TransR – 基于张量分解的方法:RESCAL • 现场实践 – C&W模型 过程的定义 (Construction) – 输入参数 – 模型参数 – 模型计算过程 – 优化过程 • 过程的执行( Execution) – 初始化模型参数 – 学习过程 » 获得训练数据 » 执行学习过程 – 保存模型参数
1
利用Twitter短文本,在训练词向量时融合进词语考虑带有的情感,得到带有情感信息的词向量。所用模型为SSWE,压缩包内包含三个文本文档:SSWE-h.txt、SSWE-r.txt、SSWE-u.txt。另,训练得到的词向量维度为50.
2021-06-05 20:20:47 91.37MB word embedding
1
从Word Embedding到Bert模型—自然语言处理中的预训练技术发展史.pdf
2021-03-08 13:06:43 4.29MB embedding NLP BERT machine
1
Abstract—Although there has been substantial research in software analytics for effort estimation in traditional software projects, little work has been done for estimation in agile projects, especially estimating user stories or issues. Story points are the most common unit of measure used for estimating the effort involved in implementing a user story or resolving an issue. In this paper, we offer for the first time a comprehensive dataset for story points-based estimation that contains 23,313 issues from 16 open source projects. We also propose a prediction model for estimating story points based on a novel combination of two powerful deep learning architectures: long short-term memory and recurrent highway network. Our prediction system is endto- end trainable from raw input data to prediction outcomes without any manual feature engineering. An empirical evaluation demonstrates that our approach consistently outperforms three common effort estimation baselines and two alternatives in both Mean Absolute Error and the Standardized Accuracy.
2019-12-21 21:41:53 401KB Word embedding NLP
1