c# - .NET从.docx文件中获取数据,就像C#中的一个大字符串

我想从C#代码中的.docx文件读取字符串数据,

我正在尝试使用 ApplicationClass Application = new ApplicationClass(); 但我得到了

错误:

The type 'Microsoft.Office.Interop.Word.ApplicationClass 'has no constructors defined

我想从我的docx文件中得到完整的文本,而不是分开的单词 !


foreach (FileInfo f in docFiles)


{


 Application wo = new Application();


 object nullobj = Missing.Value;


 object file = f.FullName;


 Document doc = wo.Documents.Open(ref file,. .... . ref nullobj);


 doc.Activate();


 doc. ==?? 


}



我想知道如何从docx文件获取整个文本?

时间:

尝试


Word.Application interface instead of ApplicationClass. 



了解Office主互操作程序集类和接口

这是我从docx文件中提取整个文本的 !


 using (ZipFile zip = ZipFile.Read(filename))


{


 MemoryStream stream = new MemoryStream();


 zip.Extract(@"word/document.xml", stream);


 stream.Seek(0, SeekOrigin.Begin); 


 XmlDocument xmldoc = new XmlDocument();


 xmldoc.Load(stream);


 string PlainTextContent = xmldoc.DocumentElement.InnerText;


}



享受吧,

确保你正在使用.NET框架 4.5 ,


using NUnit.Framework;


 [TestFixture]


 public class GetDocxInnerTextTestFixture


 {


 private string _inputFilepath = @"../../TestFixtures/TestFiles/input.docx";



 [Test]


 public void GetDocxInnerText()


 {


 string documentText = DocxInnerTextReader.GetDocxInnerText(_inputFilepath);



 Assert.IsNotNull(documentText);


 Assert.IsTrue(documentText.Length> 0);


 }


 }



using System.IO;


using System.IO.Compression;


using System.Xml;


 public static class DocxInnerTextReader


 {


 public static string GetDocxInnerText(string docxFilepath)


 {


 string folder = Path.GetDirectoryName(docxFilepath);


 string extractionFolder = folder +"extraction";



 if (Directory.Exists(extractionFolder))


 Directory.Delete(extractionFolder, true);



 ZipFile.ExtractToDirectory(docxFilepath, extractionFolder);


 string xmlFilepath = extractionFolder +"worddocument.xml";



 var xmldoc = new XmlDocument();


 xmldoc.Load(xmlFilepath);



 return xmldoc.DocumentElement.InnerText;


 }


 }



首先,需要添加一些来自程序集的引用,如:


System.Xml


System.IO.Compression.FileSystem



它次,你应该确保在你的类中调用这些函数:


using System.IO;


using System.IO.Compression;


using System.Xml;



然后你可以使用下面的代码:


public string DocxToString(string docxPath)


{


 //Destination of your extraction directory


 string extractDir = Path.GetDirectoryName(docxPath) +"" + Path.GetFileName(docxPath) +".ext";


 //Delete old extraction directory


 if (Directory.Exists(extractDir)) Directory.Delete(extractDir, true);


 //Extract all of media an xml document in your destination directory


 ZipFile.ExtractToDirectory(docxPath, extractDir);



 XmlDocument xmldoc = new XmlDocument();


 //Load XML file contains all of your document text from the extracted XML file


 xmldoc.Load(extractDir +"worddocument.xml");


 //Read all text of your document from the XML


 return xmldoc.DocumentElement.InnerText;


}



请欣赏。

...