shloktech
diff --git a/‎Readme.md‎
Lines changed: 53 additions & 9 deletions b/‎Readme.md‎
Lines changed: 53 additions & 9 deletions
diff --git a/‎build/lib/md2docx_python/src/docx2md_python.py‎
Lines changed: 95 additions & 0 deletions b/‎build/lib/md2docx_python/src/docx2md_python.py‎
Lines changed: 95 additions & 0 deletions
diff --git a/‎dist/md2docx_python-1.0.0-py3-none-any.whl‎
6.93 KB b/‎dist/md2docx_python-1.0.0-py3-none-any.whl‎
6.93 KB
@@ -2,8 +2,17 @@
 
 ## Overview
 
-Simple and straight forward Python utility that converts a Markdown file (`.md`) to a Microsoft Word document (`.docx`). It supports multiple Markdown elements, including headings, bold and italic text, both unordered and ordered lists, and many more.
+Simple and straight forward Python utility that converts Markdown files (`.md`) to Microsoft Word documents (`.docx`) and vice versa. It supports multiple Markdown elements, including headings, bold and italic text, both unordered and ordered lists, and many more.
 
+## Word to Markdown Conversion Example:
+#### Input .docx file:
+![image](https://github.com/user-attachments/assets/2891ebdf-ff36-4fd5-af2f-b35413264b06)
+
+#### Output .md file:
+![image](https://github.com/user-attachments/assets/e46c096b-762e-4f0c-a0ab-f81c3069a533)
+
+
+## Markdown to Word Conversion Example:
 #### Input .md file:
 ![image](https://github.com/user-attachments/assets/c2325e52-05a7-4e11-8f28-4eeb3d8c06f5)
 
@@ -13,18 +22,22 @@ Simple and straight forward Python utility that converts a Markdown file (`.md`)
 
 ## Features
 
-- Converts Markdown headers (`#`, `##`, `###`) to Word document headings.
-- Supports bold and italic text formatting.
-- Converts unordered (`*`, `-`) and ordered (`1.`, `2.`) lists.
-- Handles paragraphs with mixed content.
+- Bi-directional conversion between Markdown and Word documents
+- Handles various programming languages code given in word doc like python, ruby and more.
+- Converts Markdown headers (`#`, `##`, `###`) to Word document headings and back
+- Supports bold and italic text formatting
+- Converts unordered (`*`, `-`) and ordered (`1.`, `2.`) lists
+- Handles paragraphs with mixed content
+- Preserves document structure during conversion
 
 ## Prerequisites
 
 You need to have Python installed on your system along with the following libraries:
 
-- `markdown` for converting Markdown to HTML.
-- `python-docx` for creating and editing Word documents.
-- `beautifulsoup4` for parsing HTML.
+- `markdown` for converting Markdown to HTML
+- `python-docx` for creating and editing Word documents
+- `beautifulsoup4` for parsing HTML
+- `mammoth` for converting Word to HTML
 
 Sure, let's enhance your instructions for clarity and completeness:
 
@@ -74,7 +87,33 @@ This code will create a file named `amazon_case_study.docx`, which is the conver
 
 ---
 
-This should make it easier to understand and follow the steps. Let me know if you need any more help or further enhancements!
+#### For Converting Word to Markdown
+Use the `word_to_markdown()` function to convert your Word document to Markdown:
+
+```python
+word_to_markdown(word_file, markdown_file)
+```
+
+- `word_file`: The path to the Word document you want to convert
+- `markdown_file`: The desired path and name for the output Markdown file
+
+
+Here's a complete example:
+
+```python
+from md2docx_python.src.docx2md_python import word_to_markdown
+
+# Define the paths to your files
+word_file = "sample_files/test_document.docx"
+markdown_file = "sample_files/test_document_output.md"
+
+# Convert the Word document to a Markdown file
+word_to_markdown(word_file, markdown_file)
+```
+
+This code will create a file named `test_document_output.md`, which is the conversion of `test_document.docx` to the Markdown format.
+
+---
 
 ## Why this repo and not others ?
 
@@ -108,6 +147,11 @@ Here are some reasons why this repo might be considered better or more suitable
 ### 8. **Privacy**
    - If you are working in a corporate firm and you want to convert your markdown files to word and you use a online tool to do it then there are chances that they will store your file which can cause to a vital information leak of your company. With use of this repo you can easily do the conversion in your own system.
 
+### 9. **Bi-directional Conversion**
+   - **Complete Workflow**: Convert documents in both directions, allowing for round-trip document processing
+   - **Format Preservation**: Maintains formatting and structure when converting between formats
+   - **Flexibility**: Easily switch between Markdown and Word formats based on your needs
+
 ### Comparison to Other Scripts
 - **Feature Set**: Some scripts may lack comprehensive support for Markdown features or may not handle lists and text formatting well.
 - **Performance**: Depending on the implementation, performance might vary. This script is designed to be efficient for typical Markdown files.
 
@@ -0,0 +1,95 @@
+from docx import Document
+import re
+
+
+def word_to_markdown(word_file, markdown_file):
+    """
+    Convert a Word document to Markdown format
+
+    Args:
+        word_file (str): Path to the input Word document
+        markdown_file (str): Path to the output Markdown file
+    """
+    # Open the Word document
+    doc = Document(word_file)
+    markdown_content = []
+
+    for paragraph in doc.paragraphs:
+        # Skip empty paragraphs
+        if not paragraph.text.strip():
+            continue
+
+        # Get paragraph style
+        style = paragraph.style.name.lower()
+
+        # Handle code blocks
+        if style.startswith("code block") or style.startswith("source code"):
+            markdown_content.append(f"```\n{paragraph.text.strip()}\n```\n\n")
+            continue
+
+        # Handle headings
+        if style.startswith("heading"):
+            level = style[-1]  # Get heading level from style name
+            markdown_content.append(f"{'#' * int(level)} {paragraph.text.strip()}\n")
+            continue
+
+        # Handle lists
+        if style.startswith("list bullet"):
+            markdown_content.append(f"* {paragraph.text.strip()}\n")
+            continue
+        if style.startswith("list number"):
+            markdown_content.append(f"1. {paragraph.text.strip()}\n")
+            continue
+
+        # Handle regular paragraphs with formatting
+        formatted_text = ""
+        for run in paragraph.runs:
+            text = run.text
+            if text.strip():
+                # Handle inline code (typically monospace font)
+                if run.font.name in [
+                    "Consolas",
+                    "Courier New",
+                    "Monaco",
+                ] or style.startswith("code"):
+                    if "\n" in text:
+                        text = f"```\n{text}\n```"
+                    else:
+                        text = f"`{text}`"
+                # Apply bold
+                elif run.bold:
+                    text = f"**{text}**"
+                # Apply italic
+                elif run.italic:
+                    text = f"*{text}*"
+                # Apply both bold and italic
+                elif run.bold and run.italic:
+                    text = f"***{text}***"
+                formatted_text += text
+
+        if formatted_text:
+            markdown_content.append(f"{formatted_text}\n")
+
+        # Add an extra newline after paragraphs
+        markdown_content.append("\n")
+
+    # Write to markdown file
+    with open(markdown_file, "w", encoding="utf-8") as f:
+        f.writelines(markdown_content)
+
+
+def clean_markdown_text(text):
+    """
+    Clean and normalize markdown text
+
+    Args:
+        text (str): Text to clean
+
+    Returns:
+        str: Cleaned text
+    """
+    # Remove multiple spaces
+    text = re.sub(r"\s+", " ", text)
+    # Remove multiple newlines
+    text = re.sub(r"\n\s*\n\s*\n", "\n\n", text)
+    return text.strip()