My Debut Novel “The Human Countermove” Is Now Available!

I’m gonna keep this update brief.

After three years my debut novel is now available to purchase on Amazon! It’s a cerebral, near-future sci-fi built from my love of strategy games like Chess. In the next few days I will be releasing a post discussing all the different strategies and games I built my book from, but today it’s all about the celebration!

Thank you to all my readers, my family, and my friends. Becoming a novelist took a lot longer than I expected, but I’ve enjoyed every little project along the way. The terrible practice novel, the staged reading of my play, the years developing ed-tech stories for students, each project was a step on my journey here.

Don’t worry, I have no intention of stopping. My next project (Project APHELION) is already about 10% of the way through its second draft, so hopefully it won’t be too long before we’re back here again with another exciting story.

As I schedule appearances at book signings, farmer’s markets, and reader events, I will post them here.

Thank you again, and happy reading.

– Logan Sidwell

The Human Countermove is now available for purchase! https://www.amazon.com/dp/B0FM9R7T5F

In a nation ruled by AI Minds, productivity is everything—even play.

Once a legend in the world of strategy games, Zouk Solinsen is now just another burnout in a society obsessed with efficiency. But when the Minds announce a high-stakes tournament—with a seat on the ruling council as the prize—Zouk is drawn back into the fray, determined to reshape the future.

With help from the enigmatic Torrez Institute, Zouk racks up early victories against the Minds. But when Maya Torrez reveals the cost of her support—a violent coup against the Minds—he rejects it and strikes out alone.

Now, with no allies, dwindling resources, and a nation on the brink, Zouk faces the biggest game of his life—and a final, impossible choice: reform the system from within, or burn it all down.

The Circle and The Allegorical Battle for Society’s Soul

The following post contains spoilers for the novel The Circle by Dave Eggers.

I used to work for a tech company, somewhere over 1000 employees. I did a bit of coding, a bit of problem-solving, but most importantly a whole lot of messaging other people. There were a million different channels for a million different things. Some niche, some broad, but every one of them had new posts each morning.

When I first started, I tried to keep up with everything. It made me a nervous wreck. Then I tried to ignore everything, and I’d miss key announcements. I’ve always disliked those big messaging systems, and I’m glad I’m free of them.

Reading The Circle by Dave Eggers was like being dumped right back into the worst of it.

The book tells the story of Mae Holland, an eager-to-please young woman hired into the biggest social media company in the country, The Circle. She starts her job by constantly monitoring and posting to every little channel in The Circle’s network. The chapters when she’s posting, reading, and responding to surveys stress me out. It highlights early The Circle’s attitude towards information. Any moment not gathering or generating information is a moment wasted.

But it’s not all posts and likes. The story’s true plot is a battle for Mae’s soul. 

At work, the executives and the employees make the argument for all the good social media is bringing to the world. No more secrets. No more backroom deals. All the world a friend.

At home, Mae’s parents and ex-boyfriend strive to protect their privacy. They don’t dare put down Mae’s achievements, but there’s a quiet reticence from her family to hop on board the information bandwagon.

But The Circle isn’t about Mae, and the fight for Mae’s soul is allegorical. The true fight is ours.

The Products of The Circle

We see a dozen different products from The Circle over the course of the book. Tiny cameras planted on every street corner, centralized identity systems to tie every post to a single person, complete catalogs of a person’s history. Each product helps build The Circle’s philosophy. Any information that isn’t recorded is information wasted. We even see 1984 style slogans like “All that happens must be known”.

But it comes from a good place. One of The Circle’s employees Francis Garaventa is out there inventing new ideas with the goal of protecting children. The kind of respectable, un-debatable goal that justifies putting chips in kids’ arms.

Later in the book, we see politicians wearing body cameras for their conversations. We see The Circle ask their users all kinds of questions and use those polls to push their political influence forward. The novel asks its readers hard questions. Is it so wrong to want to live in a transparent world? Is it so wrong to want to protect everyone? Aren’t you tired of the secrets and backroom deals of today?

Of course, with each product, we see both sides. All the good it could do, and all the privacy we’d have to surrender.

The Three Wise Men

In the back half of the book, we meet The Three Wise Men. These are the founders of The Circle. One a tech genius, one a product guy, and one a salesman.

Here, the book poses its second debate. If the products of The Circle didn’t send a shiver down your spine. If you find yourself drawn in by the products, happy to surrender a little privacy for a little more safety, Dave Eggers presents the flaw in making such an exchange.

The scene is presented as a meeting of three aquatic animals. A reclusive seahorse, an ever-stretching octopus, and a shark. The Three Wise Men. The meeting ends the way it always had to end. A shark is the only thing left in the tank.

A decade after the whole world joins The Circle, who will control the company? And how long do good intentions last?

Guided Into Their Arms

Mae’s journey into the inner sanctum of The Circle is one filled with tricks and manipulations. As she embraces the philosophy of The Circle. Her relationship with her family weakens with every visit. Her ex-boyfriend’s diatribes in favor of a less connected world feel more out of place with each speech. Mae’s embrace of the “Privacy is Theft” motto enables her to post his heartfelt hand-written letter online, a place where an echo chamber of commenters reinforce her every bias.

Then Mae makes a mistake. One with mild police involvement. The Circle is benevolent, it’s understanding, it helps Mae find freedom through confession of her mistakes.

A friend of mine pointed out The Circles tricks were exactly what a cult does to ensure its members stay with them forever. Cut off family and friends, take away Mae’s identity outside The Circle, let the social network fill her with all the love she’s losing without her family.

With Mae secure in The Circle, the evil plot is revealed. It’s not enough that all Circle users surrender their data. Everyone needs to be a part of it. A friendly invitation to be enforced on every citizen.

Tragedy and Hard Decisions

Major spoilers below.

Mae embraces it all of it, and The Circle’s influence is pushed to its limit, to tragic results. We see the cost of total transparency when one character’s historical ancestry is revealed to be a long line of monsters and criminals. We see the cost of enforced participation when Mae targets her ex-boyfriend to be brought into the fold.

This is the bucket of cold water, the moment of lucidity in Mae’s data-mining fever. She’s given a chance to change course. An opportunity to tear The Circle down before it’s drawn around the entire nation.

And she doesn’t.

Because it’s not really her choice to make. The book isn’t about Mae Holland saving the world from the dangers of social media. It’s about society’s enthusiastic surrender of our freedoms, about our call to lift every rock and shine a light down every alley, disregarding any notion of ‘privacy’.

So when Mae makes the wrong choice at the end of the book, she only does it because it’s what we’ve all been doing. Each time we make a new profile, refuse to delete an old one, dig up an old mistake to tear a person down, and offer a new picture for verification, we move one closer to closing The Circle.

The battle of The Circle is far from over. There have been some real victories for privacy in the last decade. Victories that probably looked impossible when this book was written. But if you want a clear picture of the sides and of what could be at stake, The Circle makes an extremely compelling case.

My debut novel, THE HUMAN COUNTERMOVE is now available for pre-order!

Three Years Later… I Have a Novel

On September 1st, my debut novel is being released. The Human Countermove. Getting it released is incredibly exciting, and knowing it took three years fills me with a quiet dread. The journey has been incredibly long. Two years to write it. One year to decide what to do with it, and now it’s available for sale. I’m counting every pre-order on a little calendar, crossing off a square with every sale.

Not that you can trust me, but it’s my opinion I’ve written a compelling book. My mom liked it for one. That’s a big improvement over my practice novel. My beta readers liked it, I even managed to convince one of my readers to review two different drafts, which is unheard of in the beta reader space. Usually you only get one chance to impress someone.

But it’s here. It’s been professionally edited, copy-edited, and gone over again and again. Ready for scrutinizing eyes.

The Journey

They say the first one million words are practice. I believe I hit the equivalent of one million words somewhere near the end of my first draft. There was a day when a switch flipped in my head. From then on, my understanding of scene composition, dialog, and character motivations was just, clearer.

For someone editing their first book, a sudden jump in skill is very bad news. It meant I had to face my rough, rough, rough first draft and clean it up with a newfound understanding of storytelling. That’s a lot of work for a single broom.

I lost momentum a couple of times. My systems for reliably writing weren’t in place yet. One weekend I’d pump out 13,000 words, then nothing for a month.

Even the soul of the story wasn’t there on the first go-around. I found it partway into the second draft. A great idea that really clarified the narrative. Funny enough, I wanted to put that heart in the sequel. My editor talked me out of it, convinced me that good ideas are meant to be spent, and that my debut should be as strong as it can be.

In my opinion the back third of this book is where it excels, a final arc that imbues the whole story with purpose. The place where all those funny little ideas were vacuumed out of a hypothetical sequel and pulled into the original.

Choosing to Self Publish

I’m an impatient man. It’s silly of me to be impatient after spending two years writing up a draft, but I was ready for this project to be out there. I’ve met plenty of writers sitting on twelve novels just waiting for the right agent to turn them into stars, that’s not the path for me.

The scariest part of self-publishing is knowing that every inch of success is entirely on you. That also means if the book only sells a dozen copies, it’s your fault. For me, that didn’t seem so bad. I’d rather improve by releasing my work and letting people give me honest feedback than hide away and write book after book on my own. I’ve never worked on something that didn’t get released to the public within four months of being finished before, so a year of waiting was an eternity.

Now that the time is here, I’m really enjoying the process. Soon there will be something out in the world that I’m proud of, something I made, something I’m eager to share. Lately I’ve been attending a lot of farmer’s markets. I haven’t made a single sale, but the experience has been a blast. I get to spend time speaking to real people, giving advice to novice writers, learning what different readers like reading. After all this time on my own pushing to finish a product, getting to know someone else’s story is sort of, healing.

My review of self-publishing so far: Owning my own book and owning my own success is hard work and an absolute joy.

The Novel

I can’t write this whole thing up without talking about my novel! The book is titled “The Human Countermove”, there’ll be a link and description down at the end. But here, in this little blog, I want to give a more informal description.

The book tells the story of Zouk, a washed up strategy game grandmaster who challenges the three AI rulers of his society to determine society’s future.

It’s a cerebral near-future sci-fi, inspired by my love of chess and strategy games. The premise is drawn from the famous chess match Kasparov vs Deep Blue (1997), where mankind’s best chess player was soundly defeated by an algorithm.

I wrote this thing on the hunt for some narrative payback. In real life, we got our butt handed to us. In The Human Countermove, the big question at the start of the book is, ‘What can a person do to out-think something that is cognitively superior’? Zouk Solinsen is my very own John Henry the steel-driving man, except this time instead of trying to beat the machine by brute-force, Zouk pulls every trick in the book to get an advantage.

One thing I fought hard to keep in the book was a rejection of the normal dystopian tropes. So often in these things society is irredeemable, and it all descends into war and destruction. The reader watches the conflict between robots and humans pave a fiery trail for centuries, they see the last few untracked humans turn into a rebellion. I’m ready for something new.

Our main character is a victim of a broken system. A system that demands efficiency in every act. Work and play and rest, all measured, all prescribed in particular doses. It’s not unreasonable to be angry. A broken system needs change. But at the heart of the story is one issue, does the system need to be burned down, or do we not yet understand it? Is there something inherently wrong with a society run by AI Minds? Maybe. Or maybe there’s just a separation between what mankind asks for and what we really want.

Conclusion

As silly as it is, I’ve often defined whether or not I’m a writer by the absence of a published book. I’ve worked professionally in the field, I’ve written for graphics teams, voice actors, education companies, by all means, I am a writer. But this was the last hurdle. As soon as this book comes out, I get to say it to myself and mean every word.

Next week, I will be a novelist.

My debut novel is now available for pre-order. Release Date September 1st. I’m still working out the last few kinks on the paperback side, but that option should be made available soon.

https://www.amazon.com/dp/B0FM9R7T5F

In a nation ruled by AI Minds, productivity is everything—even play.

Once a legend in the world of strategy games, Zouk Solinsen is now just another burnout in a society obsessed with efficiency. But when the Minds announce a high-stakes tournament—with a seat on the ruling council as the prize—Zouk is drawn back into the fray, determined to reshape the future.

With help from the enigmatic Torrez Institute, Zouk racks up early victories against the Minds. But when Maya Torrez reveals the cost of her support—a violent coup against the Minds—he rejects it and strikes out alone.

Now, with no allies, dwindling resources, and a nation on the brink, Zouk faces the biggest game of his life—and a final, impossible choice: reform the system from within, or burn it all down.

My First Draft Took 7 months, Here’s What I Learned

I just finished the first draft of my second book. It took 7 months. The final word count was about 87,000 words. That averages out to about 410 words per day. But that’s not the reality.

The reality is half my book was written across 7 very productive weeks, and half my book was written across 5 very unproductive months. Here’s what I learned.

Find The Process

Last week I wrote a post about my writing process. On days I wrote, I always hit my wordcount goal of 1,200 words. But for a long time, getting my butt in the chair and focussing enough to work proved impossible. Then I started pre-writing with a pen and paper, and I put a time on my phone each day for writing and everything got easier.

From the moment I found my process, my average word-count per week shot up to 6,000. About 5.5 days per week on and off. If I had hit that number from the start, the book would have been done in 2 and a half months.

Momentum is Everything

Forming a consistent rhythm is hard. And sometimes life forces us to make exceptions. Here’s what I’ve learned about myself.

If I take a one day break from writing, I can get back to writing the next day without any issue.

If I take a two day break, I get kinda anxious and starting again becomes a challenge.

After three days, the momentum is gone, and I have to start cold.

The next time I’m writing the first draft of a book, I plan on allocating three dedicated months, with only brief weekend retreats to break things up. Once the habit is formed, it’s harder to break it than to fulfill it. But if I give myself too many excuses, too many easy outs, the habit dies before it’s formed.

Love (With Your Novel) is Fleeting

It’s easy to fall in love with a book. It’s much harder to stay in love. You can only work on the same task for so long before you start to hate it. A terrible kind of insecurity bubbles up, a voice in your ear whispers that your story is terrible.

About 3 months into my drafting, I stopped loving my book. Worse, I stopped liking it. And once that happened, getting words on the page was almost impossible.

The good news is: It’s fixable. It took a little wine and dining, but with the right attitude and a careful approach, I was able to rediscover my passion at least twice while getting the thing written.

The process was pretty simple, when I had been away from my book for a couple weeks and the spark was gone, I’d revisit the book the way I had at the start. Begin by visualizing the world, the aesthetics, the look and wonder of the story. The joy of the concept rather than the pain of the details. Then I’d see my characters, the protagonist with all their flaws, and everything they were trying to do. But it was more than seeing them, it was seeing what was still in store for them. I’d have a third of a book written, and I’d be able to look into the future and know what was still on its way. The end of the arc, still not on the page. My love would reignite, I had seen everything I loved about the story and everything that was still in store. It’s the reason I’m telling the story, the idea that bubbles in my stomach and warms my heart.

Too Much Buildup is Bad for The Writer

Ideas are made to be spent. Once they come into your brain, they fill a space of it until the day you get them onto the page. Worse, a great idea likes to return again and again, occupying most of your thoughts as you imagine the same scene from a hundred different angles.

The trouble with all that thinking is the buildup. At the end of the day, you only get to tell the story one way. And what does that mean for all those other perspectives? They’re tossed in the bin. Maybe I get to pull an idea or two along the way, but most of it is just wasted brainspace.

My brain knows it’s wasted work, and it hates it.

If I love a scene too much, my brain does everything in its power to keep me from writing it. To write is to commit, it takes the infinite possibility and beauty of a concept and turns it into concrete words.

For me, the best thing I can do with a scene I love is get through it as soon as possible. Keep the reimaginings low, keep the ways to spruce things up limited, and let the scene be like you saw it for the first time in your head, even when sometimes it’s just two characters chatting in a garage. It’s much easier to edit a poorly written chapter than fill a blank page.

The Outline is Key

My outline was my most important ingredient, it turned the impossible journey of 100,000 words into a bunch of 1,200 word slices. When I lost momentum, I put a list on the wall, a series of individual scenes pulled from my outline. With each scene written, I’d cross it off and move onto the next. It meant all I really needed to think about was what was directly ahead, not the entire maw that is the rest of the novel. With this book, the further the outline got into the story, the looser it described the events. That hurt me a lot. The less detail I determined early, the more work I had on the day.

New Rules

For me, seven months is too long to write a draft. The longer it takes to write, the more complications crop up along the way. My dream is to draft in 3-4 months. Less than that isn’t possible unless I start increasing my daily word count goals, and I’d rather consistently hit the daily goals I have now than risk pushing myself too hard and lose months from burnout. So, with all that in mind, I’ve set myself a few new rules:

  1. From the moment I start my draft, the next three months can have no major trips, just the occasional weekend getaway.
  2. If I miss 1 day of writing, I have to do everything in my power to make sure I hit my goal the next day.
  3. Once a scene is imagined, it doesn’t get revisited until the day I write it. No over-engineering here.
  4. Outline early, and outline thoroughly.

Hopefully in the near future I’ll be hitting my goal of 2 books a year.

DEBUT NOVEL NOW AVAILABLE FOR PRE-ORDER! (Not the story described in this article):

The Human Countermove is now available for pre-order! https://www.amazon.com/dp/B0FM9R7T5F

In a nation ruled by AI Minds, productivity is everything—even play.

Once a legend in the world of strategy games, Zouk Solinsen is now just another burnout in a society obsessed with efficiency. But when the Minds announce a high-stakes tournament—with a seat on the ruling council as the prize—Zouk is drawn back into the fray, determined to reshape the future.

With help from the enigmatic Torrez Institute, Zouk racks up early victories against the Minds. But when Maya Torrez reveals the cost of her support—a violent coup against the Minds—he rejects it and strikes out alone.

Now, with no allies, dwindling resources, and a nation on the brink, Zouk faces the biggest game of his life—and a final, impossible choice: reform the system from within, or burn it all down.

Some Python Code Proofed My Book in 5 minutes

I wrote my book word by word, no AI involved. An editor helped me develop the story and a copy-editor made sure the manuscript was clean. I’ve read my book about a dozen times. Then my layout person gave me the final version of the book and I realized I had to read the whole thing again to check for new errors.

First I did it properly. My eyes were basically blind by the end. But I wanted a second sweep. The thing is, any person asked to do the job will make a mistake. They’ll overlook something. They won’t realize one paragraph is copied over twice or accidentally cut a space between two sentences. What I needed was a perfect sweep. A complete comparison between my original manuscript and the final epub document. The kind of sweep that could only be performed by a soulless machine with an inflexible view of correct and incorrect.

When I’m not writing I’m coding, and this kind of repetitive, detail-oriented, clearly defined task is the perfect fit for a machine. In fact, it was such a perfect fit, the whole process only took an hour.

What Did The Machine Do?

First I defined my requirements. This code was written to spot exactly one type of problem, copy-and-paste mistakes performed by the layout person. It’s not going to spot typos, it’s not going to spot grammar issues, and it’s certainly not going to point out plot holes. This machine is very stupid, but it performs its job to the letter.

Manuscript format: DOCX

Final Book Layout format: EPUB

Goal: Review every sentence in the EPUB and DOCX files and identify any sentence missing from one file that is present in the other, this should capture any omissions, insertions, or errors in the final manuscript. Then, identify if any sentences appear in the same manuscript more than once, this should identify any ‘duplicate chapter’ or ‘duplicate paragraph’ problems.

The complete code will be shown at the end in case you want to use it, but first I’ll walk you through the parts.

Step 1: Parse the DOCX Manuscript

import docx

def extract_text_from_docx(docx_path):
    doc = docx.Document(docx_path)
    full_text = []
    for para in doc.paragraphs:
        if para.text.strip():  # skip empty paragraphs
            full_text.append(para.text.strip())
    return '\n'.join(full_text)

This code is pretty straightforward, it parses the .docx file into paragraphs, joins it all together into one big paragraphless blob.

Step 2: Parse the EPUB Book

This code is almost identical to the DOCX, but EPUB has a lot more nuance to its data-types. We have to ensure we only retrieve the actual text items, and parse them out of html into plain-text. Then we join it all together in one big wall of book.

import ebooklib
from ebooklib import epub
from bs4 import BeautifulSoup

def extract_text_from_epub(epub_path):
    book = epub.read_epub(epub_path)
    text_content = []

    for item in book.get_items():
        if item.get_type() == ebooklib.ITEM_DOCUMENT:
            soup = BeautifulSoup(item.get_content(), 'html.parser')
            # Remove scripts and styles
            for tag in soup(['script', 'style']):
                tag.decompose()
            text = soup.get_text(separator=' ', strip=True)
            if text:
                text_content.append(text)

    return '\n'.join(text_content)

Step 3: Split the book-blobs into sentences

This part uses a tool called the Natural Language Toolkit (NLTK). Sometimes what NLTK considers a sentence is a little funny, like it’ll join two sets of short quotes together. But we cannot allow perfect to be the enemy of good, so as long as NLTK is responsible for both sentence splitting procedures, the final outputs should be identical.

import nltk
from nltk.tokenize import sent_tokenize
nltk.download('punkt')
nltk.download('punkt_tab')

def split_text_into_sentences(text):
    return sent_tokenize(text)

Step 3: Data cleanup

You may have noticed some really long character replacement stuff. Turns out the docx parser picks up a few too many newlines and the epub parser likes directional quotes, so all of that gets replaced with nice, consistent sentencing.

def docx_scan():
    docx_path = "FILENAME.docx"
    text = extract_text_from_docx(docx_path)
    sentences: List[str] = split_text_into_sentences(text)
    
    for i, val in enumerate(sentences):
        sentences[i] = val.replace('\n', ' ').replace('“', '"').replace('”', '"').replace("‘", "'").replace("’", "'").replace("\'", "'")

    return sentences

def epub_scan():
    epub_path = 'FILENAME.epub'
    text = extract_text_from_epub(epub_path)
    sentences: List[str] = split_text_into_sentences(text)

    for i, val in enumerate(sentences):
        sentences[i] = val.replace('\n', ' ').replace('“', '"').replace('”', '"').replace("‘", "'").replace("’", "'").replace("\'", "'")

    return sentences

Step 4: Crawl through the two books

This is a bit of a doozy, but this function essentially crawls through the final book looking for the next sentence in the manuscript. If it doesn’t find it in 10 sentences, it reports the sentence missing and moves on.
Note: The original draft of this post had a different algorithm that failed to account for sentence order. There’s nothing a programmer does more than tinker with their code, but this function is a big improvement on the original, trust me.

def compare_books(manuscript: List[str], final_book: List[str]):
    # We sweep through final_book searching for sentences from manuscrpt
    book_1_pos: int = 0
    book_2_pos: int = 0
    while book_1_pos < len(manuscript):
        found: bool = False
        target_sentence: str = manuscript[book_1_pos]
        for sweep_position in range(book_2_pos, book_2_pos+10):
            if(sweep_position < len(final_book) and target_sentence == final_book[sweep_position]):
                book_1_pos += 1
                book_2_pos = sweep_position
                found = True
                continue
        
        if not found:
            book_1_pos += 1
            if ' - ' not in target_sentence:
                print(target_sentence)
    

And because of the way the function is written, we can actually crawl through both books the same way.

    epub_sentences = epub_scan()
    docx_sentences = docx_scan()
    # Check the epub file for errors
    compare_books(docx_sentences, epub_sentences)
    # Check the docx file for errors
    compare_books(epub_sentences, docx_sentences)

There are ~8000 sentences in my book. Since the computer reads both copies twice, it’s only about 32,000 operations. A very cheap, less than one second scan for errors.

All the differences are then written out to a file. There were a bunch of false positives. Of the 54 reported omissions, 4 sentences turned out to contain errors, the rest were quirks of the epub format. But finding real errors means it’s working! And it means my layout person did a fantastic job!

Step 5: Check for duplicates

Finally, we do a quick check in both sentence lists for duplicates. The results here reveal my laziness as an author. It turns out I have ~90 non-unique sentences in my book. Most are ‘He said’, ‘She said’, ‘He nodded’, but the strangest one was “Alpha, Golf, Delta, Charlie.” which is a list of squadrons that are referenced in that exact order on two different occasions.

    non_unique_docx = set([x for x in docx_sentences if docx_sentences.count(x) > 1])

    non_unique_epub = set([x for x in epub_sentences if epub_sentences.count(x) > 1])

    print(f"Docx copies: {len(non_unique_docx)}")
    print(f"Epub copies: {len(non_unique_epub)}")

I verified that the total number of non-unique sentences was identical in the DOCX and EPUB formats and moved on.

Conclusion

I always felt a little uneasy about the final version of my book. Even when I had been through it myself, I couldn’t be sure I hadn’t overlooked a massive error. I still can’t be completely sure, but there’s something really reassuring about having a machine do a run-through. When precision is the aim, somehow the passionless report of a calculator is more comforting than a thumbs-up from a professional.

Complete File:

import docx
import nltk
from nltk.tokenize import sent_tokenize
import ebooklib
from ebooklib import epub
from bs4 import BeautifulSoup
from typing import List, Set

nltk.download('punkt')
nltk.download('punkt_tab')

def extract_text_from_docx(docx_path):
    doc = docx.Document(docx_path)
    full_text = []
    for para in doc.paragraphs:
        if para.text.strip():  # skip empty paragraphs
            full_text.append(para.text.strip())
    return '\n'.join(full_text)

def split_text_into_sentences(text):
    return sent_tokenize(text)

def docx_scan():
    docx_path = "YOURFILE.docx"
    text = extract_text_from_docx(docx_path)
    sentences: List[str] = split_text_into_sentences(text)
    
    for i, val in enumerate(sentences):
        sentences[i] = val.replace('\n', ' ').replace('“', '"').replace('”', '"').replace("‘", "'").replace("’", "'").replace("\'", "'")

    return sentences


def extract_text_from_epub(epub_path):
    book = epub.read_epub(epub_path)
    text_content = []

    for item in book.get_items():
        if item.get_type() == ebooklib.ITEM_DOCUMENT:
            soup = BeautifulSoup(item.get_content(), 'html.parser')
            # Remove scripts and styles
            for tag in soup(['script', 'style']):
                tag.decompose()
            text = soup.get_text(separator=' ', strip=True)
            if text:
                text_content.append(text)

    return '\n'.join(text_content)

def epub_scan():
    epub_path = 'YOURFILE.epub'
    text = extract_text_from_epub(epub_path)
    sentences: List[str] = split_text_into_sentences(text)

    for i, val in enumerate(sentences):
        sentences[i] = val.replace('\n', ' ').replace('“', '"').replace('”', '"').replace("‘", "'").replace("’", "'").replace("\'", "'")

    return sentences

def compare_books(manuscript: List[str], final_book: List[str]):
    # We sweep through final_book searching for sentences from manuscript
    book_1_pos: int = 0
    book_2_pos: int = 0
    while book_1_pos < len(manuscript):
        found: bool = False
        target_sentence: str = manuscript[book_1_pos]
        for sweep_position in range(book_2_pos, book_2_pos+10):
            if(sweep_position < len(final_book) and target_sentence == final_book[sweep_position]):
                book_1_pos += 1
                book_2_pos = sweep_position
                found = True
                continue
        
        if not found:
            book_1_pos += 1
            if ' - ' not in target_sentence:
                print(target_sentence)

def main():
    epub_sentences = epub_scan()
    docx_sentences = docx_scan()
    compare_books(docx_sentences, epub_sentences)
    compare_books(epub_sentences, docx_sentences)

    non_unique_docx = set([x for x in docx_sentences if docx_sentences.count(x) > 1])
    non_unique_epub = set([x for x in epub_sentences if epub_sentences.count(x) > 1])
    print(f"Docx copies: {len(non_unique_docx)}")
    print(f"Epub copies: {len(non_unique_epub)}")


if __name__ == '__main__':
    main()