Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[$500] [Live Markdown] Copying and pasting from google docs comes over as bold #41109

Closed
thienlnam opened this issue Apr 26, 2024 · 88 comments
Closed
Assignees
Labels
Bug Something is broken. Auto assigns a BugZero manager. Reviewing Has a PR in review Weekly KSv2

Comments

@thienlnam
Copy link
Contributor

thienlnam commented Apr 26, 2024

  1. In a google doc, highlight and copy some regular text
  2. Paste it into the composer

Actual
image
image

Expectation
It gets pasted as regular text

cc @tomekzaw

Upwork Automation - Do Not Edit
  • Upwork Job URL: https://www.upwork.com/jobs/~0181c168843a4dc29c
  • Upwork Job ID: 1783979270880542720
  • Last Price Increase: 2024-05-20
  • Automatic offers:
    • alitoshmatov | Reviewer | 102487686
    • ShridharGoel | Contributor | 102487688
Issue OwnerCurrent Issue Owner: @abekkala
@thienlnam thienlnam added Daily KSv2 Bug Something is broken. Auto assigns a BugZero manager. labels Apr 26, 2024
@thienlnam thienlnam self-assigned this Apr 26, 2024
Copy link

melvin-bot bot commented Apr 26, 2024

Triggered auto assignment to @abekkala (Bug), see https://stackoverflow.com/c/expensify/questions/14418 for more details. Please add this bug to a GH project, as outlined in the SO.

@thienlnam thienlnam added the External Added to denote the issue can be worked on by a contributor label Apr 26, 2024
@melvin-bot melvin-bot bot changed the title [Live Markdown] Copying and pasting from google docs comes over as bold [$250] [Live Markdown] Copying and pasting from google docs comes over as bold Apr 26, 2024
Copy link

melvin-bot bot commented Apr 26, 2024

Job added to Upwork: https://www.upwork.com/jobs/~0181c168843a4dc29c

@melvin-bot melvin-bot bot added the Help Wanted Apply this label when an issue is open to proposals by contributors label Apr 26, 2024
Copy link

melvin-bot bot commented Apr 26, 2024

Triggered auto assignment to Contributor-plus team member for initial proposal review - @alitoshmatov (External)

@techflare641
Copy link

@thienlnam Is it okay to prevent all formatting when paste text?

@thienlnam
Copy link
Contributor Author

We'd like to retain the formatting that it was copied with, instead of preventing all formatting at all

@jainilparikh
Copy link

@thienlnam , isn't this the expected behaviour ? If the text was bold on Google docs, we should send it as bold in our chat too.

The composer is showing * text * since it's allowing the end user to choose whether they want to continue with the formatting that their copied text used on Google docs or whether they want to change it. In-case they want to change it, they can use markdown (which the composer supports) to change the formatting.

Copy link

melvin-bot bot commented Apr 27, 2024

📣 @jainilparikh! 📣
Hey, it seems we don’t have your contributor details yet! You'll only have to do this once, and this is how we'll hire you on Upwork.
Please follow these steps:

  1. Make sure you've read and understood the contributing guidelines.
  2. Get the email address used to login to your Expensify account. If you don't already have an Expensify account, create one here. If you have multiple accounts (e.g. one for testing), please use your main account email.
  3. Get the link to your Upwork profile. It's necessary because we only pay via Upwork. You can access it by logging in, and then clicking on your name. It'll look like this. If you don't already have an account, sign up for one here.
  4. Copy the format below and paste it in a comment on this issue. Replace the placeholder text with your actual details.
    Screen Shot 2022-11-16 at 4 42 54 PM
    Format:
Contributor details
Your Expensify account email: <REPLACE EMAIL HERE>
Upwork Profile Link: <REPLACE LINK HERE>

@jainilparikh
Copy link

Contributor details
Your Expensify account email: [email protected]
Upwork Profile Link: https://www.upwork.com/freelancers/~01845f58bb8fa86b29

Copy link

melvin-bot bot commented Apr 27, 2024

✅ Contributor details stored successfully. Thank you for contributing to Expensify!

@skyweb331
Copy link
Contributor

skyweb331 commented Apr 28, 2024

@thienlnam
When I copy "Test123" from google doc, it is translated into HTML like this.

<html>
<body>
<!--StartFragment--><meta charset="utf-8"><b style="font-weight:normal;" id="docs-internal-guid-ef0d09b9-7fff-2c97-04c0-4fe5952c2707"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">TEST</span><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;"> 12</span></b><!--EndFragment-->
</body>
</html>

Too much complicated, but simply it is
<b><span style="font-weight: 700">Test</span><span style="font-weight:400">123</span></b>

But, ExpensiMark does not detect the font-weight, it is detecting <b> tag, that's why pasted text is parsed into *Test 123* in our editor.

As I mentioned in #41110 (comment), this could not be done by regex. Regex has its limitation and could not parse all these complex html. ( I finally find the exact example that regex could not fit for every cases ). Do you think, we can catch css styles using regex? Maybe, Yes. But it is too much complicated.

So my proposal is to change the HTML parsing logic like using xml2js or fast-xml-parser (#40571 (comment))

This way, we can completely resolve the root problem.

(cc @jjcoffee )

@tomekzaw
Copy link
Contributor

@skyweb331 We could just create DOM subtree for the pasted HTML and then get pure text using the following approach: https://stackoverflow.com/a/6743966/23325954

@skyweb331
Copy link
Contributor

@tomekzaw
Yes. If we use DOM, solution is really simple. I also recommended to use DOM, but on mobile device, DOM parsing is available but slow...That's why ExpensiMark is using regex. xml2js and fast-xml-parser is not using DOM and I recommend it.
( I had discussion with @ikevin127 for this. https://expensify.slack.com/archives/C01GTK53T8Q/p1713721057329569 )

@melvin-bot melvin-bot bot added the Overdue label Apr 29, 2024
Copy link

melvin-bot bot commented Apr 30, 2024

@abekkala, @thienlnam, @alitoshmatov Uh oh! This issue is overdue by 2 days. Don't forget to update your issues!

@thienlnam
Copy link
Contributor Author

Could you put a post in #expensify-open-source about the problem and the solution solved by using another parser so we can get some more eyes on it?

@melvin-bot melvin-bot bot removed the Overdue label May 1, 2024
@skyweb331
Copy link
Contributor

@thienlnam https://expensify.slack.com/archives/C01GTK53T8Q/p1713721057329569
I already posted this and discussed with Kevin Bader.
We have all parsing Markdown to HTML and vice-versa in One file ExpensiMark and using regex for everything. It is not well-structured and hard to maintain...

Copy link

melvin-bot bot commented May 3, 2024

📣 It's been a week! Do we have any satisfactory proposals yet? Do we need to adjust the bounty for this issue? 💸

@melvin-bot melvin-bot bot added the Overdue label May 3, 2024
@ShridharGoel
Copy link
Contributor

Proposal

Please re-state the problem that we are trying to solve in this issue.

Copying and pasting non-bold text from Google docs gets pasted as bold text.

What is the root cause of that problem?

Google docs adds some extra HTML values when copying text.

Example, when copying Testing, the below is the copied HTML:

<meta charset='utf-8'><meta charset="utf-8"><b style="font-weight:normal;" id="docs-internal-guid-1138e939-7fff-aca6-792e-add7dee8781e"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Testing</span></b>

When copying Testing which is in bold in Google docs, below is the copied HTML:

<meta charset='utf-8'><meta charset="utf-8"><b style="font-weight:normal;" id="docs-internal-guid-c0d177b2-7fff-5eed-41bb-35f0cf5b210e"><span style="font-size:11pt;font-family:Arial,sans-serif;color:#000000;background-color:transparent;font-weight:700;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">Testing</span></b>

The font-weight is 400 in the 1st example, while it is 700 in the 2nd.

But, in both cases, the outer HTML is wrapped in <b></b> because of which ExpensiMark takes it to be a bold text.

https://github.com/Expensify/expensify-common/blob/1713f28214f0e7176c4fd13433fb0ea15491ebf9/lib/ExpensiMark.js#L408-L412

What changes do you think we should make in order to solve the problem?

Update ExpensiMark logic for bold to the below:

{
    name: 'bold',
    regex: /<(b|strong)(?:"[^"]*"|'[^']*'|[^'">])*>([\s\S]*?)<\/\1>(?![^<]*(<\/pre>|<\/code>))/gi,
    replacement: (match, tagContent, innerContent) => {
        const isFontWeightBold = () => {
            return (fontWeightMatch[1].replaceAll(/\s/g,'').includes('font-weight:bold;') || fontWeightMatch[1].replaceAll(/\s/g,'').includes('font-weight:700;'))
        }
        const fontWeightMatch = innerContent.match(/style="(.*?)"/);
        const isBold = fontWeightMatch ? (isFontWeightBold() ? true : false) : true;
        return isBold ? `*${innerContent}*` : innerContent;
    },
},

If the inner content has font-weight style, then this checks if the font-weight of the span element is set to bold or 700.

If yes, then it uses the bold markdown, else it uses it as a normal text. We can polish the logic and code further.

Copy link

melvin-bot bot commented May 7, 2024

@abekkala, @thienlnam, @alitoshmatov Eep! 4 days overdue now. Issues have feelings too...

@ShridharGoel
Copy link
Contributor

Do we need to handle italic as well? I think it's outside the scope of this issue, would like to know thoughts on this.

@jasperhuangg
Copy link
Contributor

@ShridharGoel I think it's fine to only handle the bold case for this particular issue. @alitoshmatov thoughts?

@melvin-bot melvin-bot bot added the Overdue label Jul 1, 2024
@marcochavezf
Copy link
Contributor

I agree to handle only bold here cc @alitoshmatov

@melvin-bot melvin-bot bot added Overdue and removed Overdue labels Jul 1, 2024
@marcochavezf
Copy link
Contributor

Waiting on @alitoshmatov's review Expensify/expensify-common#710 (review)

@melvin-bot melvin-bot bot added Overdue and removed Overdue labels Jul 4, 2024
@marcochavezf
Copy link
Contributor

Expensify/expensify-common#710 merged

@melvin-bot melvin-bot bot removed the Overdue label Jul 8, 2024
@marcochavezf
Copy link
Contributor

@ShridharGoel @alitoshmatov given the PR for expensify-common was merged, I suspect we'd need another PR to bump the package in App, right?

@ShridharGoel
Copy link
Contributor

Yes, will open it in some time.

@melvin-bot melvin-bot bot added Reviewing Has a PR in review Weekly KSv2 and removed Daily KSv2 labels Jul 10, 2024
@ShridharGoel
Copy link
Contributor

#45208

@ShridharGoel
Copy link
Contributor

Update: PR is in review.

@alitoshmatov
Copy link
Contributor

alitoshmatov commented Jul 18, 2024

We can start regression period, other similar PR was quicker then ours and bumped expensify-common to the latest version which also includes our changes. So we can close our PR, and start regression period.

Our changes was in version 2.0.40, It is already 2.0.49

"expensify-common": "2.0.49",

@ShridharGoel
Copy link
Contributor

@abekkala This seems to be due for payments, thanks.

@abekkala
Copy link
Contributor

@thienlnam @marcochavezf I noticed that the PR for this issue was closed without merging based on another PR fixing this issue sooner.

Can you confirm if payments are still payable to both contributors here?

@marcochavezf
Copy link
Contributor

@abekkala yup it's still payable, since the PR that was closed was to bump the version in App, but the main work was done in another PR

@abekkala
Copy link
Contributor

PAYMENT SUMMARY:

@abekkala
Copy link
Contributor

@alitoshmatov payment sent and contract ended - thank you! 🎉

@ShridharGoel
Copy link
Contributor

Accepted, thanks.

@abekkala
Copy link
Contributor

@ShridharGoel payment sent and contract ended - thank you! 🎉

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug Something is broken. Auto assigns a BugZero manager. Reviewing Has a PR in review Weekly KSv2
Projects
Status: Done
Development

No branches or pull requests