Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add ability to import Google docs into posts #8756

Merged
merged 42 commits into from Feb 29, 2024

Conversation

Will-Howard
Copy link
Collaborator

@Will-Howard Will-Howard commented Feb 12, 2024

Based on these designs. There are a few differences from the designs, some of which are based on this slack thread, and some of which are things I changed (@agnesstenlund I can change them back if you want).

The changes from the thread are:

  • Removing the option to link your account

The additional changes I made:

  • I changed the copy a bit, including removing the "Import shared Google doc" header (because this is implied now there is only one option)
  • I made it so all the bits of the UI (the button and the "This will overwrite" message) show before you enter a url. I did this because I personally find it a bit annoying when you can't tell what all the steps are before you fill in part of a form (mentioned in the slack thread), and also now there is only one form I think it looks ok to have more info in one go

Summary of the logic

Clicking "Import Google doc" does the following:

  • Exports the doc as html, using a "service account" (that an admin has to log into to set up). For us the account is eaforum.posts@gmail.com. I decided to go with a plain gmail account to avoid any risk of exposing CEA files (because you'll be able to download any doc this account has access to), also it's a bit shorter
  • Do some conversion (see convertImportedGoogleDoc) of the html to make it work with ckeditor, this is mostly to replicate things we do on paste in ckeditor
  • Create a new revision and/or post depending on whether this is the new post or edit post form:
    • For new posts, create a draft post with the converted html (which also creates a revision in a callback)
    • For editing posts, create a new revision (this being the latest revision will cause it to load in the editor), but don't set this as the live revision on the post. This means they will need to hit publish to update the live post
  • Refresh the page

Screenshots

Screenshot 2024-02-20 at 11 44 46
Screenshot 2024-02-20 at 11 44 54
Screenshot 2024-02-20 at 11 45 01
Screenshot 2024-02-20 at 11 45 24
Screenshot 2024-02-20 at 11 41 11

Things left to do, that I will aim to do before merging:

  • Add analytics
  • Add a reminder to admins (in the frontpage sidebar) to re-login to the service account when the session gets close to expiring
  • Stretch: Make the doc url pre-fill if you are on a post where you have previously imported from a google doc
  • Stretch: Add tests for the convertImportedGoogleDoc function (which replicates the logic in ckeditor)

┆Issue is synchronized with this Asana task by Unito

@Will-Howard Will-Howard changed the base branch from master to wh-version-history-2024-02 February 19, 2024 15:54
@Will-Howard Will-Howard changed the base branch from wh-version-history-2024-02 to master February 19, 2024 15:54
@Will-Howard Will-Howard changed the base branch from master to wh-form-group-refactor-2024-02 February 19, 2024 16:09
@agnesstenlund
Copy link
Collaborator

agnesstenlund commented Feb 20, 2024

A question–will it not also overwrite saved changes?

@agnesstenlund
Copy link
Collaborator

it takes 1 sec before the button become non-disabled, any chance that could be faster? It's long enough for me to get confused that it's not working:

Screen.Recording.2024-02-20.at.13.56.05.mov

@agnesstenlund
Copy link
Collaborator

fyi I changed my mind slightly about the colors/ UI when seeing it live–changed to black and grey:
Screenshot 2024-02-20 at 13 53 16

@agnesstenlund
Copy link
Collaborator

If I delete the text in the input the button doesn't go back to disabled unless I refresh:

Screen.Recording.2024-02-20.at.14.00.20.mov

@Will-Howard
Copy link
Collaborator Author

@agnesstenlund

A question–will it not also overwrite saved changes?

This is a bit of a semantic issue that is hard to get across in a short sentence, basically:

  • For changes where you've previously clicked "save as draft" or "publish", you will be able to restore them from version history, but it will "overwrite them" as in replacing what is currently in the editor
  • For changes you haven't saved, it will overwrite them completely and there is no way to get them back (except possibly via the "Do you want to restore your text" bar, but I'm not sure if this will work reliably

So I think the current wording is correct if you take overwrite to mean "delete in a way you can't recover easily"

it takes 1 sec before the button become non-disabled, any chance that could be faster? It's long enough for me to get confused that it's not working

It's not really possible to make this faster unfortunately without changing how it works. I expect it to be a bit faster on production though (maybe twice as fast, part of the time is looking things up in our database which will be faster, the rest is looking up the doc with google which will be the same).

If we did change how it works we could do something like un-disabling the button as soon as it gets a valid url (whether or not we have access to it), but then failing when you click the button if we don't have access (and still showing the message)

If I delete the text in the input the button doesn't go back to disabled unless I refresh:

Fixed this 👍

@Will-Howard
Copy link
Collaborator Author

Comments from Lizka:

Copy changes:

  • [Not exactly related to gdoc import] “Share draft” should IMO be “Share this draft” — next to the gdoc import it looks like it has something to do with gdocs…

Seems good 👍 , I've updated this

  • “Paste a link…” maybe is clearer if we write “Paste a link to your Google Doc here; make sure the anyone with the link can view or the Doc is shared with [email address].” (It’s more words, but less disorienting imo)

Hmm I prefer the original (bc it's fewer words), interested in what @agnesstenlund thinks. I think I'm assuming people will basically have a good understanding of how google doc permissions work so will get the idea without much explanation. Here's a side by side of the two wordings:

Screenshot 2024-02-20 at 14 40 19 Screenshot 2024-02-20 at 14 40 54
  • Love the red “We don’t have access” thing — allays basically all of my concerns
    • If I go and update the doc and don’t change anything on this tab, will it automatically update? Otherwise I’d add a “Re-upload” button or something if possible

Good catch, I've changed it so it doesn't update automatically. Unfortunately this means we now might be at risk of hitting a rate limit, although according to my calculation it should be able to tolerate ~80 users using it concurrently, which should hopefully be way more than we need

  • I don’t get the “Google Doc import service account” screenshot, but I assume it’s not meant to be viewable to users? If it is, I’d change that page.

Yup this is for admins (and only devs basically), users won't see this

@agnesstenlund
Copy link
Collaborator

Agree that the rewrite is too long. Since we say Google doc above I don't think it's necessary + the length of that line makes me slightly more confused about it?

@agnesstenlund
Copy link
Collaborator

agnesstenlund commented Feb 20, 2024

could do something like "Paste your Google doc link here. It needs to be public or shared with eaforum.posts@gmail.com" if we think we should mention google doc

@Will-Howard
Copy link
Collaborator Author

could do something like "Paste your Google doc link here. It needs to be public or shared with eaforum.posts@gmail.com" if we think we should mention google doc

Eh 🤷‍♂️, I think it's hopefully obvious from the placeholder and button text

Comment on lines +173 to +174
const googleClientId = googleClientIdSetting.get();
const googleOAuthSecret = googleOAuthSecretSetting.get();
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've set up a dev and prod client in the google cloud console (see here)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I know what you did here, but @Will-Howard, can you elaborate on the kind of credentials you set up?

The part I need to figure out is the consent screen, I think. How did that work for you?

@Will-Howard
Copy link
Collaborator Author

@oetherington btw the error in the console is a (non-serious) bug in the google api library, for which there is a fix that will be merged soon

@jpaddison3
Copy link
Collaborator

I love that when you link to that fix it puts a backlink in the referenced issue. Like, "we're waiting on you bro".

Copy link
Collaborator

@oetherington oetherington left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM.

I left quite a few comments but I don't think anything is necessarily blocking, except maybe admin gating the auth routes (if I'm understanding correctly how those are supposed to work).

const clientIdSetting = new DatabaseServerSetting<string | null>('googleDocImport.oAuth.clientId', null)
const clientSecretSetting = new DatabaseServerSetting<string | null>('googleDocImport.oAuth.secret', null)

let _oAuthClient: OAuth2Client | null = null;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think a preceding underscore is the correct naming convention for globals. Generally you'd use OAUTH_CLIENT or g_oAuthClient. A preceding underscore has the special meaning of implying that the variable is unused, and making linters ignore it if that's the case.

@@ -398,7 +478,7 @@ export const addAuthMiddlewares = (addConnectHandler: AddMiddlewareType) => {
});
}

addConnectHandler('/auth/google/callback', (req: Request, res: Response, next: NextFunction) => {
addConnectHandler('/auth/google/callback', (req: AnyBecauseTodo, res: AnyBecauseTodo, next: AnyBecauseTodo) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this line is an incorrect merge conflict resolution

return (
<div className={classes.root}>
<Link to={"/admin/googleServiceAccount"}>
{message}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: indentation

@@ -74,6 +74,7 @@ const AdminHome = ({ classes }: {
<li><Link className={classes.link} to="/admin/synonyms">Search Synonyms</Link></li>
<li><Link className={classes.link} to="/admin/tagMerge">{taggingNameCapitalSetting.get()} Merging Tool</Link></li>
<li><span className={classes.link} onClick={refreshDbSettings}>Refresh DB Settings</span></li>
<li><Link className={classes.link} to="/admin/googleServiceAccount">Google Doc import service account</Link></li>
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the <Loading /> on the line below corresponse to the "Refresh DB Settings" option above, so this probably shouldn't go in between them (it's a bit janky but, hey, it's an admin page).

@@ -209,6 +209,7 @@
"express-session": "^1.17.1",
"feedparser-promised": "^1.2.2",
"fp-ts": "^2.13.1",
"googleapis": "^133.0.0",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

WRT the error in the package.json for this library, it might be worth making a manual patch (which should be very straight forward). The issue has existed for 6 weeks and the PR to fix it has already been open for 3 weeks, and I have less than 0 faith in Google being competant enough to actually fix it in a timely manner.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll give them a couple more weeks, but I'll set a reminder with a deadline so we don't just get stuck with this forever

@@ -20,6 +20,9 @@ import { Request, Response, NextFunction, json } from "express";
import { AUTH0_SCOPE, loginAuth0User, signupAuth0User } from './authentication/auth0';
import { IdFromProfile, UserDataFromProfile, getOrCreateForumUser } from './authentication/getOrCreateForumUser';
import { promisify } from 'util';
import { OAuth2Client } from 'google-auth-library';
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given that this file already has multiple oauth clients, maybe this should be OAuth2Client as GoogleOAuth2Client

const callbackUrl = "google_oauth2callback"
const oauth2Client = new OAuth2Client(googleClientId, googleOAuthSecret, `${getSiteUrl()}${callbackUrl}`);

addConnectHandler('/auth/linkgdrive', (req: Request, res: Response) => {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should these two handlers be admin gated?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes they should! Great catch

/**
* Convert the footnotes we get in google doc html to a format ckeditor can understand. This is mirroring the logic
* in the footnotes plugin (see e.g. public/lesswrong-editor/src/ckeditor5-footnote/src/footnoteEditing/googleDocsFootnotesNormalizer.js)
*/
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we put a comment in that JS file as well so if it gets updated those updates are more likely to be copied here too?

return null;
}

return await canAccessGoogleDoc(fileUrl)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: superfluous async/await

const { html: withRehostedImages } = await convertImagesInHTML(html, postId, url => url.includes("googleusercontent"))
const withConvertedFootnotes = googleDocConvertFootnotes(withRehostedImages)
const withNormalisedRedirects = googleDocRemoveRedirects(withConvertedFootnotes)
const ckEditorMarkup = await dataToCkEditor(withNormalisedRedirects, "html")
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of import issues I noticed that probably need to be handled here:

  • The document title got imported as normal text at the top of the document, but maybe we should use it to set the post title and remove it from the document body?
  • Bold, italics and underline don't work
  • Internal links still point to the google doc, maybe we can fix this now or is it still difficult?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The document title got imported as normal text at the top of the document, but maybe we should use it to set the post title and remove it from the document body?

When creating a new post it already sets the actual title of the doc as the title, I guess it will often be repeated in the main body but it doesn't seem that costly for people to remove it manually

Bold, italics and underline don't work

Thanks, will fix. If it's not completely straightforward I might do it as a followup (but still today)

Internal links still point to the google doc, maybe we can fix this now or is it still difficult?

We can fix that now, I will do this one as a followup, probably also today

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok I did the bold/italics one, and made a ticket for internal links

@Will-Howard Will-Howard changed the base branch from wh-form-group-refactor-2024-02 to master February 29, 2024 12:46
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants