Desktop Automation (DA)

Analyse, recognise, and control mouse and keyboard inputs to automate processes in desktop applications

Desktop Automation (DA) is the training process for EMILY to analyze, recognize, and control mouse and keyboard inputs to automate processes in desktop applications. Please note that the computer screen settings on the system where DA automation skills are deployed (e.g., resolution, color depth, text scaling) should match the settings of the computer used for training. It's recommended to train the skills directly on the computer where automation will be deployed and avoid any interference with the mouse and keyboard during the execution of automation skills.

Training Interface

Clicking "OPEN TRAINER UI" will launch the training mode panel for designing DA processes directly on your local machine. Alternatively, you can use another computer on the same network to access the Trainer URL and perform remote process design:

When the training mode panel appears in the upper right corner of the screen, it's recommended to minimize the EMILY main Window and the WAP window. This allows you to free up screen space for the training mode.

The training mode panel has three modes: Live, No-Code, and Low-Code. Live mode is for manually adjusting the desktop or applications. No-Code mode involves using the mouse to generate a menu for selecting instructions for the workflow steps, and ultimately, the instructions will be written into the code editor area in Low-Code mode. Low-Code mode allows manual adjustments to the code generated by No-Code mode for maximizing design flexibility.

In Live mode, you can freely adjust the desktop to the desired state at the beginning of the workflow. Then, click the snapshot button to obtain a screen snapshot for No-Code mode. You can also use the shortcut keys Ctrl/Cmd+0 to take a quick snapshot, or Ctrl/Cmd+5 to take a delayed 5-second snapshot.

In No-Code mode, you can single-click with the mouse to bring up a menu and choose the step command you want to generate, such as OPEN FILE, INPUT TEXT, SEND KEYS, SLEEP, COPY XY, CLICK LEFT, CLICK RIGHT, and more. For the OPEN FILE command, you can either double-click a specified file with the mouse to launch the associated application or choose to start an application by directly executing its executable file.

Once the step command is generated in No-Code mode, a reminder will appear as shown in the figure below to indicate that the command has been generated:

You can switch to Low-Code mode at this point to review and adjust the generated code:

In any mode, clicking the ▶︎ execution button on the training mode panel will automatically switch EMILY to the Live mode and start executing commands in the specified order. You can stop the execution by clicking the ■ stop button or pressing Shift+ESC:

You can then take another snapshot of the screen, switch back to No-Code mode, and continue adding the next step command. For example, you can use the mouse to drag an area of the button on the screen, release the mouse to bring up the menu, and select the next step command you've added:

As the number of steps or commands gradually increases, you can execute all the commands sequentially using the ▶︎ Execute button on the Training Mode panel at any time. Additionally, you can select a portion of the code in the code editor, and click the RUN SELECTED CODE button above the editor to execute the selected portion of step commands:

After completing all the steps or commands, make sure to commit the code. Since the step commands are added to the TRIAL code area, clicking the Commit button in the Training Mode panel will push them to the FINAL area. You need to commit again to push them back to the DA WAP window:

At this point, clicking the SAVE button will complete the DA WAP process section. If you happen to forget to SAVE, you can open the training mode's workspace. The code that was committed in the FINAL code editing area will be saved in a file named code-final-xxxx.js:

Precautions

  1. Windows operating system requires Windows 10 64-bit or higher version and Microsoft VC++ library should be installed in advance.

  2. When a DA skill is running, it takes control of the mouse and keyboard. Any manual mouse or keyboard actions during the automation process can potentially disrupt the automation and may lead to errors.

  3. Users should ensure that the input method editor (IME) status matches that during training. Sometimes, the system may automatically switch the input method for different applications, potentially leading to incorrect text input.

Low-Code API

// Get screen width and height
let size = await api.screen.size()
console.log(size)

// Search for the position most similar to the specified image within the specified region on the screen (optional) 
let found1 = await api.screen.find('crop-1234.png', {left:0,top:0,width:600,height:600})
console.log(found1)

// Find all possible positions on the screen that have a similarity of 95% or more to the specified image
let found2 = await api.screen.find('crop-1234.png', {confidence:0.95, all:true})
console.log(found2)

// Wait for up to 10 seconds for a location with the highest similarity to the specified image to appear on the specified area of the screen (optional)
let found3 = await api.screen.waitFor('crop-5678', 10000, {left:0,top:0,width:600,height:600,confidence:0.95})

// Capture an image with a size of 100x100 pixels from the screen starting at coordinates (0,0)
await api.screen.capture('snapshot.png', 0, 0, 100, 100)

// Find the position most similar to crop-2345 (with a confidence level of 95%) and click at a relative offset (x, y). Wait for 500ms before proceeding to the next command, and instantly move mouse cursor to destination
await api.clickCrop('crop-2345', x, y, {confidence:0.95, wait:500, instant: true})

// Find the position most similar to crop-3456 (with a confidence level of 95%) and move the mouse cursor to a relative offset (x, y). Wait for 500ms before proceeding to the next command
await api.moveToCrop('crop-3456', x, y, {confidence:0.95, wait:500})

// Move the mouse cursor to (x, y)
await api.mouse.move(x, y)

// Click the left mouse button, then double-click the left button, and then click the right button
await api.mouse.clickLeft()
await api.mouse.doubleClick()
await api.mouse.clickRight()

// Drag the mouse cursor from the current position to (x, y)
await api.mouse.drag(x, y)

// Scroll the mouse wheel upwards/downwards/leftwards/rightwards
await api.mouse.scrollUp(n)
await api.mouse.scrollDown(n)
await api.mouse.scrollLeft(n)
await api.mouse.scrollRight(n)

// Send 'hello' from the keyboard
await api.keyboard.type('hello')

// Press and release the key
await api.keyboard.press(api.key.A)
await api.keyboard.release(api.key.A)

// Send common keyboard control keys: ENTER/ESC/BACK/TAB
await api.keyboard.enter()
await api.keyboard.escape()
await api.keyboard.backspace()
await api.keyboard.tab()

// Send the keyboard hotkeys ALT+F4
await api.keyboard.type(api.key.LeftAlt, api.key.F4)

//Send keyboard commands: Select All/Copy/Paste
await api.ctrlA()
await api.ctrlC()
await api.ctrlV()

// Paste Text
await api.pasteText('hello')

// Synchronization function for reading and writing clipboard 
api.clipboard.writeText('hello')
console.log(api.clipboard.readText())

// API for reading a CSV file with a comma as the delimiter, and skip zero rows before reading the header
let rows = await api.readCSV('input.csv', ',', 0)
// API for writing CSV files, with the CSV header in the following order: name, age
await api.writeCSV('output.csv', [{name:'Alice',age:20},{name:'Bob',age:25}], ['name','age'])

// Wait for 5 seconds
await api.sleep(5000)

// Open any file in the workspace
await api.shell.openPath('output.csv')
// Open any file
await api.shell.openPath('/Users/emily/Desktop/input.xlsx')
// Open File Explorer/Finder and select a file
await api.shell.showItemInFolder('/Users/emily/Desktop/input.xlsx')
// Move the file to the bin
await api.shell.trashItem('/Users/emily/Desktop/input.xlsx')
// Open a web link
await api.shell.openExternal('https://google.com')

// Run a PowerShell script
let result1 = await api.powerShell('Get-Process')

// Execute a PowerShell PS1 script file
let result2 = await api.powerShell('./myscript.ps1')

Keys that api.key provides:

Space LeftAlt RightAlt LeftControl RightControl LeftShift RightShift LeftSuper RightSuper

F1 F2 F3 F4 F5 F6 F7 F8 F9 F10 F11 F12

Num0 Num1 Num2 Num3 Num4 Num5 Num6 Num7 Num8 Num9

A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

Grave Minus Equal Backspace LeftBracket RightBracket Backslash Semicolon Quote Return Comma Period Slash

Left Up Right Down

Print Pause Insert Delete Home End PageUp PageDown

NumPad0 NumPad1 NumPad2 NumPad3 NumPad4 NumPad5 NumPad6 NumPad7 NumPad8 NumPad9

Add Subtract Multiply Divide Decimal

CapsLock ScrollLock NumLock

--

We are dedicated to improving our content. Please let us know if you come across any errors, including spelling, grammar, or other mistakes, as your feedback is valuable to us! 🤖️⚡️

Last updated