Encountered a random NPE when playing with bookclubs. Test case demonstrating the problem is attached.
Threw in a bunch of minor tweaks to logs to make life of the person debugging code more pleasant.
Attaching or modifying a visualizations returns early on, to avoid a situation when a background job is stalled (by other jobs) and eventually the request timeouts.
This has an unfortunate consequence that any error reported in the `UpsertVisualizationJob` cannot be reported as a directly reply to a request because the sender has already been removed from the list.
Added more logs to discover why we get errors in the first place.
Modified the API a bit so that we carry `VisualizationContext` instead of three parameters all over the place.
Bonus:
Modified `JsonRpcServerTestKit` to implicitly require a position so that we get better error reporting on failures.
Fixes a random crash (*) during instrumentation. Notice how `onTailCallReturn` calls `onReturnValue` with `null` frame.
Bonus: noticed that for some reason we weren't getting logs for `ExecutionService`. This turned out to be the problem with the logger name which by default was `[enso]` not
`[enso.org.enso.interpreter.service.ExecutionService`] and there is some logic there that normalizes the name and assumed a dot after `enso`. This change fixes the logic.
(*)
```
[enso.org.enso.interpreter.service.ExecutionService] Execution of function main failed (Cannot invoke "com.oracle.truffle.api.frame.VirtualFrame.materialize()" because "frame" is null).
java.lang.NullPointerException: Cannot invoke "com.oracle.truffle.api.frame.VirtualFrame.materialize()" because "frame" is null
at org.enso.interpreter.instrument.IdExecutionInstrument$IdEventNodeFactory$IdExecutionEventNode.onReturnValue(IdExecutionInstrument.java:246)
at org.enso.interpreter.instrument.IdExecutionInstrument$IdEventNodeFactory$IdExecutionEventNode.onTailCallReturn(IdExecutionInstrument.java:274)
at org.enso.interpreter.instrument.IdExecutionInstrument$IdEventNodeFactory$IdExecutionEventNode.onReturnExceptional(IdExecutionInstrument.java:258)
at org.graalvm.truffle/com.oracle.truffle.api.instrumentation.ProbeNode$EventProviderChainNode.innerOnReturnExceptional(ProbeNode.java:1395)
at org.graalvm.truffle/com.oracle.truffle.api.instrumentation.ProbeNode$EventChainNode.onReturnExceptional(ProbeNode.java:1031)
at org.graalvm.truffle/com.oracle.truffle.api.instrumentation.ProbeNode.onReturnExceptionalOrUnwind(ProbeNode.java:296)
at org.enso.interpreter.node.ExpressionNodeWrapper.executeGeneric(ExpressionNodeWrapper.java:119)
at org.enso.interpreter.node.ClosureRootNode.execute(ClosureRootNode.java:85)
at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.executeRootNode(OptimizedCallTarget.java:718)
at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.profiledPERoot(OptimizedCallTarget.java:641)
at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callBoundary(OptimizedCallTarget.java:574)
at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.doInvoke(OptimizedCallTarget.java:558)
at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedCallTarget.callDirect(OptimizedCallTarget.java:504)
at jdk.internal.vm.compiler/org.graalvm.compiler.truffle.runtime.OptimizedDirectCallNode.call(OptimizedDirectCallNode.java:69)
at org.enso.interpreter.node.callable.thunk.ThunkExecutorNode.doCached(ThunkExecutorNode.java:69)
at org.enso.interpreter.node.callable.thunk.ThunkExecutorNodeGen.executeAndSpecialize(ThunkExecutorNodeGen.java:207)
at org.enso.interpreter.node.callable.thunk.ThunkExecutorNodeGen.executeThunk(ThunkExecutorNodeGen.java:167)
...
```
# Important Notes
Fixes regressions introduced in #8148 and #8162
* OpenFileCmd sends a reply when finished
Lack of reply and therefore a non-determinism on when OpenFile handler
can finish, led to some sporadic instability. Once caches format got
changed, things were taking such a long time that I wasn't able to start
even a basic project (requests would start to timeout).
The change also removes PushContextCmd from synchronous cmds (as
introduced in #798 to remove initialization problems in tests as well as
in real scenarions); the change did the job but was also a bit
controversial.
The change can also help with randomly failing applies (#8174) as IDE kept
closing and opening the project that might have exploited the
race-condition.
* Adapt tests
* Make tests more resilient to out of order messages
* Drop retries that lead to confusing errors
* less random failures
* s/OpenFileNotification/OpenFileRequest
Fixes#8186 by turning `IllegalStateException` into log message. Re-assigning of `BindingsMap` can happen in the IDE where evaluation of modules is repeated again and again. In addition to that avoid dropping errors in compiler without them being noticed.
Towards reduced reliance on Scala semantics.
Translated IR.scala to IR.java and extracted implicits that now need to be imported explicitly.
# Important Notes
1:1 translation. For now `@Identifier` and `@ExternalID` represent the old type aliases but are not verified at compile time.
This is because in a mixed Scala/Java world this seems impossible to employ such frameworks as Checker.
close#8033
Changelog:
- update: run language server initialization once
- fix: issues with async `getSuggestionDatabase` message handling in new IDE
- update: implement unique background jobs
- refactor: initialization logic to Java
- refactor: `UniqueJob` to a marker interface
Unmasked position info for the failed edits so that it is possible to understand what edits are attempted at all. So far unable to locate the problem based solely on the logs and the stacktrace. Added a test confirming that making edits at the end of the file continues to be fine (that's where the problem usually manifests itself).
# Important Notes
References #7978
`AmbgiuousImport` error and `DuplicatedImport` warning must not have `Source` as one of its fields or compiler will crash during a serialization attempt.
Changed the implementation to provide it as a parameter to the `message` method instead.
Fixes#8089
close#7819
Warnings
```
Execution of visualization [...] on value [...] of [Panic] failed.
```
are produced when you are typing an incomplete expression in the component browser and can be misleading.
It seems that Runtime Connector wasn't respecting the protocol it defined itself. The connector should be waiting on the `Api.InitializedNotification` message and only then start forwarding messages. So far it seems this hasn't been a problem, or at least wasn't reported as such, because initialization was fast enough.
Modified `Handler` so that we are certain that its fields hold initialized values when being accessed by different threads.
Should fix problems mentioned in #7898.
It looks like visualization commands had required context lock unnecessairly. Context manager methods are synchronized and therefore should not need the lock before submitting a correspodning background job.
Additionally, the presence of a context lock leads a deadlock:
1. Consider a long running execution of a program that does not finish within the 5 seconds
2. In the meantime there comes either an `AttachVisualization` or `DetachVisualization` request from the client
The latter will get stuck and eventually timeout out because it cannot acquire the lock withing the required time limits. It is still possible that a single long-running `ExecuteJob` might block other ones (including visualization ones) but that's a separate work.
Fixes some issues present in #7941.
# Important Notes
We need to still investigate `ExecuteJob`s need for context lock which might delay the execution of other background jobs that require it as well (mostly concerned about visualization).
close#7608
Changelog:
- update: log separately the evaluation of visualization expression and its arguments
- update: add visualization expression, its arguments, and the value type to the log
# Important Notes
Example
```
[TRACE] [2023-09-06T20:41:45+03:00] [enso] Executing visualization [VisualizationConfiguration(d195fdd8-d4e8-400f-a0d4-e50417eddd0a,ModuleMethod(MethodPointer(Standard.Visualization.Table.Visualization,Standard.Visualization.Table.Visualization,prepare_visualization),Vector(1000)),local.New_Project_1.Main)] on expression [ddc060df-9b59-48e5-bc61-aca849347343] of [class org.enso.interpreter.runtime.data.vector.Vector$Generic]...
```
Refactoring deeply nested IR classes to shallow nesting.
Fixes#7017
# Important Notes
It's a big PR but fairly consistent. Split multiple hierarchy levels into subpackages.
close#7604
After moving the rename action to the dashboard, IDE is unaware of the new project name. PR implements a new `refactoring/projectRenamed` notification that is sent from the server to clients and informs them about the changed project name.
# Important Notes
https://github.com/enso-org/enso/assets/357683/7c62726d-217e-4e69-8e48-568e0b7b8c34
Seeing plenty of
```
java.lang.NullPointerException: Some(Null receiver values are not supported by libraries.)
at org.graalvm.truffle/com.oracle.truffle.api.library.LibraryFactory.dispatch(LibraryFactory.java:528)
at org.graalvm.truffle/com.oracle.truffle.api.library.LibraryFactory.getUncached(LibraryFactory.java:396)
at org.enso.interpreter.runtime.error.WarningsLibraryGen$UncachedDispatch.hasWarnings(WarningsLibraryGen.java:440)
at org.enso.interpreter.instrument.job.ProgramExecutionSupport$.sendExpressionUpdate(ProgramExecutionSupport.scala:366)
at org.enso.interpreter.instrument.job.ProgramExecutionSupport$.$anonfun$executeProgram$1(ProgramExecutionSupport.scala:62)
at org.enso.interpreter.instrument.job.ProgramExecutionSupport$.$anonfun$executeProgram$10(ProgramExecutionSupport.scala:151)
at java.base/java.lang.Iterable.forEach(Iterable.java:75)
at org.enso.interpreter.instrument.job.ProgramExecutionSupport$.executeProgram(ProgramExecutionSupport.scala:139)
at org.enso.interpreter.instrument.job.ProgramExecutionSupport$.$anonfun$runProgram$3(ProgramExecutionSupport.scala:217)
```
during execution of simple programs.
Added a guard to prevent us from sending expression updates when dealing with nulls.
close#7206
fixes
```
[error] [2023-07-04T08:43:55.015Z] [akka.actor.OneForOneStrategy] Cannot invoke "org.enso.interpreter.instrument.execution.CommandProcessor.stop()" because the return value of "org.enso.interpreter.instrument.Handler.commandProcessor()" is null
java.lang.NullPointerException: Some(Cannot invoke "org.enso.interpreter.instrument.execution.CommandProcessor.stop()" because the return value of "org.enso.interpreter.instrument.Handler.commandProcessor()" is null)
at org.enso.interpreter.instrument.Handler.onMessage(Handler.scala:119)
at org.enso.interpreter.instrument.Endpoint.$anonfun$sendBinary$1(Handler.scala:66)
at org.enso.interpreter.instrument.Endpoint.$anonfun$sendBinary$1$adapted(Handler.scala:64)
at scala.Option.foreach(Option.scala:437)
at org.enso.interpreter.instrument.Endpoint.sendBinary(Handler.scala:64)
at org.graalvm.truffle/com.oracle.truffle.api.instrumentation.TruffleInstrument$Env$MessageTransportProxy$MessageEndpointProxy.sendBinary(TruffleInstrument.java:1052)
at org.enso.languageserver.runtime.RuntimeConnector$$anonfun$initialized$1.applyOrElse(RuntimeConnector.scala:79)
at scala.runtime.AbstractPartialFunction.apply(AbstractPartialFunction.scala:35)
at org.enso.logger.akka.ActorLoggingReceive.apply(ActorLoggingReceive.scala:35)
at org.enso.logger.akka.ActorLoggingReceive.apply(ActorLoggingReceive.scala:14)
at scala.PartialFunction.applyOrElse(PartialFunction.scala:214)
at scala.PartialFunction.applyOrElse$(PartialFunction.scala:213)
at org.enso.logger.akka.ActorLoggingReceive.applyOrElse(ActorLoggingReceive.scala:14)
at scala.PartialFunction$Combined.applyOrElse(PartialFunction.scala:305)
at akka.actor.Actor.aroundReceive(Actor.scala:537)
at akka.actor.Actor.aroundReceive$(Actor.scala:535)
at org.enso.languageserver.runtime.RuntimeConnector.aroundReceive(RuntimeConnector.scala:20)
at akka.actor.ActorCell.receiveMessage(ActorCell.scala:579)
at akka.actor.ActorCell.invoke(ActorCell.scala:547)
at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270)
at akka.dispatch.Mailbox.run(Mailbox.scala:231)
at akka.dispatch.Mailbox.exec(Mailbox.scala:243)
at java.base/java.util.concurrent.ForkJoinTask.doExec(ForkJoinTask.java:373)
at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(ForkJoinPool.java:1182)
```
Package's config information, once loaded, never changed. While there is typically no need for it, this was problematic when the config became out-of-sync with the filesystem, like in the case of project rename action.
In rename, the config's properties would be updated in the FS, but that would never be reflected in module's package. Therefore further compilations would continue to ask for the old namespace.
Most of the changes are cosmetic (s/`.config`/`.getConfig()`) except for the new `reloadConfig` method on `Package` that is being called in `RenameProjectCmd` handler.
Closes#7062.
# Important Notes
The reported `ExecutionFailed` error should have been mostly fixed already via #7143. This change makes sure that all the related warnings are gone as well and the compiler uses the updated namespace.
`executionFailed` instead is sent when an evaulation finishes with a a critical failure or a non-critical error.
The PR tries to miniminally modify the change in the messages exchange so as to avoid a major redesign at this point.
Closes#7002.
# Important Notes
Unblocks IDE which will need to modify to this new setup.
Fixes#6955 by:
- using `visualisationModule` to specify the module where the visualization is to be used
- referring to method in `Meta.get_annotation` with `.method_name` - e.g. unresolved symbol notation
- evaluating arguments to `Meta.get_annotation` in the context of the user module (which can access the extension functions)
Request Timeouts started plaguing IDE due to numerous `executionContext/***Visualization` requests. While caused by a bug they revealed a bigger problem in the Language Server when serving large amounts of requests:
1) Long and short lived jobs are fighting for various locks. Lock contention leads to some jobs waiting for a longer than desired leading to unexpected request timeouts. Increasing timeout value is just delaying the problem.
2) Requests coming from IDE are served almost instantly and handled by various commands. Commands can issue further jobs that serve request. We apparently have and always had a single-thread thread pool for serving such jobs, leading to immediate thread starvation.
Both reasons increase the chances of Request Timeouts when dealing with a large number of requests. For 2) I noticed that while we used to set the `enso-runtime-server.jobParallelism` option descriptor key to some machine-dependent value (most likely > 1), the value set would **only** be available for instrumentation. `JobExecutionEngine` where it is actually used would always get the default, i.e. a single-threaded ThreadPool. This means that this option descriptor was simply misused since its introduction. Moved that option to runtime options so that it can be set and retrieved during normal operation.
Adding parallelism intensified problem 1), because now we could execute multiple jobs and they would compete for resources. It also revealed a scenario for a yet another deadlock scenario, due to invalid order of lock acquisition. See `ExecuteJob` vs `UpsertVisualisationJob` order for details.
Still, a number of requests would continue to randomly timeout due to lock contention. It became apparent that
`Attach/Modify/Detach-VisualisationCmd` should not wait until a triggered `UpsertVisualisationJob` sends a response to the client; long and short lived jobs will always compete for resources and we cannot guarantee that they will not timeout that way. That is why the response is sent immediately from the command handler and not from the job executed after it.
This brings another problematic scenario:
1. `AttachVisualisationCmd` is executed, response sent to the client, `UpsertVisualisationJob` scheduled.
2. In the meantime `ModifyVisualisationCmd` comes and fails; command cannot find the visualization that will only be added by `UpsertVisualisationJob`, which might have not yet been scheduled to run.
Remedied that by checking visualisation-related jobs that are still in progress. It also allowed for cancelling jobs which results wouldn't be used anyway (`ModifyVisualisationCmd` sends its own `UpsertVisualisationJob`). This is not a theoretical scenario, it happened frequently on IDE startup.
This change does not fully solve the rather problematic setup of numerous locks, which are requested by short and long lived jobs. A better design should still be investigated. But it significantly reduces the chances of Request Timeouts which IDE had to deal with.
With this change I haven't been able to experience Request Timeouts for relatively modest projects anymore.
I added the possibility of logging wait times for locks to better investigate further problems.
Closes#7005
There was an inherent race condition between edit, close & open commands which could not be prevented solely using locks. `EditFileCmd` triggered `EnsureCompiledJob` which was applying edits collected over time. At the same `CloseFileCmd` and `OpenFileCmd` were executed asynchronously and required locks on compilation unit and file lock.
Additionally, open file was resetting the module's runtime source irrespective of any edits that could already have been applied with the asynchronous execution in `EnsureCompiledJob`. This was visible especially during early manipulation of the project when open/close was performed due to a bug in IDE (#6843).
Now commands can be run either synchronously or asynchronously. Only that way can we ensure that `close` & `open` commands finish by the time any editions are being applied to module's sources.
Closes#6841.
# Important Notes
In the given video, `"foo"` would be greyed out because it would never be part of the module's (runtime) sources. Therefore no IR would be generated for it or instrumentation, meaning it would be present in `expressionUpdates` information necessary for IDE.
[Kazam_screencast_00014.webm](https://github.com/enso-org/enso/assets/292128/226a17b8-729a-415a-803f-003a9695b2f1)